




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
Lesson10DataWarehouseOverview
(第十課數(shù)據(jù)倉庫概論)
Vocabulary(詞匯)ImportantSentences(重點句)QuestionsandAnswers(問答)Problems(問題)
TheworddatawarehousewasfirstdevelopedbyBillInmonintheearly1990s.Hereferredtoitasbeingaintegratedcollectionofinformationthatcouldhelpcompaniesandorganizationsmakebetterdecisions.
Tobeeffective,adatawarehousehadtobeintegrated,subjectoriented,non-volatile,andtimevariant.Inthisarticle,Iwillgooverallthesefactorsindetail.Ifyouarebuildingadatawarehouse,itisimportantforyoutounderstandwhytheyareimportant.
Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.[1]
Itisimportantthattheinformationcontainedwithinadatawarehouseisstable.Whiledatacanbeadded,itshouldneverbedeleted.Thispropertyisreferredtoasbeingnon-volatile.Whenacompanyusesadatawarehousethatisstable,thiswillallowthemtogetabetterunderstandingoftheoperationswithintheircompany.Despitethefactthatthesetermswerefirstcoinedinthe1990s,theyarestillhighlyaccuratetoday.However,itshouldbenotedthatsomedatawarehousesarevolatile.Thereasonforthisisbecausemanymoderndatawarehousesdealwithterabytesofdata.Becausetheymuststoreterabytesofdata,manycompaniesareforcedtodeletesomeoftheirinformationafteracertainperiodoftime.Forinstance,somecompanieswillsystematicallydeletedatathathasreachedthreeyearsofage.Beforeadatawarehousecanbebuilt,thecorrectdatamustbelocated.Generally,theinformationthatwillbeaddedtothewarehousewillcomefromdailyinformationorhistoricalinformation.Thehistoricalinformationmaybestoredinalegacysystem,andischallengingtoextract.
Thedesignofthedatawarehouseisimportantaswell.Itisimportantfordesignerstomakesurethedesignisconsistentwiththequeriesthatwillbeconductedwithinthewarehouse.Todothissuccessfully,itisimportantfordesignerstounderstandthedatabaseschema.Itiscrucialtomakesurethedatawarehouseisdesignedcorrectly,asitisdifficulttorecreatesomeformsofdata.Anotherimportantaspectofdatawarehousesisdataacquisition.Dataacquisitioncanbedefinedastransferringdatafromasourcetothewarehouse.Dataacquisitionisoneofthemostexpensivepartsofbuildingadatawarehouse.ThisprocesswilloftenbeconductedwithanETL(Extracting,TranslatingandLoading)tool.
Asofthistime,therearejustover50ETLtoolsbeingsold.Itmaycostacompanymillionsofdollarsinordertotransferdatafromsourcestothewarehouse.Oncetheinitialdatahasbeentransferredtothedatawarehouse,theprocessmustberepeatedconsistently.Dataacquisitionisacontinuousprocess,andthegoalofacompanyistomakesurethewarehouseisupdatedonaregularbasis.Whenthewarehouseisupdated,itisoftenhardtodeterminewhichinformationinthesourcehaschangedsincethepreviousupdate.Theprocessofdealingwiththisissueiscalledchangeddatacapture.Thisprocesshasbecomeaseparatefield,andthereareanumberofproductscurrentlybesoldtodealwithit.
Itisimportantfordatatobecleanedbeforeitcanbeplacedinthewarehouse.Thedatacleansingprocessisusuallydoneduringthedataacquisitionphase.Anydatathatisplacedinawarehousebeforebeingcleanwillposeadangertothesystem,anditcannotbeused.Thereasonforthisisbecausethedatamaynotbecorrectifitisnotcleaned,andacompanymaymakeincorrectdecisionsbasedonit.Thiscouldleadtoanumberofproblems.Forexample,alltheinformationwithinadatawarehousethatmeansthesamethingmustbestoredinthesameform.Ifthereisinformationthatreads“MS”and“Microsoft”,eventhoughtheymeanthesamething,onlyoneofthemcanbeusedtorecognizetheelementwithinthedatawarehouse.1DataWarehouseTools
Thereareanumberofimportanttoolswhichareconnectedtodatawarehouses,andoneoftheseisdataaggregation.Adatawarehousecanbedesignedtostoreinformationbasedonacertainlevelofdetail.Forexample,youcanstoredatabasedoneachtransaction,oryoucanstoreitbasedonasummary.Theseareexamplesofdataaggregation.Whendataissummarized,thequerieswillmoveatamuchfasterrate.However,someoftheinformationmaybelostduringaquery,andthisinformationmaybeimportantforsolvingacertainproblem.
Beforeyoudecidewhichoneyouwilluse,itisimportanttoweighyouroptionscarefully.Onceyouhavecarriedoutanoperation,youwillneedtorebuildthewarehouseinordertoundoit.Thebestwaytohandlethissituationistomakesurethedatawarehouseisconstructedwithalargeamountofdetail.However,thecostforthiscanbehugedependingonthestorageoptionsyouchoose.Onceyouhavefilledyourdatawarehousewithimportantinformation,youwillwanttousethisdatatohelpyoumakesmartinvestmentdecisions.Thetoolsthatcanallowyoutodothiswillfallunderatopicthatiscalledbusinessintelligence.
Businessintelligenceisafieldwhichisverydiverse.ItiscomprisedofthingssuchasExecutiveInformationSystems,DecisionSupportSystems,andBusinessintelligencecanfurtherbebrokendownintoafieldthatiscalledmulti-dimensionalanalysistools.Thesearetoolsthatwillallowausertoviewdatafromawidevarietyofangles.AquerytoolwillallowausertosendSQLquerieswithinawarehousetolookforresults.Dataminingisalsoafieldthatfallsunderbusinessintelligence,andwillallowyoutolookforpatternsandrelationshipswithinadatawarehouse.
Anothertoolthatisconnectedtodatawarehousesisdatavisualization.Thetoolsthatareusedfordatavisualizationwillpresentvisualmodelsofdata.Thisdatacouldcomeintheformofintricate3Dimages.Thegoalofdatavisualizationistoallowtheusertoviewtrendsinamethodwhichiseasiertounderstandthancomplicatedmodelsthatarebasedoffstatistics.OnetoolthatisallowingthisfieldtoadvanceisVRML,orVirtualRealityModelingLanguage.Inorderfordatawarehousestofunctionproperly,itisalsoimportanttoplaceanemphasisonmetadatamanagement.Metadatacanbedescribedasbeing“informationaboutinformation”.
Metadatamustbemanagedwhendataisacquiredoranalyzed.Metadatawillbeheldinarepository,andcangiveyouimportantinformationaboutmanyofthedatawarehousetools.Theprocessofproperlymanagingmetadatahasbecomeasciencewithinitself.Ifitisdoneproperly,thecompanycangreatlybenefit.Thereasonwhyitisimportantisbecauseitcanalloworganizationstoanalyzethechangesthatoccurwithindatabasetables.Thisisatoolthatplaysanimportantpartoftheconstructionofadatawarehouse.
Datawarehousingisafieldwhichissomewhatcomplicated.Therearemanyvendorswhoareattemptingtoadvertisethetools,butthecostandcomplexityinvolvedwiththeproductshasnotallowedthemtobeusedbyalargenumberofcompanies.Anycompanythatisthinkingofusingdatawarehousesmustmakesuretheyhavetakenthetimetoreviewandunderstandthetechnology.Itcanonlybeusefulifyouknowhowtouseit.Onceyouunderstandandacquirethetechnology,itispossibleforyoutogainapowerfuladvantageoveryourcompetitors.Thishasmadedatawarehousesattractivetomanycompanies.
Oneofthebiggestadvantagestodatawarehousesisthattheyallowyoutostoreinformationthatyoucanusetoimprovethemarketingstrategiesofyourcompany.Notonlycanyouimprovethemarketingstrategies,butyouwillalsobeabletomakestrategicdecisionsbasedontheinformationyouhavecompiledandorganized.Withtechniquessuchasdatamininganddatavisualization,youwillbeabletodiscoverimportantpatternsthatyoudidn’tknowexisted.Thepatternsthatyoudiscovercanallowyourcompanytoearnlargeprofits.2DataWarehousingMethods
Mostorganizationsagreethatdatawarehousesareausefultool.Theybenefitfromtheabilitytostoreandanalyzedata,andthiscanallowthemtomakesoundbusinessdecisions.Itisalsoimportantforthemtomakesurethecorrectinformationispublished,anditshouldbeeasytoaccessbythepeoplewhoareresponsibleformakingdecisions.
Therearetwoelementsthatmakeupthedatawarehouseenvironment,andthesearepresentationandstaging.Thestagingcouldalsobeknownastheacquisitionarea.ItiscomposedofETLoperations,andoncethedatahasbeenprepared,itwillbesenttothepresentationarea.
Whenthedataisplacedwithinthepresentationarea,anumberofprogramswillanalyzeandreviewit.Whilemanyorganizationsagreeontheoverallgoalofdatawarehouses,theapproachestobuildingthemmaydiffer.Attemptingtousedatamartsaloneisnotagoodapproach,becausetheyaregearedtowardsdepartments.Inadditiontothis,attemptingtousedatamartsalonewillbeinefficient,andyouwillrunintoanumberoflongtermproblems.Therearetwotechniquesforbuildingdatawarehousesthathavebecomeverypopular.ThesearetheKimballBusArchitectureandtheCorporateInformationFactory.
WiththeKimballtechnique,theroughdatawillbetransformedandrefinedwithinthestagingarea.Itisimportanttomakesurethedataisproperlyhandledduringthisstep.Duringthestagingprocess,theroughdatawillbepulledfromthesourcesystems.Whilesomeofthestagingprocessesmaybecentralized,otherswillbedistributed.Thepresentationareawillhaveadimensionalstructure,andthismodelwillholdthesameinformationasastandardmodel.However,itwillbeeasiertouse,anditwilldisplayinformationthatissummarized.
Adimensionalmodelwillbecreatedbyabusinessoperation.Departmentswithintheorganizationdonotplayaroleinthis.Thedatawillbepopulatedonceitisplacedwithinthedimensionalwarehouse,andisnotdependentonthevariousdepartmentsthatmaycomposeanorganization.Whenbusinessprocesseshavebeendevelopedwithinthewarehouse,thesystemwillbecomehighlyefficient.ThenextpopulardatawarehouseapproachthatyouwillwanttobecomefamiliarwithistheCorporateInformationFactory.AnothernameforthistechniqueistheEDWapproach.Thedatathatisextractedfromthesourcewillbecoordinated.
WithintheCIF,astandarddatawarehouseisusedtoholddatarepositories,anditmayalsohavespecificdatawarehouseswhicharedesignedfordatamining.Thedatamartsmaybedesignedforspecificdepartments,andtheymayhavesummarydatawhichisintheformofadimensionalstructure.Theatomicdatamaybeobtainedfromthestandarddatawarehouse.Whiletherearesomesimilaritiesbetweenthesetotechniques,therearesomenotabledifferencesaswell.
Oneoftheprimarydifferencesbetweenthesetwotechniquesisthenormalizeddatafoundation.WiththeKimballapproach,thedatastructuresthatmustbeobtainedbeforethedimensionalpresentationwillbedependentonthesourcedataandtransformation.Inmostcases,theduplicatestorageofdataisnotrequiredinbothdimensionalandnormalizedfoundations.Manyofthepeoplewhochoosetouseanormalizeddatastructurebelievethatitisfasterthanthedimensionalstructure,buttheyoftenfailtotakeETLintoconsideration.
Anotherthingthatseparatesthetwodatawarehouseapproachesisthemanagementofatomicdata.WiththeCIF,atomicdatawillbestoredwithinanormalizeddatawarehouse.Incontrast,theKimballmethodstatesthattheatomicdatashouldbeplacedwithinadimensionalstructure.Whenthedataisplacedwithinadimensionalstructure,itcanbesummarizedinawidevarietyofdifferentways.
Itisimportanttomakesuretheinformationyouhaveisdetailedsothatuserswillbeabletoaskrelevantquestions.Whilemostuserswillnotplaceanemphasisonthedetailsofoneatomictransaction,theymaywantasummaryofalargenumberoftransactions.Itisimportantforthemtohavethedetailssothattheywillbeabletoanswerimportantquestions.Theapproachthatyouchooseshouldbetheonewhichbestservestheneedsofyourcompany.3DataWarehouseDesignStrategies
Tobuildaneffectivedatawarehouse,itisimportantforyoutounderstanddatawarehousedesignprinciples.Ifyourdatawarehouseisnotbuiltcorrectly,youcanrunintoanumberofdifferentproblems.
Thepropermethodsforbuildingapowerfuldatawarehousearebasedoninformationtechnologytactics.Firstoff,itisimportantthatyouandyourorganizationunderstandtheimportanceofhavingadatawarehouse.Ifworkersfeelthatadatawarehouseisunnecessary,theymaynotuseit,andthiscouldcauseconflicts.Everyoneinyourorganizationshouldunderstandtheimportanceofusingthesystem.
Afteryouhavegotyourcolleaguesbehindtheconceptofusingadatawarehouse,youwillwanttonextfocusondataintegrity.Youwillwanttoavoiddesigningadatawarehousethatwillloaddatathatisnotconsistent.Itisalsoimportanttoavoidcreatingadatabasethatwillreplicatedata.Thegoalofyourorganizationshouldbetointegratedataandcreatestandardsthatwillbeusedandfollowed.Afterdataintegrity,youwillnextwanttolookatimplementationefficiency.Thisbasicallymeansthatyouwillwanttodesignatsystemthatissimpletouse.Itdoesn’tmatterhowwelldesignedyourdatawarehouseisifyourworkershaveahardtimeusingit.
Ifyourworkershaveahardtimeusingthedatawarehouse,itwillslowdownthespeedandproductivityofyouroperation.Whenitcomestocreatingadatawarehouse,youwillwanttomakeitassimpleaspossible.Allofyourworkersshouldbeabletouseitwithoutproblems.Implementationefficiencyisaprinciplethatnaturallyleadstothenexttopicyouwillwanttofocuson,andthisisuserfriendliness.Thisisaconceptthatisanimportantpartofyourbusiness.Thereasonforthisisbecauseenduserswillnotutilizeaprogramthatistoodifficulttouse.Itisimportantforyoutokeeptheminmind.Useadesignwhichisfriendlyandeasytolearn.
Onceyouhavedesignedadatawarehousethatisuserfriendly,youwillnextwanttolookatoperationalefficiency.Oncethedatawarehousehasbeencreated,itshouldbeabletocarryoutoperationsquickly.Inadditiontothis,itshouldnothaveerrorsorothertechnicalproblems.Whenerrorsortechnicalproblemsdooccur,theyshouldbesimpletofix.Anotherthingyouwillwanttolookatisthecostinvolvedwithsupportingthesystem.Youwillwanttokeepthesecostslowasmuchaspossible.
Thedesignprinciplesthathavebeendiscussedinthisarticlesofararemorerelatedtobusinessthaninformationtechnology.However,thereareanumberofITdesignprinciplesthatyouwillwanttofollow.Oneoftheseisscalability.Thisisaproblemthatmanydatawarehousedesignersruninto.Thebestwaytodealwiththisissueistocreateadatawarehousethatisscalablefromthebeginning.Designitinawaywhichwillallowittosupportexpansionsorupgrades.Youshouldbeabletoadaptittoanumberofdifferentbusinesssituations.Thebestdatawarehousesarethosewhicharescalable.
Thedatawarehousethatyoudesignshouldfallundertheguidelinesofinformationtechnologystandards.EverytoolthatyouusetobuildyourdatawarehouseshouldworkwellwithITstandards.Youwillwanttomakesureitisdesignedinawaythatmakesiteasierforyourworkerstouse.Whilefollowingtheguidelinesinthisarticlewon’tallowyoutoalwaysbesuccessful,itwillgreatlytiptheoddsinyourfavor.Youshouldbewaryofcompaniesthatpromiseyouperfectresultsifyouusetheirdesignmethods.[2]Nomatterhowwelldesignedyourdatawarehouseis,youwillalwaysrunintoproblems.However,followingtherightprincipleswillmaketheproblemseasiertorecognizeandsolve.
Whenitcomestousingadatawarehouse,itisnotamatterof“if”youwillrunintoproblems.Itismatterof“how”and“when”.Whenyourdatawarehouseiswelldesigned,youwillbebetterequippedtosolveanyproblemsyouencounter.
1.?warehousen.倉庫,貨棧。
2.?goover受歡迎,獲得接受;檢查。
3.?orientvt.vi.使熟悉,使適應;使朝向;確定位置;朝向;確定方向;使適應n.東方,亞洲。
4.?variantn.變體;變種;變型adj.不同的;差別的;變異的;各種各樣的。
5.?specificadj.明確的,確切的,詳盡的;具體的,特有的,特定的;僅限于……的。Vocabulary
6.?volatileadj.飛行的,揮發(fā)性的,可變的,不穩(wěn)定的,輕快的,爆炸性的n.有翅的動物,揮發(fā)物。
7.?scheman.概要,計劃,圖表,模式。
8.?acquisitionn.獲得,得到的東西;得到的人,買進。
9.?aggregationn.集合,凝聚,集成,集結(作用),集合[成]體,集團。
10.?strategyn.戰(zhàn)略(學),策略,計謀,作戰(zhàn)方針;智謀,手腕strategyandtactics戰(zhàn)略與戰(zhàn)術。
11.?Intricateadj.復雜的,錯綜的,難以理解的。
12.?martn.市場;貿易場所。
13.?repositoryn.倉庫,儲藏所;儲物器皿,博物館;學識淵博的人;受人信賴的人,知己。
14.?Stagingn.舉行,進行;配置,階變,級,級組,分段運輸;分級法。
15.?Populatevt.居住,使人口聚居于;移民于;殖民于人口稠密(稀少)的城市。
[1]Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.ImportantSentences
所謂“面向主題”,就是數(shù)據(jù)將提供有關一個具體的主題的信息,而不是有關公司運行的信息。由于數(shù)據(jù)倉庫是面向主題的,因此它就允許你分析與具體主題相關的
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025企業(yè)展覽館設計建設合同范本
- 2025供用電合同協(xié)議范本
- 2025個體健身房器材租賃合同樣式
- 2025年鋼材購銷合同范本
- 浙江省浙南名校聯(lián)盟2024-2025學年高二下學期4月期中考試 生物 含答案
- 患者康復護理
- 金屬活動性順序教學
- 職場魔方培訓體系構建
- 急性放射病的臨床護理
- 辦公室內勤年終總結模版
- PAN纖維結晶度取向度和形貌的演變規(guī)律對其性能影響
- 島津GCMS-TQ8040教材
- (完整版)化工原理各章節(jié)知識點總結
- 空調水管線試壓沖洗方案
- 總公司與分公司承包協(xié)議[頁]
- 食品經營設施空間布局圖
- 預制箱梁運輸及安裝質量保證體系及措施
- GB∕T 36266-2018 淋浴房玻璃(高清版)
- 內科學-原發(fā)性支氣管肺癌
- 航空煤油 MSDS 安全技術說明書
- 導熱系數(shù)測定儀期間核查方法、記錄 Microsoft Word 文檔
評論
0/150
提交評論