




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast e
2、volving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon a way to keep customers by learning more about
3、 their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organiz
4、ation's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction o
5、f data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, s
6、ubject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data w
7、arehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses t
8、ypically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-li
9、ne transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on. (3).Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every
10、 key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does
11、not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decis
12、ion support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting,
13、 and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utiliz
14、ation of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on informatio
15、n in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using
16、the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appet
17、ites for spending), (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies, (3) analyzing operations and looking for sources of profit, (4) managing the customer rela
18、tionships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneo
19、us, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to h
20、eterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query in
21、to queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integ
22、ration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database in
23、tegration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processin
24、g databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermo
25、re, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.
26、0; Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database syst
27、ems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data war
28、ehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical p
29、rocessing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP sys
30、tem is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical
31、data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and a
32、n application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different
33、 organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their hu
34、ge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (s
35、ince most data warehouses store historical rather than up-to-date information), although many could be complex queries. Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2. Bu
36、t, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason fo
37、r such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data
38、warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantial
39、ly degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query
40、 often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separ
41、ation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational
42、databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only d
43、etailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.數(shù)據(jù)倉庫數(shù)據(jù)倉庫為商務(wù)運作提供結(jié)構(gòu)與工具,以便系統(tǒng)地組織、理解和使用數(shù)據(jù)進(jìn)行決策。大量組織機構(gòu)已經(jīng)發(fā)現(xiàn),在當(dāng)今這個充滿競爭、快速發(fā)展的
44、世界,數(shù)據(jù)倉庫是一個有價值的工具。在過去的幾年中,許多公司已花費數(shù)百萬美元,建立企業(yè)范圍的數(shù)據(jù)倉庫。許多人感到,隨著工業(yè)競爭的加劇,數(shù)據(jù)倉庫成了必備的最新營銷武器通過更多地了解客戶需求而保住客戶的途徑?!澳敲础保憧赡軙錆M神秘地問,“到底什么是數(shù)據(jù)倉庫?”數(shù)據(jù)倉庫已被多種方式定義,使得很難嚴(yán)格地定義它。寬松地講,數(shù)據(jù)倉庫是一個數(shù)據(jù)庫,它與組織機構(gòu)的操作數(shù)據(jù)庫分別維護(hù)。數(shù)據(jù)倉庫系統(tǒng)允許將各種應(yīng)用系統(tǒng)集成在一起,為統(tǒng)一的歷史數(shù)據(jù)分析提供堅實的平臺,對信息處理提供支持。按照W. H. Inmon,一位數(shù)據(jù)倉庫系統(tǒng)構(gòu)造方面的領(lǐng)頭建筑師的說法,“數(shù)據(jù)倉庫是一個面向主題的、集成的、時變的、非易失的數(shù)據(jù)集
45、合,支持管理決策制定”。這個簡短、全面的定義指出了數(shù)據(jù)倉庫的主要特征。四個關(guān)鍵詞,面向主題的、集成的、時變的、非易失的,將數(shù)據(jù)倉庫與其它數(shù)據(jù)存儲系統(tǒng)(如,關(guān)系數(shù)據(jù)庫系統(tǒng)、事務(wù)處理系統(tǒng)、和文件系統(tǒng))相區(qū)別。讓我們進(jìn)一步看看這些關(guān)鍵特征。(1)、 面向主題的:數(shù)據(jù)倉庫圍繞一些主題,如顧客、供應(yīng)商、產(chǎn)品和銷售組織。數(shù)據(jù)倉庫關(guān)注決策者的數(shù)據(jù)建模與分析,而不是構(gòu)造組織機構(gòu)的日常操作和事務(wù)處理。因此,數(shù)據(jù)倉庫排除對于決策無用的數(shù)據(jù),提供特定主題的簡明視圖。(2)、集成的:通常,構(gòu)造數(shù)據(jù)倉庫是將多個異種數(shù)據(jù)源,如關(guān)系數(shù)據(jù)庫、一般文件和聯(lián)機事務(wù)處理記錄,集成在一起。使用數(shù)據(jù)清理和數(shù)據(jù)集成技術(shù),確保命名約定、編
46、碼結(jié)構(gòu)、屬性度量的一致性等。(3)、時變的:數(shù)據(jù)存儲從歷史的角度(例如,過去5-10 年)提供信息。數(shù)據(jù)倉庫中的關(guān)鍵結(jié)構(gòu),隱式或顯式地包含時間元素。(4)、 非易失的:數(shù)據(jù)倉庫總是物理地分離存放數(shù)據(jù);這些數(shù)據(jù)源于操作環(huán)境下的應(yīng)用數(shù)據(jù)。由于這種分離,數(shù)據(jù)倉庫不需要事務(wù)處理、恢復(fù)和并行控制機制。通常,它只需要兩種數(shù)據(jù)訪問:數(shù)據(jù)的初始化裝入和數(shù)據(jù)訪問。概言之,數(shù)據(jù)倉庫是一種語義上一致的數(shù)據(jù)存儲,它充當(dāng)決策支持?jǐn)?shù)據(jù)模型的物理實現(xiàn),并存放企業(yè)決策所需信息。數(shù)據(jù)倉庫也常常被看作一種體系結(jié)構(gòu),通過將異種數(shù)據(jù)源中的數(shù)據(jù)集成在一起而構(gòu)造,支持結(jié)構(gòu)化和啟發(fā)式查詢、分析報告和決策制定?!昂谩?,你現(xiàn)在問,“那么,什么
47、是建立數(shù)據(jù)倉庫(data warehousing)?”根據(jù)上面的討論,我們把建立數(shù)據(jù)倉庫看作構(gòu)造和使用數(shù)據(jù)倉庫的過程。數(shù)據(jù)倉庫的構(gòu)造需要數(shù)據(jù)集成、數(shù)據(jù)清理、和數(shù)據(jù)統(tǒng)一。利用數(shù)據(jù)倉庫常常需要一些決策支持技術(shù)。這使得“知識工人”(例如,經(jīng)理、分析人員和主管)能夠使用數(shù)據(jù)倉庫,快捷、方便地得到數(shù)據(jù)的總體視圖,根據(jù)數(shù)據(jù)倉庫中的信息做出準(zhǔn)確的決策。有些作者使用術(shù)語“建立數(shù)據(jù)倉庫”表示構(gòu)造數(shù)據(jù)倉庫的過程,而用術(shù)語“倉庫DBMS”表示管理和使用數(shù)據(jù)倉庫。我們將不區(qū)分二者?!敖M織機構(gòu)如何使用數(shù)據(jù)倉庫中的信息?”許多組織機構(gòu)正在使用這些信息支持商務(wù)決策活動,包括:(1)、增加顧客關(guān)注,包括分析顧客購買模式(如,
48、喜愛買什么、購買時間、預(yù)算周期、消費習(xí)慣);(2)、根據(jù)季度、年、地區(qū)的營銷情況比較,重新配置產(chǎn)品和管理投資,調(diào)整生產(chǎn)策略;(3)、分析運作和查找利潤源;(4)、管理顧客關(guān)系、進(jìn)行環(huán)境調(diào)整、管理合股人的資產(chǎn)開銷。從異種數(shù)據(jù)庫集成的角度看,數(shù)據(jù)倉庫也是十分有用的。許多組織收集了形形色色數(shù)據(jù),并由多個異種的、自治的、分布的數(shù)據(jù)源維護(hù)大型數(shù)據(jù)庫。集成這些數(shù)據(jù),并提供簡便、有效的訪問是非常希望的,并且也是一種挑戰(zhàn)。數(shù)據(jù)庫工業(yè)界和研究界都正朝著實現(xiàn)這一目標(biāo)竭盡全力。對于異種數(shù)據(jù)庫的集成,傳統(tǒng)的數(shù)據(jù)庫做法是:在多個異種數(shù)據(jù)庫上,建立一個包裝程序和一個集成程序(或仲裁程序)。這方面的例子包括IBM 的數(shù)據(jù)連
49、接程序 (Data Joiner) 和Informix的數(shù)據(jù)刀(DataBlade)。當(dāng)一個查詢提交客戶站點,首先使用元數(shù)據(jù)字典對查詢進(jìn)行轉(zhuǎn)換,將它轉(zhuǎn)換成相應(yīng)異種站點上的查詢。然后,將這些查詢映射和發(fā)送到局部查詢處理器。由不同站點返回的結(jié)果被集成為全局回答。這種查詢驅(qū)動的方法需要復(fù)雜的信息過濾和集成處理,并且與局部數(shù)據(jù)源上的處理競爭資源。這種方法是低效的,并且對于頻繁的查詢,特別是需要聚集操作的查詢,開銷很大。對于異種數(shù)據(jù)庫集成的傳統(tǒng)方法,數(shù)據(jù)倉庫提供了一個有趣的替代方案。數(shù)據(jù)倉庫使用更新驅(qū)動的方法,而不是查詢驅(qū)動的方法。這種方法將來自多個異種源的信息預(yù)先集成,并存儲在數(shù)據(jù)倉庫中,供直接查詢和分析。與聯(lián)機事務(wù)處理數(shù)據(jù)庫不同,數(shù)據(jù)倉庫不包含最近的信息。然而,數(shù)據(jù)倉庫為集成的異種數(shù)據(jù)庫系
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 制硫璃瓦行業(yè)深度研究分析報告(2024-2030版)
- 鋰電池及正極材料生產(chǎn)項目可行性實施報告
- 2021-2026年中國綠色蔬菜市場運營態(tài)勢及發(fā)展前景預(yù)測報告
- 2025年 紅河州紅河縣人民檢察院招聘聘用制書記員附答案
- 2025年 廣東省塔式起重機操作證理論考試練習(xí)題附答案
- 中國家用物聯(lián)網(wǎng)行業(yè)發(fā)展監(jiān)測及投資戰(zhàn)略研究報告
- 2025年智能電網(wǎng)成套設(shè)備項目綜合評估報告
- 中國無線路由器行業(yè)市場前景預(yù)測及投資價值評估分析報告
- 四川垃圾箱項目投資分析報告參考范文
- 聚氨酯粘合劑項目投資價值分析報告
- 瓦斯防治考試題及答案
- 國家開放大學(xué)2025年《創(chuàng)業(yè)基礎(chǔ)》形考任務(wù)1答案
- 《鼻腔止血材料研究》課件
- 2024年吉林四平事業(yè)單位招聘考試真題答案解析
- 建筑設(shè)計防火規(guī)范
- 2025-2030工程監(jiān)理行業(yè)市場深度分析及競爭格局與投資價值研究報告
- 2024-2025學(xué)年度高中物理期中考試卷
- 福州一號線盾構(gòu)法地鐵工程整體施工組織設(shè)計
- GB 10770-2025食品安全國家標(biāo)準(zhǔn)嬰幼兒罐裝輔助食品
- 臨時鍋爐工用工合同標(biāo)準(zhǔn)文本
- 單病種質(zhì)量管理實施方案
評論
0/150
提交評論