數(shù)據(jù)倉庫(外文翻譯)

上傳人：7*** IP屬地：湖北上傳時間：2022-01-20 格式：DOC 頁數(shù)：7 大?。?5.50KB 積分：20 舉報 版權(quán)申訴

已閱讀5頁，還剩2頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast e

2、volving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon a way to keep customers by learning more about

3、 their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organiz

4、ation's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction o

5、f data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, s

6、ubject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data w

7、arehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses t

8、ypically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-li

9、ne transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on. (3).Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every

10、 key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does

11、not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decis

12、ion support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting,

13、 and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utiliz

14、ation of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on informatio

15、n in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using

16、the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appet

17、ites for spending), (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies, (3) analyzing operations and looking for sources of profit, (4) managing the customer rela

18、tionships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneo

19、us, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to h

20、eterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query in

21、to queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integ

22、ration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database in

23、tegration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processin

24、g databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermo

25、re, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.

26、0; Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database syst

27、ems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data war

28、ehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical p

29、rocessing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP sys

30、tem is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical

31、data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and a

32、n application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different

33、 organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their hu

34、ge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (s

35、ince most data warehouses store historical rather than up-to-date information), although many could be complex queries. Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2. Bu

36、t, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason fo

37、r such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data

38、warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantial

39、ly degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query

40、 often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separ

41、ation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational

42、databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only d

43、etailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.數(shù)據(jù)倉庫數(shù)據(jù)倉庫為商務(wù)運作提供結(jié)構(gòu)與工具，以便系統(tǒng)地組織、理解和使用數(shù)據(jù)進(jìn)行決策。大量組織機構(gòu)已經(jīng)發(fā)現(xiàn)，在當(dāng)今這個充滿競爭、快速發(fā)展的

44、世界，數(shù)據(jù)倉庫是一個有價值的工具。在過去的幾年中，許多公司已花費數(shù)百萬美元，建立企業(yè)范圍的數(shù)據(jù)倉庫。許多人感到，隨著工業(yè)競爭的加劇，數(shù)據(jù)倉庫成了必備的最新營銷武器通過更多地了解客戶需求而保住客戶的途徑?！澳敲础保憧赡軙錆M神秘地問，“到底什么是數(shù)據(jù)倉庫？”數(shù)據(jù)倉庫已被多種方式定義，使得很難嚴(yán)格地定義它。寬松地講，數(shù)據(jù)倉庫是一個數(shù)據(jù)庫，它與組織機構(gòu)的操作數(shù)據(jù)庫分別維護(hù)。數(shù)據(jù)倉庫系統(tǒng)允許將各種應(yīng)用系統(tǒng)集成在一起，為統(tǒng)一的歷史數(shù)據(jù)分析提供堅實的平臺，對信息處理提供支持。按照W. H. Inmon，一位數(shù)據(jù)倉庫系統(tǒng)構(gòu)造方面的領(lǐng)頭建筑師的說法，“數(shù)據(jù)倉庫是一個面向主題的、集成的、時變的、非易失的數(shù)據(jù)集

45、合，支持管理決策制定”。這個簡短、全面的定義指出了數(shù)據(jù)倉庫的主要特征。四個關(guān)鍵詞，面向主題的、集成的、時變的、非易失的，將數(shù)據(jù)倉庫與其它數(shù)據(jù)存儲系統(tǒng)（如，關(guān)系數(shù)據(jù)庫系統(tǒng)、事務(wù)處理系統(tǒng)、和文件系統(tǒng)）相區(qū)別。讓我們進(jìn)一步看看這些關(guān)鍵特征。(1)、面向主題的：數(shù)據(jù)倉庫圍繞一些主題，如顧客、供應(yīng)商、產(chǎn)品和銷售組織。數(shù)據(jù)倉庫關(guān)注決策者的數(shù)據(jù)建模與分析，而不是構(gòu)造組織機構(gòu)的日常操作和事務(wù)處理。因此，數(shù)據(jù)倉庫排除對于決策無用的數(shù)據(jù)，提供特定主題的簡明視圖。(2)、集成的：通常，構(gòu)造數(shù)據(jù)倉庫是將多個異種數(shù)據(jù)源，如關(guān)系數(shù)據(jù)庫、一般文件和聯(lián)機事務(wù)處理記錄，集成在一起。使用數(shù)據(jù)清理和數(shù)據(jù)集成技術(shù)，確保命名約定、編

46、碼結(jié)構(gòu)、屬性度量的一致性等。(3)、時變的：數(shù)據(jù)存儲從歷史的角度（例如，過去5-10 年）提供信息。數(shù)據(jù)倉庫中的關(guān)鍵結(jié)構(gòu)，隱式或顯式地包含時間元素。(4)、非易失的：數(shù)據(jù)倉庫總是物理地分離存放數(shù)據(jù)；這些數(shù)據(jù)源于操作環(huán)境下的應(yīng)用數(shù)據(jù)。由于這種分離，數(shù)據(jù)倉庫不需要事務(wù)處理、恢復(fù)和并行控制機制。通常，它只需要兩種數(shù)據(jù)訪問：數(shù)據(jù)的初始化裝入和數(shù)據(jù)訪問。概言之，數(shù)據(jù)倉庫是一種語義上一致的數(shù)據(jù)存儲，它充當(dāng)決策支持?jǐn)?shù)據(jù)模型的物理實現(xiàn)，并存放企業(yè)決策所需信息。數(shù)據(jù)倉庫也常常被看作一種體系結(jié)構(gòu)，通過將異種數(shù)據(jù)源中的數(shù)據(jù)集成在一起而構(gòu)造，支持結(jié)構(gòu)化和啟發(fā)式查詢、分析報告和決策制定?！昂谩?，你現(xiàn)在問，“那么，什么

47、是建立數(shù)據(jù)倉庫(data warehousing)？”根據(jù)上面的討論，我們把建立數(shù)據(jù)倉庫看作構(gòu)造和使用數(shù)據(jù)倉庫的過程。數(shù)據(jù)倉庫的構(gòu)造需要數(shù)據(jù)集成、數(shù)據(jù)清理、和數(shù)據(jù)統(tǒng)一。利用數(shù)據(jù)倉庫常常需要一些決策支持技術(shù)。這使得“知識工人”（例如，經(jīng)理、分析人員和主管）能夠使用數(shù)據(jù)倉庫，快捷、方便地得到數(shù)據(jù)的總體視圖，根據(jù)數(shù)據(jù)倉庫中的信息做出準(zhǔn)確的決策。有些作者使用術(shù)語“建立數(shù)據(jù)倉庫”表示構(gòu)造數(shù)據(jù)倉庫的過程，而用術(shù)語“倉庫DBMS”表示管理和使用數(shù)據(jù)倉庫。我們將不區(qū)分二者?！敖M織機構(gòu)如何使用數(shù)據(jù)倉庫中的信息？”許多組織機構(gòu)正在使用這些信息支持商務(wù)決策活動，包括:(1)、增加顧客關(guān)注，包括分析顧客購買模式（如，

48、喜愛買什么、購買時間、預(yù)算周期、消費習(xí)慣）；(2)、根據(jù)季度、年、地區(qū)的營銷情況比較，重新配置產(chǎn)品和管理投資，調(diào)整生產(chǎn)策略；(3)、分析運作和查找利潤源；(4)、管理顧客關(guān)系、進(jìn)行環(huán)境調(diào)整、管理合股人的資產(chǎn)開銷。從異種數(shù)據(jù)庫集成的角度看，數(shù)據(jù)倉庫也是十分有用的。許多組織收集了形形色色數(shù)據(jù)，并由多個異種的、自治的、分布的數(shù)據(jù)源維護(hù)大型數(shù)據(jù)庫。集成這些數(shù)據(jù)，并提供簡便、有效的訪問是非常希望的，并且也是一種挑戰(zhàn)。數(shù)據(jù)庫工業(yè)界和研究界都正朝著實現(xiàn)這一目標(biāo)竭盡全力。對于異種數(shù)據(jù)庫的集成，傳統(tǒng)的數(shù)據(jù)庫做法是：在多個異種數(shù)據(jù)庫上，建立一個包裝程序和一個集成程序（或仲裁程序）。這方面的例子包括IBM 的數(shù)據(jù)連

49、接程序 (Data Joiner) 和Informix的數(shù)據(jù)刀(DataBlade)。當(dāng)一個查詢提交客戶站點，首先使用元數(shù)據(jù)字典對查詢進(jìn)行轉(zhuǎn)換，將它轉(zhuǎn)換成相應(yīng)異種站點上的查詢。然后，將這些查詢映射和發(fā)送到局部查詢處理器。由不同站點返回的結(jié)果被集成為全局回答。這種查詢驅(qū)動的方法需要復(fù)雜的信息過濾和集成處理，并且與局部數(shù)據(jù)源上的處理競爭資源。這種方法是低效的，并且對于頻繁的查詢，特別是需要聚集操作的查詢，開銷很大。對于異種數(shù)據(jù)庫集成的傳統(tǒng)方法，數(shù)據(jù)倉庫提供了一個有趣的替代方案。數(shù)據(jù)倉庫使用更新驅(qū)動的方法，而不是查詢驅(qū)動的方法。這種方法將來自多個異種源的信息預(yù)先集成，并存儲在數(shù)據(jù)倉庫中，供直接查詢和分析。與聯(lián)機事務(wù)處理數(shù)據(jù)庫不同，數(shù)據(jù)倉庫不包含最近的信息。然而，數(shù)據(jù)倉庫為集成的異種數(shù)據(jù)庫系

人人文庫> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

數(shù)據(jù)倉庫(外文翻譯)

文檔簡介

溫馨提示

最新文檔

評論

數(shù)據(jù)倉庫(外文翻譯)

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔