3.51k likes | 4.06k Views
Data Warehousing Fundamentals. Course Objectives. After completing this course, you should be able to do the following: Describe the role of business intelligence (BI) and data warehousing in today’s marketplace
E N D
Course Objectives • After completing this course, you should be able to do the following: • Describe the role of business intelligence (BI) and data warehousing in today’s marketplace • Describe data warehousing terminology and the various technologies that are required to implement a data warehouse • Explain the implementation and organizational issues surrounding a data warehouse project • Identify data warehouse modeling concepts • Explain the extraction, transformation, and loading processes for building a data warehouse
Course Objectives • Identify management and maintenance processes that are associated with a data warehouse project • Describe methods for refreshing warehouse data • Explain warehouse metadata concepts • Identify tools that can be employed at each stage of the data warehouse project • Describe user profiles and techniques for querying the warehouse • Identify methods and tools for accessing and analyzing warehouse data
Lessons • Business Intelligence and Data Warehousing • Defining Data Warehouse Concepts and Terminology • Planning and Managing the Data Warehouse Project • Modeling the Data Warehouse • Building the Data Warehouse: Extracting Data • Building the Data Warehouse: Transforming Data • Building the Data Warehouse: Loading Warehouse Data • Refreshing Warehouse Data • Leaving a Metadata Trail • Managing and Maintaining the Data Warehouse
Lesson 1 Objectives • After completing this lesson, you should be able to do the following: • Describe the role of business intelligence in today’s marketplace • Describe why an online transaction processing system (OLTP) is not suitable for analytical reporting • Describe how extract processing for decision support querying led to data warehouse solutions that are employed today • Explain why businesses are driven to employ data warehouse technology
What Is Business Intelligence? “Business Intelligence is the process of transforming data into information and through discovery transforming that information into knowledge.” Gartner Group
Decision Knowledge Information Data Purpose of Business Intelligence The purpose of business intelligence is to convert the volume of data into business value through analytical reporting. Value Volume
Ad hoc access Productionplatforms Operational reports Decision makers Early Management Information Systems • MIS systems provided business data. • Reports were developed on request. • Reports provided little analysis capability. • Decision support tools gave personal ad hoc access to data.
Analyzing Data from Operational Systems • Data structures are complex. • Systems are designed for high performance and throughput. • Data is not meaningfully represented. • Data is dispersed. • OLTP systems may be unsuitable for intensive queries. Productionplatforms Operational reports
Operational systems Extracts Decisionmakers Data Extract Processing • End user computing offloaded from the operational environment • User’s own data
Operationalsystems Extracts Decisionmakers Extract Explosion Management Issues with Data Extract Programs
Productivity Issues with Extract Processing • Duplicated effort • Multiple technologies • Obsolete reports • No metadata
Data Quality Issues with Extract Processing • No common time basis • Different calculation algorithms • Different levels of extraction • Different levels of granularity • Different data field names • Different data field meanings • Missing information • No data correction rules • No drill-down capability
Data Warehousing and Business Intelligence Enterprise Data Warehouse Legacy Data Operations Data Analytical Reporting External Data Data Marts
Internal and external systems Datawarehouse Decisionmakers Advantages of Warehouse Processing Environments • Controlled • Reliable • Quality information • Single source of data
Advantages of Warehouse Processing Environments • No duplication of effort • No need for tools to support many technologies • No disparity in data, meaning, or representation • No time period conflict • No algorithm confusion • No drill-down restrictions
Success Factors for a Dynamic Business Environment • Know the business • Reinvent to face new challenges • Invest in products • Invest in customers • Retain customers • Invest in technology • Improve access to business information • Provide superior services and products • Be profitable
Business Drivers for Data Warehouses • Provide supporting information systems • Get quality information: • Reduce costs • Streamline the business • Improve margins
Technological Advances Enabling Data Warehousing • Hardware • Operating system • Database • Query tools • Applications • Large databases • 64-bit architectures • Indexing techniques • Affordable, cost-effective open systems • Robust warehouse tools • Sophisticated end user tools
Summary • In this lesson, you should have learned how to: • Describe the role of business intelligence in today’s marketplace • Describe why an online transaction processing system (OLTP) is not suitable for analytical reporting • Describe how extract processing for decision support querying led to data warehouse solutions employed today • Explain why businesses are driven to employ data warehouse technology
Practice 1-1 Overview • This practice covers the following topics: • Answering questions about data warehousing • Discussing how data warehousing meets business needs
Objectives • After completing this lesson, you should be able to do the following: • Identify a common, broadly accepted definition of a data warehouse • Describe the differences of dependent and independent data marts • Identify some of the main warehouse development approaches • Recognize some of the operational properties and common terminology of a data warehouse
Definition of a Data Warehouse • “A data warehouse is a subject oriented, integrated, non-volatile, and time variant collection of data in support of management’s decisions.” • — W.H. Inmon “数据仓库是一个面向主题的、集成的、随时间变化的、非易失的、用于战略决策的数据集合” “Building the Data Warehouse”(1991)
Definition of a Data Warehouse • “…数据仓库无非是所有数据集市的集合...” — Ralph Kimball • “数据仓库是信息数据库的具体实现,用来存储源自业务数据库的共享数据。典型的数据仓库应该是一个主题数据库,支持用户从巨大的运营数据存储中发现信息,支持对业务趋势进行跟踪和响应,实现业务的预测和计划。” — DM Review
Definition of a Data Warehouse “An enterprise structured repository of subject-oriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.” — Oracle’s Data Warehouse Definition “数据仓库是一个过程而不是一个项目” —另一角度描述数据仓库
Subject- oriented Integrated Data Warehouse Nonvolatile Time-variant Data Warehouse Properties
OLTP Applications Data Warehouse Subject Equity Plans Shares Insurance Loans Savings Customer financial information Subject-Oriented • Data is categorized and stored by business subject rather than by application.
Savings Current Accounts Loans Integrated • Data on a given subject is defined and stored once. Customer OLTP Applications Data Warehouse
Data Warehouse Time-Variant • Data is stored as a series of snapshots, each representing a period of time.
Nonvolatile • Typically data in the data warehouse is not updated or deleted. Operational Warehouse Load Insert, Update, Delete, or Read Read
First time load Refresh Refresh Purge or Archive Refresh Changing Warehouse Data Operational Databases Warehouse Database
Usage Curves • Operational system is predictable • Data warehouse: • Variable • Random
Enterprise wide Warehouse • Large scale implementation • Scopes the entire business • Data from all subject areas • Developed incrementally • Single source of enterprisewide data • Synchronized enterprisewide data • Single distribution point to dependent data marts
数据仓库设计中心思想 • 具有一个合适的粒度或细节以满足所有的数据集市 • 设计不能阻碍在数据集市中使用各种技术,能适应多维集市、统计、挖掘及探索型仓库
Data Marts • 数据仓库数据的一个子集。 BI环境中的大部分分析活动均在数据集市中进行。每个数据集市中的数据通常是为特定的功能所定制,不必对其他的使用有效。
OperationalSystems Marketing Finance Sales Flat Files Legacy Data External Data Operations Data External Data Dependent Data Mart Data Marts Data Warehouse Marketing Sales Finance HR
OperationalSystems Flat Files Legacy Data External Data Operations Data External Data Independent Data Mart Sales orMarketing
Features of a Data Mart • Not Real-Time Data • Consolidation and Cleansing
Warehouse Development Approaches • “Big bang” approach • Incremental approach: • Top-down incremental approach • Bottom-up incremental approach
Analyze enterprise requirements Build enterprise data warehouse Report in subsets or store in data marts “Big Bang” Approach
Top-Down Approach • Analyze requirements at the enterprise level • Develop conceptual information model • Identify and prioritize subject areas • Complete a model of selected subject area • Map to available data • Perform a source system analysis • Implement base technical architecture • Establish metadata, extraction, and load processes for the initial subject area • Create and populate the initial subject area data mart within the overall warehouse framework
Bottom-Up Approach • Define the scope and coverage of the data warehouse and analyze the source systems within this scope • Define the initial increment based on the political pressure, assumed business benefit and data volume • Implement base technical architecture and establish metadata, extraction, and load processes as required by increment • Create and populate the initial subject areas within the overall warehouse framework
Incremental Approach to Warehouse Development • Multiple iterations • Shorter implementations • Validation of each phase Increment 1 Strategy Definition Analysis Design Build Iterative Production
Data Warehousing Process Components • Methodology • Architecture • Extraction, Transformation, and Load (ETL) • Implementation • Operation and Support