1 / 23

Building the Warehouse

Building the Warehouse. Chapter 10. Overview. Planning Warehouse Storage. Meeting a Business Need. Defining DW Concepts & Terminology. Choosing a Computing Architecture. ETT (Building The Warehouse). Managing The Data Warehouse. Modeling The Data Warehouse. Planning

Download Presentation

Building the Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building the Warehouse Chapter 10

  2. Overview Planning Warehouse Storage Meeting a Business Need Defining DW Concepts & Terminology Choosing a Computing Architecture ETT (Building The Warehouse) Managing The Data Warehouse Modeling The Data Warehouse Planning For a Successful Warehouse Analyzing User Query Needs Supporting End User Access Project Management (Methodology, Maintaining Metadata)

  3. Extraction/Transformation/Transportation Process (ETT) * Extract source data * Load data into WH * Transform/clean data * Detect change * Index and summarize * Refresh data ETT Programs Gateways Tools Operational systems Warehouse

  4. ETT Processes • Must result in data that is relevant, useful, high-quality, accurate, and accessible • Require a large proportion of warehouse development time and resources Relevant ETT Useful Clean up Quality Consolidate Accurate Restructure Warehouse Accessible Opertational Systems

  5. Data Staging Area • The Construction site for the warehouse • Required by most implementations • Composed of ODS, flat files, or relational server tables • Frequently configured as multitier staging Operational system Data Staging area Warehouse Transport (Load) Extract

  6. Remote Staging Model Data staging area within the warehouse environment Warehouse environment Oper.envt. Data Staging area Warehouse Operational system Data staging area in its own environment, avoiding negative impact on the warehouse environment Oper.envt. Staging envt. Warehouse envt. Operational system Data Staging area Extract, Transform, transport Warehouse Transport (Local)

  7. Onsite Staging Model Data staging area within the operational environment, possibly affecting the operational system Operational environment WH envt. Operational system Data staging area Warehouse Extract Transform

  8. Extracting Data Data mapping • Routines developed to select fields from source • Various data formats • Rules, audit trails, error correction facilities Transform Data Staging area Operational databases Warehouse database

  9. Source Systems • Production • Archive • Internal • External

  10. Production Data IMS DB2 VSAM NonStop SQL Oracle Sybase Rdb SAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials • Operating system platforms • Hardware platforms • File systems • Database systems and vertical applications

  11. Archive Data • Historical data • Useful for analysis over long periods of time • Useful for first-time load • May require unique transformations Warehouse database Operational database

  12. Internal Data • Planning, sales, and marketing organization data • Maintained by: - Spreadsheets (structured) - Documents (unstructured) • Treated like any other source data Planning Marketing Accounting Warehouse database

  13. External Data A.C.Nielsen, IRI, IMS, Waish America Competitive information • Information from outside the organization • Issues of frequency, format, and predictability • Described and tracked using metadata Purchased databases Dun and Bradstreet Economic forecasts Wall Street Journal Barron’s Warehousing databases

  14. Mapping • Defines which operational attributes to use • Defines how to transform the attributes for the warehouse • Defines where the attributes exist in the warehouse • Mapping tools are available Metadata File A Staging File One F1 Number F2 Name F3 DOB Staging File One Number USA123 Name Mr.Bloggs DOB 10-Dec-56 File A F1 123 F2 Bloggs F3 10/12/56

  15. Extraction Techniques • Programs: C, COBOL, PL/SQL • Gateways: transparent database access • In-house development is popular • Tools - High initial cost - Ongoing automation - Data cleanup

  16. Sources and Targets Data marts Data analysis Data mining OLAP

  17. Designing Extraction Processes • Analysis: - Source, technologies - Data types, quality, owners • Design options: - Manual, custom, gateway, third-party - Replication, full, or delta refresh • Design issues: - Batch window, volumes, data currency - Automation, skills needed, resources

  18. Maintaining Extraction Metadata • Source location, type, structure • Access method • Privilege information • Temporary storage • Failure procedures • Validity checks • Handlers for missing data

  19. Possible ETT Failure • A missing source file • A system failure • Poor mapping information • Inadequate storage planning • A source structural change • No contingency plan • Inadequate data validation

  20. Maintaining ETT Quality • ETT must be: - Tested - Documented - Monitored and reviewed • Display metadata must be coordinated

  21. Selection Criteria • Base functionality • Interface features • Metadata repository • Open API • Metadata access • Repository utilities • Input and output processing • Cleansing, reformatting, and auditing • Reference • Training requirements

  22. WTI Partner ETT Tools • Carleton • Constellar • Evolutionary Technologies • Informatica • Information Builders • Oracle EDMS, Toolkits, OADW • Prism Solutions • Sagent • Vality Technology

  23. Summary This lesson discussed the following topics: • ETT processes are essential and consume a large proportion of warehouse resources and time • The extraction process acquires source data • You may encounter many data sources • There are many data extraction issues • ETT Tools should be considered

More Related