1 / 61

December 5, 2005

Data Warehouse Architecture Best Practices. December 5, 2005. Speaker: R. Michael Pickering President, Cohesion Systems Consulting Inc. Agenda. Introductions Business Intelligence Background Architecture Best Practices Questions & Answers. Data Warehouse Architecture Best Practices.

zea
Download Presentation

December 5, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse Architecture Best Practices December 5, 2005 Speaker: R. Michael Pickering President, Cohesion Systems Consulting Inc.

  2. Agenda • Introductions • Business Intelligence Background • Architecture Best Practices • Questions & Answers DW Architecture Best Practices

  3. Data Warehouse Architecture Best Practices Introductions

  4. Presenter Biography • R. Michael Pickering • President and Chief Architect, Cohesion Systems Consulting Inc. • previously, Managing Consultant, BI&W, Oracle Consulting (Canada) • before that, Red Brick Systems, Inc. • over 8 years DW experience • Manulife Reinsurance, Bell Canada, USDA, Kraft Foods, LCBO, Telecom Argentina, Nortel Networks, Procter & Gamble, Bayer, Syncrude, OMoHLTC… • Mr. Pickering has had DW articles published in The Handbook of Data Management DW Architecture Best Practices

  5. Cohesion Systems Consulting • Provides DW and BI services, specializing in: • Architecture & Implementation Consulting • Project Management • Databases, Appliances & Emerging Technology • Training & Mentoring • Since inception in 2000, clients have included Enbridge, CIBC, The Bank of New York, Loyalty Management Group, Canada Post Borderfree, Katz Group DW Architecture Best Practices

  6. Audience Survey • By a show of hands, please indicate your experience with: • normalization • dimensional modeling • operational data store • data consolidation • Extract Transform Load (ETL) • metadata architecture • DW appliances DW Architecture Best Practices

  7. Data Warehouse Architecture Best Practices Business Intelligence Background

  8. What is Business Intelligence? • A Data Warehouse is usually one component of an overall business intelligence solution • IT people may be tempted to think in terms of products and technologiesBUT... DW Architecture Best Practices

  9. Overarching Goal • The overarching goal of business intelligence is to provide the information necessary to MANAGE a business • This means providing information in support of management decision making, which is why BI is also called “Decision Support” DW Architecture Best Practices

  10. Stages (4) BI is about “Data Abstraction” wisdom knowledge information data • audience for a data warehouse typically considers higher slices of data abstraction pyramid • lowest level of pyramid is too detailed & unwieldy DW Architecture Best Practices

  11. It’s Not Technology • Business Intelligence is about delivering business value • provide tangible benefit by answering important questions that can help the business to achieve its strategic focus • Improving profitability • Who are our five most profitable clients? • What are our least profitable products? • Reducing cost • Who are our lowest cost suppliers? • Which materials incur highest spoilage costs? • Improving customer satisfaction • What factors may lead to lost customers? DW Architecture Best Practices

  12. Business of BI • In some cases, legislation such as Sarbanes-Oxley or Basel II makes some kind of BI fundamental to doing business • Many leading companies use BI to achieve competitive advantage • E.g. Walmart, Dell, Amazon.com, Kraft, American Express, etc… DW Architecture Best Practices

  13. Data Warehouse Architecture • architecture is about delivering an elegant solution that meets the solution requirements • this means really understanding the problem • DW architecture is part art, part science DW Architecture Best Practices

  14. Good Architecture • ‘It’s not easy to describe a good design, but I’ll know it when I see it’ DW Architecture Best Practices

  15. BI Architecture Requirements • must recognize change as a constant • take incremental development approach • existing applications must continue to work • need to allow more data and new types of data to be added DW Architecture Best Practices

  16. End User Acceptance • understandability • understandability is in the eyes of the beholder • want to hide the complexity • try to make it: • intuitive, obvious • visible, memorable DW Architecture Best Practices

  17. End User Acceptance • performance • don’t want to interrupt the thinking process • provide one click, instantaneous access • warehouse must be available, “production” system DW Architecture Best Practices

  18. Data Warehouse Architecture Best Practices Architecture Best Practices

  19. High Level Architecture • remember the different “worlds” • on-line transaction processing (OLTP) • business intelligence systems (BIS) • users are different • data content is different • data structures are different • architecture & methodology must be different DW Architecture Best Practices

  20. Two Different Worlds • On-Line Transaction Processing • Entity Relational Data Model • created in 1960’s to address performance issues with relational database implementations • normalized to most efficiently get data in • divides the data into many discrete entities • many relationships between these entities • this approach was documented by C.J. Date in An Introduction to Database Systems DW Architecture Best Practices

  21. Two Different Worlds • Business Intelligence Systems • Dimensional Data Model • also called star schema • designed to easily get information out • fewer relationships than ERD, the only table with multiple joins connecting to other tables is the central table • developed in 1960’s by data service providers, formalized by Ralph Kimball in The Data Warehouse Toolkit DW Architecture Best Practices

  22. Entity Relation Disadvantages • all tables look the same • people can’t visualize/remember diagrams • software can’t navigate as schema becomes too complex • business processes mixed together • many artificial keys created DW Architecture Best Practices

  23. Dimensional Model Advantages • simplicity • humans can navigate and remember • software can navigate deterministically • business process explicitly separated (Data Mart) • not so many keys (keys = # of attendant tables) DW Architecture Best Practices

  24. Best Practice #1 • Use a data model that is optimized for information retrieval • dimensional model • denormalized • hybrid approach DW Architecture Best Practices

  25. Data Acquisition Processes • Extract Transform Load (ETL) • the process of unloading or copying data from the source systems, transforming it into the format and data model required in the BI environment, and loading it to the DW • also, a software development tool for building ETL processes (an ETL tool) • many production DWs use COBOL or other general-purpose programming languages to implement ETL DW Architecture Best Practices

  26. Data Quality Assurance • data cleansing • the process of validating and enriching the data as it is published to the DW • also, a software development tool for building data cleansing processes (a data cleansing tool) • many production DWs have only very rudimentary data quality assurance processes DW Architecture Best Practices

  27. Data Acquisition & Cleansing • getting data loaded efficiently and correctly is critical to the success of your DW • implementation of data acquisition & cleansing processes represents from 50 to 80% of effort on typical DW projects • inaccurate data content can be ‘the kiss of death’ for user acceptance DW Architecture Best Practices

  28. Best Practice #2 • Carefully design the data acquisition and cleansing processes for your DW • Ensure the data is processed efficiently and accurately • Consider acquiring ETL and Data Cleansing tools • Use them well! DW Architecture Best Practices

  29. Data Model • Already discussed the benefits of a dimensional model • No matter whether dimensional modeling or any other design approach is used, the data model must be documented DW Architecture Best Practices

  30. Documenting the Data Model • The best practice is to use some kind of data modeling tool • CA ERwin • Sybase PowerDesigner • Oracle Designer • IBM Rational Rose • Etc. • Different tools support different modeling notations, but they are more or less equivalent anyway • Most tools allow sharing of their metadata with an ETL tool DW Architecture Best Practices

  31. Data Model Standards • data model standards appropriate for the environment and tools chosen in your data warehouse should be adopted • considerations should be given to data access tool(s) and integration with overall enterprise standards • standards must be documented and enforced within the DW team • someone must ‘own’ the data model • to ensure a quality data model, all changes should be reviewed thru some formal process DW Architecture Best Practices

  32. Data Model Metadata • Business definitions should be recorded for every field (unless they are technical fields only) • Domain of data should be recorded • Sample values should be included • As more metadata is populated into the modeling tool it becomes increasingly important to be able to share this data across ETL and Data Access tools DW Architecture Best Practices

  33. Metadata Architecture • The strategy for sharing data model and other metadata should be formalized and documented • Metadata management tools should be considered & the overall metadata architecture should be carefully planned DW Architecture Best Practices

  34. Best Practice #3 • Design a metadata architecture that allows sharing of metadata between components of your DW • consider metadata standards such as OMG’s Common Warehouse Metamodel (CWM) DW Architecture Best Practices

  35. Alternative Architecture Approaches • Bill Inmon: “Corporate Information Factory” • Hub and Spoke philosophy • “JBOC” – just a bunch of cubes • Let it evolve naturally DW Architecture Best Practices

  36. What We Want(Architectural Principal) • In most cases, business and IT agree that the data warehouse should provide a ‘single version of the truth’ • Any approach that can result in disparate data marts or cubes is undesireable • This is known as data silos or… DW Architecture Best Practices

  37. Enterprise DW Architecture • how to design an enterprise data warehouse and ensure a ‘single version of the truth’? • according to Kimball: • start with an overall data architecture phase • use “Data Warehouse Bus” design to integrate multiple data marts • use incremental approach by building one data mart at a time DW Architecture Best Practices

  38. Data Warehouse Bus Architecture • named for the bus in a computer • standard interface that allows you to plug in cdrom, disk drive, etc. • these peripherals work together smoothly • provides framework for data marts to fit together • allows separate data marts to be implemented by different groups, even at different times DW Architecture Best Practices

  39. Data Mart Definition • data mart is a complete subset of the overall data warehouse • a single business process OR • a group of related business processes • think of a data mart as a collection of related fact tables sharing conformed dimensions, aka a ‘fact constellation’ DW Architecture Best Practices

  40. Designing The DW Bus • determine which dimensions will be shared across multiple data marts • conform the shared dimensions • produce a master suite of shared dimensions • determine which facts will be shared across data marts • conform the facts • standardize the definitions of facts DW Architecture Best Practices

  41. Dimension Granularity • conformed dimensions will usually be granular • makes it easy to integrate with various base level fact tables • easy to extend fact table by adding new facts • no need to drop or reload fact tables, and no keys have to be changed DW Architecture Best Practices

  42. Conforming Dimensions • by adhering to standards, the separate data marts can be plugged together • e.g. customer, product, time • they can even share data usefully, for example in a drill across report • ensures reports or queries from different data marts share the same context DW Architecture Best Practices

  43. Conforming Dimensions (cont’d) • accomplish this by adding any dimension attribute(s) needed in any data mart(s) to the standard dimension definition • attributes not needed everywhere can always be ignored • typically harder to determine how to load conformed dimensions than to design them initially • need a single integrated ETL process • what is the SOR for each attribute? • how do we deal with attributes for which there is more than one possible SOR? DW Architecture Best Practices

  44. Conforming Facts • in an enterprise, some metrics may not have the same generally accepted definition across all business units • conforming facts is generally a bigger design challenge than conforming dimensions • why? DW Architecture Best Practices

  45. Conforming Facts - Benefits • ensures the constituent data marts can as clearly as possible represent fact data expressed on the same basis using consistent definitions • ensures reports or queries from different data marts share consistent content • success of an Enterprise DW hinges on successfully conformed facts • any perceived inconsistencies in fact definitions across data marts will generally be considered to be a DW bug or data problem by users • if users don’t have full confidence in data quality they may stop using the DW DW Architecture Best Practices

  46. Data Consolidation • a current trend in BI/DW is ‘data consolidation’ • from a software vendor perspective, it is tempting to simplify this: • ‘we can keep all the tables for all your disparate applications in one physical database’ DW Architecture Best Practices

  47. Data Integration • To truly achieve ‘a single version of the truth’, must do more than simply consolidating application databases • Must integrate data models and establish common terms of reference DW Architecture Best Practices

  48. Best Practice #4 • Take an approach that consolidates data into ‘a single version of the truth’ • Data Warehouse Bus • conformed dimensions & facts • OR? DW Architecture Best Practices

  49. Operational Data Store (ODS) • a single point of integration for disparate operational systems • contains integrated data at the most detailed level (transactional) • may be loaded in ‘near real time’ or periodically • can be used for centralized operational reporting DW Architecture Best Practices

  50. Role of an ODS in DW Architecture • In the case where an ODS is a necessary component of the overall DW, it should be carefully integrated into the overall architecture • Can also be used for: • Staging area • Master/reference data management • Etc… DW Architecture Best Practices

More Related