1 / 18

Demonstration 10 EDW Implementation Strategy and Process

Learn about the standard implementation strategy and process for successful data warehouse demonstration. Discover the importance of prototyping, staging areas, and cross-table validations.

Download Presentation

Demonstration 10 EDW Implementation Strategy and Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Demonstration 10EDW Implementation Strategy and Process www.InstantBI.com 1/10/2012

  2. Agenda • Standard Implementation Strategy • Standard Implementation Process • Prototyping • Setting up the Staging Area

  3. Standard Implementation Strategy

  4. Prototyping • “Plan to build it twice. You will anyway” • Frederick P Brooks – The Mythical Man Month • The BEST IT development book ever written • DWs have become FAR more sophisticated • Early 90s (90-94) • 10-20 dimension tables and 5-10 fact tables • All about ‘sales’ and campaigns • These were the high value applications • 300+ work days to build the ETL for these in cobol • Today with models like BI4ALL or even custom development • 100+ dimension tables and 30+ fact tables (easily) • 200-300 work days to build the ETL • Much more of the ‘ETL time’ is ‘understanding data’

  5. Prototyping • Prototyping allows you to • Find data understanding errors before writing ETL • Quickly tune database model for performance • Start developing reports earlier • Report development tools have become more ‘complex’ • Ensure data integrity earlier • Data integrity is the #1 killer of ETL productivity • We used to only find errors when we wrote the ETL • Find ‘assumptions’ that do not hold up for this EDW • More easily communicate the ‘end result’ to business users • If you can’t build the prototype, you can’t build the real thing • Bottom Line: Strongly recommended

  6. System 1 Source Systems Reporting Systems BIApps 2 BIApps 1 Data Marts BIApps 3 System 2 Data Mart 1 System 3 Data Mart 2 Validate and Direct Connect Clean System 4 Data Mart 3 System 5 Data Mart 4 Trans Trans form EDW form EDW ODS And And Staging Data Mart 5 System 6 Load Area Load Extracts Extracts Data Mart 6 System 7 Commercial Specific Apps System 8 App 7 App 1 App 2 App 3 App 4 App 6 App 5 System 9 App 9 App 10 App 14 App 13 App 11 App 12 App 8 System 10 1

  7. Standard Implementation Strategy • Data is ‘somehow’ extracted from source systems • Must be careful to detect deletes • Can be a very difficult problem to solve • Data, as raw as possible, sent to Staging Area • Usually as files, sometimes as ODBC links • The OLTP system usually controls extraction schedule • Extract one field -> extract whole table • Data is profiled to determine real data type • Staging area tables use ‘real’ data types • Start developing understanding of raw data • Understanding data is a huge and difficult job usually

  8. Standard Implementation Strategy • Data loaded into ‘staging area’ • Today we can afford RDBMS staging areas (used to be files) • As data goes into the staging area fields are converted from possible native types to RDBMS types • Numeric/string dates, codes, flags, numerics in chars • Errors must be caught so both source and target columns are kept so the source value is visible • Calculations within a row are performed • Durations/elapsed times, ages etc • Three Flags are set • Row deleted from source – if it was deleted • Row valid – valid by default – set to ‘N’ if found invalid • Row sent to EDW = ‘N’ because it has not • It is then possible to run ‘cross table validations’ on the data BEFORE sending it into the EDW • Always beware sending invalid data into a DW

  9. Example Staging Table CREATE TABLE UNBILLED_CALLS ( CUM_START_DATE_TIME DATE NOT NULL, CALL_DIALLED_DIGITS VARCHAR2(18 BYTE), CALL_DURATION NUMBER(10,2) NOT NULL, CALL_RETAIL_PRICE NUMBER(14,3) NOT NULL, CALL_BREAKDOWN_CODE VARCHAR2(5 BYTE) NOT NULL, CALL_DISCOUNT NUMBER(4,1) NOT NULL, CALL_UNITS NUMBER(4) NOT NULL, CALL_PP_ALLOWANCE NUMBER(5,1) NOT NULL, CALL_CLASS VARCHAR2(5 BYTE), CALL_CATEGORY VARCHAR2(5 BYTE), CALL_ORIGINATION VARCHAR2(2 BYTE), SERVICE_CODE VARCHAR2(2 BYTE) NOT NULL, CUM_CUSTOMER NUMBER(10) NOT NULL, CUM_SUBSCRIBER VARCHAR2(18 BYTE) NOT NULL, CALL_DIRECTION VARCHAR2(2 BYTE), CALL_LOCATION VARCHAR2(13 BYTE), CALL_DESTINATION VARCHAR2(5 BYTE), CALL_RECORD_TYPE VARCHAR2(3 BYTE) NOT NULL, SERVICE_TYPE VARCHAR2(2 BYTE) NOT NULL, BUCKET_TYPE VARCHAR2(5 BYTE), ROW_DEL_FRM_SRC_IND VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL, ROW_VALID_IND VARCHAR2(1 BYTE) DEFAULT 'Y' NOT NULL, ROW_SENT_TO_IWS VARCHAR2(1 BYTE) DEFAULT 'N' NOT NULL )

  10. Getting Data into Staging Area • Get data into a Staging Area asap • Helps in learning to understand the data • Can query/browse much more easily than in files • We now use (free) utilities to load staging area pttype • Using utilities is faster and less costly than using Infa for pptype development • Read a wide variety of files and load the data ‘as is’ • Defaults the flags • Get ALL data for this release into staging area before starting mapping • Or at least as much as possible • Late arriving data can only confuse the issue

  11. Cross Table Validations • Cross table validations might include • Checking customer/account exists for a sales record • Checking address is a valid address • Checking details provided by retailer match other systems such as sell through capture • Checking codes entered on tables exist • The list of ‘possible’ things to check is endless • EA must decided what validations will stop data from flowing into the EDW • These are more likely to be business than technology based • We have only built to capability to do so, not the rules themselves • Can be implemented in Infa or Stored Procedures

  12. Data Quality • Data Quality Measures/Correction • Can be implemented in ETL tools if acquired or ‘home grown’ • Can send data back to source systems if needed • Is a whole ‘other topic’ • But is will be done at some point in the life of EDW

  13. The ‘Mapping Spreadsheet’ • So, Staging Area has data in it‘What to do next?’ • Build the left hand side of the ‘mapping spreadsheet’ • Source to Target Mapping • The right hand side starts blank!! • Once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and Data Modelling. • To do this, you need to understand current models and have some ideas about BI4ALL model… • Now begins the modeling portion of the training…

  14. Data Mapping • Filling in the right hand side of mapping spreadsheet • Defining ‘real keys’ from source data • Defining tables to be joined/split • Defining how to present a ‘view’ of the staging area such that it can be sent into the EDW • Defining changes to EDW model • Two columns in SS - Table Exists/Column Exists • EDW Modeller sets them for the DBA • We will see the ‘in progress’ version later • Each source field required in the EDW is mapped • Notes and comments are included… • Database level transformations are included in SS • Key role in any EDW implementation

  15. Prototype Building • DBA implements changes to ODW/EDW Physical Model • Tables, indexes, naming standards • Key role is to ‘catch mistakes’ by the modeller • With 000’s of fields to map modeller will make mistakes • From the SS we can now generate pptype ETL • Tools available from IBI • One of which was just written (SeETL) • Includes all elements such as • Generated views • Updating control tables • Running and testing on windows 2008+ • All prior to building Informatica/DataStage ETL

  16. Prototype Building • Once Prototype is ‘relatively stable’ • Sizable amounts of test data can be loaded • Prototype can be moved to deployment platform • Presentation Views can be created • Cognos Catalogs/Business Objects started • Early reports started • More learning about real data volumes can happen • When errors are found (and they will be found) • Prototype can be changed easily • Generated ETL can be changed easily • When everyone is happy with the pptype • EDW is ‘made real’ • Real database, ETL, reports

  17. Conclusion • We have talked about the overall process of EDW Implementation • So, once you have developed your understanding of the data, built/loaded the staging area with all the data you want in the ‘current release’ you are ready to perform ‘data mapping’ and BI4ALL Modelling. • To do this, you need to understand the EDW model… • Now begins the modeling portion of the training…

  18. Thank You for Your Time

More Related