180 likes | 198 Views
Explore warehouse scenarios, leadership insights, technology decisions, and guiding principles for building a data management road map. Discover the importance of cross-departmental collaboration and incremental delivery in achieving a unified data foundation.
E N D
Holy Warehouse, Holy Grail The Quest for the Single Version of Truth August 2, 2006 Michael Covert
Agenda • Warehouse scenarios • Why are they built? • What can we expect when building one? • Guiding principles • Leadership and Communication • Modeling your business • Incremental Delivery and Construction • Technology Decisions • Building a Data Management road map • A Case Study 2
Warehouse Scenarios • Why do people build data warehouses? • To allow people to focus on analysis instead of collection and assembly of data. This is in response to growing needs to understand a business holistically. • Finding it – where is that data? • Integrating it – merging data from multiple sources. This is often a source of error and inconsistency. • Reporting on it – spreadsheets and expensive tools. Spreadsheets continue to be where most analysis occurs. • To produce a single version of truth • Believing it – people and data errors have been corrected. Differing business meanings have been reconciled so that aggregations and variances make sense. • To reduce internal friction and improve decision making • Reports from non-integrated systems often produce inconsistent and contradictory information. This leads to slowed decision making and in some cases, infighting. 3
Warehouse Scenarios • What can we expect when building a data warehouse? • Data warehouses typically integrate data from multiple areas: • Finance, HR, Sales, Inventory, etc. • This invariably leads to larger than normal decision making processes and to multi-departmental involvement • Politics will occur. • Big picture (enterprise) issues will arise and they will be difficult to solve. • Definitions, processes, frequency, and reliability 4
Warehouse Scenarios • Business data can vary greatly in complexity and in definition • Variation in attributes associated with a common entity type leads to an explosion in number of tables • Loans – auto loan, revolving credit, overdraft protection, mortgage, ad infinitum • Apparel – a myriad of categories, subcategories, and lower level hierarchies of product-specific attributes • Integration and aggregation of these entities requires cross-departmental alignment of the most basic definitions • Risk elements – probability of default, loss given default, collateral codes, service charge allocation • Sales and marketing – promotion and ring code usage, campaign phase definition, category definition, even SKU and package encoding 5
Warehouse Scenarios • The great technology debates • How will I organize my warehouse? • Should I build a star schema or a 3NF database? • How many technologies will I use? • The database itself • Warehouse, warehouse farms • Operational Data Store • Downstream data marts • The ETL Layer • A metadata repository • Master data management • How will I cleanse my data? • Usage of data marts and down stream systems • How will I implement my analytics and reporting layers? 6
Guiding Principles • Leadership • The best functioning data warehouse teams are cross-departmental and collaborative. Leadership is essential. • A shared expense model is preferred, but again requires leadership and cultural adoption. Project based expenses are tempting, but can lead to departmental development of an enterprise asset. • Communication • Communicate frequently. Advise on new data areas as they become available. Strive for smaller but more frequent additions. • Track adoption and use communication to increase usage. • When sufficient adoption as been achieved, if possible TURN OFF THE OLD SYSTEM(S)!!! 7
Guiding Principles • Model your business and maintain these models. • Use a data modeling tool to maintain your models. 8
Guiding Principles • Incremental delivery • Break the warehouse into subject areas that can be developed and evolved incrementally. • Think multi-dimensionally. • Devise a multi-dimensional structure for each subject area. • Identify overlaps where shared dimensions exist. • 3NF versus Star schemas • In most cases, 3NF versus star schema decisions should be based on: • Skill sets • In-place technology and processes • Technology governance • Many times, 3NF schemas have views that emulate star schemas. • Beware of view join overhead. * Build on successes 9
Guiding Principles • Technology selection • Limit technology through IT stewardship and governance. • Stay within product families where possible. • Implement an ETL layer. Use it to reuse data interfaces and reduce point-to-point data complexity. • Use multi-dimensional systems to offload data aggregation complexity. • Maintain conforming dimensions. • Strive for enterprise reporting, but realize that it is very difficult to achieve. 10
Guiding Principles • Layer your database to provide: • A data “landing zone” (also referred to as a staging area) • A data cleansing and integration layer • Assign data ownership to the owning line of business. • Build in “data lineage” traceability. • Program defensively from the very beginning. • Put cleaning rules here. Avoid intermingling them into operational code at all costs. Strive to reverse engineer them out of surviving systems. • A cleansed, integrated layer that is used to: • Feed downstream systems. • Provide the primary reporting interface for end user systems. • Resist allowing access to any other layer, specifically the landing zone! *** • Use this layered technique to build in audit controls and “restartability”. 12
Building a Data Management Road Map • A data management roadmap defines all data management processes and control objectives. • CoBiT, ITIL, et al 13
A Case Study • This case study involves a large financial institution with s significant portion of their business involving collateral-secured loans. • Loan reporting environment was manually intensive and heavily driven off of Microsoft Access databases (approx. 400 Access databases) • End user queries were run against transactional systems. The data that was retrieved was then integrated in redundant, and often inconsistent processes • Limited scalability for future growth (Microsoft Access databases) • Similar queries and duplicated analyses were performed by Business Analysts • Business Analysts spent approximately 80% of their time gathering, re-keying and developing reports • Major inconsistencies in these reports were producing political pressures between sales and the financial analysts. 14
Case Study – Initial Environment Data Sources Origination Dealers Locations Employees Servicing Risk Securitization 15
Case Study – New Environment Landing Zone Cleanse and Integrate 16
Case Study - Results • Excellent user acceptance of new system • Consolidated database now reduces departmental time required to access data • Simplified and pre-integrated data has reduced many inconsistencies • Excellent load and response times • Microsoft Reporting Services is being used to produce increasingly complex reports used by end users • Microsoft Analysis Services produces cubes that are easy to access and have nearly instantaneous response time • The new system architecture is much easier to change since it is simplified. New data sources have been added incrementally. 17
Conclusion • Leadership, Stewardship, and Communication • Business modeling and mapping • Data quality and ownership • Technology Governance and Control • Adoption of a data management roadmap • Data architecture • Technological risk management • Incremental delivery 18