240 likes | 413 Views
Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin. Health and Human Services Data Warehouse Redevelopment Project. Best Practices Data Audit Trails; Common Tables; Physical Data Model Standards; Person Matching; Address Cleansing Common Standards
E N D
Data Warehouse Core Common Models:Progress and Future DirectionJim Tepin
Health and Human Services Data Warehouse Redevelopment Project
Best Practices Data Audit Trails; Common Tables; Physical Data Model Standards; Person Matching; Address Cleansing Common Standards Physical Data Base Design & Security Role Standards High-Level Architecture (including) Statewide Central “Lookup” Database Data Sharing / Central Views / Audit Compliance Security Architecture Design Common Models – Address, Citizen, Events HHS Data Warehouse Redevelopment Project
Target StateInfrastructure: DW Architecture (Cont’d) 5 Option 1 – Single Unified Data Warehouse Data Sources Agency 1 Apps. Agency 2 Apps. Agency N Apps. External Sources OLTPApps. • Single unified Data Warehouse for all participating Departments / Agencies BI needs • Follows the best practice hybrid model • Nothing bypasses the Foundation Layer • No Department / Agency versions of data or independent data marts are part of this ETLTools Subject A Subject B Subject C Foundation Layer • Logical View Data Mart Optimization Layer End Users Page 5
Target StateInfrastructure: DW Architecture (Cont’d) 5 Option 2 – Multiple Data Warehouse • Shared Data Warehouse Infrastructure for those who elect to use it – slight variation of status quo • Data warehouses remain completely under the control of each Department / Agency • Data sharing is achieved on a Department / Agency to Department / Agency basis Data Sources Agency 1 Apps. Agency 2 Apps. Agency N Apps. External Sources OLTPApps. Department/Agency ETL Each Data Warehouse includes common data that is acquired independently Agency 1 Agency 2 Agency N Dimensional views of Department / Agency Data Warehouses End Users Page 6
Data Sources Target StateInfrastructure: DW Architecture (Cont’d) 5 Option 3 – “Master” Data Warehouse OLTPApps. ExternalSources Agency 1 Apps. Agency 2 Sources Agency N Apps. ETL Processes Subject A Subject B Master Data Warehouse Example Department / Agency Data Warehouses Agency 3 Agency 1 Agency 2 Agency N Agency N+1 • Master Data Warehouse contains a subset of common data identified as being widely useful Page 7
Strengths: Provides Department / Agency control For the defined subset of State-wide data a single foundation data model supports consistent results (a single version of the truth) Provides for sharing of the most widely needed data Provides a moderate degree of reuse and leverage of the technology infrastructure and staff Potentially lower total cost of ownership than Option 2 Challenges: Deciding what should be included in the MDW is very challenging AND this will change over time causing rework Provides NO WAY to guarantee consistent results across all Departments / Agencies as there are no built-in controls to ensure the shared data source is used Adding additional data types and relationships can be complex, costly and slow A centralized data warehouse team must be created to manage the Master Data Warehouse 360 degree view of citizens and resulting outcome analysis may only be partially supported Limited consistency of results and measures across Departments / Agencies achieved Substantial redundancy of technologies, tools, staff and data acquisition through duplicated effort Substantially larger total cost of ownership than Option 1 Potential single point of failure Target StateInfrastructure: DW Architecture (Cont’d) 5 Option 3 – “Master” Data Warehouse Page 8
Common Address Model Page 9
HHS Common Address Prototype Addresses across agencies (CSES, DCH, DHS, Judicial) were gathered, analyzed and cleansed. Results: • Total Records: 132.6 million • Unique Raw Records: 34.2 million (74% reduction) Lansing Subset: • Total Records: 2.3 million • Unique Raw Records: 575 thousand (75% reduction) • Unique Cleansed Records: 158 thousand (93% reduction) Reductions above are based on record counts. A common model can also employ various technical means consistently (I.e. compression) to conserve disk space
Data Architecture Common location of both raw and cleansed addresses. Secure Central/Common Orientation Process Architecture Simple Integration “Open” Leverage Available Tools Compliance HHS standards compliant Audit compliant Common Address Model - Goals
Common Area for System Codes & Values Common Area for Federal Standards Codes (FIP, NAICS, etc.) Great Starting Point for Enterprise DW P_SOM_Lookup Database
Integration with Common Citizen Security Mechanisms IQ8 – Delivery Point Validation Common Address – On the Horizon
Overview: Merged View of Various “Events” Very Extensible (i.e. date of birth) Tend to be the relationship of a person, an organization and a time element. Can be “one-time” or over a duration. Intent: Micro-analysis. Macro-analysis. Citizen Events
Prove the concept. Integrate with Common Address Establish Security Architecture Business Intelligence Competency Center Citizen EventsOn the Horizon