1 / 33

Harnessing Metadata to Power Data Governance

Harnessing Metadata to Power Data Governance. Jim Denyer Director Data Warehouse Development. Timeline. Build first data model. Source 2 Target Into production. Data Profiling put into production. Collibra selected. Source 2 Target development starts.

lance
Download Presentation

Harnessing Metadata to Power Data Governance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harnessing Metadata to Power Data Governance Jim Denyer Director Data Warehouse Development

  2. Timeline • Build first data model • Source 2 Target • Into production • Data Profiling put into production • Collibra selected Source 2 Target development starts • Data Dictionaryv1 web application published • FYIgoes into production • Data Dictionaryv2 • More DB’s revealed • Data Profilingdevelopment starts 2010 2011 2012 2013 • First two Drexel co-op students join Confidential & Proprietary – Internal Use Only

  3. FIND YOUR INFORMATION

  4. Metadata Inputs Data Models Data Profiling Databases MetADATA Flat file definitions Source to Target Mappings

  5. Find Your Information • Combines ERD model and database system catalog metadata • Creates a single user interface for maintaining and viewing data definitions across the enterprise • In production for 3 years • Heavily used by business analysts, developers, testers and business users • Fluent in Teradata, Oracle, SQL/Server and DB2 • Captures metadata for non-relational metadata such as interface file descriptions • Role-based security to allow updates by SME’s using a browser window

  6. FYI - Architecture • SQL/Server 2008 R2 database • ASP.net Version 4 • Hosted on IIS • Windows authentication • Uses SQL/Server utilities, procedures and SSIS • Browser-based – supports IE and current Firefox version • Developed with co-op power from Drexel University

  7. FYI - Interface

  8. Demonstration of FYI

  9. Managing Transformation Rules

  10. Source 2 Target • Data architecture is a set of transformations • Each has a source and target • From Cinerama-wide spreadsheets • To Single searchable interface

  11. Source 2 Target Interface

  12. Demonstration of S2T

  13. Mastering Reference Data

  14. What is Reference Data? • Reference Data are tables in databases, also called ‘domains’ or ‘lookup tables’ • Captures information about entities • Categorizes information • Allows departments within an organization to speak the same language • Used in data analysis

  15. Can you Have Too Much? • Hundreds of Reference Code data sets • Potential confusion between similarly named data sets

  16. Categories of Reference Data • Medical Code Sets • Administrative Code Sets • Master Data Reference Data • Simple Codes • Complex Codes • Crosswalks

  17. Crosswalks Enterprise Standard Claim Status • Supporting multiple dialects • Types • Simple • Complex • Supports a single book of business • Enterprise definition defined by Data Stewardship Council System A Claim Status System B Claim Status

  18. Demonstration of Reference Data and Crosswalks

  19. Managing CHANGE

  20. Reference Code Changes • Pro-active: A business person initiates a new code value; initiate the workflow application • Re-active: A new code value is introduced through an inbound data feed; workflow is started automatically

  21. Corporate Code Set Database - Verification of Code Set Usage (as of 2011) HIPPA Transactions U 8 8 8 2 2 2 2 2 8 8 8 2 3 3 3 CCS Code Set Name Description Data Steward Specific Party Responsible Code Set Source HIPAA Transaction Owner 7 7 7 7 7 2 3 3 Purpose Crosswalk QA Date CCS Code Set # 7 7 7 7 0 1 6 7 8 0 4 5 7 P I D AMA & BCBSA for Level I 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 1 X X X HCPCS Procedure Code Procedure Code Kathryn Turnbull (CPT) CMS for Level II (HCPCS 837 P/I - Bev Prisco, Christy Validation, 278 & 837 Descriptions Leah Harris, Idris Peurifoy Nat) Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 2 X X X Revenue Code Revenue Code Kathryn Turnbull NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation, 278 & 837 Descriptions Leah Harris, Idris Peurifoy Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 3 X X X National Drug Code National Drug Code (NDC) Kathryn Turnbull First DataBank 837 P/I - Bev Prisco, Christy Validation Leah Harris, Idris Peurifoy Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 4 X X Admission Source Code Kathryn Turnbull NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation Leah Harris, Idris Peurifoy Johnson, Sharelle Bond A code which represents the priority of the admission to a 278-JoAnn Hobbs facility. This information is 5 X X Admission Type Code Can (John) Carfi NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation, 278 Descriptions pertinent for admitting a Johnson, Sharelle Bond patient to a hospital for inpatient services. A code indicating the reason 278-JoAnn Hobbs or explanation for the 6 X X Ambulance Transport Reason Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions ambulance transport from Johnson, Sharelle Bond one facility to another. A code indicating the type of ambulance transport. This 278-JoAnn Hobbs information indicates 7 X X Ambulance Transport Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions whether it is the initial Johnson, Sharelle Bond transport trip, return trip, etc. A code indicating the type of certification. This code provides information 278- JoAnn Hobbs 8 regarding the status of the X X X Certification Type Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions certification such as whether Johnson, Sharelle Bond it is the initial certification or it is renewed, revised, extended, canceled, etc. Current Operations Governance

  22. Data Acquisition • Inbound data is embedded with code values • Each code table reference is checked

  23. The Vision Solution for reference data

  24. Implementing a Solution that Supports: Simplified the process of moving approved reference data Strong framework for data governance processes Governs business terms and glossaries • Present Stewardship process • Operations and Informatics division needs • Automated workflow for approving new code values • Supporting custom workflow • Maintaining crosswalks

  25. What happens when Data Stewards in the workflow are outside the Enterprise? • To the cloud! • No emailing • No file transfer • No firewall obstacles

  26. Demonstration of Workflow

  27. Data Profiling and Alerts

  28. Data Profiling • Two types of data problems • Ones that abend a process • “Silent killers” that let bad data flow un-noticed • Monthly • Capture 12 month history of the behavior of data for the major inbound data feeds • Daily • Compare today’s data elements with the 12 month behavior • Automatically publish alert reports highlighting outliers

  29. Data Profiling Measures • Number of distinct values (for character fields, ID fields, and SK fields) • Number of tilde values (for character fields) • Number of white spaces (for character fields) • Number of null values (for all nullable fields) • Number of default -1 values (for SK fields only) • Maximum value (for numeric fields, not SK fields) • Minimum value (for numeric fields, not SK fields) • Maximum date(for date fields) • Minimum date (for date fields) • Average value (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Sum value (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Standard deviation (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Number of invalid code values (for code fields)

  30. Data Quality Alerts - Chart

  31. Data Quality Alerts

  32. The End Thank you!

More Related