330 likes | 469 Views
Harnessing Metadata to Power Data Governance. Jim Denyer Director Data Warehouse Development. Timeline. Build first data model. Source 2 Target Into production. Data Profiling put into production. Collibra selected. Source 2 Target development starts.
E N D
Harnessing Metadata to Power Data Governance Jim Denyer Director Data Warehouse Development
Timeline • Build first data model • Source 2 Target • Into production • Data Profiling put into production • Collibra selected Source 2 Target development starts • Data Dictionaryv1 web application published • FYIgoes into production • Data Dictionaryv2 • More DB’s revealed • Data Profilingdevelopment starts 2010 2011 2012 2013 • First two Drexel co-op students join Confidential & Proprietary – Internal Use Only
Metadata Inputs Data Models Data Profiling Databases MetADATA Flat file definitions Source to Target Mappings
Find Your Information • Combines ERD model and database system catalog metadata • Creates a single user interface for maintaining and viewing data definitions across the enterprise • In production for 3 years • Heavily used by business analysts, developers, testers and business users • Fluent in Teradata, Oracle, SQL/Server and DB2 • Captures metadata for non-relational metadata such as interface file descriptions • Role-based security to allow updates by SME’s using a browser window
FYI - Architecture • SQL/Server 2008 R2 database • ASP.net Version 4 • Hosted on IIS • Windows authentication • Uses SQL/Server utilities, procedures and SSIS • Browser-based – supports IE and current Firefox version • Developed with co-op power from Drexel University
Managing Transformation Rules
Source 2 Target • Data architecture is a set of transformations • Each has a source and target • From Cinerama-wide spreadsheets • To Single searchable interface
Mastering Reference Data
What is Reference Data? • Reference Data are tables in databases, also called ‘domains’ or ‘lookup tables’ • Captures information about entities • Categorizes information • Allows departments within an organization to speak the same language • Used in data analysis
Can you Have Too Much? • Hundreds of Reference Code data sets • Potential confusion between similarly named data sets
Categories of Reference Data • Medical Code Sets • Administrative Code Sets • Master Data Reference Data • Simple Codes • Complex Codes • Crosswalks
Crosswalks Enterprise Standard Claim Status • Supporting multiple dialects • Types • Simple • Complex • Supports a single book of business • Enterprise definition defined by Data Stewardship Council System A Claim Status System B Claim Status
Managing CHANGE
Reference Code Changes • Pro-active: A business person initiates a new code value; initiate the workflow application • Re-active: A new code value is introduced through an inbound data feed; workflow is started automatically
Corporate Code Set Database - Verification of Code Set Usage (as of 2011) HIPPA Transactions U 8 8 8 2 2 2 2 2 8 8 8 2 3 3 3 CCS Code Set Name Description Data Steward Specific Party Responsible Code Set Source HIPAA Transaction Owner 7 7 7 7 7 2 3 3 Purpose Crosswalk QA Date CCS Code Set # 7 7 7 7 0 1 6 7 8 0 4 5 7 P I D AMA & BCBSA for Level I 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 1 X X X HCPCS Procedure Code Procedure Code Kathryn Turnbull (CPT) CMS for Level II (HCPCS 837 P/I - Bev Prisco, Christy Validation, 278 & 837 Descriptions Leah Harris, Idris Peurifoy Nat) Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 2 X X X Revenue Code Revenue Code Kathryn Turnbull NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation, 278 & 837 Descriptions Leah Harris, Idris Peurifoy Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 3 X X X National Drug Code National Drug Code (NDC) Kathryn Turnbull First DataBank 837 P/I - Bev Prisco, Christy Validation Leah Harris, Idris Peurifoy Johnson, Sharelle Bond 278-JoAnn Hobbs Pam Basil, Danielle Di Donato, 4 X X Admission Source Code Kathryn Turnbull NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation Leah Harris, Idris Peurifoy Johnson, Sharelle Bond A code which represents the priority of the admission to a 278-JoAnn Hobbs facility. This information is 5 X X Admission Type Code Can (John) Carfi NUBC- www.nubc.org 837 P/I - Bev Prisco, Christy Validation, 278 Descriptions pertinent for admitting a Johnson, Sharelle Bond patient to a hospital for inpatient services. A code indicating the reason 278-JoAnn Hobbs or explanation for the 6 X X Ambulance Transport Reason Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions ambulance transport from Johnson, Sharelle Bond one facility to another. A code indicating the type of ambulance transport. This 278-JoAnn Hobbs information indicates 7 X X Ambulance Transport Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions whether it is the initial Johnson, Sharelle Bond transport trip, return trip, etc. A code indicating the type of certification. This code provides information 278- JoAnn Hobbs 8 regarding the status of the X X X Certification Type Code Can (John) Carfi ANSI X12N Imp. Guides 837 P/I - Bev Prisco, Christy 278 Descriptions certification such as whether Johnson, Sharelle Bond it is the initial certification or it is renewed, revised, extended, canceled, etc. Current Operations Governance
Data Acquisition • Inbound data is embedded with code values • Each code table reference is checked
The Vision Solution for reference data
Implementing a Solution that Supports: Simplified the process of moving approved reference data Strong framework for data governance processes Governs business terms and glossaries • Present Stewardship process • Operations and Informatics division needs • Automated workflow for approving new code values • Supporting custom workflow • Maintaining crosswalks
What happens when Data Stewards in the workflow are outside the Enterprise? • To the cloud! • No emailing • No file transfer • No firewall obstacles
Data Profiling • Two types of data problems • Ones that abend a process • “Silent killers” that let bad data flow un-noticed • Monthly • Capture 12 month history of the behavior of data for the major inbound data feeds • Daily • Compare today’s data elements with the 12 month behavior • Automatically publish alert reports highlighting outliers
Data Profiling Measures • Number of distinct values (for character fields, ID fields, and SK fields) • Number of tilde values (for character fields) • Number of white spaces (for character fields) • Number of null values (for all nullable fields) • Number of default -1 values (for SK fields only) • Maximum value (for numeric fields, not SK fields) • Minimum value (for numeric fields, not SK fields) • Maximum date(for date fields) • Minimum date (for date fields) • Average value (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Sum value (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Standard deviation (for numeric fields, not SK, ID, NO, YR/MO/YR_MO, or Timestamp fields. Takes absolute value, ignoring zeros) • Number of invalid code values (for code fields)
The End Thank you!