1 / 45

Ron Forino DAMA - Washington, DC September 1999

Project Driven Data Quality Improvement. Ron Forino DAMA - Washington, DC September 1999. Examples. According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.

damaris
Download Presentation

Ron Forino DAMA - Washington, DC September 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Driven Data Quality Improvement Ron Forino DAMA - Washington, DC September 1999

  2. Examples • According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid. • Electronic data audits show that the invalid data values in the typical customer database average around 15 - 20%. Physical audits suggest that this number may be closer to 25 - 30%. • In 1992, 96,000 IRS tax refund checks were returned “undeliverable” due to incorrect addresses. • This year, incorrect price data in retail databases will cost American consumers as much as $2.5 billion in overcharges. • According to organizations like the Data Warehouse Institute, the Gartner Group and MetaGroup - Data Quality is one of the top 1-3 success factors to Data Warehousing. • The average mid-sized company may have 30,000 - 50,000 fields in files, tables, screens, reports, etc. [Platinum Technology]

  3. Agenda • Definitions • What is Data Quality? • Tactics and the End Game • Building Blocks to Data Quality • Tactical Initiatives • Strategic Initiatives • Tactical Data Quality • Rule Disclosure • Data Quality Measurement, Analysis and Certification • Meta Data Creation • Validation • Quality Improvement

  4. Definitions

  5. Definitions • Data Transformation - Changing data values to a format consistent with integrity and business rules agreed to by data stakeholders. • Data Cleansing - Consolidation of redundant customer records. Term used to describe the process of “merging and purging” of customer lists in an effort to reduce duplicate or inaccurate customer records. • Data Quality Improvement - The process of improving data quality to the level desired to support the enterprise information demand. • Data Quality -definition to follow….

  6. Data Quality Improvement Decision Tree Task Process Transform Conform to Business Rule Data Reengineering Data Quality Improvement Process Reengineering Standardize Validate Match Dedupe Integrate Enrich Match & Dedupe Data Cleansing

  7. Tactics and The End Game “We need better data quality...” Enterprise Initiative Select Project Data Quality Assessment Report & Recommendations Source System Clean-up Initiative

  8. Tactics and The End Game “We need better data quality...” Data Warehouse Enterprise Initiative Select Project Data Quality Assessment Data Quality Assessment Staging Specifications Report Report & Recommendations Source System Clean-up Initiative Source System Clean-up Initiative

  9. What is [Good] Data Quality?

  10. How Can We Know Good Data Quality? • Column 1 • 321453 • 212392 • 093255 • 214421 • . • . • . Is this Good Data Quality? What can we conclude?

  11. What is Data Quality? • Information Quality = f(Definition + Data + Presentation) • Definition • Defines Data • Domain Value Specification • Business Rules that Govern the Data • Information Architecture Quality • Data Content • Completeness • Validity/Reasonability • Data Presentation • Accessible • Timely • Non-ambiguous

  12. Data Content Missing Data Invalid Data Data Outside Legal Domain Illogical Combinations of Data Structural Record Key Integrity Referential Integrity Cardinality Integrity Migration/Integration Rationalization Anomalies Duplicate or Lost Entities Definitions and Standards Ambiguous Business Rules Multiple Formats for Same Data Elements Different Meanings for the Same Code Value Multiple Codes Values with the Same Meaning Field Used for Unintended Data Data in Filler Y2K Violation Common Data Quality Problems

  13. Building Blocks to Data Quality

  14. Building Blocks of a Data Quality Program Benefits Realization Strategic Defect Prevention DQ Requirements Quality Reengineering Enterprise Cultural Shift QC/Process Auditing Data Stewardship Tactical Validation Quality Improvement Meta Data Creation Measure Analyze & Certify Rule Disclosure

  15. Tactical Data Quality

  16. Steps to Tactical Data Quality Rule Disclosure Measure Quality Analyze & Certify Meta Data Creation Validation Quality Improvement

  17. Rule Disclosure

  18. Sources of Meta Data • Legacy Meta Data • Data Models, Process Models • Data Dictionary, Definitions, Aliases • Glossary of Terms • Transformation Meta Data • Data Mapping • Transformation Rules • Error Handling Rules • Access Meta Data • Data Directory • Data Definitions • The Subject Matter Expert • Database Directory • Domain Values, Range of Values • Run Books • Derived Data Calculations • Audit Statistics • Source & Transformation

  19. Acquiring good Meta Data is Essential Collect Documentation Validate the Meta Data Assess the Data Report Findings Collect Documentation Assess the Data Report Findings Validate Findings Preferred Collect Valid Meta Data Assess the Data Report Findings Meta Data can be gathered before, during or after the Assessment “You can pay me now, or you can pay me later…”

  20. Measuring • Data Quality • Techniques • Tools • Methods

  21. How can Data Quality be Measured? “One accurate measurement is worth a thousand expert opinions” [Grace Hopper, Admiral, US Navy] • Customer Complaints • User Interviews & Feedback • Customer Satisfaction Survey • Data Quality Requirements Gathering • Data Quality Assessments

  22. Measuring Data Quality - Tools • Analysis Tools • Specifically designed assessment tools • Quality Manager, Migration Architect • N & A: Trillium, Group-1, ID Centric, Finalist, etc. • Improvisations • SAS, Focus, SQL, other query tools • Other Necessary Tools • File Transfer • Data Conversion

  23. Assessment Measurements • Level 1: Completeness • Nulls or Blanks • Misuse (or overuse) of Default Values • Level 2: Validity • Data Integrity Anomalies • Invalid Data based on Business Rule • Level 3: Structural Integrity • Primary Key Uniqueness • Key Structure (Cardinality, Referential Integrity, Alternate Keys) • Level 4: Business Rule Violations • Relationship between two or more fields • Calculations Field Integrity Intuitive Integrity Rules Business Rule Integrity Requiring Meta Data

  24. Report Card • Analyze • and Certify • Identifying Problems • Sizing up Problems • “To Certify or Not to Certify…”

  25. Template - field level • Value - the domain occurrence • Frequency - the number of occurrences within the data set • Percent - the % of the whole set • 88 Info - the copybook definition for the value • Analysis - comments about our findings

  26. Identifying Problems 1 2 3 • Analysis (and Discovery) • 1. Is the field required? If so, blanks indicate an anomaly. • 2. Are the values “ID206” and “STANG” allowed? (Is this a problem with the data or the Meta Data? • 3.Some values occur in only 1.3% of the records. Is this telling us there is a problem?

  27. Data Quality Scoring

  28. Example: Poor Data Quality

  29. Field Analysis In a range of values, in the absence of domain rules, investigate the first and last .2% Bell curve distribution

  30. Management Reporting- Short Engagement

  31. Management Reporting - Status

  32. Management Reporting - Anomalies

  33. Management Reporting - Productivity

  34. Meta Data Creation

  35. Example: Data Quality Repository Newly Discovered Rules

  36. Meta Data Supply Chain Definition & Domain Meta Data Gathering Data Quality Statistical Reports Knowledge Management DQ Assessment Transformation & Edit Recommendations Data Quality & Definition Validation SME Validation Data Cleansing Update Meta Data Field Name Data Inventory Meta Data Work Groups Data Requirements

  37. Results Validation

  38. Report Validation SME validation… an opportunity to improve Meta Data 1. Supply a clear name for the field. 2. Is there a good definition? 3. Make the business rules public? 4. Will the SME initiate a data cleansing initiative? 5. Does the SME recommend edit or data transformation rules? 6. Are the findings consistent with the SMEs expectations? Report Sections Identification 1 2 3 Field Definition & Rules 4 5 Score & Explanation Statistical Reports & Analysis 6

  39. Quality Improvement

  40. Next Steps Continued Monitoring Legacy Data Extractions Monthly Reports Information Management Objectives Initiatives Data Clean-up Legacy System Enhancements & Re-engineering Management Report & Recommendations Perform Baseline Assessment (Discovered Business Rules) Steering Committee Data Migration Transformation & Cleansing Specifications Metadata, Models, Reports, etc.

  41. Lessons Learned- Data Cleanup $$ (More complete, more error prone) (Most complete, most accurate, most costly, most timely) (More accurate, less data) 100% Completeness 100% Accuracy

  42. Summary • We made the distinction between: - Data Migration - Data Quality - Data Cleansing • We defined what “good” data quality is. • We discussed that there could be 10 or more processes that could take place in building a comprehensive data quality program for the enterprise. - Tactical should precede the Strategic [or be the 1st step of ] • There are 6 steps to an effective tactical data quality initiative: - Rule Disclosure - Quality Measurement - Analyze and Certify - Meta Data Creation - Validation - Quality Improvement

  43. Reference Material • The Demings Management Method (Total Quality Management), Mary Walton • Data Quality for the Information Age, Tom Redman • The Data Warehouse Challenge: Taming Data Chaos, Michael Brackett • Improving Data Warehouse and Business Information Quality, Larry English • DM Review Magazine, Information Quality series by Larry English

  44. Ron Forino Director, Business Intelligence DMR Consulting Group (732)549-4100 X-8292 rforino@dmr.com ronforino@aol.com

More Related