1 / 28

Understanding Data Quality Issues:

Understanding Data Quality Issues: . Finding Data Inaccuracies. Art DeMaio Evoke Software VP Technical Sales Support. Agenda. Why is Understanding Data Important Methodology for Assessing Data Defining Weighting Profiling Revisiting Finding Addressing Maintaining What is Profiling

connie
Download Presentation

Understanding Data Quality Issues:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Data Quality Issues: Finding Data Inaccuracies Art DeMaio Evoke Software VP Technical Sales Support

  2. Agenda • Why is Understanding Data Important • Methodology for Assessing Data • Defining • Weighting • Profiling • Revisiting • Finding • Addressing • Maintaining • What is Profiling • Benefits of the Assessment

  3. What the Experts say… • “Information quality is not an esoteric notion;it directly affects the effectiveness and efficiency of business processes. Information quality also plays a major role in customer satisfaction.” - Larry P. English

  4. What the Experts say… • “Poor data quality is costly. It lowers customer satisfaction, adds expense, and makes it more difficult to run a business and pursue tactical improvements such as data warehouses and re-engineering.” - Thomas C. Redman

  5. What’s in Your DATA… • “…three-quarters (of participating companies) reported significant problems as a result of defective data, with a third failing to bill or collect receivables as a result.” - In a PricewaterhouseCoopers survey of 600 CIOs, IT directors or similar executives

  6. What is Data Quality? • Accuracy of Content • Structure • Completeness • Timeliness • Presentation

  7. Assessing Your Data 4-Revisit Definitions, Weights Source Data 7-Maintain 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address

  8. Defining Issues Source Data • Standard list • Key requirements • Content • Structure • Completeness • Update list by project or source 1-Define Issues

  9. Defining Issues-sample Source Data 1-Define Issues

  10. Weight Impact • After the issues are initially identified: • Some issues are more critical than others • Weights are not priorities • Assign a weighting factor (1-5) • Weighting factors SHOULD change by project Source Data 2-Weight /Impact 1-Define Issues

  11. Profile Data Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • What does Data Profiling mean?

  12. What is Data Profiling? The use of analytical techniques on data for the purpose of developing a thorough knowledge of its content, structure and quality. A process of developing information about data instead of information from data.

  13. What is Data Profiling? Information About Data: (Data Profiling) 30% of entries in SUPPLIER_ID are blank the range of values in UNIT_PRICE is 5.99 to 4599.99 there are 14 ORDER_HEADER rows with no ORDER_DETAIL rows Information FROM Data: (not Data Profiling) Texas auto buyers buy more Cadillacs per capita than any other state The average mortgage amount increased last year by 6% 10% of last year's customers did not buy anything this year

  14. Profile Data Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • This is multi-step process • Collect documentation • Review the DATA itself • Compare data to documentation • Identify and detail specific issues

  15. Revisit 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • Review the issues and weights • Should there be more or less issues • What are they? • Are the relative importance of each issue different?

  16. Findings 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues • Your findings tell others about the data • Documented reports and/or charts • Results database • Quality Assessment Score

  17. Findings-Chart

  18. Findings-Chart

  19. Findings-Chart

  20. Findings-Chart Weighted Issue Rate - 23.8% Weighted Assessment Score - 76.2%

  21. Address the Issues 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address • Addressing your findings • Actual vs. Potential • Subject Matter Expertise • Cleansing Requirements

  22. Maintain Vigilance 4-Revisit Definitions, Weights Source Data 7-Maintain 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address • Maintain • Complete the cycle • Periodic review • Document score changes

  23. Why Do The Assessment? • Quantify the quality issues • Isolate true problems • Proactive review • reduces the cost of resolving issues • reduces the risk of customer dissatisfaction • Define the scope of issues • Determine the resources required to address issues

  24. Why Do The Assessment? Project Costs Cost to Address an Issue Project Timeline When you find an Issue

  25. Why should it be done Pay me now or Pay me later TIME

  26. When Should It Be Done? • Every IT data project • Warehousing • CRM • ERP • EAI • M&A • Ongoing based on • Criticality of the system • Current status (score) • Need to re-purpose data

  27. Bibliography Larry P. English: Improving Data Warehouse and Business Information Quality, John Wiley & Sons Inc., 1999 Jack Olson, Data Profiling: The Accuracy Dimension, Morgan Kaufmann, 2002 Thomas C. Redman: Data Quality for the Information Age, Artech House, 1996 PricewaterhouseCoopers, “Global Data Management Survey”, 2001

More Related