280 likes | 618 Views
Understanding Data Quality Issues: . Finding Data Inaccuracies. Art DeMaio Evoke Software VP Technical Sales Support. Agenda. Why is Understanding Data Important Methodology for Assessing Data Defining Weighting Profiling Revisiting Finding Addressing Maintaining What is Profiling
E N D
Understanding Data Quality Issues: Finding Data Inaccuracies Art DeMaio Evoke Software VP Technical Sales Support
Agenda • Why is Understanding Data Important • Methodology for Assessing Data • Defining • Weighting • Profiling • Revisiting • Finding • Addressing • Maintaining • What is Profiling • Benefits of the Assessment
What the Experts say… • “Information quality is not an esoteric notion;it directly affects the effectiveness and efficiency of business processes. Information quality also plays a major role in customer satisfaction.” - Larry P. English
What the Experts say… • “Poor data quality is costly. It lowers customer satisfaction, adds expense, and makes it more difficult to run a business and pursue tactical improvements such as data warehouses and re-engineering.” - Thomas C. Redman
What’s in Your DATA… • “…three-quarters (of participating companies) reported significant problems as a result of defective data, with a third failing to bill or collect receivables as a result.” - In a PricewaterhouseCoopers survey of 600 CIOs, IT directors or similar executives
What is Data Quality? • Accuracy of Content • Structure • Completeness • Timeliness • Presentation
Assessing Your Data 4-Revisit Definitions, Weights Source Data 7-Maintain 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address
Defining Issues Source Data • Standard list • Key requirements • Content • Structure • Completeness • Update list by project or source 1-Define Issues
Defining Issues-sample Source Data 1-Define Issues
Weight Impact • After the issues are initially identified: • Some issues are more critical than others • Weights are not priorities • Assign a weighting factor (1-5) • Weighting factors SHOULD change by project Source Data 2-Weight /Impact 1-Define Issues
Profile Data Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • What does Data Profiling mean?
What is Data Profiling? The use of analytical techniques on data for the purpose of developing a thorough knowledge of its content, structure and quality. A process of developing information about data instead of information from data.
What is Data Profiling? Information About Data: (Data Profiling) 30% of entries in SUPPLIER_ID are blank the range of values in UNIT_PRICE is 5.99 to 4599.99 there are 14 ORDER_HEADER rows with no ORDER_DETAIL rows Information FROM Data: (not Data Profiling) Texas auto buyers buy more Cadillacs per capita than any other state The average mortgage amount increased last year by 6% 10% of last year's customers did not buy anything this year
Profile Data Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • This is multi-step process • Collect documentation • Review the DATA itself • Compare data to documentation • Identify and detail specific issues
Revisit 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 1-Define Issues • Review the issues and weights • Should there be more or less issues • What are they? • Are the relative importance of each issue different?
Findings 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues • Your findings tell others about the data • Documented reports and/or charts • Results database • Quality Assessment Score
Findings-Chart Weighted Issue Rate - 23.8% Weighted Assessment Score - 76.2%
Address the Issues 4-Revisit Definitions, Weights Source Data 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address • Addressing your findings • Actual vs. Potential • Subject Matter Expertise • Cleansing Requirements
Maintain Vigilance 4-Revisit Definitions, Weights Source Data 7-Maintain 3-Profile Data 2-Weight /Impact 5-Findings 1-Define Issues 6-Address • Maintain • Complete the cycle • Periodic review • Document score changes
Why Do The Assessment? • Quantify the quality issues • Isolate true problems • Proactive review • reduces the cost of resolving issues • reduces the risk of customer dissatisfaction • Define the scope of issues • Determine the resources required to address issues
Why Do The Assessment? Project Costs Cost to Address an Issue Project Timeline When you find an Issue
Why should it be done Pay me now or Pay me later TIME
When Should It Be Done? • Every IT data project • Warehousing • CRM • ERP • EAI • M&A • Ongoing based on • Criticality of the system • Current status (score) • Need to re-purpose data
Bibliography Larry P. English: Improving Data Warehouse and Business Information Quality, John Wiley & Sons Inc., 1999 Jack Olson, Data Profiling: The Accuracy Dimension, Morgan Kaufmann, 2002 Thomas C. Redman: Data Quality for the Information Age, Artech House, 1996 PricewaterhouseCoopers, “Global Data Management Survey”, 2001