1 / 27

Data Cleanup: Unlock the potential at a corporate scale

Data Cleanup: Unlock the potential at a corporate scale. Thibault Dambrine. IT professional for 25 years Network Designer ETL Data Warehouse Analyst Interface Specialist ERP Developer Data Quality Experience:

henrik
Download Presentation

Data Cleanup: Unlock the potential at a corporate scale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cleanup: Unlock the potential at a corporate scale

  2. ThibaultDambrine • IT professional for 25 years • Network Designer • ETL Data Warehouse Analyst • Interface Specialist • ERP Developer • Data Quality Experience: • Tasked to work on the pre-conversion data cleanup project during the Shell JDE to SAP transition

  3. Introduction • Premise: “What would a Company-wide Data Quality Initiative look like?” • Base: • Experience setting up a data cleanup team, prior to a JDE to SAP data conversion • Realizing the potential for increasing the data value within the corporation

  4. Defining Data Quality • Intrinsic Data Quality: • Accuracy, Completeness, Uniqueness • Reliability, Security and Accessibility • Contextual Data Quality: • Timeliness, Relevance • Inter-operability, consistency of identifiers • Accessibility and Representational Data Quality • Ease of understanding • Consistency of identifiers • Consistency in structures

  5. $ Quantifying the cost of Quality: The 1-10-100 Rule - Additional cost = Less Competitive Business $ Prevention Cost $ Correction Cost Failure Cost The 1-10-100 Quality Cost Rule

  6. The 1-10-100 Rule A Data Quality Example: The 1-10-100 Quality Rule Applied to mailing Data: It costs: • $1.00 to verify data at data entry time • $10.00 to clean the data after the fact • $100.00 to mop up errors caused by bad data • Packages mailed in the wrong address • Lost revenue • Lost customers • Bad (sloppy) reputation • Additional carrying cost for bad data

  7. Data Quality: The Up Side! Trust • Consistent data inquiry results build confidence in information systems • Tractability across Business and IT domains • Consistent data identifiers • promotes internal cross-department reporting • Consistency • Confidence in results Productivity • Removing redundant or near-redundant data • Maximizes re-use of data • Reduces the amount of data being processed • Reduces errors Reliability Consistently good quality data is data you can count on!

  8. A Data Cleanup Initiative • Where to start? • Who will enforce such quality initiatives? • How will the data quality be maintained on on-going basis?

  9. DQB Task 1: Identify Sponsor and Data Quality Boss To Identify sponsor: • Communicate clear understanding of the cost of bad data • Use the 1-10-100 rule • Initiative has to be backed with • Money • Authority • Responsibility

  10. Identify Data Quality Boss – DQB • Must be knowledgeable on data quality • Must be knowledgeable on the Business • Will be responsible for data quality • Will have authority to make changes Note: Responsibility without authority will not work

  11. DQB Task 2: Identify Data Sets • Identify/inventory high-level data sets e.g. • CMDB • ERP • Master Data e.g. • Customer Master • Item Master • Transaction Data e.g. • PO’s & Invoices • Inventory movements • Assign data sets to departments, potential lists of Data owners • Note: The final data owners may not be the one initially penciled in at this stage

  12. DQB Task 3: Identify Business Side Data Owners • Data Owners will effectively be the local, more granular, Data Quality Bosses. Again, they will need to • Be responsible for the data at their level • Have authority to request changes at their level • Have bottom up knowledge of the data, understand what “should be there” • Setup meetings with every department, in line with the Data Sets identified, with aim of coming up with a set of Data Owners • Have a presentation ready • Look for individuals who have been in the Business for a long time, who are well respected, who understand the data, the dependencies, and know who to talk to, to get answers, from the bottom up

  13. DQB Task 4: Request from Data Owners the “Data Quality Specification” or DQS • DQS is a document that spells out the data quality rules e.g. • No duplicates or near-duplicates • Data older than x years should be purged or archived • Data Dependencies such as no detail without a header or no invoice without a PO • No duplicates • Consistency e.g. data format • Quality audit e.g. postal code matches address • More… • Note: Some rules will apply in all DQS Documents • There is value in sharing, reviewing and updating the DQS over time. • Data quality issues are not always apparent until a first cut of data is cleaned up

  14. DQS (part of Task 4 ) Also look for: • Data Islands • Lack of consistent identifiers inhibit a single view of the big picture • Data Opportunity • Could correlated data sets be more useful to the Business? • Data Surprises • Misplaced Data • Information buried in free-form fields

  15. DQB Task 5: Build IT Data Quality Team • Data Quality • Cannot be a “side job” or a part-time task • Must be staffed with individuals who understand data. Best candidates • Proficient in SQL, data extract techniques • Understand ETL tools and techniques • Are detailed-oriented • Experience: Data Warehouse staff is good fishing grounds for such individuals

  16. Mid-Presentation Recap: All the Ingredients are now in placeThe real work can start! • Name Data Quality Sponsor & Data Quality Boss (DQB) • Identify Data Sets • Data Quality Owners • Data Quality Specifications (the DQ Roadmaps) • IT Data Cleansing Team

  17. Introducing: the Data Quality Cycle • We now have • a sponsor • Identified data sets and data owners • They have produced Data Quality Specifications • An IT Team ready to work on the first Data Quality measurements, based on the DQS • Next step: Initiate the cleanup • Not a single iteration but one that will be repeated in a cyclic fashion

  18. The Data Quality Cycle: Simple View DATA

  19. Data Quality Cycle - Corporate Version Analyze Data Improve Data Continuous Improvement Monitor Progress Formalize Schedule Make Progress VISIBLE

  20. Step 1: Identify Bad DataBad Data Definition: Does not adhere to DQS • Coordinate meetings to translate DQS documents into a suite of repeatable data cleansing procedures • Very important that these procedures should be repeatable, schedulable on regular basis • Initial Focus: Identify Bad Data • Bad data(does not adhere to DQS), • Inconsistent data • Old Data • Note: DQS will spell out rules for “old” and “inconsistent” • Ensure results are reported in a format readable by Management at executive level. This initiative has to be VISIBLE

  21. Step 2: Data Cleansing • Data Cleansing can be done in two ways: • Automated, IT based cleanup • Business-based, manual cleanup • Once the bad data is identified, determine who must do what • Business-based, • manual cleanup appropriate for more subtle tasks, e.g. to determine which of two duplicates identified should be kept. These tasks may require additional research, phone calls etc. • Automated, • IT based cleanup good for simpler tasks e.g. making telephone number formats consistent • Can be also sub-contracted to specialized data quality companies

  22. IT-based Data Cleansing and Outsource Considerations • Data cleansing may take valuable time from the Business, which is not available – Data Cleanup effort may suffer as a result • Not all data cleansing is a simple SQL • Not all data is most confidential When considering data cleansing tasks, look at all possible options • Outsourcing some data cleansing tasks may be more economical than doing it all in-house

  23. Step 3: Measure Progress • All programs, procedures written with the aim of identifying data quality issues should • Be stored, like any other programming assets • Be repeatable and be schedulable • Provide aggregate measures to describe the data cleanup status e.g. • X duplicates • Y old records • Z invoices without PO • Progress • Has to be measured in a published dashboard • Has to be visible by the entire organization to provide a sense of value

  24. Step 4: Data Hygiene: * Schedule the Cleanup/Review Tasks * Ensure results are visible • Bad data is created EVERY DAY • Data quality is an on-going effort • Establish, publish a schedule, part of the dashboard • Ensure there is visibility and accountability to ensure the levels of bad data • are going down with time • Or are kept at a minimal level

  25. Step 5: Sharpen the Saw Once the data cleanup cycle is established • Review Results • Review DQS documents periodically (setup schedule) • Get Business input • Improve process • Give input on improvements to be made • Ask the Business to come up with performance improvement measures born from the Data Quality initiative

  26. Conclusion Two sets of Five activities best define the Data Quality The Foundation Setup • Identify Data Quality Sponsor & Data Quality Boss (DQB) • Identify Data Sets • Identify Business Side Data Owners • Define Data Quality Specifications (DQS) - the DQ Roadmaps • Appoint IT Data Cleansing Team The Data Quality Cycle – Ongoing • Identify Bad Data • Initiate Data Cleansing • Measure Progress • Initiate Data Hygiene, Data Cleanup Cycle • Schedule Cleanup/Reviews • Ensure progress visible • Sharpen the Saw

  27. Links • How to improve Data Qualityhttp://www.informit.com/articles/article.aspx?p=399325&seqNum=3 • Predefined data quality rule definitions http://pic.dhe.ibm.com/infocenter/iisinfsv/v9r1/index.jsp?topic=%2Fcom.ibm.swg.im.iis.ia.quality.doc%2Ftopics%2Fpdr_predef.html • Creating Effective Business Rules: Interview with Graham Witthttp://dataqualitypro.com/data-quality-pro-blog/how-to-create-effective-business-rules-graham-witt • Gartner Magic Quadrant on Data Quality Tools – “Demand for data quality tools remains strong” http://www.citia.co.uk/content/files/50_161-377.pdf

More Related