310 likes | 593 Views
Data Quality … a venture into some personal experiences and insights. Robert Hickman Pragmatic Information Inc. March 5, 2002 Robert.Hickman@sympatico.ca. Data Quality Topics. Data Quality The Real Ugliness Romantic Reality Cowboys (and Cowgirls). Data Quality Is It Important?.
E N D
Data Quality… a venture into some personal experiences and insights Robert HickmanPragmatic Information Inc. March 5, 2002 Robert.Hickman@sympatico.ca
Data Quality Topics • Data Quality • The Real Ugliness • Romantic Reality • Cowboys (and Cowgirls) (c) Pragmatic Information Inc.
Data QualityIs It Important? “Data and information are now as vital to an organization's well being and future success as oxygen is to human’s.” “The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more that $600 billion a year.” Wayne W. Eckerson - The Data Quality Report (c) Pragmatic Information Inc.
Data QualityWhy do “Information” Quality? • Performing work once • Performing work correctly • Trust the data vs. investigating the data • Report against the data vs. applying data calculations and reformatting • Making the right decisions • Action on business opportunities Source: Larry English (c) Pragmatic Information Inc.
Data QualityDoes is have value? Errors Time (c) Pragmatic Information Inc.
Legacy/OLTP Applications CRM Applications External DataSources Data Quality… is a Process owned by management Source Data DM DM Data Warehouse DM Quality (c) Pragmatic Information Inc.
The real ugliness lies in the relationship between the people who produce the technology and the things they produce, which results in a similar relationship between the people who use the technology and the things they use. Robert M. Pirsig[Zen and the Art of Motorcycle Maintenance] The Real UglinessQuality Quote #1 (c) Pragmatic Information Inc.
The Real UglinessData Presentation - The Business Viewpoint Data Processing Presentation Data (c) Pragmatic Information Inc.
The Real Ugliness Puzzle #1 - Technical (c) Pragmatic Information Inc.
The Real Ugliness “Blank” Result Time JAN FEB MAR Product Hockey Sticks Store 105 100 125 130 Store 106 80 57 94 Store 107 250 302 196 Store 108 180 182 224 Location Store 109 207 212 251 Store 110 10 8 7 Store 111 12 4 35 14 9 2 Total 853 899 939 (c) Pragmatic Information Inc.
The Real Ugliness “Null” Result Time JAN FEB MAR Product Hockey Sticks Store 105 100 125 130 Store 106 80 57 94 Store 107 250 302 196 Store 108 180 182 224 Location Store 109 207 212 251 Store 110 10 8 7 Store 111 12 4 35 Total 839 890 937 (c) Pragmatic Information Inc.
The Real Ugliness “Expected” Result Time JAN FEB MAR Product Hockey Sticks Store 105 100 125 130 Store 106 80 57 94 Store 107 250 302 196 Store 108 180 182 224 Location Store 109 207 212 251 Store 110 10 8 7 Store 111 12 4 35 Store 112 14 9 2 Total 853 899 939 (c) Pragmatic Information Inc.
The Real UglinessPuzzle #2 –Technical/Business Current Address: • 48 Yonge St. Data Reported: 06/98 Previous Address: • 96 Toronto St, Date Reported: 01/99 Prior Address: • 19B6-96 Toronto St E, Date Reported: 06/98 (c) Pragmatic Information Inc.
The Real Ugliness Puzzle #3 - Business Time JAN FEB MAR Product Hockey Sticks Store 105 100 125 130 Store 106 80 57 94 Store 107 250 302 196 Store 108 180 182 224 Location Store 109 207 212 251 Store 110 10 8 7 Store 111 12 4 35 Store 112 14 9 2 Total 853 899 939 (c) Pragmatic Information Inc.
The Real Ugliness Puzzle #4 – Business Time JAN FEB MAR Product Hockey Sticks Tom 10 15 10 Mary 8 5 9 Megan 20 32 16 Sharon 10 12 24 Sales Person Susan 27 22 21 Mike 1 0 7 Robert 1 4 3 Beth. 1 9 2 (c) Pragmatic Information Inc.
The Real Ugliness The Technology/Business Relationship (c) Pragmatic Information Inc.
Romantic reality is the cutting edge of experience. It's the leading edge of the train of knowledge that keeps the whole train on the track. Traditional knowledge is only the collective memory of where that leading edge has been. Robert M. Pirsig[Zen and the Art of Motorcycle Maintenance] Romantic Reality Quality Quote #2 (c) Pragmatic Information Inc.
Romantic Reality A Good Data Quality Habit “See For Yourself” The Seven Habits of Highly Effective Data Modellers – Daniel Moody (c) Pragmatic Information Inc.
Romantic Reality Manufacturing data, or manufacturing data errors … • Un-displayable characters (hex on) • 34,190 null descriptions exist in a single dimension column • Time dimension was not populated • Dimensional hierarchies were incomplete • 71 states in the United States • Many others (c) Pragmatic Information Inc.
Romantic RealityBeyond … • Corporate vs. Department • Calculations • Globalization • Currency • Multiculturalism • Names (c) Pragmatic Information Inc.
Romantic Reality Spanish Double Surnames • Fathers’ last name & mother’s maiden name • Double first name Example: “Juan José Martínez Sánchez” • First names are Juan José • Father’s last name is Martínez • Mother’s maiden name is Sánchez • Hyphen use? Source: http://www.nahj.org/resourceguide/chapter_6.html (c) Pragmatic Information Inc.
One thing about pioneers that you don't hear mentioned is that they are invariably, by their nature, messmakers. They go forging ahead, seeing only their noble, distant goal, and never notice any of the crud and debris they leave behind them. Someone else gets to clean that up and it's not a very glamorous or interesting job. Robert M. Pirsig[Zen and the Art of Motorcycle Maintenance] Cowboys Quote #3 (c) Pragmatic Information Inc.
CowboysClouds of Sunshine Department A = 1500 units Department B = 3700 units Department C = 1700 units = 5000 units (1900?) Company (c) Pragmatic Information Inc.
Legacy/OLTP Applications ERP Applications External DataSources CowboysData Pools Source Data Data Data Warehouse Data Data Data Data Data Data (c) Pragmatic Information Inc.
Legacy/OLTP Applications CRM Application External DataSources CowboysProprietary Solutions (Silos) Source Data Data Warehouse CRM DB (c) Pragmatic Information Inc.
CowboysData Quality Because there is no monitoring . . . Dependent on the viewpoint, “The data is clean or it’s not clean” (c) Pragmatic Information Inc.
CowboysData Quality No Data Quality Plan Status of Data Quality Plans No Plans 48 % Developing Plan 20 % Implementing 21 % Implemented 11% Source TDWI (c) Pragmatic Information Inc.
Start-up (1998) Required culture change Business owned process Data quality program Data integrity incentives Started with purchased data Applied the data quality program Data quality ROI: Large contract awarded Increased market share CowboysA data quality model (c) Pragmatic Information Inc.
Conclusions • Data Quality – Important • The Real Ugliness – Relationship • Romantic Reality – See for yourself • Cowboys – Beware • Get the Commitment from the top • Develop a Data Quality Program • Start Now (c) Pragmatic Information Inc.
Selected References Books • Improving Data Warehouse and Business Information Quality – Larry English • Managing the Data Warehouse – W. Inmon, J. Welch, K. Glassey • The Data Warehouse Lifecycle Toolkit – R. Kimball, L. Reeves, M. Ross, W. Thornthwaite • Zen and the Art of Motorcycle Maintenance – Robert Pirsig White papers • TDWI’s Data Quality Report – http://www.dw-institute.com/ Presentations • The Seven Habits of Highly Effective Data Modellers – Daniel Moody Web Sites • Source: http://www.nahj.org/resourceguide/chapter_6.html (c) Pragmatic Information Inc.