340 likes | 593 Views
Managing Information Quality in Organisations. Based on a presentation by Dr Mikhaila Burgess School of Computer Science & Informatics Cardiff University. Session overview. What is quality? What is Data Quality (DQ)? And why is it important anyway?
E N D
Managing Information Quality in Organisations Based on a presentation by Dr Mikhaila Burgess School of Computer Science & Informatics Cardiff University
Session overview • What is quality? What is Data Quality (DQ)? And why is it important anyway? • Potential impact of poor DQ (data quality) • Defining Data Quality • Designing for Quality Data • Ensuring DQ in databases • So what goes wrong? • Potential causes of poor DQ • Managing DQ … and some exercises
Items about things, events, activities, transactions, … Numeric, alphanumeric, figures, sounds, images, … Recorded, stored, but not organised to convey any specific meaning Data vs Information Data Information • “data that have been organised in a manner that gives them meaning for the recipient” (Turban et al, 2005) • known; ‘surprise’ value One person’s data is another’s information
What is ‘quality’? What does the word actually mean?
Why is DQ important? Impact of Poor Data Quality … some examples
Defining Data Quality How do we know what we all mean when we talk about DQ?
Designing for Quality Data Ensuring a level of quality is your databases
So what goes wrong? Some causes of poor quality data & information
Data Entry: Human Aspect • Unintentional errors in data entry • Lack of understanding • Poor Training • Intentional incorrect data entry • Malicious / Non-malicious • Poorly defined or out-of-date collection process • Multiple levels of data entry Garbage in, Garbage out
Data Entry: Technical Aspect • Inaccurate measuring or counting device • Errors in the data storage process • Missing data fields • Data scanner • Poor quality data scanner • Inappropriate scanner • Incorrect set-up Microfiche Microfilm Aperture cards
Herbarium Catalogue • Approx 7 million specimens • Pressed & dried • Preserved in spirit • 30,000 per year • HerbCat • www.kew.org/herbcat/ • ePIC – electronic Plant Information Centre • www.kew.org/epic/
Type Specimen • Over 350,000 • Original specimen • Fixed species name & description • 18th century • Reference point for botanists – applying names correctly (taxonomy & systematics) http://www.kew.org/collections/herb_types.html
Random Data • Thursday 18th March 2010 – NYPD’s Identity Theft Squad deliver cheesecake to Walter (83) and Rose (82) Martin, Brooklyn, NY • 50 raids over 8 years Cops Sorry For Coming To Wrong Home 50 Times 50 errant visits blamed on computer glitch “The snafu started when police used the address as part of what Browne called “random material’’ to test an automated computer system that tracks crime complaints and records of other internal police information” • Apologise & explain … and to check people “weren’t using that address for identity theft” (Associated Press & Boston Globe)
Organisational Issues • Scattering of databases throughout different departments or organisations • Lack of awareness of data quality issues • Obsession with technology • Old (Legacy) databases • Poorly documented data • Missing/poor documentation about purpose • Obsolete data • Mergers & Acquisitions • Non-merging of databases - autonomy • Merging of databases • Data stored in multiple locations and not correctly linked
Merging Databases • Homonyms & synonyms • Surname, Name, Customer, CustName, … • OrderID • ID for order processed for a customer • ID for order placed with a supplier • Representational inconsistency • Data: eg address • Database: eg • Oracle & SQLServer • Access & Objectivity
Merging Databases • Designed for different purpose • Database design • Data collection • Student database • storing module marks, working out number of resits, allowing to proceed, degree classification • storing financial details, whether fees have been paid, ensuring no awards presented until account is clear • RAF, Navy, Army • Codes for individual stock items • Merged db’s … • Iraq – 3 days out of action!
Merging Databases • Duplicate data • eg customer name: Mikhaila Burgess Variations: Dr Mikhaila Burgess Dr M S E Burgess Ms M Burgess Mr M Burgess M Burgess Michaela Burges Michael Burge Misspellings: Mikalia Burgers Mikkalia Burgese Michelle Barron
Creator Custodian Consumer Introducing DQ problems (Strong et al 1997) Data production Same data collected in different data sets Customer data: Sales, Support, Finance, … Hospital: clinical, diagnosis, specialist treatment, finance, … Different purpose, different data stored Not necessarily the same values Different entry procedures & constraints Different relevant information Cascading updates?
Custodian Consumer Creator Introducing DQ problems (Strong et al 1997) Data storage Potentially large volumes of data Accessibility challenges Access codes (eg country: 1-UK, 2-USA, …) Distributed data Heterogeneous storage systems Potentially inconsistent data formats & values
Custodian Consumer Creator Introducing DQ problems (Strong et al 1997) Data usage Information needs change Personal requirements Organisational environment Data no longer relevant Conflicts between accessibility and security, privacy & confidentiality Access limitation due to lacking IT resources But who are these people?
An Issue of Change • Organisations change • The environment changes • government, competition, market needs, customers, customer requirements … • Requirements & specifications change • Different projects have different requirements • Require data for different purposes • Ideal world: stop data entry, clean, ensure fit for purpose, restart with perfect database • Tomorrow it will no longer be perfect!
10 Potholes to IQ (Strong et al 1997) #1 Multiple sources of the same information produce different values. #2 Information is produced using subjective judgments, leading to bias. #3 Systemic errors in information production lead to lost information. #4 Large volumes of stored information make it difficult to access information in a reasonable time. #5 Distributed heterogeneous systems lead to inconsistent definitions, formats, and values. #6 Nonnumeric information is difficult to index. #7 Automated content analysis across information collections is not yet available. #8 As information consumers’ tasks and the organisational environment change, the information that is relevant and useful changes. #9 Easy access to information may conflict with requirements for security, privacy, and confidentiality. #10 Lack of sufficient computing resources limits access.
Managing Information Manage data/information as a product, not a by-product … TQM for Data!
The Deloitte CIO club • October 2005 50% of CIOs report that data quality issues have had a negative impact on their business in the last year, and 6% say it affects them on a daily basis. A further 19% are occasionally affected. Panel admits to lack of strategic approach to managing data quality • • 50% of CIOs consider data quality to be an IT issue: even though 88% also believe that their non-IT colleagues are aware of the benefits of better quality data. Data cleansing is reactive, not proactive. Many CIOs stated it only happens “when it’s needed” – for example, when new systems are introduced – with none carrying out regular, programmed data cleansing sweeps. http://www.deloitte.com/uk/cio/
Managing data as a product (Lee et al 2006) (Wang et al 1998) • Data & Information – typically treated as a by-product • Focus on system, not data • Treat data/information as a product • An end deliverable that will satisfy customer needs • Focus on data & fitness for purpose • Fundamental change in organisations understanding of data • Follow four principles … • Understand consumer’s information needs • Manage the data production process • Manage data as a product with a product life-cycle • Data product manager – responsible for managing the data product Consumer Creator Custodian
TQM to TDQM • TQM – typical foundation for DQ/IQ programmes • Define the IP • Identify characteristics of the IP, determine IQ dimensions • Identify IP requirements • Identification of IP manufacturing process, and those involved • Measurement • Determining extent of IQ problems • Looks at results of previous attempts to resolve issues – learning from experience • Analysis • Pinpoints causes of poor IQ; effects on organisation; consider users; Pareto charts, SPC • Improvement • Delivering methods of continuous improvement
Data Quality Policy • For organisation to remain engaged & succeed in maintaining a viable, sustained DQ effort • Proactively support business activities A DQ policy must reflect the vision of the organisation. • Start DQ management programme … effort not sustained • Single DQ Champion or department … others fail to come on board … not disseminated across business Organisational policy must involve all functions and activities relating to the maintenance of data products.
10 Policy Guidelines (Lee et al 2006) The organisation … • … adopts the basic principle of treating information as product, not by-product. • … establishes and keeps data quality as a part of the business agenda. • … ensures that the data quality policy and procedures are aligned with its business strategy, business policy, and business processes. • … establishes, clearly defined data quality roles and responsibilities as part of its organisation structure. • … ensures that the data architecture is aligned with its enterprise architecture.
10 Policy Guidelines (Lee et al 2006) • … takes a proactive approach in managing changing data needs. • … has practical data standards in place. • … plans for and implements pragmatic methods to identify and solve data quality problems, and has in place a means to periodically review its data quality and data quality environment. • … fosters an environment conducive to learning and innovating with respect to data quality activities. • … establishes a mechanism to resolve disputes and conflicts among different stakeholders.
Examples … http://www.lancashirecare.nhs.uk/documents/FOI_12DataQualityPolicy.pdf http://www.suffolk.gov.uk/CouncilAndDemocracy/OurPerformance/DataQualityPolicy.htm
Review • What is quality? • Defining Quality & DQ • Importance of quality data • DQ in databases • Database design • Database Integrity • Some examples of poor DQ and it’s impact • http://www.iqtrainwrecks.com/ • Measuring DQ • Managing data as product
References CROSBY, P.B. (1978) Quality is Free: The Art of Making Quality Certain, McGraw-Hill. DROMEY, R. G. (1996) Concerning the Chimera. IEEE Software, 13(1), pp 33-43. JURAN, J. M. & GODFREY, A. B. (1999) Juran's Quality Handbook (Fifth Edition), McGraw Hill, USA. LEE, Y.W., PIPINO, L.L., FUNK J.D. and WANG, R. Y. (2006) Journey to Data Quality, MIT Press, MA, USA. PIRSIG, R. M. (1974) Zen and the Art of Motorcycle Maintenance, Random House. REDMAN, T.C. (1995) “Improve Data Quality for Competitive Advantage,” Sloan Management Review, 36(2), Winter 1995, pp 99-107. REDMAN, T.C. (1997) Data Quality for the Information Age, Artech House. STRONG, D.W., LEE, Y.W. & WANG, R.Y. (1997) 10 Potholes in the Road to Information Quality, IEEE Computer, August 1997, pp 38-46. TURBAN, E., ARONSON, J.E., & LIANG, T.P. Decision Support Systems and Intelligent systems (7th ed), Prentice-Hall. WANG, R., LEE, Y.W., PIPINO, L.L. & STRONG D.M. (1998) “Managing Your Information as a Product,” Sloan Management Review, 39(4), Summer 1998, pp95-105. WANG, R. & STRONG D. (1996) Beyond Accuracy: What data quality means to data consumers. Journal of Management Information Systems, Spring 1996, 12(4), pp 5-33. WATSON, R.T. (2003) Data Management: Database and Organizations, Wiley & Sons.