1 / 16

Understanding Data Quality

Understanding Data Quality. Philosophical Position and Important Definitions. D ata quality dimensions in the literature. include dimensions such as accuracy, reliability, importance , consistency, precision, timeliness, understandability, conciseness and usefulness

stacie
Download Presentation

Understanding Data Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Data Quality

  2. Philosophical Position and Important Definitions

  3. Data quality dimensions in theliterature • include dimensions such as accuracy, reliability, importance, consistency, precision, timeliness, understandability, conciseness and usefulness • Wand and Wang (1996: p92)

  4. Kahn et al. (1997) developed a data quality framework based on product and service quality theory, in the context of delivering quality information to information consumers.

  5. Four levels of information quality were defined: • sound information, • useful information, • usable information, and • effective information. • The framework was used to define a process model to help organisations plan to improve data quality.

  6. A more formal approach to data quality is provided in the framework of Wand and Wang (1996) who use Bunge’s ontology to define data quality dimensions. • They formally define five intrinsic data quality problems: incomplete, meaningless, ambiguous, redundant, incorrect.

  7. Semiotic Theory • Semiotic theory concerns the use of symbols to convey knowledge. Stamper (1992) defines six levels for analysing symbols. These are the physical, empirical, syntactic, semantic, pragmatic and social levels.

  8. Data quality could be emphasize on these levels: • Physical - • Empirical - • Syntactic - concerned with the structure of data • Semantic - concerns with the meaning of data • Pragmatic - concerns with the usage of data (usability and usefulness) • Social - concerns with the shared understanding of the meaning of the data/information generated from the data Concern with physical and physical media for communications of data

  9. DISCUSSIONS Discuss the strategies for ensuring quality data in all the categories listed in the form according to levels given.

  10. 4 Common Data Challenges Faced During Modernization: • Data is fragmented across multiple source systems - Each system holds its own notion of the policyholder. This makes developing a unified customer-centric view extremely difficult. The situation is further complicated because the level and amount of detail captured in each system is incongruent.

  11. 4 Common Data Challenges Faced During Modernization: • Data formats across systems are inconsistent - When organization operating with systems from multiple vendors and each vendor has chosen to implement a custom data representation. In order to respond to evolving business needs, this led to a dilution of the meaning and usage of data fields: the same field represents different data, depending on the context.

  12. 4 Common Data Challenges Faced During Modernization: (Cont.) • Data is lacking in quality - When organization has units that are organized by line of functions. Each unit holds expertise in a specific field and operates fairly autonomously. This has resulted in different practices when it comes to data entry. The data models from decades-old systems weren’t designed to handle today's business needs.

  13. 4 Common Data Challenges Faced During Modernization: (Cont.) • Systems are only available in defined windows during the day, not 24/7 - If the organization's core systems are batch oriented. This means that to make updates are not available in the system until batch processing has completed. Furthermore, while the batch processing is taking place, the systems are not available, neither for querying nor for accepting data. Another aspect affecting availability is the closed nature of the systems: They do not expose functionality for reuse by other systems.

  14. Lack of Centralized Approach Hurting Data Quality “Data quality is the foundation for any data-driven effort, but the quality of information globally is poor. Organizations need to centralize their approach to data management to ensure information can be accurately collected and effectively utilized in today’s cross-channel environment.” Thomas Schutz, senior vice president, general manager of Experian Data Quality

More Related