1 / 27

Data quality with specific reference to current banking data status

Data quality with specific reference to current banking data status. H.Diwakar. Introduction – what is data quality?.

garren
Download Presentation

Data quality with specific reference to current banking data status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data quality with specific reference to current banking data status H.Diwakar

  2. Introduction – what is data quality? • Data quality is achieved whenever data is accurate, considering that accuracy is a concept that encompasses the following properties: correctness, unambiguity, consistency, completeness and timeliness. • data has quality if it satisfies the requirements of its intended use. This implies that data quality is always defined upon a business context of application.

  3. Correctness : data is correct if it conveys a lexically, syntactically and semantically correct statement – e.g., the following pieces of information are not correct: government servant is minor” (semantically wrong);

  4. Unambiguity: data is not ambiguous if it allows only one interpretation – Industry – others Name of the company – XYS Automobile company

  5. Consistency: data is consistent if it doesn’t convey heterogeneity, neither in contents nor in form – consistency between two fields Eg., bank name “ OBC” in one place, in other places Oriental Bank of Commerce Field type mismatch – date as string, or date type in different relations

  6. Completeness: data is complete if no piece of information is missing –

  7. Timeliness: data is accurate if it is up to date.

  8. Data quality • data quality is particularly critical in business intelligence application contexts, due to its impact on decision-making effectiveness.

  9. Introduction • Data quality is often understood as “data cleansing” only. • A more complete view includes aspects such as • Matching • profiling, • Standardization • Cleansing • Monitoring • and enrichment.

  10. various aspects of the dataquality problem • Matching: identification, linking or merging related entries within or across sets of data. • Profiling: analysis of data to capture statistics (metadata) that provide insight into the quality of the data and aid in the identification of data quality issues

  11. various aspects of the dataquality problem • Parsing and standardization: decomposition of text fields into component parts and formatting of values into consistent layouts based on industry standards, local standards (for example, postal authority standards for address data), user defined business rules, and knowledge bases of values and patterns. • Generalized “cleansing”: modification of data values to meet domain restrictions, integrity constraints or other business rules that define sufficient data quality for the organization.•

  12. various aspects of the dataquality problem .• Monitoring: deployment of controls to ensure ongoing conformance of data to business rules that define data quality for the organization. • Enrichment: enhancing the value of internally held data by appending related attributes from external sources (for example, consumer demographic attributes or geographic descriptors).

  13. Preparing CBS data for BI use • Data profiling • Matching (mostly satisfied as referential constraints) • Constraints that are banking domain specific • Standardization • batch-mode silos – as one step in an overall flow or as a stand-alone, offline process. • Now deployed in a real-time manner, integrated into operational systems with the goal of standardizing

  14. data profiling techniques can show the presence of errors but cannot show the absence of errors nor the number of errors. Therefore, any metrics derived from the output of profiling are inexact. This does not make them useless. On the contrary, the errors found are true errors, and if there are enough of them you have uncovered true problems.

  15. Often a single fact is more shocking than statistical metrics. For example ,telling management that a profiling exercise of the birth date of employees revealed that the youngest employee in the company has not been born yet and that the oldest was born before the Civil War is far more effective than a metric at getting across the point that improvements are needed now. • Interest rate for housing loan zero??

  16. Factors in evaluating data capture processes in the data capture environment. • Time between event and recording • Distance between event and recording • Number of handoffs of information before recording • Availability of all facts at recording • Ability to verify information at recording • Motivation of person doing recording • Skill, training and experience of person doing recording • Feedback provided to recorder • Data value support of getting it right • Auto-assist in recording process • Error checking in recording process

  17. document the bad practices independently for the benefit of future projects. Bad practices used in one application frequently find their way into other applications.

  18. Data quality issue remedy types. • • Improve data capture • • Train entry staff • • Replace entry processes • • Provide meaningful feedback • • Change motivations to encourage quality • • Add defensive checkers • • Data entry screens • • Transaction servers • • DBMS implementations • • Add periodic monitoring • • Perform periodic data profiling • • Use data cleansing • • Reengineer and reimplement application • • Reengineer and reimplement data extraction and movement • • Educate user community

  19. Defensive data checkers are software that assists in enforcing rules at the point of data entry to prevent invalid values, invalid combinations of valid values, and structural problems from getting into the database in the first place. Rule checking can be performed in multiple places and through multiple means.

  20. Reference • Magic Quadrant for Data Quality Tools, 2007 -Gartner RAS Core Research Note G00149359, Ted Friedman, Andreas Bitterer, 29 June 2007.

  21. Evaluation Criteria Definitions • Ability to Execute • Product/Service • Overall Viability (Business Unit, Financial, Strategy, Organization): • Sales Execution/Pricing: • Market Responsiveness and Track Record: • Marketing Execution: • Customer Experience: • Operations • Completeness of Vision • Market Understanding • Marketing Strategy • Sales Strategy • Offering (Product) Strategy: • Business Model: • Vertical/Industry Strategy: • nnovation • Geographic Strategy:

  22. Acquired from Ascential - well known data quality player Positioned in information server within their product umbrella of web sphere Has profile stage and quality stage No known customer Product comparison -IBM

  23. Inferences • Data profiling is mandatory before one goes for BI or DW. • Domain specific Rule checking is mandatory. • Once identified, what next? • Policy to be made by bank • Immediate correction • Omit those data • Go back to branches?

  24. acquisition of Firstlogic in 1Q06. good breadth of functional data quality capabilities, data profiling data cleansing operations . the delivery of data quality services in an SOA context. Business Objects’ strength remains very much in applications of customer data quality, specifically in matching/linking, de-duplication, and name and address standardization and validation. Firstlogic’s technology was heavily biased toward customer data quality, Business Objects is lagging behind many of its competitors with regard to product capabilities and experience in addressing other data domains. Product –Business Objectstrengths cautions

  25. has been moving out of the large shadow of its parent company, SAS. Presence in Europe The DataFlux platform includes profiling, cleansing and monitoring capabilities in a single architecture, thereby reducing the efforts to integrate discrete products and increasing usability through a consistent user experience. DataFlux has been seen mostly as a stand-alone data quality technology provider. To become a data quality name brand independent of SAS, DataFlux need to strike partnerships and build a channel with other vendors in the business intelligence and data integration space, -Dataflux

  26. Informatica stepped into the data quality market with its acquisition of Similarity Systems in January 2006, established partnerships between Informatica and Trillium and Firstlogic ended. used in several data quality implementations inside large global companies, well positioned for domain-agnostic cleansing, and the Data Explorer is one of the most mature profiling engines in the market two separate profiling solutions – the PowerCenter profiling option and Informatica Data Explorer.-- the overlap in functionality can sometimes create confusion for customers. architectural rationalization of the various data-quality-relevant components into one concise product family is still under way. traditionally sold to IT buyers. Informatica

More Related