1 / 26

Data Quality Case Study

Data Quality Case Study. Prepared by ORC Macro. Data Correction. Background Data Correction Tracking system SAS AF query application Guidelines Profile Analysis SSNs Names. Profile Analysis—SSNs. Profile Analysis—SSNs. Shared SSNs (n=7,100). Candidates for Correction.

noelle
Download Presentation

Data Quality Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality Case Study Prepared by ORC Macro

  2. Data Correction • Background • Data Correction • Tracking system • SAS AF query application • Guidelines • Profile Analysis • SSNs • Names

  3. Profile Analysis—SSNs

  4. Profile Analysis—SSNs Shared SSNs (n=7,100) Candidates for Correction Different Names 27% Candidates for Collapse Same or Similar Names 73%

  5. Possible Duplicates 23% n=79,300 Unique Persons 77% n=267,081 Profile Analysis—Names

  6. Profile Analysis—Names

  7. OLTP—Commons Cases • Definition • Statistics • Status

  8. Data Correction • Identifying the extent of the problem • Investigating based on type of error • Validating the investigation • Implementing the change • Tracking the identification, investigation, validation, and implementation

  9. Data Correction—An Example PERSON_ID=3070908—PPRF record • Identification of problem • Two different middle initials found • Investigation of problem • TA module • Scripts run • Validation of information • Name, SSN, degree(s), grant(s) • Sources

  10. Data Correction—An Example PERSON_ID=3070908—PPRF record • Implementation of correction • Grants report submitted to NIH OD • Tracking of correction • Internal tracking system • Post-correction • Loss of control of data

  11. Developing a Data Quality Business Plan

  12. Focus of Our Activities Examination of the Database, Procedures, and Interface Development of Modified Use Cases Unified Modeling Language Identification and Extraction of Business Rules Identification of Business Model

  13. Data Quality Issues • Type-over of information • Generation of duplicate persons • Collapsing • Changes in degree and address data • Generation of orphans

  14. Type-Over Practices • Intentions: • Assign a new principal investigator (PI) to a grant • Change the name of a PI on a grant • Correct a misspelled name • Consequences: • Inclusion of incorrect information in a person profile • Absence of linkages between PIs and grant applications • Creation of false linkages between PIs and grant applications

  15. Factors Affecting Quality • Relatively easy access to person-related data elements • Lack of self-validation routines • Interface issues

  16. Solutions • Restricted access • Quality control validation • Interface simplification • Self-validation algorithm

  17. Who does it? ICs A Quality Assurance group Other How is it done? Staging areas Manual and intelligent filtering Architecture Data Quality Validation

  18. GM Module Screen GM1040

  19. GM Module Screen COM1100

  20. Self Validation • Name-matching algorithm • Consistency checking

  21. Higher-Level Analysis The following are being examined relative to their effect on quality: • Commons interface with IMPAC II • Database redundancy • Business rules in the database • Master person file • Front-end design • Human factors • Ownership

  22. Development of a Data Quality Model

  23. Major Goals Quality improvements plan for personal identifiers • Evaluate the different identification algorithms currently in use for IMPAC II • Develop identification algorithm(s) and procedures • Serve as consultant and guarantor of efficacy of algorithm implementation

  24. Moving Forward • Understanding the technical infrastructure • Identification of specific areas of concern • Development/proposal of data quality expectations • Development/proposal of appropriate, acceptable solutions

  25. Data Quality White Paper Knowledge assets are very real and carry tremendous value. Outline • Definition • Rules • Risks and Costs • NIH Expectations • Process • Measurements/Metrics • Testing • Continuous Improvements • Conclusions

  26. Conclusion Examination of the Database, Procedures, and Interface Development of Modified Use Cases Unified Modeling Language Identification and Extraction of Business Rules Identification of Business Model Develop- ment/Proposal of Appropriate, Acceptable Solutions Development/Proposal of Data Quality Expectations Identification of Specific Areas of Concern Understanding the Technical Infrastructure

More Related