1 / 19

The Data Quality Score

Learn about the development and definition of the Data Quality Score (DQS) for assessing overall data quality in models. Explore a worked example and potential uses of the DQS in tracking data maturity and making informed decisions.

rcauthen
Download Presentation

The Data Quality Score

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Data Quality Score Pete Bailey Dstl Policy and Capability Studies

  2. Agenda • Background • Development of the Data Quality Score (DQS) • Definition of the DQS • Worked example • Potential uses of the DQS • Questions

  3. Background • An OA model is only as good as the data supporting it • it could be argued that the data is more important than the model • No simple metric available to indicate the overall quality of a data set • Difficult to assess the increasing maturity of those data sets • Difficult to compare data sets for different scenarios in the same model • Difficult, if not impossible, to compare the quality of data sets developed in support of a range of models

  4. DIAMOND Data Paper • Data paper produced as part of the development and validation of the DIAMOND model • Recorded the definition of every data item in the model • Recorded the data generation process for every data item

  5. Data Table • Records the following information about every data item in DIAMOND • Dimension • Scope (scenario independent, scenario dependent) • Quantity (defined once, a few times, many times) • Quality (not applicable, low, medium, high) • Maximum Quality attainable • Source • 267 data items detailed

  6. Data Paper

  7. Summary of Data

  8. Data Quality Score (DQS) • Developed as a single figure indicator of overall data quality of the DIAMOND model • Based upon the scope, quantity and quality of individual data items and compared against ‘ideal’ data set • A subjective weighting system is used • 1, 1 for scope (i.e. no differentiation made) • 1, 2, 3 for quantity • 0, 1, 2, 3 for quality • Using these figures gives an overall DQS of 75%

  9. Definition of the DQS • where i ranges over all data items

  10. Redefinition of the Quality Values • The Quality and Maximum Quality scores are now based on a 4 point scale, originally developed for Capability Audit purposes • A: Assessment achieved by means of subjective judgement only • B: Assessment achieved by means of subjective judgement informed by objective evidence • (Objective evidence can include OA, experimental results and historical evidence.) • C: Assessment achieved by means of judgmental interpretation of existing objective evidence • D: Assessment drawn directly from objective evidence

  11. Allocating Weighting Values • Initial weighting values ({1,1}, {1,2,3}, {0,1,2,3}) were developed for simplicity and to test the approach • Actual values used are less important than the consistency across the attributes (scope, quantity, quality, maximum quality) • Consistency is required to ensure that no one attribute takes precedence over any of the others

  12. Worked Example • Following example is based on a simple UK tool called the Wartime Planning Tool (WPT) • Represents a single engagement between 2 groups of units (units can join or leave the engagement at any time), which can be supported by assets such as rocket artillery, AH, fixed wing air • A proportion of each group of units can be held in reserve • Posture, terrain and barriers (e.g. rivers, minefields, fortifications, etc.) can all affect the outcome

  13. Variable: Tempo • Definition: Single variable which controls the ‘speed’ of the engagement • Scope: Independent • Quantity: 1 • it is only defined once • Quality: A / B • currently based on Military Judgement • Max Quality: C • historical analysis of previous engagements could provide a more robust justification for this value

  14. Variable: Barrier Vulnerability • Definition: Represents the reduced vulnerability of a unit defending behind a barrier • Scope: Independent • Quantity: 2 • defined for each equipment type • Quality: A • currently based on analytical judgement • Max Quality: B • could be improved by the use of historical analysis but the cost of this could be large due to the number of cases required

  15. Variable: Effectiveness • Definition: The effectiveness of a firer equipment against a target equipment • Scope: Independent • Quantity: 3 • defined for every combination of firer and target equipment type • Quality: D • data is provided from lower level models which themselves are based upon historical analysis and trials / exercises • Max Quality: D • the quality cannot be further improved

  16. Variable: CP Modifier • Definition: The perceived % strength of a unit prior to entering the engagement • Scope: Dependent • Quantity: 2 • defined for each unit in the engagement • Quality: B • currently based on the static scores of the remaining equipments • Max Quality: C • inclusion of effects of training, morale, etc.

  17. Potential Uses of DQS • To track how the model data sets are maturing • To assess the cost / benefits of targeted data collection exercises • To assess the potential implications of model improvements on the overall data quality • By using consistent scoring systems, to compare the relative qualities of the data sets for various models

  18. Increasing Accessibility of DQS • DQS is defined by the data structure supporting the model • Most current models are populated by a ‘database’ which conform to this data structure (either formally or informally) • DQS methodology could be incorporated into these ‘databases’ as part of the audit record • Would allow the DQS to be calculated automatically and eliminates the requirement for the Quantity attribute • the actual number of instantiations could be used instead

  19. Questions?

More Related