190 likes | 202 Views
Learn about the development and definition of the Data Quality Score (DQS) for assessing overall data quality in models. Explore a worked example and potential uses of the DQS in tracking data maturity and making informed decisions.
E N D
The Data Quality Score Pete Bailey Dstl Policy and Capability Studies
Agenda • Background • Development of the Data Quality Score (DQS) • Definition of the DQS • Worked example • Potential uses of the DQS • Questions
Background • An OA model is only as good as the data supporting it • it could be argued that the data is more important than the model • No simple metric available to indicate the overall quality of a data set • Difficult to assess the increasing maturity of those data sets • Difficult to compare data sets for different scenarios in the same model • Difficult, if not impossible, to compare the quality of data sets developed in support of a range of models
DIAMOND Data Paper • Data paper produced as part of the development and validation of the DIAMOND model • Recorded the definition of every data item in the model • Recorded the data generation process for every data item
Data Table • Records the following information about every data item in DIAMOND • Dimension • Scope (scenario independent, scenario dependent) • Quantity (defined once, a few times, many times) • Quality (not applicable, low, medium, high) • Maximum Quality attainable • Source • 267 data items detailed
Data Quality Score (DQS) • Developed as a single figure indicator of overall data quality of the DIAMOND model • Based upon the scope, quantity and quality of individual data items and compared against ‘ideal’ data set • A subjective weighting system is used • 1, 1 for scope (i.e. no differentiation made) • 1, 2, 3 for quantity • 0, 1, 2, 3 for quality • Using these figures gives an overall DQS of 75%
Definition of the DQS • where i ranges over all data items
Redefinition of the Quality Values • The Quality and Maximum Quality scores are now based on a 4 point scale, originally developed for Capability Audit purposes • A: Assessment achieved by means of subjective judgement only • B: Assessment achieved by means of subjective judgement informed by objective evidence • (Objective evidence can include OA, experimental results and historical evidence.) • C: Assessment achieved by means of judgmental interpretation of existing objective evidence • D: Assessment drawn directly from objective evidence
Allocating Weighting Values • Initial weighting values ({1,1}, {1,2,3}, {0,1,2,3}) were developed for simplicity and to test the approach • Actual values used are less important than the consistency across the attributes (scope, quantity, quality, maximum quality) • Consistency is required to ensure that no one attribute takes precedence over any of the others
Worked Example • Following example is based on a simple UK tool called the Wartime Planning Tool (WPT) • Represents a single engagement between 2 groups of units (units can join or leave the engagement at any time), which can be supported by assets such as rocket artillery, AH, fixed wing air • A proportion of each group of units can be held in reserve • Posture, terrain and barriers (e.g. rivers, minefields, fortifications, etc.) can all affect the outcome
Variable: Tempo • Definition: Single variable which controls the ‘speed’ of the engagement • Scope: Independent • Quantity: 1 • it is only defined once • Quality: A / B • currently based on Military Judgement • Max Quality: C • historical analysis of previous engagements could provide a more robust justification for this value
Variable: Barrier Vulnerability • Definition: Represents the reduced vulnerability of a unit defending behind a barrier • Scope: Independent • Quantity: 2 • defined for each equipment type • Quality: A • currently based on analytical judgement • Max Quality: B • could be improved by the use of historical analysis but the cost of this could be large due to the number of cases required
Variable: Effectiveness • Definition: The effectiveness of a firer equipment against a target equipment • Scope: Independent • Quantity: 3 • defined for every combination of firer and target equipment type • Quality: D • data is provided from lower level models which themselves are based upon historical analysis and trials / exercises • Max Quality: D • the quality cannot be further improved
Variable: CP Modifier • Definition: The perceived % strength of a unit prior to entering the engagement • Scope: Dependent • Quantity: 2 • defined for each unit in the engagement • Quality: B • currently based on the static scores of the remaining equipments • Max Quality: C • inclusion of effects of training, morale, etc.
Potential Uses of DQS • To track how the model data sets are maturing • To assess the cost / benefits of targeted data collection exercises • To assess the potential implications of model improvements on the overall data quality • By using consistent scoring systems, to compare the relative qualities of the data sets for various models
Increasing Accessibility of DQS • DQS is defined by the data structure supporting the model • Most current models are populated by a ‘database’ which conform to this data structure (either formally or informally) • DQS methodology could be incorporated into these ‘databases’ as part of the audit record • Would allow the DQS to be calculated automatically and eliminates the requirement for the Quantity attribute • the actual number of instantiations could be used instead