110 likes | 346 Views
Completeness. February 27, 2006 Geog 458: Map Sources and Errors. Outlines. Completeness Testing completeness Documenting completeness in the metadata Data quality. Completeness. The data set is called “complete” if what’s defined/needed is encoded in the DB
E N D
Completeness February 27, 2006 Geog 458: Map Sources and Errors
Outlines • Completeness • Testing completeness • Documenting completeness in the metadata • Data quality
Completeness • The data set is called “complete” if what’s defined/needed is encoded in the DB • Spatial completeness: degree to which all features are captured corresponding to data capture specifications • Attribute completeness: degree to which the relevant attributes of a feature are available corresponding to a given capture specifications • Data quality component that describes whether the entity objects represent all entity instances of the corresponding abstract universe • Relationship between the objects represented in the data set and the abstract universe of all such objects
Abstract universe • Can be thought of a reference frame • Data set = digital representation of a subset of (perceived) reality • Abstract universe = terrain nominale; abstract view of the universe; universe of discourse; miniworld; subset of perceived reality (it involves selection and abstraction process) • Data set is intended to represent the abstract universe • Since completeness means the relationship between data set and abstract universe, a useful characterization of completeness relies on a comprehensive definition of the abstract universe
Data completeness vs. Model completeness • It is possible to classify completeness into two categories depending on how the abstract universe is defined or specified • Data completeness: the abstract universe is defined on generic uses of data; application-independent • Model completeness: the abstract universe is defined on specific uses of data; application-dependent • So which would be more flexible? Which would have multiple versions of completeness on the same data?
Spatial completeness • Let’s say the abstract universe “lake” is defined as the water body with the area more than 1 square mile • Check the number of entities in the abstract universe; set this number to A • Check the number of entities encoded in the DB (lake data set); set this number to B • Completeness would be B/A • The definition of “lake” varies depending on applications, thus so does A vary
Attribute completeness • Subordinated to spatial completeness • Define what the relevant attributes will be • Lake will have area, depth, type (freshwater), and so on • Check if attribute values are missing for entity in hand • Geometric description might be incomplete (area) • Report on the number of missing values out of the total number of features for each attribute
Relation to other data quality components • Completeness may affect the logical consistency of a data set • Missing arc, node connectivity, closed polygon • Missing attribute (left and right-node) connectivity • Missing attribute in PK key constraint • Missing attribute in FK referential constraint • So where do I document this in completeness or logical consistency? • If incompleteness causes logical inconsistency, describe it in logical consistency section • Else it will be included in completeness section
Data quality vs. fitness of use • Data quality • The totality of features and characteristics of a data set that bear on its ability to satisfy a stated set of requirements; application-independent • Fitness of use • The totality of features and characteristics of a data set that bear on its ability to satisfy a set of requirements given by the application; application-dependent
Data quality vs. fitness of use • Data quality information is usually provided by the producer of a data set • Fitness of use is assessed when evaluating the use of a data set by users this principle is referred to truth in labelling (users are responsible for quality control indeed) • See different approaches to quality control in the lecture note on spatial data quality
Data quality report • What you are reporting in data quality section of the metadata will be data-independent, so that it can be reused for any potential uses of the data • Reporting data quality can be thought of the process for evaluating the ability of the data set to meet up to the requirements • In that how well the value is close to ground truth (attribute/positional accuracy), whether it exhibits lack of contradictions (logical consistency), and whether what’s relevant is encoded in the DB (completeness)