1 / 11

Completeness

Completeness. February 27, 2006 Geog 458: Map Sources and Errors. Outlines. Completeness Testing completeness Documenting completeness in the metadata Data quality. Completeness. The data set is called “complete” if what’s defined/needed is encoded in the DB

nixie
Download Presentation

Completeness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Completeness February 27, 2006 Geog 458: Map Sources and Errors

  2. Outlines • Completeness • Testing completeness • Documenting completeness in the metadata • Data quality

  3. Completeness • The data set is called “complete” if what’s defined/needed is encoded in the DB • Spatial completeness: degree to which all features are captured corresponding to data capture specifications • Attribute completeness: degree to which the relevant attributes of a feature are available corresponding to a given capture specifications • Data quality component that describes whether the entity objects represent all entity instances of the corresponding abstract universe • Relationship between the objects represented in the data set and the abstract universe of all such objects

  4. Abstract universe • Can be thought of a reference frame • Data set = digital representation of a subset of (perceived) reality • Abstract universe = terrain nominale; abstract view of the universe; universe of discourse; miniworld; subset of perceived reality (it involves selection and abstraction process) • Data set is intended to represent the abstract universe • Since completeness means the relationship between data set and abstract universe, a useful characterization of completeness relies on a comprehensive definition of the abstract universe

  5. Data completeness vs. Model completeness • It is possible to classify completeness into two categories depending on how the abstract universe is defined or specified • Data completeness: the abstract universe is defined on generic uses of data; application-independent • Model completeness: the abstract universe is defined on specific uses of data; application-dependent • So which would be more flexible? Which would have multiple versions of completeness on the same data?

  6. Spatial completeness • Let’s say the abstract universe “lake” is defined as the water body with the area more than 1 square mile • Check the number of entities in the abstract universe; set this number to A • Check the number of entities encoded in the DB (lake data set); set this number to B • Completeness would be B/A • The definition of “lake” varies depending on applications, thus so does A vary

  7. Attribute completeness • Subordinated to spatial completeness • Define what the relevant attributes will be • Lake will have area, depth, type (freshwater), and so on • Check if attribute values are missing for entity in hand • Geometric description might be incomplete (area) • Report on the number of missing values out of the total number of features for each attribute

  8. Relation to other data quality components • Completeness may affect the logical consistency of a data set • Missing arc, node  connectivity, closed polygon • Missing attribute (left and right-node)  connectivity • Missing attribute in PK  key constraint • Missing attribute in FK  referential constraint • So where do I document this in completeness or logical consistency? • If incompleteness causes logical inconsistency, describe it in logical consistency section • Else it will be included in completeness section

  9. Data quality vs. fitness of use • Data quality • The totality of features and characteristics of a data set that bear on its ability to satisfy a stated set of requirements; application-independent • Fitness of use • The totality of features and characteristics of a data set that bear on its ability to satisfy a set of requirements given by the application; application-dependent

  10. Data quality vs. fitness of use • Data quality information is usually provided by the producer of a data set • Fitness of use is assessed when evaluating the use of a data set by users  this principle is referred to truth in labelling (users are responsible for quality control indeed) • See different approaches to quality control in the lecture note on spatial data quality

  11. Data quality report • What you are reporting in data quality section of the metadata will be data-independent, so that it can be reused for any potential uses of the data • Reporting data quality can be thought of the process for evaluating the ability of the data set to meet up to the requirements • In that how well the value is close to ground truth (attribute/positional accuracy), whether it exhibits lack of contradictions (logical consistency), and whether what’s relevant is encoded in the DB (completeness)

More Related