1 / 20

Data Quality Issues-Chapter 10

Data Quality Issues-Chapter 10. GiGo: garbage in, garbage out Quality Issues Terminology Sources, propagation, and management What is Data Quality? Overall fitness or suitability of data for a specific purpose. Errors, Accuracy, Precision, & Bias. Errors

Download Presentation

Data Quality Issues-Chapter 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality Issues-Chapter 10 • GiGo: garbage in, garbage out • Quality Issues • Terminology • Sources, propagation, and management • What is Data Quality? • Overall fitness or suitability of data for a specific purpose

  2. Errors, Accuracy, Precision, & Bias • Errors • Difference between real world and GIS • Could be one error or the whole thing is off • Accuracy • Extent in which an estimated value approaches a true value • Can never get 100% accurate • Precision • Recorded level of detail

  3. Errors, Accuracy, Precision, & Bias • Bias • Consistent error throughout data set • Human, equipment • Difficult to spot

  4. Resolution • Smallest feature or data that can be displayed • RasterCell size • Vector-point size, line widths

  5. Generalization • Process of simplifying

  6. Completeness & Consistency Completeness • Are all instances of a feature the GIS/map claims to include, in fact, there? • Simply put, how much data is missing? Logical Consistency • The presence of contradictory relationships in the database • Some crimes recorded at place of occurrence, others at place where report taken • Data for one country is for 2000, for another its for 2001 • Annual data series not taken on same day/month etc. (sometimes called lineage error) • Data uses different source or estimation technique for different years (again, lineage)

  7. Compatibility Slope • Compatibility • Overlay maps different scales • Can not be combined • Combining nominal and ratio • Nominal scales distinguish one item from another, but they do not rank or quantify data. • Soil Name, City Name, Polygon Identification Number • Ordinal scales identify the relative magnitudes, but they do not quantify exact differences between values. • Income = ( low , medium , or high)Slope = ( A , B ); where A = 0-4%, and B = 5-9% Crop

  8. Applicability • Applicability • Suitability of data for commands, operations or analysis • Using your GIS data collected points for a parcel fabric

  9. Sources of Error in GIS • Survey Data • surveyor or instrument error • choice of spheroid and datum • Data encoding and entry • E.g. keying or digitizing errors • Remotely Sensed Data or Aerial Photography • Mistakes in classification • Change in time

  10. ManualDigitizing Errors • Cleaning and editing always required

  11. Vector to Raster or Raster to Vector

  12. Errors in Data Processing and Analysis • is this data suitable for analysis? • Is in a suitable format? • Different datum's? • Are the data sets compatible? • Incompatible units? • Widely different scales? • Will the output mean anything?

  13. Classification Errors

  14. EVALUATING CURRENT DATA • Most of the information captured in a GIS generally exists somewhere in the office that requires the application. Some additional data may be purchased or obtained by data sharing with other agencies. • The source, accuracy, reliability, condition and scale for each document or record must be evaluated.

  15. SOURCE • The data may be in paper or map form, or it may exist in computer files on another system. • Where did that information come from? • What is the source of the source? • Do you know how the map was compiled? • Do you know who compiled the map or record? • Have you spoken with the author to learn as much as possible about the data? • What are the strong & weak points about the data?

  16. Data Accuracy & Reliability • There are different types of accuracy. • Absolute positional accuracy refers to the measurement of map location as it relates to a real world location (For example; a GPS coordinate point). • Relative positional accuracy is a measure of the relationships between the different features on the map. Relative accuracy compares the scaled distance between features measured from the map data with distances measured between the same features on the ground. • The other type of accuracy deals with the content of the information in the GIS database. Are there errors or missing data? A road may have positional accuracy but have the wrong road name associated to the feature. We think of this as Reliability. • Another very important aspect of reliability is how current the data sources are. If the map or record has not been properly maintained some method of bringing the document up to date must be instituted.

  17. Data Accuracy & Reliability

  18. MAINTENANCE OF DATA • Many of the answers needed to insure proper data maintenance are flushed out in a preliminary needs and data analysis. • Specifically, maintaining data involves knowing • Frequency of change • Quantity of change • Sources of change • It must be re-iterated: If data is not going to be maintained DO NOT PUT IT IN YOUR GIS.

  19. Condition • The condition of the source documents, especially maps, will determine how difficult the conversion will be. • Clear mylar and ink drawings will be easier to digitize (no matter what the method) than maps of poor legibility.

More Related