200 likes | 302 Views
Data Quality Issues-Chapter 10. GiGo: garbage in, garbage out Quality Issues Terminology Sources, propagation, and management What is Data Quality? Overall fitness or suitability of data for a specific purpose. Errors, Accuracy, Precision, & Bias. Errors
E N D
Data Quality Issues-Chapter 10 • GiGo: garbage in, garbage out • Quality Issues • Terminology • Sources, propagation, and management • What is Data Quality? • Overall fitness or suitability of data for a specific purpose
Errors, Accuracy, Precision, & Bias • Errors • Difference between real world and GIS • Could be one error or the whole thing is off • Accuracy • Extent in which an estimated value approaches a true value • Can never get 100% accurate • Precision • Recorded level of detail
Errors, Accuracy, Precision, & Bias • Bias • Consistent error throughout data set • Human, equipment • Difficult to spot
Resolution • Smallest feature or data that can be displayed • RasterCell size • Vector-point size, line widths
Generalization • Process of simplifying
Completeness & Consistency Completeness • Are all instances of a feature the GIS/map claims to include, in fact, there? • Simply put, how much data is missing? Logical Consistency • The presence of contradictory relationships in the database • Some crimes recorded at place of occurrence, others at place where report taken • Data for one country is for 2000, for another its for 2001 • Annual data series not taken on same day/month etc. (sometimes called lineage error) • Data uses different source or estimation technique for different years (again, lineage)
Compatibility Slope • Compatibility • Overlay maps different scales • Can not be combined • Combining nominal and ratio • Nominal scales distinguish one item from another, but they do not rank or quantify data. • Soil Name, City Name, Polygon Identification Number • Ordinal scales identify the relative magnitudes, but they do not quantify exact differences between values. • Income = ( low , medium , or high)Slope = ( A , B ); where A = 0-4%, and B = 5-9% Crop
Applicability • Applicability • Suitability of data for commands, operations or analysis • Using your GIS data collected points for a parcel fabric
Sources of Error in GIS • Survey Data • surveyor or instrument error • choice of spheroid and datum • Data encoding and entry • E.g. keying or digitizing errors • Remotely Sensed Data or Aerial Photography • Mistakes in classification • Change in time
ManualDigitizing Errors • Cleaning and editing always required
Errors in Data Processing and Analysis • is this data suitable for analysis? • Is in a suitable format? • Different datum's? • Are the data sets compatible? • Incompatible units? • Widely different scales? • Will the output mean anything?
EVALUATING CURRENT DATA • Most of the information captured in a GIS generally exists somewhere in the office that requires the application. Some additional data may be purchased or obtained by data sharing with other agencies. • The source, accuracy, reliability, condition and scale for each document or record must be evaluated.
SOURCE • The data may be in paper or map form, or it may exist in computer files on another system. • Where did that information come from? • What is the source of the source? • Do you know how the map was compiled? • Do you know who compiled the map or record? • Have you spoken with the author to learn as much as possible about the data? • What are the strong & weak points about the data?
Data Accuracy & Reliability • There are different types of accuracy. • Absolute positional accuracy refers to the measurement of map location as it relates to a real world location (For example; a GPS coordinate point). • Relative positional accuracy is a measure of the relationships between the different features on the map. Relative accuracy compares the scaled distance between features measured from the map data with distances measured between the same features on the ground. • The other type of accuracy deals with the content of the information in the GIS database. Are there errors or missing data? A road may have positional accuracy but have the wrong road name associated to the feature. We think of this as Reliability. • Another very important aspect of reliability is how current the data sources are. If the map or record has not been properly maintained some method of bringing the document up to date must be instituted.
MAINTENANCE OF DATA • Many of the answers needed to insure proper data maintenance are flushed out in a preliminary needs and data analysis. • Specifically, maintaining data involves knowing • Frequency of change • Quantity of change • Sources of change • It must be re-iterated: If data is not going to be maintained DO NOT PUT IT IN YOUR GIS.
Condition • The condition of the source documents, especially maps, will determine how difficult the conversion will be. • Clear mylar and ink drawings will be easier to digitize (no matter what the method) than maps of poor legibility.