761 likes | 1.56k Views
Georeferencing. Class 13 GISG 110. Objectives. Georeferencing What is it? Geocoding and address matching Systems of georeferencing Uses and applications Example Geocoding Requirements of geocoding Problems with addresses Scoring Limitations of georeferencing.
E N D
Georeferencing Class 13 GISG 110
Objectives Georeferencing • What is it? • Geocoding and address matching • Systems of georeferencing • Uses and applications • Example • Geocoding • Requirements of geocoding • Problems with addresses • Scoring • Limitations of georeferencing
What is georeferencing (geolocate)? • Aligning geographic data to a known coordinate system so it can be viewed, queried, and analyzed with other geographic data • May involve shifting, rotating, scaling, skewing, and in some cases warping, rubber sheeting, or orthorectifying the data
Geocoding • A GIS operation for converting street addresses into spatial data that can be displayed as features on a map • Usually by referencing address information from a street segment data layer
Address matching • A process that compares an address or table of addresses to the address attributes of a reference dataset • Determines whether a particular address falls within an address range associated with a feature in the reference dataset • If an address falls within a feature’s address range, it is considered a match and a location can be returned
How they compare… • Georeferencing (geolocating) • General or umbrella term for aligning geographic data to known coordinate system • E.g., The peaks of mountains were georeferenced (as points) for mapping. (no address needed, just x,y coordinate on map, can be manually added) • Geocoding (address matching) • More specific operation, umbrella term for address matching • Converts street addresses into spatial data • Compares a table of address(es) to address attributes of a reference dataset • E.g., The list of 500 business addresses were geocoded to build a point fc shapefile of business locations (using reference dataset) – 459 were successfully matched.
Systems of georeferencing • Placenames • Postal addresses and codes • Linear referencing systems • Cadasters and Public Land Survey System (Regional) • Latitude and longitude (global, geographic)* • UTM (global, projected)* • State Plane Coordinates (regional, projected)* * Coordinate system
Coordinate systems defined • A reference framework consisting of a set of points, lines, and /or surfaces, and a set of rules, used to define the positions of points in space in either two or three dimensions • Geographic coordinate system • Projected coordinate system
Placenames • System of authorized naming and standardizing geographic names • Multiple names for same feature • Mt Everest = Chomolungma • Florence = Firenze • Coarse spatial resolution
Postal addresses and codes • Work well for dwellings and offices, not natural features • Most people know ZIP code for home, useful mapping tool • Not useful when dwellings are not numbered consecutively along streets and large building complexes • ZIP codes can be changed whenever postal authorities want
Linear referencing systems • Identifies location by measuring distance from a defined point along a defined path • Accident on Birch St, 45 feet from Main St and Birch St intersection • Used in • managing transportation infrastructure in emergencies • Inventories of signs and bridges • Problems • Some urban streets intersect more than once • Difficult to use in rural areas (less intersections)
Cadasters and Public Land Survey System Cadaster • Map of land ownership, maintained for purposes o f taxing land or creating public record of ownership • Subdivision is process of creating new parcels by legally subdividing existing ones • Parcels are uniquely identified + persistent through time = good for georeferencing • Very few familiar with parcel identification code, limited to local officials
Cadasters and Public Land Survey System US Public Land Survey System (PLSS) • Evolved out of need to survey and distribute land of Western US in early 1800’s • Based on principal meridians and baselines • Defines land ownership (based on blocks) • Area laid out in townships (6 mile square) • Sections of township (1 mile square, numbered 1-36) • Homestead (nominal family farm) • Problems • Squares don’t work with curvature of Earth • Complicated in rugged landscapes • Still used in natural resource management (mining, agriculture)
Latitude and Longitude Latitude • The angular distance along a meridian north or south of the equator, usually measured in degrees • Also called parallels (except equator) Longitude • The angular distance of a point on the earth’s surface east or west of a prime meridian (Greenwich) • Intersect the equator and pass through the north and south poles • Expressed in degrees, minutes, and seconds
Latitude and Longitude Most commonly used reference system for locating positions on the earth
UTM Universal Tranverse Mercator (UTM) • Commonly used projected coordinate system that divides the globe into sixty zones, starting at -180 degrees longitude • Coordinates in meters • Has own central meridian
UTM • Provides georeferencing at high levels of precision for the entire globe • Adopted by many national and international mapping agencies, including NATO • Commonly used in topographic and thematic mapping, for referencing satellite imagery and as a basis for widely distributed spatial databases
UTM Advantages • UTM is frequently used • Consistent for the globe • Is a universal approach to accurate georeferencing Disadvantages • Adjacent zones can be skewed with respect to each other • Problems arise in working across zone boundaries • No simple mathematical relationship exists between coordinates of one zone and an adjacent zone
State Plane Coordinates • Individual coordinate systems adopted by U.S. state agencies • Each state's shape determines which projection is chosen to represent that state • Projections are chosen to minimize distortion over the state • A state may have 2 or more overlapping zones, each with its own projection system and grid • Units are generally in feet
State Plane Coordinates Zones divided north-south and east-west Smaller states may be 1 zone, larger states have 6 or more
State Plane Coordinates • A group of planar coordinate systems that divides the US into more than 130 zones • Each zone has own map projection and parameters • Each zone uses either NAD27 or NAD83 horizontal datum
State Plane Coordinates Advantages • May give a better representation than the UTM system for a (US) state's area • Coordinates may be simpler than those of UTM Disadvantages • Not universal from state to state • Problems may arise at the boundaries of projections
Systems of georeferencing review • Placenames • Postal addresses and codes • Linear referencing systems • Cadasters and Public Land Survey System (Regional) • Latitude and longitude (global, geographic)* • UTM (global, projected)* • State Plane Coordinates (regional, projected)* * Coordinate system
Uses of georeferenced data • Important tool for emergency response, package delivery and marketing applications • Booming business in software, applications, and creation and maintenance of base maps • Mapquest.com • Mapblast.com
Georeferencing applications Geocoding converts addresses into a GIS database for use in the following applications • Emergency response (911) • Real estate • Crime analysis • Package delivery • Market analysis • Distribution of clients, customers, membership, etc. • Trade area assessment • Mass mailing • Simple navigation
Example of georeferencing Locate a mailing address • Uses information in an address to assign it to various geographic features • Following mailing address table is listed from least specific to most specific
Knowing the address is in US provides initial sort for international mail Where in the world is SC?
Where in the US is South Carolina? South Carolina isone of 50 states All SC address canbe directly joined toa GIS polygonrepresenting the state
Where in Richland County is 29204? ZIP code 29204 is one of 15 five digit ZIP codes in Richland County, SC The five digit ZIP code polygon boundaries can be extracted for any area from Census TIGER files
Where is 2200 Gervais St? • With street name and number plus five digit ZIP, a x,y coordinate is assigned • Level of precision based on type of basemap data available • Parcels boundaries • Street segment • 9 digit ZIP code
What are the coordinates of 2200 Gervais St? Geocoding creates a point theme UTM system used in example Note difference in addresses based on different approaches Most GIS software does not include x,y coordinates as attributes
Palmetto Seafood Company 2200 Gervais St
Geocoding process Goal: To build a GIS database from a set of addresses • Build or obtain reference data (streets) • Clean/format address data based on software requirements • In software, set up rules for matching (address locator) • Determine match rate and rejects • Perform batch geocoding • View match rate, if good (>75%): • Analyze or map data • If not good (<75%), go back to Step 2 with rejected addresses
Geocoding requirements Three files or datasets • Address data (project data to match) • Reference data (data to match to) • Address locator (set of rules for matching)
Address locator • The address locator is the main tool for geocoding in ArcGIS • It is a dataset that contains information including address attributes, indexes, and queries for geocoding • Contains a snapshot of the reference data that is used for geocoding • In the process of geocoding, the reference data is no longer needed after the locator is created
Address locator styles • There are a number of different address locator styles • Choosing the right one depends on knowing what type of reference data you have and in what format your addresses will be • For example, if you want to geocode U.S. ZIP Code data, you may use the 5-Digit ZIP address locator style for creating a ZIP Code address locator based on a ZIP Code point reference feature class
Address locators • Also contains information of how an address is standardized (Ave, AV, or Avenue), searching methods for possible matches, and what output information of a match would be returned • Can redistribute by copying and pasting the address locators onto a different workspace or network
Sources of Reference Data • ESRI and other private vendors • Web based applications (mapblast) • Digital Yellow pages • Demographic and Marketing Firms
Problems with address data records • Every set of addresses have some problems that make it difficult to obtain a 100% match rate • These problems can be grouped into the following categories: • Lack of street names - PO Boxes, Rural Routes • Human errors in address records - typos, spelling errors • Inconsistency of address records - Multiple spellings (Green & Greene)
Address errors The block that contains the Palmetto Seafood Company has multiple lots with the same address front and rear addresses lots with no address buildings with numbers out of sequence What are the actual addresses of the 2200 block of Gervais St?
Scoring systems • Addresses are matched to specific records in the base map file on the basis of a scoring system • A perfect match yields a score of 100 • A match score between 75 and 100 can generally be considered a good match • The batch match process will not match the address if it yields a match score below the minimum match score
Scoring system adjustments Minimum match scores • The match process will not match the address if it yields a score below the minimum match score • A perfect match yields a score of 100 • A match score between 75 and 100 can generally be considered a good match • The default is 60 Spelling sensitivity • Determines how exact the spelling must be for a record in the base map file to be candidate for matching (user defined) • This also includes road type suffixes and directional prefixes
Minimum match score If the minimum match score was set at 80, then only the first two records would have matched
Spelling sensitivity If the spelling sensitivity is reduced to 50, then three candidate street records are found for 2200 Gerv St.
Limitations of georeferencing • Poor match rates result in incomplete databases • New subdivisions are not included in geocoding databases • Resolution for features in a layer based on the level of georeferencing accuracy • Positional accuracy of data can easily range from a few hundred feet to several miles
Limitations of georeferencing • Mail address often not at the location of the feature (PO boxes at Post Office) • Rural addresses (route and box numbers) not handled with geocoding software • Ideally, perfect look-up table with a one-to-one match
Review • Name three systems of georeferencing. • Geocoding aligns geographic data to a known coordinate system. It does not need a reference data set for comparison. (T/F) • Rural addresses (route and box numbers) are easily handled with geocoding software. (T/F) • In scoring address matched data, a match score greater than 75 can generally be considered a good match. (T/F) • Three files or datasets required before geocoding include: address data, reference data, and an address location. (T/F) • List three applications where georeferenced addresses are used. • In geocoding, the reference data is the main tool for geocoding in ArcGIS. (T/F) • State Plane Coordinate units are generally in feet. (T/F) • UTM coordinates are expressed in kilometers. (T/F) • A limitation of georeferencing is that new subdivisions are not included in geocoding databases. (T/F) • The equator is considered a parallel of latitude. (T/F)