480 likes | 602 Views
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding. Outline. Geocoding overview Polygon geocoding Linear (street) geocoding
E N D
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding
Outline INF385T(28620) – Fall 2013 – Lecture 8 Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS
Overview INF385T(28620) – Fall 2013 – Lecture 8 Process of creating geometric representations for locations (such as points) from descriptions of locations (such as street addresses) Uses a computer program called a geocoding engine that employs code tables and rules to standardize address components
Examples INF385T(28620) – Fall 2013 – Lecture 8 • City’s economic development department • Maps technology businesses by street address to determine technology-rich areas in a city • Hospital • Maps patients to determine where to open a satellite clinic • Emergency dispatch • Maps callers’ addresses to determine who should respond to an emergency • Retail store chain • Maps store and customer locations, and compares to mapped competitor locations • Others?
Tabular data • Text file or database • Street addresses • ZIP Codes INF385T(28620) – Fall 2013 – Lecture 8
Geocoding reference layers INF385T(28620) – Fall 2013 – Lecture 8 Street centerlines ZIP Code polygons
Lecture 8 Polygon geocoding
ZIP Code geocoding INF385T(28620) – Fall 2013 – Lecture 8 • Method to map data whose geocode is for a polygon • Assign each record to its polygon • Count the records for each polygon • Join the table to the corresponding polygon layer • Symbolize using a choropleth map or graduated point symbols
ZIP Code geocoding INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocoding Points created at ZIP Code centroids INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocoding Points (attendees) spatially joined to ZIP Code polygons INF385T(28620) – Fall 2013 – Lecture 8
ZIP Code geocoding INF385T(28620) – Fall 2013 – Lecture 8 Choropleth map created
Lecture 8 Linear (street) geocoding
Linear geocoding (streets) • TIGER (Census Bureau) street maps • Four street address numbers, low to high for each side of a street segment 198 100 Oak Street 199 101 INF385T(28620) – Fall 2013 – Lecture 8
Address components Number125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Street name 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Street type 125 OakStE, Apt. 2, Pittsburgh, PA 15213 Direction, suffix 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Direction, prefix 125 EOak St, Apt. 2, Pittsburgh, PA 15213 Unit number 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Zone, city 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Zone, ZIP Code 125 Oak St E, Apt. 2, Pittsburgh, PA 15213 Items for single-number street address: Address Unit City ZIP Code 125 Oak St E Apt. 2 Pittsburgh 15213 INF385T(28620) – Fall 2013 – Lecture 8
Street Intersections • Put intersections in address field Forbes AV & Craig STGrant ST & 5th AVE North Star RD & Duncan AV • Do not include street numbers 3999 Forbes Ave & 100 Craig ST • Connectors Any unusual character (e.g., &, @, |) Just be consistent
Geocoding Flowchart Input Address Matches? Yes No Parse Address Score Matches Output No match Generate Soundex Key Best match >= 90? No Find Candidates: No Range & Soundex Key Yes Output Address INF385T(28620) – Fall 2013 – Lecture 8
Geocoding steps Original address: 125 East Oak Street 15213 Address parsed: |125|East|Oak|Street|15213 Abbreviations standardized: |125|E|Oak|St|15213 Elements assigned to match keys: [HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213 Index values calculated: [HN]:125 [SN]:Oak(Soundex #) [ST]:St [SD]:E [ZP]:15213 (Index #) INF385T(28620) – Fall 2013 – Lecture 8
Soundex index Beadles = B-342, Beattles = B-342 Schultz = S-243, Shults = S-432 Oake = O-200, Oak = O-200 Smith = S-530, Smythe = S-530 Paine = P-500, Payne = P-500 Callahan = C-450, Calahan = C-450 http://www.sconsig.com/sastips/soundex-01.htm http://www.archives.gov/research/census/soundex.html • Matches names based on how they sound (if indices match) • Translates names to a 4-digit index of 1 letter and 3 numbers • First character of name remains unchanged • Adjacent letters in the name which have the same Soundex key are assigned a single digit • If the end of the name is reached before filling 3 digits, use zeros to complete the code
Scoring candidates INF385T(28620) – Fall 2013 – Lecture 8 • Use a rule base to score source and reference matches • Start with score of 100 • Subtract points for each mismatch • Examples from rule base • Soundex indices match but street names do not (-2) • Street type missing in source (-1) • Street types do not match (-2)
Candidate streets Candidates identified: 125 East Oak Street 15213 Candidates scored and filtered: INF385T(28620) – Fall 2013 – Lecture 8
Address matched as point Best candidate matched Oak St 198 98 100 2 101 1 199 99 125 Pine Ave INF385T(28620) – Fall 2013 – Lecture 8
Lecture 8 Problems and solutions
Possible problems • Variations in street names • Fifth Avenue, Fifth Ave., 5th AV • Saw Mill Run Blvd, Route 51 • Data entry errors • Fidth Avenue • Sawmill Run • Place names • White House, Heinz Field, Empire State Building • Intersections • Fifth Avenue and Craig Street INF385T(28620) – Fall 2013 – Lecture 8
Possible problems • Zones • 100 Main ST 15101, 100 Main ST 16202 • P.O. boxes • P.O. Box 125 • Missing street data INF385T(28620) – Fall 2013 – Lecture 8
Solutions INF385T(28620) – Fall 2013 – Lecture 8 Clean data before geocoding Purchase or build high-quality maps (field verification) Use postal address standards Assign house numbers in rural areas Use alias tables
Alias table INF385T(28620) – Fall 2013 – Lecture 8
Lecture 8 geocoding layer sources
US Census TIGER files INF385T(28620) – Fall 2013 – Lecture 8 • Digitized from 1:100,000 scale maps • Pros: • Free and easy to download • Uniform across jurisdictional lines (nationally) • Street address formatting works well with standard GIS geocoding capacities • Cons: • Incomplete data • Placement of address point is approximate
TIGER line attribute table • Census street centerlines extracted from lines that make up census boundaries • tl_2009_04013_edges.shp • "FEATCAT" = 'S' INF385T(28620) – Fall 2013 – Lecture 8
MAF/TIGER INF385T(28620) – Fall 2013 – Lecture 8 • Master Address File / Topologically Integrated Geographic Encoding and Referencing • MAF is a complete inventory of housing units and businesses in the United States and its territoriesTIGER is a collection of lines as we know it • MAF produces mail-out census forms and ACS random samples • MAF/TIGER produces maps for on-the-ground census takers • MAF is confidential • TIGER 2009 and newer have much improved positional accuracy
US Census ZIP Codes INF385T(28620) – Fall 2013 – Lecture 8 ZIP Code Tabulation Areas (ZCTAs) Approximations for census purposes Do not reflect actual ZIP Code areas and are not kept up to date
Local jurisdictions INF385T(28620) – Fall 2013 – Lecture 8 • Parcel address points • Pros: Accurate placement of residential location (parcel positional data is often very good; e.g., +/- 5 meters or less) • Cons: • May need to contact individuals within agencies to get most up-to-date data • May not be available, or may cost a substantial amount of money • Data ends at jurisdictional boundaries • Data files tend to be very large
Local jurisdictions INF385T(28620) – Fall 2013 – Lecture 8 • Street centerlines • Pros: • Potential to be more up to date (often yearly updates, sometimes quarterly) • Often accuracy adequate to meet city infrastructure needs (typically +/- 10 meters or less) • Cons: • May need to contact individuals within agencies to get most up-to-date data • Data ends at jurisdictional boundaries
Private vendors INF385T(28620) – Fall 2013 – Lecture 8 • StreetMap USA • National dataset (US and Canada) • Address locators prebuilt, can geocode across the United States • GDT Dynamap/2000 US street data • Small fee for individual ZIP Code layers. • Map layers are the highest quality street map layers in terms of appearance, completeness, and accuracy. • More than one million changes every quarter • Maps include more than 14 million US street segments and include postal boundaries, landmarks, water features, and other features
Online geocoding INF385T(28620) – Fall 2013 – Lecture 8 • ArcGIS.com, Google, GeoCommons, Maptive, etc. • Pros: • Fast and easy to access • Free or inexpensive • Cons • Loss of privacy/confidentiality • Accuracy • Usability in desktop GIS
Lecture 8 Geocoding in ArcGIS
Create address locator INF385T(28620) – Fall 2013 – Lecture 8 ArcCatalog
Choose address locator style INF385T(28620) – Fall 2013 – Lecture 8 Skeleton of the address locator Based on data tables and reference layer
Address locator styles INF385T(28620) – Fall 2013 – Lecture 8
Note: there are other styles… INF385T(28620) – Fall 2013 – Lecture 8
Other styles… (build custom locators) INF385T(28620) – Fall 2013 – Lecture 8 Queens, NY Salt Lake City, UT Regions of Illinois & Wisconsin Germany … and many others!
Choose reference layer INF385T(28620) – Fall 2013 – Lecture 8 Streets, ZIP Codes
ArcGIS locator parameters INF385T(28620) – Fall 2013 – Lecture 8
Geocode in ArcMap INF385T(28620) – Fall 2013 – Lecture 8 Add tabular data and streets layer Add address locator Geocode addresses View geocoding results Interactively rematch addresses
Address rematching INF385T(28620) – Fall 2013 – Lecture 8 • Investigate unmatched addresses • Generally requires expertise and knowledge of local streets • Compare a street name in the attributes of the streets table and the address table.
Prepare log file INF385T(28620) – Fall 2013 – Lecture 8 Log file includes reasons why addresses did not get geocoded. Useful for future work on cleaning addresses or repairing street maps
Summary INF385T(28620) – Fall 2013 – Lecture 8 Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS Next week: Tutorial chapter 9, and discussion of term projects – see iSchool syllabus links: http://courses.ischool.utexas.edu/Arctur_David/2013/fall/385T/schedule.php