670 likes | 989 Views
Geocoding Public Health Data. Lecture 5 Locating Street Addresses and Global Positioning System GIS and RS in Public Health Edmund Seto, Ph.D. School of Public Health University of California, Berkeley. Spatial Data.
E N D
Geocoding Public Health Data Lecture 5 Locating Street Addresses and Global Positioning System GIS and RS in Public HealthEdmund Seto, Ph.D.School of Public HealthUniversity of California, Berkeley
Spatial Data In previous lectures we talked about the wide availability of spatial data. Public Health data are often inherently spatial: Vital stats have residential street addresses A cohort study of exposure to air pollution might consider residence and work addresses The problem is how to get these locations on a map. (ie. in a format that is readily usable within a GIS) The process of getting such data placed onto a map or within a GIS is known as Geocoding.
Types of Geocoding • Relational Joins for Spatially Aggregated Data • Address Matching • Global Positioning System • Other Alternatives
Aggregated Data For example: A table of data that is grouped at the county level… How do we match this up with GIS map of counties?
Relational Join A GIS is based on the concept of relational databases, which allow us match geographic features with the corresponding attribute data. In exercises 1 and 2, we saw that a table of attributes can be “joined”with a table of geographic features based on a common identifier in GIS. Where common identifiers might be: country name, county name, postal code, etc.
cdc wonder 1999 disease of circulatory system age-adj to yr 2000 pop ICD codes I00-I99
Geocoding Limitations Beware! Your choice, or lack of choice in terms of the scale, or choice of area-based measure (individual address vs census tract vs block vs zip, etc) can affect the results of your study. Modifiable Area Unit Problem Openshaw, S., and P. Taylor, 1979: A million or so correlation coefficients: Three experiments on the modifiable area unit problem, in Statistical Applications in the Spatial Sciences, ed. N. Wrigley, (London: Pion), 127-144.
Nancy Krieger, Jarvis T. Chen, Pamela D. Waterman, Mah-Jabeen Soobader, S. V. Subramanian and Rosa Carson Geocoding and Monitoring of US Socioeconomic Inequalities in Mortality and Cancer Incidence: Does the Choice of Area-based Measure and Geographic Level Matter? The Public Health Disparities Geocoding Project Am J Epidemiol 2002; 156:471-482
Street Addresses For example: A table of individual street addresses… How do we match this up with a GIS map of streets?
Address Matching in GIS This is known as Address Matching Street geography layer: Street: name, starting & ending address 1234 University Ave 1. matching 2. interpolation Coordinates for the address
Geocoding TIGER The US Census Bureau’s TIGER files include street address information.
FRADDL TOADDL University Ave FRADDR TOADDR
Geocoding Services in ArcGIS Arcview provides a tool known as Geocoding Services that allows us to geocode, in particular, street addresses. For address matching, Geocoding Services works along the same principle as we have just discussed, relying on street geography, and interpolating the address numbers. Arcview comes with a license for StreetMap USA. For the following example, however, we will rely on TIGER files for our Geocoding Service.
Geocoding Berkeley Clinics From the Yellow Pages, I created a table of Berkeley Clinics and their addresses. We will create a Geocoding Service in Arcview for geocoding these addresses. The Geocoding Service will be based on the Berkeley streets file that we clipped out from TIGER data in exercise 2.
1. Start up ArcCatalog. Under Geocoding Services, select “Create New Geocoding Service”.
Addresses can be formated in a number of different ways, and here you can choose the style that fits the data that you’re using. For TIGER data we will use:US Streets (File-based)
5. In the Geocoding Services Manager “Add” the service we just created.
Address Matching Difficulties Address Matching isn’t as easy as it seems. Even in our little example, we only had good matches for around 50% of our addresses. And we only tried 18 addresses in Berkeley! Problems: Not all mailing addresses correspond to street addresses: PO Box 140 Warren Hall Trailer Parks Newly developed areas lack street maps for geocoding Quality of data, which could be poorly formatted address data and/or errors in street geography data.
Address Matching Difficulties Texas DOH Guideline for Geocoding http://www.tdh.state.tx.us/gis/Images/Docs/GUIDELINE_FOR_GEOCODING.pdf New Jersey Geocoding problems http://www.state.nj.us/health/chs/releasable.htm Jane McElroy’s talk - Univ of Wisc. Geocoding addresses from a large population-based study: Lessons learned and applied http://www.pophealth.wisc.edu/lecture/pm803-02/pm803-25slides.ppt
No Geographic Data For example: Mapping data that cannot be easily located on existing maps. Residential locations in rural villages Environmental sampling sites Infectious disease vector breeding & control sites
Global Positioning System • What is the Global Positioning System (GPS)? • A global navigation system • Answers the questions: • Where am I now? • How far is my destination? • How do I reach my destination?
GPS Background • A satellite-based navigation system • 24 very high-altitude orbiting satellites • Launched by U.S. Department of Defense • 24-hour, worldwide coverage • Free and reliable • Capable of very high accuracy location measurements
How Does GPS Work? • Uses radio signals transmitted from satellites to triangulate a position on the earth • 4 unknowns: x, y, z, time • Hence 4 satellites are required for triangulation
1 2 d2 d1 3 d3 Triangulating Position • 3 Satellites to locate position down to one of either 2 points. • One of those 2 points is off in space or is changing very rapidly. So theoretically if we calculate the range to each satellite exactly, then only 3 satellites would be necessary.
1 2 d2 d1 3 d3 Distance from each satellite? • Satellites are all coordinated to send the same psuedo-random code • The receiver in the field also produces the same psuedo-random code and determines the delay or offset in the code due to transmission time from each satellite. The farther away a satellite is, the larger the delay in its signal.
1 2 d2 d1 3 d3 The fourth satellite • A fourth satellite signal is needed to triangulate the position because the clocks on field receivers are not as accurate as those onboard the satellites. Hence, the fourth satellite is used to solve the position even when there is imperfect timing. d4
Sources of Inaccuracy • Multipath reception • Timing offset • Signal delays due to Earth’s ionosphere and atmosphere • Poor satellite geometry • Selective Availability (turned off May 1, 2000)
Differential Correction • Eliminates systematic errors: • S/A, receiver clock, satellite clocks, satellite position, ionosphere and atmosphere delays • Uses GPS receiver at a static known reference point to determine error in the signal • This error is similar for nearby GPS receivers at unknown positions • Error correction signal from the reference receiver can correct positions for the nearby receivers
Differential Correction Radio link sends correction information or post-processed in office Moving ROVERs at unknown locations BASE Reference station at known location
GPS Accuracy • GPS accuracy depends on other variables too: • Time spent on measurements • Averaging a bunch of measurements • Design of the receiver and antenna
GPS for Public Health • Disease case or incident sites • Sites of major exposure • Hazardous sites • Vector breeding sites • Intervention or control sites
Creating an Appropriate GPS/GIS Database • What spatial/temporal factors are relevant? • Spatial component: • Point Features • Line Features • Area Features • Attribute Data component: • What sorts of data are relevant for each particular type of spatial feature? • Spatial Resolution
Creating an Appropriate GPS/GIS Database • Fieldwork logistics are a real issue because you have to physically be at the site you want to map! • Cost of receivers vs efficiency • Battery power • Time needed for each feature • Difficulty getting to the sites
Schistosome Lifecycle Humans worms eggs Irrigation ditch exposure Fertilization miracidia cercaria Irrigation ditch habitat Snails
Geocoding Snail Density • Intermediate host for the disease is a snail that lives in irrigation ditches • Preexisting methods for estimating snail density based on sampling frames • How can we geocode these frames? • Rural area • No maps available • Roughly 500 frames within a village • Money and time are limited
One solution • Map the ditches with GPS • Line feature • Attributes • Ditch ID • Ditch properties: width, flow, construction