1 / 15

Cleansing and Geocoding Spatial Data

Cleansing and Geocoding Spatial Data. Jyoti Kamal, PhD . Introduction.

pebbles
Download Presentation

Cleansing and Geocoding Spatial Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cleansing and Geocoding Spatial Data Jyoti Kamal, PhD

  2. Introduction In developing an enterprise-wide clinical and financial data warehouse at the Ohio State University Medical Center (OSUMC), we invested tremendous time and effort in analyzing the quality of historical data coming from the source system as a first step to our commitment to data quality. The OSUMC has developed an in house technique and process to cleanse and geocode spatial data in a single pass on the city, state and zip code combination. In developing this technique and process there were several goals in mind.

  3. Goals Identify potential marketing regions for hospital outreach clinic services. Identify regions by patient demographics for direct marketing mailings. Correlate abnormal diagnosis with patient regional distributions. Identify geographic correlations with other patient trends. Identify regions by physician referrals. Bio-surveillance: Retrospective public health surveillance.

  4. Available Data 7 years worth of patient addresses 7 years worth of patient demographics Data Issues Unknown integrity of data No geocoding of data

  5. Cleansing Logic and Process Purchased list of valid city, state and zip code combinations along with zip code to county mapping, zip code centered latitude and longitude coordinates Data loaded into warehouse and refreshed monthly to keep synchronized with United States Postal Service Tool: Informix Data Stage TM used for extraction, transformation and load.

  6. Cleansing Logic and Process continued Figure 1. Cleansing logic showing different transform stages and output files.

  7. Cleansing Logic and Process continued Table 1. Sample of data before and after cleansing and the additional geographic information attached with each output record. Changes in the cleansed data are italicized

  8. Spatial Analysis To provide analytical ease we divided the geographic area around OSUMC in predefined rings and sectors. To define the rings we chose imaginary concentric circles in increments of ten miles (larger increment after hundred miles) and partitioned them in eight equal sectors of 45 degrees, each indicating a broad different direction. One can, in fact, choose any increment and angle depending on the granularity of distribution desired. With this picture in mind, we could write an application, which in combination with patient zip code and the data shown in table 2, could determine which ring and sector the patient zip codes would fall. From this information, in turn, it was easy to determine what percent of the OSUMC patient population fell in which ring sector

  9. Spatial Analysis Table 2. Sample of distance and bearing calculations in reference to the OSUMC

  10. Spatial Analysis Figure 2. Center of these circles is the OSUMC. The rings are drawn in increments of 10 miles and the circles are divided in eight equal sectors of 45 degrees, each signifying a broad direction. N: North; NE: North East; E: East; SE: South East; S: South; SW: South West; W: West; NW: North West

  11. Results Figure 3. Patient distribution by distance and bearing in relationship to the OSUMC

  12. Results Figure 4. Top five services by patient count and gender distribution for patients that are in the age range of 40-45, live within 10 to 20 miles on the south east direction of the OSUMC

  13. Results Figure 5. Patient distribution by in relationship to the OSUMC for oncology services

  14. Conclusions Process was sufficient for the OSUMC Inexpensive process Extended the application beyond patient data to physician data Satisfied business analysis and marketing personnel with service, gender and age patient distribution data

  15. Acknowledgments Authors thank Dr. Joel Saltz, Dr. Hagop Mekhjian and Maxine Moehring for their support and encouragement. The ongoing help of the Data warehouse team, in particular, Jennifer Santangelo, Mike Ostrander and Israel Rosales is greatly appreciated. Kevin Li and Mike Ostrander’s work on the validation process and OLAP application is invaluable.

More Related