170 likes | 178 Views
This project aims to empower health researchers by integrating spatial data into their research through scalable geospatial databases, analytics, and visualizations. It explores geospatial health context, facilitates geospatial analysis, and leverages big data for enhanced health registry insights.
E N D
Adding Value to Registries through Geospatial Big Data FusionGeospatial Health Context Big Table Facilitating Geospatial Analysis in Health Research Tim Haithcoat & Chi-Ren Shyu University of Missouri Informatics Institute June 13, 2019
THE GOAL Develop robust processes for health researchers and practitioners to more easily incorporate spatially integrated health, social, cultural, access, infrastructure, and environmental parameters/factors and spatial context in their research using scalable geospatially enabled databases, analytics, and visualizations.
Unique Infrastructure Typical Relational DB Typical Geospatial DB
Tessellation over Census blocks Block centroids = 343,565 points
Tessellation with Census Centroids Thiessen Proximal Polygons
Extent of the Data Table • Defined a point file with 318 million points for contiguous 48 states. • How many columns (attributes)? Projection 10,000+ • How many data sets? US Data.gov – Federal GIS > 1,000 • What is the size of the table? 1.5 Gb/attribute Growth Projection90 Tb • Using Spark big data ecosystem • Australian Cancer Atlas • Determined Main Common Keys • Census Geography • Zip Code • Watershed • Etc. • Created point summary counts for all geographies to use for analytics
Establishing Context • Inter-layer Distance measures • Coded 1st & 2nd Order Relationships
Registry Data Loading Registry Data Records
Leveraging Geospatial in Registries • Geocoding of Registry • Attach an X,Y coordinate to each record with associated confidence (strongest) • Attach a primary key(s) (i.e. Census ID, Zip Code Tabulation Area) based on geocode of address to create ‘easy’ linkage to associated data when needed. • Use geocoded location to determine association with a primary key to move attributes of interest directly to the registry record. • Determine what information, and at what geographic summarization level, registry data gets shared
Using the Big Data Table Geospatial Health Context Big Table Data Required Socio-Economic Demographic Infrastructure Environmental Cultural Derived Physical Modeled User Data Address Zip Code Tract County Inquiry Type Exploratory Simple Question Complex Question Complex Question w Temporal Aggregation Unit Zip Code Tract Block Group County Watershed School Dist Health Service Area LIFESTYLE 50% HEALTH CARE 25% BIOLOGY 15% ENVIRON 10%
Choose an Issue Right-Sizing Care: Over the next decade, the aging American population is expected to place increased demands on the U.S. healthcare system. For older Americans, a review of medical records, found that 38% of doctor visits, including 27% of Emergency Room (E.R.) visits could have been replaced with telemedicine. Effort Required Census data tables (2 hrs) Census geography (1 hr) Hospital types (2 hrs) Road network zones (time and/or distance) (1 week) Broadband type (2 hrs) Query Elements Age > 60 years Gender Hospital Service Area Broadband Service The Data Needed Census age & gender Hospital locations Attributed road network Broadband attributes Census geography
Example Complex Questions • What factors in different demographic groups or locations discourage people from cancer treatment? • How can we update our healthcare delivery strategy based on availability of medical services with relation to cancer risk based on population growth, ageing, and cancer type? • Can we identify any new relationships between cancer occurrence and environmental, socio-cultural, infrastructural, or other data to explore or generate new hypotheses? • What is the magnitude of population cancer disparities in an area, where are they located, and what factors might be creating these ‘hot spots’?
Relevance • The Geospatial Health Context Big Table provides: • Cancer Researchers an integrated big data repository to: • Search - Enable stronger research designs (i.e. develop sampling / surveillance approached). • Explore - Understand spatial interaction of a multitude of attributes. • Ability to add contextual information based on neighborhood • Decision Makers with a new tool to evaluate policy implications and focus on areas / populations affected. • Public Health Professionals an ability to identify, mitigate, and potentially prevent health disparities in cancer incidence.
Acknowledgments Collaborators: Chi-Ren Shyu, PhD Richard D. Hammer, M.D. Tim Matisziw, PhD Iris Zachary, PhD Eileen Avery, PhD Kelly Bowers, D.O. Mirna Becevic, PhD This work is supported by the NIH BD2K T32 Training grant (5T32LM012410-02) The Big Data ecosystem is supported by the NSF CNS-1429294 Looking for research collaborations: Contact: HaithcoatT@missouri.edu