240 likes | 452 Views
GLOBE. Matthew D Schmill Lindsey Gordon, Erle Ellis, Nicholas Magliocca , Tim Oates University of Maryland, Baltimore County. Analytics for Assessing Global Representativeness. GLOBE: Enhancing Scientific Workflows.
E N D
GLOBE Matthew D SchmillLindsey Gordon, Erle Ellis, Nicholas Magliocca, Tim Oates University of Maryland, Baltimore County Analytics for Assessing Global Representativeness
GLOBE: Enhancing Scientific Workflows • The goal: accelerate and improve scientific workflows for land change science • Joint work with Wayne Lutters, Erle Ellis, Tim Oates, Penny Rheingans at University of Maryland, Baltimore County • IS, CSEE, GES • Supported by NSF’s Cyber-Enabled Discovery & Innovation program • Fourth and final year of the program • Centerpiece is the GLOBE system • Enabling better science through • Real-time statistical assessments, interactive geovisualization tools • Scientific collaboration platform
Land Change Science • Study of interaction between human systems, ecosystems, the atmosphere, and other Earth Systems as mediated through human use of land. • Cross cuts many disciplines of social and natural science • Typified by this challenge: how to integrate and synthesize local studies to “globalized” results • Though GLOBE is targeted at Land Change Scientists • The concept of representativeness is a very general concern • The GLOBE system is appropriate to any discipline engaged in the synthesizing local studies into global results
Representativeness • The degree to which a sample represents a global pattern • A converse to bias • A well-represented sample is not biased, a biased sample is not representative • Sampling bias: a typical criticism anywhere that samples are used to make inferences • A land change science example: • Are you representing only accessible sites? • Accessibility as a measure of travel time to a city (Nelson, 2008) • A measure of representativeness should be • Intuitive, understandable • Statistically sound
Measures of Representativeness • Pearson’s Chi Square • Requires the variable space be discrete • Unreliable with small sample sizes • Kolmogorov-Smirnov Goodness-of-Fit Test • Does not require discrete space • Scaling and visualizing beyond 1d is hard • f-Divergence (Hellinger, Jensen-Shannon) • Requires discrete variable space
Measures of Representativeness • Pearson’s Chi Square • Requires the variable space be discrete • Unreliable with small sample sizes • Kolmogorov-Smirnov • Does not require discrete space • Scaling and visualizing beyond 1d is hard • f-Divergence (Hellinger, Jensen-Shannon) • Requires discrete variable space • Probability Estimates • Chi Square – simple • Monte Carlo methods for the rest
Representativeness Gives you Does not give you Any guidance on where to look to address sampling bias Any way to view this geographically • Quick metric for judging level of bias • Basis for comparing samples/sampling methods • A way to compute the probability of incorrectly concluding a sample is biased
Representedness • The degree to which a location or member of the population is represented by the collection • The complement of representativeness • Useful for visualization and analysis • Heat maps that show geographically where gaps lie • Can be used as a basis for case study search to fill study gaps
Computing Representedness Chi Square p-value of x2 times sign of between difference sample and population 1573mm/yr Difference in ECDF forpopulation versus sampleat unit datum Get datum for land unit(precipitation) KS Distance Locate datum in global distribution Compute representativeness for that value
Computing Representedness discrete p-value of x2 times sign of between difference sample and population Compute RGB (heat map) 49.2m Difference in ECDF forpopulation versus sampleat unit datum Get datum for land unit continuous Locate datum in global distribution Compute representativeness for that value
Addressing Bias Study Gap Search Case Weighting Addresses biases in statistical analysis by Over-weighting (> 1.0) cases in under-represented areas Under-weighting (< 1.0) cases in over-represented areas Computed using representedness • Identify areas where density in population is significantly higher than sample • Search case database using that criterion • Additional criteria available (fts, metadata)
The GLOBE Application • Our platform for better Land Change Science • By improving workflows • As a social/collaborative platform • Formally introduced to GLP OSM in March 2014 • Features • Allows researchers to create and manage case studies and their geometry • Integrates global data layers to augment user cases • Provides real-time analytics and visual tools • Similarity search • Representativeness analysis
Global Data • Organized into a Discrete Global Grid [Sahr, White, and Kimerling, 2003] • ISEA Aperature3, Hexagonal • 1.5M 96 km2 equal-area hexagons at resolution 12 (native GLOBE resolution) • Downsampled grid at resolution 10 (863.8 km2) for approximate calculations • Currently 75 global variables; variables can be processed and submitted to GLOBE • Human, remote sensing, biological, surface, climate
GLOBE Cases • GLOBE GES team has georeferenced and entered 630 cases • Currently a total 927 georeferenced, completed cases
In Summary • Representativeness an issue anywhere inferences are made from samples • Representedness a companion piece that enables geovisualization and gap search • Can be implemented many ways • Classical hypothesis test (x2) • Monte Carlo methods: f-divergence, KS-distance • GLOBE application enables representativeness workflow for land change science • Realtime assessment & visualizations • Gap search and case weighting
In the Pipeline • Multidimensional Analysis • Quantifying the impact of data scarcity (small sample size) • Heuristic tools for guiding the user • Improved visual tools • Dimensionality reduction • Identifying if and when it is possible • Automated exploratory analysis • Helping the user to identify what analysis they should be running
Thanks! • Visit us at http://globe.umbc.edu
Conceptual Overview Global Data discrete global grid GLOBE GCE analytical & computational engine GLOBE Web App visual & interactive tools GLOBE Cases geography + data