310 likes | 461 Views
Comparing Predictive Power in Climate Data: Clustering Matters. Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases Minneapolis, MN August 24, 2011. Outline. Motivation Networks Primer
E N D
Comparing Predictive Power in Climate Data: Clustering Matters Karsten SteinhaeuserUniversity of Minnesotajoint work withNitesh Chawla & Auroop Ganguly 12th International Symposium onSpatial and Temporal DatabasesMinneapolis, MNAugust 24, 2011
Outline • Motivation • Networks Primer • From Data to Networks • Motivating Networks in Climate Science • Descriptive Analysis and Predictive Modeling • Empirical Evaluation & Comparison • Conclusions University of Minnesota
Mining Complex Data • Complex spatio-temporal data pose unique challenges • Tobler’s First Law of Geography:“Everything is related, but nearthings more than distant.” • But are all near things equally related? • Are there phenomena explained byinteractions among distant things?(teleconnections) University of Minnesota
Networks Primer What is a Network? • Oxford English Dictionary:network, n.: Any netlike or complexsystem or collection of interrelatedthings, as topographical features,lines of transportation, ortelecommunications routes(esp. telephone lines). • My working definition:Any set of items that are connected or related to each other.(“items” and “connections” can be concrete or abstract) University of Minnesota
Networks Primer Community Detection in Networks • Identify groups of nodesthat are relatively more tightlyconnected to each other thanto other nodes in the network • Computationally challengingproblem for real-world networks University of Minnesota
From Data to Networks • Networks are pervasive insocial science, technology,and nature • Many datasets explicitlydefine network structure • But networks can also represent other types of data, framework for identifying relationships, patterns, etc. University of Minnesota
Motivating Networks in Climate Uncertainty derives from many known andoften many more unknown sources University of Minnesota
Motivating Networks in Climate • Projections of climaterely on many factors • Understanding of thephysical processes • Ability to implementthis understanding incomputational models • Assumptions aboutthe future Source: IPCC SRES and AR-4 University of Minnesota
Motivating Networks in Climate • Some processes well-understood and modeled, others much less credible • Comparison to observations shows varying skills University of Minnesota
Motivating Networks in Climate • Models cannot capture some features/processes • Comparison to observations illustrates severe geographic variability, topographic bias University of Minnesota
Motivating Networks in Climate Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Answer:Stay Tuned… University of Minnesota
High-Performance Computing Data Storage & Management Knowledge Discovery System Inputs HPCCLens - Jaguar Observations Complex Networks Data Mining HPSS Oracle DB GCM outputs Visualization ARM Data Basic OLAPCapabilities ArcGIS PowerWall System Outputs Novel Insights Decision Support Knowledge Discovery for Climate University of Minnesota
Historical Climate Data • NCEP/NCAR Reanalysis (proxy for observation) • Monthly for 60 years (1948-2007) on 5ºx5º grid • Seven variables:Sea surface temperature (SST)Sea level pressure (SLP)Geopotential height (GH)Precipitable water (PW)Relative Humidity (RH)Horizontal wind speed (HWS)Vertical wind speed (VWS) Raw Data De-Seasonalize AnomalySeries University of Minnesota
Network Construction • View global climate system as a collection of interacting oscillators[Tsonis & Roebber, 2004] • Vertices represent locations in space • Edges denote correlation in variability • Link strength estimated by correlation, low-weight edges are pruned from the network • Construct networks only for locations over the oceans • Relatively better captured by models University of Minnesota
Autocorrelation Teleconnection Geographic Properties • Examine network structure in spatial context • Link lengths computed as great-circle distance • Compare autocorrelation / de-correlation lengths for different variables, interpret within the domain Sea Level Pressure Precipitable Water Vertical Wind Speed University of Minnesota
Clustering Climate Networks • Apply community detection to partition networks • Use Walktrap algorithm[Pons & Latapy, 2006] • Efficient and works well for dense networks • Visualize spatial pattern using GIS tools • Cluster structure suggests relationships within the climate system Sea Level Pressure Precipitable Water University of Minnesota
Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes… but that’s not all. University of Minnesota
Descriptive Predictive • Network representation is able to capture interactions, reveal patterns in climate • Validate existing assumptions / knowledge • Suggest potentially new insights or hypothesesfor climate science • Want to extract the relationships between atmospheric dynamics over ocean and land • i.e., “Learn” physical phenomena from the data University of Minnesota
Predictive Modeling • Use network clusters as candidate predictors • Create response variables for target regions around the globe (illustrated below) • Build regressionmodel relatingocean clustersto land climate University of Minnesota
Illustrative Example • Predictive model for air temperature in Peru • Long-term variability highly predictable due towell-documented relation to El Nino • Small number of clusters have majority of skill • Feature selection (blue line) improves predictions Raw DataAll ClustersFeature Selection University of Minnesota
Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes and Yes… but wait, there’s more. University of Minnesota
Results on Train/Test University of Minnesota
Predictive Skill University of Minnesota
Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes, Yes, and Yes. University of Minnesota
Variations / Extensions • Compare network approach to traditional clustering methods • k-means, k-medoids, spectral, EM, etc. • Compare different types of predictive models • (linear) regression, regression trees, neural nets, support vector regression University of Minnesota
Compare Clustering Methods University of Minnesota
Compare Predictive Models University of Minnesota
Refining Model Projections University of Minnesota
Conclusions • Networks capture behavior of the climate system • Clusters (or “communities”) derived from these networks have useful predictive skill • Statistically significantly better than predictors based on clusters derived using traditional methods • Potential for advancing climate science • Understanding of physical processes • Complement climate model simulations University of Minnesota
Upcoming Events • First International Workshop on Climate Informatics, New York, NY, August 26, 2011http://www.nyas.org/climateinformatics • NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011https://c3.ndc.nasa.gov/dashlink/projects/43/ • IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011http://www.nd.edu/~dial/climkd11/ University of Minnesota
Thanks & Questions Contact ksteinha@umn.edu Personal Homepage http://www.nd.edu/~ksteinha NSF Expeditions onUnderstanding Climate Change http://climatechange.cs.umn.edu This work was supported in part by the National Science Foundation under Grants OCI-1029584 and BCS-0826958. This research was also funded in part by the project entitled “Uncertainty Assessment and Reduction for Climate Extremes and Climate Change Impacts” under the initiative “Understanding Climate Change Impact: Energy, Carbon, and Water Initiative” within the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725. University of Minnesota