1 / 31

Comparing Predictive Power in Climate Data: Clustering Matters

Comparing Predictive Power in Climate Data: Clustering Matters. Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases Minneapolis, MN August 24, 2011. Outline. Motivation Networks Primer

walker
Download Presentation

Comparing Predictive Power in Climate Data: Clustering Matters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Predictive Power in Climate Data: Clustering Matters Karsten SteinhaeuserUniversity of Minnesotajoint work withNitesh Chawla & Auroop Ganguly 12th International Symposium onSpatial and Temporal DatabasesMinneapolis, MNAugust 24, 2011

  2. Outline • Motivation • Networks Primer • From Data to Networks • Motivating Networks in Climate Science • Descriptive Analysis and Predictive Modeling • Empirical Evaluation & Comparison • Conclusions University of Minnesota

  3. Mining Complex Data • Complex spatio-temporal data pose unique challenges • Tobler’s First Law of Geography:“Everything is related, but nearthings more than distant.” • But are all near things equally related? • Are there phenomena explained byinteractions among distant things?(teleconnections) University of Minnesota

  4. Networks Primer What is a Network? • Oxford English Dictionary:network, n.: Any netlike or complexsystem or collection of interrelatedthings, as topographical features,lines of transportation, ortelecommunications routes(esp. telephone lines). • My working definition:Any set of items that are connected or related to each other.(“items” and “connections” can be concrete or abstract) University of Minnesota

  5. Networks Primer Community Detection in Networks • Identify groups of nodesthat are relatively more tightlyconnected to each other thanto other nodes in the network • Computationally challengingproblem for real-world networks University of Minnesota

  6. From Data to Networks • Networks are pervasive insocial science, technology,and nature • Many datasets explicitlydefine network structure • But networks can also represent other types of data, framework for identifying relationships, patterns, etc. University of Minnesota

  7. Motivating Networks in Climate Uncertainty derives from many known andoften many more unknown sources University of Minnesota

  8. Motivating Networks in Climate • Projections of climaterely on many factors • Understanding of thephysical processes • Ability to implementthis understanding incomputational models • Assumptions aboutthe future Source: IPCC SRES and AR-4 University of Minnesota

  9. Motivating Networks in Climate • Some processes well-understood and modeled, others much less credible • Comparison to observations shows varying skills University of Minnesota

  10. Motivating Networks in Climate • Models cannot capture some features/processes • Comparison to observations illustrates severe geographic variability, topographic bias University of Minnesota

  11. Motivating Networks in Climate Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Answer:Stay Tuned… University of Minnesota

  12. High-Performance Computing Data Storage & Management Knowledge Discovery System Inputs HPCCLens - Jaguar Observations Complex Networks Data Mining HPSS Oracle DB GCM outputs Visualization ARM Data Basic OLAPCapabilities ArcGIS PowerWall System Outputs Novel Insights Decision Support Knowledge Discovery for Climate University of Minnesota

  13. Historical Climate Data • NCEP/NCAR Reanalysis (proxy for observation) • Monthly for 60 years (1948-2007) on 5ºx5º grid • Seven variables:Sea surface temperature (SST)Sea level pressure (SLP)Geopotential height (GH)Precipitable water (PW)Relative Humidity (RH)Horizontal wind speed (HWS)Vertical wind speed (VWS) Raw Data De-Seasonalize AnomalySeries University of Minnesota

  14. Network Construction • View global climate system as a collection of interacting oscillators[Tsonis & Roebber, 2004] • Vertices represent locations in space • Edges denote correlation in variability • Link strength estimated by correlation, low-weight edges are pruned from the network • Construct networks only for locations over the oceans • Relatively better captured by models University of Minnesota

  15. Autocorrelation Teleconnection Geographic Properties • Examine network structure in spatial context • Link lengths computed as great-circle distance • Compare autocorrelation / de-correlation lengths for different variables, interpret within the domain Sea Level Pressure Precipitable Water Vertical Wind Speed University of Minnesota

  16. Clustering Climate Networks • Apply community detection to partition networks • Use Walktrap algorithm[Pons & Latapy, 2006] • Efficient and works well for dense networks • Visualize spatial pattern using GIS tools • Cluster structure suggests relationships within the climate system Sea Level Pressure Precipitable Water University of Minnesota

  17. Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes… but that’s not all. University of Minnesota

  18. Descriptive  Predictive • Network representation is able to capture interactions, reveal patterns in climate • Validate existing assumptions / knowledge • Suggest potentially new insights or hypothesesfor climate science • Want to extract the relationships between atmospheric dynamics over ocean and land • i.e., “Learn” physical phenomena from the data University of Minnesota

  19. Predictive Modeling • Use network clusters as candidate predictors • Create response variables for target regions around the globe (illustrated below) • Build regressionmodel relatingocean clustersto land climate University of Minnesota

  20. Illustrative Example • Predictive model for air temperature in Peru • Long-term variability highly predictable due towell-documented relation to El Nino • Small number of clusters have majority of skill • Feature selection (blue line) improves predictions Raw DataAll ClustersFeature Selection University of Minnesota

  21. Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes and Yes… but wait, there’s more. University of Minnesota

  22. Results on Train/Test University of Minnesota

  23. Predictive Skill University of Minnesota

  24. Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them toimprove or refine our understanding? Revised Answer:Yes, Yes, and Yes. University of Minnesota

  25. Variations / Extensions • Compare network approach to traditional clustering methods • k-means, k-medoids, spectral, EM, etc. • Compare different types of predictive models • (linear) regression, regression trees, neural nets, support vector regression University of Minnesota

  26. Compare Clustering Methods University of Minnesota

  27. Compare Predictive Models University of Minnesota

  28. Refining Model Projections University of Minnesota

  29. Conclusions • Networks capture behavior of the climate system • Clusters (or “communities”) derived from these networks have useful predictive skill • Statistically significantly better than predictors based on clusters derived using traditional methods • Potential for advancing climate science • Understanding of physical processes • Complement climate model simulations University of Minnesota

  30. Upcoming Events • First International Workshop on Climate Informatics, New York, NY, August 26, 2011http://www.nyas.org/climateinformatics • NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011https://c3.ndc.nasa.gov/dashlink/projects/43/ • IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011http://www.nd.edu/~dial/climkd11/ University of Minnesota

  31. Thanks & Questions Contact ksteinha@umn.edu Personal Homepage http://www.nd.edu/~ksteinha NSF Expeditions onUnderstanding Climate Change http://climatechange.cs.umn.edu This work was supported in part by the National Science Foundation under Grants OCI-1029584 and BCS-0826958. This research was also funded in part by the project entitled “Uncertainty Assessment and Reduction for Climate Extremes and Climate Change Impacts” under the initiative “Understanding Climate Change Impact: Energy, Carbon, and Water Initiative” within the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725. University of Minnesota

More Related