Terascale Data Organization for Discovering Multivariate Climatic Trends

Terascale Data Organization for Discovering Multivariate Climatic Trends Wesley Kendall, Markus Glatter, and Jian Huang The University of Tennessee, Knoxville Tom Peterka, Robert Latham, and Robert Ross Argonne National Laboratory

Drought Analysis • In the past ten years, drought has averaged about 2 billion dollars in damage, with 10 billion dollar damages occurring in 2002 alone [ncdc.noaa.gov] Dried-up Xiliu Lake. Courtesy of theepochtimes.com

Drought Analysis • Many parameters and uncertainty • Low vegetation and low rainfall. What is low? • High drought index. What is high? • Extended period of time. How long is extended? • Abnormal for a given region. What is abnormal? • A system that can turn these knobs of uncertainty is highly useful for advancing scientific discovery

Mountain Pine Beetle Infestation • Mountain pine beetles have destroyed millions of trees on the west coast, causing significant ecological and economical damages • Early warning system • Need supercomputing • Need on-the-fly analysis [Hargrove et al., PERS 2009] 2009 mountain pine beetle damages (red) of forests in Colorado. Courtesy of Bill Hargrove

A System For Full Range Analysis Of Scientific Data On-The-Fly • To thoroughly examine problem, must do it at full scale, not in bits and pieces • Full range analysis is crucial to climate science • High spatial resolution is needed for global ecosystem dynamics • High temporal resolution is needed for inter-annual climate variability • On-the-fly analysis is also highly important • Integral aspect of early warning systems

Our Driving Application • NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) project • Data is large and complex • High spatial resolution, up to 250 meter • High temporal resolution, up to 1 day • Many products, many variables • Query for qualitative events like drought in 1.1 TB of MODIS • 8 day intervals from Feb. 2000 – Feb. 2009 • 31,200 x 21,600 grid • 2 variables: vegetation and water index

Our Focus • I/O performance • Good I/O implementation can yield orders of magnitude improvement • ADIOS – 1,400 vs. 1.4 seconds to write 7 GB file [Lofstead et al., IPDPS 2009] • S3D tweak – 1,443 vs. 6 seconds to write 6.5 MB file [Ross et al., SC Tutorial 2009] • No extended time to prepare data • Handle application-native formats • Scalability and load balancing

Jaguar Cray XT4 at ORNL • 7,832 quad-core 2.1 GHz AMD Opteron processors • 8 GB of memory per processor • Lustre ﬁle system, 144 Object Storage Targets (OST) Jaguar Cray XT4. Courtesy of ornl.gov

How? Three Main Components: I/O, Querying, and Analysis

I/O Component Time-Varying Output Data

I/O Component Parallel File System Striping

I/O Component I/O across Time-Varying Files

Query Component Data Distribution for Load Balanced Queries

Query Component Parallel Queries

Analysis Component Parallel Sort

Analysis Component Analysis / Write Results

Repeat Repeat Query / Analysis Process

I/O Component

I/O Component • Ability to work with common formats • Use parallel netCDF library for netCDF-3 files • Maximize use of collective I/O in MPI-IO • More large and contiguous reads • Work with time-varying data across multiple files • Need file assignment process • Greedy assignment process is flexible and robust

Greedy File Assignment Four Processes, Three Files

Greedy File Assignment Each Process has Quota to Fill

Greedy File Assignment First Stage

Greedy File Assignment Second Stage

I/O Component • No netCDF benchmarks exist for Jaguar • IOR benchmark results for raw data on Jaguar [Yu et al., IPDPS 2008] • 42 GB/s bandwidth for one file on 1K processes • 36 GB/s for one file per process on 1K processes

I/O Bandwidth Results • Achieved 28 GB/s, 75% of IOR benchmark

Query Component

Query Component • Compound Boolean range queries • Vegetation < 0.2 and water < 0.3 • Conceptual queries[Glatter et al., VIS 2008] • Regular expressions • Beginning of Spring, [-.4-.4]*T[.4-max]?* Conceptual query for beginning of Spring

Query Component • Common query driven visualization methods • Bitmap indexed [Stockinger et al., VIS 2005] • Optimal search, but lengthy serial index building (≈1 minute for 1.25 GB without I/O) • Large storage overhead (≈90% of dataset) • Tree based [Glatter et al., VIS 2006] • Search time depends on query, but highly scalable • Costly load balancing (≈8 hours for 105 GB with I/O) • Search phase isn’t bottleneck, need scalability • Use tree based method and reduce load balancing time

Query Component • Want to establish trade off between time to load balance and the time to query • Test five load balancing schemes • Hilbert-order sort, round robin distribution • Z-order sort, round robin distribution • Round robin distribution • Random distribution • No distribution

Time Comparison • Random distribution, although simple, achieves best trade off between load balancing time and query time

Analysis Component

Analysis Component • Sort items for data coherence, then perform appropriate analysis • Use parallel sample sorting algorithm • Shown to work best on large data [Blelloch et al., TCS 1998]

Discovering Multivariate Climatic Trends • We used this system for looking at two important problems in climate research: drought and time-lag analysis Wes, could I use this to look at global warming? Al Gore

Drought Analysis • Multivariate, complex problem space • Low vegetation index, low water index • High drought index • Prolonged period • Abnormal occurrence • Our test case • Query for vegetation index < 0.5 and water index < 0.3 • Compute drought index, keep values > 0.5 • Must be consistent for at least a month (4 timesteps) with maximum of two separate occurrences

Drought Analysis

2006 Mexico Drought

2001 - 2002 Canadian Prairie Drought

Time-Lag Analysis • Query for • First snow • First occurrence of 0.7 < water index < 0.9 • Vegetation green-up • First occurrence of 0.4 < vegetation index < 0.6 that happens after first snow • Compute time between events, an indicator of length of winter and severity of snow season

2006 Time-Lag Analysis

Rocky Mountains In Canada

Colorado Ski Resorts

Canadian Boreal Forests

Application Timing Results I/O Preprocessing Trend Extraction Drought application timing results in seconds I/O Preprocessing Trend Extraction Time-lag application timing results in seconds

Conclusions • System provides robust approach for settings where many parameters need to be adjusted with immediate feedback • Greedy assignment of I/O is a practical solution with good performance • Random distribution of data, while simplistic, demonstrates advantage in performance for on-the-fly analysis

Future Work • Query datasets out of core • Handle more complex problems that will require orders of magnitude more queries

Acknowledgements • Funding for this work is primarily through the DOE SciDAC Institute of Ultra-Scale Visualization (http://www.ultravis.org) • Important components of the overall system were developed while supported in part by a DOE Early Career PI grant awarded to Jian Huang (No. DE- FG02-04ER25610) and by NSF grants CNS-0437508 and ACI-0329323 • The MODIS dataset was provided by NASA (http://modis.gsfc.nasa.gov) • This research used resources of the National Center for Computational Science at Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC, for DOE under Contract No. DE-AC05-00OR22725 • We would also like to thank Forrest Hoffman and David Erickson from Oak Ridge National Laboratory, and Bill Hargrove from the Eastern Forest Environmental Threat Assessment Center

Questions? • Wesley Kendall - kendall@eecs.utk.edu anl.gov utk.edu ultravis.org

References • W. Yu, J. S. Vetter, and S. Oral, “Performance Characterization and Optimization of Parallel I/O on the Cray XT,” in IPDPS `08: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, 2008. • M. Glatter, C. Mollenhour, J. Huang, and J. Gao, “Scalable Data Servers for Large Multivariate Volume Visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 1291–1298, 2006. • J. Lofstead, F. Zheng, S. Klasky, and K. Schwan, “Adaptable, Metadata Rich IO Methods for Portable High Performance IO,” in IPDPS `09: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, 2009. • W. W. Hargrove, J. P. Spruce, G. E. Gasser, and F. M. Hoffman, “Toward a National Early Warning System for Forest Disturbances Using Remotely Sensed Canopy Phenology,” Photogrammetric Engineering and Remote Sensing, Vol. 75, No. 10, pp. 1150–1156. • G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. Smith, and M. Zagha, “An Experimental Analysis of Parallel Sorting Algorithms,” Theory of Computing Systems, vol. 31, no. 2, pp. 135–167, 1998. • M. Glatter, J. Huang, S. Ahern, J. Daniel, and A. Lu, “Visualizing Temporal Patterns in Large Multivariate Data using Textual Pattern Matching,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1467–1474, 2008. • K. Stockinger, J. Shalf, K. Wu, and E. Bethel, “Query-Driven Visualization of Large Data Sets,” in VIS `05: Proceedings of the IEEE Visualization Conference, October 2005, pp. 167–174. • R. Ross, R. Latham, M. Unangst, and B. Welch, “Parallel I/O in Practice”, Tutorial at SC `09: ACM / IEEE Supercomputing Conference, November 2009.

Terascale Data Organization for Discovering Multivariate Climatic Trends