1 / 9

George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation. George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division. Common evolutionary steps: Experimental science and computational science.

clove
Download Presentation

George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

  2. Common evolutionary steps:Experimental science and computational science • Early computational science relies largelyon intuitive design and visual validation • Computational experiments are expensive • Petascale data sets are nearly as opaqueas real systems—statistical analysismust select what to visualize • Uncertainty analysis is in its infancy • Statistics is a major partner in bringing computational science to the rigor and efficiency standards of experimental science • Methods to see through, examine, and classify variability • Uncertainty quantification • Statistical design of experiments Beckerman_0611

  3. ORNL statistics and data sciences group: Cutting through the fog of high-dimensional variability • High-dimensional multivariate classification for chem-bio mass spectrometer • Climate: Compact representation and insight in high-dimensional climate simulation space • Network intrusion detectionwithout content analysis • High-dimensional density space classification • Visualization ofhigh-dimensional relationships • Fusion: Mapping clustered heating coefficient space to tokamak cross-section space Beckerman_0611 3 Ostrouchov_SDS_0611

  4. Statistical decomposition of time-varying simulation data 1t • Transform to reducenon-linearity in distribution(often density-based) • PCA computed via SVD(or ICA, FA, etc.) • Constructionof component movies • Interpretation of spatial, time, and movie components • Pairs of equal singular values indicate periodic motion 2t ƒ (Xt) = S0 +1t S1 + … + kt Sk + Et S1 Beckerman_0611 EETG GTC Simulation Data (W. Lee, Z. Lin, and S. Klasky) Decomposition shows transient wave components in time

  5. Statistical decomposition of time-varying simulation data • Time dynamicsof component pairsi  j stand outin a smallmultiples plot • Movies of any component combinations or residuals provide further viewsof dynamics ¿ Beckerman_0611 ƒ(Xt) = S0 +1t S1 + … + ktSk + Et

  6. Conditional small multiples provide access to high-dimensional relationships • Five-dimensional climate relationship involving surface heat exchange • Latent heat flux: x • Radiative heat flux: y • Various precipitationregimes: Rows • Time of day: 6 a.m. left columns, 6 p.m. right columns • Across a century of increasing CO2; century divided into three parts: Columns (1,4), (2,5), (3,6) • Hexagonal binning on alog scale controls overplotting Beckerman_0611

  7. C1 b1 c1 C2 b1 c1 bk ck C5 b1 c1 bk ck C3 b1 c1 . . . . . . . . . . . . . . . . . . . . C4 NULL RobustMap multiscale data decomposition Hierarchical data decompositionand dimension reduction with O(np) complexity Axis represents difference between data groups Each set of axes represents a local data group Internal nodes contain axes for data splits Leaf nodes contain axes for local dimension reduction n data points in p dimensions Beckerman_0611

  8. More data Human bandwidth overload? Petabytes Terabytes Gigabytes Megabytes Visualization scalability through guidance by statistical analysis No analysis More statistical analysis From local models to annotated data in global context through clustering in a high-dimensional feature space A statistical framework for guiding visualization of massive data sets Browse a petabyte of data? • To see 1% of a petabyteat 10 MB per second takes35 work days! • Statistical analysis must select views or reduce to quantities of interest in additionto fast rendering of data Beckerman_0611

  9. Contact George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division (865) 574-3137 ostrouchovg@ornl.gov Beckerman_0611 9 Ostrouchov_SDS_0611

More Related