410 likes | 521 Views
Mining Complex Evolutionary Phenomena. D. Thompson, B. Gatlin Center for Computational Sytems Mississippi State University. M. Jiang, M. Coatney, S. Mehta, S. Parthasarthy, R. Machiraju Computer and Information Science The Ohio State University. T-S. Choy, S. Barr, J. Wilkins
E N D
Mining Complex Evolutionary Phenomena D. Thompson, B. Gatlin Center for Computational SytemsMississippi State University M. Jiang, M. Coatney, S. Mehta, S. Parthasarthy, R. Machiraju Computer and Information Science The Ohio State University T-S. Choy, S. Barr, J. Wilkins Department of Physics Ohio State University
Insights Into Evolutions • Study evolution through simulations • Model them using continuum models • Obtain discrete models and solve • Generate data • However, …
Data Horror Stories … 4.5 million points 1500 time steps with full volume output every 4 time steps (375 solutions) 750 MB per solution 281.25GB of data O(108) grid points Generates >10 Terabytes per day (every day) Write to disk every 1/1000 time steps (99.9% discarded) Final database ~1 Terabyte All analysis is done after final database is obtained …
Solutions ! • Get the rings of the smoke • Track them in time • Mine their properties • Use some science drivers
Driver 1 - CFD Vortices
CFD Of Interest – Bronchial Flow • Complex Non-rigid, Fractal-like Geometry • Deep recursive branching structure • Need insights into how flow changes • Study Vortices, swirling flow • Q: Persistence of vortex ? • Implications • Pulmonary drug delivery • Carcinogen Deposition
Object of Study: Vortices • Swirling regions • Core (Center of vortex) and swirling streamlines …
Driver 2 - Material Formation Grain GrainBoundary
MD Of Interest – Defect Evolution • Active Device sizes (Si-based transistors) passive components (alloys) are shrinking • At sub-micron levels extended defects effect performance • Extended defects • Si is doped with Boron in a “Hot Bath” • Non-uniform solidification • Arise from point defects • Study evolution of point defects and formation of extended defects • Q: What structures finally remain ?
Object of Study: Defects Defect Atoms - Red ! • Point defects – interstitial and vacancy • Interstitial – Si atoms located at non-bulk position
Problem Statement • Need – Locating, Characterizing & Tracking Structures in Large Domains. • Acts of Discovery and Perseverance! • Approach desired • Tied to simulations • Multiple time scales • Organized Search • Encode Structure, dynamics and relationships • Incorporate complex physics in discovery • Classification and categorization (similarity) • Verification of discovered entities for veracity • Generalize to other domains
Framework ApplicationCFD, MD, … Sensor Multires Transforms Meta-stability Detection Transient Detection Feature Mining Event Detection Feature Tracking Catalog Spatio-temporal Rule Mining
Components • Sensors – • Monitoring a stream • Swirl (CFD), Energy (MD) • Multiresolution Analysis • Temporal wavelet transform • Casual transforms • Eulerian Framework • Can be used with a spatial sub-division • Event Detection • Changes in Feature Demographics • Birth, death, continuation • Aggregation, bifurcation • Has impact on tracking
Tracking - Correspondence Lagrangian Framework
Feature Mining Mechanics • Do not just use raw data • Features – A feature is a manifestation of the correlations between various parameters • Feature Mining – • Extract meta-stable features using underlying physics • Describe features as tangible shapes
Shapes Point cloud Proximity graphs Conical frusta
Similar Efforts - CFD Marusic, Kumar, Karypis, Interrante, U of Minn. Frequent subgraphs
Similar Efforts - MD • Defect is infrequent, atomsets of bulk are not ! • Run common substructure discovery algorithm • Get bulk ! • Remove atoms contained in common substructure atomsets • Remainder of structure is defect! Alloys (Ni3Al) I1 Defect !
Our Efforts Finding Needles In HayStack
Feature Mining 1 Data Transform Tour Grid Operator Aggregate Classify Points Denoise Track Rank Catalog ROIs Classify-Aggregate
Applying To Defect Detection Visit all atom sites Atom-site: Is it part of defect ? Spatially aggregate atomsin located areas ! Works for quenched defects (local equilibria)
Feature Mining for Defects • Build spatially local classifiers • Define Bulk • Form Rules to define Bulk --- C1, C2,…,Cn • Typical Rules: • C1 = prescribed bond length • C2 = prescribed bond angle • Defect is not bulk
Feature Mining for Defects • Core Defect Atoms will satisfy C = ~C1 AND ~C2AND ~C3 … AND ~Cn • Find neighborhood by locating atoms which satisfy D = ~C1 OR ~ C2OR ~C3 …. OR ~Cn • Defect = Embed C graph in D graph • D is needed to deal with noise and uncertainty of conditions Ci • Cluster all atoms in D
Results – I3 Defects I3A Defect I3B Defect
Related Work - SAL Aggregate Classify Original Redescribe Yip&Zhao 96
Does It Work Always ? • Compute Swirl • Local Classification Method • Swirling regions contain vortices • False Positives ! • Cannot extract structures! Classify-Aggregate
Solution - Feature Mining 2 Data Transform Tour Grid Operator Verify Aggregate Denoise Track Rank Catalog ROIs Aggregate-Classify (Verify)
Classify-Aggregate Yellow: Good Green:Bad Yellow ones really swirl !
Classifier • Simple and efficient ! • Can be error prone • Since One verifies • Point-based approach: • Label neighbors • Combinatorial: • Locally check for complete triangles
2 Swirling Criteria Verification Tools Verification
Defects at Finite Temp. Visit all atom sites Atom-site Is part of defect ? Spatially aggregate atomsin located areas ! Quench defect to verify
Current Work • Streaming • Tracking and Correspondence • Shape Descriptors • Data Structures for Data Management • Spatio-temporal associations
Summary • Computational Sciences need computational instruments • Need to be scalable and use all lessons learned from parallel, distributed, streaming and out-of-core implemenations • Need to exploit underlying source of data • Should provide good hooks to data-mining and intelligent systems • Need very Interdisciplinary work !