160 likes | 279 Views
Grid-based Data Stream Processing in e-Science. Richard Kuntschke 1 , Tobias Scholl 1 , Sebastian Huber 1 , Alfons Kemper 1 , Angelika Reiser 1 , Hans-Martin Adorf 2 , Gerard Lemson 3 , and Wolfgang Voges 3. 2 Max-Planck-Institut 2 für Astrophysik. 3 Max-Planck-Institut
E N D
Grid-basedData Stream Processingin e-Science Richard Kuntschke1, Tobias Scholl1, Sebastian Huber1, Alfons Kemper1, Angelika Reiser1, Hans-Martin Adorf2, Gerard Lemson3, and Wolfgang Voges3 2Max-Planck-Institut 2für Astrophysik 3Max-Planck-Institut 3für extraterrestrische 3Physik 1Lehrstuhl Informatik III: 1Datenbanksysteme 1Fakultät für Informatik 1Technische Universität München Grid-based Data Stream Processing in e-Science
Important Challenges in e-Science • In general: • Large and exponentially growing amounts of data • Distributed data archives • No unique identifiers • Uncertainty • In astrophysics: Spectral Energy Distributions (SEDs) • Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars, ...) • Generation requires spatial (astrometric) matching Grid-based Data Stream Processing in e-Science
Spatial (Astrometric) Matching Current solutions … • … load all data into main memory • Uses a lot of memory • Infeasible if memory size is insufficient • … process all data at once and deliver the complete result at the end • Inefficient • No results until all processing has completed Grid-based Data Stream Processing in e-Science
StarGlobe Grid-based P2P Data Stream Management System implemented on top of Globus In-network processing Early filtering Parallelization Pipelining Load-balancing Mobile user-defined operators Astrophysical Example Workflow Astrometric matching Performance evaluation Our Contributions Grid-based Data Stream Processing in e-Science
Publish Stream 1 Fct-Provider Subscribe transform filter Query 2 The StarGlobe Architecture filter Load mobile operators transform Grid-based Data Stream Processing in e-Science
Data-Prov. C Data-Prov. D Data-Prov. B Data-Prov. A Traditional Approach: Bring Data to Code NN_10 union Grid-based Data Stream Processing in e-Science
Data-Prov. B Data-Prov. C Data-Prov. D Data-Prov. A New Approach: Bring Code to Data Fct-Provider NN_10 NN_10 union NN_10 NN_10 NN_10 NN_10 scan scan scan scan Grid-based Data Stream Processing in e-Science
Mobile User-Defined Operators • Load user-defined operators from function provider servers in the network • Common interface for integrating external operators • Push-based iterator • Flexibility Grid-based Data Stream Processing in e-Science
StreamIterator Interface • open(Config, StreamWriter) • Configuration parameters • Writer for result stream • next(StreamIteratorEvent) • Next element in input stream • Writing output to result stream using StreamWriter.write() • close() Grid-based Data Stream Processing in e-Science
Communication betweenStreamProcessor and StreamIterator Grid-based Data Stream Processing in e-Science
Astrophysical Example Workflow Grid-based Data Stream Processing in e-Science
Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science
Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science
Distributed Query Evaluation Plan Grid-based Data Stream Processing in e-Science
Evaluation of Early Filtering Grid-based Data Stream Processing in e-Science
Conclusion • Synergies between research in computer science and other scientific disciplines, e.g., astrophysics • StarGlobe • Handling large data volumes efficiently • Early filtering, parallelization, pipelining • Returning first results early on • Pipelining • Flexible support of domain-specific application logic • Mobile user-defined operators • Results also applicable to other domains Grid-based Data Stream Processing in e-Science