1 / 20

Streamflow - Programming Model for Data Streaming in Scientific Workflows

Streamflow - Programming Model for Data Streaming in Scientific Workflows. Chathura Herath. Outline. Background Motivation Approach Architecture Programming Model Domain application. Background. Scientific workflow are a good programming model for scientific computing

galeno
Download Presentation

Streamflow - Programming Model for Data Streaming in Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streamflow - Programming Model for Data Streaming inScientific Workflows Chathura Herath

  2. Outline • Background • Motivation • Approach • Architecture • Programming Model • Domain application

  3. Background • Scientific workflow are a good programming model for scientific computing • Scientific domains have high volumes of data • Most of the data are coming from sensors, catalogs and other experiments. • Most data sources are data streams or can be modeled as streams.

  4. Motivation • Huge data sources require preprocessing and mining and scaling down of data volumes. • Compute resources are limited when taking the scale of date. • Currently experts determine which data sets contain the interesting data • Preserve the workflow programming model for the user. • Users are familiar with DAG execution • Define workflow patterns for use as new workflow semantics that can capture data streams • Goal • Real-time data mining, filtering and preprocessing • Data-driven reactive workflow systems • Feedback systems

  5. Data to Information Data Storage Supercomputing Information Rate Data Rate

  6. Data to Information Data Storage Supercomputing Scientific workflow Information Rate Data Rate Stream Mining

  7. Streamflow Data Storage Supercomputing Information Rate Streamflow Data Rate

  8. Why Workflow Streaming? • Most scientific workflows are static • Considerable segment of scientific data for scientific workflows are produced by scientific sensors • Sensor data tend to behave as repeating data streams • It is possible to provide a programming abstraction to capture data search and filtration?

  9. Possible approaches • Complete decoupled systems where workflows and the data mining is separate. • Data mining rules or queries would produce outputs which would may get refined again and again. • Some interesting event would launch the workflow. • It may loose the insight and abstraction provided by the workflows • The Data mining itself may have complex data and control dependencies • Pure workflow approach • Workflow languages are not designed for streaming

  10. Stream Integration Approach • Complex Event Processing system • Interact with the streams • Filter and bundle data • Publish input datasets to workflows • Workflow system • Handles the scientific computations • Gets invoked when dataset of specified nature gets published to the CEP system Streamflow Composer Streamflow Semantics Esper StreamBase Workflow Resources

  11. STREAMingworkFLOWS -Streamflows • Streamflows are enhancement of workflows to handle data streams • Allows the complex experimental logic to be encapsulated using scientific workflows • Allows the management of large streams of data with stream mining • Provide a programming model similar to workflow composition to handle streams Workflow Streamflow

  12. Stream Integration Select * from DataminedRUCDATA(reflectivity> 3.5).win:time_batch(1h)

  13. Workflow Semantics • Conventional SOA components can be used as it is. • Workflow components may change behavior based on input data or stream. • Filter nodes will change the “cardinality” of the output stream • Aggregator will aggregate data over a window. • Generator node interface external stream to the Streamflow

  14. Programming model • Join semantics • Constant inputs need to be matched to streams. • Inputs Streamed into the workflow from Stream Engine • Outputs are published back by stream sinks and may be used for feedback.

  15. Evaluation • Deployment Overhead • Extra overhead as the workflow is flat. Θ(1) • Extra overhead are comparable to the normal workflow deployment because it may need to deploy new workflows • Runtime Latency • Latency of event arriving at the framework to be delivered the workflow.

  16. Evaluation

  17. Storms Forming Forecast Model Streaming Observations Data Mining On-Demand Grid Computing Domains • Meteorology • Astronomy Meteorology Astronomy

  18. Related work • B. Biornstad. A workflow approach to stream processing, PhD Thesis, Computer Science Department, ETH Zurich. • Y. Liu, N. Vijayakumar, and B. Plale. Stream processing in data-driven computational science. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, pages 160–167. IEEE Computer Society Washington, DC, USA, 2006. • J. Buck, S. Ha, E. Lee, and D. Messerschmitt. Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer Simulation, 4(2):155–182, 1994. – DataTurbine • Y. Cai et al. MAIDS: Mining Alarming Incidents from Data Streams Automated Learning Group, NCSA, University of Illinois at Urbana-Champaign, U.S.A.

  19. Future work • Develop a formal model for the workflow semantics • Event order guarantees • How to handle missing streams • Provenance for data streams.

  20. Questions ?

More Related