410 likes | 565 Views
A Tale of Two Workflows Roger Barga , Microsoft Research (MSR) Nelson Araujo, Dean Guo, Jared Jackson, Microsoft Research The creative input of the Trident MSR summer ‘08 interns. MSR (Trident) Summer ‘08 Interns. Eran Chinthaka Indiana University. David Koop University of Utah.
E N D
A Tale of Two Workflows Roger Barga, Microsoft Research (MSR)Nelson Araujo, Dean Guo, Jared Jackson, Microsoft Research The creative input of the Trident MSR summer ‘08 interns
MSR (Trident) Summer ‘08 Interns Eran Chinthaka Indiana University David Koop University of Utah Satya Sahoo Wright State University Matt Valerio Ohio State University
Trident Project Objectives Demonstrate that a commercial workflow management system can be used to implement scientific workflow Offer this system as an open source accelerator • Write once, deploy and run anywhere... • Abstract parallelism (HPC and many core); • Automatic provenance capture, for both workflow and results; • Costing model for estimating resource required; • Integrated data storage and access, in particular cloud computing; • Reproducible research; Develop this in the context of real eScience applications • Make sure we solve a real problem for actual project(s). And this is where things got really interesting...
Workflowand the Neptune Array Workflow is a bridge between the underwater sensor array (instrument) and the end users Mandate • Make data available to researchers in (near-) real time • Store data for long term time-series studies Features • Allow human interaction with instruments; • Deployed instruments will change regularly, as will the analysis; • Facilitate automated, routine “survey campaigns”; • Support automated event detection and reaction; • User able to access through web (or custom client software); • Best effort for most workflows is acceptable;
Pan-STARRS Sky Survey Slide complements of Yogesh Simmhan • One of the largest visible light telescopes • 4 unit telescopes acting as one • 1 Gigapixel per telescope • Surveys entire visible universe in 1 week • Catalog solar system, moving objects/asteroids • ps1sc.org: UHawaii, Johns Hopkins, … Haleakala Observatory, Maui, Hawaii!!
Pan-STARRS Highlights Slide complements of Yogesh Simmhan • 30TB of processed data/year • ~1PB of raw data • 5 billion objects; 100 million detections/week • Updated every week • SQL Server 2008 for storing detections • Distributed view over spatially partitioned databases • Replicated for fault tolerance • Windows 2008 HPC Cluster • Schedules workflow, monitor system
Pan-STARRS Data Flow Slide complements of Yogesh Simmhan IPP csv csv csv csv csv csv Shared Data Store Load Merge 1 Load Merge 2 Load Merge 3 Load Merge 4 Load Merge 5 Load Merge 6 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 S 16 L1 L2 Slice 1 Slice 2 Slice 3 Slice 4 Slice 5 Slice 6 Slice 7 Slice 8 HOT S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 S 13 S 14 S 15 S 16 WARM s 16 s 3 s 2 s 5 s 4 s 7 s 6 s 9 s 8 s 11 s 10 s 13 s 12 s 15 s 14 s 1 Main Main Distributed View
The Pan-STARRS Workflows Slide complements of Yogesh Simmhan ← Behind the Cloud|| User facing services → Data Valet Workflows Astronomers (Data Consumers) The Pan-STARRS Science Cloud Data Consumer Queries & Workflows WarmSlice DB 1 Data Creators Load Workflow Load DB CSV Files Cold Slice DB 1 Image Procesing Pipeline (IPP) MyDB Merge Workflow Flip Workflow Hot Slice DB 2 Distributed View CSV Files Load DB CASJobs Query Service Load Workflow Telescope Merge Workflow Cold Slice DB 2 Distributed View Flip Workflow WarmSlice DB 2 Hot Slice DB 1 Validation Exception Notification MyDB Admin & Load-Merge Machines Slice Fault Recover Workflow Data flows in one direction→, except for error recovery Production Machines Supporting Provenance for the Scientist & the Data Valet
Pan-STARRS Architecture Slide complements of Yogesh Simmhan Workflow is just a member of the orchestra <footer text>
Workflowand Pan-STARRS Workflow carries out the data loading and merging Features • Support scheduling of workflows for nightly load and merge; • Offer only controlled (protected) access to the workflow system; • Workflows are tested, hardened and seldom change; • Not a unit of reuse or knowledge sharing; • Fault tolerance – ensure recovery and cleanup from faults; • Assign clean up workflows to undo state changes; • Provenance as a record of state changes (system management); • Performance monitoring and logging for diagnostics; • Must “play well” in a distributed system; • Provide ground truth for the state of the system;
Other Partner Applications <footer text>
Differing and Lurking Requirements • I want to do this more than once and get exactly the same answer. • I want to do this more than once, but don’t care if I get exactly the same answer. • I’m only going to do this once and don’t care about keeping the data or the results long term (but I need to remember the inputs); • I want to store the data in <local file, SQL Server, in the cloud, etc> • I want full provenance to validate a result, OPM compliant; • I want to use my own provenance management system; • Each group may wish a different UI (no WF), or authoring tool • I want any data from any agency or investigator even if the measurement sites aren’t quite co-located; I’ll deal with it later. • I only want NCAR, MBARI, etc. data because I trust it. • I know that Jon really wants my results to drive his model and I want to share my workflow and executables. Each of these potentially impacts the technology, user interface, and API design
Why pay the price to architect? Divide and conquer • You can see all of the application components; • Different components share interfaces; • Different components developed by different people work together, even if someone else implements them; Go from working to working • Change one component, the rest keep working; • Scale up or down over time; • Testing components independently is possible; Full design, incremental implementation • Build what you need as you go; • Integrate new data sources, data types, analysis tools leverage the stable interfaces. Plug and play…
Why not architect? • It’s hard • You have to accumulate user scenarios, map them to the technical components, and then understand the implications. • What are the dimensions of change/flexibility?
Why not architect? • It’s hard • You have to accumulate user scenarios, map them to the technical components, and then understand the implications. • What are the dimensions of change/flexibility? • It doesn’t feel like you’re making progress • You spend a lot of time discovering what you already know. User scenarios often contain many of the same technical requirements again and again. • It’s not fun • You have to keep your interfaces stable longer (because you have dependencies on them), so that great idea has to wait for the next release • The design discussions can be rather “energetic” • It takes a team commitment
How we decided to architect • Drive workflow development with 20 queries (workflows) • representative of the science • diverse enough to drive the design
20 Workflows for Neptune <footer text>
20 Workflows for Neptune <footer text>
How we decided to architect • Drive workflow development with 20 queries (workflows) • representative of the science • diverse enough to drive the design • Introduce a registry as single ground truth for all state and objects.
Trident RegistryRegistry Management • Provides ground truth state for Trident • Captures provenance for workflows • Records information on running jobs • Meta data for all objects in Trident
How we decided to architect • Drive workflow development with 20 queries (workflows) • representative of the science • diverse enough to drive the design • Introduce a registry as single ground truth for all state and objects. • Introduce an event blackboard for service communication;
Trident Blackboard OverviewMatt Valerio, Satya Sahoo, Jared Jackson Shared Ontology Logging • Tracking • Design Blackboard Monitoring • Tracking • Resource Usage • User-Defined … Provenance Other publishers • Tracking • Design • Data Publisher Store Subscription Store … BlackboardMessage Subscription Profile Other publishers concept1 concept1 value1 concept3 concept2 value2
Workflow Tracking • Workflow Events • Aborted • Changed • Completed • Created • Idle • Loaded • Persisted • Resumed • Started • Suspended • Terminated • Unloaded • Activity Events • Cancelling • Closed • Compensating • Executing • Faulting • Initialized Concept-Value Pairs Ontology concept1 value1 concept2 value2 concept3 value3 • Tracking Data • Instance ID • Activity Type • Activity Name • Timestamp • … concept4 value4 Mapping • User Events • User-defined Aggregate Subscription Profile Filtering concept1 • Why filter at the publisher? • Minimize network usage • Optimize performance (more messages/sec) concept3 BlackboardMessage Blackboard concept1 value1 concept3 value3 Send
Workflow Monitoring • Goals • Real-time resource usage graphs (e.g. Silverlight) • Subscriber-initiated • Activity-initiated • Creation of cost models for each type of activity • Implementation • Subscribers listen for a specific resource concept • The monitoring service polls a resource monitor at regular intervals • The results are sent to the blackboard CPU Monitor 0% 100% SequenceActivity1 CpuIntensiveActivity1 MemoryIntensiveActivity1 CpuIntensiveActivity2 Time
How we decided to architect • Drive workflow development with 20 queries (workflows) • representative of the science • diverse enough to drive the design • Introduce a registry as single ground truth for all state and objects. • Introduce an event blackboard for service communication; • Choose specific interfaces between components and stick to them • APIs, object models, browser user screens and forms • Everything can be replaced and/or augmented
Trident Registry Provider APIEran Chinthaka and Nelson Araujo Native Managed API API Managed Native Web Services Web Services
Trident Registry Provider APIEran Chinthaka and Nelson Araujo
How we decided to architect • Drive workflow development with 20 queries (workflows) • representative of the science • diverse enough to drive the design • Introduce a registry as single ground truth for all state and objects. • Introduce an event blackboard for service communication; • Choose specific interfaces between components and stick to them • APIs, object models, browser user screens and forms • Everything can be replaced and/or augmented • Separate the user interface to solve specific tasks • Separate authoring UI from runtime • Separate execution UI from runtime. It’s a workflow – what parameters do you want to set? What parts do you want to pause? Do over? Never do again? • Some things only work on the desktop; some things work best in the cloud. Enable users to select at runtime.
Trident Interface for Neptune <footer text>
Trident Interface for Neptune <footer text>
Workflow SelectionDavid Koop, Nelson Araujo • Show me the workflows that • Process these data sets (sensor types); • Produce this kind of result (type of visualization, analysis); • Order these workflows by time it was last used; • Now apply this workflow to “this” area of the ocean;
Trident Logical Architecture <footer text>
myExperiment <footer text>
Questions Scientific workflows for streamlining the data pipeline