150 likes | 404 Views
T HE US N ATIONAL V IRTUAL O BSERVATORY. Managing VO data and process flows. Matthew J. Graham CACR/Caltech. Overview. Astronomical data VOStore/VOSpace Workflows Astrogrid workflow CEA. VO Wheel™. The importance of data. Data is the raison d’être of the VO
E N D
THE US NATIONAL VIRTUAL OBSERVATORY Managing VO data and process flows Matthew J. Graham CACR/Caltech NVO Summer School 2005
Overview • Astronomical data • VOStore/VOSpace • Workflows • Astrogrid workflow • CEA NVO Summer School 2005
VO Wheel™ The importance of data • Data is the raison d’être of the VO • LSST is the data source nonpareil • data rates of 540MB/s ~16TB in 8 hrs • final archive > 3PB of data • Well-established ways of handling distributed data: • SRB • PVFS • OGSA-DAI NVO Summer School 2005
Data use cases • Client has data: • stored locally: transfers it to service • stored locally: service retrieves it • stored elsewhere: service retrieves it • Service generates data: • stores it locally: notifies client of location • transfers it to the client’s local store • transfers it to a client-designated store NVO Summer School 2005
VOStore • Provides a uniform interface to existing or new data storage locations (Facade pattern) • Structured/unstructured data both first level • Methods: • get • put • list / listAll • importInit • importData (sync/async) • exportInit • exportData (sync/async) • delete • rename NVO Summer School 2005
VOSpace • Orchestrates VOStores: • data collections: directories, user-defined • authorisation: user groups • processing efficiency: where is the nearest copy? • move • copy • identifiers NVO Summer School 2005
A virtual super-peer data network? NVO Summer School 2005
How to manage the flows? • Way of describing a flow: • processes/steps, inputs/outputs, serial/parallel execution, control logic, variables, inline scripting • preferably XML (verbose but rigourous) • Way of controlling a flow: engine • e-Science vs. e-Business: • open-ended vs. closed • verification and publication • static vs. dynamic workflows • volume and type of data • meta-transactions • customer, manager and user vs. scientist NVO Summer School 2005
Workflow patterns Sequence: Parallel split Synchronisation Multi + Synchronizing Merge AND XOR Exclusive choice Simple Merge Multi + Multi Multi Multi choice Multi Merge Multi + Discriminator Deferred choice Multiple Instances with/out Synch Implicit termination Interleaved Parallel Routing Milestone NVO Summer School 2005
Workflow kerfuffle • Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI) , XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA • Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE NVO Summer School 2005
Astrogrid workflow components • JES (Job Execution System) • Astrogrid workflow engine • Manages control flow • Runs steps in a controlled asynchronous fashion • CEC (Common Execution Controller) • Manages step execution • Manages data flow • CEA (Common Execution Architecture) apps • datacenters: support complex quesries against archives • processing: consume data files and reduce them NVO Summer School 2005
Registry Command Line CEA Portal CEC JES Datacenter CEA MySpace Astrogrid workflow schematic Application list Resolve application Submit workflow Client library Save/load workflow Save/load data NVO Summer School 2005
Astrogrid workflow language <workflow name=“a workflow”> <description>description of the workflow</description> <sequence/flow> <set var=“dec” value=“15”/> <step name=“a” result-var=“a-results”> <tool name=“toolA” interface=“simpleInterface”> <input> <parameter name=“RA”><value>21</value></parameter> <parameter name=“Dec”><value>${dec}</value></parameter> </input> <output> <parameter name=“results ”indirect=“true”> <value>ftp://aServer/myResults</value> </parameter> </output> </tool> </step> <step name=“b”>… </sequence/flow> <script>… <if test=…> <while test=…> <for var=… items=…> <parfor var=… items=…> <try> <catch> </workflow> NVO Summer School 2005
CEA • Create a uniform interface and model for an application and its parameters • Provides higher level description than WSDL: • Restrict how interfaces can be expressed • Provide specific semantics for astronomical quantitites • Extra information, such as default values, GUI labels • VOResource extensions for a general application • Provide asynchronous operation: • callback, polling and job identification • Allow separate data and control flows NVO Summer School 2005
Minimum CEA compliance • Must implement CommonExecutionConnector interface • Must send a message to services implementing ResultsListener interface • Should send messages to services implementing JobMonitor interface • Should perform basic type checking on all parameter types during init phase NVO Summer School 2005