250 likes | 434 Views
Process/data API. Process API - intro. The workflow engine runs applications Executable code in different languages API – methods Web services Applications require setup to run Where are they Where will they run (farm, local machine, specific machine Data IO Version etc.
E N D
Process API - intro • The workflow engine runs applications • Executable code in different languages • API – methods • Web services • Applications require setup to run • Where are they • Where will they run (farm, local machine, specific machine • Data IO • Version etc
Process API - Intro • We do this 2 ways • As a single object process • We have defined a data object to hold things • We can use the same idea for the processAPI • Set up the object and “doIt” • As setup calls and application call • Define setups for a process • Use a single call to run the process
ProcessAPI • The following are the fields within the WFE process object. (ignoring WFE specific) • Name & Human-readable name : not impt. • type • File : Where, could be URL • Data : see later • Runtime/fail time : does the API monitor these • parameters
Process Object fields • Type • Ie is this an exec, URL, and so on • Process • The actual mapped process name. A Site specific mapping will define the actual meaning of the process name • Location : • Where is the application to run (client/server/farm), or other things like URL. • Is it useful to have this in the WFE - XML file – or as a separate process API XML setup. I would think the latter.
Process-API • Data • The WFE data object defines input and output at run time – only mutability is class (static) • We have to pass data to a process, then it might be sensible to put the process object • See the data API definition for the object. • Some object containers are data in and some are data out – they need to have the same structure though.
Process-API • Runtime and failtime • These are WFE exception manager properties • It might not be a good idea reproduce the exception outside the WFE as the WFE needs to handle any failure. Process failure must not be hidden from the WFE
Process API • Parameters • Probably a python dictionary is best here. • Needs to be exposed to the WFE since different parts of the workflow may need different parameters (consider MAXIT)
Process API • The problem I have is defining which data object is which. The data object needs a definition so the program knows what the data – see process API. • Using python class object These will of course be defined in the workflow engine variables. Note that adding of multiple data objects ProcOb = ApiProcess() ProcOb.set( ‘name ‘,‘myAlignProg’) ProcObset(‘parameters’], ‘-P 33 –x ddd’) ProcOb.set(‘type’,‘exec’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘input’, data.ob[‘D2’]) ProcOb.add(‘output1’,data.ob[‘D3’])
Process API • Program Exec • Executable • Process : Use a mapped name for application – site specific • Location : local/server/farm – mapped names • How do we know which objects are which ? ProcOb = ApiProcess() ProcOb.set(‘type’,‘exec’) ProcOb.set(‘process’,‘maxit’) ProcOb.set(‘location’,’server’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘input’, data.ob[‘D2’]) ProcOb.add(‘output1’,data.ob[‘D3’]) processAPI.run (procOb)
Process API • DataAPI copy • Copy data • Parameters = new version • Data objects – see later ProcOb = ApiProcess() ProcOb.set(‘name ‘, ‘copy’) ProcOb.set(‘parameters’, ‘newVersion’) ProcOb.set(‘process’,‘method’) ProcOb.set(‘location’,’dataAPI’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘output’,data.ob[‘D3’]) processAPI.run (procOb)
Automated questions in XML • <wf:task taskID="TD3" name="SequenceOK" nextTask="J1" breakpoint="false"> <wf:description>Check whether the sequence align was OK</wf:description> <wf:decision type="AUTO"> <wf:dataObjectsLocation> <wf:location dataID="D6" type="input"/> </wf:dataObjectsLocation> <wf:nextTasks> <wf:nextTask taskID="TW4"> <wf:function dataID="D6" gte="20" less="200000000"/> </wf:nextTask> <wf:nextTask taskID="TM5"> <wf:function dataID="D6" gte="2" less="20"/> </wf:nextTask> <wf:nextTask taskID="T9"> <wf:function dataID="D6" gte="0" less="2"/> </wf:nextTask> </wf:nextTasks> </wf:decision> </wf:task> Decision data object Decision option More complex functions will require python methods specific to the question
Detail description to technology • A data object is pre-declared in the XML • Data place holder • Defines API object detail • A task object can reference data objects • As input, output or both • A process task : • API method • Exec program <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject> • <wf:tasktaskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task to copy data object</wf:description> <wf:processrunTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:locationdataID="D1" type="input"/> <wf:locationdataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
Creating data objects in WFE • # the data object ID' self.object.set("deposition-dataset-ID",depID) self.object.set("workflow-class-ID",classID) self.object.set("workflow-instance-ID",instID) self.type = data.getAttribute("type") self.object.set("return-type",data.getAttribute("type")) if (data.getAttribute("mutable")=="true"): self.object.set("access",data.getAttribute("read-write")) else: self.object.set("access",data.getAttribute("read-only")) # internal workflow cross reference self.name = data.getAttribute("dataID") self.nameHumanReadable = data.getAttribute("name") for detail in data.childNodes: if (detail.nodeName == "wf:description"): self.description = detail.firstChild.data elif (detail.nodeName == "wf:location"): self.nameSpace = detail.getAttribute("namespace") self.object.set("data-object-name",detail.getAttribute("namespace")) self.where = detail.getAttribute("where") self.object.set("data-object-location",detail.getAttribute("where")) Each data XML statement is stored as a reference object This object is a place holder which can be passed to processes It contains information where to access data
The engine data object • May be a real or virtual payload of data • Where, what and type • Payload is passed between tasks • The WF is a data processing pipeline • A real value can be examined to effect the WF • The path is dependent on data values (auto/manual decisions are based on these values) • The data version is WF instance data • Can be domain data (via dataAPI) • Can be WF data (via statusAPI) – scope defined by the object the data is stored in
Engine process manager This is a thread – running inside exception manager • def run(self): self.status = 1; for key, value in self.inputObjects istat = myApi.do(value) • if self.task.uniqueType == "test": # test method - just counts for 5 seconds for i = in (0,5): time.sleep(1.0) elif self.task.uniqueType == "method": # this is an API process if self.task.uniqueWhere == "API": # this is an API method call self.processAPI.runMethod(task.uniqueName) elif self.task.uniqueType == "exec": # this is an exec program found "where" self.processAPI.runExec(task.uniqueName, task.uniqueWhere) • for key, value in self.outputObjects istat = myApi.do(value) self.statusAPI.setStatus(“finished”) Send the request data objects What sort of process is it ? Get the response data objects
Workflow granularity • It does not really matter • A process can be as complex as you like • Depends on go-back granularity • Depends on “how much would loose if it crashed” • Data is the problem ! • The workflow is a flow of data – so hiding data from the engine will collapse a workflow to nothing. • The pathway choice is all about data – the less visible the data – the less choice in the workflow. • If a process decides what to do with data the consequence is : • Loose go-back ability • Loose track of the data and what is going on • Loose plug and play on the process. • Loose exception management.
Engine design examples Interface task Process task Send data objects to interface Read XML – store objects and tasks Send data object requests Send actionable events Start/restart (maybe at go-back point) Wait for interface Run process Run tasks – follow path Get response data objects Get return action from interface Exit
John’s requirements 1 • 1) Identify and copy and archive object • Object declaration <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataCopy" type="Object" dependence="D1" mutable="true"> <wf:description>General object - new copy of data</wf:description> <wf:location namespace="__new_object" where="DM"/> </wf:dataObject> • Task declaration • <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task> The actual data The process – a method within the API Name reference
John’s requirement 2make new data version • Declare data • Input D1 • Output D2 • Declare task • Method in API <wf:dataObjects> <wf:dataObject dataID="D1" name="dataToAddNewVersion" type="Object" mutable="true"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataNewVersion" type="Object" dependence="D1" mutable="true"> <wf:description>New version of data</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> </wf:dataObjects> <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task create a new version of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APInewVersion" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
John’s requirement 3Get version list and show • Data – 3 objects • D1 – object target • D2 – Version list • D3 – Which one to use • Some tasks • Get list from API • Interface to chose (not shown) <wf:dataObject dataID="D1" name="dataObjectTarget" type="Object" mutable="false"> <wf:description>target object to query on</wf:description> <wf:location namespace="__object_name" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="VersionList" type="List" mutable="false"> <wf:description>Return version list</wf:description> <wf:location namespace="versionList" where="local"/> </wf:dataObject> <wf:dataObject dataID="D3" name="useVersion" type="Integer" mutable="true"> <wf:description>Version to use</wf:description> <wf:location namespace="version" where="WF"/> </wf:dataObject> <wf:task taskID="T2" name="requestVersionList" nextTask="T3" breakpoint="false"> <wf:description>Run API to get the version list of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIversionList" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>
John’s requirement 4/5data selector • A data object may need additional qualifiers to say what it is. • Selector value • “selection” • It is likely that the qualifier will : • need to be a WF class (static) variable • Need to be a WF inst (dynamic) variable. <wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="_entity.id=1" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="set_entity.type='protein' where entity.id=1" where="DM"/> </wf:dataObject>
John’s requirement 6Length/size of object • <wf:dataObject dataID="D1" name="dataTarget" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataLength" type="integer" dependence="D1" mutable="true"> <wf:description>Length of data object</wf:description> <wf:location namespace="dataLength" where="WF"/> </wf:dataObject> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIObjectSize" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> Define object and place holder for size value Run task to input data to function, and return length
John’s requirement 7Format conversion • <wf:dataObjects> <wf:dataObject dataID="D1" name="dataObjectPDB" type="Object" mutable="false"> <wf:description>General object to convert format</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataObjectMMCIF" type="Object" dependence="D1" mutable="true"> <wf:description>New data in different format</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> <wf:dataObject dataID="D3" name="status" type="string" dependence="D1" mutable="true"> <wf:description>A status code return</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> </wf:dataObjects> • <wf:task taskID="T2" name="formatChange" nextTask="T9" breakpoint="false"> <wf:description>Run API task to change the format of data</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIformatChangePDBtoPDBx" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> <wf:location dataID="D3" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task> Input and output formats Place holder for status – this might be so intrinsic to all tasks that it should probably be pre-declared and always present And the API function to do this