230 likes | 239 Views
This paper explores a data-centric approach for managing data conflicts and complexity in service-oriented workflows. It discusses the challenges faced during workflow orchestration and presents a solution that focuses on data flow modeling and integration. The paper also introduces the concept of a data-centric workflow and explores its advantages in terms of reusability, shared data models, and simplified integration.
E N D
A Data Centric approach for Workflows A. Akram, J Kewley and R. Allan CCLRC e-Science Centre, Daresbury Laboratory, UK
Web Services • Web Services are popular “connection technology” implementing SOA • Web Services can be • Described using service description language i.e. WSDL • Published to a registry of services i.e. UDDI • Discovered through standard mechanisms • Invoked through a declared API • Composed with other services
Service Oriented Workflow • Typical workflow are based on available Web Services and operations • Workflow requires additional work to resolve data conflicts between services • Normal steps to script any complicated workflow are: • Discovering suitable Web Services; • Parsing WSDLs • Extracting the Data Information from WSDLs • Data Mapping to match the service requirements • Data Transformation at each activity level • Resolving the Namespace issues related to different data sets • Efforts spent in such orchestration are to resolve Namespace issues, data ambiguity, and data transformation • Data specific issues make workflows overwhelmingly large, complicated, and difficult to manage and maintain
Service Oriented Workflow <process ……. xmlns:ns1="urn:ehtpx-process" xmlns:ns3="http://clyde.dl.ac.uk:8080/process/services/Score" <assign name="AssignScoreGet"> <copy> <from variable="inputVariable" part="payload" query="/ns1:ProcessAdminElement"/> <to variable="InvokeScoreGet_InputVariable" part="arg1" query="/ns2:getScoreResponse/admin"/> </copy> <assign/> </process>
Data Centric Workflow • Data Centric Workflow are modified Data-Flow Diagrams (DFDs) • Workflow in the context of Web Services focus on the data flowing from one process to another • Data Flow Modelling is the process of identifying, modelling and documenting the data moves across the system • Data flow modelling • Examines processes (activities that transform data from one form to another) • Data stores (the holding areas for data) • External entities (what sends data into a system or receives data from a system) • Data flows (routes by which data can flow). • Data modelling develops an accurate model, of the business requirements, stake holders, and sub-processes and activities.
Data Centric Workflow • Data centric workflow • defines define data definitions in the context of the application. • captures the intermediate data sets used by sub-processes during the lifecycle of the application • The services offered by different business partners or parties forming the supply chain follow the Data Flow Model • Web Services are developed in accordance with already negotiated and accepted data models. • The role of any Web Service is pre-defined in the context of an application • Web Services from multiple vendors, partners and collaborators may have same role in the application and can be replaceable • Current services may need to be reengineered to share common data model and provide predictable services. • Services need to develop their mapping routines and interfaces in compliance with the Data Model • Application has its own metadata, which must be coordinated through a shared data vocabulary
Data Centric Workflow • Business relationships and activity co-ordinations are becoming complex and they often demand simplicity in the integration of their services • Business users want common interfaces, business processes, application functionality, tools and services • Advantages of Data Centric Workflow: • Reusable Data Set • Agreed Data Model. • Shared Data Vocabulary. • Standard compliant Data Models. • Uniform View. • Simple Integration. • Single Source of Modification. • Improved Performance • Separation of Roles. • Data Binding and Validation • Stability
Limitation of Web Services • Web Services lacks the notion of: • State • Stateful interactions • Resource lifecycle management • Notification of state changes • Support for sharing and coordinated use of diverse resources
Web Services and State • Stateful Entities Exist • Data in a purchase order • Schema for a message exchange • Current usage agreement for resources • Metrics associated with work load on a server • No WS Standards for State Management • Each system does it in an “idiosyncratic way” • Integration impediment • Missing Component • Formalize a mechanism to represent “state”
Web Services Resource Framework • Web Services Resource Framework is built on the adopted Web Services architecture to address the limitations of Web Services. • WS-RF WS-RF comprises four inter-related specifications: • WS-ResourceProperties defines how WS-Resources are described by XML documents that can be queried and modified; • WS-ResourceLifetime defines mechanisms for destroying WS-Resources; • WS-ServiceGroup describes how collections of Web Services can be represented and managed; • WS-BaseFaults defines a standard exception reporting format. • WS-RF depends on two supporting specifications: • WS-Addressing • WS-Notifications
WS-Resources • A Resource: • A specific set of state data expressible as an XML document • This is not typically all of the resource’s state! • Has a well-defined identity and lifecycle • Singleton resource may not have any unique identifier • Known to, and acted upon, by one or more Web services. • Many Possible Instances • Files, Database tables, EJB Entities, XML documents, Compositions of multiple data sources, Virtualized executions of applications, etc. • A WS-Resource has: • Identity: Can be uniquely identified/referenced • Lifetime: Often created & destroyed by clients • State: Part of the state can be projected as XML • Type: Its Web service interface
WS-Resource Sharing • WS-Resources are not bound to a single Web Service. • Multiple Web Services can manage and monitor the same WS-Resource instance. • WS-Resources are not confined to a single organization. • Multiple organizations may work together on the same WS-Resource leading to the concept of collaboration. • Different WS can have distinct perspective of single WS-Resource Dynamically generated WS-Resource EPRs can be: • Discovered, • Inspected and • Monitored via dedicated Web Services • Unique identity of the WS-Resource instances (EPR) can be passed between partner processes and organizations: • Results in minimum network overhead • Avoids issues of stale information • Improved security options
Managing Multiple WS-Resources • In EA WS-Resources related to different entities can be very similar • Different type of Clients e.g. normal or super user • Different type of Bank Accounts e.g. current or saving account • Different but similar type of categories • Different products in same category • Hierarchical nature of entities • Similar natured WS-Resources mostly have similar natured operations • Similarly natured operations on different WS-Resources can be effectively managed with the single Instance Service • Web Service managing multiple WS-Resources could be deployed as : • Gatekeeper Service • Monitoring Service • Auditing Service
WS-Resource Referencing • WS-Resources are composed of Resource Properties • Resource Properties reflect the state. • Resource Properties can be reference to other WS-Resources • Referencing other WS-Resources defines: • inter-dependency of the WS-Resources. • Eliminates complicated business logic in instance service. • WS-Resources depends on the state of other WS-Resources to • Query • Modify
Implied Resource Pattern • Implied Resource pattern has two services: • Factory Service: • Instantiate the resources • Returns the EndpointReference • Instance Service: • Works only on existing resources • Uses the EndpointReference to • Access • Manipulate the resource • Direct instantiation of resources is prohibited • Singleton resource may not have any Factory service
Proof of Concept Implementation • The Data Centric workflow was developed as a stateful Web Service • Data model was encapsulated as a WS-Resource • The instantiation of the workflow was done through its Factory service • Factory Services are independent work in isolation. • Resources are created only when required for “late binding” • Most time consuming activity was the modelling of the data related to the workflow • Application specific data was modelled separately in various XML Schema • Simulate the real world problem, different Namespaces were mixed in the complex data types by using <xsd:import> and <xsd:include> • Separation of the data from the Web Services forced us to use Document/literal style Web Services • Development of the individual Web Services was straightforward • Business logic of the workflow was implemented in the Instance service of the workflow
Proof of Concept Implementation • Minor changes in the Data Model and in the partner services don’t require changes in the main • instance service of the workflow • Minor changes in the Data Model (restructuring of the data) had no impact on the workflow; • Severe changes in the Data Model may require changes in the partner services; • Interface changes of the partner services require changes in the workflow; • WSDL for the partner services was easy to manage and update; • Modelling data before implementation solves issues related to the automatic WSDL generation from various tools i.e. JAVA2WSDL for Axis, wscompile for JAX RPC and wsdl.exe for .Net platform • Workflow calls partner services in predefined sequence without any complicated data mapping and transformation logic; • Partner services can even call the next service in the sequence without involving the main workflow service
Conclusion • Scripting workflows based on the operations of available services is impractical • Most of the developers avoid direct interaction with XML • Shortcomings of in workflow scripts are more obvious when dealing with data type mapping and transformation • Data Modelling and taking a top down (WSDL first) approach is required to ensure a consistent solution for orchestration • Common platform-independent data type system facilitates: • Separation of roles, whereby workflow can be developed in isolation from partner services • Data Centric approach greatly increases productivity and simplifies development