310 likes | 459 Views
New EMBOSS Web Service. Shaun McGlinchey (shaun@ebi.ac.uk). Outline.
E N D
New EMBOSS Web Service Shaun McGlinchey (shaun@ebi.ac.uk)
Outline • The presentation will discuss the challenges encountered in exposing the EMBOSS suite of command line sequence analysis tools as a ‘stateful’ SOAP based web service. An overview of the proposed framework for client-side requests, server-side job submission and results delivery will then be given.
What is EMBOSS? EMBOSS is "The European Molecular Biology Open Software Suite". What can I use EMBOSS for? • Consists of approx 300 command line applications covering areas such as: • Sequence alignment • Rapid database searching with sequence patterns • Protein motif identification, including domain analysis • Phylogenetic analysis • Presentation tools for publication
What is JAX-WS? • In the words of SUN: JAX-WS - Java API for XML Web Services (JAX-WS). is the centerpiece of a newly rearchitected API stack for Web services, the so-called "integrated stack" that includes JAX-WS 2.0, JAXB 2.0, and SAAJ 1.3. • Essentially a SOAP toolkit for Java • The implementation has been renamed (JAXRPC) • It brings clear improvements on data binding capabilities through its tight integration with JAXB – Java API for XML Binding
Current State of (old) EBI EMBOSS Web Service • The current server-side implementation is Perl-based. Sample clients are available in .Net, SOAP::Lite and Java (Axis) solutions. • Currently accepts free text as data input – weak typing – poor validation capability • Supports both Synchronous and Asynchronous job submission. • Asynchronous requests are allocated a job id • Migrating to a Java-based JAX-WS server side implementation enables us to have more control over the generated artifacts, increased data validation capabilities and to rapidly improve on the functionality provided.
EMBOSS Data Types • There are 52 datatypes (at the last count) used within the EMBOSS suite of applications. These fall under five headings • Simple – Array, Boolean, Integer, String … • Input – Codon, Features, Sequence, Seqall … • Selection Lists – List, Selection … • Output – Align, Report, Seqout … • Graphics – Graph, Xygraph
EMBOSS Qualifiers • EMBOSS command line program • Accepts application name + qualifiers (each of which is a datatype): • Water -asequence tsw:hba_human -bsequence tsw:hbb_human : (water sequence seqall) • -asequence is of datatype Sequence, bsequence of Seqall • Qualifiers consist of associated qualifiers which can be also passed to the command line to enable advanced configuration of the application call. • - sbegin, -send, -sformat
General, Additional & Advanced Qualifiers • General are common to all EMBOSS applications • -auto true - Turn off prompts (boolean datatype) • -stdout true - Write standard output (boolean)
Web Service Development • In accordance with the Technology Recommendation we have chosen Top-Down approach to WS Development, not Bottom-Up. • Top-Down Approach to WS Development • Express data types in schema • Write WSDL (include schema) • Generate Artifacts (JavaBeans – data objects, server side stubs, implementation class
Top-Down Approach to WS Development • Top-Down • Express data types in schema • Write WSDL (include schema) • Generate Artifacts (JavaBeans – data objects, server side stubs, implementation class • Package (WAR file) • Deploy WAR file to server
Sample EMBOSS Application Schema (Head) <?xml version="1.0" encoding="UTF-8"?> <definitions targetNamespace=“emboss" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <types> <xsd:schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.ebi.ac.uk/ws/emboss/water/> <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.ebi.ac.uk/ws/emboss/applications/water/" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" jxb:version="1.0">
Application Schema – Custom Bindings (cont’d) <xsd:annotation> <xsd:appinfo> <jxb:schemaBindings> <jxb:package name="uk.ac.ebi.ws.emboss.applications.water"> </jxb:package> </jxb:schemaBindings> </xsd:appinfo> </xsd:annotation>
Express Application Parameters <xsd:element name="asequence“/> <xsd:complexType name="asequence"> <xsd:sequence> <xsd:element name="asequence" type="xsd:string" nillable="false"/> <xsd:element name="asequenceQualifiers" type="tns:asequenceQualifiers" nillable="true"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Express asequenceQualifiers <xsd:element name=“asequenceQualifiers”> <xsd:complexType name=“asequenceQualifiers"> <xsd:sequence> <xsd:element name="sbegin" type="xsd:integer"/> <xsd:element name="send" type="xsd:integer"/> <xsd:element name=“usa" type="xsd:string"/> …… </xsd:sequence> </xsd:complexType> </xsd:element>
Encapsulate all data types inside an application element <xsd:element name="water" type="tns:water"/> <xsd:complexType name="water"> <xsd:sequence> <xsd:element name="asequence" type="tns:asequence"/> <xsd:element name="bsequence" type="tns:bsequence"/> <xsd:element name="datafile" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Using JAXB Generated Java Beans at the client side • Java Bean Objects are generated using for client using JAX-WS ‘wsimport’ tool – compiles wsdl + schema • Generated objects are populated using setter (client-side) i.e. Sequence asequence = newSequence(); asequence.setUsa("tsw:hba_human"); asequenceQual.setSprotein(true); asequenceQual.setSbegin(0);
EMBOSS Applications (300) • Manually create the schema – Not scaleable • Maven is a software project management & build tool. • Written an EMBOSS ACD parser plugin for our Maven WS Software Build • Java class • Takes EMBOSS application definitions (ACD) as input • Output XML Schema, WSDL, representing each EMBOSS application • These schema are passed to a JAXB compiler which generates our Java Bean objects
Advantages of WS EMBOSS Software Build • Advantage of this approach is • We can auto-generate XML schema, Application WSDLs • Generate Java Objects for use on Client-Side • We can easily integrate new EMBOSS applications as a WS by running the ACD file through our software build
Why go to these lengths? • Because of sheer number of EMBOSS apps, necessary to provide a clear means of representing the invocation of separate applications and the passing of parameters appropriate to that app. ******* CLIENT SIDE CODE ********** RunEmbossRequest run = new RunEmbossRequest(); EmbossParams water = new EmbossParams(); water.setAsequence(asequence); water.setBsequence(bsequence); Emboss emboss = new Emboss(); emboss.setApplication(EmbossApplication.WATER); emboss.setApplicationParams(water); run.setEmbossParams(emboss); service = new WSEmbossService(); WSEmboss wsemboss = service.getWSEmboss(); RunEmbossResponse response = wsemboss.run(run);
Server-side – Reverse Process • At the server-side level, to obtain values objects can be de-serialised using the Java getter methods, i.e. ******* SERVER-SIDE CODE ********** Emboss emboss = input.getEmbossParams(); EmbossApplication embossApp = emboss.getApplication(); String appname = embossApp.value(); EmbossParams water = emboss.getApplicationParams(); Sequence asequence = water.getAsequence(); Seqall bsequence = water.getBsequence(); • This solution does not scale well
How do we get from a Web Service payload to a valid command line? • We are looking at the possibility of developing a generic mechanism to transform the SOAP envelope (our WS inputs – Water params etc) using XSL (Extensible Stylesheets) into a form (that can used to access the EMBOSS binary (application)
Understanding our Job Submission Requirements • Building a valid & secure command line (approx 300 EMBOSS applications) • Issuing the command line (300 applications) • Retrieving results from the EMBOSS application • Our WS Job Submission should fulfill the EMBRACE Technology recommendations of: • Being a ‘Stateful Web Service’ • Implement both synchronous and asynchronous functionality • Synchronous – submit a job (locked in to that application untill it returns a result) • Asynchronous (not synchronised) – submit a job but retain a free hand (not locked in) – we can poll the service with a jobid to obtain job status and results
Operations to support requirement of ‘Stateful’ WS • RunJob: i.e. runJob(water); – all parameters for the job are encapsulated in the water object. Operation will return a jobid. • CancelJob: i.e. cancelJob(“water12”); • This can be used to cancel the job execution • GetStatus: i.e. getStatus(“water12”); • Waiting, Scheduled, Running, Done, Cancelled, Aborted) • GetResult: i.e. getResult(“water12”); • Retrieve result of job, given a identifier
Do we have to reinvent the wheel? – Enter OMII • We propose borrowing established technology as one possible solution to our requirements • Recently (this week) I met with Software Group Leader at OMII – Open Middleware Infrastructure Institute based at University of Southampton – www.omii.ac.uk • OMII is an established GRID middleware service provider – very keen to have real users (developers using their products) • OMII design GRID related software products
What can they offer us? • We are interested in their GridSAM product • GridSAM consists of several subsystems that support: • Pluggable job persistence (if your job fails, it will be retried) • Job Queuing, Launching • Job Monitoring • Pending, staging in, active, executed, staging out, job completed
GridSAM cont’d • File Staging (stage in input files, stage out output files) • All this functionality is available through an API – JobManager Interface • Providing us with rich job submission functionality at little cost • Typically this functionality will be invoked from within the embedding Application – web service – using the API
How do I pass my job content to GridSAM Server • Jobs are launched by passing a JSDL (Job Submission Description Language) document to the GridSAM server from a GridSAM client using the JobManager API • All of this can exist underneath your web service layer • Opportunity for a shared EMBRACE server perhaps!
Sample JSDL <xml version”1.0” encoding=“UTF-8”?> <JobDefinition xmlns=http://schemas.ggf.org/jsdl/2005/11/jsdl> <JobDescription> <Application> <POSIXApplication xmlnshttp://schema.gff.org.jsdl/2005/11/jsdl-posix> <Executable>/bin/echo</Executable> </Application </JobDescription> </JobDefinition>
Very good! – What about the EMBOSS WS • As mentioned, we propose to transform the EMBOSS WS payloads (soap message) at runtime into a valid JSDL document to be submitted to GridSAM • GridSAM looks promising! • We will use the EMBOSS WS as a test bed • If successful we may make a recommendation to WP3