1 / 44

Middleware Technologies in Bioinformatics

Middleware Technologies in Bioinformatics. Alan Robinson. Talk Outline. What is middleware? Why middleware? What does middleware offer bioinformatics? Who is middleware for? Some middleware technologies Some practical uses of middleware. What is middleware?. The “stuff” between:

gabi
Download Presentation

Middleware Technologies in Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Middleware Technologies in Bioinformatics Alan Robinson

  2. Talk Outline • What is middleware? • Why middleware? • What does middleware offer bioinformatics? • Who is middleware for? • Some middleware technologies • Some practical uses of middleware

  3. What is middleware? • The “stuff” between: • Hardware & software • OS & applications • Applications • Properties of middleware: • “Glue” • Integration • Interoperability • Infrastructure • Enables distributed computing.

  4. Why Middleware? • Bioinformatics requires: • Access to multiple distributed resources • Needs information to be up-to-date • Minimal data redundancy • Robust applications • Extendable applications • Monolithic App. vs. Components • Portable software

  5. Why Middleware? • Bioinformatics must contend with increasing amounts of information • Bioinformatics must adapt to changing and new technologies • Bioinformatics can learn from IT experiences in other domains • For example: • Microarray data accentuates these problems of integration and interoperability, but applies to HTP sequencing and proteomics also.

  6. Pathways Interoperation & Integration What do we know about other molecules involved in that pathway? EMBL-BankDNA sequences SWISS-PROT + TrEMBL Protein Sequences EnsEMBL Human Genome Gene Annotation Array-Express Microarray Expression Data EMSD Macromolecular Structure Data IntActProtein Interactions

  7. Middleware Technologies • CORBA: • Object-oriented / components • Web services: • XML-based • Grid: • Next generation?

  8. CORBA Overview • CORBA allows the interconnection of databases and applications, regardless of: • The computer language of the applications that provide or use the objects • The machine architecture of the computers involved • The geographical location of the computer (connection through the Internet/Intranet)

  9. CORBA Client Object Stub Skel ORB

  10. CORBA Client Object Client Object Stub Skel Stub Skel ORB ORB IIOP

  11. Naming service Trader service Event service Notification service Object transaction service Security service LifeCycle service Relationship service Persistent state service Externalisation service Object query service Object properties service Concurrency service Licensing service Secure time service Object collection service ... CORBA Services

  12. Object Management Architecture Application objects Vertical CORBA facilities Horizontal CORBA facilities Object Request Broker CORBA services

  13. Web services • A collection of XML-based technologies developed by the e-business community to address issues of: • Service discovery - Business processes • Interoperability - Data exchange • Major developers include: • Apache, IBM, HP, SUN & Microsoft (.NET) • http://www.webservices.org/ • http://www.ibm.com/developerworks/webservices/

  14. Web Services Architecture

  15. Web Services Stack

  16. SOAP • Simple Object Access Protocol • A lightweight protocol for exchange of information in a decentralized, distributed environment • A design goal is to encapsulate RPC calls using the extensibility and flexibility of XML • http://www.w3c.org/TR/SOAP/

  17. XML Messaging Using SOAP

  18. WSDL • Web Services Definition Language • A specification to describe networked XML-based services • A simple way for service providers to describe the basic format of requests to their systems regardless of the underlying protocol • http://www.w3.org/TR/wsdl/

  19. UDDI • Universal Description, Discovery and Integration • UDDI creates a platform-independent, open framework & registry using the Internet for: • Describing services • Discovering businesses • Integrating business services • http://www.uddi.org/

  20. WSFL • Web Services Flow Language • An XML language for the description of Web Services compositions • Describes how Web services may be composed into new Web services to support business processes • http://www-4.ibm.com/software/solutions/webservices/pdf/WSFL.pdf

  21. Web Services Stack

  22. Proposed Specifications for Web services • Quality of service (WS-Quality) • Transactions • Service level agreements (WS-SLA) • Security (WS-Security) • Interactivity (WSXL) • ... • Serious investment & momentum in WS.

  23. Semantic Web • An evolution of the current Web • Information is given well-defined meaning, better enabling computers and people to work in co-operation • Data defined and linked for more effective discovery, automation, integration, and reuse across various applications • Enable users/agents to locate, select, employ, compose, and monitor Web-based services automatically • http://www.semanticweb.org/

  24. DAML+OIL • A markup language with a rich set of constructs that allow for the creation of complex and robust ontologies • Written in RDF & RDFSchema, but provides richer modelling primitives • Intent is to provide additional machine-processable semantics for resources • http://www.w3.org/TR/daml+oil-reference

  25. Example of DAML • "Parenthood is a more general relationship than motherhood" and "Mary is the mother of Bill" together allow a DAML system to conclude that "Mary is the parent of Bill" • If a user asks a DAML search system "Who are Bill's parents?" • The system can respond that “Mary is one of Bill's parents”, even though that fact is not stated anywhere, but can only be derived by a DAML application.

  26. DAML-S • Supplies Web service providers with a core set of markup language constructs for describing the properties and capabilities of their Web services in unambiguous, computer-intepretable form by agents • Facilitate the automation of Web service tasks including automated discovery, execution, interoperation, composition and monitoring • Builds on top of DAML+OIL • http://www.daml.org/services/

  27. DAML for Web Services

  28. What is the Grid? • “An environment that enables geographically distributed scientists to achieve research goals more effectively, while enabling their results to be used in developments elsewhere” • Typified by access to HPC & HPN • Globus & AccessGrid.

  29. OGSA • Open Grids Services Architecture • “A proposed evolution of the current Globus toolkit towards a Grid system architecture based on an integration of Grid & Web service technologies” • http://www.globus.org/ogsa/

  30. OGSA • Architecture defines a uniform exposed service semantics (the Grid service) • Defines standard mechanisms for creating, naming, and discovering transient Grid service instances • Provides location transparency and multiple protocol bindings for service instances • Supports integration with underlying native platform facilities.

  31. OGSA • OGSA defines WSDL interfaces and associated conventions,mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification • Still embryonic.

  32. What is e-Science? Prof. Tom Rodden - Nottm. • e-Science is not just high bandwidth communication and HPC running simulations linked through “the GRID” • e-Science is about: • Exploiting digital technology to support all aspects of scientific activity • Support for large-scale science through distributed global collaborations • Formation of virtual co-laboratories allowing scientists to work together irrespective of location • Universal access to scientific resources • Support for scientific community.

  33. The Bottom Line... “The development of a communication and computational infrastructure to underpin the work of scientists” Middlewareenablinginteroperability

  34. Distributed Annotation System • A web-based protocol for a distributed sequence annotation system developed by Lincoln Stein et al. at CSHL • A single server is the “reference server” and serves essential genome structural information • physical map, sequence, authorship information • Sequence annotation decentralised among multiple third-party annotators and integrated on an "as-needed" basis by client-side software

  35. Distributed Annotation System

  36. The DAS System • Interrogate annotation servers to retrieve and add features to the sequence retrieved from the reference server • Need a standard format to describe sequence features • The format must be able to deal with relative co-ordinates in which annotations are related to arbitrary hierarchical landmarks • Assume have good sequence related by mapping info

  37. Distributed Annotation System Reference server Annotation server 1 Annotation server 2 Client

  38. DAS Annotations • Annotations relate to a region of sequence • Each annotation is unambiguously located by defining its position relative to a reference sequence • Annotation co-ordinates stored relative to the smallest sequencing unit since more stable than co-ordinates based on links or chromosomes

  39. Client/Server Interactions • DAS is web-based • Clients query the reference and annotation servers by sending formatted URL request to the server • Request composed of • site, data source, command and arguments • Servers process the request and return response as a formatted XML document http://stein.cshl.org/das/elegans/features?ref=CHROMOSOME_I&start=1000&stop=20

  40. Soaplab • Soaplab is a set of Web Services providing programmatic access to applications on remote computers. • Soaplab uses a specification for an Analysis Service (based OMG's Biomolecular Sequence Analysis specification) • The EMBL-EBI has a Soaplab service running on top of several tens of analyses (most coming from EMBOSS • Soaplab does not access individual analysis programs directly but uses a general-purpose package AppLab that hides all details about finding, starting, controlling, and using applications. http://industry.ebi.ac.uk/soaplab/

  41. The myGrid Project • “myGrid aims to design, develop & demonstrate higher level functionalities over an existing Web services & Grid infrastructure that support scientists in making use of complex distributed resources” • The exemplar domain is bioinformatics • http://www.mygrid.org.uk/

  42. Converging Technologies Grid Computing Globus, Sun Grid Engine, Condor, DS (Jini, Corba) An early adopter for OGSA Agents Web Technologies ACL, methodology SOAP, WSDL, UDDI, WSFL DAML+OIL, OWL, RDF(S)

  43. Publication of services Service repository Discovery of services Ontology & metadata Use & access to services Interoperation Personalisation Personal repository Annotation Configuration Sharing Workflow & process Composition Enactment Storage Management of provenance Recording of the process Attribution Notification Tracking changes to services Update of services Security & trust. myGrid Features The myGrid project seeks to make the use of on-line services easier by improving:

  44. Conclusions...

More Related