1 / 45

my Grid: Upper level Grid Services for the Bioinformatican

my Grid: Upper level Grid Services for the Bioinformatican. Prof. Carole Goble http://www.mygrid.org.uk Sun Microsystems BioGrid Symposium, Baltimore, USA 4 th -5 th December 2002. UK eScience Programme. Grid-enabled eScience Emphasis on information integration and knowledge management

Download Presentation

my Grid: Upper level Grid Services for the Bioinformatican

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. myGrid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble http://www.mygrid.org.uk Sun Microsystems BioGrid Symposium, Baltimore, USA 4th-5th December 2002

  2. UK eScience Programme Grid-enabled eScience Emphasis on information integration and knowledge management The Virtual Organisation view $180 million + industrial contributions Complete infrastructure of regional eScience centres, support and a UK computational Grid Started on Globus though Unicore used in EuroGrid with great success Centres donated equipment – highly heterogeneous Core component of the EU Grid FP6 programme Edinburgh Glasgow Newcastle Belfast Manchester DL Cambridge Oxford Hinxton RAL Cardiff London Southampton

  3. myGrid • EPSRC UK eScience pilot project • 01/01/02 - end 30/03/05 • Uses the UK Grid infrastructure IBM Lion BioSciences, Millennium Pharmaceuticals & Oracle

  4. myGrid • Not a computational grid project • Building Grid middleware • Higher level services: workflow, databases, knowledge management, provenance… • Service-based : Open Grid Service Architecture early adopter • Bioinformatics services are published as Web services and Grid Services • Working with publicly available biological resources: e.g. EMBL-EBI

  5. What is the Grid? • Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations • On-demand, ubiquitous access to computing, data, and all kinds of services • New capabilities constructed dynamically and transparently from distributed services • No central location, No central control, No existing trust relationships, Little predetermination • Uniformity, Pooling & Virtualisation

  6. What is the Grid? “E-Scientists” Environment • In silico experiments • Information harvesting & PSE • Dynamically forming virtual organisations to solve problems. • Describing, searching for and weaving resources: people. applications, db, content, instruments • Orchestrating resources • Support for scientific method: provenance, argumentation, opinion contextualisation etc • BioUtility & communities of practice Knowledge Grid Information Grid Data/Computation Grid

  7. Information Weaving • Large amounts of different kinds of data & many applications. • Highly heterogeneous. • Different types, algorithms, forms, implementations, communities, service providers • High autonomy. • Highly complex and inter-related, & volatile. • Much of it textual narrative

  8. Circadian Rhythms • Has anyone else studied the effect of neurotransmitters on the circadian rhythms in Drosophila? • I’ve got a cluster of proteins from my experiment. How do their functions interrelate? And what are the proteins with a particular function? • Is a structure known for my protein? What other proteins have a similar structure? • Can I build a homology 3D model? • What is known about a homologous protein? 1 2 3 4 5

  9. 3 4 5 e-Science Q & A Who else has asked this question & can I use/adapt their approach? • Workflow. What were the results at each stage? • Dynamic Data Repositories. When was P12345 last updated? Which BLAST did I use? • Provenance. Has PDB changed since I last ran this? • Notification. 1 2 3 4 5 Personalisation.

  10. Courtesy of Mark Wilkinson (BioMOBY)

  11. RASMOL myGrid • Service based architecture • Publication, discovery, interoperation, composition, decommissioning of myGrid services • Resource Interoperation • Workflow coordination & Database integration. • Experimental workflows rather than production workflows. • Experimentation • Provenance & Change Propagation • Personalisation & Collaborative working. • Security & ownership • Knowledge based using metadata and ontologies

  12. Knowledge (ontologies) Security Personalisaion Provenance Metadata Web Portal Wor kbench Carp Gene expression analysis TALISMAN annotation workbench BioMedical Services Library: DAS, workflow sets, integrated databases Upper level knowledge-based Grid Common Services: Semantic integration, knowledge based querying, workflow composition, visualisation, provenance mgt, semantic service discovery Middle level Grid Common Services: Database access, distributed query processing, service discovery, workflow enactment, event notification Low level Grid Common Services (OGSI) Co-scheduling, data shipping, authentication, job execution, resource monitoring, database access …

  13. Who is myGrid for? myGrid users IS specialists biologists systems administrators tool builders infrequent problem specific service provider bioinformaticians bioinformatics tool builders

  14. myGrid Outcomes • e-Scientists • Environment built on toolkits for service access, personalisation & community. • Talisman – Interpro family of pattern databases annotation • UTOPIA – visual multiple sequence alignment • Workbench for gene expression in Carp & Graves disease • Developers • Protocols and service descriptions. • myGrid-in-a-Box developers kit of core services. • Reference implementation services & applications. • Bio services.

  15. Service based architecture • Each bio resource is a service • Database, archive, analysis, tool, person, instrument, a workflow … • Each myGrid architectural component is a service • Workflow enactment engine, event notification, registry, scheduler… • OGSA early adopter. Open Grid Service Architecture Web services Grid protocols

  16. Metadata+ontology W3C: RDF, DAML+OIL, OWL • Service registration, discovery, publication, composition, management. • Data types & ontologies • Service matchmaking • Ontology editor, deployment server & reasoner • Typing inputs and outputs of workflows • Semantic Database integration • Portal driving …. Semantic Web OGSA Web services Grid protocols

  17. 1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives. 2. Once the user has entered a partial description they submit it for matching. The results are displayed below. 3. The user adds the operation to the growing workflow. 4. The workflow specification is complete and ready to match against those in the workflow repository.

  18. Integration & Coordination • View-based Information Repository for XML data • Database integration • Access XML and RDBMS with OGSA-DAI • Semantic database integration. • Distributed query processing. • Workflow • Dynamic workflow enactment engine. • Workflow repository • User interactivity. • Workflows linked with results

  19. E-Science Support • Data provenance and resource change management • Workflow logs. • Event notification service. • Incremental view management. • Workflow and query evolution. • Personalisation • Management of views over repositories. • Personalisation of process flows. • Annotation of data sets and workflows • Dynamic creation of personal data sets.

  20. Bio-Science services • Grid-enabled BioServices by the EMBL-European Bioinformatics Institute • EMBOSS, SRS, Open BQS, BLAST, XEmbl and EmblFetch, Flybase, Gadfly … • Applications using Gateway API • TALISMAN (annotation tool used by Interpro) • UTOPIA (sequence fingerprint analysis) • Portal • Workbench application

  21. Portal Repository Client Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository How do the functions of a cluster of proteins interrelate? Some proteins in my personal repository

  22. Repository Client Portal Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository Find services that takes a protein and gives their functions and pick the best match.

  23. Repository Client Portal Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository Find another that displays the proteins base on their function. Ontology restricts inputs & outputs

  24. Repository Client Portal Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository Build a workflow of composed services linked together

  25. Repository Client Portal Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository See if a workflow that is appropriate already exists. It could have been made anyone who will share with you.

  26. Repository Client Portal Workflow Client Ontology Client Meta Data: Ontology Personal Repository Meta Data: Service Type Directory Workflow Repository Pick one and enact it.

  27. Repos. Client Workflow Client Service Selection Client 4 1 2? Workflow Enactment Personal Repository 3 2? 2 Service Directory Provenance Data Bioinformatic Services While its running it picks the best service instance that can run the service at that time.

  28. Repos. Client Workflow Client Service Selection Client 4 1 2? Workflow Enactment Personal Repository 3 2? 2 Service Directory Provenance Data Bioinformatic Services While its running it picks the best service instance that can run the service at that time. Or you choose.

  29. Repos. Client Workflow Client Service Selection Client 4 1 2? Workflow Enactment Personal Repository 3 2? 2 Service Directory Provenance Data Bioinformatic Services The workflow finishes with the final display service

  30. Repos. Client Workflow Client Service Selection Client 4 1 2? Workflow Enactment Personal Repository 3 2? 2 Service Directory Provenance Data Bioinformatic Services Results are put into your personal repository, with a concept from the ontology to tell you and myGrid what they mean.

  31. Repos. Client Workflow Client Service Selection Client 4 1 2? Workflow Enactment Personal Repository 3 2? 2 Service Directory Provenance Data Bioinformatic Services And full provenance record kept, and linked with the results. We could redo or reuse the workflow.

  32. HPC vs Bioinformatics • Computational Biology vs Bioinformatics => HPC vs Info Grid • Relationship between them? Shared components? Architectures? • Information management matters! Accelerating scientific process is not just accelerating compute intensive processes. • HPC style BioGrid • Provenance? Personalisation? Metadata? Interactivity? Knowledge? Intermediate results to db; annotated logs…

  33. We are not alone • Other Efforts – we are not alone • W3C semantic web, BioMOBY, I3C, OMG LSR, active ontology development in the community, DARPA, • Open Grid Service Architecture • We believe!! Links with Web Services give many benefits. • But it’s a moving target … • GGF is a zoo … over 40 RG and WG, often overlapping.

  34. Service Providers • Its hard to get Service Providers buy-in • lower the barriers of entry • make it reliable. • security & intellectual property management • programmatic interfaces • How do we migrate legacy applications? • Whole bunch of apps and databases on the web • Accounting matters • Who is going to pay for all this?

  35. Hotch potch • Heterogeneity sucks • Multi-policy of everything – security, access, accounting really matters in EU • Getting a UK Grid to work is non-trivial • Huge investment in system admin. • Doing more than you could do before. • Not just another predictable BLAST service over a bunch of machines • Non-predictable analysis.

  36. Not a silver bullet! Its just middleware not magic • Data quality • Content management of databases (controlled vocabularies) • Provenance and versioning policies • Appropriate use of tools • Computational inaccessibility of free text annotation • Database accessibility through means other than point and click web interfaces. Independent of the Grid!

  37. Life Sciences Grid (LSG) http://people.cs.uchicago.edu/~dangulo/LSG/

  38. The sum up • If you ignore the multi-organisational aspect of Grid • If you ignore the heterogeneous aspect of Grid • If you assume its safe and free and fair • Then its not so hard.

  39. The myGrid Team • Carole Goble • Norman Paton • Alvaro Fernandes • Stephen Pettifer • Luc Moreau • Dave De Roure • Chris Greenhalgh • Tom Rodden • John Brooke • Paul Watson • Alan Robinson • Rob Gaizauskas • Robert Stevens • Neil Wipat • Matthew Addis • Nick Sharman • Rich Cawley • Simon Harper • Karon Mee • Simon Miles • Vijay Dailani • Xiaojian Liu • Tom Oinn • Martin Senger • Milena Radenkovic • Kevin Glover • Angus Roberts • Chris Wroe • Mark Greenwood • Phil Lord • Neil Davis • Darren Marvin • Justin Ferris • Peter Li • Nedim Alpdemir • Luca Toldo • Robin McEntire • Anne Westcott • Tony Storey • Bernard Horan • Paul Smart • Robert Haynes

  40. Spares

  41. Applications e-Scientist environment Knowledge applications & networks Collaboratory Annotation Text mining Prediction Knowledge services Knowledge Services Knowledge-based information services Semantic services Knowledge-based data/computation services Base services Data/computation services Information services Resources

  42. Sequence annotation MSD Cold Carp Gene Expression Custom Application Demonstrator Workbench Demonstrator Application UTOPIA Workbench Apps Builder (Talisman) User Agent Presentation Services Collaboration Support Management Tools Web Portal Gateway API Security Personalisaion Provenance BioMedical Services Library e.g. Distributed Annotation Service Semantic Discovery Information Extraction Knowledge Provenance Validation & Assessment Semantic aware services Semantic Workflow Design Ontology Service Reasoner Service matcher Semantic Data Integration Provenance metadata Annotation Base Services Distributed Query Workflow Enactment QoS Syntactic Discovery Availability Versioning Preferences Event Notification Fabric … Third Party JobExecution Device Access MIR ‘White Pages’ & ‘Yellow Pages’ Discovery Database Access Metadata Database myGrid Stack

  43. Sequence annotation MSD Cold Carp Gene Expression Custom Application Demonstrator Workbench Demonstrator Application UTOPIA Workbench Apps Builder (Talisman) User Agent Presentation Services Collaboration Support Management Tools Web Portal Gateway API Security Personalisaion Provenance BioMedical Services Library e.g. Distributed Annotation Service Semantic Discovery Information Extraction Knowledge Provenance Validation & Assessment Semantic aware services Semantic Workflow Design Ontology Service Reasoner Service matcher Semantic Data Integration Provenance metadata Annotation Base Services Distributed Query Workflow Enactment QoS Syntactic Discovery Availability Versioning Preferences Event Notification Fabric … Third Party JobExecution Device Access MIR ‘White Pages’ & ‘Yellow Pages’ Discovery Database Access Metadata Database myGrid Stack 0.1

  44. Sequence annotation MSD Cold Carp Gene Expression Custom Application Demonstrator Workbench Demonstrator Application UTOPIA Workbench Apps Builder (Talisman) User Agent Presentation Services Collaboration Support Management Tools Web Portal Gateway API Security Personalisaion Provenance BioMedical Services Library e.g. Distributed Annotation Service Semantic Discovery Information Extraction Knowledge Provenance Validation & Assessment Semantic aware services Semantic Workflow Design Ontology Service Reasoner Service matcher Semantic Data Integration Provenance metadata Annotation Base Services Distributed Query Workflow Enactment QoS Syntactic Discovery Availability Versioning Preferences Event Notification Fabric … Third Party JobExecution Device Access MIR ‘White Pages’ & ‘Yellow Pages’ Discovery Database Access Metadata Database myGrid Stack 0.2

  45. Service based architecture • Find them • Publication, registration, discovery, matchmaking, deregistration. • Run them. • Execution, monitoring, exception handling. • Organise them. • Interoperation, composition, substitution.

More Related