190 likes | 330 Views
A perspective on life science grids in Europe. Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) ISGC 2007 March 28th, 2007. Content. Introduction The life science grid ecosystem Embrace, BioinfoGRID, 2 contributions to life science activities on grids in Europe Conclusion. Introduction.
E N D
A perspective on life science grids in Europe • Vincent Breton (CNRS-IN2P3, LPC Clermont-Ferrand) • ISGC 2007 • March 28th, 2007
Content • Introduction • The life science grid ecosystem • Embrace, BioinfoGRID, 2 contributions to life science activities on grids in Europe • Conclusion
Introduction • In this talk, LIFE SCIENCE = bioinformatics and molecular biology • Hurng-Chun Lee addressed already grid-enabled drug discovery • WISDOM workshop tomorrow • Yannick Legré will address medical research later this afternoon
Needs of the life sciences community • Biologists need growing capability to handle all the data relevant to their research topics • Design of complex analysis workflows • Knowledge management • Bioinformaticians who are developing the IT services for the biologists need growing resources • to store, update, curate exponentially growing databases • To run increasingly complex algorithms on this growing data set • To build new databases exploiting the growing body of knowledge • Biologists and bioinformaticians have therefore different needs • Biologists need high level environments and little resources • Bioinformaticians need large resources to develop and/or update the services needed by the biologist
The life science community needs both e-science and grid infrastructures • E-science focusses at creating new research environments for biologists • Use of the most recent information technologies (semantics, ontologies) • Design of virtual laboratories where the biologist can run experiments and manipulate the knowledge she/he is familiar with • Examples: MyGrid (UK) and VLe (Netherlands) -> T. Oinn talk • Grid infrastructures provide ressources needed at different levels • to support bioinformaticians who maintain data bases accessed by e-science environments (update, curate, store/duplicate) • To increase resources for e-science environments when needed • To enable specific heavy computing or data production projects (Decrypthon)
The situation in Europe • Several e-science projects are developing high level e-science environments under adoption by the biology community • Grid infrastructures (EGEE, DEISA) are now providing robust computing and growing data management services • Two projects are exploring the interface between e-science and grid infrastructures for life science: Embrace and BioinfoGRID
Introduction • EMBRACE is a EU-sponsored Network of Excellence aimed at enabling bioinformatics research through better operability of databases, servers, and services
Example • You want to predict phosphorylation sites just outside • transmembrane helices in 1329 membrane proteins. • Yesterday: • 1) Obtain software to predict transmembrane helices; • 2) Obtain software to predict phosphorylation sites; • 3) Install both programs; • 4) Write software that calls both programs; • 5) Write software that combines outputs and presents results. • Tomorrow: • Import APIs for the two services; • Write software that combines outputs and presents results.
Data • EMBRACE includes nearly all European bioinformaticians with longstanding track-records in terms of providing databases, servers, and services. • Data types that they will make available: • DNA sequences, • protein sequences, macromolecular structures, SNPs, • expression information, alignments, untranslated regions, • structure domains, protein families, literature, electron • micrographs, orthologs, ORFs, genome annotation, • proteomics patterns, GPCRs, protein interactions, nucleotid
Software • EMBRACE includes nearly all large European bioinformatics centers that all will make their servers, services, and computational tools available using the EMBRACE-GRID. • Computational facilities that all European bioinformaticians will get at • their finger tips include: • DNA sequence analysis, • genome annotation, homology searches at sequence and • structure level, structure analysis, visualization, protein • sequence analysis, phylogeny, protein domain mapping, • pattern matching, HMM, neural nets, micro-arrays, workflow • management, text-mining, systems biology, database techno
Contact • EMBRACE is coordinated by Graham Cameron • and Kerstin Nyberg at the EBI. • Peter Rice coordinates the content integration • Alan Bleasby coordinates the tools integration • Vincent Breton coordinates technology recommendation • Erik Bongcam Rudloff coordinates the test cases • Gert Vriend coordinates outreach and education
Recommendation: web service technology Application interface User interface Application
Embrace grid • Develop standard web service interfaces to tools and data bases • Provide a workbench (Taverna) to exploit these tools and data (-> T. Oinn talk) • Support data base providers in accessing grid infrastructures for resources and grid services • Pioneers: Swiss Institute of Bioinformatics, CMBI (Netherland) • Creation of an Embrace VO on EGEE • Develop interfaces between the e-science environments and the grid infrastructure • Issues: web service interface to grid services
Refinement of Protein structures • Project led by Gert Vriend (CMBI, Niejmegen, NL) • Goal: recalculate the 3D structures of the 40000 proteins stored in PDB with improved image reconstruction algorithm • Estimated computing need: 4 CPU years • Status: under deployment on the Embrace EGEE VO
BioinfoGRID Objective • BioinfoGRID project aims at deploying bioinformatics applications on existing grid infrastructures (EGEE)
Grid-enabled in silico drug discovery Grid service customers Chemist biologist teams Check point Check point Check point Biology teams Selected hits hits In vitro tests target Grid infrastructure Annotation services MD service Docking services Grid service providers Chimioinformatics teams Bioinformatics teams
Molecular Dynamics (MD) simulation on the grid • Choice of Amber as MD software • MMPBSA procedure developed by G. Rastelli (Univ. Modena) • Licensing issues adressed • One license per grid site deploying Amber • Use of Amber restricted to grid users coming from institutes owning a license • Access granted to all the nodes where Amber is installed • Deployment of MD calculations on EGEE • Reranking of the 2500 best hits coming out of the first WISDOM data challenge on malaria in February • Selection of 100 best hits • In vitro tests to start next month at Chonnam National University
Conclusion • Life science community is really involving two communities: molecular biologists and bioinformaticians • Both communities have different needs • BIologists need high level e-science environments • Bioinformaticians need also growing resources to maintain, updata and curate their data bases • Several grid projects in Europe are targeting both communities • Embrace • BioinfoGRID • Significant progresses are witnessed • Still, need to develop web service interface to grid infrastructures to foster adoption
Les objectifs de l’exposé • Expliquer l’articulation entre les projets en Europe • EGEE: l’infrastructure • Les liens avec la communauté de bioinformatique • BioinfoGRID: utiliser la grille EGEE telle qu’elle est aujourd’hui. Cela signifie demander à des biologistes et des bioinformaticiens de s’adapter • Embrace: mon role est de construire le pont entre la culture actuelle de la communauté et les grilles actuelles. • Notion de grid ecosystem • Need for different services for the biology and bioinformatics community • From MyGrid to EGEE • Presentation d’EMbrace • Presentation de BioinfoGRID