1 / 16

Bioinformatics Applications in the Spanish Network for e-Science

Bioinformatics Applications in the Spanish Network for e-Science. Ignacio Blanquer Vicente Hernández. Outline. The Spanish Network for e-Science Structure and link with the Spanish NGI. Bioinformatics applications in the Spanish Network for e-Science.

clovis
Download Presentation

Bioinformatics Applications in the Spanish Network for e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Applications in the Spanish Network for e-Science Ignacio Blanquer Vicente Hernández

  2. Outline The Spanish Network for e-Science • Structure and link with the Spanish NGI. Bioinformatics applications in the Spanish Network for e-Science. Challenges for Bioinformatics on the Grid. Bioinformatics Session - EGEE’09 - Barcelona

  3. The Creation of the Spanish Network for e-Science As a consequence of the interest raised by the different research centres and groups participating in national and international projects on Grids and Supercomputing, the white book for the e-Science was produced (http://www.fecyt.es/e-ciencia/libroblanco.htm). The need for a global coordination and the development of common tool for easing the access to resources, the Spanish Network for e-Science (CAC-2007-52) was created by the Ministry of Science and Innovation • Officially approved on December 2007 and coordinated by Vicente Hernández García (Universidad Politécnica de Valencia). One of the mandates of the Network was to set up the Spanish NGI, which has been officially created in July 2009 • The ministry nominated Isabel Campos (IFCA) as the coordinator of the Spanish NGI. Bioinformatics Session - EGEE’09 - Barcelona 3

  4. Participant Groups More than 50 different institutions and 97 Research Groups. More than 1000 researchers. Dynamic Structure • 28 Groups have been incorporated after the starting of the activity. Structured in Four Activity Areas • EGEE Booth Number 6. Bioinformatics Session - EGEE’09 - Barcelona 4

  5. Infrastructure IFCA 867 cores 1 TB CESGA 339 cores 1 TB UNIZAR 54 cores 0.8 TB PIC 1296 cores 10 TB CIEMAT 220 cores 2.7 TB UPV 36 cores 1 TB • gLite-based • Own BDII (EGEE-Compatible) • Supporting IBERGRID (ES+PT) • 3 DifferentWMs (Xbroker, gLite-WMS, GridWay) Bioinformatics Session - EGEE’09 - Barcelona

  6. Applications Expert Panel Support Groups Pilots Pilot Selection Pilot migration Deploym. and test Report Applications Applications proposal Autonom. migration NGI infrastructure Analysis and Selection Production Resource Allocation Assisted Migration Support Groups Expert panel Bioinformatics Session - EGEE’09 - Barcelona 3 Roles are identified • Mature applications aiming at a challenging experiment. • Pilots that require intensive porting and a feasibility study. • Support groups with experience on porting applications. Pilots, Applications and Support Groups are certified by an expert board. An internal call for projects was set up.

  7. Overview of the Bioinformatics Applications Consolidated Use • Work on current databases to analyse quality, improve annotation or increase the usability • CD-HIT. • GSBLAST. • BiG - Metagenomics. Emerging Use • Port new applications on the Grid for providing new services • Gfrodock. • G-MIRA. • Filogen. Bioinformatics Session - EGEE’09 - Barcelona

  8. http://www.e-ciencia.es/wiki/index.php/CD-HITCD-HIT Identification of Representative Sequences of Protein Families using CD-HIT • Proposed by the National Centre of Oncological Research (CNIO). • It proposes using the resources available through the Spanish Network for e-Science and the CD-HIT algorithm to create more regularly non redundant versions of the available databases. Bioinformatics Session - EGEE’09 - Barcelona

  9. http://www.e-ciencia.es/wiki/index.php/BLAST GBLAST Output size using the columns as input and the rows as reference database Analysis of the horizontal transference of genes through a BLAST Processing Service • Proposed by the “Instituto de Biología Celular y Molecular de Plantas” and the GRyCAP, from the Universidad Politécnica de Valencia. • This experiment aims at identifying the horizontal transference of gens between prokaryotes and plants, using the UINPROT database, and comparing all known prokaryotic sequences (~4M) among all the known sequences of plants (~0.5M), animals (~1.5M) and fungus (~0.4M). Bioinformatics Session - EGEE’09 - Barcelona

  10. http://www.e-ciencia.es/wiki/index.php/GFrodock GFrodDock Grid-Fast ROtational DOCKing • Proposed by the Centro de Investigaciones Biológicas – CSIC. • The objective is determining the interaction between two proteins by means of the analysis of their atomic structure. • Aiming at solving one of the CAPRI (Critical Assessment of Predicted Interactions) scientific challenges. Bioinformatics Session - EGEE’09 - Barcelona

  11. Metagenomic Analysis on the GridBiG Quality of the phylogenetic annotation of bacteria • Comparative phylogenetic experiment on a soil sample with respect to different releases of the NR Gene Bank Database. • Many of the associations of sample fragments to biological families have changed, even recently. • The changing rate does not decreases as time goes by, being increased in many cases. • This reveals that the complete diversity of such communities is not sufficiently well described on current data bases. Bioinformatics Session - EGEE’09 - Barcelona

  12. http://www.e-ciencia.es/wiki/index.php/MIRAGMIRA Assembly of Pyrosequences • Proposed by the “Instituto de Biología Molecular y Celular de Plantas” and the Grid and High Performance Computing Research Group of the Universidad Politécnica de Valencia. • The new high-throughput sequencing techniques are producing millions of readings between 80 and 500 nucleotids each, requiring intensive post-processing for their assembly. • This pilot focuses on porting to the Grid one well-known code for this kind of sequences, which requires vast computing and memory resources. Bioinformatics Session - EGEE’09 - Barcelona

  13. http://www.e-ciencia.es/wiki/index.php/FilogenFilogen Construction of Phylogenetic trees • Proposed by the Institute of Research on Engineering in Aragon (I3A). • Phylogenetics aims at reconstructing the evolutionary relations among species and living beings using the information from their genome. • This pilot focuses on porting a suite of general purpose codes for such objective, in order to reduce the long response time required for challenging executions. Bioinformatics Session - EGEE’09 - Barcelona

  14. Current Status Resource Usage 4 Projects already have a VO created (vo.odthpiv.es-ngi.eu, vo.blast.es-ngi.eu, vo.filogen.es-ngi.eu and vo.frodock.es-ngi.eu ). 3 Projects (GBLAST, FILOGEN, and g-MIRA), have been granted with resources for porting through an internal project call. 33% of the resources have been consumed by the biomed applications. Bioinformatics Session - EGEE’09 - Barcelona

  15. Challenges 1/2 From the point of view of the resources • Improved scheduling of jobs • Highly dynamic nature of the behaviour of resources (multiple entry points, information system refreshment delays, wide geographic distribution, …). • Need for Quality of Service and job run-length prediction. • Need for much more scalable algorithms and models • Go beyond the simple high-throughput approach based on splitting the input. • I/O Bandwidth consume minimisation • Improvement of locality of reference for large databases. • Specialised resources • Main memory constraints. • Availability of pre-existing tuned configurations of widely used software. Bioinformatics Session - EGEE’09 - Barcelona

  16. Challenges 2/2 From the point of view of the community • Trade-off on Public Database between extensively covering the available information and its quality. • Many results of using Grid in bioinformatics have been focused on this issue. • Since databases are exponentially growing on size, this issue seems to be valid for the medium-term. • Popularisation of community access • Availability of simpler interfaces and configurable workflows • But Grids are not adequate for any kind of problems • Do not create over-expectances. • Many research group already have medium-size computing resources which can tackle most of the daily work. • Create user’s confidence. Bioinformatics Session - EGEE’09 - Barcelona

More Related