180 likes | 295 Views
Bioinformatics Infrastructure for Life Sciences (BILS) and ELIXIR in Sweden. Bengt Persson. BILS – Bioinformatics Infrastructure for Life Sciences. Distributed national research infrastructure, similarly to ELIXIR Bioinformatics support specialised nodes
E N D
Bioinformatics Infrastructure for Life Sciences (BILS) andELIXIR in Sweden Bengt Persson
BILS – Bioinformatics Infrastructure for Life Sciences • Distributed national research infrastructure,similarly to ELIXIR • Bioinformatics support • specialised nodes • Large-scale sequencing, Proteomics, Systems biology, Metabolomics, Structural calculations, Biobank-related bioinformatics • general nodes • Bioinformatics network • nodes at each of the 6 large university cities • annual workshop • Bioinformatics computation and data storage • in collaboration with SNIC (Swedish National Infrastructure for Computing) • Swedish node in ELIXIR ELIXIR Funding Agencies Planning Meeting 11.10.2010
Funding from Swedish Research Council • 2010 4 MSEK • 2011 8.5 MSEK • 2012 13 MSEK ELIXIR Funding Agencies Planning Meeting 11.10.2010
Planned BILS activities • Provided by the participating groups • leading bioinformatics groups in Sweden • General support • distributed at the six large university cities • Specialised support • Large-scale sequencing • Proteomics • Systems biology • Metabolomics • Structural calculations • Biobank-related bioinformatics • ... • Training ELIXIR Funding Agencies Planning Meeting 11.10.2010
Initial BILS activities • Support to users of large-scale sequencing facilities • Providing analysis tools and methods • Maintaining databases and storage of primary data • Setting up pipelines for the first analysis steps of data • Evaluate, report on, and set up bioinformatics software • In-depth bioinformatics support • National data repository for mass spectrometry proteomics • in close collaboration with proteomics groups • interfacing European efforts • BILS/BBMRI.SE collaboration • Building interfaces to enable for researchers using biobank data to get seamless access to bioinformatics tools and databases. • Support in metagenomics ELIXIR Funding Agencies Planning Meeting 11.10.2010
Plans for autumn 2010 • BILS positions at each of the six large university towns • Umeå • Uppsala • Stockholm • Gothenburg • Linköping • Lund • BILS technical coordinator ELIXIR Funding Agencies Planning Meeting 11.10.2010
Swedish ELIXIR node Sweden has a long tradition in providing bioinformatics tools and databases for the life science community With the establishment of BILS, long-term support of these will be guaranteed, and Sweden is able to take responsibility for a number of tools and databases that are of European interest. Data resources Bioinformatics methods/services Computational and storage resources ELIXIR Funding Agencies Planning Meeting 11.10.2010
Data resources • Primary databases of data produced in Sweden • Secondary databases developed in Sweden • Human Protein Atlas • One major Swedish contribution to the international research community. • Localisationof human proteins in cells, tissues and organs. • Allows for a systematic exploration of the human proteome using Antibody-Based Proteomics. • Version 6 (March 2010) contains • more than 9 million high-resolution images • 8,400 (40%) protein-encoded genes • 48 different normal tissues • 20 different cancer types • 47 different human cell lines. • Focus on interfacing to various ELIXIR resources in order to facilitate the utilisation of this important information resource. ELIXIR Funding Agencies Planning Meeting 11.10.2010
Data resources, cont. • Several databases with protein families and orthologues, e.g.: • InParanoid/MultiParanoid and OrthoDisease • Comprehensive databases of orthologsin eukaryotes and disease gene orthologs • Pfam • The Sonnhammer group is partner of the Pfam consortium contributing software tools such as NIFAS and Pfamalyzer for analysing protein domain architecture and evolution. • HOPS • FunShift • MolMeth (The Molecular Methods Database; http://www.molmeth.org) • Structured database developed for the BBMRI project with the aim to provide best practice-based protocols for molecular analyses of different types of samples • PROPHEY (http://prophecy.lundberg.gu.se/), • Database with quantitative high-resolution genome-wide phenotypic information about genetic perturbations • Microarray databases at LCB-Data-Warehouse (http://www2.lcb.uu.se/lcbdw.php) • SDR and MDR databases (http://www.sdr-enzymes.org, http://www.mdr-enzymes.org) • short-chain and medium-chain dehydrogenases/reductases including HMMs for family designation ELIXIR Funding Agencies Planning Meeting 11.10.2010
Data resources, cont. • Long-term storage of primary data from a variety of sources, • e.g. large-scale sequencing • and proteomics • The framework for storage uses GRID storage and is scalable to fit European nodes. • For proteomics, the web-based analysis system Proteios provides complete analysis for several proteomics workflows and generates XML in PRIDE format. • The Proteios analysis system works seamlessly with files on a local storage or on a remote storage, like the GRID storage. ELIXIR Funding Agencies Planning Meeting 11.10.2010
National storage Swestore srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz NSC PDC UPPMAX HPC2N Lunarc C3SE BILS web pages Any user Users of SNIC systems ELIXIR Funding Agencies Planning Meeting 11.10.2010
Bioinformaticsmethods/services • Methods/services developed and maintained in Sweden • General European interest • Examples: • Structural calculations • Pcons, Pfrag • Analysis of membrane proteins • TOPPRED, TMHMM, Phobius, Zpred, GPCRHMM, SHRIMP, SCAMPI, OCTOPUS, SPOCTOPUS, TOPCONS • Tools for microarray analysis • BASE (BioArray Software Environment) • Functional predictions • SFINX, FunCoup, Dasher ELIXIR Funding Agencies Planning Meeting 11.10.2010
Bioinformatics methods, cont. • EVALLER • web-tool wherein you can electronically test a protein’s potential allergenicity/cross-reactivity based on its amino acid sequence. • feasible for scanning purposes and as a key part of an integrated allergenicity assessment procedure. • UniDomInt • An integrated database of domain–domain interactions • jSquid • A java tool to visualize networks and edge scores. • GPCRHMM • A hidden Markov model for GPCR detection. • SFINX • Integrated functional and structural protein feature prediction. • Dasher • A Java DAS client for displaying annotations on a protein sequence. ELIXIR Funding Agencies Planning Meeting 11.10.2010
Bioinformatics methods, cont. • Interfaces towards BBMRI • Development of interfaces between bioinformatics services (BILS/ELIXIR) and the biobank infrastructure (BBMRI.SE/BBMRI), which will be of importance to make the huge bioinformatics resources available for large-scale biomedical studies using biobanks • Science for Life Laboratory in Stockholm/Uppsala • Development of analysis pipelines, computer programs, methods and standards, that will be of European interest ELIXIR Funding Agencies Planning Meeting 11.10.2010
Bio-compute centres • Provided by BILS together with SNIC (Swedish National Infrastructure for Computing) • Computational resources needed for periodic calculation campaigns • e.g. for sequence comparisons and HMM calculations when new major database releases require updates of the accompanying information. • Example: dedicate one of our large clusters to such campaigns for a number of days every 2–3 months ELIXIR Funding Agencies Planning Meeting 11.10.2010
Infrastructure collaboration BILS National BBMRI.SE SNIC Regional NDGF ELIXIR PRACE BBMRI European EGI Other ELIXIR nodes EBI Computation Storage Grid Cloud Other BMS infrastructures, e.g. BBMRI Bioinformation ELIXIR Funding Agencies Planning Meeting 11.10.2010