380 likes | 545 Views
SeRC Swedish e-Science Research Centre Juni Palmgren EGEE User Forum Uppsala April 2010. e-Science in UK. Belfast e-Science Centre (BeSC) Cambridge e-Science Centre (CeSC) STFC e-Science Centre (STFCeSC) e-Science North West (eSNW) National Grid Service (NGS) OMII-UK
E N D
SeRC Swedish e-Science Research Centre Juni Palmgren EGEE User Forum Uppsala April 2010 J Palmgren
e-Science in UK • Belfast e-Science Centre (BeSC) • Cambridge e-Science Centre (CeSC) • STFC e-Science Centre (STFCeSC) • e-Science North West (eSNW) • National Grid Service (NGS) • OMII-UK • Lancaster University Centre for e-Science • London e-Science Centre (LeSC) • North East Regional e-Science Centre (NEReSC) • Oxford e-Science Centre (OeSC) • Southampton e-Science Centre (SeSC) • Welsh e-Science Centre (WeSC) J Palmgren
Outline • Background • e-Science as a Strategic Research Area • SeRC application 2009 • Funding from 2010 • What will SeRC do? • Example: LifeSciences • Example: Complex Disease e-Science Community J Palmgren
Research InfrastructureSweden and Europe J Palmgren
Key e-Infrastructure components Well functioning network SUNET – Swedish University NETwork SNIC – Swedish National Infrastructur for Computing DISC – Database InfraStructure Committee HPC/ compute and storage Well documented databases J Palmgren
Strategic Research Areas 2008Call 2009 • 20 research areas in technology, climate and environment, energy and health – including e-Science • e-Science proposals to involve • Methods and tools • Application areas • Infrastructure • Resourses available for e-Science (research/infrastructure) • 2010 – 25 (20/5) MSEK • 2011 – 40 (32/8) MSEK • 2012 – 70 (56/14) MSEK etc for 2013 and beyond • Applications evaluated by international panel set up by the • Swedish Research Council Spring 2009 J Palmgren
SeRC applicatione-Science • Consortium KTH – LiU – SU – KI • KTH main applicant • 30 Milj/yr research: 38%, 27%, 27%, 8% • 20 Milj/yr infrastructure: 50% KTH, 50% LiU • Consortium application includes • Core of methods development • Strong application centers and research groups • Majority of Swedish e-infrastructure (PDC och NSC) J Palmgren
SeRC Main applicants • Dan Henningson, Mechanics, KTH • Hans Ågren, Theoretical chemistry, KTH • Anna Delin, Material science, KTH • Anna-Karin Tornberg, Numerical Analysis, KTH • Anders Ynnerman, NVIS, LiU • Bengt Persson, Bioinformatics, LiU/KI • Igor Abrikosov, Material science, LiU • Juni Palmgren, Mathematical statistics, SU/KI • Anders Lansner, Computational neuroscience, SU • Erik Lindahl, Bioinformatics, SU J Palmgren
SeRC Application granted • Highest ranked application, full research budget granted • Additional research funding to UU-Lund-Umeå consortia • Infrastructure (national) part treated separately J Palmgren
Strategic funding for e-Science J Palmgren
Strengths of the SeRC consortium • Ten centers of excellence (5 Linné, 4 SSF, 1 Berzelii) • Seven additional research centers • 54 key researchers named • Six members of KVA, Five ERC grants, Five “rådsforskare” • PDC + NSC 80% of total computer capacity for academic research • Software development • DALTON, EMTO, GROMACS, PENCIL, SIMSON, SUPERFLUID, FEniCS • Participation in EU/National projects • SweGrid, NDGF, EGEE, DEISA, PRACE, ELIXIR, BBMRI, INCF, IS-ENES • Industrial support • SAAB, SCANIA, SKF, BOMBARDIER, AstraZeneca, SECTRA, LightLab, Portendo, SMHI J Palmgren
SeRC Research showcases • FLUID MECHANICS: massively parallel turbulence simulations provide new understanding • ENVIRONMENT: simulation of future climat • MATERIAL: simulations give structure of iron in earths core • GENOME MEDICINE: Identification of genome sequence variation associated with cancer risk J Palmgren
SeRC Research showcases • MOLECULAR MODELING: distributed computations enable the largest ion-channel simulations in membranes • e-EPIDEMIOLOGY: ”Twin-Net data federation” – distributed database model for European infrastructure • VISUALIZATION: Interactive full-body scan possible through smart handeling of data • BRAIN MODEL: New method enables largest simulation of memory network J Palmgren
What will SeRC do? • Form ”e-Science communities”; collaboration between • e-Science applications • core e-Science • computer experts at PDC/NSC • Research in core e-Science areas of importance for e-Science applications • Strategic collaboration between PDC and NSC • national responsibility for large computer infrastructure • strong European partner • Cooperation with industry and society J Palmgren
e-Science communities • Computational fluid dynamics • Climate and the Environment • Bioinformatics: mapping and analyzing DNA and protein sequences • Complex Diseases: modeling molecular and medical data • Particle Simulations and Molecular Dynamics • Electron structures: DFT and Hartee-Fock methods • Waves: aero-acoustics and electromagnetics J Palmgren
Core e-Science areas • Numerical Analysis • Visualization and Image Science • Mathematical Modeling • Distributed resources • Database technology • Parallel Algorithms and Performance Optimization J Palmgren
Interface to industry and society • External advisory committee with international experts and representatives from industry and society • Representatives from industry and society in e-Science communities; Establish e-Science industry forum with dedicated coordinator • e-Science software and spin-offs • Open Source software of industrial interest • Spin-off companies in electromagnetics, material science, medical visualization, … • Establish software curation unit • Production of PhDs of interest to industry J Palmgren
Management structure • Director: Dan Henningson • Co-director: Anders Ynnerman J Palmgren
Swedish e-Science Research Centre (SeRC) J Palmgren
Life SciencesNew technology! Max IV, Lund Synchrotrones Deep sequencing MRI-PET Advanced light microscopy PANDA
Genome sequencing • 2000: Human genome working drafts • All data freely released • Project took about 10 years and cost about $3 billion • Jan 2008: Every 4 days major genome centers can sequence the same number of base pairs as the HGP • New technology has now reduced this to 10 hours J Palmgren
Explosion of new biological data Cataloging sequence variation highest priority! • 2008 genome sequencing produced 1.5 petabyte data at hundreds of labs in dozens of countries with two Tier 0 centra (EMBL-EBI and NCBI) • New sequencing technology produces 5-10 times more data • Bioinformatics needs are gigantic! Notice • LHC produces 15 petabyte data annually • LHC grid has 140 computer centra in 33 countries centered around CERN asTier 0. From Iain Mattai EMBL 2008
Increasing computational cost Values are for BLAST searches on Amazon EC2 (from Wilkening et al, IEEE Cluster09) From Folke Meyer J Palmgren
Chemical entities ChEBI; PubChem Protein interactions IntAct with Imex, PRIDE Pathways Kegg; Reactome Systems BioModels Bioinformatics Databases: From molecules to systems Genomes (Sanger, EBI + NCBI) Nucleotide sequence EMBL-Bank+Genbank+DDBj Protein sequence UniProt(EBI/SIB/PIR) Gene expression ArrayExpress, GEO Protein structure wwPDB(RCSB,EBI,PDBj) Protein families, motifs and domains InterPro (12 collaborators) From Janet Thornton, EBI
Integration with other dataFrom molecules to medicine & agriculture Genomes Proteins Metabolites
SeRC Complex Disease eScience Community - Focus on human complex disease - Secure transfer and storage of large scale individual specific data - mathematical and statistical modeling of molecular, clinical, epidemiological data - Early diagnosis, prevention and cure
LIFE SPAN REGISTRIES Hospital Discharge Cancer LATE LIFE EARLY LIFE Birth Specific Disease Registries Clinical Quality Registries Congenital Malformations Cause of Death Register of prescribed drug sales L I F E GENETICALLY INFORMATIVE Population Multiple Generation Twin Migration Bulding on:(i) Swedish population based registries
(ii) Swedish Research Biobanks Biobank deposit Storage Biobank use • Input • Ensure optimally valuable biological samples stored • Ethics • Quality control • Sample handling and processing • Data gathering • Links to registers • Output • Ensure biological samples are optimally used • Ethics • Technology platforms for high throughput sample analysis • Data handling • Biostatistics IT database
(iii) Cutting edge technologyStockholm-UppsalaScience for Life Laboratory; SciLifeLab HTP technology platforms: • Genomics • Proteomics and protein profiling • Bioimaging and functional biology • Bioinformatics and systems biology J Palmgren
SeRC Complex Disease eScience Community Initially three thematic areas • Cancer • Cancer Risk Prediction (CRiSP) • Cardiovascular disease/Inflammation • Center of Excellence for Research on Inflammation and Cardio-vascular disease (CERIC) • Neuroscience • Stockholm Brain Institute (SBI) J Palmgren
… with reference to C www.crispcenter.org2008-2018 Namn Efternamn
Summary The Swedish e-Science Research Centre (SeRC) Builds on • Transition from HPC-centres to e-Science enablers Recognizes that frontier science today often • Is complex, multi-scale, and multi-science in nature • Requires multi-disciplinary, multi-investigator, multi-institutional approaches (often international in scope). • Involves high data intensity and heterogeneity from observations, digital instruments, sensor nets, and simulations • demands semantic federation, active curation and long-term preservation of access to data • Places new demands on education J Palmgren
Example:Risk prediction in CRiSP • Model risk and prognosis • Traditional risk factors • Mammographic density • Genetic variants • Search for new biomarkers • Statistical learning methods for discovery and validation • Account for disease heterogeneity • ER+, ER- tumors • - omics tumor profiles • Other.. • Assess model performance • Internal and external validation