340 likes | 463 Views
eScience and Grid Tools and techniques for the next generation scientist. Professor Brian Vinter Head of the Copenhagen eScience Center. e Science. «The next 10 to 20 years will see computational science firmly embedded in the fabric of science
E N D
eScience and GridTools and techniques for the next generation scientist Professor Brian VinterHead of the Copenhagen eScience Center
eScience «The next 10 to 20 years will see computational science firmly embedded in the fabric of science – the most profound development in the scientific method in over three centuries.» US Department of Energy 2003.
Mega-Science • The next scientific period will be dominated by Mega-Science projects • 104 researchers on a single project • Extreme data production • Highly integrated collaboration between different groups of scientists • Examples • CERN LHC • ALMA • Mars project
Data Production 1 Exabyte = 1000 Petabytes 1 Petabyte = 1000 Terabytes 1 Terabyte = 1000 Gigabytes 1 Gigabyte = 1000 Megabytes • 1997: Total data worldwide app 12 exabytes (incl. documents, film, TV, pictures, …)1 • 1999: 2-3 exabytes data produced2 • 2002: App. 5 exabytes data produced2 Global data availablity doubles every 4-5 years. 1) http://www.lesk.com/mlesk/ksg97/ksg.html 2) http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
eScience Components • Modeling and simulation
eScience Components • Modeling and simulation • Data acquisition and handling
eScience Components • Modeling and simulation • Data acquisition and handling • Visualization
eScience Components • Modeling and simulation • Data acquisition and handling • Visualization • HPC and Grid
Why is it getting more difficult? 54 molecules 442 molecules 1372 molecyles
Nano-modeling • Extremely CPU- and Data-intensive algorithms • Complex structure-calculations • Multiple days of execution even on a supercomputer • Runs of both PCs and Supercomputers
eScience and Bio/Med • We expect very good results form eScience in biology and medicine • The foremost advantages will come from introducing a mathematical causal understanding of biological systems • Bio-informatics are already doing this • An emerging field: Systems Biology • Systems Medicine is also starting internationally
Calculations in treatment • Computational methods are already important in medical planning • Radiation planning • Bypass flow modeling • Robotic surgery • …
Personalized medicine • Every human is unique • Also at the genetic level • In our genome, which is written with the alphabet ACGT, we have a number of micro mutations – called single nucleotide polymorphisms, SNP • These SNPs are often without consequence but • Some make us sick • Some are indicators of a faulty gene • Others influence our reception of a drug • The last complication makes is very hard to make drugs for the general population • We want to move from commodity medicine to custom tailored drugs
An example • app 60% of today's medicines are metabolized by cytochrome P450 enzymes • Some have highly efficient P450 while others have very slow and inefficient P450 • Knowledge of a patients P450 level will allow us to dose medicine to the individual much more efficiently • This is already in early use
And this is eScience how? • Developing a drug is not a linear process • The human genome is written with billions og letters • Any person has millions of SNP mutations • Finding the SNP that has an effect is a highly complex computational task
eScience and geology • Geology and hydrology too has been using computational methods for a long time • There are very interesting aspects in combining different methods • i.e. include biological systems in the models • Inverse mapping of seismic data • It turns out that we use the same techniques in medicine • And soon in industry
Grid Minimum intrusion Grid
GRID Resource User GRID Resource User GRID Resource User Resource Minimum intrusion Grid
Processing plants • Like the power grid the computing Grid has many types of power producers • High yield power plants (fossil fuel, nuclear,…) • Supercomputers and large farms • Low yield producers (windmills, etc) • Individual PCs and games-consoles • Very low yield producers (solar panels, etc.) • Web-browers
VGrids • Best thing since sliced bread • VGrids are Virtual Organizations in MiG • They are a dead easy way to create collaborations • Share files • Share resources • Private entry page • Public Web-page
Portals • VO’s can generate their own private entry pages including application portals
Files in VGrids • A user must keep her personal home-directory independent of which VGrid she works in • But VGrids have a common directory where only members of the VGrid are allowed • These are represented as directories in the users home-directory • VGrid owners can create sub-VGrids
Examples eScience on Grid
GeneRecon • GeneRecon seeks to identify genetic factors behind heretical deceases • The overall idea is to compare two genomes • One where the decease is observed • One where the decease is not observed • App 1000 individuals in each set • GeneRecon is developed at the Bioinformatics Research Center, Århus University
GeneRecon • The Algorithm is a Markov-chain Monte Carlo method • A test run consists of app. 30.000 individual tests • One test runs form 1 to 10 days on a PC • In total no less than 82 CPU years • MiG hosted the execution on Grid and got the execution down below a month
Statistics • 1315 jobs were submitted to Grid at the same time • 0 jobs were lost • First result • 2:04:44 • Last result • 28 days, 5:42:54
Groundwater modeling on Funen Calibration of the Assens model: 1 model evaluation = 30 min 920 model evaluations = 19 days
Master Client Client Client Client Days to hours AUTOCAL OfficeGRID
Drug Design • Molecular docking is a time consuming calculation process which this project does through two steps • First step is a coarse calculation that can eliminate molecules that won’t dock • This process can run on PCs and PS3’s – a lot of work is being done towards efficient utilization of the CELL CPU for molecular docking • The molecules that survive the first step are then modeled more precisely at quantum level on classic supercomputers and clusters
SeGrid • Still a proposal • The idea is to share sensitive data through Grid and use the Grid technology to manage access control and automatic anonymization
More information • www.eScience.dk • Portal for KUs eScience activities • www.migrid.org • Portal for the Minimum intrusion Grid • www.rcuk.ac.uk/escience/ • The very ambitious UK eScience program