300 likes | 470 Views
Integrated e-Infrastructure for Distributed, Data-driven, Data-intensive High Performance Computing: Biomedical Requirements. P e ter V Coveney Centre for Computational Science University College London. Integrating the Strengths of the e-Research Community, NeSC, Thursday, 10th March 2011.
E N D
Integrated e-Infrastructure for Distributed, Data-driven, Data-intensive High Performance Computing: Biomedical Requirements Peter V Coveney Centre for Computational Science University College London Integrating the Strengths of the e-Research Community, NeSC, Thursday, 10th March 2011
Contents • Computational Biomedicine • HIV/AIDS • Cardiovascular medicine • Cancer • ICT, e-Health and the • Virtual Physiological Human • Infrastructure support • Shortcomings in UK infrastructure • Major policy hurdles • UCL CLMS initiative • Conclusions
UCL Projects • VPH Network of Excellence – EU (€8M); no HPC • ContraCancrum – EU (€3.4M); no HPC • VPH-Share – EU (€10.7M); no HPC • P-Medicine – EU (€13.7M); no HPC • INBIOMEDVision – EU (€2M) • MAPPER – EU (€2M); no HPC • A new approach to Science at the Life Sciences Interface – EPSRC (£4M) + HECToR • Large Scale Lattice-Boltzmann Simulation of Liquid Crystals – EPSRC (£800K) + HECToR
Patient-specific medicine • ‘Personalised medicine’ - use the patient’s specific profile to better manage disease or a predisposition towards a disease • Tailoring of medical treatments based on the characteristics of an individual patient Why use patient-specific approaches? • Treatments can be assessed for their effectiveness with respect to the patient before being administered, saving the potential expense of ineffective treatments Patient-specific medical-simulation • Use of genotypic and or phenotypic simulation to customise treatments for each particular patient, where computational simulation can be used to predict the outcome of courses of treatment and/or surgery See: P. V. Coveney et al (eds), Interface Focus, Theme Issue on VPH Vol. 1, No. 3 Online 25thApril 2011
Monomer B 101 - 199 Monomer A 1 - 99 Flaps Glycine - 48, 148 Saquinavir P2 Subsite Catalytic Aspartic Acids - 25, 125 C-terminal N-terminal Leucine - 90, 190 Medical/clinical domain I: HIV/AIDS HIV-1 Protease is a common target for HIV drug therapy • Enzyme of HIV responsible for protein maturation • Target for Anti-retroviral Inhibitors • Example of Structure Assisted Drug Design • 9 FDA inhibitors of HIV-1 protease • So what’s the problem? • Emergence of drug resistant mutations in protease • Render drug ineffective • Drug resistant mutants have emerged for all FDA inhibitors Integrate simulation with conventional clinical decision support systems to refine results
Medical/clinical domain II: Grid enabled neurosurgical imaging using simulation • The goal: to simulatelarge scale patient specific cerebral blood • flow in clinically relevant time frames • Objectives: • To study cerebral blood flow using patient-specific image-based models. • To provide insights into the cerebral blood flow & anomalies. • To develop tools and policies by means of which users can better exploit the ability to reserve and co-reserve HPC resources. • To develop interfaces which permit users to easily deploy and monitor simulations across multiple computational resources. • To visualize and steer the results of distributed simulations in real time • Yield patient-specific information which helps plan embolisation of arterio-venous malformations, aneurysms, etc. M. D. Mazzeo and P. V. Coveney, Computer Physics Communications, 178, (12), 894-914, (2008). DOI: 10.1016/j.cpc.2008.02.013.
Medical/clinical domain III: ContraCancrum Multi-level data Multi-level Modelling Two dedicated clinical studies in ContraCancrum, one in glioma and one in lung cancer (200 cases/year) Clinically Oriented Translational Cancer Multilevel Modelling http://www.contracancrum.eu
Virtual Physiological Human “a methodological and technological framework that, once established, will enable collaborative investigation of the human body as a single complex system ...” • Funded under EU FP 7; ~ €250M • 20 projects: 1 NoE, 5 IPs, 11 STREPs, 3 CAs. Networking NoE
HIV Heart Aneurisms Musculoskeletal VPH-Share Overview VPH-Share will provide the organisational fabric (the infostructure), realised as a series of services, offered in an integrated framework, to expose and to manage data, information and tools, to enable the composition and operation of new VPH workflows and to facilitate collaborations between the members of the VPH community. €11M, 2011-2015, EU FP7 – Promotes cloud technologies
Multi-level disease modeling Multi-scale therapy predictions/disease evolution results Disease Modelling at the molecular Level Disease Modelling at the cellular Level Disease Modelling at the tissue/organ Level G G S G M 1 2 0 N A p-medicine • Predictive disease modeling in p-medicine will contribute to the optimization of cancer treatment by fully exploiting the individual data of the patient. • p-medicine is focusing on Wilms tumor, breast cancer and acute lymphoblastic leukemia • The p-medicine infrastructure supports both a generic seamless, multi-level data integration purpose and a VPH-specific, multi-level, cancer data repository to facilitate model validation and clinical translation through trials. • The infrastructure is scalable for any disease as long as predictive modeling is clinically significant in one or more levels (from molecular to tissue level) and the development of such models is feasible (i.e. there is enough understanding of the biological mechanisms involved to develop them). • Led by a clinical oncologist - Prof Norbert Graf! €13M, 2011-2013, EU FP7
Large scale data & computing • Models are built for use in clinical decision support • results are needed in a timely fashion • It is necessary to have the possibility of seamlessly “plugging in” resources for parallel and large scale computing “here and now” • petascale computing is needed to perform e.g.: • activities like drug binding affinity determination • Blood flow through tumours • Gratis via VPH-NoE supervised VPH Virtual Community allocations of time on DEISA and, in future PRACE via MAPPER, …? Seamless access and integration of distributed, heterogeneous data in a data warehouse repeatedly over time (≈ 200 GB / patient and time point)
MAPPER: Objectives and Challenges MAPPER will develop computational strategies, software and services for distributed multiscale simulations across disciplines, exploiting existing and evolving European e-Infrastructure. Driven by seven exemplar multiscale applications, MAPPER will deploy a computational science infrastructure for distributed multiscale computing on and across European e-Infrastructures. By taking advantage of existing software and services, MAPPER will deliver high quality components aiming at large-scale, heterogeneous, high performance multidisciplinary multiscale computing, while maintaining ease of use and transparency for end users. MAPPER will advance state-of-the-art in high performance computing on e-Infrastructures by enabling distributed execution, across all European e-Infrastructures, of multiscale models. http://www.mapper-project.eu
VPH ToolKit http://toolkit.vph-noe.eu
VPH Virtual Community on DEISA • + euHeart in second wave, and other non-VPH EU projects • VPH was awarded 2 million standard DEISA core hours for 2009, • renewed for 2010 and 2011 • HECToR (Cray, UK) • SARA (IBM Power 6, Netherlands) DEISA-TeraGrid interoperability project has additional access to LRZ
VPH requires HPC and Data Integration • Computational experiments integrated seamlessly into current clinical practice • Clinical decisions influenced by patient specific computations: turnaround time for data acquisition, simulation, post-processing, visualisation, final results and reporting. • Fitting the computational time scale to the clinical time scale: • Capture the clinical workflow • Get results which will influence clinical decisions: 1 day? 1 week? • This project - 15 to 30 minutes • Development of procedures and software in consultation with clinicians • Security/Access is major concern • Need to integrate Data, Compute via Workflows • On-demand availability of storage, networking and computational resources
Many of the projects we are involved in have non-standard requirements with respect to HPC service providers • Ability to co-reserve resources HARC • Launch emergency simulations SPRUCE • Consistent interfaces for federated access AHE • Access to back end nodes: steering, visualisation • Lightpath network connections • Data integration from multiple sources IMENSE • Support for software (ReG steering toolkit etc)
Individualized MEdiciNe Simulation Environment IMENSE • Data repository – this is the key store for project data containing all patient data, and simulation data derived from the patient data. • Integrated web portal – this provides the central interface from which users upload and access data sets, and analysis services. The interface provides users with the facility to search for patient data based on a number of criteria. • Web Services – the web services platform implements required data processing functions. • Workflow environment – the workflow environment provides a virtual experiment system, from which users can launch pre-defined workflows to automate moving data between the data environment and multiple data processing services. Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
IMENSE Environment IMENSE Interface
Workflows • GSEngine is a workflow orchestration engine developed by the ViroLab project • Can be used to orchestrate applications launched by AHE • It allows services to be orchestrated using both point and click and scripting interfaces • Workflows stored in a repository and shared between users • Many of the aims of ViroLab similar to VPH-I projects, so GSEngine will be useful here Malawski et al, Future Generation Computer Systems, 26, (1), 138—146, 2010
Inside IMENSE: Integrating the components Coveney et al, “An e-Infrastructure Environment for Patient Specific Multiscale Modelling and Treatment”, preprint, 2011
UK Infrastructural Failures • UK computing e-Infrastructure is crumbling. • Not a holding partner in PRACE. • No Tier-0 site in the works. • Only one Tier-1 machine (with issues). • HECToR has had several major failures, researchers seem to have trouble using/trusting it, given its usage. • What’s happening next? • Tier-2 facilities are also being dismantled. • NGS core nodes being shut down!! • We cannot maintain a good level of e-Science research without the infrastructure to support it • Relative to other countries we’re in full scale retreat!
Infrastructure in the UK is fragmented NGS Data HPC Networks ?
TeraGrid eXtreme Digital (XD) • Two sets of services: • XES will provide a set of well-known (and standard) protocol specifications and profiles • CPS will support both the diversity of different services and capabilities required by the community • From the desktop to the largest machines! • XD design is firmly tied to the user requirements of the science and engineering research community. • Presents the individual user with a common user environment • Caters to both researchers whose computations require very little data movement and those performing very data-intensive computations. • Will offer a highly capable service interface to “community user accounts” such as science gateways https://www.teragrid.org/web/about/xdtransition
We face major policy hurdles • For our projects to be successful, we need integrated compute, storage, networks and services. • HPC’s antediluvian policies prevent this from happening • They still have a batch job mentality! • No coordinated allocations policies in the EU • Need to apply for a project, then if successful apply for compute access Can’t do project if compute application rejected!
Importance of connectivity • With limited national facilities, connectivity to other countries becomes crucial. • 1-10Gbit wide area networks are needed for large simulations and data movements. • However, network provisioning is currently extremely difficult and time-consuming. • Researchers end up having to request the links, rather than resource providers.
Policy issues • E-science research has always required changes in resource provider policies to thrive. • Support for advance machine and network reservations. • Including urgent computing. • Improvements in accessibility and usability. • Support for Audited Credential Delegation. • Interoperability between machines & infrastructures. DEISA’s Failure to address this augurs poorly for the future
Political issues • Streamlined procedures for UK or EU scientific projects. • All-in proposals which, when accepted, grant everything needed for a research project. • This includes funding for research as well as HPC resource allocations. • More sensible service level agreements. • If a simulation uses multiple machines and one fails, a full allocation refund should be given. MAPPER Policy Document – copies available
Computational Life and Medical Science The CLMS Network is 3 year initiative from September 2010 Management: Dean’s Committee Steering Committee Supported by the Provost's Strategic Fund Director: P.V. Coveney http://www.clms.ucl.ac.uk
CLMS Goals CLMS brings together UCL researchers with clinicians from UCL partners to develop shared data + compute + data transfer + application support services Integrated e-Infrastructure and Services • Expand UCL’s world-leading position in life and biomedical sciences • Steering the collaboration with academic institutions: within UCL, with UCLP and the NHS, UK-CMRI, Yale, and others • Exploit initiatives in integrative biomedical systems science from the UK Research Council, EU and others around the world • Grow collaborations with industry, create business and commercial opportunities, promote UCL IP licensing • Plan for the next stages of activity in computational life and medical sciences at UCL
Conclusions • Biomedical projects all put pressure on resource providers to offer new services and new ways of working • For interactive and urgent work the batch processing model does not work • The very conservative model adopted by HPC providers proscribes their resources from being used in innovative ways to do new science and engage new and different kinds of users • If HPC is to be exploited in computational biomedicine it needs to be used in a way that fits in with the medical & clinical workflow • VPH and similar initiatives: Will only increase pressure for non-standard services from resource providers