230 likes | 353 Views
Issues with Production Grids. Tony Hey Director of UK e-Science Core Programme. The ‘Grid’ is a set of core middleware services Running on top of high performance global networks to support research and innovation. CPUs. Clusters. Compute Resource Grids. Overlay and Compose
E N D
Issues with Production Grids Tony Hey Director of UK e-Science Core Programme
The ‘Grid’ is a set of core middleware services Running on top of high performance global networks to support research and innovation
CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Functional Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services
NGS “Today” Interfaces • Projects • e-Minerals • e-Materials • Orbital Dynamics of Galaxies • Bioinformatics (using BLAST) • GEODISE project • UKQCD Singlet meson project • Census data analysis • MIAKT project • e-HTPX project. • RealityGrid (chemistry) • Users • Leeds • Oxford • UCL • Cardiff • Southampton • Imperial • Liverpool • Sheffield • Cambridge • Edinburgh • QUB • BBSRC • CCLRC. OGSI::Lite
RealityGrid AHM Experiment • Measuring protein-peptidebinding energies – Gbind is vital for e.g. understanding fundamental physical processes at play at the molecular level, for designing new drugs. • Computing a peptide-protein binding energy traditionallytakesweeks tomonths. • We have developed a grid-based method toacceleratethis process. We computedGbindduring the UK AHM i.e. in less than 48 hours ligand Src SH2 domain
Experiment Details • A Grid based approach, using the RealityGrid steering library enables us to launch, monitor, checkpoint and spawn multiple simulations • Each simulation is a parallel molecular dynamic simulation running on a supercomputer class machine • At any given instant, we had up to nine simulations in progress (over 140 processors) on machines at 5 different sites: e.g 1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x NGS-Leeds, 1x NGS-RAL
Experiment Details (2) • In all 26 simulations were run over 48 hours. We simulated over 6.8ns of classical molecular dynamics in this time • Real time visualization and off-line analysis required bringing back data from simulations in progress. • We used UK-light between UCL and the TeraGrid machines (SDSC, NCSA)
The e-Infrastructure UK NGS Leeds Manchester Starlight (Chicago) US TeraGrid Netherlight (Amsterdam) Oxford RAL SDSC NCSA PSC UCL UKLight AHM 2004 Local laptops and Manchester vncserver All sites connected by production network (not all shown) Computation Steering clients Service Registry Network PoP
The scientific results … Some simulations require extending and more sophisticated analysis needs to be performed
… and the problems • Restarted the GridService container Wednesday evening • Numerous quota and permission issues, especially at TG-SDSC • NGS-Oxford was unreachable Wednesday evening to Thursday morning • The steerer and launcher occasionally fail • We were unable to checkpoint two simulations • The batch queuing systems occasionally did not like our simulations • 5 simulations died of natural causes • Overall, up to six people were working on this calculation to solve these problems
Grid Operation Support Centre NGS “Tomorrow” Web Services based National Grid Infrastructure
Specifications that have/will enter a standardisation process but are not stable and are still experimental ‘WS-I+’ profile Standards that have broad industry support and multiple interoperable implementations Specifications that are emerging from standardisation process and are recognised as being ‘useful’ Web Service Grids: An Evolutionary Approach to WSRF WS-I
OMII Vision • To be the national provider of reliable, interoperable, open source grid middleware • Provide one-stop portal and software repository for grid middleware • Provide quality assured software engineering, testing, packaging and maintenance for our products • Lead the evolution of Grid middleware through a managed programme and wide reaching collaboration with industry
OMII Distribution 1 Oct 2004 • Collection of tested, documented and integrated software components for Web Service Grids • A base built from off-the-shelf Web Services technology • A package of extensions that can be enabled as required • An initial set of Web Services for building file-compute collaborative grids • Technical preview of Web Service version of OGSA-DAI database middleware • Sample applications
Include the services in previous distributions +… OMII managed programme contributions Database service Workflow service Registry service Reliable messaging service Notification service Interoperability with other grids OMII future distributions
Why Workflows and Services? Workflow = general technique for describing and enacting a process Workflow = describes what you want to do, not how you want to do it Web Service = how you want to do it Web Service = automated programmatic internet access to applications • Automation • Capturing processes in an explicit manner • Tedium! Computers don’t get bored/distracted/hungry/impatient! • Saves repeated time and effort • Modification, maintenance, substitution and personalisation • Easy to share, explain, relocate, reuse and build • Available to wider audience: don’t need to be a coder, just need to know how to do Bioinformatics • Releases Scientists/Bioinformaticians to do other work • Record • Provenance: what the data is like, where it came from, its quality • Management of data (LSID - Life Science IDentifiers)
SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Workflow Components Freefluo Freefluo Workflow engine to run workflows Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available
The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence
The Workflow Experience Have workflows delivered on their promise? YES! • Correct and Biologically meaningful results • Automation • Saved time, increased productivity • Process split into three, you still require humans! • Sharing • Other people have used and want to develop the workflows • Change of work practises • Post hoc analysis. Don’t analyse data piece by piece receive all data all at once • Data stored and collected in a more standardised manner • Results amplification • Results management and visualisation
VRE, VLE, IE HPCx + HECtoR LHC ISIS TS2 Future UK e-Infrastructure? Usersget common access, tools, information, nationally supported services, through NGS and robust, standards-compliant middleware from the OMII GOSC Regional and Campus grids Integrated internationally