370 likes | 603 Views
Compute Grids, Data Grids and Service Grids Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Centre. Compute Grids, Data Grids and Service Grids - What they are - What they can do - Where they can be found - What the future holds in this arena.
E N D
Compute Grids, Data Grids and Service Grids Dr Neil Geddes CCLRC Head of e-ScienceDirector of the UK Grid Operations Centre
Compute Grids, Data Grids and Service Grids - What they are - What they can do - Where they can be found - What the future holds in this arena
Compute Grids, Data Grids and Service Grids What are they ?
What is a computational grid? • A pool of computational resources that can be “plugged into” via standard interfaces. • Processors • Data storage devices • Instruments
Compute Grids • Focus on high throughput computing • Clusters of computers • Some very big • Clusters of clusters • HPC meta-computing • HPC + pre + post processing • Grids enable coordination across administrative boundaries • Key components: • Authentication, Authorisation • Resource discovery • Job submission/retrieval • Networking NASA Information Power Grid
Data Grids • Focus on • Large data volumes • Coordinated data access • Heterogeneous and distributed data • Importance of metadata • e.g. • Virtual Observatories • Medical images • Important components • Authentication, Authorisation • Resource discovery • Data transfer • Confidentiality • Networking X-ray optical infra-red radio
Service Grids • Focus on • Everything else: • What you want to do rather than how it is done • Integrate audio visual tools • Remote control and tele-presence • Microscopes, Beamlines, test equipment • Integrated with compute and data grid • Integrate with other services • Journal archives, website management • Service based architectures • Web services • Important components • Authentication, Authorisation • Resource discovery • Data transfer • Confidentiality • Common Interfaces
Common Grid Features • Authentication • Authorisation • Accounting • Resource discovery • Data transfer • Confidentiality • Security • Automation Different emphasis for different deployments/problems Grid computing is about common standards/interfaces to enable inter-enterprise, collaborative computing.
Compute Grids, Data Grids and Service Grids What can they do ? Where can they be found ?
(some) US Grid Projects: • Information Power Grid (IPG) Production Grid for aerosciences and other NASA missions. • Network for Earthquake Eng. Simulation Grid (NEESGrid) Production Grid for earthquake engineering. • National Virtual Observatory (NVO) Production Grids for data analysis in astronomy. • Particle Physics Data Grid (PPDG) Production Grids for data analysis in high energy and nuclear physics • Southern California Earthquake Center 2 Full geophysics modeling using Grids and knowledge-based systems. • TeraGrid U.S. science infrastructure linking four major resource sites at 40 Gb/s. • DOE Science Grid (DOESG) supplies persistent Grid services. • EdGrid promote applications of modeling and visualization in science and mathematics education, remote control of instruments (electron microscope) for K-12 • Biomedical Informatics Research Network (BIRN) An NCRR initiative aimed at creating a testbed to address biomedical researchers' need to access and analyze data at a variety of levels of aggregation located at diverse sites throughout the country.
UK eScience Projects CLEF A Co-operative Clinical e-Science Framework BiosimGRID A GRID Database for biomolecular simulations e-HPTX An e-Science resource for High Throughput Protein Crystallography AstroGrid A Virtual Observatory for the UK BAIR Biological Atlas of Insulin Resistance ClimatePrediction.com Distributed computing for a global climate (NERC Pilot) DAME Distributed Aircraft Maintenance Environment
e-Protein A distributed pipeline for structural-based proteome annotation using GRID technology e-Minerals Environment from the molecular level: an e-Science proposal for modelling the atomistic processes involved in environmental issues. Integrative Biology A robust and fault tolerant Grid infrastructure fro biomedical science GENIE Grid Enabled Integrated Earth system model GEODISE Grid Enabled Optimisation & Design Search for Engineering myGrid Directly Supporting the E-Scientist Comb-e-Chem Structure-Property Mapping: Combination Chemistry & the Grid NERC DataGrid Data discovery and delivery for the NERC community GridPP The Grid for UK Particle Physics
CMS LHCb ATLAS CMS e-science and the UK GRID
climateprediction.net • Launch ensemble of coupled simulations of 1950-2000 and compare with observations. • Largest climate model ensemble ever (by factor of >200) • >45,000 users, >15,000 complete model runs, >1,000,000 model years in ~3 months (this is equivalent to 1.5 Earth Simulators) • Screensaver” requires • 10 CPU days on a 1.4GHz P4,>128MB memory, 600MB disk space • Global outreach (participants in all 7 continents, inc. Antarctica!) • Generated much interest in schools (coolkidsforacoolclimate.com)
What is BIRN? • Testbed for a biomedical knowledge infrastructure • Creation and support federated bioscience databases • Data integration • Interoperable analysis tools • Datamining software • Scalable and extensible • Driven by research needs pull, not technology push
BIRN Today • Established three neuroscience testbeds building on previously funded R01 research projects: - Mouse BIRN - Morph BIRN - Functional BIRN - BIRN Coordinating Center • Integrating the activities of the advanced biomedical imaging and clinical research centers in the US. • Developing hardware and software infrastructure for managing distributed data: creation of data grids. • Exploring data using “intelligent” query engines that can make inferences upon locating “interesting” data. • Building bridges across tools and data formats. • Changing the use pattern for research data fromthe individual laboratory/project to shared use.
BIRN Network IT Infrastructure to hasten the derivation of new understanding and treatment of disease through use of distributed knowledge
AboutNEESgrid will link earthquake researchers across the U.S. with leading-edge computing resources and research equipment, allowing collaborative teams (including remote participants) to plan, perform, and publish their experiments. • Through the NEESgrid, researchers will: • perform tele-observation and tele-operation of experiments; • publish to and make use of a curated data repository using standardized markup; • access computational resources and open-source analytical tools; • access collaborative tools for experiment planning, execution, analysis, and publication. • The components of the NEESgrid system will be completed by September, 2004, • when management and operation of the NEES system will be turned over to a • consortium of earthquake engineer researchers and practitioners.
Compute Grids, Data Grids and Service Grids What the future holds ?
The drive toward standardisation community-initiated forum of thousands of individuals from industry and research leading the global standardization effort for grid computing. GGF's primary objectives are to promote and support the development, deployment, and implementation of Grid technologies and applications via the creation and documentation of "best practices" - technical specifications, user experiences, and implementation guidelines. OASIS is a not-for-profit, global consortium that drives the development, convergence and adoption of e-business standards • Horizontal and e-business framework • Web Services • Security • Public Sector • Vertical industry applications • WS-RF (from GGF)
for Everyone Enabling Grids for E-science in Europe
EGEE - Consortia UK e-Science: PPARC + Core Programme 10 European Consortia (incl. GEANT/TERENA/DANTE) + US + Russia
Also includes: • http://www.csar.cfs.ac.uk/ • 256 Itanium2 processor SGI Altix • 512 processor Origin3800 http://www.hpcx.ac.uk/ Full installation = 1600 IBM p690+ Regatta processors currently 1236 processors EMBL Nucleotide Sequences NCBI, BLAST, EMBOSS, FASTA, Gaussian • Thus, the NGS provides access to over 2000 processors, over 36TB of "data-grid" • capacity, common scientific applications and extensive data archives. • Other resource providers anticipated to join in the future …
More than just computation and data resources… • In future will include services to facilitate collaborative (grid) computing • Authentication (PKI X509) • Job submission/batch service • Authorisation • Certificate management • Virtual Organisation management • Data access/integration services (SRB/OGSA-DAI/DQPS) • Information service • National Registry (of registry’s) • Data replication • Data caching • Grid monitoring • Accounting
Concluding Remarks • Huge worldwide research activity • Push towards standardisation and intersection with e-Business (web services) • Increasing grid infrastructure deployed ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’Tony Blair, 2002
The Particle Physics Challenge CMS ATLAS Storage – Raw recording rate 0.1 – 1 GByte/sec Accumulating at ~10 PetaBytes/year 10 PetaBytes of disk Processing – >100,000 of today’s fastest PCs LHCb
CERN/LHC Community Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users