500 likes | 687 Views
UK e-Science Program. Core Centres 2001 (EPSRC) Research Council Pilot projects Godiva Ocean grid (NERC) Genie Earth System (NERC) e-Minerals (NERC) e-Biodiversity (BBSRC) Open EPSRC call for new e-Science Centres Reading e-Science Centre (Nov. 2003) Resources: Access Grid Node
E N D
UK e-Science Program • Core Centres 2001 (EPSRC) • Research Council Pilot projects Godiva Ocean grid (NERC) Genie Earth System (NERC) e-Minerals (NERC) e-Biodiversity (BBSRC) • Open EPSRC call for new e-Science Centres • Reading e-Science Centre (Nov. 2003) • Resources: Access Grid Node Technical Director: Jon Blower
The Reading e-Science Centre(ReSC) Jon Blower Technical Director http://www.resc.rdg.ac.uk resc@rdg.ac.uk
Aims of the ReSC • Promote e-Science methods in the environmental science community • CGAM, DARC, ESSC, JCMM, NCAS all at Reading • Act as a focus for all e-Science activities in Reading • Provide expertise, help and support for these activities • Reach out into government agencies and industry • esp. Met Office, Environment Agency • British Maritime Technology
What is e-Science? • “science increasingly done through distributed global collaborations enabled by the Internet, using very large data collections, terascale computing resources and high performance visualization”
What is e-Science? (2) • Easier definition: “Collaborative science using distributed computing” • Who can benefit? • Users of lots of computing power • Users of large datasets • Users of very distributed datasets • scientists who work across geographical and institutional boundaries • Easier to explain with some concrete examples
Case 1: Ensemble modelling • The Problem: • Climate is sensitive to very many factors. How do we work out which factors are most important in determining our future climate? • The Solution: • Run (fairly simple) simulations many, many times over with different parameters (an ensemble run) • climateprediction.net: participants all over the world run the model on their home PCs
climateprediction.net results • Already largest climate model ensemble ever (by factor of >200) • >45,000 users, >15,000 complete model runs, >1,000,000 model years in ~3 months (this is equivalent to 1.5 Earth Simulators) • Global outreach (participants in all 7 continents, inc. Antarctica!) • Generated much interest in schools (coolkidsforacoolclimate.com) 10K 2K Large range of sensitivities found:
Case 2: Sharing large datasets • The Problem: • There are many different models of ocean circulation and we would like to compare and visualize the results. But there are lots of different data formats, and there’s lots of data! • The Solution: • Create an Internet-based service that allows users to cut out just the data they want, and get it in the format they want (this is called Grid Access Data Service, GADS) • Developed under the GODIVA project
GODIVA Web Portal • Allows users to interactively select data for download using a GUI • Users can create movies on the fly • cf. Live Access Server
Case 3: Highly distributed data • The Problem: • In order to study the genetic origins of a disease it is necessary to interrogate many data sources to perform in silico experiments to test hypotheses • The Solution: • Provide Web Services to access these data sources and a means for combining these Services into workflows. • These workflows can be shared between scientists, experiments can be easily repeated • myGrid project is doing just this (www.mygrid.org.uk)
The Taverna workbench • Each blob on the diagram is a Web Service • Flexible way of creating a distributed application taverna.sourceforge.net
e-Science buzzwords • The GRID • highly heterogeneous network of supercomputers, clusters and commodity machines (and one PS2!) • cf. power grids (long way off!) • not all e-Science is done on The GRID (in fact, most isn’t at the moment) • Interoperability / standards • absolutely necessary for working together and avoiding duplication of effort • Metadata and Semantics (“The Semantic Web”) • Metadata = “data about data”, vital for discovering data resources • Meaning of data (semantics) must be precisely specified
The tools of the trade • Middleware • software that “glues together” existing systems and connects people with distant resources • Condor • Manages task of running jobs over several computers • Globus (Toolkit) • Most popular middleware, handles authentication, job submission, etc • version 3 very different from previous versions; it’s based on… • Web Services
Web Services • “Black box” subroutine that can be accessed over the Internet • Platform and language neutral • for example, code can run on Solaris, but be called from Mac, Windows, Linux etc, any language • Huge industry backing • IBM, Microsoft, Sun, etc • Grid Services extend WS for long-lived jobs • notification of progress, persistence of data etc
Workflows • Web Services can be composed into “workflows” to create a distributed application • hot topic of research and debate in e-Science • Lots of standards and tools to do this, but no one clear “winner” yet • BPEL is popular, but really designed for business-to-business (B2B) interaction
Extract dataset 1 Extract dataset 2 Example workflow Perform diagnostics Compare datasets Visualize results Convert format
Visualization • Key component of many e-Science projects • Vital for validating models and finding features of interest • not just “pretty pictures” • Can do collaborative visualization • several groups can look at the same thing at the same time • e.g. mammography in hospitals • Real-time visualization of model results permits computational steering • RealityGrid (www.realitygrid.org) • explore parameter space much more quickly
GODIVA visualization • Adaptive meshing gives data compression with little visible degradation • 60 x 60 x 66 data points ~ ¼ million reduced by factor of ~10
Why ReSC? • Centre of Excellence in Environmental e-Science • Reading Uni has strong links with Met Office, and Environment Agency • Support existing Reading e-Science activities • in ESSC, Comp Sci, Plant Sciences, etc • acts as focus and central point of contact • not just environmental e-Science • Complements NIEeS • National Institute for Environmental e-Science in Cambridge • www.niees.ac.uk
Who are we? • Two co-Directors • Keith Haines (ESSC) • Rachel Harrison (Computer Science) • Technical Director (first point of contact) • Jon Blower (ESSC) • Many Associates • Mike Evans, Lizzie Froude, Kevin Hodges, Chunlei Liu, Kecheng Liu, Adit Santokhee • join us!
What are we doing? • Building Reading e-Science community • Comp Sci, Met Dept, CGAM, DARC, Plant Sciences • Building infrastructure • Building Condor pool between ESSC and Comp Sci, further in future • Bidding for dedicated compute cluster • Building software • Web Services for environmental data access and manipulation • Outreach into govt agencies and industry • BMT, ECMWF, MCA, SEEDA • using Reading Enterprise Hub
ReSC projects • Flexible Online Environmental Data Systems (EDAS) • SEEDA project • delivery of live Met Office data to end users • e.g. BMT for search and rescue / oil spill mitigation • GODIVA • Grid for Ocean Diagnostics, Interactive Visualization and Analysis • GADS • Grid Access Data Service • Lizzie Froude’s PhD studentship • storm tracking diagnostics on large, distributed data sets • Lots more going on in Reading • e.g. BiodiversityWorld • Computer Science
How you can get involved • Talk to us! • Join the Reading University e-Science mailing list • e-science@lists.reading.ac.uk • Read our website: www.resc.rdg.ac.uk • Use the Wiki site to share ideas • Register expertise and interests • Share documents that might be of general use
What we can do for you • Provide technical expertise • e.g. on Web Services, workflow, etc • Provide advice on getting funding • Help find collaborators, resources etc • Provide computational resources • Provide live data • Provide Access Grid for use
The Access Grid accessgrid-resc@rdg.ac.uk
What is the Access Grid? • (not to be confused with The GRID!) • State-of-the-art videoconferencing suite • Can hold meetings with many sites at once • everyone can see and hear everyone else • Reduces travel costs and saves lots of time • Uses high-speed internet • no running costs! • Easy to operate • don’t need dedicated technician
In conclusion… • ReSC is here to support all Reading e-Science activity • We specialise in environmental e-Science • We’re always looking for new projects to be involved in • Many potential future projects • especially in area of delivery of real-time Met Office or Environment Agency data • engage GIS community • Let us know what you would like us to do! • resc@rdg.ac.uk
GENIE • Grid-Enabled Integrated Earth System model • Aims to create a distributed, component-based model of the earth system • Will study long-term climate change and palaeoclimate • Will incorporate components representing atmosphere, ocean, land surface, ice, ocean and land biogeochemistry, ocean sediments • Developing novel computing techniques for model framework, integration, data management, visualization www.genie.ac.uk
GENIE (contd.) Response of Atlantic circulation to freshwater forcing • New ways of working: • Web Portal for composing + executing simulations, retrieving results • Use of flocked Condor pools (London, Soton) and Beowulf clusters • Data client for post-processing
GENIE (contd.) • 3 international collaborators (Japan, US, Switzerland) • Involvement in international projects: PRISM, EMIC, GAIM • 4 Oral, 2 poster presentations at EUG/AGU (Nice), IUGG (Japan), AHM 03 • 4 refereed journal papers (1 in press, 3 submitted) • Engagement with industry (50K each from Intel, Compusys for meetings) • ~20 people at present using shared code repository • Tyndall Centre will use code in integrated assessment model
GODIVA • Grid for Ocean Diagnostics, Interactive Visualisation and Analysis • Aims to quantify the thermohaline circulation via analysis of model results and observational data • Developing Web Services for performing common tasks on oceanographic data: • Data extraction, processing, analysis, visualisation • These Services will be composed into “workflows” to create flexible, distributed applications • collaborating with other e-Science projects (e.g. myGrid) in this matter
GODIVA progress • Talks/demonstrations at All Hands meeting and SCGlobal 2003 • Created prototype client application: • extracts live data and performs 3-D rendering • Also created data portal providing global access to data (next slide) • Will engage GIS community (e.g. MarineGIS project in Ireland) • MENTION irregular mesh www.nerc-essc.ac.uk/godiva
GODIVA Data Portal • Web-based, similar to Live Access Server • Users select area of interest and can download data or create movies in matter of seconds or minutes • Uses distributed computing for visualisation
NERC Data Grid • Objective is build a grid which makes data discovery, delivery and use much easier than it is now • Standards compliant (ISO 19115, 19118), semantic data model for maximum interoperability • Data can be stored in many different ways (flat files, databases…) • Clear separation between discovery and use of data. • 1 PI, 2 co-Investigators, 4 FTE staff, 3 registered US collaborators ndg.nerc.ac.uk
NERC Data Grid progress • Involved in many UK events (All Hands, Met Soc, NIEeS workshops etc) • Generated much international interest (US, France, Netherlands, Australia…) • Major challenges: • Influencing OGC and ISO to support the complex requirements of the climate simulation community • Developing a “feature-registry” to allow semantics of data types to be well understood by different communities
climateprediction.net • Have created extremely powerful and distributed climate modelling facility by running model simulation on home computers (cf. SETI@home) • Launch ensemble of coupled simulations of 1950-2000 and compare with observations. • Run on to 2050 under a range of natural and anthropogenic forcing scenarios. • Investigates sensitivity of climate system to increasing CO2 with range of parameter values • Have collaborated with other universities and industry to build system
e-Minerals • Models the atomistic processes involved in environmental issues (radioactive waste disposal, pollution, weathering) • Simulation of radiation damage (Daresbury) • Order-N quantum mechanical model of fluids (Cambridge) • Complex fluid-mineral interfaces – crystal growth and dissolution (Bath) • Developing new methods • embedded clusters: links simulations of various sophistication to cover greater ranges of scales • first use of quantum Monte Carlo techniques in mineral sciences eminerals.org
e-Minerals (contd.) • Have constructed minigrid across institutions to run code • ~30 scientists in 8 institutions • Users submit jobs using a Web Portal • This integrates the CCLRC Data Portal with the HPC Portal • Developing tools for collaborative visualisation across the virtual organisation • Collaborating with Peter Murray-Rust to extend the Chemical Markup Language (CML) for computational chemistry
NIEeS • National Institute for Environmental e-Science • Promotes and supports the use of e-science and grid technologies within the UK environmental science community • Holds workshops, courses, training events, visitor programmes, demonstration projects • Industry event forthcoming (Feb 12th) • generating much interest www.niees.ac.uk
NIEeS (contd.) • Up to end of 2003 (since launch in July 2002): • 14 events held • 901 participants • e.g. Earth Systems Modelling workshop (Oct 03) received coverage in national press and engaged Earth Simulator community in Japan • Event sponsorship from BNFL, LaserScan • In-kind support from EDINA, ICE, IEMA, MIRO • Additional help from Hi Consulting
Illustration of an e-Science problem • SOC’s latest OCCAM model runs at 1/12 degree resolution, covering the entire globe • Every model day, model outputs 8GB of data • Hence whole data set will be several TB in size • How do we work with this data set? • Might want to do analysis, visualisation etc • Extract just the data you want and work with it • OR move the programs (code) to the data, not vice-versa • These are two key principles of e-Science
Working with large data sets • Subset / resample • Transform / regrid / rotate • Analyse • Compare
National e-Science Centre (NeSC) National Institute for Environmental e-Science (NIEeS) UK e-Science Centres
GADS: Background • Climate scientists have a need to access large datasets: • Model data and satellite observations • Data in a variety of formats (netCDF, HDF, GRIB, more), grids, naming conventions • Model intercomparisons (MERSEA) • Existing standards (DODS/OPeNDAP) are limited
Advantages of GADS • Data are abstracted from storage • Data can be exposed with standard variable names, even if data files do not conform to standards • Data can be delivered in many formats, irrespective of internal storage format • Deployed as Web Service • Platform – independent • Compatible with current eScience advances