210 likes | 334 Views
Grids – Achtergronden en praktijk in het EU Data Grid. David Groep, NIKHEF davidg@nikhef.nl. http://www.dutchgrid.nl/ http://www.eu-datagrid.org/ http://www.edg.org/. The GRID: networked data processing centres and ”middleware” software as the “glue” of resources.
E N D
Grids – Achtergronden en praktijkin het EU Data Grid David Groep, NIKHEFdavidg@nikhef.nl http://www.dutchgrid.nl/ http://www.eu-datagrid.org/http://www.edg.org/
The GRID: networked data processing centres and ”middleware” software as the “glue” of resources. Researchers perform their activities regardless geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amounts of data Grid – a vision Federico.Carminati@cern.ch next: beyond distributed computing
Beyond distributed computing A grid integrates resources that are • not owned or administered by one single organisation • speak a common, open protocol … that is generic • working as a coordinated, transparent system And … • can be used by many people from multiple organisations • that work together in one Virtual Organisation Checklist items based on: Ian Foster What is the Grid? July 2002 next: virtual organisations
Virtual Organisations • A VO is a temporary alliance of stakeholders • Users • Service providers • Information Providers A set of individuals or organisations, not under single hierarchical control, temporarily joining forces to solve a particular problem at hand, bringing to the collaboration a subset of their resources, sharing those at their discretion and each under their own conditions. Viewgraph: Foster, Kesselman, Tuecke, the Globus Project next: common and open protocols
Applications Common and open protocols • Resources must talk standard protocols … • … for interoperability of application toolkits Application Toolkits Condor-G DUROC MPICH-G2 VLAM-G Grid Services Information Replica GridFTP GRAM Grid Security Infrastructure (GSI) Grid Fabric FARMS Supers Desktops TCP/IP Apparatus DBs next: protocol standards
Standard protocols • New Grid protocols based on popular Web Services Open Grid Services Architecture • service discovery • many different bindings • easily integrated in hosting environments (Java, WebSphere, .NET) • is entirely generic • adds: transient services, stateful services Global Grid Forum (GGF) promotes the open standards process next: access in a coordinated way
Access in a coordinated way • New ‘qualities-of-service’ • Transparently crossing of domain boundariessatisfying constraints of • site autonomy • authenticity, integrity, confidentiality • single sign-on to all services • ways to address services collectively • preferably via portals and visual programming next: example GOME analysis
Example: GOME analysis Task: ozone is the component in the atmosphere that protects us from harmful UV radiation. Its concentration varies widely. What is happening? • the EnviSat satellite is orbiting the earth and measuring light absorption in the atmosphere • the absorption is related to the ozone concentration,but needs instrument corrections • ground-based observation give absolute concentrations • linking both datasets can give us the concentration everywhere • terabytes of data come in at several ground stations, and various labs need the final products Grid can provide a good solution to this problem next: GOME analysis on the Grid, domains
Example: Ozone Analysis on the Grid LIDARdatabase 10100100010111101001000100101101010010001000101011010100101010100001011110101001010011010010010111001001001010010011111010101001010111001010101010101001001001111101010100100010100101100010100000101010001010010001011110100100010010110101001000100010101101010010101010000101111010100101001101001001011100100100101001001111101010100101011100101010101010100100100111110101010010001010010110001010000010101000 NOPREGO resourcebroker validation visualize OPERA next: DataGrid overview
A Working Grid: the EU DataGrid Objective: build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities • official start in 2001 • 21 partners • in the Netherlands: NIKHEF, SARA, KNMI • Pilot applications: earth observation, bio-medicine, high-energy physics • aim for production and stability next: history of grids
Realising the Grid Vision Grid was the logical next step in the end of the 1990: • Harnassing desktop power became commonplace– 1988: Condor, later: SETI@Home, Entropia, Distributed.NET • Peer-to-peer data access protocols emerged– 1999: Napster, later: Gnutella, KaZaa, BitTorrent • Network access became extremely fast– 1997: wide area bandwidth starts to double every 9 months! • 1997: Globus starts developing basic middleware– 1996: middleware by Legion, 2000: Unicore • Massive take-up of the Grid vision in 1999– lead in Europe by the EU DataGrid– others include: NASA-IPG, CrossGrid, GridLab, PPDG, Alliance, … next: the EU DataGrid project
Authentication Request OK C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy Query AuthDB VOMSpseudo-cert VOMSpseudo-cert VOMS user Grid Security Infrastructure Crucial in Grid computing: it gives Single Sign-On GSI uses a Public Key Infrastructure with proxy-ing and delegation multiple VOs per user, groups and role support contracts Service 2 Grid Service 1 connect to providers VOMS overview: Luca dell’Agnello and Roberto Cecchini, INFN and EDG WP6 next: information services overview
What is needed to get the work done • Fabric information • what are the resources (computers, disk, tape) available to my VO? • how do I access these resources (the “contact information”)? • “Physical” meta-data • when was this dataset written? • where can I find copies of it ‘close’ to me? • Contextual meta-data or ‘information’ • Which datasets contain feature “X”? • Which DNA sequence corresponds to this protein? • Actual storage, processing power, network connectivity next: spitfire
Spitfire: Access to Data Bases • based on common EDG Trust and Authorization Manager • VO and Role mapping to data base views Access via • Browser • Web Service • Commands Screenshots: Gavin McCance, Glasgow University and EDG WP2 next: R-GMA
Grid information: R-GMA Relational Grid Monitoring Architecture • a Global Grid Forum standard • Implemented by a relational model • used by grid brokers Screenshots: R-GMA Browser, Steve Ficher et al., RAL and EDG WP3 next: RLS and RMC
ATLAS Replica Service higgs1.dat, ... sara:atlas/data/higgs1.dat cern:lhc/atlas/higgses/1.dathiggs2.dat, ... cern:lhc/atlas/higgses/2.dat Replica Location Service • Search on file attributes (date, name, …) • Find replicas on (close) Storage Elements SE2CERN CECERN SE1SARA cacheUvA DAS2 CEDAS-2 next: CE and RB, brokering and LCAS
Compute Brokering: reliable execution • User can delegate all job actions to the Resource Broker …… and go away • Reliable scheduling of jobs over the entire grid (as seen from the R-GMA information system) • Users are roaming, and can retrieve their results anywhere, anytime next: EDG test bed overview
Current EU DataGrid Facilities EDG and LCG sites Core site NIKHEF RAL Tokyo Taipei BNL CERN Lyon ~1000 CPUs~100 Tbyte storage several key databases ~60 sites, ~600 users in ~7 VOs CNAF next: using EDG, VisualJob
Using the DataGrid for Real Screenshots: Krista Joosten and David Groep, NIKHEF next: Portals
Portals Screenshots: ICES/KIS and WTCW: VLAM-G; INFN-GRID and EDG: Genius; NPACI: Rocks next: conclusions and outlook
What more is there to see and do? The current Grids are only the beginning! • portals will get more users on the Grid • more functionality, better resilience, strong reliability • joining the Grid will be as simple as joining a file-sharing network • EGEE: a pan-European Grid Infrastructure being created today The EU DataGrid project web www.edg.org DutchGrid Platform www.dutchgrid.nl For other grid projects, see www.gridstart.orgwww.enterthegrid.com