1 / 75

Grid Tutorial 2008

Grid Tutorial 2008. ?? What do you think is a GRID??. The word ‘ grid ’ is (over)used a lot (HYPE) Oracle databases ;( cluster computing cycle scavenging “If a customer calls it a ‘grid’, then it is a grid” cross-domain resource and data sharing Is there a clear definition?

yorick
Download Presentation

Grid Tutorial 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Tutorial 2008

  2. ?? What do you think is a GRID?? • The word ‘grid’ is (over)used a lot (HYPE) • Oracle databases ;( • cluster computing • cycle scavenging • “If a customer calls it a ‘grid’, then it is a grid” • cross-domain resource and data sharing • Is there a clear definition? • Coördinate resources not under a central controle • The use of standards, open and generic protocols & interfaces • Delivering a non-trivial amount of collective services • When do you need a grid? • More then one computer • More then one use (sharing) • More then one location (collaborating) • More then one company/devision • More then one community • In general: More then ONE. J. Templon dr. M. Bouwhuis

  3. Mobile Access G R I D M I D D L E W A R E Supercomputer, PC-Cluster Workstation Data-storage, Sensors, Experiments Visualising Internet, networks The Grid metaphor Grid Tutorial, Utrecht, 2008

  4. Introduction to GRID computing Introduction GRID Tutorial Maurice Bouwhuis SARA Grid Tutorial, SURFnet, November 2008

  5. Problem 1: HogeEnergieFysica 45 m ATLAS One of the four LHC detectors online system multi-level trigger filter out background reduce data volume 40 MHz (40 TB/sec) • In NL: • ~3000 CPU • 3.000.000 GB disk • 3.000.000 GB tape per jaar level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded processors 5 KHz (5 GB/sec) level 3 - PCs 100 Hz (100 MB/sec) data recording & offline analysis Grid Tutorial, SURFnet, September 2007 dr. M. Bouwhuis

  6. Life and Medical Science Over 5 million sequence entries in GenBank Over 3 billion bases from 41,000 species Grid Tutorial, Utrecht, November 2008 • Autometisering en increased resolution  more data AND more complex data • Many different sources • Hypothesis driven  data driven

  7. And the rest • Astronomy (LOFAR et al) • Climate research • Earth observation • Alpha en Gamma sciences • Storage and Long Term archiving • Analysis of digital files • Ecology • Food and health • Medical instrumentation design • Archaeology • ………………………… • Where are you from Grid Tutorial, Utrecht, November 2008

  8. State of GRID today • So what is happening today? • Scale! Grid infrastructures operate worldwide • International infrastructures - EGEE, DEISA, Nordugrid, OSG, TeraGrid • National – NAREGI (Japan), UK-eScience, D-Grid, NLGrid • Interoperability – availability of middleware – Globus toolkit, UNICORE, NAREGI, schedulers Grid Tutorial, Utrecht, November 2008

  9. State of GRID Today • Some basic requirements for a grid infrastructure • Transparent user administration – single sign on (single grid identity), authorisation and accounting based on grid identity – AAA facilities • Job scheduling – which can handle different environments • Global data access • Global information services – job information, data information, resource information • Interoperability! • Standards needed for federation of infrastructures – GGF, IETF…. Grid Tutorial, Utrecht, November 2008

  10. It all starts with Networking • Developments in network connectivity (high bandwidths) and tools play an important role • 10 Gbps WAN links available today, both shared links and dedicated lightpaths (based on lambda technology) • 1 Gbps network adapters are commodity items on systems today and 10GE adapters available Grid Tutorial, Utrecht, November 2008

  11. SURFnet6 DWDM on dark fiber SURFnet 6 infrastructure Muenster Grid Tutorial, Utrecht, November 2008

  12. GEANT2 topology Grid Tutorial, Utrecht, November 2008

  13. Global Lambda Integrated Facility (GLIF)World Map Visualization courtesy of Bob Patterson, NCSA/University of Illinois at Urbana-Champaign. Data compilation by Maxine Brown, University of Illinois at Chicago. Earth texture from NASA. www.glif.is Grid Tutorial, Utrecht, 2008

  14. EGEE Main Objectives Operate a large-scale, production quality grid infrastructure for e-Science Attract new resources and users from industry as wellas sciences • Flagship grid infrastructure project co-funded by the European Commission • Now in 3nd phase Univ. Linz - March 2008

  15. EGEE – What do we deliver? • Infrastructure operation • Sites distributed across many countries • Large quantity of CPUs and storage • Continuous monitoring of grid services & automated site configuration/management • Support multiple Virtual Organisations from diverse research disciplines • Middleware • Production quality middleware distributed under business friendly open source licence • Implements a service-oriented architecture that virtualises resources • Adheres to recommendations on web service inter-operability and evolving towards emerging standards • User Support - Managed process from first contact through to production usage • Training • Expertise in grid-enabling applications • Online helpdesk • Networking events (User Forum, Conferences etc.) Univ. Linz - March 2008

  16. 250 sites 48 countries 50,000 CPUs 13 PetaBytes >5000 users >200 VOs >140,000 jobs/day • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … 32% Univ. Linz - March 2008

  17. Production Usage Status • ~19 million jobs run (8200 cpu-years, ~50K jobs/day) in 2006 • Non-physics usage is 10K jobs/day (same as whole of EGEE in 2005) • Continuous usage of between ¼ and ⅓ of the available resources • 24% of resources are contributed by groups external to the project • Grid Operations report: https://edms.cern.ch/document/726140 Gris Tutorial 2007 courtesy of Bob Jones (EGEE director)

  18. Registered Collaborating Projects Infrastructures geographical or thematic coverage Support Actions key complementary functions Applications improved services for academia, industry and the public 24 projects have registered as on February 2007 EGEE & SEE-GRID Summer School, Budapest, June 30th, 2007

  19. User Support Activities Grid Tutorial, Groningen, September 2006

  20. User support in NE region • User support: contact user support at local site or mail to support@egee-ne.org • NE uses a ticketing system monitored by different partners from our region. In NL NIKHEF, RC-RuG, SARA responsible. • Tickets from GGUS are also imported in the NE system • Application support – NA4 activity. In NL RC-RuG, SARA Grid Tutorial, Groningen, September 2006

  21. A Selection of Monitoring tools 1. GIIS Monitor 2. GIIS Monitor graphs 3. GOC Data Base 4. Scheduled Downtimes 5. GridIce – VO view 6. Live Job Monitor Grid Tutorial, Groningen, September 2006

  22. BiG Grid • Strengthen existing National Grid infrastructure in Netherlands (NL-GRID by NCF) • Sudsidy of 28 M€ for hardware and peopleware (expertise and support) • Core partners • NCF • Nikhef (High Energy Physics) • NBIC (BioInformaitcs) • Central and Distributed facilities Grid Tutorial, SURFnet, September 2007

  23. Infrastrucure O(5000) CPU O(10) PB disk storage O(20) PB tape storage O(10) Life Science Grid clusters

  24. Combination of ‘push’ and ‘pull’ • Application support: • expertise: Application Domain Analysts • help desk and operations centre • Uniform software suite  Collaboration • Standarts: Open Grid Forum • EGEE: ‘production’ grid (40 disciplines) • BSIK VL-E project Grid Tutorial, SURFnet, September 2007

  25. Other Projects: DEISA • European super-computing grid • Shared global file system • Job migration • Co-scheduling Grid Tutorial, SURFnet, September 2007

  26. Virtual Laboratories Grid Harness multi-domain distributed resources Distributed computing Application Specific Part Application Specific Part Application Specific Part Visualization & collaboration Potential Generic part Potential Generic part Potential Generic part Management of comm. & computing Virtual Laboratory Application oriented services Management of comm. & computing Management of comm. & computing Knowledge Data & information

  27. User Interfaces & Virtual reality based visualization Virtual Laboratory for e-Science Bio-diversity Telescience Food Informatics Bio-Informatics Data Intensive Science Medical diagnosis & imaging Interactive PSE Adaptive information disclosure Virtual lab. & System integration Collaborative information Management High-performancedistributed computing Security & Generic AAA Optical Networking

  28. vrije Universiteit The VL-e project • 20 partners • Academic - Industrial • 40 M€ (20 M€ BSIK funding) • 2004 - 2008

  29. The Grid Tutorial • Day 1 • Grid Certificates and Virtual Organizations • Job Submission • Online multimedial collaboration by SURFnet • Security and authentication • Drinks • Day 2 • Data handling • User Scenarios (SciaGrid and BioMed) • Handout and USB stick • Paper handout • Usb Stick: demoXX, Imation, info.txt, tut_exercises.tgz, tutorial.pdf, tut_vmimage.zip Grid Tutorial, SURFnet, September 2007

  30. Sponsors Tutorial is free thanks to support of our sponsors • Gridforum.nl • Netherlands Center for BioInformatics • BigGrid • SURFnet Grid Tutorial, SURFnet, September 2007

  31. Grid Tutorial, SURFnet, September 2007

  32. Introduction to GRID computing Bringing It All Together Grid Tutorial, SURFnet, September 2007

  33. Summary You have seen and played with: • Authentication --- X509 certificate, VO • Job Submission --- Use the compute resources • Data Management --- Moving data around • Use Cases --- plans and achievements on the Grid Embrace/BioMed and other EU projects Databases Web Service Service Oriented Architecture WorkFlow Systems Onthologies Taverna/myGrid This Tutorial

  34. Bringing it all Together • Try your own application on the Grid • Need help, ask us, and we will work with you • Talk to the experts • We will walk around to answer “any” question • What does this type of Grid mean for BioInformatics • Working session by Victor de Jager (NBIC), Machiel Jansen (SARA) and Pieter van Beek (SARA)

  35. Extra Extra Extra Extra Extra dr. M. Bouwhuis

  36. Om te onthouden • Grid nu al beschikbaar, in productie en wordt gebruikt • BIG GRID levert de hardware en peopleware • Veel opslag en rekenkracht dr. M. Bouwhuis

  37. Kijk ook op de Expo • Virtual Laboratory for e-Science • Nationale Computer Faciliteiten • SARA • ……… Dank voor uw aandacht dr. M. Bouwhuis

  38. Waarom in Nederland? • Grid pioniers: • leidende rol grid ontwikkeling & standaardisatie • host 1e Global Grid Forum (maart 2001) • coördinatie worldwide grid identitymanagement • twee ‘area directors’ Open Grid Forum (David Groep – security, Cees de Laat – infrastructure) • SURFnet: wereldleider in netwerken • Nederland wereldwijd toonaangevend in: • bio banking • digitaal archief • computational science • radioastronomie dr. M. Bouwhuis

  39. Veelsoortige wetenschappen • Alfa- en Gammawetenschappen:DANS, MPI Nijmegen • Bio-science:BioAssist • Elementaire deeltjesfysica (HEP):wLCG project (CERN) realiseert grid prototype • Pan-disciplinair in Nederland:VL-e geeft vorm aan Nederlandse e-Science Philips jobs at NIKHEF dr. M. Bouwhuis

  40. NL – LHC/Tier-1 Samenwerking NIKHEF en SARA. Commitments vastgelegd in wLCG MoU (Maart 2006): Mede gefinancierd uit NCF-project ‘pilot BIG GRID’. Nadruk op generieke services! 1Bruto capaciteit (geen ‘fair share’ toegepast); inclusief 20% LISA @ SARA Grid Tutorial, SURFnet, September 2007

  41. Definition of Grid • From an EU brochure: • It doesn’t matter if your team is modeling the Earth’s atmosphere, designing cars, creating animated films or finding new medicines, the basic principle is the same: your Grid supplies all the computing power, software, data and knowledge you need in one integrated package, and helps project teams work more closely together • The analogy with the power grid: • Like you can plug in anywhere to the power grid without knowing where your energy is coming from you can plug into the grid without knowing where your (computing) resources are coming from. Grid Tutorial, Groningen, September 2006

  42. History (1) • From a news item in 1991 • “Smarr describes the metacomputer as a network of heterogeneous, computational resources linked by software in such a way that they can be used as easily as a personal computer” • So the concept was introduced already in the early 90s, known as metacomputing. • Motivation was the emergence of computer networks. Grid Tutorial, Groningen, September 2006

  43. Example (1) Following is an example of the kind of initiatives started in those years from close by: In 1996 a project was started in Amsterdam: The Amsterdam Metacomputing project is an ongoing effort from the University of Amsterdam (UvA), the Free University (VU) and "Academic Computing Services Amsterdam" (SARA) to develop a Metacomputer environment on the Amsterdam campus. Important components of this environment will be: automatic distribution and monitoring of jobs over a network of computer systems, uniform access to files of other users from each place to work and to each computer system incorporated in the environment, distributed storage of data on various fileservers, automatic backup, migration and archiving, general availability of both commercial and public domain software on software servers, and a minimum of system management tasks. In this way scientists will be able to devote all of their time to their actual task: science. Grid Tutorial, Groningen, September 2006

  44. Example (2) An extensive package of services will gradually be implemented and finally include the following components: • fileservers and distributed, transparent file-systems; • backup, migration and archiving services; • batch-queueing systems, designed for efficient use of local systems, and if desired, of computational servers supplied by SARA; • public domain and specialist (commercial) software servers. All components will be accessible from the scientist's desktop. A client-server architecture will play an important role. Combining components will be a relatively easy task, enhancing efficiency in terms of man-hours needed to accomplish a given task. These pages, as well as the Metacomputer are still in a development stage …….. Grid Tutorial, Groningen, September 2006

  45. Example (3) Systems available at SARA in 1996 CRAY YMP Vector system Parsytec CC 56 CPUs IBM SP2 76 CPUs Grid Tutorial, Groningen, September 2006

  46. Example (4) • SARA news item on 16-6-1998 • Basis voor meta-omgeving gelegd. • Sinds 4 mei maakt SARA's IBM RS/6000 SP parallelle supercomputer gebruik van de DCE/DFS omgeving, een filesysteem dat een transparante computeromgeving mogelijk maakt. Met het nieuwe filesysteem zijn bestanden van DCE/DFS gebruikers wereldwijd toegankelijk met andere computersystemen die beschikken over DCE/DFS, waarmee een belangrijke basis is gelegd voor de meta-omgeving. • Gebruikers aan de VU science facultyhebben nu op een uniforme manier toegang tot hun bestanden, ongeacht of ze werken op de RS/6000 SP of een lokaal workstation. Hetzelfde geldt voor gebruikers van het Parsytec CC systeem bij SARA: vanaf zowel de Parsytec als de RS/6000 SP zijn alle bestanden voor de gebruiker direct toegankelijk. Grid Tutorial, Groningen, September 2006

  47. Example (5) • A web interface was developed for submitting jobs to the metacomputing environment, also a meta job language was used. • Also job migration between systems and mpi over two systems was investigated • First time we heard about globus, one of the well known building blocks now for grid infrastructures. • Network link between systems was a problem, only FE link, Gbit not available, HiPPI (800 Mbps) not available for Parsytec. Grid Tutorial, Groningen, September 2006

  48. GLORIAD-RU @NIKHEF GE 622M GLORIAD NetherLight – Lightpath connections to the Netherlands 3rd quarter 2005 Grid Tutorial, Groningen, September 2006

More Related