1 / 46

The SwissGrid Initiative

The SwissGrid Initiative. Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest, July, 2006. Peter Kunszt. Doctorate in Theoretical Physics from the University of Bern. Building the Science Database of the Sloan Digital Sky Survey, Johns Hopkins University Baltimore.

phila
Download Presentation

The SwissGrid Initiative

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SwissGrid Initiative Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest, July, 2006

  2. Peter Kunszt Doctorate in Theoretical Physics fromthe University of Bern Building the Science Database of the Sloan Digital Sky Survey, Johns Hopkins University Baltimore EU Grid Projects, leading data management middleware developmentCERN, Geneva Manager Swiss Grid Initiative,Swiss National Supercomputing Centre CSCSManno PMB, 12_07_2006, P. Kunszt

  3. CSCS PMB, 12_07_2006, P. Kunszt

  4. Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

  5. Grid Computing in Switzerland – High Level Goals • Resource Sharing: Pooling of Available Resources • Excellent national network provided by national research network provider SWITCH • Optimal usage of national resources • Pooling of available resources at research institutions • Harvesting cycles on as of yet unused resources (e.g. classroom PCs, cluster backfill queues) PMB, 12_07_2006, P. Kunszt

  6. Grid Computing in Switzerland – High Level Goals • Coordination: Building an Infrastructure • Agreements on the usage of the available resources • Coordinated support of the resources • Sharing of tools and middleware PMB, 12_07_2006, P. Kunszt

  7. Grid Computing in Switzerland – High Level Goals • Collaboration: Enabling Scientific Discovery • Coordinated application usage, thematic Grids • Building a community • Establishing a joint knowledge base PMB, 12_07_2006, P. Kunszt

  8. Swiss Grid Initiative • Taking care of coordinating and supporting national Grid projects. • Point of contact for all Grid Projects • Point of support for all Grid users and administrators • Representation of Swiss Academic Research Interests • In Europe • Globally • Towards the Industry PMB, 12_07_2006, P. Kunszt

  9. The Swiss Grid Initiative • The Swiss Grid Initiative has been created to • Provide support and expertise for the Swiss research community • Promote connectivity and collaboration between disciplines and users, especially CS and ‘high-need’ applications • Represent the interests of the national research community towards other national and EU Grid projects • Get involved in joint multinational projects, help Swiss partners to get funding • Interact with the industry in joint projects • Continuously initiate thematic projects, including e-Science pilot studies • Research and develop middleware components to fill gaps and to improve the services to the community and with the community PMB, 12_07_2006, P. Kunszt

  10. Swiss Grid Initiative Focus • Support the End-User • Enabling Relevant Scientific Discovery from Day 1 (no testbeds) • Consulting about Gridification – not every project is suitable for the High Throughput Paradigm • Seek new opportunities and initiate new projects PMB, 12_07_2006, P. Kunszt

  11. Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

  12. EGEE and LCG • SEE PREVIOUS PRESENTATION FOR DETAILS • European Grid Infrastructure for Enabling E-science • Teaming up with the D-Grid in the DECH Federation PMB, 12_07_2006, P. Kunszt

  13. Tier-ed model Lab m Uni x CERNTier 1 grid for a regional group UK Lab a USA CSCS France Tier3 physics department Tier 1 Uni a Tier2 CERN Tier 0 Japan Italy Germany  Lab b Taipei Lab c grid for a physics study group  Uni y  Uni b PMB, 12_07_2006, P. Kunszt

  14. CSCS – Swiss Supercomputing Centre SWITCH – Swiss Research & Edu Network CSCS SA1, NA2, NA3, NA4 Is an LCG Tier2 Site Support for Region and all of EGEE Analysis of Physics Data Biomed, Comp.Chemistry, EO applications Training, Education, Public Relations SWITCH: JRA1 Security Middleware: Next generation of Grid Certificates by integrating Shibboleth and PKI Swiss Partners PMB, 12_07_2006, P. Kunszt

  15. Challenges in EGEE: View of CSCS • Nontrivial Administration • Steep learning curve to become ‘EGEE member’ • Reporting, Deliverables, etc • Substantial Communication Overhead • Finding the right partner to communicate with • Many bodies, forums, sometimes contradictory information • It helps to be vocal – just trotting along silently will not help to improve the project • Infrastructure: substantial effort • To keep the site running • To respond to updates • Many things are not well automated • Mean Time Between Failure very low (in Grid middleware) Complex System: Many things break in many ways PMB, 12_07_2006, P. Kunszt

  16. Swiss Bio Grid PMB, 12_07_2006, P. Kunszt

  17. Swiss Bio Grid Applications • Usage Patterns of different Applications • Identified three classes of applications • Short CPU jobs (Docking) • Medium CPU + data exchange (Proteomics Pipelining) • Data intensive (Mass Spectrometry MS; Systems Biology) • Strategy: Address them in sequence, find commonalities • Dengue docking project (see next slides) • swissPIT (Protein Identification Toolbox) Project starting now PMB, 12_07_2006, P. Kunszt

  18. Orphan Diseases: Dengue PMB, 12_07_2006, P. Kunszt

  19. What does it take to make a drug? Target validation Opti- mization Target ID Screening Clinical Preclinical BIOLOGY CHEMISTRY DEVELOPMENT • 12years of development, 802 mio US$ (DiMasi, J.A. et al. (2003) J Health Econ, 22, 151-185). • 1 in 10‘000 NCE becomes a product(Heilman, R.D. (1995) Qual Assur 4(1) 75-9.) • ‚Only‘ 20 years of Patent – 8 years to make money PMB, 12_07_2006, P. Kunszt

  20. “In Silico” Drug Development Bioinformatics, data mining, visualization, simulations, modeling, and many algorithms, databases PMB, 12_07_2006, P. Kunszt

  21. Screening of compounds Computational screening of small compounds to identify early drug candidates PMB, 12_07_2006, P. Kunszt

  22. Dengue Docking project • Proof of concept for successful private-public partnership • Biozentrum:in silico docking • Novartis Institute for Tropical Deseases:In vitro/in vivo follow-up • Novartis:drug development at cost PMB, 12_07_2006, P. Kunszt

  23. Dengue Docking project COMPOUNDLIBRARIES TARGETPROTEINS IT INFRASTRUCTURE ALGORITHMS char* filename = argv[1]; int seed; (argc > 2) ? seed = atoi(argv[2]) : seed = /* fill the array of random numbers */ double numbers[ITERATIONS]; //double foo = 0; for (int i = 0; i < ITERATIONS; i++) numbers[i] = (double)ra //numbers[i] = foo++; / e array to the file */ ile = fopen ( filename, "w+" ); if (myFile == 0) { err << "could not ); } fwrite (numbe ITERATIONS fflush (myFil fclose (myFile) 3D structure of targets • NS5 Methyltransferase • NS3 Protease • GPE Envelope Glycoprotein • NS3 Helicase NCI Diversity (2k) NCI DTP (200k) ZINC (2700k) • DOCK 5.1 • Autodock 3.05 • FlexX (SCAI/BioSolvIT) • GLIDE(Schrödinger) PMB, 12_07_2006, P. Kunszt

  24. Dengue NS5 Methyltransferase PDB 1R6A: Structure solved in complex with Ribavirin and AdoHCys 2' O-methylation of viral RNA (2nd capping step of type 1 RNA cap) Cofactor: SAM Deletion of SAM domain aborts viral replication in Kunjin (Koonin, 1993) PMB, 12_07_2006, P. Kunszt

  25. Current Achievements of GRID-enabled Dengue Docking • Completed Phase I SwissBioGrid • Completed large-scale parameterization testusingAutodock 3.0.5: >500‘000 docking runs, >38‘000h CPU time • In vitro testing of predicted binders is underway at NITD • Some initial candidates alreadyin next phase PMB, 12_07_2006, P. Kunszt

  26. Some challengs in grid adoption • Compute resources are busy already • Agree on dedicated compute time for grid projects • PC Desktop grids: untapped resource • Buy new clusters for your grid (not the idea) • Non-intrusiveness • Firewall exceptions • Non-intrusiveness on PC Desktop grids: application level • Application clearing: • Security issues • Numerical stability in heterogeneous environments • Data model in bioinformatics different from HEP • Applications need access to large databases or data sets PMB, 12_07_2006, P. Kunszt

  27. Challenge: Heterogeneity • Very different resources at participating institutes • Use ‘standard’ schedulers for clusters (Sun Grid Engine, LSF, PBS) • Agree on a higher-level Grid scheduler • Provide good documentation and bindings of the Grid scheduler to the predominant cluster schedulers • Work on new bindings • Here we are already quite advanced, can make good use of results of other projects – but still a long way to go! PMB, 12_07_2006, P. Kunszt

  28. Challenge: Numerical stability Athlon CPU Bug fixed Itanium2 CPU Before: After: PMB, 12_07_2006, P. Kunszt PDB: 3dfr

  29. Challenge: Security • Sensitive data, data safety • Rely on standards for Authentication and Authorization • Network data channel encryption • Encryption of distributed data on storage • Distributed keys and algorithms for retrieval (n of m schemes) • Not at all addressed yet; a lot of room for improvement PMB, 12_07_2006, P. Kunszt

  30. Challenge: Legacy • Licensed, proprietary, legacy code • Solve the problem together with the software provider • New licensing models for distributed computing (e.g. license servers don’t scale) • Legacy support • Recompilation if possible • Emulators • Virtual machines • Virtual Machines may be the way forward for many of these applications – but not production quality yet, lot of research to be done; also a lot of room for improvement PMB, 12_07_2006, P. Kunszt

  31. Challenge: User Interface • Users don’t want to deal with Grid specifics • Set up a Grid Portal • Many portals exist, however almost none have a good application-specific interface for the users • Proteomics Project addresses this: dedicated proteomics pipelining portal based on existing Grid portal technologies – work started now together with the Swiss Institute of Bioinformatics and SZTAKI using P-GRADE • P-GRADE also addresses Legacy issues to some extent PMB, 12_07_2006, P. Kunszt

  32. SEPAC UniZurich ETH CSCS CompuLab ETH CILEA UniNa UniLe UniCal SEPAC stands for South European Partnership for Advanced Computing • SPACI consortium - University of Lecce - University of Calabria - Hewlett-Packard • CILEA • CSCS • ETHZ • UNIZH PMB, 12_07_2006, P. Kunszt

  33. SEPAC Project Scope Infrastructure and Technology oriented collaboration Exploration of technology and interoperability Application portfolio being built Building on another Grid Portal: the Grid Resource Broker from the Univ. of Lecce PMB, 12_07_2006, P. Kunszt

  34. Intelligent Scheduling System ISS • Partners: CSCS, EPFL, EIA-FR • Provide a middleware service allowing optimal placement and scheduling of applications on the Grid – submit to the most suited computer architecture based on resource and application monitoring • Research-oriented project, exploiting new ideas for a scheduling approach (2 PhDs) PMB, 12_07_2006, P. Kunszt

  35. ISS Details ETHZ: SMP/NUMA High gM cluster EPFL: SMP/NUMA High gM cluster CERN: EGEE EIF: NoW CSCS: SMP/vector Low gM cluster • Cost function includes monitoring data on machine status and application behaviour. Usage of Γ model. See http://pleiades1.epfl.ch/~rgruber/projects/iss.pdf • Monitoring Data on machines and applications delivered by application monitoring and the service itself • Actual job submission through existing Grid middleware First Testbed : EPFL Mechanics departement machines (clusters & single CPU machines) Second Testbed : Whole EPFL Third Testbed : EPFL + CSCS + EIA-Fr + ETHZ machines I S S Switch PMB, 12_07_2006, P. Kunszt

  36. Exemple : Integration of ISS into VIOLA/MSS/UniCORE Environment Team : CSCS, EPFL, EIA-Fr, FhG, JFZ PMB, 12_07_2006, P. Kunszt

  37. And There Are More… • … Swiss Involvements in Grids • CoreGrid: EPFL, CSCS, EIA-FR • KnowARC project: University of Geneva • DILIGENT: University of Basel • EMBRACE: University of Lausanne, SIB • Computational Chemistry Grid: University of Zurich • … PMB, 12_07_2006, P. Kunszt

  38. Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

  39. Are Grids Just a Hype? • Grids respond to a Paradigm Shift in Scientific Discovery • Paradigm Shift from Individual Researchers to Collaborations • Project driven research – joint work preferred over one-man shows • Collaborations do achieve the most relevant results these days • Need for Collaborative Computing Platforms • Need for temporary Virtual Organizations to do work, share data and results and publish results Grids are here to stay PMB, 12_07_2006, P. Kunszt

  40. All Grids? • Successful Grids are measured by the success of their users • Ease of use • Ease of configuration • Non-intrusiveness at participating sites • Security • Robustness Some Grids will Disappear PMB, 12_07_2006, P. Kunszt

  41. Measure of Success • Users are producing scientific results • Harnessing increased computing capacity • Easy integration of applications – users can focus on their field instead of computing • Number of Publications • Complexity of applications We are not here yet • New Projects WANT to use your Grid instead of building their own • If people knock on your door that they want to work with you, you know you are successful • Your Repository of Middleware is used by others • You need robust, professionally documented, re-usable software • Using Grid Service standards, interoperable • Mandatory collaboration with other Grid projects and Universities PMB, 12_07_2006, P. Kunszt

  42. Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

  43. National Grids • Scaling large multinational projects can only be done through a well-managed hierarchy • Strategy of long-term infrastructures will follow the NREN model • EU drives in this direction: building on national infrastructures • Visible results of National Financing build the basis for EU funding PMB, 12_07_2006, P. Kunszt

  44. Participating in Large Multinational Projects: ADVANTAGES • Being part of the game, enabling national users to play on the large international playground • Access to a much larger infrastructure • Ability to voice local interests to the large community • Ability to focus on strengths, taking components from others • Building expertise in large Grids • Profiting from international funding • Visibility of the national efforts on an international scale, raising the attractivity of the country PMB, 12_07_2006, P. Kunszt

  45. Participating in Large Multinational Projects: DISADVANTAGES • In large multinational projects the large nations will dominate • Many technological decisions are political and not baswed on quality • Choice of middleware components • Assigning development tasks to concurring teams • Inefficiency of large projects • Communication Overhead • Meetings, conferences, telephones, emails... • Internal arguments • Need for Compromise – Slow Decision making • Positioning inside a project very important • Expertise • Choice of partners inside the project PMB, 12_07_2006, P. Kunszt

  46. Links • SwissGrid Initiative: • http://www.swiss-grid.org/ or • http://www.gridinitiative.ch/ • CSCS: • http://cscs.ch/ PMB, 12_07_2006, P. Kunszt

More Related