470 likes | 634 Views
The SwissGrid Initiative. Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest, July, 2006. Peter Kunszt. Doctorate in Theoretical Physics from the University of Bern. Building the Science Database of the Sloan Digital Sky Survey, Johns Hopkins University Baltimore.
E N D
The SwissGrid Initiative Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest, July, 2006
Peter Kunszt Doctorate in Theoretical Physics fromthe University of Bern Building the Science Database of the Sloan Digital Sky Survey, Johns Hopkins University Baltimore EU Grid Projects, leading data management middleware developmentCERN, Geneva Manager Swiss Grid Initiative,Swiss National Supercomputing Centre CSCSManno PMB, 12_07_2006, P. Kunszt
CSCS PMB, 12_07_2006, P. Kunszt
Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt
Grid Computing in Switzerland – High Level Goals • Resource Sharing: Pooling of Available Resources • Excellent national network provided by national research network provider SWITCH • Optimal usage of national resources • Pooling of available resources at research institutions • Harvesting cycles on as of yet unused resources (e.g. classroom PCs, cluster backfill queues) PMB, 12_07_2006, P. Kunszt
Grid Computing in Switzerland – High Level Goals • Coordination: Building an Infrastructure • Agreements on the usage of the available resources • Coordinated support of the resources • Sharing of tools and middleware PMB, 12_07_2006, P. Kunszt
Grid Computing in Switzerland – High Level Goals • Collaboration: Enabling Scientific Discovery • Coordinated application usage, thematic Grids • Building a community • Establishing a joint knowledge base PMB, 12_07_2006, P. Kunszt
Swiss Grid Initiative • Taking care of coordinating and supporting national Grid projects. • Point of contact for all Grid Projects • Point of support for all Grid users and administrators • Representation of Swiss Academic Research Interests • In Europe • Globally • Towards the Industry PMB, 12_07_2006, P. Kunszt
The Swiss Grid Initiative • The Swiss Grid Initiative has been created to • Provide support and expertise for the Swiss research community • Promote connectivity and collaboration between disciplines and users, especially CS and ‘high-need’ applications • Represent the interests of the national research community towards other national and EU Grid projects • Get involved in joint multinational projects, help Swiss partners to get funding • Interact with the industry in joint projects • Continuously initiate thematic projects, including e-Science pilot studies • Research and develop middleware components to fill gaps and to improve the services to the community and with the community PMB, 12_07_2006, P. Kunszt
Swiss Grid Initiative Focus • Support the End-User • Enabling Relevant Scientific Discovery from Day 1 (no testbeds) • Consulting about Gridification – not every project is suitable for the High Throughput Paradigm • Seek new opportunities and initiate new projects PMB, 12_07_2006, P. Kunszt
Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt
EGEE and LCG • SEE PREVIOUS PRESENTATION FOR DETAILS • European Grid Infrastructure for Enabling E-science • Teaming up with the D-Grid in the DECH Federation PMB, 12_07_2006, P. Kunszt
Tier-ed model Lab m Uni x CERNTier 1 grid for a regional group UK Lab a USA CSCS France Tier3 physics department Tier 1 Uni a Tier2 CERN Tier 0 Japan Italy Germany Lab b Taipei Lab c grid for a physics study group Uni y Uni b PMB, 12_07_2006, P. Kunszt
CSCS – Swiss Supercomputing Centre SWITCH – Swiss Research & Edu Network CSCS SA1, NA2, NA3, NA4 Is an LCG Tier2 Site Support for Region and all of EGEE Analysis of Physics Data Biomed, Comp.Chemistry, EO applications Training, Education, Public Relations SWITCH: JRA1 Security Middleware: Next generation of Grid Certificates by integrating Shibboleth and PKI Swiss Partners PMB, 12_07_2006, P. Kunszt
Challenges in EGEE: View of CSCS • Nontrivial Administration • Steep learning curve to become ‘EGEE member’ • Reporting, Deliverables, etc • Substantial Communication Overhead • Finding the right partner to communicate with • Many bodies, forums, sometimes contradictory information • It helps to be vocal – just trotting along silently will not help to improve the project • Infrastructure: substantial effort • To keep the site running • To respond to updates • Many things are not well automated • Mean Time Between Failure very low (in Grid middleware) Complex System: Many things break in many ways PMB, 12_07_2006, P. Kunszt
Swiss Bio Grid PMB, 12_07_2006, P. Kunszt
Swiss Bio Grid Applications • Usage Patterns of different Applications • Identified three classes of applications • Short CPU jobs (Docking) • Medium CPU + data exchange (Proteomics Pipelining) • Data intensive (Mass Spectrometry MS; Systems Biology) • Strategy: Address them in sequence, find commonalities • Dengue docking project (see next slides) • swissPIT (Protein Identification Toolbox) Project starting now PMB, 12_07_2006, P. Kunszt
Orphan Diseases: Dengue PMB, 12_07_2006, P. Kunszt
What does it take to make a drug? Target validation Opti- mization Target ID Screening Clinical Preclinical BIOLOGY CHEMISTRY DEVELOPMENT • 12years of development, 802 mio US$ (DiMasi, J.A. et al. (2003) J Health Econ, 22, 151-185). • 1 in 10‘000 NCE becomes a product(Heilman, R.D. (1995) Qual Assur 4(1) 75-9.) • ‚Only‘ 20 years of Patent – 8 years to make money PMB, 12_07_2006, P. Kunszt
“In Silico” Drug Development Bioinformatics, data mining, visualization, simulations, modeling, and many algorithms, databases PMB, 12_07_2006, P. Kunszt
Screening of compounds Computational screening of small compounds to identify early drug candidates PMB, 12_07_2006, P. Kunszt
Dengue Docking project • Proof of concept for successful private-public partnership • Biozentrum:in silico docking • Novartis Institute for Tropical Deseases:In vitro/in vivo follow-up • Novartis:drug development at cost PMB, 12_07_2006, P. Kunszt
Dengue Docking project COMPOUNDLIBRARIES TARGETPROTEINS IT INFRASTRUCTURE ALGORITHMS char* filename = argv[1]; int seed; (argc > 2) ? seed = atoi(argv[2]) : seed = /* fill the array of random numbers */ double numbers[ITERATIONS]; //double foo = 0; for (int i = 0; i < ITERATIONS; i++) numbers[i] = (double)ra //numbers[i] = foo++; / e array to the file */ ile = fopen ( filename, "w+" ); if (myFile == 0) { err << "could not ); } fwrite (numbe ITERATIONS fflush (myFil fclose (myFile) 3D structure of targets • NS5 Methyltransferase • NS3 Protease • GPE Envelope Glycoprotein • NS3 Helicase NCI Diversity (2k) NCI DTP (200k) ZINC (2700k) • DOCK 5.1 • Autodock 3.05 • FlexX (SCAI/BioSolvIT) • GLIDE(Schrödinger) PMB, 12_07_2006, P. Kunszt
Dengue NS5 Methyltransferase PDB 1R6A: Structure solved in complex with Ribavirin and AdoHCys 2' O-methylation of viral RNA (2nd capping step of type 1 RNA cap) Cofactor: SAM Deletion of SAM domain aborts viral replication in Kunjin (Koonin, 1993) PMB, 12_07_2006, P. Kunszt
Current Achievements of GRID-enabled Dengue Docking • Completed Phase I SwissBioGrid • Completed large-scale parameterization testusingAutodock 3.0.5: >500‘000 docking runs, >38‘000h CPU time • In vitro testing of predicted binders is underway at NITD • Some initial candidates alreadyin next phase PMB, 12_07_2006, P. Kunszt
Some challengs in grid adoption • Compute resources are busy already • Agree on dedicated compute time for grid projects • PC Desktop grids: untapped resource • Buy new clusters for your grid (not the idea) • Non-intrusiveness • Firewall exceptions • Non-intrusiveness on PC Desktop grids: application level • Application clearing: • Security issues • Numerical stability in heterogeneous environments • Data model in bioinformatics different from HEP • Applications need access to large databases or data sets PMB, 12_07_2006, P. Kunszt
Challenge: Heterogeneity • Very different resources at participating institutes • Use ‘standard’ schedulers for clusters (Sun Grid Engine, LSF, PBS) • Agree on a higher-level Grid scheduler • Provide good documentation and bindings of the Grid scheduler to the predominant cluster schedulers • Work on new bindings • Here we are already quite advanced, can make good use of results of other projects – but still a long way to go! PMB, 12_07_2006, P. Kunszt
Challenge: Numerical stability Athlon CPU Bug fixed Itanium2 CPU Before: After: PMB, 12_07_2006, P. Kunszt PDB: 3dfr
Challenge: Security • Sensitive data, data safety • Rely on standards for Authentication and Authorization • Network data channel encryption • Encryption of distributed data on storage • Distributed keys and algorithms for retrieval (n of m schemes) • Not at all addressed yet; a lot of room for improvement PMB, 12_07_2006, P. Kunszt
Challenge: Legacy • Licensed, proprietary, legacy code • Solve the problem together with the software provider • New licensing models for distributed computing (e.g. license servers don’t scale) • Legacy support • Recompilation if possible • Emulators • Virtual machines • Virtual Machines may be the way forward for many of these applications – but not production quality yet, lot of research to be done; also a lot of room for improvement PMB, 12_07_2006, P. Kunszt
Challenge: User Interface • Users don’t want to deal with Grid specifics • Set up a Grid Portal • Many portals exist, however almost none have a good application-specific interface for the users • Proteomics Project addresses this: dedicated proteomics pipelining portal based on existing Grid portal technologies – work started now together with the Swiss Institute of Bioinformatics and SZTAKI using P-GRADE • P-GRADE also addresses Legacy issues to some extent PMB, 12_07_2006, P. Kunszt
SEPAC UniZurich ETH CSCS CompuLab ETH CILEA UniNa UniLe UniCal SEPAC stands for South European Partnership for Advanced Computing • SPACI consortium - University of Lecce - University of Calabria - Hewlett-Packard • CILEA • CSCS • ETHZ • UNIZH PMB, 12_07_2006, P. Kunszt
SEPAC Project Scope Infrastructure and Technology oriented collaboration Exploration of technology and interoperability Application portfolio being built Building on another Grid Portal: the Grid Resource Broker from the Univ. of Lecce PMB, 12_07_2006, P. Kunszt
Intelligent Scheduling System ISS • Partners: CSCS, EPFL, EIA-FR • Provide a middleware service allowing optimal placement and scheduling of applications on the Grid – submit to the most suited computer architecture based on resource and application monitoring • Research-oriented project, exploiting new ideas for a scheduling approach (2 PhDs) PMB, 12_07_2006, P. Kunszt
ISS Details ETHZ: SMP/NUMA High gM cluster EPFL: SMP/NUMA High gM cluster CERN: EGEE EIF: NoW CSCS: SMP/vector Low gM cluster • Cost function includes monitoring data on machine status and application behaviour. Usage of Γ model. See http://pleiades1.epfl.ch/~rgruber/projects/iss.pdf • Monitoring Data on machines and applications delivered by application monitoring and the service itself • Actual job submission through existing Grid middleware First Testbed : EPFL Mechanics departement machines (clusters & single CPU machines) Second Testbed : Whole EPFL Third Testbed : EPFL + CSCS + EIA-Fr + ETHZ machines I S S Switch PMB, 12_07_2006, P. Kunszt
Exemple : Integration of ISS into VIOLA/MSS/UniCORE Environment Team : CSCS, EPFL, EIA-Fr, FhG, JFZ PMB, 12_07_2006, P. Kunszt
And There Are More… • … Swiss Involvements in Grids • CoreGrid: EPFL, CSCS, EIA-FR • KnowARC project: University of Geneva • DILIGENT: University of Basel • EMBRACE: University of Lausanne, SIB • Computational Chemistry Grid: University of Zurich • … PMB, 12_07_2006, P. Kunszt
Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt
Are Grids Just a Hype? • Grids respond to a Paradigm Shift in Scientific Discovery • Paradigm Shift from Individual Researchers to Collaborations • Project driven research – joint work preferred over one-man shows • Collaborations do achieve the most relevant results these days • Need for Collaborative Computing Platforms • Need for temporary Virtual Organizations to do work, share data and results and publish results Grids are here to stay PMB, 12_07_2006, P. Kunszt
All Grids? • Successful Grids are measured by the success of their users • Ease of use • Ease of configuration • Non-intrusiveness at participating sites • Security • Robustness Some Grids will Disappear PMB, 12_07_2006, P. Kunszt
Measure of Success • Users are producing scientific results • Harnessing increased computing capacity • Easy integration of applications – users can focus on their field instead of computing • Number of Publications • Complexity of applications We are not here yet • New Projects WANT to use your Grid instead of building their own • If people knock on your door that they want to work with you, you know you are successful • Your Repository of Middleware is used by others • You need robust, professionally documented, re-usable software • Using Grid Service standards, interoperable • Mandatory collaboration with other Grid projects and Universities PMB, 12_07_2006, P. Kunszt
Content • Swiss Grid Initiative • Swiss Involvements in Grid Projects – challenges • EGEE • Swiss Bio Grid • SEPAC • Intelligent Scheduling System ISS • Importance of Grids in General • Beyond the Hype • Strategies for Successful Grids • Importance of National Grids • Why is it necessary to have a national Grid • Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt
National Grids • Scaling large multinational projects can only be done through a well-managed hierarchy • Strategy of long-term infrastructures will follow the NREN model • EU drives in this direction: building on national infrastructures • Visible results of National Financing build the basis for EU funding PMB, 12_07_2006, P. Kunszt
Participating in Large Multinational Projects: ADVANTAGES • Being part of the game, enabling national users to play on the large international playground • Access to a much larger infrastructure • Ability to voice local interests to the large community • Ability to focus on strengths, taking components from others • Building expertise in large Grids • Profiting from international funding • Visibility of the national efforts on an international scale, raising the attractivity of the country PMB, 12_07_2006, P. Kunszt
Participating in Large Multinational Projects: DISADVANTAGES • In large multinational projects the large nations will dominate • Many technological decisions are political and not baswed on quality • Choice of middleware components • Assigning development tasks to concurring teams • Inefficiency of large projects • Communication Overhead • Meetings, conferences, telephones, emails... • Internal arguments • Need for Compromise – Slow Decision making • Positioning inside a project very important • Expertise • Choice of partners inside the project PMB, 12_07_2006, P. Kunszt
Links • SwissGrid Initiative: • http://www.swiss-grid.org/ or • http://www.gridinitiative.ch/ • CSCS: • http://cscs.ch/ PMB, 12_07_2006, P. Kunszt