370 likes | 423 Views
Grid Computing. Sudhindra Rao. Outline. History of Distributed Computing Grid – Definition, Architecture details P2P versus Grid Webservices Java – anywhere computing paradigm Middleware Grid models and recent research Research directions Tools and grids available References. History.
E N D
Grid Computing Sudhindra Rao
Outline • History of Distributed Computing • Grid – Definition, Architecture details • P2P versus Grid • Webservices • Java – anywhere computing paradigm • Middleware • Grid models and recent research • Research directions • Tools and grids available • References
History • Shift from Centralized Computing to Distributed Computing – powerful processors, faster networks • Parallel computing based on MPI and PVM models • Cluster Computing • Peer-to-peer computing • Grid computing
Application and Infrastructure technology trends • P2P • App Integration • Reliable messaging • Reliable execution • Service virtualization • Web services • Service Registration • Service discovery • Location independent service invocation • Lifting apps off the servers • Serial applications • Parallel applications • Multi-threaded • MPI/PVM • OpenMP • Client Server • CORBA • COM/DCOM • . NET • J2EE • Custom distributed systems • Open Systems • Unix • Windows • Linux • Storage : • Direct attached Storage • Clusters • DRM • Storage : • Direct attached Storage • Infrastructure Virtualization • Grid • OGSA • Data Grid • Service provisioning • Mainframes • Storage : • Direct attached Storage Time Monolithic Open Distributed Virtualized
Technology Evolution: Cluster, Grid, P2P * HTC * P2P * PDAs Minicomputers * * PCs * Workstations * Mainframes COMPUTING * Grids * PC Clusters * Crays * MPPs * WS Clusters *XEROX PARC worm * IETF * W3C * TCP/IP NETWORKING * Ethernet * HTML * Mosaic * Email * Sputnik * Internet Era * WWW Era * XML * ARPANET 1960 1970 1975 1980 1985 1990 1995 2000 * Web Services
What is Cluster/Grid ? A Cluster A Cluster A Cluster Grid • A type of parallel and distributed system that enables the sharing, selection, & aggregationof resources distributed in administrative domains depending on their availability, capability, performance, cost, and users quality of service requirements. A Single Cluster
Approaches for Parallel Programming • Implicit Parallelism • Supported by parallel languages and parallelizing compilers that take care of identifying parallelism, the scheduling of calculations and the placement of data. • Explicit Parallelism • In this approach, the programmer is responsible for most of the parallelization effort such as task decomposition, mapping task to processors, the communication structure. • This approach is based on the assumption that the user is often the best judge of how parallelism can be exploited for a particular application.
Parallel Programming Models and Tools • Shared Memory Model • DSM • Threads/OpenMP (enabled for clusters) • Java threads (HKU JESSICA, IBM cJVM) • Message Passing Model • PVM • MPI • Hybrid Model • Mixing shared and distributed memory model • Using OpenMP and MPI together • Object and Service Oriented Models • Wide area distributed computing technologies • OO: CORBA, DCOM, etc. • Services: Web Services-based service composition
Levels of Parallelism Code-Granularity Code Item Large grain (task level) Program Medium grain (control level) Function (thread) Fine grain (data level) Loop (Compiler) Very fine grain (multiple issue) With hardware Task i-l Taski Task i+1 PVM/MPI func1 ( ) { .... .... } func2 ( ) { .... .... } func3 ( ) { .... .... } Threads a ( 0 ) =.. b ( 0 ) =.. a ( 1 )=.. b ( 1 )=.. a ( 2 )=.. b ( 2 )=.. Compilers + x Load CPU
Cluster Architecture PC/Workstation PC/Workstation PC/Workstation PC/Workstation Communications Software Communications Software Communications Software Communications Software Network Interface Hardware Network Interface Hardware Network Interface Hardware Network Interface Hardware Parallel Applications Parallel Applications Parallel Applications Sequential Applications Sequential Applications Sequential Applications Parallel Programming Environment Cluster Middleware (Single System Image and Availability Infrastructure) Cluster Interconnection Network/Switch
A Typical P2P Computing Environment Peer Discovery Service Peer Agent Application P3 pM Who can help ? Peer P2, P7 can help! pN Request P2 Sorry, I am busy. Peer Agent Request Peer Agent Response P1 R7 p4 p5
CPM: DC Economy-based P2P Computing(Jxta based Implementation) Market Server Market Repository CPM Agent Bill • Discovery • - Membership User (Consumer) Trader Job Management Resources (Provider) Accounting
Definition of a Grid • Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed "autonomous" resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements • Coordinated resource sharing and problem solving in dynamic, multi-institutional Virtual Organizations (VOs) • Most current distributed technologies facilitate this in a local environment • J2EE, CORBA, VPN are a few examples • Nomadic users and applications provide new avenues for providing such a service • Mechanisms required to coordinate trusted and untrusted access to resources
A Typical Grid Computing Environment database Grid Information Service Grid Resource Broker Application R2 2 R3 R4 R5 RN Grid Resource Broker R6 R1 Resource Broker Grid Information Service
Virtual Drug DesignA Virtual Lab for “Molecular Modeling for Drug Design” on P2P Grid GTS GTS GTS GTS Grid Info. Service Grid Market Directory Data Replica Catalogue “Give me list PDBs sources Of type aldrich_300?” “service cost?” “service providers?” GTS Resource Broker “Screen 2K molecules in 30min. for $10” “mol.5 please?” (RB maps suitable Grid nodes and Protein DataBank) “get mol.10 from pdb1 & screen it.” PDB2 “mol.10 please?” (GTS - Grid Trade Server) PDB1
Scalable Seamless Computing: Breaking Administrative Barriers 2100 2100 2100 2100 2100 2100 2100 2100 2100 ? PERFORMANCE Administrative Barriers • Individual • Group • Department • Campus • State • National • Globe • Inter Planet • Galaxy Desktop SMPs or SuperComputers Global Cluster/Grid Inter Planetary Grid! Local Cluster Enterprise Cluster/Grid
Basic Elements Security Data locality Resource Allocation & Scheduling Computational Economy Uniform Access System Management Resource Discovery Network Management Application Development Tools
Issues in Grid computing • Protocols required for interoperability • Define standard services – for access of computation, data, resource discovery etc. • APIs and SDKs to assist such protocol and service deployment • Current Distributed Computing – Resource sharing in single organization – limited to sharing certain resource types only • Need of services to support a common set of applications – Middleware
Projects • Globus – A toolkit for grid computing infrastructure development • Gridbus • Legion • OGSA – Standard for developing Grid application infrastructure (derived from Globus)
NetSolve mix-and-match Object-oriented Internet/partial-P2P Grid Computing Approaches Network enabled Solvers Economic-based Utility / Service-Oriented Computing Nimrod-G
Australia Nimrod-G Gridbus GridSim Virtual Lab DISCWorld GrangeNet. ..etc Europe UK eScience EU Data Grid Cactus XtremeWeb ..etc. India I-Grid Japan Ninf DataFarm Korea... N*Grid Singapore NGP USA AppLeS Globus Legion Sun Grid Engine NASA IPG Condor-G Jxta NetSolve AccessGrid and many more... Cycle Stealing & .com Initiatives Distributed.net SETI@Home, …. Entropia, UD, SCS,…. Public Forums Global Grid Forum Australian Grid Forum IEEE TFCC CCGrid conference P2P conference Some Global Initiatives
Globus Approach • A toolkit and collection of services addressing key technical problems • Modular “bag of services” model • Not a vertically integrated solution • General infrastructure tools (aka middleware) that can be applied to many application domains • Inter-domain issues, rather than clustering • Integration of intra-domain solutions • Distinguish between local and global services
Grid computing – SuperScalar modelIBM Grid • Ease the programming of GRID applications • Basic idea: ns seconds/minutes/hours
Automatic code generation client server app.idl gsstubgen app-stubs.c app.h app.c app-worker.c app-functions.c
Automatic code generation app-worker.c app-worker.c app-functions.c app-functions.c serveri serveri app.c app-stubs.c . . . GRID superscalar runtime GT2 client
Production Grids & Testbeds NASA’s Information Power Grid The Alliance National Technology Grid GUSTO Testbed
Testbed Statistics(Browse the Testbed) • Grid Nodes: 218 distributed across 62 sites in 21 countries. • Laptops, desktop PCs, WS, SMPs, Clusters, supercomputers • Total CPUs: 3000+ (~3 TeraFlops) • CPU Architecture: • Intel x86, IA64, AMD, PowerPC, Alpha, MIPS • Operating Systems: • Windows or Unix-variants – Linux, Solaris, AIX, OSF, Irix, HP-UX • Intranode Network: • Ethernet, Fast Ethernet, Gigabit, Myrinet, QsNet, PARAMNet • Internet/Wide Area Networks • GrangeNet, AARNet, ERNet, APAN, TransPAC, & so on.
Grid Technologies and Applications Globus High Energy Physics Brain Activity Analysis Grid Apps. Natural Language Engineering Molecular Docking Portfolio Analysis GAMESSChemistry High-level Services and Tools … User-LevelMiddleware (Grid Tools) Gridscape G-Monitor Programming Framework Grid Brokers & Schedulers Nimrod-G Gridbus Data Broker Alchemi: .NET Grid Services +Clustering of desktop PCs Data Management Services GridBank GMD Core Grid Middleware MDS GRAM GASS PKI-basedGrid Security Interface (GSI) .NET JVM Condor PBS SGE LSF Tomcat Grid Fabric Windows Linux AIX IRIX OSF1 HP UX Solaris
Classes of Applications that can be powered by Grids • Distributed HPC (Supercomputing): • Computational science. • High-Capacity/Throughput Computing: • Large scale simulation/chip design & parameter studies. • Content Sharing (free or paid) • Sharing digital contents among peers (e.g., Napster) • Remote software access/renting services: • Application service provides (ASPs) & Web services. • Data-intensive computing: • Drug Design, Particle Physics, Stock Prediction... • On-demand, realtime computing: • Medical instrumentation & Mission Critical. • Collaborative Computing: • Collaborative design, Data exploration, education. • Service Oriented Computing (SOC): • Towards economic-based Utility Computing: New paradigm, new applications, new industries, and new business.
What is Grid computing? • Grid is the next-generation internet • Grid requires a distributed operating system • Grid requires new programming models • Grid does not need high performance computers
Research directions • Publisher/Subscriber systems on the Grid – How can the grid be used to manage such applications and what are the issues • What levels of selectivity and regionalism is expected from VOs? • How to handle the dynamics of the topology and nodes? • Addressing QoS on Grid – best effort ? • Efficient Discovery and Retrieval • Replication techniques
References • List of available resources on grid computing - http://www.gridcomputing.com • Foster I., Kesselman, C., and Tuecke, S., - “The Anatomy of the Grid- Enabling Scalable Virtual Organizations” – Intl J. SuperComputer Applications, 2001 • Casanova, H., “Distributed Computing Research Issues in Grid Computing” – ACM SIGACT News Distributed Computing Column 8 July, 2002 • Lau, F., Ho, R. and Wang, C., “Grid Computing: Challenges and Design Approaches” • “The grid : blueprint for a new computing infrastructure” Editors Foster, I., and Kesselman, C. , Elsevier, 2004