410 likes | 736 Views
Grid Computing 1. Grid Book, Chapters 1, 2, 3, 22 “Implementing Distributed Synthetic Forces Simulations in Metacomputing Environments” Brunett, Davis, Gottschalk, Messina, Kesselman http://www.globus.org. Outline. What is Grid computing? Grid computing applications Grid computing history
E N D
Grid Computing 1 Grid Book, Chapters 1, 2, 3, 22 “Implementing Distributed Synthetic Forces Simulations in Metacomputing Environments” Brunett, Davis, Gottschalk, Messina, Kesselman http://www.globus.org CSE 160/Berman
Outline • What is Grid computing? • Grid computing applications • Grid computing history • Issues in Grid Computing • Condor, Globus, Legion • The next step CSE 160/Berman
What is Grid Computing? • Computational Gridis a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications • Computational Grid also called “metacomputer” CSE 160/Berman
Computational Grids • Term computational grid comes from an analogy with the electric power grid: • Electric power is ubiquitous • Don’t need to know the source (transformer, generator) of the power or the power company that serves it • Analogy falls down in the area of performance • Ever-present search for cycles in HPC. Two foci of research • “In the box” parallel computers -- PetaFLOPS architectures • Increasing development of infrastructure and middleware to leverage the performance potential of distributed Computational Grids CSE 160/Berman
Grid Applications • Distributed Supercomputing • Distributed Supercomputing applications couple multiple computational resources – supercomputers and/or workstations • Examples include: • SFExpress (large-scale modeling of battle entities with complex interactive behavior for distributed interactive simulation) • Climate Modeling (high resolution, long time scales, complex models) CSE 160/Berman
Distributed Supercomputing Example – SF Express • SF Express = (Synthetic Forces Express) large scale distributed simulation of behavior and movement of entities (tanks, trucks, airplanes, etc.) for interactive battle simulation. • Entities require information about • State of terrain • Location and state of other entities • Info updated several times a second • Interest management allows entities to only look at relevant information, enabling scalability CSE 160/Berman
SF Express • Large scale SF Express run goals • Simulation of 50,000 entities in 8/97, 100,000 entries in 3/98 • Increase fidelity and resolution of simulation over previous runs • Improve • Refresh rate • Training environment responsiveness • Number of automatic behaviors • Ultimately use simulation for real-time planning as well as training • Large scale runs extremely resource-intensive CSE 160/Berman
SF Express Programming Issues • How should entities be mapped to computational resources? • Entities receive information based on “interests” • Communication reduced and localized based on “interest management” • Consistency model for entity information must be developed • Which entities can/should be replicated? • How should updates be performed? CSE 160/Berman
R R R R I I I S S S S S S S S S S S S D D D S S S SF Express Distributed Application Architecture • D = data server, I = interest management, R = router, S = simulation node CSE 160/Berman
50,000 entity SF Express Run • 2 large-scale simulations run on August 11, 1997 CSE 160/Berman
50,000 entity SF Express Run • Simulation decomposed terrain (Saudi Arabia, Kuwait, Iraq) contiguously among supercomputers • Each supercomputer simulated a specific area and exchanged interest and state information with other supercomputers • All data exchanges were flow-controlled • Supercomputers fully interconnected, dedicated for experiment • Success depended on “moderate to significant system administration, interventions, competent system support personnel, and numerous phone calls.” • Subsequent Globus runs focused on improving data, control management and operational issues for wide area CSE 160/Berman
High-Throughput Applications • Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work • High-throughput applications include RSA keycracking, Seti@home (detection of extra-terrestrial intelligence), MCell CSE 160/Berman
High-Throughput Applications • Biggest master/slave parallel program in the world with master = website, slaves = individual computers CSE 160/Berman
High-Throughput Example - MCell • MCell – Monte Carlo simulation of cellular microphysiology. Simulation implemented as large-scale parameter sweep. CSE 160/Berman
MCell • MCell architecture: simulations performed by independent processors with distinct parameter sets and shared input files CSE 160/Berman
MCell Programming Issues • How should we assign tasks to processors to optimize locality? • How can we use partial results during execution to steer the computation? • How do we mine all the resulting data from experiments for results • During execution • After execution • How can we use all available resources? CSE 160/Berman
Data-Intensive Applications • Focus is on synthesizing new information from large amounts of physically distributed data • Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications (Grid version of MS Terraserver), digital library applications CSE 160/Berman
Data-Intensive Example - SARA • SARA = Synthetic Aperture Radar Atlas • application developed at JPL and SDSC • Goal:Assemble/process files for user’s desired image • Radar organized into tracks • User selects track of interestand properties to be highlighted • Raw data is filtered and converted to an image format • Image displayed in web browser
Data Servers Compute Servers Client Computation servers and data servers are logical entities, not necessarily different nodes . . . SARA Application Architecture • Application structure focused around optimizing the delivery and processing of distributed data
OGI UTK UCSD SARA Programming Issues • Which data server should replicated data be accessed from? • Should computation be done at the data server or data moved to a compute server or something in between? • How big are the data files and how often will they be accessed? AppLeS/NWS
TeleImmersion • Focus is on use of immersive virtual reality systems over a network • Combines generators, data sets and simulations remote from user’s display environment • Often used to support collaboration • Examples include • Interactive scientific visualization (“being there with the data”), industrial design, art and entertainment CSE 160/Berman
Teleimmersion Example – Combustion System Modeling • A shared collaborative space • Link people at multiple locations • Share and steer scientific simulations on supercomputer • Combustion code developed by Lori Freitag at ANL • Boiler application used to troubleshoot and design better products Chicago San Diego CSE 160/Berman
Early Experiences with Grid Computing • Gigabit Testbeds Program • Late 80’s, early 90’s, gigabit testbed program was developed as joint NSF, DARPA, CNRI (Corporation for Networking Research, Bob Kahn) initiative • Goals were to • investigate potential architecture for a gigabit/sec network testbed • explore usefulness for end-users CSE 160/Berman
Gigabit Testbeds –Early 90’s • 6 testbeds formed: • CASA (southwest) • MAGIC (midwest) • BLANCA (midwest) • AURORA (northeast) • NECTAR (northeast) • VISTANET (southeast) • Each had a unique blend of research in applications and in networking and computer science research CSE 160/Berman
Gigabit Testbeds CSE 160/Berman
Gigabit Testbeds CSE 160/Berman
I-Way • First large-scale “modern” Grid experiment • Put together for SC’95 (the “Supercomputing” Conference) • I-Way consisted of a Grid of 17 sites connected by vBNS • Over 60 applications ran on the I-WAY during SC’95 CSE 160/Berman
I-Way “Architecture” • Each I-WAY site served by an I-POP (I-WAY Point of Presence) used for • authentication of distributed applications • distribution of associated libraries and other software • monitoring the connectivity of the I-WAY virtual network • Users could use single authentication and job submission across multiple sites or they could work directly with end-users • Scheduling done with a “human-in-the-loop” CSE 160/Berman
I-Soft – Software for I-Way • Kerberos based authentication • I-POP initiated rsh to local resources • AFS for distribution of software and state • Central scheduler • Dedicated I-WAY nodes on resource • Interface to local scheduler • Nexus based communication libraries • MPI, CaveComm, CC++ • In many ways, I-Way experience formed foundation of Globus CSE 160/Berman
SPRINT I-Way Application: Cloud Detection • Cloud detection from multimodal satellite data • Want to determine if satellite image is clear, partially cloudy or completely cloudy • Used remote supercomputer to enhance instruments with • Real-time response • Enhanced function, accuracy (of pixel image) • Developed by C. Lee, Aerospace Corporation, Kesselman, Caltech et al. CSE 160/Berman
PACIs • 2 NSF Supercomputer Centers (PACIs) – SDSC/NPACI and NCSA/Alliance, both committed to Grid computing • vBNS backbone between NCSA and SDSC running at OC-12 with connectivity to over 100 locations at speeds ranging from 45 Mb/s to 155 Mb/s or more CSE 160/Berman
PACI Grid CSE 160/Berman
NPACI Grid Activities • Metasystems Thrust Area one of the NPACI technology thrust areas • Goal is to create an operational metasystems for NPACI • Metasystems players: • Globus (Kesselman) • Legion (Grimshaw) • AppLeS (Berman and Wolski) • Network Weather Service (Wolski) CSE 160/Berman
Alliance Grid Activities • Grid Task Force and Distributed Computing team are Alliance teams • Globus supported as exclusive grid infrastructure by Alliance • Grid concept pervasive throughout Alliance • Access Grid developed for use by distributed collaborative groups • Allliance grid players include Foster (Globus), Livny (Condor), Stevens (ANL), Reed (Pablo), etc. CSE 160/Berman
Other Efforts • Centurion Cluster = Legion testbed • Legion cluster housed at UVA • 128 533 MHz Dec Alphas • 128 Dual 400 MHz Pentium2 • Fast ethernet and myrinet • Globus testbed = GUSTO which supports Globus infrastructure and application development • 125 sites in 23 countries as of 2/2000 • Testbed aggregated from partner sites (including NPACI) CSE 160/Berman
GUSTO (Globus) Computational Grid CSE 160/Berman
IPG • IPG = Information Power Grid • NASA effort in grid computing • Globus supported as underlying infrastructure • Application focus include aerospace design, environmental and space applications CSE 160/Berman
Research and Development Foci for the Grid • Applications • Questions revolve around design and development of “Grid-aware” applications • Different programming models: polyalgorithms, components, mixed languages, etc. • Program development environment and tools required for development and execution of performance-efficient applications Applications Middleware Infrastructure Resources CSE 160/Berman
Research and Development Foci for the Grid • Middleware • Questions revolve around the development of tools and environments which facilitate application performance • Software must be able to assess and utilize dynamic performance characteristics of resources to support application • Agent-based computing and resource negotiation Applications Middleware Infrastructure Resources CSE 160/Berman
Research and Development Foci for the Grid • Infrastructure • Development of infrastructure that presents a “virtual machine” view of the Grid to users • Questions revolve around providing basic services to user: security, remote file transfer, resource management, etc., as well as exposing performance characteristics. • Services must be supported by heterogeneous and interoperate Applications Middleware Infrastructure Resources CSE 160/Berman
Research and Development Foci for the Grid • Resources • Questions revolve around heterogeneity and scale. • New challenges focus on combining wireless and wired, static and dynamic, low-power and high-power, cheap and expensive resources • Performance characteristics of grid resources vary dramatically, integrating them to support performance of individual and multiple applciations extremely challenging Applications Middleware Infrastructure Resources CSE 160/Berman