760 likes | 1.05k Views
Grid Computing and LA Grid. Acknowledgement. Andrew Tanenbaum Laukik Chitnis Sanjay Ranka Onyeka Ezenwoye Jorge Rodriguez. Agenda. Motivation Grid Computing Overview Grid Middleware LA Grid. Speed up Using Parallel Processing.
E N D
Acknowledgement • Andrew Tanenbaum • Laukik Chitnis • Sanjay Ranka • Onyeka Ezenwoye • Jorge Rodriguez
Agenda • Motivation • Grid Computing Overview • Grid Middleware • LA Grid
Speed up Using Parallel Processing (a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel.
Speed up Using Parallel Processing Real programs achieve less than the perfect speedup indicated by the dotted line.
Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer. (e) A grid.
Instruction-Level Parallelism (a) A CPU pipeline. (b) A sequence of VLIW instructions. (c) An instruction stream with bundles marked.
Multiprocessors (a) A multiprocessor with 16 CPUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different CPU.
Multicomputers (a) A multicomputer with 16 CPUs, each with its own private memory. (b) The bit-map image of Fig. 8-17 split up among the 16 memories.
Taxonomy of Parallel Computers (1) Flynn’s taxonomy of parallel computers.
Taxonomy of Parallel Computers (2) A taxonomy of parallel computers.
UMA Multiprocessors Using Crossbar Switches (a) An 8 × 8 crossbar switch. (b) An open crosspoint. (c) A closed crosspoint.
NUMA Multiprocessors A NUMA machine based on two levels of buses. The Cm* was the first multiprocessor to use this design.
BlueGene (1) The BlueGene/L custom processor chip.
BlueGene (2) The BlueGene/L. (a) Chip. (b) Card. (c) Board. (d) Cabinet. (e) System.
Google (1) Processing of a Google query.
Google (2) A typical Google cluster.
Agenda • Motivation • Grid Computing Overview • Grid Middleware • LA Grid
Grid Computing • Grid computing is an emerging computing model that • tries to solve large-scale computation problems • by taking advantage of many networked computers • to model a virtual computer architecture • that is able to distribute process execution • across a parallel infrastructure. Source: www.wikipedia.org
Ian Foster and Carl Kesselman • “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.” 1998 • “The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource-brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization.” 2000
A Grid Checklist • coordinates resources that are not subject to centralized control … • … using standard, open, general-purpose protocols and interfaces … • … to deliver nontrivial qualities of service. • Virtual Organizations • Group of individuals or institutions defined by sharing rules to share the resources of “Grid” for a common goal. • Example: Application service providers, storage service providers, databases, crisis management team, consultants.
The Grid isn’t a new concept • Using multiple distributed resources to cooperatively work on a single application has been around for decades • Networked OS (70s) • Distributed OS (80s and 90s) • Heterogeneous Computing http://www.acis.ufl.edu/hcw2006/ • Parallel Distributed Computing • Metacomputing http://www.cnds.jhu.edu/research/metacomputing/
How is a grid different? • Grids focus on site autonomy • Grids involve heterogeneity • Grids involve more resources than just computers and networks • Grids focus on the user
A dynamicmulti-institutional network of computers that come together to share resources for the purpose of coordinatedproblem solving. Grid Computing resource application institutional boundary • Achieved through: • Open general-purpose protocols • Standard interfaces
A total of 35 sites Over 3500 CPUs So, what is a Grid again?! • Simply put, a Grid is a congregation of different sites collaborating to increase productivity • For example, Grid3
Data Grids Managing and manipulating large amounts of data. Main objective is to share large amounts of data that is otherwise impossible with out the grid Compute Grids For compute-intensive tasks. Emphasis is on the federation of CPU cycles and distribution of compute intensive tasks Broad Division of Grid There is no consensus on these categorization and it only aids in understanding the requirements
Examples of Distributed Apps. • High Energy Physics applications • Monte Carlo simulations • CMS experiment • Finding interesting astronomical patterns • Sloan Digital Sky Survey • Coastal ocean monitoring and predicting • SURA Coastal Ocean Observing and Prediction (SCOOP) • Prime number generator • Cracking DES • Cannot be done on a single machine • You want to divide the application and run it on a distributed and decentralized environment
Why do you want a grid? • Different perspectives • User: I want to run my scientific application on the grid so that I can get results in 10 hours instead of 10 days • Organization: Our next big experiment will generate tera-bytes of data and we want to distribute, share and analyze the data • Organization: We want to tap into the existing grids and share resources
Run my app in 10 hrs that usually takes 10 days on my pentium A User Perspective • I need • More CPU cycles • More disk space • More bandwidth • Better software tools • All of the above • Alternatives to grid • Simple CPU cycle stealer • Simple SRM (Storage Resource Manager)
I got root ! Sys admin perspective • How do I distribute the load on the machines? • How do I reduce the overhead on the central server • How do I manage local and remote users? • What should be the policies?
Organizational Perspective • Federation of scientists – distributing, sharing and analyzing data • Tapping into existing grids • Cost-effective: A grid can be built from commodity software and hardware without spending millions on the next super duper computer. • Reliability: If a site fails, we can simply move our jobs to another site (this can be seen as a user perspective as well) Where do you want to run your job today? GridSoft
Distributed App. Requirements • Requires • A lot of resources • Reservation of resources at a particular time • Monitoring of status of the submitted jobs to multiple sites • Storage that is not easily available at a single place
Grid Building Blocks • Computational Clusters • Storage Devices • Networks • Grid Resources and Layout: • User Interfaces • Computing Elements • Storage Elements • Monitoring Infrastructure… Some slides from Jorge Rodriguez’s presentation on “Building, Monitoring and Maintaining a Grid”
Computer Clusters A few Headnodes, gatekeepers and other service nodes I/O Servers typically RAID fileserver Cluster Management “frontend” The bulk are Worker Nodes Disk Arrays Tape Backup robots Dell Cluster at UFlorida’s High Performance Center
Network Switch A Typical Cluster Installation WAN • Computing Cycles • Data Storage • Connectivity Head Node/Frontend Server Pentium III I/O Node + Storage Worker Nodes Pentium III • Cluster Management • OS Deployment • Configuration • Many options • ROCKS (kickstart) • OSCAR (sys imager) • Sysconfig Pentium III Pentium III
Layout of Typical Grid Site The Gr i d A Grid Site Computing Fabric User Interface Compute Element Authz server Monitoring Element => + Storage Element + => Grid Middleware Grid Level Services VDT OSG Data Management Services Grid Operations Monitoring Clients Services
A Typical Grid Application Workflow A Small Montage Workflow ~1200 node workflow, 7 levels Mosaic of M42 created on the Teragrid using Pegasus
Simple Expectations from the Grid • Simply put, we need the grid to do the following operations for each transformation: • Find the input datasets • Apply the transformations (process the input) • Store the output datasets • and publish its “presence” so that collaborating scientists can find it
Qualities expected from the grid • And of course, we would like these operations to be performed: • Efficiently • As quickly as possible • Seamlessly • for easy collaboration • Fairly • fair to all collaborators • Securely • security against loss (fault tolerance), unauthorized access
Agenda • Motivation • Grid Computing Overview • Grid Middleware • LA Grid
Grid Middleware • In an effort to view the ‘Grid as a Workstation’, a set of grid software and services act as middleware between the user and the grid of machines. • These services can be roughly categorized as follows: • Security services • Information Services • Data Management • Job Management • Virtual Data System
GLOBUS TOOLKIT 4 – GT4 • Open source toolkit developed by The Globus Alliance that allows us to build Grid applications. • Organized as a collection of loosely coupled components. • Consists of services, programming libraries, and development tools. • High-level services • Resource Monitoring and Discovery Service • Job Submission Infrastructure • Security Infrastructure • Data Management Services
Information Services Security Services Services offered in a Grid Resource Management Services Data Management Services
Security Services • Forms the underlying communication medium for all the services • Secure Authentication and Authorization • Single Sign-on • User need not explicitly authenticate himself every time a service is requested • Uniform Credentials • Ex: GSI (Globus Security Infrastructure)
User Proxy Creates a proxy for single-sign on GSI enabled GRAM GSI enabled GRAM Plain unix authentication Kereberos authentication A B
GSI certificate • A GSI certificate includes four primary pieces of information: • A subject name, which identifies the person or object that the certificate represents. • The public key belonging to the subject. • The identity of a Certificate Authority (CA) that has signed the certificate to certify that the public key and the identity both belong to the subject. • The digital signature of the named CA. • A third party (a CA) is used to certify the link between the public key and the subject in the certificate. In order to trust the certificate and its contents, the CA's certificate must be trusted. The link between the CA and its certificate must be established via some non-cryptographic means, or else the system is not trustworthy.
Grid Proxy • Once your user certificate is in place, you need to create a grid proxy which is used for accessing the Grid • In Globus, you can do this using • grid-proxy-init • A proxy is like a temporary ticket to use the Grid, default in the above case being 12 hours. • Once this is done, you should be able to run “grid jobs” • globus-job-run site-name /bin/hostname
Gridmap file • A gridmap file at each site maps the grid id of a user to a local id • The grid id of the user is his/her subject in the grid user certificate • The local id is site-specific; • multiple grid ids can be mapped to a single local id • Usually a local id exists for each VO participating in that grid effort • The local ids are then used to implement site specific policies • Priorities etc.
Gridmap file entry • The gridmap-file is maintained by the site administrator • Each entry maps a Grid DN (distinguished name of the user; subject name) to local user names # #Distinguished Name Local username # “/DC=org/DC=doegrids/OU=People/CN=Laukik Chitnis 712960” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Richard Cavanaugh 710220” grid3 “/DC=org/DC=doegrids/OU=People/CN=JangUk In 712961” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Jorge Rodriguez 690211” osg