430 likes | 753 Views
2. Objectives. Grid computingSoftware and middleware for the gridPresent and future grid applications. 3. Grid Computing. Definition:Grid computing is distributed computing performed transparently across multiple administrative domains" (P.V. Coveney).Distributed high-performance computing.Lar
E N D
1. Grid Computing Mark P. Wachowiak, Ph.D.
February 2, 2007
2. 2 Objectives Grid computing
Software and middleware for the grid
Present and future grid applications
3. 3 Grid Computing Definition:
Grid computing is distributed computing performed transparently across multiple administrative domains (P.V. Coveney).
Distributed high-performance computing.
Large geographically distributed networks of computers.
Provides a means for using distributed resources to solve large problems.
What the Web did for communication, grids endeavor to do for computation.
4. 4 Grid Computing (2) Very general computing applications:
Database searches and queries.
Scientific applications.
Weather prediction.
Cryptography.
Business applications.
Transparency:
Distributing computational resources among multiple and widely separated sources and users is a difficult algorithmic problem.
5. 5 Characteristics of Grids Grids coordinate resources that are not subject to centralized control.
Grids use standard, open, general-purpose protocols and interfaces.
Grids deliver high qualities of service.
6. 6 Grid vs. Parallel Computing
7. 7 Grid vs. Parallel Computing (2) Grid computing is distinguished from parallel computing on one or more multiprocessors:
Parallel computing: locally clustered machines or large supercomputers.
Grid computing: computation across different administrative domains.
8. 8 Two Tenets of Grid Computing Virtualization
Individual resources, such as computers, disks, information sources, and applications) are pooled together and made available by abstractions.
Overcomes hard-coded connections between providers and consumers of resources.
Provisioning
When a request for a resource is made, a specific resource is identified to fulfill the request.
The system determines how to meet the need, and optimizes system performance.
9. 9 Characteristics of Grid Applications Data acquired by scientific instruments.
Data are stored in archives on separate, perhaps geographically-separated sites.
Data are managed by teams belonging to different organizations.
Large quantities of data (tera- or petabytes) are collected.
Software used to analyze and summarize the raw data.
10. 10 The Importance of Standardization Without standardization, grid computing practitioners would need to acquire accounts at many different computer centers, managed by different organizations.
Different security and authentication protocols and accounting practices would have to be applied.
Very heterogeneous software environment.
11. 11 Objectives Grid computing
Software and middleware for the grid
Grid applications
12. 12 Importance of Middleware Middleware eases grid users experience and provides them with levels of abstraction.
Middleware extends the Webs information and database management capabilities.
Allowing remote deployment of computational resources.
13. 13 Globus Toolkit Most widely-used middleware for grids.
Open source toolkit for building computing grids.
Provides a standard platform upon which other services build.
Provides directory services, security, and resource management.
14. 14 Objectives Grid computing
Software and middleware for the grid
Grid applications
15. 15 CPU Scavenging Unused PC resources worldwide are harnessed. Also known as shared computing.
CPU-scavenging systems gain and lose machines at unpredictable times as users interact with their computers, or as network connections fail.
CPU-scavengers can migrate jobs to allow smooth operation.
16. 16 SETI@home Search for Extraterrestrial Intelligence
Goal: to analyze vast amounts of data from the Arecibo radio telescope.
Initiated by the Space Sciences Laboratory at the University of California, Berkeley
17. 17 SETI@home (2) Uses a free screen saver, available to the public.
When activated, the screensaver program downloads time sequences of radio telescope data and searches them for radio sources.
SETI@home has more than 5 million participants.
Inspiration for other scientific applications in need of large computing resources.
18. 18 SETI@home (3) Main purpose: A program downloads and analyzes radio telescope data.
Data is recorded at the Arecibo Observatory in Puerto Rico.
The data is sent to Berkeley, where it is processed into units of 107 seconds of data.
These work units are sent from the SETI@home server over the Internet to participating computers around the world for analysis.
19. 19 SETI@home (4) The analysis software can search for signals with about one-tenth the strength of those sought in previous surveys, because it makes use of a very computationally intensive algorithm.
Data is merged into a database using SETI@home computers in Berkeley. Various pattern-detection algorithms are applied to search for the most interesting signals.
20. 20 SETI@home User Client
21. 21 Berkeley Open Infrastructure for Network Computing.
Funded by the National Science Foundation.
Used in the SETI project.
Client-server architecture:
Client Used by the computer supplying resources for one or more BOINC projects. Performs the computations.
Server System software, such as database services and projects web site.
BOINC
22. 22 Remote Procedure Calls Mechanism by which the server communicates with the client in BOINC.
Similar to a regular function call or method invocation, but one computer executes the function on another computer.
23. 23 Remote Procedure Calls - Examples Return screensaver mode:
get_screensaver_mode(int& status)
Get a list of results for jobs in progress:
get_results(RESULTS&)
Get a list of file transfers in progress:
get_file_transfers(FILE_TRANSFERS&)
Get the clients current state:
get_state(CC_STATE&)
24. 24 Human Proteome Folding Project (HPFP) Goal: to predict the structure of human proteins.
Devised at the Institute for Systems Biology, University of Washington.
Produces the likely structures for each of the proteins using a set of predefined rules.
Improved knowledge of human proteins is important in developing new therapies.
Officially completed on July 18, 2006.
Second stage now underway.
25. 25 Human Proteome Folding Project
26. 26 Business Applications Business application grid (BAG).
Major focus is using existing grid computing technologies to unite all of an organizations desktops, workstations, servers, printers, peripherals, etc., to perform useful work during idle time.
Usually focused on well-defined problems:
Calculating performance averages for a mutual fund.
Reducing processing time in wealth management systems.
Database applications.
27. 27 Business Applications (2) A large financial services company uses specialized grid software for new corporate banking applications.
Oracle Corporation offers a grid database system.
28. 28 Business Grid Middleware Provides an IT-level infrastructure to support business applications.
Middleware provides services for composing, submitting, and managing business applications.
Business functions (e.g. credit card authorization and shipping-and-handling services) are not provided.
Globus Toolkit 4 makes it easier to build an application that taps into existing distributed computing resources (e.g. servers, storages, databases).
29. 29 Conclusions Grid computing is an enabling technology that is rapidly gaining popularity in:
Science.
Medicine.
Engineering.
Business and financial applications.
Many software vendors offer grid computing toolkits and middleware.
In 2004, 20% of companies were seeking grid computing solutions (Evans Data Corp.).
30. 30 Benefits of Grid Computing Collaboration.
Increased productivity.
Efficient use of resources and storage.
Cost-effectiveness.
Heterogeneous environments.
Failure tolerance.
Transparency.
31. 31 Challenges Lack of control over resources, administration.
Security.
Middleware.
Network failures.
Cultural issues.
32. 32 Thank you.
33. 33 Open grid services architecture OGSA standard for grid-based applications.
Framework for meeting grid requirements.
34. 34 Globus toolkit
35. 35 Other grid tools Resource management:
Grid Resource Allocation and Management Protocol (GRAM)
Information Services:
Monitoring and Discovery Service (MDS)
Security Services:
Grid Security Infrastructure (GSI)
Data Movement and Management:
Global Access to Secondary Storage (GASS) and GridFTP
36. 36 World-Wide Telescope (2002) Goal: deployment of data resources shared by astronomers.
Data:
Archives of observations over a particular period of time, part of the EM spectrum, and area of the sky.
Observations collected at different sites around the world.
Data on same celestial objects are combined over different periods of time and different parts of the EM spectrum.
37. 37 World-Wide Telescope (2) Data archives (? terabyte) managed locally by the teams that collect the data.
As data is acquired, it is analyzed and stored as transformed data so that it can be used by remote astronomy sites.
Librarian role of scientists.
Metatdata is required to describe:
Time the data was collected.
Part of the sky observed.
Instruments used.
38. 38 WCG ongoing projects FightAIDS@Home
Launched by WCG in 2005.
Each computer processes one potential drug molecule and tests how well it would dock with HIV protease, inhibiting viral reproduction.
Human Proteome Folding Phase 2
Released in 2006.
Extension of HPF1, focusing on human-secreted proteins.
Better protein models, but more computationally intensive.
39. 39 World Community Grid (WCG) Goal: to create the world's largest public computing grid for humanitarian concerns.
Administered and funded by IBM.
Platforms: Windows, Linux, and Mac OS X.
Uses the idle time of Internet-connected desktop computers.
The agent works as a screen saver (like SETI@home), only using a computer's resources when it would otherwise be idle, and returning resources to the users when requested.
Projects are approved by an advisory board: representatives of major research institutions, universities, UN, WHO.
40. 40 WCG Smallpox research Completed project.
WCG largely began due to the success of this project in shaving years off research time.
Analysis of therapeutic candidates to fight the small virus.
About 35 million potential drug molecules were screened against several smallpox proteins, resulting in 44 new potential treatments.
41. 41 WCG Ongoing projects (2) Help Defeat Cancer (2006)
Processes large numbers of tissue samples using tissue microarrays.
Genome Comparison (2006)
Compares gene sequences of different organisms to find similarities.
Goal: determining the purpose of specific gene sequences in particular functions by comparing it with similar sequences with known functions in another organism.
42. 42 Other grid projects
43. 43 Requirements of grid systems Remote access to resources, specifically, to archived data.
Data processing at the site where the data is managed.
Remote requests (queries) result in a visualization or results from a small quantity of data.
Resource manager of a data archive create instances of services when they are needed.
Similar to distributed object model, where servant objects are created when needed.
44. 44 Requirements of grid systems (2) Metadata to describe characteristics of archived data.
Directory services based on the metadata.
Software for:
Query management.
Data transfer.
Resource reservation.