270 likes | 613 Views
Grid Computing and Middleware. Shawn Malhotra Monday, February 5 th , 2007. Overview. Background and definition Importance of middleware Globus Toolkit Sample Applications. What is Grid Computing?. Computing model that leverages the power of many networked resources Not just CPUs
E N D
Grid Computing and Middleware Shawn Malhotra Monday, February 5th, 2007
Overview • Background and definition • Importance of middleware • Globus Toolkit • Sample Applications
What is Grid Computing? • Computing model that leverages the power of many networked resources • Not just CPUs • Storage devices, special equipment (i.e. telescope) • Share resources across administrative domains • Requires security features • Different than traditional cluster computing • Programmer sees a single ‘virtual computer’ • Web ↔ Information as Grid ↔ Computing Power
Why is Grid Computing Important? • Helps solve computationally expensive problems • Flexible enough to handle many small problems • Share costly resources amongst institutions • Federally funded research labs / academic institutions • Make resources available to anybody • Cost barrier is lowered • ‘Pay as you go’ type service • Increases overall bandwidth
Motivation for Middleware • Need robust, efficient ways to pool resources • Previous ‘ad-hoc’ methods not sufficient • Need for standardization! • Distributed Computing System (DCS) • Developed at the University of California at Irvine • Early 1970s • Focus on CPU management • Poor security solution • Abandoned in the 1980s
Globus Toolkit • Broader scope, more complete solution • CPU Management • Storage Management • Monitoring Services • More details to come … • Most popular grid computing framework • Implements several standards • OGSA, WSRF, SOAP, WSDL
Globus Toolkit - Overview • Facilitates grid application development • Open, extensible, flexible, high abstraction
Job Submission • GRAM interface • Grid Resource Allocation and Management • Specify resource requirements and flow • Uniform way to submit remote jobs • Translate request for local resources • Offers a variety of features • Retrieve job status • Send job signals (kill, start, restart) • Uses Web services interface
Job Scheduling • What happens after the job is submitted? • Submitted to a scheduler • Queues jobs decides where/when to run • Requirement matching, priority systems, etc. • Abstracts resources from user • Pool heterogeneous resources together • Can have multiple layers of scheduling • Local schedulers vs. Metaschedulers
Security • Access to resources must be controlled • Grid Security Infrastructure (GSI) • Provides basic security constructs • Certificate-based PKI system • Supports single sign-on over the grid • Supports delegation • Access control left to individual services • Infrastructure provides necessary info and control • Uses Web services interface
Other Provided Modules • Data management • Facilitates file transfer, access to data stores • Monitoring and discovery • APIs to get status, subscribe to content • Important since ‘grid’ is never down, only components • Collaboration tools • Facilitates person-to-person collaboration • Build web portals for chat, e-mail, etc.
Example Applications • What can you build with such a toolkit? • Applications range from the depths of the sea to the stars above! • LOOKING deep sea research • Condor batch computing infrastructure • BIRN medical resource pooling • LEAD meteorological data • NVO virtual observatory
Workload management system • Queuing, scheduling, prioritization, monitoring • Pool desktops into batch system • Use when idle, auto-detect when busy again • ClasAd mechanism • Novel way to match resources with requests • Flocking • Seamless combination of multiple networks http://www.cs.wisc.edu/condor
Make tools / data related to oceanography available to all researchers • ‘20,000 Terabits Beneath the Sea’ • Presented at iGrid2005 • Real-time high definition deep sea video • Monitor active underwater volcanoes http://lookingtosea.ucsd.edu/
Resource pooling • Tools for research and diagnoses • Collaboration • Common user interface • Better hypotheses testing • Use a distributed patient population http://www.nbirn.net/
Sharing meteorological resources • Algorithm Development and Mining (ADaM) • Works on observational data • Provides analysis tools • ARPS Data Assimilation System (ADAS) • Provides visualization tools • Earth Science Markup Language (ESML) • Uniform way of expressing data • Data Access Systems • Allow uniform access to distributed data https://portal.leadproject.org/gridsphere/gridsphere
Expose the vast amount of astronomical data for all to use • Telescopes will produce 7 petabytes per year by 2012 • Standardized way of expressing data • VOTable • Creation of tools to produce required data • ConeSearch • Make accessing data like using real tools http://www.us-vo.org/
The WISDOM Project • Analyze potential anti-malaria drugs • Focus lab tests on promising compounds • Uses up to 5000 computers in 27 countries • Simulate drug interaction with malaria protein • Test 80,000 drugs per hour, 140 million in total • Shows the power of collaboration • Many computers borrowed from particle physics simulator in the UK – GridPP • Shared spare capacity http://grid.globalwatchonline.com/epicentric_portal/site/GRID/
Grid Computing – The Future • Currently the domain of ‘Big Science’ • Make it more mainstream for ‘Little Science’ • Technology is not the barrier • Evolution of the standards • Continued enhancement of the toolkit • Better front-end design • Promote peer-to-peer collaboration • Security is still a challenge
Summary • Grid computing is a powerful collaborative computing model • Grid computing requires efficient, fully featured middleware to thrive • Grid computing enables research and development that is not possible in isolation
References • Globus site • http://www.globus.org/ • Wikipedia • http://en.wikipedia.org/wiki/Grid_computing • Grid Café • http://gridcafe.web.cern.ch/gridcafe/
The Need for Grid Solutions • Grids are essential to sustain Moore’s Law as physical limitations will eventually limit what individual computing stations can achieve • It will become less necessary as individual resources become more powerful since technology grows faster than the complexity of our research
The Corporate Barrier • True grid computing will never be embraced by corporations due to security issues and sensitivity of data. This will limit the scope and power of the technology • Much like Web 2.0 has caused a shift in corporate presence on the internet, a ‘Grid 2.0’ will eventually force corporations to embrace this technology
Grid Middleware • Middleware designed to manage a grid will eventually merge with software designed to handle multiple CPUs on one motherboard to form a common solution. • Grid computing is far too different from multi-CPU processing to ever offer a common solution.
Expanding User Base • Development of a good middleware solution that abstracts most details of the grid will bring grid computing to ‘Little Science’ and eventually individual users. • The complexity of grid computing and lack of demand will prevent grid computing from ever becoming part of the main stream.