1 / 70

Grid Computing Using Modern Technologies

Grid Computing Using Modern Technologies. A 3-Part Tutorial presented by: Mary Thomas Dennis Gannon Geoffrey Fox. Tutorial Outline. Part I: Mary Thomas (TACC/UT) Understanding Web and Grid portal technologies Building application portals with GridPort Grid Web services

medwin
Download Presentation

Grid Computing Using Modern Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing Using Modern Technologies A 3-Part Tutorial presented by: Mary Thomas Dennis Gannon Geoffrey Fox

  2. Tutorial Outline • Part I: Mary Thomas (TACC/UT) • Understanding Web and Grid portal technologies • Building application portals with GridPort • Grid Web services • Part II: Dennis Gannon (Indiana) • Distributed Software Components • Grid Web services and science portals for scientific applications • Part III: Geoffrey Fox (Indiana) • Integrating Peer-to-Peer networks and web services with the Grid • Grid Web services and Gateway

  3. Introduction to Developing Web-BasedGrid Computing Portals Mary Thomas Texas Advanced Computing Center The University of Texas at Austin and NPACI Presented at GGF4, Toronto, Canada, Sunday, 2/15/02

  4. Goals of Part 1 • Introduce basic portal technologies and concepts • Provide enough knowledge to go out and begin process of evaluating/understanding technologies • Provide enough knowledge to build a computing portal based on GridPort/Perl/CGI • Not going to teach you how to install Grid software: assume you can do this is you have someone who takes care of this • Audience: application developers or scientists

  5. Approach • Introduce the concept of a portal for computational Grids and science applications • Use GridPort Toolkit to demonstrate the basic concepts needed to understand how to use the web and grid technologies • Show examples of how to program a portal

  6. Outline • Defining Portals and Web Technologies • The GridPort Portal Architecture • GridPort-Based Portals • Application Portals • Programming Example • Portal Services for Remote Clients • Client Portal Services • Grid Portal Web Services

  7. Defining The Web

  8. What is the Web?

  9. What is the Grid Web?

  10. Web Server Technologies • Web Servers: • Run on a machine, and clients access the process • Common Versions: • Netscape (http://www.netscape.com) • Apache (http://www.apache.org) - open source • OS’s: Windows, Unix, MacIntosh, Lynus, etc. • Web Programming Languages • Server: Java, Javascript, Python, PHP, Perl • Client: HTML, Javascript • Protocols: HTTP/CGI/Servlets/Applets • Security • HTTPS, SSL, Encryption • Cookies • Certificates

  11. Web Clients • Multiple display devices: • Desktop workstations, PC’s, PDA’s, cell phones, pagers, other wireless devices, televisions • Various viewing tools: • Browsers: Internet Explorer, Netscape Navigator, Opera • Visually Impaired tools • OS’s: Windows/WinCE, Unix, Mac, Lynux, Palm, etc. • Web Programming Languages • HTML, Javascript • Perl, Java for ‘scrapers’ • Security • HTTPS, SSL, Encryption, Cookies • Certificates

  12. Defining ComputationalWeb Portals

  13. What is a Portal? • Web sites that provide centralized access to a set of resources • Characterized by • Personalization • Security/authentication/authorization • What you see often changes based on what you are looking for (e.g.: adds) • Navigation/choices • Gateway for Web access to distributed resources/information • Hub from which users can locate all the Web content that they commonly need.

  14. Classes of portals • Horizontal or “mega-portals”: • information from search engines and the ISP's (yahoo) • everybody comes in, sees the same thing • allow personalization to some degree • Vertical • portals that are customized by the system. • the system recognizes who you are, and gives you a different view of the university or the company that you're going to build. • More specialized (amazon, travelocity, etc.) • Intranet: • inside a company that give particular people the information that they need

  15. Scientific Web Portals • Differ from Commercial Portals (yahoo, amazon) • Types of Science Portals: • User Portals: • simplify user’s ability to interact with and utilize a complex, often distributed environment • direct access to resources (compute, data, archival, instruments, and information) • Application Interfaces • Enables scientists to conduct simulations on multiple resources • EOT Portals • Educates public (future scientists?) about science using software simulations, visualizations, etc • Learning tools • Individual Portals • Users can roll out their own portals by writing web pages using standard HTML or Perl/CGI

  16. Why Use Portals for Computational Science? • Computational science environment is complex: • Users have access to a variety of distributed resources (compute, storage, etc.). • Interfaces, OS’s, Grid tools to these resources vary and change often • Environment changes: • Relocation/upgrade of binaries • Policies at sites sometimes differ, allocations change • Using multiple resources can be cumbersome • Grid adds complexity for programmers

  17. Portals Provide Simple Interfaces • Portals are web based and that has advantages - • Users know & understand the web • Can serve as a layer in the middle-tier infrastructure of the Grid • Integrate various Grid services and resources • Users can be isolated from resource specific details • Single web interface isolates system changes/differences • Not and end-all solution - several issues/challenges here • Performance, scaleability

  18. Virtual Organizations • GGF Model, based on Foster paper • “Anatomy of the Grid” • Hierarchical tree based • Each ‘node’ represents collections of: • Compute resources • Projects, • Centers • HotPage portals represent VO’s • NPACI, PACI • Like to build HiPCAT as a VO to run as DTF node

  19. Virtual Organizations (HotPage) NSF HiPCAT DTF PACI NPACI PSC Alliance SDSCI Mich TACC

  20. Portal Toolkits • Commercial: • Sun: Java Servlets, iPlanet • IBM: WebSphere • MSFT: .NET • Special interest groups: • uPortal, Javaspeed • R&D within Grid community: • GridPort Toolkit (http://gridport.npaci.edu) • GPDK () • GirdSphere (refer to ACM site???) • Gateway (Fox) • CCA (Gannon)

  21. Building Application Portals Using theGridPort Toolkit

  22. GridPort Architecture

  23. GridPort Toolkit Design Concepts • Key design goals: • Any site should be able to host a user portal • Any user should be able to create their own user portal if they have accounts and certificate • Key Requirements: • Base software design on infrastructure provided by World Wide Web: • use commodity technologies wherever possible • avoid shell programs/applications/applets • GridPort Toolkit should not require that additional services be run on the HPC Systems • reduce complexity -- there are enough of these already • so, leverage existing grid research & development • GSI certificate (PKI)

  24. GridPort Designed for Ease of Use • WWW interface is common, well understood, and pervasive • User Portals can be accessed by anyone who has access to a web browser, regardless of location • Users can construct customized application web pages: • only basic knowledge of HTML and Perl/CGI • Application programmers can extend the set of basic functions provided by the Toolkit • Portal services hosts can modify support services by adding/remove/modifying broker or grid interface codes

  25. Commodity Web Technologies • Use of commodity web technologies -> Portability • contribute to a ‘plug-n-play’ grid • Requirements: • Any Browser: Communicator, IE • HTTP, HTTPS, SSL, HTML/JavaScript, Perl/CGI, SSH, FTP • Netscape or Apache servers • Based on simple technology, this software is easily ported to, and used by other sites. • easy to modify and adapt to local site policies and requirements • Goal is to design a toolkit that is simple to implement, support, port, and develop

  26. Grid Technologies • Security: • Globus Grid Security Infrastructure (GSI), SSH • Myproxy for remote proxies • Job Execution: • Globus GRAM • Information: • Globus Grid Information Services (GIS) • File Management: • SDSC Storage Resource (SRB) • GridFTP

  27. Information Services • Designed to provide a user-oriented interface to NPACI Resources and Services • Consists of on-line documentation, static informational pages, and links to events within NPACI, including basic user information such as: • Documentation, training , news, consulting • Simple tools: • application search • systems information • generation of batch scripts for all compute resources • Network Weather System • No user authentication is required to view this information.

  28. Information Services: Dynamic • Dynamic information provided by automatic data collection scripts that provide real-time information for each machine (or summaries) such as: • Status Bar: displays live updates showing operational status and utilization of all NPACI resources • Machine Usage: displays summary of machine status, load, and batch queues • Batch Queues: displays currently executing and queued jobs • Node Maps: displays graphical map of how running applications are mapped to nodes • Network Weathering System: provides connectivity information between a user’s local host and grid resources • Pulled from 3 possible sources: • MDS, web services, local cron jobs

  29. Web Server to Grid Resource

  30. Interactive Sessions • How do they work? • What do they do: • Job submission • File management • Authentication • What do we use to do them? • List of grid technologies

  31. GridPort Interactive Services Diagram

  32. Interactive Sessions: Login/Logout • Login: • client browser connects to an HTTPS server • user enters valid NPACI Portal account ID • login into Portal using CA authentication (Globus): • Globus infrastructure manages user accounts on remote hosts • username used to map passphrase to valid key and certificate (local repository), or Myproxy (remote) • passphrase used to create proxy cert. using globus-proxy-init • if proxy-init successful, session key stored on client browser • data passed through web server over SSL channel • Session info stored in secure, local repository

  33. Interactive Sessions: Login/Logout • Logout: • user automatically logged out • if logout selected • session times out • on logout: • active session data files cleared • relevant user information archived for future sessions stored

  34. Grid Security at all Layers • GSI authentication for all portal services • transparent access to the grid via GSI infrastructure • Security between the client -> web server -> grid: • SSL/RC4-40 128 bit key/ SSL RSA X509 certificate • authentication tracked with cookies coupled to server data base/session tracking • Single login environment (another key goal) • provide access to all NPACI Resources where GSI available. • with full account access privileges for specific host • use client browser cookies to track state

  35. Portal Accounts • Portal accounts are not the same as resource accounts. • valid Grid user on resource, need allocations • processes run under own account with same access and privileges as if they had logged onto resource • Portal users must have a digital certificate signed by a known Certificate Authority (CA) • And must get DN into mapfile • Accounts for NPACI users obtained via an on-line web form: • Can generate a certificate - certificate and key are placed in a secure repository

  36. Interactive Sessions: Job Execution • Web server transactions: • confirm/authenticate user login status • parse command/request (CGI vars) • establish user environment • assemble remote command (Globus/SSH) • verify proxy (if Globus) or recreate • send command (e.g., Globus daemon on remote host) • parse, format, and return results to the web browser on the user’s workstation or store data (e.g., FTP). • While the user login is in the active state: • check for timeout • track current state • record information about job requests and user data for use in subsequent transactions or sessions.

  37. GridPort File System • Without SRB capabilities, files are distributed • Adds to complexities when migrating and managing data

  38. GridPort + SRB Architecture • With SRB capabilities, file access is direct • Single SRB account access allows for more flexible data management

  39. Variety of GridPort Applications • Current applications in production: • NPACI/PACI HotPages (also @PACI/NCSA ) • https://hotpage.npaci.edu • LAPK Portal: Pharmacokinetic Modeling (live demo of Pharmacokinetic Modeling Portal) • https://gridport.npaci.edu/LAPK • GAMESS (General Atomic and Molecular electronic Structure System) • https://gridport.npaci.edu/GAMESS • Telescience (Ellisman) • https://gridport.npaci.edu/Telescience • Protein Data Bank CE Portal (Phil Bourne) • https://gridport.npaci.edu/CE

  40. Programming Example: Job Submit • Client: • Example of Client HTML page • HTML Code • Server: • Perl/CGI parser script running on server • GridPort Toolkit function code

  41. HotPage View: Job Submission

  42. JobSubmit Web Page

  43. JobSumbit HTML Code <FORM action="https://hotpage.npaci.edu/tools/cgi-bin/job_submit.cgi" method=post enctype="application/x-www-form-urlencoded" name="job_submit"> Arguments: <INPUT TYPE="text" NAME="args"> Select Queue: <SELECT NAME="queue"> <OPTION VALUE="low">low <OPTION VALUE="normal">normal <OPTION VALUE="high">high <OPTION VALUE="express">express </SELECT> Number of Cpu’s: <INPUT TYPE="text" NAME="cpus"> Max Time (min): <INPUT TYPE="text" NAME="max_time"> <INPUT TYPE="hidden" NAME="mach" VALUE="SSPHN"> <INPUT TYPE="hidden" NAME="exe" VALUE="/rmount/paci/sdsc/mthomas/mpi_pi"> <INPUT TYPE="submit" METHOD="post" ACTION="https://hotpage.npaci.edu/tools/cgi-bin/job_submit.cgi" > </FORM>

  44. JobSumbit: Server Perl/CGI Parser • GRABS HTTP/CGI data and sends it to GridPort subroutine, waits for results #!/usr/local/bin/perl use CGI qw(:all); my $query = new CGI; $|=1; BEGIN{ ###GET THE SCRIPTS LOCATION AND THE GLOBAL VARS### $MY_LOCATION = "tools/cgi-bin"; $CURRENT_DIR = `pwd`; ($PORTAL_ROOT, $rest) = split(/$MY_LOCATION/, $CURRENT_DIR); $GLOBAL_VARS_CONFIG = $PORTAL_ROOT . "cgi-bin/global_vars.cgi"; require "$GLOBAL_VARS_CONFIG"; require "$PORTAL_HOME_DIR/cgi-bin/hotpage_authen.cgi"; }

  45. JobSubmit: Server Perl/CGI code (cont.) # load in code to do job submission through globus require "$GRIDPORT_HOME_DIR/services/globus/cgi-bin/gridport_globus_job.cgi"; # subroutines to get/set user directories (home,work, current) and do job handling require "$PORTAL_HOME_DIR/tools/cgi-bin/user_dirs.cgi"; require "$PORTAL_HOME_DIR/tools/cgi-bin/user_jobs.cgi"; my $args = $query->param(args); my $queue = $query->param(queue); my $cpus = $query->param(cpus); my $max_time = $query->param(max_time); $mach = $query->param(mach); my $exe = $query->param(exe); $exe = $exe . " $args"; # run the command through Globus, trap output, return to caller process @output = gridport_globus_job_submit($mach,$cpus,60,$exe,$max_time,$queue);

  46. gridport_globus_job_submit sub gridport_globus_job_submit { my @job = (); my $user = &get_username(); ### get the input and set up globus my ($mach, $cpus, $timeout, $exe, $max_cpu_time, $queue) = @_; &globus_config($user); # verify data &mach_config($mach); # verify data #build the globus command my $globus_submit = "$globus_job_submit{$machines{$mach}{gv}} "; $globus_submit .= "$machines{$mach}{name}{job} -np $cpus -queue $queue "; $globus_submit .= "-maxtime $max_cpu_time $exe"; @job = run_command_timeout($globus_submit, $timeout); # run job return @job; }

  47. Laboratory for Applied Pharmacokinetics • (LAPK) Portal: • Users are Doctors, so need extremely simple interface • Must be portable – run from many countries • Need to hide details such as • Type of resources (T3E), file storage, batch script details, compilation,UNIX command line • Major Success: • LAPK users can now run multiple jobs at one time using portal. • Not possible before because developers had to keep codes & scripts simple enough for doctors to use on T3E

  48. Laboratory for Applied Pharmacokinetics • Uses gridport.npaci.edu portal services/capabilities: • File upload/download between local host/portal/HPC systems • Job Submit: • submission (builds batch script, moves files to resource, submit jobs) • Job tracking: in the background portal tracks jobs on system and moves results back over to portal storage when done • Job cancel/delete • Job History: maintains relevant job information

  49. LAPK Job Submit and Job History

More Related