210 likes | 649 Views
NSF Site Visit. 2-23-2006. HYDRA Using Windows Desktop Systems in Distributed Parallel Computing . NSF Site Visit. 2-23-2006. Introduction…. Windows desktop systems at IUB student labs 2300 systems, 3 year replacement cycle
E N D
NSF Site Visit 2-23-2006 HYDRAUsing Windows Desktop Systems in Distributed Parallel Computing
NSF Site Visit 2-23-2006 Introduction… • Windows desktop systems at IUB student labs • 2300 systems, 3 year replacement cycle • Pentium IV (>=1.6 GHz), 256/512/1024 MB memory, 10/100 Mbps/GigE, Windows XP • More than 1.5 TF
NSF Site Visit 2-23-2006 Possibly Utilize Idle Cycles? Red: total ownerBlue: total idleGreen: total Condor
NSF Site Visit 2-23-2006 Problem Description • Once again... Windows desktop systems at IUB student labs: • As a scientific resource • Harvest idle cycles
NSF Site Visit 2-23-2006 Constraints • Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic • Microsoft Windows environment • Daily software rebuild (updates)
NSF Site Visit 2-23-2006 What could these systems be used for? • Many small computations and a few small messages • Foreman-worker • Parameter studies • Monte Carlo • Goal: High Throughput Computing (not HPC) • Parallel runs of the aforementioned small computations to make better use of resource • Parallel libraries – MPI, PVM, etc. – have constraints if availability of resources is ephemeral i.e. not predictable
NSF Site Visit 2-23-2006 Solution • Simple Message Brokering Library (SMBL) • Limited replacement for MPI • Both server and client library based on TCP socket abstraction • Porting from MPI is fairly straight forward • Process and Port Manager (PPM) • Plus … • Condor for job management, file transfer, no checkpointing or parallelism • Web portal for job submission
NSF Site Visit 2-23-2006 The Big PictureWe’ll discuss each part in more detail next… The shaded box indicates components hosted on multiple desktop computers
NSF Site Visit 2-23-2006 SMBL (Server) SMBL Server Process Table for 4 CPU parallel session • SMBL server maintains a dynamic pool of client process connections • Worker job manager hides details of ephemeral workers at the application level
NSF Site Visit 2-23-2006 SMBL (Server) SMBL Server Process Table for 4 CPU parallel session • SMBL server maintains a dynamic pool of client process connections • Worker job manager hides details of ephemeral workers at the application level
NSF Site Visit 2-23-2006 SMBL (Client) • Client library implements selected MPI-like calls • MPI_Send () SMBL_Send () • MPI_Recv () SMBL_Recv () • In charge of message delivery for each parallel process
NSF Site Visit 2-23-2006 Process and Port Manager (PPM) • Starts the SMBL server and application processes on demand • Assigns port/host to each parallel session • Directs workers to their servers
NSF Site Visit 2-23-2006 PPM (cont’d ...) PPM with two SMBL servers (two parallel sessions) Parallel Session 1 Parallel Session 2
NSF Site Visit 2-23-2006 Once again … the big picture The shaded box indicates components hosted on multiple desktop computers
NSF Site Visit 2-23-2006 Recent Development • Hydra cluster Teragrid enabled! (Nov 2005) • Allow TG users to use resource • Virtual Host based solution – two different URLs for IU and Teragrid users • Teragrid users authenticate against PSC’s Kerberos server
NSF Site Visit 2-23-2006 System Layout • PPM, SMBL server, Condor and web portal running on Linux server • Dual Intel Xeon 3.0 GHz, 4 GB memory, GigE • Second Linux server running Samba to serve BLAST database
NSF Site Visit 2-23-2006 Portal • Creates and submits Condor files, handles data files • Apache/PHP based • Kerberos authentication • URLs: • http://hydra.indiana.edu (IU users) • http://hydra.iu.teragrid.org (Teragrid users)
NSF Site Visit 2-23-2006 Utilization of Idle Cycles Red: total ownerBlue: total idleGreen: total Condor
NSF Site Visit 2-23-2006 Summary • Large parallel computing facility created at a low cost • SMBL parallel message passing library that can deal with ephemeral resources • PPM port broker that can handle multiple parallel sessions • SMBL Homepage • http://smbl.sourceforge.net (Open Source)
NSF Site Visit 2-23-2006 Links and References • Hydra Portal • http://hydra.indiana.edu (IU users) • http://hydra.iu.teragrid.org (Teragrid users) • SMBL home page: http://smbl.sourceforge.net • Condor home page: http://www.cs.wisc.edu/condor/ • IU Teragrid home page – http://iu.teragrid.org
NSF Site Visit 2-23-2006 Links and References (cont’d..) • Parallel FastDNAml: http://www.indiana.edu/~rac/hpc/fastDNAml • Blast: http://www.ncbi.nlm.nih.gov/BLAST • Meme: http://meme.sdsc.edu/meme/intro.html