220 likes | 323 Views
Stern Center for Research Computing Update. Norman White February 24, 2005. Outline of talk. Background Current Status and Plans Feedback from faculty How to submit jobs to the grid Demo of grid engine for those interested. Background. Stern Research Computing
E N D
Stern Center for Research Computing Update Norman White February 24, 2005
Outline of talk • Background • Current Status and Plans • Feedback from faculty • How to submit jobs to the grid • Demo of grid engine for those interested
Background • Stern Research Computing • Research Computing has had little attention since Stern signed the WRDS agreement. • Several neglected areas • Computational intensive research • Wharton (WRDS) not really appropriate • Eureka very slow • Desktop not appropriate • Rapidly growing demand • Desktop computing • Faculty offices becoming mini computer centers • Software Licensing Issues
Initial Response • Center for Digital Economy Research • Citigroup grant for small cluster (grid) • Salomon Center • Establishes a small staff and facilities for Financial data bases • Collaboration Between Salomon Center and CEDER • Equipment Consolidation in Copy Center • Stern Center for Research Computing Established
CRC Mission • Foster and support computational based research at Stern • Provide Stern with the ability to do cutting edge research • Leverage Stern’s Scale and Scope
Immediate Goals (now completed) • Consolidate existing Research Computing Facilities • Provide immediate improvement in capabilities (processing, disk, software, backups) • Establish a research computing architecture which integrates existing and new hardware • Develop platform for continued improvement • Provide incentives for faculty to participate • Support PhD Research
Medium Term goals • Extend architecture to include • Stern Desktop support • Computation nodes • Data access from desktops • Labs • University facilities • Super computer on order • Provide programming support
The “Team” • Faculty Director – Norman White • “Virtual Team” • Scott Joens – IT and Salomon Center • David Frederick – IT • Dan Graham – IT • Vadim Barkalov – Student • …..
Current Status • Hardware • Cluster of machines in Copy Center ( ~15 Eurekas) • GRID – Primary research computer • Available to all researchers (1.5 times Eureka) • Main host for the rest of the machines • Sun Grid Engine Master Host • LEDA – 8 Processor Linux • HPC only (Matlab, …) • 5 times as powerful as Eureka • Miner • HPC Only (Matlab, Splus, R, Octave) • Total processing power >10 times Eureka • High speed gigabit network backbone • Gigabit connection to rest of Stern • Dedicated Tape backup unit for research computing
Software • Sun Grid Engine running on 2 machines • Soon to be rolled out to all machines • Matlab license server with 28 licenses • Can run on any node, Sun or Linux • SAS • Sun (Grid) only • Splus • Sun and Linux • Stata • Linux • Cplex, GAUSS, Mathematica, R, Octave, Perl, f77, C,Java … • Pine, Pico, emacs
User files • All user home directory files are available on any node. • Networked data storage available on all nodes (~ 1TB in total, more coming) • Home directories backed up every night. • Data once per week.
Grid Computing … • Concept • View machines as computing nodes • High speed network connecting machines in a cluster together • Support for heterogeneous nodes • Speed • OS (Solaris, Linux) • Software (SAS, Matlab) • Disk (need > 4GB) • Memory (> 256MB) • 3 types of host machines • Submit Host • Scheduling Host (knows what nodes have what resources) • Execution host
Advantages of Grid Computing • Grid Scheduler has intelligence • Knows load on all hosts • Knows hosts resources • Knows availability of hosts • Allows dynamic addition of nodes • Execution hosts can die and grid is unaffected • Understands grid-wide resources (like software licenses) • Provides an architecture for continuous growth
Who can use the “Stern HPC Grid” • Any researcher who needs to run jobs > 1 hour of cpu • Most users have been migrated (even though you don’t know it) • All large jobs will HAVE to run on the grid, unless there is some compelling reason not to.
How do I use the “grid” • You need to create a small shell file to run your job. • In the shell file, you tell Sun Grid Engine about your job so it can decide where to run it. • At a minimum you give it a name, and how to run your program. • Optionally declare resource needs like • Cpu time (default is 2 hours) • Software (matlab, Splus, Sas, …) • Memory (default is 256MB)\ • …. (many options)
ExampleMatlab job – 100 hours of CPU • #!/bin/sh • #$ -N mymatjob • #$ -l matlab, h_cpu=100:00:00 • matlab<mymatjob.m To submit: qsub mymatjob.sh qstat (will show you the status of all jobs)
So what is happening?? • When you submit your job, the Sun Grid Engine matches your needs against available resources. • It will then choose the “best” machine to process your job on. • I.e. the most lightly loaded machine, that matches your requirements.
Why can’t I just login and run it myself? • How would you know which machine has what resources? • How could you determine the load? • Sun Grid Engine will also: • Load balance across many machines • Deliver your output automatically • Email you when your job is complete • Allow you to have job dependencies • I.e. First run job A, then (in parallel B,C,D), and then E • SGE will manage parallel execution • I.e. Run this job on 7 different matlab nodes in parallel
Advantages • Centralized management of all resources • Graphical interface (Qmon) to manage and view status • (Coming) Web interface for users to submit and monitor jobs.
So what about desktop users?? • Two answers • Is your desktop really the appropriate place to keep your data and do your computing, or are you doing it there because you have to? • New environment should make it more efficient and safe to do your computing on the grid. • If you need a Windows environment, we can still offer • Software installation • Access to consulting • Data storage and backup
Coming this summer • More grid Nodes?? • A Windows Server for expensive research applications (Authorware …) • ??? (What do you need)
Comments?? • What are your needs? • What isn’t covered here?