190 likes | 201 Views
A research group in the Computer Science Division at U.C. Berkeley, currently interviewing for jobs and providing available resources for NPACI users.
E N D
Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley http://now.cs.berkeley.edu IPPS 98
Disclaimer • Still a research group, not a computer center • Project transition phase • 0.25 Staff FTE (- family leave) till now • Finally got jobs through UCB • Interviewing NOW! IPPS 98
NPACI Users • Currently 43 official NPACI users from 15 sites • still modest number of hours • front-end time is free • only account for GLUnix usage • Still have many unofficial external users • about 50 non-CS at UCB and 25 external • some have both kinds of accounts • attempted to run CS267 through NPACI to debug “partnership” • process too slow, difficult to use other resources • attempted to move external NOW users to NPACI • not ready for the rush • So far mostly conventional MPI users • not “systems” users yet IPPS 98
Available Resources IPPS 98
Partitions • Initially a 10-node “default” cluster to absorb sequential load • production parallel cluster • 19 => 32 nodes • grappling with NOW research <=> NPACI usage IPPS 98
Usage IPPS 98
Hardware Development • Sun 450 SMP front-end for NPACI users • now.npaci.edu • starting point for next years NPACI CLUMPS • New slice of a clustered file server • Sun 450 SMP (4 GB, 4 proc) • with 500 GB fiberchannel attached drive • tape stacker for backup • Testing gigabit ethernet backbone IPPS 98
Cluster of SMPs (CLUMPS) • Four Sun E5000s • 8 processors • 3 Myricom NICs • Multiprocessor, Multi-NIC, Multi-Protocol IPPS 98
Pleiades Information Servers • Basic Storage Unit: • Ultra 2, 300 GB raid, 800 GB tape stacker, ATM • scalable backup/restore • Dedicated Info Servers • web, • security, • mail, … • VLANs project into dept. IPPS 98
... ... ... ... ... ... ... ... Basic Hardware Configuration 126 GB 126 GB FC hub 126 GB FC hub Pleiades Pleiades Pleiades Pleiades 126 GB Sun 450 4 proc 4 GB Sun 450 2 proc 2 GB Tomorrow ATM Bay Router NPACI CLUMPS NOW nodes IPPS 98
Tool Development • GLUnix availability enhanced through GLUguard • MPI over AM protocol development • Virtual Network support • Implicit Coscheduling • Split-C environment • Performance Analysis Tools • available for download IPPS 98
Automatic Mgmt of Virtual Networks • Collection of Endpoints form a Virtual Network • Direct, protected hardware access performance • General purpose Host Memory Process n Processor *** Process 3 Process 2 Process 1 NIC Mem P IPPS 98 Network Interface
App App App MPI CH MPI CH MPI CH AM ADI AM ADI AM ADI Machine Architecture Machine Architecture Machine Architecture MPI over AM ° ° ° Network IPPS 98
GS GS LS LS A A GS GS LS LS A A A A Implicit co-scheduling • Obtain coordinated without explicit subsystem interaction, only the events in the program • very easy to build • potentially very robust to component failures • inherently “service on-demand” • scalable • Local service component can evolve. IPPS 98
Performance Analysis Tools NPB LU-A IPPS 98
Tools (cont) 8-fold reduction in miss rate from 4 to 8 proc IPPS 98
Imports and Exports • NPACI file configuration • will be finished with new server • TCP wrappers • very valuable for NOW • NPACI queueing • SSH and Kerberos environment • shared K5 domain IPPS 98
FY99 Budget Summary Faculty Time 1.75 mo 20.4 K Post-doc Res. 1 FT 79.7 K Support Staff 2.2 FT 291 K Travel 19.5K 4 x 4 Clumps 358 K S&E 51.6K Direct Cost Total 746.1 K Growth: Scale CLUMPS to 32, 48, or 64 proc. IPPS 98
Millennium PC Clumps • Initial phase of 6M$ Intel Grant • Inexpensive, easy to manage Cluster • NOW environment • moving to NT • Replicated in many departments • Prototype for 400 proc PC cluster IPPS 98