350 likes | 583 Views
Building a scientific research computing environment. Eric Wu, BBN Technologies 10/29/2003. Building a scientific research computing environment . Eric Wu, BBN Technologies 10/29/2003. BBN Techologies.
E N D
Building a scientific research computing environment Eric Wu, BBN Technologies 10/29/2003
Building a scientific research computing environment Eric Wu, BBN Technologies 10/29/2003
BBN Techologies • Consulting firm founded by MIT Professors and a student in 1948. Leo Beranek (B) receiving the 2002 National Medal Of Science • Located in Cambridge, MA • Accomplishments • First ARPAnet • @ symbol in email • First router • Analyzed Nixon watergate tapes • My department • Speech recognition. Transcription, not translation. • English, Arabic, Japanese • ~150 node network • http://www.bbn.com
What should I buy? Hardware and software Hardware depends on software to realize full potential Software depends on hardware to realize full potential $$$
Software • Test speed of software (benchmark) • Rules for benchmarking • First rule of benchmarking: • The only benchmark that matters is your code!!! • SPEC, Vendor benchmarks are worthless (my opinion) • Always try to benchmark before buying a new architecture • Benchmarking resources • Your friends • Web • Supercomputing centers • Testdrive.hp.com (Alpha, Pentium, Itanium) • Buy one
Software - Benchmarking example • Performance • VASP – Alpha is better than Xeon • CP90 - Alpha and Xeon are same • Alpha costs 4-5x as much
Hardware • Hardware features • Memory speed • Interconnects (Front Side Bus) • Clock speed • 32 bit vs. 64 bit • Cache • Processor architecture • Understanding hardware can help to understand or predict speed
Interconnect (Bus) Memory Memory Hardware Diagram of Hardware Processor Processor
Memory and Front Side Bus • Don’t ignore memory and interconnects(FSB)! • Memory and Front Side Bus (FSB) speed make a difference in performance • Be careful when vendors are upgrading • FSB for Xeons lag behind Pentium4 • FSB effects on a dual-processor machine • 1 job (+ 1 free processor) takes 1 hour • 2 jobs (no free processors) each take 1.25 hours • Bandwidth limitations!
Processor: clock speed • Defined as rate the processor runs (cycles per second) • Useful only when comparing within an architecture (Pentium to Pentium) • Useless when comparing across architectures • For VASP, Alpha 1.25 GHz is 2x as fast as Xeon 2.8 GHz • For VASP, Itanium 900 MHz is 1.6x as fast as Xeon 2.8 GHz • Many other factors matter • Example: Instructions per clock cycle also matter (IPC) • Pentium 4 – 2 • Itanium “Madison” – 6
Processor: 32 bit vs. 64 bit • Definitions • 32 bit – can store range of 232 integers • 64 bit – can store range of 264 integers • Does not mean 64 bit is automatically faster or better! • Advantages of 64 bit • High memory applications • Each number points to an address space in memory • 232 = 4x109, or 4G • 264 = 4x109, or 18 billion G • 32 bit can access > 4G with OS tricks, but slow • Applications with large range of numbers • Scientific computing • Cryptography • 32 bit can access 264 with compiler tricks, but slow
Interconnect (Bus) Memory Memory Processor: Cache Diagram of Hardware Processor Fast Processor Cache Slow Slow
Processor: Cache • Cache • Bypass slow interconnect and memory • Reduce access time to information • Reduce bandwidth requirements to memory • L2 vs L3 • Lower L[n] means closer to processor, more potential for improvement • Effects • Faster code • Superlinear speedup in parallel code • Examples • Xeon 3.06 GHz 512k L2, 1MB L3 • Opteron 1MB L2 • Itanium “Madison” 6MB L3 • Alpha 16 MB L2 Processor L2 Cache L3 Cache Memory
Hardware • Many processor features can influence speed • Effect on speed will depend on software There is no substitute for benchmarking
Purchasing Strategies • Don’t forget to ask your friends • How much did they pay? • Which vendors? • How reliable? • Picking vendors • Know your group • How many students? • How many machines? • Know the differences between vendors • Vendor A vs. Vendor B • Hardware: Repair on site vs. send it back • Memory: Next day air replacement vs. send it back • Diagnosing problems: Motherboard lights vs. send it back • Rack Rails: Snap in vs. Screw in • Problem rate: 2/16 machines (9%) vs. 5/24 machines (21%) • Machine cooling: 5 fans vs. 2 fans • Cost: Vendor A is $550 more per node, 1/6th more!
Purchasing Strategies • Beware new hardware • 3 points of failure: hardware, compiler, software • Case study 1: Pentium 2 Xeons (1998) (donation) • Operating system? • Windows was slow • Linux was buggy • Compilers were new, no standards • Software (VASP) did not have Pentium support • Case study 2: Itanium I 600 (2000) on testdrive.hp.com • Processors were slower than expected • Intel compiler operated differently on Itanium and Xeon • Math libraries had bugs (MKL) • Software (VASP) did not have Itanium support • Sometimes, it’s better to let somebody else be the guinea pig
Purchasing Strategies - Examples • Buying Xeons • Quotation from Vendor A : ~$4500. • Quotation from Vendor B: ~$3000! • Go back to Vendor A, Vendor A lowers price to $3000 • This is extreme, but you should price shop. • SW Technologies http://www.swt.com gives prices of cheap Xeons. • Ask your friends what they paid. • Buying Alphas • Quotation from Vendor A : ~$12,500 • Threaten to buy all Xeons! • New quotation from Vendor A: $11,000
Parallel computing • Moore’s law is slowing down • Source: • http://www.nersc.gov/~simon/cs267/
Parallel computing • Even with Moore’s law, at best we can only double system size every two years (with N scaling) • Parallel computing • Advancements in hardware • SMP machines • More processors/machine • Networking of Intel-type machines • Myrinet • Gigabit is cheaper • Advancements in software • MPICH and LAM are more robust • Your favorite code is probably parallel now • Cost • Usually cheaper (can be 50%). Some costs (cooling, power) usually covered by school or lab
Parallel computing Hardware • Networking Hardware • Fast Ethernet (100 Mbits/s) • Gigabit (1000 Mbits/s) • Myrinet • Quadrics, Infiniband, etc… • Definition of terms • Latency – Time to decide where to send packet. • Low latency is good for many small packets • Bandwidth • How fast does it transmit? • Maximum switching capacity • Maximum volume it can handle (relevant for gigabit)
Parallel computing Hardware • Buy a vendor architecture • 8-16 processors on each machine • Examples: HP GS160, HP GS320, IBM Power 4 • Advantages • Less sysadmin • More reliabile • Easier in every way • Division of machine into OS partitions (more for businesses) • Disadvantages • Cost - ~$500,000 vs. $50,000-$150,000 • Can pay for sysadmins instead
Parallel computing Hardware • Gigabit • Pricing • Cards are often free (standard) • Switches are moderately expensive, and falling • Few ports = cheap. Pricing does not scale well to >60 ports. • Latency • Moderate. Depends on switch and packet size • Be careful of switching capacity!! Make sure to buy a switch that is made for high performance computing, not routing. • Brands: • Foundry • Extreme • Cisco
Parallel computing Hardware • Myrinet • Pricing • Total ~$1100 a port (http://www.myri.com) • Linear scaling up to 128 ports. • Latency • Lowest latency • Needs setup of drivers (not too bad, but…) • Easy to expand • Best performance for large number of processors (at highest price)
Parallel computing Hardware • Remember the first rule of benchmarking. Example • PWSCF or ABINIT, parallelize over k-points • Little communication • Drawback – need a lot of memory, need kpoints>processors • No need for either gigabit or Myrinet • VASP parallelize over plane waves • A lot of communication • Reduce memory usage • Gigabit or Myrinet is essential • Know your code and how you will use it!
Parallel computing Software • PVM Parallel Virtual Machine • MPI Message Passing Interface • LAM http://www.lam-mpi.org • Designed for TCP/IP (clusters) • Performance (?) • MPICH http://www-unix.mcs.anl.gov/mpi/mpich/ • Stack architecture = flexibility. Not just TCP/IP • More popular • Slightly easier to use • Both MPICH and LAM can coexist. Pick the one you like.
Compilers • Often overlooked • Compilers can increase speed 10-100% • Compilers are cost-effective • Compiler may cost $500 • Cost to increase speed 10-100% can be $200-$2000/machine! • Disadvantages • Each compiler is different – alter code for each compiler • Students hate compiling codes
Compilers – gcc (2.95.3, 3.3) • Available at http://gcc.gnu.org • Advantages • Free • Portable • Wide base of users • Newer versions produce fast code • Disadvantages • Poor Fortran support
Compilers – Intel • Available at • http://www.intel.com/software/products/compilers/flin/noncom.htm • http://www.intel.com/software/products/compilers/clin/noncom.htm • Advantages • Free (academia) • Wide base of users (more so for Fortran) • FAST code on Intel chips. Reported fast code for AMD chips • Disadvantages • Harder to use (my opinion) • “Character” of different versions • No Red Hat 9 support
Compilers – Portland, and others • Pricing info at http://www.pgroup.com/pricing/ae.htm • Advantages • Works for all platforms • Robust • Disadvantages • Some cost • Not as fast • Other compilers • NAG • Fujitsu • Absoft
Math Libraries • BLAS/LAPACK • Intel MKL - http://www.intel.com/software/products/mkl/ • ATLAS - http://math-atlas.sourceforge.net • K. Goto’s BLAS - http://www.cs.utexas.edu/users/flame/goto/ • FFTW (http://www.fftw.org) • Vendor only • HP/Compaq cxml • IBM essl • SGI scsl
Disk Storage • Should be done on a RAID (Redundant Array of Inexpensive Disks) • RAID configuration provides fault tolerance • Different types of RAID • RAID 1 (mirroring) - 2 disks (two 100 G disks = 100G of data) • RAID 5 – 3+ disks (three 100G disks = 200G data, four 100G disks = 300G data, etc…) • Implemented within software or hardware • Disk type SCSI or IDE • May take one day to set up. Can save your hide!!!
What type of RAID should I use? • Software or Hardware? • Software RAID is free • Hardware RAID has better performance (especially with more clients), but costs $$$. Usually can buy a PCI card and some cables. • SCSI or IDE disks? • IDE is cheap • SCSI is $$$, but better performance. Most believe better quality. • SATA disks are another alternative. • Costs • Hardware/SCSI can cost 3x more • Don’t forget cost of computer to house disks • My recommendation • Hardware/SCSI • Graduate students hate to do sysadmin tasks. • Graduate students tend to be lax with sysadmin tasks • Force your students to delete old files/use gzip • Hardware/IDE – If you need 100’s of G of storage
Backups • “You’re only as good as your last backup” Ancient computing proverb • MIT-TSM backups http://web.mit.edu/is/help/tsm/quickstart.html • $7.50 a month • Unlimited storage (rsync) – limited only by restore speed • With scripts, can backup every day • Disk mirroring with rsync • Buy a few cheap IDE disks • Use an old machine • Tape backups • “You’re only as good as your last restore” • Modern computing proverb
Further reading • 32 vs. 64 bit • Good article: http://www.arstechnica.com/cpu/03q1/x86-64/x86-64-1.html • Courses on supercomputers (recommended) • Berkeley: http://www.nersc.gov/~simon/cs267/ • Buffalo: http://www.ccr.buffalo.edu/content/education.htm#courses • Building a Beowulf • Ron Choy@mit http://www.mit.edu/people/cly/beowulf.ppt • ROCKS, “automatic” install of Beowulf cluster http://www.x2ca.com/articles/ICCS2003.pdf • Parallel computing/supercomputing links • Parascope http://www.computer.org/parascope/ • Nan’s page http://www.cs.rit.edu/~ncs/parallel.html • Top 500 http://www.top500.org/
Conclusions • Hardware understanding can help you make an intelligent decision • Nothing beats a benchmark of your code • Don’t forget the compiler and math libraries • Consider your parallel computing options • Be sure to implement fault-tolerant systems (RAID and backups)