1 / 35

Building a scientific research computing environment

Building a scientific research computing environment. Eric Wu, BBN Technologies 10/29/2003. Building a scientific research computing environment . Eric Wu, BBN Technologies 10/29/2003. BBN Techologies.

Jeffrey
Download Presentation

Building a scientific research computing environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a scientific research computing environment Eric Wu, BBN Technologies 10/29/2003

  2. Building a scientific research computing environment Eric Wu, BBN Technologies 10/29/2003

  3. BBN Techologies • Consulting firm founded by MIT Professors and a student in 1948. Leo Beranek (B) receiving the 2002 National Medal Of Science • Located in Cambridge, MA • Accomplishments • First ARPAnet • @ symbol in email • First router • Analyzed Nixon watergate tapes • My department • Speech recognition. Transcription, not translation. • English, Arabic, Japanese • ~150 node network • http://www.bbn.com

  4. What should I buy? Hardware and software Hardware depends on software to realize full potential Software depends on hardware to realize full potential $$$

  5. Software • Test speed of software (benchmark) • Rules for benchmarking • First rule of benchmarking: • The only benchmark that matters is your code!!! • SPEC, Vendor benchmarks are worthless (my opinion) • Always try to benchmark before buying a new architecture • Benchmarking resources • Your friends • Web • Supercomputing centers • Testdrive.hp.com (Alpha, Pentium, Itanium) • Buy one

  6. Software - Benchmarking example • Performance • VASP – Alpha is better than Xeon • CP90 - Alpha and Xeon are same • Alpha costs 4-5x as much

  7. Hardware • Hardware features • Memory speed • Interconnects (Front Side Bus) • Clock speed • 32 bit vs. 64 bit • Cache • Processor architecture • Understanding hardware can help to understand or predict speed

  8. Interconnect (Bus) Memory Memory Hardware Diagram of Hardware Processor Processor

  9. Memory and Front Side Bus • Don’t ignore memory and interconnects(FSB)! • Memory and Front Side Bus (FSB) speed make a difference in performance • Be careful when vendors are upgrading • FSB for Xeons lag behind Pentium4 • FSB effects on a dual-processor machine • 1 job (+ 1 free processor) takes 1 hour • 2 jobs (no free processors) each take 1.25 hours • Bandwidth limitations!

  10. Processor: clock speed • Defined as rate the processor runs (cycles per second) • Useful only when comparing within an architecture (Pentium to Pentium) • Useless when comparing across architectures • For VASP, Alpha 1.25 GHz is 2x as fast as Xeon 2.8 GHz • For VASP, Itanium 900 MHz is 1.6x as fast as Xeon 2.8 GHz • Many other factors matter • Example: Instructions per clock cycle also matter (IPC) • Pentium 4 – 2 • Itanium “Madison” – 6

  11. Processor: 32 bit vs. 64 bit • Definitions • 32 bit – can store range of 232 integers • 64 bit – can store range of 264 integers • Does not mean 64 bit is automatically faster or better! • Advantages of 64 bit • High memory applications • Each number points to an address space in memory • 232 = 4x109, or 4G • 264 = 4x109, or 18 billion G • 32 bit can access > 4G with OS tricks, but slow • Applications with large range of numbers • Scientific computing • Cryptography • 32 bit can access 264 with compiler tricks, but slow

  12. Interconnect (Bus) Memory Memory Processor: Cache Diagram of Hardware Processor Fast Processor Cache Slow Slow

  13. Processor: Cache • Cache • Bypass slow interconnect and memory • Reduce access time to information • Reduce bandwidth requirements to memory • L2 vs L3 • Lower L[n] means closer to processor, more potential for improvement • Effects • Faster code • Superlinear speedup in parallel code • Examples • Xeon 3.06 GHz 512k L2, 1MB L3 • Opteron 1MB L2 • Itanium “Madison” 6MB L3 • Alpha 16 MB L2 Processor L2 Cache L3 Cache Memory

  14. Hardware • Many processor features can influence speed • Effect on speed will depend on software There is no substitute for benchmarking

  15. Purchasing Strategies • Don’t forget to ask your friends • How much did they pay? • Which vendors? • How reliable? • Picking vendors • Know your group • How many students? • How many machines? • Know the differences between vendors • Vendor A vs. Vendor B • Hardware: Repair on site vs. send it back • Memory: Next day air replacement vs. send it back • Diagnosing problems: Motherboard lights vs. send it back • Rack Rails: Snap in vs. Screw in • Problem rate: 2/16 machines (9%) vs. 5/24 machines (21%) • Machine cooling: 5 fans vs. 2 fans • Cost: Vendor A is $550 more per node, 1/6th more!

  16. Purchasing Strategies • Beware new hardware • 3 points of failure: hardware, compiler, software • Case study 1: Pentium 2 Xeons (1998) (donation) • Operating system? • Windows was slow • Linux was buggy • Compilers were new, no standards • Software (VASP) did not have Pentium support • Case study 2: Itanium I 600 (2000) on testdrive.hp.com • Processors were slower than expected • Intel compiler operated differently on Itanium and Xeon • Math libraries had bugs (MKL) • Software (VASP) did not have Itanium support • Sometimes, it’s better to let somebody else be the guinea pig

  17. Purchasing Strategies - Examples • Buying Xeons • Quotation from Vendor A : ~$4500. • Quotation from Vendor B: ~$3000! • Go back to Vendor A, Vendor A lowers price to $3000 • This is extreme, but you should price shop. • SW Technologies http://www.swt.com gives prices of cheap Xeons. • Ask your friends what they paid. • Buying Alphas • Quotation from Vendor A : ~$12,500 • Threaten to buy all Xeons! • New quotation from Vendor A: $11,000

  18. Parallel computing • Moore’s law is slowing down • Source: • http://www.nersc.gov/~simon/cs267/

  19. Parallel computing • Even with Moore’s law, at best we can only double system size every two years (with N scaling) • Parallel computing • Advancements in hardware • SMP machines • More processors/machine • Networking of Intel-type machines • Myrinet • Gigabit is cheaper • Advancements in software • MPICH and LAM are more robust • Your favorite code is probably parallel now • Cost • Usually cheaper (can be 50%). Some costs (cooling, power) usually covered by school or lab

  20. Parallel computing Hardware • Networking Hardware • Fast Ethernet (100 Mbits/s) • Gigabit (1000 Mbits/s) • Myrinet • Quadrics, Infiniband, etc… • Definition of terms • Latency – Time to decide where to send packet. • Low latency is good for many small packets • Bandwidth • How fast does it transmit? • Maximum switching capacity • Maximum volume it can handle (relevant for gigabit)

  21. Parallel computing Hardware • Buy a vendor architecture • 8-16 processors on each machine • Examples: HP GS160, HP GS320, IBM Power 4 • Advantages • Less sysadmin • More reliabile • Easier in every way • Division of machine into OS partitions (more for businesses) • Disadvantages • Cost - ~$500,000 vs. $50,000-$150,000 • Can pay for sysadmins instead

  22. Parallel computing Hardware • Gigabit • Pricing • Cards are often free (standard) • Switches are moderately expensive, and falling • Few ports = cheap. Pricing does not scale well to >60 ports. • Latency • Moderate. Depends on switch and packet size • Be careful of switching capacity!! Make sure to buy a switch that is made for high performance computing, not routing. • Brands: • Foundry • Extreme • Cisco

  23. Parallel computing Hardware • Myrinet • Pricing • Total ~$1100 a port (http://www.myri.com) • Linear scaling up to 128 ports. • Latency • Lowest latency • Needs setup of drivers (not too bad, but…) • Easy to expand • Best performance for large number of processors (at highest price)

  24. Parallel computing Hardware • Remember the first rule of benchmarking. Example • PWSCF or ABINIT, parallelize over k-points • Little communication • Drawback – need a lot of memory, need kpoints>processors • No need for either gigabit or Myrinet • VASP parallelize over plane waves • A lot of communication • Reduce memory usage • Gigabit or Myrinet is essential • Know your code and how you will use it!

  25. Parallel computing Software • PVM Parallel Virtual Machine • MPI Message Passing Interface • LAM http://www.lam-mpi.org • Designed for TCP/IP (clusters) • Performance (?) • MPICH http://www-unix.mcs.anl.gov/mpi/mpich/ • Stack architecture = flexibility. Not just TCP/IP • More popular • Slightly easier to use • Both MPICH and LAM can coexist. Pick the one you like.

  26. Compilers • Often overlooked • Compilers can increase speed 10-100% • Compilers are cost-effective • Compiler may cost $500 • Cost to increase speed 10-100% can be $200-$2000/machine! • Disadvantages • Each compiler is different – alter code for each compiler • Students hate compiling codes

  27. Compilers – gcc (2.95.3, 3.3) • Available at http://gcc.gnu.org • Advantages • Free • Portable • Wide base of users • Newer versions produce fast code • Disadvantages • Poor Fortran support

  28. Compilers – Intel • Available at • http://www.intel.com/software/products/compilers/flin/noncom.htm • http://www.intel.com/software/products/compilers/clin/noncom.htm • Advantages • Free (academia) • Wide base of users (more so for Fortran) • FAST code on Intel chips. Reported fast code for AMD chips • Disadvantages • Harder to use (my opinion) • “Character” of different versions • No Red Hat 9 support

  29. Compilers – Portland, and others • Pricing info at http://www.pgroup.com/pricing/ae.htm • Advantages • Works for all platforms • Robust • Disadvantages • Some cost • Not as fast • Other compilers • NAG • Fujitsu • Absoft

  30. Math Libraries • BLAS/LAPACK • Intel MKL - http://www.intel.com/software/products/mkl/ • ATLAS - http://math-atlas.sourceforge.net • K. Goto’s BLAS - http://www.cs.utexas.edu/users/flame/goto/ • FFTW (http://www.fftw.org) • Vendor only • HP/Compaq cxml • IBM essl • SGI scsl

  31. Disk Storage • Should be done on a RAID (Redundant Array of Inexpensive Disks) • RAID configuration provides fault tolerance • Different types of RAID • RAID 1 (mirroring) - 2 disks (two 100 G disks = 100G of data) • RAID 5 – 3+ disks (three 100G disks = 200G data, four 100G disks = 300G data, etc…) • Implemented within software or hardware • Disk type SCSI or IDE • May take one day to set up. Can save your hide!!!

  32. What type of RAID should I use? • Software or Hardware? • Software RAID is free • Hardware RAID has better performance (especially with more clients), but costs $$$. Usually can buy a PCI card and some cables. • SCSI or IDE disks? • IDE is cheap • SCSI is $$$, but better performance. Most believe better quality. • SATA disks are another alternative. • Costs • Hardware/SCSI can cost 3x more • Don’t forget cost of computer to house disks • My recommendation • Hardware/SCSI • Graduate students hate to do sysadmin tasks. • Graduate students tend to be lax with sysadmin tasks • Force your students to delete old files/use gzip • Hardware/IDE – If you need 100’s of G of storage

  33. Backups • “You’re only as good as your last backup” Ancient computing proverb • MIT-TSM backups http://web.mit.edu/is/help/tsm/quickstart.html • $7.50 a month • Unlimited storage (rsync) – limited only by restore speed • With scripts, can backup every day • Disk mirroring with rsync • Buy a few cheap IDE disks • Use an old machine • Tape backups • “You’re only as good as your last restore” • Modern computing proverb

  34. Further reading • 32 vs. 64 bit • Good article: http://www.arstechnica.com/cpu/03q1/x86-64/x86-64-1.html • Courses on supercomputers (recommended) • Berkeley: http://www.nersc.gov/~simon/cs267/ • Buffalo: http://www.ccr.buffalo.edu/content/education.htm#courses • Building a Beowulf • Ron Choy@mit http://www.mit.edu/people/cly/beowulf.ppt • ROCKS, “automatic” install of Beowulf cluster http://www.x2ca.com/articles/ICCS2003.pdf • Parallel computing/supercomputing links • Parascope http://www.computer.org/parascope/ • Nan’s page http://www.cs.rit.edu/~ncs/parallel.html • Top 500 http://www.top500.org/

  35. Conclusions • Hardware understanding can help you make an intelligent decision • Nothing beats a benchmark of your code • Don’t forget the compiler and math libraries • Consider your parallel computing options • Be sure to implement fault-tolerant systems (RAID and backups)

More Related