250 likes | 283 Views
Parallel Computers. Past and Present Yenchi Lin Apr 17,2003. Outline. Concepts/Background on Parallel Computers Connection Machines Earth Simulator Conclusion. Quick architecture overview. SIMD, MIMD Shared memory, distributed memory MPP, PVP, SMP NOW
E N D
Parallel Computers Past and Present Yenchi Lin Apr 17,2003
Outline • Concepts/Background on Parallel Computers • Connection Machines • Earth Simulator • Conclusion
Quick architecture overview • SIMD, MIMD • Shared memory, distributed memory • MPP, PVP, SMP • NOW • Network of Workstations (clusters)
SIMD, MIMD • SIMD – Single Instruction Multiple Data • All processors perform same instruction on different pieces of data • Some processors can be masked out from executing certain instructions • MIMD – Multiple Instruction Multiple Data • Each processor executes different instruction on different data
Memory • Shared Memory • Single, unified address space across all processors • Distributed Memory • Each processor has its own address space • Hybrid • Multiple processors within a computing node share the same address space, while the whole system has many different address spaces.
Processors • PVP – parallel vector processors • Cray, NEC, Hitachi • MPP – massively parallel processors • Connection Machines • SMP – symmetric multiple processor • Sun SunFire, DEC (Compaq/HP) AlphaServer
D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach”
Trends (cont.) The trend of MPP overtaking SMP has continued, as number of NOW (clusters) grow in TOP 500 list. D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach”
Connection Machines • Invented by Dennis Hills of Thinking Machines Corp. while at MIT. • Originally designed to run artificial intelligence applications • First working application on CM-1 : Game of Life • CM-1(1985), CM-2 (1986) and CM-5 (1992) • Richard Feynman helped in building the first CM-1s. • At its peak, 70 machines were installed around the world and all in TOP 500 list. • Thinking Machines Corp. filed bankruptcy in 1993, changed to pure software company in 1996, bought by Oracle in 1999.
CM-2 – 1986 • SIMD • hypercube connection • 1bit processor in groups of 16. • 8 dimension for 8192 processor configuration, 12 dimension for 65536 processor configuration. • Programming languages – C*, * lisp, CM Fortran
Sprint Node in CM-2 12 degree connectivity! • 1 bit-serial processors • 16 in a group, two groups on the board • Two groups share same memory and floating point unit • Router has limited processing power
Hypercube Connection in CM-2 • Maximum hop count in hypercube = dimension of hypercube • Router randomly pick the next hop • High wire count Four dimensional hypercube
CM-5 – 1992 • Distributed memory multi-processor • Sparc + custom vector units • Fat Tree structure • Programming Languages – C*, * lisp, CM Fortran, HPF, C++, etc • Supports partitioning, multi-user
Processing Element in CM-5 • 33Mhz SPARC • Vector processor • Network interface • 32MB memory • Connected using Sun MBus • Network access treated equally as memory access – expensive for larger message
Fat-Tree of CM-5 • Three networks – data, control and diagnostic, synchronized on 40Mhz clock • 4-ary fat tree, each processor as leaf • Two parents per child for the first two levels • Four parents per child for higher levels Data network of CM-5
Transition from CM-2 to CM-5 • 1-bit serial processors -> 64bit SPARCs • SIMD -> MIMD • Use SPMD to emulate SIMD behavior • Hypercube -> Fat-Tree • Randomness preserved by random routing
Earth Simulator – 2002 • Collection of modified NEC SX-6 • 640 nodes, 8 way each • 12.3GB/s x 2 network • Theoretical throughput 40TFlops • Max throughput 36TFlops running Linpack
Programming Models of ES • MPI/HPF on node level and process level • OpenMP, threads • Automatic Vectorization
Organization of ES • 320 processor node (PN) cabinet, 2 nodes each • 65 interconnect (IN) cabinet • Crossbar of 640 nodes • 12.3GB/s x 2 (bidirectional) node-to-node, 8TB/s aggregated • 900TB disk space, 1.6 PB tape storage
PN of ES Arithmetic Processor (SX-6) Memory (512MB)
Arithmetic Processor Total of 640 x 8 = 5112 arithmetic processors
remarks • Initial Cost: • Development: 40Billion Yen (USD $400M) • Physical Building: 7Billion Yen (USD $70M) • Operating cost: • Maintenance: 8Billion Yen/Year (USD $80M) • USD $2.54/sec • Electricity: 800Million Yen/Year (USD $8M)
Eye Candies 1 AP, 9 in one cabinet SX-6i PN cabinet, 9AP’s in one Back of a PN cabinet
Conclusion • Connection machines were interesting • Earth simulator is also interesting • Early designs versus recent design • GigaFlops vs. TeraFlops • When will Americans take back the crown in supercomputing?
references • Top 500.org http://www.top500.org/ORSC/ • Earth simulator - http://www.es.jamstec.go.jp/ • http://ails.arc.nasa.gov/Images/InfoSys/AC93-0146-2.html • http://ails.arc.nasa.gov/Images/InfoSys/AC90-0563-7.html • http://archive.ncsa.uiuc.edu/Pubs/TechReports/TR023/Summary.html • http://www.netlib.org/benchmark/top500/reports/report94/Architec/node32.html • http://mission.base.com/tamiko/cm/cm-text.htm • http://www.longnow.org/about/articles/ArtFeynman.html • D.E. Culler, J.P. Singh and A. Gupta. “Parallel Computer Architecture – A Hardware/Software Approach” 1999 • Hennessy, Patterson. “Computer Architecture – A Quantitative Approach, 2nd Ed.” 2002 • D. J. Kerbyson, A. Hoisie, H. Wasserman. “A Comparison Between the Earth Simulator and AlphaServer Systems using Predictive Application Performance Models” 2002 • Thinking Machines Corp. “The Network Architecture of the Connection Machine CM-5” 1992 • E. Blelloch, et. All. “A Comparison of Sorting Algorithms for the Connection Machine CM-2” 1991