Supercomputers New Perspective

1. SupercomputersNew Perspective Prof. Sin-Min Lee Department of Computer Science

22. Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream- MIMD

23. Single Instruction, Single Data Stream - SISD Single processor Single instruction stream Data stored in single memory Uni-processor

24. Single Instruction, Multiple Data Stream - SIMD Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors

25. Multiple Instruction, Single Data Stream - MISD Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Never been implemented

26. Multiple Instruction, Multiple Data Stream- MIMD Set of processors Simultaneously execute different instruction sequences Different sets of data SMPs, clusters and NUMA systems

28. Taxonomy of Parallel Processor Architectures

30. MIMD - Overview General purpose processors Each can process all instructions necessary Further classified by method of processor communication

37. Tightly Coupled - SMP Processors share memory Communicate via that shared memory Symmetric Multiprocessor (SMP) Share single memory or pool Shared bus to access memory Memory access time to given area of memory is approximately the same for each processor

38. Tightly Coupled - NUMA Nonuniform memory access Access times to different regions of memroy may differ

39. Loosely Coupled - Clusters Collection of independent uniprocessors or SMPs Interconnected to form a cluster Communication via fixed path or network connections

40. Parallel Organizations - SISD

41. Parallel Organizations - SIMD

42. Parallel Organizations - MIMD Shared Memory

43. Parallel Organizations - MIMDDistributed Memory

46. Symmetric Multiprocessors with the following characteristics Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O Either through same channels or different channels giving paths to same devices All processors can perform the same functions (hence symmetric) System controlled by integrated operating system providing interaction between processors Interaction at job, task, file and data element levels

47. SMP Advantages Performance If some work can be done in parallel Availability Since all processors can perform the same functions, failure of a single processor does not halt the system Incremental growth User can enhance performance by adding additional processors Scaling Vendors can offer range of products based on number of processors

48. Block Diagram of Tightly Coupled Multiprocessor

49. History Cray Research founded in 1972. Cray Computer founded in 1988. 1976 First product � Cray-1 (240,000,000 OpS). Seymour Cray personally invented vector register technology. 1985 Cray-2 (1,200,000,000 OpS, a 5-fold increase from Cray 1). Seymour is credited with immersion-cooling technology Cray-3 used revolutionary new gallium arsenide integrated circuits for the traditional silicon ones 1996 Cray was bought by SGI In March 2000 the Cray Research name and business was sold by SGI to Tera Inc.

50. Visual Tour

51. Market Segment (from top500)

52. Supercomputer Architecture

53. Current Cray Products Cray X1 is the only Cray�s product with a unique vector CPU Competitors are: Fujitsu, NEC, HP Cray XT3 and XD1 use AMD Opteron CPUs (series 100 and series 200 accordingly) You can find full product specifications as well as additional information on current systems at www.cray.com

54. Performance Measurements Performance is measured in teraflops Linpack is a standard benchmark Performance is also measured in memory bandwidth & latency, disk performance, interconnects, internal IO, reliability, and others For example: My home system, Athlon 750, gives about 34 megaflops (34*10^6 flops) Current mid-range supercomputers give about 40 teraflops(40*10^12 flops) which is 1,176,470 times faster

55. Scalable Architecture in XT-3

56. Is Cray a good deal? Typical Cost approximately $30 million and above Useful lifetime � 6 years Most customers use supercomputers at 90% - 98% load Clustered supercomputers and machines build around common desktop components (AMD/Intel CPUs, memory chips, motherboards, and etc.) are significantly cheaper

57. Future Cray�s �Red Storm� System in Sandia National Laboratories is running on Linux OS Current Cost $90 million Uses 11,648 AMD Opteron CPUs Current operational speed � 41.5 teraflops Uses unique SeaStar chip, which passes messages between thousands of CPUs Upgrades are scheduled to be completed by the end of 2005 using dual-core Opteron Expected to reach 100 teraflops by the end of 2005

64. We currently have two Cray T3E supercomputers, which are used to runthe daily weather forecasts and to run large scale climate studies.These are both Massively Parallel Processor (MPP) systems, which is tosay that they contain a large number of processors (CPUs), each withit's own separate portion of volatile RAM which serves as mainmemory.The individual portions of memory are relatively low, 128 MB perprocessor on one T3E and 256 MB on the other, but because of thenumbers of processors involved this gives total memory sizes of 118 GBon the T3E-900 and 168 GB on the T3E-1200E respectively.In order to run programs with very high memory requirements, it isnecessary for the programmer to break down the forecast or climatedata into smaller sections and distribute it across a number ofprocessors.� Each processor can access it's own local memory withnormal load/store operations, but data on held remote processors mustbe accessed using special software routines, such as the MessagePassing Interface (MPI) or Cray's SHMEM system.The T3Es run an operating system called Unicos/mk.� This is based onthe Unix operating system, but it has been extensively modified toallow it to run across a large number of processors simultaneously.Unicos/mk is best thought of as an interactive system which has thecapacity to run batch work, rather than the other way round.� Thebatch facilities are provided by an additional piece of Cray softwarecalled the Network Queueing System (NQS).For reasons of efficiency, the majority of the workload on the Crays,including the weather forecasts, are run in batch mode.� This allowsus to ensure that the supercomputers are run at full capacity overweekends and holiday periods, as well as allowing us to allocate extracomputing power to our scientists when it is not required for toproduce the daily forecasts.

65. We currently have two Cray T3E supercomputers, which are used to runthe daily weather forecasts and to run large scale climate studies.These are both Massively Parallel Processor (MPP) systems, which is tosay that they contain a large number of processors (CPUs), each withit's own separate portion of volatile RAM which serves as mainmemory.The individual portions of memory are relatively low, 128 MB perprocessor on one T3E and 256 MB on the other, but because of thenumbers of processors involved this gives total memory sizes of 118 GBon the T3E-900 and 168 GB on the T3E-1200E respectively.In order to run programs with very high memory requirements, it isnecessary for the programmer to break down the forecast or climatedata into smaller sections and distribute it across a number ofprocessors.� Each processor can access it's own local memory withnormal load/store operations, but data on held remote processors mustbe accessed using special software routines, such as the MessagePassing Interface (MPI) or Cray's SHMEM system.The T3Es run an operating system called Unicos/mk.� This is based onthe Unix operating system, but it has been extensively modified toallow it to run across a large number of processors simultaneously.Unicos/mk is best thought of as an interactive system which has thecapacity to run batch work, rather than the other way round.� Thebatch facilities are provided by an additional piece of Cray softwarecalled the Network Queueing System (NQS).For reasons of efficiency, the majority of the workload on the Crays,including the weather forecasts, are run in batch mode.� This allowsus to ensure that the supercomputers are run at full capacity overweekends and holiday periods, as well as allowing us to allocate extracomputing power to our scientists when it is not required for toproduce the daily forecasts.

66. > decision to make. We can rewrite our codes for each computer system we> buy in order to make it run really fast and efficiently on that system.> This costs us a lot of money in employing people to rewite our code.> Alternatively we can write code that may not be as fast and efficient,> but that runs well on many different systems. This is closer to the> approach we actually take - in reality it's a bit of both - but as far> as possible we aim for code that is highly portable to different> computers.> > Slide 32:> Again a very complex idea to explain! Can't think off the top of my head> of a good analogy to use here. I'll let you know if I can think of> something...> > Slide 34:> Similar comments to 32> decision to make. We can rewrite our codes for each computer system we> buy in order to make it run really fast and efficiently on that system.> This costs us a lot of money in employing people to rewite our code.> Alternatively we can write code that may not be as fast and efficient,> but that runs well on many different systems. This is closer to the> approach we actually take - in reality it's a bit of both - but as far> as possible we aim for code that is highly portable to different> computers.> > Slide 32:> Again a very complex idea to explain! Can't think off the top of my head> of a good analogy to use here. I'll let you know if I can think of> something...> > Slide 34:> Similar comments to 32

67. Weather forecasting

105. Teacher�s note. You could do this as a class exercise to show how parallel processing works. Ask 12 pupils ..( if your class is smaller than 26 you can make this group smaller, as long as you have one person in this group you can demonstrate the concept) Call them Group A to write out the seven times table. Ask another 12 pupils .. Call them Group B to write out one line of the seven times table. So Ann does 1*7 Branwen does 2*7, Sabeen does 3*7, Lucy does 4*7 and so on. Then have one pupil time the first person in Group A to finish writing out the tables. Have another pupil time how long it takes for everybody in Group B to have written down their answers. Group B should be much faster. This is the idea behind Parallel ProcessingTeacher�s note. You could do this as a class exercise to show how parallel processing works. Ask 12 pupils ..( if your class is smaller than 26 you can make this group smaller, as long as you have one person in this group you can demonstrate the concept) Call them Group A to write out the seven times table. Ask another 12 pupils .. Call them Group B to write out one line of the seven times table. So Ann does 1*7 Branwen does 2*7, Sabeen does 3*7, Lucy does 4*7 and so on. Then have one pupil time the first person in Group A to finish writing out the tables. Have another pupil time how long it takes for everybody in Group B to have written down their answers. Group B should be much faster. This is the idea behind Parallel Processing

108. References http://research.microsoft.com/users/gbell/craytalk/sld066.htm http://inventors.about.com/library/inventors/blsupercomputer.htm http://americanhistory.si.edu/csr/comphist/cray.htm http://web.mit.edu/invent/iow/cray.html www.top500.org http://www.spikynorman.dsl.pipex.com/CrayWWWStuff/ http://news.zdnet.co.uk/hardware/emergingtech/0,39020357,39162182,00.htm

Supercomputers New Perspective

Supercomputers New Perspective

Presentation Transcript

Supercomputers 2

BigSim: Simulating PetaFLOPS Supercomputers

Science on Supercomputers:

SUPERCOMPUTERS & PARALLELISM

Using the BYU Supercomputers

ASSIGNMENT: 1 SUPERCOMPUTERS

High Performance Computing – Supercomputers

Supercomputers

Supercomputers

Cray Supercomputers New Perspective

Seymour Cray: supercomputers

Enterprise Supercomputers

Supercomputers in Europe

Appro Xtreme-X Supercomputers

Supercomputers

Supercomputers

Supercomputers 2

Supercomputers 2

BigSim: Simulating PetaFLOPS Supercomputers

Supercomputers New Perspective