1 / 29

Tera MTA (Multi-Threaded Architecture)

Tera MTA (Multi-Threaded Architecture). Thriveni Movva (CMPS 5433). Presentation Contains. Evolution of Tera MTA Design goals of Tera MTA Tera MTA Architecture Interconnection Network Applications Advantages & Drawbacks Current MTA Status. Evolution Of Tera MTA.

eilis
Download Presentation

Tera MTA (Multi-Threaded Architecture)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tera MTA(Multi-Threaded Architecture) Thriveni Movva (CMPS 5433)

  2. Presentation Contains • Evolution of Tera MTA • Design goals of Tera MTA • Tera MTA Architecture • Interconnection Network • Applications • Advantages & Drawbacks • Current MTA Status

  3. Evolution Of Tera MTA • 1987: Tera Computer Company was established by Burton Smith in Washington, USA • 1988: Software development starts • 1991: Hardware development starts • 1997:First MTA-1shipment to SDSC (San Diego Supercomputer Center)

  4. Tera MTA: Design Goals • To solves the two major problems then faced by high-performance parallel computers • scalability • Programmability • To be suitable for very high-speed implementations • The architecture to be applicable to a wide spectrum of problems. • To Ease compiler implementation • To overcome John von Neumann’s bottleneck (a problem of memory usage)

  5. About Tera MTA • The Tera MTA is a high performance system having • scalar multithreaded processors with synchronization among threads • uniform access shared memory i.e all data accessible with equal ease -No locality - No cache - No mapping • simple programming • zero cost context switching

  6. About Multi-Threading architecture (MTA) • Uses a new technique called Multi-threading that lets multiprocessors share memory without using caches • Because these multi-threaded architecture computers can have thousands of processors that stay almost constantly busy, there will be no waits for slow memory accesses • Multi-threading allows each processor to switch thread contexts between execution cycles and as a result the processor stays busy • Whenever a processor starts a slow memory or I/O instruction, rather than waiting tens of cycles for the stalled instruction to complete, the processor executes its next instruction from a different thread using different registers • Each processor has many copies of the programming and pipeline control registers, one copy for each execution thread that it can support

  7. Tera MTA Overview • Up to 256 processors with each processor running @ 260MHz • Up to 128 active threads per processor • Up to 256 I/O processors • Peak Performance of 256 GFlop/sec • Processors and memory modules populate a sparse 3D torus interconnection network • 4096 interconnection network nodes • Flat, shared main memory ranging from 16 to 512 GB • Cost : $5 million to $40 million

  8. A View of the Tera Multiprocessor

  9. Key Architecture Details • Each MTA processor has 128 “streams” each of which is hardware (including 32 registers and a program counter that is devoted to running single thread of control • The processor executes instructions from streams, that are not blocked, in a fair round robin fashion • A stream can issue an instruction every 21 cycles (the length of the instruction pipeline) so at least 21 ready threads are required to keep a processor fully busy • The processor makes a context switch on each cycle, choosing the next instruction from one of the streams that is ready to execute • Using ‘rich’ interconnect network guarantees that any potential delays caused by references to data in memory are completely hidden • Randomized memory mapping and high interconnectivity network provide near-uniform access time from any processor to any memory location.

  10. Key Architecture Details • Hardware multithreading is used to tolerate high latencies to memory. This latency is typically on the order of 150 clock cycles • Expected benefits of the MTA include high processor utilization, near linear scalability, and reduced programming effort specially compared to distributed memory machines using explicit message passing • The current MTA interconnect network is a 3–D toroidal mesh

  11. Tera MTA’S Interconnection Network • The interconnection network is a three-dimensional sparsely populated torus of pipelined packet-switching nodes, each of which is linked to some of its neighbors • Each link can transport a packet-containing source and destination addresses, an operation, and 64 data bits in both directions simultaneously on every clock tick. • Some of the nodes are also linked to resources, i.e., processors, data memory units, I/O processors, and I/O cache units. • Instead of locating the processors on one side of the network and the memories on the other, the resources are distributed more-or-less uniformly throughout the network.

  12. Tera MTA’S Interconnection Network • The interconnection network of one 256-processor Tera system contains 4096 nodes arranged in a 16*16*16 toroidal mesh • As the Tera architecture scales to larger numbers of processors p, the number of network nodes grows as p3/2 rather than as the p log p associated with the more commonly used multistage networks. For example, a 1024-processor system would have 32,768 nodes

  13. Multithreading on one processor Unused streams

  14. Multithreading on multiple processors

  15. Latency Tolerance In Tera MTA • The latency incurred in memory references is hidden by multithreading • As there may be up to 128 instruction streams (threads) and 8 memory references can be issued without waiting for the preceding ones, a latency of 1024 cycles can be tolerated • The lookahead allows threads to achieve peak performance. • Three operations (M, A, C) can be executed simultaneously per instruction per processor

  16. The Tera Idea: Higher investment in hardware yields improved utilization and reduces software overhead

  17. Tera MTA Applications • PULSE 3D, used for simulating real-time heartbeats to better treat heart diseases. • MSC Software’s NASTRAN, a structural analysis code used extensively by the automobile and aerospace industries. • Livermore Software's LS-DYNA, which can simulate physical occurrences such as car crashes and metal stamping. • GAUSSIAN 98, a computational chemistry application used in molecular modeling. • MPIRE (for Massively Parallel Interactive Rendering Environment), a powerful graphics and animation application that visualizes complex phenomena. • Used in seismic analysis, national security and weather forecasting.

  18. Advantages of Tera MTA • Tera MTA uses multiple contexts to hide latency • Tera machines perform a context switch every clock cycle • Both pipeline latency and memory latency are hidden in the Tera approach • The thread creation is very cheap • With 128 contexts per processor, a large number(2k) of registers must be shared finely between threads • As long as there is plenty of parallelism in user programs to hide latency and plenty of compiler support, the performance is potentially very high. • The advantages of Tera's architecture are available to users via minimal changes to their application code.

  19. Drawbacks of Tera MTA • The performance will be bad for limited parallelism, such as guaranteed low single-context performance. • A large number of contexts demands lots of registers and other hardware resources which in turn implies higher cost and complexity. • Finally, the limited focus on latency reduction and caching entails lots of slack parallelism to hide latency as well as lots of memory bandwidth; both require a higher cost for building the machine. • Bandwidth (not latency) limits practical MTA system size and large MTA systems will have expensive memory networks.

  20. Tera MTA: Tools Tera provides two powerful tools Traceview and Canal that allow the programmer to: • Understand how the compiler has multithreaded a program • How effectively the program actually utilizes the hardware.

  21. Customers • San Diego Supercomputer Center (SDSC) • Logicon, under a Naval research Lab • Tera computer company

  22. Tera MTA Macro Architecture

  23. Problems Solved using Tera MTA • irregular memory access patterns • Synchronization among threads • load balancing

  24. Current Industry Status: Cray Inc (ex-Tera) • 1972:Est. by Seymour Crayin Minnesota, USA • 1976:First Cray-1 shipment to Los Alamos • 1980s: Ship follow-on products • Cray XMP, Cray YMP, Cray-2 • 1990s: More follow-on products • Cray C90, Cray J90,Cray T3D • Cray T90, Cray T3E, Cray SV1 • 1996: Merged with Silicon Graphics(SGI) • 1987:Est. by Burton Smith in Washington, USA • 1988:Software development starts • 1991:Hardware development starts • 1997: First MTA-1shipment to SDSC (San Diego • Supercomputer Center) • 2000:Purchased Cray business unit from SGI • Cray Inc. (Nasdaq NM: CRAY) Est.: April 1, 2000 (Tera Computer + Cray Research) HQ: Seattle WA, USA Products: Supercomputers (Vector, Micro Processor, Multithread) Market: Government, Industry, Academic Research

  25. Cray Inc. (2000–present; result of merger between Tera Computers and Cray Research) • Cray SX-6 • Cray MTA-2 • Cray SV1 • Cray Red Storm • Cray X1 • Cray XD1

  26. Cray MTA-2 , Multi-threaded Architecture 128 Virtual Processors in a CPU module Up to 1TB Scalable Shared memory Zero Overhead Thread Switching

  27. Cray MTA-2 Overview Multithreadsystem Cray MTA-2

  28. Unique capability of Cray MTA Visualization of Nebula using MPIRE Application on Cray MTA system

  29. References • http://www.hoise.com/vmw/00/articles/vmw/JH-VM-01-00-1.html • http://www.cs.njit.edu/pact/eight/tutorial/tera.html • http://techreports.larc.nasa.gov/icase/1998/icase-1998-interim33.pdf • http://www.bearcave.com/misl/misl_tech/venture_capital.html

More Related