2008-12-23 progress

2008-12-23 progress Wen-Long Yang

Outline • An example of many-core communication – Tile processor • The interface of micro-kernel • The common & difference between DaCS & MCAPI

TILE PROCESSOR

Overview • On-chip Interconnection Architecture of the Tile Processor • IEEE Micro, 2007 • Tile processor overview (TILE64 launched in 2007) • Developed by Tilera and inspired by MIT’s Raw processor • Consisting of 2D grid of homogeneous general-purpose compute element, called tiles • The tiles are connected by 5 mesh networks to provide massive on-chip communication bandwidth • Support DMA between the cores and between main memory & the cores • Each link consists of 2 32-bit-wide unidirectional links • Each tile combines a processor, which implements a 3-way VLIW, and its associated cache hierarchy with a switch • Each tile operates at 1GHz • 4.8MB on-chip cache distributed among the processor

Block diagram

Interconnect hardware • 5 networks: • Static network (STN) • Doesn’t have packetized format but rather allows static configuration of the routing decisions at each switch • Let applications send streams of data to another tile • User dynamic network(UDN) • I/O dynamic network (IDN) • Memory dynamic network (MDN) • Tile dynamic network (TDN) • Only responsible for tile-to-tile cache requests

Receive-side hardware demultiplexing • UDN & IDN implement this functionality • Implemented by having several independent hardware queues with settable tags, which used to identify packets

Software interfaces • They provide a C-based iLib library to support communication via UDN • Raw channels: low-overhead but only with limited buffer size • Buffered channels: slightly higher overhead but with unlimited buffer • Socket-like channels • FIFO & point-to-point connection • Message passing • MPI-like • Except unlimited buffer, it also provides a message key to identify the message • Receivers can process message out-of-orderly • The messages would be saved until the receivers are ready • The synchronization of communication is managed by a messaging engine • Target to allow more flexible communication mechanism

Implementation • Raw channel • Reserve a demux queue for it directly • Buffer channel • When demux buffer is full, the demux trigger a interrupt handler to fill the data into memory • So read operations of receiver also need to check the buffer in main memory • Message passing • Also depend on interrupt to inform sender/receiver/messaging engine to do their work

Performance comparison • UDN hardware provides 4byte/cycle at maximum • For raw channels, max bandwidth is 3.93 bytes/cycle • For buffered channel, max bandwidth is 1.4 bytes/cycle • For messaging, max bandwidth is 0.97 bytes/cycle • Overheads • Buffered channel: interrupts & copies between on-chip cache and memory • Messaging: more frequent interrupt, identification of message keys, and copies between on-chip cache and memory

INTERFACE OF MICRO-KERNEL

API summary (1/2)

API summary (2/2)

Characteristics • Support MPI-like communication • Send a block of data to receivers • Support multiple senders and receivers in a function call • Allow batch transfers

THE COMMON & DIFFERENCE BETWEEN DACS & MCAPI

Common

Differences

Conclusions • MCAPI is more flexible than DaCS because its communication is identified by endpoint, not only by process ID or physical ID. • In many-core environment, MCAPI is more suitable.

2008-12-23 progress

2008-12-23 progress

Presentation Transcript

September 23, 2008

33:12-23

September 23, 2008

Chapter 12-23

June 23, 2008

January 23, 2008

September 23, 2008

April 23, 2008

September 23, 2008

May 23, 2008

Progress since 2008

September 23, 2008

4/23/12

July 23, 2008

October 23, 2008

EE100Su08 Lecture #12 (July 23 rd 2008)