1 / 18

2008-12-23 progress

This article discusses the communication interface of the Tile processor, including its on-chip interconnection architecture, hardware and software interfaces, and performance comparison with different communication mechanisms. The common and difference between DaCS and MCAPI are also discussed.

mckinneyb
Download Presentation

2008-12-23 progress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2008-12-23 progress Wen-Long Yang

  2. Outline • An example of many-core communication – Tile processor • The interface of micro-kernel • The common & difference between DaCS & MCAPI

  3. TILE PROCESSOR

  4. Overview • On-chip Interconnection Architecture of the Tile Processor • IEEE Micro, 2007 • Tile processor overview (TILE64 launched in 2007) • Developed by Tilera and inspired by MIT’s Raw processor • Consisting of 2D grid of homogeneous general-purpose compute element, called tiles • The tiles are connected by 5 mesh networks to provide massive on-chip communication bandwidth • Support DMA between the cores and between main memory & the cores • Each link consists of 2 32-bit-wide unidirectional links • Each tile combines a processor, which implements a 3-way VLIW, and its associated cache hierarchy with a switch • Each tile operates at 1GHz • 4.8MB on-chip cache distributed among the processor

  5. Block diagram

  6. Interconnect hardware • 5 networks: • Static network (STN) • Doesn’t have packetized format but rather allows static configuration of the routing decisions at each switch • Let applications send streams of data to another tile • User dynamic network(UDN) • I/O dynamic network (IDN) • Memory dynamic network (MDN) • Tile dynamic network (TDN) • Only responsible for tile-to-tile cache requests

  7. Receive-side hardware demultiplexing • UDN & IDN implement this functionality • Implemented by having several independent hardware queues with settable tags, which used to identify packets

  8. Software interfaces • They provide a C-based iLib library to support communication via UDN • Raw channels: low-overhead but only with limited buffer size • Buffered channels: slightly higher overhead but with unlimited buffer • Socket-like channels • FIFO & point-to-point connection • Message passing • MPI-like • Except unlimited buffer, it also provides a message key to identify the message • Receivers can process message out-of-orderly • The messages would be saved until the receivers are ready • The synchronization of communication is managed by a messaging engine • Target to allow more flexible communication mechanism

  9. Implementation • Raw channel • Reserve a demux queue for it directly • Buffer channel • When demux buffer is full, the demux trigger a interrupt handler to fill the data into memory • So read operations of receiver also need to check the buffer in main memory • Message passing • Also depend on interrupt to inform sender/receiver/messaging engine to do their work

  10. Performance comparison • UDN hardware provides 4byte/cycle at maximum • For raw channels, max bandwidth is 3.93 bytes/cycle • For buffered channel, max bandwidth is 1.4 bytes/cycle • For messaging, max bandwidth is 0.97 bytes/cycle • Overheads • Buffered channel: interrupts & copies between on-chip cache and memory • Messaging: more frequent interrupt, identification of message keys, and copies between on-chip cache and memory

  11. INTERFACE OF MICRO-KERNEL

  12. API summary (1/2)

  13. API summary (2/2)

  14. Characteristics • Support MPI-like communication • Send a block of data to receivers • Support multiple senders and receivers in a function call • Allow batch transfers

  15. THE COMMON & DIFFERENCE BETWEEN DACS & MCAPI

  16. Common

  17. Differences

  18. Conclusions • MCAPI is more flexible than DaCS because its communication is identified by endpoint, not only by process ID or physical ID. • In many-core environment, MCAPI is more suitable.

More Related