830 likes | 1.18k Views
On-Chip Communication (Architecture and Design). Sungjoo Yoo ISRC, SNU. Contents. Part 1 Introduction to on-chip communication On-chip communication architecture Software architecture Hardware architecture On-chip communication networks Part 2
E N D
On-Chip Communication(Architecture and Design) Sungjoo Yoo ISRC, SNU
Contents • Part 1 • Introduction to on-chip communication • On-chip communication architecture • Software architecture • Hardware architecture • On-chip communication networks • Part 2 • Analysis and optimization of on-chip communication network • On-chip communication design on unreliable interconnect • Open issues and summary
Introduction • On-chip communication design M1 M3 High-level functional specification M2 mP IP M1 M3 M1 SoC Implementation of on-chip communication architecture SW wr. HW wr. HW wr. Physical Communication Network
Designer’s Objectives and Problems • High-performance • What is the maximum bandwidth of wire? • What is the best suited OCA? • Low power consumption • What is the minimum energy required to send the given amount of data? • How to achieve the minimum energy? • Small HW/SW overhead • Interconnection and transceiver • Conflicting objectives • Trade-offs
Specification of On-Chip Communication • Abstraction levels of on-chip communication • Client/server level • Message level • Transaction level • Implementation level
Client/Server Level • Concept • Service request/provide relation • A client component demands a service from server(s). • Service provider component may not be fixed and can be determined dynamically • Object request broker (ORB) is needed. • Real example • Modem service • PDA device: baseband modem vocoder • Modem service can be Bluetooth, IEEE802.11, CDMA2000, GPS, etc. depending on the location of PDA device. • Indoor: Bluetooth or IEEE802.11 • Outdoor: IEEE802.11 (short range) or CDMA2000
Message Level • Concept • Components communicate with each other via messages. • Message sender/receiver are fixed. • A message can have any type of data. • Real example • PDA: In the CDMA2000 mode, the vocoder sends messages to the CDMA2000 modem. • A message has a frame of voice data and control info.
Transaction Level • Concept • Components are mapped on real processors. • Communication is mapped on abstract communication networks. • Communication protocols are fixed. • Transaction can be read, write, burst_read, burst_write, etc. • For each candidate of real communication networks, the transaction performance can be analyzed. • Real example • PDA: vocoder on a DSP, modem on an IP, candidate communication networks (AMBA, Sonics, IBM, ...) • Determine bus priorities, packet priorities, TDMA slot assignment, etc.
SW architecture HW architecture Implementation Level • On-chip communication architecture is implemented. • Software and hardware architecture mP, DSP Local memory w/ I/D caches Application SW Middleware HW IP OS Device drivers DMA Memory Processor local bus Adapter Adapter Adapter Communication network (OCBs w/ bridges, Sonics, packet/circuit switch, etc.)
On-Chip Communication Architecture • Software • Middleware, OS, device driver and ISR, memory instructions • Hardware • DMA, (bus) adapter, communication network (OCBs and bridges, packet network, etc.), memory
Software On-Chip CommunicationArchitecture • Middleware: CORBA, COM+, JAVA, BREW • Service resolution • ORB implementation • Dynamic reconfiguration of services needs to be supported. • 802.11 baseband modem in HW --> Bluetooth in SW • Operating system • Communication services • pipe, shared memory, semaphore, mutex, etc. • Supported as OS system calls
Software On-Chip Communication Architecture • Device driver and ISR • The device driver depends on OS and the processor • OS • Preemptive or not, interrupt or not, synchronization services (semaphore, lock var, …) • Processor • Bus width, register set, exception behavior, etc. • Memory instructions • Load/store, load multiple/store multiple instructions • Cache/virtual memory instructions in ARM v6 architecture
IP(mP) adapter Ch. adp Ch. adp Hardware On-Chip Communication Architecture • DMA (Direct Memory Access) • Block size • Adapter • Basic functionality: protocol conversion • E.g. VCI -- AMBA • Local communication architecture • Distributed bus arbitration/network routing: e.g. Sonics, packet switch network mP mP IP M4 M1 M1 M3 OS OS Adapter Adapter Adapter AMBA CoreConnect
Hardware On-Chip Communication Architecture • Communication network • On-chip bus • AMBA, CoreConnect, PI, etc. • Sonics mNetwork • On-chip communication network • Circuit switch • Philips • Packet switch • W. Dally (DAC01), Guerrir (DATE00)
Hardware On-Chip Communication Architecture • On-chip memory • Shared memory • E.g. external SDRAM in multimedia chips • Distributed memory w/ caches: e.g. Daytona architecture • Four 64-bit processing elements (PE’s) • Each PE • - 32-bit RISC with DSP enhancements • - 64-bit vector co-processor (four MAC’s) • Split-transaction bus • - Shared memory based on L1 cache snooping • - Caches reduce bus traffic. • Embedded RTOS dynamically schedules tasks. • 120mm2, 0.35m, 100MHz
Hardware On-Chip Communication Architecture • On-chip memory (cont’d) • On-chip implementation of linked list • Philips, DATE01 • Data transfer and storage exploration (DTSE) • IMEC • Focus on low power consumption and area of memory
On-Chip Communication Networks • Routing • Sonics mNetwork SiliconBackplane • Philips, Circuit Switch Network • Packet Switch Networks, Guerrir, DATE00 • Network topologies • Mesh, W. Dally, DAC2001 • Octagon, ST Microelectronics, DAC2001
Sonics mNetwork SiliconBackplane • On-chip bus • Time-division multiple access (TDMA)
Two-step Arbitration • Originally assigned module TDMAIf no bus access priority-based
Pipelined TDMA Bus Arbitration • Pipeline depth • Based on memory target latency at the desired clock frequency
Design Example: Carrier-Class VOIPProcessing Card DSP + CPU banks + IO + DRAM DSP: ~16 processors voice and modem protocols LEC CPU: ~4 processors Packet protocols Control (call setup) Hi BW SDRAM
Communication Bandwidth Requirements: Basic I/O IO traffic is low BW Data IO rates = 1000 ch x 64kb/s x 3 full duplex = 48MB/s (worst case) Data are buffered to SDRAM
Communication Bandwidth Requirements: Cache Updates • CPU cache swap • assuming 1.6MIPS/channel • Total BW requirements: • 48 + 600 + 320 = 968 (MB/s)
Derivative Design Example • Full G.168 LEC uses a specialized core • LEC has local 4MB memory • # of channels: 1000 2000 • Increased traffic • Bus width: 64 128 (bits)
Circuit Switch Network: Philips PROPHID Architecture • Focus on high-throughput signal processing for multimedia applications • Requirements • High computation capacity and high communication bandwidth • Performance and programmability • PROPHID • Heterogeneous multi-processor architecture consisting of general and application specific processors • General purpose processor • Control and low-medium signal processing • Application specific processors • High performance signal processing
PNX8500 PROPHID architecture
PROPHID: An Architecture Template For high throughput: ~ 10 Gbits/s and reconfigurable connection (switch matrix, 20 proc’s, 64MHz) Programmability and control app’s Control-oriented bus ~10 GOPS Autonomous tasks based on data-driven execution
PROPHID: Autonomous Execution of ADS Processors - Autonomous task execution on Application Domain Specific (ADS) processors - Steam-based execution - Data-availability determines the execution of tasks. - Master(CPU)-slave synchronization can be avoided.
Circuit Switch Network • Guaranteeing the throughput of streams with hard-real-time constraints in the PROPHID architecture. • Requirements of task execution on ADS processors • Time-interleaved task execution • Each task requires input/output FIFO’s.
High-Performance Communication Network in PROPHID Architecture space time time
A Generic Architecture for On-Chip Packet-Switched Interconnections, DATE 2000. • A scalable system-level interconnection template is presented.
A Generic Architecture for On-Chip Packet-Switched Interconnections • Bus-based architecture will not meet the bandwidth requirements, since • it is inherently non-scalable in terms of bandwidth • Bandwidth is shared by connected comp’s. • Multiple on-chip bus approaches like VSIA • case-specific grouping of IP’s • Not a truly scalable and reusable interconnection. • In this paper, a generic interconnection template is presented.
A Generic Architecture for On-Chip Packet-Switched Interconnections • Switching networks • Circuit switching • like PROPHID communication network • High performance • Drawbacks • lack of reactivity against rapidly changing comm. • E.g. data bursts in MPEG (worst case should be assumed.), random traffic between CPU master and slaves. • Packet switching • Packets are transferred by routers like Internet. • Routing decisions are distributed over the routers, the network can remain very reactive.
Packet Routing • Wormhole routing
Network Topology: Fat-tree Network • Ex. 16 terminals: 8 --> 8 communication • The terminals can be processors, DSPs, memory, etc. • - Routers are free to use any of the available paths • - Packet: a sequence of 32 bit words • - Packet payload may be of any size