Chapters 11 and 15 I/O and Bus Interface

Chapters 11 and 15I/O and Bus Interface • Memory-mapped I/O concept • Bus interface • Review on address, data and control buses • Advanced topic: split-transaction bus • ARM AMBA3 bus case study

I/O Device Interface • Physical connections • Exactly analogous to memory device interface • I/O Mapping Methods • Memory-mapped I/O  uses range of addresses • Isolated (I/O-mapped, direct) I/O  uses IN, OUT • I/O Methods • Programmed I/O  uses a polling program • Interrupt-driven I/O  I/O device sends interrupt • Direct memory access (DMA)  uses DMA Controller Isolated I/O Memory-mapped I/O

I/O Device Connection • I/O device can be treated like a memory device • For memory-mapped I/O, use the same method • If address is in memory, memory device is enabled • If address is in an I/O device, that I/O device is enabled • For I/O-mapped I/O, use an extra I/O enable signal • Example: Intel 8086 • IORC  I/O read control (active low) • IOWC  I/O write control (active low)) • Handshaking (REQ, ACK, READY, etc.) may be necessary to coordinate timing of data transfers

Bus System • Buses used in computer system • - Address bus • - Data bus • - Control bus • MWTC(memory write) • MRDC(memory read) • IOWC(IO write) • IORC(IO read)

Pin Descriptions for Intel 8086 • Important (and frequently used) pins • Address pins: A19 through A0 • Several pins have dual uses • AD7 through AD0, A19/S6 through A16/S3 • ALE pin: if ALE=1, AD=address, else AD=data • Data pins: D15 through D0 (bidirectional) • Read enable pin: RD (active-low) • Write enable pin: WR (active-low) • IO / M : selects memory or I/O access • This type of notation commonly used when selecting one of two modes of operation

Interrupt related pins • INTR: interrupt request • INTA: interrupt acknowledge • NMI: non-maskable interrupt (uses vector 2) • RESET • Provides a soft reset of the CPU • Minimum 4 cycles of high • CLK: system clock signal • READY: • If (READY = 0) CPU enters into “wait” state • HOLD and HLDA: used for I/O processing using DMA (direct memory access) • Suspends access to 3 main busses by CPU

prepare for next cycle (precharge memory) General Bus Cycle ALE, DT/R, IO/M RD, WR, DEN Memory or IO operation WRITE If READY=0 here, a WAIT state is inserted between T2 and T3. READ sample

Advanced Topic • Split transaction bus case study • ARM AMBA3 (AXI) bus • Widely used in mobile and home system-on-chips, e.g., mobile application processor for smart phones • The most popular bus used in current mobile chips

[Source: AXI Spec] Interconnect, Interface & Channel

AXI Master AXI Slave Separate Read / Write Channels • AMBA AXI allows for independent read and write transactions. Write Address/Control AWREADY Write data WREADY Response BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction • Address, data, and response are handled separately. Write Address/Control AWREADY Write data WREADY Response BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction: Write (1/3) Master issues address Write Address/Control AWREADY Write data WREADY Response BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction: Write (2/3) Master gives data Write Address/Control AWREADY Write data WREADY Response BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction: Write (3/3) Write Address/Control AWREADY Slave acknowledges Write data WREADY Response BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction: Read (1/2) Write Address/Control AWREADY Write data WREADY Response Master issues address BREADY Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Split Transaction: Read (2/2) Write Address/Control AWREADY Write data WREADY Response BREADY Slave returns data Read Address/Control ARREADY Read data RREADY

AXI Master AXI Slave Wire Counts • Address 32b, data 32b bus case: 184~204 • AW: 52~56, W: 39~43, B: 4~8, AR: 52~56, R: 37~41 Write Address/Control 52~56 AWID[3:0] write addr ID (0~4 bits) AWADDR[31:0] write addr ID AWLEN[3:0] burst length AWSIZE[2:0] burst size Payload AWBURST[1:0] burst type AWLOCK[1:0] lock info AWCACHE[3:0] cache type AWPROT[2:0] protection type AWVALID write address valid Handshake signals AWREADY write address ready Write data 39~43 WID[3:0] write ID tag, AWID = WID (0~4 bits) WDATA[31:0] write data WSTRB[3:0] write strobes WLAST write last WVALID write valid WREADY write ready

D22 D23 One Address for Burst • Separation of address and data channel • Masterprovides the start address ofburst • Slave needs to generate the remaining addresses based on burst type (FIXED, INCR, WRAP) ADDRESS A11 A21 A31 DATA D11 D12 D13 D14 D21 D31

Burst Length, Size and Type

Benefit of Split Transaction:Multiple Outstanding Requests • Parameters for multiple outstanding requests • Master I/F: Issuing capability  master가generation 할수있는outstanding request의개수 • Slave I/F: Acceptance capability  slave가 받아 들일 수 있는 outstanding request의 개수 ADDRESS A11 A21 A31 DATA D11 D12 D13 D14 D21 D22 D23 D31 *D21,D22,D23 의 delay 감소

[B. Jacobs, 2002] tRCD CL tRP Simplified DRAM Operations BL Row Decoder Row Address WL • Three key commands • Row access (Activate or ACT): tRCD • Column access (RD/WR): CL • Precharge (PRE): tRP Sense Amplifier Column Address Column Decoder / Buffer Data In/Out

[Source: D. Lee, 2008] DDR SDRAM Banks Row Dec bank row • Three dimensions: bank, row, and column Row tRP tRCD Row buffer column Col Dec tCL Address Data • Memory access latency • E.g., 3-3-3: 3 cycles for each of ACT, RD/WR, & PRE RD D RD D RD D ACT ACT ACT PRE PRE PRE

D22 D23 Effects of Multiple Outstanding Requests ADDRESS A11 A21 A31 DATA D11 D12 D13 D14 D21 D31 tRCD, tCL tRP,tRCD, tCL Bank 0 Bank 1 Bank 2 ADDRESS A11 A21 A31 DATA D12 D13 D14 D21 D22 D23 D31 D11 tRCD, tCL tRP,tRCD, tCL

Read Burst Operation Read request is initiated Data read is ready 1st data is transferred The last data is transferred Read request is accepted Note: data transfer only when valid = ready = 1

Overlapping Read Bursts Read request A is accepted Read request B is accepted via AR channel while data A(0) is transferred via R channel

Write Burst Operation Write request A is accepted Response completes write operation

An Architecture in Mobile Chip CPU Video Decoder 3D Graphics LCD Control Video Process I/O Buffer Bus Memory Multiple masters access the shared memory, e.g., DRAM How to arbitrate their accesses?

Arbitration SchemeFixed Priority

[Source: PL301 TS] Arbitration SchemeRound Robin

Arbitration SchemeHybrid • Combination of round robin and fixed priority

[Source: PL301 TS] Arbitration Scheme • LRG (least recently granted) scheme

Data Bus • Write strobe • Narrow transfer Incrementing burst case

Data Bus (Cont’d) • Narrow transfer • An easy way of unaligned write, use write strobe Incrementing burst case

Unaligned Transfer

Unaligned Transfer (Cont’d)

Unaligned Transfer (Cont’d) • Wrapping burst case

A21 A31 D21 D22 D23 D31 Out-of-Order Transaction • Transaction ID is used to identify data transfer at all channels • ARID <-> RID and AWID, WID <-> BID • Up to four bits • Ordering by transaction ID • Master needs to finish data transfers with the same transaction ID in the order of request issue. • Slave can handle data transfers with different transaction IDs out-of-order ADDRESS A11 RDATA D11 D12 D13 D14

A Deadlock Problem in Accessing Multiple Slaves On-chip Bus D Memory Controller 1 Memory 1 Master 1 A C Memory Controller 2 Memory 2 Master 2 B x D is blocked at master 1 Color (= Transaction id) Requests with the same transaction id need to be finished in the order of request issues Optimization in Memory Controller Memory controllers can serve independent requests out-of-order to increase memory utilization or to lower memory access latency Memory 1 D A Memory 2 B C x B is blocked at master 2 Master 1 C D Master 2 A B Time

Cyclic Dependency Schemes • Outstanding requests are permitted only for a single slave per transaction id • The deadlock problem is resolved while limiting parallel (memory) accesses Memory 1 A D Memory 2 C B Master 1 C C D D Master 2 A B A B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Single Slave Scheme • Allow multiple outstanding transactions only to the same slave

Unique ID Scheme • Accept only out-of-order requests, i.e., requests with different transaction ID’s

Single Slave per ID • Combination of both single slave and unique ID schemes • Allow multiple outstanding requests to a single slave per transaction ID

Appendix

An Example of QoS Requirement 40% of total BW CPU Video Decoder 3D Graphics LCD Control Video Process Mixer DMA Bus Memory Controller

[Source: PL301 TS] Programmable QoS Maximum # of requests allowed for best effort traffic Assume Tidemark = 4, and ID match = M0. If there is 4 outstanding requests for M1, then only requests from M0 are accepted by S0 until one of M1’s requests is served

Chapters 11 and 15 I/O and Bus Interface