710 likes | 725 Views
This chapter explores the implementation of Inverse Discrete Cosine Transform (IDCT) on FPGA modules and the design of AMBA I/O interface. It also introduces the concepts and equations associated with 2D IDCT. The chapter discusses the use of vector processing for efficient IDCT implementation.
E N D
Chapter 6 FPGA Modules and Hardware Interface Design Professor Tzyy-Kuen Tien E-mail: tktien@mail.stut.edu.tw Http://www.eecs.stut.edu.tw STUT/EE
Outline 6.1 The Implementation of IDCT on FPGA 6.2 AMBA I/O Interface Design 6.3 I/O Interface Design
6.1 The Implementation of IDCT on FPGA 6.1The Implementation of IDCT on FPGA 6.2 AMBA I/O Interface Design 6.3 I/O Interface Design
6.1 Compression/Decompression System • A block diagram of a compression/decompression system. • DCT/IDCT can be used in the system to reduce the bandwidth requirements.
6.1 Introduction to IDCT • Inverse Discrete Cosine Transform. • IDCT is used to decompress DCT compressed data in the decoder. • IDCT is one of the most computation-intensive parts of the MPEG decoding process. • A fast, hardware based IDCT implementation is crucial to speed the MPEG decoding process.
M-1 N-1 c(p)c(q) Π(2m+1)p Π(2n+1)q ·cos XCpq = ∑ ∑ XNmn· ·cos 4 2N 2M m = 0 n = 0 (EQ 1) 6.1 2D IDCT Equations (1/2) • The algorithm used for the calculation of the 2D IDCT coefficients is based on the following equation: • First, the 1D DCT of the rows are calculated and then the 1D IDCT of the columns are calculated.
(2·col number + 1) · row number ·Π C = K · cos 2 · M √1 for row = 0, K = N K = for row ≠ 0 N √2 (2·row number + 1) · col number ·Π Ct = K · cos 2 · N √1 for col = 0, K = M √2 K = for col ≠ 0 M 6.1 2D IDCT Equations (2/2) • The 1D IDCT coefficients for the rows and columns can be calculated by separating equation 1 into the row part and the column part. (EQ 2) (EQ 3) M = total number of columns, N = total number of rows.
23170 23170 23170 23170 23170 23170 23170 23170 32138 27246 18205 6393 –6393 –18205 –27246 –32138 30274 12540 –12540 –30274 –30274 –12540 12540 30274 27246 –6393 –32138 –18205 18205 32138 6393 –27246 23170 –23170 –23170 23170 23170 –23170 –23170 23170 18205 –32138 6393 27246 –27246 –6393 32138 –18205 12540 –30274 30274 –12540 –12540 30274 –30274 12540 6393 –18205 27246 –32138 32138 –27246 18205 –6393 C = 23170 32138 30274 27246 23170 18205 12540 6393 23170 27246 12540 –6393 –23170 –32138 –30274 –18205 23170 18205 –12540 –32138 –23170 6393 30274 27246 23170 6393 –30274 –18205 23170 27246 –12540 –32138 23170 –6393 –30274 18205 23170 –27246 –12540 32138 23170 –18205 –12540 32138 –23170 –6393 30274 –27246 23170 –27246 12540 6393 –23170 32138 –30274 18205 23170 –32138 30274 –27246 23170 –18205 12540 –6393 Ct = 6.1 Constant Values of C and Ct • The constant values for C and Ct calculated from equations 2 and 3 are as follows:
RAM Double 1 D IDCT 1 D IDCT Buffer 6.1 2D IDCT using Vector Processing • A one-dimensional 8-point IDCT followed by an internal double buffer memory, followed by another one-dimensional 8-point IDCT provides the 2D IDCT architecture. • Vector processing using parallel multipliers is a method used for implementation of IDCT. • Advantages of vector processing method. • Regular structure, simple control and interconnect, good balance between performance and complexity of implementation.
23170 23170 23170 23170 23170 23170 23170 23170 32138 27246 18205 6393 –6393 –18205 –27246 –32138 30274 12540 –12540 –30274 –30274 –12540 12540 30274 27246 –6393 –32138 –18205 18205 32138 6393 –27246 23170 –23170 –23170 23170 23170 –23170 –23170 23170 18205 –32138 6393 27246 –27246 –6393 32138 –18205 12540 –30274 30274 –12540 –12540 30274 –30274 12540 6393 –18205 27246 –32138 32138 –27246 18205 –6393 x00 x01 x02 x03 x04 x05 x06 x07 x10 x11 x12 x13 x14 x15 x16 x17 x20 x21 x22 x23 x24 x25 x26 x27 x30 x31 x32 x33 x34 x35 x36 x37 x40 x41 x42 x43 x44 x45 x46 x47 x50 x51 x52 x53 x54 x55 x56 x57 x60 x61 x62 x63 x64 x65 x66 x67 x70 x71 x72 x73 x74 x75 x76 x77 X = C = 6.1 Behavioral Model (1/2) • The output Y of an 8‧8 IDCT for input X is given by Y = C‧X‧Ct, where C is the cosine coefficients and Ct is the transpose coefficients. • The equation can also be written as Y = Ct‧Z, where Z = X‧C.
6.1 Behavioral Model (2/2) Z(0,0) = 23170x00 + 32138x01 + 30274x02 + 27246x03 + 23170x04 + 18205x05 + 12540x06 + 6393x07 Z(0,1) = 23170x00 + 27246x01 + 12540x02 – 6393x03 – 23170x04 – 3213805 – 30274x06 – 18205x07 Z(0,2) = 23170x00 + 18205x01 – 12540x02 – 32138x03 – 23170x04 + 6393x05 + 30274x06 + 27246x07 Z(0,3) = 23170x00 + 6393x01 – 30274x02 – 18205x03 + 23170x04 + 27246x05 – 12540x06 – 3213807 Z(0,4) = 23170x00 – 6393x01 – 30274x02 + 18205x03 + 23170x04 – 27246x05 – 12540x06 + 32138x07 Z(0,5) = 23170x00 – 18205x01 – 12540x02 + 32138x03 – 23170x04 – 6393x05 + 30274x06 – 27246x07 Z(0,6) = 23170x00 – 27246x01 + 12540x02 + 6393x03 – 23170x04 + 32138x05 – 30274x06 + 18205x07 Z(0,7) = 23170x00 – 32138x01 + 30274x02 – 27246x03 + 23170x04 – 18205x05 + 12540x06 – 6393x07 Or: Z(k,0) = (23170xk0 + 30274xk2 + 23170xk4 + 12540xk6) + (32138xk1 + 27246xk3 + 18205xk5 + 6393xk7) = P01 + P02 Z(k,1) = (23170xk0 + 12540xk2 – 23170xk4 – 30274xk6) + (27246xk1 – 6393xk3 – 32138xk5 – 18205xk7) = P11 + P12 Z(k,2) = (23170xk0 – 12540xk2 – 23170xk4 + 30274xk6) + (18205xk1 – 32138xk3 + 6393xk5 + 27246xk7) = P21 + P22 Z(k,3) = (23170xk0 – 30274xk2 + 23170xk4 – 12540xk6) + (6393xk1 – 18205xk3 + 27246xk5 – 32138xk7) = P31 + P32 Z(k,4) = P31 – P32 Z(k,5) = P21 – P22 Z(k,6) = P11 – P12 Z(k,7) = P01 – P02 where k = 0, 2, …, 7
6.1 1D IDCT • The block diagram for the implementation of the 1D IDCT is shown below.
6.2 AMBA I/O Interface Design 6.1 The Implementation of IDCT on FPGA 6.2AMBA I/O Interface Design 6.3 I/O Interface Design
6.2 AMBA I/O Interface Design • Introduction to the AMBA buses • AMBA AHB bus • AMBA ASB bus • AMBA APB bus
6.2 Introduction (1/5) • What is AMBA? • The Advanced Microcontroller Bus Architecture specification. • An on-chip communication standard for designing high-performance embedded microcontroller. • Three distinct buses. • AHB (the Advanced High-performance Bus). • High-performance system backbone bus. • ASB (the Advanced System Bus). • An alternative system bus. • APB (the Advanced Peripheral Bus). • Minimal power consumption. • Reduced interface complexity.
6.2 Introduction (2/5) • Objectives of the AMBA specification. • To facilitate the right-first-timedevelopment of embedded microcontroller products. • To be technology-independent. • To ensure that highly reusable peripheral and system. macrocells can be migrated across a diverse range of IC processes. • To encourage modular system design. • To minimize the silicon infrastructure required for both operation and manufacturing test.
High-performance ARM-processor High-bandwidth On-chip RAM B R I D G E UART Timer High-bandwidth External Memory Interface AHB or ASB APB Keypad PIO DMA bus master AHB to APB Bridge Or ASB to APB Bridge 6.2 Introduction (3/5) • Typical AMBA system.
6.2 Introduction (4/5) • Feature
6.2 Introduction (5/5) • When to use AMBA AHB/ASB or APB. • A full AHB or ASB. • Bus masters. • On-chip memory blocks. • External memory interface. • High-bandwidth peripherals with FIFO interfaces. • DMA slave peripherals. • A simple APB interface. • Simple register-mapped slave devices. • Very lowpower interfaces where clocks cannot be globally routed. • Grouping narrow-bus peripherals to avoid loading the system bus.
6.2 AMBA I/O Interface Design • Introduction to the AMBA buses • AMBA AHB bus • AMBA ASB bus • AMBA APB bus
High-performance ARM-processor High-bandwidth on-chip RAM B R I D G E UART Timer High-bandwidth Memory Interface AHB APB Keypad PIO DMA bus master AHB to APB Bridge AMBA Advanced High-performance Bus (AHB) *High performance *Pipelined operation *Burst transfers *Multiple bus masters *Split transactions AMBA Advanced Peripheral Bus (APB) *Low power *Latched address and control *Simple interface *Suitable for many peripherals 6.2 A Typical AHB and APB System
HADDR HWDATA HADDR HRDATA HWDATA HRDATA HADDR HWDATA HRDATA Address and control mux HADDR HWDATA HRDATA HADDR HWDATA HRDATA HADDR Write data mux Read data mux HWDATA HRDATA HADDR HWDATA HRDATA 6.2 AMBA AHB Bus Interconnect • Multiplexor interconnection.
6.2 AMBA AHB Transfer Type • Transfer type encoding. • HTRANS[1:0] • 00 – IDLE • No data transfer is required. • 01 – BUSY • Bus masters insert IDLE cycles in the middle of bursts of transfers. • 10 – NONSEQ • The first transfer of a burst or a single transfer is initiated. • 11 – SEQ • The remaining transfers are in a burst. • The address is related to the previous transfer.
6.2 Slave Transfer Responses • Response encoding. • HRESP[1:0] • 00 – OKAY. • 01 – ERROR. • 10 – RETRY. • The signal shows the transfer has not yet completed, so the bus master should retry the transfer. • 11 – SPLIT • The slave will request access to the bus on behalf of the master when the transfer can complete. • If the response is the one among ERROR, RETRY and SPLIT, a two-cycle response is required.
6.2 AMBA AHB Bus Arbitration • Bus master grant signals. • The HGRANTx signal is only used by the master to determine when it owns the bus. HMASTER[3:0] Master #1 HGRANT_M1 HADDR_M1[31:0] Decoder Arbiter Master #2 HGRANT_M2 HADDR_M2[31:0] HADDR to all slaves Address and Control multiplex Master #3 HADDR_M3[31:0] HGRANT_M3
Select HSELx AHB slave HADDR[31:0] HWRITE Address and control HREADY HTRANS[1:0] HRESP[1:0] Transfer response HSIZE[2:0] HBURST[2:0] HWDATA[31:0] HRDATA[31:0] Data Data HRESETn Reset Clock HCLK HMASTER[3:0] Split-capable slave HSPLITx[15:0] HMASTLOCK 6.2 AMBA AHB Bus Slave • AHB bus slave interface.
HBUSREQx AHB master Arbiter HLOCKx Arbiter grant HGRANTx HTRANS[1:0] Transfer type HREADY Transfer response HADDR[31:0] HRESP[1:0] Address and control Reset HRESETn HWRITE HCLK Clock HSIZE[2:0] HBURST[2:0] HRDATA[31:0] HPROT[3:0] Data HWDATA[31:0] Data 6.2 AMBA AHB Bus Master • AHB bus master interface.
HBUSREQx1 AHB arbiter HLOCKx1 Arbiter requests and locks HBUSREQx2 HLOCKx2 HBUSREQx3 HGRANTx1 HLOCKx3 HGRANTx2 Arbiter grants HADDR[31:0] HGRANTx3 HSPLITx[15:0] Address and control HMASTER[3:0] HTRANS[1:0] HBURST[2:0] HMASTLOCK HRESP[1:0] HREADY HRESETn Reset HCLK Clock 6.2 AMBA AHB Arbiter • AHB arbiter interface.
6.2 AMBA I/O Interface Design • Introduction to the AMBA buses • AMBA AHB bus • AMBA ASB bus • AMBA APB bus
High-performance ARM-processor High-bandwidth on-chip RAM B R I D G E UART Timer High-bandwidth Memory Interface ASB APB Keypad PIO DMA bus master ASB to APB Bridge AMBA Advanced System Bus (ASB) *High performance *Pipelined operation *Burst transfers *Multiple bus masters AMBA Advanced Peripheral Bus (APB) *Low power *Latched address and control *Simple interface *Suitable for many peripherals 6.2 A Typical AMBA ASB-based Microcontroller • A typical AMBA system.
6.2 AMBA ASB Description • Basic flow of the bus operation. • The arbiter determines which master is granted access to the bus. • When granted, a master initiates transfers on the bus. • The decoder uses the high order address lines to select a bus slave. • The slave provides a transfer response back lines to the bus master and data is transferred between the master and slave.
6.2 ASB Transfers • Three types of transfer. • NONSEQUENTIAL • Used for signal transfers or the first transfer of a burst. • SEQUENTIAL • Used for transfers in a burst. The address of a SEQUENTIAL transfer is always related to the previous transfer. • ADDRESS-ONLY • Used when no data movement is required.
DSEL ASB slave Select BA[31:0] BWAIT Address and control BWRITE BERROR Transfer response BSIZE[1:0] BLAST BnRES Reset BD[31:0] BCLK Clock Data 6.2 AMBA ASB Bus Slave • ASB bus slave interface.
ASB master AREQ AGNT Arbiter grant BLOK Arbiter BTRAN[1:0] Transfer type BWAIT BA[31:0] Transfer response BERROR BLAST Address and control BWRITE Reset BnRES BSIZE[1:0] BCLK BPROT[1:0] Clock BD[31:0] Data 6.2 AMBS ASB Bus Master • ASB bus master interface.
DSEL1 ASB decoder BTRAN[1:0] Transfer type DSEL1 Selects ….. BA[31:0] DSELn Address and control BWRITE BSIZE[1:0] BPROT[1:0] BWAIT BnRES BERROR Transfer response Reset BCLK Clock BLAST 6.2 AMBA ASB Bus Decoder • ASB decoder interface.
ASB arbiter AGNTx1 AREQx1 Arbiter requests AGNTx2 AREQx2 Arbiter grants AGNTx3 AREQx3 BWAIT Wait BLOK Lock Reset BnRES BCLK Clock 6.2 AMBA ASB Bus Arbiter • ASB arbiter interface.
6.2AMBA I/O Interface Design • Introduction to the AMBA buses • AMBA AHB bus • AMBA ASB bus • AMBA APB bus
High-performance ARM-processor High-bandwidth on-chip RAM B R I D G E UART Timer High-bandwidth Memory Interface AHB or ASB APB Keypad PIO DMA bus master APB Bridge AMBA Advanced Peripheral Bus (APB) *Low power *Latched address and control *Simple interface *Suitable for many peripherals 6.2 A Typical AMBA-based Microcontroller • AMBA Advanced Peripheral Bus (APB).
6.2 Avtivity of the Peripheral Bus • State diagram. • IDLE • The default state for the peripheral bus. • SETUP • The bus moves into this state when a transfer is required. • The bus remains in the SETUP state for one clock and will always move to the ENABLE state. • PSELx is asserted. • ENABLE • PENABLE is asserted. • The address, write and select signals all remain stable during SETUPENABLE. • Glitch is acceptable during ENABLESETUP. No transfer Transfer Transfer No transfer
PSEL1 PSEL2 System bus slave interface . . . Selects PSELn APB slave PENABLE Strobe Address and control PADDR Read data PRDATA PWRITE PRESETn Reset PRDATA Write data PCLK Clock 6.2 AMBA APB Interface Design • APB bridge interface.
PSELx Select PENABLE Strobe PADDR Address and control PWRITE APB slave PRESETn Reset PCLK Clock PRDATA Read data PWDATA Write data 6.2 AMBA APB Slave • APB slave interface.
6.3 I/O Interface Design 6.1 The Implementation of IDCT on FPGA 6.2 AMBA I/O Interface Design 6.3 I/O Interface Design
6.3 I/O Interface • Provides a method for transferring information between CPU (or internal storage) and external I/O devices. • I/O devices connected to a computer need special communication links for interfacing them with the CPU.
6.3 Purposes of the Communication Link • Conversion of signal values. • The manner of operation for an I/O device may be different from the operation of the CPU. • Providing a synchronization mechanism. • The data transfer rate of I/O devices is usually slower than the transfer rate of the CPU. • Word format transformation. • Data codes and formats in I/O differ from the word format in the CPU. • The control of I/O devices. • To ensure the operation of an I/O device is not disturbed by another I/O devices.
6.3 I/O Bus and Interface Modules • The I/O bus consists of data lines, address lines, and control lines.
6.3 I/O versus Memory Bus • There are three ways that computer buses can be used to communicate with memory and I/O: • Use two separate buses, one for memory and the other for I/O. • Use one common bus for both memory and I/O but have separate control lines for each. • Use one common bus for memory and I/O with common control lines.
6.3 Isolated versus Memory-Mapped I/O • Isolated I/O. • Isolate all I/O interface addresses from the addresses assigned to memory. • Distinct input and output instructions for I/O transfer. • Memory-mapped I/O. • Use the same address space for both memory and I/O. • No specific input or output instructions. • The CPU manipulates I/O data with the same instructions that are used to manipulate memory words.
CS RS1 RS0 Register selected 0 X X None: data bus in high-impedance 1 0 0 Port A register 1 0 1 Port B register 1 1 0 Control register 1 1 1 Status register 6.3 Example of I/O Interface
6.3 Asynchronous Data Transfer • Asynchronous data transfer between two independent units requires control signals to transmit data. • Two different types of control mechanism for data transferring between two independent units . • Strobe control. • Handshaking.
6.3 Strobe Control (1/3) • The strobe control method employs a single control line to time each transfer. • The strobe may be activated by either the source or the destination.