230 likes | 366 Views
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage. Group Members: Anderson Raid Marie Beltrao Raphael Christian. Outline. Introduction – Literature Review Coarse Grain CAM Objectives of the paper CAM based computing scheme
E N D
Reconfigurable Computing Using Content Addressable Memory (CAM)for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao Raphael Christian
Outline • Introduction – Literature Review • Coarse Grain • CAM • Objectives of the paper • CAM based computing scheme • MCB – hardware • Multi-MCB Communication • Application Mapping Process • Estimation of Cycle time and performance • Design and Organization of a Ternary CAM (T CAM) • Hybrid CAM-LUT • MCB - Delay and power components • SIMULATION RESULTS
Introduction – Literature Review • Traditional FPGA > Significant design overhead and poor scalability with process technology > LUT > 80% or more of power >> programmable interconnects • Multi-cycle Memory Based > Reduction in memory requirement > Little or no degradation in performance > CAM
Introduction – Literature Review • Fine grain x coarse grain • Fine control over bit-width • Bit-level operations • CAD tools Available • Flexible • Speed, Power Consumption • Time to Configure Less Routing. • Better Instruction Density. • Better cycle times. • Small configuration sizes. • Little CAD support • Less flexible!
Introduction – Literature Review Why coarse grain? • In order to achieve improvement in both performance and reliability of operation • Significantly reduce the configuration memory and time • Improve routing overhead and poor routability • Improve area and delay by minimizing the contribution of the programmable interconnects. • Spacial Computing + Multi-cycled Computing • (LUT) trade off (CAM)
Introduction – Literature Review What is CAM? • “Content Addressable Memory” • word length ranging from 36 to 144 bits • address space from 7 to 15 bits • access times as low as 0.25ns Embedded System Block (ESB) of the APEX20K from Altera Corporation incorporates such an embedded memory!! But cannot exploit the optimization obtained by consideration of don’t care terms
Objective of the paper Implement “(…) a multi-cycle Memory Based Computational methodology that utilizes Content Addressable Memory (CAM) as the underlying reconfigurable fabric” • Implement a large application efficiently • Proposes a CAM-based implementation of reconfigurable computing. • Discusses the circuit implementation and develops a scalable hardware framework that allows mapping of a large design to multiple computational units. • Proposes a hybrid LUT-CAM based function representation that can further optimize the memory requirement by selectively storing some partitions in CAM, while the others in LUT.
CAM based computing scheme Storages functional responses
MCB – hardwareMemory-based Computational Block • Store and evaluate up to 128 partitions, 32 in each bank, with each partition having 12 inputs and outputs.
Multi-MCB Communication Functional block diagram for memory based computing
Multi-MCB Communication • A MCB node alone has limited memory resource = scalability restrictions for larger applications • Multi-MCB communication tend to minimize interconnect overhead • Hierarchical interconnect architecture
Application Mapping Process • Partitioning • Greedy heuristic-based portioning approach = multi input-output logic blocks • It’s an optimization problem = evaluation time as objective and memory requirement as constrain
Application Mapping Process • Partitioning
Application Mapping Process • Scheduling • Multi-cycle evaluation at each MCB = heuristic-based algorithm for scheduling the execution of the partitions • Static Scheduling • Minimize the number of evaluation cycles
Estimation of Cycle time and performance • Simulations were carried out using 70nm technology model • It estimated cycle time for a LUT based MCB framework • Improvement of 56.3% in processing time • Cost of 23.6% increase in the energy/vector
Estimation of Cycle time and performance • The Performance improvement offered by the proposed framework was also validated for two algorithm-specific applications: • DCT: Discrete Cosine Transform • FIR: Finite Impulse Response
Design and Organization of a Ternary CAM (T CAM) • Allows pattern matching with the use of “don’t cares.” • Attractive for implementing longest-prefix-match searches in routing tables
Hybrid CAM-LUT • The proposed framework contain both PLA and LUT based representation and is advantageous for memory-efficient realization of all classes of function (hybrid CAM/LUT-based). • A hybrid approach can potentially improve the total memory requirement.
Questions? Thank you!