210 likes | 225 Views
Explore the Viterbi decoder implementation details, architecture overview, power metrics, design flow, SRAM generation, and simulation results in this comprehensive project report.
E N D
Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang
High Level Architecture 4%1%4% 23%36%29% 38%8%21% 2%1%4% 0%48%18% 18%4%15% 9%2%8% % Gates% Area% Power
Branch & Path Metric Generation U U U U U U U U L L L L L L L L • Branch Metrics Computation apparently implemented with a CORDIC block (contains 840 MUX’s, 58 adders & flip-flops, 32 15-bit busses) • Branch Metrics Hard-wired to each ACS unit • Path Metrics Stored in ACS units • Each ACS unit handles 16 states Hard-wired Path Metric Interconnect
ACS Architecture 8x9 SRAM PMU PMU BMU PML Pipeline Register BML CompareSelect Add PML MUX • Each ACS unit stores 32 path metrics • Only two SRAM’s are active at a time • Across all four ACS units, each path metric is stored twice • SRAM accounts for 88% of the area and 27% of the power for each ACS unit
Traceback Architecture PipelineRegister MUX SRAM DecisionBits Traceback Next_ramin Traceback Memory Unit 192 DecisionBits Out TracebackMemory Unit22% Area20% Power Finite StateMachine11% Area13% Power Traceback Unit • State-Machine blocks are just large sum-of products combinational networks(351 gates each) • Each memory unit contains a 16x64 SRAM and logic(192 MUX’s, 128 flip-flops)
Design Flow Synthesis & Module Generation Pre-Layout Verification & Analysis Floor Planning Place & Route Post-Layout Verification & Analysis • Design Compiler Synthesis script (from Mentor/Inventra) • SRAM Generator (from Norman Walker) • VHDL gate-level sims (timing verification, switching activity annotation) • PowerMill Simulations (SRAM, core) • Design Compiler, Power Compiler (Static timing, power analysis) • Floor Planning (Preview) • Place & Route (Silicon Ensemble) • Interconnect Parasitic Extraction (“report simcap”) • PowerMill simulations, PathMill static analysis • Design Compiler, Power Compiler (Static timing, power analysis with back-annotated interconnect parasitics)
Synthesis and SRAM Generation • Synthesis with Synopsys Design Compiler • Constraint: 66 kHz clock (effectively infinite) • Bottom-up synthesis of 62 VHDL entities • Low-Power SRAM generator (from Pleiades) • Very large sense-amps, control logic • Optimized for power, speed at low supply-voltages • Word-length limited to a power of 2
Simulation Models • Parameterized, bit-true, and fast • Used for system level design and BER simulations Behavioral C Behavioral VHDL • Parameterized, bit-true, and cycle-true • Used for structural simulations and test bench reference • Synthesizable, crafted for specific parameters and • implementation structure • Used for synthesis quality RTL VHDL
SRAM • Simulation Tools: TimeMill & PowerMill • Parameters • 66 MHz clock • Voltage 2.5V • Random Generated Test Vectors • Results • Power Analysis • Timing Analysis
SRAM: Power Numbers • SRAM used for ACS Unit • 8 words by 9 data bits Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 663.73 1.659 24.885 Write Activity 563.21 1.408 21.120 Read/Write 612.29 1.530 22.950 Parasitic Extraction Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 949.89 2.3747 35.6205 Write Activity 772.830 1.9320 28.980 Read/Write 851.42 2.1285 31.9275
SRAM: Power Numbers • SRAM used for Traceback Unit • 16 words by 64 data bits Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 2170.7 5.4267 81.4005 Write Activity 1893.4 4.7335 71.0025 Read/Write 2086.9 5.2172 78.2580
SRAM: Timing Numbers • Delays • Delays • Setup Time; Hold Time • time needed for data address to become stable Setup(ns) Hold(ns) Data Resolution(ns) ACS SRAM ~1 ~2 ~1.8 Traceback SRAM ~1 ~2 ~5
Place and Route • Floor planning of the Viterbi SRAM macro cells and standard cells was done in Preview, and Silicon Ensemble was used for routing. • Total SRAM macro cell area was 1.58 mm2 (1.08 mm2 with 9x8 SRAMs) • Area of the 16 9x8 bit SRAM macro cells: 0.052 mm2 each, 62% larger than required, as 16x8 bit SRAMs were used (SRAM generator output had been verified for powers of 2) • Area of the 3 16x64 bit SRAM macro cells: 0.25 mm2 each • Area of the standard cells 1.02 mm2 (0.35 mm2 from DEF file) • Final chip area was 4.0 mm2 (original estimate 2.5 mm2) • Parasitics for timing simulation were extracted from the final routed nets in Silicon Ensemble.
Wiring Statistics • Six metal layers, layers 5 and 6 used for power and ground respectively • Ground and power spaced alternately 100 um apart horizontally and vertically. • There were about 6200 nets and 46,114 vias. Total wire lengths: • metal layer 1: 3,293 um • metal layer 2: 458,440 um • metal layer 3: 510,517 um • metal layer 4: 218,023 um • metal layer 5: 96,882 um signal, and 38,400 um power • metal layer 6: 8,660 um signal, and 37,500 um ground • wire length: 685 mm horizontal, 611 mm vertical, total 1296 mm
Final Placement and Routing • Significant routing congestion at 16 by 64 bit SRAM outputs, due to Silicon Ensemble grid size of 1 um (observe white and light blue wires). • Minimum of 6 unroutable nets observed, even at 12 mm2 chip area. • Final size was 1.25 mm x 3.2 mm, 4 mm2, with 9 unroutable nets. • Violation reports in Silicon Ensemble did not identify which nets were unroutable, other than problems with ground and power connections.
Static Timing Checks • All timing checks performed with Design Compiler’s report_timing command • Parasitic capacitances back-annotated with the set_load command • No RC parasitics annotated • No SRAM model was used for timing checks • Critical Path was from ACS control logic, through a PM ouput MUX select signal (in an ACS unit), through the following ACS unit. • Checks performed at 2.5V
Static Power Checks • All timing checks performed with Design Compiler’s report_power command • Switching activity was measured for every output port (transition counts over 16,000-cycle simulation) • Back-annotation performed with SAIF files • No SRAM model was used for power checks (added in manually) • Checks performed at 2.5V w/ 60 MHz clock
Performance Results For fixed throughput requirement 100ksps:
Summary • Performance in intended operation (100ksps) • Clock Speed: 1.6 MHz • Power Dissipation: 0.14 mW • Power Density: 34.9 uW per mm2 • Cost • Die Size: 4 mm2 • Design effort: 30 work days • Predictability and portability • Mentor/Inventra predictions vs. measured results