CS252 Project Presentation Optimizing the Leon Soft Core

CS252 Project PresentationOptimizing the Leon Soft Core

Project Outline • Goal: Reduce the size of Leon on FPGAs • Our motivation for using Leon: • RAMP research: emulation of multiprocessors • Analysis: • LUT breakdown • Optimizations: • Circuit Level • Architectural Level

Leon Overview • 32-bit SPARC V8 compliant processor • 7 stage pipeline, in-order • Separate L1 Instruction & Data caches • Configurable cache size, associativity, replacement policy • Optional Memory Management Unit • AMBA bus interface to memory and peripherals • Supports Symmetric Multiprocessing • Open-source (Gaisler Research)

Area analysis • Configuration • MMU: Combined I/D-TLB, 2-entry only • Integer MUL/DIV enable • Cache: Direct-map I/D cache • Variables • DSU - Debug support unit • Target clock • 20 MHz - easy to achieve • 200 MHz - over constrained

Resource break down

Why it’s BIG • Debugging Support • More MUXes • One additional pipeline stage • Useful for RAMP emulation / bootstrapping • IU is over 50% • Barrel shifter • Pipeline control (forwarding)

Circuit Level Optimizations • Store LRU bits in Block RAMs instead of Flip Flops • Also saves LUTs • One-hot encoding for signals • Synthesis tool does a good job of 1-hot encoding for many signals (e.g., state encoding) • Applied this to the cache output • Instead of data(set), we can use data(0) or data(1) or data(2) or data(3) • Useful only for multiway caches • LUT savings: ~ 100 LUTs

Circuit Level Optimizations • Use fast-carry chain logic • Provided 30% savings in LUT usage for TLB entries • Multipliers for barrel shifter • Right shift by b is same as multiplication by 2^b • Savings of ~ 100 LUTs

LUTs for Integer Mul / Div • 2195 / 18429* for entire two core system (12%) • 11.5% of Leon3 core • *(Xilinx ISE)

Didn’t your mother teach you to share? • Savings of ~350 LUTs for prototype • Only multiplier shared • Only two cores • 10% could become 5%..2.5%...1%…. • Even more for MAC

Operand MUXes: 32 bit, 7 to 1 MUX 32 bit, 5 to 1 MUX

Operand MUXes • 313 LUTs + 64 MUX /each

Integer Pipeline Changes • Remove all forwarding • Single thread: Just stall • Fine Grain Multithreading could boost performance • LUTs saved: 27-37 % • Maximum Freq improvement: 20%

Conclusions • CAD tools already perform many optimizations • Remove unused logic • Infer technology dependent logic from HDL source, e.g. Fast carry chain logic • Optimize logic globally

Conclusions • Optimization is possible • Higher levels yield (much) greater savings • Circuit Level: 200-300 LUTs • Architectural Level: 1000+ of LUTs • Sharing: ~700 per core • Total: 35-40% savings

CS252 Project Presentation Optimizing the Leon Soft Core

CS252 Project Presentation Optimizing the Leon Soft Core

Presentation Transcript

Optimizing for Intel multi-/many-core architectures

Core Genesis Project

Conjoining Soft-Core FPGA Processors

The presentation project

Core Genesis Project

Core Banking Project

The Core Project

Leon County Schools Understanding the Common Core Standards Summer 2012

eatworms.swmed/~leon leon@eatworms.swmed

The Core PLATFORM Project

Core Class Presentation

The Core Project

Soft Copy Presentation State

Microblaze Soft Processor Core

Presentation of the project

CORE-Business Project

Presentation of the project

Conjoining Soft-Core FPGA Processors

soft ferrite core market