Liquid Architecture – R esults of Optimization of the LEON

John W LockwoodVisiting Associate Professor Stanford Universityjwlockwd@stanford.edu http://stanford.edu/~jwlockwd Slides from Raw 2004, 2006, by Shobana Padmanbhan, et. al. In Collaboration with: Ron Cytron, Roger Chamberlain, Jason Fritts David Schuehler, Phillip Jones, Scott Friedman, Ben Brodie, Huakai Zhang Funded by : NSF 03-13203 Washington University in St. Louis www.arl.wustl.edu/arl/projects/fpx/projects/liquid_arch/ Liquid Architecture –Results of Optimization of the LEON

Soft processors App performance • Parameterized general purpose processors FPGA resources Power Number of registers Set size Set Associativity • Customization is performance-cost tradeoff • More “knobs” more options for customization

Actual System Configuration

Soft processor customization • LEON: 10 reconfigurable subsystems • Instruction cache • Parameters: sets, set size, line size, replacement policy • 4 * 7 * 2 * 3 = 168 configurations (4 parameters; 16 values) • Data cache • sets, set size, line size, replacement, fast read, fast write, local RAM, local RAM size • 168 * 2 * 2 * 2 * 7 = 9,408 configns (8 params; 29 values) • Integer unit • multiplier, registers, fast jump, fast decode, ICC, load delay, FPU enable, co-processor enable, hardware watchpoints • 119,040 configurations (10 parameters; 56 values) • & Floating-point unit, memory controller, peripherals,… • In total • 190 parameter values; 5*(1024) configurations!!

Manual Adjustment of LEON Parameters

Highlights of optimization technique • Optimize and Bound Parameterize • Search space still includes all 5*(1024) configurations • Build only 100’s instead of 5*(1024) of configurations • Formulate as binary integer nonlinear optimization program • Measure actual cost and performance of resulting application performance andprocessor area

Cost measurement • Application runtime • Measured in cycles from direct execution • Hardware-based profiler providesnon-intrusive, cycle-accurate, near-real-time measurement • FPGA resources • Measured In terms as number of LUTs and BRAM, from actual build • Power / Energy : Work in Progress • FPL, FPT, VLSI conference ..

Optimization technique Start with soft processor: base configuration Perturb parameter values, build configuration, track resource cost Run application on each configuration, trackruntime cost Formulate costs as Binary Integer NonlinearProgram Solve using TOMLAB/MatLab

Processor ICache reconfiguration

Processor ICache reconfiguration xi = 0 or 1 (off or on)

Processor ICache reconfiguration xi = 0 or 1 (off or on) No constraint needed

FPGA resource constraints • LUTs • BRAM xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

Optimization • Optimize application runtime • Optimize resource utilization also xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

Problem formulation • Minimize • Subject to … … Parameter validity constraints FPGA resource constraints Binary variables constraint xi = 0 or 1 (off or on) ri,li,bi: delta costs from base configuration n is number of configurations

Evaluation Our technique selects the same configuration Despite parameter independence assumption, near-optimal configuration

FPX Module(Stacked below Enet card) Virtex w/LEON core Gigabit Ethernet Serial port Front-panelNetworkPort

Choose methods to profile from the user interface Method Time / Cycles Liquid architecture: cycle-accurate profiling for free .text main addQuery findMatch computeKey computeBase coreLoop fillQuery Rnd

Method Address Range Liquid architecture: cycle-accurate profiling for free .text main Lo addQuery 0x4000027C 0x400003EF Hi findMatch computeKey computeBase coreLoop fillQuery Rnd

Liquid architecture: cycle-accurate profiling for free Method Event Monitor Bus PC CLK .text Stats Module main 0x4000035A Lo addQuery 0x4000027C 0x400003EF Hi findMatch computeKey computeBase coreLoop fillQuery Rnd

Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi addQuery Counter findMatch INCR computeKey computeBase coreLoop fillQuery Rnd

Liquid architecture: cycle-accurate profiling for free Function Event Monitor Bus PC CLK .text Stats Module Lo main 0x4000027C 0x4000035A 0x400003EF ≤ ≤ Hi addQuery Counter findMatch INCR computeKey computeBase Lo 0x400005D8 0x4000035A 0x4000061F ≤ ≤ Hi coreLoop fillQuery Counter INCR Rnd

Cache behavior Hits and misses in LEON

Cache behavior These signals are fed into the Event Monitoring Bus

Cache behavior Statistics Module

Cache behavior Statistics Module Statistics Module counts events

Liquid architecture enables fast, accurate results Seconds: fast, but no cache performance data available

Liquid architecture enables fast, accurate results Days: so slow you wouldn’t do this on the whole program

Liquid architecture enables fast, accurate results ½ hour: Practical, reasonably fast, totally accurate

Illustration of all configurations being searched

Distribution of generated configurations

Port of Liquid Architecture to RAMP • Determine optimal architecture for applications • Determine effect on performance withadditional parameters of • Interconnect bandwidth • Interconnect latency • Automate process to vary RDL parametersfor Multi-Processor system

Liquid Architecture – R esults of Optimization of the LEON

Liquid Architecture – R esults of Optimization of the LEON

Presentation Transcript

Software Architecture

Deterministic Optimization Models

Process Optimization

Distributed Systems Architecture Presentation II

Safe Handling and Use of Liquid Nitrogen

5. STERILIZATION OF LIQUID MEDIA

Dynamic Optimization 03.378 Dynamic Optimization BP/LP (nur für das Wahlpflichtfach Umweltökonomie) 2st Di 10-12, Geoma

Liquid Liquid Extraction:

Yolimar’s Book of Art

Solvent Extraction Liquid-Liquid Extraction Prepared by Dr.Nagwa El- mansy Cairo University Chemical Engineering Departm

QUERY OPTIMIZATION AND QUERY PROCESSING

Liquid-Liquid Extraction Lecture 23

Introduction Background Distributed DBMS Architecture Distributed Database Design

CSF

(Nonlinear) Multiobjective Optimization

6 . Distributed Query Optimization

On-Chip Interconnect Trend and Design Optimization

Software Architecture

-Early Islamic Architecture- -Moorish Architecture-

Introduction of Revit Architecture, Structure, and System