RAMP Gold

RAMP Gold RAMPants {rimas,waterman,yunsup}@cs Parallel Computing Laboratory University of California, Berkeley

A Survey of μArch Simulation Trends Typical ISCA 2008 papers simulated about about twice as many instructions as those in 1998. So what?

A Survey of μArch Simulation Trends Something seems broken here…

A Survey of μArch Simulation Trends Something is clearly broken here.

Something is Rotten in theState of California • A median ISCA ‘08 paper’s simulations run for fewer than four OS scheduling quanta! • We run yesterday’s apps at yesteryear’s timescales • And attempt to model N communicating cores with O(1/N) instructions per core?! • The problem is that simulators are too slow • Irony: since performance scales as sqrt(complexity), simulated instructions per wall-clock second falls as processors get faster

RAMP Gold: Our Solution • RAMP Gold is an FPGA-based, 100 MIPS manycore simulator • Only 100x slower than real-time • Economical: RTL is BSD-licensed; commodity HW

Our Target Machine 64 cores SPARC V8 CORE SPARC V8 CORE SPARC V8 CORE SPARC V8 CORE … I$ D$ I$ D$ I$ D$ I$ D$ Shared L2$ / Interconnect DRAM

RAMP Gold Architecture • Mapping the target machine directly to an FPGA is inefficient • Solution: split timing and functionality • The timing logic decides how many target cycles an instruction sequence should take • Simulating the functionality of an instruction might take multiple host cycles • Target time and host time are orthogonal

Function/Timing Split Advantages • Flexibility • Can configure target at runtime • Synthesize design once, change target model parameters at will • Efficient FPGA resource usage • Example 1: model a 2-cycle FPU in 10 host cycles • Example 2: model a 16MB L2$ using only 256KB host BRAM to store tags/metadata

Host Multithreading Build 64 pipelines Time-multiplex one pipeline F D X M W F D X M W F D X M W F D X M W F D X M W F D X M W F D X M W … 64 pipelines time … • Single hardware pipeline with multiple copies of CPU state • No bypass path required • Not multithreaded target! F D X M W F D X M W How are we going to model 64 cores? time

Cache Modeling tag index offset max associativity … tag, state tag, state tag, state = = = hit : don’t stall miss : stall arbitrary cycles The cache model maintains tag, state, protocol bits internally Whenever the functional model issues a memory operation, the cache model determines how many target cycles to stall

Putting it all together instruction cache ifetch stage decode stage register access stage memory stage data cache exception stage memory controller cache model & performance counters • Resource Utilization (XC5VLX110T) • LUTs – 14%, BRAM – 23% • We can fit 3 pipelines on one FPGA!

Infrastructure

Our accomplishments this semester

HARDware ain’t no joke

Sample Use Case: L1 D$ Tradeoffs • Assume we have a 64-core CMP with private 16KB direct-mapped L1 D$ • In the next tech gen, we can fit either of these improved configurations in a clock cycle: • 32KB direct-mapped L1 • 16KB 4-way set-associative L1 • Which should we choose?

Sample Use Case: L1 D$ Tradeoffs Evidently, the associative cache is superior It took longer to make these slides than to run these 10+ billion instruction simulations

Future Directions • RAMP Gold closes two critical feedback loops • Expedient HW/SW co-tuning is within our grasp • Simulations can now be run on a thermal timescale, enabling the exploration of temperature-aware scheduling policies • We intend to explore both avenues!

DEMO: Damascene Image Convert Colorspace Textons: K-means Intervening Contour Bg Cga Cgb Texture Gradient Generalized Eigensolver Combine Oriented Energy Combination Non-max suppression Combine, Normalize Contours

RAMP Gold

RAMP Gold

Presentation Transcript

RAMP INSPECTIONS

RAMP

ADA Ramp Construction

RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors

Ramp Grinding

Ramp it up!

RAMP Gold Update

RAMP-White

RAMP Team

Chromaticity during ramp

RAMP Gold Wrap

2395 – bare ramp

Ramp

RAMP

Xilinx RAMP donations

ramp

ramp filter

RAMP-White

RAMP Blue Status

RAMP Infrastructure

Ramp Loading

Ramp Presentation