130 likes | 292 Views
Out of Order SuperScalar. Ankit Sethia Daya Shanker Gaurav Chadha Kuldeep Singh. Basic Design. Out of Order (T3) 2 way SuperScalar Number of RS – 16 Number of ROB – 64 (tested for 8 as well). PRF entry – 64 ALU – 2 Multiplier - 2 System V erilog used for the design process.
E N D
Out of Order SuperScalar AnkitSethia DayaShanker GauravChadha Kuldeep Singh
Basic Design • Out of Order (T3) • 2 way SuperScalar • Number of RS – 16 • Number of ROB – 64 (tested for 8 as well). • PRF entry – 64 • ALU – 2 • Multiplier - 2 • System Verilogused for the design process. • Helpful in designing. • We had just 5 synthesis runs.
Advanced Features • 2 way superscalar • Instruction Prefetcher • Stride Prefetcher • RAS • Load Store Queue (4 loads, 4 stores) • BTB, Local Branch Predictor • Non-blocking D - Cache Attempted Features • Unconditional branch resolution in IF stage.
LSQ • Out of order load launch. • After dependency resolution with preceding stores. • Forwarding of data from store queue to load structure. • Load structure is not a queue • Auxiliary load queue for outstanding loads
DCache • Handles Hit under Miss and Miss under Miss. • Can support 16 outstanding load requests. • Highest priority to eviction, followed by current request, followed by outstanding misses. • Has the highest priority among requests to memory.
Features Contd. • Heavy instruction Prefetching • 60 at the max • Varied a lot • BTB/ Branch Predictor • 2 bit local branch predictor
Features Contd. • Unconditional branch resolution in IF-stage. • Calculate the next PC for br/bsr in the IF-stage • RAS • Lot of difficulties in implementing RAS
Stride Prefetcher • A data prefetching mechanism, which prefetches data from stride based access pattern. • Can handle upto four loads. • Keeps a table of four non-stride loads that may be present. • 3rd highest priority among requests to memory.
Results • Final clock period after synthesis 6.7 ns • All 33 benchmarks passed in simulation and synthesis • CPI varies from 0.59 – 5.00
Interesting Bugs • In the I-cache the input address doesn’t change but the data has changed. so the fetching stops • Eviction during branch squash - reason same reset. • Speculative load with invalid address returns a continuous Nack from the cache-controller.
Suggestions • System-Verilog was really helpful • Always_comb(no inferred latches ), always_ff • Don’t worry about wire, reg. Use logic type. • Structures, multiple dimensional arrays, literal assignment. • As queues are used a lot (ROB, LSQ, DCACHE, PREFETCHER). A robust queue could be given beforehand. • Faced problems with bottom up synthesis. This could be made as a tutorial section.