140 likes | 250 Views
The ESW Paradigm. Manoj Franklin & Guirndar S. Sohi 05/10/2002. Observations. Large exploitable ILP, theoretically Close instructions dependent; parallelism possible further down stream Centralized resources is bad Minimizing comm cost is important. What about others?. Dataflow model
E N D
The ESW Paradigm Manoj Franklin & Guirndar S. Sohi 05/10/2002
Observations • Large exploitable ILP, theoretically • Close instructions dependent; parallelism possible further down stream • Centralized resources is bad • Minimizing comm cost is important
What about others? • Dataflow model + most general • unconventional PL paradigm • comm cost can be high • SS, VLIW (sequential) + temporal locality • large centralized HW • compiler too dumb • not scalable • ESW = dataflow + sequential
Design Goals • Decentralized resources • Minimize wasted execution • Speculative memory address disambiguation • realizability Replace large dynamic window with many small ones
How it works • Basic window • Single entry, loop-free, call-free block • Equal, superset or subset of basic block • Execute basic windows in parallel • Multiple independent stages • Complete with branch prediction, L1 cache, reg file…etc.
Dist Inst Supply Optimization: Snooping on L2-L1 Cache traffic
Dist Inter-Inst Comm • Architecture: • dist. future file • create/use masks for dep. check • Observation: • Register use mostly within basic block • The rest in subsequent blocks
Dist DMem System • Problem: • Addr. space large, can’t create/use mask • Need to maintain consistency between multiple copies • Solution: ARB
ARB • - Bits cleared upon commit • Restart stages when dependency violated • When load, forward values from ARB if already exists Q. What happens when ARB’s full?
Simulation Environment • Custom simulator using MIPS R2000 pipeline • Up to 2 inst fetch/decode/issued/ per IE • Up to 32 inst per basic window • 4K word L1 cache, 64KB L2 DM Cache (100% hit rate, what??) • 3-bit counter branch prediction
Results • Optimizations: • Moving up instruction • Expand basic window (in eqntott and expresso) Basic window <= basic block But is 100% cache hit rate reasonable?
Discussion • Compare this to CMP? RAW? • Does the trade-off strike a balance?
New Results (1) In order execution
New Results (2) Out of order execution