160 likes | 318 Views
A Fully Buffered Memory System Simulator. FBsim 1.0. Rami Nasr -M.S. Thesis, and ENEE 759H Course Project Thursday May 12 th , 2005. Another Simulator?. Sim-DRAM exists and supports FB-DIMM. Why write another simulator? .
E N D
A Fully Buffered Memory System Simulator FBsim 1.0 Rami Nasr -M.S. Thesis, and ENEE 759H Course Project Thursday May 12th, 2005
Another Simulator? Sim-DRAM exists and supports FB-DIMM. Why write another simulator? • Sim-DRAM still had a few unworkable bugs in its FB-DIMM model when I began my study. • FB-DIMM is radically different than other memory architectures. New simulator => fresh start. • FBsim is made exclusively for simulating and studying the FB-DIMM architecture. Easier to study FB-DIMM with an exclusive simulator. • Different scheduler, mapping algorithm, approach, style, section of study in the FB-DIMM design space. • FBsim is ideal for simulating ‘unreasonably’ high memory request rates and studying channel saturation effects. • The two simulators can be used to validate each other’s results in FB-DIMM studies. • Writing a memory simulator was a great experience for me.
FBsim Overview • All code written from scratch. • Standalone product. Does not currently interface with CPU simulators or memory traces. Instead probabilistically models memory transactions according to user specifications. • => Does not actually store memory data • Written in ANSI C. ~5000 lines of code. Code organized into header files, commented, quite easy to hack. • Fast. For each memory channel, 1 second simulates ~10ms (or ~1ms during channel saturation) on a 2.4 GHz Pentium 4. • Supports Open & Closed Page Mode, Fixed & Variable Latency Mode. • Supports output of macro and micro (frame by frame) simulation data • Does not model channel init, maintenance, sync. overhead. • Does not model memory refresh. • Does not model power consumption, and power timing limitations (tFAW etc.). • The above options can be incorporated readily into future versions.
FBsim Overview 2 Channel Scheduler 0 Channel Scheduler 1 Input Transaction Generator Address Mapper • A Frame Iteration • Try to generate transactions • Map any generated transactions to its channel scheduler. • Fire each scheduler once. Channel Scheduler 7
Input Transaction Model • Step Distributions • Normal (Gaussian) Distributions
Bus Trace Viewer FBsim Model Input Transaction Model 2
Closed Page Mode Open Page Mode Address Mapping • Physical address must be mapped somehow to the right channel, DIMM, rank, bank, row, and column. • FBsim built to support different DIMM capacities, different channel capacities, even unbalanced configurations • => Algorithm needed to map incoming transaction to DIMM WHILE (a non zero row sum exists) { WHILE (visit each channel with a non zero row sum exactly once) { The next 'result' is channel DIMM with the highest number. Decrement that DIMM's number by 1. Decrement the row sum by 1. } } Modulus = 4+2+1+2 = 9
FB-DIMM Frame Format Review • SouthBound (SB) Frame could be a: • Channel Frame (not modeled in FBsim) • Command Frame (up to three DRAM commands, with only one command possible to each DIMM in the channel) • Command + Wdata Frame (holds one DRAM command, plus one DDR beat of write data) • NorthBound (NB) Frame could be a: • Channel Frame (not modeled in FBsim) • Read Response Frame (holds two DDR beats of returned read data)
1x8 achieved 7.9 GBps before saturating (82%) • 2x4 achieved 15.6 GBps (82%) • 4x2 achieved 31.3 GBps (82%) • 8x1 achieved 45.2 GBps (59%!) Some of my Results • Case Study Conclusion • With at least two DIMMs on each channel, performance scales very well in FB-DIMM • More than two DIMMs only increases capacity, not throughput • Adding each DIMM adds ~5ns average channel latency in FLM, and slightly over half that in VLM • In closed page mode, only 82% of peak theoretical throughput of a channel can be reached.
Some of my Results 2 • In Closed Page Mode with 2:1 read/write ratio, a reordering window of size ~12 transactions achieves best possible performance (channel saturation) for a FB-DIMM channel scheduler. Increasing window-size over this has no benefit. • The more skewed the read/write ratio, the bigger the scheduling window needs to be (at 4:1, its ~18). • In Variable Latency Mode, a reordering window of size ~20 achieves best possible performance.
Some of my Results 3 Micro-study shows that in Closed Page Mode, the FB channel can at most reach ~93% write data utilization on the SB, and ~84% read data utilization on the NB. Micro-study showed that FBsim channel utilization was slightly worse for non 2:1 read/write ratios (it was 2% worse for 4:1). FBsim scheduler can quite straightforwardly be made more adaptive to read/write ratio of transactions in scheduler.
Future Ideas with FBsim • I’m graduating this semester (if Dr Jacob and Mr (Dr?) Wang so please), and escaping to the corporate world. • => Writing a guide for FBsim along with some ideas for future work. Anyone who wishes to take over development is eagerly encouraged to. • If so, I would be happy to help get things rolling by email or in person. Feel free to access & use anything in FBsim or my thesis paper. • I strongly believe a very interesting paper or three can quite quickly come out of this research area (me)
Future Ideas with FBsim 2 • For credibility in a paper, add an interface between FBsim and a CPU simulator or memory traces. Run real benchmarks through FBsim. Compare and contrast these results with the transaction modeling results. • AND/OR add more functionality and provable realism to the transaction modeler. Study this. • Best yet, integrate FBsim into the Sim-DRAM package as an added option. • Add modeling for channel overhead, memory refresh overhead, error simulation and error handling, power consumption constraints and metrics. • Enhance adaptivity of FBsim scheduler to non 2:1 read/write ratios. • Experiment with address mapping algorithm and load balancing. • Experiment with different type scheduler implementations (eg. ones not based on pattern matching). *involved* • Study hardware constraints in FB-DIMM channel scheduling.
More Possible FB-DIMM Studies • Channel utilization and configuration trade-offs for Open Page Mode • Performance degradation of shrinking scheduler reorder window size • Relaxation on critical DRAM device parameters (density, nBanks, timing constraints, clock frequency) allowed by FB-DIMM architecture • OR optimizing the FB-DIMM architecture by increasing the SB and NB channel widths (adding lines) or bitrates, and maybe modifying the frame protocol • AMB is a logic device on a memory module!! Can add buffers, arithmetic units, processing power, etc…..
Special Thanks to.. • Dr Jacob for introducing me to the field and guiding my progress • David Wang for the course lectures and material