240 likes | 365 Views
A New Methodology for Studying Realistic Processors in Computer Science Degrees. Crispín Gómez , María E. Gómez y Julio Sahuquillo DISCA. Technical University of Valencia DSI. University of Castilla -La Mancha. Outline. Motivation Simulator Proposed Methodology Case Study Conclusions.
E N D
A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA. Technical University of Valencia DSI. University of Castilla-La Mancha
Outline • Motivation • Simulator • Proposed Methodology • Case Study • Conclusions
Motivation • Astonishingly quick evolution of processor architecture: • Teaching should cover from the basics to the most realistic up-to-date concepts Superscalar POWER Out-Of-Order Execution In-Order Execution Multicore Manycore
Motivation • Current designs imply a big complexity • Out-Of-order complex cores • Multi-level memory hierarchy • On-chip Interconnection network
Outline • Motivation • Simulator • Proposed Methodology • Case Study • Conclusions
Simulator • Multi2Sim: multicore and multithreaded • X86 binary compatibility • Application-only • Free simulator: Open source project • http://www.multi2sim.org/ • Widely used on research • Academia • Industry
Simulator – Cores • CPU: 6-staged pipelined processors, out-of-order execution • Execution stage maybe customized to be multicycle. • Speculative execution • Three mutithreading paradigms are supported: • Coarse grain, fine grain, simultaneous multithreading • All microarchitectural parameters are customizable • Type of branch predictor • Issue width • Etc. • GPUs
Simulator – Memory Hierarchy • Complete memory hierarchy • Coherency: MOESI • Flexible hierarchy: # of memory levels and memory structures in each level • Each memory structure is fully customizable • #Sets • #Ways • Block size
Simulator – Interconnection Network • Interconnection network: • Any topology can be implemented • Forwarding tables routing (any routing algorithm can be used) • Each network element is fully customizable • Buffer size at switches • Link bandwidth
Outline • Motivation • Simulator • Proposed Methodology • Case Study • Conclusions
Proposed Methodology • Tries to motivate the students into processor architecture • Realistic examples • Increasing difficulty levels • Shared use in several courses • Develop basic skills for final projects, MS thesis or Ph.D thesis • Based on a progressive interaction with Multi2Sim • 4 learning phases with increasing difficulty due to the simulator’s complexity
Proposed Methodology • 1st phase: Simulation parameters modifications ( at labs) • Configure the system components • Launch simulations • Analyze the effects of the parameters on the system performance
Proposed Methodology • 2nd phase: Modify small pieces of code • Very small and bounded fragments of source code • Completely guided by the instructors • Modification of a provided baseline • Examples: Branch predictor, prefetch mechanisms,… • Final work of the course
Proposed Methodology • 3rd phase: Implementation complete functionalities • Consolidated simulator skills • Development of functionalities from scratch • Examples: Memory controller, Stream-buffers based prefetcher,… • Final project or MS thesis • Some works have been published in top level conferences • 4th phase: Complete autonomy • The students are in a privileged position to start a Ph.D.
Outline • Motivation • Simulator • Proposed Methodology • Case Study • Conclusions
Case study • The methodology has been implanted at the UPV in two courses • Advanced Processor Architectures • Computer Science Degree and Master Degree • Networks on-chip • Master Degree • We have defined several learning stages with the simulator • Baseline system modeling • Execution of standard benchmark suites • Prefetching mechanisms implementation
Case study • Baseline system modeling
Case study • Baseline system modeling • Detailed explanation of the configuration for • Memory • Cores • Interconnection network • Sample configuration files are used
Case study Benchmark Execution • Parallel (Splash 2) • Multiprogrammed mixes (Spec) • Performance study (IPC, Execution Time, Network latency) varying L2 block size
Case study • Prefetching mechanisms implementation • Base simple prefetching mechanism provided • OBL (One Block Look-ahead) on L2 miss • Modification to this mechanism • N-block sequential • N-block with regular stride
Case study • Results • This year 2 final projects have been performed in memory controller and prefetching • Results from these projects are expected to be sent to first level international conferences • These projects are expected to be evolved into MS thesis • Results projection is based on the experiences from previous year, in which results from the projects were accepted in PACT and IPDPS conferences
Outline • Motivation • Simulator • Proposed Methodology • Case Study • Conclusions
Conclusions • We have reduced the gap between theoretical contents on Computer Architecture topics and real processors • By using a well-established CMP-simulator in the international research community • Methodology based on an increasing degree of difficulty • First steps are very guided by instructors • Students are encouraged to go ahead to more complex implementations • Methodology + simulator = good platform for future works as the range of design choices is very wide
Thanks you for your attention Crispin.Gomez@uclm.es