120 likes | 262 Views
Performance Model for Future Multicore Process Designs. Yipkei Kwok 02/06/2008. A Non-Work-Conserving Operating System Scheduler For SMT Processors. Authors: A. Fedorova et. al Calculate optimal level of //ism of SMT Processors at run time Analytical model
E N D
Performance Model for Future Multicore Process Designs Yipkei Kwok 02/06/2008
A Non-Work-Conserving Operating System Scheduler For SMT Processors • Authors: A. Fedorova et. al • Calculate optimal level of //ism of SMT Processors at run time • Analytical model • Estimate the workload’s IPC for a given degree of concurrency • 1st id’fy performance bottle • Suppressing L2 misses improves performance the best
A Non-Work-Conserving Operating System Scheduler For SMT Processors • Factors • N • perf_cache_CPI(N) • L2_RMR • L2_WMR • L2_WBR_R • L2_WBR_W • WSC • L2_MCOST
Non-Work-Conserving Operating System Scheduler For SMT Processors • 2-phases scheduling • Preparation phase • Collect model inputs under full //ism • W./ hardware counters • Till the retirement of the 100 million-th instructions • Optimization phase • Estimate optimal N • Enforce it • Till … … • New locality phase
Limitations • 3-56% improvement but … .. • Empirical model based on UltraSparc T1 • SMT only • But expandable w./, hopefully, reasonable effort • Once expanded, performance prediction • What’re needed? • Extra factors?
What new factors? • Depends on systems to model • Shared-memory machine • Threaded // workloads • SMP of CMPs • SMT per core
What new factors? • Architecture • Homo/hetero cores • Difference in speed, or functionality • Level of cache sharing • Interconnects
What new factors? • Params • #(cores) • Cache size • Degree of set-associativity • #(cores) sharing a cache • Bus, ring, crossbar, tiny-network • Switching & flow mechanisms • Routing algos • Fault tolerance techniques
What new factors? • Protocols • Cache coherence protocol at dedicated/semi-shared cache • Algorithms • Block replacement algorithm • Algorithms of cache coherence and data consistency protocols
Potential uses • Performance prediction for future processors • Scheduler
Similar work exists? • Multi2Sim (2007) • Framework simulating the system working as a whole • Yet, app-only simulation • Evaluate multicore-multithreaded processors • 3 major components simulated • Core • Cache hierarchy • Interconnect • Note: source code available
Enough? • Limitations • Homogenous core • Topology • Bus only • W./ variable bus width though