200 likes | 345 Views
Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. Authors : Allan Snavely and Dean Tullsen. Presenter: Alexandra Fedorova Simon Fraser University. Super-scalar Processor. Issue slots. Super-scalar processor has multiple issue slots
E N D
Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Authors: Allan Snavely and Dean Tullsen Presenter: Alexandra Fedorova Simon Fraser University
Super-scalar Processor Issue slots • Super-scalar processor has multiple issue slots • A “slot” means we can issue/schedule an instruction • Many issue slots many instructions issued per cycle • This is possible, because we have many functional units time
Problem: Under-Utilization Issue slots • A single thread is not always able to fill all the slots • Slots are left unused – we waste energy! • One solution is speculative, out-of-order execution, but it is difficult to implement and has limitations. time
Simultaneous Multithreading Issue slots • An idea: fill unused slots with instructions from multiple threads • More instructions to choose from – more opportunity to fill the issue slots! time
Problem: Contention Resource sharing increases utilization, but also LEADS TO CONTENTION
Research Question How to enable sharing of processor hardware without causing contention?
Outline • Background and Problem Statement • Overview of the Idea • Challenge • Details of the Solution • Research Methodology • Results
Idea: Symbiotic Schedules • Assumption: OS has a queue of threads ready to execute. • Some threads compete less than others • Co-schedule threads that complete less CPU Scheduling queue
Challenge • How do we measure the degree of contention? • How do we identify co-schedules that have little contention? ?
Measuring Contention Background: IPC = instructions/cycle (measure of progress) IPCSMT – thread’s IPC on SMT processor: more contention lower IPCSMT IPCsingle– thread’s IPC running alone IPCSMT / IPCsingle– measure of symbiosis for a given thread
Weighted Speedup • Sum of symbiosis measures across all N threads:
Maximizing Symbiosis • How to achieve the best symbiosis online? • Proposal #1: • Run different thread combos • Measure their Weighted Speedup • Remember combos with the bestWeighted Speedup • Co-schedule them in the future. Problem? Cannot measure WS online!!!
Problem: Predicting WS • WS cannot be measured online • Only offline, in lab conditions • So we must estimate it • using metrics available online
Estimating WS • Obtain hardware performance metrics (available online) • Measure WS (available offline) • Observe correlation between metrics and WS • Build a model to predict WS
Estimating WS: Part I Run threads together, measure instructions, measure cycles 1. Measure IPCSMT , Run each thread in isolation, measure instructions, cycles 2. Measure IPCsingle 3. Compute WS = Σ (IPCSMT/IPCsingle)
Estimating WS: Part II 4. Measure online hardware metrics Run threads together, read hardware counters • AllConf • Dcache • FQ • FP • etc. WS 5. Correlate WS to each metric WS1 AllConf1 WS2AllConf2 WS3AllConf3 WS4AllConf4 ... AllConf 6. Metric with highest correlation is the best predictor
Result of the Model • We know which metric best predicts symbiosis (WS) • IPC • Dcache • FQ • Composite • Score • Measure Score online. If Score is high, there is high symbiosis.
Scheduler • Sample • Run many different co-schedules • Measure hardware counters • Optimize • Predict which co-schedules have high symbiosis: those with high Score • Schedule • Select co-schedules that are predicted symbiotic (with high Score)
Summary • New processor motivated a new problem: resource contention • Addressed by co-scheduling symbiotic threads • Challenge: which threads are symbiotic? • Solution: heuristic based on hardware counters • On average 9% speedup