340 likes | 441 Views
Scheduling-based TM Contention Management A survey talk. Danny Hendler Ben-Gurion university. Danny Hendler. TM: a research toy?. TM. “Why STM can be more than a research toy.” Dragojevic et al., 2009. “ TM:Why it is only a research toy.” Cascaval et al., 2008.
E N D
Scheduling-based TM Contention ManagementA survey talk Danny Hendler Ben-Gurion university Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM: a research toy? TM “Why STM can be more than a research toy.” Dragojevic et al., 2009 “TM:Why it is only a research toy.” Cascaval et al., 2008 • There is consensus that TM performance must be improved. Specifically: • “In some workloads, performance degraded when we used too many concurrent • threads. One possible alternative to improving performance in these cases • would be to modify the thread scheduler so it avoids running more • concurrent threads than is optimal for a given workload, based on the information • provided by the STM runtime. “ [Dragojevic et al.] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM scheduling: rationale • Transactional threads controlled by TM-aware scheduler • Kernel-level, user-level • Richer “tool-box“ for reducing and/or preventing transaction conflicts Improve performance under high-contention Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Conventional” (non-scheduling) contention management Non TM-scheduledthreads TM system ContentionManager ContentionDetection Suicide Polite Karma arbitrate Abort/retry, wait greedy Aggressive proceed Polka “Polymorphic contention management”, [Guerraoui, Herlihy & Pochon DISC'05] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Conventional Contention Management is often problematic • Loser resumes execution after a waiting period • May resume execution too early • May resume execution too late • Repeated collisions occur under high contention • Livelocks • Performance may become worse than single lock Scheduling-based CM to the rescue. Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline • Preliminaries • The first TM schedulers • Later user-land work • Kernel support Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
The first TM schedulers • “Adaptive Transaction Scheduling for transactional memory systems” [Yoo & Lee, SPAA'08] • “CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory” [Dolev, Hendler & Suissa, PODC '08] • “Steal-on-abort: dynamic transaction reordering to reduce conflicts in transactional memory” [Ansari , Jarvis, Kirkham, Kotsedilis, Lujan and Watson, HiPEAC'09] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Adaptive Transaction Scheduling (ATS)(Yoo & Lee, SPAA'08) • A single scheduling queue • Per-thread Contention Intensity (CI) computed • Adaptive mechanism • CI below threshold transaction begins normally • CI above threshold transaction serialized (queued) Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
ATS: adaptive parallelism control An average of all the CIs from running threads Timeline flows from top to bottom Transactions begin execution without resorting to the scheduler As contention starts to increase, some transactions call the scheduler As more transactions get serialized, contention intensity starts to decrease Contention intensity subsides below threshold More transactions start without the scheduler to exploit more parallelism Behavior of a Queue-Based Scheduler ATS adaptively varies the number of concurrent transactions according to the dynamic contention feedback Yoo & Lee, Transaction Scheduling.
CAR-STM (Collision Avoidance and Resolution for STM) (Dolev, Hendler & Suissa, PODC'08) • Per-core transaction queues • Serialize conflicting transactions • Contention avoidance: attempt to avoid even first collision Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
CAR-STM high-level architecture Transaction thread T-Info Dispatcher CollisionAvoider TQ thread TQ thread Serializing contention mgr. Transaction queue #k Transaction queue #1 Core #1 Core #k Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Execution time: STMBench7*R/W dominated workloads (*) “STMBench7: a benchmark for STM”, [Guerraoui, Kapalka & Viteck., Eurosys'07]
Shortcomings of first TM schedulers • May restrict parallelism too much • ATS: a single serialization queue • CAR-STM: at most a single transactional thread per core • High overheads even in the lack of contention Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline • Preliminaries • The first TM schedulers • Later user-land work • Kernel support Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer • Avoid repeated collisions while minimizing over-serialization • No per-core queues • Adaptive • “On the impact of serializing contention management on STM performance”, [Heber, Hendler & Suissa., opodis'09] • “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) Transactional threads Conditionvariables Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) 1) t Identifies a collision 3) t changes status of t' to ABORT(writes that t is winner) t' t 2) t calls contention manager: ABORT_OTHER 4) t' identifies it was aborted Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) 6) Eventually t commits and broadcasts on its condition variable… t 5) t' rolls back transaction and goes to sleep on the condition variable of t t' Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) t' Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) Stabilization mechanism • Algorithm is adaptive • Serializing mode / “Conventional’’ mode • Prevents “mode-oscillations”: • Shifting to serialization-mode reduces perceived contention • Should use two thresholds Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd) Experimental evaluation Throughput of CM conventional algorithms is low Stabilization mechanism helps CAR-STM over-serializes compared with low-overhead serializer Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention/avoidance • Predicts future accesses based on past accesses • Read-set predicted based on past few committed/aborted TXs(temporal locality) • Write-set predicted based on immediately preceding aborted TX • Serialize with a probability proportional to the number of threads currently serialized (serialization affinity), if thread's success rate is low and a collision is predicted • “Preventing versus curing: avoiding conflicts in transactional memories”, [Dragojevic, Guerraoui, Singh and Singh, PODC'09] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention (cont'd) Serialize only if contention is high &a collision is “predicted” Don't serialize when contention is low Update statistics, release lock if you own it Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – experimental evaluationSTMBench7, read-write workload Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline • Preliminaries • The first TM schedulers • Later user-land work • Kernel support Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling Support for Transactional Memory Contention Management • Implement CM scheduling support in the kernel scheduler (Linux & OpenSolaris) • (Strict) serialization • Soft serialization • Time-slice extension • Different mechanisms for communication between user-level STM library and kernel scheduler • “Scheduling support for TM contention management”, [Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10] Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM Library / Kernel Communication via Shared Memory Segment (Ser-k algorithm) • User code notifies kernel on events such as: transaction start, commit and abort (in which case thread yields) • Kernel code handles moving thread between ready and blocked queues Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Soft Serialization • Instead of blocking, reduce loser thread priority and yield • Efficient in scenarios where loser transactions may take a different execution path when retrying • Priority should be restored upon commit or when conflicting transactions terminate Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Time-slice extention • Preemption in the midst of a transaction increases conflict “window of vulnerability” • Defer preemption of transactional threads • avoid CPU monopolization by bounding number of extensions and yielding after commit • May be combined with serialization/soft serialization Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation (STMBench7, 16-core AMD Opterom) Serializing by local spinning is efficient as long as threads ≤ cores Conventional CM deteriorates when threads>cores Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation - STMBench7 throughput Serializing by sleeping on condition var is best when threads>cores, since system call overhead is negligible (long transactions) All strict serialization schemes significantly reduce aborts Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Additional TM scheduling work • “Transactional scheduling for read-dominated workloads” [Attiya & Milani, OPODIS'09] • “Taking the heat off transactions: dynamic selection of pessimistic concurrency control [Sonmez, Harris, Cristal, Unsal & Valeo, IPDPS'09] • “Proactive transaction scheduling for contention management”[Blake, Dreslinky & Mudge, MICRO'09] • “Improving performance by reducing aborts in HTM”[Ansari, Khan, Lujan, Kotselidis, Kirkham and Watson, HIPEAC'10] • “Window-based greedy contention management for TM”[Sharma, Estrade & Busch, DC'10] • “On Transaction Scheduling in distributed TM systems][Kim & Ravindran, 2010] • “Kernel-assisted Scheduling and Deadline Support for STM”[Maldonado, Marlier, Felber, Lawall, Muller & Riviere, DSN'11] • “Adaptive thread scheduling techniques for improving scalability of STM”[Chan, Lam & Wang, 2011] • … Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling support for TMConclusions & future work • Scheduling-based CM results in improved throughput under high contention • Overhead is negligible when contention is low • Lightweight kernel support can improve performance and efficiency for some workloads • Dynamically selecting best CM algorithm for workload at hand is a challenging research direction • Machine learning? Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Thank you. Danny Hendler 3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome