300 likes | 785 Views
Advanced Topics in Transactional Memory. Preventing vs Curing: Avoiding Conflicts in Transactional Memories. Idan Kedar. Article by: Aleksander Dragojevic Anmol V. Singh Rachid Guerraoui Vasu Singh. Contents. Scheduling Transactions Setting the goal: how good can we get? Das Shrink
E N D
Advanced Topics in Transactional Memory Preventing vs Curing: Avoiding Conflicts in Transactional Memories Idan Kedar Article by: Aleksander Dragojevic Anmol V. Singh Rachid Guerraoui Vasu Singh
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
Scheduling Transactions • Basically, we want to schedule transactions, much the same way as threads are scheduled. • It is already being done poorly by the oblivious OS scheduler. • It is partly done as an afterthought by contention managers.
Serializing Contention Managers • When CMs/TM schedulers decide to serialize a transaction, they usually use some coarse measure to decide on A and B. • Serialization is only justified when the reason to abort a transaction is likely to repeat.
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
The theoretical model • Infinite amount of cores. • Central schduler assigns transactions to cores. • Preemptive scheduling. • Every transaction has a release time. • A transaction may abort.
Some definitions • Makespan: the total time from the beginning of the first transaction until the successful comitting of the last one. • Optimal offline scheduler (OOS): a theroetical scheduler which has all knowledge about all the transactions.
Some more definitions • Competitive ratio: makespan of a TM Scheduler divided by makespan of the OOS. • A scheduler which a competitive ratio of k is called k-competitive. • Online clairvoyant scheduler: a scheduler which has complete knowledge of all transactions already started.
Existing Contention Managers • Greedy: the older transaction gets to abort the younger one. • Serializer (employed by CAR-STM): Upon conflict between A and B, B will execute after A on the same core. • ATS: If a transaction T aborts k times, T is added to the global queue Q, which executes all its transactions sequentially.
Dissing ATS • ATS (with a constant k) O(n)-competitive. • Assume all transactions conflict with T1 • Assume T1 takes k time units. • After k aborts, T2-Tn will be queued after T1. • ATS has a makespan of k+n-1. • OOS has a makespan of k+1. • Since k is constant, this is O(n)-competitive.
Dissing Greedy and Serializer • Hagit Attya proved that greedy is O(n)-competitive. • Serializer is also O(n)-competitive. • Assume all transactions conflict with T2, and T1 aborts T2. • T2 is serialized after T1, and T3-n are serialized after T2. • Serializer has a makespan of n. • OOS has a makespan of 2.
So what's fast? • Online Clairvoyant Transaction Scheduler: such a scheduler can at most be 2-competitive with OOS. • This article showed that 2 is a tight lower bound. • Let all transactions abort – and now you have all the information about them. Now create the same scheduling OOS will create – and the makespan is optimal. • If the previous scheduler was inaccurate, it would be O(n)-competitive.
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
General outline of Shrink • Shrink uses locality of reference across the last few transactions to determine the read set of a thread. • As for the write set, such attitude usually doesnt work. But we do know that an aborted transaction will attempt to write the same variables. • A transaction touching a data set of another transaction may be serialized.
Predicting the read set • Empirical observations show that multiple consecutive committed transactions of a thread read similar addresses. • This phenomenon is referred to as ”temporal locality”. • Usually a result of travarsals of data structures. • Shrink bases its read-set prediction on the last k transaction. • k is referred to as the locality window.
Predicting the write set • Unlike read set, transactions usually have small write sets. • Therefore, temporal locality doesn't work well on write sets. • When a transaction aborts and restarts, it writes to the same addresses and therefore has the same write set. • We assume no drastic changes between abort and restart.
Serialization affinity • Serializing transactions and avoiding conflicts doesn't pay off in an underloaded systems and in low contention scenarios. • Serialization affinity stipulates that the probability of serializing a transaction should be proportional to the amount of contention.
Flowchart of Shrink • Success rate of a thread: ratio of commits/aborts. • Write set is predicted by aborts. To avoid unnecessary aborts and mispredictions, shrink only uses prediction on low success rate or high contention.
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
Shrink with SwissTM on STMBench7 • Shrink is comparable to regular SwissTM in underloaded cases. • Shrink improves SwissTM by up to 55% in read-dominated workloads.
Shrink with SwissTM on STAMP • On STAMP, Shrink gives better improvements when the system is more overloaded. • ”intruder” is improved by 15%. the intruder scenario is one that requires serialization. • ”yada” is improved by up to 120%. yada has long transactions and moderate contention.
Shrink with TinySTM on STMBench7 • Shrink is comparable to TinySTM in underloaded cases. • In 24 threads, shrink performs 32 times better then TinySTM. This is because the cost of conflicts is very high in TinySTM.
Some more numbers • Shrink on TinySTM on STAMP, Shrink performed 100 times better than TinySTM on ”intruder”, ”yada” and ”vacation”. • When SwissTM was configured with busy waiting, Shrink's throughput was hundreds of times faster on 24 threads.
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
Conflict Avoiding Schedulers • ATS (Adaptive Transaction Scheduling) – gained performance mostly in overloaded system. • Steal-on-abort – when a thread aborts her transaction – it is stolen by the prevailing thread. • TxLinux – integrates transactions into the kernel, making the scheduler transaction-aware. • All of these schedule in order to avoid conflicts. None of them schedule to prevent conflicts.
CAR-STM • CAR-STM employs Serializer, which we have already seen. • As we have seen, CAR-STM can serialize too much in some cases. • Shrink predicts conflicts, whereas CAR-STM doesn't. Instead, CAR-STM lets the application predict it. • In domains which do not have temporal locality (for instance – worker threads), domain-based prediction will produce better results.
Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion
Aftermath • We started with the deep desire in our hearts to schedule transactions. The scheduler's quality is crucial for performance. • We explored the theoretical power and limitations of prediction based TM schedulers. • We introduced Shrink, a scheduler with a heuristic separating overloaded from underloaded cases and read prediction from write prediction.
In two words: buy it • Shrink dynamically serializes transactions based on the current contention and the likelihood of conflict with currently executing transactions. • Shrinks performance were illustrated on SwissSTM and TinySTM. TinySTM is a notable case where Shrink improves performance of overloaded systems by order of magnitude. • Shrink's main performance gain is in the average case. Its worst case performance is no better than of other schedulers.