Advanced Topics in Transactional Memory

Advanced Topics in Transactional Memory Preventing vs Curing: Avoiding Conflicts in Transactional Memories Idan Kedar Article by: Aleksander Dragojevic Anmol V. Singh Rachid Guerraoui Vasu Singh

Contents • Scheduling Transactions • Setting the goal: how good can we get? • Das Shrink • The results • Related work: other schdulers • Conclusion

Scheduling Transactions • Basically, we want to schedule transactions, much the same way as threads are scheduled. • It is already being done poorly by the oblivious OS scheduler. • It is partly done as an afterthought by contention managers.

Serializing Contention Managers • When CMs/TM schedulers decide to serialize a transaction, they usually use some coarse measure to decide on A and B. • Serialization is only justified when the reason to abort a transaction is likely to repeat.

The theoretical model • Infinite amount of cores. • Central schduler assigns transactions to cores. • Preemptive scheduling. • Every transaction has a release time. • A transaction may abort.

Some definitions • Makespan: the total time from the beginning of the first transaction until the successful comitting of the last one. • Optimal offline scheduler (OOS): a theroetical scheduler which has all knowledge about all the transactions.

Some more definitions • Competitive ratio: makespan of a TM Scheduler divided by makespan of the OOS. • A scheduler which a competitive ratio of k is called k-competitive. • Online clairvoyant scheduler: a scheduler which has complete knowledge of all transactions already started.

Existing Contention Managers • Greedy: the older transaction gets to abort the younger one. • Serializer (employed by CAR-STM): Upon conflict between A and B, B will execute after A on the same core. • ATS: If a transaction T aborts k times, T is added to the global queue Q, which executes all its transactions sequentially.

Dissing ATS • ATS (with a constant k) O(n)-competitive. • Assume all transactions conflict with T1 • Assume T1 takes k time units. • After k aborts, T2-Tn will be queued after T1. • ATS has a makespan of k+n-1. • OOS has a makespan of k+1. • Since k is constant, this is O(n)-competitive.

Dissing Greedy and Serializer • Hagit Attya proved that greedy is O(n)-competitive. • Serializer is also O(n)-competitive. • Assume all transactions conflict with T2, and T1 aborts T2. • T2 is serialized after T1, and T3-n are serialized after T2. • Serializer has a makespan of n. • OOS has a makespan of 2.

So what's fast? • Online Clairvoyant Transaction Scheduler: such a scheduler can at most be 2-competitive with OOS. • This article showed that 2 is a tight lower bound. • Let all transactions abort – and now you have all the information about them. Now create the same scheduling OOS will create – and the makespan is optimal. • If the previous scheduler was inaccurate, it would be O(n)-competitive.

General outline of Shrink • Shrink uses locality of reference across the last few transactions to determine the read set of a thread. • As for the write set, such attitude usually doesnt work. But we do know that an aborted transaction will attempt to write the same variables. • A transaction touching a data set of another transaction may be serialized.

Predicting the read set • Empirical observations show that multiple consecutive committed transactions of a thread read similar addresses. • This phenomenon is referred to as ”temporal locality”. • Usually a result of travarsals of data structures. • Shrink bases its read-set prediction on the last k transaction. • k is referred to as the locality window.

Predicting the write set • Unlike read set, transactions usually have small write sets. • Therefore, temporal locality doesn't work well on write sets. • When a transaction aborts and restarts, it writes to the same addresses and therefore has the same write set. • We assume no drastic changes between abort and restart.

Prediction accuracy

Serialization affinity • Serializing transactions and avoiding conflicts doesn't pay off in an underloaded systems and in low contention scenarios. • Serialization affinity stipulates that the probability of serializing a transaction should be proportional to the amount of contention.

Flowchart of Shrink • Success rate of a thread: ratio of commits/aborts. • Write set is predicted by aborts. To avoid unnecessary aborts and mispredictions, shrink only uses prediction on low success rate or high contention.

Shrink with SwissTM on STMBench7 • Shrink is comparable to regular SwissTM in underloaded cases. • Shrink improves SwissTM by up to 55% in read-dominated workloads.

Shrink with SwissTM on STAMP • On STAMP, Shrink gives better improvements when the system is more overloaded. • ”intruder” is improved by 15%. the intruder scenario is one that requires serialization. • ”yada” is improved by up to 120%. yada has long transactions and moderate contention.

Shrink with TinySTM on STMBench7 • Shrink is comparable to TinySTM in underloaded cases. • In 24 threads, shrink performs 32 times better then TinySTM. This is because the cost of conflicts is very high in TinySTM.

Some more numbers • Shrink on TinySTM on STAMP, Shrink performed 100 times better than TinySTM on ”intruder”, ”yada” and ”vacation”. • When SwissTM was configured with busy waiting, Shrink's throughput was hundreds of times faster on 24 threads.

Conflict Avoiding Schedulers • ATS (Adaptive Transaction Scheduling) – gained performance mostly in overloaded system. • Steal-on-abort – when a thread aborts her transaction – it is stolen by the prevailing thread. • TxLinux – integrates transactions into the kernel, making the scheduler transaction-aware. • All of these schedule in order to avoid conflicts. None of them schedule to prevent conflicts.

CAR-STM • CAR-STM employs Serializer, which we have already seen. • As we have seen, CAR-STM can serialize too much in some cases. • Shrink predicts conflicts, whereas CAR-STM doesn't. Instead, CAR-STM lets the application predict it. • In domains which do not have temporal locality (for instance – worker threads), domain-based prediction will produce better results.

Aftermath • We started with the deep desire in our hearts to schedule transactions. The scheduler's quality is crucial for performance. • We explored the theoretical power and limitations of prediction based TM schedulers. • We introduced Shrink, a scheduler with a heuristic separating overloaded from underloaded cases and read prediction from write prediction.

In two words: buy it • Shrink dynamically serializes transactions based on the current contention and the likelihood of conflict with currently executing transactions. • Shrinks performance were illustrated on SwissSTM and TinySTM. TinySTM is a notable case where Shrink improves performance of overloaded systems by order of magnitude. • Shrink's main performance gain is in the average case. Its worst case performance is no better than of other schedulers.

Advanced Topics in Transactional Memory

Advanced Topics in Transactional Memory

Presentation Transcript

Transactional memory

Software Transactional Memory

Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Selfishness in Transactional Memory

Transactional Memory

Hybrid Transactional Memory

Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory