Analyzing Aborts in Software Transactional Memory

Presented by: OferKiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation Analyzing Aborts in Software Transactional Memory

Overview • Repeating midterm presentation on the following subjects * Software Transactional Memory abstraction * STM implementation example - TL2 overview * Aborts in STM * Unnecessary aborts in STM * Project goal * Implementation * Overview • Online part – implementation • Online logging • Evaluation • Hardware • Deuce • Benchmarks • Results • Conclusion and analysis • Nice to have • Future work

Importance Of Parallel Programming • Frequency barrier – the single core processor’s performance can not improve. • Switch to multi-cores. • Parallel programs allow utilizing multi-core processors. • Need for synchronization for accessing shared data

Transactional Memory – why? • Current synchronization – locks • Coarse-grained – limit parallelism • Fine-grained – high programming complexity • Error-prone (deadlocks / livelocks) • Transactional memory solution • Intuitive for a programmer • Provides a “transaction” abstraction for a critical section (operations executed atomically) • Implemented in both software and hardware.

Why Do Aborts Happen? T1 T2 T3 Committed OBJECT1 OBJECT2 T1 T2 T3 Read from O1 Aborted T4 T1 T2 T3 write to O2 To maintain consistency if T4 commits T1 T2 & T3 must abort! T4 Reads from O2 and writes to O1

T3 A o1 Unnecessary Aborts T1 o2 T2 C • Aborts are bad • work is lost, resources are wasted, throughput decreases • Some aborts are necessary • continuing the run would violate correctness • And some aborts are not • Analysis whether the algorithm should is too expensive. • “Unnecessary” abort: it could be avoided • keep more versions, better check of transactional dependencies.

Project Goals • Build a software analysis tool: • measures aborts statistics for a given run • evaluate how many of them were unnecessary • evaluate the damage to performance • “Will it pay off to add designs to stop the unnecessary aborts?”

Project Formation • An offline part for analyzing the run: • reads the log of the run. • gathers statistics. • analyzes unnecessary aborts. • An online part for logging the run: • is inserted to a specific algorithm • run in a benchmark • flushes the run info to an XML log file

Offline Part • Gives basic statistics regarding the transactions run. • Counts aborts per reason. • Counts reads, writes • Count transactions • Inserting the Path into Run Descriptor ADT Struct. Parser • Every log line represents transactional action • represented by LogLine abstract class • Parser responsibility: • iterate over the xml • create appropriate LogLine instances • LogLine factories for different operation types • transactional start • read operation • write operation • transactional commit Analyzer

Transactional Dependencies Run Descriptor is a precedence graph!

In order to create the graph we needed to establish A way to make the basic run into a graph  RUN DESCRIPTOR T1 OBJECT1 Reader OBJECT1 Version2 WaR WaR Writer OBJECT2 OBJECT2 Version2 T4 Writer Reader

ABORTS ANALYZER • Searches for unnecessary aborts in RUN DESCRIPTOR • Speculatively adds the edges of the aborted transaction to the RUN DESCRIPTOR • Using DFS – Finds circles in the precedence graph. • Circles represent necessary aborts • Removes the edges at the end of analysis. • Built as visitor pattern • Flexible for more complex analysis

Online part Our goals: • Run benchmarks to prepare the statistics for offline part. • Be sure that the measurements don’t distort the scheduling picture.

Platform Supporting STM Introducing: Deuce STM!!! • Deuce STM is an open source java STM environment. • With Deuce STM, if the method: public void doThing() {…} is not thread-safe… @Atomic Public void doThing() {…} is!! Created By: Guy Korland, NirShavit, Pascal Felber, Igor Berman TL2 Work Method With Logging Deuce Frame Work Source Code final public class Context implements org.deuce.transaction.Context { private static String objectId(Object reference, long field) { return Long.toString(System.identityHashCode(reference) + field); } final static AtomicIntegerclock = new AtomicInteger(0);

How To Utilize Deuce for Logging Deuce Framework • Modified code to call logging utils. • More exceptions type to distinct between different aborts types. TL2 Algorithm Transactions Code: Start Read Write Commit Logger A Perfectly Scalable Code 

Online Part Implementation Version 1 Main Problem : Adding to priority queue damages parallelism and lowers performance

Online Part ImplementationVersion 2 The Back End Collector 1 3 The threads don’t do any Extra actions to log the run. The Loglines have ended The program has ended 2

What Do we Check? • Commit rate • Unnecessary aborts (classified by types) • Wasted work

Testbenches • SSCA2 – Short transactions, low contention, high memory utilization • Vacation – High contention, Medium length transaction, Mostly reads. • AVL tree – customizable contention, medium length transactions. • Random choice between add, remove or search for a random integer in the tree. • Ability to change integer range for custom contention. • Created by us. 

Hardware • Benchmarks run on Trinity: • 8 quad-cores • 132 GB RAM • Machine was idle for our use.

Simulation Results – AVL tree All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads

Simulation Results – SSCA2 All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads

Simulation Results – Vacation All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads

Simulation Results – AVL tree All graphs are a function of the thread amount

Simulation Results – SSCA2 All graphs are a function of the thread amount

Simulation Results – Vacation All graphs are a function of the thread amount

Simulation Results – AVL tree All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types

Simulation Results – SSCA2 All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types

Simulation Results – Vacation All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types

Logger impact on performance AVL test with logging – commit ratio • Logger access obviously demands more from the Deuce framework. • More memory accesses • More exception types • On every read & write • How much distortion does the logger cause?

Conclusions • Parallelism increases → aborts rate, unnecessary abort rate and the wasted work rate increase as well. • Parallelism increases → more aborts are caused by locked objects. • To improve STM performance over highly parallel workloads, algorithms may be improved to prevent unnecessary aborts.

Nice To Have • Drawing the precedence graph automatically to a drawing in Microsoft Visio. • Possibility to analyze according to abort types. • GUI. • Expansion of the simulation to more algorithms and test benches – makes the comparison of performance between algorithms possible.

Future Work • Drop in abort rates after 128 threads due to a drop in concurrency – further analysis is required. • Unfit versions cause a lot of aborts. • The new SMV algorithm may solve this problem.

BIBLIOGRAPHY • I. Keidar and D. Perelman. On avoiding spare aborts in transactional memory. In Proceedings of the twenty-ﬁrst annual symposium on Parallelism in algorithms and architectures, pages 59–68, 2009. • I. Keidar and D. Perelman .SMV: Selective Multi-Versioning STM • O. S. D. Dice and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194–208, 2006. • M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Soft-ware transactional memory for dynamic-sized data structures. In Pro-ceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92–101, 2003.

?QUESTIONS

Analyzing Aborts in Software Transactional Memory

Analyzing Aborts in Software Transactional Memory

Presentation Transcript

Transactional memory

Software Transactional Memory

Software Transactional Memory

Adaptive Software Transactional Memory

Software Transactional Memory

Maintaining Multiple Versions in Software Transactional Memory

Software Transactional Memory

On Avoiding Spare Aborts in Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Analyzing Aborts in Software Transactional Memory

Transactional Memory

Algorithmics for Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Dynamic Software Transactional Memory

Transactional Memory

Transactional Memory

Software Transactional Memory

Software Perspectives on Transactional Memory

Transactional Memory