350 likes | 523 Views
Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman. Final Presentation. Analyzing Aborts in Software Transactional Memory. Overview.
E N D
Presented by: OferKiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation Analyzing Aborts in Software Transactional Memory
Overview • Repeating midterm presentation on the following subjects * Software Transactional Memory abstraction * STM implementation example - TL2 overview * Aborts in STM * Unnecessary aborts in STM * Project goal * Implementation * Overview • Online part – implementation • Online logging • Evaluation • Hardware • Deuce • Benchmarks • Results • Conclusion and analysis • Nice to have • Future work
Importance Of Parallel Programming • Frequency barrier – the single core processor’s performance can not improve. • Switch to multi-cores. • Parallel programs allow utilizing multi-core processors. • Need for synchronization for accessing shared data
Transactional Memory – why? • Current synchronization – locks • Coarse-grained – limit parallelism • Fine-grained – high programming complexity • Error-prone (deadlocks / livelocks) • Transactional memory solution • Intuitive for a programmer • Provides a “transaction” abstraction for a critical section (operations executed atomically) • Implemented in both software and hardware.
Why Do Aborts Happen? T1 T2 T3 Committed OBJECT1 OBJECT2 T1 T2 T3 Read from O1 Aborted T4 T1 T2 T3 write to O2 To maintain consistency if T4 commits T1 T2 & T3 must abort! T4 Reads from O2 and writes to O1
T3 A o1 Unnecessary Aborts T1 o2 T2 C • Aborts are bad • work is lost, resources are wasted, throughput decreases • Some aborts are necessary • continuing the run would violate correctness • And some aborts are not • Analysis whether the algorithm should is too expensive. • “Unnecessary” abort: it could be avoided • keep more versions, better check of transactional dependencies.
Project Goals • Build a software analysis tool: • measures aborts statistics for a given run • evaluate how many of them were unnecessary • evaluate the damage to performance • “Will it pay off to add designs to stop the unnecessary aborts?”
Project Formation • An offline part for analyzing the run: • reads the log of the run. • gathers statistics. • analyzes unnecessary aborts. • An online part for logging the run: • is inserted to a specific algorithm • run in a benchmark • flushes the run info to an XML log file
Offline Part • Gives basic statistics regarding the transactions run. • Counts aborts per reason. • Counts reads, writes • Count transactions • Inserting the Path into Run Descriptor ADT Struct. Parser • Every log line represents transactional action • represented by LogLine abstract class • Parser responsibility: • iterate over the xml • create appropriate LogLine instances • LogLine factories for different operation types • transactional start • read operation • write operation • transactional commit Analyzer
Transactional Dependencies Run Descriptor is a precedence graph!
In order to create the graph we needed to establish A way to make the basic run into a graph RUN DESCRIPTOR T1 OBJECT1 Reader OBJECT1 Version2 WaR WaR Writer OBJECT2 OBJECT2 Version2 T4 Writer Reader
ABORTS ANALYZER • Searches for unnecessary aborts in RUN DESCRIPTOR • Speculatively adds the edges of the aborted transaction to the RUN DESCRIPTOR • Using DFS – Finds circles in the precedence graph. • Circles represent necessary aborts • Removes the edges at the end of analysis. • Built as visitor pattern • Flexible for more complex analysis
Online part Our goals: • Run benchmarks to prepare the statistics for offline part. • Be sure that the measurements don’t distort the scheduling picture.
Platform Supporting STM Introducing: Deuce STM!!! • Deuce STM is an open source java STM environment. • With Deuce STM, if the method: public void doThing() {…} is not thread-safe… @Atomic Public void doThing() {…} is!! Created By: Guy Korland, NirShavit, Pascal Felber, Igor Berman TL2 Work Method With Logging Deuce Frame Work Source Code final public class Context implements org.deuce.transaction.Context { private static String objectId(Object reference, long field) { return Long.toString(System.identityHashCode(reference) + field); } final static AtomicIntegerclock = new AtomicInteger(0);
How To Utilize Deuce for Logging Deuce Framework • Modified code to call logging utils. • More exceptions type to distinct between different aborts types. TL2 Algorithm Transactions Code: Start Read Write Commit Logger A Perfectly Scalable Code
Online Part Implementation Version 1 Main Problem : Adding to priority queue damages parallelism and lowers performance
Online Part ImplementationVersion 2 The Back End Collector 1 3 The threads don’t do any Extra actions to log the run. The Loglines have ended The program has ended 2
What Do we Check? • Commit rate • Unnecessary aborts (classified by types) • Wasted work
Testbenches • SSCA2 – Short transactions, low contention, high memory utilization • Vacation – High contention, Medium length transaction, Mostly reads. • AVL tree – customizable contention, medium length transactions. • Random choice between add, remove or search for a random integer in the tree. • Ability to change integer range for custom contention. • Created by us.
Hardware • Benchmarks run on Trinity: • 8 quad-cores • 132 GB RAM • Machine was idle for our use.
Simulation Results – AVL tree All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads
Simulation Results – SSCA2 All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads
Simulation Results – Vacation All graphs are a function of the thread amount Commit Ratio Amount of Aborts & Unnecessary Aborts Percentage of Unnecessary Aborts Percentage of Wasted Reads
Simulation Results – AVL tree All graphs are a function of the thread amount
Simulation Results – SSCA2 All graphs are a function of the thread amount
Simulation Results – Vacation All graphs are a function of the thread amount
Simulation Results – AVL tree All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types
Simulation Results – SSCA2 All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types
Simulation Results – Vacation All graphs are a function of the thread amount Amount of Aborts by types Percentage of Aborts by types
Logger impact on performance AVL test with logging – commit ratio • Logger access obviously demands more from the Deuce framework. • More memory accesses • More exception types • On every read & write • How much distortion does the logger cause?
Conclusions • Parallelism increases → aborts rate, unnecessary abort rate and the wasted work rate increase as well. • Parallelism increases → more aborts are caused by locked objects. • To improve STM performance over highly parallel workloads, algorithms may be improved to prevent unnecessary aborts.
Nice To Have • Drawing the precedence graph automatically to a drawing in Microsoft Visio. • Possibility to analyze according to abort types. • GUI. • Expansion of the simulation to more algorithms and test benches – makes the comparison of performance between algorithms possible.
Future Work • Drop in abort rates after 128 threads due to a drop in concurrency – further analysis is required. • Unfit versions cause a lot of aborts. • The new SMV algorithm may solve this problem.
BIBLIOGRAPHY • I. Keidar and D. Perelman. On avoiding spare aborts in transactional memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 59–68, 2009. • I. Keidar and D. Perelman .SMV: Selective Multi-Versioning STM • O. S. D. Dice and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194–208, 2006. • M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Soft-ware transactional memory for dynamic-sized data structures. In Pro-ceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92–101, 2003.