Automatic Data Partitioning in Software Transactional Memories

Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)

No one-size-fits-all TM! • STMs: • Design: • Invisible vs. visible reads • Object-based vs. word-based • Parameters: • Lock-based: #locks, addresslock mapping • HTMs: • Different interfaces (e.g., Rock vs. AMD’s ASF) • Resource bounds • Heterogeneous workloads: Global tuning does not help Divide and conquer !?

How to divide • User-driven? hmm, rather not … • Temporally • Runtime tuning can handle phases • … But only if whole workload has same phases • Memory • “Word-based”: Mapping function is difficult • Runtime overheads • Mapping needs to be stable • Memory allocator affects mapping heavily (see false conflicts) • “Object-based”: still need mapping or per-object data • Code • Problem: same function might operate on different data

How to conquer? • Tune concurrency control mechanisms • Use different STM implementations • Use HTM only where applicable/necessary • Tune TM parameters per partition • Challenge: Threads must agree on which mechanisms to use for each item/location! • Two-phase commit or similar is necessary when using several independent TM mechanisms • Improve mapping/partitioning at other levels • E.g., locationlock mapping

Data Partitioning • Partition memory automatically • We use Pool Allocation (Lattner et al, PLDI 05) • Mixed compile-time/runtime technique: • Based on pointer analysis for C/C++ • Nodes in points-to graph become partitions • Partitions are instantiated dynamically at runtime and supplied to called functions that use these partitions • Memory allocator is not affected • Implementation extends Tanger (STM compiler) • STM load/store functions get pointer to partition

Example: Points-to graph for STAMP’s Vacation Type, if known struct has 4 fields, 2 are pointers A second Red-Black Tree instance A Red-Black Tree instance Partial,simplified DS graph for main()

Conquering … • Partition types determine STM implementation used per partition (TinySTM): • Multiple Locks (general purpose) • Single Shared Lock (infrequently updated partitions) • Single Exclusive Lock (low concurrency partitions) • Read-Only (no concurrency control necessary) • Thread-local, transaction-local • Loads/stores dispatched to type-specific STM functions on each call • Partition types and parameters can be tuned • E.g., read-only partitions get tuned on first write

Performance Partitioningdecreases falseconflicts in lockarray. Lock hashfunction gets a2nd levelat compile time. Exclusive Lock is faster than general purpose STM Partitioning addsruntime overhead TinySTM w/o partitioningsupport, 220 / 224 locks TinySTM with partitioning, 4 different tuning heuristics

Performance (2) Read-Only partitions during first phase of benchmark 226 locks ! (224 livelocks due tofalse conflicts) 5 x 256K locks

Challenges • Analysis: Calls to libraries? • Points-to graphs can probably be attached to libs (local per-function analysis + callgraph) • Analysis is bottom-up on call-graph • TM implementations that don’t support two-phase commit • Dispatch: Runtime overheads • JIT? • Size of binaries • Tuning partitions and partitioning • No direct feedback, partitioning results in even more parameters to be tuned • Partition selection / merging at compile-time/runtime

Questions? Tanger + TinySTM + …:http://tinystm.org(send email for version with partitioning support)

Backup Slides

Are there partitions?

Partition Type Performance & Tuning Strategies • Tuning strategy: • Start with read-only type • On reaching a certain number of aborts, switch to: • Single Exclusive Lock • Single Shared Lock • Multiple Locks • Part-1: switch directly to Multiple Locks, Part-4: try other types first (single locks, fewer multiple locks)

Analysis • We use Data Structure Analysis (DSA [1]): • Pointer analysis for LLVM compiler framework • Creates a points-to graph with Data Structure (DS) nodes • Context-sensitive: • Data structures distinguished based on call graphs • Field-sensitive: • distinguish between DS fields • Unification-based: • Pointers target a single node in the points-to graph • Information about pointers from different places get merged • If incompatible information, node is collapsed (= “nothing known”) • Can safely analyze incomplete programs: • Calls to external / not analyzed functions have an effect only on the data that escapes into / from these functions (get marked “External”) • Analyzing more code increases analysis precision [1] Chris Lattner, PhD thesis, 2005

Analysis (2) Integration into Tanger compilation process: • Compile and link program parts into LLVM intermediate representation module • Analyze module using DSA • Local intra-function analysis: per-function DS graph • Merge DS graphs bottom-up in callgraph (put callees’ information into callers) • Merge DS graphs top-down in callgraph (vice versa) • Transactify module • Use DSA information to decide between object-based / word-based • Requirement: If memory chunk (DS node) is object-based, then it must be safe for object-based everywhere in the program • DSA can give us this guarantee • Link in STM library and generate native code

Automatic Data Partitioning in Software Transactional Memories

Automatic Data Partitioning in Software Transactional Memories

Presentation Transcript

Software Transactional Memory

Software Transactional Memory

Digital Memories Software

Concurrency and Software Transactional Memories

Software Transactional Objects

Run-time reconfiguration for automatic hardware/software partitioning

Software Transactional Memory

Hardware-Software Partitioning

Data Partitioning in VLDB

Software Transactional Memory

Software / Hardware Partitioning Techniques

Software Transactional Memory

Analyzing Aborts in Software Transactional Memory

Hardware/Software Partitioning

Software Transactional Memory

Software Transactional Memory

Lecture 18: Transactional Memories II

Formalisms and Verification for Transactional Memories

SECURE WEB APPLICATIONS VIA AUTOMATIC PARTITIONING

Hardware/Software Partitioning

Software Transactional Memory