730 likes | 826 Views
Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12. Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin Rinard, MIT Mooly Sagiv, TAU. http://theory.stanford.edu/~hawkinsp/. Research Interests.
E N D
Data Representation SynthesisPLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin Rinard, MIT Mooly Sagiv, TAU http://theory.stanford.edu/~hawkinsp/
Research Interests • Verifying properties of low level data structure manipulations • Synthesizing low level data structure manipulations • Composing ADT operations • Concurrent programming via smart libraries
Motivation • Sophisticated data structures exist • Increase abstraction level • Simplifies programming • Concurrency • Integrated into modern PL • But hard to use • Composing several operations
thttpd: Web Server Conn next next next next next 0 3 1 2 1 Map index= 2 index = 0 index = 3 index = 1 index= 5 table [0] [1] [2] [3] Representation Invariants: 1. n: Map. v: Z. table[v] =n n->index=v 2. n: Map. n->rc =|{n' : Conn . n’->file_data = n}| [4] null [5]
thttptd:mmc.c static void add_map(Map *m) { inti = hash(m); … table[i] = m ; … m->index= i ; … m->rc++; } Representation Invariants: 1. n: Map. v:Z. table[v] =n index[n]=v 2. n:Map. rc[n] =|{n' : Conn . file_data[n’] = n}| broken restored
TOMCAT Motivating Example TOMCAT 5.* attr = new HashMap(); … Attribute removeAttribute(String name){ Attribute val = null; synchronized(attr) { found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } } return val; } Invariant: removeAttribute(name) returns the removed value or null if the name does not exist
TOMCAT Motivating Example TOMCAT 6.* attr = new ConcurrentHashMap(); … Attribute removeAttribute(String name){ Attribute val = null; /* synchronized(attr) { */ found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } /* } */ return val; } attr.put(“A”, o); attr.remove(“A”); o Invariant: removeAttribute(name) returns the removed value or null if the name does not exist
Multiple ADTs Invariant: Every element that added to eden is either in eden or in longterm public void put(K k, V v) { if (this.eden.size() >= size) { this.longterm.putAll(this.eden); this.eden.clear(); } this.eden.put(k, v);}
Filesystem Example filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files file=14 file=7 f_list f_list file_in_use f_fs_list f_fs_list file=6 file=5 f_list f_list file_unused f_fs_list f_fs_list file=2 f_list f_fs_list
Filesystem Example Access Patterns filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files • Find all mounted filesystems • Find cached files on each filesystem • Iterate over all used or unused cached files in Least-Recently-Used order file=14 file=7 f_list f_list file_in_use f_fs_list f_fs_list file=6 file=5 f_list f_list file_unused f_fs_list f_fs_list file=2 f_list f_fs_list
Desired Properties of Data Representations • Correctness • Efficiency for access patterns • Simple to reason about • Easy to evolve
Specifying Combination of Shared Data Structures • Separation Logic • First order logic+ reachability • Monadic 2nd order logic on trees • Graph grammars • …
Filesystem Example filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files file=7 file=5 file=14 f_list f_list f_list file_in_use f_fs_list f_fs_list f_fs_list file=6 f_list file_unused f_fs_list • {fs, file} {inuse} file=2 f_list f_fs_list
Filesystem ExampleContainers: View 1 filesystems filesystem=1 filesystem=2 s_list s_list fs:1 fs:2 s_files s_files file:2 file:5 file=14 file=7 file:7 f_list f_list file_in_use file:6 f_fs_list f_fs_list file:14 file=6 file=5 f_list f_list file_unused f_fs_list f_fs_list file=2 f_list f_fs_list
Filesystem ExampleContainers: View 2 filesystems filesystem=1 filesystem=2 s_list s_list s_files s_files file=14 file=7 f_list f_list file_in_use f_fs_list f_fs_list fs: 2, file:5 fs: 2, file:7 fs: 1, file:2 fs: 1, file:14 file=6 file=5 f_list f_list file_unused f_fs_list f_fs_list inuse:F inuse:T file=2 f_list f_fs_list
Filesystem ExampleContainers: Both Views inuse:T inuse:F filesystems filesystem=1 filesystem=2 s_list s_list fs:1 fs:2 s_files s_files file:2 file:5 file=14 file=7 file:7 f_list f_list file_in_use file:6 f_fs_list f_fs_list file:14 fs file inuse fs: 2, file:5 {fs, file} { inuse} fs: 2, file:7 file=6 file=5 f_list f_list file_unused fs: 1, file14 f_fs_list f_fs_list file=2 f_list f_fs_list
Decomposing Relations • High-level shape descriptor • A rooted directed acyclic graph • The root represents the whole relation • The sub-graphs rooted at each node represent sub-relations • Edges are labeled with columns • Each edge maps a relation into a sub-relation • Different types of edges for different primitive data-structures • Shared nodes correspond to shared sub-relations inuse fs list array list list file fs, file fs file inuse {fs, file} { inuse} inuse
Filesystem Example inuse fs fs:1 fs:2 file fs, file inuse file:14 file:2 file:5 file:7 file:6
Memory Decomposition(Left) inuse fs fs:1 fs:2 file fs, file inuse file:14 file:2 file:5 file:7 file:6 inuse:F inuse:T Inuse:T inuse:F Inuse:F
Filesystem Example inuse fs inuse:T inuse:F file fs, file inuse fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2
Memory Decomposition(Right) inuse fs inuse:T inuse:F file fs, file inuse fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:F inuse:T inuse:F Inuse:T Inuse:F
Decomposition Instance inuse fs inuse:F inuse:T fs:1 fs:2 file fs, file fs file inuse file:2 {fs, file} { inuse} inuse file:5 file:14 file:6 file:7 fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:F inuse:T inuse:F Inuse:T Inuse:F
Shape Descriptor inuse fs inuse:F inuse:T fs:1 fs:2 file fs, file fs file inuse file:2 {fs, file} { inuse} inuse file:5 file:14 file:6 file:7 fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:F inuse:T inuse:F Inuse:T Inuse:F
Memory State inuse fs list array inuse:F inuse:T fs:1 fs:2 list list file fs, file s_list fs file inuse file:2 {fs, file} { inuse} inuse file:5 file:14 file:6 file:7 fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 f_list f_fs_list f_fs_list f_fs_list inuse:F inuse:T inuse:F Inuse:T Inuse:F f_list f_list
Memory State filesystems filesystem=1 filesystem=2 inuse fs list s_list s_list array s_files s_files file=5 file=7 list file=14 list file fs, file f_list f_list f_list f_fs_list f_fs_list f_fs_list fs file inuse {fs, file} { inuse} inuse file=6 f_list f_fs_list file_in_use file=2 f_list file_unused f_fs_list
Adequacy Adequacy Not every decomposition is a good representation of a relation A decomposition is adequate if it can represent every possible relation matching a relational specification enforces sufficient conditions for adequacy
Adequacy of Decompositions • All columns are represented • Nodes are consistent with functional dependencies • Columns bound to paths leading to a common node must functionally determine each other
Respect Functional Dependencies file {file} {inuse} inuse
Respect Functional Dependencies file,fs {file, fs} {inuse} inuse
Adequacy and Sharing inuse fs fs, file file Columns bound on a path to an object x must functionally determine columns bound on any other path to x {fs, file}{inuse, fs, file} inuse
Adequacy and Sharing inuse fs fs file Columns bound on a path to an object x must functionally determine columns bound on any other path to x {fs, file} {inuse, fs} inuse
Decompositions and Aliasing • A decomposition is an upper bound on the set of possible aliases • Example: there are exactly two paths to any instance of node w
The RelC Compiler[PLDI’11] Programmer Relational Specification C++ RelC Decomposition Data structure designer
The RelC Compiler Relational Specification • fsfileinuse • {fs, file} {inuse} • foreach <fs, file, inuse> filesystemss.t. fs= 5 • do … inuse fs list array C++ list list Graph decomposition file fs, file RelC inuse
Query Plans foreach <fs, file, inuse> filesystems if inuse=T do … scan inuse fs list array list fs, file list file scan inuse Cost proportional to the number of files
Query Plans foreach <fs, file, inuse> filesystems if inuse=T do … access inuse fs list array fs, file list list file scan inuse Cost proportional to the number of files in use
Removal and graph cuts remove <fs:1> filesystems fs:1 fs:2 fs list s_list s_list array s_files s_files list file:14 file:7 list file f_list f_list inuse:T f_fs_list f_fs_list inuse file:6 file:5 f_list f_list inuse:F f_fs_list f_fs_list file:2 f_list f_fs_list
Removal and graph cuts remove <fs:1> filesystems fs:2 fs list s_list array s_files list file:7 list file f_list inuse:T f_fs_list inuse file:5 f_list inuse:F f_fs_list
Abstraction Theorem • If the programmer obeys the relational specification and the decomposition is adequate and if the individual containers are correct • Then the generated low-level code maintains the relational abstraction remove <fs:1> relation relation memory state memory state low level code remove <fs:1>
Concurrent ^ Filesystem Example • Lock-based concurrent code is difficult to write and difficult to reason about • Hard to choose the correct granularity of locking • Hard to use concurrent containers correctly
Granularity of Locking More scalable Fine-grained locking More complex Less complex Lower overhead Easier to reason about Coarse-grainedlocking Easier to change
Concurrent Containers • Concurrent containers are hard to write • Excellent concurrent containers exist (e.g., Java’s ConcurrentHashMap and ConcurrentSkipListMap) • Composing concurrent containers is tricky • It is not usually safe to update containers in paralel • Lock association is tricky • In practice programmers often* use concurrent containers incorrectly * 44% of the time [Shacham et al., 2011]
Concurrent Data Representation Synthesis[PLDI’12] Relational Specification Compiler RelScala Concurrent Data Structures, Atomic Transactions Concurrent Decomposition
= ConcurrentHashMap = TreeMap Concurrent Decompositions • Concurrent decompositions describe how to represent a relation using concurrent containers and locks • Concurrent decompositions specify: • the choice of containers • the number and placement of locks • which locks protect which data
Concurrent Interface Interface consisting of atomic operations:
Goal: Serializability Every schedule of operations is equivalent to some serial schedule Thread 1 Thread 2 Time
Goal: Serializability Every schedule of operations is equivalent to some serial schedule Thread 1 Thread 2 Time
Two-Phase Locking Attach a lock to each piece of data • Well-locked: To perform a read or write, a thread must hold the corresponding lock • Two-phase: All lock acquisitions must precede all lock releases Two phase locking protocol: Theorem [Eswaran et al., 1976]: Well-locked, two-phase transactions are serializable
Two Phase Locking Decomposition Decomposition Instance Attach a lock to every edge We’re done! Problem 1: Can’t attach locks to container entries Problem 2: Too many locks
Lock Placements Decomposition Decomposition Instance 1. Attach locks to nodes