520 likes | 533 Views
Explore broadcast environments, reading current and consistent data, & developing correctness criteria in data dissemination. Learn scalable solutions, cache utilization, & time-constrained broadcasting.
E N D
Reading Consistent and Current Data “Off the Air” Krithi Ramamritham Indian Institute of Technology, Bombay University of Mass. Amherst
Outline • broadcast environments • reading current and consistent data • development of suitable • correctness criteria • mechanisms for disseminating (control) data efficient and scalable solutions • exploiting caches • time-constrained broadcasting
Broadcast Data Dissemination • business data, e.g., Vitria, Tibco • election coverage data • stock related data • traffic information • sportscasts, e.g., Praja • Datatacycle • Broadcast disks Data Server
Example: E-auctions • numerous potential clients • clients must have access to current and consistent database state • only a small fraction actually offer bids • asymmetric communication medium -- broadcast current state of auction -- clients offer bids using low bandwidth uplinks, reduce client transmissions
F E G D A C B . . Clients BC EG AB Server 14.4 Kbps Cyclic Broadcasts 100 Mbps
Client-Server Communication Model • Asymmetric communication environments • Server periodically broadcasts data to clients using high bandwidth broadcast links • Clients listen to the broadcast to fetch data • Clients communicate with the server using low bandwidth upstream links • Update handling • Transactions at clients update data and send them to the server • Server resolves update conflicts, commits updates, and broadcasts updates to clients through broadcast links
R(x) R(y) R(z) Mutually Consistent Reads TrBegin TrEnd time (broadcast cycles) Are x, y, and z mutually consistent?
Serializability in Bcast Env. • Serializability: a global property • dynamic conflict resolution => excessive comm. • e.g., locking: • acquiring read locks by client transactions • server swamped with lock requests • client uses precious uplink bandwidth • avoid potential conflicts • clients must be conservative • unilaterally disallow certain correct executions • unnecessary aborts
W2(x) C2 W4(y) C4 Server ClientA R1(x) R1(y) ClientB R3(x) R3(y) Schedules
Server W2(x) C2 W4(y) C4 ClientA R1(x) R1(y) ClientB R3(x) R3(y) Serialization Orders T2 T4 T4T1T2 T2T3T4 But, global history is not serializable
Serializability? • all read-only transactions • 1. required to see the same serial order of update transactions • -- even if executing at different clients • 2. required to be serializable w.r.t. all the update transactions -- even if updates do not • affect the values read unnecessary and inappropriate
Broadcast Data Requirements Mutual consistency -- server maintains mutually consistent data -- clients read mutually consistent data Currency -- clients see data that is current
W2(x) C2 W4(y) C4 Server ClientA R1(x) R1(y) T2T4 ClientB R3(x) R3(y) T4T1T2 T2T3T4 A Sufficient Criterion • All update transactions are serializable. • Each read-only transaction is serializable with respect to the update transactions it (directly or indirectly) reads from.
ensures consistency of • the database • the values read by transactions A Sufficient Criterion • All update transactions are serializable. • Each read-only transaction is serializable with respect to the update transactions it (directly or indirectly) reads from. external consistency [Weihl 87] update consistency [Bober and Carey 92]
Server W2(x) C2 W4(y) C4 ClientA R1(x) R1(y) R3(x) R3(y) ClientB Schedule is Correct T2T4 T4T1T2 T2T3T4 Even though global history is not serializable
Implications If clients know update schedule read-only transactions need not contact the server. => addresses the primary problems with serializability
Possible Concerns? Will transactions executing at the same client see different serial orders of update transactions? • T1 followed by T2 • because of mutual consistency and currency, • T2’s reads will be consistent and current relative to T1's • T1 concurrent with T2 • can see different update orders • only if the updates are unrelated
Server W2(x) C2 W4(y) C4 ClientA R1(x) R1(y) R3(x) R3(y) ClientB Schedule is Correct T2T4 T4T1T2 T2T3T4 Even though global history is not serializable
Outline • broadcast environments • reading current and consistent data • development of correctness criteria • mechanisms • performance results • exploiting caches • time-constrained broadcasts
Approx Relationships Mutual Consistency View Serializability Conflict Serializability
The Approach 1. Update trs at the server are conflict serializable. 2. Each read-only transaction is serializable with respect to the update transactions it (directly or indirectly) reads from. affect( T) = transactions that affect what T reads i.e., transactions T directly or indirectly reads from. for every read-only transaction T, serialization graph consisting of only the trs in { T } U affect( T ) is acyclic.
The Algorithm: F-Matrix • the server functionality • the client functionality • the nature of the control information • transmitted from the server to the clients • to help clients determine consistency of values read • the client read-only transaction validation protocol
Server Functionality Ensures the conflict serializability of all update transactions During each cycle server broadcasts 1. the latest committed values of all data items at the beginning of the cycle. 2. a control matrix Incrementally maintains the control matrix -- as updates occur
consult control information transmitted during that cycle to determine whether the read operation can proceed if read operation cannot proceed the transaction is aborted. Read performed on a local copy of the data item in the client. no checks are made Write update tr : (write set + values) along with (read set + cycle numbers) sent to server read tr : commit succeeds Commit Client Functionality
W2(x) C2 W4(y) C4 Server Client R1(x) R1(y) Server W2(x) C2 W4(x) W4(y) C4 Client R1(x) R1(y) T is currently reading y T had read x Did any tr that affectedy change xafter T read it? Control Matrix: Intuition
Control Matrix Objects: n objects all initialized at cycle 0 C: n x n control matrix C(x,y) = max(cycle in which T commits ), where Taffects the latest committed value of y and also writes to x
Let T affect the latest committed value of y and also writes to x C(x,y) = cycle in which T commits T1 w(x) w(y) c r(y) w(y) c Ti commits during broadcast cycle i T2 C(X,Y) = 1 x y T2 updates y last T2 reads y from T1 T1 writes to xduring cycle 1 x y 1 1 1 2 Control Matrix
Precond. for Consistent Reads T previously read x from broadcast cycle b RT = set of (x,b) pairs C is the matrix at the beginning of current cycle read y iff read-condition(y) holds: forall (x,b) in RT, C(x,y) < b i.e., no transaction that affected y wrote x after t read x
T is currently reading y T had read x No tr that affected y changed xafter T read it Server W2(x) C2 W4(y) C4 ClientA R1(x) R1(y) ClientB R3(x) R3(y) Is Schedule Correct? RT= {(x,1)} RT= {(x,1),(y,3)} C(x,y) = 0 ok RT= {(x,2),(y,2)}
Server W2(x) C2 W4(x) W4(y) C4 ClientA R1(x) R1(y) ClientB R3(x) R3(y) Is Schedule Correct? T is currently reading y T had read x No tr that affected y changed xafter T read it RT= {(x,1)} RT = {(x,1),(y,3)} C(x,y) = 2 ok RT = {(x,2),(y,2)}
Control Matrix - Overheads • maxcycles: maximum number of cycles that a read tr • could span • need to store only cycle numbers from 0 to maxcycles • perform modulo (maxcycles + 1) arithmetic • Transmitting the matrix: • n2x log(maxcycles) bits per broadcast cycle • if object size is small, this overhead can be significant. • transmit column j right after object obj.
Smaller Control Matrix • partition objects into groups • control matrix: n x numgroups • SC(x,s) = max y in s C(x, y) • updating an object in s = update to any object in s • fewer entries to transmit compared to C group2 group 1 read-condition(y): forall (x, b) in RT SC(i , s) < b T is currently reading y T had read x No tr that affectedany object in y ‘s group changed xafter T read it
no previously read object has been updated Group Size • increasing size of group => more unnecessary conflicts • reducing size of group => increased control information overhead. • n groups => F-Matrix • one group => Datacycle • achieves serializability • read-condition for Datacycle :
R-Matrix • To achieve Mutual Consistency Read condition: objects previously read have not been updated by other transactions or the object being read has not been updated since the beginning of the transaction
Server W2(x) C2 ClientA R1(x) R1(y) Schedule is Correct objects previously read have not been updated by other transactions or the object being read has not been updated Not acceptable by Datacycle -- accepted by R-Matrix
Hardware Support • a bit could be set by hardware if any of the previously read values of a transaction are changed. • a read is disallowed if • the bit is set and • if the object being read has been changed during or after the cycle in which the first read operation was performed by the tr
Outline • broadcast environments • reading current and consistent data • development of correctness criteria • mechanisms • performance results • exploiting caches • time-constrained broadcasts
Simulated System • broadcast medium bandwidth -- 64 Kbits/s • time unit - time to broadcast one bit • timestamp size = 8 bit • object size = one KByte • Control matrix Overheads: • Datacycle and R-matrix -- 0.1% • with 300 objects, F-Matrix -- 23%
Parameters • Client Transaction Length 4 • Server Transaction Length 8 • Transaction Rate at Server 1 in 2.5 x 105 bit-units • Number of Objects in Database 300 • Size of Objects in Database 1 KB • Server Read Operation Probability 0.5 • Client Inter-Operation Delay 64K bit-units • Client Inter-Transaction Delay 128K bit-units • Client Restart Delay 0 bit-units • Timestamp Size 8 bits
Datacycle R-Matrix F-Matrix F-Matrix-ideal Effect of Client Tr. Length • F-Matrix • -- has best perf. • -- scales very well
Datacycle R-Matrix F-Matrix F-Matrix-ideal Effect of Server Tr length • Longer server transactions • more updates at the server for each cycle • response time increases • F-Matrix: • little increase in response time • scalable
Datacycle R-Matrix F-Matrix F-Matrix-ideal Effect of Server Tr rate • F-Matrix • does not degrade 210 220 230 240 250 Transaction Rate (in 1 per 1000X bits)
Datacycle R-Matrix F-Matrix F-Matrix-ideal Effect of Number of Objects • # objects increases • probability of trs accessing an object decreases • length of cycle increases • control information increases • # server transactions per cycle increases • increases #conflicts at server. • response times increase • F-Matrix • displays the best response times • has least rate of increase • effect similar for size increases
Summary of Results • F-Matrix > R-Matrix > Datacycle • weaker abort condition leads to better response times • response time scalability • F-Matrix is highly scalable with respect to • client / server transaction length • server transaction rate • In many cases F-Matrix very close to F-Matrix-ideal
Enhancements • Concurrency control granularity • “Finer” granularity results in higher concurrency • Matrix not scalable for finer granularity due to overheads resulting from matrix transmission • Control information transmission • Reduce transmitted information if matrix is sparse • Use appropriate indices to speed up clients’ access to concurrency control information
Outline • broadcast environments • reading current and consistent data • development of suitable • correctness criteria • mechanisms for disseminating (control) data • performance results • exploiting caches • time-constrained broadcasts
Client Caches - Serializability • server maintains read/write locks • server aware of client cache contents • invalidations / propagations to clients • scalability problems • else, client read old data • compromises currency / consistency
Client Caches - C Matrix • transmit only s over the previous C matrix transmission. • but • client has to store the previous transmitted C matrix • client should listen to the last broadcast of the C matrix -- and the subsequent s • increases usage of scarce client resources (battery power)
Caches - Weak Currency • suppose data needs to be current to only within D time units D > broadcast cycle time • read from the client cache • data removed from cache as soon as ``not current`` • ensuring mutually consistent reads • store columns of the control matrix corresponding to the data items cached • along with the cycle at which the data items were cached.