190 likes | 263 Views
Replication and Distribution. CSE 444 Spring 2012 University of Washington. HASH MAPS. Hash Maps. Precursors to Bloom filters. Used to reduce communication while joining. S = Set to transmit. S = {x 1 , x 2 , …, x n } H = Hash Map. An array of m bits. . Operation.
E N D
Replication and Distribution CSE 444 Spring 2012 University of Washington
Hash Maps • Precursors to Bloom filters. • Used to reduce communication while joining. • S = Set to transmit. • S = {x1, x2, …, xn} • H = Hash Map. • An array of m bits.
Operation • To insert x in H: • Compute the hash on x to get a bit position j • Set j to 1. • To send S, insert all of its elements in H. • Two distinct elements can hash to 1 position. • Creates false positives.
Question Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows: 1. R computes a hash map M with cN bits, where c=8 and sends it to S. 2. Schecks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S. Q: Indicate the total number of bytes transferred over the network in each step.
Analysis • Recall |H| = m. • Insert one element into H. • Probability that bit j remains 0? • p = (1 – 1/m)
Analysis • Recall |H| = m. • Insert alln elements into H. • Probability that bit j remains 0? • p = (1 – 1/m)n = e-n/m (for large m)
Probability of False Positives • Take a random element y, and check if its hash is set to 1 in H. • Probability of FP = probability that the hash is 1. • Probability that bit j is 1? • p = 1 – (1 – 1/m)n = 1 – e-n/m (for large m)
Question Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows: 1. R computes a hash map M with cN bits, where c=8 and sends it to S. 2.S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S. Indicate the total number of bytes transferred over the network in each step.
Solution • Step 1: Send the hash map. • cN bits = 1 million bytes = 1 MB. • Step 2: Number of matched tuples (included false positives) • FP rate = 1 – e-n/m = 11% • 110,000 false positive documents • 110,050 documents in total (including the 50 common ones) • 110.05 MB • 50 documents = 50KB • Total of 111.1 MB The naïve solution without hash maps takes 1 GB of data transfer
Setup 10% read only 2% writes 10% read only 2% writes 50% read only 2% writes 10% read only 2% writes 10% read only 2% writes Each site can communicate with every other site.
Read-locks-oneWrite-locks-all What is the average number of inter-site messages exchanged? All reads are local, so no locks are acquired. Each write requires 4 other locks
Majority locking What is the average number of inter-site messages? 2 other locks needed for both reads and writes. What if you could broadcast across sites with 1 message? Lock acquisition and release is 1 message for all sites Lock grants still takes at 1 message per site.
Primary-copy locking What is the average number of inter-site messages? The copies need to acquire locks for each operation. 48% of the actions need locks.
Two-Phase Commit • Coordinator : 0 • Three subordinates : {1, 2, 3} • Messages • P (Prepare) • C (Commit) • A (Abort) • Y (Yes vote) • N (No vote) • Ignore acks.
2PC • What messages are exchanged for a successful commit? • (0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C) • When exactly does the commit occur? • When coordinator force-wrote the commit record.
2PC (continued) • Ifthe coordinator has sent all the prepare messages but has not yet received a vote from site 1, can it abort the transaction at this point, and send abort messages to the subordinates? • If the coordinator has sent all the prepare messages, received a No vote from site 1, but has not yet received the votes of sites 2 and 3, should it wait for the two missing votes, or should it proceed to abort? • Ifsite 1 has received a prepare message and voted Yes, but has not received any commit or abort messages, and Site 1 contacts all other subordinates and discovers that they have all voted Yes, can site 1 commit the transaction?