1 / 19

Replication and Distribution

Replication and Distribution. CSE 444 Spring 2012 University of Washington. HASH MAPS. Hash Maps. Precursors to Bloom filters. Used to reduce communication while joining. S = Set to transmit. S = {x 1 , x 2 , …, x n } H = Hash Map. An array of m bits. . Operation.

mairi
Download Presentation

Replication and Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replication and Distribution CSE 444 Spring 2012 University of Washington

  2. HASH MAPS

  3. Hash Maps • Precursors to Bloom filters. • Used to reduce communication while joining. • S = Set to transmit. • S = {x1, x2, …, xn} • H = Hash Map. • An array of m bits.

  4. Operation • To insert x in H: • Compute the hash on x to get a bit position j • Set j to 1. • To send S, insert all of its elements in H. • Two distinct elements can hash to 1 position. • Creates false positives.

  5. Question Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows: 1. R computes a hash map M with cN bits, where c=8 and sends it to S. 2. Schecks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S. Q: Indicate the total number of bytes transferred over the network in each step.

  6. Analysis • Recall |H| = m. • Insert one element into H. • Probability that bit j remains 0? • p = (1 – 1/m)

  7. Analysis • Recall |H| = m. • Insert alln elements into H. • Probability that bit j remains 0? • p = (1 – 1/m)n = e-n/m (for large m)

  8. Probability of False Positives • Take a random element y, and check if its hash is set to 1 in H. • Probability of FP = probability that the hash is 1. • Probability that bit j is 1? • p = 1 – (1 – 1/m)n = 1 – e-n/m (for large m)

  9. Question Data supplier R has N = 1 million documents. Data supplier S also has N = 1 million documents. Each document is 1KB. They have 50 documents in common and they want to compute these. They will proceed as follows: 1. R computes a hash map M with cN bits, where c=8 and sends it to S. 2.S checks its items in M and sends all matches to R. 3. R computes the result and sends the matching 50 documents to S. Indicate the total number of bytes transferred over the network in each step.

  10. Solution • Step 1: Send the hash map. • cN bits = 1 million bytes = 1 MB. • Step 2: Number of matched tuples (included false positives) • FP rate = 1 – e-n/m = 11% • 110,000 false positive documents • 110,050 documents in total (including the 50 common ones) • 110.05 MB • 50 documents = 50KB • Total of 111.1 MB The naïve solution without hash maps takes 1 GB of data transfer

  11. Distributed locking

  12. Setup 10% read only 2% writes 10% read only 2% writes 50% read only 2% writes 10% read only 2% writes 10% read only 2% writes Each site can communicate with every other site.

  13. Read-locks-oneWrite-locks-all What is the average number of inter-site messages exchanged? All reads are local, so no locks are acquired. Each write requires 4 other locks

  14. Majority locking What is the average number of inter-site messages? 2 other locks needed for both reads and writes. What if you could broadcast across sites with 1 message? Lock acquisition and release is 1 message for all sites Lock grants still takes at 1 message per site.

  15. Primary-copy locking What is the average number of inter-site messages? The copies need to acquire locks for each operation. 48% of the actions need locks.

  16. Two phase commit

  17. Two-Phase Commit • Coordinator : 0 • Three subordinates : {1, 2, 3} • Messages • P (Prepare) • C (Commit) • A (Abort) • Y (Yes vote) • N (No vote) • Ignore acks.

  18. 2PC • What messages are exchanged for a successful commit? • (0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C) • When exactly does the commit occur? • When coordinator force-wrote the commit record.

  19. 2PC (continued) • Ifthe coordinator has sent all the prepare messages but has not yet received a vote from site 1, can it abort the transaction at this point, and send abort messages to the subordinates? • If the coordinator has sent all the prepare messages, received a No vote from site 1, but has not yet received the votes of sites 2 and 3, should it wait for the two missing votes, or should it proceed to abort? • Ifsite 1 has received a prepare message and voted Yes, but has not received any commit or abort messages, and Site 1 contacts all other subordinates and discovers that they have all voted Yes, can site 1 commit the transaction?

More Related