1 / 25

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications. C. Faloutsos – A. Pavlo Lecture#26: Database Systems. System Votes. Two-Phase Commit. Commit Request. OK. OK. Application Server. Phase1: Prepare. OK. Phase2: Commit. Node 1. Node 2. Node 3. Coordinator.

ebenjamin
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Mellon Univ.Dept. of Computer Science15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#26: Database Systems

  2. System Votes CMU SCS 15-415/615

  3. Two-Phase Commit Commit Request OK OK Application Server Phase1: Prepare OK Phase2: Commit Node 1 Node 2 Node 3 Coordinator Participant Participant OK CMU SCS 15-415/615

  4. Paxos

  5. Paxos • Consensus protocol where a coordinator proposes an outcome (e.g., commit or abort) and then the participants vote on whether that outcome should succeed. • Does not block if a majority of participants are available and has provably minimal message delays in the best case. • First correct protocol that was provably resilient in the face asynchronous networks CMU SCS 15-415/615

  6. Paxos Commit Request Agree Accept Application Server Agree Propose Accept Commit Node 4 Node 1 Node 3 Node 2 Proposer Acceptor Acceptor Acceptor Agree Accept CMU SCS 15-415/615

  7. Paxos Commit Request Agree Accept Application Server X Propose Commit Node 4 Node 1 Node 3 Node 2 Proposer Acceptor Acceptor Acceptor Agree Accept CMU SCS 15-415/615

  8. Paxos Proposer Acceptors Proposer Propose(n) Agree(n) Propose(n+1) Commit(n) Reject(n, n+1) Agree(n+1) Commit(n+1) Accept(n+1)

  9. 2PC vs. Paxos • 2PC is a degenerate case of Paxos. • Single coordinator. • Only works if everybody is up. • Use leases to determine who is allowed to propose new updates to avoid continuous rejection. CMU SCS 15-415/615

  10. Google Spanner • Google’s geo-replicated DBMS (>2011) • Schematized, semi-relational data model. • Concurrency Control: • 2PL + T/O (Pessimistic) • Externally consistent global write-transactions with synchronous replication. • Lock-free read-only transactions.

  11. Google Spanner CREATETABLE users { uidINTNOTNULL, email VARCHAR, PRIMARYKEY(uid) }; CREATETABLE albums { uidINTNOTNULL, aid INTNOTNULL, name VARCHAR, PRIMARYKEY (uid, aid) } INTERLEAVEINPARENT users ONDELETECASCADE; users(1001) ⤷albums(1001, 9990) ⤷albums(1001, 9991) users(1002) ⤷albums(1002, 6631) ⤷albums(1002, 6634)

  12. Google Spanner • Ensures ordering through globally unique timestamps generated from atomic clocks and GPS devices. • Database is broken up into tablets: • Use Paxos to elect leader in tablet group. • Use 2PC for txns that span tablets. • TrueTime API

  13. Google Spanner Set A=2, B=9 Set A=0, B=7 Application Server Application Server T1 T2 NETWORK Node 1 Node 2 A=1 B=8 Paxos or 2PC

  14. Google F1 • OCC engine built on top of Spanner. • Read phase followed by a write phase • In the read phase, F1 returns the last modified timestamp with each row. No locks. • The timestamp for a row is stored in a hidden lock column. The client library returns these timestamps to the F1 server • If the timestamps differ from the current timestamps at the time of commit the transaction is aborted

  15. Redis • Remote Dictionary Server (2009) • Key-value data store: • Values can be strings, hashes, lists, sets and sorted sets. • Single-threaded execution engine. • In-memory storage: • Snapshots + WAL for persistence. CMU SCS 15-415/615

  16. Redis STRING → page:index.html <html><head>… → view_count 12345 SET → users_logged_in { 1, 2, 3, 4, 5 } LIST → latest_post_ids { 111, 112, 119, … } HASH Keys → time => 1430086922username => tupac user:999:session Values SORTED SET → odb ~ 11 tupac ~ 12 biggie ~ 19 eazye ~ 20 current_user_scores

  17. Redis • Asynchronous master-slave replication: • Master sends oplog to downstream replicas. • Do not wait for acknowledgements. • Newer version can ensure that at least some replicas are available before accepting writes but still not check whether they received those writes. CMU SCS 15-415/615

  18. Redis • Supports some notion of transactions: • Operations are batched together and executed serially on server side. • Allows for compare-and-swap. • Does not support rollback! CMU SCS 15-415/615

  19. MongoDB • Document Data Model • Think JSON, XML, Python dicts • Not Microsoft Word documents • Different terminology: • Document → Tuple • Collection → Table/Relation

  20. MongoDB • JSON-only query API • Single-document atomicity. • No server-side joins: • “Pre-join” collections by embedding related documents inside of each other. • No cost-based query planner / optimizer.

  21. MongoDB • A customer has orders and each order has order items. Customers R1(custId, name, …) ⨝ Orders R2(orderId, custId, …) ⨝ Order Items R3(itemId, orderId, …)

  22. MongoDB • A customer has orders and each order has order items. { "custId" : 1234, "custName" : "Christos", "orders" : [ { "orderId" : 10001, "orderItems" : [ { "itemId" : "XXXX", "price" : 19.99 }, { "itemId" : "YYYY", "price" : 29.99 } ] }, { "orderId" : 10050, "orderItems" : [ { "itemId" : “ZZZZ", "price" : 49.99 } ] } ] } Customer Customers Order Order Item Orders ⋮ Order Item Order Items

  23. MongoDB • Heterogeneous distributed components. • Centralized query router. • Master-slave replication. • Auto-sharding: • Define ‘partitioning’ attributes for each collection (hash or range). • When a shard gets too big, the DBMS automatically splits the shard and rebalances.

  24. MongoDB • Originally used mmap storage manager • No buffer pool. • Let the OS decide when to flush pages. • Single lock per database. • Version 3 (2015) now supports pluggable storage managers. • WiredTiger from BerkeleyDB alumni. • Fine-grained locking.

More Related