1 / 22

Transactions, Concluded, and the Future of Data Management

Transactions, Concluded, and the Future of Data Management. Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 4, 2003. Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke. Final Administrivia.

tyne
Download Presentation

Transactions, Concluded, and the Future of Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 4, 2003 Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke

  2. Final Administrivia • Project demos today and tomorrow • Final exam handed out at the end of today’s class • Finals plus project reports due by 1PM, 12/18/2003 • Project reports should be ballpark 10-15 pages • Remember, quality and clarity of presentation matters! • Also, email me a brief message detailing: • Your contributions to the project • Your group members’ contributions and your assessment of “group dynamics” • Turn in at my office, 576 Levine Hallor to my assistant, Kathy Venit, in 308 Levine Hall

  3. Last Time… • We were discussing isolation levels • How to keep transactions from interfering with one another • Or at least, how to minimize this • Recall the strongest version of isolation was serializability

  4. Theory of Serializability • A schedule of a set of transactions is a linear ordering of their actions • e.g. for the simultaneous deposits example: R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal) • A serial schedule is one in which all the steps of each transaction occur consecutively • A serializable schedule is one which is equivalent to some serial schedule (i.e. given any initial state, the final state is the same as one produced by some serial schedule) • The example above is neither serial nor serializable

  5. Questions of Concern • Given a schedule S, is it serializable? • How can we "restrict" transactions in progress to guarantee that only serializable schedules are produced?

  6. Conflicting Actions • Consider a schedule S in which there are two consecutive actions Ii and Ij of transactions Ti and Tj respectively • If Ii and Ij refer to different data items, then swapping Ii and Ij does not matter • If Ii and Ij refer to the same data item Q, then swapping Ii and Ij matters if and only if one of the actions is a write • Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q)

  7. Testing for Serializability • Given a schedule S, we can construct a di-graph G=(V,E) called a precedence graph • V : all transactions in S • E : Ti Tj whenever an action of Ti precedes and conflicts with an action of Tj in S • Theorem: A schedule S is conflict serializable if and only if its precedence graph contains no cycles • Note that testing for a cycle in a digraph can be done in time O(|V|2)

  8. T1 T2 T3 R(X,Y,Z) R(X) W(X) R(Y) W(Y) R(Y) R(X) W(Z) T1 T2 T3 Cyclic: Not serializable. An Example

  9. Another Example T1 T2 T3 R(X) W(X) R(X) W(X) R(Y) W(Y) R(Y) W(Y) T1 T2 T3 Acyclic: serializable

  10. Producing the Equivalent Serial Schedule • If the precedence graph for a schedule is acyclic, then an equivalent serial schedule can be found by a topological sort of the graph • For the second example, the equivalent serial schedule is: • R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X)

  11. Locking and Serializability • We said that for a serializable schedule, a transaction must hold all locks until it terminates (a condition called strict locking) • It turns out that this is crucial to guarantee serializability • Note that the first (bad) example could have been produced if transactions acquired and immediately released locks.

  12. Well-Formed, Two-Phased Transactions • A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesn’t release the lock until the action is performed • Locks are also released by the end of the transaction • A transaction is two-phased if it never acquires a lock after unlocking one • i.e., there are two phases: a growing phase in which the transaction acquires locks, and a shrinking phase in which locks are released

  13. Two-Phased Locking Theorem • If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability • i.e., there is a very simple scheduler! • However, if some transaction is not well-formed or two-phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable • i.e., one bad apple spoils the bunch.

  14. Summary of Transactions • Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system • Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to some serial execution) • Practically, this may adversely affect throughput  isolation levels • With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate

  15. What to Look for Down the Road • … well, no one really knows the answer to this… • … But here are some hints, ideas, and hot directions • Sensors and streaming data • Peer-to-peer meets databases • “The Semantic Web” • Collaborative data sharing

  16. Sensors and Streaming Data • No databases at all… • … Instead we have networks of simple sensors • Madden, starting at MIT • Gehrke, Cornell • Widom, Stanford • queries are in SQL • data is live and “streaming” • we compute aggregates over “windows”

  17. What’s Interesting Here • We’re not talking about data on disk – we’re talking about queries over “current readings” • Sensors are generally “stupid” and may be battery-operated • A lot of challenges are networking-related: how to aggregate data before it gets sent, etc. • The next step (e.g., work initiated here @ Penn): including sensors that capture images – a very different problem! • This has many more compelling applications – security, monitoring, correlating multiple sensors, rescue operations, military logistics and coordination, etc.

  18. Peer-to-Peer Computing • Fundamentally, our model of DBMSs tends to be centralized • Even for data integration: there’s a single mediator • This has many implications: central administration, central coordination, etc. • What can be gained from borrowing a page from peer-to-peer systems like Napster, Kazaa, etc.? • A better architecture? • Solutions to many problems unsolved by distributed DBMSs? • Replication, object location, distributed optimization, resiliency to failure, … • New types of applications, e.g., in integration?

  19. P2P Work • As a new architecture for storage and querying • PIER (Berkeley), P-Grid (EPFL), Medusa (MIT) • A better way of thinking about translating and exchanging data • Piazza (Washington), Orchestra (Penn), Hyperion (Toronto), work at Trento

  20. The Semantic Web • In some ways, a very “pie-in-the-sky” vision • But some real and concrete problems might be partly solvable • Goal is really very similar to data integration, where somehow we have mappings between the schemas • Currently, most people in the SW community are from knowledge representation community and use RDF • Focus: very rich ways of describing schemas – “ontologies” – that blend querying with class definitions • “Teachers are people who teach students”“Tenure-track professors are teachers at universities who can get tenure”; etc. • Implicit take on the problem: if we create better languages for describing ontologies, it’s easier to mediate between schemas

  21. Holes in the Semantic Web • What issues and concerns came up in the data integration assignment you had? • Do you think a richer schema language would help for these? • Do you think “better normalization” would help? • Fundamentally, we need: • Languages for not only describing relationships, but transformations between formats (e.g., XML schemas) • Automatic or partly automated ways of discovering mappings and correspondences • These are all database problems, and the solution likely must come from the DB community • This is part of what P2P systems like Piazza, Hyperion try to address

  22. My Take on the Future • We’ve evolved from a world where data management is about controlling the data • Instead, data management is about translating and transforming data using declarative languages • It should ultimately become much like TCP or SOAP – a set of standard services for “getting stuff” from one point to another, or from one form to another • It’s the plumbing that connects different applications using different formats • Orchestra project at Penn: focuses on how to build a system for supporting collaborative science • People publish and map data in different schemas • What happens if people start updating it? • How do you propagate, manage, trace, reconcile changes?

More Related