200 likes | 216 Views
Transaction Research History and Challenges. Invited talk for session on Systems Perspectives on Database Technology; Achievements and Dreams Forgotten @ ACM SIGMOD 2006, Chicago, Ill, 27 June 2006. Jim Gray Microsoft http://research.microsoft.com/~gray/talks/.
E N D
Transaction ResearchHistory and Challenges Invited talk for session on Systems Perspectives on Database Technology; Achievements and Dreams Forgotten @ ACM SIGMOD 2006, Chicago, Ill, 27 June 2006 Jim Gray Microsofthttp://research.microsoft.com/~gray/talks/ Thanks to: Phil Bernstein, Surajit Chaudhuri, Dave DeWitt, Rick Snodgrass, Gerhard Weikum
Databases Are State • DB is a collection of facts • Store the facts • Find the facts • Combine the facts to make new ones
Transactions are State Changes • And of course these changes are state (facts) my meta data is just your data It’s all rock ‘n roll to me. It’s turtles all the way down.
Transactions Have a LONG History Years Before Present Timescale 6000 1000 100 50 25 0 • First clay tablets were transaction records • General ledger – lots of technology there • Punched cards • Batch (tape) transactions • Online (concurrency & durability issues) • And now… What next? • I believe it is back to clay tablets….
“Formal” Transaction Notion Interesting History • Formalization happened concurrently in many groups: GE, IBM, MIT, Tokyo, … • Many others saw it as useless • Transactions don’t give THE right answer,they just give A answer. • Heated debate among the “enlightened” • “Winner” was wrong, but right at the time.
ACID came to define Transaction “elevator pitch” • Atomicity: All or nothing • Consistency: Preserve application invariants • Isolation: No concurrency surprises • Durability: No commitments lost Lesson: Simple Story Matters.It is IMPORTANT to get it right.
Initial state: A+B Not allowed.Why? Not a one-at-a time schedule. B, A A, B Red-Green Balls ExampleWhat’s Wrong With This Picture? A: Change all Red to Green B: Change all Green to Red Even people who have worked on this for 40 years, still puzzle about things like this. The “answer” is subtle. Probably there is no “answer”. We get to set the rules.
The Virtue of Transactions • They are simple • Convert complex errors into simple go / no-go • Simplifies component composition • Simplifies distributed system error handling(especially useful in a “cluster”) • Lampson: • Transactions are “pixie dust” that you sprinkle on your program to make it reliable.”
But… Technology Clouded our/my Thinking • Disk & RAM Storage was expensive • Accesses were expensive • So we discarded old values, did update-in-place • Makes it • easy to find current state • possible to find old state from log. • But, many applications want data lineage – databases don’t optimize for that. • But… now storage is “free”. Keep everything! Some kept old versions: Prime Codasyl, Oracle, Rdb,but “garbage collected
RestatementIt is a Mistake to Update Data • Discards information! • You should only ADD information • Examples • clay tablets, • general ledger • punched cards • Batch processing Old-Master New-Master
Correct SolutionTemporal Databases • No Update! No Delete! • Only Insert and Read-@-Time grouped into transactions • Every item has time dimension(s) • transaction time, valid time,… • This is BETTER than clay tablets & punched cards (they did not have valid time & transaction time) • Same as general ledger • References: • Bernstein, Hadzilacos, Goodman “transaction” book. • Snodgrass et. al., Temporal SQL • Reed Atomic Actions thesis
Lesson • Technology can warp your (my) thinking • No update with clay tablets, cards, tape • Disks allowed/encouraged update(precious disk space). • Now that disk space is free, … I see the error of my ways • But… at the time (1970..2005) it was “right” • Systems that tried, “failed” (e.g. Postgress, TSQL) • Real lesson: Good ideas can go bad Good ideas may have to wait
What About Durability • Discussion so far: atomic-consistent-isolated (ACI) state change • Durability always used replicas. • Log replica is compact but “useless” • Want object-replica • Want security, query, … • Log is just a technology for replicas. • Replica technology has made huge progress. • Problem: too many solutions . • Durability requires geo-plexLots of Copies Keep Stuff Safe (LOCKSS) • Challenge today is to simplify options.
What’s WRONG With ACID Transactions? • Transactions are an UN-availability feature • Correctness/consistency… Fight with “Do it now!!!”(Lesson: “Do it now!!!” usually wins) • Users hate to wait! • Transactions are • Good within an organization: I trust you! • Bad across organizations: Can I depend on you?
Workflow – Still An Elusive Goal • If X-is good, recursive-X is better • What is the generalization of transaction?If they are atomic, what are molecules?How to compose them? • Great!! progress on Multi-level Transaction Model (Weikum-Vossen book) • Limited progress on • workflow • parallelism within transactions…
Workflow Progress • There are LOTS of workflow systems. • What “concepts” have helped? • Compensation model • Simple metaphors (e.g. Sagas) • Commit–Abort dependencies
Aside: The Software Crisis and Transactional Memory • Software systems are getting too complex. • Try-catch fault handling model • Huge advance • Unworkable in complex systems. • Multi-core and Many-core force parallel programming • So, Software is in crisis (as usual). • Transactional memory (treat methods as sub-transactions) simplifies error handling. • Reminiscent of Randell’s Recovery Blocks and… • Great progress in this space, challenging problems. • PS: they definitely update in place.
Transaction Research Advicefor 2007… • Think in terms of temporal databases • Transactions of Insert and Read@time • Simplify replication (as a path to Durability) • LOCKSS is the key to durability • Temporal model may make it easier • Don’t give up on workflow • It is too important. • Non ACID workflow? But, all my advice on it has been a dead end. • Simpler programming model with Transactions? • Cleaner & Simpler fault handling. • Many-core parallelism?
The abstract I promised to talk about.Database Operating Systems: Storage & Transactions • Database systems now use most of the technologies the research community developed over the last 3 decades:Self-organizing data, non-procedural query processors, automatic-parallelism, transactional storage and execution, self-tuning, and self-healing. • After a period of linear evolution, database concepts and systems are undergoing rapid evolution and mutation -- entering a synthesis with programming languages, with file systems, with networking, and with sensor networks. Files are being unified with other types and becoming first-class objects. The transaction model appears to be fundamental to the transactional memory needed to program multi-core systems in parallel. • Workflow systems are now a reality. The long-heralded parallel database machine idea of data-flow programming has begun to bear fruit. Each of these new applications of our ideas raise new and challenging research questions. Blue are undelivered promisesSo it goes.