Mastering Transactions for Reliable Computing Systems

Transactions Paul Greenfield CSIRO

Why? • Why are transactions and transaction processing interesting? • They make it possible for mere mortals to program high performance, reliable and scalable computing systems • The basis of enterprise ‘mission critical’ computing

Programming is simple • Most business operations are really quite simple • Sell a book, withdraw cash, book a ticket, trade stocks, …. • Some database lookups • Some computation • Some database updates • But in the real world…

Real programming is hard • Real systems have to be fast • Real systems have to scale • Real systems have to be reliable • Real systems have to recover from failure when it does happen • And real systems are built by the average programmer

Speed and Scale • Speed • Responding quickly to requests • Customers can’t be kept waiting too long, especially on the Web • Scale • Banks have thousands of ATMs, stores have hundreds of registers, Web sites can have many thousands of users • All need fast response times

Speed and Scale • Solution is concurrency • Doing more than one operation at a time • Processing while waiting for I/O • Using multiple processors • Problem is concurrency • Interference between programs • Conflicts over shared resources • Can’t sell the same seat twice

Reliability • What happens when a computer systems is down? • Lost customers, sales, money, …. • 24x7 operations, 99.99% availability • Solutions • Stand-by systems (warm or hot) • Clusters • Based on transactions

Failure • Computer hardware fails! • System software has bugs!! • Application programs have bugs!!! • System still has to be reliable!!!! • Fail cleanly (maintain data integrity) • Recover from transient errors • Recover after system failure

Data Integrity • Business has rules about its data • Money must always be accounted for • Seats cannot be sold twice or lost • Easy if system never fails… • Transactions maintain integrity despite failures

Transaction Systems • Efficiently handle high volumes of requests • Avoid errors from concurrency • Avoid partial results after failure • Grow incrementally • Avoid downtime • And … never lose data

Ancient History • Earliest transaction systems • Airline reservations (1960’s) • SABRE. 300,000 devices, 4200 requests/sec custom-made OS • Banking, government and other very large systems (1970’s) • CICS, IMS, COMS, TIP • Large (expensive) mainframe computers

Recent Past • UNIX systems (1980’s…) • TP monitors • Tuxedo, TopEnd, Encina, CICS, … • Object Transaction Monitors • Databases • Oracle, Sybase, Informix • Basis for mainstream commercial computing • Coexisting with mainframe TP

Present • Commodity transaction processing • Windows NT, Windows 2000 • MTS and COM+ from Microsoft • UNIX TP ports • Tuxedo, Orbix, Web Logic, … • SQL/Server, Oracle, … • Becoming pervasive

What is a Transaction? • A complete, indivisible business operation • Book a seat • Transfer money • Withdraw cash • Sell something • Borrow a book

ACID • Transaction systems must pass the ACID test • Atomic • Consistent • Isolated • Durable

Atomic • Transactions have to be atomic • All or nothing • Execute completely or not at all • Even after failure and recovery • Successful transactions commit • Changes made permanent • Failing transactions abort • Changes backed out

Atomic Example • Transferring money • Move $100 from account A to account B • Take $100 from account A • Put $100 in account B • Both actions have to take place or none • Failure after withdrawal step? • Money disappears??

Consistency • Move data from one consistent state to another • Money in bank is accounted for and not ‘lost‘ • Really an application program responsibility • But AID makes helps by making programming simpler

Isolation • Every transaction thinks it is running all alone (isolated) • Reality of concurrency is hidden • Transactions running together do not interfere with each other • Looks like transactions are run serially • Illusion assisted by databases

Isolation Example • Banking • Two ATMs trying to withdraw the last $100 from an account fetch balance from account fetch balance from account update account (balance=balance-$100) update account(balance=balance-$100) • Isolation stops this from happening • Second transaction waits for first to complete

Durable • Once changes are committed they are permanent • Even after failure and recovery • Changes written to disk • Wait for write to complete • Largely responsibility of database • DB told to commit changes by transaction manager

Distributed Transactions • Now do all this across multiple computer systems… • Geographically dispersed • Multiple application servers • Multiple database servers • All working together • A single transaction can use all of these resources • Full ACID properties still maintained

Distributed System

Distributed Transactions • What happens when a transaction updates data on two or more systems? • Transaction still needs to be atomic • All updates succeed or all fail • But systems can independently fail and recover! • Transaction manager keeps track and coordinates changes using 2PC

No Two Phase Commit? • Without 2PC updates can be lost and data can become inconsistent when systems fail and recover Withdraw $100Deposit $100 Sydney Withdraw $100Commit Melbourne Deposit $100*System fails* System recovers but update lost Money withdrawn but deposit lost

Two Phase Commit • Transaction manager coordinates updates made by resource managers • Phase 1 • Prepare to commit • Phase 2 • Commit • Transaction manager always knows the state of the transaction

Phase 1 • Transaction manager asks all RMs to prepare to commit. • RMs can save their intended changes and then say ‘yes’. • Any RM can say ‘no’. • No RM actually commits yet! • If all RMs said ‘yes’, go to Phase 2. • If any RMs said ‘no’, tell everyone to abandon their intended changes.

Phase 2 • Transaction manager asks all resource managers to go ahead and commit their changes. • Can now recover from failure • RM knows what transactions were questionable at point of failure • TM knows whether transactions succeeded or failed

Two Phase Commit Coordinator Particpant Coordinator Particpant Prepare Prepare Prepared No Commit Abort Done Done Successful transaction Failing transaction

Transaction Managers • Just what is a TM? • Can be part of the database software • Updating multiple Oracle databases… • Can be part of a ‘transaction monitor’ • CICS, Tuxedo, … • Can be stand-alone • MS DTC, X/Open model

Distributed Transactions • Multiple Transaction managers • One per node (computer) involved, looking after their local resource managers • TMs cooperate in distributed transactions • Produces transaction trees • Each TM coordinates TMs below it • One root TM (where it all started) • New branch when apps invoke code or access a resource on a new node

Transaction Trees TM RM TM TM TM RM RM RM RM

X/Open Standards • Standard model and interfaces • XA: TM to RM • TX: Application to TM Application program SQL,… TX XA Resource manager Transaction manager

Resources • Much more than just databases • ATMs, printers, terminal responses, … • Can only issue cash or print cheque if transaction is successful Begin update account dispense cashCommit Begin update accountCommit dispense cash **Crash**

Resources • Older TP systems supported files, printers, terminal output, data comm • App wrote output normally • OS/TP deferred actual output until transaction committed • Application problem? • Write to database for background app? • Use transactional message queues?

TP Monitor • Mainframe transaction processing (TP) • CICS, IMS, COMS, TIP, … • Terminals, batch input • Screen formatting • Message-based with transaction code routing • Normally ‘stateless applications’ for scalability

Stateless? • Transaction is a single atomic and complete operation • No data left behind in application • Scalability through multiple copies of server applications • A transaction can be sent to any copy of the application • Copies can be created as necessary

COMS • Burroughs/Unisys TP Monitor (1984+) TP app COMS TP app Queue

Controlling Transactions • Who really controls the transaction? • Client or server? • Explicit vs implicit transactions? • SQL transactions • Started somehow • Tx_begin, method call, SQL statement • Finished by application code • Commit or Abort

API Models • Client-side transaction control • Explicit begin-transaction • Explicit Commit/Abort • Server-side transaction control • Perform one complete business transaction in each call Tx_beginCall withdraw(accno, amount)Call deposit(accno, amount)SQL insert into audit ….Tx_commit Dim acc = new AccountAccount.transfer(accfrom, accto, amount)

Transaction Models • Server-side fits with n-tier models • Client just does presentation • Business logic in server transactions • Declarative transactions (EJB, MTS) • No leakage between layers Client Application Database

API models • Message-based • Route by transaction code (eg BKFLT) • Very flexible, returns another message • Procedure calls (RPC-based) • Calling transactional procedures with parameters & return values • Method calls • Objects with transactional methods

Procedure Models • Extension of RPC technologies • Encina • Faded technology, not fashionable • Not object-oriented • Marshalling parameters • Turns function to message & back • Mask differences between platforms • Uses proxies and stubs

Object Models • Extension of ORB technologies • Call methods on remote objects • CORBA standards (OTMs) • Fading technology • An OO model • Objects on servers  state on server • Balancing load? Scalability?

Component Models • Rising star • Microsoft with MTS & COM+ • EJB • Call methods on remote components • Look like objects to clients • Normally stateless • Declarative • Transactional-ness is a property, not coding concern

Stateless • Clients think they have server objects • Reference stays around but not object • Object created when method called • Object ‘destroyed’ when method finishes • Reduced server resource usage • Methods can run anywhere • Not bound to server objects • Scales to server farms

More Complications • Security • Implementation models & internals • Scaling questions • Reliability and fault-tolerance • Programming models

Next Week • Databases • How do we get isolation? • How do recover? • More technical details • Protocols, logging • Recovering from failure

Mastering Transactions for Reliable Computing Systems

Mastering Transactions for Reliable Computing Systems

Presentation Transcript

Transactions

Transactions

Transactions

Transactions

Transactions

TRANSACTIONS

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions

Transactions