250 likes | 261 Views
This project focuses on implementing a real-time failover mechanism for a Blackjack game application, ensuring continuous gameplay for users. The architecture is fault-tolerant, with passive replication and efficient failover strategies. Performance evaluation shows significant improvements in failover times. The setup includes multiple servers, a Replication Manager, and client-server communication enhancements. Further improvements in failover time and runtime efficiency are suggested. Open issues include GUI enhancements, load balancing with Replication Manager, and performance profiling.
E N D
Team 2: The HouseParty Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang
Team Members Hwi Cheong (Paul) hcheong@andrew.cmu.edu Mohammad Ahmad mohman@cmu.edu Joohoon Lee jool@ece.cmu.edu Jun Han junhan@andrew.cmu.edu SukChan Kang sckang@andrew.cmu.edu
Baseline Application • Blackjack game application • User can create tables and play Blackjack. • User can create/retrieve profiles. • Configuration • Operating System: Linux • Middleware: Enterprise Java Beans (EJB) • Application Development Language: Java • Database: MySQL • Servers: JBOSS • J2EE 1.4
Baseline Architecture • Three-tier system • Server completely stateless • Hard-coded server name into clients • Every client talks to HostBean (session)
Fault-Tolerant Design • Passive replication • Completely stateless servers • No need to transfer states from primary to backup • All states stored in database • Only one instance of HostBean (session bean) needed to handle multiple client invocations efficient on server-side • Degree of replication depends on number of available machines • Sacred machines • Replication Manager (chess) • mySQL database (mahjongg) • Clients
Replication Manager • Responsible for server availability notification and recovery • Server availability notification • Server notifies Replication Manager during boot. • Replication Manager pings each available server periodically. • Server recovery • Process fault: pinging fails; reboot server by sending script to machine • Machine fault (Crash fault): pinging fails; sending script does nothing; machine has to be booted and server has to be manually launched.
Replication Manager (cont’d) • Client-RM communication • Client contacts Replication Manager each time it fails over • Client quits when Replication Manager returns no server or Replication Manager can’t be reached.
Failover Mechanism • Server process is killed. • Client receives a RemoteException • Client contacts Replication Manager and asks for a new server. • Replication Manager gives the client a new server. • Client remakes invocation to new server • Replication Manager sends script to recover crashed server
Failover Experiment Setup • 3 servers initially available • Replication Manager on chess • 30 fault injections • Client keeps making invocations until 30 failovers are complete. • 4 probes on server, 3 probes on client to calculate latency
Failover Experiment Result Latency (ms) Invocation #
Failover Experiment Results • Maximum jitter: ~700ms • Minimum jitter: ~300ms • Average failover time: ~ 404ms
Failover Pie-chart Most of latency comes from getting an exception from server and connecting to the new server
Real-time Fault-Tolerant Baseline Architecture Improvements • Fail-over time Improvements • Saving list of servers in client • Reduces time communicating with replication manager • Pre-creating host beans • Client will create host beans on all servers as soon as it receives list from replication manager • Runtime Improvements • Caching on the server side
Client-RM and Client-Server Improvements • Client-RM and Client-Server communication • Client contacts Replication Manager each time it runs out of servers to receive a list of available servers. • Client connects to all servers in the list and makes a host beans in them, then starts the application with one server • During each failover, client connects to the next server in the list. • No looping inside list • Client quits when Replication Manager returns an empty list of servers or Replication Manager can’t be reached.
Real-time Server • Caching in server • Saves commonly accessed database data in server • Use Hashmap to map query to previously retrieved data. • O(1) performance for caching
Real-time Failover Experiment Setup • 3 servers initially available • Replication Manager on chess • 30 fault injections • Client keeps making invocations until 30 failovers are complete. • 4 probes on server, 5 probes on client to calculate latency and naming service time • Client probes • Probes around getPlayerName() and getTableName() • Probes around getHost() – for failover • Server probes • Record source of invocation – name of method • Record invocation arrival and result return times
Real-time Failover Experiment Results Latency (ms) Invocation #
Real-time Failover Experiment Results • Average failover time: 217 ms • Half the latency without improvements (404 ms) • Non-failover RTT is visibly lower (shown on graphs below) Before Real-Time Implementation After Real-Time Implementation
Open Issues • Blackjack game GUI • Load-balancing using Replication Manager • Multiple number of clients per table (JMS) • Profiling on JBoss to help improve performance • Generating a more realistic workload • TimeoutException
Conclusions • What we have accomplished • Fault-tolerant system with automatic server detection and recovery • Our real-time implementations proved to be successful in improving failover time as well as general performance • What we have learned • Merging code can be a pain. • A stateless bean are accessed by multiple clients. • State can exist even in stateless beans and is useful if accessed by all clients cache! • What we would do differently • Start evaluation earlier… • Put more effort and time into implementing timeout’s to enable bounded detection of server failure.