330 likes | 497 Views
Fault Tolerance and Performance Analysis in Wireless CORBA. Chen Xinyu. Supervisor: Markers:. Prof. Michael R. Lyu Prof. Jerome Yen Prof. John C.S. Lui. 2002-12-09. Outline. Motivation Wireless CORBA Fault Tolerant Wireless CORBA Performance and Availability Analysis
E N D
Fault Tolerance and Performance Analysis in Wireless CORBA Chen Xinyu Supervisor: Markers: Prof. Michael R. Lyu Prof. Jerome Yen Prof. John C.S. Lui 2002-12-09
Outline • Motivation • Wireless CORBA • Fault Tolerant Wireless CORBA • Performance and Availability Analysis • Conclusions and Future Work
Motivation • Mobile Computing • Permanent failures • Physical damage • Transient failures • Mobile host • Wireless link • Environmental conditions • Fault Tolerant CORBA • Entity replication
Home Domain Terminal Domain ab1 GIOP Tunnel Access Bridge Access Bridge mh1 Visited Domain Static Host Static Host Access Bridge Access Bridge ab2 Wireless CORBA Architecture Terminal Bridge GTP Messages
Terminal Domain Terminal Domain Terminal Domain ab1 Terminal Bridge Terminal Bridge Terminal Bridge GIOP Tunnel mh1 GIOP Tunnel Static Host Static Host mh1 mh1 GIOP Tunnel Terminal Domain GIOP Tunnel Terminal Bridge mh1 ab2 Wireless CORBA Architecture Home Domain Home Location Agent Access Bridge Access Bridge Visited Domain Access Bridge Access Bridge
Outline • Motivation • Wireless CORBA • Fault Tolerant Wireless CORBA • Performance and Availability Analysis • Conclusions and Future Work
Basic Concepts • Checkpoint • the saved program’s states during failure-free execution • Repair • brings the failed device back to normal operation • Rollback • reloads the program’s states saved at the most recent checkpoint • Recovery • the reprocessing of the program, starting from the most recent checkpoint, applying the logged messages and until the point just before the failure
Applying Access Bridge as stable storage Applying mobile host as stable storage Uncoordinated checkpointing Pessimistic message logging a large number of system messages or a large size of information carried in one message Checkpoints and Logs collection Device, Wireless & Mobile Issues • Device Issues • Slow processor • Small memory • Small disk space • Low power supply • Physical damage • Wireless Issues • High bit error rate • Little bandwidth • Long transfer delay • Mobile Issue • Handoff
Client Object Object Replica Multicast Messages GIOP Tunnel Access Bridge Mobile Side Fixed Side Fault Tolerance Architecture Mobile Host Mobile Support Station Static Server ORB Terminal Bridge ORB ORB Recovery Mechanism Logging Mechanism Recovery Mechanism Logging Mechanism Recovery Mechanism Platform Platform Platform
Access Bridge 1 Access Bridge 2 Access Bridge 3 Handoff Location Update Mobile Host Handoff Home Location Agent
Access Bridge 1 Access Bridge 2 Access Bridge 3 Location Update Handoff Mobile Host Handoff Home Location Agent
Access Bridge 1 Access Bridge 2 Access Bridge 3 Mobile Host Crash Home Location Agent
Access Bridge 1 Access Bridge 2 Access Bridge 3 Collect last checkpoint and succeeded message logs Reconnect Sorted by Ack. SN Messages Replay Mobile Host Recovery Home Location Agent
Outline • Motivation • Wireless CORBA • Fault Tolerant Wireless CORBA • Performance and Availability Analysis • Conclusions and Future Work
Assumptions • Failure occurrence, message arrival and handoff event homogeneous Poisson process with parameter , and respectively • Failures do not occur when the program is in the repair or rollback process • A failure is detected as soon as it occurs
Execution without Checkpointing Z0 H1 Hk F1 Fj m0(N) m1(n1) mj(1) mj(N) 0 t Y0 R H R H X0 X(N) Repair Handoff
Execution with Equi-number Checkpointing Ci-1 Ci Z i(0) Hi(1) Hi(k) Fi(1) Fi(j) mi0(a) mi1(ni1) mij(1) mij(a) 0 t C Yi(0) H H C R+C R+C Xi(0) Xi(N,a) Checkpointing Repair + Rollback Handoff
Average Availability • uptime interval: a program produces useful work towards its completion • downtime interval: • Repair and rollback • Handoff • Checkpoint creation • Wasted Computation • average availability: how much of the time an MH is in uptime interval during an execution
Equi-number Checkpointing • Equi-number checkpointing with respect to message number • Message number in each checkpointing interval is not changed • Equi-number checkpointing with respect to checkpoint number • Checkpoint number is not changed
Average Availability vs. Message Arrival Rate and Handoff Rate
Conclusions • Fault tolerant wireless CORBA • Equi-number checkpoiting strategy • LST and expectation of program execution time • Average availability • Optimal checkpointing interval • Beneficial condition
Future Work • Analysis model • The message queuing effect during repair and recovery • Failure detector • Distributed consensus with link failures, process failures, and mobile disconnections • Leads to a faster solution • Reduces communication costs • Fault tolerance in Ad Hoc network • Without infrastructure support • Self-organizing and adaptive