160 likes | 292 Views
Behavior Isolation in Enterprise Systems. Mohamed Mansour mansour@cc.gatech.edu. Travel Industry Example. Client 1. clearinghouse . Client 2. Message queue. Client 3. GDS. Airlines. Message queue. GDS. GDS Scale. Mission critical environment 24/7 11.5 million queries/days
E N D
Behavior Isolation in Enterprise Systems Mohamed Mansour mansour@cc.gatech.edu
Travel Industry Example Client 1 clearinghouse Client 2 Message queue Client 3 GDS Airlines
Message queue GDS GDS Scale • Mission critical environment • 24/7 • 11.5 million queries/days • 2-16 seconds processing time • ~10GB data set, 20% annual increase • 8 updates per day, moving to seamless updates
Why We Care? • Business • Consumer Loyalty • Violates contractual agreements • Technical • Occurs even in highly engineered systems • Can cause ripple effects
Lets Just Fix it! • Difficult to identify root cause • Constant data changes • Request stream dependency • Sometimes can’t fix root cause • 3rd part libraries • Interactions with OS, and H/W caches • Complex code base
I(solation) Queue • Dynamic management of message streams • Correlate message sequences with server behavior • Learning phase • Isolate undesired sequences • Control phase • Evaluation metrics • Quality of Information metrics (QoI)
Learning Phase • Use online learning methods • Statistical correlation [ICSOC 06] • HMM [GIT-CERCS-06-11] • Behavior Model • Associate undesired behaviors with certain input patterns
Control Phase • Observe input message sequence • Control sequence dispatched to each server to maintain QoI • Dispatcher • Reordering messages in queue
I-Queue Applied to Worldspan Pricing Engine • Affects customer relations • Possible impact on consumer experience – less options • Objective: return maximum number of alternate fares • Problem • Variable number of alternate fares for same query • Root cause unknown
Establishing Behavior Model • Heuristics point to query geographies • Geography based on From/To city pair, e.g. East Coast to EU • Fare data stored in disk files separated by geography • Use geo-locality as our predictor • Goal: improve geo-locality
Modified Queue Dispatcher • Dispatcher maintains server execution history • Request routed to an available server with matching geography Message queue GDS
Evaluation • Used real traces from Worldspan • Set of about 1800 requests • 20% process in 16 seconds • Geography extracted from messages • Hand-coded mapping from city pairs to geography code • Processing times measured using Worldspan servers • Completely static environment • Simulations to measure geo-matching • Compare different isolation points
Improvement in Geo-locality • Matching improves 6 times for min. farm size • Matching can improve further by adding more servers
Choosing the Right Metrics to Monitor • Min. of 28 servers to avoid queuing delays • Geo-match increases with more servers • Queuing delay is not the best metric to monitor