150 likes | 246 Views
HotDep’05. Computational Risk Management for Building Highly Reliable Network Services. Chaki Ng Brent N. Chun Philip Buonadonna. Network Service Performance. Desire for Hard Performance Guarantees “99.999% availability,” “all trades < 30 seconds” Difficult to Achieve Consistently
E N D
HotDep’05 Computational Risk Management for Building Highly Reliable Network Services Chaki NgBrent N. Chun Philip Buonadonna
Network Service Performance • Desire for Hard Performance Guarantees • “99.999% availability,” “all trades < 30 seconds” • Difficult to Achieve Consistently • Demand: workload varies and can be bursty • Supply: resource needs vary and hard to plan for • Dedicated and Over-Provisioning • $$$, low utilization • Shared Infrastructure • Resource supply varies – competition, failures • Tradeoff supply and performance guarantees Chaki Ng || Computational Risk Management
Computational Service Provider (CSP) • Goal: mechanism to manage supply • Resources (e.g. server nodes) • Accommodate peak demand of most services • Markets of nodes • Each node sells resource contracts • Spot, futures, options • Contracts priced based on supply and demand Chaki Ng || Computational Risk Management
Measure Risk • How to quantify performance guarantees • Risk metrics: simple statistical summaries of undesirable outcomes • Example: Value-at-Risk (VaR) • Finance: “The Fidelity mutual fund will lose no more than $25MM monthly, with 95% probability” • Computation: “Amazon.com will process orders in less than 30 seconds daily for 95% of all orders” • Two challenges: calculate VaR and sensitivity analysis of VaR Chaki Ng || Computational Risk Management
Probability Probability 95% Var: -$27MM 95% Var: 33 seconds 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 Fidelity Fund Profit/Loss Amazon.com Order Time Calculate VaR • Calc expected performance distribution • Example method: historical • Methods: Variance, Monte Carlo, Stress Testing Chaki Ng || Computational Risk Management
Compute VaR: Model Supply and Demand Own ServiceWorkload Forecast Supply Set of Accessible Node Resources VaR Node Performance and Trade Forecast Aggregate Workload Forecast Chaki Ng || Computational Risk Management
Sensitivity Analysis of VaR • Goal: model how VaR varies as the set of resource contracts changes • VaR = F(set of resource contracts) • Forecast demand and supply • Nodes and aggregate workload forecast • Own client workload forecast • Model portfolio VaR • Swap set of resource contracts • Calculate VaR improvements Chaki Ng || Computational Risk Management
Portfolio Management • Goal: meet target VaR within budget and minimal cost • Continuous portfolio optimization • Find available set of resources • Find sets that achieve best VaR • Trade resource contracts • Buy best set within budget Chaki Ng || Computational Risk Management
MSFT ORCL Probability Fidelity Profit/Loss Finance: Manage Portfolio VaR VaR Portfolio EBAY IBM 95% Var: -$27MM Sell IBM @ $75 Buy EBay @ $37 Target VaR: “The Fidelity mutual fund will lose no more than $25MM monthly with 95% probability.” Financial Markets Chaki Ng || Computational Risk Management
95% Var: 33 seconds Node2 Node3 Probability 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 Amazon.com Order Time Computation: Manage Portfolio VaR VaR Portfolio Node4 Node1 Sell Node1 @ $50 Buy Node4 @ $30 Target VaR: “Amazon.com will process orders in less than 30 seconds for 95% of all orders.” CSP Chaki Ng || Computational Risk Management
Open Problems • Resource Contracts: pricing, base units • Programming: model, API • Modeling Supply and Demand • Portfolio Strategies: “standard portfolios” • Interoperability: across different CSPes Chaki Ng || Computational Risk Management
Conclusion • Dedicated vs. shared • CSP: share resources via markets • Achieve performance goals in the context of shared CSP • Quantify performance goal via risk metrics like VaR • Calculation and sensitivity analysis • Portfolio optimization Chaki Ng || Computational Risk Management
Backup Slides Chaki Ng || Computational Risk Management
Simple Experiment Service Workload Failover Node Failures Each request tries N nodes randomly If both nodes down failed request Successful Requests Daily Service Availability = All Requests Chaki Ng || Computational Risk Management
Results • Each point: 100 daily runs, 100 requests/hr Chaki Ng || Computational Risk Management