Computational Economics and Job-Specific Service Level Agreements

Computational Economics and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS

Outline • Introduction • Service Level Agreement • Proposed SLA structure: based on WS-agreement standard • Aim: to build price comparison service for computational market • Potentials in computational market • Analogy: Financial market • Relevant literatures • Financial risk management, portfolio theory • Value-at-Risk • VaR Monte Carlo portfolio simulation evaluations • Constructing Job-specific SLA • Building probability of failure • Building probability of completion • Building job-specific charges • Managing multiple Job-specific SLAs (providers) • Conclusion and Future Work

Service Level Agreement (SLA) • negotiated agreement document • legal service contract: rights and liabilities • Requirements • Charges • Legal issues • Penalty • Long and boring with lots of legal terms

Google APPS (SLA) (standard edition agreement) … ANY USE (of APP service) THEREOF SHALL BE AT CUSTOMER'S OWN RISK. GOOGLE AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND … NON-INFRINGEMENT. GOOGLE ASSUMES NO RESPONSIBILITY FOR THE PROPER USE OF THE SERVICE. … GOOGLE MAKES NO REPRESENTATION THAT GOOGLE (OR ANY THIRD PARTY) WILL ISSUE UPDATES OR ENHANCEMENTS TO THE SERVICE. GOOGLE DOES NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE SERVICE WILL BE UNINTERRUPTED OR ERROR FREE. (SLA) During the Term of the applicable Google Apps Agreement, the Google Apps Covered Services web interface will be operational and available to Customer at least 99.9% of the time in any calendar month (the "Google Apps SLA"). If Google does not meet the Google Apps SLA, and if Customer meets its obligations under this Google Apps SLA, Customer will be eligible to receive the Service Credits (not money back but 3 to 15 days longer service) ... Customer must notify Google within thirty days from the time Customer becomes eligible to receive a Service Credit. Failure to comply with this requirement will forfeit Customer’s right to receive a Service Credit.

AWS (SLA) (EC2) AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit. (S3) AWS will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the “Service Commitment”). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit (10% to 25% of you monthly billing). “The test for commercially reasonable efforts is less stringent than that imposed by the ‘best efforts’ clauses contained in some agreements.” -- http://definitions.uslegal.com/c/commercially-reasonable-efforts/ To receive a Service Credit, you must submit a request (i) include your account number … (ii) include … the dates and times of each incident of Region Unavailable that you claim to have experienced including instance ids of the instances that were running and affected during the time of each incident; (iii) include your server request logs … (iv) … within thirty (30) business days … 99.95% availability = 0.178days/year down = 4.3 hours/year down

Job-specific SLA • Describe services of particular submitted task • Server management: automatically and dynamically create SLAs while the user demand changes, per-job SLA , different from ITIL (a general continual SLA) • Concept and practice of SLA brings the notion of risk management into computational market • to ensure QoS:Act as a contract between providers and users, negotiate with brokers. • Clarifies the business nature and parties’ obligations • TWO FRAMEWORKS: • Web Service Agreement (WS-Agreement. OGF): GRAAP, part of Service-Oriented Architecture (SOA), XML syntax • Web Service Level Agreement (WSLA): IBM

SLA Structure • SDTs: identify the work to be done • the required platform; • the software involved; • the set of expected arguments; • input/output resources; • etc. • GTs: provide assurance between provider and requester on quality of service (QoS) • price of the service; • insurance price; • the probability of failure; • the penalty for failure; • the starting time • the probability of completion; • etc. Structure Xml example

Basic Grid Service Use Case

Aim: provide the same kind of comparison service for compute resources. Goods: tasks with job specification. Retailers: resource providers. Invoice: SLA. Other factors: insurance, risk or availability confidence level etc.

Grid, Utility, Cloud…… Computing Potential Market

Grid, Utility, Cloud…… Computing Potential Market Biggest structure change in IT since 1960s. TechMarketView: by 2012, uk software market 15% will be delivered by Cloud. (22% are applications) Computational Market

Grid, Utility, Cloud…… Computing Potential Market Computational Market Economics Issues ......................................absent: Pricing, Liability, etc. Service Level Agreements Risk Assessment

Grid, Utility, Cloud…… Computing Potential Market Computational Market Economics Issues ......................................absent: Pricing, Liability, etc. Service Level Agreements Risk Assessment Resource Monitoring Time series Analysis .................. .............. ................................. ............................... ........Analysis Analogy ........ ....... ........................... Derivatives Risk Ana Financial Derivatives Financial Risk Management Measures Financial Market

Background and literature: • Financial Grids: • Macleod G., Donachy P., Harmer T.J., Perrot R. H., Conlon B., Press J., Lungu F., “Implied Volatility Grid: Grid Based Integration to Provide On Demand Financial Risk Analysis”, Belfast e-Science Centre, Queen’s University of Belfast, 2005. • Donachy P., Stødle D., “Risk Grid - Grid Based Integration of Real-Time Value-at-Risk (VaR) Services”, EPSRC UK e-Science All Hands Meeting, 2003. • Germano G., Engel M., “City@home: Monte Carlo derivative pricing distributed on networked computers”, Grid Technology for Financial Modelling and Simulation, 2006. • Schumacher J., Jaekel U., and Zimmermann F., “Grid Services for Derivatives Pricing”, Grid Technology for Financial Modelling and Simulation, 2006. • Grid economics: • Gray, J. (2003): Distributed Computing Economics. Microsoft Research Technical Report: MSRTR-2003-24 (also presented in Microsoft VC Summit 2004, Silicon Valey, April 2004) • Chetty, M. and Buyya., R. (2002). Weaving electrical and computational grids: How analogous are they? Computing in Science and Engineering, to appear, May/June 2002. • Kenyon, C. and Cheliotis, G. (2002). Architecture requirements for commercializing grid resources. In 11th IEEE International Symposium on High Performance Distributed Computing (HPDC'02). • Kenyon, C. and Cheliotis, G. (2003), Grid Resource Commercialization: Economic Engineering and Delivery Scenarios. Grid Resource Management: State of the Art and Research Issues. • Kerstin, V., Karim, D., Iain, G. and James, P. (2007), AssessGrid, Economic Issues Underlying Risk Awareness in Grids, LNCS, Springer Berlin / Heidelberg • Birkenheuer, G., Hovestadt, M., Voss, K., Kao, O., Djemame, K., Gourlay, I., Padgett,J.: Introducing Risk Management into the Grid. Proc. 2nd IEEE Intl. Conf. on e-Science and Grid Computing, Amsterdam, The Netherlands (2006)

Comparison

Grid for Financial Risk Analysis • Risk Fact: • Risk is an integral part of the real world in general, and the financial world in particular. • Market • Grid infrastructures in Bank of America and HSBC: 3000 to 6000 processors • Computational services market: Customers willing to pay for use of computer systems instead of purchasing and maintaining hardware and software. • Grid / Cloud: HP, Amazon, Sun, IBM etc. • Financial Risk Management: • Monitory based, losses or profits. • Risk can only be reduced (Mitigated) but never eliminated. • Fundamental risk management theory: Portfolio (diversification). • To ensure market event has reduced impact on the whole portfolio • Depends on the correlation or covariance of the return and other assets. • Diversified portfolio: standard deviation of each asset; correlation among assets • Useful analysis measurements (models): Mean-Variance; Correlation; The sensitivities (The Greeks); Value-at-Risk

Value-at-Risk (VaR) • Defined by Philippe Jorion, Value at Risk theory “summarizes the worst maximum potential loss in value of a portfolio of financial instruments over a certain target horizon with a given level of confidence”. • 3 Components: • Confidence Level (Quantiles), • Holding Period (Time Horizon) • Monetary Base.

Value-at-Risk (VaR)

Value-at-Risk (VaR) Monte Carlo Simulation using Condor DAG Methods Comparison

VaR Monte Carlo Simulation Evaluation Single Financial Instrument MSC Speedup Option-free Financial Portfolio MSC Speedup

The Bridge Service-based Financial Grids Complex financial products and markets compute Resources Risk-balanced portfolio Computational Economics Risk analysis provide construct Develop possible formulation

SLA Structure • SDTs: identify the work to be done • the required platform; • the software involved; • the set of expected arguments; • input/output resources; • etc. • GTs: provide assurance between provider and requester on quality of service (QoS) • price of the service; • insurance price; • the probability of failure; • the penalty for failure; • the starting time • the probability of completion; • etc. Structure Xml example

The Bridge • Grid based financial risk analysis applications (Financial Grids): • Great demands on available resources; • Assume availability at any given time. • Aim: • Ability to predict (risks of resource availability for) the predictability(risks on historical use portfolio). • Major impetus for work-Uncertainty: availability of computation Resource -Predict future resource availability: computation Resource Monitoring

Building probability of failure • Closest work: Kerstin et al: risk-aware Grid architecture. • Kerstin, V., Karim, D., Iain, G. and James, P., “AssessGrid, Economic Issues Underlying Risk Awareness in Grids”, LNCS, Springer Berlin / Heidelberg, 2007 • Specific financial analysis for creating computation economy over queuing-based systems. • Computation Economy as a commodity market; • Due considerations: • 1. For trading and hedging of risk, options, futures and structured products. • 2. Collecting data: historical computation resource use -> predict future resource use for such class of applicatioons. • 3. Construction of portfolios of computer resources (Extension of financial models (CDOs) offers potential for a future market in computation economics) . • Diversify the risk (resource probability of failure) within the overall portfolio.

Predict Future Resource Availability • Grid Resource Historical Usage Analyzing: • Data source: UK’s National Grid Service (NGS) • Monitoring system: Ganglia • Grid middleware: Globus • Data dimensions: 37 system metrics in XML, including use of network bandwidth, temperature and CPU use • Minimum capture interval: 15 seconds • Measurements: • Distribution analysis • Skewness, Kurtosis analysis • Prediction: • Simulation under normal distribution assumption • Simulation under Laplace distribution assumption CPU usage (Real Time, year data) CPU usage (Changes, year data) CPU usage (Changes, MC simulated, normal)

Building job-specific charges Price Comparison Service: Ami: computation resource price benchmark. Amazon Web Service: success Cloud business model; computation resource cost in real market.

Price benchmark • Reliability: Of 64 instances in 10 experiments, only 7 completed (1 failing node in other 3)

Building probabilityof completion Foster’s Hypothesis

234s 106s 76s

Performance • Is a Cloud better than a Supercomputer? • Grid/HPC: shorter application runtime and less distributions • Cloud: longer application runtime and larger distributions • ready and relatively easy to use.

Managing multiple SLAs • Future commercialized computational market: multiple providers (SLAs) • Collateralized Debt Obligations (CDOs) • Structured transaction • Generic CDO: • Special Purpose Vehicle (SPV) • Underlying assets • Collateral Management • Tranche Management • Risk-identified chunks: Tranches (in the order that secured to be get paid. Eg. AAA; AA; BBB; BB and equity) • Premium: basis points for each tranche Financial CDO CDO Components

Constructing Resource CDO • Processes: • sort resources among the system into different classes according to the historical information. • make different basis points with premium to guarantee various performances. • top class resource should have highest premium to insure the most availability and performance. resources CDO

Managing multiple SLAs (Autonomic SLAs) • Dynamically alter themselves as the resource status changes. • Strongly connected to the resource CDO, therefore the monitoring system. • Also considers the situation while the job in tranches fails. • The more expensive and lower risk submission is always guaranteed completion. • Protects the processes in the more senior tranches. • Protecting the brokers. • Multiple providers? Future grid and Cloud computing will benefit.

Conclusion and future work • Analogy: • Financial price changes – Computation resource usage changes • Financial risk management – Risk assessment in computation market • Financial derivatives – Service level agreement • Build Computation Economy: • Key: Binding autonomic SLA with Risk analysis • Aim: Computation price comparison • Measuring risk: Predict the predictability (future resource availability) • Risk mitigate: Resource CDOs • Initial steps: predict future resource availability (probability of failure); • building probability of completion; • build job-specific service price benchmark; • construct resource CDO;

Conclusion and future work • Future Work Objectives and Contributions • To produce a methodology for calculating and evaluating resource portfolio risk of failure. • Done: VaR, Option Black-Schole model implementations, sensitivities, correlation and moments’ analysis. • Further work: Expected shortfall and related financial models research, provide an algorithm of calculating resource portfolio risk of failure. • Constructing an algorithm to create on-the-fly resource tranches (resource CDO). • Done: General portfolio selection techniques, historical data collection from NGS, future resource availability simulation with normal and Laplace assumption. • Further work: Understanding of more complicated financial derivatives’ risk analysis, obtain long term Grid historical data from NGS, and finally to create an algorithm of constructing resource CDO. • Automatic creation of SLAs. • Further work: WS-Agreement standards with XML practice, an application for automatic creating SLAs • To adapt the use of resource portfolio risk of failure and resource CDO, create autonomic SLAs. • Combine the objectives all above. • Extend our previous analysis into Cloud computing

Further references Li, B., Gillam, L., and O'Loughlin, J. (2010) Towards Application-Specific Service Level Agreements: Experiments in Clouds and Grids, In Antonopoulos and Gillam (Eds.), Cloud Computing: Principles, Systems and Applications. Springer-Verlag. Li, B. and Gillam, L. (2009), Grid Service Level Agreements using Financial Risk Analysis Techniques, In Antonopoulos, Exarchakos, Li and Liotta (Eds.), Handbook of Research on P2P and Grid Systems for Service-Oriented Computing: Models, Methodologies and Applications. IGI Global. Thank you for your attention Questions?

Computational Economics and Job-Specific Service Level Agreements

Computational Economics and Job-Specific Service Level Agreements

Presentation Transcript

Service Agreements

Service Level Agreements: SLAs

Service Level Agreements – What are the Goals?

Economics of Computations and Job-Specific Service Level Agreements

Computational Economics

Consistency-Based Service Level Agreements for Cloud Storage

Service Level Agreements

Service Level Agreements Service Level Statements

Defining Service Level Agreements

Grid Quality of Service and Service Level Agreements

Trust Negotiation and Service Level Agreements

Service Level Agreements and ITIL

Introducing EGEE Site Service Level Agreements

Service Level Agreements for QoS over Wireless Networks

Precise Service Level Agreements

Service Level Agreements

Service Level Agreements

Use of customer metrics in service level agreements

Comparison of NREN Service Level Agreements

Grid Quality of Service and Service Level Agreements