1 / 43

SLA Management in AssessGrid

SLA Management in AssessGrid. Dominic Battr é, TU Berlin. AssessGrid in a Nutshell. Requirement for Service Level Agreements from users Reluctance to sign SLAs by providers. AssessGrid in a Nutshell. AssessGrid in a Nutshell. AssessGrid in a Nutshell. AssessGrid in a Nutshell.

vern
Download Presentation

SLA Management in AssessGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SLA Management in AssessGrid Dominic Battré, TU Berlin

  2. AssessGrid in a Nutshell • Requirement for Service Level Agreements from users • Reluctance to sign SLAs by providers

  3. AssessGrid in a Nutshell

  4. AssessGrid in a Nutshell

  5. AssessGrid in a Nutshell

  6. AssessGrid in a Nutshell DAS-2 Grid 3 failedjobs succ.jobs TeraGrid … * statisticsfrom 2005/2006!

  7. AssessGrid in a Nutshell

  8. AssessGrid in a Nutshell • User: • Which provider is reliable? • How reliable is a provider? • Does a provider lie? • Provider: • How reliable am I? • Can I sign SLAs? • Can I improve my reliability?

  9. Agenda • AssessGrid in a Nutshell • Content of SLAs • Demo • Job submission and provider selection • Fault Tolerance • Underlying technology • Negotiation Manager • Risk Assessment and Management • Content of SLAs as WS-Agreement • Future Challenges

  10. Content of SLAs nodes Job 1 Job 7 Job 2 Job 3 Job 4 Job 5 Schedule • Participating parties • Job Definition • Scheduling • Executable • File Staging • Acceptable Probability of Failure • Price and penalty Job 6 Each job specified with Job 1 nr. nodes Job 1 runtime Earliest start time Latest finish time time

  11. Job Submission And Provider Selection

  12. Job Submission and Provider Selection Specify Job End-User Broker Providers • Program, Input, Output • Acceptable PoF • Penalty in case of failure • Deadline

  13. Job Submission and Provider Selection Get Quotes End-User Broker Providers

  14. Job Submission and Provider Selection Get Quotes End-User Broker Providers • Forwarding based on • Matching of templates to request • Quotes created in the past • Performance in the past

  15. Job Submission and Provider Selection Generate Quotes End-User Broker Providers • Calculate Probability of Failure (PoF) • Calculate required number of spare nodes, extra time • Calculate price • Check available resources in schedule

  16. Job Submission and Provider Selection Quotes End-User Broker Providers

  17. Job Submission and Provider Selection Enhance Quotes End-User Broker Providers • Own estimation of PoF in case of unreliable providers • Perform ranking respecting user’s desire

  18. Job Submission and Provider Selection Quotes End-User Broker Providers

  19. Job Submission and Provider Selection Select Provider End-User Broker Providers • Criteria: • Price, PoF, Adjusted PoF • AHP-Ranking

  20. Job Submission and Provider Selection Get Reputation End-User Broker Providers

  21. DS Analytical Hierarchy Process Past Performance Maintenance Security Customer Support Infrastructure Experience Maintenance Staff 24/7 Staff training/yr Staff experience Red. Power Red. Storage Storage Age … Infrastructure

  22. Job Submission and Provider Selection Create Agreement End-User Broker Providers

  23. Fault Tolerance

  24. Demonstration of Fault Tolerance

  25. Underlying Technology: The Negotiation Manager

  26. Negotiation Manager • Globus Toolkit 4 • Apache 2 License • 2 Flavours • Simple Framework • AssessGrid Implementation(OpenCCS, Risk Assessment, …) • Features • Template Store • Access Control, Credential Delegation • State Management • Staging by GridFTP • Simple Validation of CreationConstraints • Extensible • WS-Notification • Optional: Quote Mechanism • Optional: Cheap Cancellation Extension

  27. Template Store • Optional component • Templates stored persistently in RDBMS • Get, Insert, Delete by WS-RF • Monitoring by WS-Notification • Access policies: • Everybody can read • Admin(s) can modify • Templates used in AssessGrid • Regular Job (POSIX and SPMD) • Out-sourced Job with checkpoint data-set

  28. Access Control • Default: • 3 User Groups • Admins, Owners, Users • Admin has access to anything • Owner is legally responsible • Users have read access • Owner and Users are different in case of SLA outsourcing • Overwriteable • Option to delegate credentials

  29. State Management • Asynchronous, multi-threaded, persistent state management Waitfortermination Start Waitforstage-in Dostage-in Stage-in done Dostage-out Stage-out done Cleanup Wait forexecution Wait fortermination

  30. File-staging • Files specified by JSDL • User delegates credentials • User estimates duration • Shorter duration triggers earlier execution • Longer duration triggers later execution • Staging by GridFTP

  31. CreationConstraints • Difficult to support Namespaces: • //wsag:…/assessgrid:… - prefixes are just strings • Very difficult to support structural information • xs:group, xs:all, xs:choice, xs:sequence • Possible but difficult to support xs:restriction • xs:simple • Check for enumeration (xs:restriction of xs:string) • Check for valid dates (xs:restriction of xs:date) • Everything else close to impossible • {min,max}{In,Ex}clusive • totalDigits, fractionDigits, length, … probably useless Context Terms Creation Constraints

  32. Optional Quote Mechanism User Provider Get Template Fill Template Create Quote modify Create Agreement bound Yes / No bound

  33. Extensible Not: But: WSDL WSDL Black Box deployed NegMgr WSDL Interface Domain specificImplementation Domain specificImplementation deployed

  34. Cancellation Policy • Motivation: • Serious issues of 3-way commit protocol (reservations) • Goal: Cheap Cancellation Policy • “Full refund if product bought online is returned online within 14 days” (German law) • “Cancellation before first day of validity: 15 EUR, after that: not possible” (Deutsche Bahn) • “less than 24 hours before scheduled stay: 50% of first day for cancellation” (hotels)

  35. Cancellation Policy - Rules • Ends of periods: • Price: +5min -1d createQuote createAgreement Earliest Start - 80% 1 EUR

  36. Cancellation Policy - Combination price Full price -50% -1d +5min 0.50 EUR time createQuote createAgreement Earliest Start Used in Broker for roll-back of unsuccessful workflow mappings

  37. Context <wsag:Context> … <wsag:AgreementInitiator> <AG:DistinguishedName> /C=DE/O=… </AG:DistinguishedName> </wsag:AgreementInitiator> <wsag:AgreementResponder>…</…> <AG:ServiceUsers> <AG:ServiceUser>DN</…> </AG:ServiceUsers> … </wsag:Context> Context Terms Creation Constraints

  38. Terms, SDTs • Conjunction of terms • Common structure of templates • WS-AG too powerful/difficult to fully support • Service Description Term (one) • assessgrid:ServiceDescription (extension of abstract ServiceTermType) • jsdl:POSIXExecutable / SPMD (executable, arguments, environment) • jsdl:Resources • jsdl:DataStaging * • assessgrid:PoF (upper bound) Context Terms Creation Constraints

  39. Terms, GuaranteeTerms • No hierarchy but two meta guarantees • ProviderFulfillsAllObligations • e.g. Reward: 1000 EUR, Penalty 1000 EUR • ConsumerFulfillsAllObligations • e.g. Reward: 0 EUR, Penalty 1000 EUR • First violation is responsible for failure • No hardware problem, then User fault • Other Guarantees • Execution Time • Any start time (best effort) • Exact start time • Earliest start time, latest finish time • Maximum StageIn/Out time • No Cancellation No timely execution No stage-out Context Terms Creation Constraints

  40. Stuff I did not talk about • Risk Assessment • Risk Management • Checkpointing details, runtime extension, spare nodes, … • Confidence and Reputation Service • Workflows • Description in WS-Agreement • Mapping to individual SLAs • Simulation tools

  41. Future Challenges • Failure detection and analysis • (Re)negotiation • Risk Assessment • Interoperability of WS-Agreement implementations by micro-specs – or even common template structures • Automatic evaluation of CreationConstraints • Posthumous resolving of disagreements • Third party blaming • Persisting Problems • Dependencies of violated guarantees • Violation caused by third party or unknown cause • Failure/success of entire SLA

  42. http://www.assessgrid.eu

More Related