1 / 52

Transparent Cross-Border Migration of Parallel Multi Node Applications

Dominic Battr é , Matthias Hovestadt, Odej Kao, Axel Keller , Kerstin Voss Cracow Grid Workshop 2007. Transparent Cross-Border Migration of Parallel Multi Node Applications. Outline. Highly Predictable Clusters for Internet-Grids EC funded project in FP6. Advanced Risk Assessment &

halima
Download Presentation

Transparent Cross-Border Migration of Parallel Multi Node Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dominic Battré, Matthias Hovestadt, Odej Kao, Axel Keller, Kerstin Voss Cracow Grid Workshop 2007 Transparent Cross-Border Migration of Parallel Multi Node Applications

  2. Outline Axel Keller Highly Predictable Clusters for Internet-Grids EC funded project in FP6 Advanced Risk Assessment & Management for Trustable Grids EC funded project in FP6 • Motivation • The Software Stack • Cross-Border Migration • Summary

  3. Axel Keller The Gap between Grid and RMS User asks for SLA Grid Middleware realizes job by means of local RMS BUT: RMS offer Best Effort Need: SLA-aware RMS Guaranteed! user request grid middleware Reliability? Quality of Service? SLA RMS RMS RMS Best Effort! M1 M2 M3

  4. Axel Keller HPC4U: Highly Predictable Clusters for Internet-Grids Objective Software-only solution for an SLA-aware, fault tolerant infrastructure, offering reliability and QoS, and acting as active Grid component Key Features System level checkpointing Job migration Job types: sequential and MPI-parallel Planning based scheduling

  5. HPC4U: Planning Based Scheduling queues new jobs Machine new jobs time Axel Keller

  6. Axel Keller HPC4U: Software Stack User- / Broker- Interface CLI Negotiation RMS Scheduler SSC Storage Process Network Cluster

  7. Axel Keller HPC4U: Checkpointing Cycle 3. Return: “Checkpoint completed!” 1. CP job+halt 6. Resume job 7. Job runningagain RMS 5. Link to Snapshot 4. Snap-shot ! Process Network Storage 2. In- Transit Packets

  8. Axel Keller Cross Border Migration: Intra Domain User- / Broker- Interface User- / Broker- Interface CLI CLI CRM CRM Negotiation Negotiation PP PP RMS RMS Scheduler Scheduler SSC SSC Storage Storage Process Network Process Network Cluster Cluster

  9. Axel Keller Cross Border Migration: Target Retrieval User- / Broker- Interface User- / Broker- Interface CLI CLI CRM CRM Negotiation Negotiation PP PP RMS RMS Scheduler Scheduler SSC SSC Storage Storage Process Network Process Network Cluster Cluster

  10. Axel Keller Cross Border Migration: Checkpoint Migration User- / Broker- Interface User- / Broker- Interface CLI CLI CRM CRM Negotiation Negotiation PP PP RMS RMS Scheduler Scheduler SSC SSC Storage Storage Process Network Process Network Cluster Cluster

  11. Axel Keller Cross Border Migration: Remote Execution User- / Broker- Interface User- / Broker- Interface CLI CLI CRM CRM Negotiation Negotiation PP PP RMS RMS Scheduler Scheduler SSC SSC Storage Storage Process Network Process Network Cluster Cluster

  12. Axel Keller Cross Border Migration: Result Migration User- / Broker- Interface User- / Broker- Interface CLI CLI CRM CRM Negotiation Negotiation PP PP RMS RMS Scheduler Scheduler SSC SSC Storage Storage Process Network Process Network Cluster Cluster

  13. Axel Keller Cross-Border Migration: Using Globus User- / Broker- Interface CLI CRM WS-AG Broker • WS-AG implementation based on GT4 • Developed in EU project AssessGrid • Source specifies SLA / file staging parameters • Subset of JSDL (POSIX Jobs) • Resource determination via broker • Source directly contacts destination • Destination pulls migration data via Grid-FTP • Destination pushes result data back to source • Source uses WSRF event notification Negotiation PP RMS Scheduler SSC Storage Process Network Cluster

  14. Axel Keller Ongoing Work: Introducing Risk Management User- / Broker- Interface CLI CRM WS-AG Broker • Topic of EU project: AssessGrid • Encorporated in SLA • Provider • Estimates risk for agreeing an SLA • Considers propability of failure in schedule • Assessment based on historical data Risk Assessor Negotiation PP RMS Scheduler Consultant Service SSC Monitoring Storage Process Network Cluster

  15. Summary: Best Effort is not Enough Axel Keller Cross border migration and Risk assessment provide new means to increase the reliability of Grid Computing.

  16. Axel Keller More information • Read the paper • AssessGrid www.assessgrid.eu • HPC4U www.hpc4u.eu • OpenCCS www.openccs.eu Thanks for your attention!

  17. Contents Axel Keller BACKUP

  18. Scheduling Aspects Axel Keller • Execution Time • Exact start time • Earliest start time, latest finish time • User provides stage-in files by time X • Provider keeps stage-out files until time Y • Provisional Reservations • Job Priorities • Job Suspension

  19. Axel Keller • HPC4U

  20. Axel Keller Motivation: Fault Tolerance Commercial Grid usersneed SLAs Providers cautious on adoption Reason: Business case risk Missed deadlines due to system failures Penalties to be paid Solution: Prevention with Fault Tolerance Fault tolerance mechanisms available, but Application modification mandatory Overall solution (System software, process, storage, file system, network) required Combination with Grid migration missing

  21. HPC4U Objective Axel Keller • Software-only solution for a SLA-aware, fault tolerant infrastructure, offering reliability and QoS, acting as active Grid component • Key features • Definition and implementation of SLAs • Resource reservation for guaranteed QoS • Application-transparent fault tolerance

  22. Axel Keller HPC4U: Concept SLA negotiation as an explicit statement of expectations and obligations in a business relationship between provider and customer Reservation of CPU, storage and network for desired time interval Job start in checkpointing environment In case of system failure Job migration / restart with respect to SLA

  23. Phases of Operation Axel Keller Acceptance(or rejection) of SLA StageIn Compu-tation StageOut Pre- Runtime Runtime Post- Runtime Negotiation time Lifetime of SLA Allocationof systemresources • Negotiation of SLA • Pre-Runtime: Configuration of Resources • e.g. network, storage, compute nodes • Runtime: Stage-In, Computation, Stage-Out • Post-Runtime: Re-configuration

  24. Phase:Pre-Runtime Axel Keller • Task of Pre-Runtime Phase • Configuration of all allocated resources • Goal: Fulfill requirements of SLA • Reconfiguration affects all HPC4U elements • Resource Management System • e.g. configuration of assigned compute nodes • Storage Subsystem • e.g. initialization of a new data partition • Network Subsystem • e.g. configuration of network infrastructure

  25. Phase: Runtime • Runtime Phase = lifetime of job in system • adherence with SLA has to be assured • FT mechanisms have to be utilized • Phase consists of three distinct steps • Stage-In • transmission of required input data from Grid customer to compute resource • Computation • execution of application • Stage-Out • transmission of generated output data fromcompute resource back to Grid customer Axel Keller

  26. Phase: Post-Runtime • Task of Post-Runtime Phase: • Re-Configuration of all resources • e.g. re-configuration of network • e.g. deletion of checkpoint datasets • e.g. deletion of temporary data • Counterpart to Pre-Runtime Phase • Allocation of resources ends • Update of schedules in RMS and storage • Resources are available for new jobs Axel Keller

  27. Axel Keller • PROCESS

  28. Subsystems Axel Keller • Process Subsystem • checkpointing of network • cooperative checkpointing protocol (CCP) • Network Subsystem • checkpoint network state • Storage Subsystem • provision of storage • provision of snapshot

  29. Axel Keller • STORAGE

  30. Storage subsystem Axel Keller Computing Interface VSM - SR Storage Logical space (data layout strategies) Physical space Storage Resource 1 Storage Resource 2 • Functionalities • Negotiates the storage part of the SLA • Provides storage capacity at a given QoS level • Provides FT mechanisms • Requirement: manage multiple jobs running on the same SR Virtual Storage Manager

  31. Data Container concept Axel Keller Job File System Block Address Mapping LogicalVolume data layout policies (e.g., simple striping) Physical devices • Idea: • create storage environment for applications at a desired QoS level with abstraction of physical devices • Components: File I/O (read, write, open,…) Data Container Block I/O (read, write, ioctl) Logical space Block I/O Storage Resource

  32. Data container properties Axel Keller • Storage part of the SLA Data container section • Size • File system type • Number of nodes that need to access the data container (private/shared) Performance section • Application I/O profile  Benchmark • Bandwidth (in MB/s or IO/s) Or Default configuration Dependability section • Data redundancy type (within a cluster) • Snapshot needed or not • Data replication or not (between clusters) Job specific section • Job’s time to schedule and time to finish

  33. Fault Tolerance Mechanisms Axel Keller • RAID • Tolerate the failure of one or more disks • RAIN • Tolerate the failure of one or more nodes • Implementation • Hardware • Software • Storage FT mechanisms rely on special data layouts Software • Storage Snapshot

  34. Data container snapshot Axel Keller • Provide instantaneous copy of data containers • Technique used: Copy-On-Write (COW) • create multiple copies of data without duplicating all the data blocks • With checkpoint, it allows application restart from a previous running stage • Impact on SR performance • Taken into account at negotiation time

  35. Snapshot single node job restart after node failure 4 1 3 5 Job Job 2 Node failure 1 Restore job’s data from previous snapshot 2 Start data container 3 Restore job’s state from previous checkpoint 4 Job restart Redundant data layout 5 Storage Resource Characteristics: • The job is running on a single node • The data container is private to that node • Data container snapshot resides on the same storage resource Axel Keller

  36. Interfaces with other components Axel Keller Open Source • client-server • callbacks VSM wrapper Exanodes data container data container data container wrapper SR_type2 SR_type1 ClassicalStorage Array Classical Storage Array Proprietary RMS Interface VSM - RMS VSM Interface VSM – SR Storage Subsystem Storage Resource (SR) Network (socket , RDMA, …)

  37. Axel Keller • ASSESSGRID

  38. Grid Fabric Layer with Risk Assessor Axel Keller • NegotiationManager • Agr./Agr.Fact. WS • checks whether offer complies to template • initiation of file transfers • Scheduler • creates tentative schedules for offers • Risk Assessor • Consultant Service • records data • Monitoring • runtime behavior

  39. Precautionary Fault-Tolerance Axel Keller • How many spare resources are available at execution time? • Use of planning based scheduler

  40. Estimating Risk for a Job Execution Axel Keller • Use of planning based scheduler • How much slack time is available for fault tolerance? • How much effort do I undertake for fault tolerance? • What is the considered risk of resource failure? Execution Time Slack Time Latest Finish Time Earliest Start Time

  41. Risk Assessment Axel Keller • Estimate risk for agreeing an SLA • consider risk of resource failure • estimate risk for a job execution • initiate precautionary FT mechanisms low risk middle risk high risk

  42. Risk Management at Job Execution Axel Keller Events Risk Management Decisions Actions Risk Assessment Business Model (price, penalty) Weekend/Holiday/Workday Schedule (SLAs, best effort) Redundancy Measures

  43. Detection of Bottlenecks Axel Keller • Consultant Service • Analysis of SLA violation • Estimated risk for the job • Planned FT mechanisms • Monitoring Information • Job • Resources • Data Mining • Find connections between SLA violations • Detect weak points in the provider’s infrastructure

  44. Axel Keller • WS-AG

  45. Components Axel Keller

  46. Implementation with Globus Toolkit 4 Axel Keller • Why Globus? • Utility: Authentication, Authorization, Delegation, RFT, MDS, WS-Notification • Impact • Problem 1: GRAM (Grid Resource Allocation and Management) • State machine, incl. File-Staging, Delegation of Credentials, RSL • Cannot use it: written for batch schedulers, nor for planning schedulers • Problem 2: Deviations from WS-AG spec. • Different Namespaces WS-A, WS-RF

  47. Implementation with Globus Toolkit 4 Axel Keller • Technical Challenges • xs:anyType • Wrote custom serializers/deserializers • Subtitution groups • Used in ItemConstraint (Creation Constraints) • Cannot be mapped to Java by Axis • Replaced by xs:anyType – use as DOM tree • CreationConstraints • Namespace prefixes in XPaths meaningless • Need for WSDL and interpretation for xs:all, xs:choice, and friends

  48. Context Axel Keller <wsag:Context> … <wsag:AgreementInitiator> <AG:DistinguishedName> /C=DE/O=… </AG:DistinguishedName> </wsag:AgreementInitiator> <wsag:AgreementResponder>EPR</…> <AG:ServiceUsers> <AG:ServiceUser>DN</…> </AG:ServiceUsers> … </wsag:Context> Context Terms Creation Constraints

  49. Terms, SDTs Axel Keller • Conjunction of terms • Common structure of templates • WS-AG too powerful/difficult to fully support • Service Description Term (one) • assessgrid:ServiceDescription (extension of abstract ServiceTermType) • jsdl:POSIXExecutable (executable, arguments, environment) • jsdl:Application (mis-)used for libraries • jsdl:Resources • jsdl:DataStaging * • assessgrid:PoF (upper bound) Context Terms Creation Constraints

  50. Terms, GuaranteeTerms Axel Keller Context Terms Creation Constraints • No hierarchy but two meta guarantees • ProviderFulfillsAllObligations • e.g. Reward: 1000 EUR, Penalty 1000 EUR • ConsumerFulfillsAllObligations • e.g. Reward: 0 EUR, Penalty 1000 EUR • First violation is responsible for failure • No hardware problem, then User fault • Other Guarantees • Execution Time • Any start time (best effort) • Exact start time • Earliest start time, latest finish time • User provides StageIn files by time X • Provider keeps StageOut files until time Y No timely execution No stage-out

More Related