1 / 21

Outline

Explore the current directions, contracts, and status of resource selection, with details on scheduling, contracts, and migration management. The protocol involves Cactus Worm Server, GridFTP, and performance detection for improved resource selection.

Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Resource Selection: Current Directions • Contracts: Current Directions • Current Status • Resource Selection • Request Protocol • Response Protocol • Resouce “Scheduling” • Contracts • Migration Manager

  2. Resource Selection Current Directions

  3. Cactus Worm Server Cactus Flesh “Worm” Migration Module User Supplied Application Payload External GridFTP Server (Source) GridFTP Client Thorn External GridFTP Server (Destination) Performance Degradation Detection External Resource Selection Service Resource Selection Client Thorn Migration Logic Manager External Processes Thorns Cactus Application Unit Current ArchitectureUnder Development Data transfer

  4. GRIS’s Resource Selector ArchitectureUCSD (UCSD) Resource Selection Client Thorn Request in ClassAds format Protocol? Http? SOAP? Response (format?) HFA/GradsSoft Translator MDS Resource Selection Library UCSD (HFA/GradsSoft) NWS

  5. GRIS’s Resource Selector ArchitectureClassAds (ClassAds) Resource Selection Client Thorn Protocol? Http? SOAP? Request in ClassAds format Response (format?) UTk Project NWS Resource Selection Engine MDS Needed for recovery and timeliness? ClassAds library

  6. Resource Selector ArchitectureOther RS’s (Other) Resource Selection Client Thorn Request in some format Protocol? Http? SOAP? Response in some format Other Resource Selection Service

  7. Contract Monitoring Current directions

  8. Contract Monitor • Driven by three user-controllable parameters • Time quantum for “time per iteration” • % degradation in time per iteration (relative to prior average) before noting violation • Number of violations before migration • Potential causes of violation • Competing load on CPU • Computation requires more processing power: e.g., mesh refinement, new subcomputation • Hardware problems

  9. Contract Monitor Details • The end user specifies several variables. • These variables can be changed during runtime by contacting the application with an HTTP interface. • These variables include: • time quantum • % degradation • number of violations before migration • The system will then calculate the average wall clock time per iteration for each time quantum. • If the average iteration in any time quantum has lower performance (by the percentage specified) than the average for all the other previous quanta, then a violation is noted.

  10. Actions Taken on Contract Violation • Occurs when more than the specified number of violations have been noted • New set of resources requested from the ResourceSelector • Checkpoints application • Moves checkpoint data to the new resources along with other data needed for restart • Restarts application on the new resources

  11. Current Status

  12. Resource Selection • Demonstrated migration using RS with simple protocol (using raw sockets). • Working on more robust protocol over HTTP using ClassAds as request and XML as response • Robustness (error handling) critical on real grid • Important to use well known protocol • Working on incorporating performance model into ClassAds

  13. Resource Selection:Example Input [ Type="request"; Owner="dangulo"; RequiredDomains={"cs.uiuc.edu", "ucsd.edu"}; requirements= "other.opSys=="LINUX" & other.minMemSize> (100G/other.CPUCount) && Include(other.domains, RequiredDomains) "; Rank= other.minCPUSpeed * other.CPUCount / (other.maxCPULoad+1); ]

  14. Resource Selection:Input • Need to specify other user-centric informaion • Cactus is installed in user space • We’re investigating whether we can put the Performance Model equations into the ClassAds format in order to pass it to the Resource Selector. • The “Rank” value in the preceding slide shows a simple example of this.

  15. Resource Selection:Example output <virtualMachine> <result statusCode="200" statusMessage="OK"/> <machineList> <machine dns="amajor.cs.uiuc.edu" processor=" 1"> <machine dns="bmajor.cs.uiuc.edu" processor=" 1"> <machine dns="cmajor.cs.uiuc.edu" processor=" 1"> <machine dns="dmajor.cs.uiuc.edu" processor=" 1"> <machine dns="emajor.cs.uiuc.edu" processor=" 1"> <machine dns="fmajor.cs.uiuc.edu" processor=" 1"> <machine dns="hmajor.cs.uiuc.edu" processor=" 1"> </machineList> </virtualMachine>

  16. Resource Selection:Example outputNo resource is found <virtualMachine> <result statusCode="204“ statusMessage="No match Resource is Found"/> <machineList> </machineList> </virtualMachine>

  17. Resource Selection:Example outputBad request from client (request format error) <virtualMachine> <result statusCode="400" statusMessage="Bad Request"/> <machineList> </machineList> </virtualMachine>

  18. Resource Selection:Example outputMDS server is down <virtualMachine> <result statusCode="601“ statusMessage="MDS Service is not available"/> <machineList> </machineList> </virtualMachine>

  19. Resource “Scheduling” • What word do we use for allocating machines to data (“scheduling” seems wrong). • We’re assuming that RS does this • We need to map RS output to Cactus machine distribution

  20. Contract Monitoring • Demonstrated detection of performance degradation • Application monitors placed in Cactus scheduling • routine called once per iteration • accesses Cactus internal timing API • synchronization implies that timing on all nodes are identical • could use different Cactus scheduling times to get node dependant results

  21. Migration Manager • In initial development • Will allow RS selection to occur asynchronously • Will make intelligent choice on whether migration will actually help • Will not migrate to seemingly lower quality resources

More Related