1 / 27

Cluster Computing on the Fly:

Cluster Computing on the Fly:. Peer-to-Peer Scheduling of Idle Cycles in the Internet Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu Zhao, and Yuhong Liu Network Research Group University of Oregon. CCOF Motivation.

mulan
Download Presentation

Cluster Computing on the Fly:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Computing on the Fly: Peer-to-Peer Scheduling of Idle Cycles in the Internet Virginia Lo, Daniel Zappala, Dayi Zhou, Shanyu Zhao, and Yuhong Liu Network Research Group University of Oregon

  2. CCOF Motivation • A variety of users and their applications need additional computational resources • Many machine throughout the Internet lie idle for large periods of time • Many users are willing to donate cycles • How to provide cycles to the widest range of users? (beyond institutional barriers)

  3. CCOF Scenario #1 • Chess hobbyist want to test her chess program • She only has a PC at home • She joins the chess interest group cycle-sharing community and discovers hosts who will run her chess state space search algorithm for a few weeks

  4. CCOF Scenario #2 • Experiments with network game due in a week to meet conference deadline • Planet Lab overloaded • Network Research Group machines overloaded • Requests for hosts go out to machines in the department, campus, colleagues at other universities, personal friends, and general donors

  5. CCOF Goals and Assumptions • Cycle sharing in an open peer-to-peer environment • Application-specific scheduling • Long term fairness • Hosts retain local control, sandbox

  6. Cycle Sharing Applications Four classes of applications that can benefit from harvesting idle cycles: • Infinite workpile • Workpile with deadlines • Tree-based search • Point-of-Presence (PoP)

  7. Infinite workpile • Consume huge amounts of compute time • Master-slave model • “Embarrassingly parallel”: no communication among hosts • Ex: SETI@home, Stanford Folding, etc.

  8. Workpile with deadlines • Similar to infinite workpile but more moderate • Must be completed by a deadline (days or weeks) • Some capable of increasingly refined results given extra time • Ex: simulations with a large parameter space, ray tracing, genetic algorithms

  9. Tree-based Search • Tree of slave processes rooted in single master node • Dynamic growth as search space is expanded • Dynamic pruning as costly solutions are abandoned • Low amount of communication among slave processes to share lower bounds • Ex: distributed branch and bound, alpha-beta search, recursive backtracking

  10. Point-of-presence • Minimal consumption of CPU cycles • Require placement of application code dispersed throughout the Internet to meet specific location, topological distribution, or resource requirements • Ex: security monitoring systems, traffic analysis systems, protocol testing, distributed games

  11. CCOF Architecture

  12. CCOF Architecture • Cycle sharing communities based on factors such as interest, geography, performance, trust, or generic willingness to share. • Span institutional boundaries without institutional negotiation • A host can belong to more than one community • May want to control community membership

  13. CCOF Architecture • Application schedulers to discover hosts, negotiate access, export code, and collect and verify results. • Application-specific (tailored to needs of application) • Resource discovery • Monitors jobs for progress; checks jobs for correctness • Kills or migrates jobs as needed

  14. CCOF Architecture (cont.) • Local schedulers enforce local policy • Run in background mode v. preempt when user returns • QoS through admission control and reservation policies • Local machine protected through sandbox • Tight control over communication

  15. CCOF Architecture (cont.) • Coordinated scheduling • Across local schedulers, across application schedulers • Enforce long-term fairness • Enhance resource discovery through information exchange

  16. CCOF Preliminary Work • Wave Scheduler • Resource discovery experiments • Quizzes for Correctness • Point-of-Presence Scheduler

  17. Wave Scheduler • Well-suited for workpile with deadlines • Provides on-going access to dedicated cycles by following night timezones around the globe • Uses a CAN-based overlay to organize hosts by timezone

  18. Wave Scheduler

  19. Resource Discovery(Zhou and Lo, to appear WGP2P’04 at CC-Grid ‘04) • Highly dynamic environment (hosts come, go) • Hosts maintain profiles of blocks of idle time Four basic search methods • Rendezvous points • Host advertisements • Client expanding ring search • Client random walk search

  20. Resource Discovery • Rendezvous point best high job completion rate and low msg overhead, but favors large jobs under heavy workloads • ==> coordinated scheduling needed for long term fairness

  21. CCOF Verification Goal: Verify correctness of returned results for workpile and workpile with deadline • Quizzes = easily verifiable computations that are indistinguishable from the actual work • Standalone quiz v. Embedded quizzes • Quiz performance stored in reputation system • Quizzes v. replication

  22. Point-of-Presence Scheduler • Scalable protocols for identifying selected hosts in the community overlay network such that each ordinary node is k-hops from C of the selected hosts • (C,k) dominating set Problem • Useful for leader election, rendezvous point placement, monitor location, etc.

  23. CCOF Dom(C,k) Protocol • Round 1: Each node says HI to k-hop neighbors <Each node knows size of its own k-hop neighborhood> • Round 2: Each node sends size of its k-hop neighborhood to all its neighbors. <Each node knows size of all nbrs k-hop nbrhoods.> • Round 3: If a node is maximal among its nbrhood, it declares itself a dominator and notifies all nbrs. <Some nodes hear from some dominators, some don’t> For those not yet covered by C dominators, repeat Rounds 1-3 excluding current dominators, until all nodes covered.

  24. CCOF Research Issues • Incentives and fairness • What incentives are needed to encourage hosts to donate cycles? • How to keep track of resources consumed v. resources donated? • How to prevent resource hogs from taking an unfair share? • Resource discovery • How to discover hosts in a highly dynamic environment (hosts come and go, withdraw cycles, fail) • How to discover hosts that can be trusted, that will provide the needed resources?

  25. CCOF Research Issues • Verification, trust, and reputation • How to check returned results? • How to catch malicious or misbehaving hosts that change results with low frequency? • Which reputation system? • Application-based scheduling • How does trust and reputation influence scheduling? • How should a host decide from whom to accept work?

  26. CCOF Research Issues • Quality of service and performance monitoring • How to provide local admission control? • How to evaluate and provide QoS - guaranteed versus predictive service? • Security • How to prevent attacks launched from guest code running on the host? • How to prevent denial of service attacks in which useless code occupies many hosts

  27. Related Work • Systems most closely resembling CCOF SHARP (Fu, Chase, Chun, Schwab, Vahdat, 2003) Partage, Self-organizing Flock of Condors (Hu, Butt, Zhang, 2003) BOINC (Anderson, 2003) - limited to donation of cycles to workpile) • Resource discovery (Iamnitchi and Foster, 2002); Condor matchmaking • Load sharing within and across institutions Condor, Condor Flocks, Grid computing • Incentives and Fairness See Berkeley Workshop on Economics of P2P Systems OurGrid (Andrade, Cirne, Brasileiro, Roisenberg, 2003) • Trust and Reputation EigenRep (Kamvar, Schlosser, Garcia-Molina, 2003); TrustMe(Singh and Liu, 2003)

More Related