1 / 29

Building Grids with Condor

Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor. Building Grids with Condor. What are Grids?. Grids allow access to distributed, remote compute cycles Send input and executable over Run the executable

gracenewman
Download Presentation

Building Grids with Condor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Building Grids with Condor

  2. What are Grids? • Grids allow access to distributed, remote compute cycles • Send input and executable over • Run the executable • Pull output back

  3. Grids typically... • have multiple sites with multiple administrative domains • Different people, different rules and policies • use the public internet • Unreliable, insecure • hand jobs off to a local batch system • PBS, LSF, Condor, etc

  4. Execute Node Archetypical Grid Remote Site Head Node Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System

  5. Grids with Condor • Distributed Condor pools • Condor Flocking • Condor to Globus: Condor-G • Condor to Condor: Condor-C

  6. Distributed pools with Condor • Simply have a single large Condor pool spanning multiple sites • Variety of work to better support public internet • Encryption and authentication • Disconnected starter-shadow • TCP instead of UDP updates

  7. Execute Node Distributed pools with Condor Condor Pool Submit Node Internet Queue Job 1 Job 2 …

  8. Distributed pools with Condor: Advantages • Unified system (all Condor) • Strong matchmaking capabilities • Directly matching jobs to execute nodes • Everything else great about Condor • (See rest of talks for details)

  9. Distributed pools with Condor: Disadvantages • Requires coordination between sites • Weak with firewalls and address translation (NATs) • Solutions, like Generic Connection Brokering (GCB) exist • Centralized point of failure • A central manager network outage will stop jobs from starting

  10. Condor Flocking • A Condor submit node (condor_schedd) works with multiple Condor pools

  11. Execute Node Condor Flocking Local Condor Pool Remote Condor Pool Submit Node Internet Queue Job 1 Job 2 … Batch System

  12. Condor Flocking Advantages • Unified system (all Condor) • Strong matchmaking capabilities • Directly matching jobs to execute nodes • Slightly less coordination between sites required • Sites can have different policies

  13. Condor Flocking Disadvantages • As a large distributed pool • Weak with firewalls and address translation (NATs) • Network connection intensive • More complex than a single pool

  14. Globus • Globus provides a remote front end to multiple batch systems • In addition to other functionality

  15. Execute Node Globus Remote Site Head Node (Globus) Submit Node Internet Firewall Batch System

  16. Globus Advantages • Standard • Widely used • Variety of tools built on top of Globus are available • Can speak to a variety of batch systems • Condor, PBS, LSF, etc • Each site can run own system

  17. Globus Disadvantages • Minimal: designed to be a lower layer • Simple command line tools • No job tracking • No matchmaking • No recovery from errors • Must configure Globus in addition • Strictly remote side

  18. Condor-G • Condor can provide an interface and job queue for Globus: Condor-G universe = grid grid_type = gt2 (or gt3, or gt4)

  19. Execute Node Condor-G Remote Site Head Node (Globus) Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System

  20. Condor-G Advantages • Interoperable with and builds on strengths of Globus • Provides persistent submit queue • Attempts to automatically recover from errors

  21. Condor-G Disadvantages • Must configure Condor-G in addition • Strictly submitter side • Remote side doesn’t need to know

  22. Condor-G Status and News • Globus Toolkit 2 is stable • Globus Toolkit 3 is supported • But we think most people are moving to… • Globus Toolkit 4 in progress • GT4 beta works now in Condor 6.7.6 • Condor will officially support soon after official GT4 release.

  23. Condor-C • Condor handing jobs off to Condor universe = grid grid_type = condor • Once handed off, behaves like a normal Condor job

  24. Execute Node Condor-C Remote Site Head Node (Condor) Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System

  25. Condor-C Advantages • Unified system (All Condor) • Relatively easy to configure if you’re already using Condor • Can optionally speak to a variety of batch systems • Each site can run own system: PBS, LSF, etc

  26. startd startd schedd startd startd schedd schedd startd startd schedd startd startd Condor-C is Flexible • General way to redistribute Condor work between schedds • Overloaded schedd? • Fan the work out

  27. Condor-C Disadvantages • Work in progress • Not yet ready for multi-user environments • Expected for Condor 6.8.0 • No strong security yet • Expected for Condor 6.8.0 • Speaking to other batch systems very new, not yet distributed

  28. Condor-C • Available for evaluation in Condor 6.7 • First stable release in Condor 6.8

  29. The End? • More on Wednesday • At the Computer Science Building • Demos 9:00AM to Noon • Condor-C – Alan De Smet – room 4247 • Condor-G – Jaime Frey – room 4254 • Birds of a Feather discussion • 1:00PM to 2:30PM – room 4331

More Related