290 likes | 306 Views
Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor. Building Grids with Condor. What are Grids?. Grids allow access to distributed, remote compute cycles Send input and executable over Run the executable
E N D
Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Building Grids with Condor
What are Grids? • Grids allow access to distributed, remote compute cycles • Send input and executable over • Run the executable • Pull output back
Grids typically... • have multiple sites with multiple administrative domains • Different people, different rules and policies • use the public internet • Unreliable, insecure • hand jobs off to a local batch system • PBS, LSF, Condor, etc
Execute Node Archetypical Grid Remote Site Head Node Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System
Grids with Condor • Distributed Condor pools • Condor Flocking • Condor to Globus: Condor-G • Condor to Condor: Condor-C
Distributed pools with Condor • Simply have a single large Condor pool spanning multiple sites • Variety of work to better support public internet • Encryption and authentication • Disconnected starter-shadow • TCP instead of UDP updates
Execute Node Distributed pools with Condor Condor Pool Submit Node Internet Queue Job 1 Job 2 …
Distributed pools with Condor: Advantages • Unified system (all Condor) • Strong matchmaking capabilities • Directly matching jobs to execute nodes • Everything else great about Condor • (See rest of talks for details)
Distributed pools with Condor: Disadvantages • Requires coordination between sites • Weak with firewalls and address translation (NATs) • Solutions, like Generic Connection Brokering (GCB) exist • Centralized point of failure • A central manager network outage will stop jobs from starting
Condor Flocking • A Condor submit node (condor_schedd) works with multiple Condor pools
Execute Node Condor Flocking Local Condor Pool Remote Condor Pool Submit Node Internet Queue Job 1 Job 2 … Batch System
Condor Flocking Advantages • Unified system (all Condor) • Strong matchmaking capabilities • Directly matching jobs to execute nodes • Slightly less coordination between sites required • Sites can have different policies
Condor Flocking Disadvantages • As a large distributed pool • Weak with firewalls and address translation (NATs) • Network connection intensive • More complex than a single pool
Globus • Globus provides a remote front end to multiple batch systems • In addition to other functionality
Execute Node Globus Remote Site Head Node (Globus) Submit Node Internet Firewall Batch System
Globus Advantages • Standard • Widely used • Variety of tools built on top of Globus are available • Can speak to a variety of batch systems • Condor, PBS, LSF, etc • Each site can run own system
Globus Disadvantages • Minimal: designed to be a lower layer • Simple command line tools • No job tracking • No matchmaking • No recovery from errors • Must configure Globus in addition • Strictly remote side
Condor-G • Condor can provide an interface and job queue for Globus: Condor-G universe = grid grid_type = gt2 (or gt3, or gt4)
Execute Node Condor-G Remote Site Head Node (Globus) Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System
Condor-G Advantages • Interoperable with and builds on strengths of Globus • Provides persistent submit queue • Attempts to automatically recover from errors
Condor-G Disadvantages • Must configure Condor-G in addition • Strictly submitter side • Remote side doesn’t need to know
Condor-G Status and News • Globus Toolkit 2 is stable • Globus Toolkit 3 is supported • But we think most people are moving to… • Globus Toolkit 4 in progress • GT4 beta works now in Condor 6.7.6 • Condor will officially support soon after official GT4 release.
Condor-C • Condor handing jobs off to Condor universe = grid grid_type = condor • Once handed off, behaves like a normal Condor job
Execute Node Condor-C Remote Site Head Node (Condor) Submit Node Internet Queue Job 1 Job 2 … Firewall Batch System
Condor-C Advantages • Unified system (All Condor) • Relatively easy to configure if you’re already using Condor • Can optionally speak to a variety of batch systems • Each site can run own system: PBS, LSF, etc
startd startd schedd startd startd schedd schedd startd startd schedd startd startd Condor-C is Flexible • General way to redistribute Condor work between schedds • Overloaded schedd? • Fan the work out
Condor-C Disadvantages • Work in progress • Not yet ready for multi-user environments • Expected for Condor 6.8.0 • No strong security yet • Expected for Condor 6.8.0 • Speaking to other batch systems very new, not yet distributed
Condor-C • Available for evaluation in Condor 6.7 • First stable release in Condor 6.8
The End? • More on Wednesday • At the Computer Science Building • Demos 9:00AM to Noon • Condor-C – Alan De Smet – room 4247 • Condor-G – Jaime Frey – room 4254 • Birds of a Feather discussion • 1:00PM to 2:30PM – room 4331