150 likes | 174 Views
Explore the world of Condor - a distributed high-throughput computing (HTC) facility transforming workstations into powerhouses. Learn about ClassAd Matchmaking, fault-tolerance, and robust job management with Condor-G. Discover HTC challenges and success stories like NUG30 solved simultaneously across multiple resources. Join the Condor Tutorial for insights into leveraging Condor for managing jobs, resources, and more. Cheers to the future of computing with Condor!
E N D
Condor Introduction Asia Pacific Grid WorkshopTokyo, JapanOctober 2001
Outline Overview: What is Condor • What does Condor do? • What is Condor good for? • What kind of results can I expect?
The Condor Project (Established ‘85) Distributed High Throughput Computing research performed by a team of ~25 faculty, full time staff and students who: • face software engineering challenges in a distributed UNIX/Linux/NT environment, • are involved in national and international collaborations, • actively interact with academic and commercial users, • maintain and support a large distributed production environment, • and educate and train students. Funding – US Govt. (DoD, DoE, NASA, NSF), AT&T, IBM, INTEL, Microsoft, UW-Madison
What is High-Throughput Computing? • High-performance: CPU cycles/second under ideal circumstances. • “How fast can I run simulation X on this machine?” • High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. • “How many times can I run simulation X in the next month using all available machines?”
What is Condor? • Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughputcomputing (HTC) facility. • Condor uses ClassAd Matchmaking to make sure that everyone is happy. • Fault tolerance provided with checkpointing and other technologies.
The Condor System • Unix and NT • Operational since 1986 • Manages more than 1300 CPUs at UW-Madison • Software available free on the web • More than 150 Condor installations worldwide in academia and industry
Some HTC Challenges • Condor does whatever it takes to run your jobs, even if some machines… • Crash (or are disconnected) • Run out of disk space • Don’t have your software installed • Are frequently needed by others • Are far away & managed by someone else
What is ClassAd Matchmaking? • Condor uses ClassAd Matchmaking to make sure that work gets done within the constraints of both users and owners. • Users (jobs) have constraints: • “I need an Alpha with 256 MB RAM” • Owners (machines) have constraints: • “Only run jobs when I am away from my desk and never run jobs owned by Bob.”
Upgrade to Condor-G A Grid-enabled version of Condor that provides robust job management for Globus. • Robust replacement for globusrun • Provides extensive fault-tolerance • Brings Condor’s job management features to Globus jobs
What Have We Done on the Grid Already? • Example: NUG30 • quadratic assignment problem • 30 facilities, 30 locations • minimize cost of transferring materials between them • posed in 1968 as challenge, long unsolved • but with a good pruning algorithm & high-throughput computing...
NUG30 Solved on the Grid with Condor + Globus Resource simultaneously utilized: • the Origin 2000 (through LSF ) at NCSA. • the Chiba City Linux cluster at Argonne • the SGI Origin 2000 at Argonne. • the main Condor pool at Wisconsin (600 processors) • the Condor pool at Georgia Tech (190 Linux boxes) • the Condor pool at UNM (40 processors) • the Condor pool at Columbia (16 processors) • the Condor pool at Northwestern (12 processors) • the Condor pool at NCSA (65 processors) • the Condor pool at INFN (200 processors)
NUG30 - Solved!!! Sender: goux@dantec.ece.nwu.edu Subject: Re: Let the festivities begin. Hi dear Condor Team, you all have been amazing. NUG30 required 10.9 years of Condor Time. In just seven days ! More stats tomorrow !!! We are off celebrating ! condor rules ! cheers, JP.
The Idea Computing power is everywhere,we try to make it usable by anyone.
Condor Tutorial This Afternoon: Outline • Understanding Condor • Using Condor to manage jobs • Using Condor to manage resources • Condor Architecture and Mechanisms • Condor on the Grid • Flocking • Condor-G • Case Study: Distributed TeraFlop
Thank you! Check us out on the Web: http://www.cs.wisc.edu/condor Email: condor-admin@cs.wisc.edu