1 / 15

Condor Introduction Asia Pacific Grid Workshop Tokyo, Japan October 2001

Explore the world of Condor - a distributed high-throughput computing (HTC) facility transforming workstations into powerhouses. Learn about ClassAd Matchmaking, fault-tolerance, and robust job management with Condor-G. Discover HTC challenges and success stories like NUG30 solved simultaneously across multiple resources. Join the Condor Tutorial for insights into leveraging Condor for managing jobs, resources, and more. Cheers to the future of computing with Condor!

noguera
Download Presentation

Condor Introduction Asia Pacific Grid Workshop Tokyo, Japan October 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor Introduction Asia Pacific Grid WorkshopTokyo, JapanOctober 2001

  2. Outline Overview: What is Condor • What does Condor do? • What is Condor good for? • What kind of results can I expect?

  3. The Condor Project (Established ‘85) Distributed High Throughput Computing research performed by a team of ~25 faculty, full time staff and students who: • face software engineering challenges in a distributed UNIX/Linux/NT environment, • are involved in national and international collaborations, • actively interact with academic and commercial users, • maintain and support a large distributed production environment, • and educate and train students. Funding – US Govt. (DoD, DoE, NASA, NSF), AT&T, IBM, INTEL, Microsoft, UW-Madison

  4. What is High-Throughput Computing? • High-performance: CPU cycles/second under ideal circumstances. • “How fast can I run simulation X on this machine?” • High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. • “How many times can I run simulation X in the next month using all available machines?”

  5. What is Condor? • Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughputcomputing (HTC) facility. • Condor uses ClassAd Matchmaking to make sure that everyone is happy. • Fault tolerance provided with checkpointing and other technologies.

  6. The Condor System • Unix and NT • Operational since 1986 • Manages more than 1300 CPUs at UW-Madison • Software available free on the web • More than 150 Condor installations worldwide in academia and industry

  7. Some HTC Challenges • Condor does whatever it takes to run your jobs, even if some machines… • Crash (or are disconnected) • Run out of disk space • Don’t have your software installed • Are frequently needed by others • Are far away & managed by someone else

  8. What is ClassAd Matchmaking? • Condor uses ClassAd Matchmaking to make sure that work gets done within the constraints of both users and owners. • Users (jobs) have constraints: • “I need an Alpha with 256 MB RAM” • Owners (machines) have constraints: • “Only run jobs when I am away from my desk and never run jobs owned by Bob.”

  9. Upgrade to Condor-G A Grid-enabled version of Condor that provides robust job management for Globus. • Robust replacement for globusrun • Provides extensive fault-tolerance • Brings Condor’s job management features to Globus jobs

  10. What Have We Done on the Grid Already? • Example: NUG30 • quadratic assignment problem • 30 facilities, 30 locations • minimize cost of transferring materials between them • posed in 1968 as challenge, long unsolved • but with a good pruning algorithm & high-throughput computing...

  11. NUG30 Solved on the Grid with Condor + Globus Resource simultaneously utilized: • the Origin 2000 (through LSF ) at NCSA. • the Chiba City Linux cluster at Argonne • the SGI Origin 2000 at Argonne. • the main Condor pool at Wisconsin (600 processors) • the Condor pool at Georgia Tech (190 Linux boxes) • the Condor pool at UNM (40 processors) • the Condor pool at Columbia (16 processors) • the Condor pool at Northwestern (12 processors) • the Condor pool at NCSA (65 processors) • the Condor pool at INFN (200 processors)

  12. NUG30 - Solved!!! Sender: goux@dantec.ece.nwu.edu Subject: Re: Let the festivities begin. Hi dear Condor Team, you all have been amazing. NUG30 required 10.9 years of Condor Time. In just seven days ! More stats tomorrow !!! We are off celebrating ! condor rules ! cheers, JP.

  13. The Idea Computing power is everywhere,we try to make it usable by anyone.

  14. Condor Tutorial This Afternoon: Outline • Understanding Condor • Using Condor to manage jobs • Using Condor to manage resources • Condor Architecture and Mechanisms • Condor on the Grid • Flocking • Condor-G • Case Study: Distributed TeraFlop

  15. Thank you! Check us out on the Web: http://www.cs.wisc.edu/condor Email: condor-admin@cs.wisc.edu

More Related