1 / 20

Condor RoadMap

Explore the roadmap for Condor Version 6.7.x focusing on scalability, resources, failover, and accessibility. Learn about key improvements such as increased job capacity, better matchmaking, and enhanced security.

Download Presentation

Condor RoadMap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor RoadMap

  2. Outline • The “Big Picture” • Version 6.7.x • Availability • Failover • Scalability • Resources, jobs, matchmaking framework, files • Accessibility • APIs, more Grid middleware, network

  3. Big Picture What do we want to achieve in a new Condor developer series? • Technology Transfer • Building a bridge between the Condor production software development activity and the academic core research activity BAD-FS, Stork, Diskrouter, Parrot (transparent I/O), Schedd Glidein, VO Schedulers, HA, Management, Improved ClassAds…

  4. What do we want to achieve, cont? • New Ports: Go to where the cycles are! • The RedHat Dilemma • Our porting ‘hopper’ : • AIX 5.1L on the PowerPC architecture • Redhat AS server on x86 • Fedora Core on x86 • Fedora Core 2 on x86 • Redhat AS server on AMD64 • SuSE 8.0 on AMD64 • Redhat AS server on IA64 • HPUX 11.11 64-bit

  5. What do we want to achieve, cont. • Improve existing ports • Move “clipped wing” port to full ports (w/ checkpoint, process migration) • Max OS X, Windows • Better integration into environments • Windows: operate better w/ DFS, use MSI • Unix: operate w/ AFS

  6. What do we want to achieve, cont. • Address changes in the computing landscape • Firewalls, NATs • 64-bit operating systems • Emphasis on data • Movement towards standards such as WS, OGSA, …

  7. Version 6.7.x Theme • Version 6.7.x • Scalability • Resources, jobs, matchmaking framework, security • Availability • Failover • Accessibility • APIs, more Grid middleware, network

  8. High Availability in v6.7.x What happens if my submit machine reboots? Once upon a time, only one answer: job restarts. Checkpoint? No Checkpoint?

  9. New: Job Progress continues if connection is interrupted • Now for Vanilla and Java universe jobs, Condor now supports reestablishment of the connection between the submitting and executing machines. • To take advantage of this feature, put the following line into their job’s submit description file: JobLeaseDuration = <N seconds> For example: JobLeaseDuration = 1200

  10. What if the submission point spontaneously explodes? (don’t try this at home)

  11. More High Availability Solutions • Condor can support a submit machine “hot spare” • If your submit machine is down for longer than N minutes, a second machine can take over • Two mechanisms available • Job Mirroring • Described by Jaime earlier today • High Availability Daemon Failover • Just tell the condor_master to run ONE instance

  12. Master SchedD Daemon Failover Machine A Machine B Refresh Lock Refresh Lock Obtain Lock Check Lock Master SchedD Active Active (hot spare)

  13. Accessibility • Support for GCB • Condor working w/ NATs, Firewalls • Distributed Resource Management Application API (DRMAA) • GGF Working Group • An API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems • Condor DRMAA interface to appear in v6.7.0

  14. SOAP/Grid Service condor_schedd Cedar Web Service: SOAP HTTPS OGSI: SOAP HTTPG

  15. New “Grid Universe” • With new Grid Universe, always specify a ‘gridtype’. So the old “globus” Universe is now declared as: universe = grid gridtype = gt2 • Other gridtypes? GT3 for OGSA-based Globus Toolkit 3

  16. Condor-G improvements • Condor-G can submit to either Globus GT2 or GT3 resources, including support for GT3 with web services. • Condor-G includes everything required; no need for client to have a GT3 installation. • Good migration path to OGSA • Condor-G to Nordugrid, Unicore, Condor, ORACLE • Support for credential refresh via the MyProxy Online Credential Management in NMI http://grid.ncsa.uiuc.edu/myproxy/

  17. Why Condor + MyProxy? • Long-lived tasks or services need credentials • Task lifetime is difficult to predict • Don’t want to delegate long-lived credentials • Fear of compromise • Instead, renew credentials with MyProxy as needed during the task’s lifetime • Provides a single point of monitoring and control • Renewal policy can be modified at any time • For example, disable renewals if compromise is detected or suspected

  18. Refresh Credentials RetrieveCredentials RefreshCredentials Credential Renewal Home Remote SubmitJobs ResourceManager Launch Job Condor-G Scheduler EnableRenewal MyProxy Job

  19. More… • Condor can now transfer job data files larger than 2 GB in size. • On all platforms that support 64bit file offsets • Real-time spooling of stdout/err/in in any universe incl VANILLA • Real-time monitoring of job progress

  20. Thank you!

More Related