1 / 26

An opportunistic HTCondor pool inside an interactive-friendly Kubernetes cluster

Learn about the implementation and performance of an opportunistic HTCondor pool within a interactive-friendly Kubernetes cluster. This presentation discusses the Pacific Research Platform, the use of Kubernetes as a resource manager, dealing with opportunistic pod types, and future possibilities.

grick
Download Presentation

An opportunistic HTCondor pool inside an interactive-friendly Kubernetes cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An opportunistic HTCondor pool inside an interactive-friendly Kubernetes cluster At HTCondor Week 2019 Presented by Igor Sfiligoi, UCSDfor the PRP team HTCondor Week, May 2019

  2. Outline • Where do I come from? • What we did? • How is it working? • Looking head HTCondor Week, May 2019

  3. The Pacific Research Platform • The PRP originally created as a regional networking project • Establishing end-to-end links between 10Gbps and 100Gbps (GDC) HTCondor Week, May 2019

  4. The Pacific Research Platform • The PRP originally created as a regional networking project • Establishing end-to-end links between 10Gbps and 100Gbps • Expanded nationally since • And beyond, too HTCondor Week, May 2019

  5. The Pacific Research Platform • Recently the PRP evolved in a major resource provider, too • Because scientists really need more than bandwidth tests • They need to share their data at high speed and compute on it, too • The PRP now also provides • Extensive compute power – About 330 GPUs and 3.5k CPU cores • A large distributed storage area - About 2 PBytes • Select user communities now directly use all the resources PRP has to offer • Still doing all the network R&D in the same setup, too • We call it the Nautilus cluster HTCondor Week, May 2019

  6. Kubernetes as a resource manager HTCondor Week, May 2019

  7. Designed for interactive use HTCondor Week, May 2019

  8. Opportunistic use HTCondor Week, May 2019

  9. Kubernetes priorities https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ HTCondor Week, May 2019

  10. HTCondor as the OSG helper HTCondor Week, May 2019

  11. HTCondor in a (set of) container(s) HTCondor Week, May 2019

  12. Service vs Opportunistic Pure opportunistic HTCondor Week, May 2019

  13. Then came the users Not without elevated privileges HTCondor Week, May 2019

  14. Then came the users HTCondor Week, May 2019

  15. Dealing with many opportunistic pod types HTCondor Week, May 2019

  16. Dealing with many opportunistic pod types I know how to implement this. HTCondor Week, May 2019

  17. Dealing with many opportunistic pod types I was toldthis is coming. HTCondor Week, May 2019

  18. Dealing with many opportunistic pod types In OSG-land, glideinWMS solves this for me. HTCondor Week, May 2019

  19. Dealing with many opportunistic pod types No concrete plans on how to address these yet. HTCondor Week, May 2019

  20. Dealing with many opportunistic pod types HTCondor Week, May 2019

  21. Are side-containers an option? Pod HTCondor container User job container HTCondor Week, May 2019

  22. Will nested Containers be a reality soon? HTCondor Week, May 2019

  23. Looking ahead HTCondor Week, May 2019

  24. A final picture • Opportunistic GPU usage over the past few months HTCondor Week, May 2019

  25. Summary • We created an opportunistic HTCondor pool in the PRP Kubernetes cluster • OSG users can now use any otherwise-unused cycles • The lack of nested containerization forces us to have multiple execute pod types • Some micromanagement currently needed, hoping for more automation in the future HTCondor Week, May 2019

  26. Acknowledgents This work was partially funded by US National Science Foundation (NSF) awards CNS-1456638, CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC 1450871, OAC-1659169 and OAC-1841530. HTCondor Week, May 2019

More Related