1 / 70

Flights of the Condor: War Stories, Challenges, and Solutions

Flights of the Condor: War Stories, Challenges, and Solutions. Jason Stowe Condor Week 2009 April 22 nd , 2009. Coming to Condor Week since 2005. Started as a User. Users hunger for features.

terrence
Download Presentation

Flights of the Condor: War Stories, Challenges, and Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flights of the Condor:War Stories, Challenges, and Solutions Jason Stowe Condor Week 2009 April 22nd, 2009

  2. Coming to Condor Week since 2005. Started as a User

  3. Users hunger for features

  4. AccountingGroups (2004/2005)Configuration w/Pipes (2005/2006)GroupResourcesUsed (2006/2007)Condor in Cloud (2007/2008)Resource Weights (2008/2009)Based upon customer requests

  5. Focus on software development for managing Condor at any scale,and provide services that complement the technology

  6. Universities, Fortune 500s, Government Labs, Small/Medium Businesses, that use Condor

  7. Users like Condor because...It’s open, it works, flexible, (corporations) no lock-in API/Operating System, and...

  8. The Community

  9. Today, let’s talk about a few challenges, solutions

  10. War Story #1: Compute & Data

  11. Whenever you find or solvea computation problem, youdiscover a data problem.

  12. “Dark” or Latent, Unused Storageon any OS/Device

  13. Empty space dispersed across machines in unusable sizes

  14. “We need more filer space, but we have empty space on all our machines.”

  15. So we looked at Hadoop

  16. New type of storage:Aggregated or “Cloud” Storage

  17. Block Store Architecture

  18. But how do we use it?

  19. 1.5 years ago: It works well to access it in Java, but what about mounting?

  20. So we tried WebDAV

  21. Next up,open source FUSE driver

  22. Need: Windows/Linux, Reliable, Large Files, scalable, and Read/Write

  23. Mountable drivers Linux(FUSE) / Windows (IFS)

  24. CloudFS Architecture

  25. When we rolled it out...

  26. Customers Asked for Surprising Features • HTTP/REST Protocols similar to Amazon S3 Reasons: Installing mountable driver across servers/workstations prohibitive Want similar interface to various cloud storage providers => Internal Cloud • FTP Interface – Because it is simple!

  27. Status Today

  28. Mountable Multi-platform Drivers. Linux: SUSE 10, RHEL/CentOS 4&5, Windows 2k3 +, OSX 10.3+

  29. Encryption to avoid snooping sensitive data

  30. Data Nodes built on Java: Linux, Windows, OSX, Solaris

  31. RESTful Storage Service & FTP interface

  32. Management interface for controlling storage features(Integrating with CycleServer)

  33. Looking forward to condor_hadoop!

  34. War Story #2: Cloud Calculations

  35. Condor usersPeak vs. Median usageProblem

  36. Need for compute power comes up suddenly

  37. Condor Users hunger for resources

  38. Condor users balance “We need more servers for big runs” and “Our servers are 40% utilized”

  39. Many ways to solve this problem using EC2

  40. Use cases do exist for adding nodes to a local condor poolusing Amazon EC2

  41. We favored entire poolsin cloud

  42. Data Scheduling, Performance issues

  43. Run workflows faster using resources you could never buy...

  44. can test CycleServer at a scale our users have and we don’t

  45. Need 1000 node Condor PoolWait 15 minutes

  46. Dynamic Resources => Pool can be sized to the jobs

  47. 1 core x 1000 hrs =1000 core x 1 hr = ~$200

  48. Sounds good, but how do we do this for a Workflow like BLAST?

  49. From e-science 2008:For 64x the processorsHadoop Running Blast: 57xmpiBLAST: 52.4x

More Related