1 / 17

Experiments in Utility Computing: Hadoop and Condor

Sameer Paranjpye Y! Web Search. Experiments in Utility Computing: Hadoop and Condor. Outline. Introduction Application environment, motivation, development principles Hadoop and Condor Description, Hadoop-Condor interaction. Introduction. Web Search Application Environment.

keegan
Download Presentation

Experiments in Utility Computing: Hadoop and Condor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sameer Paranjpye Y! Web Search Experiments in Utility Computing: Hadoop and Condor

  2. Outline • Introduction • Application environment, motivation, development principles • Hadoop and Condor • Description, Hadoop-Condor interaction

  3. Introduction

  4. Web Search Application Environment • Data intensive distributed applications • Crawling, Document Analysis and Indexing, Web Graphs, Log Processing, … • Highly parallel workloads • Bandwidth to data is a significant design driver • Very large production deployments • Several clusters of 100s-1000s of nodes • Lots of data (billions of records, input/output of 10s of TB in a single run)

  5. Why Condor and Hadoop? • To date, our Utility Computing efforts have been conducted using a command-and-control model • Closed, “cathedral” style development • Custom built, proprietary solutions • Hadoop and Condor • Experimental effort to leverage open source for infrastructure components • Current deployment: Cluster for supporting research computations • Multiple users, running ad-hoc, experimental programs

  6. Vision - Layered Platform, Open APIs Applications (Crawl, Index, …) Programming Models (MPI, DAG, MW, MR…) Distributed Store (HDFS, Lustre, Ibrix, …) Batch Scheduling (Condor, SGE, SLURM, …)

  7. Development philosophy • Adopt, Collaborate, Extend • Open source commodity software • Open APIs for interoperability • Identify and use existing robust platform components • Engage community and participate in developing nascent and emerging solutions

  8. Hadoop and Condor

  9. Hadoop • Open source project developing • Distributed store • Implementation of Map/Reduce programming model • Led by Doug Cutting • Implemented in Java • Alpha (0.1) release available for download • Apache distribution • Genesis • Lucene and Nutch (Open source search) • Hadoop (factors out distributed compute/storage infrastructure) • http://lucene.apache.org/hadoop

  10. Hadoop DFS • Distributed storage system • Files are divided into uniform sized blocks and distributed across cluster nodes • Block replication for failover • Checksums for corruption detection and recovery • DFS exposes details of block placement so that computes can be migrated to data • Notable differences from mainstream DFS work • Single ‘storage + compute’ cluster vs. Separate clusters • Simple I/O centric API vs. Attempts at POSIX compliance

  11. Hadoop DFS Architecture • Master Slave architecture • DFS Master “Namenode” • Manages all filesystem metadata • Controls read/write access to files • Manages block replication • DFS Slaves “Datanodes” • Serve read/write requests from clients • Perform replication tasks upon instruction by namenode

  12. Hadoop DFS Architecture Metadata (Name, replicas, …): /home/sameerp/foo, 3, … /home/sameerp/docs, 4, … Namenode Metadata ops Client Datanodes I/O Client Rack 1 Rack 2

  13. Benchmarks

  14. Deployment • Research cluster of 600 nodes • Billion+ web pages • Several months worth of logs • 10s of TB of data • Multiple-users running ad-hoc research computations • Crawl experiments, various kinds of log analysis, … • Commodity Platform: Intel/AMD, Linux, locally attached SATA drives • Testbed for open source approach • Still early days, deployment exposed many bugs • Future releases to • First stabilize at current size • Then scale to 1000+ nodes

  15. Hadoop-Condor interactions • DFS makes data locations available to applications • Applications generate job descriptions (class-ads) to schedule jobs close to data • Extensions to enable Hadoop programming models to run in scheduler universe • Master/Worker, MPI universe like meta-scheduling • Condor enables sharing among applications • Priority, accounting, quota mechanisms to manage resource allocation among users and apps

  16. 1 4 3 2 1 d e a b c Hadoop-Condor interactions Scheduler universe apps HDFS Data locations (d,e) 1 Condor Classads (Schedule on d,e) 2 3 Resource allocation 4

  17. The end THE END

More Related