1 / 12

-Sandeep Shiva

Decoupling Storage and Computation in Hadoop with SuperDataNodes. George Porter UC San Diego La Jolla, CA 92093 gmporter@cs.ucsd.edu. -Sandeep Shiva. Contents: Introduction Existing architecture Advantages, Limitations SuperDataNode Advantages, Limitations Evaluation Results.

Download Presentation

-Sandeep Shiva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decoupling Storage and Computation in Hadoop with SuperDataNodes George Porter UC San Diego La Jolla, CA 92093 gmporter@cs.ucsd.edu -Sandeep Shiva

  2. Contents: • Introduction • Existing architecture • Advantages, Limitations • SuperDataNode • Advantages, Limitations • Evaluation Results

  3. Introduction: • Rise in Data Intensive Computing • Data-parallel programming systems • Use of Hadoop is growing: • ADOBE, AOL,AMAZON,FACEBOOK….. • Fact: Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds!

  4. Hadoop Architecture

  5. Advantages of Hadoop: • It is able to process a portion of the data in parallel on each node, leading to very high scalability. • It relies on commodity server nodes and networking fabrics, reducing the cost of deployment considerably.

  6. Limitations: • The ratio of computation to storage might change • over time, or might not be known in advance. • When the workload varies, it might be desirable • to power down or re-purpose some of the • Hadoop nodes for other applications which is not • possible in the existing structure as the data is spread • across all the nodes. Migration consumes time! • Note: Offloading a single terabyte off of a typical disk • over a gigabit link takes approximately three hours

  7. SuperDataNode

  8. Super Data Node: • It is a node with a large number of disks for storage compared to the traditional Hadoop node. • It Each VM is assigned its own network interface if 1 Gbit/sec links are used), or a portion of a network interface if 10 Gbit/sec links are used), and its own IP address. • Note: An experiment revealed that a SuperDataNode reduced total job execution time of a Sort workload by 17%, and a Grep workload by 54%.

  9. Advantages: • Decouple amount of storage from number of • worker nodes . • • Support for “archival” data • – Subset of data with low probability of access • • Increased uniformity for job scheduling and block • placement. • • Ease of management • – Workers become stateless; SDN management • similar to that of a regular storage node.

  10. Limitations: • Storage bandwidth between SuperDataNodes • and TaskTrackers is a scarce resource. • Effect on fault tolerance • Cost of SuperDataNodes

  11. Evaluation: For Baseline: 10 SunFire TMX4150 Servers running OpenSolaris TM, each with 8 GB of memory and four 146GB SAS disk drives. ForSuperDataNode: SunFire TMX4540 Server configured with 64 GB of memory and 48 500GB SATA drives. Sort:17% less time Grep: 54% less time

  12. Thank you!

More Related