1 / 33

System Software Considerations for Cloud Computing on Big Data

March 17, 2011. System Software Considerations for Cloud Computing on Big Data. Michael Kozuch Intel Labs Pittsburgh. Outline. Background: Open Cirrus Cluster software stack Big Data Power Recent news. Open Cirrus . Open Cirrus * Cloud Computing Testbed.

arnav
Download Presentation

System Software Considerations for Cloud Computing on Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. March 17, 2011 System Software Considerations for Cloud Computing on Big Data Michael Kozuch Intel Labs Pittsburgh

  2. Outline • Background: Open Cirrus • Cluster software stack • Big Data • Power • Recent news

  3. Open Cirrus

  4. Open Cirrus* Cloud Computing Testbed Collaboration between industry and academia, sharing • hardware infrastructure • software infrastructure • research • applications and data sets Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) 14 sites currently, target of around 20 in the next two years ISPRAS* KIT* UIUC* ETRI* China Telecom* CESGA* CMU* GaTech* China Mobile* IDA* MIMOS*

  5. Open Cirrus* http://opencirrus.org Independently-managed sites… providing a cooperative research testbed • Objectives • Foster systems research around cloud computing  • Enable federation of heterogeneous datacenters • Vendor-neutral open-source stacks and APIs for the cloud • Expose research community to enterprise level requirements • Capture realistic traces of cloud workloads • Each Site • Runs its own research and technical teams, • Contributes individual technologies • Operates some of the global services

  6. Mobile Rack 8 (1u) nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown] 16GB DRAM 2 1TB Disk Intel BigData Cluster 1 Gb/s (x8 p2p) Switch 24 Gb/s 1 Gb/s (x4) 3U Rack 5 storage nodes ------------- 12 1TB Disk 1 Gb/s (x8) 1 Gb/s (x2x5 p2p) 1 Gb/s (x4) Switch 48 Gb/s Switch 48 Gb/s 45 Mb/s T3 to Internet 1 Gb/s (x4) 1 Gb/s (x4) 20 nodes: 1 Xeon (single-core) [Irwindale] 6GB DRAM 366GB disk 10 nodes: 2 Xeon 5160 (dual-core) [Woodcrest] 4GB RAM 2 75GB Disk 10 nodes: 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) (r1r5) Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s (x4x4 p2p) 1 Gb/s (x4x4 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x27 p2p) Blade Rack 40 nodes Blade Rack 40 nodes ------------- 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1U Rack 15 nodes ------------- 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 2U Rack 15 nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown] 8GB DRAM 6 1TB Disk 2U Rack 15 nodes ------------- 2 Xeon E5520 (quad-core) [Nehalem-EP] 16GB DRAM 6 1TB Disk 1U Rack 15 nodes ------------- 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 12 nodes ------------- 2 Xeon X5650 (six-core) [WestmereEP] 48GB DRAM 6 0.5TB Disk Key: rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z x1 (r1r1) x3 (r1r3,r1r4,r2r3) x2 (r3r2,r3r3) x1 (r1r2) (r2r1c1-4) (r2r2c1-4)

  7. Cloud Software Stack

  8. Node Node Node Node Node Node Cloud Software Stack – Key Learnings • Enable use of application frameworks (Hadoop, Maui-Torque) • Enable general IaaS use • Provide Big Data storage service • Enable physical resources allocation Application Frameworks IaaS Storage Service Why Physical? Virtualization overhead Access to phys resource Security issues Resource Allocator

  9. Provides each project with a mini-datacenter Zoni Functionality Isolation of experiments • Allocation • Assignment of physical resources to users • Isolation • Allow multiple mini-clusters to co-exist without interference • Provisioning • Booting of specified OS • Management • OOB power management • Debugging • OOB console access Domain 1 Domain 0 DNS/PXE/DHCP PXE/DNS/DHCP Server Pool 0 Server Pool 0 Gateway Server Pool 1

  10. Intel BigData Cluster Dashboard

  11. Big Data

  12. Example Applications There has been more video uploaded to YouTube in the last 2 months than if ABC, NBC, and CBS had been airing content 24/7/365 continuously since 1948. - Gartner

  13. Big Data • Interesting applications are data hungry • The data grows over time • The data is immobile • 100 TB @ 1Gbps ~= 10 days • Compute comes to the data • Big Data clusters are the new libraries The value of a cluster is its data

  14. Example Motivating Application:Online Processing of Archival Video Big Data Cluster 14 • Research project: Develop a context recognition system that is 90% accurate over 90% of your day • Leverage a combination of low- and high-rate sensing for perception • Federate many sensors for improved perception • Big Data: Terabytes of archived video from many egocentric cameras • Example query 1: “Where did I leave my briefcase?” • Sequential search through all video streams [Parallel Camera] • Example query 2: “Now that I’ve found my briefcase, track it” • Cross-cutting search among related video streams [Parallel Time]

  15. Big Data System Requirements • Provide high-performance execution over Big Data repositories  Many spindles, many CPUs  Parallel processing • Enable multiple services to access a repository concurrently • Enable low-latency scaling of services • Enable each service to leverage its own software stack  IaaS, file-system protections where needed • Enable slow resource scaling for growth • Enable rapid resource scaling for power/demand •  Scaling-aware storage

  16. compute servers storage servers Storing the Data – Choices Model 1: Separate Compute/Storage Compute and storage can scale independently Many opportunities for reliability Model 2: Co-located Compute/Storage No compute resources are under-utilized Potential for higher throughput compute/storage servers

  17. Cluster Model external network Cluster Switch BWswitch Connections to R Racks TOR Switch BWnode BWdisk p cores d disks The cluster switch quickly becomes the bottleneck. Rack of N server nodes Local computation is crucial.

  18. I/O Throughput Analysis 20 racks of 20 2-disk servers; BWswitch = 10 Gbps

  19. Data Location Information • Issues: • Many different file system possibilities (HDFS, PVFS, Lustre, etc) • Many different application framework possibilities • Consumers could be virtualized • Solution: • Standard cluster-wide Data Location Service • Resource Telemetry Service to evaluate scheduling choices • Enables virtualized location info and file system agnosticism

  20. Exposing Location Information Data Location Service LA application LA application LA runtime LA runtime Resource Telemetry Service Virtual Machines DFS DFS Guest OS OS DFS VM Runtime VMM OS (a) non-virtualized (b) virtualized

  21. Power

  22. Demand Scaling/ Power Proportionality • “A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems,” Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, and Albert Zomaya (System) Efficiency

  23. Power Proportionality and Big Data The Hadoop Filesystem (10K blocks) 2000 Possible power savings: ~66% ~0% Optimal: ~95% Number of blocks stored on node i i=100 Node number i

  24. Rabbit Filesystem A reliable, power-proportional filesystem for Big Data workloads Simple Strategy: Maintain a “primary replica”

  25. Recent News

  26. Recent News • “Intel Labs to Invest $100 Million in U.S. University Research” • Over five years • Intel Science and Technology Centers– 3+2 year sponsored research • Half-dozen or more by 2012 • Each can have small number of Intel research staff on site • New ISTC focusing on cloud computing possible

  27. Tentative Research Agenda Framing

  28. Potential Questions

  29. Potential Research Questions • Software stack • Is physical allocation an interesting paradigm for the public cloud? • What are the right interfaces between the layers? • Can multi-variable optimization work across layers? • Big Data • Can a hybrid cloud-HPC file system provide best-of-both-worlds? • How should the file system deal with heterogeneity? • What are the right file system sharing models for the cloud? • Can physical resources be taken from the FS and given back?

  30. Potential Research Questions • Power • Can storage service power be reduced without reducing availability? • How should a power-proportional FS maintain a good data layout? • Federation • Which applications can cope with limited bandwidth between sites? • What are the optimal ways to join data across clusters? • How necessary is federation? How should compute, storage, and power be managed to optimize for performance, energy, and fault-tolerance?

  31. Backup

  32. Scaling– Power Proportionality • Demand scaling presents perf./power trade-off • Our servers: 250W loaded, 150W idle, 10W off, 200s setup • Research underway for scaling cloud applications • Control theory • Load prediction • Autoscaling • Scaling beyond single tier less well-understood Cloud-based App Request rate: λ Note: proportionality issue is orthogonal to FAWN design

  33. Scaling– Power Proportionality • Project 1: Multi-tier power management • E.g. Facebook • Project 2: Multi-variable optimization • Project 3: Collective optimization • Open Cirrus may have key role λ e.g. Tashi IaaS e.g. Rabbit Distributed file system e.g. Zoni Resource allocator Physical resources λ

More Related