330 likes | 484 Views
March 17, 2011. System Software Considerations for Cloud Computing on Big Data. Michael Kozuch Intel Labs Pittsburgh. Outline. Background: Open Cirrus Cluster software stack Big Data Power Recent news. Open Cirrus . Open Cirrus * Cloud Computing Testbed.
E N D
March 17, 2011 System Software Considerations for Cloud Computing on Big Data Michael Kozuch Intel Labs Pittsburgh
Outline • Background: Open Cirrus • Cluster software stack • Big Data • Power • Recent news
Open Cirrus* Cloud Computing Testbed Collaboration between industry and academia, sharing • hardware infrastructure • software infrastructure • research • applications and data sets Sponsored by HP, Intel, and Yahoo! (with additional support from NSF) 14 sites currently, target of around 20 in the next two years ISPRAS* KIT* UIUC* ETRI* China Telecom* CESGA* CMU* GaTech* China Mobile* IDA* MIMOS*
Open Cirrus* http://opencirrus.org Independently-managed sites… providing a cooperative research testbed • Objectives • Foster systems research around cloud computing • Enable federation of heterogeneous datacenters • Vendor-neutral open-source stacks and APIs for the cloud • Expose research community to enterprise level requirements • Capture realistic traces of cloud workloads • Each Site • Runs its own research and technical teams, • Contributes individual technologies • Operates some of the global services
Mobile Rack 8 (1u) nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown] 16GB DRAM 2 1TB Disk Intel BigData Cluster 1 Gb/s (x8 p2p) Switch 24 Gb/s 1 Gb/s (x4) 3U Rack 5 storage nodes ------------- 12 1TB Disk 1 Gb/s (x8) 1 Gb/s (x2x5 p2p) 1 Gb/s (x4) Switch 48 Gb/s Switch 48 Gb/s 45 Mb/s T3 to Internet 1 Gb/s (x4) 1 Gb/s (x4) 20 nodes: 1 Xeon (single-core) [Irwindale] 6GB DRAM 366GB disk 10 nodes: 2 Xeon 5160 (dual-core) [Woodcrest] 4GB RAM 2 75GB Disk 10 nodes: 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) 1 Gb/s (x4) (r1r5) Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s Switch 48 Gb/s 1 Gb/s (x4x4 p2p) 1 Gb/s (x4x4 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x15 p2p) 1 Gb/s (x27 p2p) Blade Rack 40 nodes Blade Rack 40 nodes ------------- 2 Xeon E5345 (quad-core) [Clovertown] 8GB DRAM 2 150GB Disk 1U Rack 15 nodes ------------- 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 2U Rack 15 nodes ------------- 2 Xeon E5440 (quad-core) [Harpertown] 8GB DRAM 6 1TB Disk 2U Rack 15 nodes ------------- 2 Xeon E5520 (quad-core) [Nehalem-EP] 16GB DRAM 6 1TB Disk 1U Rack 15 nodes ------------- 2 Xeon E5420 (quad-core) [Harpertown] 8GB DRAM 2 1TB Disk 12 nodes ------------- 2 Xeon X5650 (six-core) [WestmereEP] 48GB DRAM 6 0.5TB Disk Key: rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z x1 (r1r1) x3 (r1r3,r1r4,r2r3) x2 (r3r2,r3r3) x1 (r1r2) (r2r1c1-4) (r2r2c1-4)
Node Node Node Node Node Node Cloud Software Stack – Key Learnings • Enable use of application frameworks (Hadoop, Maui-Torque) • Enable general IaaS use • Provide Big Data storage service • Enable physical resources allocation Application Frameworks IaaS Storage Service Why Physical? Virtualization overhead Access to phys resource Security issues Resource Allocator
Provides each project with a mini-datacenter Zoni Functionality Isolation of experiments • Allocation • Assignment of physical resources to users • Isolation • Allow multiple mini-clusters to co-exist without interference • Provisioning • Booting of specified OS • Management • OOB power management • Debugging • OOB console access Domain 1 Domain 0 DNS/PXE/DHCP PXE/DNS/DHCP Server Pool 0 Server Pool 0 Gateway Server Pool 1
Example Applications There has been more video uploaded to YouTube in the last 2 months than if ABC, NBC, and CBS had been airing content 24/7/365 continuously since 1948. - Gartner
Big Data • Interesting applications are data hungry • The data grows over time • The data is immobile • 100 TB @ 1Gbps ~= 10 days • Compute comes to the data • Big Data clusters are the new libraries The value of a cluster is its data
Example Motivating Application:Online Processing of Archival Video Big Data Cluster 14 • Research project: Develop a context recognition system that is 90% accurate over 90% of your day • Leverage a combination of low- and high-rate sensing for perception • Federate many sensors for improved perception • Big Data: Terabytes of archived video from many egocentric cameras • Example query 1: “Where did I leave my briefcase?” • Sequential search through all video streams [Parallel Camera] • Example query 2: “Now that I’ve found my briefcase, track it” • Cross-cutting search among related video streams [Parallel Time]
Big Data System Requirements • Provide high-performance execution over Big Data repositories Many spindles, many CPUs Parallel processing • Enable multiple services to access a repository concurrently • Enable low-latency scaling of services • Enable each service to leverage its own software stack IaaS, file-system protections where needed • Enable slow resource scaling for growth • Enable rapid resource scaling for power/demand • Scaling-aware storage
compute servers storage servers Storing the Data – Choices Model 1: Separate Compute/Storage Compute and storage can scale independently Many opportunities for reliability Model 2: Co-located Compute/Storage No compute resources are under-utilized Potential for higher throughput compute/storage servers
Cluster Model external network Cluster Switch BWswitch Connections to R Racks TOR Switch BWnode BWdisk p cores d disks The cluster switch quickly becomes the bottleneck. Rack of N server nodes Local computation is crucial.
I/O Throughput Analysis 20 racks of 20 2-disk servers; BWswitch = 10 Gbps
Data Location Information • Issues: • Many different file system possibilities (HDFS, PVFS, Lustre, etc) • Many different application framework possibilities • Consumers could be virtualized • Solution: • Standard cluster-wide Data Location Service • Resource Telemetry Service to evaluate scheduling choices • Enables virtualized location info and file system agnosticism
Exposing Location Information Data Location Service LA application LA application LA runtime LA runtime Resource Telemetry Service Virtual Machines DFS DFS Guest OS OS DFS VM Runtime VMM OS (a) non-virtualized (b) virtualized
Demand Scaling/ Power Proportionality • “A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems,” Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, and Albert Zomaya (System) Efficiency
Power Proportionality and Big Data The Hadoop Filesystem (10K blocks) 2000 Possible power savings: ~66% ~0% Optimal: ~95% Number of blocks stored on node i i=100 Node number i
Rabbit Filesystem A reliable, power-proportional filesystem for Big Data workloads Simple Strategy: Maintain a “primary replica”
Recent News • “Intel Labs to Invest $100 Million in U.S. University Research” • Over five years • Intel Science and Technology Centers– 3+2 year sponsored research • Half-dozen or more by 2012 • Each can have small number of Intel research staff on site • New ISTC focusing on cloud computing possible
Potential Research Questions • Software stack • Is physical allocation an interesting paradigm for the public cloud? • What are the right interfaces between the layers? • Can multi-variable optimization work across layers? • Big Data • Can a hybrid cloud-HPC file system provide best-of-both-worlds? • How should the file system deal with heterogeneity? • What are the right file system sharing models for the cloud? • Can physical resources be taken from the FS and given back?
Potential Research Questions • Power • Can storage service power be reduced without reducing availability? • How should a power-proportional FS maintain a good data layout? • Federation • Which applications can cope with limited bandwidth between sites? • What are the optimal ways to join data across clusters? • How necessary is federation? How should compute, storage, and power be managed to optimize for performance, energy, and fault-tolerance?
Scaling– Power Proportionality • Demand scaling presents perf./power trade-off • Our servers: 250W loaded, 150W idle, 10W off, 200s setup • Research underway for scaling cloud applications • Control theory • Load prediction • Autoscaling • Scaling beyond single tier less well-understood Cloud-based App Request rate: λ Note: proportionality issue is orthogonal to FAWN design
Scaling– Power Proportionality • Project 1: Multi-tier power management • E.g. Facebook • Project 2: Multi-variable optimization • Project 3: Collective optimization • Open Cirrus may have key role λ e.g. Tashi IaaS e.g. Rabbit Distributed file system e.g. Zoni Resource allocator Physical resources λ