Cloud Computing Skepticism

Abhishek Verma, Saurabh Nangia Cloud Computing Skepticism

Outline • Cloud computing hype • Cynicism • MapReduce Vs Parallel DBMS • Cost of a cloud • Discussion

Recent Trends Amazon S3 (March 2006) Amazon EC2 (August 2006) Salesforce AppExchange (March 2006) Google App Engine (April 2008) Facebook Platform (May 2007) Microsoft Azure (Oct 2008)

Tremendous Buzz

Gartner Hype Cycle* Cloud Computing * From http://en.wikipedia.org/wiki/Hype_cycle

Blind men and an Elephant CLOUD COMPUTING

“Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades.” whatis.com Definition of Cloud Computing

“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. […] The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?” Larry Ellison During Oracle’s Analyst Day From http://blogs.wsj.com/biztech/2008/09/25/larry-ellisons-brilliant-anti-cloud-computing-rant/

From http://geekandpoke.typepad.com

Reliability • Many enterprise (necessarily or unnecessarily) set their SLAs uptimes at 99.99% or higher, which cloud providers have not yet been prepared to match • Not clear that all applications require such high services • IT shops do not always deliver on their SLAs but their failures are less public and customers can’t switch easily * SLAs expressed in Monthly Uptime Percentages; Source : McKinsey & Company

Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker To appear in SIGMOD ‘09 A Comparison of Approaches to Large-Scale Data Analysis* *Basic ideas from MapReduce - a major step backwards, D. DeWitt and M. Stonebraker

MapReduce – A major step backwards • A giant step backward • No schemas, Codasyl instead of Relational • A sub-optimal implementation • Uses brute force sequential search, instead of indexing • Materializes O(m.r) intermediate files • Does not incorporate data skew • Not novel at all • Represents a specific implementation of well known techniques developed nearly 25 years ago • Missing most of the common current DBMS features • Bulk loader, indexing, updates, transactions, integrity constraints, referential Integrity, views • Incompatible with DBMS tools • Report writers, business intelligence tools, data mining tools, replication tools, database design tools

MapReduce II* • MapReduce didn't kill our dog, steal our car, or try and date our daughters. • MapReduce is not a database system, so don't judge it as one • Both analyze and perform computations on huge datasets • MapReduce has excellent scalability; the proof is Google's use • Does it scale linearly? • No scientific evidence • MapReduce is cheap and databases are expensive • We are the old guard trying to defend our turf/legacy from the young turks • Propagation of ideas between sub-disciplines is very slow and sketchy • Very little information is passed from generation to generation * http://www.databasecolumn.com/2008/01/mapreduce-continued.html

Tested Systems • Hadoop • 0.19 on Java 1.6, 256MB block size, JVM reuse • Rack-awareness enabled • DBMS-X (unnamed) • Parallel DBMS from a “major relational db vendor” • Row based, compression enabled • Vertica (co-founded by Stonebraker) • Column oriented • Hardware configuration: 100 nodes • 2.4 GHz Intel Core 2 Duo • 4GB RAM, 2 250GB SATA hard disks • GigE ports, 128Gbps switching fabric

Data Loading • Grep Dataset • Record = 10b key + 90b random value • 5.6 million records = 535MB/node • Another set = 1TB/cluster • Hadoop • Command line utility • DBMS-X • LOAD SQL command • Administrative command to re-organize data

Grep Task Results SELECT*FROMDataWHEREfieldLIKE‘%XYZ%’;

Select Task Results SELECTpageURL,pageRank FROMRankingsWHEREpageRank > X;

Join Task SELECT INTO Temp sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenue FROM Rankings AS R, UserVisits AS UV WHERE R.pageURL=UV.destURL AND UV.visitDate BETWEEN Date(‘2000-01-15’) AND Date(‘2000-01-22’) GROUP BY UV.sourceIP; SELECT sourceIP,totalRevenue, avgPageRank FROM Temp ORDER BY totalRevenue DESC LIMIT 1;

Concluding Remarks • DBMS-X 3.2 times, Vertica 2.3 times faster than Hadoop • Parallel DBMS win because • B-tree indices to speed the execution of selection operations, • novel storage mechanisms (e.g., column-orientation) • aggressive compression techniques with ability to operate directly on compressed data • sophisticated parallel algorithms for querying large amounts of relational data. • Ease of installation and use • Fault tolerance? • Loading data?

Albert Greenberg, James Hamilton, David A. Maltz, Parveen Patel MSR Redmond The Cost of a Cloud: Research Problem in Data Center Networks Presented by: Saurabh Nangia

Overview • Cost of cloud service • Improving low utilization • Network agility • Incentive for resource consumption • Geo-distributed network of DC

Cost of a Cloud? • Where does the cost go in today’s cloud service data centers?

Cost of a Cloud Amortized Costs (one time purchases amortized over reasonable lifetimes, assuming 5% cost of money) 45% 25% 15% 15%

Are Clouds any different? • Can existing solutions for the enterprise data center work for cloud service data centers?

Enterprise DC vs Cloud DC (1) • In enterprise • Leading cost: operational staff • Automation is partial • IT staff : Servers = 1:100 • In cloud • Staff costs under 5% • Automation is mandatory • IT staff : Servers = 1:1000

Enterprise DC vs Cloud DC (2) • Large economies of scale • Cloud DC leverage economies of scale • But up front costs are high • Scale Out • Enterprises DC “scale up” • Cloud DC “scale out”

Types of Cloud Service DC (1) • Mega data centers • Tens of thousands (or more) servers • Drawing tens of Mega-Watts of power (at peak) • Massive data analysis applications • Huge RAM, Massive CPU cycles, Disk I/O operations • Advantages • Cloud services applications build on one another • Eases system design • Lowers cost of communication needs

Types of Cloud Service DC (2) • Micro data centers • Thousands of servers • Drawing power peaking in 100s of Kilo-Watts • Highly interactive applications • Query/response, office productivity • Advantages • Used as nodes in content distribution network • Minimize speed-of-light latency • Minimize network transit costs to user

Cost Breakdown

Server Cost (1) • Example • 50,000 servers • $3000 per server • 5% cost of money • 3 year amortization • Amortized cost = 50000 * 3000 * 1.05 / 3 = $52.5 million dollars per year!! • Utilization remarkably low, ~10%

Server Cost (2) • Uneven Application Fit • Uncertainty in demand forecasts • Long provisioning time scales • Risk Management • Hoarding • Virtualization short-falls

Reducing Server Cost • Solution: Agility • to dynamically grow and shrink resources to meet demand, and • to draw those resources from the most optimal location. • Barrier: Network • Increases fragmentation of resources • Therefore, low server utlization

Infrastructure Cost • Infrastructure is overhead of Cloud DC • Facilities dedicated to • Consistent power delivery • Evacuating heat • Large scale generators, transformers, UPS • Amortized cost: $18.4 million per year!! • Infra cost: $200M • 5% cost of money • 15 year amortization

Reducing Infrastructure Cost • Reason of high cost: requirement for delivering consistent power • Relaxing the requirement implies scaling out • Deploy larger numbers of smaller data centers • Resilience at data center level • Layers of redundancy within data center can be stripped out (no UPS & generators) • Geo-diverse deployment of micro data centers

Power • Power Usage Efficiency (PUE) = (Total Facility Power)/(IT Equipment Power) • Typically PUE ~ 1.7 • Inefficient facilities, PUE of 2.0 to 3.0 • Leading facilities, PUE of 1.2 • Amortized cost = $9.3million per year!! • PUE: 1.7 • $.07 per KWH • 50000 servers each drawing average 180W

Reducing Power Costs • Decreasing power cost -> decrease need of infrastructure cost • Goal: Energy proportionality • server running at N% load consume N% power • Hardware innovation • High efficiency power supplies • Voltage regulation modules • Reduce amount of cooling for data center • Equipment failure rates increase with temp • Make network more mesh-like & resilient

Network • Capital cost of networking gear • Switches, routers and load balancers • Wide area networking • Peering: traffic handed off to ISP for end users • Inter-data center links b/w geo distributed DC • Regional facilities (backhaul, metro-area connectivity, co-location space) to reach interconnection sites • Back-of-the-envelope calculations difficult

Reducing Network Costs • Sensitive to site selection & industry dynamics • Solution: • Clever design of peering & transit strategies • Optimal placement of micro & mega DC • Better design of services (partitioning state) • Better data partitioning & replication

Perspective • On is better than off • Server should be engaged in revenue production • Challenge: Agility • Build in resilience at systems level • Stripping out layers of redundancy inside each DC, and instead using other DC to mask DC failure • Challenge: Systems software & Network research

Cost of Large Scale DC *http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx

Solutions!

Improving DC efficiency • Increasing Network Agility • Appropriate incentives to shape resource consumption • Joint optimization of Network & DC resources • New mechanisms for geo-distributing states

Agility • Any server can be dynamically assigned to any service anywhere in DC • Conventional DC • Fragment network & server capacity • Limit dynamic growth and shrink of server pools

Networking in Current DC • DC network two types of traffic • Between external end systems & internal servers • Between internal servers • Load Balancer • Virtual IP address (VIP) • Direct IP address (DIP)

Conventional Network Architecture

Problems (1) • Static Network Assignment • Individual applications mapped to specific physical switches & routers • Adv: performance & security isolation • Disadv: Work against agility • Policy-overloaded (traffic, security, performance) • VLAN spanning concentrates traffic on links high in tree

Problems (2) • Load Balancing Techniques • Destination NAT • All DIPs in a VIPs pool be in the same layer 2 domain • Under-utilization & fragmentation • Source NAT • Servers spread across layer 2 domain • But server never sees IP • Client IP required for data mining & response customization

Problems (3) • Poor server to server connectivity • Connection b/w servers in diff layer 2 must go through layer 3 • Links oversubscribed • Capacity of links b/w access router & border routers < output capacity of servers connected to access router • Ensure no saturation in any of network links!

Problems (4) • Proprietary hardware scales up, not out • Load balancers used in pairs • Replaced when load becomes too much

Cloud Computing Skepticism

Cloud Computing Skepticism

Presentation Transcript

Cloud Computing

Cloud Computing

CLOUD COMPUTING

Cloud Computing

Cloud Computing

Cloud Computing Skepticism

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

Cloud Computing

CLOUD COMPUTING

Cloud Computing

Cloud Computing