1 / 16

Introduction

Introduction. Readings. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, L. Barroso and U. Holze. Introduction. Increasingly we are seeing more of our applications moving from the PC to the Internet e.g., Email – gmail, yahoo

cruz
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction

  2. Readings • The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, L. Barroso and U. Holze

  3. Introduction • Increasingly we are seeing more of our applications moving from the PC to the Internet e.g., • Email – gmail, yahoo • Photo management – Picasso, Kodak, Sutterbug • Word processing – Google apps • Why? • Less work on the user’s behalf • Maybe the potential for less cost for the user

  4. Introduction • To support this move from the PC to the “Internet” requires a large number of servers, storage, network support etc; • Companies like Amazon, Google, eBay are running data centers with tens of thousands of machines • To make users trust these systems requires that a number of issues be addressed e.g., failure handling

  5. Architecture

  6. Architecture • Common elements include • Low end servers typically in a blade enclosure within a rack • The interconnection of servers within a rack is supported with a local Ethernet switch (rack switch) • The local Ethernet switch has a number of uplink connections to one or more cluster-level (data center level) Ethernet switch

  7. Storage • Disks can be connected directly to each server and managed by a global distributed file system (e.g., Google’s GFS); or • Disks can be part of Network Attached Storage (NAS) devices that are directly connected to the cluster level switch

  8. Storage • NAS • Reliability is provided by the device through replication and error codes • Server node • Need a fault-tolerant file system at the cluster level which is not trivial to implement • Writes are slower • Potentially is lower cost then using NAS • Disks can be the same as what is on your PC

  9. Storage Hierarchy

  10. Networking Fabric • Tradeoffs between speed, scale and cost • Intra rack connectivity is relatively inexpensive to achieve • Network switches with high port counts have a different price structure then switches used for rack connectivity • Much more expensive • Network switches with few ports require programmers to be aware of the scarce bandwidth

  11. Latency, Bandwidth, Capacity • Much faster for an application to retrieve data from local disks then from off rack disks but • Applications often need more storage then found on a local disk (e.g., Google search) • How is this dealt with efficiently?

  12. Power Usage • Peak power usage measured at one of Google’s data centers: • Networking 5% • CPUs 23% • Disks 10% • DRAMS 30% • Other 22%

  13. Handling Failures • The high number of components almost guarantee failures • Disk drives can exhibit annualized failure rates higher than 4% • Lots of restarts needed • This issue has received a good deal of attention

  14. Request Handling • Lots of disks so how is data placed so that it can be found • Let’s look at Amazon • Partition the data so that groups of servers handle just a part of the inventory (or any other data) • Router needs to be able to extract keys from request • Hashing is one strategy for doing this • Based on the key you then determine the server to handle the request

  15. Internet-time implies constant change Need acceptable quality Three approaches to managing upgrades Fast reboot: Cluster at a time Minimize yield impact Rolling upgrade: Node at a time Versions must be compatible Big flip: Half the cluster at a time Reserved for complex changes Either way: use staging area, be prepared to revert Online Evolution

  16. Summary • We have briefly discussed a high-level view of data centers • In this course we will discuss how Google, Amazon, etc deal with some of the implications of these architectures

More Related