1 / 11

Web Search For a Planet: The Google Cluster Architecture

Web Search For a Planet: The Google Cluster Architecture. Luiz Andre Barroso Jeffrey Dean Urs Holzle Google. Outline. Introduction Architecture Overview Serving a Google Query Design Principles of Google Clusters The Power Problem Parallelism

manasa
Download Presentation

Web Search For a Planet: The Google Cluster Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Search For a Planet: The Google Cluster Architecture Luiz Andre Barroso Jeffrey Dean Urs Holzle Google

  2. Outline • Introduction • Architecture Overview • Serving a Google Query • Design Principles of Google Clusters • The Power Problem • Parallelism • Hardware-level Application Characteristics • Conclusion

  3. Introduction • Search engines • require high amounts of computation per request • A single query on Google (on average) • reads hundreds of megabytes of data • consumes tens of billions of CPU cycles • Instead of a smaller number of high-end servers • Google combines more than 15,000 commodity-class PCs

  4. Architecture Overview • Provide reliability in software • Use cluster of PCs • Not high-end servers • Design system to maximize aggregate throughput • No server response time can parallelize individual • requests

  5. Serving a Google Query Figure shows Google query-serving architecture

  6. Serving a Google Query(Cont.) • Geographically distributed clusters • Each with a few thousand machines • First perform Domain Name System (DNS) lookup • Maps request to nearby cluster • Send HTTP request to selected cluster • Request serviced locally w/in that cluster • Clusters consist of Google Web Servers (GWSs) • Hardware-based load balancer distributes load among GWSs

  7. Design Principles of Google Clusters • Software level reliability • No fault-tolerant hardware features; e.g. • redundant power supplies • A redundant array of inexpensive disks (RAID) • Use replication • for better request throughput and availability • Price/performance beats peak performance • CPUs giving the best performance per unit price • Not the CPUs with best absolute performance • Using commodity PCs • reduces the cost of computation

  8. The Power Problem • Power is a very big issue • Google servers consumes 400-700 W/sq. ft • Typical commercial data center consumes 70-150 W/sq. ft. • Reducing power would reduce performance but it may not reduce cooling and power issues

  9. Parallelism • Lookup of matching docs in a large index • Many lookups in a set of smaller indexes followed by a merge step • A query stream is of multiple streams each handled by a cluster • Adding machines to a pool increases serving capacity

  10. Hardware-level Application Characteristics • Price/Performancebeats pure Performance • Use dual CPU servers rather than quad • Use IDE rather than SCSI • Use commodity PCs • Essentially mid-range PCs except for disks • No redundancy as in many high-end servers • Use rack mounted clusters connected via Ethernet

  11. Conclusion Web search engine can be implemented like Google using: • Lots of commodity machines. • Parallel algorithms for latency. • Replication for throughput and reliability. • Software based reliability. • This resulting distributed system will have a very good price/performance ratio.

More Related