330 likes | 449 Views
Web Search for a Planet: The Google Cluster Architecture. Eugenio De Hoyos 6175 Computer Science Seminar October 4, 2011. introduction. introduction. “. … a single query on Google reads h undreds of megabytes of data and c onsumes tens of billions of CPU cycles…. ”. IO.
E N D
Web Search for a Planet:The Google ClusterArchitecture Eugenio De Hoyos 6175 Computer Science Seminar October 4, 2011
introduction “ … a single query on Google reads hundreds of megabytes of data and consumes tens of billions of CPU cycles… ” IO 500 MB @ 20 MB/s → 25 sec CPU 10x109 cycles @ 2 GHz → 5 sec
introduction “ … a single query on Google reads hundreds of megabytes of data and consumes tens of billions of CPU cycles… ” IO 500 MB @ 20 MB/s → 25 sec CPU 10x109 cycles @ 2 GHz → 5 sec
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
a single query http://www.googlefalle.com
a single query Google Web Server Google Web Server Google Web Server Google Web Server Google Web Server Hardware Load Balancer Google Web Server Google Web Server
Google Web Server Google Web Server Google Web Server 4 3 2 1 IndexServers DocumentServers Shard Shard Shard Shard Shard Shard Shard Shard PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC PC
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
philosophy Service C Service B Service A
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
the power problem RAM/BOARD HD
“ ” A Google data center, circa 2000. Note the fan on the floor to cool servers. (Credit: Stephen Shankland-CNET News.com/Jeff Dean-Google)
their observation Equipment Cost Power & Cooling Scale
are their numbers right? Min. Amortization Requires $ 1,500 Operating Costs Min. Cost Requires $ 20,000 Amortization Cost of inefficiency
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
hardware index server RAM CPU Hard Drive
hardware 0 8 6 7 9 5 3 1 2 4 1 Clock Cycle 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 Short Pipeline Pentium III 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 6 7 8 9 0 5 9 1 2 3 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 8 6 7 9 0 5 8 9 3 1 2 4 6 7 8 9 0 5 7 8 9 1 2 3 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 6 7 8 0 5 6 7 8 9 1 2 3 4 Long Pipeline 6 7 0 5 6 7 8 9 1 2 3 4 5 Pentium IV 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 6 0 5 8 6 7 9 3 1 2 4 5 4 0 5 6 7 8 9 1 2 3 4 5 3 4 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 3 1 2 4 5 3 2 4 0 8 6 7 9 3 1 2 5 3 1 2 4 0 8 6 7 9 1 2 5 3 1 2 4
hardware 0 8 6 7 9 5 3 1 2 4 1 Clock Cycle 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 Short Pipeline Pentium III 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 6 7 8 9 0 5 9 1 2 3 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 0 8 6 7 9 0 5 8 9 3 1 2 4 6 7 8 9 0 5 7 8 9 1 2 3 4 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 6 7 8 0 5 6 7 8 9 1 2 3 4 Long Pipeline 6 7 0 5 6 7 8 9 1 2 3 4 5 Pentium IV 0 8 6 7 9 5 3 1 2 4 5 3 1 2 4 6 0 5 8 6 7 9 3 1 2 4 5 4 0 5 6 7 8 9 1 2 3 4 5 3 4 0 8 6 7 9 5 3 1 2 4 0 8 6 7 9 3 1 2 4 5 3 2 4 0 8 6 7 9 3 1 2 5 3 1 2 4 0 8 6 7 9 1 2 5 3 1 2 4
hardware instruction level parallelism 5 5 3 3 1 1 2 2 4 4 thread level parallelism 5 5 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4
hardware simultaneous multithreading (SMT) 5 5 5 5 3 3 3 3 1 1 1 1 2 2 2 2 4 4 4 4 5 5 5 5 3 3 3 3 1 1 1 1 2 2 2 2 4 4 4 4 5 5 5 5 3 3 3 3 1 1 1 1 2 2 2 2 4 4 4 4 5 5 5 5 3 3 3 3 1 1 1 1 2 2 2 2 4 4 4 4 CPU L1 5 5 5 5 3 3 3 3 1 1 1 1 2 2 2 2 4 4 4 4 L2
hardware chip multiprocessor (CMP) 5 5 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4 L1 5 5 3 3 1 1 2 2 4 4 5 5 CPU 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4 5 5 3 3 1 1 2 2 4 4 L2 5 5 3 3 1 1 2 2 4 4 CPU 5 5 3 3 1 1 2 2 4 4 L1
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
memory & scalability Unpredictable memory access Large cache lines prefetch helps RAM line length Cache CPU cache length Memory bandwith OK
outline A Single Query Philosophy Power Index Hardware Index Memory Conclusion
conclusion Cluster architecture is ideal and least expensive Maximize throughput Software Reliability
conclusion Service C Service B Service A
a discussion question… HDMI Monitor USB Keyboard 700 MHz ARM 11 128 MB RAM Open GL ES 2.0 1080p -- David Braben, UK game developer