490 likes | 682 Views
Java @ Google JavaZone 2005. Knut Magne Risvik Google Inc. September 14, 2005. Presentation Outline. Background: Google’s mission and computing platform. GFS and MapReduce: Ebony and Ivory of our Infrastructure Java for computing : Coupling infrastructure and Java
E N D
Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005
Presentation Outline • Background: Google’s mission and computing platform. • GFS and MapReduce: Ebony and Ivory of our Infrastructure • Java for computing: Coupling infrastructure and Java • Java in Google products: apps and middle-tiers • The Java expertise at Google: We host Java leadership. • Giving it back: Google contributions to Java • Closing notes and Q&A: Swags for good questions.
Google’s Mission To organize the world’s information and make it universally accessible and useful
Explosive Computational Requirements Every Google service sees continuing growth in computational needs • More queries: More users, happier users • More data:Bigger web, mailbox, blog, etc. • Better results:Find the right information, and find it faster more data more queries better results
A Simple Challenge For Our Computing Platform • Create world’s largest computing infrastructure • Make sure we can afford it Need to drive efficiency of the computing infrastructure to unprecedented levels
Many Interesting Challenges • Server design and architecture • Power efficiency • System software • Large scale networking • Performance tuning and optimization • System management and repairs automation
Design Philosophy Single-machine performance does not matter • The problems we are interested in are too large for any single system • Can partition large problems, so throughput beats peak performance Stuff Breaks • If you have one server, it may stay up three years (1,000 days) • If you have 1,000 servers, expect to lose one a day “Ultra-reliable” hardware makes programmers lazy • A reliable platform will still fail – software still needs to be fault-tolerant • Fault-tolerant software beats fault-tolerant hardware
Why Use Commodity PCs? • Single high-end 8-way Intel server: • IBM eserver xSeries 440 • 8 2-GHz Xeon, 64 GB RAM, 8 TB of disk • $758,000 • Commodity machines: • Rack of 88 machines • 176 2-GHz Xeons, 176 GB RAM, ~7 TB of disk • $278,000 • 1/3X price, 22X CPU, 3X RAM, 1X disk Sources: racksaver.com, TPC-C performance results, both from late 2002
Misc. servers query Spell checker Google Web Server Ad Server Index servers Doc servers I0 I1 I2 IN D0 D1 DM … … Replicas Replicas I0 I1 I2 IN D0 D1 DM … … Index shards Doc shards I0 I1 I2 IN D0 D1 DM Elapsed time: 0.25s, machines involved: 1000+ Google Query Serving Infrastructure
Reliable Building Blocks • Need to store data reliably • Need to run jobs on pools of machines • Need to make it easy to apply lots of computational resources to problems In-house solutions: • Storage: Google File System (GFS) • Job scheduling: Global Work Queue (GWQ) • MapReduce: simplified large-scale data processing
Misc. servers GFS Master Replicas Client Masters GFS Master Client Client C1 C1 C0 C0 C5 … Chunkserver N Chunkserver 2 Chunkserver 1 C2 C3 C2 C5 C5 Google File System - GFS • Master manages metadata • Data transfers happen directly between clients/chunkservers • Files broken into chunks (typically 64 MB) • Chunks triplicated across three machines for safety
GoogleFile API access to GFS • GoogleFile. Public API with two roles: • Creational class. Static methods to obtain InputStream, OutputStream and GoogleChannel on top of a Google file. • File manipulation. A subset of the methods provided by the java.io.File class. • GoogleInputStream. Implements the read method. • GoogleOutputStream. Extends java.io.OutputStream, write method. • GoogleChannel. This is a public class. It implements the ByteChannel interface and a subset of the methods in the FileChannel class. This class provides random access. • GoogleFile.Stats. • The JNI Layer is implemented by the class FileImpl and set of SWIG JNI wrappers generated during the build process.
GFS Usage at Google • 30+ Clusters • Clusters as large as 2000+ chunkservers • Petabyte-sized filesystems • 2000+ MB/s sustained read/write load • All in the presence of HW failures • More information can be found: The Google File SystemSanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung19th ACM Symposium on Operating Systems Principles (http://labs.google.com/papers/gfs.html)
MapReduce: Large Scale Data Processing • Many tasks: Process lots of data to produce other data • Want to use hundreds or thousands of CPUs, and it has to be easy • MapReduce provides, for programs following a particular programming model: • Automatic parallelization and distribution • Fault-tolerance • I/O scheduling • Status and monitoring
Example: Word Frequencies in Web Pages A typical exercise for a new engineer in his or her first week • Have files with one document per record • Specify a map function that takes a key/value pairkey = document namevalue = document text • Output of map function is (potentially many) key/value pairs.In our case, output (word, “1”) once per word in the document “document1”, “to be or not to be” “to”, “1” “be”, “1” “or”, “1” …
key = “be” values = “1”, “1” key = “or” values = “1” key = “not” values = “1” key = “to” values = “1”, “1” “2” “1” “1” “2” Example continued: word frequencies in web pages • MapReduce library gathers together all pairs with the same key • The reduce function combines the values for a keyIn our case, compute the sum • Output of reduce (usually 0 or 1 value) paired with key and saved “be”, “2” “not”, “1” “or”, “1” “to”, “2”
Example: Pseudo-code map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); Total 80 lines of code
Typical Google Cluster • 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory • Limited bisection bandwidth • Storage: local IDE disks and Google File System (GFS) • GFS running on the same machines provides reliable, replicated storage of input and output data • Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines
Execution GFS: Google File System Map task 1 Map task 2 Map task 3 map map map k1:v k2:v k1:v k3:v k1:v k4:v Shuffle and Sort Reduce task 1 Reduce task 2 k1:v,v,v k3,v k2:v k4,v reduce reduce GFS: Google File System
Optimizations • Shuffle stage is pipelined with mapping • Many more tasks than machines, for load balancing • Locality: map tasks scheduled near the data they read • Backup copies of map & reduce tasks (avoids stragglers) • Compress intermediate data • Re-execute tasks on machine failure
Results Using 1800 machines: • MR_Grep scanned 1 terabyte in 100 seconds • MR_Sort sorted 1 terabyte of 100 byte records in 14 minutes Rewrote Google's production indexing system • a sequence of 7, 10, 14, 17, 21, 24 MapReductions • simpler • more robust • faster • more scalable
Widely applicable at Google • Implemented as a C++ library linked to user programs • Java JNI interface similar to GFS API. • Can read and write many different data types Example uses: distributed grepdistributed sortterm-vector per hostdocument clusteringmachine learning... web access log stats web link-graph reversal inverted index construction statistical machine translation …
Conclusion • MapReduce has proven to be a useful abstraction • Greatly simplifies large-scale computations at Google • Fun to use: focus on problem, let library deal with messy details MapReduce: Simplified Data Processing on Large ClustersJeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation (Search Google for “MapReduce”)
AdWords FE - Millions of ads - Billions of transactions - Extreme rates GMail middletier - UI and storage brokerage - Content searching, analysis - Tagging Java in Google Applications
Java expertise @ Google • Joshua Bloch - Collections Framework, Java 5.0 language enhancements, java.math, Author of "Effective Java," Coauthor of "Java Puzzlers." • Neal Gafter - Lead developer of javac, implementor of Java 5.0 language enhancements, "Coauthor of "Java Puzzlers." • Robert Griesmer - Architect and technical lead of the HotSpot JVM. • Doug Kramer - Javadoc architect, Java platform documentation lead. • Tim Lindholm - Original member of the Java project, key contributor to the Java programming language, implementor of the classic JVM, coauthor of "The Java Viutual Machine Specification." • Michael "madbot" McCloskey - Designer and implementer of java.util.regexp. • Srdjan Mitrovic - Co-implementor of the HotSpot JVM. • David Stoutamire - Technical lead for Java performance, designer and implementer of parallel garbage collection. • Frank Yellin - Original member of the Java project, Co-implementor of classic JVM, KVM and CLDC, Coauthor of "The Java Viutual Machine specification."
Giving it back – JCP Expert Groups • Executive Committe for J2SE/J2EE • JSR 166X: Concurrency Utilities (continuing) http://www.jcp.org/en/jsr/detail?id=166 • JSR 199: Java Compiler API http://www.jcp.org/en/jsr/detail?id=199 • JSR 220: Enterprise JavaBeans 3. http://www.jcp.org/en/jsr/detail?id=220 • JSR 250: Common Annotations for the Java Platform http://www.jcp.org/en/jsr/detail?id=250 • JSR 260: Javadoc Tag Technology Update http://www.jcp.org/en/jsr/detail?id=260 • JSR 269: Pluggable Annotation Processing API http://www.jcp.org/en/jsr/detail?id=269 • JSR 270: J2SE 6.0 ("Mustang") Release Contents http://www.jcp.org/en/jsr/detail?id=270Google representative: gafter • JSR 273: Design-Time API for JavaBeans JBDT http://www.jcp.org/en/jsr/detail?id=273 • JSR 274: The BeanShell Scripting Language http://www.jcp.org/en/jsr/detail?id=274 • JSR 277: Java Module System http://www.jcp.org/en/jsr/detail?id=277
Closing Notes • Google = Computing infrastructure • Java is becoming a first class citizen at Google • Essential native interfaces being built • API design extremely important at our scale, the Java expertise is driving general API work • Google brings high-scale industrial experience into JCP expert groups.
Q&A Knut Magne Risvik Google Inc. September 14, 2005