260 likes | 380 Views
Telegraph Java Experiences. Sam Madden UC Berkeley madden@cs.berkeley.edu. Telegraph Overview. 100% Java In memory database Query engine for alternative sources Web Sensors Testbed for adaptive query processing. Telegraph & WWW : FFF. Federated Facts and Figures
E N D
Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu RightOrder : Telegraph & Java
Telegraph Overview • 100% Java • In memory database • Query engine for alternative sources • Web • Sensors • Testbed for adaptive query processing RightOrder : Telegraph & Java
Telegraph & WWW : FFF • Federated Facts and Figures • Collect Data on the Election • Based on Avnur and Hellerstein Sigmod ‘00 Work: Eddies • Route tuples dynamically based on source loads and selectivities RightOrder : Telegraph & Java
fff.cs.berkeley.edu RightOrder : Telegraph & Java
Architecture Overview • Query Parser • Jlex & CUP • Preoptimizer • Chooses Access Paths • Eddy • Routes Tuples To Modules RightOrder : Telegraph & Java
Modules • Doubly-Pipelined Hash Joins • Index Joins • For probing into web-pages • Aggregates & Group Bys • Scans • Telegraph Screen Scraper: View web pages as Relations RightOrder : Telegraph & Java
Execution Framework • One Thread Per Query • Iterator Model for Queries • Experimented with Thread Per Module • Linux threads are expensive • Two Memory Management Models • Java Objects • Home Rolled Byte Arrays RightOrder : Telegraph & Java
Tuples as Java Objects • Tuple Data stored as a Java Object • Each in separate byte array • Tuples copied on joins, aggregates • Issues • Memory Management between Modules, Queries, Garbage collector control • Allocation Overhead • Performance: 30,000 200byte tuples / sec -> 5.9 MB / sec RightOrder : Telegraph & Java
Byte Array Offset, Size Offset, Size Offset, Size Directory Surrogate Objects Tuples As Byte Array • All tuples stored in same byte array / query • Surrogate Java Objects RightOrder : Telegraph & Java
Byte Array (cont) • Allows explicit control over memory / query (or module) • Compaction eliminates garbage collection randomness • Lower throughput: 15,000 t/sec • No surrogate object reuse • Synchronization costs RightOrder : Telegraph & Java
Other System Pieces • XML Based Catalog • Java Introspection Helps • Applet-based Front End • JDBC Interface • Fault Tolerance / Multiple Servers • Via simple UNIX tools RightOrder : Telegraph & Java
RightOrder Questions • Performance vs. C • JNI Issues • Garbage Collection Issues • Serialization Costs • Lots of Java Objects • JDBC vs ODI RightOrder : Telegraph & Java
Performance Vs. C • JVM + JIT Performance Encouraging: IBM JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks • IBM JIT 2x Faster than HotSpot for Telegraph Scans • Stability Issues • www.javalobby.org/features/jpr RightOrder : Telegraph & Java
JIT Performance vs C Optimized Intel Optimized MS IBM JIT Source: www.javalobby.org/features/jpr RightOrder : Telegraph & Java
Performance Gotchas • Synchronization • ~2x Function Call overhead in HotSpot • Used in Libraries: Vector, StringBuffer • String allocation single most intensive operation in Telegraph • Mercatur: 20% initial CPU Cost • Garbage Collection • Java dumb about reuse • Mercatur: 15% Cost • OceanStore: 30ms avg latency, 1S peak RightOrder : Telegraph & Java
More Gotchas • Finalization • Finalizing methods allows inlining • Serialization • RMI, JNI use serialization • Philippsen & Haumacher Show Performance Slowness RightOrder : Telegraph & Java
Performance Tools • Tools to address some issues • JAX, Jopt: make bytecode smaller, faster • www.alphaworks.ibm.com/tech/JAX • www.condensity.com • Bytecode optimizer • www.optimizeit.com • Good profiler, memory allocation and garbage collection monitor RightOrder : Telegraph & Java
JNI Issues • Not a part of Telegraph • JNI overhead quite large (JDK 1.1.8, PII 300 MHz) Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis, UC Berkeley, 1999. RightOrder : Telegraph & Java
More JNI • But, this is being worked on • IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII) • JNI allows synchronization (pin / unpin), thread management • See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html • GCJ + CNI: access Java objects via C++ classes • http://gcc.gnu.org/java/ RightOrder : Telegraph & Java
Garbage Collection • Performance • Big problem: 1 S or longer to GC lots of objects • Most Java GCs blocking (not concurrent or multi-threaded) • Unexpected Latencies • OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC • In high-concurrency apps, such delays disastrous RightOrder : Telegraph & Java
Garbage Collection Cont. • Limited Control • Runtime.gc() only a hint • Runtime.freeMemory() unreliable • No way to disable • No object reuse • Lots of unnecessary memory allocations RightOrder : Telegraph & Java
Serialization • Not in Telegraph • Philippsen and Haumacher, “More Efficient Object Serialization.” International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. • Serialization costs for RMI are 50% of total RMI time • Discard longevity for 7x speed up • Sun Serialization provides versioning • Complete class description stored with each serialized object • Most standard classes forward compatible (JDK docs note special cases) • See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html RightOrder : Telegraph & Java
Lots of Objects • GC Issues Serious • Memory Management • GC makes programmers allocate willy-nilly • Hard to partition memory space • Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries RightOrder : Telegraph & Java
Storage Overheads • Java Object class is big: • Integer requires 23 bytes in JDK 1.3 • int requires 4.3 bytes • No way to circumvent object fields • Use primitives or hand-written serialization whenever possible RightOrder : Telegraph & Java
JDBC vs ODI • No experience with Oracle • JDBC overheads are high, but don’t have specific performance numbers RightOrder : Telegraph & Java
Bottom Line • Java great for many reasons • GC, standard libraries, type safety, introspection, etc. • Significant reductions in development and debugging time. • Java performance isn’t bad • Especially with some tuning • Memory Management an Issue • Lack of control over JVMs bad • When to garbage collect, how to serialize, etc. RightOrder : Telegraph & Java