: Flexible Database Clustering Middleware

Emmanuel Cecchet – INRIA Julie Marguerite – ObjectWeb Willy Zwaenepoel – EPFL : FlexibleDatabase Clustering Middleware

JDBC JDBC Motivations • Database tier should be • scalable • highly available • without modifying the client application • database vendor independent • on commodity hardware JDBC Internet

Scaling the database tier – Alternative 1 (master-slave) • Cons • failover time on master failure • scalability App. server Master Web frontend Internet

Database Well-known database vendor here Well-known hardware + database vendors here Scaling the database tier – Alternative 2 (SMP) • Cons • Cost • Scalability limit App. server Web frontend Internet

Scaling the database tier – Alternative 3 (shared disks) • Cons • still expensive hardware • availability App. server Disks Database Web frontend Internet Another well-known database vendor

Outline • C-JDBC architecture • High availability • Use cases • Conclusion

C-JDBC controller MySQL database C-JDBC JDBC driver MySQL JDBC driver JVM architectural overview Application server JVM JVM

basic concepts • 2 components • C-JDBC driver • C-JDBC controller • 100% Java implementation • Read-one, Write all approach • Tunable replication • full partitioning • full replication • partial replication JVM

connectmyDB connectlogin, password executeSELECT * FROM t Functional overview

executeINSERT INTO t … Functional overview

Failures • No 2 phase-commit • parallel transactions • failed nodes are automatically disabled executeINSERT INTO t …

Restoring a backend • Updates stored in the recovery log • Database dumps associated to checkpoints

Synchronization • Replay missing updates from log

Healed Cluster • Re-enable backend when done

Vertical scalability • Addresses JVM scalability issues • Distributing large number of connections on many backends

Controller replication • Prevent the controller from being a single point of failure • Group communication for controller synchronization

jdbc:cjdbc://node1,node2/myDB Total order reliable multicast Controller replication

Mixing horizontal & vertical scalability

Current limitations • Replication granularity is table • No distributed joins • Network partition with replicated controllers • JDBC only • support of PHP, Perl, ODBC through wrappers or bridges • partial support of JDBC 3.0

Other features • SSL support • Support for heterogeneous databases • SQL monitoring • JMX based administration console • Request player

Budget High Availability • High availability infrastructure “on a budget” • Typical eCommercesetup • http://www.budget-ha.com

OpenUSS: University Support System • eLearning • High availability • Portability • Linux, HP-UX, Windows • InterBase, Firebird, PostgreSQL, HypersonicSQL • http://openuss.sourceforge.net

Flood alert system • Disaster recovery • Independent nodes synchronized with C-JDBC • VPN for security issues • http://floodalert.org

Internet emulated users J2EE benchmarking • Large scaleJ2EE clusters • http://jmob.objectweb.org

Conclusion • C-JDBC: Flexible Database Clustering Middleware • scalable • highly available • without modifying the client application • database vendor neutral • on commodity hardware • LPGL software hosted by ObjectWeb

Q&A_________Thanks to all users and contributors ... http://c-jdbc.objectweb.org

TPC-W benchmark(Amazon.com) • Nearly linear speedups with the shopping mix

Result cache • Cache contains a list of SQL->ResultSet • Policy defined by queryPattern->Policy • 3 policies • EagerCaching: variable granularities for invalidations • RelaxedCaching: invalidations based on timeout • NoCaching: never cached

Recovery log • All updates are stored in the recovery log • Database dumps associated to checkpoints

Making new checkpoints • Disable one backend to have a coherent snapshot • Mark the new checkpoint entry in the log • Use Octopus to store the dump

Making new checkpoints • Replay missing updates from log

Making new checkpoints • Re-enable backend when done

Handling a backend failure • A node fails! • Automatically disabled but should be fixed or changed by administrator

Fault tolerant recovery log UPDATE statement

: Flexible Database Clustering Middleware

: Flexible Database Clustering Middleware

Presentation Transcript

Clustering Algorithms

Middleware: High Technical Bandwidth, High Political Latency

Fuzzy C-Means Clustering

Building Flexible Database Systems

Clustering Techniques

Concepts of Database Management Seventh Edition

Middleware Security

Graph P artitioning a nd Clustering for Community Detection

DATA MINING LECTURE 8

Clustering Documents

Middleware Security

Clustering and NLP

Software Clustering Using Bunch

Middleware Implementation Case Studies

Rat Genome Database RGD

High Performance Data Mining Chapter 3: Clustering

Database Design Methodology

Dr. Marina Gavrilova

INTRODUCTION TO DATABASE (cont.)

FAQs About Chapter 5 and the Database (through time?)