1 / 90

GORDA Kickoff meeting INRIA – Sardes project

GORDA Kickoff meeting INRIA – Sardes project. Emmanuel Cecchet Sara Bouchenak. Outline. INRIA, ObjectWeb & Sardes GORDA. INRIA key figures. A public scientific and technological research institute in computer science and control

brooke
Download Presentation

GORDA Kickoff meeting INRIA – Sardes project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GORDA Kickoff meetingINRIA – Sardes project Emmanuel Cecchet Sara Bouchenak

  2. Outline • INRIA, ObjectWeb & Sardes • GORDA

  3. INRIA key figures • A public scientific and technological research institute • in computer science and control • under the dual authority of the Ministry of Research and the Ministry of Industry Jan. 2003 A scientific force of 3,000 • 900 permanent staff • 400 researchers • 500 engineers, technical and administrative • 450 researchers from other organizations • 700 Ph.D students • 200 external collaborators • 750 trainees, post-doctoral students, visiting researchers from abroad • (universities or industry)) INRIA Rhône-Alpes 6 Research Units Budget: 120 M€ (tax not incl.)

  4. iCluster 2 • Itanium-2 processors • 104 nodes (Dual 64 bits 900 MHz processors, 3 GB memory, 72 GB local disk) connected through a Myrinet network • 208 processors, 312 GB memory, 7.5 TB disk • Connected to the GRID • Linux OS (RedHat Advanced Server) • First Linpack experiments at INRIA (Aug. 03) have reached 560 GFlop/s • Applications : Grid computing,classical scientific computing, high performance Internet servers, …

  5. ObjecWeb key figures • Open source middleware development • Based on open standard • J2EE, CORBA, OSGi… • International consortium • Founded by INRIA, Bull and FT R&D in 2001 • Academic partners • European universities and research centers • Industrial partners • RedHat, Suse, MySQL, … • NEC, Bull, France Telecom, Dassault, Cap Gemini, …

  6. Applications OSGi J2EE CCM ... Application Servers Dedicated Middleware Platform Interoperability Transactions Deployment Persistence Tools Components Frameworks ... Database Connectivity Composition JVM ... ... Hardware & OS Networked Resources Common Software & Architecture for Component Based Development JMOB JOnAS OSCAR RUBiS OpenCCM ProActive Speedo JORAM DotNetJ CAROL Enhydra XMLC JORM/MEDOR JOTM OSCAR Kilim Zeus C-JDBC Fractal Jonathan Bonita RmiJdbc Think JAWE Octopus

  7. Sardes project • Distributed Systems group • Main research themes • Reflective component technology • Autonomous systems management • Applications areas • high-availability J2EE servers • dynamic monitoring, configuration and resource management in large scale distributed systems • (embedded system networks, ubiquitous computing) • Result dissemination by ObjectWeb

  8. Outline • INRIA, ObjectWeb & Sardes • GORDA

  9. Sardes experiences • Component-based open source middleware • ObjectWeb (http://www.objectweb.org) • J2EE application servers • JOnAS clustering (http://jonas.objectweb.org) • Database replication middleware • C-JDBC (http://c-jdbc.objectweb.org) • Benchmarking • RUBiS (http://rubis.objectweb.org) • TPC-W (http://jmob.objectweb.org) • CLIF (http://clif.objectweb.org) • Monitoring • LeWYS (http://lewys.objectweb.org)

  10. Database Well-known database vendor here Well-known hardware + database vendors here Common scalability practice • Cons • Cost • Scalability limit App. server Web frontend Internet

  11. Replication with shared disks • Cons • still expensive hardware • availability App. server Disks Database Web frontend Internet Another well-known database vendor

  12. Master/Slave replication • Cons • consistency • failover time on master failure • scalability App. server Master Web frontend Internet

  13. Atomic broadcast-based replication • Database tier should be • scalable • highly available • without modifying the client application • database vendor independent • on commodity hardware Internet Atomic broadcast

  14. JDBC JDBC C-JDBC • JDBC compliant (no client application modification) • database vendor independent • JDBC driver required • heterogeneity support • no 2PC, no group communication between databases • group communication for controller replication only JDBC Internet

  15. RAIDb - Definition • Redundant Array of Inexpensive Databases • better performance and fault tolerance than a single database, at a low cost, by combining multiple database instances into an array of databases • RAIDb levels offers various tradeoff of performance and fault tolerance

  16. RAIDb • Redundant Array of Inexpensive Databases • better performance and fault tolerance than a single database, at a low cost, by combining multiple database instances into an array of databases • RAIDb controller • gives the view of a single database to the client • balance the load on the database backends • RAIDb levels • RAIDb-0: full partitioning • RAIDb-1: full mirroring • RAIDb-2: partial replication • composition possible

  17. C-JDBC – Key ideas • Middleware implementing RAIDb • Two components • generic JDBC 2.0 driver (C-JDBC driver) • C-JDBC Controller • C-JDBC Controller provides • performance scalability • high availability • failover • caching, logging, monitoring, … • Supports heterogeneous databases

  18. C-JDBC – Overview

  19. Heterogeneity support • unload a singleOracle DB withseveral MySQL • RAIDb-2 forpartial replication

  20. Inside the C-JDBC Controller Sockets Sockets JMX

  21. C-JDBC features • unified authentication management • tunable concurrency control • automatic schema detection • tunable replication: full partitioning, partial replication, full replication • caching: metadata, parsing, result with various invalidation granularities • various load balancing strategies • on-the-fly query rewriting for macros and heterogeneity support • recovery log for dynamic backend adding and failure recovery • database backup/restore using Octopus • JMX based monitoring and administration • graphical administration console

  22. connectmyDB connectlogin, password executeSELECT * FROM t Functional overview

  23. executeINSERT INTO t … Functional overview

  24. Failures • No 2 phase-commit • parallel transactions • failed nodes are automatically disabled executeINSERT INTO t …

  25. Controller replication jdbc:c-jdbc://node1:25322,node2:12345/myDB • Prevent the controller from being a single point of failure • Group communication for controller synchronization • C-JDBC driver supports multiple controllers with automatic failover

  26. jdbc:cjdbc://node1,node2/myDB Uniform total order reliable multicast Controller replication

  27. Mixing horizontal & vertical scalability

  28. Lessons learned • SQL parsing cannot be generic • many discrepancies in JDBC implementations • minimize the use of group communications • IP multicast does not scale • notification infrastructure needed • users want • no single point of failure • control (monitoring, plug-able recovery policies, …) • no database vendor locking • no database modification • need for an exhaustive test suite • benchmarking accurately is very difficult • load injection requires resources • monitoring and exploiting results is tricky

  29. Sardes role in GORDA • provide input • GORDA APIs • group communication requirements • monitoring and management requirements • middleware implementation based on C-JDBC • dissemination effort • ObjectWeb • possible participation to JCP for JDBC extensions • hardware resources for experiments • eCommerce benchmarks

  30. Other interests • LeWYS (http://lewys.objectweb.org) • monitoring infrastructure • generic hardware/kernel probes for Linux/Windows • software probes: JMX, SNMP, … • monitoring repository • autonomic behavior • building supervision loops • self-healing clusters • self-sizing (expand or shrink) • SLAs

  31. Q without A • do we consider distributed query execution? • XA support? • cluster size targeted? • do we target grids or cluster of clusters? • reconciliation • consistency/caching • network architecture considered? • are relaxed or loose consistency options? • what will really cover the GRI? • do we impose a specific way of doing replication? • access to read-set/write-set difficult to implement with legacy databases • which workloads are considered? • which WP deals with backup/recovery? • licensing issues?

  32. Q&A_________Thanks to all users and contributors ...

  33. Bonus slides

  34. INTERNALS

  35. Virtual Database • gives the view of a single database • establishes the mapping between the database name used by the application and the backend specific settings • backends can be added and removed dynamically • configured using an XML configuration file

  36. Authentication Manager • Matches real login/password used by the application with backend specific login/ password • Administrator login to manage the virtual database

  37. Scheduler • Manages concurrency control • Specific implementations for Single DB, RAIDb 0, 1 and 2 • Query-level • Optimistic and pessimistic transaction level • uses the database schema that is automatically fetched from backends

  38. Request cache • caches results from SQL requests • improved SQL statement analysis to limit cache invalidations • table based invalidations • column based invalidations • single-row SELECT optimization • request parsing possible in theC-JDBC driver • offload the controller • parsing caching in the driver

  39. Load balancer 1/2 • RAIDb-0 • query directed to the backend having the needed tables • RAIDb-1 • read executed by current thread • write executed in parallel by a dedicated thread per backend • result returned if one, majority or all commit • if one node fails but others succeed, failing node is disabled • RAIDb-2 • same as RAIDb-1 except that writes are sent only to nodes owning the written table

  40. Load balancer 2/2 • Static load balancing policies • Round-Robin (RR) • Weighted Round-Robin (WRR) • Least Pending Requests First (LPRF) • request sent to the node that has the shortest pending request queue • efficient if backends are homogeneous in terms of performance

  41. Connection Manager • Connection pooling for a backend • Simple : no pooling • RandomWait : blocking pool • FailFast : non-blocking pool • VariablePool : dynamic pool • Connection pools defined on a per login basis • resource management per login • dedicated connections for admin

  42. Recovery Log • Checkpoints are associated with database dumps • Record all updates and transaction markers since a checkpoint • Used to resynchronize a database from a checkpoint • JDBCRecoveryLog • store information in a database • can be re-injected in a C-JDBC cluster for fault tolerance

  43. SCALABILITY

  44. C-JDBC scalability • Horizontal scalability • prevents the controller to be a Single Point Of Failure (SPOF) • distributes the load among several controllers • uses group communications for synchronization • C-JDBC Driver • multiple controllers automatic failover • jdbc:c-jdbc://node1:25322,node2:12345/myDB • connection caching • URL parsing/controller lookup caching

  45. C-JDBC scalability • Vertical scalability • allows nested RAIDb levels • allows tree architecture for scalable write broadcast • necessary with large number of backends • C-JDBC driver re-injected in C-JDBC controller

  46. C-JDBC vertical scalability • RAIDb-1-1with C-JDBC • no limit tocompositiondeepness

  47. C-JDBC vertical scalability • RAIDb-0-1with C-JDBC

  48. CHECKPOINTING

  49. Fault tolerant recovery log UPDATE statement

  50. Checkpointing • Octopus is an ETL tool • Use Octopus to store a dump of the initial database state • Currently done by the user using the database specific dump tool

More Related