1 / 37

Hypertable

Hypertable. Doug Judd Zvents, Inc. Background. Web 2.0 = Data Explosion. Web 2.0. Web 1.0. Web 2.0. Web 1.0. Traditional Tools Don’t Scale Well. Designed for a single machine Typical scaling solutions ad-hoc manual/static resource allocation. The Google Stack.

tariq
Download Presentation

Hypertable

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypertable Doug Judd Zvents, Inc.

  2. Background hypertable.org

  3. Web 2.0 = Data Explosion Web 2.0 Web 1.0 Web 2.0 Web 1.0

  4. Traditional ToolsDon’t Scale Well • Designed for a single machine • Typical scaling solutions • ad-hoc • manual/static resource allocation

  5. The Google Stack • Google File System (GFS) • Map-reduce • Bigtable hypertable.org

  6. Architectural Overview hypertable.org

  7. What is Hypertable? • A open source high performance, scalable database, modelled after Google's Bigtable • Not relational • Does not support transactions hypertable.org

  8. Hypertable Improvements Over Traditional RDBMS • Scalable • High random insert, update, and delete rate

  9. Data Model • Sparse, two-dimensional table with cell versions • Cells are identified by a 4-part key • Row • Column Family • Column Qualifier • Timestamp hypertable.org

  10. Table: Visual Representation hypertable.org

  11. Table: Actual Representation hypertable.org

  12. Anatomy of a Key • Row key is \0 terminated • Column Family is represented with 1 byte • Column qualifier is \0 terminated • Timestamp is stored big-endian ones-compliment hypertable.org

  13. Concurrency • Bigtable uses copy-on-write • Hypertable uses a form of MVCC(multi-version concurrency control) • Deletes are carried out by inserting “delete” records

  14. CellStore • Sequence of 65K blocks of compressed key/value pairs

  15. System Overview

  16. Range Server • Manages ranges of table data • Caches updates in memory (CellCache) • Periodically spills (compacts) cached updates to disk (CellStore) hypertable.org

  17. Client API class Client { void create_table(const String &name, const String &schema); Table *open_table(const String &name); String get_schema(const String &name); void get_tables(vector<String> &tables); void drop_table(const String &name, bool if_exists); };

  18. Client API (cont.) class Table { TableMutator *create_mutator(); TableScanner *create_scanner(ScanSpec &scan_spec); }; class TableMutator { void set(KeySpec &key, const void *value, int value_len); void set_delete(KeySpec &key); void flush(); }; class TableScanner { bool next(CellT &cell); }; hypertable.org

  19. Language Bindings • Currently C++ only • Thrift Broker

  20. Write Ahead Commit Log • Persists all modifications (inserts and deletes) • Written into underlying DFS

  21. Range Meta-Operation Log • Facilitates Range meta operation • Loads • Splits • Moves • Part of Master and RangeServer • Ensures Range state and location consistency

  22. Compression • Cell Stores store compressed blocks of key/value pairs • Commit Log stores compressed blocks of updates • Supported Compression Schemes • zlib (--best and --fast) • lzo • quicklz • bmz • none hypertable.org

  23. Caching • Block Cache • Caches CellStore blocks • Blocks are cached uncompressed • Query Cache • Caches query results • TBD hypertable.org

  24. Bloom Filter • Negative Cache • Probabilistic data structure • Indicates if key is not present

  25. Scaling (part I) hypertable.org

  26. Scaling (part II) hypertable.org

  27. Scaling (part III) hypertable.org

  28. Access Groups • Provides control of physical data layout -- hybrid row/column oriented • Improves performance by minimizing I/OCREATE TABLE crawldb { Title MAX_VERSIONS=3, Content MAX_VERSIONS=3, PageRank MAX_VERSIONS=10, ClickRank MAX_VERSIONS=10, ACCESS GROUP default (Title, Content), ACCESS GROUP ranking (PageRank, ClickRank)}; hypertable.org

  29. Filesystem Broker Architecture • Hypertable can run on top of any distributed filesystem (e.g. Hadoop, KFS, etc.) hypertable.org

  30. Keys To Performance • C++ • Asynchronous communication

  31. C++ vs. Java • Hypertable is CPU intensive • Manages large in-memory key/value map • Alternate compression codecs (e.g. BMZ) • Hypertable is memory intensive • Java uses 2-3 times the amount of memory to manage large in-memory map (e.g. TreeMap) • Poor processor cache performance hypertable.org

  32. Performance Test(AOL Query Logs) • 75,274,825 inserted cells • 8 node cluster • 1 1.8 GHz Dual-core Opteron • 4 GB RAM • 3 x 7200 RPM SATA drives • Average row key: 7 bytes • Average value: 15 bytes • Replication factor: 3 • 4 simultaneous insert clients • 500K random inserts/s • 680K scanned cells/s hypertable.org

  33. Performance Test II • Simulated AOL query log data • 1TB data • 9 node cluster • 1 2.33 GHz quad-core Intel • 16 GB RAM • 3 x 7200 RPM SATA drives • Average row key: 9 bytes • Average value: 18 bytes • Replication factor: 3 • 4 simultaneous insert clients • Over 1M random inserts/s (sustained) hypertable.org

  34. Weaknesses • Range data managed by a single range server • Though no data loss, can cause periods of unavailability • Can be mitigated with client-side cache or memcached hypertable.org

  35. Project Status • Currently in “alpha” • Just released version 0.9.0.7 • Will release “beta” version end of August • Waiting on Hadoop JIRA 1700 hypertable.org

  36. License • GPL 2.0 • Why not Apache?

  37. Questions? • www.hypertable.org hypertable.org

More Related