1 / 29

Hypertable

Hypertable. Doug Judd CEO, Hypertable, Inc. Modeled after Bigtable High Performance Implementation (C++) Project Started in March 2007 Runs on top of HDFS Thrift Interface for all popular languages Java PHP Ruby Python Perl, etc. High Performance, Open Source Scalable Database.

mahdis
Download Presentation

Hypertable

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypertable Doug Judd CEO, Hypertable, Inc.

  2. Modeled after Bigtable High Performance Implementation (C++) Project Started in March 2007 Runs on top of HDFS Thrift Interface for all popular languages Java PHP Ruby Python Perl, etc. High Performance, Open SourceScalable Database

  3. Hypertable Deployments

  4. Architecture

  5. Underlying Data Representation

  6. Scaling (part I)

  7. Scaling (part II)

  8. Scaling (part III)

  9. Request Routing

  10. Query Handling

  11. Features

  12. Load data from HT to Hive and vice-versa Use Hive types Use Hive QL (joins, aggregations) Low latency data warehousing Uses Hypertable’s native MapReduce Input/Output format

  13. Namespaces /development user tweet /testing user tweet /production /v1 user tweet /v2 user tweet

  14. Column Family Options • TTL=<t> • “time to live” • Remove cells that are older than <t> • MAX_VERSIONS=<n> • Keep only most recent <n> cell versions

  15. Access Groups • Provides control over physical layout • Row oriented • Column oriented • Hybrid • Reduces I/O CREATE TABLE MyTable ( a, b, c, d, ACCESS GROUP first(a), ACCESS GROUP second (b, c, d) );

  16. Regular Expression Filtering • Google’s RE2 regular expression engine • Extremely fast (up to 50X Java regex) • Searches run in time linear in the size of the input • Searches constrained to a fixed amount of memory • Supported Searches: • Row key • Column qualifier • Value SELECT CELLS tag:/(?i)(nosql|bigtable)/ FROM MyTable WHERE ROW REGEXP "^\D+" AND VALUE REGEXP ”(?i)hypertable";

  17. Atomic Counters • New column option: • Modified via existing API using specially formatted values: create table counts ( url COUNTER, );

  18. Group Commit • Supports highly concurrent updates • Trades minimum latency for better throughput • Configurable commit interval per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;

  19. Compression • Block compression • Cell Store (SSTable) blocks • Commit Log blocks • Supported Compression Schemes: • zlib • lzo • quicklz • bmz • none

  20. Bloom Filter • Dramatically reduces disk access • Associated with each Cell Store • Tells you if key is definitively not present

  21. Performance Evaluation

  22. Setup • Modeled after Test described in Bigtable paper • 1 Test Dispatcher, 4 Test Clients, 4 Tablet Servers • Test was written entirely in Java • Hardware • 1 X 1.8 GHz Dual-core Opteron • 10 GB RAM • 3X 250GB SATA drives • Software • HDFS 0.20.2 running on all 10 nodes, 3X replication • HBase 0.20.4 • Hypertable 0.9.3.3

  23. Latency

  24. Throughput

  25. Why does Performance Matter? $$$

  26. Upcoming Release (0.9.5) • Last “alpha” release • Release Date: February 15th 2011 • Features • Automatic range balancing • Asynchronous API • Improved Monitoring System

  27. Resources

  28. Professional Support

  29. Q&A

More Related