1 / 12

James Taylor jtaylor@salesforce

James Taylor jtaylor@salesforce.com. Phoenix. We put the SQL back in the NoSQL. Agenda. Phoenix Overview Phoenix Implementation Performance Analysis Phoenix Roadmap Demo. Completed. Phoenix Overview. SQL layer on top of HBase Delivered as a embedded JDBC driver

Download Presentation

James Taylor jtaylor@salesforce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. James Taylor jtaylor@salesforce.com Phoenix We put the SQL back in the NoSQL

  2. Agenda • Phoenix Overview • Phoenix Implementation • Performance Analysis • Phoenix Roadmap • Demo Completed

  3. Phoenix Overview • SQL layer on top of HBase • Delivered as a embedded JDBC driver • Targeting low latency queries over HBase data • Columns modeled as multi-part row key and key values • Query engine transforms SQL into series of scans • Using native HBase APIs and capabilities • Coprocessors for aggregation • Custom filters for expression evaluation • Transaction isolation through scan time range • Optionally client-controlled timestamps • Open sourcing soon • 100% Java Completed

  4. Phoenix SQL Support • SELECT <expression>… • FROM <table> • WHERE <expression> • GROUP BY <expression>… • HAVING <aggregate expression> • ORDER BY <aggregate expression>… • LIMIT <value> • Aggregation Functions • MIN, MAX, AVG, SUM, COUNT • Built-in Functions • SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE • Operators • =,!=,<>,<,<=,>,>=, LIKE • AND, OR, NOT • Bind Parameters • ?, :# • CASE WHEN • IN (<value>…) • DDL/DML (in progress) • CREATE/DROP <table> • DELETE FROM <table> WHERE <expression> • UPSERT INTO <table> [(<column>…)] • VALUES (<value>…)

  5. Sample Queries SELECT host, TRUNC(dateTime, 'DAY'), AVG(cache_hit), MIN(cache_hit), MAX(cache_hit) FROM server_metrics WHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01') GROUP BY host, TRUNC(dateTime, 'DAY') HAVING MIN(cache_hit) < 90 ORDER BY host, AVG(cache_hit) SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_category FROM product_catalogue WHERE product_category IN ('Camping', 'Hiking’) AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’) Completed

  6. Query Processing Product Metrics HTable ORG_ID DATE FEATURE Row Key TXNS Key Values IO_TIME RESPONSE_TIME • Scan • Start key: ORG_ID (:1) + DATE (:2) • End key: ORG_ID (:1) + DATE (:3) • Filter • Filter: IO_TIME > 100 • Aggregation • Intercepts scan on region server • Builds map of distinct FEATURE values • Returns one row per distinct group • Client does final merge SELECT feature, SUM(txns) FROM product_metrics WHERE org_id = :1 AND date >= :2 AND date <= :3 AND io_time > 100 GROUP BY feature

  7. Phoenix Query Optimizations • Start/stop key of scan based on AND-ed columns • Through SUBSTR, ROUND, TRUNC, LIKE • Parallelized on client by chunking over start/stop key of scan • Aggregation on region-servers through coprocessor • Inline for GROUP BY over row key ordered columns • In memory map per group otherwise • WHERE clause executed through custom filters • Incremental evaluation with early termination • Evaluated through byte pointers • IN and OR over same column (in progress) • Becomes batched get or filter with next row hint • Top N queries (future) • Through coprocessor keeping top N rows • TABLESAMPLE (future) • Becomes filter with next row hint Completed

  8. Phoenix Performance

  9. Phoenix Performance Completed

  10. Phoenix Roadmap • Increase breadth of SQL support • DML/DDL (in progress) • Derived tables (SELECT * FROM (SELECT foo FROM bar)) • More built-in functions: COALESCE, UPPER, TRIM • More operators: ||, IS NULL, *,/,+,- • Secondary indexes • Multiple projections for immutable data • Reordered columns in row key • Different levels of aggregation • Incrementally maintained for non immutable data • TABLESAMPLE for sampling • Improve multi-byte support • Joins • Hash join • OLAP extensions • OVER • PARTITION BY Completed

  11. Demo Completed Time-series database charting http://goo.gl/61WRs

  12. Thank you! Questions/comments?

More Related