120 likes | 226 Views
James Taylor jtaylor@salesforce.com. Phoenix. We put the SQL back in the NoSQL. Agenda. Phoenix Overview Phoenix Implementation Performance Analysis Phoenix Roadmap Demo. Completed. Phoenix Overview. SQL layer on top of HBase Delivered as a embedded JDBC driver
E N D
James Taylor jtaylor@salesforce.com Phoenix We put the SQL back in the NoSQL
Agenda • Phoenix Overview • Phoenix Implementation • Performance Analysis • Phoenix Roadmap • Demo Completed
Phoenix Overview • SQL layer on top of HBase • Delivered as a embedded JDBC driver • Targeting low latency queries over HBase data • Columns modeled as multi-part row key and key values • Query engine transforms SQL into series of scans • Using native HBase APIs and capabilities • Coprocessors for aggregation • Custom filters for expression evaluation • Transaction isolation through scan time range • Optionally client-controlled timestamps • Open sourcing soon • 100% Java Completed
Phoenix SQL Support • SELECT <expression>… • FROM <table> • WHERE <expression> • GROUP BY <expression>… • HAVING <aggregate expression> • ORDER BY <aggregate expression>… • LIMIT <value> • Aggregation Functions • MIN, MAX, AVG, SUM, COUNT • Built-in Functions • SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE • Operators • =,!=,<>,<,<=,>,>=, LIKE • AND, OR, NOT • Bind Parameters • ?, :# • CASE WHEN • IN (<value>…) • DDL/DML (in progress) • CREATE/DROP <table> • DELETE FROM <table> WHERE <expression> • UPSERT INTO <table> [(<column>…)] • VALUES (<value>…)
Sample Queries SELECT host, TRUNC(dateTime, 'DAY'), AVG(cache_hit), MIN(cache_hit), MAX(cache_hit) FROM server_metrics WHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01') GROUP BY host, TRUNC(dateTime, 'DAY') HAVING MIN(cache_hit) < 90 ORDER BY host, AVG(cache_hit) SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_category FROM product_catalogue WHERE product_category IN ('Camping', 'Hiking’) AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’) Completed
Query Processing Product Metrics HTable ORG_ID DATE FEATURE Row Key TXNS Key Values IO_TIME RESPONSE_TIME • Scan • Start key: ORG_ID (:1) + DATE (:2) • End key: ORG_ID (:1) + DATE (:3) • Filter • Filter: IO_TIME > 100 • Aggregation • Intercepts scan on region server • Builds map of distinct FEATURE values • Returns one row per distinct group • Client does final merge SELECT feature, SUM(txns) FROM product_metrics WHERE org_id = :1 AND date >= :2 AND date <= :3 AND io_time > 100 GROUP BY feature
Phoenix Query Optimizations • Start/stop key of scan based on AND-ed columns • Through SUBSTR, ROUND, TRUNC, LIKE • Parallelized on client by chunking over start/stop key of scan • Aggregation on region-servers through coprocessor • Inline for GROUP BY over row key ordered columns • In memory map per group otherwise • WHERE clause executed through custom filters • Incremental evaluation with early termination • Evaluated through byte pointers • IN and OR over same column (in progress) • Becomes batched get or filter with next row hint • Top N queries (future) • Through coprocessor keeping top N rows • TABLESAMPLE (future) • Becomes filter with next row hint Completed
Phoenix Performance Completed
Phoenix Roadmap • Increase breadth of SQL support • DML/DDL (in progress) • Derived tables (SELECT * FROM (SELECT foo FROM bar)) • More built-in functions: COALESCE, UPPER, TRIM • More operators: ||, IS NULL, *,/,+,- • Secondary indexes • Multiple projections for immutable data • Reordered columns in row key • Different levels of aggregation • Incrementally maintained for non immutable data • TABLESAMPLE for sampling • Improve multi-byte support • Joins • Hash join • OLAP extensions • OVER • PARTITION BY Completed
Demo Completed Time-series database charting http://goo.gl/61WRs
Thank you! Questions/comments?