90 likes | 213 Views
SQL (92 and Beyond) Support for Hive. Jason Dai Principal Engineer Intel SSG (Software and Services Group). What SQL support is needed?. More SQL-92 support for analytics Complete SQL data type system
E N D
SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)
What SQL support is needed? • More SQL-92 support for analytics • Complete SQL data type system • Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function (CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc. • Full subquery support • Subqueryin WHERE clauses, correlated subquery, scalar subquery, etc. • New expressions (EXISTS, ALL, ANY, etc.) • Complete Set operators • DISTINCT UNION, INTERSECT, EXCEPT, etc. • Multiple-table SELECT statement • Update/delete? • On HBase only? • (Almost) SQL-92 compliance? • How about transaction?
What SQL support is needed (continued)? • Additional analytics support (beyond SQL-92) • Advanced OLAP functions for analysis & reporting • E.g., rank, rollup, cube, window function(SQL 2003), etc. • Advanced SQL syntax • E.g. WITH clause (SQL-99) • Procedural extensions • E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.
Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • (Almost) SQL-compliant Hive parser • A lot of work: SQL much more complex than HiveQL • HiveQL grammar file: ~61KB with 2487 lines • SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines • Also complex: many existing Hive grammar rules need to be changed • To support more complex SQL constructs (e.g., subquery) • UDF/UDAF/UDTF • For some operators (e.g., rank) Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query
Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • Analysis, transformation & optimization • SQL data type system • Subquery support (incl. subqueryunnestting) • Multiple-table SELECT • Set operations • Advanced OLAP functions • … Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query
How to Leverage Existing Works? • Project Panthera: • Our open source efforts to enable better analytics capabilities on Hadoop/HBase • https://github.com/intel-hadoop/project-panthera SQL-AST Analyzer & Translator Hive Semantic Analyzer SubqueryUnnesting Multi-Table SELECT (Open Source) SQL Parser* Hadoop MR SQL-AST Hive-AST MINUS Support INTERSECT Support SQL Driver … Query *https://github.com/porcelli/plsql-parser … • A SQL engine for Hive MapReduce • Goal: full analytical SQL support for OLAP • Subquery in WHERE clause • Correlated subquery • Multiple-table SELECT statement • … HiveQL Hive Parser Hive-AST
How to Leverage Existing Works? • NextR Hive UDFs • https://github.com/nexr/hive-udf • UDFs for Oracle dbextensions (rank, decode, nvl, etc.) • SQL windowing functions for Hive • https://github.com/hbutani/SQLWindowing