1 / 9

SQL (92 and Beyond) Support for Hive

SQL (92 and Beyond) Support for Hive. Jason Dai Principal Engineer Intel SSG (Software and Services Group). What SQL support is needed?. More SQL-92 support for analytics Complete SQL data type system

tambre
Download Presentation

SQL (92 and Beyond) Support for Hive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

  2. What SQL support is needed? • More SQL-92 support for analytics • Complete SQL data type system • Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function (CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc. • Full subquery support • Subqueryin WHERE clauses, correlated subquery, scalar subquery, etc. • New expressions (EXISTS, ALL, ANY, etc.) • Complete Set operators • DISTINCT UNION, INTERSECT, EXCEPT, etc. • Multiple-table SELECT statement • Update/delete? • On HBase only? • (Almost) SQL-92 compliance? • How about transaction?

  3. What SQL support is needed (continued)? • Additional analytics support (beyond SQL-92) • Advanced OLAP functions for analysis & reporting • E.g., rank, rollup, cube, window function(SQL 2003), etc. • Advanced SQL syntax • E.g. WITH clause (SQL-99) • Procedural extensions • E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.

  4. Workload Analysis

  5. Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • (Almost) SQL-compliant Hive parser • A lot of work: SQL much more complex than HiveQL • HiveQL grammar file: ~61KB with 2487 lines • SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines • Also complex: many existing Hive grammar rules need to be changed • To support more complex SQL constructs (e.g., subquery) • UDF/UDAF/UDTF • For some operators (e.g., rank) Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query

  6. Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • Analysis, transformation & optimization • SQL data type system • Subquery support (incl. subqueryunnestting) • Multiple-table SELECT • Set operations • Advanced OLAP functions • … Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query

  7. How to Leverage Existing Works? • Project Panthera: • Our open source efforts to enable better analytics capabilities on Hadoop/HBase • https://github.com/intel-hadoop/project-panthera SQL-AST Analyzer & Translator Hive Semantic Analyzer SubqueryUnnesting Multi-Table SELECT (Open Source) SQL Parser* Hadoop MR SQL-AST Hive-AST MINUS Support INTERSECT Support SQL Driver … Query *https://github.com/porcelli/plsql-parser … • A SQL engine for Hive MapReduce • Goal: full analytical SQL support for OLAP • Subquery in WHERE clause • Correlated subquery • Multiple-table SELECT statement • … HiveQL Hive Parser Hive-AST

  8. How to Leverage Existing Works? • NextR Hive UDFs • https://github.com/nexr/hive-udf • UDFs for Oracle dbextensions (rank, decode, nvl, etc.) • SQL windowing functions for Hive • https://github.com/hbutani/SQLWindowing

More Related