1 / 18

Hive 101: Hive Query Language

Hive 101: Hive Query Language. 2014-08-21. Jeff Clouse. Agenda. What is Hive HUE HQL Select Operators Functions Joins Sub Queries Union Hive best practices. What is Hive. High level implementation of MapReduce Language is Hive Query Language - HQL

ryan-franks
Download Presentation

Hive 101: Hive Query Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hive 101: Hive Query Language 2014-08-21 Jeff Clouse

  2. Agenda • What is Hive • HUE • HQL • Select • Operators • Functions • Joins • Sub Queries • Union • Hive best practices

  3. What is Hive • High level implementation of MapReduce • Language is Hive Query Language - HQL • HQL is a subset of ANSI SQL with extensions • Metadata is stored in MySQL • Semantics are very much like Oracle and MySQL • There are no Updates

  4. What is Hive • Hive tables • External Tables • Warehouse Tables • Drops in HIVE External tables delete metadata • Drops in the HIVE warehouse really delete

  5. HUE • Hadoop User Experience • Provides web access to Hive

  6. HQL Select Syntax • Select • Select * From t1 • Distinct • Select Distinct col1 From t1 • Where • Select * From t1 wherecol1 = ‘US’ • Limit • Select * From t1 limit 5 • Group By • Select col1, sum(col2) as Total From t1 group by col1 • Order By • Select col1, sum(col2) as Total From t1 group by col1 order by col1 • Having • Select col1, sum(col2) as Total From t1 group by col1 having sum(col2) > 50

  7. HQL Predicate Operators • = Equals • <=> Equals or both sides are NULL • <>, != Not equal • < Less Than • <= Less than or equal to • > Greater than • >= Greater than or equal to • [not] between Value is equal to or between two values • is [not] NULL Check Value for NULL • like Value is like another value. Wildcards are % and _

  8. HQL Arithmetic Operators • A - B Subtract B from A • A * B Multiply A and B • A / B Divide A by B • A + B Add A and B • A % B The remainder resulting from A/B • A & B Bitwise and of A and B • A | B Bitwise or of A and B • A ^ B Bitwise xor of A and B • ~A Bitwise negation of A

  9. HQL Logical Operators • A and B, A && B Boolean and of A and B • A or B, A || B Boolean or of A and B • NOT A, !A Boolean negation of A • A [NOT] IN (B,…) A is in [or not] a set of values

  10. HQL Functions • Round(A) • Round(A,2) • Floor(A) • Ceiling(A) • Rand() • Year(date) • Month(date) • Datediff(date1, date2) • Date_add(startdate, days) • Length(A) • Upper(A) • Concat(A, B, …) • Substring(A, start ,len) • Trim(A) • Sum(A) • Count(*) • Min(A) • Max(A)

  11. HQL Joins • Join • Select * from table1 t1 join table2 t2 on t1.key = t2.key • Only returns records from both tables • Outer Joins • Left • Select * from table1 t1 left join table2 t2 on t1.key = t2.key • Returns all rows from the left table, t1, and matching rows from the right table. Missing rows from the right table will be populated with NULL • Right • Select * from table1 t1 right join table2 t2 on t1.key = t2.key • Returns all rows from the right table, t2, and matching rows from the left table. Missing rows from the left table will be populated with NULL • Full • Select * from table1 t1 full outer join table2 t2 on t1.key = t2.key • Returns all rows from both tables. Missing rows from either table will be populated with NULL

  12. HQL SubQueries and Union • Used to combine multiple result sets • Only UNION ALL is supported currently • The number and name of columns returned by each select statement must be the same. Select * from ( Select col1, col2 from t1 UNION ALL select col1, col2 from t2 ) unionResults • Sub-queries are only supported in the from clause • Support for sub-queries in the where clause will be limited to IN and EXISTS in Hive 0.13

  13. Hive best practices • Smallest to largest tables for joins • Data Layout • Partition large tables • Use the partition in your where clause

  14. Partitioning – by Month Trans Table Jan Feb Dec Partitioned by Month F0100 F0200 F1200 Files within the partitions F0101 F0201 F1201 F0102 F0202 F1202 F0103 F0203 F1203 F0104 F1204 F0105

  15. Hive best practices • Smallest to largest tables for joins • Data Layout • Partition large tables • Use the partition in your where clause • Bucketing

  16. Bucketing– by Basket_id Trans_item Trans Tables Files containing Rows with same hash for Bucket_Id

  17. Hive best practices • Smallest to largest tables for joins • Data Layout • Partition large tables • Use the partition in your where clause • Bucketing • Data Sampling • Bucket TABLESAMPLE(bucket 30 out of 64 on basket_id) • Block TABLESAMPLE(1 PERCENT) • Parallel Processing • set hive.exec.parallel=true;

  18. Questions?

More Related