100 likes | 395 Views
Hive Index. Yongqiang He Software Engineer Facebook Data Infrastructure Team. Agenda. Create Index. C REATE INDEX index_name ON TABLE table_name ( col_name …) AS ‘ index handler class name ’ [WITH DEFERRED REBUILD] [INDEXPROPERTIES ( prop_key = prop_value , … ) ]
E N D
Hive Index • Yongqiang He • Software Engineer • Facebook Data Infrastructure Team
Create Index • C REATE INDEX index_name • ON TABLE table_name(col_name…) • AS ‘index handler class name’ • [WITH DEFERRED REBUILD] • [INDEXPROPERTIES (prop_key=prop_value, …)] • [IN TABLE index_table_name] • [[ROW FORMAT …] STORED AS …] • EXAMPLE 1: • CREATE TABLE src (key int, value string); • CREATE INDEX src_index ON TABLE src(key) as 'COMPACT' WITH DEFERRED REBUILD STORED AS RCFILE; • EXAMPLE 2: • CREATE TABLE srcpart_rc (key int, value string) PARTITIONED BY (ds string, hr int) STORED AS RCFILE; • CREATE INDEX src_part_indexON TABLEsrcpart_rc(key) as 'COMPACT' WITH DEFERRED REBUILD;
Update Index / Rebuild Index • ALTER INDEX index_nameON table_name [partitionSpec] REBUILD; • EXAMPLE : • ALTER INDEXsrc_indexON srcREBUILD; • ALTER INDEXsrc_part_indexON srcpart_rcREBUILD; • ALTER INDEX src_part_indexON srcpart_rcpartition(ds=‘2010-08-09’) REBUILD;
No Optimizer Available right now. • Need to use index manually in query. • (Working on an index optimizer to direct a query against index.) Use Index • designed to help queries with filter clause (point query, range query). • EXAMPLE : • Original query:SELECT key, value FROM srcpart_rc WHERE key=100; • Query with index: • INSERT OVERWRITE DIRECTORY "/tmp/index_result" SELECT `_bucketname` , `_offsets` FROM default__srcpart_rc_srcpart_rc_index__ WHERE key=100; • SET hive.index.compact.file=/tmp/index_result; • SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; • SELECT key, value FROM srcpart_rc WHERE key=100 ORDER BY key;
Metastore upgrade script • MySQL: • http://wiki.apache.org/hadoop/Hive/IndexDev#Metastore_Upgrades