150 likes | 423 Views
What is Apache Hive in terms of big data and Hadoop ? How does it relate to business intelligence and management reporting ? Can it be used with Business Objects ?
E N D
Apache Hadoop Hive • What is it ? • Architecture • Related Projects • Hive DDL • Hive DML • HiveQL Examples • Business Intelligence
Hadoop – What is it ? • A data warehouse for Hadoop • Open source writen in Java • Holds meta data in a relational database • Allows SQL like queries • Supports “big data” data sets • Offers built in and user defined functions • Has indexing
Hive – Architecture Where does Hive sit in the Hadoop architecture ?
Hive – Architecture • Given an existing HDFS and Hadoop cluster • Then add Hive and the meta data structure • Use Flume and Sqoop to move data • Use Hive LOAD DATA command to load from flat files • Use ODBC for connectivity to your BI layer
Hive – Related Projects • Apache Flume – move large data sets to Hadoop • Apache Sqoop – cmd line, move rdbms data to Hadoop • Apache Hbase – Non relational database • Apache Pig – analyse large data sets • Apache Oozie – work flow scheduler • Apache Mahout – machine learning and data mining • Apache Hue – Hadoop user interface • Apache Zoo Keeper – configuration / build
Hive - DDL • Create table hive> CREATE TABLE customer (age INT, address STRING); • Partitions hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ; • Show table hive> SHOW TABLES ; • Describe table hive> DESCRIBE customer;
Hive - DDL • Alter table hive> ALTER TABLE customer ADD COLUMNS ( age INT) ; • Drop table hive> DROP TABLE customer;
Hive - DML • Loading flat files into Hive hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer; • No verification of incoming data
HiveQL Examples • HiveQL, an SQL like language hive> SELECT a.age FROM customer a WHERE a.sdate ='2008-08-15'; selects all data from table for a partition but doesnt store it hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15'; writes all of customer table to an hdfs directory
Hive – Business Intelligence • Use ODBC to connect Hive to your BI layer • Now you can use BI tools like Business Objects • Create a universe over the Hive instance • Create reports against the universe • Create add hoc queries against the universe
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems