110 likes | 299 Views
An introduction to Cloudera Impala, what is it and how does it work ? How can it bring real time performance gains to Apache Hadoop ?
E N D
Impala • What is it ? • How does it work ? • Performance • Formats • Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – What is it ? • Adhoc real time query for Hadoop • Open source • Developed by Cloudera • Based on Google 2010 dremel paper • Direct data access via Impala engine • Future Hadoop parquet update will • Add columnar binary storage to Hadoop • Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – How does it work ? • Direct data access • Query planning / coordination on data nodes • Node based query engine • Low latency • Perfomance imrovement • Query data on HDFS or Hbase • Uses same Hive QL syntax ( SQL like ) • Has the Hue GUI • Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – Performance Impala delivers performance gains • IO bound queries – hardware limitations • Min 3 times • Complex – multiple MapReduce stages • Min 7 times • Cached queries • Min 20 times www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – Formats Supported formats • Text & Sequence Files which can be compressed as • Snappy • GZIP • BZIP • Future support for • Avro • RCFile • LZO text file • Parquet www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – Requirements What does Impala need to run ? • CentOS 6.2 • or RHEL (Red Hat Enterprise Linux) • CDH 4.1 (Cloudera Hadoop Distribution) • Cloudera Manager ( advised ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems