1 / 8

An introduction to Cloudera Impala

An introduction to Cloudera Impala, what is it and how does it work ? How can it bring real time performance gains to Apache Hadoop ?

semtechs
Download Presentation

An introduction to Cloudera Impala

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Impala • What is it ? • How does it work ? • Performance • Formats • Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Impala – What is it ? • Adhoc real time query for Hadoop • Open source • Developed by Cloudera • Based on Google 2010 dremel paper • Direct data access via Impala engine • Future Hadoop parquet update will • Add columnar binary storage to Hadoop • Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Impala – How does it work ? • Direct data access • Query planning / coordination on data nodes • Node based query engine • Low latency • Perfomance imrovement • Query data on HDFS or Hbase • Uses same Hive QL syntax ( SQL like )‏ • Has the Hue GUI • Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Impala – Performance Impala delivers performance gains • IO bound queries – hardware limitations • Min 3 times • Complex – multiple MapReduce stages • Min 7 times • Cached queries • Min 20 times www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. Impala – Formats Supported formats • Text & Sequence Files which can be compressed as • Snappy • GZIP • BZIP • Future support for • Avro • RCFile • LZO text file • Parquet www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. Impala – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. Impala – Requirements What does Impala need to run ? • CentOS 6.2 • or RHEL (Red Hat Enterprise Linux)‏ • CDH 4.1 (Cloudera Hadoop Distribution)‏ • Cloudera Manager ( advised ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  8. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related