50 likes | 79 Views
Impala is an open-source tool by Cloudera offering ad hoc real-time queries for Hadoop, based on Google's 2010 Dremel paper. It enables direct data access and upcoming updates promise enhanced performance with columnar storage. Learn more at Semtech Solutions.
E N D
Impala • What is it ? • How does it work ? • Performance • Formats • Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – What is it ? • Adhoc real time query for Hadoop • Open source • Developed by Cloudera • Based on Google 2010 dremel paper • Direct data access via Impala engine • Future Hadoop parquet update will • Add columnar binary storage to Hadoop • Improve Impala performance www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Impala – How does it work ? • Direct data access • Query planning / coordination on data nodes • Node based query engine • Low latency • Perfomance imrovement • Query data on HDFS or Hbase • Uses same Hive QL syntax ( SQL like ) • Has the Hue GUI • Allows table joins and aggregation www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Interesting, right? This is just a sneak preview of the full presentation. We hope you like it! To see the rest of it, just click here to view it in full on PowerShow.com. Then, if you’d like, you can also log in to PowerShow.com to download the entire presentation for free.