1 / 12

Apache Kylin

This presentation gives an overview of the Apache Kylin project. It explains Kylin architecture in relation to Hadoop/HBase/Hive and Druid. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/

semtechs
Download Presentation

Apache Kylin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache Kylin ? ● An analytics data warehouse ● For big data / Apache 2.0 license ● Open source / written in Java ● Kylin is an OLAP engine with SQL interface ● For huge table (e.g., >100 million rows) ● Provides second level query performance at TB to PB level

  2. How does Kylin work ? ● Kylin runs on a Hadoop cluster ● It needs these services – HDFS, YARN, MapReduce, Hive, HBase, Zookeeper ● State information is stored in Hbase ● Historic data / star schema stored in Hive ● Access Kylin at http://<hostname>:7070/kylin ● Uses Lambda architecture for real time streaming – layers: Batch, speed and serving – batch / near real-time processing

  3. Kylin Software Requirements ● Requirements as of release v3.0.1 – Hadoop: 2.7+, 3.1+ (since v2.5) – Hive: 0.13 - 1.2.1+ – HBase: 1.1+, 2.0 (since v2.5) – Spark (optional) 2.3.0+ – Kafka (optional) 1.0.0+ (since v2.5) – JDK: 1.8+ (since v2.5) – OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

  4. Kylin In Cluster Mode

  5. Kylin Real Time Streaming Architecture

  6. Kylin Real Time Streaming Architecture ● Streaming Receiver – ingest data from stream data sources ● Streaming Coordinator – coordinate work loads ● Metadata Store – store streaming related metadata ● Query Engine – query real-time data from streaming receiver ● Build Engine – build cube from the real-time data

  7. Kylin Vs Druid Druid is more suitable for real time analysis. Kylin is more focused on the OLAP case. ● Druid has good integration with Kafka for real time streaming analysis. The real time capability of Kylin (v3) is for real time OLAP. ● Druid uses bitmap indexes for internal data structures. Kylin uses bitmap indexes for real time data and molap cubes for historical data. ● Kylin provide ANSI SQL, Druid provides a specific query language. ● Druid has limitations on table join, Kylin supports star schema. ● Kylin has good integration with BI tools, such as Tableau or Excel. Druid has limited integration with existing BI tools. ● Since Kylin supports molap cubes, it has very good performance for complex queries on billion level data sets. ● Since Druid needs to scan the full index, the performance may be hurt if the data set and query range is too big. ●

  8. Some Kylin Users

  9. Kylin Ecosystem

  10. Kylin Ecosystem ● Kylin Core Fundamental framework of Kylin OLAP Engine comprises of Metadata Engine, Query Engine, Job Engine and Storage Engine to run the entire stack. It also includes a REST Server to service client requests ● Extensions Plugins to support additional functions and features ● Integration Lifecycle Management Support to integrate with Job Scheduler, ETL, Monitoring and Alerting Systems ● User Interface Allows third party users to build customized user-interface atop Kylin core ● Drivers ODBC and JDBC drivers to support different tools and products, such as Tableau

  11. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  12. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related