Apache Kylin

What Is Apache Kylin ? ● An analytics data warehouse ● For big data / Apache 2.0 license ● Open source / written in Java ● Kylin is an OLAP engine with SQL interface ● For huge table (e.g., >100 million rows) ● Provides second level query performance at TB to PB level

How does Kylin work ? ● Kylin runs on a Hadoop cluster ● It needs these services – HDFS, YARN, MapReduce, Hive, HBase, Zookeeper ● State information is stored in Hbase ● Historic data / star schema stored in Hive ● Access Kylin at http://<hostname>:7070/kylin ● Uses Lambda architecture for real time streaming – layers: Batch, speed and serving – batch / near real-time processing

Kylin Software Requirements ● Requirements as of release v3.0.1 – Hadoop: 2.7+, 3.1+ (since v2.5) – Hive: 0.13 - 1.2.1+ – HBase: 1.1+, 2.0 (since v2.5) – Spark (optional) 2.3.0+ – Kafka (optional) 1.0.0+ (since v2.5) – JDK: 1.8+ (since v2.5) – OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

Kylin In Cluster Mode

Kylin Real Time Streaming Architecture

Kylin Real Time Streaming Architecture ● Streaming Receiver – ingest data from stream data sources ● Streaming Coordinator – coordinate work loads ● Metadata Store – store streaming related metadata ● Query Engine – query real-time data from streaming receiver ● Build Engine – build cube from the real-time data

Kylin Vs Druid Druid is more suitable for real time analysis. Kylin is more focused on the OLAP case. ● Druid has good integration with Kafka for real time streaming analysis. The real time capability of Kylin (v3) is for real time OLAP. ● Druid uses bitmap indexes for internal data structures. Kylin uses bitmap indexes for real time data and molap cubes for historical data. ● Kylin provide ANSI SQL, Druid provides a specific query language. ● Druid has limitations on table join, Kylin supports star schema. ● Kylin has good integration with BI tools, such as Tableau or Excel. Druid has limited integration with existing BI tools. ● Since Kylin supports molap cubes, it has very good performance for complex queries on billion level data sets. ● Since Druid needs to scan the full index, the performance may be hurt if the data set and query range is too big. ●

Some Kylin Users

Kylin Ecosystem

Kylin Ecosystem ● Kylin Core Fundamental framework of Kylin OLAP Engine comprises of Metadata Engine, Query Engine, Job Engine and Storage Engine to run the entire stack. It also includes a REST Server to service client requests ● Extensions Plugins to support additional functions and features ● Integration Lifecycle Management Support to integrate with Job Scheduler, ETL, Monitoring and Alerting Systems ● User Interface Allows third party users to build customized user-interface atop Kylin core ● Drivers ODBC and JDBC drivers to support different tools and products, such as Tableau

Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Kylin

Apache Kylin

Presentation Transcript

Apache Sandesha and Apache Axis2

Apache

Apache

Apache

Apache

Apache

The apache

Apache

APACHE

Apache

Apache

Apache

APACHE

Apache

Apache

Apache

APACHE

APACHE KYLIN - Hybrid OLAP (HOLAP) Solution

KYLIN KING LIMITED

Suzhou Kylin Textile Technology Co., Ltd.

Suzhou Kylin Textile Technology Co., Ltd.

Apache