120 likes | 144 Views
This presentation gives an overview of the Apache Kylin project. It explains Kylin architecture in relation to Hadoop/HBase/Hive and Druid. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache Kylin ? ● An analytics data warehouse ● For big data / Apache 2.0 license ● Open source / written in Java ● Kylin is an OLAP engine with SQL interface ● For huge table (e.g., >100 million rows) ● Provides second level query performance at TB to PB level
How does Kylin work ? ● Kylin runs on a Hadoop cluster ● It needs these services – HDFS, YARN, MapReduce, Hive, HBase, Zookeeper ● State information is stored in Hbase ● Historic data / star schema stored in Hive ● Access Kylin at http://<hostname>:7070/kylin ● Uses Lambda architecture for real time streaming – layers: Batch, speed and serving – batch / near real-time processing
Kylin Software Requirements ● Requirements as of release v3.0.1 – Hadoop: 2.7+, 3.1+ (since v2.5) – Hive: 0.13 - 1.2.1+ – HBase: 1.1+, 2.0 (since v2.5) – Spark (optional) 2.3.0+ – Kafka (optional) 1.0.0+ (since v2.5) – JDK: 1.8+ (since v2.5) – OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
Kylin Real Time Streaming Architecture ● Streaming Receiver – ingest data from stream data sources ● Streaming Coordinator – coordinate work loads ● Metadata Store – store streaming related metadata ● Query Engine – query real-time data from streaming receiver ● Build Engine – build cube from the real-time data
Kylin Vs Druid Druid is more suitable for real time analysis. Kylin is more focused on the OLAP case. ● Druid has good integration with Kafka for real time streaming analysis. The real time capability of Kylin (v3) is for real time OLAP. ● Druid uses bitmap indexes for internal data structures. Kylin uses bitmap indexes for real time data and molap cubes for historical data. ● Kylin provide ANSI SQL, Druid provides a specific query language. ● Druid has limitations on table join, Kylin supports star schema. ● Kylin has good integration with BI tools, such as Tableau or Excel. Druid has limited integration with existing BI tools. ● Since Kylin supports molap cubes, it has very good performance for complex queries on billion level data sets. ● Since Druid needs to scan the full index, the performance may be hurt if the data set and query range is too big. ●
Kylin Ecosystem ● Kylin Core Fundamental framework of Kylin OLAP Engine comprises of Metadata Engine, Query Engine, Job Engine and Storage Engine to run the entire stack. It also includes a REST Server to service client requests ● Extensions Plugins to support additional functions and features ● Integration Lifecycle Management Support to integrate with Job Scheduler, ETL, Monitoring and Alerting Systems ● User Interface Allows third party users to build customized user-interface atop Kylin core ● Drivers ODBC and JDBC drivers to support different tools and products, such as Tableau
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration