Apache Druid

What Is Druid ? ● Real Time Analytics Database ● Distributed Architecture ● Open Source ● Highly Performant ● Time Series Database ● Apache 2 License ● Written in Java

Druid Use Cases ● User activity and behaviour ● Network flows ● Digital marketing ● Application performance management ● IoT and device metrics ● OLAP and business intelligence For real time data ingestion, fast query and high uptime.

Druid Features ● Column-oriented storage ● Native search indexes ● Streaming and batch ingest ● Flexible schemas ● Time-optimized partitioning ● SQL support ● Horizontal scalability ● Easy operation

Druid Users ● Airbnb ● Outbrain ● Alibaba ● Paypal ● Booking.com ● Pinterest ● Cisco ● Slack ● Ebay ● Twitter ● Hulu ● Walmart ● Lyft ● Yahoo Some of the more famous users among many others

Druid MetaStore ● Stores Metadata about system and data stored ● Can use the following databases – Derby, MySQL, Postgresql ● Stores Meta data information like – Segments, Rules, Config – Tasks, Audit

Druid Deep Storage ● Deep storage persists Druid segment data ● Uses storage like – Local Mounts, AWS S3, HDFS ● Core extensions available from Druid committers ● Extension examples include – Azure, Cassandra, Cloudfiles

Druid Architecture

Druid Architecture 2

Druid Processes ● Historical – store and query historic data ● MiddleManager – ingest new data ● Broker – process client queries ● Coordinator – watch over Historical processes ● Overlord - watch over MiddleManager processes ● Router – optional – provide a unified API gateway

Druid Query ● Druid supports JSON and SQL based queries ● The SQL syntax is as follows ● GROUPING SETS improves efficiency, reduces scanning ● ROLLUP provides grouped data for each level of data ● CUBE provides grouped data for each combination of data

Druid High Availability (HA) ● Use 3 or 5 ZooKeeper nodes on own hardware ● MetaStore use MySQL or Postgresql – With replication and failover ● Use multiple Coordinators and Overlords – Using same metaStore and ZooKeeper ● Scale Brokers horizontally ● Use a load balancer

Available Books ● See “Big Data Made Easy” Apress Jan 2015 – ●See “Mastering Apache Spark” Packt Oct 2015 – ●See “Complete Guide to Open Source Big Data Stack “Apress Jan 2018” – – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – ●Connect on LinkedIn www.linkedin.com/in/mike-frampton-38563020 –

Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Druid

Apache Druid

Presentation Transcript

Druid Hills High School

Apache

Apache

Apache

Apache

Apache

APACHE

Apache

Apache

DRUID Summer Conference 2006

Druid Hills High School

DRUID

DRUID Study recruitment

Druid Hills High School

Apache

Celtic Druid

APACHE

Apache

Apache

APACHE

Druid Hills High School

Druid Tree, 9783948499