130 likes | 147 Views
This presentation gives an overview of the Apache Druid project. It covers areas like use cases, features, architecture and users. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/<br><br>Music by <br><br>"Little Planet", composed and performed by Bensound from http://www.bensound.com/
E N D
What Is Druid ? ● Real Time Analytics Database ● Distributed Architecture ● Open Source ● Highly Performant ● Time Series Database ● Apache 2 License ● Written in Java
Druid Use Cases ● User activity and behaviour ● Network flows ● Digital marketing ● Application performance management ● IoT and device metrics ● OLAP and business intelligence For real time data ingestion, fast query and high uptime.
Druid Features ● Column-oriented storage ● Native search indexes ● Streaming and batch ingest ● Flexible schemas ● Time-optimized partitioning ● SQL support ● Horizontal scalability ● Easy operation
Druid Users ● Airbnb ● Outbrain ● Alibaba ● Paypal ● Booking.com ● Pinterest ● Cisco ● Slack ● Ebay ● Twitter ● Hulu ● Walmart ● Lyft ● Yahoo Some of the more famous users among many others
Druid MetaStore ● Stores Metadata about system and data stored ● Can use the following databases – Derby, MySQL, Postgresql ● Stores Meta data information like – Segments, Rules, Config – Tasks, Audit
Druid Deep Storage ● Deep storage persists Druid segment data ● Uses storage like – Local Mounts, AWS S3, HDFS ● Core extensions available from Druid committers ● Extension examples include – Azure, Cassandra, Cloudfiles
Druid Processes ● Historical – store and query historic data ● MiddleManager – ingest new data ● Broker – process client queries ● Coordinator – watch over Historical processes ● Overlord - watch over MiddleManager processes ● Router – optional – provide a unified API gateway
Druid Query ● Druid supports JSON and SQL based queries ● The SQL syntax is as follows ● GROUPING SETS improves efficiency, reduces scanning ● ROLLUP provides grouped data for each level of data ● CUBE provides grouped data for each combination of data
Druid High Availability (HA) ● Use 3 or 5 ZooKeeper nodes on own hardware ● MetaStore use MySQL or Postgresql – With replication and failover ● Use multiple Coordinators and Overlords – Using same metaStore and ZooKeeper ● Scale Brokers horizontally ● Use a load balancer
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – ●See “Mastering Apache Spark” Packt Oct 2015 – ●See “Complete Guide to Open Source Big Data Stack “Apress Jan 2018” – – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – ●Connect on LinkedIn www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration