190 likes | 1.01k Views
This presentation gives an overview of the Apache Kafka project. It covers areas like producer, consumer, topic, partitions, API's, architecture and usage. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/<br><br>Music by <br><br>"Little Planet", composed and performed by Bensound from http://www.bensound.com/
E N D
What Is Apache Kafka ? ● A stream processing platform ● Open source / Apache 2.0 license ● Written in Java and Scala ● A publish/subscribe system for record streams ● Scaleable / fault tolerant ● Topic based partition FIFO queues
How Does Kafka Work ? ● Kafka runs as a cluster of servers ● Stores records in topics ● Topics are partitioned into queues ● Partitions are stored across cluster ● Consumers organised into groups ● Stream processors transform records ● Reusable connectors process queues – For instance database connectors
Kafka API'S ● Producer API – Allows applications to publish to topics ● Consumer API –Applications subscribe to topics / processdata streams ● Streams API – Applications acts as stream processor, transforming stream ● Connector API – Build reusable producers / consumers – I.E. RDBMS connectors/producers/consumers ● Admin API – For topic and broker management
Kafka Topic Queue Offsets ● Records published to Topics ● Topics are multi subscriber ● Topics contain partition queues ● A partition queue contains an sequence of records ● Each record has a queue offset ( position ) ● Consumers use the offset to read records ● Queue record retention is configurable
Kafka Producer Consumer ● Producers write to partitions i.e. Producer1 → P0 ● Producers responsible for record → partition mapping ● Kafka only guarantees order with a partition ● Kafka cluster contains <n> servers ● Partitions mapped to servers ● Consumers members of consumer groups ● Each consumer must maintain it's partition read offset
Kafka's Stack Role ● A low latency messaging system – Records load balanced across partitions ● As a storage system – Using local file system storage – Scales horizontally in terms of performance ● As a stream processing system – Using stream API to transform data ● Data replication provides fault tolerance
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration