Apache Kafka

What Is Apache Kafka ? ● A stream processing platform ● Open source / Apache 2.0 license ● Written in Java and Scala ● A publish/subscribe system for record streams ● Scaleable / fault tolerant ● Topic based partition FIFO queues

How Does Kafka Work ? ● Kafka runs as a cluster of servers ● Stores records in topics ● Topics are partitioned into queues ● Partitions are stored across cluster ● Consumers organised into groups ● Stream processors transform records ● Reusable connectors process queues – For instance database connectors

Kafka API'S ● Producer API – Allows applications to publish to topics ● Consumer API –Applications subscribe to topics / processdata streams ● Streams API – Applications acts as stream processor, transforming stream ● Connector API – Build reusable producers / consumers – I.E. RDBMS connectors/producers/consumers ● Admin API – For topic and broker management

Kafka Logical Architecture

Kafka Topic Queue Offsets

Kafka Topic Queue Offsets ● Records published to Topics ● Topics are multi subscriber ● Topics contain partition queues ● A partition queue contains an sequence of records ● Each record has a queue offset ( position ) ● Consumers use the offset to read records ● Queue record retention is configurable

Kafka Producer Consumer

Kafka Producer Consumer ● Producers write to partitions i.e. Producer1 → P0 ● Producers responsible for record → partition mapping ● Kafka only guarantees order with a partition ● Kafka cluster contains <n> servers ● Partitions mapped to servers ● Consumers members of consumer groups ● Each consumer must maintain it's partition read offset

Kafka's Stack Role ● A low latency messaging system – Records load balanced across partitions ● As a storage system – Using local file system storage – Scales horizontally in terms of performance ● As a stream processing system – Using stream API to transform data ● Data replication provides fault tolerance

Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Kafka

Apache Kafka

Presentation Transcript

Franz Kafka

Franz Kafka

Franz Kafka

Apache Kafka

Franz Kafka

Franz Kafka

FRANZ KAFKA

Apache Samza * Reliable Stream Processing atop Apache Kafka and Yarn

Franz Kafka

Ivan Kafka

FRANZ KAFKA

Franz Kafka

Franz Kafka

G. KAFKA

FRANCAS KAFKA

Franz kafka

Apache Kafka Courses

Apache Kafka Plugin-ORIEN IT

Apache Pulsar vs Apache Kafka [Infographic]

Integrating Apache NiFi and Apache Kafka

kafka ppt

Apache Kafka 3.0 -Master the Leading Data Streaming Technology