310 likes | 812 Views
A short introduction to Apache Storm, what is it and how does it work ? How can it provide real time data processing for big data ?
E N D
Apache Storm • What is it ? • Architecture • Storm Vs Hadoop • History • Terms www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – What is it ? • A real time big data processing system • Stream based • Fault tolerant and distributed • Non persistent • In the Apache incubator • Written in Clojure and Java • Released via an Eclipse license www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Storm Vs Hadoop • Storm • Distributed & fault tolerant • Real time / stream based • Master/slave plus Zoo Keeper • Non persistent • Big Data analysis • Hadoop • Distributed & fault tolerant • Batch / file based • Master/slave plus Zoo Keeper • Persistent, uses HDFS • Big Data Analysis www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Storm Vs Hadoop • Hadoop Versus Storm • They are complementary technologies • They might both be used in a single system • Storm to process real time streams of data • Hadoop and M/R to process batched data on HDFS www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Architecture • Storm architecture at a high level www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Architecture • Composed of stream of tuples, bolted together • sourced via spouts www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Architecture • From these components we form topologies www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – History • What is Apache Storm's history ? • Developed by BackType • Acquired by Twitter • Open sourced by Twitter in Sept 2011 • Added to Apache Incubator in 2013 www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Storm – Terms • Tuple – an ordered list of elements • Stream – an unbounded feed of tuples • Spout – like a tap or faucet, a source of streams • Bolt – Functions / Filters etc to process streams • Topologies – ETL like architectures built from • Spouts, Streams, Bolts • Nimbus – master node, like Hadoop job tracker • Supervisor – controls worker processes www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems