200 likes | 634 Views
An Introduction to Apache Flume, what is it used for and how does it work ? How does it fit into the Hadoop tool set ?
E N D
Apache Flume • What is it ? • How does it work ? • Architecture • Reliability www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – What is it ? • A data collection service for Hadoop • For distributed systems • Open source • Scaleable • Reliable • Manageable • Fault tolerant www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – How does it work ? • Flumes uses agents which have • A source • Listen for events • Write events to channel • A channel • Queue event data as transactions • A sink • Write event data to target i.e. HDFS • Remove event from queue www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – Architecture • A single agent showing its parts • Generally one agent for a given data type www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – Architecture • Agents can be chained into flows • Avro can be used for data serialization www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – Architecture In complicated flows it may be necessary to think about • Event Data Reliability • Should we have • Complete end to end reliability • Send and forget • Or something in between ? www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Flume – Architecture • Complex flows may have many links www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems