90 likes | 112 Views
Flume Introduction, Features & Architecture<br><br><br>The article provides you the complete Apache Flume Tutorial. It explains Apache Flume used for transferring data from web servers to HDFS or HBase. This Apache Flume Tutorial covers the following:<br>uf0b7What is Apache Flume<br>uf0b7Why Apache Flume<br>uf0b7Features of Apache Flume<br>uf0b7Flume Architecture<br>uf0b7Data Flow<br>uf0b7Flume Advantages<br>uf0b7Disadvantages<br>uf0b7Flume Applications
E N D
Apache Flume from HKR Training Us About About HKR HKR Us HKR Best Training best at giving you the online and offline classes with best quantity facilities with a low price and of course there is without any compromise on quality. .We do not train students with outdated applications training. We make them work with advanced technologies under updated real time experts is our INNOVATION. Online and offline classroom training alone is no longer beneficial. Our curriculum for all courses has been designed as per latest MNC requirements is STRATEGY. Our success secret lies in our experienced faculty, who always transform students o IT Champs with their continuous efforts and support is our best ACHIEVEMENT. While What you can expect from us? A dedicated learning platform with 24*7 support is available and Best in class training materials to will help you learn Advance Techniques and Practical knowledge of all IT Technologies available.Our courses are specifically curated for both professionals as well as job-seekers. Online classes conducted by the best knowledgeable and certified trainers helps you earn certification at your convenience. Flume Introduction, Features & Architecture The article provides you the complete Apache Flume Tutorial. It explains Apache Flume used for transferring data from web servers to HDFS or HBase. This Apache Flume Tutorial covers the following: What is Apache Flume Why Apache Flume Features of Apache Flume Flume Architecture Data Flow Flume Advantages Disadvantages Flume Applications TOPI TOPIC C 01: 01: INTRODUCTION INTRODUCTION Overview Architecture Data flow mode
Reliability and Recoverability TOPIC TOPIC 02: 02: SETTING SETTING UP UP AN AN AGENT AGENT Configuring individual components Wiring the pieces together Data ingestion TOPIC TOPIC 03: 03: EXECUTING EXECUTING COMMANDS COMMANDS Network streams TOPIC TOPIC 04: 04: SETTING SETTING MULTI MULTI- -AGENT AGENT FLOW FLOW Consolidation Multiplexing the flow Configuration Defining the flow Configuring individual components Adding multiple flows in an agent TOPIC TOPIC 05: 05: CONFIGURING CONFIGURING A A MULTI MULTI AGENT AGENT FLOW FLOW Fan out flow Flume Sources Avro Source Exec Source NetCat Source Sequence Generator Source Syslog Sources Syslog TCP Source Syslog UDP Source Legacy Sources Avro Legacy Source Thrift Legacy Source Custom Source TOPIC TOPIC 06: 06: FLUME FLUME SINKS SINKS HDFS Sink Logger Sink Avro Sink IRC Sink
File Roll Sink Null Sink Hbase Sinks Hbase Sink AsyncHBase Sink Custom Sink TOPIC TOPIC 07: 07: FLUME FLUME CHANNELS CHANNELS Memory Channel JDBC Channel Recoverable Memory Channel File Channel Pseudo Transaction Channel Custom Channel Flume Channel Selectors Replicating Channel Selector Multiplexing Channel Selector Custom Channel Selector TOPIC TOPIC 08: 08: FLUME FLUME SINK SINK PROCESSORS PROCESSORS Default Sink Processor Failover Sink Processor Load balancing Sink Processor Custom Sink Processor TOPIC TOPIC 09: 09: FLUME FLUME INTERCEPTORS INTERCEPTORS Timestamp Interceptor Host Interceptor Flume Properties Property TOPIC TOPIC 10: 10: SEC SECURITY URITY Monitoring Troubleshooting Handling agent failures Compatibility HDFS AVRO
What is Apache Flume? Apache Flume is an open-source tool for collecting, aggregating, and moving huge amounts of streaming data from the external web servers to the central store, say HDFS, HBase, etc. It is a highly available and reliable service which has tunable recovery mechanisms. The main purpose of designing Apache Flume is to move streaming data generated by various applications to Hadoop Distributed FileSystem. Why Apache Flume? A company has millions of services that are running on multiple servers. Thus, produce lots of logs. In order to gain insights and understand customer behavior, they need to analyze these logs altogether. In order to process logs, a company requires an extensible, scalable, and reliable distributed data collection service. That service must be capable of performing the flow of unstructured data such as logs from source to the system where they will be processed (such as in Hadoop Distributed FileSystem). Flume is an open-source distributed data collection service used for transferring the data from source to destination. It is a reliable, and highly available service for collecting, aggregating, and transferring huge amounts of logs into HDFS. It has a simple and flexible architecture. Apache Flume is highly robust and fault-tolerant and has tunable reliability mechanisms for fail-over and recovery. It allows the collection of data collection in batch as well as in streaming mode. Features of Apache Flume Apache Flume is a robust, fault-tolerant, and highly available service. It is a distributed system with tunable reliability mechanisms for fail-over and recovery. Apache Flume is horizontally scalable. Apache Flume supports complex data flows such as multi-hop flows, fan-in flows, fan-out flows. Contextual routing etc.
Apache Flume provides support for large sets of sources, channels, and sinks. Apache Flume can efficiently ingest log data from various servers into a centralized repository. With Flume, we can collect data from different web servers in real-time as well as in batch mode. We can import large volumes of data generated by social networking sites and e-commerce sites into Hadoop DFS using Apache Flume. Apache Flume Architecture Apache Flume has a simple and flexible architecture. The data generators are Facebook, Twitter, e-commerce sites, or various other external sources. A data collector collects data from the agents, aggregates them, and pushes them into a centralized repository such as HBase or HDFS. Flume Event A Flume event is a basic unit of data that needs to be transferred from source to destination. Flume Agent Flume agent is an independent JVM process (JVM) in Apache Flume. Agent receives events from clients or other Flume agents and passes it to its next destination which can be sink or other agents. Flume Agent contains three main components. Source A source receives data from the data generators. It transfers the received data to one or more channels in the form of events. Flume provides support for several types of sources. Example − Exec source, Thrift source, Avro source, twitter 1% source, etc. Channel A channel receives the data or events from the flume source and buffers them till the sinks consume them. It is a transient store. Flume supports different types of channels.
Example − Memory channel, File system channel, JDBC channel, etc. Sink A sink consumes data from the channel and stores them into the destination. The destination can be a centralized store or other flume agents. Example − HDFS sink. Additional Components of Flume Agent There are few more components other than described above that play a significant role in transferring the events. Interceptors They alter or inspect flume events transferred between the flume source and channel. Channel Selectors They determine which channel is to be chosen for transferring the data when multiple channels exist. Channel selectors are of two types- Default and multiplexing. Sink Processors Sink Processors invoke a particular sink from the group of sinks. 1. Multi-hop Flow Within Apache Flume, there can be multiple agents. So before reaching the final destination, the flume event may travel through more than one flume agent. This is called a multi-hop flow. 2. Fan-out Flow The dataflow from one flume source to multiple channels is called fan-out flow. Fan-out flow is of two types − replicating and multiplexing.
3. Fan-in Flow The fan-in flow is the data flow where data is transferred from many sources to one channel. Flume Advantages 1. Apache Flume enables us to store streaming data into any of the centralized repositories (such as HBase, HDFS). 2. Flume provides steady data flow between producer and consumer during reading2/write operations. 3. Flume supports the feature of contextual routing. 4. Apache Flume guarantees reliable message delivery. 5. Flume is reliable, scalable, extensible, fault-tolerant, manageable, and customizable. Flume disadvantages 1. Apache Flume offers weaker ordering guarantees. 2. Apache Flume does not guarantee that the messages reaching are 100% unique. 3. It has complex topology and reconfiguration is challenging. 4. Apache Flume may suffer from scalability and reliability issues. Apache Flume Applications 1. Apache Flume is used by e-commerce companies to analyze customer behavior from a particular region. 2. We can use Apache Flume to move huge amounts of data generated by application servers into the Hadoop Distributed File System at a higher speed. 3. Apache Flume is used for fraud detections. 4. We can use Apache Flume in IoT applications. 5. Apache Flume can be used for aggregating machine and sensor-generated data. 6. We can use Apache Flume in the alerting or SIEM. Summary In short, Apache Flume is an open-source tool for collecting, aggregating, and moving huge amounts of data from the external web servers to the central store. Apache Flume is a highly available and reliable service. Apache Flume can be used for ingesting data from
various applications to HDFS. It is useful for various e-commerce sites for understanding customer behavior. The Apache Flume Tutorial had explained the Flume architecture, data flow. It had also enlisted flume features, advantages, and disadvantages. To know more about it. Click the link https://hkrtrainings.com/apache-flume-training India Address :Mehdipatnam, Hyderabad, TS, India. Phone :+91 7036587777 USA Address :40640 High Street Fremont, Illinois, Chicago. Phone :+1 (872) 231 0447 Mail :info@hkrtrainings.com Website :www.hkrtrainings.com