160 likes | 171 Views
This presentation attempts to give an overview of the Apache NiFi flow management system currently included in Cloudera's CDF product. <br> <br><br><br>Links for further information and connecting<br><br>http://www.semtech-solutions.co.nz<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br><br>Music by <br><br>"Little Planet", composed and performed by Bensound from http://www.bensound.com/
E N D
What Is NiFi ? ● A data flow automation system maintained by Cloudera ● Written in Java ● Apache 2 License ● Cluster based and scaleable ● Has web based user interface ● Widely extendable ● Offers data flow monitoring
NiFi History ● Based on NiagaraFiles, developed by NSA ● Open sourced by NSA in 2014 ● Commercialised by Onyara Inc ● Purchased by HortonWorks in 2015 ● HortonWorks merged into Cloudera in 2018 ● Cloudera plans full open source path
How does Nifi work ? ● NiFi runs in JVM on servers in cluster ● Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary ● JVM encapsulates – Web server – Processor / Extensions – Repositories for ●FlowFile / Content / Data Provenance
Nifi Architecture 2 ● Web Server for monitoring and administration ● Flow controller manages extensions and resources ● FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow ● Extensions allow remote system connectivity – Can be user defined ● FlowFile Repo – tracks and maintains current flows ● Content Repo – maintains data in transit ● Provenance Repo – historic data flow information
Nifi Performance ● NiFi server RAM limited by JVM memory settings ● Garbage collection rate important ● Nifi.properties file for performance config i.e. – nifi.ui.autorefresh.interval (browser performance) – nifi.queue.swap.threshold (use of swap) – nifi.provenance.repository.index.threads ●Change for high volume threads – nifi.provenance.repository.implementation ●WriteAheadProvenance might cause Java garbage collection issues
NiFi Flow Management ● Guaranteed data delivery ● Uses write ahead logs and content repositories ● Queue buffering / back pressure ● Queue priority configuration ● Flow configuration ( latency / throughput ) ● UI based data flow builds ● UI based data flow monitoring ● UI based data provenance
NiFi Ease Of Use 1 ● Visually create dataFlows in real time ● Changes take immediate effect ● Use flow templates for existing flow types ● Data provenance for – Problem tracking – Data compliance issues – Step through historic data transforms ● Fine grained data investigation using UI & repositories
NiFi Security ● DataFlow based encryption / decryption ● 2 way SSL ● User access control ● Pluggable / extendable authorization possible ● DataFlow level authorization supports – Flow level component access – Supports multi tenant access / sharing – Even multi tenant support within a flow
NiFi Extensible / Scaleable ● Many NiFi points of extension – Processors, Controller Services, Reporting Tasks – Prioritizers, Customer User Interfaces ● NiFi S2S interface for distributed communication ● Extension conflicts avoided using NiFi Archives ● Scale out NiFi cluster instances ● Scale NiFi concurrent tasks up and down
NiFi Further information ● For further information see – https://nifi.apache.org – https://en.wikipedia.org/wiki/Apache_NiFi – http://vision.cloudera.com/cloudera-dataflow/ I included the Cloudera link because CDF now uses NiFi for edge data and flow management.
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – ●See “Mastering Apache Spark” Packt Oct 2015 – ●See “Complete Guide to Open Source Big Data Stack “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – ●Connect on LinkedIn nz.linkedin.com/pub/mike-frampton/20/630/385 –
Contact Us ● Feel free to contact at – info@semtech-solutions.co.nz ● Or connect on LinkedIn ● Im always interested in – New technology – Opportunities – Technology based issues – Big data integration