Apache Fluo

What Is Apache Fluo ? ● For large scale data set incremental updates ● Open source Apache 2.0 license ● Based upon Apache Accumulo – Uses Hadoop HDFS to store data – Uses ZooKeeper for configuration – Partitions tables into tablets ● It is a distributed system ● Supports cross node transactions

What Is Apache Fluo ? ● Allows monitoring of large datasets to – Identify small changes – Join changes into the larger data set – Without processing all data ● Transactions allows many current changes – Without data corruption ● Fluo uses code based observers which – Act on table column changes ● Offers a Fluo Java based API

What Is Apache Fluo ? ● Use of Fluo is code based and low level ● Fluo uses Hadoop YARN to run its processes ● Fluo uses ZooKeeper to – Store its meta data – Store its state information ● Fluo data is stored in Fluo tables on Accumulo ( HDFS) – Same structure as Accumulo except – Row has no timestamps

Fluo Architecture

Fluo Architecture ● Large scale computation through small scale transactions ● Clients access Fluo through Java API ● Clients ingest data through the API ● Application Oracle processes apply transaction timestamps ● Application worker processes run user code ● User code/observers monitor column changes ● Multiple workers can run the same observers ● Transactions change data, snapshots read data

Fluo Architecture ● Fluo provides snapshot isolation ● A snapshot only sees pre committed transactions ● Transaction overlap / collision is possible ● In this case a write skew is possible if – Different keys are concurrently updated ● Fluo supports scanners to read data ranges or spans ● Fluo has a transaction based LoaderExecutor – To aid the loading of data

Fluo Architecture ● Fluo supports incremental processing via ● Notifications – Persistent markers set by a transaction that Indicate – An Observer should run later for a certain row+column ● Observers – User provided code that is registered to – Process notifications for a certain column ●Observer receives row/column that triggered it plus transaction ●Fluo worker processes running across a cluster ● Will execute Observers

Fluo Architecture ● Fluo supports two types of notification ● Strong notification – Guarantee an observer will run at most once – When a column is modified – Even for multiple row+column updates ● Weak notification – Cause an observer to run at least once – Observers may run multiple times and/or concurrently – Based on a single weak notification

Fluo Row Locking

Fluo Row Locking ● For cross node transactions Fluo uses – Accumulo conditional mutations ●Conditional mutations lock entire rows ● On the server side when checking conditions ● Row locks can impact the transaction performance ● May be a problem if – Many transactions will update separate columns in a row – Those transactions are very likely to run concurrently

Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Fluo

Apache Fluo

Presentation Transcript

Apache Sandesha and Apache Axis2

Apache

Apache

Apache

Apache

Apache

The apache

Apache

Apache Mesos

APACHE

Apache

Apache

FLUO IN DE DISCO

Changes of Fluo-3 intensity (%)

NEW FLUO

Apache

APACHE

Apache

Apache

Apache

APACHE

Incandescente VS fluo-compact