120 likes | 128 Views
This presentation gives an overview of the Apache Fluo project. It explains Apache Fluo in terms of it's architecture, functionality and transactions. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache Fluo ? ● For large scale data set incremental updates ● Open source Apache 2.0 license ● Based upon Apache Accumulo – Uses Hadoop HDFS to store data – Uses ZooKeeper for configuration – Partitions tables into tablets ● It is a distributed system ● Supports cross node transactions
What Is Apache Fluo ? ● Allows monitoring of large datasets to – Identify small changes – Join changes into the larger data set – Without processing all data ● Transactions allows many current changes – Without data corruption ● Fluo uses code based observers which – Act on table column changes ● Offers a Fluo Java based API
What Is Apache Fluo ? ● Use of Fluo is code based and low level ● Fluo uses Hadoop YARN to run its processes ● Fluo uses ZooKeeper to – Store its meta data – Store its state information ● Fluo data is stored in Fluo tables on Accumulo ( HDFS) – Same structure as Accumulo except – Row has no timestamps
Fluo Architecture ● Large scale computation through small scale transactions ● Clients access Fluo through Java API ● Clients ingest data through the API ● Application Oracle processes apply transaction timestamps ● Application worker processes run user code ● User code/observers monitor column changes ● Multiple workers can run the same observers ● Transactions change data, snapshots read data
Fluo Architecture ● Fluo provides snapshot isolation ● A snapshot only sees pre committed transactions ● Transaction overlap / collision is possible ● In this case a write skew is possible if – Different keys are concurrently updated ● Fluo supports scanners to read data ranges or spans ● Fluo has a transaction based LoaderExecutor – To aid the loading of data
Fluo Architecture ● Fluo supports incremental processing via ● Notifications – Persistent markers set by a transaction that Indicate – An Observer should run later for a certain row+column ● Observers – User provided code that is registered to – Process notifications for a certain column ●Observer receives row/column that triggered it plus transaction ●Fluo worker processes running across a cluster ● Will execute Observers
Fluo Architecture ● Fluo supports two types of notification ● Strong notification – Guarantee an observer will run at most once – When a column is modified – Even for multiple row+column updates ● Weak notification – Cause an observer to run at least once – Observers may run multiple times and/or concurrently – Based on a single weak notification
Fluo Row Locking ● For cross node transactions Fluo uses – Accumulo conditional mutations ●Conditional mutations lock entire rows ● On the server side when checking conditions ● Row locks can impact the transaction performance ● May be a problem if – Many transactions will update separate columns in a row – Those transactions are very likely to run concurrently
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration