110 likes | 204 Views
A short introduction to Apache Gora, what is it and how does it work ? How can it provide data store abstraction and persistency for big data ?
E N D
Apache Gora • What is it ? • Gora – Nutch • Supports • Data Access • API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – What is it ? • Provides for Big Data • In memory data model • Persistence • Data store abstraction • Supports persisting to • Column stores • Key/value stores • Document stores • RDBMS's • Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – What is it ? • Released via Apache 2 license • Written in Java • Offers a persistence framework • Designed for big data applications • Used by Nutch 2.x for web crawl data storage • Used for • Persistence • Indexing • Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Nutch • Nutch 2.x now uses Gora • Abstracted storage • Data store independence • Handles object to persistent mappings • Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Supports • Gora supports the following • Apache Accumulo • Apache Cassandra • Apache Hbase • Amazon DynamoDB • Pig • Hive • Cascading • MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Data Access • Java API for data access • Independent of location • Core Gora API's • Store • Persistency • Query • MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Store API • Java API – org.apache.gora.store.* • DataStore handles object persistence • DataStore methods process objects • Persist • Fetch • Query • Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Persistency API • Java API – org.apache.gora.persistency.* • Core classes • BeanFactory • Construct keys • Persistent • Persist objects • State • State managed through StateManager • NEW, CLEAN (UNMODIFIED) • DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – Query API • Java API – org.apache.gora.query.* • Core classes • Query • Constructed via DataStore • PartitionQuery • Divide results of Query into partitions. • Run queries on data nodes. • Generate Hadoop InputSplits • Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Gora – MapReduce API • Java API – org.apache.gora.mapreduce.* • GoraMapper • GoraReducer • ALL Record Counter • Reader • Writer • Hadoop / Avro • Serialise • De-serialise • Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems