1 / 11

An introduction to Apache Gora

A short introduction to Apache Gora, what is it and how does it work ? How can it provide data store abstraction and persistency for big data ?

semtechs
Download Presentation

An introduction to Apache Gora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Gora • What is it ? • Gora – Nutch • Supports • Data Access • API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Apache Gora – What is it ? • Provides for Big Data • In memory data model • Persistence • Data store abstraction • Supports persisting to • Column stores • Key/value stores • Document stores • RDBMS's • Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Apache Gora – What is it ? • Released via Apache 2 license • Written in Java • Offers a persistence framework • Designed for big data applications • Used by Nutch 2.x for web crawl data storage • Used for • Persistence • Indexing • Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Apache Gora – Nutch • Nutch 2.x now uses Gora • Abstracted storage • Data store independence • Handles object to persistent mappings • Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. Apache Gora – Supports • Gora supports the following • Apache Accumulo • Apache Cassandra • Apache Hbase • Amazon DynamoDB • Pig • Hive • Cascading • MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. Apache Gora – Data Access • Java API for data access • Independent of location • Core Gora API's • Store • Persistency • Query • MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. Apache Gora – Store API • Java API – org.apache.gora.store.* • DataStore handles object persistence • DataStore methods process objects • Persist • Fetch • Query • Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  8. Apache Gora – Persistency API • Java API – org.apache.gora.persistency.* • Core classes • BeanFactory • Construct keys • Persistent • Persist objects • State • State managed through StateManager • NEW, CLEAN (UNMODIFIED)‏ • DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  9. Apache Gora – Query API • Java API – org.apache.gora.query.* • Core classes • Query • Constructed via DataStore • PartitionQuery • Divide results of Query into partitions. • Run queries on data nodes. • Generate Hadoop InputSplits • Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  10. Apache Gora – MapReduce API • Java API – org.apache.gora.mapreduce.* • GoraMapper • GoraReducer • ALL Record Counter • Reader • Writer • Hadoop / Avro • Serialise • De-serialise • Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  11. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related