1 / 16

Hbase : Hadoop Database

Hbase : Hadoop Database. B. Ramamurthy. Motivation-0. Think about the goal of a typical application today and the data characteristics Application trend: Search  Analytics

mayes
Download Presentation

Hbase : Hadoop Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hbase: Hadoop Database B. Ramamurthy

  2. Motivation-0 • Think about the goal of a typical application today and the data characteristics • Application trend: Search  Analytics • Simple get from a database  provide the primary key  get the row; traditional RDBMS is optimized for this normalized tables multiple indices etc. • NULLs are expensive • Analytics  huge number of rows accessed efficiently  To supply analytic algorithms with big-data  inherently denormalized multiple versions eg. time series • NULLs are typical/norm…very common

  3. Motivation-1 • HDFS itself is “big” • Why do we need “hbase” that is bigger and more complex? • Word count, web logs …are simple compared to web pages…consider what a web crawler encounters… • http://www.cse.buffalo.edu • http://www.math.buffalo.edu/index.shtml

  4. Introduction • Persistence is realized (implemented) in traditional applications using Relational Database Management System (RDBMS) • Relations are expressed using tables and data is normalized • Well-founded in relational algebra and functions • Related data are located together • However social relationship data and network demand different kind of data representation • Relationships are multi-dimensional • Data is by choice not normalized (i.e, inherently redundant) • Column-based tables rather than row-based (Consider Friends relation in Facebook) • Sparse table • Solution is Hbase: Hbase is database built on HDFS

  5. Motivation-2 • Google: GFS  Big Table Colossus • Facebook: HDFSHive Cassandra Hbase • Yahoo: HDFS Hbase • To source a MR workflow and to sink the output of MR workflow; • To organize data for large scale analytics • To organize data for querying • To organize data for warehousing; intelligence discovery • NO-SQL (see salesforce.com) • Compare storing a Bank Account details and a Facebook User Account details

  6. Hbase • Hbase reference : http://hbase.apache.org • Main concept: millions of rows and billions of columns on top of commodity infrastructure (say, HDFS) • Hbase is a data repository for big-data • It can be a source and sink to HDFS workflow • Hbase includes base classes for supporting and backing MR workflows, Pig and Hive as sink as well as source HDFS HBASE HBASE

  7. When to use Hbase? • When you need high volume data to be stored • Un-structured data • Sparse data • Column-oriented data • Versioned data (same data template, captured at various time, time-elapse data) • When you need high scalability (you are generating data from an MR workflow: you need to store sink it somewhere…) • When you have long rows that a table needs to be split within a traditional row…shrading into horizontal partition.

  8. Hbase: A Definitive Guide • By George Lars • Online version available • Also look at http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

  9. Column-based

  10. Hbase Architecture

  11. Data Model • http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html • Table • Row# is some uninterrupted number • Column Families (courses: mth309, courses:cse241) • Region • Region File

  12. Applications: Google Earth Client Htable MR Client Htable HBASE HDFS Operating Sys Hardware

  13. User table Implemented Thru regionserver and regions: Rows, colfam, cols META data -ROOT- Client

  14. Row One row’s data Row Key Column Family Column Family Column Family ….. Column qualifier Column qualifier Column qualifier Column qualifier Column qualifier Timestamp: data Timestamp: data Timestamp: data

  15. Rows Region server1 Region server 2 Region server 3 A Region Keys T-Z Region Keys I-M Region Keys F-I B Region Keys A-C Region Keys M-T Region Keys C-F Z

  16. Big-data application: EMR, healthcare, health exchanges Hbase API RegionServer Master Memstore Write-ahead Log HFile Zookeeper HDFS

More Related