120 likes | 327 Views
HBase. Presented by Chintamani Siddeshwar Swathi Selvavinayakam. http:// www.slideshare.net / amansk /hbase-hadoop-day-seattle-4987041. HBase. Open source BigTable HDFS as underlying DFS ZooKeeper as lock service Tight integration with Hadoop MapReduce. Why HBase ?.
E N D
HBase Presented by ChintamaniSiddeshwar SwathiSelvavinayakam http://www.slideshare.net/amansk/hbase-hadoop-day-seattle-4987041
HBase • Open source BigTable • HDFS as underlying DFS • ZooKeeperas lock service • Tight integration with HadoopMapReduce
Why HBase ? • Scales out to thousands of nodes • Access granularity is a row – read/write to a single row is atomic • Designed for workloads consisting of simple operations on individual items • Provides efficient access to random rows • Allows dynamic repartitioning of data
Data Model • Sparse • Distributed • multi dimensional • persistent • Sorted • map • (row, column, timestamp) -> cell • Column = Column Family : Column Qualifier
Other Features • Compression • In memory column families • Multiple masters • Rolling restart • Bloom filters • Efficient bulk loads • Source and sink for Hive, Pig, Cascading
Use Cases • Mozilla • Yahoo! • Twitter • Facebook • Adobe
HBase v/s RDBMS • Column Oriented • Flexible schema, add columns on the fly • Good with sparse tables • No query language • De-normalize your data • No transactions • Row Oriented ( mostly) • Fixed schema • Not optimized for sparse tables • SQL • Normalize as you can • Transactional
Related Chapters • Big Data • Data Modelling
References • http://ofps.oreilly.com/titles/9781449396107/intro.html • http://wiki.apache.org/hadoop/Hbase/DataModel • http://www.slideshare.net/amansk/hbase-hadoop-day-seattle-4987041 • http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf