Seven Databases in Seven Weeks

Seven Databases in Seven Weeks HBase

HBase HDFS (Hadoop Distributed File System) DFS Server

7つのデータベース７つの世界　での構成 １日目：CRUDとテーブル管理スタンドアロンでHbaseを動かすテーブルを作るデータの出し入れをする２日目：ビッグデータを扱う Wikipedia ダンプを投入するスクリプト (Not Shell) での操作に慣れる３日目：クラウドに持っていく Thrift を使って操作する今回は扱いません今回は扱いません Whirr を使って EC2 にデプロイする

HBaseの特徴 自動シャーディング・自動フェールオーバーテーブルサイズが大きくなった時、自動的に分割する分割されたシャードは、ノード障害時に自動的にフェールオーバーするデータの一貫性 (CAP:Consistency) データの更新は反映された瞬間から読出可能結果的に同じ値が読めるようになる（結果整合性）条件緩和を取らない Hadoop/HDFS 統合 Hadoopの HDFS 上に展開できる Hadoop/MapReduceでAPIを挟まず HBaseを入出力の対象にできる各種インタフェース Java Native API の他、 Thrift , REST API から利用可能

１日目：HBaseをスタンドアロンで展開する 実行コマンド [root@HBase01 ask]# cd /opt/ [root@HBase01 opt]# wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/hbase/hbase-0.94.7/hbase-0.94.7.tar.gz [root@HBase01 opt]# tar zxvf hbase-0.94.7.tar.gz [root@HBase01 opt]# vi hbase-0.94.7/conf/hbase-site.xml hbase-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>file:///var/files/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/var/files/zookeeper</value> </property> </configuration> ファイル実体配置 /var /files /hbase /zookeeper 単体で可動するための最小限の設定ファイル設置先の指定で、任意のディレクトリを書き出し先に指定する xmlで指定できる全項目 : src/main/resources/hbase-default.xml

１日目：HBaseをスタンドアロンで展開する JDKが要求される [root@HBase01 opt]# hbase-0.94.7/bin/start-hbase.sh • +======================================================================+ • | Error: JAVA_HOME is not set and Java could not be found | • +----------------------------------------------------------------------+ • | Please download the latest Sun JDK from the Sun Java web site | • | > http://java.sun.com/javase/downloads/ < | • | | • | HBase requires Java 1.6 or later. | • | NOTE: This script will find Sun Java whether you install using the | • | binary or the RPM based installer. | • +======================================================================+ JDKのバリエーション（以下から選んで導入） Javaのインストールディレクトリを指定 [root@HBase01 opt]# vi hbase-0.94.7/conf/hbase-env.sh - # export JAVA_HOME=/usr/java/jdk1.6.0/ + export JAVA_HOME=/usr/java/latest/

１日目：HBaseをスタンドアロンで展開する 起動 [root@HBase01 opt]# hbase-0.94.7/bin/start-hbase.sh starting master, logging to /opt/hbase-0.94.7/bin/../logs/hbase-root-master-HBase01.db.algnantoka.out シェル接続 [root@HBase01 opt]# hbase-0.94.7/bin/hbase shell HBase Shell; enter 'help<RETURN>'for list of supported commands. Type "exit<RETURN>"to leave the HBase Shell Version 0.94.7, r1471806, Wed Apr 2418:48:26 PDT 2013 hbase(main):001:0> status 1 servers, 0 dead, 2.0000 average load 停止 [root@HBase01 opt]# hbase-0.94.7/bin/stop-hbase.sh stopping hbase...........

１日目：HBaseの使い方 テーブル作成 : create hbase(main):009:0> help "create" Createtable; pass table name, a dictionary of specifications per column family, and optionally a dictionary oftable configuration. Dictionaries are described below in the GENERAL NOTES section. Examples: hbase> create't1', {NAME => 'f1', VERSIONS => 5} hbase> create't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create't1', 'f1', 'f2', 'f3‘ hbase> create't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} hbase> create't1', 'f1', {SPLITS => ['10', '20', '30', '40']} hbase> create't1', 'f1', {SPLITS_FILE => 'splits.txt'} hbase> # Optionally pre-split the table into NUMREGIONS, using hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname) hbase> create't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'} 基本型 Create ‘TableName’ , {NAME => ‘ColumnFamilyName’, Option => Value …} … 省略表記 Create ‘TableName’ , ‘ColumnFamilyName’, …

１日目：HBaseの使い方 レコード挿入 : put hbase(main):010:0> help "put" Put a cell 'value' at specified table/row/columnand optionally timestamp coordinates. To put a cell value intotable't1' at row'r1' under column'c1' marked with the time'ts1', do: hbase> put 't1', 'r1', 'c1', 'value', ts1 SampleTable: create ‘SampleTable’ , ‘color’ , ‘shape’ put ‘SampleTable’ , ‘first’ , ‘color:red’ , ‘#F00’ put ‘SampleTable’ , ‘first’ , ‘color:blue’ , ‘#00F’ put ‘SampleTable’ , ‘first’ , ‘color:yellow’ , ‘#FF0’

１日目：HBaseの使い方 hbase(main):011:0> help "get" Get rowor cell contents; pass table name, row, and optionally a dictionary ofcolumn(s), timestamp, timerange and versions. Examples: hbase> get 't1', 'r1‘ hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} hbase> get 't1', 'r1', {COLUMN => 'c1'} hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4} hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4} hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"} hbase> get 't1', 'r1', 'c1‘ hbase> get 't1', 'r1', 'c1', 'c2‘ hbase> get 't1', 'r1', ['c1', 'c2'] レコード取得 : get SampleTable get ‘SampleTable’ , ‘first’ get ‘SampleTable’ , ‘first’, ‘color’ get ‘SampleTable’ , ‘first’, ‘color:blue’

１日目：HBaseの使い方 レコード検索: scan • hbase(main):001:0> help 'scan' • Scan a table; pass table name and optionally a dictionary of scanner • specifications. Scanner specifications may include one or more of: • TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, • or COLUMNS, CACHE • If no columns are specified, all columns will be scanned. • To scan all members of a column family, leave the qualifier empty as in • 'col_family:'. • The filter can be specified in two ways: • 1. Using a filterString - more information on this is available in the • Filter Language document attached to the HBASE-4176 JIRA • 2. Using the entire package name of the filter. • Some examples: • hbase> scan '.META.' • hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} • hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} • hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} • hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"} • hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} • For experts, there is an additional option -- CACHE_BLOCKS -- which • switches block caching for the scanner on (true) or off (false). By • default it is enabled. Examples: • hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} • Also for experts, there is an advanced option -- RAW -- which instructs the • scanner to return all cells (including delete markers and uncollected deleted • cells). This option cannot be combined with requesting specific COLUMNS. • Disabled by default. Example: • hbase> scan 't1', {RAW => true, VERSIONS => 10}

１日目：HBaseの使い方 TimeStamp put ‘table’ , ‘first’ , ‘color:red’ , ‘#FFF‘ put ‘table’ , ‘first’ , ‘color:red’ , ‘#000' put ‘table’ , ‘first’ , ‘color:red’ , ‘#0F0‘ put ‘table’ , ‘first’ , ‘color:red’ , ‘#00F' put ‘table’ , ‘first’ , ‘color:red’ , ‘#F00' ‘first’ , ‘color:red’ get ‘table’ , ‘first’ , ‘color:red’ timestamp 5 #F00 timestamp 4 get ‘table’ , ‘first’ , {COLUMN=>‘color:red’ , TIMESTAMP=>4} #00F timestamp 3 #0F0 get ‘table’ , ‘first’ , {COLUMN=>‘color:red’ , VERSIONS=>4} timestamp 2 #000 timestamp 1 #FFF

１日目：HBaseの使い方 スキーマ変更: alter hbase(main):009:0> disable 'table1' 0row(s) in2.5190 seconds hbase(main):010:0> get 'table1', 'first','color:red' COLUMN CELL ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: table1 is disabled. hbase(main):012:0> alter'table1' , { NAME => 'color', VERSIONS => 10} Updating all regions with the new schema... 1/1 regions updated. Done. 0row(s) in1.3630 seconds hbase(main):014:0> enable 'table1' 0row(s) in2.3000 seconds alter の対象 Table はオフラインでなければならない保持するバージョン数の変更 alter によるスキーマ変更の手順は以下新たなスキーマの空テーブルを作る元テーブルからデータを複製する元テーブルを破棄する高コストなので、原則スキーマ変更（ ColumnFamilyの変更）は行わない

１日目：HBaseの使い方 JRubyスクリプティング hoge.rb include Java import org.apache.hadoop.hbase.client.HTable import org.apache.hadoop.hbase.client.Put import org.apache.hadoop.hbase.HBaseConfiguration defjbytes(*args) args.map { |arg| arg.to_s.to_java_bytes } end table = HTable.new( HBaseConfiguration.new, "table1" ) p = Put.new( *jbytes( "third" ) ) p.add( *jbytes( "color", "black", "#000" ) ) p.add( *jbytes( "shape", "triangle", "3" ) ) p.add( *jbytes( "shape", "square", "4" ) ) table.put( p ) hbase関係の Javaクラスレコード挿入タイミング実行 [root@HBase01 opt]# hbase-0.94.7/bin/hbase shell hoge.rb hbase(main):002:0> get 'table1', 'third' ,{COLUMN => ['color','shape']} COLUMN CELL color:black timestamp=1369049856405, value=#000 shape:square timestamp=1369049856405, value=4 shape:triangle timestamp=1369049856405, value=3 9row(s) in0.0870 seconds レコードのtimestampが揃う hbase shell は JRubyインタプリタを拡張したものなので、JRubyが実行できる

Hbaseとは何か Googleの内部システム（発表した論文より） Hadoopプロジェクト（Googleクローン）バッチ処理リアルタイム応答 MapReduce BigTable MapReduce HBase Google File Sytem (GFS) Hadoop Distributed File Sytem (HDFS)

BigTable(ソート済列志向データベース) スキーマで定義するスキーマレス（自由に追加できる）必須ソート済あるColumn timestamp 5 #F00 timestamp 4 #00F timestamp 3 タイムスタンプでバージョニングされる #0F0 timestamp 2 #000 timestamp 1 #FFF

BigTable(ソート済列志向データベース) リージョンリージョンリージョンリージョンリージョンリージョン • テーブルはリージョンで物理的に分割（シャーディング）される • リージョンはクラスタ中のリージョンサーバが担当する • リージョンは ColumnFamily毎に作られる • リージョンはソート済のRowKeyを適当なサイズで分割する

BigTable(ソート済列志向データベース) テーブルスキーマの初期設計超重要 ColumnFamilyはむやみに増やさない　→　Columnの追加で極力対応 RowKeyは連続アクセスが起きやすい形にしておく ColumnやColumnFamilyを条件にして検索する構造を取らない • テーブルはリージョンで物理的に分割（シャーディング）される • リージョンはクラスタ中のリージョンサーバが担当する • リージョンは ColumnFamily毎に作られる • リージョンはソート済のRowKeyを適当なサイズで分割する

HBaseの特徴 自動シャーディング・自動フェールオーバーテーブルサイズが大きくなった時、自動的に分割する分割されたシャードは、ノード障害時に自動的にフェールオーバーするデータの一貫性 (CAP:Consistency) データの更新は反映された瞬間から読出可能結果的に同じ値が読めるようになる（結果整合性）条件緩和を取らない Hadoop/HDFS 統合 Hadoopの HDFS 上に展開できる Hadoop/MapReduceでAPIを挟まず HBaseを入出力の対象にできる

HBaseの特徴　を構成する要素 自動シャーディング・自動フェールオーバーリージョンの自動分割？？データの一貫性 (CAP:Consistency) ？？ Hadoop/HDFS 統合 HDFS : GFSクローン Hbase : BigTableクローン

HBaseの特徴　を構成する要素 自動フェールオーバー・データの一貫性 (CAP:Consistency) HDFS ローカルストアオンメモリストア Read WAL Region Write Master Server Region Server replicate ローカルストア WAL Region ZooKeeper Region Server (フェールオーバー先)

２日目：Wikipediaのデータを扱う 力尽きた＼(＾0＾)／

Scan にかかる秒数

Seven Databases in Seven Weeks

Seven Databases in Seven Weeks

Presentation Transcript

SEVEN

SEVEN

Seven Letters to Seven Churches

Seven

Seven Letters to Seven Churches

Seven Letters to Seven Churches

SEVEN

Seven Letters to Seven Churches

SEVEN

SEVEN

SEVEN

SEVEN

SEVEN

seven

chapter seven

seven

Seven Awesome Weeks

Seven Marriages from Seven Churches