530 likes | 669 Views
Using Q4M a message queue storage engine for MySQL. Cybozu Labs, Inc. Kazuho Oku. Background. Who am I?. Name: Kazuho Oku ( 奥 一穂 ) Original Developer of Palmscape / Xiino The oldest web browser for Palm OS Working at Cybozu Labs since 2005
E N D
Using Q4Ma message queue storage engine for MySQL Cybozu Labs, Inc. Kazuho Oku
Background Using Q4M
Who am I? • Name: Kazuho Oku (奥一穂) • Original Developer of Palmscape / Xiino • The oldest web browser for Palm OS • Working at Cybozu Labs since 2005 • Research subsidiary of Cybozu, Inc. in Japan Using Q4M
About Cybozu, Inc. • Japan’s largest groupware vendor • Mostly provides as software products, not as services • Some of our apps bundle MySQL as storage Using Q4M
About Pathtraq • Started in Aug. 2007 • Web ranking service • One of Japan’s largest • like Alexa, but semi-realtime, and per-page • running on MySQL • Need for a fast and reliable message relay • for communication between the main server and content analysis server(s) Using Q4M
Design Goals of Q4M • Robust • Do not lose data on OS crash or power failure • Fast • Transfer thousands of messages per second • Easy to Use • Use SQL for access / maintenance • Integration into MySQL • no more separate daemons to take care of Using Q4M
What is a Message Queue? Using Q4M
What is a Message Queue? • Middleware for persistent asynchronous communication • communicate between fixed pairs (parties) • a.k.a. Message Oriented Middleware • MQ is intermediate storage • RDBMS is persistent storage • Senders / receivers may go down Using Q4M
Minimal Configuration of a MQ Queue • Senders and receivers access a single queue Sender Receiver Using Q4M
MQ and Relays Queue Queue • Separate queue for sender and receiver • Messages relayed between queues Sender Receiver Relay Using Q4M
Merits of Message Relays • Destination can be changed easily • Relays may transfer messages to different locations depending on their headers • Robustness against network failure • no loss or duplicates when the relay fails • Logging and Multicasting, etc. Using Q4M
Message Brokers • Publish / subscribe model • Separation between components and their integration • Components read / write to predefined queues • Integration is definition of routing rules between the message queues • Messages are often transformed (filtered) within the relay agent Using Q4M
What about Q4M? • Q4M itself is a message queue • Can connect Q4M instances to create a message relay • Provides API for creating message relays and brokers Using Q4M
Performance of Q4M • over 7,000 mess/sec. • message size: avg. 512 bytes • syncing to disk • Outperforming most needs • if you need more, just scale out • Can coexist with other storage engines without sacrificing their performance see http://labs.cybozu.co.jp/blog/kazuhoatwork/2008/06/q4m_06_release_and_benchmarks.php Using Q4M
Applications of Q4M Using Q4M
Asynchronous Updates • Mixi (Japan's one of the largest SNS) uses Q4M to buffer writes to DB, to offload peak demands from http://alpha.mixi.co.jp/blog/?p=272 Using Q4M
Connecting Distant Servers • Pathtraq uses Q4M to create a relay between its database and content analysis processes → Contents to be analyzed → Pathtraq DB Content Analysis Processes MySQL conn. over SSL,gzip ← Results of the analysis ← Using Q4M
To Prefetch Data • livedoor Reader (web-based feed aggregator) uses Q4M to prefetch data from database to memcached • uses Q4M for scheduling web crawlers as well from http://d.hatena.ne.jp/mala/20081212/1229074359 Using Q4M
Scheduling Web Crawlers Request Queue Retry Queue • Web crawlers with retry-on-error • Sample code included in Q4M dist. If failed to fetch, store URL in retry queue Store Result Spiders Read URL URL DB Re- scheduler Using Q4M
Delayed Content Generation • Hatetter(RSS feed-to-twitter-API gateway) uses Q4M to delay content generation • Source code: github.com/yappo/website-hatetter Using Q4M
User Notifications Queue(s) • For sending notifications from web services DB App. Logic SMTP Agent IM Agent Using Q4M
Installing Q4M Using Q4M
Installing Q4M • Compatible with MySQL 5.1 • Download from q4m.31tools.com • Binary releases available for some platforms • Installing from source: • requires source code of MySQL • ./configure && make && make install • run support-files/install.sql Using Q4M
Configuration Options of Q4M • --with-sync=no|fsync|fdatasync|fcntl • Controls synchronization to disk • default: fdatasync on linux • --enable-mmap • Mmap’ed reads lead to higher throughput • default: yes • --with-delete=pwrite|msync • msyncrecommended on linux>=2.6.20 if you need really high performance Using Q4M
Q4M Basics Using Q4M
The Model Q4M table • Various publishers write to queue • Set of subscribers consume the entries in queue Publisher Publisher Subscribers Publisher Using Q4M
Creating a Q4M Table • ENGINE=QUEUE creates a Q4M table • No primary keys or indexes • Sorted by insertion order (it’s a queue) mysql> CREATE TABLE qt ( -> id int(10) unsigned NOT NULL, -> message varchar(255) NOT NULL -> ) ENGINE=QUEUE; Query OK, 0 rows affected (0.42 sec) Using Q4M
Modifying Data on a Q4M Table • No restrictions for INSERT and DELETE • No support for UPDATE mysql> INSERT INTO qt (id,message) -> VALUES -> (1,'Hello'), -> (2,'Bonjour'), -> (3,'Hola'); Query OK, 3 rows affected (0.02 sec) mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+ 3 rows in set (0.00 sec) Using Q4M
SELECT from a Q4M Table • Works the same as other storage engines • SELECT COUNT(*) is cached mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+ 3 rows in set (0.00 sec) mysql> SELECT COUNT(*) FROM qt; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec) How to subscribe to a queue? Using Q4M
Calling queue_wait() • After calling, only one row becomes visible from the connection mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+ 3 rows in set (0.00 sec) mysql> SELECT queue_wait('qt'); +------------------+ | queue_wait('qt') | +------------------+ | 1 | +------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | +----+---------+ 1 row in set (0.00 sec) Using Q4M
OWNER Mode and NON-OWNER Mode • In OWNER mode, only the OWNED row is visible • OWNED row becomes invisible from other connections • rows of other storage engines are visible NON-OWNER Mode 1,'Hello' 2,'Bonjour' 3,'Hola' OWNER Mode 1,'Hello' queue_wait() queue_end() queue_abort() Using Q4M
Returning to NON-OWNER mode • By calling queue_abort, the connection returns to NON-OWNER mode mysql> SELECT QUEUE_ABORT(); +---------------+ | QUEUE_ABORT() | +---------------+ | 1 | +---------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+ 3 rows in set (0.01 sec) Using Q4M
Consuming a Row • By calling queue_end, the OWNED row is deleted, and connection returns to NON-OWNER mode mysql> SELECT queue_wait('qt'); (snip) mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 1 | Hello | +----+---------+ 1 row in set (0.01 sec) mysql> SELECT queue_end(); +-------------+ | queue_end() | +-------------+ | 1 | +-------------+ 1 row in set (0.01 sec) mysql> SELECT * FROM qt; +----+---------+ | id | message | +----+---------+ | 2 | Bonjour | | 3 | Hola | +----+---------+ 2 rows in set (0.00 sec) Using Q4M
Writing a Subscriber • Call two functions: queue_wait, queue_end • Multiple subscribers can be run concurrently • each row in the queue is consumed only once while (true) { SELECT queue_wait('qt'); # switch to owner mode rows := SELECT * FROM qt; # obtain data if (count(rows) != 0) # if we have any data, then handle_row(rows[0]); # consume the row SELECT queue_end(); # erase the row from queue } Using Q4M
Writing a Subscriber (cont'd) • Or call queue_wait as a condition • Warning: conflicts with trigger-based insertions while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Using Q4M
The Model – with code Q4M table while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Publisher INSERT INTO queue ... INSERT INTO queue ... INSERT INTO queue ... Publisher Subscribers Publisher Using Q4M
Three Functions in Detail Using Q4M
queue_wait(table) • Enters OWNER mode • 0〜1 row becomes OWNED • Enters OWNER mode even if no rows were available • Default timeout: 60 seconds • Returns 1 if a row is OWNED (0 on timeout) • If called within OWNER mode, the owned row is deleted Using Q4M
Revisiting Subscriber Code • Calls to queue_end just before queue_wait can be omitted while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end(); } Using Q4M
Conditional queue_wait() • Consume rows of certain condition • Rows that do not match will be left untouched • Only numeric columns can be checked • Fast - condition tested once per each row examples: SELECT queue_wait('table:(col_a*3)+col_b<col_c'); SELECT queue_wait('table:retry_count<5'); Using Q4M
queue_wait(tbl_cond,[tbl_cond…,timeout]) • Accepts multiple tables and timeout • Data searched from leftmost table to right • Returns table index (the leftmost table is 1) of the newly owned row • Returns zero if no rows are being owned example: SELECT queue_wait('table_A','table_B',60); Using Q4M
Functions for Exiting OWNER Mode • queue_end • Deletes the owned row and exits OWNER mode • queue_abort • Releases (instead of deleting) the owned row and exits OWNER mode • Close of a MySQL connection does the same thing Using Q4M
Relaying and Routing Messages Using Q4M
The Problem Q4M Table (source) Q4M Table (dest.) • Relay (or router) consists of more than 3 processes, 2 conns • No losses, no duplicates on crash or disconnection Relay Program Using Q4M
Internal Row ID • Every row have a internal row ID • invisible from Q4M table definition • monotonically increasing 64-bit integer • Used for detecting duplicates • Use two functions to skip duplicates • Data loss prevented by using queue_wait / queue_end Using Q4M
queue_rowid() • Returns row ID of the OWNED row (if any) • Returns NULL if no row is OWNED • Call when retrieving data from source Using Q4M
queue_set_srcid(src_tbl_id, mode, src_row_id) • Call before inserting a row to destination table • Checks if the row is already inserted into the table, and ignores next INSERT if true • Parameters: • src_tbl_id - id to determine source table (0〜63) • mode - "a" to drop duplicates, "w" to reset • src_row_id - row ID obtained from source table Using Q4M
Pseudo Code • Relays data from src_tbl to dest_tbl while (true) { # wait for data SELECT queue_wait(src_tbl) => src_db; # read row and rowid row := (SELECT * FROM src_tbl => src_db); rowid := (SELECT queue_rowid() => src_db); # insert the row after setting srcid SELECT queue_set_srcid(src_tbl_id, 'a', rowid) => dest_db; INSERT INTO dest_tbl (row) => dest_db; } Using Q4M
q4m-forward • Simple forwarder script • installed into mysql-dir/bin usage: q4m-forward [options] src_addrdest_addr example: % support-files/q4m-forward \ "dbi:mysql:database=db1;table=tbl1;user=foo;password=XXX" \ "dbi:mysql:database=db2;table=tbl2;host=bar;user=foo" options: --reset reset duplicate check info. --sender=idx slot no. used for checking duplicates (0..63, default: 0) --help Using Q4M
Limitations and the Future of Q4M Using Q4M