510 likes | 650 Views
HandlerSocket plugin for MySQL. Jun 29, 2010 DeNA Technology Seminar @ Yoyogi IT Platform Dept., System Management Division DeNA Co.,Ltd. Akira Higuchi <higuchi dot akira at dena dot jp>. Who am I?. Akira Higuchi, Ph.D. in science IT Platform Dept., DeNA Co.,Ltd.
E N D
HandlerSocket plugin for MySQL Jun 29, 2010 DeNA Technology Seminar @ Yoyogi IT Platform Dept., System Management Division DeNA Co.,Ltd. Akira Higuchi <higuchi dot akira at dena dot jp>
Who am I? • Akira Higuchi, Ph.D. in science • IT Platform Dept., DeNA Co.,Ltd. • system-wide performance optimization • middleware development • The creator of HandlerSocket plugin • Using GNU/Linux since 1993 • Fedora: yum install KoboDeluxe • Debian: apt-get install kobodeluxe
What is HandlerSocket? • Non-SQL interface for MySQL
What HandlerSocket aims • Executes simple CRUD operations fast • Omit SQL parsing • Combine multiple requests on the server side • Allows SQL on the same database • Only simple operations can be faster • Seamless migration from SQL queries
Offers a direct and non-SQL interface to MySQL storage engines Own TCP/IP listener Talks a text protocol There is a C++ and a Perl client libraries Only works with Linux The source code is here: https://github.com/ahiguti/HandlerSocket-Plugin-for-MySQL More infos on the DeNA Tech Blog http://engineer.dena.jp/ (in Japanese) HandlerSocket plugin
Applications libmysql libhsclient Listener for libmysql SQL Layer Handlersocket Plugin Handler Interface Innodb MyISAM Other storage engines … Construction client app mysqld
Other NoSQL interfaces to MySQL • mycached • http://developer.cybozu.co.jp/kazuho/2009/08/mycached-memcac.html • Works with any storage engines • Talks the memcached protocol • NDB API • http://dev.mysql.com/doc/ndbapi/en/index.html • Dedicated for the ndbcluster engine
Handlersocket executes simple read queries 4x faster than mysqld/libmysql Very effective when many columns are retrieved The reason is described later Performance
Commands supported by HandlerSocket (for reading data) • In pseudo-SQL... SELECT f1, .. , fn FROM db.table WHERE k1, ... , km = v1, ... , vm ORDER BY index_i LIMIT offset, limit • (k1, ... , km) are the key fields (or a prefix) of the index_i • =, >=, >, <=, and < can be used for a comparator
Commands supported by HandlerSocket (for modifying data) • UPDATE, DELETE, and INSERT • Does not support transactions • Modifications are recorded to the binary log in the row-based format • Modifications are durable
Command example • create table db1.table1 (k int key, v char(20)) • insert into db1.table1 values (234, 'foo'), (678, ‘bar’) $ telnet localhost 9998 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. P 0 db1 table1 PRIMARY k,v 0 1 0 = 1 234 0 2 234 foo 0 = 1 678 0 2 678 bar opens the PK find k = 234 find k = 678
Why fast? • No SQL parsing low CPU usage • Executes multiple requests in bulk low CPU/Disk usage • Own client/server protocol small network transmission size
oprofile results – libmysql/mysqld • Executes “SELECT v from table where k = ?” many times samples| %| ------------------ 9669940 53.1574 mysqld 4438098 24.3970 vmlinux 1835976 10.0927 libpthread-2.5.so 1680656 9.2389 libc-2.5.so 397970 2.1877 e1000e 89136 0.4900 oprofiled 42881 0.2357 oprofile
oprofile results – libmysql/mysqld • CPU usage inside mysqld samples % symbol name 748022 7.7355 MYSQLparse(void*) 219702 2.2720 my_pthread_fastmutex_lock 205606 2.1262 make_join_statistics(JOIN*, TABLE_LIST*, 198234 2.0500 btr_search_guess_on_hash 180731 1.8690 JOIN::optimize() 177120 1.8317 row_search_for_mysql 171185 1.7703 lex_one_token(void*, void*) 162683 1.6824 alloc_root 131823 1.3632 read_view_open_now 122795 1.2699 mysql_select(THD*, Item***, TABLE_LIST*, 100276 1.0370 open_table(THD*, TABLE_LIST*, st_mem_root*, 99575 1.0297 mem_pool_fill_free_list 96434 0.9973 build_template(row_prebuilt_struct*, THD*, 86349 0.8930 get_hash_symbol(char const*, unsigned int,
oprofile results – libmysql/mysqld • CPU usage inside the Linux kernel samples % symbol name 204393 4.6054 schedule 118648 2.6734 tcp_sendmsg 115832 2.6099 tcp_recvmsg 106537 2.4005 tcp_v4_rcv 103915 2.3414 tcp_ack 103534 2.3328 system_call 93864 2.1150 dev_queue_xmit 86831 1.9565 __mod_timer 85891 1.9353 tcp_rcv_established 84083 1.8946 .text.task_rq_lock
oprofile results – libmysql/mysqld • libmysql/mysqld • Much CPU time spent in mysqld • Parsing SQL is slow • schedule() is called frequently
oprofile results – HandlerSocket • CPU usage inside MySQL with HandlerSocket samples| %| ------------------ 1919039 51.0453 vmlinux 811998 21.5987 mysqld 421215 11.2041 libpthread-2.5.so 207166 5.5105 e1000e 191566 5.0955 handlersocket.so 188618 5.0171 libc-2.5.so 13622 0.3623 oprofiled 5707 0.1518 oprofile
oprofile results – HandlerSocket • CPU consumption in mysqld samples % symbol name 119684 14.7394 btr_search_guess_on_hash 58202 7.1678 row_search_for_mysql 46946 5.7815 mutex_delay 38617 4.7558 my_pthread_fastmutex_lock 37707 4.6437 buf_page_get_known_nowait 36528 4.4985 rec_get_offsets_func 34625 4.2642 build_template(row_prebuilt_struct*, THD*, TABLE*, 20024 2.4660 row_sel_store_mysql_rec 19347 2.3826 btr_cur_search_to_nth_level 16701 2.0568 row_sel_convert_mysql_key_to_innobase 13343 1.6432 cmp_dtuple_rec_with_match 11381 1.4016 ha_innobase::index_read(unsigned char*, 11176 1.3764 dict_index_copy_types 10762 1.3254 mtr_memo_slot_release 10734 1.3219 ha_innobase::init_table_handle_for_HANDLER()
oprofile results – HandlerSocket • CPU consumption in the Linux kernel samples % symbol name 129038 6.7241 tcp_sendmsg 80080 4.1729 tcp_v4_rcv 69658 3.6298 dev_queue_xmit 66171 3.4481 .text.skb_release_data 63316 3.2994 __qdisc_run 60279 3.1411 tcp_recvmsg 59703 3.1111 ip_output 58462 3.0464 .text.skb_release_head_state 48876 2.5469 tcp_ack 48733 2.5394 __alloc_skb 45660 2.3793 ip_queue_xmit 44671 2.3278 tcp_transmit_skb
oprofile results – HandlerSocket • HandlerSocket • Most CPU time is consumed in the kernel • schedule() is not called frequently • Inside mysqld, innodb eats most CPU time
Threading model mysqld: • Thread per connection (MySQL 5) • Thread pooling (MySQL 6?)
Threading model HandlerSocket: • Small number of threads • Many connections per thread • Uses epoll() • Virtually unlimited number of concurrent connections • Small memory footprint
HandlerSocket reader thread handlersocket reader thread reads requests from many clients locks the DB, gets a read view locks/unlocks (1/#conns) times per request executes many requests unlocks the DB returns responses to clients
HandlerSocket writer thread handlersocket writer thread reads requests from many clients locks the DB, begins a transaction executes multiple ops in a single transaction executes multiple requests commits, and unlocks the DB returns responses to clients
Write throughput • Condition: • Durable write • sync_binlog = 1 • innodb_flush_log_at_trx_commit = 1 • innodb_support_xa = 1 • Write-back cache with BBU, or SSD • Throughput: • MySQL: up to 1000 qps • HandlerSocket: up to 30000 qps
How HandlerSocket locks tables • MyISAM: • Shared-exclusive lock • InnoDB: • Reader threads don’t block • Only one writer thread can be executed at the same time • HandlerSocket requests are deadlock-free • Only simple operations are supported
MySQL C/S protocol write(3, "L\0\0\0\3select column0,column1,column2,column3,column4 from db_1.table_1 where k=15", 80) = 80 read(3, "\1\0\0\1\0056\0\0\2\3def\4db_1\7table_1\7table_1\7column0\7column0\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\3\3def\4db_1\7table_1\7table_1\7column1\7column1\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\4\3def\4db_1\7table_1\7table_1\7column2\7column2\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\5\3def\4db_1\7table_1\7table_1\7column3\7column3\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\6\3def\4db_1\7table_1\7table_1\7column4\7column4\f\r\0<\0\0\0\375\200\0\0\0\0\5\0\0\7\376\0\0\"\0\n\0\0\10\0010\0011\0012\0013\0014\5\0\0\t\376\0\0\"\0", 16384) = 327 SELECT column0, column1, column2, column3, column4 FROM db_1.table_1 where k = 15 when the above query is executed...
HandlerSocket C/S protocol when an equivalent query is executed using handlersocket... write(3, "1\t=\t1\t15\n", 9) = 9 read(3, "0\t5\t0\t1\t2\t3\t4\n", 8192) = 14
MySQL C/S protocol • The strace result shows that MySQL C/S protocol is verbose • Result-set metadata http://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#Field_Packet • Result-set metadata become very large if a result-set has many columns • Neither a HANDLER statement nor a server-side prepared statement does not help to avoid this problem
libhsclient • Client library for C++
Net::HandlerSocket • Client library for Perl • Invokes libhsclient via XS my $cli = new Net::HandlerSocket( {host => ‘localhost’, port => 9999}); $cli->open_index(1, ‘db1’, ‘table1’, ‘PRIMARY’, ‘k,v’); my $res = $cli->exec_multi([ [ 1, ‘=‘, [ ’33’ ], 1, 0 ], [ 1, ‘=‘, [ ’44’ ], 1, 0, ‘U’, [ ’44’, ‘hoge’ ] ], [ 1, ‘>=‘, [ ’55’ ], 10, 20 ], ]);
HandlerSocket configuration options • handlersocket_threads = 16 • Number of reader threads • Recommended value is the number of logical CPU • handlersocket_thread_wr = 1 • Number of writer threads • Recommended value is ... 1 • handlersocket_port = 9998 • Listening port for reader requests • handlersocket_port_wr = 9999 • Listening port for writer requests
Other configuration options • innodb_buffer_pool_size • As large as possible • innodb_log_file_size, innodb_log_files_in_group • As large as possible • innodb_thread_concurrency = 0 • open_files_limit = 65535 • Number of file descriptors mysqld can open • HandlerSocket can handle up to 65000 concurrent connections
Other configuration options • innodb_adaptive_hash_index = 1 • Adaptive has index is fast, but consume memory
Options related to durability • sync_binlog = 1 • innodb_flush_log_at_trx_commit = 1 • innodb_support_xa = 1
Benchmark • Server: • Core2Quad Q6600 • CentOS 5.4 • Single EXPI9301CT(e1000e) • Single Intel X25-E (write-back cache disabled) • Schema: • CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = INNODB; • Read benchmark: • 10000000 records • SELECT v from table1 where k = ? • Random access • Write benchmark: • 10000000 records • UPDATE table SET v = ? where k = ? • Random access • Durable write • sync_binlog = 1 • innodb_flush_log_at_trx_commit = 1 • innodb_support_xa = 1
Issues • Difficult to build • Requires the source of mysql • MySQL binary compatibility?