320 likes | 391 Views
Ten Thousand Tables Can’t Be Wrong…. Richard Chart – Co-Founder, ScienceLogic, LLC. rchart@sciencelogic.com. www.sciencelogic.com. This Presentation is about…. An unusual approach to application scaling using MySQL
E N D
Ten Thousand Tables Can’t Be Wrong… Richard Chart – Co-Founder, ScienceLogic, LLC rchart@sciencelogic.com www.sciencelogic.com
This Presentation is about… • An unusual approach to application scaling using MySQL • The design choices we made early in product design, and how they look a few years later • A few hard earned lessons in what to look for as an application grows beyond its roots
…Covering • The decision path that led to using MySQL in this way • The growth path that got us to current scale • Along the way I will talk about: • What we discovered about MySQL write performance • While real world monitoring is important… understanding how to interpret the results is even more so
In the beginning… • …or at least the beginning for ScienceLogic: 2003 • Product concept: IT Management Appliance • A series of choices - fundamental technology for each layer: • Hardware • Operating System • Application Architecture • Database technology & architecture Sweet Layers – Shirley Shelton
Technology Choices • Hardware/OS/architecture – for another day • Database Technology: MySQL • Performance • Stability • Experience • Support (increasingly important)
Scale • Starting point: single appliance supporting 200 devices • Goal: extensible architecture with - • very deep host and application monitoring • 1,000+ devices, each with 100+ management points • Current reality – • several times the scale of the original goal • Future: The fundamentals are sound: next generation of the product moves up another order of magnitude
Database Architecture • The challenge: • Wide range of monitored nodes (100…1,000+) • Wide range of data points (ping…hundreds of HW, OS and application metrics) • Distributed collection (WAN: latency, security concerns) • Commercial product: MySQL dual licensing comes into play
Database Architecture • Data Characteristics: • 90+% Writes • ACID not important • Resilient to loss • (Remaining data is not invalidated by gaps) • Data elements valuable by themselves • Data elements much more valuable when • relationships added
The MyISAM Fit • Very fast writes when no lock contention • Simple data handling • Lower license cost that InnoDB (important only for those of us shipping commercial products) • Not transactional – and we don’t care! Not InnoDB (all the opposite reasons) • Slower to stuff with data • More expensive (for us commercial folks)
What about memory tables? • Limited applicable areas in EM7: most of the data has to live on disk for reporting weeks or months in future • No measurable benefit over MyISAM in transient data areas where we could use them in EM7 • Because we take advantage of MyISAM cached indexes and the required data is in the index • …but the application continues to evolve, we will use them in future if the right situation occurs
Scaling without lock contention • The ace in the hole • Dynamically created tables • No more than one thread writing to a table at once • Separate thread consolidates data for reporting across devices • This approaches the sharding architecture used in highly scalable web sites, but with core data stored centrally
Dynamic Table Creation • NO_TABLE = 1146 # MySQL error code • try: • db.execute(“INSERT INTO dynamic_app_%s.stat_%s • VALUES (10, ’sample data’)”, (app, device)) • except MySQLdb.Error, e: • if e.args[0] == NO_TABLE: • db.execute(“CREATE DATABASE IF NOT EXISTS dynamic_app_%s”, (app)) • db.execute(“CREATE TABLE IF NOT EXISTS dynamic_app_%s.stat_%s • LIKE dynamic_app_0.stat_0”,(app, device))
How far does this go? • So Far: • 20,000+ tables • 2,200+ queries per second • 5 billion rows • 93% writes • Next limit is how quickly data can be stuffed onto disk
The Database Platform • 4 x Intel Xeon Dual Core 7140M, 16MB Cache, 3.4GHz, 800MHz FSB • 16GB (16 x 1GB) 400MHz Single Ranked DIMMs • Hardware RAID Controller • 10 x 146GB 15k RPM SAS Drives (RAID10) • Linux 2.6 kernel • MySQL 5.0.x
Managing Performance With Growth • As usage rates escalate, things that once were fine become an issue…eg: • Query Cache entries purged due to too many entries, or too many changes to underlying tables • Lock contention • Sort data set size causing increased • created_tmp_disk_tables rather than created_tmp_tables
Open Files & Tables • Critical when scaling this way • show global status like 'open%' • Open Files (Linux) • /etc/security/limits.conf • mysql hard nofile 20480 • mysql soft nofile 20480 • /etc/my.cnf • [safe_mysqld] • open_files_limit = 20480 • [mysqld] • table_cache=8192
Misc MyISAM Tuning Helpers • concurrent_insert = 2 • 1 (Default) Enables concurrent insert for MyISAM tables that don't have holes • 2 Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole, new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole. • myisam_recover = QUICK, BACKUP
Trends • Very important to measure DB stats over time • show global status like ‘opened_tables'; • Point in time counter useless in its own right • …very valuable with 5 minute poll and graphed deltas • Some stats must be combined to be useful • Percentage of requests waiting for locks (deltas) (table_locks_waited/(table_locks_waited + table_locks_immediate) * 100
Monitoring • MySQL Enterprise Monitor • Very worthwhile tool – take advantage of it if you subscribe to MySQL Enterprise support • Other Tools • EM7 monitors databases (as well as servers, routers, firewalls, etc. etc.), so of course we use that…
Some monitored changes are obviously bad… eg increase in created_tmp_disk_tables Monitoring Caveat! • Some monitors are misleading… …what’s going on here?
What really happened: • Misconfigured clients caused the CPU load • created_tmp_tables unconnected with the CPU load • EXPLAIN showed small row sets being sorted in memory tmp tables
Summarizing & Pruning Data • Keeping the source statistics tables small is key for ongoing performance – • Summarized data for reporting • Infrequent writes, regular reads - MyISAM fine here also in most cases • In EM7 we summarize hourly, daily, monthly etc. • Retention periods configurable • DELETEs suck MySQL performance… • can use a deleted row marker • If you can do a purge in your app, you’re golden • In EM7 we schedule DELETEs for a nightly quiet time
Summarizing Data With Dynamic Stored Procedure • USE dynamic_app_data_43; • DROP PROCEDURE IF EXISTS dynamic_app_43.dynamictest; • DELIMITER // • CREATE PROCEDURE dynamic_app_43.dynamictest(n INTEGER) • BEGIN • SET @s = ""; • SET @s = CONCAT(@s," INSERT INTO app_crunched • (did,object,ind,year,month, date,average, total, poll_count)"); • SET @s = CONCAT(@s," SELECT ",n,",object,ind,YEAR(date),MONTH(date),date,0,0,0"); • SET @s = CONCAT(@s," FROM stat_", n); • SET @s = CONCAT(@s," WHERE crunched = 0"); • #SELECT @s; • PREPARE exe FROM @s; • EXECUTE exe; • END; • // • DELIMITER ; • CALL dynamictest(5);
What’s The Downside? • Reporting tools not good with dynamic databases and tables (eg Crystal Reports) • 80 : 20 rule: • Above a certain size of implementation, some tables just have to use row locking • (with EM7, 1,500 devices, 10,000 tables, and 0.1% need to be InnoDB… so should be the 99.9 : 0.1 rule) • Backup and data maintenance complexity of multiple engines
So… MyISAM all the way? • Well, no. • In our application, at larger sites, around a dozen tables need the characteristics of InnoDB, or around 0.1% • Be selective in the storage engine choice, consider relative merits for each part of the application
Questions Richard Chart rchart@sciencelogic.com
A hand on us… Contact for a pack: Richard Chart rchart@sciencelogic.com