210 likes | 297 Views
0 to 60 in 3.1. Tyler Carlton Cory Sessions. <Insert funny joke here>. The Project. Medium sized demographics data mining project 1,700,000+ User base Hundreds of data points per user. “Legacy” System – Why Upgrade?. +. Main DB (External Users) Offline backup (Internal Users)
E N D
0 to 60 in 3.1 Tyler Carlton Cory Sessions
The Project • Medium sized demographics data mining project • 1,700,000+ User base • Hundreds of data points per user
“Legacy” System – Why Upgrade? + • Main DB (External Users) • Offline backup (Internal Users) • Weekly manual copy backups • Max of 3 simultaneous data pulls • 8hr+ data pull times for complex data pulls • Random index corruption
Notes: Smaller is Better On average, CPU usage with MySQL was 20% lower than our old database solution.
Why We Chose MySQL Cluster • Scalable • Distributed processing • 5 – 9’s Reliability • Instant data availability between internal & external users
What We Built – NDB Data Nodes • 8 Node NDB cluster • Dual Core 2 Quad 1.8 ghz • 16 Gig ram (Data memory) • 6x Raid 10 SAS 15k RPM drives
What We Built – API & MGMT Nodes • 3 API nodes + 1 management node • Dual Core 2 Quad 1.8 ghz • 8 Gig ram • 300 gig 7200rpm (Raid 0)
NDB Issues with a Large Data Set • NDB load times • Loading from backup: ~ 1 hour • Restarting NDB nodes: ~ 1 hour Note: Load times differ depending on your data size
NDB Issues with a Large Data Set • Indexing Issues • Force index (NDB picks wrong) • Index creation/modification order matters (Seriously!) • Local Checkpoint Tuning • TimeBetweenLocalCheckpoints - 20 means 4MB (4 × 220) of write operations • NoOfFragmentLogFiles – No. of 4 x 16MB files • None deleted until 3 local checkpoints • On startup: Local checkpoint buffers would overrun • RTFM (two, maybe three times)
NDB Network Issues • Network transport packet size • Buffer would fill and overrun • This caused nodes to miss their heartbeats and drop • This would happen when: • A backup was running • A local checkpoint was running at the same time • Solved by : Increasing network packet buffer
Issues - IN Statements • IN statements die with engine_condition_pushdown=ON with a set of apx. 10,000 or more. (caused with zip codes) • Really need engine_condition_pushdown=ON, but this broke it for us, so… we had to disable it.
Structuring Apps: Redundency • Redundant power supply + dual power sources • Port trunking w/ redundant Gig-E switches • # NDB Replicas: 2 (2x4 setup) 64 gig max data size • MySQL (API Nodes ) Heads: Load balanced with automatic fail over
Structuring Apps: Internal Apps • Ultimate goal: Offload the data intensive processing to the MySQL nodes
The Good Stuff: Stats! Queries per Second (over 20 days) • Average 1100-1500 Queries / Sec during our peak times • Average 250 Queries / Sec
Website Traffic Stats for March 2008
Net Usage: NDB Node • All NDB data nodes have nearly identical network bandwidth usage MySQL ( API ) Nodes use about 9 MBs max under our current structure Totaling 75 MBs during peak(600 Mbs)
Monitoring & Maintenance • SNMP Monitoring: CPU, Network, Memory, Load, Disk • Cron Scripts: • Node status & Node down notification • Backups • Database maintenance routines • MySQL Clustering book provided the base the scripts
Dolphin NIC Testing 4 node test cluster 4 x overall performance Brand new patch to handle automatic Ethernet failover / Dolphin Fail Over ( beta as of March 28 ) Net Usage: Next steps…
Contact Information • Tyler Carlton www.qdial.com tcarlton@gmail.com • Cory Sessions CorySessions.com OrangeSoda.com