1 / 21

0 to 60 in 3.1

0 to 60 in 3.1. Tyler Carlton Cory Sessions. <Insert funny joke here>. The Project. Medium sized demographics data mining project 1,700,000+ User base Hundreds of data points per user. “Legacy” System – Why Upgrade?. +. Main DB (External Users) Offline backup (Internal Users)

Anita
Download Presentation

0 to 60 in 3.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 0 to 60 in 3.1 Tyler Carlton Cory Sessions

  2. <Insert funny joke here>

  3. The Project • Medium sized demographics data mining project • 1,700,000+ User base • Hundreds of data points per user

  4. “Legacy” System – Why Upgrade? + • Main DB (External Users) • Offline backup (Internal Users) • Weekly manual copy backups • Max of 3 simultaneous data pulls • 8hr+ data pull times for complex data pulls • Random index corruption

  5. Notes: Smaller is Better On average, CPU usage with MySQL was 20% lower than our old database solution.

  6. Why We Chose MySQL Cluster • Scalable • Distributed processing • 5 – 9’s Reliability • Instant data availability between internal & external users

  7. What We Built – NDB Data Nodes • 8 Node NDB cluster • Dual Core 2 Quad 1.8 ghz • 16 Gig ram (Data memory) • 6x Raid 10 SAS 15k RPM drives

  8. What We Built – API & MGMT Nodes • 3 API nodes + 1 management node • Dual Core 2 Quad 1.8 ghz • 8 Gig ram • 300 gig 7200rpm (Raid 0)

  9. NDB Issues with a Large Data Set • NDB load times • Loading from backup: ~ 1 hour • Restarting NDB nodes: ~ 1 hour Note: Load times differ depending on your data size

  10. NDB Issues with a Large Data Set • Indexing Issues • Force index (NDB picks wrong) • Index creation/modification order matters (Seriously!) • Local Checkpoint Tuning • TimeBetweenLocalCheckpoints - 20 means 4MB (4 × 220) of write operations • NoOfFragmentLogFiles – No. of 4 x 16MB files • None deleted until 3 local checkpoints • On startup: Local checkpoint buffers would overrun • RTFM (two, maybe three times)

  11. NDB Network Issues • Network transport packet size • Buffer would fill and overrun • This caused nodes to miss their heartbeats and drop • This would happen when: • A backup was running • A local checkpoint was running at the same time • Solved by : Increasing network packet buffer

  12. Issues - IN Statements • IN statements die with engine_condition_pushdown=ON with a set of apx. 10,000 or more. (caused with zip codes) • Really need engine_condition_pushdown=ON, but this broke it for us, so… we had to disable it.

  13. Structuring Apps: Redundency • Redundant power supply + dual power sources • Port trunking w/ redundant Gig-E switches • # NDB Replicas: 2 (2x4 setup) 64 gig max data size • MySQL (API Nodes ) Heads: Load balanced with automatic fail over

  14. Structuring Apps: Internal Apps • Ultimate goal: Offload the data intensive processing to the MySQL nodes

  15. The Good Stuff: Stats! Queries per Second (over 20 days) • Average 1100-1500 Queries / Sec during our peak times • Average 250 Queries / Sec

  16. Website Traffic Stats for March 2008

  17. Net Usage: NDB Node • All NDB data nodes have nearly identical network bandwidth usage MySQL ( API ) Nodes use about 9 MBs max under our current structure Totaling 75 MBs during peak(600 Mbs)

  18. Monitoring & Maintenance • SNMP Monitoring: CPU, Network, Memory, Load, Disk • Cron Scripts: • Node status & Node down notification • Backups • Database maintenance routines • MySQL Clustering book provided the base the scripts

  19. Dolphin NIC Testing 4 node test cluster 4 x overall performance Brand new patch to handle automatic Ethernet failover / Dolphin Fail Over ( beta as of March 28 ) Net Usage: Next steps…

  20. Questions?

  21. Contact Information • Tyler Carlton www.qdial.com tcarlton@gmail.com • Cory Sessions CorySessions.com OrangeSoda.com

More Related