Presentation on paper “Ten Rules for Scalable Performance in Simple Operation Datastores ”

Presentation on paper “Ten Rules for Scalable Performance in Simple Operation Datastores” By 鄭秀青 (0156824)

Purpose of the paper • Ten rules to help anyone who • Wants to choose a DBMS that handles Simple Operations • Not considering GPTRS (General-purpose traditional row stores) systems, such as MySQL and MSSQL • Simple Operations (SO) is a term coined by the author: • Read or write a few items • Apply to OLTP model

Rule #1Look for shared-nothing scalability • Each node shares neither main memory nor disk • A collection of self-contained nodes are connected to each other by networking • Greatly reduces the overheads of synchronisation and locking mechanism • Easily scalable until the network bandwidth is exhausted

Rule #2High-level languages are good and need not hurt performance • Programmers can write less code that is easier to understand • No need for the programmers to understand complex storage optimisations • Less maintenance when things need to be changed in the database

Rule #3Plan to carefully leverage main memory databases • Doubling the size of RAM does not mean the performance will become twice as fast • CPU overheads need to be considered • If the CPU overheads are dealt poorly, even entire database is placed in memory, the performance will only improve marginally

Rule #4HA and automatic recovery are essential for SO scalability • Few clients today are willing to accept down time in their SO application • Most people want redundant hardware and have a second copy of their data • Disaster recovery should be considered as an extension of HA • The DBMS you choose should have built-in high availability function

Rule #5 On-line everything • Users want their database to be “up” all the time • In addition to failure recovery, other reasons for taking a DB offline should be considered: • Schema changes • Index changes • Reprovisioning • Software upgrade • The actions above should be performed without interrupting the DB service

Rule #6Avoid multi-node operations • Multi-node operation • Basically means operations over several servers • If the majority of operations involve several servers, the advantages of scalability may lost as the overheads of the cross server communication and synchronisation increase dramatically

Rule #7Don’t try to build ACID yourself • Use a DBMS that provides ACID • Do not try to code ACID in application level as it: • Complicates the design • Is difficult to maintain

Rule #8Look for administrative simplicity • Choose a DBMS with easy-to-use administrative tools, including • Installation • Schema construction • Application design • Data distribution • Tuning • Monitoring

Rule #9Pay attention to node performance • “Node performance is less important compare to linear scalability” is a misconception • Assume solution A provides node performance of a factor of 20 better than solution B • If solution A requires 50 hardware nodes • Solution B will need 1000 nodes

Rule #10Open source gives you more control over your future • Avoid expensive software license and upgrade fees • Proprietary technical support is not always superior • Several vendors provide open source software consultancy and technical support

Conclusion • Use DB built-in functions whenever possible • The performance of an individual node is as important as the overall performance • Use open source software to avoid large sum of bills

Presentation on paper “Ten Rules for Scalable Performance in Simple Operation Datastores ”