450 likes | 618 Views
An Analysis of the Suitability of SQL and NoSQL for Commercial Applications. Project Group 12: Monika Puhazhendhi – A0075132N Wang Jun – A0078092Y Qiao Liang – U0905199 Nithiya Priya Ravichandran – A0069854M . A CS4221 project. Introduction.
E N D
An Analysis of the Suitability of SQL and NoSQL for Commercial Applications Project Group 12: Monika Puhazhendhi– A0075132N Wang Jun – A0078092Y Qiao Liang – U0905199 Nithiya Priya Ravichandran – A0069854M A CS4221 project
Introduction • Database volumes expanding with cloud computing • Multiple sources of data - customers, the public, point-of-sale devices, GPS, mobile devices, RFID readers, website, video and so on • Databases and Datastore – increasing importance
MySQL & NoSQL - Overview • MySQL • Tested Relation DBMS • SQL standard • ACID properties • Good performance with biz and web applications • NoSQL • “Not Only SQL” • Schema-less DBMS • BASE properties • Representation forms: Document-store, Key-value, XML, Graph, Column-store • Designed for flexibility and scalability
Project Objective ObjectiveAnalyze suitability of SQL and NoSQL for commercial applications through case studies Overview • Case studies • Pure SQL/NoSQL • Hybrid • Security Issues • Analysis of SQL vsNoSQL
Wikipedia - MySQL • Own DBMS - Media wiki • Primary DB model – MySQL • Provides multiple user access • Handles heavy weight queries • Efficient recovery from crash • Satisfactory performance
Why not NoSQL? • Intensive cross-indexing and multiple different views • Lose efficiency and data locality • Require moving of data from data to application layer • Sub-sections of wiki page • Individual version control along with normalization • Wastage of space due to duplication • Does not provide strong consistency and atomicity to handle high profile of data flow
LinkedIn - NoSQL • Back-end : MySQL for ACID properties • Front-end : MongoDB & Voldemort • Voldemort • Key-value storage • Efficient handling of million reads and writes daily • Useful to access and analyze large unstructured data • DataBus : real-time DB change capture system
Quora - MySQL • Social Q&A site with approx. 5m users • Pure MySQL DB solution • No intention to change to NoSQL anytime soon • Built on a range of database technologies
Quora - MySQL Why? • They don’t have to • Smaller user base than Facebook and Twitter • Read-heavy DB operations – Mitigate the load with read slaves • Able to scale up and scale out conventionally without issue • They don’t want to • Believe SQL is the “right way” to go – support for structured data, heavy lifting, no post-processing, efficient data storage with little redundancy • NoSQL not matured sufficiently
Twitter • “Online social networking and microblogging service that enables users to send and read ‘tweets’, which are text messages limited to 140 characters.” – wiki • 500m+ active users • Average tweets per day: 58m
Twitter - Early stage (2007) • MySQL with replica and memcache
Twitter - Current stage • Cassandra • High-velocity writes and low-velocity reads • Usage in geolocation, places of interest, real-time data analytics • Gizzard • Distributed data storage framework • Manages data partitioning across data stores
But it’s still MySQL • FlockDB built on Gizzard and MySQL for ID generation and graph storage
Twitter – Future Plan “For now, we’re not working on using Cassandra as a store for Tweets. This is a change in strategy. Instead we’re going to continue to maintain our existing MySQL-based storage. We believe that this isn’t the time to make large scale migration to a new technology.”
Facebook - Infrastructure 1. Front-end cluster: Web server & Memcache 2. Back-end cluster: Database (MySQL) 3. Auxiliary cluster: Inbox search, Ads,…
Facebook – Implementation • Tier 1 – Web Server • Tier 2 – Memcache • Tier 3 – Database
Facebook - MySQL • Key-value store • Why use MySQL? • Simplicity • Reliability • Speed • Random data distribution among many instances • Load balancing at physical node level • Custom partitioning scheme with global ID • Custom archiving scheme on per-user basis
Facebook - Memcache • Distributed memory caching system • Caching of query results • Primary and secondary indexing • Speed up dynamic database-driven websites • Advantages • Highly reliable • Low latency • High service rates • Disadvantages • Easily corrupted • Limited data model • Inefficient for small data • Aware of only own server
Facebook – Issues with MySQL • Local migration of data • Explosion of user traffic on single node • No support for global queries • Data-driven schemas • FB Objects and Associations • Higher level design abstraction • Automatic construction of SQL query and cache key • Performance issues – no well-defined schema
Facebook - Cassandra • Use in inbox search • Issue – Balance between normalization and join queries • Features • Denormalization to one huge table • Replication among multiple nodes • Delayed consistency
Facebook - HBase • Need for distributed fault-tolerant database and manage write-dominated workload • HBase – distributed DB • Based on Google’s BigTable • Integrated with Apache Hadoop and Zookeeper • Good scalability and performance • Simpler consistency model than Cassandra • HDFS
Google App Engine • Database products • Google CloudSQL • Google Datastore
Google App Engine - CloudSQL • Need for RDBMS • Until 2011, schema-less object data store based on GFS and BigTable • SQL-like syntax – “GQL” • No support for relational features – joins, sub-queries, multiple filters • CloudSQL • RDBMS service for development in Java and Python on GAE • Connect to MySQL instance in cloud • Import/export of instances • RESTful JSON API • Synchronous replication
Google App Engine - Datastore • Infinite scalability • Eventual consistency vs. Strong consistency • Data modelling
Why security Without security, bad things can happen: • Confidentiality: Attacker can access your business secret • Integrity: We store a FYP paper but retrieve a piece of joke • Availability: Cannot access the data when we need it • Authentication: A hacker can login to your database and do whatever he wants
Security Mechanisms of RDBMS • Strong authentication mechanism • Encryption mechanism for data storage and transmission • Consistent transaction • Mature industry standard
NoSQL Security It does not mean no security protection but • Weak authentication and no authentication by default • Plain text storage and transmission • Motivation for vendors to offer security protection
NoSQL Security – User Role • System Design • Security Model • Secured on Application side
NoSQL Security – User Role • NoSQL is not SQL or RDBMS • Not designed for • Access or constrain control • Normal structured data • Replace RDBMS
Anlaysis of SQL vsNoSQL • No one perfect solution – Trade-offs • Decision is based on the particular use case. • Big players usually combine both to solve their various problems • NoSQL may take some time to mature as the community grows and commercial support catches up.
Where to use SQL? • If the application requires • Good range of expressive and powerful queries • Assurance of ACID properties • Better security mechanisms • Maturity of technology • If scalability is more important than ensuring ACID properties, NoSQL is should be chosen
Where to use NoSQL? • If the application requires • Easy, fast and high scalability • Flexible schema