1 / 23

Analysis of Cloud Data Management Systems

Analysis of Cloud Data Management Systems. Student: Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11. INTRODUCTION. ?. Distributed Databases NoSQL Cloud Data Stores. Relational Database Management Systems. 1/22. Presentation Plan.

Download Presentation

Analysis of Cloud Data Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Cloud Data Management Systems Student: Miro Szydlowski Supervisor: Prof. Mehmet Orgun Date: 11.11.11

  2. INTRODUCTION ? Distributed Databases NoSQL Cloud Data Stores Relational Database Management Systems 1/22

  3. Presentation Plan • Origins of Database Management Systems • Raise to power • ACID qualities • Problems and Solutions • Consequences of being popular • Partitioning, Replication, Load Balancing, • Distributed Database Management Systems • Challenges of Connected World • Cloud Computing • Definition, Type • Place of DBMS in Cloud • Cloud Data Management Systems • CAP, BASE, NoSQL and few other concepts • NoSQL by implementation type • Example: AmazonDB • Which one to choose? 2/22

  4. Database Management Systems “…set of software programs that control the organisation, storage, management and data retrieval” Database Models: Hierarchical Network Relational Object-relational 3/22

  5. Origins of Relational Database Management Systems • 1970, University of California • In the following 20 years became not only accepted not only essential, but considered the only solution for enterprise data storage • Why? • Data normalisation • Metadata reuse • User Views <-> Community View <-> Storage • SQL! • Guarantees data integrity - ACID 4/22

  6. ACID • Atomicity • Consistency • Isolation • Durability • Provides consistent state of the database • …but at a cost 5/22

  7. Problems and Solutions • Load Balancing • …and finally • Distributed Database Management Systems • Very successful solution, but the businesses were growing… • Data volume • Data warehousing, business intelligence • Merges and acquisitions • WWW • New Solutions: • Partitioning • Hardware • Horizontal • Vertical • Replication • Multi-master • Master-Slave …but the challenges kept coming… 6/22

  8. Challenges of the Connected World • Search Engines • Mobile Devices • Business-To-Business (Web Services) • Stream Processing • Data Warehousing • Directory Services • Current example: 2011 Twitter statistics: • • 1 Billion Tweets per week • • 140 million Tweets per day in average • • 177 million Tweets sent on March 11, 2011. • • Current record: 6,939 TPS - set 4 seconds after midnight in Japan on New Year’s Day. • New Solutions needed ASAP 7/22

  9. What is Cloud Computing? • Lots of definitions, one of them below: • “…a pool of highly scalable, abstracted infrastructure, capable of hosting end-customer applications, that is billed by consumption” (James Staten) • Automation • Virtualization • Scalability • Pay-as-you-go pricing model 8/22

  10. Cloud Computing Types By Deployment Type By Service Type Cloud Data Management Systems? IaaS or PaaS 9/22

  11. Dark Cloud • Beginning of 21st century – open critique of the relational database management systems: • Too complex for an average user • Can’t cope with data volumes • Relational mapping is an overkill • One size doesn’t fit all – we want to prioritize some features • Why do we need to build the ORM? • Distributed RDMSs are fake! • Scalability! Why don’t we re-engineer and rebuild instead of constantly ‘patching’ RDBMS? 10/22

  12. CAP and BASE • Eric Brewer at ACM Symposium in 2000 made a statement: • It is unachievable to implement all three qualities of a “shared-data system” at once: • • Consistency • • Availability • • Partition Tolerance • …so – pick any two! • Since we can’t guarantee ACID, lets BASE our systems on another principle: • Basically Available • Soft State • Eventually Consistent • These two ideas changed the approach to the database design… • …and gave birth to the ‘NoSQL’ movement 11/22

  13. Few new concepts • Hash – based partitioning • certain property of each entity is used to calculate a hash value, which is used to determine which database server to use to store the entity • ‘Shared nothing’ architecture • cluster of independent machines that communicate over a high-speed network • Sharding • splitting up a database across multiple machines • MapReduce • not a database system, but a programming framework • every job sent is divided into two parts: a ‘Map’, and a ‘Reduce’ 12/22

  14. NoSQL Movement • Their main objection: unnecessary complexity of the relational databases • Motto: “select a right tool for the job” • “Tool in the box” approach • Principles of NoSQL data stores: • Built for performance • Built for real scalability • Build for high availability • Typically use a very specific data access pattern • Either schemaless or implementing very simple schemas • Weak consistency guarantees • Declarative query language (such as SQL) replaced with simple APIs 13/22

  15. NoSQL Databases by Implementation Type • Key/Value Stores • BigTable • Document-based • Columnar • (also, graph, object-oriented, distributed object stores and dozen of others…) 14/22

  16. Key/Value Stores • Data is stored as a key/value pair • Basic APIs – Put/Get/Remove • Scalability: Sharding or Replicating data items • Advantages: Performance and scalability • Best For: High-performance systems that deal with one type of object • Examples: HBase, SimpleDB, Cassandra • Potential Issues: Data Integrity has to be supported by application, supports only one type of query 15/22

  17. ‘BigTable’ Databases • Named after Google’s ‘BigTable’ implementation • Each row can have different set of columns • A row can have thousands of columns • Records can have multiple fields • Records are indexed by [row-key, column-key, timestamp] • Usually sharded • Advantages: Highly optimized for write operations, highly scalable, (quoted) extremely even performance • Examples: Google Analytics, Google Docs, Microsoft Azure Tables • Potential Issues: Lack of text search, very difficult to import and export data – query times out after 30 sec 16/22

  18. Document Databases • Completely schemaless • All document data is stored in the document itself • Document usually encoded in JSON, BSON, XML • Scalability: good, implementing asynchronous replication • Advantages: client application can store data in its final form; support custom views • Examples: Couch DB, MongoDB, Terrastore • Best For: wikis, blogs, document management systems • Potential Issues: They actually don’t outperform RDBMS, not well supported 17/22

  19. Columnar Databases • ‘Between’ SQL and NoSQL – can use SQL syntax, but use wide columns • Each columns stored separately on different disk location • Scalability and Performance: both good because rows and columns can be split across multiple nodes: rows – sharding, columns – column groups • Advantages – great when you need data aggregation • Examples: Vertica, HBase • Best At: Data warehousing, data mining • Potential Issues: Not great at handling complex relationship, better than RDBS only when row size is big and not many columns of a single row are required 18/22

  20. Example: Amazon SimpleDB • Data Store Type: Entity-Attribute-Value • Data Model: Document Store/Big Table • Cloud Type: Platform as a Service • The data model based on domains, items, attributes and values: • Domains are currently limited to 10 GB each, and each account is limited to 100 domains • Domains are collections of items that are described by attribute-value pairs • Doesn’t have the concept of schema – everything is a string • Designed for reads rather than writes • Updates done to central database ONLY and distributed to ‘slaves’ • Client interface: SOAP and REST • Availability: multiple geographically distributed copies of each data item • Scalability: Great • Pay as you do model: Clients are charged by data storage, data transfer and machine utilization • Potential Issues: eventual consistency, no data types or constraints 19/22

  21. Summary – RDBMS or NoSQL? It depends… • if you have a low-volume, medium-complexity suite of applications, don’t change it – this is what the RDBMS are good at • if your data is normalized and using joins – don’t move to the schemaless NoSQL • if you’re looking for an off-the-shelf system and don’t want to get involved in a customized development – choose RDBMS • if you problem can’t be resolved using RDBMS [e.g. you have serious scalability issues] and you’re determined to fix it at any cost – go ‘NoSQL’ • if you have access to sufficient quantities of sufficiently smart people - choose NoSQL. 20/22

  22. Summary – RDBMS or NoSQL? ‘choose a right tool for the job’ 21/22

  23. Questions? 22/22

More Related