1 / 28

CS 405G: Introduction to Database Systems

CS 405G: Introduction to Database Systems. 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky. Presentation next Tuesday and Thursday. Summary. Tree-based indexes: O(logN) for search and update, support range queries

ulema
Download Presentation

CS 405G: Introduction to Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky

  2. Presentation next Tuesday and Thursday. Chen Qian @ University of Kentucky

  3. Summary • Tree-based indexes: O(logN) for search and update, support range queries • Hash-based indexes: best for equality searches O(1), cannot support range searches. • Static and dynamic Chen Qian @ University of Kentucky

  4. NoSQL Systems: Motivation NoSQL: The Name • “SQL” = Traditional relational DBMS • Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS • “NoSQL” = “No SQL” = Not using traditional relational DBMS • “No SQL”  Don’t use SQL language

  5. NoSQL Systems: Motivation NoSQL: The Name • “SQL” = Traditional relational DBMS • Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS • “NoSQL” = “No SQL” = Not using traditional relational DBMS • “No SQL”  Don’t use SQL language • “NoSQL” = “Not Only SQL”

  6. NoSQL Systems: Motivation Not every data management/analysis problem is best solved using a traditional DBMS • Database Management System (DBMS) provides…. • … efficient, reliable, convenient, and safe • multi-user storage of and access to massive • amounts of persistent data.

  7. NoSQL Systems: Motivation NoSQL Systems Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency  higher performance & availability • No declarative query language  more programming • Relaxed consistency  fewer guarantees

  8. NoSQL Systems: Motivation Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Load into database system Data cleaning Data extraction Verification Schema Nothing above is needed for noSQL!

  9. NoSQL Systems: Motivation Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all records for… • Given UserID • Given URL • Given timestamp • Certain construct appearing in additional-info

  10. NoSQL Systems: Motivation Example #1: Web log analysis Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender, … Task: Find average age of user accessing given URL May not require strict consistency.

  11. NoSQL Systems: Motivation Example #2: Social-network graph Each record: UserID1, UserID2 Separate records: UserID, name, age, gender, … Task: Find all friends of friends of friends of … friends of given user Large number of joins? Not efficient at all! Specially designed graph database may be better

  12. NoSQL Systems: Motivation Example #3: Wikipedia pages Large collection of documents Combination of structured and unstructured data Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900 Mix of structured and unstructured data

  13. NoSQL Systems: Motivation NoSQL Systems Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency  higher performance & availability • No declarative query language  more programming • Relaxed consistency  fewer guarantees

  14. NoSQL Systems Overview

  15. NoSQL Systems: Overview NoSQL Systems Several incarnations • MapReduce framework: OLAP • Key-value stores: OLTP • Document stores • Graph database systems

  16. NoSQL Systems: Overview MapReduce Framework Originally from Google, open source Hadoop • No data model, data stored in files • User provides specific functions map() reduce() • System provides data processing “glue”, fault-tolerance, scalability

  17. NoSQL Systems: Overview Map and Reduce Functions Map: Divide problem into subproblems Reduce: Do work on subproblems, combine results

  18. NoSQL Systems: Overview MapReduce Architecture

  19. NoSQL Systems: Overview MapReduce Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Count number of accesses for each domain (inside URL)

  20. NoSQL Systems: Overview MapReduce Example (modified #1) Each record: UserID, URL, timestamp, additional-info Task: Total “value” of accesses for each domain based on additional-info

  21. NoSQL Systems: Overview MapReduce Framework • No data model, data stored in files • User provides specific functions • System provides data processing “glue”, fault-tolerance, scalability

  22. NoSQL Systems: Overview MapReduce Framework Schemas and declarative queries are missed Hive – schemas, SQL-like query language Pig – more imperative but with relational operators • Both compile to “workflow” of Hadoop (MapReduce) jobs

  23. NoSQL Systems: Overview Key-Value Stores Extremely simple interface • Data model: (key, value) pairs • Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, fault-tolerance • Records distributed to nodes based on key • Replication • Single-record transactions, “eventual consistency”

  24. NoSQL Systems: Overview Key-Value Stores Extremely simple interface • Data model: (key, value) pairs • Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) • Some allow (non-uniform) columns within value • Some allow Fetch on range of keys Example systems • Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, …

  25. NoSQL Systems: Overview Document Stores Like Key-Value Stores except value is document • Data model: (key, document) pairs • Document: JSON, XML, other semistructured formats • Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) • Also Fetch based on document contents Example systems • CouchDB, MongoDB, SimpleDB, …

  26. NoSQL Systems: Overview Graph Database Systems • Data model: nodes and edges • Nodes may have properties (including ID) • Edges may have labels or roles

  27. NoSQL Systems: Overview Graph Database Systems • Interfaces and query languages vary • Single-step versus “path expressions” versus full recursion • Example systems Neo4j, FlockDB, Pregel, … • RDF “triple stores” can map to graph databases

  28. NoSQL Systems: Overview NoSQL Systems • “NoSQL” = “Not Only SQL” Not every data management/analysis problem is best solved exclusively using a traditional DBMS • Current incarnations • MapReduce framework • Key-value stores • Document stores • Graph database systems

More Related