1 / 71

Modeling Data

A quick primer on NoSQL databases, types, and best use cases. Learn about Document Stores, hierarchical data structures, schema-agnostic JSON, and data modeling approaches.

adelia
Download Presentation

Modeling Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Data in a Document-Oriented NoSQL Database { "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }

  2. A Quick Primeron NoSQL

  3. “If all you have is a hammer, everything looks like a nail“ -Abraham Maslow

  4. “If all you have is a hammer, everything looks like a nail“ -Abraham Maslow

  5. The database renaissance!

  6. Choose the right tools for the right job

  7. NoSQL in a nutshell • NoSQL is buzzword • NoSQL is varied • Key-value • Wide-column • Document-oriented • Graph

  8. Document stores Document stores contain data objects that are inherently hierarchical, tree-like structures (most notably JSON). Built for scale and performance Great for: Hierarchical Trees, Logging, Telemetry

  9. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents

  10. Not these documents

  11. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale

  12. Let’s walkthrougha basic scenario

  13. !=

  14. !=

  15. What a waste of space…

  16. More tables!Okay… What if I have 100,000 product types?Or I have varying features for a single product type?

  17. { "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ] } { "ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound" }

  18. Data Modeling

  19. Myth: No Schema, No Problem There is still a schema; just no specific rules.

  20. How do approaches differ? ORM Data normalization Come as you are

  21. Modeling data, the relational way ContactDetail PersonContactDetailLnk Person Id Id PersonId ContactDetailId Address ContactDetailType Id Id

  22. Modeling data, the document way { "id": "0ec1ab0c-de08-4e42-a429-...", "addresses": [ { "street": "1 Redmond Way", "city": "Redmond", "state": "WA", "zip": 98052} ], "contactDetails": [ {"type": "home", "detail": “555-1212"}, {"type": "email", "detail": “me@ms.com"} ], ... } Person Id Addresses Address … Address … ContactDetails ContactDetail …

  23. To embed, or to reference, that is the question embed reference

  24. To embed, or to reference, that is the question Data from entities are queried together

  25. To embed, or to reference, that is the question Data from entities are queried together

  26. To embed, or to reference, that is the question Data from entities are queried together { id: "book1", covers: [ {type: "front", artworkUrl: "http://..."}, {type: "back", artworkUrl: "http://..."} ], index: "", chapters: [ {id: 1, synopsis: "", pageCount:24, wordCount:1456}, {id: 2, synopsis: "", pageCount:18, wordCount:960} ] }

  27. To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order { id: "order1", customer: "customer1", orderDate: "2014-09-15T23:14:25.7251173Z" lines: [ {product: "13inch screen" , price: 200.00, qty: 50 }, {product: "Keyboard", price:23.67, qty:4}, {product: "CPU", price:87.89, qty:1} ] }

  28. To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship • { • id: "person1", • name: "Mickey" • creditCard: { • number: "**** **** **** 4794", • expiry: "06/2019", • cvv: "868", • type: "Mastercard" • } • }

  29. To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility { id: "person1", name: "Mickey", contactInfo: [ {email: "mickey@disney.com"}, {mobile: "+1 555-5555"}, {twitter: "@MickeyMouse"} ] }

  30. To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1relationship Similar volatility The set of values or sub-documents is bounded (1:few) { id: "task1", desc: "deliver an awesome presentation @ #sqlbits", categories: [ "conference", "talk", "workshop", “databases“ ] }

  31. To embed, or to reference, that is the question Data from entities are queried together The child is a dependent e.g. Order Line depends on Order 1:1 relationship Similar volatility The set of values or sub-documents is bounded (1:few) Typically denormalized data models provide better read performance

  32. To embed, or to reference, that is the question one-to-many relationships (unbounded) { id: "post1", author: "Mickey Mouse", tags: [ "fun", "cloud", "develop"] } {id: "c1", postId: "post1", comment: "Coolest blog post"} {id: "c2", postId: "post1", comment: "Loved this post, awesome"} {id: "c3", postId: "post1", comment: "This is rad!"} … {id: "c10000", postId: "post1", comment: "You are the coolest cartoon character"} … {id: "c2000000", postId: "post1", comment: "Are we still commenting on this blog?"}

  33. To embed, or to reference, that is the question { author-id: "author1", book-id: "book1" } { author-id: "author2", book-id: "book1" } { id: "book1", name: "100 Secrets of Disneyland" } { id: "book2", name: "The best places to eat @ Disney" } one-to-many relationships (unbounded) many-to-many relationships { id: "author1", name: "Mickey Mouse" } { id: "author2", name: "Donald Duck" } Look familiar? It should …. It's the "relational" way

  34. To embed, orto reference, that is the question • one-to-many relationships (unbounded) • many-to-many relationships { id: "book1", name: "100 Secrets of Disneyland", authors: ["author1", "author2"] } { id: "book2", name: "The best places to eat @ Disney”, authors: ["author1"] } { id: "author1", name: "Mickey Mouse", books: ["book1", "book2"] } { id: "author2", name: "Donald Duck" books: ["book1"] }

  35. To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others { id: "1", author: "Mickey Mouse", stocks: ["dis", "msft"] } { id: "dis", opening: "52.09", numerOfTrades: 10000, trades: [{qty:57, price: 53.97}, {qty:5, price: 54.01}] }

  36. To embed, or to reference, that is the question one-to-many relationships (unbounded) many-to-many relationships Related data changes frequently The referenced entity is a key entity used by many others Typically normalizing provides better write performance. Normalized data models can require more round trips to the server.

  37. Where do you put the reference? Publisher & Book … does publisher refer to book? Publisher document: { id: "mspress", name: "Microsoft Press", books: [ 1, 2, ... ] } Book documents: {id: 1, name: "DocumentDB 101" } {id: 2, name: "DocumentDB for RDBMS Users" }

  38. Where do you put the reference? Publisher & Book … or book refer to publisher? Publisher document: { id: "mspress", name: "Microsoft Press", } Book documents: {id: 1, name: "DocumentDB 101", pub-id: "mspress"} {id: 2, name: "DocumentDB for RDBMS Users", pub-id: "mspress"}

  39. Embed vs Reference: No absolutes!

  40. Embed vs Reference: No absolutes!

  41. Embed + Reference { "id": "product1", "type": "product", "name": "Microsoft Band 2 – Medium", "price": "174.99", "summary": "Continuous heart rate monitor tracks heart rate...", "images": [ {"image1": "http://..."}, { "image2": "http://..."} ], "reviews": { "averageStars": 4, "reviewCount": 313 } }

  42. Embed + Reference { "id": "product1", "type": "reviewSummary", "reviewBreakdown: [ {5:24},{4:10},{3:3},{2:0},{1:4} ], "topReview": { "rating": 4, "title": "More comfortable than Band 1: But New Size Scale!", "snippet": "I've been wearing the first Band since it…", "fullReviewLink": "http://..." } }

  43. Use Case: Hierarchical Trees

  44. Use case: Hierarchical trees { {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } Jill Ben Susan Andrew Sven Thomas To get the manager of any employee is trivial - SELECT manager FROM org WHERE id = "Susan"

  45. Use case: Hierarchical trees { {id: "Jill" }, {id: "Ben", manager: "Jill" }, {id: "Susan", manager: "Jill" }, {id: "Andrew", manager: "Ben" }, {id: "Sven", manager: "Susan" }, {id: "Thomas", manager: "Sven" } } Jill Ben Susan Andrew Sven Thomas To get all employees where Jill is the manager is also easy - SELECT * FROM org WHERE manager = "Jill"

  46. Use case: Hierarchical trees { {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } Jill Ben Susan Andrew Sven Thomas To get all direct reports for Jill is easy - SELECT * FROM org WHERE id = "Jill"

  47. Use case: Hierarchical trees { {id: "Jill", directs:["Ben","Susan"] }, {id: "Ben", directs:["Andrew"] }, {id: "Susan", directs: ["Sven"] }, {id: "Andrew" }, {id: "Sven", directs: ["Thomas"] }, {id: "Thomas" } } Jill Ben Susan Andrew Sven Thomas To find the manager for an employee is possible - SELECT * FROM empWHERE ARRAY_CONTAINS(emp.directs, "Ben")

More Related