1 / 37

NoSQL with Mongo DB and R

NoSQL with Mongo DB and R. Bob Wakefield (I know stuff.) Sit down. Strap in. H ang on. NoSQL with Mongo DB and R. This is a discussion. Feel free to hop in and correct me if I say something crazy. WARNING!. “We’ll talk about this a little bit later.”. Motivations for this presentation.

Download Presentation

NoSQL with Mongo DB and R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NoSQL with Mongo DB and R • Bob Wakefield • (I know stuff.) • Sit down. Strap in. Hang on.

  2. NoSQL with Mongo DB and R • This is a discussion. Feel free to hop in and correct me if I say something crazy.

  3. WARNING! • “We’ll talk about this a little bit later.”

  4. Motivations for this presentation • Kaggle competition experience • Recent experience on a client site • NoSQL skills starting to be in demand

  5. RecSys Challenge 2013: Yelp business rating prediction • Build a recommender system based on user ratings • AKA ETL Hell

  6. System 1 Schema changes break everything down stream!

  7. System 2

  8. Wakefield Career Management System • Step 1: Scan market for high value skills. • Step 2: Acquire skills. • Step 3: Sell skills to the highest bidder. • Step 4: Get Paid.

  9. Sample of Job Board Post • Designs NoSQL dynamic schemas to leverage simplicity and power of NoSQL • Experience with NoSql-based databases, such as MongoDB • Knowledge of NoSQL databases (MongoDB, Hadoop, Couch DB, etc…) • Five years' experience with NoSQL, columnar, and key/value databases

  10. Mongo DB ETL Example • Data from Yelp Kaggle Competition • Data in JSON format • 229,907 reviews { 'type': 'review', 'business_id': (encrypted business id), 'user_id': (encrypted user id), 'stars': (star rating), 'text': (review text), 'date': (date, formatted like '2012-03-14', %Y-%m-%d in strptime notation), 'votes': {'useful': (count), 'funny': (count), 'cool': (count)} }

  11. Main Topics for this Evening • NoSQL • MongoDB • Modeling unstructured data • MongoDB and R

  12. Intended Audience • Business/Data Analyst • Data Architect • General Scenario • You have read only access to MongoDB and need to retrieve your own data.

  13. People I’m Going to Ignore • Software Developers • DBAs

  14. Source Material - Books • No SQL Distilled by P. Sadalage and M. Fowler • MongoDB Applied Design Patters by R. Copeland • The Definitive Guide to MongoDB by Plugge, Membry and Hawkins • MongoDB online docs

  15. Source Material - YouTube • Introduction to NoSQL by Martin Fowler – GOTO conferences • Workshop: NoSQL Data Modelling (Jan Steemann) Teil2 – ArangoDB • NoSQL Data Modelling for Scalable eCommerce – Dataversity Net • Domain Driven Design – Zend • Webinar on the rmongodb R package - comsystotv

  16. Why NoSQL • Handles Schema Changes Well (easy development) • Solves Impedance Mismatch problem • Rise of JSON • python module: simplejson

  17. A really generic and unofficial definition of NoSQL An ill-defined set of mostly open-source databases, mostly developed in the 21st century, and mostly not using SQL.

  18. Common Characteristics of NoSQL Databases • Non – relational • Open source • Cluster friendly • Built from the ground up to handle 21st century data challenges • Schema-less*

  19. Various Types of NoSQL Databases

  20. Example of graph database

  21. What is an aggregate? • Not what you think. • Definition from Domain Driven Design • “A group of related entities and value objects.” • aggregate = document

  22. What is a document? • Not what you think. • word document <> NoSQL document

  23. Example of a document • { • "business_id": "rncjoVoEFUJGCUoC1JgnUA", • "full_address": "8466 W Peoria Ave\nSte 6\nPeoria, AZ 85345", • "open": true, • "categories": ["Accountants", "Professional Services", "Tax Services",], • "city": "Peoria", • "review_count": 3, • "name": "Peoria Income Tax Service", • "neighborhoods": [], • "longitude": -112.241596, • "state": "AZ", • "stars": 5.0, • "latitude": 33.581867000000003, • "type": "business“ • }

  24. A MongoDB Vocab lesson

  25. Facts about MongoDB that will BLOW YOUR MIND!! • No Schemas • No transactions • No joins • Max docuement size of 16MB • Larger documents handled with GridFS

  26. Facts about MongoDB that are fairly mundane. • Runs on most common OSs • Windows • Linux • Mac • Solaris • Data stored as BSON (Binary JSON) • used for speed • translation handled by language drivers

  27. Retrieving Data

  28. Rules for building NoSQL Data Structures Rule 1: Every document must have an _id. Rule 2: There is only one rule.

  29. Designing NoSQL Data Structures • NoSQL data structures driven by application design. • Need to take into account necessary CRUD operations • To embed or not to imbed. That is the question! • Rule of thumb is to imbed whenever possible. • No modeling standards or CASE tools!

  30. A (denormalized) embedded structure An array of values { "business_id": "rncjoVoEFUJGCUoC1JgnUA", "full_address": "8466 W Peoria Ave\nSte 6\nPeoria, AZ 85345", "open": true, "categories": ["Accountants", "Professional Services", "Tax Services",], "city": "Peoria", "review_count": 3, "name": "Peoria Income Tax Service", "neighborhoods": [], "longitude": -112.241596, "state": "AZ", "stars": 5.0, "latitude": 33.581867000000003, "type": "business“ }

  31. A (denormalized) embedded structure An array of sub documents { “_id : “First Post”, “comments” : [ {“author” : “Bob”, “text” : “Nice Post!”}, {“author” : “Tom”, “text” : “Dislike!”} ], “comment_count” : 2 } This makes for a hairy query!

  32. A normalized structure //db.post schema { “_id” : “First Post”, “author” : “Rick”, “text” : “This is my first post.” } //db.comments schema { “_id” : ObjectID(...), “post_id” : “First Post”, “author” : “Bob”, “text” : “Nice Post!” }

  33. A polymorphic structure • When all the documents in a collection are similarly, but not identically, structured. • Enables simpler schema migration. • custom_field_1 • no more of this crap • Better mapping of object – oriented inheritance and polymorphism.

  34. A polymorphic structure //Page document (stored in nodes collection) { _id : 1, title: “Welcome”, url: “/”, type: “page”, text: “Welcome to my wonderful wiki.” } //Photo document (also stored in nodes collection) { _id: 3, title: “Cool Photo”, url: “/photo.jpg”, type: “photo”, content: Binary(...) }

  35. RmongoDB • Two packages available • Rmongo = Dodge Omni • rmongoDB = Porche • RmongoDB usage example

  36. Final Thoughts • Data Architects should NOT be designing NoSQL data structures • Are NoSQL DBs going to totally replace RDBMS? • Polyglot Persistence

  37. Questions? • You should consider this presentation a book report. • I’ve only been studying this stuff for a month. • I MIGHT have an answer to your question. • I might not...

More Related