290 likes | 491 Views
Big Data and NoSQL. Cloud Computing : Module 4. Objectives. Big Data. http://mashable.com/2012/06/22/data-created-every-minute/. 90% of world’s data was created in past two years. http://mashable.com/2012/06/22/data-created-every-minute/. Where does this data come from?.
E N D
Big Data and NoSQL Cloud Computing : Module 4
90% of world’s data was created in past two years http://mashable.com/2012/06/22/data-created-every-minute/
Where does this data come from? • 1. Machine generated/Sensor data : eg. Logs, call records • 2. Social data: eg. Facebook, twitter • 3. Traditional Enterprise data: eg. Web store transactions
3 V’s of Big Data Volume Volume Terabytes of Tweets -> Product sentiment analysis Annual meter readings ->predict power consumption Velocity Variety Velocity Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real-time to predict customer churn faster Big Data Variety Monitor 100’s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Benefits • Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually. • Optimize routes for many thousands of package delivery vehicles while they are on the road. • Analyze millions of SKUs to determine prices that maximize profit and clear inventory. • Generate retail coupons at the point of sale based on the customer's current and past purchases. • Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers. • Recalculate entire risk portfolios in minutes. • Quickly identify customers who matter the most. • Use clickstream analysis and data mining to detect fraudulent behavior.
Scale Up Adding resources to a single node. i.e adding more CPU , more RAM etc. to a single computer.
Amore nodes or servers to the system i.e if there is one computer in a system then scaling out means adding more computers to the system. Scale Out
NoSQL Database Types • Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. • Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB. • Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
Why Not RDBMS? • Reading Data • Accelerate only data reads • Cold cache thrash – Caches are temporary, so therefore whenever an application seeks some data, it first tries to find the data in caching tier and when it doesn’t find the data there then it is forced to read the data from the RDBMS thus delaying both read and write • Another tier to manage - In RDBMS , caching is developed as a separate infrastructure tier thus inserting another infrastructure tier into the existing architecture adds more complexity. • Partitioning(Sharding) • Application needs to be Partition Aware • When you fill a shard, it is highly disruptive to re-shard. • Relationships are broken i.e. referential integrity is no more. • You lose some of the most important benefits of the relational model. • You have to create and maintain a schema on every server . • Schema • RDBMS technology requires the strict definition of a “schema” prior to storing any data into the database. It’s an integral part as it defines the structure of the database. In RDBMS changes like capturing new information, changing the data formats and content of the application, are extremely turbulent and therefore are frequently avoided.
What is MongoDB Document Database { id: "00e8da9d", type: "Film” pricing: { ... } details: { title: "The Matrix", director: [ "Andy Wachowski", "Larry Wachowski" ], writer: [ "Andy Wachowski", "Larry Wachowski" ], ..., aspect_ratio: "1.66:1" }, } …. }
Installing MongoDB Download the binary files for the desired release of MongoDB. Download the binaries from https://www.mongodb.org/downloads. Extract the files from the downloaded archive. tar -zxvf mongodb-linux-x86_64-2.6.1.tgz Copy the extracted archive to the target directory. Copy the extracted folder to the location from which MongoDB will run. mkdir -p mongodb cp -R -n mongodb-linux-x86_64-2.6.1/ mongodb Ensure the location of the binaries is in the PATH variable. The MongoDB binaries are in the bin/ directory of the archive. To ensure that the binaries are in your PATH, you can modify your PATH. For example, you can add the following line to your shell’s rc file (e.g. ~/.bashrc): export PATH=<mongodb-install-directory>:$PATH Replace <mongodb-install-directory> with the path to the MongoDB binaries.
Running MongoDB Create the data directory. The following example command creates the default /data/db directory: mkdir-p /data/db Set permissions for the data directory. Before running mongod for the first time, ensure that the user account running mongod has read and write permissions for the directory. Run MongoDB. To run MongoDB, run the mongod process at the system prompt. If necessary, specify the path of the mongod or the data directory. See the following examples. Run without specifying paths If your system PATH variable includes the location of the mongod binary and if you use the default data directory (i.e., /data/db), simply enter mongod at the system prompt: mongod Stop MongoDB as needed. To stop MongoDB, press Control+C in the terminal where the mongod instance is running.
Where to Go Further? http://docs.mongodb.org/manual/tutorial/ https://university.mongodb.com/
Handling Databases Connect to a mongod mongo From the mongo shell, display the list of databases, with the following operation: show dbs Switch to a new database named mydb, with the following operation: use mydb Confirm that your session has the mydb database as context, by checking the value of the db object, which returns the name of the current database, as follows: db Inserting Data to Collections j = { name : "mongo" } k = { x : 3 } db.testData.insert( j ) db.testData.insert( k ) Dropping a Database db.dropDatabase() > { "dropped" : "mydb", "ok" : 1 }
Inserting Data SQL INSERT INTO post VALUES(title, description, tags, likes) VALUES (‘MongoDBOverview’, ‘MongoDB is no sql database’, ‘database’, ‘100’) MongoDB db.post.insert([ { title: 'MongoDB Overview', description: 'MongoDB is no sql database', tags: 'database', likes: 100 }
Retrieving Data db.mycol.find({"tags":"mongodb","title": "MongoDB Overview"}).pretty() { "_id": ObjectId(7df78ad8902c), "title": "MongoDB Overview", "description": "MongoDB is no sql database", "tags": ["mongodb", "database", "NoSQL"], "likes": "100" }
Where to Go Further? http://docs.mongodb.org/manual/tutorial/ https://university.mongodb.com/