1 / 34

CSE 482: Big Data Analysis

CSE 482: Big Data Analysis. Lecture 4: SQL and Database Querying. Outline of Today’s Lecture. Previous lecture talks about how to collect online data from the Web Today’s lecture focuses on how to store the data into a database What is a database? What is a database management system?

paul
Download Presentation

CSE 482: Big Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 482: Big Data Analysis Lecture 4: SQL and Database Querying

  2. Outline of Today’s Lecture • Previous lecture talks about how to collect online data from the Web • Today’s lecture focuses on how to store the data into a database • What is a database? • What is a database management system? • What is SQL (structured query language)? • Using SQL with python

  3. What is a Database? • Collection of related data central to a given enterprise • Examples: • Banking – savings/checking accounts, mortgage, etc • Vehicle registration – car registration, year, make, etc • Student registration – name, PID, GPA, last semester enrolled, etc • Electronic Medical Records – name, SSN, date of birth, address, symptoms, diseases, medication, test results, etc • Sales – customer purchase transactions, customer complaints, etc

  4. Database Management System (DBMS) • A collection of programs that enables users to create and maintain a database • Examples of DBMS • MS Access, MS SQL Server, IBM DB2, Oracle, Sybase, Postgres, mySQL, and many more • These are relational DBMS (which mean, these systems would model the data as a collection of relations/tables) • We will use mySQL as our DBMS for this class

  5. MySQL Database Account • Every registered student will have access to a MySQL account on mysql-user.cse.msu.edu • To log in, go to: • http://www.cse.msu.edu/facility/phpMyAdmin/index.php • Username is your CSE usernamePassword is your PID (if you’re using it the first time)Server Choice: mysql-user • Send an email to manager@cse.msu.edu if you have problems logging in • Note that your database password is not the same as your CSE account password (unless you set them to be the same)

  6. MySQL Database Account

  7. SQL (Structured Query Language) • A language provided by relational DBMS to • Create a table • Alter the schema (structure) of a table • Insert data records into a table • Delete data records from a table • Query a table • etc

  8. Edit is a relationship between user and article Example: Wikipedia Database

  9. CREATE TABLE CREATE TABLE User(Username VARCHAR(50) PRIMARY KEY,IS_Admin BOOLEAN,Content VARCHAR(500)); CREATE TABLE Article (Title VARCHAR(50) PRIMARY KEY,Content VARCHAR(500) ); A primary key is used to uniquely identify the records in a table

  10. CREATE TABLE CREATE TABLE Edit ( Editor VARCHAR(50), Edited_Title VARCHAR(50), Modified_Time DATETIME, PRIMARY KEY(Editor, Edited_Title, Modified_Time); FOREIGN KEY (Editor) REFERENCES User(Username),FOREIGN KEY (Edited_Title) REFERENCES Article(Title) ); A foreign key is used to enforce the constraint that rows which were referenced in the Edit table exists in the referenced table

  11. Example of Foreign Key Inserting a row (XYZ, MySQL, 2013-01-01 10:10:10) is a violation of foreign key constraint because there is no Article titled MySQL

  12. MySQL Example You can also use phpMyAdmin interface to directly access the database:https://www.cse.msu.edu/phpMyAdmin/ Option to create new table

  13. MySQL Example Enter the columns of the table and their constraints

  14. Summary of useful MySQL commands • set password=password(‘new password’); • show databases; -- show the list of databases available to you • use dbname; -- use the database called dbname • show tables; -- show tables available • describe User; • truncate table User; • drop table User; • source script-file; -- executing SQL commands from a script-file • load data infile /path/file.txt into table User;

  15. DROP TABLE • Delete a table from the database • After deletion, the table can no longer be used in queries, updates, or any other commands since its description no longer exists • Example:DROP TABLE USER;

  16. ALTER TABLE • Add a new column to an existing table: ALTER TABLE USER ADD BIRTHDATE DATETIME; • Remove a column from an existing table: ALTER TABLE USER DROP COLUMN BIRTHDATE; • Modify an existing column in a table: ALTER TABLE USER MODIFY BIRTHDATE INTEGER DEFAULT 1900; • Modify constraints of a table: ALTER TABLE EDIT DROP PRIMARY KEY;

  17. INSERT or DELETE Operation INSERT INTO Article VALUES (‘Database’, ‘$@%@23’); INSERT INTO Article VALUES (‘Michigan’, ‘#525s36’); DELETE FROM Article WHERE Title=‘Database’; DELETE FROM Article WHERE Content=‘#525s35’;

  18. Retrieval Queries in SQL • Basic form of the SQL retrieval queries: SELECT <attribute list> FROM <table list> WHERE <condition> • <attribute list> is a list of column names whose values are to be retrieved by the query • <table list> is a list of table names required to process the query • <condition> is a conditional (Boolean) expression that identifies the rows to be retrieved by the query

  19. Example: Wikipedia Database

  20. SQL Query (for 1 Table) Query: Find the names of users who have administrative privileges SELECT Username FROM User WHERE isAdmin = true;

  21. Query: Show all the records in the User table SQL Query (for 1 Table) A wildcard (*) in the SELECT clause means retrieve all the columns SELECT * FROM User; Query has no WHERE clause (which means all rows will satisfy the condition)

  22. Query: Count the number of edits made by XYZ SQL Query (for 1 Table) Count is an aggregate function; used to aggregate the rows SELECT Count(*) FROM Edit WHERE Editor=‘XYZ’;

  23. Query: Count the number of edits made by each user SQL Query (for 1 Table) SELECT Editor, Count(*) FROM Edit GROUP BY Editor; GROUP BY clause means group the results based on the Editor column before summarizing them

  24. Query: Count the number of distinct articles edited by each user SQL Query (for 1 Table) SELECT Editor, Count(DISTINCT Edited_Title) FROM Edit GROUP BY Editor;

  25. Query: Find the top 2 editors who made the most edits SQL Query (for 1 Table) SELECT Editor, Count(*) FROM Edit GROUP BY Editor ORDER BY Count(*) DESC LIMIT 2;

  26. SQL Query for 2 Tables • Query: Find the titles of Wikipedia articles edited by administrators SELECT Edited_Title FROM User, Edit WHERE IsAdmin = True; SQL is wrong. Missing the join condition.

  27. SQL Query for 2 Tables • Query: Find the titles of Wikipedia articles edited by administrators SELECT Edited_Title FROM User, Edit WHERE Editor = Username AND IsAdmin = True; Join condition

  28. SQL Query for 2 Tables • Query: For each non-administrator, count the number of distinct articles they have edited SELECT Editor, COUNT(DISTINCT Edited_Title) FROM User, Edit WHERE Editor = Username AND IsAdmin = False GROUP BY Editor;

  29. Using MySQL with Python Install mysql-connector-python library conda install mysql-connector-python Steps Import mysql-connector module Create database connection cnx = mysql.connector.connect(username, password, host, database) Create a cursor cursor = cnx.cursor() Pass query to cursor for execution cursor.execute(query) If it is a database update query: cnx.commit() Close cursor and database connection

  30. Using MySQL with Python Example: load wiki_edit.txt file into MySQL database Schema: RevisionId, Article_name, EditDate, Editor

  31. Using MySQL with Python • Create a new table (and drop the old one) in MySQL

  32. Using MySQL with Python • Insert tuples from DataFrame into MySQL (after removing the unicode characters in article name) Make sure you commit the database changes

  33. Using MySQL with Python • Query table and print the results

  34. Accessing CSE MySQL from Home • You may not be able to access the database directly from your code outside of MSU network (firewall) • You may need to login to MSU VPN or EGR VPN

More Related