140 likes | 265 Views
CS143 Project 1. Due: Oct 24 th , 11:59 PM All the materials will be posted in courseweb . Before we start. Two things to do: Find your partner At most 2 students
E N D
CS143 Project 1 Due: Oct 24th, 11:59 PM All the materials will be posted in courseweb.
Before we start • Two things to do: • Find your partner • At most 2 students • Send team information (you and your partner's name, UID, email, expected password for your MySQL account) to one of TAs by Oct 12th • Note: This team is for your project, NOT homework. You need to finish your homework individually. • Get familiar with Linux and MySQL
Linux • A Unix-like, open source operating system • All the projects will be done on the SEASNET linux server. • lnxsrv03.seas.ucla.edu • How to access the server? • If you are using SEASNET machine, all SEASNET machines already have a secure shell client installed, so you simply need to run the client. • If you need to access from a personal machine that does not have a secure shell client, you can download a windows secure shell client http://www.filewatcher.com/m/SSHSecureShellClient-3.2.9.exe.5517312.0.0.html Or Putty http://www.chiark.greenend.org.uk/~sgtatham/putty/ • Mac OS X or Unix machines have a secure shell client preinstalled. Simply type "ssh -l <userid> lnxsrv03.seas.ucla.edu" within your command line interface
Linux • Account for Linux Server • Apply SEASNET account if you don’t have one • Frequently used Linux command: • http://linuxcommand.org/learning_the_shell.php • Try it!
MySQL • Already installed on lnxsrv03 server • Username & password will be assigned after you submit your team request. • MySQL document: http://dev.mysql.com/tech-resources/articles/mysql_intro.html
Project 1 • Step 1: Loading the data • Step 2: Running easy queries • Step 3: Applying some constraints • Step 4: Join operation • Step 5: A more complicated query • Step 6: Putting all together
Step 1: Loading the data • There are 5 data files located at /u/cs/class/cs143/cs143ta/proj1/data/ • Author.csv, Coauthored.csv, Authored.csv, Paper.csv, Cites.csv • Also available at the courseweb • Task: Load these 5 data files to MySQL using the “load data” command • Before Loading, you should: • 1. Log in MySQL • 2. Use your own database • 3. Create 5 tables: Author, Coauthored, Authored, Paper, Cites
Step 1: Loading the data Table Cites: • ID (INTEGER) • paper1ID (INTEGER) • paper2ID (INTEGER) • is_self_citation(INTEGER) Table Authored: • ID (INTEGER) • AuthorID(INTEGER) • paperID(INTEGER) • Email (VARCHAR) • rank_in_author_list(INTEGER) • original_name_str(VARCHAR) • email_domain(VARCHAR) • email_country(VARCHAR) • affiliation_str(VARCHAR) • affil(VARCHAR) Table Paper: • ID (INTEGER) • paper_id(INTEGER) • title_str(VARCHAR) • authors_str(VARCHAR) • area (VARCHAR) • num_abstract_wds(INTEGER) • num_authors(INTEGER) • num_kb(INTEGER) • num_pages(INTEGER) • num_revisions(INTEGER) • num_title_wds(INTEGER) • comments_str(VARCHAR) • submit_date(DATE) • submitter_email(VARCHAR) • submitter_name(VARCHAR) Table Author: • ID (INTEGER) • author_name(VARCHAR) • first_name(VARCHAR) • last_name(VARCHAR) • preferred_name(VARCHAR) Table CoAuthored: • ID (INTEGER) • author1ID (INTEGER) • author2ID (INTEGER) • paper_ID(INTEGER)
Step 2: Running some easy queries • Write queries that return the answers to these questions: • “Give me the author_name of all the Authors with first_name ‘Kevin’.” • “Return author_name and preferred_name of all the Authors who have different author_name and preferred_name.” Sort your results first by author_name then by preferred_name.
Step 3: Applying some constraints • Add a unique key constraint to the CoAuthored table in which the combination of (author1ID, author2ID, paper_ID) should be unique. • Add foreign key constraints for author1ID and author2ID. • More details in project description.
Step 4: Join operation • Write queries that return the answers to these questions: • “Return the author_name of all co-authors of the author with ID ‘42673’.” • “Return the author_name of all authors who have more than 10 co-authors.”
Step 5: A more complicated query • Write one query that returns the answer to following question: • “Give me the author_name of all the authors with the number of papers they co-authored in, in the decreasing order of the number of paper.”
Step 6: Putting all together • Create a script named P1 that shows every step in this part of the project. You can use the ‘--' tag to make comments within your SQL script. Make sure you give adequate comments documenting each part of each step. • Execute the script and save all outputs in a file call P1_Output. • Add one README file, which includes you and your partner's name, UID, email, and any other information you think is useful. • Make a zip file and submit through courseweb.