330 likes | 481 Views
SQL, Databases, and Ensembl Modules. Please look for next lecture. Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html. Survey of Databases (if time). Ensembl web interface Ensembl Biomart. RDBMS. Oracle and Sybase (many others)
E N D
Please look for next lecture Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html
Survey of Databases(if time) Ensembl web interface Ensembl Biomart
RDBMS • Oracle and Sybase (many others) • industry standard, commercial products • development and management tools • PostgresSQL • full-featured relational DBMS • open source • found in most linux distributions • handles unusual datatypes well which adds flexibility for future extensions • MySQL • open-source relational DBMS • easy to setup and use • Linux/Windows/Mac • Each has variations • SQL, datatypes, functions, features
Structured Query Language (SQL) • SQL is the standard language used to create, modify, maintain, and query relational databases. • SQL commands are issued within the context of the DBMS interface • SQL commands can be passed to the DBMS by other programs (such as perl, PHP, Java, manually, etc.) • Many venders have extensions (Sybase-SQL) • SQL Example: • select FIELD from TABLE where CONDITION • select TABLE.FIELD from TABLE where CONDITION • select program from software where program like 'blast%';
SQL datatypes • datatype -- describes the data stored in a particular column of a table • typically is either numeric or character strings • SQL defines subtypes that set different upper limits on the size of text or numerical data • Also -- special types such as DATE, MONEY
SQL datatypes • INT • FLOAT • REAL (larger float) • CHAR -- fixed length text string • TEXT -- variable length text • BLOB -- variable length binary field • DECIMAL -- real number stored as character string • DATE • TIMESTAMP -- value updates every time the record is modified • ENUM -- limited set of options (numeric or named) • SET -- value is one of a limited set
Creating a new table in the database CREATE TABLE tablename (columnane type [modifiers], columanme type [modifiers]) create table software_package ( packid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, packname VARCHAR(100), packurl VARCHAR(255), function TEXT, keyword ENUM, os SET, format SET, archfile VARCHAR(255) );
Inserting data INSERT INTO table(colname1, colname2, columname3) VALUES ('value1','value2','value3')
Accessing Data -- SELECT SELECT [fields] FROM [table] where [condition] select * from [table] where [condition] select packname, packurl from software select packname, packurl from software where keyword = "sequence alignment"; select packname, packurl from software where packname like "B%";
Joining multiple tables • SELECT can "join" two related tables • relationships between tables are created by replicating information ("primary key") from one table as a "foreign key" in another table select attribute from table1, table2 where primarykey1 = foreignkey2 SELECT packname, packurl, reference_date FROM software, reference WHERE software.package_id = reference.package_id AND reference_date >= 1998;
Example select Clients.Last, Clients.First, Clients.Phone from Clients where Clients.City = “El Paso” order by Clients.Last, Clients.First select Last, First, Phone from Clients where City = "El Paso" order by Last, First
SQL Basics • select * from table_name • select count(*) from table_name • Find start times of Entertainers where Jameson is the Client • Nested query (may not always be supported) • select StartTime from Engagements where ClientID = (select ClientID from Clients where Last = “Jameson”) • Join • select Engagements.StartTime, Clients.ClientID from Engagements, Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” • select Engagements.StartTime from Engagements,Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” • insert into TABLE values (val1, val2, …) • insert into TABLE (FIELDa, FIELDb, …FIELDn) values (value1, value2, …) • insert into Clients values (“Terry”, “Braun”, 11/10/03,555-6666)
…as a spreadsheet Flattened table structure, and lined up the rows. Appears to be okay…. Except… The relationships in this trivial example are more complicated than a simple flat structure such as a spreadsheet. Example) The first Engagement is between Client 9001, and Entertainer 3002
SQL Basics delete TABLE where FIELD = value update TABLE set FIELD = value drop DATABASE
exon transcript_id exon_num sequence_start sequence_stop intron transcript_id intron_num sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name description date transcript id sequence_id source source_id sequence id target_id type sequence chr_name strand genomic_start genomic_stop source source_id refresh target id date gene_name description accession status set_target_join set_id target_id rank cas_rank cas_options set_table id project_id name date description
Sample Data exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description
Extracting Data from a DB • This simple figure provides me with enough understanding of the database structure to extract data
Local SQL Example: • We have gene name (BBS4), and would like to find project name, and the exon number, start/stops • select id from target where gene_name = "BBS4" • 11 • select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = TARGET_ID • 3 • select project.name from project, set_table where set_table.project_id = project.id and set_table.id = SET_ID • project333 • select id from sequence where target_id = TARGET_ID • 33 • select id from transcript where sequence_id = SEQUENCE_ID • 68 • select exon_num, sequence_start, sequence_stop from exon where transcript_id = TRANSCRIPT_ID • Whew.
MySQL demo (gscr) • mysql -h gscr -u tabraun –p • use trapss;
TrAPSS Database Show tables; Show columns from table TABLE_NAME;
exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name =testset date description • select id from target where gene_name = "BBS4";
exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id = 23 date gene_name = BBS4 description accession status target_set_info set_id target_id = 23 rank = 5 cas_rank cas_options set_table id project_id name = testset date description select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = 23; (3 rows ==> in 3 sets)
exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id = target_id rank = 5 cas_rank cas_options set_table id = 7 project_id name =testset date description select project.name from project, set_table where set_table.project_id = project.id and set_table.id = 7;
exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id = 23 type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select id from sequence where target_id = 23;
exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id = 2216 source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select id from transcript where sequence_id = 2216;
exon transcript_id = 5053 exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id= 5053 sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select exon_num, sequence_start, sequence_stop from exon where transcript_id = 5053;
SQL Examples – One Large Query select distinct exon_num, project.name, sequence_start, sequence_stop from exon, transcript, sequence, target, target_set_info, set_table, project where target.gene_name = "BBS4" and project.name = "pro1" and set_table.name = "testset" and set_table.project_id = project.id and target_set_info.set_id = set_table.id and target_set_info.target_id = target.id and sequence.target_id = target.id and transcript.sequence_id = sequence.id and exon.transcript_id = transcript.id order by exon_num;
Executing SQL commands from a text filehttp://www.mysql.org/doc/refman/5.0/en/batch-commands.html • However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here: • shell> mysql db_name < text_file • If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line: • shell> mysql < text_file • If you are already running mysql, you can execute an SQL script file using the source or \. command: • mysql> source file_name • mysql> \. file_name
Remotely mysql -h gscr.eng.uiowa.edu -u tabraun -p
Setting up MySQL • server -- database application that is running on the machine where the data is stored • runs as a "daemon" on Unix machines • a process that is always on, listening for and responding to requests (from clients) • mysqld -- server • client -- programs that connect to the server and request data • can be on same machine as server, or • located on other machines and connected by network • mysql -- client • mysqladmin -- administration tool
MySLQ help http://dev.mysql.com/ Show tables; Show columns from table TABLE_NAME; Show TrAPSS (time permitting)
Survey of Databases(if time) Ensembl web interface Ensembl Ensmart