1 / 33

SQL, Databases, and Ensembl Modules

SQL, Databases, and Ensembl Modules. Please look for next lecture. Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html. Survey of Databases (if time). Ensembl web interface Ensembl Biomart. RDBMS. Oracle and Sybase (many others)

Download Presentation

SQL, Databases, and Ensembl Modules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL, Databases, and Ensembl Modules

  2. Please look for next lecture Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html

  3. Survey of Databases(if time) Ensembl web interface Ensembl Biomart

  4. RDBMS • Oracle and Sybase (many others) • industry standard, commercial products • development and management tools • PostgresSQL • full-featured relational DBMS • open source • found in most linux distributions • handles unusual datatypes well which adds flexibility for future extensions • MySQL • open-source relational DBMS • easy to setup and use • Linux/Windows/Mac • Each has variations • SQL, datatypes, functions, features

  5. Structured Query Language (SQL) • SQL is the standard language used to create, modify, maintain, and query relational databases. • SQL commands are issued within the context of the DBMS interface • SQL commands can be passed to the DBMS by other programs (such as perl, PHP, Java, manually, etc.) • Many venders have extensions (Sybase-SQL) • SQL Example: • select FIELD from TABLE where CONDITION • select TABLE.FIELD from TABLE where CONDITION • select program from software where program like 'blast%';

  6. SQL datatypes • datatype -- describes the data stored in a particular column of a table • typically is either numeric or character strings • SQL defines subtypes that set different upper limits on the size of text or numerical data • Also -- special types such as DATE, MONEY

  7. SQL datatypes • INT • FLOAT • REAL (larger float) • CHAR -- fixed length text string • TEXT -- variable length text • BLOB -- variable length binary field • DECIMAL -- real number stored as character string • DATE • TIMESTAMP -- value updates every time the record is modified • ENUM -- limited set of options (numeric or named) • SET -- value is one of a limited set

  8. Creating a new table in the database CREATE TABLE tablename (columnane type [modifiers], columanme type [modifiers]) create table software_package ( packid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, packname VARCHAR(100), packurl VARCHAR(255), function TEXT, keyword ENUM, os SET, format SET, archfile VARCHAR(255) );

  9. Inserting data INSERT INTO table(colname1, colname2, columname3) VALUES ('value1','value2','value3')

  10. Accessing Data -- SELECT SELECT [fields] FROM [table] where [condition] select * from [table] where [condition] select packname, packurl from software select packname, packurl from software where keyword = "sequence alignment"; select packname, packurl from software where packname like "B%";

  11. Joining multiple tables • SELECT can "join" two related tables • relationships between tables are created by replicating information ("primary key") from one table as a "foreign key" in another table select attribute from table1, table2 where primarykey1 = foreignkey2 SELECT packname, packurl, reference_date FROM software, reference WHERE software.package_id = reference.package_id AND reference_date >= 1998;

  12. Example select Clients.Last, Clients.First, Clients.Phone from Clients where Clients.City = “El Paso” order by Clients.Last, Clients.First select Last, First, Phone from Clients where City = "El Paso" order by Last, First

  13. SQL Basics • select * from table_name • select count(*) from table_name • Find start times of Entertainers where Jameson is the Client • Nested query (may not always be supported) • select StartTime from Engagements where ClientID = (select ClientID from Clients where Last = “Jameson”) • Join • select Engagements.StartTime, Clients.ClientID from Engagements, Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” • select Engagements.StartTime from Engagements,Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” • insert into TABLE values (val1, val2, …) • insert into TABLE (FIELDa, FIELDb, …FIELDn) values (value1, value2, …) • insert into Clients values (“Terry”, “Braun”, 11/10/03,555-6666)

  14. …as a spreadsheet Flattened table structure, and lined up the rows. Appears to be okay…. Except… The relationships in this trivial example are more complicated than a simple flat structure such as a spreadsheet. Example) The first Engagement is between Client 9001, and Entertainer 3002

  15. SQL Basics delete TABLE where FIELD = value update TABLE set FIELD = value drop DATABASE

  16. exon transcript_id exon_num sequence_start sequence_stop intron transcript_id intron_num sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name description date transcript id sequence_id source source_id sequence id target_id type sequence chr_name strand genomic_start genomic_stop source source_id refresh target id date gene_name description accession status set_target_join set_id target_id rank cas_rank cas_options set_table id project_id name date description

  17. Sample Data exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description

  18. Extracting Data from a DB • This simple figure provides me with enough understanding of the database structure to extract data

  19. Local SQL Example: • We have gene name (BBS4), and would like to find project name, and the exon number, start/stops • select id from target where gene_name = "BBS4" • 11 • select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = TARGET_ID • 3 • select project.name from project, set_table where set_table.project_id = project.id and set_table.id = SET_ID • project333 • select id from sequence where target_id = TARGET_ID • 33 • select id from transcript where sequence_id = SEQUENCE_ID • 68 • select exon_num, sequence_start, sequence_stop from exon where transcript_id = TRANSCRIPT_ID • Whew.

  20. MySQL demo (gscr) • mysql -h gscr -u tabraun –p • use trapss;

  21. TrAPSS Database Show tables; Show columns from table TABLE_NAME;

  22. exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name =testset date description • select id from target where gene_name = "BBS4";

  23. exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id = 23 date gene_name = BBS4 description accession status target_set_info set_id target_id = 23 rank = 5 cas_rank cas_options set_table id project_id name = testset date description select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = 23; (3 rows ==> in 3 sets)

  24. exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id = target_id rank = 5 cas_rank cas_options set_table id = 7 project_id name =testset date description select project.name from project, set_table where set_table.project_id = project.id and set_table.id = 7;

  25. exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id source = Ensembl source_id sequence id target_id = 23 type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select id from sequence where target_id = 23;

  26. exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id sequence_id = 2216 source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select id from transcript where sequence_id = 2216;

  27. exon transcript_id = 5053 exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id project id name = pro1 description date transcript id= 5053 sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status target_set_info set_id target_id rank = 5 cas_rank cas_options set_table id project_id name = testset date description select exon_num, sequence_start, sequence_stop from exon where transcript_id = 5053;

  28. SQL Examples – One Large Query select distinct exon_num, project.name, sequence_start, sequence_stop from exon, transcript, sequence, target, target_set_info, set_table, project where target.gene_name = "BBS4" and project.name = "pro1" and set_table.name = "testset" and set_table.project_id = project.id and target_set_info.set_id = set_table.id and target_set_info.target_id = target.id and sequence.target_id = target.id and transcript.sequence_id = sequence.id and exon.transcript_id = transcript.id order by exon_num;

  29. Executing SQL commands from a text filehttp://www.mysql.org/doc/refman/5.0/en/batch-commands.html • However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here: • shell> mysql db_name < text_file • If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line: • shell> mysql < text_file • If you are already running mysql, you can execute an SQL script file using the source or \. command: • mysql> source file_name • mysql> \. file_name

  30. Remotely mysql -h gscr.eng.uiowa.edu -u tabraun -p

  31. Setting up MySQL • server -- database application that is running on the machine where the data is stored • runs as a "daemon" on Unix machines • a process that is always on, listening for and responding to requests (from clients) • mysqld -- server • client -- programs that connect to the server and request data • can be on same machine as server, or • located on other machines and connected by network • mysql -- client • mysqladmin -- administration tool

  32. MySLQ help http://dev.mysql.com/ Show tables; Show columns from table TABLE_NAME; Show TrAPSS (time permitting)

  33. Survey of Databases(if time) Ensembl web interface Ensembl Ensmart

More Related