390 likes | 495 Views
Biocomputing Databases and Perl. 13 th February 2006. HTTP. CGI. Web Server. Application Server. Client. Request. Response. HTTP. CGI. DB API. Web Server. Application Server. Client. DB Server +DB. e.g. PC + Apache +PERL (with CGI module) + MySQL. The Database Tier.
E N D
BiocomputingDatabases and Perl 13th February 2006
HTTP CGI Web Server Application Server Client Request Response
HTTP CGI DB API Web Server Application Server Client DB Server +DB e.g. PC + Apache +PERL (with CGI module) + MySQL
The Database Tier • stores and retrieves data. • manages updates, allows simultaneous (concurrent) access from web servers, provides security, ensures the integrity of data • requires complex software. • Database Management Systems (DBMSs) • searches and manages data that's stored in databases • a database is a collection of related data, and an application can have more than one database • interface is accessed using SQL (Standard Query Language) that's used to define and manipulate databases and data
Why use a database server? • more than one user who needs to access the data at the same time. • a moderate amount of data • relationships between the stored data items • more than one kind of data object • constraints that must be rigidly enforced on the data, such as field lengths, field types etc.. • If data must be queried to produce reports or results. • large amount of data that must be searched quickly. • If security is important • When adding, deleting, or modifying data is a complex process.
Relational databases • Entities and Attributes • An entity is a person, place, event, or thing about which data is collected • An entity set is a named collection of entities sharing common characteristics • Attributes are characteristics of the entity • Attributes are also called fields • Tables • A table holds related entities or an entity set • Also called relations • Comprised of rows and columns
Tables • Two-dimensional structure with rows and columns • Each row (tuple) represents a single entity • Columns represent attributes • Row/column intersection represents single value • Tables must have an attribute to uniquely identify each row, a unique key • Column values all have same data format • Each column has range of values called attribute domain • Order of the rows and columns is immaterial to the DBMS
Fingerprint Sequence Motif
mySQL • Command-line interpreter • UNIX/Mac /usr/local/bin/mysql -uhugh -pshhh • Windows Start > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client
Managing databases • Creating databases mysql> CREATE DATABASE phyloprints; • To use that database use phyloprints; • Creating tables CREATE TABLE "fingerprint" ( "fprint_accn" varchar(15), "identifier" varchar(15), "motifs" int2, "date" date, "up_date" date, "family_title" text, "family_doc“ varchar(18), "category" text, "second_accn" varchar(15), "pseudo" bool ); SHOW CREATE TABLE wine;
Table = fingerprint +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | fprint_accn | varchar() | 15 | | identifier | varchar() | 15 | | motifs | int2 | 2 | | date | date | 4 | | up_date | date | 4 | | family_title | text | var | | family_doc | varchar() | 18 | | category | text | var | | second_accn | varchar() | 15 | | pseudo | bool | 1 | +----------------------------------+----------------------------------+-------+ Table = motif +----------------------------------+----------------------------------+-------+ |Field | Type |Length | +----------------------------------+----------------------------------+-------+ |fprint_accn | varchar() | 18 | | motif | int2 | 2 | | repeat | varchar() | 4 | | seqn_accn | varchar() | 15 | | seqn_fragment | varchar() | 35 | | start_position | int4 | 4 | | inter_motif_dist | int4 | 4 | | initial | bool | 1 | | final | bool | 1 | +----------------------------------+----------------------------------+-------+ attributes
Modifiers • Are applied to attributes • NOT NULL (row can’t exist without this attribute having a value) identifier varchar(15) NOT NULL, • DEFAULT (sets the data to the value that follows when no data is supplied) pseudo" bool DEFAULT f,
Keys • Primary key – one or more attributes that uniquely identify a row in a table • It is essential that every table has one PRIMARY KEY (fprint_accn), key names*fprint_accn,identifier) type=MyISAM;
Removing databases and tables • DROP statement (make sure you want to do this – it doesn’t check!) DROP TABLE seqn; DROP DATABASE phyloprints;
Inserting/Updating/Deleting Data • Inserting Data INSERT INTO fingerprint VALUES (‘PR90008’, ‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t); • Inserting lots of data rows INSERT INTO fingerprint VALUES (‘PR90008’, ‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t),(‘PR00240’,’ADRENRGCA1DR’,7,1996-08-14,2000-02-18,’Alpha-1D adrenergic receptor signature’,’PDOC00240’,f);
DELETE statement • All data DELETE FROM fingerprint; • Deleting specific rows – use WHERE DELETE FROM fingerprint WHERE identifer = ‘CRYSTALLIN’; • UPDATE statement UPDATE fingerprint SET fprint_accn = ‘PR00001’ WHERE identifier=‘CRYSTALLIN’;
Querying with SQL • Key operators – SELECT – PROJECT – JOIN • Other operators – INTERSECT – UNION (union compatible tables) – DIFFERENCE – PRODUCT – DIVIDE
Querying databases with SQL • SELECT statement SELECT identifier, fprint_accn FROM fingerprint; SELECT * FROM fingerprint; • WHERE clause SELECT * from fingerprint WHERE fprint_accn=‘PR00001’;
select identifier,fprint_accn from fingerprint where pseudo='t' ORDER BY fprint_accn; • select fprint_accn,COUNT(*) from fingerprint where pseudo='t' GROUP BY fprint_accn; • select DISTINCT seqn_accn from motif; • select distinct seqn_accn from motif LIMIT 10;
Join queries • output data that's based on relationships between two or more tables
BiocomputingDatabases and Perl 15th February 2006
Outline • Database connectivity in Perl using DBI module • A series of small exercises
Querying databases using Perl #! /usr/bin/perl -w # intro6.pl - connect to MySQL, retrieve data, write plain text output use strict; use DBI; my $user = ‘george’; my $pass = ‘george’; my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); $count = 0; while (my @val = $sth->fetchrow_array ()) { printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++$count; push(@allval,@val); } print "count rows total\n"; $sth->finish (); $dbh->disconnect (); exit (0);
%./intro6.pl name = Fargo-Moorhead Twins, wins = 36, losses = 16 name = Winnipeg Maroons, wins = 24, losses = 26 name = Minot Why Nots, wins = 19, losses = 23 name = Warren Wanderers, wins = 16, losses = 30 4 rows total
Perl example… use DBI; my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database: " . DBI->errstr; my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die "Couldn't prepare statement: " . $dbh->errstr; print "Enter name> "; while ($lastname = <>) { # Read input from the user my @data; chomp $lastname; $sth->execute($lastname) # Execute the query or die "Couldn't execute statement: " . $sth->errstr; # Read the matching records and print them out while (@data = $sth->fetchrow_array()) { my $firstname = $data[1]; my $id = $data[2]; print "\t$id: $firstname $lastname\n"; } if ($sth->rows == 0) { print "No names matched `$lastname'.\n\n"; } $sth->finish; print "\n"; print "Enter name> "; } $dbh->disconnect;
Query databases over the Web #! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write HTML output use strict; use DBI; use CGI qw(:standard); my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); print header(), start_html ("team data"); $count = 0; while (my @val = $sth->fetchrow_array ()) { print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2])); ++$count; } print p ("$count rows total"), end_html (); $sth->finish (); $dbh->disconnect (); exit (0);
More about data retrieval with DBI.pm • fetchrow_arrayref $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); while (my $ref = $sth->fetchrow_arrayref ()) { printf "name = %s, wins = %d, losses = %d\n", $ref->[0], $ref->[1], $ref->[2]; } $sth->finish ();
#!/usr/bin/perl use DBI; my $db = “sequence”; my $server = ‘localhost’; my $user = ‘root’; my $passwd = ‘passwd’; my $dbConnection = DBI->connect(“dbi::mysql:$db:$server”,$user,$passwd) my $query = “show tables”; my $sql = $dbConnection->prepare($query); $sql ->execute(); while(my $row = $sql->fetchrow_arrayref){ print join(“\t”,@$row),”\n”; } $dbConnection->disconnect; exit;
Higher retrieval methods • selectrow_array() • to retrieve a single row or a single column value • selectcol_arrayref() • returns the first column of a result set, as a reference to an array of values. • selectall_arrayref() • retrieves the entire result set as a matrix and returns a reference to it
Inserting data in the database sub new_employee { # Arguments: database handle; first and last names of new employee; # department ID number for new employee's work assignment my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle; $insert_handle->execute($first, $last, $department) or return 0; $update_handle->execute($department) or return 0; return 1; # Success }
Inserting data in the database sub new_employee { # Arguments: database handle; first and last names of new employee; # department ID number for new employee's work assignment my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle; my $success = 1; $success &&= $insert_handle->execute($first, $last, $department); $success &&= $update_handle->execute($department); my $result = ($success ? $dbh->commit : $dbh->rollback); unless ($result) { die "Couldn't finish transaction: " . $dbh->errstr } return $success; }
A short cut for DELETING, UPDATING and INSERTING data use the do statement $dbh->do('DELETE FROM people WHERE age > 65');
A little on more keys… • A foreign key is a column or columns whose values are the same as the primary key of another table. You can think of a foreign key as a copy of primary key from another relational table. The relationship is made between two relational tables by matching the values of the foreign key in one table with the values of the primary key in another.
Reminders • Executable CGI files • Configuring Apache to read cgi files from cgi-bin • In httpd.conf • ScriptAlias /cgi-bin/ "C:/Program Files/Apache Group/Apache2/cgi-bin/"