290 likes | 418 Views
Running Other Programs And CGI Scripts. Teaching Survey. Please fill the teaching survey at: http://www.ims.tau.ac.il/tal/login.asp I read it closely, and I make changes in the course from year to year according to the feedback. Exam.
E N D
Teaching Survey Please fill the teaching survey at: http://www.ims.tau.ac.il/tal/login.asp I read it closely, and I make changes in the course from year to year according to the feedback.
Exam • The exam will be on the computers in the PC classroom, on the 31/1/2007 at 9:00 • The computers will be disconnected from the network (i.e. no internet access. Sorry… ) • You will receive a floppy disk (diskette) with some files, and the exam questions on paper. • You will write your solutions as normal Perl scripts and save them to the floppy, which you will submit at the end of the exam. • 2 A4 pages • Everything except BioPerl and CGI
Write a script that reads a DNA sequence from STDIN and prints its reverse complement. The sequence may be in either small or capital letters. • The file exam1.pl contains a script that reads a sequence file in Genbank format. Add the missing regular expression in order to find all CDS lines in line number 25. The regular expression should extract the coordinates of the start and stop codons. Fill in the appropriate variables in lines 27 and 28. • The file exam2.pl contains a script that reads a file in PDB format (see example in EHD1.pbd) and finds all the “ATOM…” lines. Write the subroutine getAtomInfo that is called for each such line. The subroutine has one parameter – the scalar string of the ATOM line. It should return the following data structure: {‘amino_acid’ => AMINO_ACID, ‘coordinates’ => [X,Y,Z], ‘amino_acid_number’ => N} • Make a copy of exam2.pl and name it exam3.pl. Add a new section at the end of the script that makes an array of arrays. Each internal array should hold all the hashes of the ATOMs that belong to a single amino acid of the protein. Some exam questions
Dealing with less common formats e.g. Rate4Site: Still not very widely used (54 citations so far…) so there is no BioPerl modules that will run it for you and read its output: #POS SEQ SCORE QQ-INTERVAL STD MSA DATA #The alpha parameter 1.5 1 K -0.9763 [-1.6621,-0.5750] 0.8777 6/6 2 V 0.9820 [-0.1107,2.2169] 1.5983 6/6 3 F 0.0035 [-0.9640,0.4935] 1.3195 6/6 4 S 0.2010 [-0.7766,0.8962] 1.3975 6/6 5 K -0.3480 [-1.1423,0.1673] 1.0990 6/6 6 C -0.7887 [-1.4855,-0.3560] 1.0182 6/6 7 E -0.9894 [-1.6621,-0.5750] 0.8714 6/6 8 L 0.0153 [-0.9640,0.4935] 1.3378 6/6 9 A -1.1347 [-1.6621,-0.7766] 0.7487 6/6 10 H -0.3200 [-1.1423,0.1673] 1.1252 6/6 11 K -0.3557 [-1.1423,0.1673] 1.1077 6/6 12 L -0.8331 [-1.4855,-0.3560] 0.9965 6/6 13 K -0.9763 [-1.6621,-0.5750] 0.8777 6/6 14 A 1.6809 [0.4935,2.2169] 1.6672 6/6 15 Q 1.4315 [0.1673,2.2169] 1.7297 6/6 16 E 0.1025 [-0.9640,0.8962] 1.3784 6/6 17 M 0.5006 [-0.5750,1.4226] 1.4456 6/6
Running programs from a script You may run programs using the system function: $exitValue = system("blast.exe ...");if ($exitValue!=0) {die "blast failed!";} This way the output of blast will be seen on the screen. Another way is to use “back-ticks” (left of the “1” key on your keyboard): @blastOutput = `blast.exe ...`; This way the output of blast is stored in the array.
Class exercise 15 • Write a script that runs clustalw on a given protein FASTA file (use ex15.zip from the website, use the help file in there!) • Modify the script: Now do both multiple sequence alignment, and build an NJ tree. • Modify the script: Now add a rate4site run on the output of clustalw (type “rate4site.exe -h” for help)
CGI: Common Gateway Interface • A CGI script is a script that is intended to be used over the internet. • A CGI script on a web server can be used by a user to obtain data from databases (e.g. Genbank web server) or run analyses for the user (e.g. Blast at NCBI). The results of the script are an HTML page.
HTML: What is a web page? • All web pages that you see on the internet are written in HTML. • HTML (HyperText Markup Language) is a computer language that defines how a web page will look in you web browser. • Web browsers (such as Microsoft Internet Explorer) read HTML text files and produce colorful graphical pages. • You can see the HTML source code of a web page in Explorer by clicking: View->Source Try it on the course web page: ob<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Author" content="Eyal Privman"> <meta name="GENERATOR" content="Mozilla/4.77 [en] (X11; U; IRIX64 6.5 IP27) [Netscape]"> <title>Perl Programming By Eyal Privman</title> <style> …
HTML basics • HTML uses tags. Tags are always enclosed in angle-brackets and are case-insensitive. For example: <head> • Tags typically occur in begin-end pairs. These pairs are in the form <tag> ... </tag> For example, if you want some text to be underlined in your page: <u>Aim:</u> The aim of this course is to introduce the participant
Structure of HTML documents • The whole document should be between <html> ... </html> • The text between <head> ... </head> includes general information about the page. • Inside the “head” section, use <title> ... </title> to write the title of the page. • The text between <body> ... </body> is the actual contents of the page
Class exercise 16 • Create the following HTML file and view it with Internet Explorer: • <html> <head> <title>Hello World Page</title> </head> <body> <h1> Hello World! </h1> </body> </html> • (name your file “class_ex16.1.html”)
Running a CGI over the web The easiest way to get yourself a webserver is if you have an account at the bioinformatics unit. (On the bioinfo server) You should place your HTML files and CGI script in your home directory on the bioinfo server. You will have to ask the staff of the bioinfo unit to open your account to web access. (They will create the needed directories for you)
Producing HTML page with a script Any Perl script can output its results in HTML, using simple print commands. The Perl CGI module can make it easier for you: #!/usr/local/bin/perl This is necessary on a UNIX serveruse CGI;my $cgi = new CGI; print $cgi->header . $cgi->start_html('Hello World Page') . $cgi->h1('Hello World!') . $cgi->end_html; exit (0); Tells the server everything is fine
Class exercise 16 2. Create the Perl script from the previous slide and test it.
An HTML form can run a CGI script Here is the HTML that makes this form that takes input (a name) and invokes a CGI script named script.pl, which should be placed in the directory cgi-bin: <HTML> <HEAD> <TITLE>HTML Form Example</TITLE> </HEAD> <BODY> <FORM method="GET" action="/cgi-bin/script.pl"> <h3>Enter your name:</h3> <p> <INPUT type="text" name="userName"> </p> <h3>Submit this Form</h3> <p> <INPUT type="submit" value="Send Data Now!"> </p> <h3>Reset this Form</h3> <p> <INPUT type="reset" value="Clear all my input now"> </p> </FORM> </BODY> </HTML>
Using the input in the CGI script Use the CGI function param to get the input that was entered into the form. To get a list of all parameter names:my @params = $cgi->param(); To get the value for a specific parameter name:my @params = $cgi->param(PARAM_NAME); For the example form in the previous slide, the CGI script could do this: print $cgi->h1('Hello '.$cgi->param("userName").'!');
Class exercise 17 The UNIX Challenge • Create the HTML form and the Perl script from the previous slides on the bioinfo server (it’s a UNIX system!): • Log in to bioinfo using TeraTerm (Start???Tera Term): The host is “bioinfo.tau.ac.il”, choose SSH, click OK, click Yes, user-name is “symp”, password is “turj”. • In UNIX you can use “cd” as in Windows, and “ls” or “ls -l” are like “dir”. • Use the command “mkdir DIR_NAME” to create a directory named as your first name inside the directory “public_html”. the HTML file should be in there. • To create and edit files use the editor pico (“pico FILE_NAME”). To paste into TeraTerm click the middle mouse button. • To access this HTML from your browser use this address:http://bioinfo.tau.ac.il/~symp/YOUR_NAME/form.html
Class exercise 17 • Create another directory for yourself inside the directory “cgi-bin”. The CGI script should be in there. • After creating the script you have to give it execution permissions: “chmod +x SCRIPT_NAME”. Use “ls -l” to check that it now has x’s like this:(bioinfo:symp)~/cgi-bin/eyal>ls –l-rwxr-xr-x 1 symp staff 167 Jan 23 13:14 hello.pl* • The reference to the CGI script in the HTML form should be: <FORM method="GET" action="/cgi-bin/symp/YOUR_NAME/script.pl"> Bonus* Write another HTML form that ask the user for a FASTA file of DNA sequences, and runs a CGI version of ex3.4 (find ORFs in each sequence)
(Class exercise 17) Download and install a package • If you find a package in CPAN or elsewhere you can usually download azip archive of all the files of the package, which usually is a .tar.gz file For example: Search for BioPerl version 1.4 in CPAN – it should be called something like “bioperl-1.4.tar.gz” • Unzip it (extract the files from the compressed archive) • Place the unzipped files or directories in the ActivePerl directory on your computer in the site\lib\ directory. (…\ActivePerl-5.8.7.813\site\lib\) For example – the “Bio” directory of BioPerl should be moved to:…\ActivePerl-5.8.7.813\site\lib\Bio Now you should be able to use modules named like Bio::SeqIO. • Test it with SeqIO_example.pl (available on the webpage)
(Class exercise 17) Using packages from other directories The command “use lib” asks Perl to search in certain directory when searching for packages that are used in the script: use lib 'D:\perl\myPackages';use myPackage; (Assuming that the direcory “myPackages” contains “myPackage.pm”) • Move the “Bio” directory of BioPerl to a ‘D:\test’ and make SeqIO_example.pl find it by adding “use lib”
BioPerl: run blast over the web BioPerl lets us to blast our sequence at the NCBI website:Use Bio::Tools::Run::RemoteBlast Instead of Bio::Tools::Blast (which I showed you before) use Bio::Tools::Run::RemoteBlast ;# here we define the parameters and input of blastmy %runParam = (-method => 'remote', -prog => 'blastp', -database => 'swissprot', -seqs => [$seqObj1,$seqObj2]); # here we run itmy $blastObj = Bio::Tools::Blast->new( -run => \%runParam, -parse => 1, # ask to parse the report -signif => '1e-10', # the cutoff -strict => 1);
Running a local blast 1. You could install blast on your computer from: ftp.ncbi.nlm.nih.gov (There go to the directory: blast/executables/release/) But this may be difficult, and you will also need to download and install the databases you want to search. 2. You can also work on the Unix servers of the bioinformatics unit you can use local blast that is already installed there. Genbank databases that are installed there can be used for blast and for any other work, such as getting a sequence by its accession.
Class exercise 18 • Write a script that runs blast over the web on a given protein FASTA file (Use the same FASTA file as in ex. 14), and print the accessions of the first 20 hits for each input sequence. • Modify the script: Take the accession of a sequence as a command-line argument, fetch this sequence from Genbank over the web, and then blast it