550 likes | 805 Views
BINF634 FALL09 LECTURE 1. 2. Acknowledgements. John GrefenstetteAssistance with course development.Sharing course materials.Friendship :^).. BINF634 FALL09 LECTURE 1. 3. Experimental Biology Computational Biology and Bioinformatics. . . . . . . . Database. Problem Statement. Experiment. Results.
E N D
1. BINF634 FALL09 LECTURE 1 1 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D.
Office: Room 312C OB
Phone: 540-809-9799
Email: jlsolka@gmail.com
Office Hours: By appointment
Required texts:
Beginning Perl for Bioinformatics by Tisdall and Waliszewski
Programming Perl (3rd Edition) by Wall, Christiansen and Orwant
Course Meeting Times: 304B M: 4:30 pm 7:10 pm
Course webpage
http://binf.gmu.edu/~jsolka/fall09/binf634/Fall_2009BINF_634_Syllabus_rev1.html
2. BINF634 FALL09 LECTURE 1 2 Acknowledgements John Grefenstette
Assistance with course development.
Sharing course materials.
Friendship :^).
3. BINF634 FALL09 LECTURE 1 3 Experimental Biology Computational Biology and Bioinformatics
4. BINF634 FALL09 LECTURE 1 4 Bioinformatics Programming Tasks Manage large experimental data sets
Sequence data
Microarray data (gene expression)
Mass spec data (proteomics)
Genotype project data (HapMap)
Clinical data
Build tools for Knowledge Discovery
Find motifs in sequence data
Data clustering
Visualization
Build analysis pipelines
Glue several analysis steps together into a single automated process
"Munge" data: Take data from one application or database and format it for input to another application of database
5. BINF634 FALL09 LECTURE 1 5 Where the Course Fits
6. BINF634 FALL09 LECTURE 1 6
Objectives Programming skills
Problem solving and Debugging
Reading and Writing Documentation
Data Munging: Data filtering and transformation
Pattern matching and data mining
Visualization and web presentation
Object-oriented programming
Bioinformatics skills
Biological sequence analysis
Interacting with biological databases
Using Bioperl
7. BINF634 FALL09 LECTURE 1 7 Background and Prerequisites Molecular Biology
BIOL 482 or similar course
Recombinant DNA - Watson, Gilman, Witlowski, Zoller
http://www.amazon.com/Recombinant-DNA-Genes-Genomes-Course/dp/0716728664/ref=dp_ob_title_bk
Online Tutorials
http://www.biology-online.org/1/5_DNA.htm
Computer Science
IT 108, CS 112 or similar
Previous programming experience
8. BINF634 FALL09 LECTURE 1 8 Course Policies Programming assignments (50%)
5 graded programming assignments
Exams: Midterm (20%) and Final (20%)
May include both closed-book section and open-book programming problems
In-class Quizzes (10%)
Weekly homework assignments
All HW assignments must be submitted to me via email by the beginning of the next class. HW assignments will not be graded individually, but you may be called upon to discuss your work during the next class. Therefore, late assignments will not be accepted.
Grading criteria: A: 90-100 B: 80-89 C: 70-79
Keep an eye on the webpage
http://binf.gmu.edu/~jsolka/fall09/binf634/Fall_2009BINF_634_Syllabus_rev1.html
9. BINF634 FALL09 LECTURE 1 9 Honor Code Policies I take honor code violations very seriously.
Programming assignments must be your work. Each assignment will specify whether you may use code from other sources. Any material you take from another source must be acknowledged within the program documentation. You must read and understand the honor code handout. Violations of the honor code WILL be referred to the Honor Council.
All students must adhere to the GMU Honor Code:
See: http://honorcode.gmu.edu/
10. BINF634 FALL09 LECTURE 1 10 Pragmatics Assignments and Announcement
Will be posted on course wepage; check daily
Class email will be sent to your email address from Patriot Web
Accounts
You should have an account on the server binf.gmu.edu
Systems administrator: Chris Ryan, cryan1@gmu.edu
Accessing perl:
Login from Rooms 304B or 320
Login from off-campus using ssh
Go to ftp://ftp.ssh.com/pub/ssh/ for academic Windows client
Alternatively go to http://www.chiark.greenend.org.uk/~sgtatham/putty/
Install perl on your own computer -- see textbooks and backup slide materials
11. Pragmatics Unix
This class will focus on using the Unix operating system
We will be using Mac OS X (at least in the classroom)
There are numerous UNIX tutorials
http://www.unixtools.com/tutorials.html
Text Editors
Perl program are stored in plain text files
I recommend emacs or vim for a Unix text editor (see links for windows support)
http://www.claremontmckenna.edu/math/ALee/emacs/emacs.html
http://www.vim.org
If you are interested in an integrated development environment I recommend Eclipse (see backup slides)
www.eclipse.org
There is a tutorials for each online
http://www.gnu.org/software/emacs/tour/
http://www.yolinux.com/TUTORIALS/LinuxTutorialAdvanced_vi.html
12. BINF634 FALL09 LECTURE 1 12 Review: Molecular Biology Life evolved from common origin about 3.5 billion years ago
All life shares similar biochemistry
Proteins: active elements
Nucleic acids: informational elements
Molecular Biology: the study of structure and function of proteins and nucleic acids
13. BINF634 FALL09 LECTURE 1 13 Proteins Functions:
Structural proteins
Enzymes
Transport
Antibody defense
Structure:
Chains of amino acids
Typical size ~300 residues
Range from about 100 to over 5000 residues
14. BINF634 FALL09 LECTURE 1 14
15. BINF634 FALL09 LECTURE 1 15
16. BINF634 FALL09 LECTURE 1 16 Translation Translation involves mRNA and ribosomes
Ribosomes made of protein and ribosomal RNA (rRNA)
Transfer RNA (tRNA) make connection between specific codons in mRNA and amino acids
As tRNA binds to the next codon in mRNA, its amino acid is bound to the last amino acid in the protein chain
When a STOP codon is encountered, the ribosome releases the mRNA and synthesis ends
17. BINF634 FALL09 LECTURE 1 17
18. BINF634 FALL09 LECTURE 1 18 DNA Structure DNA contains:
Genes
"a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other functional sequence regions ".[1]
Promoters
a promoter is a region of DNA that facilitates the transcription of a particular gene
Non-coding regions
DNA which does not contain instructions for making proteins
Reading frames
An open reading frames (ORF): a contiguous sequence of DNA starting at a start codon and ending at a STOP codon
19. BINF634 FALL09 LECTURE 1 19 Shotgun DNA Sequencing
20. Sequence Files -- FASTA Format
21. GenBank Record LOCUS AK091721 2234 bp mRNA linear PRI 20-JAN-2006
DEFINITION Homo sapiens cDNA FLJ34402 fis, clone HCHON2001505.
ACCESSION AK091721
VERSION AK091721.1 GI:21750158
KEYWORDS oligo capping; fis (full insert sequence).
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
Hominidae; Homo.
TITLE Complete sequencing and characterization of 21,243 full-length
human cDNAs
JOURNAL Nat. Genet. 36 (1), 40-45 (2004)
FEATURES Location/Qualifiers
source 1..2234
/organism="Homo sapiens"
/mol_type="mRNA"
CDS 529..1995
/note="unnamed protein product"
/codon_start=1
/protein_id="BAC03731.1"
/db_xref="GI:21750159"
/translation="MVAERSPARSPGSWLFPGLWLLVLSGPGGLLRAQEQPSCRRAFD
...
RLDALWALLRRQYDRVSLMRPQEGDEGRCINFSRVPSQ"
ORIGIN
1 gttttcggag tgcggaggga gttggggccg ccggaggaga agagtctcca ctcctagttt
61 gttctgccgt cgccgcgtcc cagggacccc ttgtcccgaa gcgcacggca gcggggggaa
...
22. Why Perl? Widely used in Bioinformatics
Bioperl
http://www.bioperl.org/wiki/Main_Page
Ease of Programming
Excellent pattern matching features
Good for gluing other program together
Easy to learn (enough to get started)
Rapid Prototyping
Few lines of code needed for many problems
One-liners
Portability
Runs on Unix, Windows, Macs
Open Source Culture
Many sources of help ( try: %perldoc perldoc)
%perldoc f print
http://perldoc.perl.org/index-tutorials.html
Many sources of useful modules ( http://www.cpan.org/ )
23. BINF634 FALL09 LECTURE 1 23 Variables The types of Perl variables are indicated by the initial symbol:
$var stores a scalar (a single string or number)
$x = 10;
$s = "ATTGCGT";
$x = 3.1417;
@var stores an array (a list of values)
@a = (10, 20, 30);
@a = (100, $x, "Jones", $s);
print "@a\n"; # prints "100 3.1417 Jones ATTGCGT"
%var stores a hash (associative array)
%ages = { John => 30, Mary => 22, Lakshmi => 27 };
print $age{"Mary"}, "\n"; # prints 22
24. BINF634 FALL09 LECTURE 1 24 Declaring Variables use strict;
Putting use strict; at the top of your programs will tell perl to slap your hands with a fatal error whenever you break certain rules.
Requires us to declare all variables
Avoids creating variable by typos
variables may be declaring using my, our or local
for now, we only need to use my:
my $a; # value of $a is undef
my ($a, $b, $c); # $a, $b, $c are all undef
my @array; # value of @array is ()
Can combine declaration and initialization:
my @array = qw/A list of words/;
my $a = "A string";
25. BINF634 FALL09 LECTURE 1 25 How Things Can Go Wrong
26. BINF634 FALL09 LECTURE 1 26 Scalar and List Context All operations in Perl are evaluated in either scalar or list context, and may behave differently depending on context
@array = ('one', 'two', 'three');
$a = @array; # scalar context for assignment, return size
print $a; # prints 3
($a) = @array; # list context for assignment
print $a; # prints 'one'
($a, $b) = @array;
print "$a, $b"; # prints 'one, two'
($a, $b, $c, $d) = @array; # $d is undefined
27. BINF634 FALL09 LECTURE 1 27 String Operations Ways to concatenate strings
$DNA1 = "ATG";
$DNA2 = "CCC";
$DNA3 = $DNA1 . $DNA2; # concatenation operator
$DNA3 = "$DNA1$DNA2"; # string interpolation
print "$DNA3"; # prints ATGCCC
$DNA3 = '$DNA1$DNA2'; # no string interpolation
print "$DNA3"; # prints $DNA1$DNA2
28. BINF634 FALL09 LECTURE 1 28 Arrays An array stores an ordered list of scalars:
@gene_array = (EGF1, TFEC, CFTR, LOC1691);
print @gene_array\n;
Output:
EGF1 TFEC CFTR LOC1691
# theres more than one way to do it (see previous slide on declaring variables)
@gene_array = qw/EGF1 TFEC CFTR LOC1691/;
29. BINF634 FALL09 LECTURE 1 29 Arrays An array stores an ordered list of scalars:
@a = (one, two, three, four);
The array is indexed by integers starting with 0:
print $a[1] $a[0] $a[3]\n;
prints:
two one four
Notice: $a[i] is a scalar since we used the $ method of
addressing the variable
30. BINF634 FALL09 LECTURE 1 30 Unix Commands I cat --- for creating and displaying short files
chmod --- change permissions
cd --- change directory
cp --- for copying files
date --- display date
echo --- echo argument
ftp --- connect to a remote machine to download or upload files
grep --- search file
head --- display first part of file
ls --- see what files you have
lpr --- standard print command
more --- use to read files
mkdir --- create directory
mv --- for moving and renaming files
31. BINF634 FALL09 LECTURE 1 31 Unix Commands II pwd --- find out what directory you are in
rm --- remove a file
rmdir --- remove directory
setenv --- set an environment variable
sort --- sort file
tail --- display last part of file
tar --- create an archive, add or extract files
ssh --- log in to another machine
wc --- count characters, words, lines
This site has a nice reference card
http://www.digilife.be/quickreferences/QRC/UNIX%20commands%20reference%20card.pdf
32. BINF634 FALL09 LECTURE 1 32 chmod and tar chmod
There is a nice tutorial here
http://www.perlfect.com/articles/chmod.shtml
tar
There is a nice tutorial here
http://www.apl.jhu.edu/Misc/Unix-info/tar/tar_2.html
33. BINF634 FALL09 LECTURE 1 33 Running perl on binf.gmu.edu % ssh binf.gmu.edu
Password: ******
-- Create binf634 directory (don't type stuff in red)
% mkdir binf634
% cd binf634
% ls
-- Copy a file to current directory
-- (the "." means :current directory")
% cp ~jsolka/public_html/fall09/binf634/bookcode/examples
/example4-1.pl .
% ls
% ls -l
% l
34. BINF634 FALL09 LECTURE 1 34 Running perl on binf.gmu.edu % cat example4-1.pl
#!/usr/bin/perl -w
# Example 4-1 Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Next, we print the DNA onto the screen
print $DNA;
# Finally, we'll specifically tell the program to exit.
exit;
-- Changing permissions
% chmod 755 example4-1.pl
-- Running a perl script
% example4-1.pl
35. BINF634 FALL09 LECTURE 1 35 Editing a Perl Script -- Read the Emacs or vi tutorial.
-- Make a copy and edit the copy
% cp example4-1.pl first.pl
% l
% e first.pl
-- 1. Change 'print $DNA;' to 'print $DNA, "\n";'
-- 2. Now add a comment:
# Author: your name
% cat first.pl
#!/usr/bin/perl -w
# Author: Jeff Solka
# Example 4-1 Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Next, we print the DNA onto the screen
print $DNA, "\n";
# Finally, we'll specifically tell the program to exit.
exit;
36. BINF634 FALL09 LECTURE 1 36 For Next Week Read Tisdall chapters 1-5.
Be ready to ask questions
Be ready to answer questions
HW 1: Write programs as described in the following exercises from "Beginning Perl for Bioinformatics" by Tisdall:
4.3, 4.4, 4.5, 5.2, 5.4 and 5.6
For each exercise, create a perl script called exX.Y.pl, for example, ex4.3.pl for the first exercise.
email me the assignments at jlsolka@gmail.com
Use the following format
initialoffirstname.lastname.ex.4.3
37. BINF634 FALL09 LECTURE 1 37 Some of the Details
38. BINF634 FALL09 LECTURE 1 38 Alternative Development Environments
39. BINF634 FALL09 LECTURE 1 39 What is Eclipse? Eclipse is a multi-language software development platform comprising an IDE and a plug-in system to extend it. It is written primarily in Java and is used to develop applications in this language and, by means of the various plug-ins, in other languages as wellC/C++, Cobol, Python, Perl, PHP and more.
The initial codebase originated from VisualAge.[1] In its default form it is meant for Java developers, consisting of the Java Development Tools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipse software framework, such as development toolkits for other programming languages, and can write and contribute their own plug-in modules. Language packs provide translations into over a dozen natural languages.[2]
Released under the terms of the Eclipse Public License, Eclipse is free and open source software.
http://en.wikipedia.org/wiki/Eclipse_(software)
40. BINF634 FALL09 LECTURE 1 40 What Operating Systems Does Eclipse Run Under? LINUX
MAC OSX
WINDOWS
XP
Vista
41. BINF634 FALL09 LECTURE 1 41 Languages Supported by the Eclipse IDE JAVA
Out of the box
PERL
Via EPIC library
Note one must also have a PERL compiler
PYTHON
Via PyDev library
Note one must also have a PYTHON compiler installed
42. BINF634 FALL09 LECTURE 1 42 Advantages and Disadvantages of the Eclipse Development Environment Advantages
Support for a plethora of languages
Industrial strength
Used by many professional software developer
Has support for configuration management
Disadvantages
Can be slow when developing in languages other than JAVA (may be mere anecdotal evidence)
43. BINF634 FALL09 LECTURE 1 43 Installing Eclipse Under Windows XP - I First make sure that you have a Java Runtime Environment installed
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Owner>java -version
java version "1.5.0_05"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05)
Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode)
C:\Documents and Settings\Owner>
If you dont have a JRE installed go to
http://java.sun.com/j2se/1.4.2/download.html
44. BINF634 FALL09 LECTURE 1 44 Installing Eclipse Under Windows XP - II Obtain the Eclipse zipped file from the Eclipse downloads link at http://www.eclipse.org/downloads/
I believe that I chose this one
Eclipse IDE for Java Developers (85 MB)
Unzip it into an eclipse folder under your windows Program Files directory
In my case here
C:\Program Files\eclipse
Note that Eclipse does not modify your systems registry
45. BINF634 FALL09 LECTURE 1 45 Installing Eclipse Under Windows XP - III Once installed (unzipped)
Double click on the eclipse.exe icon
There is a hello world java tutorial
There are a number of other tutorials
Eclipse3-1.pdf (I will email it to you it is publicly available on the web)
46. BINF634 FALL09 LECTURE 1 46 Downloading ActiveStates ActivePerl Go here and click on the Windows download link
http://www.activestate.com/activeperl/
You should be downloading version 5.10
Use this self extracting binary to install the program
This takes a long time (30 minutes or more, go enjoy your favorite beverage)
47. BINF634 FALL09 LECTURE 1 47 Installing the Eclipse EPIC Library This is my synopsis of this EPIC webpage tutorial
http://www.epic-ide.org/download.php
This is also a helpful site
http://www.epic-ide.org/faq.php
Under Eclipse user the Help->Software Updates Tab
Switch to the Available Software tab
Choose Add Site and choose
http://e-p-i-c.sf.net/updates
Tick the newly created site and click the install button
48. BINF634 FALL09 LECTURE 1 48 Creating Your First PERL Program Under the Eclipse IDE - I Under Eclipse go to Window -> Open Perspective -> Other
Choose PERL
Under Eclipse go to Window -> Preferences
Click on the PERL + and enter in the full path to the ActiveStates PERL executable
In my case it is
"C:\Perl\bin\perl5.10.0.exe"
49. BINF634 FALL09 LECTURE 1 49 Creating Your First PERL Program Under the Eclipse IDE - II Click on File -> New PERL Project
Call it something like HelloWorld
Click on File -> New PERL File
Call it something like HelloWorldPerl
Left click on this file symbol and make sure its extension is .pl (Now it should have a camel symbol)
Enter in your code
print "Hello from ActivePerl!\n";
Now you should be able to choose Run from the top menu or left click on the program symbol and choose Run As Perl Local
If all goes well a console window with the output
Hello from ActivePerl!
should show up
50. BINF634 FALL09 LECTURE 1 50 Debugging With Eclipse and PERL The Perl PPM package PadWalker has to be installed before one can debug your PERL programs under Eclipse
Follow the steps on the next two slides to install PadWalker within ActiveStates PERL
51. BINF634 FALL09 LECTURE 1 51 First Find the Package (PadWalker) Find a package.
To find a package in the repository:
Click the All packages button,
Enter text from the package's name or abstract in the Filter field
As text is entered in the Filter field, the list of packages is automatically updated as the substring match becomes more precise. Click the magnifying glass icon to filter on different meta-data (e.g. Author).
Alternatively, just start typing the name of the package. The Package List will highlight the first package that matches the string you have typed.
52. BINF634 FALL09 LECTURE 1 52 Next Install the Package (PadWalker) Install a package.
To install a package from the repository:
Click on the desired package in the Package List to select it.
Mark the package by:
Clicking the Mark for install button or,
Hitting the "+" key or,
Selecting Install <package-name> from the Action menu or,
Right-clicking the selection and choosing Install <package-name> from the context menu.
Click the Run marked actions button or select Run Marked Actions (Ctrl-Enter) from the File menu.
In my case I installed PadWalker 1.7
53. BINF634 FALL09 LECTURE 1 53 Installing PadWalker Via ppm There are other interesting discussions here but they seem to have been somewhat relegated by the gui-based ActiveStates PERL ppm interface
http://trouchelle.com/perl/ppmrepview.pl
54. BINF634 FALL09 LECTURE 1 54 Editors
55. BINF634 FALL09 LECTURE 1 55 http://www.viemu.com/vi-vim-cheat-sheet.gif
56. BINF634 FALL09 LECTURE 1 56 http://refcards.com/docs/gildeas/gnu-emacs/emacs-refcard-a4.pdf