300 likes | 498 Views
Introduction to Computational Biology. Programming with Matlab. Why Program?. Programming a computer provides: Accuracy Reduce human error Speed and efficiency Processing the human genome manually is impossible Adding all numbers from 1..1000 is a waste of our time Repetition
E N D
Introduction to Computational Biology Programming with Matlab
Why Program? • Programming a computer provides: • Accuracy • Reduce human error • Speed and efficiency • Processing the human genome manually is impossible • Adding all numbers from 1..1000 is a waste of our time • Repetition • Same program can be run many times using different input data • Automation
Anatomy of a program Input Instructions Output Data Steps encoded in a programming language Results – readable by you or another program 3, 4 Sum 7 Align GACACGTTTTTGG---TGGTTTT GAGCAGACACGTTTTTGGTGGTTTT |||| |||||| ||||||| GAGCAGACATTTTTTGATTCTGGTTTT GACAT-TTTTTGATTCTGGTTTT
Computing • A computable task • Does it make sense to create a program to solve my problem? • Text Editor • A program that allows you to save text files • Emacs (unix), vi (unix), gedit (unix), notepad (Windows) • Compiler/Interpreter • gcc (C), javac (Java), perl (Perl)
Strings “This is a string” ‘This is also a string’ “ACGACTACGACTAGCATCAGCATCAG”
vectors >> v = [3 1] v = 3 1
vectors >> v = [3 1]; >>
Row vectors >> v = [3 1 7 -21 5 6] v = 3 1 7 -21 5 6
column vectors >> v = [3 1 7 -21 5 6]' v = 3 1 7 -21 5 6
A new vector >> v = [1:8] v = 1 2 3 4 5 6 7 8
A new vector with increments of 0.25 >> v = [2:.25:4] v = Columns 1 through 7 2.0000 2.2500 2.5000 2.7500 3.0000 3.2500 3.5000 Columns 8 through 9 3.7500 4.0000
Accessing vector elements >> v(1) ans = 2
Basic operations >> v v = 0 2 4 6 8 >> v(1:3) ? >> v(2:4) ? >> v(1:3)-v(2:4) ans = -2 -2 -2
Basic operations >> u = [0:-1:-4] u = 0 -1 -2 -3 -4 >> -2*u ans = 0 2 4 6 8
A matrix: n x n (row x column) >> A = [ 1 2 3; 3 4 5; 6 7 8] A = 1 2 3 3 4 5 6 7 8
Indexing a matrix A = 1 2 3 3 4 5 6 7 8 >> A(1:2,3:4) ??? Index exceeds matrix dimensions. >> A(1:2,2:3) ans = 2 3 4 5 >> A(1:2,2:3)' ans = 2 4 3 5
Vector operations >> v+b ans = 3 6 9 >> v-b ans = -1 -2 -3 >> v = [1 2 3]' v = 1 2 3 >> b = [2 4 6]' b = 2 4 6
Plotting >> v v = 1 2 3 >> b b = 2 4 6 >> plot(v,b)
Loops >> for j=1:4, j end j = 1 j = 2 j = 3 j = 4
Loops >> v = [1:3:10] v = 1 4 7 10 >> for j=1:4, v(j) = j; end >> v v = 1 2 3 4
If statements a = 2; b = 3; if (a<b) j = -1; end
If and elseif statements a = 4; b = 3; if (a<b) j = -1; else if (a>b) j = 2; end
Character arrays >> c = ’atcg’
http://www.math.ohiou.edu/~just/bioinfo05/supplements/MLBasicsIV/watson_crick.mhttp://www.math.ohiou.edu/~just/bioinfo05/supplements/MLBasicsIV/watson_crick.m Example 1: find the complement function c = watson_crick(v) %This is a function m-file for finding the Watson-Crick complement of a %string of nucleotide symbols % % %Author: Winfried Just % Department of Mathematics % Ohio University % Athens, OH 45701 % just@math.ohiou.edu % %Date: March 31, 2005 %Input: v - string of letters from the alphabet {a, c, g, t} %Output: c - the string of Watson-Crick complements to the terms of v %
Example 1 cont. function c = watson_crick(v) for i = 1:length(v) if v(i) == 'a' c(i) = 't'; elseif v(i) == 'c' c(i) = 'g'; elseif v(i) == 'g' c(i) = 'c'; elseif v(i) == 't' c(i) = 'a'; else c(i) = '?'; disp('Non-nucleotide symbol encountered'); end end
Run watson_crick >> watson_crick (’accgatgcttatggatc’)
Example 2: find the AUG function start_codon(v) %This is a function m-file for finding the position of the first start codon % % %Author: Winfried Just % Department of Mathematics % Ohio University % Athens, OH 45701 % just@math.ohiou.edu % %Date: March 31, 2005 %Input: v - string of letters from the alphabet {a, c, g, t} %Output: a message that shows the position of the first % start codon in v %
http://www.math.ohiou.edu/~just/bioinfo05/supplements/MLBasicsIV/start_codon.mhttp://www.math.ohiou.edu/~just/bioinfo05/supplements/MLBasicsIV/start_codon.m Example 2: find the AUG function start_codon(v) found = 0; %this variable will tell us when we have found a start codon i = 1; while ~found & i < length(v) - 1 if v(i) == 'a' & v(i+1) == 't' & v(i+2) == 'g' found = 1; disp(['start codon at position ', num2str(i)]) else i = i+1; end end if ~found disp('no start codon found') end
Run start_codon >> start_codon(’accgatgcttatggatc’)
Examples relevant to biology • Log-log plot • http://www.math.ohiou.edu/courses/matlab/math266a/266A-fitting-logplot.pdf