1 / 48

Computer Programming for Biologists

Computer Programming for Biologists. Class 5 Nov 21 st , 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25. Computer Programming for Biologists. Overview. Program Exit Test Submission Random numbers Regular Expressions. Computer Programming for Biologists. Exiting a program.

adolph
Download Presentation

Computer Programming for Biologists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Programming for Biologists Class 5 Nov 21st, 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25

  2. Computer Programming for Biologists Overview • Program Exit • Test Submission • Random numbers • Regular Expressions

  3. Computer Programming for Biologists Exiting a program 1. automatic exit at end of script 2. explicit exit with value: exit 0; # default or exit 1; # normally indicates an error 3. exit on failure: die "error message"; ("\n" supresses line number)

  4. Computer Programming for Biologists Exiting a program Example:

  5. Computer Programming for Biologists Exam Structure 1. Multiple choice quiz (on paper) • 10 questions, up to 2 points each 2. Programming task selection: two short ones (30 points each) two long ones (50 points each) • pick any two for up to 80 points • submit online

  6. Computer Programming for Biologists Exam Structure Example scenario 1: multiple- choice Simple Program: Option A Option B Harder Program: Option C Option D Multiple choice: 20 points Option A: 30 points Option C: 50 points 100 points max

  7. Computer Programming for Biologists Exam Structure Example scenario 2: multiple- choice Simple Program: Option A Option B Harder Program: Option C Option D Multiple choice: 20 points Option A: 30 points Option B: 30 points 80 points max

  8. Computer Programming for Biologists Exam Structure Example scenario 3: multiple- choice Simple Program: Option A Option B Harder Program: Option C Option D Multiple choice: 20 points Option C: 50 points Option D: 50 points 100 points max

  9. Computer Programming for Biologists Sample Programming Task Write a Perl script that does the following: Prompt user for input Store received input in a variable Check if it is a palindrome, e.g. level, kayak, rotator If it is, exit with success message If not, exit or die with failure message

  10. Computer Programming for Biologists Sample Programming Task Step 1: Put comments into your script and save as 'pal.pl'

  11. Computer Programming for Biologists Sample Programming Task Step 2: Add Perl code (one line at a time):

  12. Computer Programming for Biologists Sample Programming Task Step 3: Run script in Terminal to make sure syntax is ok:

  13. Computer Programming for Biologists Sample Programming Task Step 4: Add another line of code:

  14. Computer Programming for Biologists Sample Programming Task Step 5: Run and repeat until script is finished. Hints: - receive input: $input = <>; - remove newline with chomp - check for palindrome, i.e.: $input eq reverse($input)

  15. Computer Programming for Biologists Sample Programming Task Online Submission: http://bioinf.gen.tcd.ie/cgi-bin/upload.pl • include name and student id as comments • insert student id into text box • upload intermediate versions  last uploaded version counts!

  16. Computer Programming for Biologists rand rand = Perl's random number generator - default: $num = rand;  0 < $num < 1, e.g. 0.24099 - higher numbers: $num = rand(10);  0 < $num < 10, e.g. 6.5223 Tip: use 'int' to remove digits after decimal point

  17. Computer Programming for Biologists rand Exercise: Try it in the Perl debugger!

  18. Computer Programming for Biologists rand example Task: write a script that generates a random DNA sequence of length X Pseudocode: - specify alphabet - select random letter - append to sequence string - repeat X times - print sequence to screen

  19. Computer Programming for Biologists rand example Implementation: use pseudocode as comments

  20. Computer Programming for Biologists rand example Implementation: optional, but good practice

  21. Computer Programming for Biologists rand example Implementation: get random position first then use as index in array and retrieve according base Note: only integer part of $pos is used

  22. Computer Programming for Biologists rand example Implementation: concatenate letter to string using '.=' operator

  23. Computer Programming for Biologists rand example Implementation: wrap code block into a loop '..' builds a list with numbers from 1 to $length end of loop Where to get '$length' from? How about command line?

  24. Computer Programming for Biologists rand example Implementation: takes first element from @ARGV

  25. Computer Programming for Biologists rand example Implementation: print output

  26. Computer Programming for Biologists rand example check for argument on command line

  27. Computer Programming for Biologists rand example error message if run without argument on command line random sequence

  28. Computer Programming for Biologists Regular Expressions • constructs that describe patterns • powerful methods for text processing • search for patterns in a string • search and extract patterns • search and replace patterns • pattern at which to split a string

  29. Computer Programming for Biologists Regular Expressions • Examples: • Look for a motif in a dna/protein sequence • Find low complexity repeats and mask with x's • Find start of sequence string in GenBank record • Extract e-mail addresses from a web-page • Replace strings, e.g.: '@tcd.ie' with '@gmail.com'

  30. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $sequence = 'ataggctagctaga'; if ( $sequence =~ /ctag/ ) { print 'Found!';} string in which to search

  31. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $sequence = 'ataggctagctaga'; if ( $sequence =~ /ctag/ ) { print 'Found!';} binding operator

  32. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $sequence = 'ataggctagctaga'; if ( $sequence =~ /ctag/ ) { print 'Found!';} pattern

  33. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $sequence = 'ataggctagctaga'; if ( $sequence =~ /ctag/ ) { print 'Found!';} delimiters

  34. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $sequence = 'ataggctagctaga'; if ( $sequence =~ /ctag/ ) { print 'Found!';} string in which to search delimiters pattern binding operator

  35. Computer Programming for Biologists Regular Expressions Find a pattern in a string (stored in a variable): $_ = 'ataggctagctaga'; if ( /ctag/ ) { print 'Found!';} delimiters pattern without binding // to a variable, regular expression works on $_

  36. Computer Programming for Biologists Regular Expressions Search modifier: i = make search case-insensitive $sequence = 'ataggctagctaga'; if ( $sequence =~ /TAG/i ) { print 'Found!'; }

  37. Computer Programming for Biologists Regular Expressions Metacharacters: ^ = match at the beginning of a line $ = match at the end of the line . = match any character (except newline) \ = escape the next metacharacter $sequence = ">sequence1\natgacctggaataggat"; if ( $sequence =~ /^>/ ) { # line starts with '>' print 'Found Fasta header!'; } /\.$/ matches dot at end of line

  38. Computer Programming for Biologists Project Exercise: Modify your course project (sequanto.pl) to use a regular expression for detection of a header line instead of 'substr' and 'eq' to check first character.

  39. Computer Programming for Biologists Regular Expressions • Matching repetition: • a? = match 'a' 1 or 0 times • a* = match 'a' 0 or more times, i.e., any number of times • a+ = match 'a' 1 or more times, i.e., at least once • a{n,m} = match at least "n" times, but not more than "m" times. • a{n,} = match at least "n" or more times • a{n} = match exactly "n" times • $sequence =~ /a{5,}/; # finds repeats of 5 or more 'a's

  40. Computer Programming for Biologists Regular Expressions Search for classes of characters \d = match a digit character \w = match a word character (alphanumeric and '_') \D = match a non-digit character \W = match a non-word character \s = whitespace \S = match a non-whitespace character $date = '30 Jan 2009'; if ( date =~ /\d{1,2}\w+\d{2,4}/ ) { print 'Correct date format!'; } also matches '1 February 09'

  41. Computer Programming for Biologists Regular Expressions Match special characters \t = matches a tabulator (tab) \b = matches a word boundary \r = matches return \n = matches UNIX newline \cM = matches Control-M (line-ending in Windows) while (my $line = <>) { if ($line =~ /\cM/) { warn "Windows line-ending detected!"; } }

  42. Computer Programming for Biologists Regular Expressions Search for range of characters [ ] = match at least one of the characters specified within these brackets - = specifies a range, e.g. [a-z], or [0-9] ^ = match any character not in the list, e.g. [^A-Z] $sequence = 'ataggctapgctaga'; if ( $sequence =~ /[^acgt]/ ) { print "Sequence contains non-DNA character: $&"; } $& is a special variable containing the last pattern match $` and $' contain strings before and after match

  43. Computer Programming for Biologists Regular Expressions Search and replace (substitute): s/pattern1/pattern2/ $sequence = 'ataggctagctaga'; $rna = $sequence; $rna =~ s/t/u/; -> 'auaggctagctaga' Only the first match will be replaced!

  44. Computer Programming for Biologists Regular Expressions Modifiers for substitution: i = case in-sensitive g = global s = match includes newline $sequence = 'ataggctagctaga'; $rna = $sequence; $rna =~ s/t/u/g; -> 'auaggcuagcuaga' replaces all 't' in the line with 'u'

  45. Computer Programming for Biologists Regular Expressions Example: Clean up a sequence string: $sequence = " 1 ataggctagctagat 16 ttagagctagta "; $sequence =~ s/[^actg]//g; -> 'ataggctagctagatttagagctagta' Deletes everything that is not a, c, t, or g.

  46. Computer Programming for Biologists Regular Expressions • Extract matched patterns: • put patterns in parentheses • \1, \2, \3, … refers back to ()'s within pattern match • $1, $2, $3, … refers back to ()'s after pattern match • $sequence = ">test\natgtagagctagta"; • if ($sequence =~ /^>(.*)/) { $id = $1; } • or • $email =~ s/(.*)\@(.*)\.(.*)/\1 at \2 dot \3/; • print "Changed address to $1 at $2 dot $3\n"; changes 'kahokamp@tcd.ie' to 'kahokamp at tcd dot ie''

  47. Computer Programming for Biologists Regular Expressions in split Change a character into an array: @array = split //, $string; Split input line at tabs: @columns = split /\t/, $input_line; Default splits $_ on whitespace: while (<>) { @colums = split; … }

  48. Computer Programming for Biologists Exam 1 Friday, 22nd November, 12 – 1pm, East End Mac lab Allowed material: - your own laptop - course material (slides, web pages) - Perl and Unix documentation Tip: Include lots of comments!

More Related