1 / 17

Advanced Perl for Bioinformatics

Advanced Perl for Bioinformatics. Lecture 5. Regular expressions - review. You can put the pattern you want to match between //, bind the pattern to the variable with =~, then use it within a conditional: if ($dna =~ /CAATTG/) {print “Eco RI<br>”;}

lola
Download Presentation

Advanced Perl for Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Perl for Bioinformatics Lecture 5

  2. Regular expressions - review • You can put the pattern you want to match between //, bind the pattern to the variable with =~, then use it within a conditional: if ($dna =~ /CAATTG/) {print “Eco RI\n”;} • Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) • A vertical line means “or”; it allows you to look for either of two completely different patterns: if ($dna=~/GAAT|ATTC/)

  3. Reading and writing files, review • Open a file for reading: open INPUT,”/home/class30/input.txt”; • Or writing open OUTPUT,”>/home/class30/output.txt”; • Make sure you can open it! open INPUT, ”input.txt” or die “Can’t open file\n”;

  4. Test time Last one…

  5. Hashes Perl has another super useful data structure called a hash, for want of a better name. A hash is an associative array – i.e. it is an array of variables that are associated with each other.

  6. Making a hash of it • You can think of a hash just as if it were a set of questions and answers my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” );

  7. Getting the hash back my %matthash = (“first_name” => “Matt”, “surname” => “Hudson”, “age” => “secret”, “height” => 187, #cm “hairstyle” => “D minus” ) print “my name is “, $matthash{first_name}; print “ “, $matthash{surname}, “\n”; You can store a lot of information and recover it easily and quickly without knowing in what order you added it, unlike an array.

  8. Hashes as an array • You can get the “keys” of the hash and use them like an array: foreach my $info (keys %matthash){ print “$info = $matthash{$info}”; }

  9. Why are hashes useful? Exercise. • Many of you might have noticed in the exercise on restriction sites, that there was no way to keep track of which sites were which using arrays • Modify your script using a hash like this one: my %enzymehash = ( “EcoRI” => “CAATTG”, “BamHI” => “GGATCC”, “HindIII” => “AAGCTT”);

  10. (an) answer foreach my $name (keys %enzymehash){ if ($sequence =~ /$enzymehash{$name}/) { print “I found a site for $name,$enzymehash{$name}”; } }

  11. Putting data in a hash my %hash; while (<FILE>) { /stuff(important stuff) more stuff (best stuff)/; $hash{$1} = $2; } Or…. while ($line = <FILE>) { my @tmp = split /\t/, $line; $hash{$tmp[0]} = $tmp[1]; }

  12. Advanced regex • The fun isn’t over yet. • You can match precise numbers of characters • Any number of characters • Positions in a line • Precise formatting (spaces, tabs etc) • You can get bits of the string you matched out and store them in variables • You can use regexes to substitute or to translate

  13. Grabbing bits of the regex • The fun isn’t over yet. my $blastline = “Query= AT1g34399 gene CDS”; $blastline =~ /Query= (.+) gene/; my $atgnumber = $1; print “The accession number is $atgnumber\n”; You can store the contents of the bit within brackets, within the regex, as the special variable $1. Then use it for other stuff. If you put another pair of brackets in, it will be stored in $2.

  14. Using modules • You can use other peoples modules, including those that come with Perl. These provide extra commands, or change the way your Perl script behaves. E.g. use strict; use warnings; use Bio::Perl; You will see these stacked up at the beginning of more complicated Perl scripts. Some modules come with perl (strict, warnings) #man perlmod others you need to download and add in yourself.

  15. A last exercise?... • So: how might hashes help you solve this? • Open up a BLAST output file • Spit out the name of the query sequence, the top hit, and how many hits there were.

  16. Programming projects • Now it’s time to think of your programming projects. • Hopefully you have an idea – we’ll discuss how feasible they are in the time available • If not, here are some suggestions

  17. Suggested program functions • Translate a cDNA into protein, and then check it against the pfam database for HMM hits. • Make a real restriction map of a DNA sequence, with predicted fragment sizes • Align proteins of a favorite family, open the alignment and find residues that are totally conserved. • Perform BLAST against the latest version of the database files for a particular organism – which will check whether the user has the latest files, and if not will download them • Design PCR primers, to make a fragment size chosen by the user, for a sequence input from a fasta file. • Check whether primer sites are unique in a sequenced, or partially sequenced, genome, and gives an “electronic PCR” result. • Output an XML formatted version of a BLAST or HMMER text file. • Analyze codon usage in a protein coding DNA sequence and calculate the Ka/Ks ratio

More Related