1 / 24

Lecture 6

Lecture 6. More advanced Perl…. Substitute. Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/ CAATTG / CG /g ; Or… to get number of sites also: $ecosites = ($linker =~ s/ CAATTG / CG /g );. Reverse and Translate. my $DNA = ‘CCGTAA’;

barth
Download Presentation

Lecture 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6 More advanced Perl…

  2. Substitute • Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g; • Or… to get number of sites also: $ecosites = ($linker =~ s/CAATTG/CG/g);

  3. Reverse and Translate my $DNA = ‘CCGTAA’; $DNA =~ tr/ACGT/TGCA/; print “$DNA”; $DNA = reverse $DNA; print “ $DNA\n”; • Could have been made with DNA in mind…

  4. Also, quick way to calc. GC% my $DNA = ‘CCGTAA’; my $gc = ($DNA =~ tr/CG/GC/); my $at = ($DNA =~ tr/AT/TA/); print (($gc/($gc+$at))*100); $DNA = reverse $DNA; print "\n$DNA\n";

  5. push, shift, unshift and pop @DNA = ( “A", “C", “G", “T" ); • Add “A” to the END of @DNA push( @DNA, “A" ); • Remove “A” (or whatever is there) from the END of@DNA $end = pop( @DNA ); • Add “T" to the START of @DNA unshift( @DNA, “T" ); • Remove “T” (or whatever is there) from the START of @DNA $a = shift( @DNA );

  6. Arguments • Arguments are data given to a function in UNIX or Perl, e.g. [matt@mrmarsh]$hmmpfam myprotein.fas Pfam.ls • You can get data into a Perl script with arguments [matt@mrmarsh]$myscript.pl myprotein.fas

  7. Arguments, cont. • The arguments end up in a special array called @ARGV: my $file = (shift @ARGV); open FILE, “$file”;

  8. Arguments, cont. • You can put as many arguments as you like in @ARGV – the number can be arbitrary my @data; foreach my $file (@ARGV){ open FILE, “$file”; while <FILE> { push $_, @data; } }

  9. Index • Returns the positionof a match: • Takes three arguments: index string, substring, offset $linker = “GGCCAATTGGAAT”; while (($pos = index ($linker, “CAATTG”, $pos)) > -1) { print “EcoRI at $pos\n”; $pos ++; }

  10. Bioperl • Bioperl is a HUGE set of ready-made perl programs that do almost all the jobs you need for bioinformatics. • Examples – recover DNA sequence from a website, translate DNA to protein, read GenBank files, convert to FASTA, parse BLAST output files into spreadsheets…

  11. Bioperl, cont. • Unfortunately, there is a downside. Bioperl is extremely complex and very difficult to use. Also, the code is only as good as the people who wrote it. • Still, it can save you an awful lot of time. But in order to use it, you need to learn the Perl syntax for objects and references

  12. Object syntax • Perl can be used as an object oriented programming language, although this isn’t enforced (as with Java or C++). • Bioperl is an object oriented set of modules • You pass Bioperl either a function call or a reference. It will return an object. You need to know what to do with the object when you get it, or Bioperl isn’t much use.

  13. References and dereferences • Hashes, arrays and also variables can be big – don’t always want to duplicate them • Can pass a reference to these structures to another function. This is a variable that tells the code where to find a variable, hash or array without duplicating it. my $reference = \$DNA; my $data = ${$DNA}; objects can also be dereferenced by the dereference operator: ->

  14. References and dereferences • Magically, because a reference is a variable, you can fill a hash or an array with references. • This is very useful for spreadsheet-type matrix data: my @data; #just a normal array open FILE, $spreadsheet or die $!; while (<FILE>) { my @line = split “\t”, $_; push @data, \@line; }

  15. References and dereferences And then, to get data back, just dereference: foreach (@data) { foreach (@{$_}){ print "$_\t"; } }

  16. References and dereferences • Also, you can make a hash of arrays: my %fileshash; foreach my $file (@filelist) { open FILE, $file or die $!; my @lines = (<FILE>); close FILE; $fileshash{$file} = \@lines; }

  17. References and dereferences • And then, you can get back any line of any file: foreach (keys %fileshash){ print join “\n”, @{$fileshash{$_}}; } #or my $file = $ARGV[0]; my $line = $ARGV[1]; print ${$fileshash{$file}}[$line]

  18. References and dereferences • And of course you can also make an array of hashes: foreach my $file (@filelist) { open FILE, $file or die $!; my %quesandans; while (<FILE>) { /([^t]+)\t(.+)/; $quesandans{$1} = $2; } push @hashrefarray, \%quesandans; }

  19. References and dereferences • And get the data back in a similar way: foreach (@hashrefarray) { print “answer for $question is “; print ${$_{$question}}; }

  20. References and dereferences • This, of course, leads to a very flexible and powerful set of data structures, since you can go as deep as you like: • Hashes of hashes of arrays • Arrays of hashes of hashes of hashes • etc. When they get this complicated, the dereference notation -> starts to get useful.

  21. Bioperl: Example • Open a FASTA format sequence file: use Bio::Perl; use strict; my $file = $ARGV[0]; die "give me a sequence filename!\n" unless $file; my @seq_object_array = read_all_sequences($file,'fasta'); • The read_all_sequences function returns all the sequences in the fasta file as an array of object references

  22. Bioperl: Example • The sequences from the file are now all in a long string, which can be accessed by dereferencing foreach my $object (@seq_object_array) { my $sequence = uc ($object->seq()); my $name = $object->display_id; my $pos =0; while ((my $pos = index ($sequence, “CAATTG”, $pos)) > -1) { print “EcoRI at $pos of $name\n”; $pos ++; } }

  23. More Bioperl • I could spend a whole semester on Bioperl, but I won’t. You are going to have to figure it out for yourselves if you need it. • perldoc bioperl • I recommend going through the example script bptutorial.pl. I have downloaded and put this in your home directories.

  24. That’s it for now • The more you know, the more there is to learn • The only way to really learn this stuff is to write programs • You need to get cracking with some programming projects in class!

More Related