1 / 27

Perl and substitution operator s/ /

Perl and substitution operator s/ /. Outline. Eclipse Example and Submitting files Binding Operator Building a RE example Anchors Precedence More examples. Eclipse. Show example of creating file and "workspace" File to submit. Binding Operator =~.

tegan
Download Presentation

Perl and substitution operator s/ /

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl and substitution operator s/ /

  2. Outline • Eclipse Example and Submitting files • Binding Operator • Building a RE example • Anchors • Precedence • More examples

  3. Eclipse • Show example of creating file and "workspace" • File to submit

  4. Binding Operator =~ • "=~" binds a scalar expression to a pattern match. • Certain operations search or modify the string $_ by default. • This operator makes the pattern match operation work on some other scalar (variable). • The right argument is a search pattern, substitution, or translation. • The left argument is what is supposed to be searched, substituted, or translated instead of the default $_. • When used in scalar context, the return value generally indicates the success of the operation. • (target =~ search) -- search is often (always?) RE

  5. Clarifying the Binding Operator #!/usr/bin/perl -w # weirdBind.pl # # This example illustrates how the BINDING OPERATOR # is VERY different from assignment $seq1 = "ATGATGATG"; $seq2 = "ATGATGATG"; $seq3 = "ATGATGATG"; if($seq1 =~ m/ATG/) # Binding operator does NOT change { # the value of the variable on the left print "Yeah, we matched\n"; print " 1 $seq1\n"; # seq1 is unchanged } $seq2 =~ m/ATG/; # this line has virtually no effect on the program -- print "2 $seq2\n"; # all it does is return "true". Again, seq2 unchanged $seq3 = m/ATG/; # this line is almost GIBBERISH (even though print " 3$seq3\n"; # it does not cause a compilation error) # BUT LH expression can be changed with s/ / / or tr/ / /

  6. Example: Building an RE Let's say, just for the sake of argument, we are interested in this gene (BBS4) http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_033028.2 This page is a "GenBank" record format. Let's build a RE to search thru the file and find the exon positions.

  7. Example -- RE • Save the file to your computer • "Display" == GeneBank • "SendTo" == File (creates file called "sequences.gb" • In the file, an exon looks something like this: exon 59..110 /gene="BBS4" /note="alignment:Splign" /number=2

  8. Example • Build a Perl script to read in the file -- line by line -- so we can start building a regular expression -- something like this: #!/usr/bin/perl open(FH,"sequence.gb"); while($line = <FH>) { # pattern matching and RE here # How might we test if this is #working? }

  9. #!/usr/bin/perl open(FH,"sequences.gb"); while($line = <FH>) { if($line =~ ????) { #Do something useful } }

  10. #!/usr/bin/perl open(FH,"sequences.gb"); while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon 1..58/) { print "Found line: $line"; #Do something useful } }

  11. More general? #!/usr/bin/perl open(FH,"sequences.gb"); while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon \d..\d/) { print "Found line: $line"; #Do something useful } }

  12. Missing Exons • What happened????

  13. More more general? #!/usr/bin/perl open(FH,"sequences.gb"); while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon \d+..\d+/) { print "Found line: $line"; #Do something useful } }

  14. What else could we do? #!/usr/bin/perl open(FH,"sequences.gb"); while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon\s+\d+\.\.\d+/) { # Match any number of spaces # match periods(not just any character) print "Found line: $line"; #Do something useful } }

  15. Check to see if we missed any exons #!/usr/bin/perl open(FH,"sequences.gb"); $end = -1; #Note, if $end == -1, Undefined while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon\s+(\d+)\.\.(\d+)/) { # Match any number of spaces # match periods(not just any character) print "Found line: $line"; $old_end = $end; $start = $1; $end = $2; if(($start == $old_end+1) || ($old_end == -1)) { #that's good -- do nothing } else { print "old_end end $old_end $end\n"; print "did we miss one\n"; exit(1); } } }

  16. More functionality open(FH,"sequences.gb"); $end = -1; #Note, if $end == -1, Undefined while($line = <FH>) { # from file: exon 1..58 if($line =~ m/exon\s+(\d+)\.\.(\d+)/) { # Match any number of spaces # match periods(not just any character) print "Found line: $line"; $old_end = $end; $start = $1; $end = $2; $line=<FH>; # read in 3 more lines -- pretty ugly $line=<FH>; $line=<FH>; if($line =~ m/number=(\d+)/) { $number = $1; print "number = $number\n"; } if(($start == $old_end+1) || ($old_end == -1)) { #that's good -- do nothing } else { print "old_end end $old_end $end\n"; print "did we miss one\n"; exit(1); } } }

  17. Anchors A pattern that doesn't match at the start of a string can "float" down the string trying to match somewhere else. Anchors constrain the match to particular locations ^ marks the beginning of the string (not to be confused with negation in character class [ ] $ marks the end of the string /^fred/ matches freddy, but NOT manfred /fred$/ matches manfred, but not freddy

  18. Precedence Parentheses () Quantifiers *, +, ?, {,} Anchors ^ $ Alternation | man perlre, prelrequick, and perlretut

  19. Properties of regexps • Any regexp will match at the earliest possible position • ?, *, +, and {n,m} are maximal matching (greedy), meaning they will match as much string as possible Minimal match ?? *? +? {n,m}? a+? == match 'a' 1 or more times, but as few as possible • Example

  20. minMaxMatch.pl #!/usr/bin/perl # minMaxMatch.pl # # Example to show difference between # minimal matching, and maximal (greedy) matching $query = "ATGCCCTGGC"; if($query =~ m/C+?/) { print "min match = $&\n"; # min match = C -- NOTE NEW special variable } if($query =~ m/C+/) { print "max match = $&\n"; # max match = CCC }

  21. Using Regular Expressions Case insensitive flag /i $_ = "fred barney"; if(/FRED/i){ print "found fred\n"; }

  22. Match Variables $_ = "Bob tom, morning"; if(/(\S+) (\S+), (\S+)/) { #match non-whitespace print "words were: $1 $3 $2\n"; $save_for_later = $3; } words were: Bob morning tom Note, the comma is outside of the (), so it is not included. Match variables are available until the next successful match. Match variables are not defined if the match was unsuccessful.

  23. Automatic Match Variables Intentionally have strange names so that programmers do not unintentionally use one. $& = part of string that actually matched the pattern $` = portion of string before the matched section $' = portion of string remaining, after the matched portion

  24. Interpolating into Patterns and Auto Match Variable my $seq = "GGcaTGccAT"; my $query = "ATG"; if($seq =~ /$query/i) { print "Match: $&"; # Match: aTG }

  25. Example $_ = "ATCGAGAGCATGCCATGCAT"; if(/ATG/) { print "Found sequence at position ".length($`)."\n"; } PRICE: Once you use any one of these automatic match variables any where in your entire program, other regular expression will run a little more slowly. Some programmers simply don't use them. .

  26. Substitutions with s/// s/PATTERN/REPLACEMENT STRING/ $_ = "The Cowboys lost the football game"; s/Cowboys/Giants/; #replace Cowboys with Giants print "$_\n"; $_ = "He's out bowling with Fred tonight"; s/with (\w+)/against $1/; #He's out bowling against Fred tonight # This matches ‘with Fred’ and replaces it with ‘against # Fred’

  27. End

More Related