160 likes | 191 Views
Strings and Patterns in Perl. Ellen Walker Bioinformatics Hiram College. Finding a Fixed Pattern. my $string = “ATAAGCTTATCG”; my $pattern = “GCT”; print index($string,$pattern); print index (reverse($string), $pattern);. Finding multiple occurrences. my $start = 0;
E N D
Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College
Finding a Fixed Pattern • my $string = “ATAAGCTTATCG”; • my $pattern = “GCT”; • print index($string,$pattern); • print index (reverse($string), $pattern);
Finding multiple occurrences • my $start = 0; • print index($string, $pattern, $start); • $start = index($string, $pattern, $start) + length($pattern); • print index($string, $pattern, $start); • $start = index($string, $pattern, $start) + length($pattern); When do you stop searching?
Finding all (non-overlapping) occurrences my $start = 0; my $found; $found = index($string, $pattern, $start); while ($found > -1) { print “$pattern found at $found\n”; $start = $found + length($pattern); $found = index($string, $pattern, $start); }
Pattern Matching Operators • Three types of operators (so far) • Translation: tr • Substitution: s and g • Matching: m • Used with =~ to modify a string • Example: • $complement =~ tr/ACGT/TGCA/
Translation • The tr operator takes two sequences of characters of the same length • Every character in the first string is changed to the character at the same position in the second string • This is destructive; save the old string before you use it!
Translation examples • my $string = “actgTGCA”; • my $capitalizedString = $string; • $capitalizedString =~ tr/actg/ACTG/; • my $lowerCaseString = $string; • $lowerCaseString =~tr/ACTG/actg/;
Substitution • Replaces an entire pattern with another pattern • Patterns need not be the same length • s changes only the first occurrence • Add g to change all occurrences • Example: • $string =~ s/T/U/g
Substitution Examples • My $aminoAcids = $dna; • $aminoAcids =~ s/AUG/Met/g; • $aminoAcids =~ s/GGU/Gly/g; • $aminoAcids =~ s/GGG/Gly/g; A sequence of these substitutions will not really work to translate RNA (why not?)
Matching • Not destructive to the string • Tests if the string matched (can be used as a condition in an if statement. • Example: if ($string =~ m/T/) print “String is DNA, not RNA\n”;
Non-Exact Patterns • Can be used with s or m • Include • wildcard characters, • multiple option matches • capturing
Wildcard characters . Matches any character * Matches 0 or more characters equal to the preceding character + matches 1 or more… ^ before the beginning of the string $ matches after the end of the string
Multiple option matches [actg] Matches one character in the set a, c, t, g [^A-Z] Matches one character that is not A-Z TAG|TGA|TAA Matches either TAG, TGA or TAA • Example:my $Rpattern = ‘A|G’;
Capturing Patterns • Any pattern in parentheses is “captured” • The pattern can be recovered with \1, \2 etc. • Example: • s/(…)(…)/\2\1/ switches the first two codons in the string.
Slides are not Complete! • Page 56-57 of the Perl book has an extensive list of regular expression examples.
Examples • 6-mer palindrome (.)(.)(.)\3\2\1 • Pair of nucleotides repeated at least three times (.)(.).*\1\2.*\1\2 • Strings that end with GGA GGA$