160 likes | 186 Views
Understand substitution with s/// and character replacement with tr// in Perl, useful for modifying data. Find patterns, replace, match globally, and handle case. See examples with detailed explanations.
E N D
Perl Syntax: substitution s// and character replacement tr//
Substitution Pattern matching is useful for finding or indexing items, but to modify the data, substitution is required. Substitution searches a string for a PATTERN and, if found, replaces it with REPLACEMENT. $line =~ s/PATTERN/REPLACEMENT/; Returns a value equal to the number of times the pattern was found and replaced. $result = $line =~ s/PATTERN/REPLACEMENT/;
s/// $_ = "one two"; s/^([^ ]+) +([^ ]+)/$2 $1/; # $_ = "green scaly dinosaur"; s/(\w+) (\w+)/$2, $1/; # s/^/huge, /; # s/,.*een//; # s/green/red/; # s/\w+$/($`!)$&/; # s/\s+(!\W+)/$1 /; # s/huge/gigantic/g; # $_= "fred barney"; if(s/fred/george/) { #s/// returns true if sucessful print "replaced fred with george\n";
s/// $_ = "one two"; s/^([^ ]+) +([^ ]+)/$2 $1/; #swap first 2 words $_ = "green scaly dinosaur"; s/(\w+) (\w+)/$2, $1/; # "scaly, green dinosaur" s/^/huge, /; #huge, scaly, green dinosaur s/,.*een//; #huge dinosaur s/green/red/; #fails s/\w+$/($`!)$&/; # huge (huge !)dinosaur next line ^^ --match _, !, and ) s/\s+(!\W+)/$1 /; # huge (huge!) dinosaur -- replace '_!)' with !)_ s/huge/gigantic/; #gigantic (huge!) dinosaur $_= "fred barney"; if(s/fred/george/) { #s/// returns true if sucessful print "replaced fred with george\n";
Character Replacement • A similar operation to substitution is character replacement. • tr/CHARACTER SEARCH LIST/REPLACEMENT LIST/ • -- note that this is not pattern matching, but character matching • -- returns number of characters replaced • $line =~ tr/a-z/A-Z/; • $count_CG = $line =~ tr/CG/CG/; • $line =~ tr/ACGT/TGCA/; • $line =~ s/A/T/g; # be CAREFUL • $line =~ s/C/G/g; # this turns your sequence into all A|C • $line =~ s/G/C/g; • $line =~ s/T/A/g;
Character Replacement Flags tr/SEARCH_LIST/REPLACEMENT_LIST /c -- complement the SEARCHLIST -- SEARCH_LIST is comprised of all characters NOT in SEARCH_LIST tr/ / /d -- delete found but unreplaced characters tr / / /s -- squash duplicate replaced characters -- sequences (or runs) of characters replaced, are squashed down to a single character
Character Replacement while($line = <IN>) { $count_CG = $line =~ tr/CG/CG/; $count_AT = $line =~ tr/AT/AT/; } $total = $count_CG + $count_AT; $percent_CG = 100 * ($count_CG/$total); print “The sequence was $percent_CG CG-rich.\n”;
Pattern Matching Flags g m// s// (not tr// ) match globally, find all occurrences i ignore case m match multiple lines as continuous string s treat string as single line .
Examples $_ " AtttcgAtggctaaaAtttgctt" s/A/a/g; #atttcgatggctaaaatttgctt s/^\s+//; #strip leading white space s/\s+$//; #strip trailing white space Binding operator $string = " opps"; $string =~ s/^\s+//; # "opps"
Upper/Lower Case \U -- everything that follows is upper case \L -- what follows is lower case \u -- single character upper \l -- single character lower $_ = "I saw Barney with Fred"; s/(fred|barney)/\U$1/gi; I saw BARNEY with FRED
#!/usr/bin/perl # newNaive.pl # # Here is an example that shows how the "match" # returns a "true" -- so that on the "IF" control structure, # execution precedes into the block # # The $` is a special variable that "remembers" all of the # string that was passed over by the pattern matching engine. # # Using the length() function, the position of the match is determined, # and printed # $_ = "CCCATGATG"; if(/ATG/) { print "Found sequence at position ".length($`)."\n"; }
` #!/usr/bin/perl # newNaive2.pl # # Here is an example that shows how the "match" # returns a "true" -- so that on the "IF" control structure, # execution precedes into the block # # The $` is a special variable that "remembers" all of the # string that was passed over by the pattern matching engine. # # Using the length() function, the position of the match is determined, # and printed # $_ = "CCCATAATTTAGTTTT"; if(/ATG/) { print "Found Start codong $& at position ".length($`)."\n"; } elsif (m/TAG/) { print "found stop codon $& at position ".length($`)."\n"; print "There are ".scalar(length($&)+length($'))." nucleotides after the stop, including the stop codon\n"; } elsif (m/TAA/) { print "found stop codon $& at position ".length($`)."\n"; print "There are ".scalar(length($&)+length($'))." nucleotides after the stop, including the stop codon\n"; } elsif (m/TGA/) { print "found stop codon $& at position ".length($`)."\n"; print "There are ".scalar(length($&)+length($'))." nucleotides after the stop, including the stop codon\n"; } else { print "Start/stop codons not found in $_\n"; }
#!/usr/bin/perl # sub.pl # # Example where I match "with" and " " and one or more # word characters # Then I replace all of that "with word" with "against 'word'" # # The $1 corresponds to the first set of parentheses. # # $_="He's out bowling with Fred tonight"; s/with (\w+)/against $1/; print "$_\n";
#!/usr/bin/perl -w # bind3.pl # # Here's an example that takes a unix path ($file) # and copies it to anothe variable ($filename) # Then, we search for one or more of any character {.+} # followed by a "/" character -- but we have to use the # escape metacharacter "\" so that we don't end the match {\/}. # Finally, we are looking for one or more non-white spaces {(\S+) # at the end -- to pull off the the last file name "FOUND" # # # $path = "/home/tabraun/test/bob/FOUND"; $filename = $path; $filename =~ s/.+\/(\S+)/$1/; print "$filename\n";
#!/usr/bin/perl # randomSeq.pl # # Don't get too uptight over this line -- it is just setting # a "seed" for the rand() fuction with a value that approximates # a random number. If you must know, it takes a prccess ID ($$), # shifts its bit left 15 times, then add the process ID to the shifted # value, then does an bit-wise XOR (^) with the current time(). # print "Enter length of sequence to generate:"; $length = <STDIN>; srand(time() ^ ($$ + ($$ << 15)) ); while($length) { # stay in loop until have generated enough sequence $rand = int rand(4); # Interger number between (0-3) inclusive $rand =~ tr/0123/ACTG/; $length = $length-1; #decrease loop counter $seq = $seq . $rand; #keep the nucleotide I just created } # Since I am out of the loop, I must be done print "$seq\n";