140 likes | 279 Views
Teaching Materials by Ivan Ovcharenko <IVOvcharenko@lbl.gov> 84-234. part 3. part 2. File management:. How to deal with files? Read data from a file? Write to a file?. Regular expressions:. How to compare strings? Extract substrings? Substitute characters?
E N D
Teaching Materials by Ivan Ovcharenko <IVOvcharenko@lbl.gov>84-234 part 3
part 2 File management: • How to deal with files? • Read data from a file? • Write to a file? Regular expressions: • How to compare strings? • Extract substrings? • Substitute characters? • Practical example: reverse complement of a DNA sequence
part 2 File management Windows Graphical User Interface 1. Open 2. Edit 3. Save
part 2 Perl file management structure is simplier Edit Read or Write 1. Open file 2. READ or WRITE data (line by line) 3. Close file
part 2 How to open and close a file “data.txt” from a perl program? # open data.txt file for READINGopen (FILE, “ < data.txt”); Direction of file data flow File handler. This name will be used everywhere later in the program, when we will deal with this file. < - READ from a file > - WRITE to a file # close a file specified by FILE file handlerclose (FILE);
part 2 Writing “Hello everyone” to the “tmp.txt” file: #!/usr/local/bin/perl open (FILE, “ > tmp.txt”); print FILE “Hello everyone\n”; close (FILE);
part 2 Reading data from a file #!/usr/local/bin/perl # open file data.txt for readingopen (FILE, “ < data.txt”); # read file line by line and print it out to the screen while ($line = <FILE>) { print “$line”; } #close file close(FILE); while loop is analogous to the for loop. All the body statements of it are executed until the condition in parenthesis is correct (true). $line = <FILE> - read next line from a file specified by the file handler FILE
part 2 Example. Calculating a sum of numbers in the file data.txt 1 18 23 2 chomp command removes “\n” (new line) symbol from the string #!/usr/local/bin/perl $sum = 0;open (FILE, “ < data.txt”); while ($line = <FILE>) { chomp($line); $sum = $sum + $line; } close(FILE); print “Sum of the numbers in data.txt file is $sum\n”; Sum of the numbers in data.txt file is 44
part 2 Operations with strings $string = “Hello everybody\n”; # concatenating strings$strA = “Hello “;$strB = “everybody\n”;$string = $strA . $strB; # length of the string -> number of characters inside$strLen = length ($strA); # $strLen = 6; # extracting a part of a string$strA = “Hello everybody\n”;$strB = substr ($strA, 2, 5);print “$strB”; llo e substr substr ( $string, $offset, $n) -- extracts $n characters from the string $string, starting at the position $offset (first position in a string is 0, not 1!)
part 2 Calculate a length of every string in a file named a.txt #!/usr/local/bin/perl # open the a.txt file open (INP, “<a.txt”); # read the file line by line while ($line = <FILE>) { chomp($line); $lineLength = length($line); print “$lineLength\n”; } # close the file close (INP);
part 2 Comparing strings $strA = “AAA”; $strB = “BBB”; $strC = “bbb” if ($strA eq $strB) { print “true\n”; } if ($strB ne $strC) { print “true\n”; } ? $strA = “AAAbbb”; $strC = “bbb” if ($strC eq substr($strA,3,3)) { print “true\n”; } ?
part 2 Modifying strings $strA = “AAAxCTT”; # substitute ‘x’ symbol by ‘N’ symbol in string $strA $strA =~s/x/N/; # substitute all ‘A’ symbols by ‘G’ symbols $strA =~ s/A/G/g; global substitution # substitute all ‘A’s by ‘G’s, all ‘T’s by ‘A’s $strA =~ tr/AT/GA/; print “$strA \n”; GGGxCAA Note: tr/// substitutes only symbols, while s/// substitutes strings $strA =~ s/AAA/123/; 123xCTT
part 2 Example Convert string “Robert has 2 brothers and 2 sisters” to “John has 3 brothers and 3 sisters” $string = “Robert has 2 brothers and 2 sisters”; # 2 --> 3 $string =~ tr/2/3/; # Robert --> John $string =~ s/Robert/John/;
part 2 Reverse complement of a DNA sequence.Input file “seq1.fasta”. Output file “RCseq1.fasta”. #!/usr/local/bin/perl # open file seq1.fasta for reading & RCseq1.fasta for# writingopen (INP, “ < seq1.fasta”);open (OUT, “ > RCseq1.fasta”); # read the sequence and store it into the string# variable $seq, skip the header line$seq = “”;<INP>; # reading a header line and losing the datawhile ($line = <FILE>) { chomp($line); $seq = $seq . $line;} # reverse complement the sequence$seq = reverse($seq);$seq =~ tr/ACTGactg/TGACtgac/; # output the $seq sequence in fasta format$offset = 0;while ($offset < length($seq)) { $subSeq = substr ($seq, $offset, 50); $offset = $offset + 50; print OUT “$subSeq\n”;} #close filesclose (INP);close (OUT);