280 likes | 298 Views
Dive into Perl programming with exercises on loops, FASTA file analysis, and file input/output handling. Learn to read, process, and manipulate file contents efficiently.
E N D
Previously on . . .PERL course(let’s practice some more loops)
Start Read line Save header Read line Concatenate to sequence Read line Header or end of input No Yes Do something End of input? No End • Overall design: • Read the FASTA file (several sequences). • For each sequence: • Read the FASTA sequence • 1.1. Read FASTA header • 1.2. Read each line until next FASTA header • For each sequence: Do something • 2.1. Compute G+C content • 2.2. Print header and G+C content • Let’s see how it’s done… FASTA: Analyzing complex input
Start Read line Save header Read line Concatenate to sequence Read line Header or end of input No Yes Do something End of input? No End • # 1. Read FASTA sequece • $fastaLine = <STDIN>; • while (defined $fastaLine) { • # 1.1. Read FASTA header • $header = substr($fastaLine,1); • $fastaLine = <STDIN>; • # 1.2. Read sequence until next FASTA header • while ((defined $fastaLine) and • (substr($fastaLine,0,1) ne ">" )) • { • $seq .= $fastaLine; • $fastaLine = <STDIN>; • } • # 2. Do something • ...# 2.1 compute $gcContent • print "$header: $gcContent\n"; • }
Class exercise 4a • Write a script that reads lines of names and expenses:Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 6.10,24.00,7.00,8.00ENDFor each line print the name and the sum. Stop when you reach "END" • Change your script to read names and expenses on separate lines, Identify lines with numbers by a "+" sign as the first character in the string:Yossi+6.10+16.50+5.00Dana+21.00+6.00Refael +6.10+24.00+7.00+8.00END Sum the numbers while there is a '+' sign before them. Output: Yossi 27.6 Dana 27 Refael 45.1
Class exercise 4a • (Home Ex. 2 Q. 5) Write a script that reads several protein sequences in FASTA format, and prints the name and length of each sequence. Start with the example code from the last lesson. • 4*. Write a script that reads several DNA sequences in FASTA format, and printsFASTA output of the sequences whose header starts with 'Chr07'. • 5**. Write a script that reads several DNA sequences in FASTA format, and printsFASTA output of the sequences whose header contains 'Chr07'.
Reading files Open a file for reading, and link it to a filehandle:open(IN, "<EHD.fasta"); And then read lines from the filehandle, exactly like you would from <STDIN>:my $line = <IN>;my @inputLines = <IN>;foreach $line (@inputLines) ... Every filehandle opened should be closed:close(IN); Always check the open didn’t fail (e.g. if a file by that name doesn’t exists):open(IN, "<$file") ordie "can't open file $file";
no comma here Writing to files Open a file for writing, and link it to a filehandle: open(OUT, ">EHD.analysis") ordie... NOTE: If a file by that name already exists it will be overwriten! You could append lines to the end of an existing file: open(OUT, ">>EHD.analysis") ordie.. Print to a file (in both cases):print OUT "The mutation is in exon $exonNumber\n";
File Test Operators You can ask questions about a file or a directory name (not filehandle): if (-e $name) { print "The file $name exists!\n"; } -e $name exists-r $name is readable-w $name is writable by you-z $name has zero size-s $name has non-zero size (returns size)-f $name is a file-d $name is a directory-l $name is a symbolic link-T $name is a text file-B $name is a binary file (opposite of -T).
Working with paths open( IN, '<D:\workspace\Perl\p53.fasta' ); • Always use a full path name, it is safer and clearer to read • Remember to use \\ in double quotes open( IN, "<D:\\workspace\\Perl\\$name.fasta" ); • (usually) you can also use / open( IN, "<D:/workspace/Perl/$name.fasta" );
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example $line = <STDIN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <STDIN>; chomp $line; }
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example open(IN, '<D:\perl_ex\in.txt') or die "can't open input file"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN);
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example open(IN, '<D:\perl_ex\in.txt') or die "can't open input file"; open(OUT,'>D:\perl_ex\out.txt') or die "can't open output file"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);
Class exercise 5a • Change the script for class exercise 4a.2 to read the lines from an input file (instead of reading lines from keyboard). • Now, in addition, write the output of the previous question to a file named 'D:\perl_ex\class.ex.4a2.out' (instead of printing to the screen). • 3*. Now, before opening 'D:\perl_ex\class.ex.4a2.out‘, check if it exists, and if so – print a message that the output file already exist, and exit the script. • 4*. Change the script for class exercise 4.a3 to receive from the user two strings: 1) a name of FASTA file 2) a name of an output file. And then - read from a FASTA file given by the user, and write to an output file also supplied by the user.
@ARGV 'D:\perl_ex\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl D:\perl_ex\in.fasta 2 430 D:\perl_ex\in.fasta2430
@ARGV 'D:\my' 'perl\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl D:\my perl\in.fasta 2 430 D:\myperl\in.fasta2430
@ARGV 'D:\my perl\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl "D:\my perl\in.fasta" 2 430 D:\my perl\in.fasta2430
Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: my $inFile = $ARGV[0];my $outFile = $ARGV[1]; Or more simply: my ($inFile,$outFile) = @ARGV; > perl -w findProtein.pl D:\perl_ex\in.fasta D:\perl_ex\out.txt
Reading files - example Reminder: the class exercise of 3 days ago. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example $line = <STDIN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <STDIN>; chomp $line; }
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN);
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while (defined $line) { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);
Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; # loop processes one input line and print output for line while (defined $line) { chomp $line; # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; } close(IN); close(OUT);
Class exercise 5b • Change the script of class exercise 5a.2 such that script receive the input and output file names as arguments. • 2*. Write a script receives a number of numeric arguments and prints its sum. For example: • 10 20 30 40 • output: • 100