1 / 28

Previously on . . . PERL course (let ’ s practice some more loops)

Dive into Perl programming with exercises on loops, FASTA file analysis, and file input/output handling. Learn to read, process, and manipulate file contents efficiently.

chade
Download Presentation

Previously on . . . PERL course (let ’ s practice some more loops)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Previously on . . .PERL course(let’s practice some more loops)

  2. Start Read line Save header Read line Concatenate to sequence Read line Header or end of input No Yes Do something End of input? No End • Overall design: • Read the FASTA file (several sequences). • For each sequence: • Read the FASTA sequence • 1.1. Read FASTA header • 1.2. Read each line until next FASTA header • For each sequence: Do something • 2.1. Compute G+C content • 2.2. Print header and G+C content • Let’s see how it’s done… FASTA: Analyzing complex input

  3. Start Read line Save header Read line Concatenate to sequence Read line Header or end of input No Yes Do something End of input? No End • # 1. Read FASTA sequece • $fastaLine = <STDIN>; • while (defined $fastaLine) { • # 1.1. Read FASTA header • $header = substr($fastaLine,1); • $fastaLine = <STDIN>; • # 1.2. Read sequence until next FASTA header • while ((defined $fastaLine) and • (substr($fastaLine,0,1) ne ">" )) • { • $seq .= $fastaLine; • $fastaLine = <STDIN>; • } • # 2. Do something • ...# 2.1 compute $gcContent • print "$header: $gcContent\n"; • }

  4. Class exercise 4a • Write a script that reads lines of names and expenses:Yossi 6.10,16.50,5.00Dana 21.00,6.00Refael 6.10,24.00,7.00,8.00ENDFor each line print the name and the sum. Stop when you reach "END" • Change your script to read names and expenses on separate lines, Identify lines with numbers by a "+" sign as the first character in the string:Yossi+6.10+16.50+5.00Dana+21.00+6.00Refael +6.10+24.00+7.00+8.00END Sum the numbers while there is a '+' sign before them. Output: Yossi 27.6 Dana 27 Refael 45.1

  5. Class exercise 4a • (Home Ex. 2 Q. 5) Write a script that reads several protein sequences in FASTA format, and prints the name and length of each sequence. Start with the example code from the last lesson. • 4*. Write a script that reads several DNA sequences in FASTA format, and printsFASTA output of the sequences whose header starts with 'Chr07'. • 5**. Write a script that reads several DNA sequences in FASTA format, and printsFASTA output of the sequences whose header contains 'Chr07'.

  6. Reading and writing files

  7. Reading files Open a file for reading, and link it to a filehandle:open(IN, "<EHD.fasta"); And then read lines from the filehandle, exactly like you would from <STDIN>:my $line = <IN>;my @inputLines = <IN>;foreach $line (@inputLines) ... Every filehandle opened should be closed:close(IN); Always check the open didn’t fail (e.g. if a file by that name doesn’t exists):open(IN, "<$file") ordie "can't open file $file";

  8. no comma here Writing to files Open a file for writing, and link it to a filehandle: open(OUT, ">EHD.analysis") ordie... NOTE: If a file by that name already exists it will be overwriten! You could append lines to the end of an existing file: open(OUT, ">>EHD.analysis") ordie.. Print to a file (in both cases):print OUT "The mutation is in exon $exonNumber\n";

  9. File Test Operators You can ask questions about a file or a directory name (not filehandle): if (-e $name) { print "The file $name exists!\n"; } -e $name exists-r $name is readable-w $name is writable by you-z $name has zero size-s $name has non-zero size (returns size)-f $name is a file-d $name is a directory-l $name is a symbolic link-T $name is a text file-B $name is a binary file (opposite of -T).

  10. Working with paths open( IN, '<D:\workspace\Perl\p53.fasta' ); • Always use a full path name, it is safer and clearer to read • Remember to use \\ in double quotes open( IN, "<D:\\workspace\\Perl\\$name.fasta" ); • (usually) you can also use / open( IN, "<D:/workspace/Perl/$name.fasta" );

  11. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example $line = <STDIN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <STDIN>; chomp $line; }

  12. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example open(IN, '<D:\perl_ex\in.txt') or die "can't open input file"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN);

  13. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example open(IN, '<D:\perl_ex\in.txt') or die "can't open input file"; open(OUT,'>D:\perl_ex\out.txt') or die "can't open output file"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);

  14. Class exercise 5a • Change the script for class exercise 4a.2 to read the lines from an input file (instead of reading lines from keyboard). • Now, in addition, write the output of the previous question to a file named 'D:\perl_ex\class.ex.4a2.out' (instead of printing to the screen). • 3*. Now, before opening 'D:\perl_ex\class.ex.4a2.out‘, check if it exists, and if so – print a message that the output file already exist, and exit the script. • 4*. Change the script for class exercise 4.a3 to receive from the user two strings: 1) a name of FASTA file 2) a name of an output file. And then - read from a FASTA file given by the user, and write to an output file also supplied by the user.

  15. Passing information using command-line arguments

  16. @ARGV 'D:\perl_ex\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl D:\perl_ex\in.fasta 2 430 D:\perl_ex\in.fasta2430

  17. @ARGV 'D:\my' 'perl\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl D:\my perl\in.fasta 2 430 D:\myperl\in.fasta2430

  18. @ARGV 'D:\my perl\in.fasta' '2' '430' Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: foreach my $arg (@ARGV){ print "$arg\n";} > perl -w findProtein.pl "D:\my perl\in.fasta" 2 430 D:\my perl\in.fasta2430

  19. Command line arguments It is common to give arguments (separated by spaces) within the command-line for a program or a script: They will be stored in the array @ARGV: my $inFile = $ARGV[0];my $outFile = $ARGV[1]; Or more simply: my ($inFile,$outFile) = @ARGV; > perl -w findProtein.pl D:\perl_ex\in.fasta D:\perl_ex\out.txt

  20. Command line arguments in Eclispe

  21. Command line arguments in Eclispe

  22. Reading files - example Reminder: the class exercise of 3 days ago. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1

  23. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example $line = <STDIN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <STDIN>; chomp $line; }

  24. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN);

  25. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 END Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while ($line ne "END") { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);

  26. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; chomp $line; # loop processes one input line and print output for line while (defined $line) { # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; chomp $line; } close(IN); close(OUT);

  27. Input: Yossi 6.10,16.50,5.00 Dana 21.00,6.00 Refael 24.00,7.00,8.00 Output: Yossi 27.6 Dana 27 Refael 45.1 Reading files: example my ($inFileName, $outFileName) = @ARGV; open(IN, "<$inFileName") or die "can't open $inFileName"; open(OUT, ">$outFileName") or die "can't open $outFileName"; $line = <IN>; # loop processes one input line and print output for line while (defined $line) { chomp $line; # Separate name and numbers @nameAndNums = split(/ /, $line); $name = $nameAndNums[0]; @nums = split(/,/, $nameAndNums[1]); $sum = 0; # Sum numbers foreach $num (@nums) { $sum = $sum + $num; } print OUT "$name $sum\n"; # Read next line $line = <IN>; } close(IN); close(OUT);

  28. Class exercise 5b • Change the script of class exercise 5a.2 such that script receive the input and output file names as arguments. • 2*. Write a script receives a number of numeric arguments and prints its sum. For example: • 10 20 30 40 • output: • 100

More Related