550 likes | 783 Views
96-Summer 生物資訊程式設計實習 ( 二 ). Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯. Schedule. Regular expression. File handle. File handle. Reserved file handle File manipulation File test operator File status Localtime. Reserved file handle. STDIN STDOUT STDERR DATA
E N D
96-Summer生物資訊程式設計實習(二) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯
Regular expression File handle
File handle • Reserved file handle • File manipulation • File test operator • File status • Localtime
Reserved file handle • STDIN • STDOUT • STDERR • DATA • ARGV • ARGVOUT
File handle - open • Input • open SEQ, “seq.txt”; • open SEQ, “< seq.txt”; • Output • open SEQ, “> seq.txt”; • Appended output • open LOG, “>> log.txt”;
File handle - close • Input/Output • close SEQ; • close LOG;
File handle - die • Error handling • die “<your error message>”; • $! : system error message • Example #!/usr/bin/perl -w #log.pl : write the read-only file open LOG, ">> disorder.fa" or die "LOG ERROR:$!\n"; # write log close LOG;
File handle - warn • Warning handling • warn “<your error message>”; • $! : system error message • Example • open LOG, “>> disorder.txt” orwarn “LOG ERROR:$!”;
File copy #!/usr/bin/perl -w #copy1.pl : copy data from the input file into the output file open INPUT, "<disorder.fa" or die "disorder.fa can't be opened\n"; open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n"; my $line; while ( $line = <INPUT> ) { chomp $line; print OUTPUT "$line\n"; } close INPUT; close OUTPUT;
File copy + #!/usr/bin/perl -w #copy2.pl : copy data from the input file into the output file if (not -e "disorder1.fa") { die "disorder1.fa isn't existed\n"; print "continue to open disorder1.fa\n"; } open INPUT, "<disorder1.fa" or die "disorder1.fa can't be opened\n"; if (-e "temp.fa") { warn "temp.fa is existed\n"; print "continue to write temp.fa\n"; } open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n"; my $line; while ( $line = <INPUT> ) { chomp $line; print OUTPUT "$line\n"; } close OUTPUT; close INPUT;
Exercise File handle
File size • Get the size of a file • my $size = -s “disorder.fa”; • Check file size • if ( -s “disorder.fa” > 5*1024) { … } • if ($size=-s “disorder.fa” > 5*1024) { print “disorder.fa has $size bytes\n”;} • What’s the value of $size ? Why ?
Exercise – linenumber.pl • Input (disorder.fa) >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cerevisiae (Baker's yeast). MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT DALGIDEYGG • Output 1 >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cerevisiae (Baker's yeast). 2 MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD 3 TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... 128 EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL 129 GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT 130 DALGIDEYGG
Regular expression File status, localtime
File status #!/usr/bin/perl -w #stat.pl : show the information of the file my $fn = shift @ARGV; die "please enter a filename\n" if(not defined($fn)); die "$fn isn't existed\n" if(not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); print "device = $dev\n"; print "inode = $ino\n"; print "mode = $mode\n"; print "node link = $nlink\n"; print "user id = $uid\n"; print "group id = $gid\n"; print "rdev = $rdev\n"; print "size = $size\n"; print "atime = $atime\n"; print "mtime = $mtime\n"; print "ctime = $ctime\n"; print "block size = $blksize\n"; print "blocks = $blocks\n";
Local time #!/usr/bin/perl -w #localtime1.pl : show the readable time of the file my $fn = shift @ARGV; die "please enter a filename\n" if (not defined($fn)); die "$fn isn't existed\n" if (not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); my $alocal = localtime $atime; my $mlocal = localtime $mtime; my $clocal = localtime $ctime; print "atime = $alocal\n"; print "mtime = $mlocal\n"; print "ctime = $clocal\n";
Local time + #!/usr/bin/perl -w #localtime2.pl : show the user-defined time of the file my $fn = shift @ARGV; die "please enter a filename\n" if (not defined($fn)); die "$fn isn't existed\n" if (not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); my ($sec,$min,$hour,$day,$mon,$year,$wday,$yday,$isdst) = localtime $mtime; print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";
Local time • $sec : 0~59 • $min : 0~59 • $hour : 0~23 • $day : 1~31 • $mon : 0~11 • $year : +1900 • $wday : 0 (Sunday) ~ 6 (Saturday) • $yday : 0 (Jan 1) ~354 or 355 • $isdst: daylight saving time (positive or zero)
Exercise localtime
Quiz – localtime my ($sec,$min,$hour,$day,$mon,$year,$wday, $yday,$isdst) = localtime $mtime; print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n"; mtime = (107/7/2 10:10:16 (4;213;0) my $mlocal = localtime $mtime; print "mtime = $mlocal\n"; mtime = Thu Aug 2 10:10:16 2007 my ($mlocal) = localtime $mtime; ?
Exercise • How to show the time information of disorder.fa like “ 2007/8/2 10:10:16 (Thu) “ ? • Hint: year, month and weekday • @weekDays = qw(Sun Mon Tue Wed Thu Fri Sat Sun); • How to show the time information of disorder.fa like “Aug 2 2007 10:10:16 (Thu)“ ? • @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
Regular expression Basic
How to search a word in a text file ? • Unix command • grep • Perl • Regular expression
An example of Regular expression #!/usr/bin/perl -w #google1.pl : check string with/without a certain pattern while (1) { print "Please enter your query:"; $line = <>; if ($line =~ /google/) { print "Found!!!\n"; } else { print "No match\n"; } }
If we want to find the following words • google, g01gle, g12gle, gabgle, …, gxxgle • ggle, gogle, google, gooogle, …, go…ogle • gogle, google, gooogle, …, go…ogle • google, goooogle, goooooogle, …, goo…oogle • ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, …
Meta-character • Wildcard (.) • Except for “\n” • Quantifier • ? : one character or none • * : one character ~ or none • + : one character ~
If we want to find the following words • google, g01gle, g12gle, gabgle, …, gxxgle • /g..gle/ • ggle, gogle, google, gooogle, …, go…ogle • /go*gle/ • gogle, google, gooogle, …, go…ogle • /go+gle/ • google, goooogle, goooooogle, …, goo…oogle • /g(oo)+gle/ • ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, … • /g.*gle/
Character class • [ ] • - • ^ • Examples • [abcdefghijklmnopqrstuvwxyz] or [a-z] • [0123456789] or [0-9] • [abcxyz] • [02468] or [^13579] • [A-Za-z0-9]
Character class simplicity • [\d] : [0-9] • [\w] : [A-Za-z0-9_] • [\s] : [\f\t\n\r ] • Something you don’t want • [\D] : [^\d] • [\W] : [^\w] • [\S] : [^\s] • How about [\s\S] ? • What’s different between . and [\s\S] ?
Please think … • /google/ • /g[\d][\d]gle/ • /g..gle/ • /g[\w]*gle/ • /g.*gle/ • /g[\d\D]*gle/ • /g……….gle/
Additional quantifiers • | • { n, m } • Examples • /(google|Google)/ or /(G|g)oogle/ • /g……….gle/ or /go{10}gle/ • /go{0,100}gle/ • /g(oo)+gle/ or /g(oo){1,}gle/
Additional quantifiers • ^ : beginning of the string • $ : end of the string • \b : boundary of a word • \B : [^\b] • Examples • /^google$/ • /\bgoogle\b/
Additional quantifiers • ( ) • \1, \2, … : backreference • Examples • /g(o)\1gle/ • /g([\S])\1gle/ • Output (matched variable) • $1, $2, …
Exercise Basic regular expression
Exercise • How to extract these words ? • gogle, gooogle, gooooogle, gooooooogle (No ggogles) • g11gle, g33gle, g55gle, g77gle, g99gle (excluding gg99gles) • What do those mean ? • /g[\d]+gle/ • /go?gle/ • /g([\w])([\w])\2\1gle/
Magic variable - $_ • Magic while (<>) { chomp; if (/google/) { print “$_\n”; } } • Original while ($line = <>) { chomp($line); if ($line =~ /google/) { print “$line\n”; } }
Magic variable - $_ #!/usr/bin/perl -w #google2.pl : check string with/without a certain pattern print "Please enter your query:"; while (<>) { chomp; if (/google/) { print "Found!!!\n"; } else { print "No match\n"; } print "Please enter your query:"; }
Regular expression Flags
Regular Expression • String matching • m// or // • String substitution • s/// • String transliteration • tr/// or y///
Matching • Complete syntax • m// • Examples • m/google/ • m/g(oo){0,}gle/ • Others • m<google>, m[google], m!google!, …
Flag options • /i : case insensitivity • /s : let . become [\d\D] • /m : multiple lines • Examples • google, Google, GOOGLE, gOOGLE, GooGle, … • m/google/i
Matched patterns • $& : the last matched patterns • $` : prefix-string of $& • $’ : suffix-string of $& • Examples $string = "Microsoft google Yahoo"; $string =~ m/google/i; print “[$`][$&][$‘]\n"; [Microsoft ][google][ Yahoo]
Matched pattern - $&, $`, $’ #!/usr/bin/perl -w #google3.pl : check string with/without a certain pattern print "Please enter your query:"; while (<>) { chomp; if (m/google/i) { print "Match:[$&]\n"; print "prefix : [$`]\n"; print "suffix : [$']\n"; } else { print "No match\n"; } print "Please enter your query:"; }
Substitution • Complete syntax • s/// or s### • Examples • $string =~ s/google/GOOGLE/ • s/(google|GOOGLE)/Microsoft/ • Others • s#^https://#http://#;
Flag options • /i : case insensitivity • /s : let . become [\d\D] • /g : multiple replacement • Examples • s/google/yahoo/sg • s/\s+/ /g • s/^\s+// • s/\s+$// • s#^.*/##s