1 / 34

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc 13928761660

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc.edu 13928761660. More Conditions. But. Use ==, <, <=, >, >=, !=, ||, && for numeric numbers Use eq, lt, le, gt, ge, ne, or, and for string comparisons. More Arithmatics. +, -, *, **, /, % +=, -=, *=, **=, /=, %= ++, --. $x = 28; $x = $x/2;

hueyr
Download Presentation

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc 13928761660

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics生物信息学理论和实践唐继军jtang@cse.sc.edu13928761660Bioinformatics生物信息学理论和实践唐继军jtang@cse.sc.edu13928761660

  2. More Conditions

  3. But • Use ==, <, <=, >, >=, !=, ||, && for numeric numbers • Use eq, lt, le, gt, ge, ne, or, and for string comparisons

  4. More Arithmatics • +, -, *, **, /, % • +=, -=, *=, **=, /=, %= • ++, --

  5. $x = 28; $x = $x/2; print $x/=2, "\n"; print $x--, "\n"; print $x, "\n"; print --$x, "\n"; print $x, "\n"; print $x % 3, "\n"; print $x**2, "\n";

  6. #!/usr/bin/perl -w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; $position = 0; while ( $position < length $DNA) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } $position++; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

  7. #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; for ( $position = 0 ; $position < length $DNA ; ++$position ) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

  8. #!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_CG = 0; while($DNA =~ /c/ig) {$count_of_CG++;} while($DNA =~ /g/ig) {$count_of_CG++;} print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

  9. $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s .= substr($DNA, $i, $j); } print $s, "\n";

  10. $DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s .= substr $DNA, $i, $j; } print ($s, "\n");

  11. Call functions/subroutines • Name p1, p2, p3; • Name(p1, p2, p3); • print $DNA1, $DNA2, "\n"; • print ($DNA1, $DNA2, "\n");

  12. Exercise 1 • Ask for a protein file in fasta format • Ask for an amino acid • Count the frequency of that amino acid • TKFHSNAHFYDCWRMLQYQLDMRCMRAISTFSPHCGMEHMPDQTHNQGEMCKPRMWQVSMNQSCNHTPPFRKTYVEWDYMAKALIAPYTLGWLASTCFIW

  13. Exercise 2 • Ask for an RNA file in fasta format • Convert it to RNA • Ask for a codon • Count the frequency of that codon • TCGTACTTAGAAATGAGGGTCCGCTTTTGCCCACGCACCTGATCGCTCCTCGTTTGCTTTTAAGAACCGGACGAACCACAGAGCATAAGGAGAACCTCTAGCTGCTTTACAAAGTACTGGTTCCCTTTCCAGCGGGATGCTTTATCTAAACGCAATGAGAGAGGTATTCCTCAGGCCACATCGCTTCCTAGTTCCGCTGGGATCCATCGTTGGCGGCCGAAGCCGCCATTCCATAGTGAGTTCTTCGTCTGTGTCATTCTGTGCCAGATCGTCTGGCAAATAGCCGATCCAGTTTATCTCTCGAAACTATAGTCGTACAGATCGAAATCTTAAGTCAAATCACGCGACTAGACTCAGCTCTATTTTAGTGGTCATGGGTTTTGGTCCCCCCGAGCGGTGCAACCGATTAGGACCATGTAGAACATTAGTTATAAGTCTTCTTTTAAACACAATCTTCCTGCTCAGTGGTACATGGTTATCGTTATTGCTAGCCAGCCTGATAAGTAACACCACCACTGCGACCCTAATGCGCCCTTTCCACGAACACAGGGCTGTCCGATCCTATATTACGACTCCGGGAAGGGGTTCGCAAGTCGCACCCTAAACGATGTTGAAGGCTCAGGATGTACACGCACTAGTACAATACATACGTGTTCCGGCTCTTATCCTGCATCGGAAGCTCAATCATGCATCGCACCAGCGTGTTCGTGTCATCTAGGAGGGGCGCGTAGGATAAATAATTCAATTAAGATATCGTTATGCTAGTATACGCCTACCCGTCACCGGCCAACAGTGTGCAGATGGCGCCACGAGTTACTGGCCCTGATTTCTCCGCTTCTAATACCGCACACTGGGCAATACGAGCTCAAGCCAGTCTCGCAGTAACGCTCATCAGCTAACGAAAGAGTTAGAGGCTCGCTAAATCGCACTGTCGGGGTCCCTTGGGTATTTTACACTAGCGTCAGGTAGGCTAGCATGTGTCTTTCCTTCCAGGGGTATG

  14. Subroutine • Some code needs to be reused • A good way to organize code • Called “function” in some languages • Name • Return • Parameters (@_)

  15. #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = countG($DNA); print $count_of_G; sub countG { my($dna) = @_; my($count) = 0; $count = ( $dna =~ tr/Gg//); return $count; }

  16. #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>; @DNA = <DNAFILE>; close DNAFILE; $DNA = join('', @DNA); $DNA =~ s/\s//g; $count_of_G = count($DNA, 'Gg'); print $count_of_G; sub count { my($dna, $pattern) = @_; my($count) = 0; $count = ( eval("$dna =~ tr/$pattern//") ); return $count; }

  17. Codon

  18. sub codon2aa { my($codon) = @_; if ( $codon =~ /TCA/i ) { return 'S' } # Serine elsif ( $codon =~ /TCC/i ) { return 'S' } # Serine elsif ( $codon =~ /TCG/i ) { return 'S' } # Serine elsif ( $codon =~ /TCT/i ) { return 'S' } # Serine elsif ( $codon =~ /TTC/i ) { return 'F' } # Phenylalanine elsif ( $codon =~ /TTT/i ) { return 'F' } # Phenylalanine elsif ( $codon =~ /TTA/i ) { return 'L' } # Leucine elsif ( $codon =~ /TTG/i ) { return 'L' } # Leucine elsif ( $codon =~ /TAC/i ) { return 'Y' } # Tyrosine elsif ( $codon =~ /TAT/i ) { return 'Y' } # Tyrosine elsif ( $codon =~ /TAA/i ) { return '_' } # Stop elsif ( $codon =~ /TAG/i ) { return '_' } # Stop elsif ( $codon =~ /TGC/i ) { return 'C' } # Cysteine elsif ( $codon =~ /TGT/i ) { return 'C' } # Cysteine elsif ( $codon =~ /TGA/i ) { return '_' } # Stop elsif ( $codon =~ /TGG/i ) { return 'W' } # Tryptophan elsif ( $codon =~ /CTA/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTC/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTG/i ) { return 'L' } # Leucine elsif ( $codon =~ /CTT/i ) { return 'L' } # Leucine elsif ( $codon =~ /CCA/i ) { return 'P' } # Proline elsif ( $codon =~ /CCC/i ) { return 'P' } # Proline elsif ( $codon =~ /CCG/i ) { return 'P' } # Proline elsif ( $codon =~ /CCT/i ) { return 'P' } # Proline elsif ( $codon =~ /CAC/i ) { return 'H' } # Histidine elsif ( $codon =~ /CAT/i ) { return 'H' } # Histidine elsif ( $codon =~ /CAA/i ) { return 'Q' } # Glutamine elsif ( $codon =~ /CAG/i ) { return 'Q' } # Glutamine elsif ( $codon =~ /CGA/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGC/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGG/i ) { return 'R' } # Arginine elsif ( $codon =~ /CGT/i ) { return 'R' } # Arginine elsif ( $codon =~ /ATA/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATC/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATT/i ) { return 'I' } # Isoleucine elsif ( $codon =~ /ATG/i ) { return 'M' } # Methionine elsif ( $codon =~ /ACA/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACC/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACG/i ) { return 'T' } # Threonine elsif ( $codon =~ /ACT/i ) { return 'T' } # Threonine elsif ( $codon =~ /AAC/i ) { return 'N' } # Asparagine elsif ( $codon =~ /AAT/i ) { return 'N' } # Asparagine elsif ( $codon =~ /AAA/i ) { return 'K' } # Lysine elsif ( $codon =~ /AAG/i ) { return 'K' } # Lysine elsif ( $codon =~ /AGC/i ) { return 'S' } # Serine elsif ( $codon =~ /AGT/i ) { return 'S' } # Serine elsif ( $codon =~ /AGA/i ) { return 'R' } # Arginine elsif ( $codon =~ /AGG/i ) { return 'R' } # Arginine elsif ( $codon =~ /GTA/i ) { return 'V' } # Valine elsif ( $codon =~ /GTC/i ) { return 'V' } # Valine elsif ( $codon =~ /GTG/i ) { return 'V' } # Valine elsif ( $codon =~ /GTT/i ) { return 'V' } # Valine elsif ( $codon =~ /GCA/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCC/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCG/i ) { return 'A' } # Alanine elsif ( $codon =~ /GCT/i ) { return 'A' } # Alanine elsif ( $codon =~ /GAC/i ) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GAT/i ) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GAA/i ) { return 'E' } # Glutamic Acid elsif ( $codon =~ /GAG/i ) { return 'E' } # Glutamic Acid elsif ( $codon =~ /GGA/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGC/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGG/i ) { return 'G' } # Glycine elsif ( $codon =~ /GGT/i ) { return 'G' } # Glycine else { print STDERR "Bad codon \"$codon\"!!\n"; exit; } # }

  19. sub codon2aa { my($codon) = @_; if ( $codon =~ /GC./i) { return 'A' } # Alanine elsif ( $codon =~ /TG[TC]/i) { return 'C' } # Cysteine elsif ( $codon =~ /GA[TC]/i) { return 'D' } # Aspartic Acid elsif ( $codon =~ /GA[AG]/i) { return 'E' } # Glutamic Acid elsif ( $codon =~ /TT[TC]/i) { return 'F' } # Phenylalanine elsif ( $codon =~ /GG./i) { return 'G' } # Glycine elsif ( $codon =~ /CA[TC]/i) { return 'H' } # Histidine elsif ( $codon =~ /AT[TCA]/i) { return 'I' } # Isoleucine elsif ( $codon =~ /AA[AG]/i) { return 'K' } # Lysine elsif ( $codon =~ /TT[AG]|CT./i) { return 'L' } # Leucine elsif ( $codon =~ /ATG/i) { return 'M' } # Methionine elsif ( $codon =~ /AA[TC]/i) { return 'N' } # Asparagine elsif ( $codon =~ /CC./i) { return 'P' } # Proline elsif ( $codon =~ /CA[AG]/i) { return 'Q' } # Glutamine elsif ( $codon =~ /CG.|AG[AG]/i) { return 'R' } # Arginine elsif ( $codon =~ /TC.|AG[TC]/i) { return 'S' } # Serine elsif ( $codon =~ /AC./i) { return 'T' } # Threonine elsif ( $codon =~ /GT./i) { return 'V' } # Valine elsif ( $codon =~ /TGG/i) { return 'W' } # Tryptophan elsif ( $codon =~ /TA[TC]/i) { return 'Y' } # Tyrosine elsif ( $codon =~ /TA[AG]|TGA/i) { return '_' } # Stop else {print STDERR "Bad codon \"$codon\"!!\n"; exit; } }

  20. Exercise • Make the subroutine of converting codon to aa • Read in a dna fasta file, print out an Amino Acid sequence

  21. #!/usr/bin/perl -w $dna = 'CGACGTCTTCGTACGGGACTAGCTCGTGTCGGTCGC'; $protein = ''; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein .= codon2aa($codon); } print "I translated the DNA\n\n$dna\n\n into the protein\n\n$protein\n\n"; sub codon2aa { #... }

  22. Reading Frame   5'                                                   3'    atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa     M   P   K   L   N   S   V   E   G   F   S   S   F   E   D   D   V   *  2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat      C   P   S   *   I   A   *   R   G   F   H   H   L   R   T   M   Y  3   gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata       A   Q   A   E   *   R   R   G   V   F   I   I   *   G   R   C   I  three in the forward reading, three in the reverse complement reading

  23. Exercise 3 • Make the subroutine of converting codon to aa • Read in a dna fasta file, print out an Amino Acid sequence • There are 6 reading frame, can you try to print all 6 version?

  24. #!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n"; sub dna2peptide { my ($dna) = @_; my $protein = ""; for(my $i=0; $i < (length($dna) - 2) ; $i += 3) { $codon = substr($dna,$i,3); $protein .= codon2aa($codon); } return $protein; } sub codon2aa { #... }

  25. Modules • A Perl Module is a self-contained pieceof [Perl] code that can be used by a Perl program later • Like a library • End with extension .pm • Needs a 1 at the end

  26. #Bio.pm sub codon2aa { #.... #.... } sub dna2peptide { #.... #.... } 1

  27. #!/usr/bin/perl -w use Bio; print "Please type the filename: "; $dna_filename = <STDIN>; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

  28. #Bio.pm sub codon2aa { #.... #.... } sub dna2peptide { #.... #.... } sub fasta_read { print "Please type the filename: "; my $dna_filename = <STDIN>; chomp $dna_filename; unless (open(DNAFILE, $dna_filename)) { print "Cannot open file ", $dna_filename, "\n"; } $name = <DNAFILE>;@DNA = <DNAFILE>;close DNAFILE; $DNA = join( '', @DNA);$DNA =~ s/\s//g; return $DNA; } 1

  29. #!/usr/bin/perl -w use Bio; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

  30. Scope • my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. • Blocks of code are hunks within curly braces {}; files are blocks. • Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

  31. #!/usr/bin/perl -w use Bio; use strict; use warnings; $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

  32. Variable "$DNA" is not imported at frame2.pl line 6. Variable "$DNA" is not imported at frame2.pl line 8. Variable "$DNA" is not imported at frame2.pl line 9. Variable "$DNA" is not imported at frame2.pl line 10. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 12. Variable "$DNA" is not imported at frame2.pl line 13. Variable "$DNA" is not imported at frame2.pl line 14. Variable "$DNA" is not imported at frame2.pl line 15. Global symbol "$DNA" requires explicit package name at frame2.pl line 6. Global symbol "$DNA" requires explicit package name at frame2.pl line 8. Global symbol "$DNA" requires explicit package name at frame2.pl line 9. Global symbol "$DNA" requires explicit package name at frame2.pl line 10. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 12. Global symbol "$DNA" requires explicit package name at frame2.pl line 13. Global symbol "$DNA" requires explicit package name at frame2.pl line 14. Global symbol "$DNA" requires explicit package name at frame2.pl line 15. Execution of frame2.pl aborted due to compilation errors.

  33. #!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

More Related