360 likes | 474 Views
Welcome to Integrated Bioinformatics Friday, 8 September 2006. Comparison of genomes – Scenario Installing and running Blast. Weekend/Monday – How to find differences. Nature of research articles. Escherichia coli. . . . very small lab rats.
E N D
Welcome toIntegrated BioinformaticsFriday, 8 September 2006 • Comparison of genomes – Scenario • Installing and running Blast • Weekend/Monday – How to find differences • Nature of research articles
Escherichia coli . . . . . . very small lab rats Courtesy of Kent State University Microbiology E. coli: What makes it kill?
E. coli: What makes it kill? Escherichia coli . . . haemorrhagic colitis
TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA Gene finder Gene finder E. coli: What makes it kill? E. coli K12 E. coli O157:H7
TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA Gene finder Gene finder E. coli: What makes it kill? E. coli K12 E. coli O157:H7
Killer protein Killer functions Membrane protein, sodium transporter Iron responsive transcriptional regulator Calcium-dependent protein kinase Unknown protein Unknown protein Unknown protein . . . Similarity finder ideas for new antibiotics E. coli: What makes it kill?
TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA Gene finder Gene finder Welcome toIntegrated BioinformaticsFriday, 8 September 2004
Welcome toIntegrated BioinformaticsFriday, 8 September 2006 • Nature of research articles • Comparison of genomes - Scenario • Weekend/Monday – How to find differences – Parsing programs – Regular expressions
Welcome toIntegrated BioinformaticsFriday, 8 September 2006 • Nature of problem sets • Nature of research articles • Comparison of genomes - Scenario • Weekend/Monday – How to find differences • Today – Why differences
How to distinguish from ? How do differences arise between genomes? Addition/deletion of DNA Where do they come from? – GC-content
How do differences arise between genomes? Addition/deletion of DNA Point mutation organism 1 TTT TCT GAA TCC GTA GAC GTT organism 2 TTT TCT GAA TCA GCA GAC GTG What kind of mutations arise?
How do differences arise between genomes? Addition/deletion of DNA Point mutation Keeping track of gene variants – Concepts of ortholog / paralog
Phage Infection Phage genome Bacterial chromosome Lysogenicpathway Lytic pathway Phage genome How do differences arise between genomes? Death General transduction
Phage Infection Phage genome Bacterial chromosome Lysogenicpathway Lytic pathway Phage genome How do differences arise between genomes? Life!
Phage Infection Phage genome Bacterial chromosome Lysogenicpathway Lytic pathway Phage genome How do differences arise between genomes? Life!
b tox–C.d. tox+C.d. The gene encoding diphtheria toxin (tox) is carried on corynephage b Lysogenic conversion by corynephage b confers toxogenicity!!
[G] + [C] GC-content = [total nucleotides] Borrelia burgdorferi Mycobacterium tuberculosis AAU Asn 0.80 AACAsn 0.20AAALys 0.80AAGLys 0.20 AAU Asn 0.21 AACAsn 0.79AAALys 0.26AAGLys 0.74 29% GC content 65% GC content How to distinguish foreign from native genes? SQ2: List the two triplets that code for Lys. What proportion of each is used in Borrelia burgdorferi compared to Mycobacterium tuberculosis? Is this finding surprising? Why or why not?
How to distinguish foreign from native genes? SQ4: The GC content of Bacillus anthracis is 33.97%. By analysis of codon use, would it likely be easier to detect a foreign gene originating from Borrelia burgdorferi or from Mycobacterium tuberculosis? Borrelia burgdorferi Mycobacterium tuberculosis AAU Asn 0.80 AACAsn 0.20AAALys 0.80AAGLys 0.20 AAU Asn 0.21 AACAsn 0.79AAALys 0.26AAGLys 0.74 29% GC content 65% GC content
DNA mutation has multiple causes • Errors during DNA replication • base mis-incorporation • polymerase slippage / repeat amplification • Errors during recombination or cell division • chromosome loss or rearrangement • large insertions or deletions • Environmental factors – mutagens: • radiation – UV or ionizing radiation • chemical – many mechanism of action • Spontaneous events: • tautomerisation • depurination • deamination • Viral infection or transposons
GUU ValGUC ValGUA ValGUG Val GCU AlaGCC AlaGCA AlaGCG Ala How do differences arise between genomes? Addition/deletion of DNA Point mutation organism 1 TTT TCT GAA TCC GTA GAC GTT organism 2 TTT TCT GAA TCA GCA GAC GTG
GUU ValGUC ValGUA ValGUG Val GCU AlaGCC AlaGCA AlaGCG Ala How do differences arise between genomes? Addition/deletion of DNA Point mutation organism 1 TTT TCT GAA TCC GTA GAC GTT organism 2 TTT TCT GAA TCA GCA GAC GTG Silent mutation
Single base mutations Transitions Transversions Purine for pyrimidine or pyrimidine for purine Purine for purine or pyrimidine for pyrimidine
Transition: purine purine pyrimidine pyrimidine Transversion: purine pyrimidine How do differences arise between genomes? Addition/deletion of DNA Point mutation organism 1 TTT TCT GAA TCC GTA GAC GTT organism 2 TTT TCT GAA TCA GCA GAC GTG
Tautomerization of bases C T G A C* T* A G
DNA replication can “lock in” a mutation Mutations can arise as a consequence of misincorporation during replication
How to distinguish foreign from native genes? SQ7: There are two codons each for 9 of the amino acids. Choose any one of these 18 codons. • Create a transition mutation in the third position of the codon. What is the result? • Create a transversion mutation in the third position. What is the result? • In the third position, are transition mutations or transversion mutations more likely to result in a change in the amino acid encoded?
How do differences arise between genomes? Addition/deletion of DNA Point mutation Keeping track of gene variants – Concepts of ortholog / paralog
How do differences arise between genomes? Addition/deletion of DNA Point mutation Keeping track of gene variants – Concepts of ortholog / paralog
Speciation event leading to orthologs Horizontal transfer leads to xenologs Gene duplication gives rise to paralogs Orthologs, Paralogs, and Xenologs
Orthologs vs Paralogs SQ5: Are genes B1 and C2 orthologs or paralogs? How to predict orthology with imperfect information?