810 likes | 990 Views
Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments — The molecular clock has two hands Xiaolong Wang Ocean University of China Email: Xiaolong@ouc.edu.cn Website: www.DNAPlusPro.com. From the Cell to Protein Machines.
E N D
Accurate Reconstruction of Molecular Phylogenies Using United Codon-aa Sequence Alignments —The molecular clock has two hands Xiaolong Wang Ocean University of China Email: Xiaolong@ouc.edu.cn Website: www.DNAPlusPro.com
NATURE | VOL 409 | 15 FEBRUARY 2001 | • SCIENCE| Vol 291 | 16 FEBRUARY 2001
已完成基因组测序的物种(部分) Rickettsiaprowazekii Ureaplasmaurealyticum Drosophila melanogaster Helicobacter pylori Bacillus subtilis human Buchnerasp. APS Escherichia coli Arabidopsis Thermotogamaritima Caenorhabitiselegans Thermoplasmaacidophilum Mouse Rat rat Borreliaburgorferi NeisseriameningitidisZ2491 Plasmodium falciparum Borreliaburgorferi Mycobacterium tuberculosis Aquifexaeolicus
How many characters are in the “Heaven Book”? 3*109 10,000 books 1 book 100 pages 1 page 3,000 characters CCGGTCTCCCCGCCCGCGCGCGAAGTAAAGGCCCAGCGCAGCCCGCGCTCCTGCCCTGGGGCCTCGTCTTTCTCCAGGAAAACGTGGACCGCTCTCCGCCGACAGTCTCTTCCACAGACCCCTGTCGCCTTCGCCCCCCGGTCTCTTCCGGTTCTGTCTTTTCGCTGGCTCGATACGAACAAGGAAGTCGCCCCCAGCGAGCCCCGGCTCCCCCAGGCAGAGGCGGCCCCGGGGGCGGAGTCAACGGCGGAGGCACGCCCTCTGTGAAAGGGCGGGGCATGCAAATTCGAAATGAAAGCCCGGGAACGCCGAAGAAGCACGGGTGTAAGATTTCCCTTTTCAAAGGCGGGAGAATAAGAAATCAGCCCGAGAGTGTAAGGGCGTCAATAGCGCTGTGGACGAGACAGAGGGAATGGGGCAAGGAGCGAGGCTGGGGCTCTCACCGCGACTTGAATGTGGATGAGAGTGGGACGGTGACGGCGGGCGCGAAGGCGAGCGCATCGCTTCTCGGCCTTTTGGCTAAGATCAAGTGTAGTATCTGTTCTTATCAGTTTAATATCTGATACGTCCTCTATCCGAGGACAATATATTAAATGGATTGATCAATCCGCTTCAGCCTCCCGAGTAGCTGGGACTACAGACGGTGCCATCACGCCCAGCTCATTGTTGATTCCCGCCCCCTTGGTAGAGACGGGATTCCGCTATATTGCCTGGGCTGGTGTCGAACTCATAGAACAAAGGATCCTCCCTCCTGGGCCTGGGCGTGGGCTCGCAAAACGCTGGGATTCCCGGATTACAGGCGGGCGCACCACACCAGGAGCAAACACTTCCGGTTTTAAAAATTCAGTTTGTGATTGGCTGTCATTCAGTATTATGCTAATTAAGCATGCCCGGTTTTAAACCTCTTAAAACAACTTTTAAAATTACCTTTCCACCTAAAACGTTAAAATTTGTCAAGTGATAATATTCGACAAGCTGTTATTGCCAAACTATTTTCCTATTTGTTTCCTAATGGCATCGGAACTAGCGAAAGTTTCTCGCCATCAGTTAAAAGTTTGCGGCAGATGTAGACCTAGCAGAGGTGTGCGAGGAGGCCGTTAAGACTATACTTTCAGGGATCATTTCTATAGTGTGTTACTAGAGAAGTTTCTCTGAACGTGTAGAGCACCGAAAACCACGAGGAAGAGAGGTAGCGTTTTCATCGGGTTACCTAAGTGCAGTGTCCCCCCTGGCGCGCAATTGGGAACCCCACACGCGGTGTAGAAATATATTTTAAGGGCGCG
生物信息学 生物信息学家们面对的是堆集如山的DNA片段
Genome sequences: • What to do? • Comparative genomics • Functional genomics • Structure biology • How useful? • Drug design • Personal genetics • Molecular breeding • Gene prediction and annotation • Non-coding RNA discovery • Molecular Phylogeny reconstruction • …
I. Drug Designing • Understanding How Structures Bind Other Molecules (Function) • Designing Inhibitors • Docking, Structure Modeling Three-dimensional molecular structure is one of the foundations of structure-based drug design. Often, data are available for the shape of a protein and a drug separately, but not for the two together. Docking is the process by which two molecules fit together in 3D space.
生物信息学与新药研制 基因 序列 表达 数据 数据 处理 关联 分析 确定 靶标 分子 • 设计 • 药物 现代药物研究是基于生物信息知识挖掘的过程
Proteininhibitors (Virusas an example) • attachment, entry and fusion inhibitors • DNA polymerase inhibitors • integrase inhibitors • interferons • maturation inhibitors • monoclonal antibodies • neuraminidase inhibitors • NS3 protease inhibitors • nucleoside reverse transcriptase inhibitors • protease inhibitors • reverse transcriptase inhibitors • RNA polymerase inhibitors
Nucleic acid inhibitors (Antisense oligonucleotides or RNAi) • Targeting mRNA • Targeting microRNA • Targeting genomic DNA • Interfere RNA processing • Aptamers oligonucleotide or peptide molecules that bind to a specific target molecule
Fomivirsen (Vitravene) — the first and only antisense antiviral drug approved by FDA $63.87 USD per injection Fomivirsen (ISIS 2922)
Wang X, Gou D, Xu S-y (2010) Polymerase-Endonuclease Amplification Reaction (PEAR) for Large-Scale Enzymatic Production of Antisense Oligonucleotides. PLoS ONE 5(1): e8430. doi:10.1371/journal.pone.0008430
Polymerase-Endonuclease Amplification Reaction (PEAR) for Enzymatic Production of Antisense Oligonucleotides
Target X Annealing Probe X X’ R’ X’ Denaturation X’ R’ X’ X R X R X Denaturation Annealing X X’ R’ X’ R’ X’ Elongation X X R X X X’ Annealing X’ R’ X’ X’ R’ X’ X’ Annealing (slipping) Cleaving X R X Elongation dNTPs Taq polymerase Denaturation X’ R’ X’ Annealing Denaturation X R X X X PspGI X R X + Cleaving X’ R’ X’ X’ X’ X’ R’ X’
Other potential applications of PEAR • (1). PEAR is a minimal DNA replication system, to study the origin and evolution of repetitive DNA in genome, as well as the origin and evolution of genetic material and life. • (2). The repeat PEAR product DNA can be transferred into cells or organisms to study • Function of repeat DNA sequences. • Repeat Gene Sponges • Repeat Gene Missile • Repeat Gene probe
Genome sequences: • What to do? • Comparative genomics • Functional genomics • Structure biology • How useful? • Drug design • Personal genetics • Molecular breeding • Gene prediction and annotation • Non-coding RNA discovery • Molecular Phylogeny reconstruction • …
约600万年前开始,源自同一个祖先,人类和黑猩猩走上了不同的进化道路。600万年后的今天,科学家们另辟蹊径,通过对人类的亲戚———黑猩猩的基因组序列分析,并将其与人类的基因组序列相比较,来解答人类起源和进化过程中的问题。
Use ClustalW to do a progressive MSA http://www2.ebi.ac.uk/clustalw/
Use ClustalW to do a progressive MSA http://www.clustal.org/
CLUSTAL MUSCLE MAFFT ProbCons Praline TCOFFEE
United Codon-aa Sequence Alignment CodingDNAsequences atggggataaat … tga atgataaatagt … tga Translate PeptideSequences MG I N … * M I N S … * Combine Combined DNA-protein Sequences atgMgggGataIaatN … tga* atgMataIaatNagtS … tga* Align Combined alignment atgMgggGataIaatN---- … tga* atgMgggGataI ---- agtS…tga*
S1AClustalW Thompson J.D., Higgins, D.G. and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.
S1B MAFFT • Katoh K, Kuma K, Toh H, Miyata T., 2005. Nucleic Acids Res. • KatohK, Misawa K, Kuma K, Miyata T., 2002. Nucleic Acids Res.
S1C MUSCLE • RobertC. Edgar, 2004. MUSCLE. Nucleic Acids Res. 32(5): 1792–1797. • Robert C Edgar, 2004. MUSCLE. BMC Bioinformatics.5: 113.
S1D T-coffee • Poirot O, O'Toole E, Notredame C. 2003. Tcoffee@igs,Nucleic Acids Res.31(13):3503-6. • NotredameC, Higgins DG, Heringa J. 2000. T-Coffee:J Mol Biol.302(1):205-17.
S1E PRANK • Löytynoja A, Goldman N. 2005. ProcNatlAcadSci USA.102(30): 10557–10562. • Löytynoja A, Goldman N. 2008. Science, 320 (5883):1632-1635.
S1G • Codon Alignment back translate
S2C MUSCLE S2A ClustalW S2BMAFFT ENV • S2D T-coffee • S2E PRANK • S2F DNA+PRO
gag env 3A Protein env gag • 3B • DNA+PRO
env gag • 3C • Back translate • combined env • gag 3D Codon alignment
env S3A Protein • gag env gag • S3B • DNA+PRO
4A Protein • 4B • Codon alignment • 4C • DNA+PRO BamHI homologs
5A Protein • 5B • Codon • Alignment • 5C • 5C • DNA+PRO SAUSA300_2431 homologs ARobust multi-gene phylogenetic tree