260 likes | 428 Views
Computational Virology. Lectures in. Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms. Marcella A. McClure, Ph.D. Department of Microbiology and the Center for Computational Biology Montana State University, Bozeman MT mars@parvati.msu.montana.edu.
E N D
Computational Virology Lectures in Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms Marcella A. McClure, Ph.D. Department of Microbiology and the Center for Computational Biology Montana State University, Bozeman MT mars@parvati.msu.montana.edu
Summary Lecture II Introduction to Retroid Agents The Genome Parsing Suite Retroid Agents in the Human Genome Discovery-based Hypothesis Generation
Retroid Agents Retroviruses, retrotransposons, pararetroviruses, retroposons, retroplasmids, retrointrons, and retrons RNA viruses e.g., Ebola, rabies, influenza, polio All cellular systems & most DNA Viruses reverse transcriptase mediated replication or transposition RNA DNA Replication by DNA-dependent DNA polymerase transcription Replication by RNA-dependent RNA Polymerase translation snRNAs, ribozymes tRNA, rRNA PROTEIN SYNTHESIS McClure, 2000
Distribution of Retroid Agents among Eukaryotes and Eubacteria Eubacteria Archaea Retroid agents Eukaryotes Human Vertebrates Invertebrates Plants Fungi Protists Plastids Baculovirus Genome Conjugative Slime Mold Alga Oomycetes transposons Protozoa Retroviruses + + +a Pararetroviruses Caulimoviruses + Badnaviruses + Hepadnaviruses + + Transposons : Retrotransposons Gypsy - + +b + + + + + + + DIRS1 - + + + Copia - + +b + + + + + Retroprosons + + + + + + + + + + Retrointrons + + + + Retroplasmids + + Retrons + + Retrophages +
Gene Maps Phylogenetic Tree based Gene Maps on 65 RT sequences MA C NC retroviruses HIV-1 orphan class DIRS-1 C NC gypsy-like retrotransposons 17.6 NC CaMV caulimoviruses hepadnaviruses HBV NC copia-like retrotransposons Copia C LIN-H NC C CIN4 C R2Bm NC retroposons C I-FAC INGI introns INT-SC1 Group II plasmids MAUP retrons MX65 TERT 1000 2000 3000 4000 RT = reverse transcriptase RH= ribonuclease H Nucleotides H-C/IN =integrase PR = aspartic acid protease McClure, 2000
RNA-dependent DNA Polymerase DD K G Reverse Transcriptase Ribonuclease H 1 2 3 4 1 2 3 4 5 6 P D E D K D NX D 3 fingers palm fingers palm thumb connection Aspartic Acid Protease 1 2 3 1 2 3 DTG G ILG DTG G ILG Integrase 1 2 3 4 1 2 3 4 D D E D D E Hx H CX C Hx H CX C 4 2 4 2 zinc-binding core DNA-binding zinc-binding core DNA-binding
Roles of Retroid Agents: 1) Disease: a) retroviruses: 1) exogenous infectious: HIV HTLV 2) endogenous associations: breast cancer, testicular tumors, insulin dependent diabetes, multiple sclerosis, rheumatoid arthritis, schizophrenia and systemic lupus erythematosus b) LINEs insertional mutagenesis: 1) Hemophilia A 2) muscular dystrophies; Duchenne and Fukuyama- congenital type 3) X-linked disorders; Alport Syndrome-Diffuse Leiomyomatosis and Chronic Granulomatous Disease 2) Regulation of cellular genes and reproduction 3) Telomere maintenance 4) Repair of broken dsDNA 5) Exchange of genetic information among and between organisms
Possible function of HERV-W Syncytiotrophoblast Trophoblast HERV-W Endometrium Syncytin
What is the “host” genomic environment of active Retroid Agents ? Predicted functional RT Predicted Retroid genome Real Contig Real Chromosome Disease Reproduction Development
RNA-dependent DNA Polymerase DD K G Reverse Transcriptase Ribonuclease H 1 2 3 4 1 2 3 4 5 6 P D E D K D NX D 3 fingers palm fingers palm thumb connection Aspartic Acid Protease 1 2 3 1 2 3 DTG G ILG DTG G ILG Integrase 1 2 3 4 1 2 3 4 D D E D D E Hx H CX C Hx H CX C 4 2 4 2 zinc-binding core DNA-binding zinc-binding core DNA-binding
The score of a given motif is calculated by M + M1 + M2 M score = M length T motifs ∑ M score_i i = 1 OSM score = T motifs M, M1 and M2 are based on the number of amino acids in a motif found in common between a known RT query sequence and the potential RT M is a count of amino acid identities M1 is a count on conservative substitution of (ILMV, AG, ST, DE, NQ, FY, RK) M2 accounts for older substitutions (LIMV, AGST, DENQ, FYW, RKH) The overall OSM score is calculated by T motifs is the number of motifs comprising the OSM
Status of the Human Genome Project • 3,200,000 Kbp of the euchromatic portion of the human chromosomes are being sequenced • Heterochromatic portion is not being done • As of January 5, 2003: • Non-redundant sequence only • 98.8% of euchromatic portion has been done • 3.0% is completed to the working draft level • 95.8% has been completed to 99% accuracy
Fluctuations in Nucleotides Fluctuation in nucleotides per chromosome (A) and unique BLAST RT hits per chromosome (B) over the last four freezes. The bar codes are as follows: black November, 2002; right-hatched, June 2002; gray April 2002; and left-hatched December 2001.
Distribution of Significant Blast hits Distribution of significant BLAST hits retrieved by 22 RT protein query sequences per chromosome. Chromosomal size from the Nov. 2002 HGD freeze is given in megabase pairs. Other column designations are described in the text. The significant raw and unique hits are from all 22 queries. The RTs with six motifs are significant hits retrieved by LINEs, HERVs, MMLV and TERT queries. Intact OSMs are found only in LINEs, HERVs and the TERT. The last two columns report the full length LINEs with all components and perfect LINEs, respectively.
Classicification of 1656 whole LINEs A total of 153 LINEs appear to be perfect, while 86 contain a single stop codon and 80 a single frame-shift.
Distribution of significant BLAST hits per query sequence. Query Hits Query Hits H-LIN 170260/69692/7345/2760 RTBV 60/12/0/0 HERV-K 2982/496/86/22 CMV 174/11/0/0 HERV-L 8208/2910/208/12 Copia 104/9/0/0 MMLV 4559/2108/4/0 Gypsy 334/14/0/0 MPMV 3506/52/0/0 DIRO 97/12/0/0 HIV 903/8/0/0 IPAO 27/13/0/0 FIV 1505/15/0/0 PMAUP 19/18/0/0 HTLV 3232/51/0/0 RECO 9/9/0/0 Snakehead 3109/39/0/0 H_TERT 1857/1581/1/1 SPUMA 2369/17/0/0 R_TERT 26/21/0/0 HBV 58/31/0/0 Archaea 11/11/0/0 Values indicate raw hits/unique hits/RTs with 6 motifs/Perfect OSMs. The 22 representative sequences used to query the HGD. Sequences, excluding the HERVs and human TERT, are the representative mean sequences for over 600 RTs from eight different classes of Retroid Agents.
Distribution of the 482 Low Frequency Reverse Transcriptase hits Distribution of the 482 Low Frequency Reverse Transcriptase hits with remnants of at least one motif. Number of Low Frequency hits/Number of hits with a minimum of one recognizable motif. Of the 482 hits, 108 have at least one recognizable RT motif. The remaining 374 hits have remnants of at least one motif and were conserved enough to be scored by GPS.
Chromosomes HIV MPMV Spuma TERT Chromosome Motifs K D QG DD G-K LG K D QG DD G-K LG K D QG DD G-K LG K D QG DD G-K LG 1R 1 1R 29R C (1)C (1)C C 2 1C 1R 14R 3 1C 1C 13R 1R 1C C 1R 4 9R 3C(1)C 2C 5 1C 1C 1C1R 1C 1C 1C12R 6 8R 1C 7 10R 8 6R 9 (1)R C 15R 10 1C 13R 11 1C 1R 10R 12 1C 1R 1C 8R (1)C C1R 13 1C 4R 14 8R 15 1R 1C 12R 1R R 1C 16 24R 1R(1)C 17 1C 21R 18 1C 8R 19 1C 1R (2)C C C 2R 22R 2R 20 1C 8R 21 5R 1C 10R 1R (1)C C 22 R 1C X 1C 5R 1R (1)C C C C Y 1C(1)R
Looking at the environment of each Retroid Agent Truncated LINE inserted into Intron 6 Truncated L1MB1 inserted into Intron 6 Truncated L1PA5 inserted into Intron 8 Truncated LINE inserted into Intron 18 Chromosome 21 contig NT_029490 TPTE Gene Figure 3: Looking at the environment of each Retroid Genome. In this example, four truncated LINEs are found within three different exons of a putative Tyrosine Phosphatase gene (TPTE). Insertions of Retroid genomes into introns may have little effect on a gene, or may allow for gene shuffling. In this case none of the coding region for the gene was disrupted, which demonstrates that Retroid sequence information may be utilized to make introns, or selection favors insertions that do not disrupt coding capacity or introns may provide the preferential target site for transposition. The black lines represent the exons of the TPTE gene.
Distribution of Retroid Agents on Human Chromosomes (November, 2002 Freeze) Query: 22 distinct reverse transcriptase sequences representing 18 subgroups were used to query the NCBI’s Human Genome Database Results: 1) Retroid Agents are not randomly distributed on Human Chromosomes. 2) Chromosomes X and Y have the highest percent Retroid Agent sequence 3) Of those remaining, Chromosome 4, has the most, while Chromosome 20 comprises the least percent Retroid Agents. Only two chromosomes, 19 and 21 are without at least one intact and potentially active LINE.Using exact sequence lengths for each hit of each category indicated in the table of data, the November freeze of the human genome contains at least 1.01% unique RT sequences, 0.35% full-length LINEs and 0.032% active LINEs.
New hypotheses from discovery-based research 1) Low frequency RT-like sequences (not from LINEs or ERVs) are discernible in the Human Genome. 2) Human low frequency RT-like sequences are remnants of ancient invasions. 3) Human low frequency RT-like sequences are remnants of failed invasions. 4)The pattern of low frequency RT-like sequences is unique in each organismal genome. 5) Both unique and trans-organismal patterns of low frequency RT-like sequences are found in Eukaryotes. What mechanisms could be maintaining these signals ? Gene conversion, an event without a mechanism. Transcriptional inactivation due to methylation of CpG regions. Translational recoding. Complementation.
The McClure Lab The McClure Lab Eric Donaldson, B.S., Bioinformatician II Dustin Lee, M.S., Bioinformatics Programmer Aaron Juntunen, Undergraduate programmer Crystal Hepp, Undergraduate Kendal Harwood, Undergraduate Dr. Marcella McClure, P.I. (Marcie)