450 likes | 616 Views
Diversity and survival strategies of LTR retrotransposons in the Arabidopsis genome. Brooke Peterson-Burch Voytas Laboratory Iowa State University. Beyond genes. Most DNA in eukaryotes doesn’t code for anything necessary for the survival and replication of the organism.
E N D
Diversity and survival strategies of LTR retrotransposons in the Arabidopsis genome Brooke Peterson-Burch Voytas Laboratory Iowa State University
Beyond genes • Most DNA in eukaryotes doesn’t code for anything necessary for the survival and replication of the organism. • How did that sequence get there? • Why isn’t it eliminated? • Genome sequences can teach us about genome evolution and the part that retroelements play
What’s a retroelement? • Type of transposable element • A mRNA copy of the parental element ‘genome’ is reverse transcribed into DNA and inserted into a new location in the host • Transposition is replicative
RH RT RH RT Retroelement genomes gag retroposons EN RT RH AAAn gag Dirs λ Recombinase RH RT Pseudoviridae MA CA NC PR IN BEL gag PR RT RH IN CA MA NC Metaviridae PR IN env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR Retroviridae p6 NC MA CA rev vpu vif
env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR p6 NC MA CA rev vpu vif Transcription Element mRNA Translation Pseudoviridae LTR LTR MA CA NC RH PR IN RT Retro living…
env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR p6 NC MA CA rev vpu vif Particle Packaging Only viruses escape host cell Pseudoviridae LTR LTR MA CA NC RH PR IN RT Retroelement life cycle Element
env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR p6 NC MA CA rev vpu vif Reverse Transcription cDNA Element Pseudoviridae LTR LTR MA CA NC RH PR IN RT Retroelement life cycle
env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR p6 NC MA CA rev vpu vif cDNA New Copy IN Integration Pseudoviridae LTR LTR MA CA NC RH PR IN RT Retroelement life cycle Element
Retroelements play a major role in the structure and evolution of many genomes • Genome sequences provide a great resource for diversity, distribution, and element identification studies
Retroelements and Genomes • Genome data-mining can help answer questions about: • Number of Elements • Types of Elements • Diversity • Physical distribution • Impact on host • Odd or interesting elements • Evolutionary history • Element sequence and domain characteristics
Retroviridae Metaviridae Dirs Retroposons BEL Pseudoviridae A retroelement family tree
Lueckenbuesser (G) 4 8080198 Osser (G) Endovir1-1 PREM 2 SIRE 1 Opie-2 ToRTL1 5 Art1 2 Tpv2-6 Evelknievel Hopscotch Retrofit AtRE1 1 97 86 1 16648808 95 100 92 copia (I) X66399 Tst1 54 91 5 14977057 RIRE1 68 94 BARE 1 4 Sto 4 5 21307623 78 70 95 Tnt1 94 Tto1 85 Panzee 6 2 2904626 Tgmr 3 Ta1-3 100 Melmoth 1731 Mosqcopia (I) Ty5-6p (F) Tca5 (F) 0.1 5 8783861 Ty4 (F) Ta11 Tca2 (F) Ty1 (F) Retroviridae Metaviridae Dirs Retroposons BEL Pseudoviridae A.thaliana captures all plant Pseudoviridae diversity
LTR LTR MA CA NC RH PR IN RT Mapping proteases to HIV-1 structure helps explain patterns of conservation
LTR LTR MA CA NC RH PR IN RT Proline rich region G KGY ILGD C C C C H H H H D D D D E E Pseudoviridae +/- common region ILGD motif present G K G Y * * * * * * - -- 1 -- …217 1731 - -- 1 -- …211 BARE-1 - copia -- 1 -- …311 + Endovir1-1 -- 1 -- …239 GKGY - Melmoth -- 1 -- …223 - Mosqcopia -- 1 -- …218 + Opie-2 -- 1 -- …257 - -- 1 -- …290 Osser - -- 1 -- …327 Retrofit - -- 1 -- …231 Tnt1-94 - - -1 -- …465 Ty1 - Ty5 -- 4 -- …476 …249 + -- 60 -- Del …189 - -- 60 -- MMLV GPF/Y - …238 SnRV -- 57 -- - …248 Tf1 -- 58 -- - …201 Ty3-2 -- 60 -- - …198 Athila5-1 -- 68 -- Chromodomain present G P F Y gypsy …133 ...137 HIV1 Other osvaldo …192 RSV …167 WDSV …167 (Meta/Retro)viridae +/- Chromodomain GPF/Y Integrase: what’s happening in the back?
Retroviridae HIV-1 Rousv Pseudoviridae MoMLV Metaviridae Putative retroviruses Ty5-6p Evelknievel Osser 0.1 changes Ty3 Gypsy Hopscotch Del1 Retrofit Tst1 Reina Cyclops SIRE-1 Calypso Endovir1-1 Fababean Opie-2 Athila4-6 ToRTL1 Grande Art1 Tat4-1 Ta1-3 Tnt1-94 Tto1 Cinful-1 copia MAG Ty1 SURL Retroviruses independently evolved at least twice in plants
ToRTL1 env 668 aa 31% ID Endovir1-1 env 476 aa 24% ID SIRE-1 env 648 aa env nef HIV-1 pol vpr TM SU gag tat LTR RH PR RT IN LTR p6 NC MA CA rev vpu vif retrovirus envlike-coding regions show a bipartite structural organization
A C B A C B LTR LTR RH PR IN RT MA CA NC Gag surprises… Putative retrovirus group A B C A B C (Hemi/Pseudo)virus • Gag is much larger in the retroviral lineage • Sequence and structural conservation is evident
Diversity of the Pseudoviridae family summary • Enzymatic regions appear to be highly constrained other than the IN C-terminus. • Arabidopsis LTR retrotransposons are representative of plant elements in the family • The putative retrovirusesrepresent an uniquely evolving Pseudoviridae lineage bearing numerous changes in the retrotransposon genome. • Sub-lineage differences suggest areas to focus experimental efforts for functional studies. • Gag shows greater sequence conservation than previously thought
Summary continued… • envlike-coding regions have been evolutionarily conserved indicating a functional role for the ORF • features suggestive of viral env proteins have been identified in all LTR retrotransposon envlike ORFs • putative env proteins have evolved in at least two independent plant LTR retrotransposon lineages, giving credence to the hypothesis that retroviruses evolved from retrotransposons
Organization of the retroelement populations of the Arabidopsis genome
Do retroelements of higher eukaryotes choose where they integrate? • Is yeast a good model? • Multicellular organism genome projects have noted that transposable element numbers are markedly increased near centromeres. • This project quantitatively documents these anecdotal observations for the Arabidopsis genome
MB 10 20 30 40 50 60 70 80 90 Completed genome? 2 28.0 3 4 X
RetroMap: a graphical tool for simplifying whole-genome analysis of retroelements
RetroMap Features • RetroMap provides the following tools to work with genome data: • Parse blast results • Assign Lineages or arbitrary groupings to retroelements • View chromosomal locations • Identify and extract LTRS • Identify and extract full length elements • Assign ages to complete LTR retroelements • Extract sequence(s) for hits • Visualize hit open reading frames • Generate information about neighboring annotated features (Arabidopsisthaliana only) • Generate tab-delimited datafiles of retroelement information for direct import into statistical software packages
Overview of how RetroMap generates retroelement data for a genome
WDSV MMLV SnRV Cer1 Ce Osvaldo Db RSV Athila At con HIV1 Ty3 Sc sushi Fr PAT Pred Tf1 Spom Dirs1 Dd TAtRL ta11 946 Prt1 Pbla 988 861 996 0.1 L1 Hs 1000 Roo Dm 1000 R2 Dm. Mazi Dm R1 Dm BEL Dm Jockey Dm Pao Bm SIRE1 Gm Tca2 Ca. Ty5 Sp Endovir1 1 At Art1 At copia Dm Starting eprobe sequences
Tat Metavirus 0.1 Athila root 0.2 Pseudoviridae root Metaviridae A. thaliana LTR retrotransposon genome overview
A. thaliana retroelements consist of retroposons and only two LTR families • Pseudoviridae elements are significantly shorter (p=.0001)
identical at time of insertion gag pol Dating LTR retrotransposons Relative ages can be estimated from the sequence divergence (genetic distance) of the LTRs e.g. T = d (genetic distance: 1 – (% identity ÷ 100)) 2k (k: nucleotide substitution rate for genome)
Pseudos are younger than Metas. The Athila sublineage being the oldest tested
Full-length element host DNA homologous recombination loops out and deletes retroelement internal sequences host DNA solo LTR Going solo
No family distribution is random • Metaviridae Athila and Tat are found preferentially inside heterochromatic regions, others groups are not • Pseudoviridae and retroposon distributions are not significantly different • Solo LTRs show same distributions as full-length family members
Hypotheses • Retroelement lineages show ‘universal’ organizational characteristics on the family level • General retroelement abundance at centromeres is due to reduced elimination…the ‘graveyard scenario’ • Metaviridae in Arabidopsis are targeted to heterochromatin
Conclusions • Heterochromatic regions DO appear to act as graveyards, at least in the case of the Pseudoviridae (and presumably the retroposons) • Younger Pseudoviridae elements tend to be found outside of heterochromatin • Solo LTR distributions indicate that homologous recombination between LTRs is not greatly inhibited in heterochromatin • The Metaviridae lineages appear to use targeting in their interactions with the host genome
So many people helped make this research happen, I couldn’t have done it without their support and input. Special thanks go to the many members of the Voytas lab, past and present, undergrads too! I’ve been lucky to have good collaborators who are interesting and fun to work with. These have included Dr. Nettleton, Dr. Wright, Dr. Laten from Loyola University, and always Dr. Voytas. To the head honcho: no one can say it hasn’t been a crazy, crazy ride. Thanks. :o) Acknowledgements
case 1 case 2 case 3 case 4 • Simple match, no overlap with nearest hit, no compression 2) Overlap case(s) both hits merged into one representing their combined maximum extent on the database sequence • 3) Two non-overlapping hits which should be combined: • Left checks it’s boundary position on its query sequence and determines if the other hit falls within that range. If so merge. • Right repeats the proceedure if Left failed to indicate a merge 4) An example of a merge case which may lead to false positives Basic Hit Redundancy Elimination Scheme Query sequence
LTR R T Blast Round 1 LTR LTR LTR LTR RT RT Blast Round 2 LTR R T R T RT RT RT RT RT BLAST false-positive amplification problem RT
Genome sequence Hit 10 kb 10 kb Blast2Sequences H it Hit LTR prediction • Works only for hits of a sequence interior to LTRs • Blast2Sequences is used to detect repeats • 10kb of sequence upstream and downstream are compared • Innermost matching repeats are taken to be the LTRs
Tandem elements Hit 10 kb 10 kb Hit Predicted element Nested elements Hit1 Hit2 10 kb 10 kb Predicted element Hit2 Degenerate or simple internal repeat elements pA pA Hit 10 kb 10 kb Hit LTR Identification Errors
Sample distribution data Sample hit neighbors annotation data