1 / 20

Analysis of Alu repeat elements

Analysis of Alu repeat elements. Molecular biology & Phylogeny Laboratory Woo-Yeon Kim. CONTENTS. Whole-genome analysis of Alu repeat elements reveals complex evolutionary history INTRODUCTION NEW IDEAS RESULTS DISCUSSIONS

truly
Download Presentation

Analysis of Alu repeat elements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Alu repeat elements Molecular biology & Phylogeny Laboratory Woo-Yeon Kim

  2. CONTENTS • Whole-genome analysis of Alu repeat elements reveals complex evolutionary history • INTRODUCTION • NEW IDEAS • RESULTS • DISCUSSIONS • Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition

  3. Genome Research - Letter Supplemental material is available online at www.genome.org

  4. INTRODUCTION

  5. L R Poly A signal Poly A signal AAAAA AAAAA Alu repeats • A family of SINEs, short interspersed nuclear elements • Replicating via LINE-mediated reverse transcription of an RNA polymerase Ⅲ transcript • Roughly 280 bp • The history of substitution patterns in the human genome • Markers to determine genetic distances between human subpopulations – polymorphic Alu insertions SINE Structure

  6. K-means • Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. • Assign each object to the group that has the closest centroid. • When all objects have been assigned, recalculate the positions of the K centroids. • Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

  7. NEW IDEAS

  8. An example using real data • Only the 5 Alu positions with diagnostic mutations in the Ya5 subfamily (position 91, 98, 146, 175, and 238) • Applying k-means clustering, k = 2

  9. Looking for overrepresented pairs • Identifying nested subfamilies • Computing biprofiles, frequencies of pairs of nucleotide values

  10. RESULTS

  11. Aligned consensus sequences of selected subfamilies • Roughly 480,000 full-length Alu elements • Recursively split subfamilies • Identifying 213 subfamilies

  12. An evolutionary tree of Alu subfamilies

  13. DISCUSSION • Significant mutation from the consensus sequence • Available detected by a rigorous whole-genome analysis • Partial results • Not statistically discernible • Limitations in this algorithm • Limitations – Excluding • Insertion/deletion mutations • Frequent CpG mutations • Mutations to nucleotide values already present in other subfamilies • Statistically distinguishable subfamilies • Only 19 of the 31 subfamilies currently reported in Repbase Update

  14. Bioinformatics – Discovery Note Online Supplementary data is available at the web page www.igib.res.in/manuscriptdata/aluanalysis.html

  15. Alu distribution in whole genome Fig.1. (a) Number of Alu repeats in different chromosomes in human genome with vertical segments representing the numbers corresponding to each Alu subfamily

  16. Alu repeat density and association with genes Fig. 1. (b) Variation in Alu and gene densities in human genome

  17. Alu in intergenic and intragenic regions Variation in Alu contents in Genes of human Genome Alu densities in the intergenic and intragenic regions in human genome

  18. Distribution of Alu subfamilies • The most abundant Alu subfamily – AluS, 6.4 % region of the genome • Chromosome Y • The most Alu poor chromosome • High density Alu Y – very low density Alu S, Alu J • Chromosome 13, 9 – similar trend • 13 having least density of Alu J • Chromosome 8, X • High density Alu S, J • Very low density Alu Y

  19. Correlation analysis GC content seems to have highest association with Alu density overall, followed by gene density and intron density

  20. DISCUSSION • Analysis of Alu distribution in genes • Statistically significant correlation between Alu and gene densities • A higher Alu density in intragenic regions – These elements are preferred in genes. • The highest Alu and gene densities – Chromosome 19, 22 • Alu density is correlated in the order GC content > gene density > intron density • The abundance of Alu subfamilies – Alu S > Alu J > Alu Y • Young subfamilies - Chromosome 9, 13 and Y • Old subfamilies – Chromosome 8 and X • Higher correlation of older Alus with GC content than younger ones

More Related