300 likes | 592 Views
E N D
2. Overview Background
Tandem repeats
Methodology
Results
Conclusions
References
3. Background An array of consecutive repeats
Repeating pattern or consensus = 5
Total repeat length = 25
3 main types of tandem repeats
Microsatellites -- 1-5 bp repeating pattern
Minisatellites -- 6-50 bp repeating pattern
Large tandem -- greater than 50 bp repeating pattern
4. Significance Use tandem repeats to determine whether 2 DNA samples belong to same person or not
Uses –
Forensic use
Paternity testing
5. Mechanism of tandem duplication
Unequal recombination is the major known mechanism for the formation of large tandem repeats
Image has been downloaded from http://hc.ims.u okyo.ac.jp/JSBi/journal/GIW02/GIW02F010/GIW02F010.html
6. Tandem gene duplication Benefits – New functions arise. Responsible for the evolution of gene clusters
Example – Zinc finger genes in mammalian genes
7. Purpose Large tandem repeats are commonly found in eukaryotes – humans have 1.684 % and chimpanzees have 1.525%
To date the large tandem duplication and find the relationship between various characteristics of long tandem repeats and corresponding evolutionary time
8 genomes – 3 primates, 2 rodents , dog, chicken and puffer fish were analyzed
8. Methodology Identification
Tandem repeat finder (TRF) for identification of large tandem repeats
Distance computation
Jukes – Cantor distance model to find distance between two repeats
Transformation
Transform the above computed distance into evolutionary time
9. Tandem Repeat Finder STRING, Mreps and TRF
TRAP: T.Jose, P. Sobreira, A.Durham and A.Gruber
TRF can be downloaded at http://tandem.bu.edu/trf/trf.html
Starting and ending positions of tandem repeat was present
Number of repetitions
A%, C%, G%, T% percentage of bases in the tandem repeat
Length of the consensus word (only the first 10 bases)
10. Tandem Repeat Finder Tandem repeat finder outline :
Tandem repeat finder program has 2 main components – detection and analysis
Detection - Finds candidate tandem repeats
Analysis - Produces an alignment for each candidate and statistics about the alignment
11. Tandem Repeat Finder Large tandem repeats were extracted
Results of TRF –
1 5 100 0 50 20 40 20 20 1.92 GATCC GATCCGATCCGATCCGATCCGATCC
GATCC - period or consensus
GATCCGATCCGATCCGATCCGATCC - repeat
1 - indices
5 - consensus or period size
100 - percent matches
0 - percent indels
50 - score
20 - % of A
40 - % of C
1.92 - entropy
12. DNA Sequence Evolution Model For Dating
13.
Computing divergence of tandem repeating units –
Repeat identity - each repeat is compared with other repeats and maximum similarity/identity is considered
GATCC GATCC|GATCC|GATCC|GATCC|GATCC Dating tandem duplications
14. Jukes-Cantor model Computes the distance between 2 repeats
All bases occur with equal probability,
i.e. p = 0.25 for A, T, G and C
All possible base substitutions are equally likely as follows -
A ? G, A ? C, A ? T, G ? T
15. Jukes-Cantor model m = no. of mutations
n = length of sequence
D = -3/4 ln(1- 4/3 m/n)
D = Distance between two repeats
Ex- Observed mismatches at 25% of the sites, then Jukes Cantor model predicts the distance between two repeat is 0.304
16. Estimating the evolutionary time Transforming the computed distance (D) between two repeats into evolutionary time
Neutral mutation rate in mammals is nearly 1.25 * 10-9 per year per site
Time (T) = D / 1.25 * 10-9 years ago
Ex- D = 0.1
T = 0.1 / 1.25 * 10-9 = 80 million years ago
17. Material and Method Material
The genome files were downloaded from UCSC site http://hgdownload.cse.ucsc.edu/downloads.html
The tandem repeat finder and stretcher software were downloaded
Procedure
Extraction of large tandem repeats with the help of tandem repeat finder
Calculation of similarities between tandem repeats using stretcher
Computation of the distance using Jukes-Cantor model
Transformation of distance to the evolutionary time
18. Tree of life
19. Recap – period & repeat
20. Results
21. Results
22. Total number of repeats
23. Total number of period or consensus
24. Results of repeat length
25. % Repeat results
26. Dating tandem repeats
27. Tree of life
28. Conclusions Primates (human, chimpanzee and macaque) have highest number of long tandem repeat duplications
Dating peak is prominent in human, chimpanzee and macaque, especially between 80-120 million years ago
Tandem repeat results follow a pattern which is similar to the divergence as shown in the tree of life
Dog, rat and mouse show steady increase in number of tandem duplications but burst is negligible between 80-120 million years ago
Human has highest number of duplications among all studied genomes
29. Acknowledgements Advisor – Dr. Haixu Tang
School of Informatics
Members of Computational Omics Lab
Parents, Rajen & Rajeev
Prasanta
30. References Methods for reconstructing the history of tandem repeats and their application to the human genome
Authors: Jaitly D, Kearney P , Lin G, Ma B
A Survey on Algorithmic Aspects of Tandem Repeats Evolution.
Authors: E. Rivals
Topological Rearrangements and Local Search Method for Tandem Duplication Trees
Authors: Denis Bertrand and Olivier Gascuel
Greedy method for inferring tandem duplication history
Authors: Louxin Zhang Bin Ma Lusheng Wang and Ying Xu
A fast and accurate distance algorithm to reconstruct tandem duplication trees
Authors: Elemento O. and Gascuel O
Tandem repeats finder: a program to analyze DNA sequences
Author: Gary Benson
31.