1 / 19

Fire μSat : An Algorithm to Detect Tandem Repeats in DNA

Fire μSat : An Algorithm to Detect Tandem Repeats in DNA. Introduction. What are tandem repeats in DNA? How are we going to detect tandem repeats in DNA? Why would anybody want to detect tandem repeats in DNA?. Genetic sequences. DNA consists of four different nucleotides, namely:

jontae
Download Presentation

Fire μSat : An Algorithm to Detect Tandem Repeats in DNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FireμSat:An Algorithm to Detect Tandem Repeats in DNA

  2. Introduction • What are tandem repeats in DNA? • How are we going to detect tandem repeats in DNA? • Why would anybody want to detect tandem repeats in DNA?

  3. Genetic sequences • DNA consists of four different nucleotides, namely: Adenine (A) Guanine (G) Cytosine (C) Thiamine (T) • Genetic databanks e.g. Genbank, Emboss and Entrez stores DNA sequences as concatenated single letter codes in FASTA format.

  4. Tandem Repeats (TR’s) in genome sequences • DNA molecules are subject to numerous mutational events. One of the consequences of these events that can be detected by computationally analyzing genome sequences is tandem duplication. • A TR or TR-zone is a string of DNA molecules that is characterized by a certain motif that introduces the string, contiguously followed by a number of ‘copies’ of the motif, e.g., ACGACGACGACGACG

  5. Tandem Repeats • Perfect tandem repeat (PTR) if the copies are exact e.g. ACGACGACGACGACG, hence five copies of the motif ACG. • Approximate tandem repeat (ATR) if the copies of the motif include non-exact copies, thus mutational events have, most likely occurred e.g. ACGACACGAGGACGAG. • In the absence of further qualification, reference to a tandem repeat should be construed as a reference to either a PTR or an ATR.

  6. Tandem Repeat Elements • A PTR element (PTRE) is a TR element that matches the motif. If the motif is for example ACG then the PTRE will also be ACG. • An ATR element (ATRE) is a TR element similar to the motif but not an exact copy thereof. If the motif is ACG then an ATRE may for example be AC.

  7. Microsatellites • The length of PTRE’s may vary: satellites, minisatellites and microsatellites • Microsatellites is a subset of TR’s • (conforming to Benson, Delgrange, Rivals & Abajian)

  8. Formal problem statement A PTR whose motif is ρ is repeated p times where p 1, is denoted by ρp. An ATR u that is derived from this PTR ρp must always have the motif (ρ) as its prefix. It therefore has the form ρu2…up where each ATRE, uk(k = 2…p), is the result of at most ε mutations on ρ. Here ε is the so called motif error. Besides the restrictions applicable to the motif error threshold values are also introduced that manipulate the attributes of the detected TR.

  9. Tolerated error types • Errors regarding the motif or PTRE (motif errors): • deletions • mismatches • insertions • Errors related to the detected TR (TR errors): • in terms of the ratio between PTRE’s and ATRE’s • the minimum number TRE’s to be reported • the maximum number of ATRE’s consecutively

  10. Motif errors • Maximum of 50% error toleration • If |ρ| = 2 or |ρ| = 3 then є = 0 or є = 1 • (default = 1) • If |ρ| = 4 or |ρ| = 5 then є = 0; є = 1 or є = 2 • (default = 2) • Consider ACGTT then ACT will be an ATRE where two deletions have occurred.

  11. Motif errors: Types of Mutations • Deletion Refers to the absence of a base pair in the motif. • Insertion An ATRE with up to ε base pairs inserted into any position of the PTRE. • Mismatch Refers to the replacement of a base pair in the motif by another.

  12. Detected TR errors: the substring error • The substring error : • where is the maximum substring error allowed and • = (n_d x p_d) + (n_i x p_i) + (n_m x p_m) – n_ptre • where • n_d: number of deletions • n_i: number of insertions • n_m: number of mismatches • p_d: penalty allocated to deletions • p_i: penalty allocated to insertions • p_m: penalty allocated to mismatches

  13. Detected TR errors: the minimum number of TRE’s • tn_tre = tn_ptre + tn_atre • tn_tre • the default value for = 2 • to prevent the output of unwanted data

  14. Detected TR errors: the maximum number of consecutive ATRE’s • tn_atreC • tn_atreC is incremented for every ATRE read • tn_atreC is set to zero whenever a PTRE is read • the default of tn_atreC is 0

  15. DeletionRefers to the absence of a base pair in the motif FAD(ACG,1)

  16. MismatchRefers to the replacement of a base pair in the motif by another. FAm(ACG,1)

  17. High-level Descriptionof FireμSat • generateWords(ρ,ε)generates a set of all words of length ρLength from thealphabet Σ = {A,C,G,T}. • createFATR(ρ,ε) returns FATR(ρ,ε) as discussed. • findIndices(gSeq, FATR, τ, α,β, p_m, p_d, p_i) returns a set of index pairs in gSeq of an identified TR. • the TR is such that it complies with the constraints specified by τ, α,β. Various counters have to be updated to ensure correct output.

  18. Why does anybody want to detect TR’s in DNA? • The cause of several human diseases can be traced to having too many copies of a certain nucleotide triplet. • TR’s play a role in the development of immune system cells. • TR’s serves as genetic markers in plant and animal species. • Tandem repeats play a role in gene regulation and contribute to the breeding of disease resistant cultivars.

  19. Conclusion A new theoretical approach to detect TR’s in DNA has been introduced. The time complexity of FireµSat is linear in |gSeq|. The practical implementation of FireµSat is in progress. The following matters constitute a future research agenda: • the performance of FireµSat • the possibility of reducing FATR • and, if successful, the latter results could suggest ways of adapting FireµSat to detect minisatellites and satellites as well.

More Related