1 / 10

Pairwise Sequence Analysis-V

Pairwise Sequence Analysis-V. Relatedness “Not just important, but everything” Modeling Alignment Scores Coin Tosses Unit Distributions Extreme Value Distribution Lambda and K revealed Loose Ends. Modeling Expectation. Reduced model: Coin tosses Given: N coin tosses

ronat
Download Presentation

Pairwise Sequence Analysis-V

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Analysis-V • Relatedness • “Not just important, but everything” • Modeling Alignment Scores • Coin Tosses • Unit Distributions • Extreme Value Distribution • Lambda and K revealed • Loose Ends Lecture 6 CS566

  2. Modeling Expectation • Reduced model: Coin tosses • Given: • N coin tosses • Probability of heads p • Problem: • What is the average number of longest run of heads? • Solution: • Experimental: Perform several repetitions and count • Theoretical: E(Runmax) = log1/pN • For example, for fair coin and 64 tosses, E(Runmax) = 6 Lecture 6 CS566

  3. Random alignment as Coin tosses • Head = Match • Assume • Score = Run of matches • Maximum score = Longest run of matches • Therefore • Same model of expectation • For example: For DNA sequences of length N, • E(matchlengthmax) = Expected longest run of matches = log1/pN Lecture 6 CS566

  4. Local alignment as Coin tosses • Assume • Score in local alignment = Run of matches • Maximum score = Longest run of matches • Therefore • Similar model of expectation • For DNA sequences of length n & m • E(Matchlengthmax) ~ log1/p(nm) (Why not just n or m?) ~ log1/p(K’nm) • Var(Matchlengthmax) = C (i.e., Independent of sample space) Lecture 6 CS566

  5. Refining Model • S = AS matrix based scoring between unrelated sequences • E(S) ~ log1/p(K’nm) ~ [ln(Knm)]/ (where  = loge 1/p) • Holy Grail: Need P(S > x), probability of a score between unrelated sequences exceeding x Lecture 6 CS566

  6. Poisson distribution estimate of P(S > x) • Consider Coin Toss Example Given [x >> E(Runmax)] Define Success = (Runmax  x) Define Pn = Probability of n successes Define y = E[Success],i.e., Average no. of successes Then, probability of n successes follows Poisson dist. Pn = (e-yyn)/n! Probability of 0 successes (No score exceeding x) is given by P0 = e-y. Then, probability of at least one score exceeding x, P(S > x) = i 0 Pi = (1 - P0) = 1 - e-y For Poisson distribution, y = Kmne-x. Therefore, P(S > x) = 1 – exp (-Kmne-x) Lecture 6 CS566

  7. Unit Distributions • Normalize Gaussian and EVD • Area under curve = 1 • Curve maximum at 0 • Then • For Gaussian • Mean = 0; SD = 1 • P(S > x) = 1 – exp (-e-x) • For EVD • Mean = 0.577 (Euler cons); Variance = 2/6 = 1.645 • P(S > x) = 1 – exp (-e-(x-u)) • Z-score representation in terms of SDs • P (Z > z) = 1 – exp(-e-1.2825z – 0.5772) Lecture 6 CS566

  8. Lambda and K •  = Scale factor for scoring system • Effectively converts AS matrix values to actual natural log likelihoods • K = Scale factor that reduces search space to compensate for non-independence of local alignments • Esimated by fitting to Poisson approximation or equation for E(S) Lecture 6 CS566

  9. Treasure Trove of Probabilities • Probability distribution of scores between unrelated sequences P(Sunrel) • Probability distribution of number of scores from P(Sunrel) exceeding some cut-off, mean represents number of scores exceeding cut-off observed on average • Probability of observing score x occurring between unrelated sequences P(S  x) Lecture 6 CS566

  10. Loose Ends • What about gap parameters? • Short answer: No formal theory • Long answer: • Found empirically • Choice of parameters can be used to convert local alignment algorithm into a global alignment • What about gapped alignment? • Not formally proven, but simulations show statistical behavior similar to ungapped alignment • Effective sequence length n’ = n – E(matchLengthmax) Lecture 6 CS566

More Related