1 / 24

ISMB 2007 Review

ISMB 2007 Review. Kyung-Ah Sohn. Bayesian Association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations Jim C. Huang, Anitha Kannan and John Winn University of Toronto, MS Research, Cambridge.

ania
Download Presentation

ISMB 2007 Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISMB 2007 Review Kyung-Ah Sohn

  2. Bayesian Association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations Jim C. Huang, Anitha Kannan and John Winn University of Toronto, MS Research, Cambridge

  3. A statistical method for alignment-free comparison of regulatory sequences Miriam R. Kantorovitz, Gene E. Robinson and Saurabh Sinha UIUC, USA

  4. Motivation • How do we measure the similarity between two regulatory DNA sequences in an alignment-free manner? • For sequences which do not demonstrate any statistically significant alignment • e.g. two sequences which are not orthologous, yet are functionally related • detecting regulatory regions in the new genome that are homologous to known enhancers or promoters, which show a significantly less level of alignment than coding sequences

  5. Comparison of k-word frequency distribution How to compare two 4k-dimensional vectors of k-word counts? • Euclidian distance • Information theoretic measure like KL-distance • Geometric measure such as the cosine of the angle between the count vectors • Statistical measure such as the correlation coefficient

  6. Contribution of this paper • D2 score: Alignment-free similarity measure defined as the number of k-word matches • D2z score: normalized measure that captures the statistical significance of D2 score • Reduce the time complexity from O(42k) to O(4k)

  7. D2 score • For A=A1A2…An1, B=B1B2…Bn2 : indicator variable for a match between the k-words starting at position i in A and at position j in B  The number of k-word matches between the two sequences A and B, including overlaps

  8. D2 score • The inner product of the vectors of word counts in A and B Let • : the set of all k-words on the alphabet of size d • : the number of times w appears in the sequence Then

  9. D2z score where E(D2) and σ (D2): the expectation and the standard deviation of D2(A,B) Approximately standard normal when the lengths of the sequences are large enough • How to compute E(D2) and σ (D2)? • IID case • Markov model case

  10. Expectation • IID model where faA: background probability of letter a in the sequence A

  11. Expectation • Markov Model

  12. Variance

  13. Variance – IID case • Case (a): Cov(Y(i,j), Y(s,t))=0 • Case (b): • Case (c): …

  14. Variance – Marcov Model • Case (a)

  15. Evaluation and Comparison • Evaluate if functionally and/or evolutionarily related sequence pairs are scored better than unrelated pairs of sequences randomly chosen from the genome • Positive set: a set of CRMs, known to regulate expression in the same tissue • Negative set: a set of equally many randomly chosen non-coding sequences • Compare each pair of sequences in the positive set, and also for negative set, sort all the scores in one combined list, and then count how many of the pairs in top half of this list are from the positive set

  16. Evaluation on functionally related regulatory sequences

  17. Evaluation on orthologous regulatory sequences

  18. Summary • Proposed a new sequence similarity score

  19. Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics Song Wu, Jie Yang and Rongling Wu Department of Statistics, University of Florida

  20. HIV dynamics • Bi-exponential model for short-term dynamic changes of HIV virion copies in AIDS patients after initiation of HAART Plasma load at time t Viral decay rates in the first and second phase Baseline viral loads when the treatment is initiated Lack of incorporating the characteristics of long-term HIV viral load changes

  21. HIV dynamics • Two phases of viral load decay • The early rapid decay – λ1 • The late slow decay corresponding to the cleaning of free and latent viruses • It is not sensible to assume constant λ2over a long term treatment period

  22. Natural cubic spline • Piecewise third-order polynomial function that passes through a set of control points • Estimate λ2(t) using a cubic spline

  23. Quantitative genetic model Genetically associated marker QTL Alleles with frequency M/m A/a q/1-q p/1-p D: linkage disequilibrium Four haplotypes of MA, Ma, mA, and ma with frequencies p11=pq+D, p10=p(1-q)-D, p01=(1-p)q-D, p00=(1-p)(1-q)+D

  24. Linear model linking genetic and residual effects

More Related