1 / 22

Applications of scan statistics in molecular biology and neuroscience

Applications of scan statistics in molecular biology and neuroscience. by Chan Hock Peng Dept of Statistics and Applied Probabilty. Outline. 1. General introduction 2. Applications in molecular biology (weighted scan statistics) 3. Tail probability computations

duard
Download Presentation

Applications of scan statistics in molecular biology and neuroscience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of scan statistics in molecular biology and neuroscience by Chan Hock Peng Dept of Statistics and Applied Probabilty

  2. Outline • 1. General introduction • 2. Applications in molecular biology (weighted scan statistics) • 3. Tail probability computations • 4. Applications in neuroscience (template matching problem) • 5. Tail probability computations • 6. Extensions and other applications

  3. Notation • : The maximum score in any window of length u. • : The underlying rate of events occurring under normal circumstances. • n: The length of the interval under consideration.

  4. Example 1 • (USA Today, 1996) On Feb 22, US Navy suspended all operations of F-14 jet after third crash in one month. • The three crashes in a month was seven times expected rate based on 5 year period. • =3, n=5*365, =1/70.

  5. Example 2 • (Home News, 1995) In 10 month period, 11 residents died at a Tennessee State Institution. Number was twice what was expected. • Judge was angry and ordered mental health commissioner to spend one in four weekends at institution. • =11, n=?, =11/20.

  6. Clusters of DAM sites in E.Coli DNA • Karlin and Brendel (1992). • DAM site--occurrence of the pattern GATC. • Important in repair and replication of DNA. • =8, n=4.7 million, =1.1/250. • P-value approx. of Naus (1982),

  7. Palindromes in DNA • A-T and C-G are complementary bases. • Complement of CCACGTGG is GGTGCACC. • CCACGTGG is palindromic pattern because its complement reads the same as itself backwards.

  8. Palindromic sequences in viruses • Masse et al. (1992) & Leung et al. (1994). • Palindromic sequences clusters around origin of replication. • Event occurs if there is palindromic pattern of length at least 10 base pairs. • HCMV sequence. =10, n=229354, =0.001. p-value=0.00195.

  9. Extensions to general scoring functions (weighted scan) • In Chew, Choi and Leung (2005), longer palindromic patterns are given larger weights. • For example, a pattern of length k can be given score of k/10. • p-value computations ?

  10. Other applications of weighted scan • Rajewsky et al. (2002) & Lifanov et al. (2003). • Scanning for clusters of transcription factor binding sites. • Position weighted matrices to score words for similarity to a given motif. • Siepel et al. (2005). Searching for segments of high evolutionary conservation.

  11. P-value computations for weighted scan • Chan and Zhang (2006). where • I is a large deviation rate function. • is an overshoot function. • K is the moment generating function of the scores.

  12. Template matching in neuroscience • Neurons are basic units of information processing in brain. • Generate small and highly peaked electric potentials known as spikes. • Pattern of spikes modeled as point or counting process, e.g. Poisson process.

  13. Template pattern • Dave and Margoliash (2000) and Mooney (2000), the spike patterns of a zebra finch when it is listening to a bird song. • Each contains the times in which spikes were generated for ith neuron in an interval of time [0,T).

  14. Longer spike train patterns • Let be corresponding spike train patterns when finch is sleeping, observed over a longer period of time [0,a). • If w matches well with a segment of y, then evidence of bird song replay and hence song learning during sleep.

  15. Scoring function • Consider kernel function f, e.g. let f(x) = 1 if x < 0.025 ms, f(x)=-0.3 if x> 0.025 ms. • For the illustration below, consider d=1 and T=0.2ms. • Let w={.01, .05, .09, .12}. • Let y ={.32, .75, 1.03, 1.15, 1.25 }.

  16. To check if there is a match between w and the segment of y starting at time t=1, compare w = {.01,.05,.09,.12} against y-1 = {.03,.15}. • The point .03 provides a score of 1 because there is point in w less than 0.025ms away. • The point .15 provides a score of -0.3 because nearest point in w is more than 0.025ms away. • Overall score at time t=1 is 1-0.3=0.7.

  17. Scan statistics • For d>1, add up scores over all neurons starting at same time t. • Scan statistics is the maximum possible score over all t in the interval [0,a-T). • Chi (2004) obtain approx of • Chan & Loh (2005) more precise approx of was obtained.

  18. Assumptions and related information • Each is stationary while are independent Poisson processes. • Separate formulas when kernel f is continuous and when it is not continuous. • Number of times a large score c is exceeded is Poisson random variable.

  19. Table of approximations • c MC (s.e.) C & L 0.017 0.0387(0.0019) 0.0383 0.018 0.0237(0.0012) 0.0241 0.019 0.0158(0.0008) 0.0149 0.020 0.0095(0.0005) 0.0091 0.021 0.0054(0.0003) 0.0055 0.022 0.0033(0.0002) 0.0033

  20. Future works • Higher dimension Poisson processes e.g. 2 or 3 dimensional. • Applications in astronomy and imaging. • Varying window-sizes.

More Related