1 / 18

Random walk

Random walk. Presented by Changqing Li Mathematics Probability Statistics. What is a Random Walk?. An Intuitive understanding : A series of movement which direction and size are randomly decided (e.g., the path a drunk person left behind ).

cira
Download Presentation

Random walk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random walk Presented by Changqing Li Mathematics Probability Statistics

  2. Whatis a Random Walk? • An Intuitive understanding: A series of movement which direction and size are randomly decided (e.g., the path a drunk person left behind). • Formal Definition: Let a fixed vector in the d-dimensional Euclidean space and a sequence of independent, identically distributed (i.i.d.) real-valued random variables in . The discrete-time stochastic process defined by is called a d-dimensional random walk

  3. Why Random Walks? • A random walk (RW) is a useful model in understanding stochastic processes across a variety of scientific disciplines. • Random walk theory supplies the basic probability theory behind BLAST ( the most widely used sequence alignment theory).

  4. If and RVs take values in , then is called d-dimensional lattice random walk. In the lattice walk case, if we only allow the jump from to where or , then the process is called d-dimensional sample random walk. Definitions (cont.)

  5. Definitions (cont.) A random walk is defined as restricted walk if the walk is limited to the interval [a, b]. The endpoints a and b are called absorbing barriers if the random walk eventually stays there forever; or reflecting barriers if the walk reaches the endpoint and bounces back.

  6. Example: DNA sequence alignment modeled as RW | | | ||| || ||| ggagactgtagacagctaatgctata Gaacgccctagccacgagcccttatc Simple scoring schemes: at a position: +1, same nucleotides -1, different nucleotides *

  7. Example: simple RW Ladder point Ladder Point (LP):the point in the walk lower than any previously reached points. Excursion: the part of the walk from a LP until the highest point attained before the next LP. Excursions in Fig: 1, 1, 4, 0, 0, 0, 3; BLAST theory focused on the maximum heights achieved by these excursions.

  8. Example : General RW • Consider arbitrary scoring scheme (e.g. substitution matrix)

  9. General Walk • Suppose generally the possible step sizes are, and their respective probabilities are, • The mean of step size is negative, i.e., • The mgf of S(step size) is,

  10. General Walk • There exists unique positive , such that, • To consider the walk that start at 0, with stopping boundary at -1 and without upper boundary, impose an artificial barrier at The possible stopping points can be, • And Wald’s Identity states, where, is the total displacement when the walk stops.

  11. General Walk • Thus, Where, is the probability that the walk finishes at the point k. The mean of number of steps until the walk stops or would be

  12. Random Walks in real life! In Supernova stars – how “star stuff” gets to be inside us (eventually!)

  13. Random Walks (in your body) How two liquids (and air!) mix together! (Osmosis) Cells inside your body

  14. Random Walks and $$ (Wall Street) Stock Market – predicting the price /cost of a stock in the future

  15. Application: BLAST • BLAST is the most frequently used method for assessing which DNA or protein sequences in a large database have significant similarity to a given query sequence; a procedure that searches for high-scoring local alignments between sequences and then tests for significance of the scores found via P-value. • The null hypothesis to be test is that for each aligned pair of animo acids, the two amino acids were generated by independent mechanism.

  16. BLAST : modeling • The positions in the alignment are numbered from left to right as 1, 2,…, N. A score S(j, k) is allocated to each position where the aligned amino acid pair (j,k) is observed, where S(j,k) is the (j,k) element in the substitution matrix chosen. • An accumulated score at position i is calculated as the sum of the scores for the various amino acid comparison at position 1, 2,…,i. As i increases, the accumulated score undergoes a random walk.

  17. BLAST : calculating parameters • Let Y1, Y2,… be the respective maximum heights of the excursions of this walk after leaving one ladder point and before arriving the next, and let Ymax be the maximum of these maxima. It is in effect the test statistic used in BLAST. So it is necessary to find its null hypothesis distribution. • The asymptotic probability distribution of any Yi is shown to be the geometric-like distribution. The values of C and in this distribution depend on the substitution matrix used and the amino acid frequencies {pj} and {pj’}. The probability distribution of Ymax also depends on n, the mean number of ladder points in the walk.

  18. Reference • http://mathworld.wolfram.com/RandomWalk2-Dimensional.html • http://mathworld.wolfram.com/Borel-TannerDistribution.html • http://www.bioss.ac.uk/~dirk/talks/tutorial_Blast.pdf#page=5&zoom=auto,53,792 • http://www.jstor.org/discover/10.2307/27851819?uid=3739840&uid=2129&uid=2&uid=70&uid=4&uid=3739256&sid=21102991585977

More Related