190 likes | 206 Views
BIOINFORMATICS 2054 Statistical Foundations for Bioinformatics Data Mining Class #13, April 17, 2003. Review of Exercises 6.3, 6.4, 6.5. Intro to Monte Carlo Markov Chain Application to CLUSTAL W. Simulation of a.a. coincidence in aligned blocks, p.201 (code will be e-mailed).
E N D
BIOINFORMATICS 2054Statistical Foundations for Bioinformatics Data MiningClass #13, April 17, 2003 Review of Exercises 6.3, 6.4, 6.5. Intro to Monte Carlo Markov Chain Application to CLUSTAL W. Simulation of a.a. coincidence in aligned blocks, p.201 (code will be e-mailed).
Metropolis-Hastings algorithm Given a likelihood function L() for in (not necessarily normalized) , Goal: To generate variates from L()/(L()). Idea: Imagine swarms of particles at each value of , with relative quantity L*(). They jump at each “tick” with probability. Pr(2|1)=q(2|1). Except that sometimes they stick!
Let p(1,2) = Allow a particle at 1to jump to 2 (chosen by h) if and only if a) p > 1 or b) Z~Bernoulli(p) equals 1. So Pr(starts at 1, jumps to 2)
L(2) L(1) 1 2 The net interchange between 1 and 2 is
Irreducibility and convergence Transition kernel too narrow to reach across. Not irreducible; will not converge to correct distribution. Transition kernel will reach across eventually. Irreducible; but convergence may take too long.
If L*(1)/L*(2) = L(1)/L(2), then no net change. The true distribution is stationaryw.r.t. this transition. Also, “self-correcting”
State Space Function Space (random quantities) (prob distributions) initial distribution, 0 T T T T T T stationary distribution, T is a stochastic process [TTT…TTT()] T is a functional (a function that takes a function to another function). TTTT…TTT(0)
Gibbs sampling One always accepts, because… This leads to an acceptance probability p(1,2) = always equal to one. Notice this cycles through parameters--- really a successive composition of a number of transition processes.
WinBugs • the movie!
Connection with Bayesian networks • Arrows represent simplest factorization, using conditional independence. • Directed acyclic graph. u v d
Connection with Bayesian networks • Arrows represent simplest factorization, using conditional independence. • Directed acyclic graph. u v x d
v children(v) parents(v) par(ch(v)) anc(par(v)) Connection with Bayesian networks • Directed acyclic graph, arrows represent simplest factorization, using conditional independence. • Derive the “full conditional” by dropping terms without v. desc(ch(v))
… W 1 1 n N Reduced array
initialize the ’s. • initialize the p’s. • pick a row l (at random? or in order?) • update the p’s using Eq. 1. • update (l) (randomly) using Eq.2. Algorithm 1 2
Assignment for next week • Reminder: Due next week: • Identify the exons of TP53. • State and test a hypothesis about the codons.