790 likes | 1.02k Views
Maximum Likelihood Estimates and the EM Algorithms II. Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm. Part 1 Computation Tools. Include Functions in R. source("file path") Example In MME.R:.
E N D
Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm
Include Functions in R • source("file path") • Example • In MME.R: • In R: MME=function(y1, y2, y3, y4) { n=y1+y2+y3+y4; phi1=4.0*(y1/n-0.5); phi2=1-4*y2/n; phi3=1-4*y3/n; phi4=4.0*y4/n; phi=(phi1+phi2+phi3+phi4)/4.0; print("By MME method") return(phi); # print(phi); } > source(“C:/MME.R”) > MME(125, 18, 20, 24) [1] "By MME method" [1] 0.5935829
A a b B A b a A a A a B b B B b Example 1 in Genetics (1) • Two linked loci with alleles A and a, and B and b • A, B: dominant • a, b: recessive • A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab 1/2 1/2 1/2 1/2
A a b B A b a A a A a B b B B b Example 1 in Genetics (2) • Probabilities for genotypes in gametes 1/2 1/2 1/2 1/2
Example 1 in Genetics (3) • Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92. • More: http://en.wikipedia.org/wiki/Geneticshttp://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf
Example 1 in Genetics (5) • Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*. • A*: the dominant phenotype from (Aa, AA, aA). • a*: the recessive phenotype from aa. • B*: the dominant phenotype from (Bb, BB, bB). • b*: the recessive phenotype from bb. • A*B*: 9 gametic combinations. • A*b*: 3 gametic combinations. • a*B*: 3 gametic combinations. • a*b*: 1 gametic combination. • Total: 16 combinations.
Example 1 in Genetics (6) • Let , then
Example 1 in Genetics (7) • Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So
Example 1 in Genetics (8) • Suppose that we observe the data of which is a random sample from Then the probability mass function is
Maximum Likelihood Estimate (MLE) • Likelihood: • Maximize likelihood: Solve the score equations, which are setting the first derivates of likelihood to be zeros. • Under regular conditions, the MLE is consistent, asymptotic efficient and normal! • More: http://en.wikipedia.org/wiki/Maximum_likelihood
MLE for Example 1 (1) • Likelihood • MLE:
MLE for Example 1 (2) A B C
MLE for Example 1 (3) • Checking: • 1. • 2. • 3. Compare ?
A Banach Space • A Banach space is a vector space over the field such that • Every Cauchy sequence of converges in (i.e., is complete). • More: http://en.wikipedia.org/wiki/Banach_space
Lipschitz Continuous • A closed subset and mapping • is Lipschitz continuous on with if • is a contraction mapping on if is Lipschitz continuous and • More: http://en.wikipedia.org/wiki/Lipschitz_continuous
Fixed Point Theorem (1) • If is a contraction mapping on if is Lipschitz continuous and • has an unique fixed point such that • initial
Fixed Point Theorem (2) • More: http://en.wikipedia.org/wiki/Fixed-point_theorem http://www.math-linux.com/spip.php?article60
Applications for MLE (1) • Numerical solution is a contraction mapping : initial value that Then
Applications for MLE (2) • How to choose s.t. is a contraction mapping? • Optimal ?
Parallel Chord Method (1) • Parallel chord method is also called simple iteration.
Plot Parallel Chord Method by R (1) ### Simple iteration ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Parallel chord method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)
Plot Parallel Chord Method by R (2) phi0 = 0.25 # Given the initial value 0.25 # segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "red") phi1 = -b0/f2(phi0) # Find the closer phi # segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red")
Define Functions for Example 1 in R • We will define some functions and variables for finding the MLE in Example 1 by R # Fist, second and third derivatives of log likelihood # f1 = function(y1, y2, y3, y4, phi){y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 = function(y1, y2, y3, y4, phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} f3 = function(y1, y2, y3, y4, phi) {2*y1*(2+phi)^(-3)-2*(y2+y3)*(1-phi)^(-3)+2*y4*(phi)^(-3)} # Fisher Information # I = function(y1, y2, y3, y4, phi) {(-1)*(y1+y2+y3+y4)*(1/(4*(2+phi))+1/(2*(1-phi))+1/(4*phi))} y1 = 125; y2 = 18; y3 = 20; y4 = 24; initial = 0.9
Parallel Chord Method by R (1) > fix(SimpleIteration) function(y1, y2, y3, y4, initial){ phi = NULL; i = 0; alpha = -1.0/f2(y1, y2, y3, y4, initial); phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-5){ i = i+1; phi1 = phi2; phi2 = alpha*f1(y1, y2, y3, y4, phi1)+phi1; phi[i] = phi2; } print("By parallel chord method(simple iteration)"); return(list(phi = phi2, iteration = phi)); }
Parallel Chord Method by R (2) > SimpleIteration(y1, y2, y3, y4, initial)
Newton-Raphson Method (1) • http://math.fullerton.edu/mathews/n2003/Newton'sMethodMod.html • http://en.wikipedia.org/wiki/Newton%27s_method
Plot Newton-Raphson Method by R (1) ### Newton-Raphson Method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 # First and second derivatives of log likelihood # f1 <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} f2 <- function(phi) {(-1)*y1*(2+phi)^(-2)-(y2+y3)*(1-phi)^(-2)-y4*(phi)^(-2)} x = c(10:80)*0.01 y = f1(x) plot(x, y, type = 'l', main = "Newton-Raphson method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0)
Plot Newton-Raphson Method by R (2) # Given the initial value 0.25 # phi0 = 0.25 segments(0, f1(phi0), phi0, f1(phi0), col = "green", lty = 2) segments(phi0, f1(phi0), phi0, -200, col = "green", lty = 2) # Use the tangent line to find the intercept b0 # b0 = f1(phi0)-f2(phi0)*phi0 curve(f2(phi0)*x+b0, add = T, col = "purple", lwd = 2) # Find the closer phi # phi1 = -b0/f2(phi0) segments(phi1, -200, phi1, f1(phi1), col = "green", lty = 2) segments(0, f1(phi1), phi1, f1(phi1), col = "green", lty = 2)
Plot Newton-Raphson Method by R (3) # Use the parallel line to find the intercept b1 # b1 = f1(phi1)-f2(phi0)*phi1 curve(f2(phi0)*x+b1, add = T, col = "red") curve(f2(phi1)*x-f2(phi1)*phi1+f1(phi1), add = T, col = "blue", lwd = 2) legend(0.45, 250, c("Newton-Raphson", "Parallel chord method"), col = c("blue", "red"), lty = c(1, 1))
Newton-Raphson Method by R (1) > fix(Newton) function(y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1); phi[i] = phi2; } print("By Newton-Raphson method"); return (list(phi = phi2, iteration = phi)); }
Newton-Raphson Method by R (2) > Newton(125, 18, 20, 24, 0.9) [1] "By Newton-Raphson method" $phi [1] 0.577876 $iteration [1] 0.8193054 0.7068499 0.6092827 0.5792747 0.5778784 0.5778760 0.5778760
Halley’s Method • The Newton-Raphson iteration function is • It is possible to speed up convergence by using more expansion terms than the Newton-Raphson method does when the object function is very smooth, like the method by Edmond Halley (1656-1742): (http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)
Halley’s Method by R (1) > fix(Halley) function( y1, y2, y3, y4, initial){ i = 0; phi = NULL; phi2 = initial; phi1 = initial+1; while(abs(phi1-phi2) >= 1.0E-6){ i = i+1; phi1 = phi2; alpha = 1.0/(f2(y1, y2, y3, y4, phi1)); phi2 = phi1-f1(y1, y2, y3, y4, phi1)/f2(y1, y2, y3, y4, phi1)*1.0/(1.0-f1(y1, y2, y3, y4, phi1)*f3(y1, y2, y3, y4, phi1)/(f2(y1, y2, y3, y4, phi1)*f2(y1, y2, y3, y4, phi1)*2.0)); phi[i] = phi2; } print("By Halley method"); return (list(phi = phi2, iteration = phi)); }
Halley’s Method by R (2) > Halley(125, 18, 20, 24, 0.9) [1] "By Halley method" $phi [1] 0.577876 $iteration [1] 0.5028639 0.5793692 0.5778760 0.5778760
Bisection Method (1) • Assume that and that there exists a number such that . If and have opposite signs, and represents the sequence of midpoints generated by the bisection process, thenand the sequence converges to . • That is, . • http://en.wikipedia.org/wiki/Bisection_method
Plot the Bisection Method by R ### Bisection method ### y1 = 125; y2 = 18; y3 = 20; y4 = 24 f <- function(phi) {y1/(2+phi)-(y2+y3)/(1-phi)+y4/phi} x = c(1:100)*0.01 y = f(x) plot(x, y, type = 'l', main = "Bisection method", xlab = expression(varphi), ylab = "First derivative of log likelihood function") abline(h = 0) abline(v = 0.5, col = "red") abline(v = 0.75, col = "red") text(0.49, 2200, labels = "1") text(0.74, 2200, labels = "2")
Bisection Method by R (1) > fix(Bisection) function(y1, y2, y3, y4, A, B) # A, B is the boundary of parameter # { Delta = 1.0E-6; # Tolerance for width of interval # Satisfied = 0; # Condition for loop termination # phi = NULL; YA = f1(y1, y2, y3, y4, A); # Compute function values # YB = f1(y1, y2, y3, y4, B); # Calculation of the maximum number of iterations # Max = as.integer(1+floor((log(B-A)-log(Delta))/log(2))); # Check to see if the bisection method applies # if(((YA >= 0) & (YB >=0)) || ((YA < 0) & (YB < 0))){ print("The values of function in boundary point do not differ in sign.");
Bisection Method by R (2) print("Therefore, this method is not appropriate here."); quit(); # Exit program # } for(K in 1:Max){ if(Satisfied == 1) break; C = (A+B)/2; # Midpoint of interval YC = f1(y1, y2, y3, y4, C); # Function value at midpoint # if((K-1) < 100) phi[K-1] = C; if(YC == 0){ A = C; # Exact root is found # B = C; } else{ if((YB*YC) >= 0 ){
Bisection Method by R (3) B = C; # Squeeze from the right # YB = YC; } else{ A = C; # Squeeze from the left # YA = YC; } } if((B-A) < Delta) Satisfied = 1; # Check for early convergence # } # End of 'for'-loop # print("By Bisection Method"); return(list(phi = C, iteration = phi)); }
Bisection Method by R (4) > Bisection(125, 18, 20, 24, 0.25, 1) [1] "By Bisection Method" $phi [1] 0.5778754 $iteration [1] 0.4375000 0.5312500 0.5781250 0.5546875 0.5664062 0.5722656 0.5751953 [8] 0.5766602 0.5773926 0.5777588 0.5779419 0.5778503 0.5778961 0.5778732 [15] 0.5778847 0.5778790 0.5778761 0.5778747 0.5778754