1 / 65

§❶ Review of Likelihood Inference

§❶ Review of Likelihood Inference . Robert J. Tempelman. Likelihood Inference. Data Analysts: Don’t throw away that Math Stats text just yet!!!. Meaningful computing skills is a plus!. Necessary prerequisites to understanding Bayesian inference Distribution theory Calculus

louvain
Download Presentation

§❶ Review of Likelihood Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. §❶ Review of Likelihood Inference Robert J. Tempelman

  2. Likelihood Inference Data Analysts: Don’t throw away that Math Stats text just yet!!! Meaningful computing skills is a plus! • Necessary prerequisites to understanding Bayesian inference • Distribution theory • Calculus • Asymptotic theory (e.g, Taylor expansions) • Numerical Methods/Optimization • Simulation-based analyses • Programming Skills • SAS PROC ???? or R package ???? is only really a start to understanding data analysis. • I don’t think that SAS PROC MCMC (version 9.3)/WinBuGs is a fix to all of your potential Bayesian inference problems.

  3. The “simplest” model pdf: probability density function Conditional independence joint pdf is product of independent pdf’s • Basic mean model: • Common distributional assumption: • What does this mean? Think pdf!!!

  4. Likelihood function ‘proportional to’ Simplify joint pdf further Regard joint pdf as function of parameters

  5. Maximum likelihood estimation • Maximize with respect to unknowns. • Well actually, we directly maximize log likelihood • One strategy: Use first derivatives: • i.e., determine and and set to 0. • Result?

  6. Example Data, Log Likelihood & Maximum Likelihood estimates

  7. Likelihood inference for discrete data Set to zero → Consider the binomial distribution:

  8. Sometimes iterative solutions are required • First derivative based methods can be slow for some problems. • Second-derivative methods are often desirable, e.g. Newton-Raphson • Generally faster • Provide asymptotic standard errors as useful by-product

  9. Plant Genetics Example(Rao, 1971) y1, y2, y3, and y4 are the observed numbers of 4 different phenotypes involving genotypes different at two loci from the progeny of self-fertilized heterogygotes (AaBb). It is known that under genetic theory that the distribution of four different phenotypes (with complete dominance at each loci) is multinomial.

  10. Probabilities  → 0: close linkage in repulsion → 1: close linkage in coupling 0  1

  11. Genetic Illustration of Coupling/Repulsion A a A a B b b B  = 0  = 1 Coupling Repulsion

  12. Likelihood function Given:

  13. First and second derivatives First derivative: Second derivative: Recall Newton Raphson algorithm:

  14. Newton Raphson:SAS data step and output data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (-y1/(2+theta)**2 - (y2+y3)/(1-theta)**2 - y4/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output; run; procprint data=newton; variterate theta loglike; run;

  15. Asymptotic standard errors Observed information procprint data=newton; varasyvar; run; Given:

  16. Alternative to Newton Raphson Expected information • Fisher’s scoring • Substitute for in Newton Raphson . • Now • Then

  17. Fisher scoring:SAS data step and output: In some applications, Fisher’s scoring is easier than Newton Raphson…but observed information probably more reliable than expected information (Efron and Hinckley, 1978 ) data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (n/4)*(-1/(2+theta) - 2/(1-theta) - 1/theta); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output; run; procprint data=newton; variterate theta loglike; run;

  18. Extensions to multivariate q. Suppose that q is p x 1 vector. Newton Raphson Fisher’s scoring or

  19. Generalized linear models • For multifactorial analysis of non-normal (binary, count) data. • Consider the probit link binary model. • Implies the existence of normally distributed latent (underlying) variables (i). • Could do something similarly for logistic link binary model • Consider a simple population mean model: • i = m + ei ; ei ~ N(0, e2 ) • Let  = 10 and e= 2

  20. The liability (latent variable) concept Y=0 (“failure”) Y=1 (“success”) =12 (“THRESHOLD”) pdf(i) • e = 2 i.e. probability of “success” = 15.87% i  = 10

  21. Inferential utopia! • Suppose we’re able to measure the liabilities directly • Also suppose a more general multi-population(trt) model = Xa + e; e ~ N(0, R); typically R = Is2 = ML(a) = OLS(a): But (sigh…), we of course don’t generally observe l

  22. Suppose there are 3 subclasses  = Xa + e Herd 1 Herd 2 Herd 3 Mean liabilities Use “corner parameterization”:

  23. Probability of success as function of effects (can’t observe liabilities…just observed binary data) Shaded areas

  24. Reparameterize model Herd 1 Herd 2 Herd 3 Let dand xi'a = (m + xi'*a*) cannot be estimated separately from s2e….i.e., s2e not identifiable.

  25. Reparameterize the model again. Notice that the new threshold is now 12/2 = 6, whereas the mean responses for the three herds are now 9/2, 10/2 and 11/2 consider the remaining parameters as standardized ratios: t = d/ se, x = m/se, and b = a*/se -> same as constraining se = 1.

  26. There is still another identifiability problem Notice that the new threshold is now 0, whereas the mean responses for the three herds are now -1.5, -1 and -0.5 • Between tand x • One solution? • “zero out” t.

  27. highervalues of translate into lowerprobabilities of disease Note that

  28. Deriving likelihood function y = 0,1 Given: i.e., Suppose you have second animal (i’) Suppose animals i and i’ are conditionally independent Example

  29. Deriving likelihood function → • More general case • conditional independence • So…likelihood function for probit model: • Alternative: logistic model:

  30. Small probit regression example Link function = probit Data

  31. Log likelihood Fisher’s scoring: Newton Raphson equations can be written as:

  32. A SAS program procgenmod data=example_binary descending; class y; model y = x /dist=bin link=probit; contrast 'slope ' x 1; run; dataexample_binary; input x y; cards; 40 1 40 1 40 0 43 0 43 1 43 0 47 0 47 0 47 1 50 0 50 0 50 0 ;

  33. Key output

  34. Wald test • Asymptotic inference: • Reported standard errors are square roots of diagonals. • Hypothesis test: on K’b = 0: When is n “large enough” for this to be trustworthy????

  35. Likelihood ratio test Reduced Model: -2 (logLreduced - logLfull) = -2(-7.63 - -6.210) =2.84 Ho: b1 = 0 is Prob(c21 >2.84) = .09. Again..asymptotic procgenmod data=example_binary descending; class y; model y = /dist=bin link=probit; run;

  36. A PROC GLIMMIX “fix” for uncertainty:use asymptotic F-tests rather than c2-tests “less asymptotic?” procglimmix data=example_binary ; model y = x /dist=bin link=probit; contrast 'slope ' x 1; run;

  37. Ordinal Categorical Data • How I learned this? • “Sire evaluation for ordered categorical data with a threshold model” by Dan Gianola and Jean Louis Foulley (1983) in Genetics, Selection, Evolution 15:201-224. (GF83) • See also Harville and Mee (1984) Biometrics (HM84) • Application: • Calving ease scores (0= unassisted, 5 = Caesarean) • Determined by underlying continuous liability relative to set of thresholds:

  38. e = 2 Liabilities: Consider three different herds/subclasses:

  39. Underlying normal densities for each of three herds. Probabilities highlighted for Herd 2

  40. Constraints 2 4 6 8 Not really possible to separately estimate sefrom t1 , t2, m1, m2, andm3. Define then L* = L/ se, t1 * = t1 /se, m1* = m1/se, m2* = m2/se, and m3* = m3/se.

  41. Yet another constraint required i.e., zero out m* Suppose we use the corner parameterization: when expressed as a ratio over seis Such that t1* or t2* are not separately identifiable from m* t1**= t1* - m* = 4.0- 5.5 = -1.5 t2**= t2* - m* = = 6.0- 5.5 = +0.5

  42. m** t1**= t1* - m* = 4.0- 5.5 = -1.5 t2**= t2* - m* = = 6.0- 5.5 = +0.5

  43. Alternative constraint. Estimate m but “zero out” one of t1 ort2,say t1 Start with and t1* = 4.0 and t2*= 6.0. Then: m**= m*-t1* = 5.5-4.0 = 1.5 t2** = t2* -t1*= 6.0 - 4.0 = 2.0

  44. One last constraint possibility Setting t1 = 0and t2to arbitrary value > t1 and infer upon se Say se = 2. t1 fixed to 0; t2fixed to 4

  45. Likelihood function for Ordinal Categorical Data Based on the multinomial (m categories) where and Likelihood: Log Likelihood:

  46. Hypothetical small example • Ordinal outcome having 3 possible categories: • Two subjects in the dataset: • first subject has a response of 1 whereas the second has a response of 3. • Their contribution to the log likelihood:

  47. Solving for ML • Let’s use Fisher’s scoring: • For a three+ category problem: • Now

  48. Setting up Fisher’s scoring2ndderivatives(see GF83or HM84 for details) now

  49. Setting up Fisher’s scoring1nd derivatives (see GF83 for details) Now with

  50. Fisher’s scoring algorithm So

More Related