421 likes | 814 Views
The Method of Likelihood. Hal Whitehead BIOL4062/5062. What is likelihood Maximum likelihood Maximum likelihood estimation Likelihood ratio tests Likelihood profile confidence intervals Model selection: Likelihood ratio tests Akaike Information Criterion (AIC)
E N D
The Method of Likelihood Hal Whitehead BIOL4062/5062
What is likelihood • Maximum likelihood • Maximum likelihood estimation • Likelihood ratio tests • Likelihood profile confidence intervals • Model selection: • Likelihood ratio tests • Akaike Information Criterion (AIC) • Likelihood and least-squares • Calculating likelihood
The Method of Likelihood Observations: Y = {y1,y2,y3,...} e.g. Weights of 30 crabs of known age and sex Model specified by: μ1, μ2, μ3,… e.g. y = μ1 + μ2·√Age + μ3·Sex(0:1) + μ4·e where e ~ N(0, 1) The LIKELIHOOD of Y is: L = Probability (Y| Model & μ1, μ2, μ3,... )
Likelihood The LIKELIHOOD of Y is: L = Probability (Y| Model & μ1, μ2, μ3,... ) The LIKELIHOOD that Z became a criminal: Probability Z became a criminal given what we what we know of Z’s characteristics and how those characteristics translate into the probability of being a criminal
The LIKELIHOOD of Y is: L = Probability (Y| Model & μ1, μ2, μ3,…) We can work this outif we know μ1, μ2, μ3,… Weights of 30 crabs of known age and sex y = μ1 + μ2·√Age + μ3·Sex(0:1) + μ4·e e.g Prob. of these 30 weights is 0.04 if: female wt at age 0, μ1 = 30.0 growth parameter, μ2 = 0.7 excess male weight, μ3 = 5.0 residual s.d., μ4 = 6.3 L(μ1=30,μ2=0.7,μ3=5.0, μ4=6.3)=0.04
Maximum Likelihood Estimators If we do not know μ1, μ2, μ3,... MAXIMUM LIKELIHOOD of Y is: L(μ1,μ2,μ3,...) = Max{Prob.(Y| μ1, μ2, μ3,... )} μ1,μ2,… e.g Max prob. of 30 weights is 0.12 when: female wt at age 0, μ1 = 28.4 growth parameter, μ2 = 0.31 excess male weight, μ3 = 1.7 residual s.d., μ4 = 3.9
Maximum likelihood Maximum likelihood estimator of μ1 Maximum Likelihood Likelihood μ1
Imprecise estimate Maximum Likelihood Precise estimate Likelihood μ1
Likelihood Ratio Tests If: μ1,μ2,μ3,…,μt is true model μ1,μ2,μ3,…,μt,...,μg is more general model then: G = 2∙Log[L(μ1,μ2,μ3,…,μg)/L(μ1,μ2,μ3,…,μt)] (twice the log of the ratios of the maximum likelihoods) is distributed as χ² with g-t degrees of freedom for large sample sizes (asymptotically) If G is unexpectedly large then data are unlikely to be from model μ1,μ2,μ3,…,μt
Likelihood Ratio Tests G = 2·Log[L(μ1,μ2,μ3,…,μg)/L(μ1,μ2,μ3,…,μt)] This is the "G-test for goodness-of-fit": null hypothesis: μ1,μ2,μ3,…,μt alternative hypothesis: μ1,μ2,μ3,…,μt,...,μg
Likelihood: an example Expect Find Wild Type 75% 80 Mutants 25% 10 Total 100% 90
Null hypothesis: Binomial Distribution with q = 0.75 Expect Find Wild Type 75% 80 Mutants 25% 10 Total 100% 90 Likelihood(q=0.75) = 90C10 ·0.7580 ·0.2510 = .000551
Maximum Likelihood Estimator Alternative hypothesis: Binomial Distribution with q = ? Expect Find Wild Type 75% 80 Mutants 25% 10 Total 100% 90 Likelihood(q) = 90C10 ·q80 ·(1-q)10 This has a maximum value when q = 80/90 = 0.89 Max Likelihood(q) = 90C10 ·(0.89)80 ·(1-0.89)10 = 0.1236
Likelihood Ratio Test Expect Find Wild Type 75% 80 Mutants 25% 10 Total 100% 90 • G = 2 ·Log { Max Likelihood (q) } • Likelihood (q = 0.75) • = 2 · Log(0.1236/ 0.000551) = 10.96 • is distributed as χ² with 1 d.f. if q=0.75 • significantly large (P<0.01) in χ²(1) • so: reject null hypothesis.
Profile LikelihoodConfidence Intervals Likelihood μ1
Maximum likelihood 2 95% c.i. Maximum likelihood estimator of μ1 Profile LikelihoodConfidence Intervals Log- Likelihood μ1
MLE(0) μ2 -2 μ1 Profile LikelihoodConfidence Intervals Log-Likelihood Contours(relative to maximum likelihood) 95% Confidence region
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 ·e M(1): y = μ1 + μ2 ·√Age + μ4 ·e M(2): y = μ1 + μ2 ·√Age + μ3 ·Sex(0:1) + μ4 ·e
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4·e Log(L)= -23.04 M(1): y = μ1 + μ2 ·√Age + μ4 ·e Log(L)= -20.34 M(2): y = μ1 + μ2 ·√Age + μ3 ·Sex(0:1) + μ4 ·e Log(L)= -19.84
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 ·e Log(L)= -23.04 M(1): y = μ1 + μ2 ·√Age + μ4 ·e Log(L)= -20.34 M(2): y = μ1 + μ2 · √Age + μ3 ·Sex(0:1) + μ4 ·e Log(L)= -19.84 G(M(0)vs.M(1)) = 2x(-20.34 - (-23.04)) = 5.40 G(M(1)vs.M(2)) = 2x(-19.84 - (-20.34)) = 1.00 G(M(0)vs.M(2)) = 2x(-19.84 - (-23.04)) = 6.40
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 ·e Log(L)= -23.04 M(1): y = μ1 + μ2 · √Age + μ4 ·e Log(L)= -20.34 M(2): y = μ1 + μ2 · √Age + μ3 ·Sex(0:1) + μ4 ·e Log(L)= -19.84 G(M(0)vs.M(1)) = 2x(-20.34 - (-23.04)) = 5.40 P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x(-19.84 - (-20.34)) = 1.00 P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x(-19.84 - (-23.04)) = 6.40 P(χ²(2))<0.05
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 ·e Log(L)= -23.04 M(1): y= μ1 + μ2 ·√Age + μ4 ·eLog(L)= -20.34 M(2): y = μ1 + μ2 · √Age + μ3 · Sex(0:1) + μ4 ·e Log(L)= -19.84 G(M(0)vs.M(1)) = 2x(-20.34 - (-23.04)) = 5.40 P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x(-19.84 - (-20.34)) = 1.00 P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x(-19.84 - (-23.04)) = 6.40 P(χ²(2))<0.05
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 ·e Log(L)= -23.04 M(1): y = μ1 + μ2 · √Age + μ4 ·e Log(L)= -20.34 M(2): y = μ1 + μ2 · √Age + μ3 · Sex(0:1) + μ4 ·e Log(L)= -19.84 G(M(0)vs.M(1)) = 2x(-20.34 - (-23.04)) = 5.40 P(χ²(1))<0.05 G(M(1)vs.M(2)) = 2x(-19.84 - (-20.34)) = 1.00 P(χ²(1))>0.10 G(M(0)vs.M(2)) = 2x(-19.84 - (-23.04)) = 6.40 P(χ²(2))<0.05 But: What is critical p-value?
Model SelectionUsing Likelihood-Ratio Tests Weights of 30 crabs of known age and sex: M(1): y = μ1 + μ2 ·√Age + μ4 ·e M(3): y = μ1 + μ3 ·Sex(0:1) + μ4 ·e But: Cannot compare M(1) and M(3) using likelihood-ratio tests
Model SelectionUsing Likelihood-Ratio Tests • What is critical p-value? • Cannot compare models which are not subsets of one another using likelihood-ratio tests So: Akaike Information Criteria (AIC)
Akaike Information Criteria (AIC) • Kullback-Leibler Information (KLI): • “information lost when model M(0) is used to approximate model M(1)” • “distance from M(0) to M(1)” • AIC(M) = - 2xLog(Likelihood(M)) + 2xK(M) • K(M) is number of estimable parameters of model M • AIC is an estimate of the expected relative distance (KLI) between a fitted model, M, and the unknown true mechanism that generated the data
Akaike Information Criteria (AIC) • AIC(M) = - 2xLog(Likelihood(M)) + 2xK(M) • K(M) is number of estimable parameters • In model selection: choose model with smallest AIC • least expected relative distance between M, and the unknown true mechanism that generated the data
Model SelectionUsing AIC Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 · e M(1): y = μ1 + μ2 · √Age + μ4 · e M(2): y = μ1 + μ2 · √Age + μ3 · Sex(0:1) + μ4 · e M(3): y = μ1 + μ3 · Sex(0:1) + μ4 · e
Model SelectionUsing AIC Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 · e AIC=50.08 M(1): y = μ1 + μ2 · √Age + μ4 · e AIC=46.68 M(2): y = μ1 + μ2 · √Age + μ3 · Sex(0:1) + μ4 · e AIC=47.68 M(3): y = μ1 + μ2 · Sex(0:1) + μ4 · e AIC=49.95
Model SelectionUsing AIC Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 · e AIC=50.08 M(1): y= μ1 + μ2 · √Age + μ4 · eAIC=46.68 M(2): y = μ1 + μ2 · √Age + μ3 · Sex(0:1) + μ4 · e AIC=47.68 M(3): y = μ1 + μ3 · Sex(0:1) + μ4 · e AIC=49.95
Model SelectionUsing AIC • Differences in AIC between models: ΔAIC • Support for less favoured model • ΔAIC: 0-2 Substantial • ΔAIC: 4-7 Considerably less • ΔAIC: >10 Essentially none
Model SelectionUsing AIC Weights of 30 crabs of known age and sex: M(0): y = μ1 + μ4 · e AIC=50.08 Unlikely M(1): y= μ1 + μ2 · √Age + μ4 · eAIC=46.68 BEST M(2): y = μ1 + μ2·√Age + μ3·Sex(0:1) + μ4·e AIC=47.68 Good M(3): y = μ1 + μ3 · Sex(0:1) + μ4 · e AIC=49.95 Unlikely
Modifications to AIC AIC for small sample sizes: AICC= - 2x(Log-Likelihood) + 2xKxn/(n-K-1) n is sample size AIC for overdispersed count data: QAIC= - 2xLog-Likelihood/c + 2xK c is “variance inflation factor” (c=χ²/df)
Burnham, K. P., and D. R. Anderson2002Model selection and multimodel inference: a practical information-theoretic approach, 2nd ed. New York: Springer-Verlag
Likelihood and Least-Squares • If errors are normally distributed • least squares and maximum-likelihood estimates of parameters are the same • but not σ2 estimators • Likelihood is a more powerful and theoretically-based technique
AIC and Least-Squares • If all models assume normal errors with constant variance: • AIC = n.Log(σ2) + 2.K • σ2 = Σei2/n (the MLE of σ2) • K is total no of estimated regression parameters, including the intercept and σ2
Calculating Likelihoods • Analytical formulae • Compute by multiplying probabilities • Estimate by simulation • number of times data are obtained in 1,000 simulations given model and parameters
The Method of Likelihood • Probability of data given model • Estimate parameters using maximum likelihood • Estimate confidence intervals using likelihood profiles • Compare models using • likelihood ratio tests • Akaike Information Criterion (AIC)