490 likes | 773 Views
Maximum likelihood estimates. What are they and why do we care? Relationship to AIC and other model selection criteria. Maximum Likelihood Estimates (MLE). Given a model ( ) MLE is (are) the value(s) that are most likely to estimate the parameter(s) of interest.
E N D
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria
Maximum Likelihood Estimates (MLE) • Given a model () MLE is (are) the value(s) that are most likely to estimate the parameter(s) of interest. • That is, they maximize the probability of the model given the data. • The likelihood of a model is the product of the probabilities of the observations.
Maximum Likelihood Estimation • For linear models (e.g., ANOVA and regression) these are usually determined using the linear equations which minimize the sum of the squared residuals – closed form • For nonlinear models and some distributions we determine MLEs setting the first derivative equal to zero and then making sure it is a maxima by setting the second derivative equal to zero – closed form. • Or we can search for values that maximize the probabilities of all of the observations – numerical estimation. • Search stops when certain criteria are met: • Precision of the estimate • Change in the likelihood • Solution seems unlikely (stops after n iterations)
Binomial probability • Some theory and math • An example • Assumptions • Adding a link function • Additional assumptions about bs
Binomial Sampling • Characterized by two mutually exclusive events • Heads or tails • On or off • Dead or alive • Used or not used, or • Occupied or not occupied. • Often referred to as Bernoullitrials
Models • Trials have an associated parameter p • p = probability of success. • 1-p = probability of failure ( = q) • p + q = 1 • p also represents a model • Single parameter • p is equal for every trial
Binomial Sampling • p is a continuous variable between 0 and 1 (0 <p <1) • y is the number of successful outcomes • n is the number of trials. • This estimator is unbiased.
Binomial Probability Function • The probability of observing y successes given n trials with the underlying probability p is ... • Example: 10 flips of a fair coin (p = 0.5), 7 of which turn up heads is written
Binomial Probability Function (2) • evaluated numerically: • In Excel: =BINOMDIST(y, n, p, FALSE)
Likelihood Function of Binomial Probability • Reality: • have data (nand y) • don’t know the model (p) • leads us to the likelihood function: • read the likelihood of p given n and y is ... • not a probability function. • is a positive function (0 < p < 1)
Likelihood Function of Binomial Probability(2) • Alternatively, the likelihood of the data given the model can be thought of as the product of the probabilities of the individual observations. • The probability of the observations is: • Therefore, f = 1 for success, f = 0 for failure
Log likelihood • Although the Likelihood function is useful, the log-likelihood has some desirable properties in that the terms are additive and the binomial coefficient does not include p.
Log likelihood • Using the alternative: • The estimate of p that maximizes the value of ln(L) is the MLE.
Precision As n, precision , variance L(p|10,7) L(p|100,70)
Properties of MLEs • Asymptotically normally distributed • Asymptotically minimize variance • Asymptotically unbiased as n → • One-to-one transformations of MLEs are also MLEs. For example mean lifespan: is also an MLE.
Assumptions: • n trials must be identical – i.e., the population is well defined (e.g.,20 coin flips, 50 Kirtland's warbler nests, 75 radio-marked black bears in the Pisgah Bear Sanctuary). • Each trial results in one of two mutually exclusive outcomes. (e.g., heads or tails, survived or died, successful or failed, etc.) • The probability of success on each trial remains constant. (homogeneous) • Trials are independent events (the outcome of one does not depend on the outcome of another). • y, the number of successes; is the random variable after n trials.
Example – use/non-use survey • Selected 50 sites (n) at random (or systematically) with a study area. • Visit each site once and ‘surveyed’ for species x • Species was detected at 10 sites (y) • Meet binomial assumptions: • Sites selected without bias • Surveys conducted using same methods • Sites could only be used or not used (occupied) • No knowledge of habitat differences or species preferences • Sites are independent • Additional assumption – perfect detection
MLE = 20% +6% of the area is occupied Example – results
Link functions - adding covariates • “Link” the covariates, the data (X),with the response variable (i.e., use or occupancy) • Usually done with logit link: • Nice properties: • Constrains result 0<pi<1
Link functions - adding covariates • “Link” the covariates, the data (X),with the response variable (i.e., use or occupancy) • Usually done with logit link: • Nice properties: • Constrains result 0<pi<1 • s can be -∞ < < +∞ • Additional assumption –s are normally distributed
Link function • Binomial likelihood: • Substitute the link for p • Voila! – logistic regression
Link function • More than one covariate can be included • Extend the logit (linear equation). • bs are the estimated parameters (effects); • estimated for each period or group • constrained to be equal using the data (xij).
Link function • The use rates or real parameters of interest are calculated from the s as in this equation. • HUGE concept and applicable to EVERY estimator we examine. • Occupancy and detection probabilities are replaced by the link function submodel of the covariate(s). • Conceivably every sites has a different probability of use that is related to the value of the covariates.
Multinomial probability An example Adding a link function
Multinomial Distribution and Likelihoods • Extension of the binomial coefficient with more than two possible mutually exclusive outcomes. • Nearly always introduced by way of die tossing. • Another example • Multiple presence/absence surveys at multiple sites
Binomial Coefficient • The binomial coefficient was the number of ways y successes could be obtained from the n trials • Example 7 successes in 10 trials
Multinomial coefficient • The multinomial coefficient or the number of possible outcomes for die tossing (6 possibilities): • Example rolling each die face once in 6 trials:
Properties of multinomials • Dependency among the counts. • For example, if a die is thrown and it is not a 1, 2, 3, 4, or 5, then it must be a 6.
Multinomial pdf • Probability an outcome or series of outcomes:
Die example 1 • The probability of rolling a fair die (pi = 1/6) six times (n) and turning up each face only once (ni= 1) is:
Die example 1 • Dependency
Example 2 • Another example, the probability of rolling 2 – 2s , 3 – 3s, and 1 – 4 is:
Likelihood • As you might have expected, the likelihood of the multinomial is of greater interest to us • We frequently have data (n, yi...m) and are seeking to determine the model (pi…m). The likelihood for our example with the die is:
Log-likelihood • This likelihood has all of the same properties we discussed for the binomial case. • Usually solve to maximize the ln(L)
Log-likelihood • Ignoring the multinomial coefficient (constant)
Presence-absence surveys & multinomials Procedure: • Selecta sample of sites • Conduct repeated presence-absence surveys at each site • Usually temporal replication • Sometimes spatial replication • Record presence or absence of species during survey
Encounter histories for each site & species • Encounter history matrix • Each row represents a site • Each column represents a sampling occasion. • On each occasion each species • ‘1’ if encountered (captured) • ‘0’ if not encountered.
Encounter history - example • For sites sampled on 3 occasions there are 8 (=2m = 23) possible encounter histories • 10 sites were sampled 3 times(not enough for a good estimate) • 1 – Detected during survey • 0 – Not-detected during survey • Separate encounter history for each species
Encounter history - example • Each capture history is a possible outcome, • Analogous to one face of the die (ni). • Data consist of the number of times each capture history appears (yi).
Encounter history - example • Each encounter history has an associated probability (pi) • Each pij can be different
Log-likelihood example • Log-likelihood • Calculate log of the probability of encounter history (ln(Pi)) • Multiply ln(Pi) by the number of times observed (yi) • Sum the products
Link function in binomial • Binomial likelihood: • Substitute the link for p • Voila! – logistic regression
Multinomial with link function • Substitute the logit link for the pi
But wait a minute! • Is Pr(Occupancy) = Pr(Encounter)?
Is Pr(Occupancy) = Pr(Encounter)? • Probability of encounter includes both detection and use (occupancy). • Occupancy analysis estimates each thus providing conditional estimates of use of sites. Sites known to be used Absent or Not detected?