1 / 20

Data Handling & Analysis Polynomials and model fit

Data Handling & Analysis Polynomials and model fit. Andrew Jackson a.jackson@tcd.ie. Linear type data. How are two measures related?. What do we do about curvature?. Data are the number of species (Y) recorded per time spent looking for them (X)

hollie
Download Presentation

Data Handling & Analysis Polynomials and model fit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Handling & AnalysisPolynomials and model fit Andrew Jackson a.jackson@tcd.ie

  2. Linear type data • How are two measures related?

  3. What do we do about curvature? • Data are the number of species (Y) recorded per time spent looking for them (X) • Specifically, these data come from fisheries data • Good proxy for species diversity in the marine habitat

  4. Clearly a straight line won’t do

  5. … the residuals are horrible

  6. Polynomials • Polynomials are linear equations that show curvature • Quadratics • Y = b0 + b1X + b2X2 • Cubics • Y = b0 + b1X + b2X2 + b3X3 • 5th, 6th order polynomials etc…

  7. Quadratic model

  8. Quadratic residuals • Better… • But not so good at lower values of x • Try a more complicated model like a cubic

  9. Cubic model • Note the double curvature • Model appears to explain the lower values better • But how sure are we of the increase at higher values?

  10. Cubic residuals • Better than the quadratic • But still over-estimating the lowest values of x

  11. Log transform the X variable • Model is • Y~log(X) • Appears to explain the data very well across the full range • Check the residuals…

  12. Y~log(X) residuals • Now these look pretty near perfect

  13. The null model • Consists of a mean and a variance only • It gives us a benchmark against which we can test our models that include more information • If we can’t do better than the null model then we don’t understand our data or system!

  14. Residuals of the null model

  15. Choosing between alternative models • We now have a choice between 5 models • Null model (zero order polynomial, which includes an intercept only – i.e. just a mean and variance model) • Straight line (first order polynomial) • Quadratic (second order polynomial) • Cubic (third order polynomial) • First order polynomial with log(X) • How do we select which one to use? • Higher order polynomials require more parameters

  16. Parsimony as a central tenet • Parsimony is the application of the most simplest explanation for a phenomenon and underpins all of science • So.. We need to pick the model that • Fits the data the best, and … • Uses the least number of parameters

  17. Likelihood of data

  18. AIC for model selection • We will use Akaike’s Information Criterion (AIC) to select the most suitable model • AIC = -2Log(likelihood) + 2k • Log-likelihood gets bigger the better the fit • k is the number of parameters in the model • Lower AIC = more suitable model

  19. AIC of our models • Null model - 248.2 • Straight line - 184.1 • Quadratic - 142.5 • Cubic - 124.9 • 4th order - 83.5 • 5th order - 77.6 • 6th order - 77.7 • log(X) - 68.4 • So the log(x) model is the best in this case • Note that adding more orders to the polynomials ceases to confer any benefit after 5th order. Also… these get increasingly difficult to explain and relate to biological phenomena

  20. Conclusions • AIC provides an objective way to compare alternative models • Lower AIC indicates a more parsimonius model • Must only compare AIC on models of the exact same response variable • Only provides relative, and not absolute indication of model fit • Still need to check that the model is any good • Residuals etc…

More Related