1 / 23

Modeling

Modeling. Create an abstraction of the something in the real world Can be parameterized Should be validated against real-world data Types: Interpolation Correlation Simulation. Correlation Models. Predicting values of a dependent variable from one or more independent variables

lamont
Download Presentation

Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling • Create an abstraction of the something in the real world • Can be parameterized • Should be validated against real-world data • Types: • Interpolation • Correlation • Simulation

  2. Correlation Models • Predicting values of a dependent variable from one or more independent variables • Remember: correlation does not imply causation Wikipedia

  3. Parametric Methods • Typical Probability Distribution Functions • Gaussian, Negative Exponential, Binomial, Gamma, Poisson, … • Generalized Linear Models • Linearize data • Polynomial • Linear • N Order Polynomials • Generalized Additive Models • Box-Models (BioClim) • Logistic…

  4. Gaussian (Normal) Function Wikipedia

  5. Exponential (negative) Wikipedia

  6. Binomial • Number of successes in a series of yes/no trials Wikipedia

  7. Gamma function Wikipedia

  8. Poisson distribution • Probability of a given number of events occurring in a fixed interval of time Wikipedia

  9. Generalized Linear Models

  10. Polynomial • Flexible and adaptable • Notorious for oscillations between exact-fit values

  11. Non-Parametric Methods • Piece-wise Regression (MaxEnt) • Kernel Smoothing (NPMR) • Neural Nets • Regression/Decision Trees • Multivariate adaptive regressionsplines (MARS) • Genetic Algorithms

  12. Generalized Additive Models • Can use parametric functions or parametric

  13. Specific Methods/Software • MaxEnt: Species Distribution/Habitat Suitability Models • Non-Parametric Multiplicative Regression (NPMR) • Genetic Algorithm for Rule Set Production (GARP) • Others…

  14. Trees A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. Wikipedia

  15. Trees • Classification Trees • Predicted outcome is a class (sex) • Regression Trees • Predicted outcome is a value (percent) • Boosted Trees • Combines classification and regression trees • Random Forests • Combines many trees to improve fit

  16. Model Selection • Type of model should be selected based on what is known about the phenomenon being modeled • Given: • A set of data from “tests” • A set of models where we can compute the probability of each test • We can compute the “best” model based on it’s fit to the data and number of parameters (if we can compute a probability for each ‘test’ and the data is independent and identically distributed)

  17. Parsimony • “…too few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.” Edwards, A. W. F. (2001). Occam’s bonus. p. 128–139; in Zellner, A., Keuzenkamp, H. A., and McAleer, M. Simplicity, inference and modelling. Cambridge University Press, Cambridge, UK.

  18. Likelihood • Likelihood of a set of parameter values given some observed data=probability of observed data given parameter values

  19. Likelihood

  20. Akaike Information Criterion • AIC • K = number of estimated parameters in the model • L = Maximized likelihood function for the estimated model

  21. Parsimony Model Based Inference in the Life Sciences, Anderson

  22. AIC

  23. AIC • Only a relative meaning • Smaller is “better” • Balance between complexity (over fitting, lots of parameters), and bias

More Related