230 likes | 335 Views
Modeling. Create an abstraction of the something in the real world Can be parameterized Should be validated against real-world data Types: Interpolation Correlation Simulation. Correlation Models. Predicting values of a dependent variable from one or more independent variables
E N D
Modeling • Create an abstraction of the something in the real world • Can be parameterized • Should be validated against real-world data • Types: • Interpolation • Correlation • Simulation
Correlation Models • Predicting values of a dependent variable from one or more independent variables • Remember: correlation does not imply causation Wikipedia
Parametric Methods • Typical Probability Distribution Functions • Gaussian, Negative Exponential, Binomial, Gamma, Poisson, … • Generalized Linear Models • Linearize data • Polynomial • Linear • N Order Polynomials • Generalized Additive Models • Box-Models (BioClim) • Logistic…
Gaussian (Normal) Function Wikipedia
Exponential (negative) Wikipedia
Binomial • Number of successes in a series of yes/no trials Wikipedia
Gamma function Wikipedia
Poisson distribution • Probability of a given number of events occurring in a fixed interval of time Wikipedia
Polynomial • Flexible and adaptable • Notorious for oscillations between exact-fit values
Non-Parametric Methods • Piece-wise Regression (MaxEnt) • Kernel Smoothing (NPMR) • Neural Nets • Regression/Decision Trees • Multivariate adaptive regressionsplines (MARS) • Genetic Algorithms
Generalized Additive Models • Can use parametric functions or parametric
Specific Methods/Software • MaxEnt: Species Distribution/Habitat Suitability Models • Non-Parametric Multiplicative Regression (NPMR) • Genetic Algorithm for Rule Set Production (GARP) • Others…
Trees A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. Wikipedia
Trees • Classification Trees • Predicted outcome is a class (sex) • Regression Trees • Predicted outcome is a value (percent) • Boosted Trees • Combines classification and regression trees • Random Forests • Combines many trees to improve fit
Model Selection • Type of model should be selected based on what is known about the phenomenon being modeled • Given: • A set of data from “tests” • A set of models where we can compute the probability of each test • We can compute the “best” model based on it’s fit to the data and number of parameters (if we can compute a probability for each ‘test’ and the data is independent and identically distributed)
Parsimony • “…too few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.” Edwards, A. W. F. (2001). Occam’s bonus. p. 128–139; in Zellner, A., Keuzenkamp, H. A., and McAleer, M. Simplicity, inference and modelling. Cambridge University Press, Cambridge, UK.
Likelihood • Likelihood of a set of parameter values given some observed data=probability of observed data given parameter values
Akaike Information Criterion • AIC • K = number of estimated parameters in the model • L = Maximized likelihood function for the estimated model
Parsimony Model Based Inference in the Life Sciences, Anderson
AIC • Only a relative meaning • Smaller is “better” • Balance between complexity (over fitting, lots of parameters), and bias