430 likes | 662 Views
Generalized Linear Model (GZLM): Overview. Dependent Variables. Continuous Discrete Dichotomous Polychotomous Ordinal Count. Continuous Variables. Quantitative variables that can take on any value within the limits of the variable. Continuous Variables (cont’d).
E N D
Dependent Variables • Continuous • Discrete • Dichotomous • Polychotomous • Ordinal • Count
Continuous Variables • Quantitative variables that can take on any value within the limits of the variable
Continuous Variables (cont’d) • Distance, time, or length • Infinite number of possible divisions between any two values, at least theoretically • “Only love can be divided endlessly and still not diminish” (Anne Morrow Lindbergh) • More than 11 ordered values • Scores on standardized scales such as those that measure parenting attitudes, depression, family functioning, and children’s behavioral problems
Discrete Variables • Finite number of indivisible values; cannot take on all possible values within the limits of the variable • Dichotomous • Polytomous • Ordinal • Count
Dichotomous Variables • Two categories used to indicate whether an event has occurred or some characteristic is present • Sometimes called binary or binomial variables • “To be or not to be, that is the question..” (William Shakespeare, “Hamlet”)
Dichotomous DVs • Placed in foster care or not • Diagnosed with a disease or not • Abused or not • Pregnant or not • Service provided or not
PolytomousVariables • Three or more unordered categories • Categories mutually exclusive and exhaustive • Sometimes called multicategorical or sometimes multinomial variables • “Inanimate objects can be classified scientifically into three major categories; those that don't work, those that break down and those that get lost” (Russell Baker)
PolytomousDVs • Reason for leaving welfare: • marriage, stable employment, move to another state, incarceration, or death • Status of foster home application: • licensed to foster, discontinued application process prior to licensure, or rejected for licensure • Changes in living arrangements of the elderly: • newly co-residing with their children, no longer co-residing, or residing in institutions
Ordinal Variables • Three or more ordered categories • Sometimes called ordered categorical variables or ordered polytomous variables • “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Ordinal DVs • Job satisfaction: • very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, or very satisfied • Severity of child abuse injury: • none, mild, moderate, or severe • Willingness to foster children with emotional or behavioral problems: • least acceptable, willing to discuss, or most acceptable
Count Variables • Number of times a particular event occurs to each case, usually within a given: • Time period (e.g., number of hospital visits per year) • Population size (e.g., number of registered sex offenders per 100,000 population), or • Geographical area (e.g., number of divorces per county or state) • Whole numbers that can range from 0 through +
Count Variables (cont’d) “Now I've got heartaches by the number, Troubles by the score, Every day you love me less, Each day I love you more” (Ray Price)
Count DVs • Number of hospital visits, outpatient visits, services used, divorces, arrests, criminal offenses, symptoms, placements, children fostered, children adopted
Generalized Linear Model (GZLM) (selected regression models)
Generalized How? • DV continuous or discrete • Normal or non-normal error distributions • Constant or non-constant variance • Provides a unifying framework for analyzing an entire class of regression models
GLM & GZLM Similarities • IVs are combined in a linear fashion (α + 1X1 + 2X2 + … kXk; • a slope is estimated for each IV; • each slope has an accompanying test of statistical significance and confidence interval; • each slope indicates the IV’s independent contribution to the explanation or prediction of the DV;
GLM & GZLM Similarities (cont’d) • the sign of each slope indicates the direction of the relationship • IVs can be any level of measurement; • the same methods are used for coding categorical IVs (e.g., dummy coding); • IVs can be entered simultaneously, sequentially or using other methods; • product terms can be used to test interactions;
GLM & GZLM Similarities (cont’d) • powered terms (e.g., the square of an IV) can be used to test curvilinearity; • overall model fit can be tested, as can incremental improvement in a model brought about by the addition or deletion of IVs (nested models); and • residuals, leverage values, Cook’s D, and other indices are used to diagnose model problems.
Common Assumptions • Correct model specification • Variables measured without error • Independent errors • No perfect multicollinearity
Correct Model Specification • Have you included relevant IVs? • Have you excluded irrelevant IVs? • Do the IVs that you have included have linear or non-linear relationships with your DV (or some function of your DV, as discussed below)? • Are one or more of your IVs moderated by other IVs (i.e., are there interaction effects)?
Variables Measured without Error • Limitation of regression models, given that most often our variables contain some measurement error
Independent Errors • Can be result of study design, e.g.: • Clustered data, which occurs when data are collected from groups • Temporally linked data, which occurs when data are collected repeatedly over time from the same people or groups • Can lead to incorrect significance tests and confidence intervals
Independent Errors (cont’d) • Examples of when this might not be true • Effect of parenting practices on behavioral problems of children and reports of parenting practices and behavioral problems collected from both parents in two-parent families • Effect of parenting practices on behavioral problems of children and information collected about behavioral problems for two or more children per family • Effects of leader behaviors on group cohesion in small groups, and information collected about leader behaviors and group cohesion from all members of each group
No Perfect Multicollinearity • Perfect multicollinearity exists when an IV is predicted perfectly by a linear combination of the remaining IVs • Typically quantified by “tolerance” or “variance inflation factor” (VIF) (1/tolerance) • Even high levels of multicollinearity may pose problems (e.g., tolerance < .20 or especially < .10)
Estimating Parameters (e.g., ) • GLM • Ordinary Least Squares (OLS) estimation • Estimates minimize sum of the squared differences between observed and estimated values of the DV http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html • GZLM • Maximum Likelihood (ML) estimation • Estimates have greatest likelihood (i.e., the maximum likelihood) of generating observed sample data if model assumptions are true
Testing Hypotheses • Overall and nested models (1 = 2 = k = 0) • GLM • F • GZLM • Likelihood ratio 2 • Individual slopes ( = 0) • GLM • t • GZLM • Wald 2 or likelihood ratio 2
Estimating DV with GLM • Three ways of expressing the same thing… • = α + 1X1 + 2X2 + … kXk • = • Assumed linear relationship • = Greek letter mu • Estimated mean value of DV • = Greek letter eta • Linear predictor
Estimating DV with Poisson Regresion • ln() = α + 1X1 + 2X2 + … kXk • ln() = • Assumed linear relationship
Single (Quantitative) IV Example • DV = number of foster children adopted • IV = Perceived responsibility for parenting (scale scores transformed to z-scores) • N = 285 foster mothers • Do foster mothers who feel a greater responsibility to parent foster children adopt more foster children?
Poisson Model • ln() = α + X • log of estimated mean count • .018 + (.185)(X) • Log of mean number of children adopted • Does not have intuitive or substantive meaning
Mathematical Functions • Function • √4 = 2 • Inverse (reverse) function • 22 = 4
Mathematical Functions (cont’d) • Function • ln(), natural logarithm of • “Link function” • Inverse (reverse) function • exp(), exponential of • ex on calculator • exp(x) in SPSS and Excel • “Inverse link function”
Link Function • ln(), log of estimated mean count • Connects (i.e., links) mean value of DV to linear combination of IVs • Transforms relationship between and so relationship is linear • Different GZLM models use different links • Does not have intuitive or substantive meaning
Inverse (Reverse) Link Function • Three ways of expressing the same thing… • = exp(α + 1X1 + 2X2 + … kXk) • = exp() • = e • represent values of the DV with intuitive and substantive meaning • e.g., mean number of children adopted
Estimated Mean DV .018 + (.185)(X) • X = 0 • .018 + (.185)(0) = .018 • e.018 = 1.018 • M = 1.02 children adopted • X = 1 • .018 + (.185)(1) = .203 • e.203 = 1.225 • M = 1.23 children adopted
Examples of Exponentiation • e0 = 1.00 • e.50 = 1.65 • e1.00 = 2.72
Problem • For discrete DVs the relationship between the DV () and the linear predictor () is non-linear • = α + 1X1 + 2X2 + … kXk • = • Non-linear • One-unit increase in an IV may be associated with a different amount of change in the mean DV, depending on the initial value of the IV
Solution • Linear relationship between a linear combination of one or more IVs and some function of the DV