1 / 68

Ctd . Local Linear Models, LDA ( cf. PCA,FA) , Mixed Models: Optimizing, Iterating

Ctd . Local Linear Models, LDA ( cf. PCA,FA) , Mixed Models: Optimizing, Iterating. Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 4 Module 13, April 9, 2018. Smoothing/ local …. https://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/modreg/html/00Index.html

johnavon
Download Presentation

Ctd . Local Linear Models, LDA ( cf. PCA,FA) , Mixed Models: Optimizing, Iterating

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ctd. Local Linear Models, LDA (cf. PCA,FA), Mixed Models: Optimizing, Iterating Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 4 Module 13, April 9, 2018

  2. Smoothing/ local … • https://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/modreg/html/00Index.html • http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf

  3. Classes of local regression • Locally (weighted) scatterplot smoothing • LOESS • LOWESS • Fitting is done locally - the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance)

  4. Classes of local regression • The size of the neighborhood is controlled by α (set by span). • For α < 1, the neighbourhood includes proportion α of the points, and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). For α > 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables.

  5. Classes of local regression • For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey'sbiweight are used. • Be aware that as the initial value is the least-squares fit, this need not be a very resistant fit. • It can be important to tune the control list to achieve acceptable speed.

  6. Friedman (supsmu in modreg) • is a running lines smoother which chooses between three spans for the lines. • The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. • If span is specified, a single smoother with span span * n is used.

  7. Friedman • The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation. • For small samples (n < 40) or if there are substantial serial correlations between observations close in x-value, then a pre-specified fixed span smoother (span > 0) should be used. • Reasonable span values are 0.2 to 0.4.”

  8. Local non-param • lplm (in Rearrangement) • Local nonparametric method, local linear regression estimator with box kernel (default), for conditional mean functions

  9. Ridge regression • Addresses ill-posed regression problems using filtering approaches (e.g. high-pass) • Often called “regularization” • lm.ridge (in MASS)

  10. Quantile regression • is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements • In practice we often prefer using different measures of central tendency and statistical dispersion to obtain a more comprehensive analysis of the relationship between variables • quantreg (in R)

  11. Splines • smooth.spline, splinefun (stats, modreg) and ns (in splines) • http://www.inside-r.org/r-doc/splines • a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots)

  12. Splines • For interpolation, splines are often preferred to polynomial interpolation - they yields similar results to interpolating with higher degree polynomials while avoiding instability due to overfitting • Features: simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes • Most common: cubic spline, i.e., of order 3—in particular, cubic B-spline

  13. More… • Partial Least Squares Regression (PLSR) • mvr (in pls) • Principal Component Regression (PCR) • Canonical Powered Partial Least Squares (CPPLS)

  14. PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all • On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components • Whether or not that ultimately translates into a better model, in terms of its practical use, depends on the context

  15. Linear Discriminant Analysis • Find a linear combination of features that characterizes or separates two or more classes of objects or events, i.e. a linear classifier, c.f. dimension reduction then classification (multiple classes, e.g. facial rec.) • Library lda in package MASS • Dependent variable (the class) is categorial and independent variables are continuous • Assumes normal distribution of classes and equal class co-variances, c.f. Fisher LD does not (fdaCMA in package CMA)

  16. Relation to PCA, FA? • Both seek linear combinations of variables which best “explain” the data (variance) • LDA explicitly models the difference between the classes of data • PCA on the other hand does not take into account any difference in class • Factor analysis (FA) builds the feature combinations based on differences of factors rather than similarities

  17. Relation to PCA, FA? • Discriminant analysis is not an interdependence technique: a distinction between independent variables and dependent variables is made (cf. different from factor analysis) • NB: If you have categorical independent variables, the equivalent technique is Discriminant Correspondence Analysis (discrimin.coa in ade4) • See also Flexible DA (fda) and Mixture DA (mda) in mda

  18. Now mixed models

  19. What is a mixed model? • Often known as latent class (mixed models) or linear, or non-linear mixed models • Basic type – mix of two models • Random component to model, or is unobserved • Systematic component = observed… • E.g. linear model: y=y0+br x + bs z • y0 – intercept • br – for random coefficient • bs for systematic coefficient • Or y=y0+fr(x,u,v,w) + fs(z,a,b) • Or …

  20. Example • Gender – systematic • Movie preference – random? • In semester – systematic • Students on campus – random? • Summer – systematic • People at the beach – random?

  21. Remember latent variables? • In factor analysis – goal was to use observed variables (as components) in “factors” • Some variables were not used – why? • Low cross-correlations? • Small contribution to explaining the variance? • Mixed models aim to include them!! • Thoughts?

  22. Latent class (LC) • LC models do not rely on the traditional modeling assumptions which are often violated in practice (linear relationship, normal distribution, homogeneity) • less subject to biases associated with data not conforming to model assumptions. • In addition, LC models include variables of mixed scale types (nominal, ordinal, continuous and/or count variables) in the same analysis.

  23. Latent class (LC) • For improved cluster or segment description the relationship between the latent classes and external variables (covariates) can be assessed simultaneously with the identification of the clusters. • eliminates the need for the usual second stage of analysis where a discriminant analysis is performed to relate the cluster results to demographic and other variables.

  24. Kinds of Latent Class Models • Three common statistical application areas of LC analysis are those that involve • 1)  clustering of cases, • 2)  variable reduction and scale construction, and • 3) prediction.

  25. Thus! • To construct and then run a mixed model, YOU must make many choices including: • the nature of the hierarchy, • the fixed effects and, • the random effects.

  26. Beyond mixture = 2? • Hierarchy, fixed, random = 3? • More? • Changes over time – a fourth dimension?

  27. Comparing lm, glm, lme4, lcmm lmm.data <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module9/lmm.data.txt", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE) summary(lmm.data) id extro open agree social class school Min. : 1.0 Min. :30.20 Min. :22.30 Min. :18.48 Min. : 46.31 a:300 I :200 1st Qu.: 300.8 1st Qu.:54.17 1st Qu.:36.20 1st Qu.:31.90 1st Qu.: 89.32 b:300 II :200 Median : 600.5 Median :60.15 Median :39.98 Median :35.05 Median : 99.20 c:300 III:200 Mean : 600.5 Mean :60.27 Mean :40.06 Mean :35.07 Mean : 99.53 d:300 IV :200 3rd Qu.: 900.2 3rd Qu.:66.50 3rd Qu.:43.93 3rd Qu.:38.42 3rd Qu.:109.83 V :200 Max. :1200.0 Max. :90.83 Max. :57.87 Max. :58.44 Max. :151.96 VI :200

  28. Comparing lm, glm, lme4, lcmm > head(lmm.data) id extro open agree social class school 1 1 63.69356 43.43306 38.02668 75.05811 d IV 2 2 69.48244 46.86979 31.48957 98.12560 a VI 3 3 79.74006 32.27013 40.20866 116.33897 d VI 4 4 62.96674 44.40790 30.50866 90.46888 c IV 5 5 64.24582 36.86337 37.43949 98.51873 d IV 6 6 50.97107 46.25627 38.83196 75.21992 d I > nrow(lmm.data) [1] 1200

  29. Comparing lm, glm, lme4, lcmm lm.1 <- lm(extro ~ open + social, data = lmm.data) summary(lm.1) Call: lm(formula = extro ~ open + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.2870 -6.0657 -0.1616 6.2159 30.2947 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.754056 2.554694 22.998 <2e-16 *** open 0.025095 0.046451 0.540 0.589 social 0.005104 0.017297 0.295 0.768 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.339 on 1197 degrees of freedom Multiple R-squared: 0.0003154, Adjusted R-squared: -0.001355 F-statistic: 0.1888 on 2 and 1197 DF, p-value: 0.828

  30. And then lm.2 <- lm(extro ~ open + agree + social, data = lmm.data) summary(lm.2) Call: lm(formula = extro ~ open + agree + social, data = lmm.data) Residuals: Min 1Q Median 3Q Max -30.3151 -6.0743 -0.1586 6.2851 30.0167 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 57.839518 3.148056 18.373 <2e-16 *** open 0.024749 0.046471 0.533 0.594 agree 0.026538 0.053347 0.497 0.619 social 0.005082 0.017303 0.294 0.769 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.342 on 1196 degrees of freedom Multiple R-squared: 0.0005222, Adjusted R-squared: -0.001985 F-statistic: 0.2083 on 3 and 1196 DF, p-value: 0.8907

  31. anova(lm.1, lm.2) Analysis of Variance Table Model 1: extro ~ open + social Model 2: extro ~ open + agree + social Res.Df RSS Df Sum of Sq F Pr(>F) 1 1197 104400 2 1196 104378 1 21.598 0.2475 0.619

  32. Nesting, etc lm.3 <- lm(extro ~ open + social + class + school, data = lmm.data) summary(lm.3) Call: lm(formula = extro ~ open + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 social -0.001773 0.003106 -0.571 0.568 classb 2.038816 0.136575 14.928 <2e-16 *** classc 3.696904 0.136266 27.130 <2e-16 *** classd 5.654166 0.136286 41.488 <2e-16 *** schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.669 on 1189 degrees of freedom Multiple R-squared: 0.9683, Adjusted R-squared: 0.968 F-statistic: 3631 on 10 and 1189 DF, p-value: < 2.2e-16

  33. Nesting, etc Call: lm(formula = extro ~ open + agree + social + class + school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -13.1270 -0.9090 0.0155 0.8734 13.7295 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.254814 0.577059 74.957 <2e-16 *** open 0.010833 0.008349 1.298 0.195 agree -0.005474 0.009605 -0.570 0.569 social -0.001762 0.003107 -0.567 0.571 classb 2.044195 0.136939 14.928 <2e-16 *** classc 3.701818 0.136577 27.104 <2e-16 *** classd 5.660806 0.136822 41.374 <2e-16 *** schoolII 7.924110 0.167391 47.339 <2e-16 *** schoolIII 12.117899 0.166983 72.569 <2e-16 *** schoolIV 16.050765 0.167177 96.011 <2e-16 *** schoolV 20.406924 0.167115 122.113 <2e-16 *** schoolVI 28.065860 0.167127 167.931 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.669 on 1188 degrees of freedom Multiple R-squared: 0.9683, Adjusted R-squared: 0.968 F-statistic: 3299 on 11 and 1188 DF, p-value: < 2.2e-16 lm.4 <- lm(extro ~ open + agree + social + class + school, data = lmm.data) summary(lm.4)

  34. Analyze the variances** anova(lm.3, lm.4) Analysis of Variance Table Model 1: extro ~ open + social + class + school Model 2: extro ~ open + agree + social + class + school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1189 3311.4 2 1188 3310.5 1 0.90492 0.3247 0.5689

  35. Specific interaction term # 'class:school’ - different situation than one # with random effects (e.g., nested variables). lm.5 <- lm(extro ~ open + social + class:school, data = lmm.data) summary(lm.5)

  36. Summary Call: lm(formula = extro ~ open + social + class:school, data = lmm.data) Residuals: Min 1Q Median 3Q Max -9.8354 -0.3287 0.0141 0.3329 10.3912 Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.008e+01 3.073e-01 260.581 <2e-16 *** open 6.019e-03 4.965e-03 1.212 0.226 social 5.239e-04 1.853e-03 0.283 0.777 classa:schoolI -4.038e+01 1.970e-01 -204.976 <2e-16 *** classb:schoolI -3.460e+01 1.971e-01 -175.497 <2e-16 *** classc:schoolI -3.186e+01 1.970e-01 -161.755 <2e-16 *** classd:schoolI -2.998e+01 1.972e-01 -152.063 <2e-16 *** classa:schoolII -2.814e+01 1.974e-01 -142.558 <2e-16 *** classb:schoolII -2.675e+01 1.971e-01 -135.706 <2e-16 ***

  37. Summary classc:schoolII -2.563e+01 1.970e-01 -130.139 <2e-16 *** classd:schoolII -2.456e+01 1.969e-01 -124.761 <2e-16 *** classa:schoolIII -2.356e+01 1.970e-01 -119.605 <2e-16 *** classb:schoolIII -2.259e+01 1.970e-01 -114.628 <2e-16 *** classc:schoolIII -2.156e+01 1.970e-01 -109.482 <2e-16 *** classd:schoolIII -2.064e+01 1.971e-01 -104.697 <2e-16 *** classa:schoolIV -1.974e+01 1.972e-01 -100.085 <2e-16 *** classb:schoolIV -1.870e+01 1.970e-01 -94.946 <2e-16 *** classc:schoolIV -1.757e+01 1.970e-01 -89.165 <2e-16 *** classd:schoolIV -1.660e+01 1.969e-01 -84.286 <2e-16 *** classa:schoolV -1.548e+01 1.970e-01 -78.609 <2e-16 *** classb:schoolV -1.430e+01 1.970e-01 -72.586 <2e-16 *** classc:schoolV -1.336e+01 1.974e-01 -67.687 <2e-16 *** classd:schoolV -1.202e+01 1.970e-01 -61.051 <2e-16 *** classa:schoolVI -1.045e+01 1.970e-01 -53.038 <2e-16 *** classb:schoolVI -8.532e+00 1.971e-01 -43.298 <2e-16 ***

  38. Summary classc:schoolVI -5.575e+00 1.969e-01 -28.310 <2e-16 *** classd:schoolVI NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9844 on 1174 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4264 on 25 and 1174 DF, p-value: < 2.2e-16 The output of both models show 'NA' where an interaction # term is redundant with one listed somewhere above it (there are 4 classes and 6 schools).

  39. Specific interaction term lm.6 <- lm(extro ~ open + agree + social + class:school, data = lmm.data) summary(lm.6) # some output omitted… Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 8.036e+01 3.680e-01 218.376 <2e-16 *** open 6.097e-03 4.964e-03 1.228 0.220 agree -7.751e-03 5.699e-03 -1.360 0.174 social 5.468e-04 1.852e-03 0.295 0.768 … classd:schoolVI NA NA NA NA Residual standard error: 0.9841 on 1173 degrees of freedom Multiple R-squared: 0.9891, Adjusted R-squared: 0.9889 F-statistic: 4103 on 26 and 1173 DF, p-value: < 2.2e-16

  40. Compare interaction terms anova(lm.5, lm.6) Analysis of Variance Table Model 1: extro ~ open + social + class:school Model 2: extro ~ open + agree + social + class:school Res.Df RSS Df Sum of Sq F Pr(>F) 1 1174 1137.7 2 1173 1135.9 1 1.7916 1.8502 0.174

  41. Structure in glm • Even the more flexible Generalized Linear Model (glm) function can not handle nested effects, although it can handle some types of random effects (e.g., repeated measures designs/data which is not covered here). • The primary benefit of the 'glm' function is the ability to specify non-normal distributions • Output from the 'glm' function offers the Akaike Information Criterion (AIC) which can be used to compare models and is much preferred over R-square or even adjusted R-square • lower AIC indicates a better fitting model; an AIC of -22.45 indicates a better fitting model than one with an AIC of 14.25

  42. glm? 'glm' function offers the Akaike Information Criterion (AIC) – so… glm.1 <- glm(extro ~ open + social + class + school, data = lmm.data) summary(glm.1) Call: glm(formula = extro ~ open + social + class + school, data = lmm.data) Deviance Residuals: Min 1Q Median 3Q Max -13.1368 -0.9154 0.0176 0.8631 13.6773 Coefficients:

  43. glm? Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.069523 0.476596 90.369 <2e-16 *** open 0.010793 0.008346 1.293 0.196 social -0.001773 0.003106 -0.571 0.568 classb 2.038816 0.136575 14.928 <2e-16 *** classc 3.696904 0.136266 27.130 <2e-16 *** classd 5.654166 0.136286 41.488 <2e-16 *** schoolII 7.921787 0.167294 47.353 <2e-16 *** schoolIII 12.119003 0.166925 72.602 <2e-16 *** schoolIV 16.052566 0.167100 96.066 <2e-16 *** schoolV 20.410702 0.166936 122.266 <2e-16 *** schoolVI 28.063091 0.167009 168.033 <2e-16 ***

  44. glm? --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 2.785041) Null deviance: 104432.7 on 1199 degrees of freedom Residual deviance: 3311.4 on 1189 degrees of freedom AIC: 4647.5 Number of Fisher Scoring iterations: 2

  45. Glm2, 3 > glm.2 <- glm(extro ~ open + social + class:school, data = lmm.data) > glm.3 <- glm(extro ~ open + agree + social + class:school, data = lmm.data)

  46. Compare… • Glm1 - AIC: 4647.5 • Glm2 - AIC: 3395.5 • Glm3 – AIC: 3395.6 • Conclusion?

  47. However… In order to adequately test these nested (random) effects, we must turn to another type of modeling function/package. > library(lme4)

  48. However… • The Linear Mixed Effects (lme4) package is designed to fit a linear mixed model or a generalized linear mixed model or a nonlinear mixed model. • Example – following lm and glm • Fit linear mixed effect models with fixed effects for open & social or open, agree, & social, as well as random/nested effects for class within school; to predict scores on the outcome variable, extroversion (extro)

More Related