160 likes | 459 Views
Multivariate statistical analysis. Regression analysis. Regression vs. correlation. 分析性解釋變數與反應變量之間的 ( 先驗 ) 因果關係 衡量變數之間的關聯 (association) 強度. Regression model. (Y 1 , Y 2 , … Y j )= f (X 1 , X 2 , … X k ) k≧2, multiple regression( 複迴歸 ) j≧2, multivariate regression( 多元迴歸 )
E N D
Multivariate statistical analysis Regression analysis
Regression vs. correlation • 分析性解釋變數與反應變量之間的(先驗)因果關係 • 衡量變數之間的關聯(association)強度
Regression model • (Y1, Y2, …Yj)=f(X1, X2,…Xk) • k≧2, multiple regression(複迴歸) • j≧2, multivariate regression(多元迴歸) • The assumed model, yn=β0+β1x1+β2x2+…βnxn+en, • en is the random error term based on some prerequisite assumptions • Normal i.i.d. ~N(0, σ2) • Normality • Independence • Variance equality
Modeling the regression line • Ref.
Sum of errors • Sum of squares for error (SSE) • Sum of squares for model (SSM) • Sum of squares for total (SST) • MSE=SSE/d.f. of error=SSE/K • MSM=SSM/d.f. of model=SSM/(N-K-1) • d.f. of total=N-1 • F=MSM/MSE
Determination • Coefficient of determination • R2=SSM/SST=1-SSE/SST, 0≦R2≦1 • Adjusted coefficient of determination • Adjusted by means of dividing by degree of freedom • Adj. R2=1-[SSE/(N-K-1)]/[SST/(N-1)]=1-(1-R2)[(N-1)/(N-K-1)] • N>K+1, 必須比解釋變數之個數加一還多 • Determining the goodness of fit of a sampled regression line
t-test for the coefficients of explaining variables—Marginal testing
Conflicts between total testing and marginal testing • Confidence interval vs. confidence region (a region composed with several more narrower interval confidence intervals respectively)
Determine the predictors • Checking the contribution of additional variables • Stepwise regression • Forward regression • Backward regression
Testing the assumptions • Normality testing • Wilk-shapiro statistics • Q-Q/ P-P plotting (expected distribution vs. real distribution) • Variance equality testing • Scatter the error term along xn • Verify the randomized pattern • Durbin-Watson test for testing the first autocorrelation of residuals • Mean=2, if >2, “-” relation, if <2, “+” relation • Independence testing • Assumed the random & independent sampling process for the cross-sectional data • Time-series analysis for the longitudinal data
Colinearity • A pair of predictor variables that are strongly correlated • Tolerance, 1-Rj2 , • if there exists strong correlation, the Tolerance will be smaller and near to zero • VIF (variance inflation factor) • The inverse of tolerance, if tolerance is small, VIF will inflate very large
Outliers • Leverage hjj, (<1) • hjj=1/n+[square(objj - obj mean)]/Σ[ square(objj - obj mean)] • If hjj is comparatively too large, remove this observation.
Weighted regression • The different impact of sample data • Outliers set the influence weight near to 0
Data transformation • Transformation for normality, variance equality • Transformation by log, or inverse, square