1 / 15

Multivariate statistical analysis

Multivariate statistical analysis. Regression analysis. Regression vs. correlation. 分析性解釋變數與反應變量之間的 ( 先驗 ) 因果關係 衡量變數之間的關聯 (association) 強度. Regression model. (Y 1 , Y 2 , … Y j )= f (X 1 , X 2 , … X k ) k≧2, multiple regression( 複迴歸 ) j≧2, multivariate regression( 多元迴歸 )

salma
Download Presentation

Multivariate statistical analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate statistical analysis Regression analysis

  2. Regression vs. correlation • 分析性解釋變數與反應變量之間的(先驗)因果關係 • 衡量變數之間的關聯(association)強度

  3. Regression model • (Y1, Y2, …Yj)=f(X1, X2,…Xk) • k≧2, multiple regression(複迴歸) • j≧2, multivariate regression(多元迴歸) • The assumed model, yn=β0+β1x1+β2x2+…βnxn+en, • en is the random error term based on some prerequisite assumptions • Normal i.i.d. ~N(0, σ2) • Normality • Independence • Variance equality

  4. Modeling the regression line • Ref.

  5. ANOVA table for regression analysis—total model testing

  6. Sum of errors • Sum of squares for error (SSE) • Sum of squares for model (SSM) • Sum of squares for total (SST) • MSE=SSE/d.f. of error=SSE/K • MSM=SSM/d.f. of model=SSM/(N-K-1) • d.f. of total=N-1 • F=MSM/MSE

  7. Determination • Coefficient of determination • R2=SSM/SST=1-SSE/SST, 0≦R2≦1 • Adjusted coefficient of determination • Adjusted by means of dividing by degree of freedom • Adj. R2=1-[SSE/(N-K-1)]/[SST/(N-1)]=1-(1-R2)[(N-1)/(N-K-1)] • N>K+1, 必須比解釋變數之個數加一還多 • Determining the goodness of fit of a sampled regression line

  8. t-test for the coefficients of explaining variables—Marginal testing

  9. Conflicts between total testing and marginal testing • Confidence interval vs. confidence region (a region composed with several more narrower interval confidence intervals respectively)

  10. Determine the predictors • Checking the contribution of additional variables • Stepwise regression • Forward regression • Backward regression

  11. Testing the assumptions • Normality testing • Wilk-shapiro statistics • Q-Q/ P-P plotting (expected distribution vs. real distribution) • Variance equality testing • Scatter the error term along xn • Verify the randomized pattern • Durbin-Watson test for testing the first autocorrelation of residuals • Mean=2, if >2, “-” relation, if <2, “+” relation • Independence testing • Assumed the random & independent sampling process for the cross-sectional data • Time-series analysis for the longitudinal data

  12. Colinearity • A pair of predictor variables that are strongly correlated • Tolerance, 1-Rj2 , • if there exists strong correlation, the Tolerance will be smaller and near to zero • VIF (variance inflation factor) • The inverse of tolerance, if tolerance is small, VIF will inflate very large

  13. Outliers • Leverage hjj, (<1) • hjj=1/n+[square(objj - obj mean)]/Σ[ square(objj - obj mean)] • If hjj is comparatively too large, remove this observation.

  14. Weighted regression • The different impact of sample data • Outliers  set the influence weight near to 0

  15. Data transformation • Transformation for normality, variance equality • Transformation by log, or inverse, square

More Related