1 / 42

Ch10 Logistic Regression

Ch10 Logistic Regression. 迴歸分析 用於描述一應變數與一個( ) 的預測變數之關係. 必須滿足的假設: 常態性( 獨立變數並非常態性的假設 ) 變異數的均質性 獨立性. 迴歸分析之功用: 預測(給 x 求 y ) 控制(給 y 求 x ) 描述. Logistic Regression An Introduction to Categorical data Analysis---Alan Agresti, 1996

jada-conner
Download Presentation

Ch10 Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch10 Logistic Regression

  2. 迴歸分析 • 用於描述一應變數與一個( ) 的預測變數之關係. • 必須滿足的假設: • 常態性(獨立變數並非常態性的假設) • 變異數的均質性 • 獨立性

  3. 迴歸分析之功用: • 預測(給x求 y) • 控制(給y求 x) • 描述

  4. Logistic Regression • An Introduction to Categorical data Analysis---Alan Agresti, 1996 • 當區別分析的群體中,不符合常態分配假設時,可用(logistic Regression)來做.  • Logistic Regression並非預測事件是否發生.而是預測該事件的機率. • 當應變數(x)屬於離散型的變數,其分類只有2類或少數時,以logistic Regression來分析.

  5. Logistic Regression • 能討論類別,定量的自變數對一類別的關係. • 進行消費者問卷調查時, 獲得消費者行為的質性分類資料(會不會投資,購買意願, 發生與未發生等)並獲得影響此分類資料的原因(年紀,收入,產地,經濟景氣,氣候與偏好) • 當應變數有兩個或 屬直性之變數時,用logistic or Probit來分析較適當.

  6. Logistic Regression • 二元資料的廣義線性模式(Binary data) • 很多類別的反應變數只有兩類: 投票 (民主黨vs 共和黨) 汽車的選擇 (進口車vs 國產車) 婦女是否有乳癌的診斷 (無vs 有) 以 Y表二元反應P(Y=1) = 成功 P(Y=0) = 1 -  失敗

  7. 二元反應亦稱伯努利變數(Bernoulli Variable)其 分佈由成功機率與失敗機率所訂.此分佈 平均數 E(Y) =  變異數 Var(Y) = 1-  若一參數的二元反應有幾個獨立觀測值, 則成功數服從具有指標n及的二項分配

  8. Logical regression function P = ef(x) 1 – P = 成功的機率 (非線性) 1 + ef(x) 1 + ef(x) 1 失敗的機率(非線性) P ef(x) = 優勢比 1 – P P 1 - P ln ( ) = f(x) = 0+ 1x1+ 2 x2 +..

  9. 經過轉換而成具 有線性的性質  =  +  log ( ) 1-  • (x) 與x 的非線性關係是單調的(monotonic) • (x) 隨著x 的增加而連續地遞增 or • (x) 隨著x 的增加而連續地遞減 (a) (b) 1 1 (x) (x) x x > 0  <0

  10. 參數 決定曲線上升或下降的速度. 當 > 0. (x)隨x 之增加而增加 如(a) 當 < 0. (x)隨x 之增加而減少 如(b) 當 =0.曲線便成水平線.此時(x)對x而言是常數. Y與x成獨立.

  11. > 0 logit curve最陡處 由圖(a) 在特定的x值做一切線,描述該點的變化率以參數的logistical regression來討論該點斜率 m =  (x)( 1 - (x)) (a) 1 Ex:if (x) =0.5 m= (0.5)(0.5)=0.25  when (x) =1 m= 0 0.5 x

  12. > 0 logit curve最陡處  曲線最陡處發生在 (x) =0.5對應的x處.其x = -  (a) (x) 1- (x) 1 0.5  x = - log 1 = +  x  +  x =0  x Ex: log ( ) =  +  x log(0.5/0.5) = +  x

  13. Odds Ratio Interpretation (優勝比的解釋) odds vs. the odds ratio: 勝算 vs.勝算比 = exp ( + ) = e (e )x (x) (x) 1- (x) 1- (x) 此式提供一個解釋: 勝算在x增加一單位時,有依倍數的增加效應(e ) 勝算對數log =  +  x即(x)的logical變換,具線性關係. i.e. x的每一單位改變導致logical值單位的增減.

  14. logical regression 優於其它機率值的原因: (針對個案對照組的原因cas-control studies) 針對回朔抽樣設計(retrospective sampling design) Ex:個案對照研究 Y=1 反應(cases) 觀察二組樣本若個案與 Y=0 對照案(controls) 對照有差異的分佈.表示x與Y之間有存在關聯 logical regression涉及(odds & the odds ratio )勝算比.可配適此種模型於回朔資料,並估計個案與對照案的效應.

  15. Inference for logical regression:效應的信賴區間 探討模型參數的統計理論 協助評斷效應的顯著性與其大小. 針對大樣本 log =  +  x 中 的信賴區間為 (x) 1- (x)  + Z  (ASE) 2 此區間端點取指數: e 因x一單位增加 對勝算的倍數效應之對應區間

  16. =0.497 而ASE = 0.102 Sol: = 0.497  1.96 (0.102) = ( 0.298, 0.697)   + Z  (ASE) 2 ASE (Asymptotic Standard Error) 漸進標準誤 Ex: 探討雌蟹寬度(gap)是否存在跟班? (Y=1有Y=0無,預測有跟班的雌蟹數目) 因 的一個95%信賴區間為: 推論: 寬度每增加一公分,至少提高有跟班的勝算35%最高能提高一倍.

  17. Logical regression significance testing(顯著性檢定) Ho :  = 0表示成功機率和x 無關 Ha :  ‡ 0表示成功機率和x 有關 在  = 0 時 具標準常態分配 ( 可取得單或雙尾) 在 ‡ 0時,z2 具 df=1 的  2 分配 p 值: 超過觀測值的  2 分配 ---右尾機率 在大樣本,檢定統計量為 此參數估計除以其標準誤後取平方.稱為華德統計量(Wald Statistics)  + Z  (ASE) 2

  18. 模型推論與檢核的另一種方法:使用概似函數比模型推論與檢核的另一種方法:使用概似函數比 在下列二種情況下取最大, 再求比率. 1) 在H0限制下,參數所有可能值範圍內求極大. 2)在全模型限制下, H0或H1成立均可.參數所有可 能值範圍內求極大. 令 l 1 : 全模型限制下概似函數的最大值. l2 : H0之較簡單模型限制下的最大值.

  19. Ex: 線性預測 +  x 之 Ho :  = 0 Ha :  ‡ 0 則l 0 : 在  = 0時,概似函數於最像會產生所見資料 的 值. l 1 : 概似函數在看起來最像會產生所見到的資料 (, )組合起來. 其中, l 0 是在產生l 1 範圍之ㄧ個限制子集合上之最大值. 所以 l 1 至少與l 0 一樣大.

  20. Likelihood-ratio 檢定統計量 -2log (l 0 / l 1 ) = -2 [ log (l 0 ) - log (l 1) ] = -2 [ L 0 -L 1 ] L0 與 L1 表極大化的對數概似數值. 在 Ho :  = 0 時,此統計量能服從大樣本df=1 的 2 分配. 一般實務上,概似度函數比檢定比華德檢定可靠. 概似度函數比檢定是比較 = 0 (i.e.強制 (x)在所有x值都相同)時, 對數概似函數最大值 L1 .

  21. 檢定統計量-2 (L0 –L1) 具有df =1 的大樣本 2 分配.

  22. EXHIBIT 10.1: Logistic regression analysis with one categorical variable as the independent variable 1 Response Variable: SUCCESS Number of Observations: 24 Link Function: Logit Response Levels: 2 Response Profile Ordered Value SUCCESS Count 1 1 12 2 2 12

  23. Exhibit 10.1 (continued) 2 Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 21.864 . SC 36.449 24.221 . -2 LOG L 33.271 17.864 15.407 with 1 DF (p=0.0001) Score . . 13.594 with 1 DF (p=0.0002) 2a 2b 2c 2d

  24. Exhibit 10.1 (continued) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -1.7047 0.7687 4.9181 0.0266 . SIZE 4.0073 1.3003 9.4972 0.0021 1.124514 Association of Predicted Probabilities and Observed Responses Concordant = 76.4% Somers' D = 0.750 Discordant = 1.4% Gamma = 0.964 Tied = 22.2% Tau-a = 0.391 (144 pairs) c = 0.875 3 4a

  25. Exhibit 10.1 (continued) 4b Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 10 2 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 11 13 24 Sensitivity= 83.3% Specificity= 91.7% Correct= 87.5% False Positive Rate= 9.1% False Negative Rate= 15.4% NOTE: An EVENT is an outcome whose ordered response value is 1.

  26. Exhibit 10.1 (continued) 5 OBS SUCCESS SIZE PHAT OBS SUCCESS SIZE PHAT 1 1 1 0.90909 13 2 1 0.90909 2 1 1 0.90909 14 2 0 0.15385 3 1 1 0.90909 15 2 0 0.15385 4 1 1 0.90909 16 2 0 0.15385 5 1 1 0.90909 17 2 0 0.15385 6 1 1 0.90909 18 2 0 0.15385 7 1 1 0.90909 19 2 0 0.15385 8 1 1 0.90909 20 2 0 0.15385 9 1 1 0.90909 21 2 0 0.15385 10 1 1 0.90909 22 2 0 0.15385 11 1 0 0.15385 23 2 0 0.15385 12 1 0 0.15385 24 2 0 0.15385

  27. Exhibit 10.2: Contingency Analysis Output TABLE OF SUCCESS BY SIZE SUCCESS SIZE Frequency| Percent | Row Pct | Col Pct | 1| 2| Total -------------+----------+----------+ 1 | 10 | 2 | 12 | 41.67 | 8.33 | 50.00 | 83.33 | 16.67 | | 90.91| 15.38 | -------------+-----------+----------+ 2 | 1 | 11 | 12 | 4.17 | 45.83 | 50.00 | 8.33 | 91.67 | | 9.09 | 84.62 | -------------+-----------+-----------+ Total 11 13 24 45.83 54.17 100.00

  28. Exhibit 10.2 (continued) 1 STATISTICS FOR TABLE OF SUCCESS BY SIZE Statistic DF Value Prob -------------------------------------------------------------------------- Chi-Square 1 13.594 0.000 Likelihood Ratio Chi-Square 1 15.407 0.000 Continuity Adj. Chi-Square 1 10.741 0.001 Statistic Value ASE -------------------------------------------------------------------------- Gamma 0.964 0.046 Kendall's Tau-b 0.753 0.133 Stuart's Tau-c 0.750 0.134 Somers' D C|R 0.750 0.134 Somers' D R|C 0.755 0.132 2

  29. Exhibit 10.3: Logistic regression for categorical and continuous variables Step 0. Intercept entered: Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT 0 0.4082 0.0000 1.0000 . Residual Chi-Square = 16.5512 with 2 DF (p=0.0003) 1 1a

  30. Exhibit 10.3 (continued) 2 Analysis of Variables Not in the Model Score Pr > Variable Chi-Square Chi-square SIZE 13.5944 0.0002 FP 13.8301 0.0002 Step 1. Variable FP entered: Analysis of Variables Not in the Model Score Pr > Variable Chi-Square Chi-Square SIZE 5.0283 0.0249 3 3a

  31. Exhibit 10.3 (continued) 4 Step 2. Variable SIZE entered: Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 17.789 . SC 36.449 21.323 . -2 LOG L 33.271 11.789 21.482 with 2 DF (p=0.0001) Score . . 16.551 with 2 DF (p=0.0003) 4a

  32. Exhibit 10.3 (continued) 4b Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -4.4450 1.8432 5.8159 0.0159 . SIZE 3.0552 1.5981 3.6550 0.0559 0.857342 FP 1.9245 0.9116 4.4570 0.0348 1.139820

  33. Exhibit 10.3 (continued) 4c Association of Predicted Probabilities and Observed Responses Concordant = 95.8% Somers' D = 0.917 Discordant = 4.2% Gamma = 0.917 Tied = 0.0% Tau-a = 0.478 (144 pairs) c = 0.958 NOTE: All explanatory variables have been entered into the model. Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 FP 1 13.8301 . 0.0002 2 SIZE 2 5.0283 . 0.0249 4d

  34. Exhibit 10.3 (continued) 5 Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 9 3 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 10 14 24 Sensitivity= 75.0% Specificity= 91.7% Correct= 83.3% False Positive Rate= 10.0% False Negative Rate= 21.4%

  35. Exhibit 10.3 (continued) 5a NOTE: An EVENT is an outcome whose ordered response value is 1. OBS SUCCESS SIZE FP PHAT OBS SUCCESS SIZE FP PHAT 1 1 1 0.58 0.43202 13 2 1 2.28 0.95248 2 1 1 2.80 0.98199 14 2 0 1.06 0.08278 3 1 1 2.77 0.98094 15 2 0 1.08 0.08575 4 1 1 3.50 0.99525 16 2 0 0.07 0.01325 5 1 1 2.67 0.97699 17 2 0 0.16 0.01572 6 1 1 2.97 0.98695 18 2 0 0.70 0.04319 7 1 1 2.18 0.94297 19 2 0 0.75 0.04735 8 1 1 3.24 0.99220 20 2 0 1.61 0.20641 9 1 1 1.49 0.81421 21 2 0 0.34 0.02208 10 1 1 2.19 0.94400 22 2 0 1.15 0.09692 11 1 0 2.70 0.67939 23 2 0 0.44 0.02664 12 1 0 2.57 0.62265 24 2 0 0.86 0.05787

  36. Exhibit 10.4: Discriminant analysis for data in Table 10.1 1 Canonical Discriminant Functions Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df Sig : 0 .310367 24.570 2 .0000 1* 2.2220 100.00 100.00 .8304 : * Marks the 1 canonical discriminant functions remaining in the analysis. Unstandardized canonical discriminant function coefficients Func 1 SIZE 1.8552118 FP .9162471 (Constant) -2.3834923 2

  37. Exhibit 10.4 (continued) 3 Classification results - No. of Predicted Group Membership Actual Group Cases 1 2 --------------------- ------- -------------- -------------- Group 1 12 11 1 91.7% 8.3% Group 2 12 1 11 8.3% 91.7% Percent of "grouped" cases correctly classified: 91.67%

  38. Exhibit 10.5: Logistic Regression For Mutual Fund Data Stepwise Selection Procedure Criteria for Assessing Model Fit 1 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 190.400 147.711 . SC 193.327 165.275 . -2 LOG L 188.400 135.711 52.689 with 5 DF (p=0.0001) Score . . 44.034 with 5 DF (p=0.0001) NOTE: All explanatory variables have been entered into the model. 1a 1b

  39. Exhibit 10.5 (continued) Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 YIELD 1 21.0379 . 0.0001 2 TOTRET 2 11.9103 . 0.0006 3 SIZE 3 8.5928 . 0.0034 4 SCHARGE 4 4.1344 . 0.0420 5 EXPENRAT 5 5.5516 . 0.0185 2

  40. Exhibit 10.5 (continued) 3 Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -2.5902 1.2642 4.1981 0.0405 . SIZE 0.8542 0.4773 3.2020 0.0735 0.236320 SCHARGE -0.1394 0.0589 5.6088 0.0179 -0.302154 EXPENRAT - 1.4361 0.6793 4.4699 0.0345 -0.321113 TOTRET 0.8090 0.2509 10.3988 0.0013 0.402480 YIELD 0.0553 0.0124 19.9669 0.0001 0.694773

  41. Exhibit 10.5 (continued) Association of Predicted Probabilities and Observed Responses Concordant = 85.5% Somers' D = 0.711 Discordant = 14.4% Gamma = 0.712 Tied = 0.1% Tau-a = 0.351 (4661 pairs) c = 0.856 4

  42. Exhibit 10.5 (continued) 5 Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT ] 45 14 ] 59 Observed ] ] NO EVENT ] 12 67 ] 79 +---------------------+ Total 57 81 138 Sensitivity= 76.3% Specificity= 84.8% Correct= 81.2% False Positive Rate= 21.1% False Negative Rate= 17.3%

More Related