420 likes | 624 Views
Ch10 Logistic Regression. 迴歸分析 用於描述一應變數與一個( ) 的預測變數之關係. 必須滿足的假設: 常態性( 獨立變數並非常態性的假設 ) 變異數的均質性 獨立性. 迴歸分析之功用: 預測(給 x 求 y ) 控制(給 y 求 x ) 描述. Logistic Regression An Introduction to Categorical data Analysis---Alan Agresti, 1996
E N D
迴歸分析 • 用於描述一應變數與一個( ) 的預測變數之關係. • 必須滿足的假設: • 常態性(獨立變數並非常態性的假設) • 變異數的均質性 • 獨立性
迴歸分析之功用: • 預測(給x求 y) • 控制(給y求 x) • 描述
Logistic Regression • An Introduction to Categorical data Analysis---Alan Agresti, 1996 • 當區別分析的群體中,不符合常態分配假設時,可用(logistic Regression)來做. • Logistic Regression並非預測事件是否發生.而是預測該事件的機率. • 當應變數(x)屬於離散型的變數,其分類只有2類或少數時,以logistic Regression來分析.
Logistic Regression • 能討論類別,定量的自變數對一類別的關係. • 進行消費者問卷調查時, 獲得消費者行為的質性分類資料(會不會投資,購買意願, 發生與未發生等)並獲得影響此分類資料的原因(年紀,收入,產地,經濟景氣,氣候與偏好) • 當應變數有兩個或 屬直性之變數時,用logistic or Probit來分析較適當.
Logistic Regression • 二元資料的廣義線性模式(Binary data) • 很多類別的反應變數只有兩類: 投票 (民主黨vs 共和黨) 汽車的選擇 (進口車vs 國產車) 婦女是否有乳癌的診斷 (無vs 有) 以 Y表二元反應P(Y=1) = 成功 P(Y=0) = 1 - 失敗
二元反應亦稱伯努利變數(Bernoulli Variable)其 分佈由成功機率與失敗機率所訂.此分佈 平均數 E(Y) = 變異數 Var(Y) = 1- 若一參數的二元反應有幾個獨立觀測值, 則成功數服從具有指標n及的二項分配
Logical regression function P = ef(x) 1 – P = 成功的機率 (非線性) 1 + ef(x) 1 + ef(x) 1 失敗的機率(非線性) P ef(x) = 優勢比 1 – P P 1 - P ln ( ) = f(x) = 0+ 1x1+ 2 x2 +..
經過轉換而成具 有線性的性質 = + log ( ) 1- • (x) 與x 的非線性關係是單調的(monotonic) • (x) 隨著x 的增加而連續地遞增 or • (x) 隨著x 的增加而連續地遞減 (a) (b) 1 1 (x) (x) x x > 0 <0
參數 決定曲線上升或下降的速度. 當 > 0. (x)隨x 之增加而增加 如(a) 當 < 0. (x)隨x 之增加而減少 如(b) 當 =0.曲線便成水平線.此時(x)對x而言是常數. Y與x成獨立.
> 0 logit curve最陡處 由圖(a) 在特定的x值做一切線,描述該點的變化率以參數的logistical regression來討論該點斜率 m = (x)( 1 - (x)) (a) 1 Ex:if (x) =0.5 m= (0.5)(0.5)=0.25 when (x) =1 m= 0 0.5 x
> 0 logit curve最陡處 曲線最陡處發生在 (x) =0.5對應的x處.其x = - (a) (x) 1- (x) 1 0.5 x = - log 1 = + x + x =0 x Ex: log ( ) = + x log(0.5/0.5) = + x
Odds Ratio Interpretation (優勝比的解釋) odds vs. the odds ratio: 勝算 vs.勝算比 = exp ( + ) = e (e )x (x) (x) 1- (x) 1- (x) 此式提供一個解釋: 勝算在x增加一單位時,有依倍數的增加效應(e ) 勝算對數log = + x即(x)的logical變換,具線性關係. i.e. x的每一單位改變導致logical值單位的增減.
logical regression 優於其它機率值的原因: (針對個案對照組的原因cas-control studies) 針對回朔抽樣設計(retrospective sampling design) Ex:個案對照研究 Y=1 反應(cases) 觀察二組樣本若個案與 Y=0 對照案(controls) 對照有差異的分佈.表示x與Y之間有存在關聯 logical regression涉及(odds & the odds ratio )勝算比.可配適此種模型於回朔資料,並估計個案與對照案的效應.
Inference for logical regression:效應的信賴區間 探討模型參數的統計理論 協助評斷效應的顯著性與其大小. 針對大樣本 log = + x 中 的信賴區間為 (x) 1- (x) + Z (ASE) 2 此區間端點取指數: e 因x一單位增加 對勝算的倍數效應之對應區間
=0.497 而ASE = 0.102 Sol: = 0.497 1.96 (0.102) = ( 0.298, 0.697) + Z (ASE) 2 ASE (Asymptotic Standard Error) 漸進標準誤 Ex: 探討雌蟹寬度(gap)是否存在跟班? (Y=1有Y=0無,預測有跟班的雌蟹數目) 因 的一個95%信賴區間為: 推論: 寬度每增加一公分,至少提高有跟班的勝算35%最高能提高一倍.
Logical regression significance testing(顯著性檢定) Ho : = 0表示成功機率和x 無關 Ha : ‡ 0表示成功機率和x 有關 在 = 0 時 具標準常態分配 ( 可取得單或雙尾) 在 ‡ 0時,z2 具 df=1 的 2 分配 p 值: 超過觀測值的 2 分配 ---右尾機率 在大樣本,檢定統計量為 此參數估計除以其標準誤後取平方.稱為華德統計量(Wald Statistics) + Z (ASE) 2
模型推論與檢核的另一種方法:使用概似函數比模型推論與檢核的另一種方法:使用概似函數比 在下列二種情況下取最大, 再求比率. 1) 在H0限制下,參數所有可能值範圍內求極大. 2)在全模型限制下, H0或H1成立均可.參數所有可 能值範圍內求極大. 令 l 1 : 全模型限制下概似函數的最大值. l2 : H0之較簡單模型限制下的最大值.
Ex: 線性預測 + x 之 Ho : = 0 Ha : ‡ 0 則l 0 : 在 = 0時,概似函數於最像會產生所見資料 的 值. l 1 : 概似函數在看起來最像會產生所見到的資料 (, )組合起來. 其中, l 0 是在產生l 1 範圍之ㄧ個限制子集合上之最大值. 所以 l 1 至少與l 0 一樣大.
Likelihood-ratio 檢定統計量 -2log (l 0 / l 1 ) = -2 [ log (l 0 ) - log (l 1) ] = -2 [ L 0 -L 1 ] L0 與 L1 表極大化的對數概似數值. 在 Ho : = 0 時,此統計量能服從大樣本df=1 的 2 分配. 一般實務上,概似度函數比檢定比華德檢定可靠. 概似度函數比檢定是比較 = 0 (i.e.強制 (x)在所有x值都相同)時, 對數概似函數最大值 L1 .
EXHIBIT 10.1: Logistic regression analysis with one categorical variable as the independent variable 1 Response Variable: SUCCESS Number of Observations: 24 Link Function: Logit Response Levels: 2 Response Profile Ordered Value SUCCESS Count 1 1 12 2 2 12
Exhibit 10.1 (continued) 2 Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 21.864 . SC 36.449 24.221 . -2 LOG L 33.271 17.864 15.407 with 1 DF (p=0.0001) Score . . 13.594 with 1 DF (p=0.0002) 2a 2b 2c 2d
Exhibit 10.1 (continued) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -1.7047 0.7687 4.9181 0.0266 . SIZE 4.0073 1.3003 9.4972 0.0021 1.124514 Association of Predicted Probabilities and Observed Responses Concordant = 76.4% Somers' D = 0.750 Discordant = 1.4% Gamma = 0.964 Tied = 22.2% Tau-a = 0.391 (144 pairs) c = 0.875 3 4a
Exhibit 10.1 (continued) 4b Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 10 2 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 11 13 24 Sensitivity= 83.3% Specificity= 91.7% Correct= 87.5% False Positive Rate= 9.1% False Negative Rate= 15.4% NOTE: An EVENT is an outcome whose ordered response value is 1.
Exhibit 10.1 (continued) 5 OBS SUCCESS SIZE PHAT OBS SUCCESS SIZE PHAT 1 1 1 0.90909 13 2 1 0.90909 2 1 1 0.90909 14 2 0 0.15385 3 1 1 0.90909 15 2 0 0.15385 4 1 1 0.90909 16 2 0 0.15385 5 1 1 0.90909 17 2 0 0.15385 6 1 1 0.90909 18 2 0 0.15385 7 1 1 0.90909 19 2 0 0.15385 8 1 1 0.90909 20 2 0 0.15385 9 1 1 0.90909 21 2 0 0.15385 10 1 1 0.90909 22 2 0 0.15385 11 1 0 0.15385 23 2 0 0.15385 12 1 0 0.15385 24 2 0 0.15385
Exhibit 10.2: Contingency Analysis Output TABLE OF SUCCESS BY SIZE SUCCESS SIZE Frequency| Percent | Row Pct | Col Pct | 1| 2| Total -------------+----------+----------+ 1 | 10 | 2 | 12 | 41.67 | 8.33 | 50.00 | 83.33 | 16.67 | | 90.91| 15.38 | -------------+-----------+----------+ 2 | 1 | 11 | 12 | 4.17 | 45.83 | 50.00 | 8.33 | 91.67 | | 9.09 | 84.62 | -------------+-----------+-----------+ Total 11 13 24 45.83 54.17 100.00
Exhibit 10.2 (continued) 1 STATISTICS FOR TABLE OF SUCCESS BY SIZE Statistic DF Value Prob -------------------------------------------------------------------------- Chi-Square 1 13.594 0.000 Likelihood Ratio Chi-Square 1 15.407 0.000 Continuity Adj. Chi-Square 1 10.741 0.001 Statistic Value ASE -------------------------------------------------------------------------- Gamma 0.964 0.046 Kendall's Tau-b 0.753 0.133 Stuart's Tau-c 0.750 0.134 Somers' D C|R 0.750 0.134 Somers' D R|C 0.755 0.132 2
Exhibit 10.3: Logistic regression for categorical and continuous variables Step 0. Intercept entered: Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT 0 0.4082 0.0000 1.0000 . Residual Chi-Square = 16.5512 with 2 DF (p=0.0003) 1 1a
Exhibit 10.3 (continued) 2 Analysis of Variables Not in the Model Score Pr > Variable Chi-Square Chi-square SIZE 13.5944 0.0002 FP 13.8301 0.0002 Step 1. Variable FP entered: Analysis of Variables Not in the Model Score Pr > Variable Chi-Square Chi-Square SIZE 5.0283 0.0249 3 3a
Exhibit 10.3 (continued) 4 Step 2. Variable SIZE entered: Criteria for Assessing Model Fit Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 35.271 17.789 . SC 36.449 21.323 . -2 LOG L 33.271 11.789 21.482 with 2 DF (p=0.0001) Score . . 16.551 with 2 DF (p=0.0003) 4a
Exhibit 10.3 (continued) 4b Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -4.4450 1.8432 5.8159 0.0159 . SIZE 3.0552 1.5981 3.6550 0.0559 0.857342 FP 1.9245 0.9116 4.4570 0.0348 1.139820
Exhibit 10.3 (continued) 4c Association of Predicted Probabilities and Observed Responses Concordant = 95.8% Somers' D = 0.917 Discordant = 4.2% Gamma = 0.917 Tied = 0.0% Tau-a = 0.478 (144 pairs) c = 0.958 NOTE: All explanatory variables have been entered into the model. Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 FP 1 13.8301 . 0.0002 2 SIZE 2 5.0283 . 0.0249 4d
Exhibit 10.3 (continued) 5 Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT | 9 3 | 12 Observed | | NO EVENT | 1 11 | 12 +---------------------+ Total 10 14 24 Sensitivity= 75.0% Specificity= 91.7% Correct= 83.3% False Positive Rate= 10.0% False Negative Rate= 21.4%
Exhibit 10.3 (continued) 5a NOTE: An EVENT is an outcome whose ordered response value is 1. OBS SUCCESS SIZE FP PHAT OBS SUCCESS SIZE FP PHAT 1 1 1 0.58 0.43202 13 2 1 2.28 0.95248 2 1 1 2.80 0.98199 14 2 0 1.06 0.08278 3 1 1 2.77 0.98094 15 2 0 1.08 0.08575 4 1 1 3.50 0.99525 16 2 0 0.07 0.01325 5 1 1 2.67 0.97699 17 2 0 0.16 0.01572 6 1 1 2.97 0.98695 18 2 0 0.70 0.04319 7 1 1 2.18 0.94297 19 2 0 0.75 0.04735 8 1 1 3.24 0.99220 20 2 0 1.61 0.20641 9 1 1 1.49 0.81421 21 2 0 0.34 0.02208 10 1 1 2.19 0.94400 22 2 0 1.15 0.09692 11 1 0 2.70 0.67939 23 2 0 0.44 0.02664 12 1 0 2.57 0.62265 24 2 0 0.86 0.05787
Exhibit 10.4: Discriminant analysis for data in Table 10.1 1 Canonical Discriminant Functions Pct of Cum Canonical After Wilks' Fcn Eigenvalue Variance Pct Corr Fcn Lambda Chi-square df Sig : 0 .310367 24.570 2 .0000 1* 2.2220 100.00 100.00 .8304 : * Marks the 1 canonical discriminant functions remaining in the analysis. Unstandardized canonical discriminant function coefficients Func 1 SIZE 1.8552118 FP .9162471 (Constant) -2.3834923 2
Exhibit 10.4 (continued) 3 Classification results - No. of Predicted Group Membership Actual Group Cases 1 2 --------------------- ------- -------------- -------------- Group 1 12 11 1 91.7% 8.3% Group 2 12 1 11 8.3% 91.7% Percent of "grouped" cases correctly classified: 91.67%
Exhibit 10.5: Logistic Regression For Mutual Fund Data Stepwise Selection Procedure Criteria for Assessing Model Fit 1 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 190.400 147.711 . SC 193.327 165.275 . -2 LOG L 188.400 135.711 52.689 with 5 DF (p=0.0001) Score . . 44.034 with 5 DF (p=0.0001) NOTE: All explanatory variables have been entered into the model. 1a 1b
Exhibit 10.5 (continued) Summary of Stepwise Procedure Variable Number Score Wald Pr > Step Entered Removed In Chi-Square Chi-Square Chi-Square 1 YIELD 1 21.0379 . 0.0001 2 TOTRET 2 11.9103 . 0.0006 3 SIZE 3 8.5928 . 0.0034 4 SCHARGE 4 4.1344 . 0.0420 5 EXPENRAT 5 5.5516 . 0.0185 2
Exhibit 10.5 (continued) 3 Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable Estimate Error Chi-Square Chi-Square Estimate INTERCPT -2.5902 1.2642 4.1981 0.0405 . SIZE 0.8542 0.4773 3.2020 0.0735 0.236320 SCHARGE -0.1394 0.0589 5.6088 0.0179 -0.302154 EXPENRAT - 1.4361 0.6793 4.4699 0.0345 -0.321113 TOTRET 0.8090 0.2509 10.3988 0.0013 0.402480 YIELD 0.0553 0.0124 19.9669 0.0001 0.694773
Exhibit 10.5 (continued) Association of Predicted Probabilities and Observed Responses Concordant = 85.5% Somers' D = 0.711 Discordant = 14.4% Gamma = 0.712 Tied = 0.1% Tau-a = 0.351 (4661 pairs) c = 0.856 4
Exhibit 10.5 (continued) 5 Classification Table Predicted EVENT NO EVENT Total +---------------------+ EVENT ] 45 14 ] 59 Observed ] ] NO EVENT ] 12 67 ] 79 +---------------------+ Total 57 81 138 Sensitivity= 76.3% Specificity= 84.8% Correct= 81.2% False Positive Rate= 21.1% False Negative Rate= 17.3%