640 likes | 987 Views
Stata 教學. 第四講 兩個樣本之間的比較. 打開 85q1family.dta 這個社會變遷基本資料調查第三期第二次家庭的 Stata 資料檔 因為中文相容性問題有一些亂碼,辨識不易 可以打開 85q1_format.txt 看變數名稱以及變數值名稱 以 j2 、 j3 為例 j2 問受訪者「拾 .2. 通常您平均每週大約花多少時間做家務工作? _______ 小時」 j3 問受訪者「拾 .3. 通常您的配偶平均每週大約花多少時間做家務工作? _______ 小時」. 我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼 查看是否有亂碼?
E N D
Stata教學 第四講 兩個樣本之間的比較 社會統計
打開85q1family.dta這個社會變遷基本資料調查第三期第二次家庭的Stata資料檔打開85q1family.dta這個社會變遷基本資料調查第三期第二次家庭的Stata資料檔 • 因為中文相容性問題有一些亂碼,辨識不易 • 可以打開85q1_format.txt看變數名稱以及變數值名稱 • 以j2、j3為例 • j2問受訪者「拾.2.通常您平均每週大約花多少時間做家務工作?_______ 小時」 • j3問受訪者「拾.3.通常您的配偶平均每週大約花多少時間做家務工作?_______小時」 社會統計
我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼 • 查看是否有亂碼? • Data-data editor • 在j2這個變數名稱上click一下,下面一整欄的數值都反白了 • 滑鼠右鍵-variable-properties-label • 出現的中文是「通常您平均牢週大約花多少時間做家務工作︺」 • 把亂碼改好 • 也將j3變數標籤的亂碼改好 社會統計
查看變數有無異常值 • 關掉Data editor視窗 • 用box plot來看有無極端值 • Graphics-easy graphs-box plot-main-在variable的空格裡鍵入j2 社會統計
用box plot來看有無極端值 社會統計
同樣方法也可以查看j3的極端值 • 也可以直接在指令欄 社會統計
這就是指令欄 社會統計
在指令欄裡直接鍵入 • Graph box j2 • 然後按enter 社會統計
Summarize varname, detail • 指令欄鍵入summarize j2, detail • 或statistics-summaries, tables, &tests-summary statistics-summary statistics 社會統計
. 通常您平均每週大約花多少時間做家務工作? ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1924 25% 2 0 Sum of Wgt. 1924 50% 7 Mean 50.32692 Largest Std. Dev. 191.1342 75% 20 998 90% 35 998 Variance 36532.28 95% 70 998 Skewness 4.717707 99% 996 999 Kurtosis 23.40378 太愛做家事了吧! 高得不合理 社會統計
Recode極端值 • 我們到85q1_format.txt去看,發現 • J2 J3 996"不知道" 998"不適用" 999"拒答" • 所以要把995以上定義為system missing • Recode j2 995/max=. • 這裡的句點.就是Stata系統定義的缺失值。 社會統計
一週只有168小時,所以應該合理換算,以一天16小時算,一週112小時一週只有168小時,所以應該合理換算,以一天16小時算,一週112小時 . summarize j2, detail 通常您平均每週大約花多少時間做家務工作? ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1849 25% 2 0 Sum of Wgt. 1849 50% 7 Mean 11.96106 Largest Std. Dev. 15.30762 75% 15 105 90% 28 112 Variance 234.3232 95% 36 168 Skewness 3.208555 99% 70 168 Kurtosis 20.90302
用inspect來看大致分佈以及缺失個案數Data-describe data-inspect variables . inspect j2 j2: 通常您平均每週大約花多少時間做家務工作 Number of Observations ------------------------------------------- ------------------------------ Total Integers Nonintegers | # Negative - - - | # Zero 305 305 - | # Positive 1544 1544 - | # ----- ----- ----- | # Total 1849 1849 - | # . . . . Missing 75 +---------------------- ----- 0 168 1924 (47 unique values)
Recode j2 168=112 社會統計
. inspect j2 j2: 通常您平均每週大約花多少時間做家務工作 Number of Observations ------------------------------------------- ------------------------------ Total Integers Nonintegers | # Negative - - - | # Zero 305 305 - | # Positive 1544 1544 - | # ----- ----- ----- | # Total 1849 1849 - | # . . . . Missing 75 +---------------------- ----- 0 112 1924 (46 unique values)
. sum j2, detail 通常您平均每週大約花多少時間做家務工作? ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1849 25% 2 0 Sum of Wgt. 1849 50% 7 Mean 11.90049 Largest Std. Dev. 14.79188 75% 15 105 90% 28 112 Variance 218.7996 95% 36 112 Skewness 2.632377 99% 70 112 Kurtosis 12.87359
. inspect j3 j3: 通常您的配偶平均每週大約花多少時間做家 Number of Observations ------------------------------------------- ------------------------------ Total Integers Nonintegers | # Negative - - - | # Zero 263 263 - | # Positive 1661 1661 - | # ----- ----- ----- | # # Total 1924 1924 - | # . . . # Missing - +---------------------- ----- 0 999 1924 (54 unique values)
. summarize j3, detail 通常您的配偶平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1924 25% 4 0 Sum of Wgt. 1924 50% 14 Mean 278.8342 Largest Std. Dev. 436.2336 75% 996 998 90% 998 999 Variance 190299.7 95% 998 999 Skewness 1.03888 99% 998 999 Kurtosis 2.085666
Missing value & recode • Recode j3 990/max=. • Recode j3 168=112 社會統計
. recode j3 168=112 (j3: 4 changes made) . inspect j3 j3: 通常您的配偶平均每週大約花多少時間做家 Number of Observations ------------------------------------------- ------------------------------ Total Integers Nonintegers | # Negative - - - | # Zero 263 263 - | # Positive 1144 1144 - | # ----- ----- ----- | # Total 1407 1407 - | # . . . . Missing 517 +---------------------- ----- 0 150 1924 (50 unique values)
. summarize j3, detail 通常您的配偶平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1407 25% 2 0 Sum of Wgt. 1407 50% 7 Mean 14.49893 Largest Std. Dev. 18.2296 75% 21 112 90% 35 112 Variance 332.3185 95% 49 150 Skewness 2.569526 99% 85 150 Kurtosis 12.65059
Recode j3 112/max=112 • Tabulate j3 社會統計
70 | 10 0.71 98.29 80 | 3 0.21 98.51 84 | 6 0.43 98.93 85 | 1 0.07 99.00 90 | 1 0.07 99.08 98 | 4 0.28 99.36 100 | 1 0.07 99.43 105 | 1 0.07 99.50 112 | 7 0.50 100.00 ------------+----------------------------------- Total | 1,407 100.00 社會統計
來看看男女的差別 • A1.這題是性別,男是1,女是2。 • Data-data editor-找的A1這個變數-滑鼠右鍵 • Variable-properties-label改成性別 • Value label-define/modify-define-label name • 輸入gender-OK-value鍵入1-text鍵入男-OK • value鍵入1-text鍵入男-OK-cancel-close-value label選擇gender-OK • 關掉Data editor視窗 社會統計
男女的家務分擔是否有不同? • Statistics-Summaries, tables, & tests-tables-One/Two-way table of summary statistics 自變數 依變數 社會統計
| Summary of | 通常您平均每週大約花多少時間做家務工作 | 性別 | Mean Std. Dev. Freq. ------------+------------------------------------ 男 | 6.0485537 10.23684 968 女 | 18.330306 16.287017 881 ------------+------------------------------------ Total | 11.900487 14.791877 1849 差別很大嗎? 社會統計
母體變異數未知但已知相等 • Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 自變數 依變數 信賴水準 社會統計
. ttest j2, by(a1) level(99) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 968 6.048554 .3290245 10.23684 5.199367 6.897741 女 | 881 18.33031 .5487235 16.28702 16.91382 19.7468 ---------+-------------------------------------------------------------------- combined | 1849 11.90049 .3439971 14.79188 11.01349 12.78748 ---------+-------------------------------------------------------------------- diff | -12.28175 .6268771 -13.89815 -10.66535 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -19.5920 Ho: diff = 0 degrees of freedom = 1847 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
母體變異數未知但已知不相等 • 以上的方法是假設母體變異數未知但已知相等。 • 不管樣本大小,統計軟體一般用t檢定 • 那如果母體變異數未知但已知不相等,怎麼辦? 社會統計
母體變異數未知但已知不相等 • Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 自由度需要比較複雜,由Welch提出的運算方式 變異數不相等 社會統計
男女性負擔家務工作時數的差異,在母體變異數未知但已知不等的情況下男女性負擔家務工作時數的差異,在母體變異數未知但已知不等的情況下 . ttest j2, by(a1) unequal welch level(99) Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 968 6.048554 .3290245 10.23684 5.199367 6.897741 女 | 881 18.33031 .5487235 16.28702 16.91382 19.7468 ---------+-------------------------------------------------------------------- combined | 1849 11.90049 .3439971 14.79188 11.01349 12.78748 ---------+-------------------------------------------------------------------- diff | -12.28175 .6398083 -13.93195 -10.63155 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -19.1960 Ho: diff = 0 Welch's degrees of freedom = 1456.62 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 社會統計
變異數相等與否的Levene檢定 • Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group variance comparison tests 自變數 依變數 社會統計
變異數相等與否的Levene檢定 . sdtest j2, by(a1) level(99) Variance ratio test ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 968 6.048554 .3290245 10.23684 5.199367 6.897741 女 | 881 18.33031 .5487235 16.28702 16.91382 19.7468 ---------+-------------------------------------------------------------------- combined | 1849 11.90049 .3439971 14.79188 11.01349 12.78748 ------------------------------------------------------------------------------ ratio = sd(男) / sd(女) f = 0.3950 Ho: ratio = 1 degrees of freedom = 967, 880 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 0.0000 2*Pr(F < f) = 0.0000 Pr(F > f) = 1.0000 sd(男) / sd(女)不等於一,p值顯示可以拒斥變異數相等的虛無假設 社會統計
根據Levene檢定的結果,選擇變異數不相等的假設比較正確。根據Levene檢定的結果,選擇變異數不相等的假設比較正確。 • 也就是男性分擔家務的時數顯著地少於女性。 社會統計
已婚未婚者的家務工作負擔的比較 • A5為受訪者的婚姻狀況 • 1為未婚,2為已婚,3為其他 • 已婚者家務負擔比較大嗎? 社會統計
已婚未婚者的家務工作負擔的比較 • 仿照男女的比較 • 得到如下的錯誤回報 • . ttest j2, by(a5) level(99) • more than 2 groups found, only 2 allowed • r(420); • 這是因為a5這個變數有三個變數值:未婚、已婚和其他 • 要用條件是來限制,僅比較未婚者和已婚者 社會統計
Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 社會統計
變異數相等 . ttest j2 if a5!=3, by(a5) level(99) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 未婚 | 306 5.598039 .5156249 9.019752 4.261516 6.934562 已婚 | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ---------+-------------------------------------------------------------------- combined | 1837 11.87262 .3434793 14.7216 10.98695 12.75828 ---------+-------------------------------------------------------------------- diff | -7.528675 .9051995 -9.862742 -5.194608 ------------------------------------------------------------------------------ diff = mean(未婚) - mean(已婚) t = -8.3171 Ho: diff = 0 degrees of freedom = 1835 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 社會統計
變異數不相等 . ttest j2 if a5!=3, by(a5) unequal welch level(99) Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 未婚 | 306 5.598039 .5156249 9.019752 4.261516 6.934562 已婚 | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ---------+-------------------------------------------------------------------- combined | 1837 11.87262 .3434793 14.7216 10.98695 12.75828 ---------+-------------------------------------------------------------------- diff | -7.528675 .6472826 -9.20044 -5.85691 ------------------------------------------------------------------------------ diff = mean(未婚) - mean(已婚) t = -11.6312 Ho: diff = 0 Welch's degrees of freedom = 712.885 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 社會統計
Levene檢定 無法拒斥變異數相等的虛無假設 . sdtest j2 if a5!=3, by(a5) level(99) Variance ratio test ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 未婚 | 306 5.598039 .5156249 9.019752 4.261516 6.934562 已婚 | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ---------+-------------------------------------------------------------------- combined | 1837 11.87262 .3434793 14.7216 10.98695 12.75828 ------------------------------------------------------------------------------ ratio = sd(未婚) / sd(已婚) f = 0.3471 Ho: ratio = 1 degrees of freedom = 305, 1530 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 0.0000 2*Pr(F < f) = 0.0000 Pr(F > f) = 1.0000 社會統計
兩層群體的比較 • 已婚男女間,未婚男女間是否有差異? • 婚姻是否不利於女性(至少就花在家務勞動上的時間而言)? 社會統計
變異數相等 • Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 社會統計
多重比較變異數相等 . by a5, sort : ttest j2 if a5!=3, by(a1) level(99) ------------------------------------------------------------------------------------------- -> a5 = 未婚 Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 177 5.316384 .7992975 10.63396 3.234972 7.397796 女 | 129 5.984496 .5435252 6.173259 4.563295 7.405698 ---------+-------------------------------------------------------------------- combined | 306 5.598039 .5156249 9.019752 4.261516 6.934562 ---------+-------------------------------------------------------------------- diff | -.6681119 1.04519 -3.377347 2.041123 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -0.6392 Ho: diff = 0 degrees of freedom = 304 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2616 Pr(|T| > |t|) = 0.5232 Pr(T > t) = 0.7384
多重比較變異數相等 -> a5 = 已婚 Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 784 6.095663 .3493023 9.780465 5.193722 6.997605 女 | 747 20.50602 .6054935 16.54893 18.94238 22.06967 ---------+-------------------------------------------------------------------- combined | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ---------+-------------------------------------------------------------------- diff | -14.41036 .6909184 -16.19227 -12.62845 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -20.8568 Ho: diff = 0 degrees of freedom = 1529 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
多重比較變異數不相等 . by a5, sort : ttest j2 if a5!=3, by(a1) unequal welch level(99) ------------------------------------------------------------------------------------------- -> a5 = 未婚 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 177 5.316384 .7992975 10.63396 3.234972 7.397796 女 | 129 5.984496 .5435252 6.173259 4.563295 7.405698 ---------+-------------------------------------------------------------------- combined | 306 5.598039 .5156249 9.019752 4.261516 6.934562 ---------+-------------------------------------------------------------------- diff | -.6681119 .96659 -3.174232 1.838008 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -0.6912 Ho: diff = 0 Welch's degrees of freedom = 292.466 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2450 Pr(|T| > |t|) = 0.4900 Pr(T > t) = 0.7550
多重比較變異數不相等 -> a5 = 已婚 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 784 6.095663 .3493023 9.780465 5.193722 6.997605 女 | 747 20.50602 .6054935 16.54893 18.94238 22.06967 ---------+-------------------------------------------------------------------- combined | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ---------+-------------------------------------------------------------------- diff | -14.41036 .699024 -16.2138 -12.60693 ------------------------------------------------------------------------------ diff = mean(男) - mean(女) t = -20.6150 Ho: diff = 0 Welch's degrees of freedom = 1199.87 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
多層次比較變異數相等檢定 . by a5, sort : sdtest j2 if a5!=3, by(a1) level(99) ------------------------------------------------------------------------------------------- -> a5 = 未婚 Variance ratio test ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 177 5.316384 .7992975 10.63396 3.234972 7.397796 女 | 129 5.984496 .5435252 6.173259 4.563295 7.405698 ---------+-------------------------------------------------------------------- combined | 306 5.598039 .5156249 9.019752 4.261516 6.934562 ------------------------------------------------------------------------------ ratio = sd(男) / sd(女) f = 2.9673 Ho: ratio = 1 degrees of freedom = 176, 128 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 1.0000 2*Pr(F > f) = 0.0000 Pr(F > f) = 0.0000
多層次比較變異數相等檢定 -> a5 = 已婚 Variance ratio test ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] ---------+-------------------------------------------------------------------- 男 | 784 6.095663 .3493023 9.780465 5.193722 6.997605 女 | 747 20.50602 .6054935 16.54893 18.94238 22.06967 ---------+-------------------------------------------------------------------- combined | 1531 13.12671 .3912873 15.31029 12.11757 14.13586 ------------------------------------------------------------------------------ ratio = sd(男) / sd(女) f = 0.3493 Ho: ratio = 1 degrees of freedom = 783, 746 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 0.0000 2*Pr(F < f) = 0.0000 Pr(F > f) = 1.0000
Box Plot箱型圖的比較 社會統計
單身男性和已婚男性是否有差別? • 單身女性和已婚女性是否有差別? 社會統計