統計學 : 應用與進階第 13 章 : 假設檢定

統計學: 應用與進階 第13 章: 假設檢定

假設檢定的基本觀念 • 如何執行假設檢定? • 假設檢定程序 • 檢定的p-值 • 誤差機率與檢定力 • 檢定力函數

Nonstatistical Hypothesis Testing A criminal trial is an example of hypothesis testing without the statistics. In a trial a jury must decide between two hypotheses. The null hypothesis is H0: The defendant is innocent The alternative hypothesis or research hypothesis is H1: The defendant is guilty The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented.

Nonstatistical Hypothesis Testing In the language of statistics convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis).

Nonstatistical Hypothesis Testing If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis. Notice that the jury is not saying that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis.

Nonstatistical Hypothesis Testing There are two possible errors. A Type I error occurs when we reject a true null hypothesis. That is, a Type I error occurs when the jury convicts an innocent person. A Type II error occurs when we don’t reject a false null hypothesis. That occurs when a guilty defendant is acquitted.

Nonstatistical Hypothesis Testing The probability of a Type I error is denoted as α (Greek letter alpha). The probability of a type II error is β (Greek letter beta). The two probabilities are inversely related. Decreasing one increases the other.

Nonstatistical Hypothesis Testing 5. Two possible errors can be made. Type I error: Reject a true null hypothesis Type II error: Do not reject a false null hypothesis. P(Type I error) = α P(Type II error) = β

H0 Test Actual Situation Decision H0 True H0 False Type II Accept 1 –  Error H0 () Type I Power Reject Error () H0 (1 – ) Decision Results H0: Innocent Jury Trial Actual Situation Verdict Innocent Guilty Innocent Correct Error Guilty Error Correct

Nonstatistical Hypothesis Testing In our judicial system Type I errors are regarded as more serious. We try to avoid convicting innocent people. We are more willing to acquit guilty people. We arrange to make α small by requiring the prosecution to prove its case and instructing the jury to find the defendant guilty only if there is “evidence beyond a reasonable doubt.”

Nonstatistical Hypothesis Testing The critical concepts are theses: 1. There are two hypotheses, the null and the alternative hypotheses. 2. The procedure begins with the assumption that the null hypothesis is true. 3. The goal is to determine whether there is enough evidence to infer that the alternative hypothesis is true. 4. There are two possible decisions: Conclude that there is enough evidence to support the alternative hypothesis. Conclude that there is notenough evidence to support the alternative hypothesis.

假設檢定 • 假設(hypothesis) 就是我們對於母體參數的宣稱 • 我宣稱本校學生的平均身高為166 公分, • 我宣稱本校學生的平均智商為130 分 • 假設檢定(hypothesis testing) 的目的就是要對這些宣稱提供統計上的檢驗, 以統計的檢定方法來推論假設的真偽 • 決策為拒絕(reject) 該假設, 或是無法拒絕(fail to reject) 該假設

假設檢定 • 對於未知的母體參數, 我們可以有各式各樣不同的假設, 舉例來說, [H1 ] [H2 ] [H3 ] 其中, 與分別代表某固定常數

虛無假設與對立假設 • 在假設檢定中, 我們考慮兩個互斥的假設: • 虛無假設(null hypothesis) 就是研究者所要檢定的假設, 一般以H0 的符號代表。 • 對立假設(alternative hypothesis) 就是與虛無假設完全相反的假設, 如果虛無假設不成立, 則對立假設就為真, 一般以H1 或是HA 的符號代表。

虛無假設 • What is tested • Has serious outcome if incorrect decision made • Always has equality sign: , , or  • Designated H0 (pronounced H-oh)

對立假設 • Opposite of null hypothesis • Always has inequality sign:,, or  • Designated Ha

虛無假設與對立假設 • 舉例來說, 若虛無假設為 • 則對立假設可以是

簡單假設vs. 複合假設 • 如果假設中, 僅包含一個特定的假設值, 如 μ = 166, 則該假設稱作簡單假設(simple hypothesis) • 假設中, 可能的參數假設值不只一個, 則該假設稱為複合假設(composite hypothesis), 如 μ > 166 • 通常將虛無假設以簡單假設的方式呈現, 而對立假設則為複合假設

Example • Test that the population mean is not 3 • Steps: • State the question statistically ( 3) • State the opposite statistically ( = 3) • Must be mutually exclusive & exhaustive • Select the alternative hypothesis ( 3) • Has the , <, or > sign • State the null hypothesis ( = 3)

Example • Is the population average amount of TV viewing different from 12 hours? • Steps • State the question statistically:  = 12 • State the opposite statistically:  12 • Select the alternative hypothesis: Ha:  12 • State the null hypothesis: H0:  = 12

Example • Is the average cost per hat less than or equal to $20? • Steps • State the question statistically:  20 • State the opposite statistically:  20 • Select the alternative hypothesis: Ha:  20 • State the null hypothesis: H0:  20

Example • Is the average amount spent in the bookstore greater than $25? • Steps • State the question statistically:  25 • State the opposite statistically:  25 • Select the alternative hypothesis: Ha:  25 • State the null hypothesis: H0:  25

假設檢定 • 一般來說, 透過統計上的檢定程序, 我們的決策為 • 拒絕H0 且接受H1 為真, • 無法拒絕H0。注意到我們不說「接受H0」的原因在於, 即使我們找不到證據推翻H0, 並不代表H0 就是無庸置疑地為真, 只不過是找不到充分證據來推翻H0 罷了。你或許會在某些場合或是某些書上聽到或看到「接受 H0」的說法, 但是請記得當別人如此宣稱時, 並不代表H0 是無庸置疑地為真(absolutely true)

Sampling Distribution It is unlikely that we would get a sample mean of this value ... ... if in fact this were the population mean  = 50 20 Sample Means H0 Basic Idea ... therefore, we reject the hypothesis that = 50.

Level of Significance • Probability • Defines unlikely values of sample statistic if null hypothesis is true • Called rejection region of sampling distribution • Designated (alpha) • Typical values are .01, .05, .10 • Selected by researcher at start

Sampling Distribution Level of Confidence Rejection Region 1 –   Nonrejection Region Ho Sample Statistic Critical Value Value Observed sample statistic Rejection Region (One-Tail Test)

Sampling Distribution Level of Confidence Rejection Region 1 –   Nonrejection Region Ho Sample Statistic Critical Value Value Observed sample statistic Rejection Region (One-Tail Test) Sampling Distribution Level of Confidence

Sampling Distribution Level of Confidence Rejection Rejection Region Region 1 –    1/2 1/2 Nonrejection Region Ho Sample Statistic Critical Critical Value Value Value Rejection Regions (Two-Tailed Test) Observed sample statistic

Sampling Distribution Level of Confidence Rejection Rejection Region Region 1 –    1/2 1/2 Nonrejection Region Ho Sample Statistic Critical Critical Value Value Value Rejection Regions (Two-Tailed Test) Sampling Distribution Level of Confidence Observed sample statistic

One Population Mean Proportion Variance c 2 Z Test t Test Z Test Test (1 & 2 (1 & 2 (1 & 2 (1 & 2 tail) tail) tail) tail) One Population Tests

例子: 執行假設檢定 • 某藥廠宣稱只有5% 的人在服用過他們的新藥後, 出現嚴重副作用。食品藥物管理局對此感到懷疑, 決定應用統計方法予以檢定。在給予287 個受試者服用此藥後, 以Xi = 1 代表第 i 個受試者有嚴重副作用,Xi = 0 代表沒有嚴重副作用產生。顯而易見地,

例子: 執行假設檢定 • 假設為倘若我們發現, 在287 名受試者有25 位出現嚴重副作用, 亦即, = 25/287 = 0.0871。根據這組樣本,我們如何檢定藥廠的宣稱?

例子: 執行假設檢定 • 假設檢定的基本邏輯在於, 且讓我們暫時相信H0 為真 • 在假設H0 : μ = 0.05 為真的情況下, 即使我們所抽出來的每一組樣本的平均值不會「剛好」等於0.05, 卻應該會「相當接近」0.05 • 換言之, 給定虛無假設成立的情況下, 樣本均數遠大於0.05 的的可能性極低(very unlikely), 亦即, 在假設H0為真的情況下出現一個極端值的可能性將會十分微小

例子: 執行假設檢定 • 因此, 如果這種「不太可能」的事件真的發生了,我們就可以據此拒絕虛無假設 • 簡而言之, 如果的值太大, 大過於某個常數c, 我們就拒絕虛無假設 • 這樣的極端事件發生的機率要多小才算是「不太可能」? 一般來說, 我們選取一個極小的機率, α , 如0.10, 0.05 或是0.01, 來作為拒絕虛無假設的基礎

例子: 執行假設檢定 • 亦即, 根據虛無假設為真的機率分配下(稱之為虛無分配), 的值大過於某個常數c, 且發生此極端事件的條件機率P( |H0為真) 非常小,我們就做出拒絕虛無假設的決策 • 此微小機率 α通常被稱作顯著水準(significance level)。假設檢定又被叫做顯著性檢定(test of significance), 意指根據隨機樣本來決定是否足以顯著地拒絕(具有充分證據拒絕)虛無假設。

例子: 執行假設檢定 • 因此, 統計上的「顯著」並不是指「數值」的大小, 而是指「機率」的大小。發生此極端事件的機率小, 才稱此極端事件具顯著性

例子: 執行假設檢定 • 利用以上的例子說明, 則我們就是要找出一個臨界值c 使得亦即我們定義了一個機率微小事件: { > c} • 一旦我們找到 c 後, 且得知 , 則決策為: {拒絕H0, 當 > c} 以上又稱為拒絕域(rejection region, RR)

例子: 執行假設檢定 • 值得一提的是, 我們的決策法則乃是樣本實現前所確立下來的法則, 亦即, c 值是在樣本實現前所找出來的臨界值, 這又是一個樣本實現前(ex ante) 的概念 • 至於拒絕與否的決策則是由樣本實現後的與 c 做比較

例子: 執行假設檢定 • 再者, P( > c|μ = 0.05) = 有兩大特徵 • 第一, 這是一個樣本實現前(ex ante) 的機率 • 第二, 這是一個條件機率, 受限於H0 : μ = 0.05 為真的這個條件

例子: 執行假設檢定 • 我們回到藥廠的例子,

例子: 執行假設檢定 • 由於根據CLT, 我們知道

例子: 執行假設檢定

例子: 執行假設檢定 • 則c 值為 • 若選取的 α = 0.01, 則 = 2.33, 且故

例子: 執行假設檢定 • 亦即, 拒絕域為 RR ={拒絕H0, 當 ≥0.08 } • 在本例中, = = 0.0871 > 0.08 = c, 我們據此拒絕H0 : μ = 0.05, 接受H1 : μ > 0.05

執行假設檢定的常用法 • 事實上, 我們有兩種方法執行假設檢定, 第一種方法如上所示, 找出臨界值c。當 ,則拒絕虛無假設 • 另一種方法則是呼應上一章區間估計式的建構。我們可以建構一個與上一章相同的統計量 • 找出的抽樣分配, 接著根據的抽樣分配找出臨界值使得 • 最後, 求算而決策法則為: 當則拒絕H0

執行假設檢定的常用法 • 若的抽樣分配已知, 稱之為實際檢定 • 若的抽樣分配未知, 但符合CLT 的條件,則當樣本夠大時, 的抽樣分配可用N(0, 1)予以近似, 我們稱之為近似檢定, 或叫做大樣本檢定 • 底下我們列出假設檢定程序

假設檢定程序: θ代表所欲檢定的母體參數 [步驟一] 設立虛無假設(H0) 與對立假設(H1) • H0 : θ=θ0 • H1 : 三種可能 • θ>θ0 : 右尾檢定(right-tailed test, RTT) • θ<θ0 : 左尾檢定(left-tailed test, LTT) • θ≠ θ0 : 雙尾檢定(two-tailed test, TTT)

假設檢定程序 [步驟二] 建構統計量並找出的抽樣分配, (如標準常態分配, t 分配, 分配, F分配等)。若實際抽樣分配未知, 但是可以應用CLT, 則的抽樣分配可用標準常態分配予以近似 [步驟三] 選擇顯著水準, α

假設檢定程序 [步驟四] 根據的抽樣分配或近似分配找出臨界值 ,或是 ,並建構拒絕(rejection region, RR)。舉例來說, 如果 ∼ N(0, 1), 則其臨界值為 (右尾檢定), − (左尾檢定), 或是 (雙尾檢定)。其拒絕域為 • 拒絕H0, 當 • 拒絕H0, 當 • 拒絕H0, 當 [步驟五] 檢視是否掉入拒絕域並做出決策。

統計學 : 應用與進階第 13 章 : 假設檢定