620 likes | 951 Views
Population proportion and sample proportion. 生活中很多的調查都僅問是否贊成 … 、是否支持 … ,然後計算「贊成」與「反對」的人數( count) 所佔之比例 (proportion) 。 本章要介紹如何用統計方法來推論單一的「比例」 (a single proportion) 。下一章將會介紹如何來推論一組比例的分配。. Population proportion and sample proportion.
E N D
Population proportion and sample proportion • 生活中很多的調查都僅問是否贊成…、是否支持…,然後計算「贊成」與「反對」的人數(count)所佔之比例(proportion)。 • 本章要介紹如何用統計方法來推論單一的「比例」(a single proportion)。下一章將會介紹如何來推論一組比例的分配。 社會統計(上)
Population proportion and sample proportion • 想要估計總統大選阿扁的得票率,即投票給阿扁的人佔所有投票者的比例,我們可以利用適當的抽樣方法取處樣本數為n的樣本,然後觀察樣本中支持阿扁的人數佔整個樣本n的比例,即可得到樣本中的阿扁支持率,稱之為樣本比例。 • 如果我們知道樣本比例的抽樣分配,即樣本比例的期望值,變異數,及分配形狀,則可以用樣本比例來推估母體比例。 社會統計(上)
Sampling Distribution of the Sample Proportion • Let p denote the proportion of items in a population that possess a certain characteristic (unemployed, income below poverty level). • To estimate p, we take a random sample of n observation from the population and count the number X of items in the sample that possess the characteristic. • The sample proportion p^ = X/n is used to estimate the population proportion p. 社會統計(上)
Sampling Distribution of the Sample Proportion 定義 • 若一隨機試驗只有兩種課能的結果(X=1支持阿扁, X=0不支持阿扁),若母體數總共為N(所有投票人),若母體中有K個人會投票給阿扁,則支持阿扁的母體比例(population proportion)為 • p = K/N (N=母體個數,K=支持阿扁總人數) 社會統計(上)
Sampling Distribution of the Sample Proportion 定義 • 上次總統大選的有效投票數12,664,393 (N) • 其中阿扁得4,977,697 (K) • 母體比例為39.30% 社會統計(上)
Sampling Distribution of the Sample Proportion 定義 • 若母體N中隨機抽取n個元素為樣本,表為(X1, X2, …Xn),且n個樣本中有k個人支持阿扁,支持阿扁所佔的比例稱為樣本比例(sample proportion): • (n=樣本個數,k=樣本個數) • k為樣本中,支持阿扁(X=1)的個數總和。 社會統計(上)
Sampling Distribution of the Sample Proportion 定義 • 在大選前,民調中心調查1500個樣本(n=1500),其中有573人支持阿扁(k=573),樣本支持比例為38.2% • 抽樣誤差為 隨著每一次樣本所抽取的對象不同,所計算出的樣本比例也會有差異,因此樣本比例本身為一隨機變數。 社會統計(上)
The Bernoulli Distribution 定義 • P(X=1) = p • P(X=0) = (1-p) • If we let q = 1- p, then the p.f of X can be written as follows: 社會統計(上)
The Bernoulli Distribution 定義 • E(X) = 1·p +0·q = p (X的期望值等於母體比例) • E(X2) =X2 f(x)=12·p+02·q = p • Var(X) = E(X2) –[E(X)]2 =p-p2=p(1-p) = p·q 社會統計(上)
Sampling Distribution of the Sample Proportion • The Normal Approximation Rule for Proportion: Let p denote the proportion of a population possessing some characteristics of interest. Take a random sample of n observations from the population. Let X denote the number of items in the sample possessing the characteristic. We estimate the population proportion p by the sample proportion p^=X/n. If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計(上)
Sampling Distribution of the Sample Proportion • 證明 社會統計(上)
Sampling Distribution of the Sample Proportion • 證明 assume X1, X2…Xn independent 社會統計(上)
Sampling Distribution of the Sample Proportion • If the distribution of p^ is approximately normal, and 社會統計(上)
例題 • 假設這一次的大選會有55%的選民支持阿扁,假設我們任取n=400人的隨機樣本來預測阿扁的當選率,我們預測阿扁會輸的的機率為? 社會統計(上)
例題 • Of your first 15 grandchildren, what is the chance there will be more than 10 boys? (assume equal probability of male/female) • “more than 10 boys””the proportion of boys is more than 10/15” • Use the Normal Approximation Rule: 社會統計(上)
Confidence intervals for proportions (large samples) we know that p^ ~N(p, pq/n) , where q = 1-p and np≧5 and nq≧5) 社會統計(上)
Value of Zα • P(Z≧ zα/2) =α/2 • P(Z≦ -zα/2) =α/2 • P(-zα/2 ≦Z≦ zα/2) =(1-α) 1-α/2-α/2 =1-α α/2 社會統計(上)
Confidence intervals for proportions (large samples) 上面的公式必須要有母體比例p才能估計標準誤 社會統計(上)
Confidence intervals for proportions (large samples) 因為沒有p與q的資訊,在樣本數夠大時,我們通常以樣本的比例p^來估計母體的標準誤: 社會統計(上)
Confidence interval for the population proportion p 定義 Let p denote the population proportion. Suppose we take a large random sample of n observations and obtain the sample proportion p^. A confidence interval for the population proportion having level of confidence 100(1-α)% is given by 社會統計(上)
Wilson estimate • 用樣本比例取代母體比例來估計標準誤並不一定正確。 • 例如:丟一個銅板三次得到三次都得正面,則 • p^=3/3=1 • p^(1-p^)/n=1(1-0)/3=0 社會統計(上)
Wilson estimate We must know the s.d. of the population to get a CI for p. • Unfortunately, modern computer studies reveal the confidence intervals based on this approach can be quite inaccurate, even for large samples. -- When the sample is not a SRS. -- When the sample size is small 社會統計(上)
Wilson estimate • The Wilson estimate ~ Add 2 successes and 2 failures(so that the sample proportion is slightly moved away from 0 and 1.) -- Because this estimate was first suggested by Edwin Bidwell Wilson in 1927, we call it theWilson estimate. 社會統計(上)
Wilson estimate • 的抽樣分配趨近於平均數為p、標準差為 的常態分配。 • An approximate level C confidence interval for p is • The margin of error is 社會統計(上)
Confidence interval for the population proportion p 例題 政府想要估計月收入低於$25,000NT的家庭。500個家庭接受訪問,其中有200戶人家年收入少於 25000. 求p的95%信賴區間? (.3572, .4428) 社會統計(上)
例題 • 從台北市隨機抽取500個人,詢問是否贊成公投,結果有312名贊成。試求台北市贊成公投比率95%信賴區間。 ,p的信賴區間為: 社會統計(上)
One-sided confidence intervals for the population proportion Suppose that we take a random sample of n observations from some population having unknown proportion p. Suppose we wish to find the lower confidence limit LCL such that the probability is (1-) that p exceeds LCL. The one-sided interval (LCL, 1.00) is a left-sided confidence interval. The LCL is given by: 社會統計(上)
One-sided confidence intervals for the population proportion Construct a right-sided 95% CI for the proportion of defective items produced by a machine if 16 items are found to be defective in a random sample of 100 items. The 95% right-sided CI for p is (0, .2306) This mean that we can be 95% confident that the population proportion is less than .2306 社會統計(上)
Determining the sample size決定樣本大小 Margin of Error Suppose that we take a random sample from some population. Then a 100(1-)% confidence interval for the population proportion extends at most a distance m on each side of the sample proportion if the number of observations is ? 社會統計(上)
Determining the sample size決定樣本大小 問題是我們還不知道 (因為樣本數都還沒決定),所以上述公式無法使用,除非我們有p的推估值。 (1) 我們可以用pilot study來得到p的估計值。 (2) 在不知道的樣本比例情形下,我們可以採用最保守的估計,也就是最大的變異.5*.5=.25來估計n。 社會統計(上)
Sample size and confidence interval for the proportion 如果母體比率無法推估,則樣本數: 如果母體比率p可以推估,則樣本數: 社會統計(上)
Sample size and confidence interval for the proportion 民意調查機構想知道某總統候選人得票的比率,請問至少要多大的樣本數才可以使此機構在95%的信賴度下,估計的誤差界不會超過.03? 社會統計(上)
Sample size and confidence interval for the proportion 民意調查機構想知道某總統候選人得票的比率。假設該公司要求樣本比例與母體之誤差不能超過0.01,且有95%的信賴度,則樣本數應為何? 代入, p未知,故以 故至少應選取9,600個樣本點。 社會統計(上)
Tests of the population proportion 樣本比例的抽樣分配 f(p^):如果母體的比例為p, 且np5 and nq 5, 則樣本比例p^為一常態分配~N(p, pq/n) The Normal Approximation Rule for Proportion: If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計(上)
Sampling Distribution of the Sample Proportion • If the distribution of p^ is approximately normal, then random variable 社會統計(上)
Tests of the population proportion 設np5 and nq 5 檢證下列假說: H0: p = p0 or H0: pp0 H1: p < p0 如果H0為真,則樣本比率~N(p0, p0q0/n) 假設為真時的母體比例 Reject H0 if Z < -z or p^ < p^* (critical value approach) 社會統計(上)
例:Testing a population Proportion 藍營立法委員宣稱民調顯示60%的民眾支持連戰出訪中國,綠營團體宣稱支持的民眾不會超過60%,妳用100的樣本來驗證: H0: p = .6 v.s. H1: p < .6 假設55個樣本支持連戰出訪,以5%的顯著水準,我們可以推翻藍營立委的說法嗎? 社會統計(上)
例:Testing a population Proportion Solution: If H0 is true, then p^ has a normal distribution with mean p =.6 and variance pq/n = (.6)(.4)/100 = .0024 If we use a one-tailed test at the 5% level of significance, the critical region consists of all values of Z less than –z = -z.05 = -1.645 從樣本中得知p^=x/n = 55/100 =.55 社會統計(上)
例:Testing a population Proportion We do not reject H0 1 0 -1.02 實際上觀察到的樣本比例為.55>.519因此無法推翻虛擬假設 社會統計(上)
Sampling distribution of the difference between sample proportions • Suppose we take independent sample of size n1and n2from two population. Let p1 and p2 be the proportion of items in each population that possess a certain characteristics, and let q1=(1-p1), q2=(1-p2). If n1p1>5, n1q1>5, n2p2>5, n2q2>5, then the random variable (p1^-p2^) is approximately normally distributed with 社會統計(上)
例題 • 假設某行銷公司想要知道某電視節目在高、低收入人口中受歡迎的程度。假設高收入的人中有40%喜歡看此節目,在低收入人口中喜歡此節目的佔50%。這家行銷公司從高收入的人口中抽取100人的樣本,從低收入中抽200人的樣本。請問兩樣本比率差距小於.05的機率? 社會統計(上)
例題 社會統計(上)
Confidence intervals for the difference of Two population proportion Let p1 denote the observed proportion of successes in a random sample of n1 observation from a population with proportion p1 successes, and let p2 denote the observed proportion of successes in an independent random sample of n2 observations from a population with proportion p2 successes. A 100(1- α) % confidence interval for (p1 – p2) is given by the interval This result holds provided n1p1≧ 5 n1q1≧5 n2p2≧ 5 and n2q2≧5 社會統計(上)
Tests concerning differences of proportions • 欲檢定兩母體的比率是否等於某特定值(相等),假設母體1的比率為p1,母體2的比率為p2: • H0: p1 –p2 = D0 • 分別從兩母體中抽取樣本n1, n2並計算樣本比率為p^1 p^2。 社會統計(上)
Tests concerning differences of proportions • 若虛擬假設為真H0: p1 –p2 = D0,且n1p1≥5, n1q1≥5, n2p2≥5, n2q2≥5 • 通常我們想要檢驗虛擬假設H0: p1 –p2 =0的情形,即H0: p1 = p2 社會統計(上)