Population proportion and sample proportion

Population proportion and sample proportion • 生活中很多的調查都僅問是否贊成…、是否支持…，然後計算「贊成」與「反對」的人數（count)所佔之比例(proportion)。 • 本章要介紹如何用統計方法來推論單一的「比例」(a single proportion)。下一章將會介紹如何來推論一組比例的分配。社會統計（上）

Population proportion and sample proportion • 想要估計總統大選阿扁的得票率，即投票給阿扁的人佔所有投票者的比例，我們可以利用適當的抽樣方法取處樣本數為n的樣本，然後觀察樣本中支持阿扁的人數佔整個樣本n的比例，即可得到樣本中的阿扁支持率，稱之為樣本比例。 • 如果我們知道樣本比例的抽樣分配，即樣本比例的期望值，變異數，及分配形狀，則可以用樣本比例來推估母體比例。社會統計（上）

Sampling Distribution of the Sample Proportion • Let p denote the proportion of items in a population that possess a certain characteristic (unemployed, income below poverty level). • To estimate p, we take a random sample of n observation from the population and count the number X of items in the sample that possess the characteristic. • The sample proportion p^ = X/n is used to estimate the population proportion p. 社會統計（上）

Sampling Distribution of the Sample Proportion 定義 • 若一隨機試驗只有兩種課能的結果（X=1支持阿扁, X=0不支持阿扁），若母體數總共為N(所有投票人），若母體中有K個人會投票給阿扁，則支持阿扁的母體比例(population proportion)為 • p = K/N (N=母體個數，K=支持阿扁總人數）社會統計（上）

Sampling Distribution of the Sample Proportion 定義 • 上次總統大選的有效投票數12,664,393 (N) • 其中阿扁得4,977,697 (K) • 母體比例為39.30% 社會統計（上）

Sampling Distribution of the Sample Proportion 定義 • 若母體N中隨機抽取n個元素為樣本，表為(X1, X2, …Xn)，且n個樣本中有k個人支持阿扁，支持阿扁所佔的比例稱為樣本比例(sample proportion)： • (n=樣本個數，k=樣本個數） • k為樣本中，支持阿扁(X=1)的個數總和。社會統計（上）

Sampling Distribution of the Sample Proportion 定義 • 在大選前，民調中心調查1500個樣本(n=1500)，其中有573人支持阿扁(k=573)，樣本支持比例為38.2% • 抽樣誤差為隨著每一次樣本所抽取的對象不同，所計算出的樣本比例也會有差異，因此樣本比例本身為一隨機變數。社會統計（上）

The Bernoulli Distribution 定義 • P(X=1) = p • P(X=0) = (1-p) • If we let q = 1- p, then the p.f of X can be written as follows: 社會統計（上）

The Bernoulli Distribution 定義 • E(X) = 1·p +0·q = p （X的期望值等於母體比例） • E(X2) =X2 f(x)=12·p+02·q = p • Var(X) = E(X2) –[E(X)]2 =p-p2=p(1-p) = p·q 社會統計（上）

Sampling Distribution of the Sample Proportion • The Normal Approximation Rule for Proportion: Let p denote the proportion of a population possessing some characteristics of interest. Take a random sample of n observations from the population. Let X denote the number of items in the sample possessing the characteristic. We estimate the population proportion p by the sample proportion p^=X/n. If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計（上）

Sampling Distribution of the Sample Proportion • 證明社會統計（上）

Sampling Distribution of the Sample Proportion • 證明 assume X1, X2…Xn independent 社會統計（上）

Sampling Distribution of the Sample Proportion • If the distribution of p^ is approximately normal, and 社會統計（上）

例題 • 假設這一次的大選會有55%的選民支持阿扁，假設我們任取n=400人的隨機樣本來預測阿扁的當選率，我們預測阿扁會輸的的機率為？社會統計（上）

例題 • Of your first 15 grandchildren, what is the chance there will be more than 10 boys? (assume equal probability of male/female) • “more than 10 boys””the proportion of boys is more than 10/15” • Use the Normal Approximation Rule: 社會統計（上）

Confidence intervals for proportions (large samples) we know that p^ ~N(p, pq/n) , where q = 1-p and np≧5 and nq≧5) 社會統計（上）

Value of Zα • P(Z≧ zα/2) =α/2 • P(Z≦ -zα/2) =α/2 • P(-zα/2 ≦Z≦ zα/2) =(1-α) 1-α/2-α/2 =1-α α/2 社會統計（上）

Confidence intervals for proportions (large samples) 上面的公式必須要有母體比例p才能估計標準誤社會統計（上）

Confidence intervals for proportions (large samples) 因為沒有p與q的資訊，在樣本數夠大時，我們通常以樣本的比例p^來估計母體的標準誤：社會統計（上）

Confidence interval for the population proportion p 定義 Let p denote the population proportion. Suppose we take a large random sample of n observations and obtain the sample proportion p^. A confidence interval for the population proportion having level of confidence 100(1-α)% is given by 社會統計（上）

社會統計（上）

Wilson estimate • 用樣本比例取代母體比例來估計標準誤並不一定正確。 • 例如：丟一個銅板三次得到三次都得正面，則 • p^=3/3=1 • p^(1-p^)/n=1(1-0)/3=0 社會統計（上）

Wilson estimate We must know the s.d. of the population to get a CI for p. • Unfortunately, modern computer studies reveal the confidence intervals based on this approach can be quite inaccurate, even for large samples. -- When the sample is not a SRS. -- When the sample size is small 社會統計（上）

Wilson estimate • The Wilson estimate ~ Add 2 successes and 2 failures(so that the sample proportion is slightly moved away from 0 and 1.) -- Because this estimate was first suggested by Edwin Bidwell Wilson in 1927, we call it theWilson estimate. 社會統計（上）

Wilson estimate • 的抽樣分配趨近於平均數為p、標準差為的常態分配。 • An approximate level C confidence interval for p is • The margin of error is 社會統計（上）

Confidence interval for the population proportion p 例題政府想要估計月收入低於$25,000NT的家庭。500個家庭接受訪問，其中有200戶人家年收入少於 25000. 求p的95%信賴區間？ (.3572, .4428) 社會統計（上）

例題 • 從台北市隨機抽取500個人，詢問是否贊成公投，結果有312名贊成。試求台北市贊成公投比率95%信賴區間。，p的信賴區間為：社會統計（上）

One-sided confidence intervals for the population proportion Suppose that we take a random sample of n observations from some population having unknown proportion p. Suppose we wish to find the lower confidence limit LCL such that the probability is (1-) that p exceeds LCL. The one-sided interval (LCL, 1.00) is a left-sided confidence interval. The LCL is given by: 社會統計（上）

One-sided confidence intervals for the population proportion Construct a right-sided 95% CI for the proportion of defective items produced by a machine if 16 items are found to be defective in a random sample of 100 items. The 95% right-sided CI for p is (0, .2306) This mean that we can be 95% confident that the population proportion is less than .2306 社會統計（上）

Determining the sample size決定樣本大小 Margin of Error Suppose that we take a random sample from some population. Then a 100(1-)% confidence interval for the population proportion extends at most a distance m on each side of the sample proportion if the number of observations is ? 社會統計（上）

Determining the sample size決定樣本大小 問題是我們還不知道（因為樣本數都還沒決定），所以上述公式無法使用，除非我們有p的推估值。 (1) 我們可以用pilot study來得到p的估計值。 (2) 在不知道的樣本比例情形下，我們可以採用最保守的估計，也就是最大的變異.5*.5=.25來估計n。社會統計（上）

Sample size and confidence interval for the proportion 如果母體比率無法推估，則樣本數：如果母體比率p可以推估，則樣本數：社會統計（上）

Sample size and confidence interval for the proportion 民意調查機構想知道某總統候選人得票的比率，請問至少要多大的樣本數才可以使此機構在95%的信賴度下，估計的誤差界不會超過.03？社會統計（上）

Sample size and confidence interval for the proportion 民意調查機構想知道某總統候選人得票的比率。假設該公司要求樣本比例與母體之誤差不能超過0.01，且有95％的信賴度，則樣本數應為何？代入， p未知，故以故至少應選取9,600個樣本點。社會統計（上）

Tests of the population proportion 樣本比例的抽樣分配 f(p^)：如果母體的比例為p, 且np5 and nq 5，則樣本比例p^為一常態分配~N(p, pq/n) The Normal Approximation Rule for Proportion: If np5, and nq 5, the random variable p^ has approximately a normal distribution with: 社會統計（上）

Sampling Distribution of the Sample Proportion • If the distribution of p^ is approximately normal, then random variable 社會統計（上）

Tests of the population proportion 設np5 and nq 5 檢證下列假說： H0: p = p0 or H0: pp0 H1: p < p0 如果H0為真，則樣本比率~N(p0, p0q0/n) 假設為真時的母體比例 Reject H0 if Z < -z or p^ < p^* （critical value approach) 社會統計（上）

社會統計（上）

Page 614, Procedure 12.2B (cont.) 社會統計（上）

例：Testing a population Proportion 藍營立法委員宣稱民調顯示60%的民眾支持連戰出訪中國，綠營團體宣稱支持的民眾不會超過60%，妳用100的樣本來驗證： H0: p = .6 v.s. H1: p < .6 假設55個樣本支持連戰出訪，以5%的顯著水準，我們可以推翻藍營立委的說法嗎？社會統計（上）

例：Testing a population Proportion Solution: If H0 is true, then p^ has a normal distribution with mean p =.6 and variance pq/n = (.6)(.4)/100 = .0024 If we use a one-tailed test at the 5% level of significance, the critical region consists of all values of Z less than –z = -z.05 = -1.645 從樣本中得知p^=x/n = 55/100 =.55 社會統計（上）

例：Testing a population Proportion We do not reject H0 1 0 -1.02 實際上觀察到的樣本比例為.55>.519因此無法推翻虛擬假設社會統計（上）

Sampling distribution of the difference between sample proportions • Suppose we take independent sample of size n1and n2from two population. Let p1 and p2 be the proportion of items in each population that possess a certain characteristics, and let q1=(1-p1), q2=(1-p2). If n1p1>5, n1q1>5, n2p2>5, n2q2>5, then the random variable (p1^-p2^) is approximately normally distributed with 社會統計（上）

例題 • 假設某行銷公司想要知道某電視節目在高、低收入人口中受歡迎的程度。假設高收入的人中有40%喜歡看此節目，在低收入人口中喜歡此節目的佔50%。這家行銷公司從高收入的人口中抽取100人的樣本，從低收入中抽200人的樣本。請問兩樣本比率差距小於.05的機率？社會統計（上）

例題社會統計（上）

Confidence intervals for the difference of Two population proportion Let p1 denote the observed proportion of successes in a random sample of n1 observation from a population with proportion p1 successes, and let p2 denote the observed proportion of successes in an independent random sample of n2 observations from a population with proportion p2 successes. A 100(1- α) % confidence interval for (p1 – p2) is given by the interval This result holds provided n1p1≧ 5 n1q1≧5 n2p2≧ 5 and n2q2≧5 社會統計（上）

Tests concerning differences of proportions • 欲檢定兩母體的比率是否等於某特定值（相等），假設母體1的比率為p1，母體2的比率為p2： • H0: p1 –p2 = D0 • 分別從兩母體中抽取樣本n1, n2並計算樣本比率為p^1 p^2。社會統計（上）

Tests concerning differences of proportions • 若虛擬假設為真H0: p1 –p2 = D0，且n1p1≥5, n1q1≥5, n2p2≥5, n2q2≥5 • 通常我們想要檢驗虛擬假設H0: p1 –p2 =0的情形，即H0: p1 = p2 社會統計（上）

Population proportion and sample proportion

Population proportion and sample proportion

Presentation Transcript

Proportion

Population Proportion

Proportion

Estimating a population proportion

PROPORTION

PROPORTION

PROPORTION

One sample statistical tests: Sample Proportion

Proportion

Proportion

Proportion

Estimating a Population Proportion

PROPORTION

Sample Proportion

Chapter 9.2: Sample Proportion

Proportion and Non-Proportion Situations

PROPORTION

Proportion

Population Proportion

Proportion and Non-Proportion Situations

Estimating a Population Proportion

Proportion