320 likes | 335 Views
Chapter 16: Inference About a Proportion. In Chapter 16:. 16.1 Proportions 16.2 The Sampling Distribution of a Proportion 16.3 Hypothesis Test, Normal Approximation 16.4 Hypothesis Test, Exact Binomial Method 16.5 Confidence Interval for a Population Proportion 16.6 Sample Size and Power.
E N D
In Chapter 16: 16.1 Proportions 16.2 The Sampling Distribution of a Proportion 16.3 Hypothesis Test, Normal Approximation 16.4 Hypothesis Test, Exact Binomial Method 16.5 Confidence Interval for a Population Proportion 16.6 Sample Size and Power
Structure of the Book • Chapters 1 – 10 focused on statistical concepts and practices • Chapters 11 – 15 focused on the analysis of quantitative response variables • Chapters 16 – 19 focuses on the analysis categorical response variables After completing selected chapters on statistical concepts, you may cover Chapters 11 – 19 in any order.
The Nature of the Response Variable Determines the Analysis This figure illustrates a basic difference of quantitative and categorical data analysis.
Binary Response Variable Examples of binary response variables • Classification of a respondent as a current smoker: “yes” or “no” • Gender: “male” or “female” • Whether a patient survives five or more years: “survived” or “did not survive” • Whether the subject developed a blood clot: “case” or “non-case”
Binary Response Variable, cont • One outcome is arbitrarily labeled a “success” and the other a “failure” • If the process of selection is random, the number of successes in a sample will follow a binomial distribution with parameters n and p • Notation: Let X ~ b(n,p) represent a binomial random variables with parameters n and p
§16.1 Proportions • Conditions: • Single SRS • Response variable is binary. • Describe the proportion of successes in the sample, denoted “p hat”:where x = no. of successes and n = sample size
Proportion, cont Two of ten individuals in the sample have a risk factor for disease X. Therefore, the prevalence of this risk factor in the sample is:
A Proportion is an Average of 0s and 1s Here are the data in tabular form with the variable coded 1= risk factor present and 0 = risk factor absent. Note that with n =10 and x = 2 and sample mean The sample mean and sample proportion are equivalent when we think of binary responses in this way.
Incidence and Prevalence • Prevalences and some types of incidences are proportions • Incidence proportion(average risk) is proportion that develop a specified condition over a set period of time • Prevalence is the proportion with the characteristic at a particular point in time
Illustrative Example: Prevalence of Smoking An SRS of 57 adults reveals 17 current smokers. Thus, the prevalence of smokers in the sample is: Calculations should carry at least4 significant digits. For reporting purposes, the APA publication guide (2001) recommends that proportions be converted to percentages and reported with one-decimal accuracy (e.g., 29.8%).
§16.2 Inference about a Proportion How good is sample proportion at estimating population proportion p? To answer this question, consider what would happen if we took repeated samples, each of size n, from the population? How would sample proportions be distributed?
Sampling Distribution of a Proportion • In SRSs, the random number of success X in such samples follow binomial distributions with parameters n and p (Chapter 6) • Sample proportion is a mathematical transformation of the count of successes (divide the count by n) • When n is moderate to large, a Normal approximation to the binomial can be used (§8.3) to describe the sampling distribution of
16.3 Hypothesis Test, Normal Approximation Method • H0: p = p0 vs. Ha: pp0 where p0 represents the proportion specified by the null hypothesis B. Test statistic C. P-value. Convert zstat to P-value [e.g., using Table F]. Interpret results. D. Significance level (optional).
Illustration: Hypothesis Test An SRS of n = 57 finds 17 smokers (p-hat = 17 / 57 = 0.2982). The national average for smoking prevalence is 0.25. Is the proportion in the sample significantly different than the national average? • H0:p = 0.25 vs. Ha: p ≠ 0.25 • B. • C. P = 0.4010 [via Table F]. Weak evidence against H0. The sample proportion is not significantly different than the national average.
16.4 Hypothesis Test, Exact Binomial Method • When n is small (e.g., less than 5 successes expected), binomial distributions do not resemble Normal distributions and z procedures can not be used. • Instead an exact binomial procedure (e.g, “Fisher’s method”) should be used
Exact Binomial Method A. Hypotheses. H0: p = p0 vs. Ha: pp0 where p0 represents the proportion under the null hypothesis B. Test statistic. Observed number of successes, x. C. P-value. Use a software program to calculate the P-value. Interpret the results. The theory of the test assumes X ~ b(n, p0). For right-sided tests, the P-value= Pr(X ≥ x) from the binomial distribution. (See text for additional details.) D. Significance level (optional).
Exact Binomial Test, Example Tea challenge.An individual correctly identifies the order of adding milk to tea in 6 of 8 attempts. Can we say that this is better than random guessing? • A. Hypotheses. H0: p = 0.5 vs. Ha: p > 0.5 • B. Statistic. 6 out of 8 • C. P-value. P = 0.145via WinPepi > Describe.exe > Program A (next slide). Weak evidence against H0. Calculations also shown on p. 358 of text. • D. Significance level (optional). The evidence against H0 is not significant at α= .10.
Exact Binomial Test, Example Output from WinPepi > Describe.exe > Program A
§16.5 Confidence Interval for Population Proportion This method is called the “plus four method” because it adds four imaginary points during calculations. It is much more accurate than the traditional Normal method. A 1−α(100%) confidence interval for p is:
Confidence Interval, Example Based on n = 57 and x = 17, the 95% CI for the prevalence of smoking in the population is:
Confidence Interval, Example The plus-four CI method is similar to something called the “Wilson score method”. Here’s output from Output from WinPepi > Describe.exe > Program A. showing the Wilson score CI for the example
16.6 Sample Size and Power Three approaches: • n needed to estimate p with margin of error m (for confidence interval) • n needed to test H0 at given α level and power • The power of testing H0under stated conditions §9.6, §10.3, and §11.7 has addition background if needed
n need to Achieve Margin of Error m • where p* represent an educated guess for population proportion p (when no educated guess for p* is available, let p* = .5) • Round up to next integer to ensure stated precision
For margin of error of .05, use: For margin of error of .03, use: n need to Achieve m, Example Suppose our educated guess for the proportion is p* = 0.30
n to Test H0: p = p0 where • α≡ alpha level of the test (two-sided) • 1 – β ≡ power of the test • p0≡ proportion under the null hypothesis • p1≡ proportion under the alternative hypothesis
n to Test H0: p = p0, example How large a sample is needed to test H0: p = .21 against Ha: p = .31 at α = 0.05 (two-sided) with 90% power? means round up to ensure stated power
Power When Testing H0: p = p0 • where • α ≡ alpha level of the test (two-sided) • n ≡ sample size • p0 ≡ proportion under the null hypothesis • p1 ≡ proportion under the alternative hypothesis
Power, Example What is the power of testing H0: p = .21 against Ha: p = .31 at α = 0.05 (two-sided) when n = 57 ?
Conditions for Inference • Sampling independence (SRS or facsimile) • Valid information • The plus-four confidence interval requires at least 10 observations • The z test of H0: p = p0 requires np0q0 5 I'd rather have a sound judgment than a talent. Mark Twain