Inference About a Proportion in Data Analysis

Chapter 16: Inference About a Proportion

In Chapter 16: 16.1 Proportions 16.2 The Sampling Distribution of a Proportion 16.3 Hypothesis Test, Normal Approximation 16.4 Hypothesis Test, Exact Binomial Method 16.5 Confidence Interval for a Population Proportion 16.6 Sample Size and Power

Our data analysis journey continues …

Structure of the Book • Chapters 1 – 10 focused on statistical concepts and practices • Chapters 11 – 15 focused on the analysis of quantitative response variables • Chapters 16 – 19 focuses on the analysis categorical response variables After completing selected chapters on statistical concepts, you may cover Chapters 11 – 19 in any order.

The Nature of the Response Variable Determines the Analysis This figure illustrates a basic difference of quantitative and categorical data analysis.

Binary Response Variable Examples of binary response variables • Classification of a respondent as a current smoker: “yes” or “no” • Gender: “male” or “female” • Whether a patient survives five or more years: “survived” or “did not survive” • Whether the subject developed a blood clot: “case” or “non-case”

Binary Response Variable, cont • One outcome is arbitrarily labeled a “success” and the other a “failure” • If the process of selection is random, the number of successes in a sample will follow a binomial distribution with parameters n and p • Notation: Let X ~ b(n,p) represent a binomial random variables with parameters n and p

§16.1 Proportions • Conditions: • Single SRS • Response variable is binary. • Describe the proportion of successes in the sample, denoted “p hat”:where x = no. of successes and n = sample size

Proportion, cont Two of ten individuals in the sample have a risk factor for disease X. Therefore, the prevalence of this risk factor in the sample is:

A Proportion is an Average of 0s and 1s Here are the data in tabular form with the variable coded 1= risk factor present and 0 = risk factor absent. Note that with n =10 and x = 2 and sample mean The sample mean and sample proportion are equivalent when we think of binary responses in this way.

Incidence and Prevalence • Prevalences and some types of incidences are proportions • Incidence proportion(average risk) is proportion that develop a specified condition over a set period of time • Prevalence is the proportion with the characteristic at a particular point in time

Illustrative Example: Prevalence of Smoking An SRS of 57 adults reveals 17 current smokers. Thus, the prevalence of smokers in the sample is: Calculations should carry at least4 significant digits. For reporting purposes, the APA publication guide (2001) recommends that proportions be converted to percentages and reported with one-decimal accuracy (e.g., 29.8%).

§16.2 Inference about a Proportion How good is sample proportion at estimating population proportion p? To answer this question, consider what would happen if we took repeated samples, each of size n, from the population? How would sample proportions be distributed?

Sampling Distribution of a Proportion • In SRSs, the random number of success X in such samples follow binomial distributions with parameters n and p (Chapter 6) • Sample proportion is a mathematical transformation of the count of successes (divide the count by n) • When n is moderate to large, a Normal approximation to the binomial can be used (§8.3) to describe the sampling distribution of

Normal Approximation for Proportions

16.3 Hypothesis Test, Normal Approximation Method • H0: p = p0 vs. Ha: pp0 where p0 represents the proportion specified by the null hypothesis B. Test statistic C. P-value. Convert zstat to P-value [e.g., using Table F]. Interpret results. D. Significance level (optional).

Illustration: Hypothesis Test An SRS of n = 57 finds 17 smokers (p-hat = 17 / 57 = 0.2982). The national average for smoking prevalence is 0.25. Is the proportion in the sample significantly different than the national average? • H0:p = 0.25 vs. Ha: p ≠ 0.25 • B. • C. P = 0.4010 [via Table F]. Weak evidence against H0. The sample proportion is not significantly different than the national average.

16.4 Hypothesis Test, Exact Binomial Method • When n is small (e.g., less than 5 successes expected), binomial distributions do not resemble Normal distributions and z procedures can not be used. • Instead an exact binomial procedure (e.g, “Fisher’s method”) should be used

Exact Binomial Method A. Hypotheses. H0: p = p0 vs. Ha: pp0 where p0 represents the proportion under the null hypothesis B. Test statistic. Observed number of successes, x. C. P-value. Use a software program to calculate the P-value. Interpret the results. The theory of the test assumes X ~ b(n, p0). For right-sided tests, the P-value= Pr(X ≥ x) from the binomial distribution. (See text for additional details.) D. Significance level (optional).

Exact Binomial Test, Example Tea challenge.An individual correctly identifies the order of adding milk to tea in 6 of 8 attempts. Can we say that this is better than random guessing? • A. Hypotheses. H0: p = 0.5 vs. Ha: p > 0.5 • B. Statistic. 6 out of 8 • C. P-value. P = 0.145via WinPepi > Describe.exe > Program A (next slide). Weak evidence against H0. Calculations also shown on p. 358 of text. • D. Significance level (optional). The evidence against H0 is not significant at α= .10.

Exact Binomial Test, Example Output from WinPepi > Describe.exe > Program A

§16.5 Confidence Interval for Population Proportion This method is called the “plus four method” because it adds four imaginary points during calculations. It is much more accurate than the traditional Normal method. A 1−α(100%) confidence interval for p is:

Confidence Interval, Example Based on n = 57 and x = 17, the 95% CI for the prevalence of smoking in the population is:

Confidence Interval, Example The plus-four CI method is similar to something called the “Wilson score method”. Here’s output from Output from WinPepi > Describe.exe > Program A. showing the Wilson score CI for the example

16.6 Sample Size and Power Three approaches: • n needed to estimate p with margin of error m (for confidence interval) • n needed to test H0 at given α level and power • The power of testing H0under stated conditions §9.6, §10.3, and §11.7 has addition background if needed

n need to Achieve Margin of Error m • where p* represent an educated guess for population proportion p (when no educated guess for p* is available, let p* = .5) • Round up to next integer to ensure stated precision

For margin of error of .05, use: For margin of error of .03, use: n need to Achieve m, Example Suppose our educated guess for the proportion is p* = 0.30

n to Test H0: p = p0 where • α≡ alpha level of the test (two-sided) • 1 – β ≡ power of the test • p0≡ proportion under the null hypothesis • p1≡ proportion under the alternative hypothesis

n to Test H0: p = p0, example How large a sample is needed to test H0: p = .21 against Ha: p = .31 at α = 0.05 (two-sided) with 90% power?  means round up to ensure stated power

Power When Testing H0: p = p0 • where • α ≡ alpha level of the test (two-sided) • n ≡ sample size • p0 ≡ proportion under the null hypothesis • p1 ≡ proportion under the alternative hypothesis

Power, Example What is the power of testing H0: p = .21 against Ha: p = .31 at α = 0.05 (two-sided) when n = 57 ?

Conditions for Inference • Sampling independence (SRS or facsimile) • Valid information • The plus-four confidence interval requires at least 10 observations • The z test of H0: p = p0 requires np0q0 5 I'd rather have a sound judgment than a talent. Mark Twain

Inference About a Proportion in Data Analysis

Inference About a Proportion in Data Analysis

Presentation Transcript

If we live with a deep sense of gratitude, our life will be greatly embellished.

Chapter 3: Methods of Inference

Chapter 7: Inference

Journal chapter 7-8

Bridge to inference

Inference for a Population Proportion

PROPORTION

Chapter 8 Statistical Inference and Sampling

Proportion

Proportion

Chapter 12: Inference for Proportions

Objectives (BPS chapter 24)

Inference in first-order logic

Chapter 18 Inference about a Population Proportion

Chapter 14 - Inference for Regression

Chapter 9: Understanding Inference and the Author’s Purpose

Statistical Inference

Chapter 3: Methods of Inference

ap statistics

Chapter 8: Introduction to Statistical Inference

Chapter 10

Chapter 2