200 likes | 341 Views
Chapter 8. Inference Concerning Proportions. Inference for a Single Proportion ( p ). Goal: Estimate proportion of individuals in a population with a certain characteristic ( p ). This is equivalent to estimating a binomial probability
E N D
Chapter 8 Inference Concerning Proportions
Inference for a Single Proportion (p) • Goal: Estimate proportion of individuals in a population with a certain characteristic (p). This is equivalent to estimating a binomial probability • Sample: Take a SRS of n individuals from the population and observe X that have the characteristic. The sample proportion is X/n and has the following sampling properties:
Large-Sample Confidence Interval for p • Take SRS of size n from population where p is true (unknown) proportion of successes. • Observe X successes • Set confidence level C and choose z* such that P(-z*Z z*)=C (C = 90% z*=1.645 C = 95% z*=1.96 C = 99% z*=2.576)
Example - Ginkgo and Azet for AMS • Study Goal: Measure effect of Ginkgo and Acetazolamide on occurrence of Acute Mountain Sickness (AMS) in Himalayan Trackers • Parameter: p = True proportion of all trekkers receiving Ginkgo&Acetaz who would suffer from AMS. • Sample Data: n=126trekkers received G&A, X=18 suffered from AMS
Wilson’s “Plus 4” Method • For moderate to small sample sizes, large-sample methods may not work well wrt coverage probabilities • Simple approach that works well in practice (n10): • Pretend you have 4 extra individuals, 2 successes, 2 failures • Compute the estimated sample proportion in light of new “data” as well as standard error:
Example: Lister’s Tests with Antiseptic • Experiments with antiseptic in patients with upper limb amputations (John Lister, circa 1870) • n=12 patients received antiseptic X=1 died
Significance Test for a Proportion • Goal test whether a proportion (p) equals some null value p0H0: p=p0 Large-sample test works well when np0 and n(1-p0) > 10
Ginkgo and Acetaz for AMS • Can we claim that the incidence rate of AMS is less than 25% for trekkers receiving G&A? • H0: p=0.25 Ha: p < 0.25 Strong evidence that incidence rate is below 25% (p<0.25)
Comparing Two Population Proportions • Goal: Compare two populations/treatments wrt a nominal (binary) outcome • Sampling Design: Independent vs Dependent Samples • Methods based on large vs small samples • Contingency tables used to summarize data • Measures of Association: Absolute Risk, Relative Risk, Odds Ratio
Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts
Outcome Present Outcome Absent Group Total Group 1 X1 n1-X1 n1 Group 2 X2 n2-X2 n2 Outcome Total X1+X2 (n1+n2)-(X1+X2) n1+n2 2x2 Tables - Notation
High Quality Low Quality Group Total Not Integrated 33 55 88 Vertically Integrated 5 79 84 Outcome Total 38 134 172 Example - Firm Type/Product Quality • Groups: Not Integrated (Weave only) vs Vertically integrated (Spin and Weave) Cotton Textile Producers • Outcomes: High Quality (High Count) vs Low Quality (Count) Source: Temin (1988)
Notation • Proportion in Population 1 with the characteristic of interest: p1 • Sample size from Population 1: n1 • Number of individuals in Sample 1 with the characteristic of interest: X1 • Sample proportion from Sample 1 with the characteristic of interest: • Similar notation for Population/Sample 2
Example - Cotton Textile Producers • p1 - True proportion of all Non-integretated firms that would produce High quality • p2 - True proportion of all vertically integretated firms that would produce High quality
Notation (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions with the characteristic (2 other measures given below) • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p1=p2=p:
Cotton Textile Producers (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions that produce High quality output • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p1=p2=p:
Confidence Interval for p1-p2 (Wilson’s Estimate) • Method adds a success and a failure to each group to improve the coverage rate under certain conditions: • The confidence interval is of the form:
Example - Cotton Textile Production 95% Confidence Interval for p1-p2: Providing evidence that non-integrated producers are more likely to provide high quality output (p1-p2 > 0)
Significance Tests for p1-p2 • Deciding whether p1=p2 canbe done by interpreting “plausible values” of p1-p2 from the confidence interval: • If entire interval is positive, conclude p1 > p2 (p1-p2 > 0) • If entire interval is negative, conclude p1 < p2 (p1-p2 < 0) • If interval contains 0, do not conclude that p1 p2 • Alternatively, we can conduct a significance test: • H0: p1 = p2Ha: p1 p2 (2-sided) Ha: p1 > p2 (1-sided) • Test Statistic: • P-value: 2P(Z|zobs|) (2-sided) P(Z zobs) (1-sided)
Example - Cotton Textile Production Again, there is strong evidence that non-integrated performs are more likely to produce high quality output than integrated firms