1 / 20

Chapter 8

Chapter 8. Inference Concerning Proportions. Inference for a Single Proportion ( p ). Goal: Estimate proportion of individuals in a population with a certain characteristic ( p ). This is equivalent to estimating a binomial probability

colum
Download Presentation

Chapter 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Inference Concerning Proportions

  2. Inference for a Single Proportion (p) • Goal: Estimate proportion of individuals in a population with a certain characteristic (p). This is equivalent to estimating a binomial probability • Sample: Take a SRS of n individuals from the population and observe X that have the characteristic. The sample proportion is X/n and has the following sampling properties:

  3. Large-Sample Confidence Interval for p • Take SRS of size n from population where p is true (unknown) proportion of successes. • Observe X successes • Set confidence level C and choose z* such that P(-z*Z z*)=C (C = 90%  z*=1.645 C = 95%  z*=1.96 C = 99%  z*=2.576)

  4. Example - Ginkgo and Azet for AMS • Study Goal: Measure effect of Ginkgo and Acetazolamide on occurrence of Acute Mountain Sickness (AMS) in Himalayan Trackers • Parameter: p = True proportion of all trekkers receiving Ginkgo&Acetaz who would suffer from AMS. • Sample Data: n=126trekkers received G&A, X=18 suffered from AMS

  5. Wilson’s “Plus 4” Method • For moderate to small sample sizes, large-sample methods may not work well wrt coverage probabilities • Simple approach that works well in practice (n10): • Pretend you have 4 extra individuals, 2 successes, 2 failures • Compute the estimated sample proportion in light of new “data” as well as standard error:

  6. Example: Lister’s Tests with Antiseptic • Experiments with antiseptic in patients with upper limb amputations (John Lister, circa 1870) • n=12 patients received antiseptic X=1 died

  7. Significance Test for a Proportion • Goal test whether a proportion (p) equals some null value p0H0: p=p0 Large-sample test works well when np0 and n(1-p0) > 10

  8. Ginkgo and Acetaz for AMS • Can we claim that the incidence rate of AMS is less than 25% for trekkers receiving G&A? • H0: p=0.25 Ha: p < 0.25 Strong evidence that incidence rate is below 25% (p<0.25)

  9. Comparing Two Population Proportions • Goal: Compare two populations/treatments wrt a nominal (binary) outcome • Sampling Design: Independent vs Dependent Samples • Methods based on large vs small samples • Contingency tables used to summarize data • Measures of Association: Absolute Risk, Relative Risk, Odds Ratio

  10. Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts

  11. Outcome Present Outcome Absent Group Total Group 1 X1 n1-X1 n1 Group 2 X2 n2-X2 n2 Outcome Total X1+X2 (n1+n2)-(X1+X2) n1+n2 2x2 Tables - Notation

  12. High Quality Low Quality Group Total Not Integrated 33 55 88 Vertically Integrated 5 79 84 Outcome Total 38 134 172 Example - Firm Type/Product Quality • Groups: Not Integrated (Weave only) vs Vertically integrated (Spin and Weave) Cotton Textile Producers • Outcomes: High Quality (High Count) vs Low Quality (Count) Source: Temin (1988)

  13. Notation • Proportion in Population 1 with the characteristic of interest: p1 • Sample size from Population 1: n1 • Number of individuals in Sample 1 with the characteristic of interest: X1 • Sample proportion from Sample 1 with the characteristic of interest: • Similar notation for Population/Sample 2

  14. Example - Cotton Textile Producers • p1 - True proportion of all Non-integretated firms that would produce High quality • p2 - True proportion of all vertically integretated firms that would produce High quality

  15. Notation (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions with the characteristic (2 other measures given below) • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p1=p2=p:

  16. Cotton Textile Producers (Continued) • Parameter of Primary Interest: p1-p2, the difference in the 2 population proportions that produce High quality output • Estimator: • Standard Error (and its estimate): • Pooled Estimated Standard Error when p1=p2=p:

  17. Confidence Interval for p1-p2 (Wilson’s Estimate) • Method adds a success and a failure to each group to improve the coverage rate under certain conditions: • The confidence interval is of the form:

  18. Example - Cotton Textile Production 95% Confidence Interval for p1-p2: Providing evidence that non-integrated producers are more likely to provide high quality output (p1-p2 > 0)

  19. Significance Tests for p1-p2 • Deciding whether p1=p2 canbe done by interpreting “plausible values” of p1-p2 from the confidence interval: • If entire interval is positive, conclude p1 > p2 (p1-p2 > 0) • If entire interval is negative, conclude p1 < p2 (p1-p2 < 0) • If interval contains 0, do not conclude that p1 p2 • Alternatively, we can conduct a significance test: • H0: p1 = p2Ha: p1 p2 (2-sided) Ha: p1 > p2 (1-sided) • Test Statistic: • P-value: 2P(Z|zobs|) (2-sided) P(Z zobs) (1-sided)

  20. Example - Cotton Textile Production Again, there is strong evidence that non-integrated performs are more likely to produce high quality output than integrated firms

More Related