1 / 20

Lecture 5 Outline: Thu, Sept 18

Lecture 5 Outline: Thu, Sept 18. Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m. Chapter 1.5.4 (additional material on sampling units), 2.1.2, 2.2 Sampling frame and sampling units Paired t-test

iman
Download Presentation

Lecture 5 Outline: Thu, Sept 18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 Outline: Thu, Sept 18 • Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m. • Chapter 1.5.4 (additional material on sampling units), 2.1.2, 2.2 • Sampling frame and sampling units • Paired t-test • Sampling distribution of sample average • t-ratio and t-test • Confidence intervals

  2. Notes on Box Plots • Dotted lines extend to the largest (and smallest) points in data that are within 1.5 IQRs of the third (first) quartile. All other points are marked by dots. • The red bracket on the side of the box plot shows the shortest half of data (shortest interval containing half the data). The shortest half is at the center for symmetric distributions, but off-center for non-symmetric ones.

  3. Simple Random Sample • A simple random sample (of size n) is a subset of a population obtained by a procedure giving all sets of n distinct items in the population an equal chance of being chosen. • Need a frame: a numbered list of all subjects. • Simple random sample: Generate random number for each subject. Choose subjects with n smallest numbers. • Simple random sample in JMP: • Click on Tables, Subset, then put the number n in the box “Sampling Rate or Sample Size.”

  4. Sampling units • In conducting a random sample, it is important that we are randomly sampling the units of interest. Otherwise we may create a selection bias. • Sampling families • If we want mean number of children per family, we should either • Sample by family • Sample by person but downweight kids from large families. • Suppose we want to know mean level of radiation in community and have available a frame of housing lots in the community. We need to use variable probability sampling, giving a larger probability of being sampled to larger lots.

  5. The clinician’s illusion • For several diseases such as schizophrenia, alcoholism and opiate addiction, clinicians think that the long-term prognosis is much worse than do researchers. • Part of disagreement may arise from differences in the population they sample • Clinicians: “Prevalence” sample – sample from population currently suffering disease which contains a disproportionate number of people suffering disease for long time • Researchers: “Incidence” sample – sample from population who has ever contracted the disease. • Reference: P. Cohen, J. Cohen, Archives of General Psychiatry, 1984.

  6. Case Study 2.1.2 • Broad Question: Are any physiological indicators associated with schizophrenia? Early studies suggested certain areas of brain may be different in persons with schizophrenia than in others but confounding factors clouded the issue. • Specific Question: Is the left hippocampus region of brain smaller in people with schizophrenia? • Research design: Sample pairs of monozygotic twins, where one of twins was schizophrenic and other was not. Comparing monozy. twins controls for genetic and socioeconomic differences.

  7. Case Study 2.1.2 Cont. • The mean difference (unaffected-affected) in volume of left hippocampus region between 15 pairs is 0.199. Is this larger than could be explained by “chance”? • Probability (chance) model: Random sampling (fictitious) from a single population. • Scope of inference • Goal is to make inference about population mean but inference to larger population is questionable because we did not take a random sample. • No causal inference can be made. In fact researchers had no theories about whether abnormalities preceded the disease or resulted from it.

  8. Probability Model • Goal is to compare two groups (affecteds and unaffecteds) but we have taken a paired sample. We can think of having one population (pairs of twins) and looking at the mean of one variable, the difference in hippocampus volumes in each pair. • Probability model: Simple random sample with replacement from population. For a large population, this is essentially equivalent to a simple random sample without replacement.

  9. Parameters and Statistics • Population parameters ( ) • = population mean • = population variance = average size of in population • Hypotheses: • Sample statistics ( ) • Sample: • = sample mean • = sample variance

  10. Sampling distribution of sample mean • See Displays 2.3 and 2.4 • Standard deviation of : • Standard error of : • Estimated standard deviation of the sampling distribution of • For schizophrenia study,

  11. Test Statistics • Z-ratio • For a general parameter: • For 1-group: • t-ratio • For a general parameter: • For 1-group:

  12. Distribution of test statistics • Facts from statistical theory: If* the population distribution of Y is normal, then the sampling distribution of • (i) the z-ratio is standard normal • (ii) the t-ratio is student’s t on n-1 degrees of freedom • * = We will study the “if” part later; for now we will assume it is true • See Display 2.5

  13. Testing a hypothesis about • Could the difference of from (the hypothesized value for , =0 here ) be due to chance (in random sampling)? • Test statistic: • If H0 is true, then t equals the t-ratio and has the Student’s t-distribution with n-1 degrees of freedom

  14. P-value • The (2-sided) p-value is the proportion of random samples with absolute value of t ratios >= observed test statistic (|t|) • Schizophrenia example: t = 3.23

  15. Schizophrenia Example • p-value (2-sided, paired t-test) = .006 • So either, • (i) the null hypothesis is incorrect OR • (ii) the null hypothesis is correct and we happened to get a particularly unusual sample (only 6 out of 1000 are as unusual) • Strong evidence against • One-sided test: • Test statistic: • For schizophrenia example, t=3.21, p-value (1-sided) =.003

  16. Matched pairs t-test in JMP • Click Analyze, Matched Pairs, put two columns (e.g., affected and unaffected) into Y, Paired Response. • Can also use one-sample t-test. Click Analyze, Distribution, put difference into Y, columns. Then click red triangle under difference and click test mean.

  17. Confidence Interval for • A confidence interval is a range of “plausible values” for a statistical parameter (e.g., the population mean) based on the data. It conveys the precision of the sample mean as an estimate of the population mean. • A confidence interval typically takes the form: point estimate margin of error • The margin of error depends on two factors: • Standard error of the estimate • Degree of “confidence” we want.

  18. CI for population mean • If the population distribution of Y is normal (* we will study the if part later) 95% CI for mean of single population: • For schizophrenia data:

  19. Interpretation of CIs • A 95% confidence interval will contain the true parameter (e.g., the population mean) 95% of the time if repeated random samples are taken. • It is impossible to say whether it is successful or not in any particular case, i.e., we know that the CI will usually contain the true mean under random sampling but we do not know for the schizophrenia data if the CI (0.067cm3 ,0.331cm3) contains the true mean difference.

  20. Confidence Intervals in JMP • For both methods of doing paired t-test (Analyze, Matched Pairs or Analyze, Distribution), the 95% confidence intervals for the mean are shown on the output.

More Related