1 / 16

Lab 4: What is a t-test?

Lab 4: What is a t-test?. Something British mothers use to see if the new girlfriend is significantly better than the old one? . The t Distribution. We want to compute a confidence interval & test a hypothesis for an unknown population mean µ

luka
Download Presentation

Lab 4: What is a t-test?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one? 

  2. The t Distribution • We want to compute a confidence interval & test a hypothesis for an unknown population mean µ • To use the Z distribution we must know the population standard deviation σ • In most real life situations, we don’t know the true population standard deviation • In this case we can use the t distribution instead of the Z distribution to calculate confidence intervals & test hypotheses It seems the more we learn the less we know!

  3. T Distribution: Unimodal & symmetric around zero Use when population µ & σ are both UNKNOWN Assumes variable of interest is normally distributed Using sample S.D. introduces more sampling variability Heavier tails (n-1) degrees of freedom Z Distribution (CLT): Unimodal & symmetric around zero Use when population µ is UNKNOWN but σ is KNOWN CLT allows us to say the sampling distribution of the mean is approx. normal as n gets large, even when the underlying variable of interest is not normally distributed Smaller tails T vs. Z Distribution: Who’s who?

  4. T vs. Z Distribution: Who’s who?

  5. The t Distribution • Assumptions: • The variable (X) is normally distributed • Random sample of size n from the underlying population • Very similar to Z distribution as sample size gets “large” (30+) • (n-1) degrees of freedom

  6. Degrees of Freedom: (n-1) • The “Currency” of statistics - you earn a degree of freedom for every data point you collect, and you spend a degree of freedom for each parameter you estimate. Since you usually need to spend 1 just to calculate the mean, you then are left with n-1 (total data points "n" - 1 spent on calculating the mean). (Reference: http://www.isixsigma.com/dictionary) • A general rule is that the degrees of freedom decrease when we have to estimate more parameters. • Before you can compute the standard deviation, you have to first estimate a mean. • This causes you to lose a degree of freedom (Reference: http://www.childrens-mercy.org/stats/ask/df.asp) Two statistics are in a bar, talking and drinking. One statistic turns to the other and says "So how are you finding married life?" The other statistic responds, "It's okay, but you lose a degree of freedom."

  7. STATA & the one sample t-test

  8. STATA Options • Get t critical value: display invttail(df,p) • Used for CI & Hypothesis Tests • Get p-value: display ttail(df,t) (one-sided) display tprob(df,t) (two-sided) • Run t-test from data summary: • Useful for summary homework problems ttesti n x_bar s µ • Run t-test on actual data: • Useful in real-life research ttest varname= µ

  9. STATA: obtaining the critical value • Example: Concentration of benzene in cigars • Hypothesis Test: 2-sided test • Null Hypothesis: μ=81 μg/g vs. Alternative Hypothesis: μ≠81 μg/g • Standard deviation is unknown • α= 0.05 (two-sided test) • Data: • Sample mean= 151 μg/g • Sample Standard deviation, s=9 μg/g • d.f. = n-1 = 7-1 = 6 • The STATA command: invttail(df,p) where df is the degrees of freedom and p is a number between 0 and 1. • display invttail(6,0.025) 2.4469118 • This means that if T statistic is above 2.447 or below –2.447, then we would reject the null hypothesis at the 5% alpha level. Since the observed value of the statistic T is 20.6, we reject the null hypothesis. t=

  10. Note of Confusion! • Note! • invnorm(p) returns the inverse cumulative standard normal distribution [i.e. returns z which satisfies P(Z ≤ z)=p] • invttail(df,p) returns the inverse REVERSE cumulative Student's t distribution [i.e. returns t which satisfies P(T ≥ t)=p)] • So instead of using invttail(6,0.975) we should use invttail(6,0.025)

  11. STATA: obtaining the p-value • Use: ttail(df,t) (one-sided) or tprob(df,t) (two-sided) display ttail(6,20.6) 4.257e-07 • This gives you P(T ≥ 20.6). To obtain the p-value for this two-sided test, we have p=P(|T| ≥ 20.6) = P(T ≥ 20.6 or T ≤ -20.6)=2*P(T ≥ 20.6)= 8.513e-07. • While tprob will give you P(|T| ≥ t) directly: display tprob(6,20.6) 8.513e-07

  12. STATA: running one-sample t-test from summary statistics • ttesti n x_bar s µ • Null Hypothesis: μ=81 μg/g vs. Alternative Hypothesis: μ≠81 μg/g Data: Sample mean= 151 μg/g (x_bar) Sample Standard deviation, s=9 μg/g d.f. = n-1 = 7-1 = 6 ttesti 7 151 9 81

  13. STATA Output • One-sample t test • ------------------------------------------------------------------------------ • | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] • ---------+-------------------------------------------------------------------- • x | 7 151 3.40168 9 142.6764 159.3236 • ------------------------------------------------------------------------------ • Degrees of freedom: 6 • Ho: mean(x) = 81 • Ha: mean < 81 Ha: mean != 81 Ha: mean > 81 • t = 20.5781 t = 20.5781 t = 20.5781 • P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000

  14. STATA: running one-sample t-test on data • Open lowbwt.dta contained on the disk in your book. If you wish to test a hypothesis regarding the population mean of the gestation age of low birth weight infants (for example: you might hypothesize that low birth infants have gestation ages greater than 28 weeks). To test this one-sided hypothesis: H0: mean <= 28 H1: mean > 28 Alpha-level = 0.05 You would use the following STATA command: ttest gestage = 28

  15. STATA Output • One-sample t test • ------------------------------------------------------------------------------ • Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] • ---------+-------------------------------------------------------------------- • gestage | 100 28.89 .253419 2.53419 28.38716 29.39284 • ------------------------------------------------------------------------------ • Degrees of freedom: 99 • Ho: mean(gestage) = 28 • Ha: mean < 28 Ha: mean != 28 Ha: mean > 28 • t = 3.5120 t = 3.5120 t = 3.5120 • P < t = 0.9997 P > |t| = 0.0007 P > t = 0.0003 • STATA writes out two-sided and one-sided hypotheses. In this case, we would be employing the one on the right (Ha: mean>28). Since the p-value is 0.0003 which is less than our alpha-level of 0.05, we would reject the null hypothesis and conclude that the mean gestation age is not less than 28 weeks.

  16. STATA: two sample t-test & paired t-test Next Week…

More Related