310 likes | 319 Views
Explore point estimation, confidence intervals, sample size, and normality assessment methods in statistical inference. Learn how to interpret confidence intervals and choose sample sizes effectively.
E N D
This Week • Review of estimation and hypothesis testing • Reading Le (review) • Chapter 4: Sections 4.1 – 4.3 • Chapter 5: Sections 5:1 and 5:4 • Chapter 7: Sections 7:1 – 7.3 • Reading C &S • Chapter 2:A-E • Chapter 6: A,B,F
Point Estimate Sampling error: True value – estimate (unknown)
The sample data provide a value for the sample mean . Statistical Inference Population with mean m = ? A simple random sample of n elements is selected from the population. The value of is used to make inferences about the value of m.
Interval Estimation In general, confidence intervals are of the form: Estimate = mean, proportion, regression coefficient, odds ratio... SE = standard error of your estimate 1.96 = for 95% CI based on normal distribution
Standard normal distribution 2.5% probability 2.5% probability -1.96 1.96
Estimation for Population Meanm Point estimate: Estimate of variability in population A slightly larger number based on the t-distribution is used for smaller n Estimate of variability in point estimate (SE) 95% Confidence Interval
Assumptions • Data in population follows a normal distribution or • Sample size is large enough to apply central limit theorem (CLT) • CLT – no matter the shape of the population distribution of the sample mean approaches a normal distribution as the sample size gets large
Meaning of Confidence Interval • There is a 95% chance that your interval contains m. (That you “captured” the true value m with your interval)
Example Suppose sample of n=100 persons mean = 215 mg/dL, standard deviation = 20 95% CI = Lower Limit: 215 – 1.96*20/10 Upper Limit: 215 + 1.96*20/10 = (211, 219) “We are about 95% confident that the interval 211-219 contains m” We can pretty much rule out that m > 220
Properties of Confidence Intervals • As sample size increases, CI gets smaller • Because SE gets smaller; • Can use different levels of confidence • 90, 95, 99% common • More confidence means larger interval; so a 90% CI is smaller than a 99% CI • What would a 100% CI look like? • Changes with population standard deviation • More variable population means larger interval
Effect of sample size Suppose we had only 10 observations What happens to the confidence interval? For n = 100, For n = 10, Larger sample size = smaller interval
Effect of confidence level Suppose we use a 90% interval What happens to the confidence interval? 90%: Lower confidence level = smaller interval (A 99% interval would use 2.58 as multiplier and the interval would be larger)
Effect of standard deviation Suppose we had a SD of 40 (instead of 20) What happens to the confidence interval? More variation = larger interval
Effect of different sample Suppose new sample with mean of 212 (but same standard deviation) What happens to the confidence interval? Same size, moves a little
How Big A Sample To Take? • Depends on the variability in the population • Depends on how precise an estimate you want • Cost - if it doesn’t cost much to sample an element then sample many
95% Confidence Intervals for m Using SAS PROCMEANS DATA = datasetnameCLM ; VAR list of variables This will display the following statistics N Mean Standard Deviation Standard Error of Mean Lower 95% Confidence Limit Upper 95% Confidence Limit Confidence Limits
Assessing Normality with Graphs • Boxplots and stem-and-leaf plots, histograms • Look for skewness (non-symmetry) • Hard to get normal looking graphs with small sample sizes • Can check effect of transformations • Normal probability plots • x-axis: related to inverse of standard normal distribution • y-axis: actual data • * actual data • + what we would expect if data were really normal
Assessing normality - PROC UNIVARIATE PROCUNIVARIATEDATA = demo NORMAL PLOT; VAR ursod; * Ursod is urinary sodium excretion in 8-hours RUN; NORMAL and PLOT are two options that test for normality and display simple graphs Plots are best - with enough data, tests for normality almost always reject normality assumption
STEM AND LEAF PLOT Stem Leaf # Boxplot 16 6 1 0 15 0 1 0 14 7 1 0 13 6 1 0 12 038 3 0 11 7 1 | 10 49 2 | 9 57 2 | 8 0002 4 | 7 033456 6 | 6 0134568 7 +-----+ 5 001347 6 | + | 4 00001123333456777779999 23 *-----* 3 011244455667799 15 +-----+ 2 23444556678888999 17 | 1 4677788 7 | ----+----+----+----+--- Multiply Stem.Leaf by 10**+1
The UNIVARIATE Procedure Variable: ursod Normal Probability Plot 165+ * | * | * 135+ * ++ | *** +++ | * +++ 105+ * +++ | *++ | ++* 75+ ++*** | ++*** | +++ ** 45+ +****** | ***** | ******** 15+* * ** ** +++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
Variable: lursod Normal Probability Plot 5.15+ +* | *++ | **++ | **++ | ** + 4.65+ * ++ | *++ | *+ | *** | ** 4.15+ ** | +* | ++** | +*** | *** 3.65+ ** | ** | +* | **** | ** 3.15+ **+ | *+ | ++ | **+** | * + 2.65+* ++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Log transformed value shows a better linear pattern
Hypothesis Testing Hypothesis: A statement about parameters of population or of a model (m=200 ?) Test: Does the data agree with the hypothesis? (sample mean 220) Measure the agreement with probability
Steps in hypothesis testing • State null and alternative hypothesis (Ho and Ha) • Ho usually a statement of no effect or no difference between groups • Choose α level • Probability of falsely rejecting Ho (Type I error)
Steps in hypothesis testing • Calculate test statistic, find p-value (p) • Measures how far data are from what you expect under null hypothesis • State conclusion: p < α, reject Ho p > α, insufficient evidence to reject Ho
Possible results of tests What we decide Reality
Details α related to confidence level Commonly set at 0.05 or 0.01 β usually predetermined by sample size
One sample t-test;test for population mean • Simple random sample from a normal population (or n large enough for CLT) • Ho: μ = μo • Ha : μ μo , pick α • test statistic:
Matched pairs data • Recall independence requirement for CIs • Similar issue for t-tests • Observations not independent Examples; pre and post test, left and right eyes, brother-sister pairs • Solution: look at paired differences, do one sample test on differences d = X2 - X1 Ho: d = 0, Ha: d 0
PROC TTEST, one sample test PROCTTESTDATA = DEMO; VAR age; RUN; • Tests if mean age is different than zero. Not very useful • Need to be tricky...
Use a Data step to calculate a new variable • Subtract value of mean under null hypothesis • Test new variable for difference from zero DATA DEMO; SET DEMO; dage = age - 25; RUN; PROCTTESTDATA=DEMO ; VAR dage; RUN; This tests whether the mean age is different from 25
PROC TTEST one sample output T-Tests Variable DF t Value Pr > |t| dage 11 -0.41 0.6931 Conclusion: We have insufficient evidence to claim that the mean age is different than 25 (p=0.69)