1 / 28

Experimental design and analysis

Experimental design and analysis. Estimation. P( Y ). Y. Y. Frequency (probability) distribution. Distribution of values of variable in population frequency (probability) of different values of a variable occurring under repeated sampling. y. Types of estimates. Point estimate

Download Presentation

Experimental design and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental design and analysis Estimation

  2. P(Y) Y Y Frequency (probability) distribution • Distribution of values of variable in population • frequency (probability) of different values of a variable occurring under repeated sampling

  3. y Types of estimates • Point estimate • Single value estimate of the parameter, e.g. is a point estimate of m • Interval estimate • Range within which the parameter lies known with some degree of confidence, e.g. 95% confidence interval is an interval estimate of m

  4. Methods of estimation • Maximum likelihood • Ordinary least squares

  5. Maximum likelihood (ML) • Given sample of observations from population • Find estimate of parameter that maximises likelihood of observing these data

  6. Likelihood function: • likelihood of observed data for all possible values of parameter • maximum likelihood estimate is that which maximises likelihood function • Maximum likelihood estimators usually require iterative algorithms • no simple arithmetic solutions

  7. Ordinary least squares (OLS) • Given sample of observations from population • OLS estimator is that which minimises sum of squared differences between each observation and parameter estimate • OLS estimators have simple arithmetic solutions • specific assumptions

  8. OLS estimate of m is value of sample mean which minimises sum of squared deviations between each observation and sample mean :

  9. OLS vs ML • For common parameters (e.g. ): • ML and OLS produce same estimates when OLS assumptions hold • ML more reliable when assumptions not met • OLS easier to calculate

  10. Sampling distribution The frequency (or probability) distribution of a statistic. • Many samples (size n) from population • Calculate all the sample means • Plot frequency distribution of sample means (sampling distribution)

  11. P(y) y y Multiple samples - multiple sample means - P(y) - y Sampling distribution of sample means

  12. Sampling distribution of mean • The sampling distribution of the sample mean approaches a normal distribution as n gets larger - Central Limit Theorem. • The mean of this sampling distribution is m, the mean of original population. • The standard deviation of this sampling distribution is s/Ön, the standard deviation of original population divided by square root of sample size -the standard error of the mean.

  13. Standard error of mean • population SE estimated by sample SE: s/Ön • measures precision of sample mean • how close sample mean is likely to be to true population mean

  14. Standard error of mean • If SE is low: • repeated samples would produce similar sample means • therefore, any single sample mean likely to be close to population mean • If SE is high: • repeated samples would produce very different sample means • therefore, any single sample mean may not be close to population mean

  15. Worked example Random sample of 15 quadrats from rocky shore at Cheviot Beach. Variable is number of limpets. 4, 2, 2, 2, 9, 15, 19, 34, 16, 26, 20, 2, 2, 9, 8 Sample mean 11.33 Sample median 9 Sample variance 100.68 Sample SD 10.03 SE of mean 2.59

  16. Interval estimate • How confident are we in a single sample estimate of m, i.e. how close do we think our sample mean is to the unknown population mean. • Remember mis a fixed, but unknown, value. • Interval (range of values) within which we are 95% (for example) sure m occurs - a confidence interval

  17. P( ) y y Distribution of sample means 99% 95% Calculate the proportion of sample means within a range of values. Transform distribution of means to normal distribution with mean = 0 and variance = 1

  18. t statistic • Transform any sample mean to equivalent value from distribution of sample means with a mean of 0 and standard deviation of 1

  19. The t statistic • This t statistic follows a t-distribution, which has a mathematical formula. • Same as normal distribution for n>30 otherwise flatter, more spread than normal distribution. • Different t distributions for different sample sizes < 30 (actually df which is n-1). • The proportions of t values between particular t values, from t-distributions with different df • tabulated in most stats books and programmed into stats software.

  20. Pr(t) 95% -2.78 0 +2.78 t • Probability is 95% that t is between -2.78 and +2.78 • Probability is 95% that ( - m)/s/Ön is between -2.78 and +2.78 • Rearrange equation to solve for m For n = 5 (df = 4), 95% of all t values occur between t = -2.78 and t = +2.78

  21. For 95% CI, use the t value between which 95% of all t values in t distribution occur, for specific df (n-1): • This is a confidence interval. • CI’s from repeated samples of size n , 95% of the CI's would contain m and 5% wouldn’t. • 95% probability that this interval includes the true population mean.

  22. Cheviot Beach (n = 15, df = 14) • Sample mean 11.33 • Sample SD 10.03 • SE 2.59 • The t value (95%, 14df) = 2.15 (from a t-table) • 2.5% of t values are greater than 2.15 • 2.5% of t values are less than -2.15 • 95% of t values are between -2.15 and +2.15 • P {11.33 - 2.15 (10.03 / Ö15) <m< 11.33 + 2.15 (10.03 / Ö15)} = 0.95 • P {5.78 <m< 16.89} = 0.95

  23. Confidence interval • The interval 5.78 - 16.89 will contain m 95% of the time. • We are 95% confident that the interval 5.78 - 16.89 contains m.

  24. Aim: • To obtain a narrow interval (range) for a given level of confidence (e.g. 95%), given s and n. • Sample (n=15) with mean of 11.33 and s of 10.03: • t for 95% CI with 14df is 2.15 • 95% CI is 5.78 to 16.89 (from previous page), i.e. 11.11

  25. 1. Different estimates of s • Sample (n=15) with a mean of 11.33 and s of 5.01 (half of before): • t for 95% CI with 14 df is 2.15 • 95% CI is 8.55 to 14.11, i.e. 5.56(cf. 11.11) • So less variability in population (and sample) results in narrower interval - we are 95% confident that a narrower interval contains m.

  26. 2. Different sample sizes • Sample (n=30) with mean of 11.33 and s of 10.03: • t for 95% CI with 29df is 2.05 • 95% CI is 7.58 to 15.08, i.e. 7.50(cf. 11.11) • So increasing sample size narrows interval because we have a better estimate of s and therefore SE

  27. 3. Different level of confidence (eg. 99%) • Sample (n=15) with mean of 11.33 and s of 10.03: • t for 99% CI with 14df is 2.98 • 99% CI is 3.62 to 19.05, i.e. 15.43(cf. 11.11) • So requiring a greater level of confidence results in a wider interval for a given n and s.

  28. Estimating other parameters • Logic of interval estimation of population mean using t-distribution works for many other population parameters if we know: • exact formula for standard deviation of statistic, i.e. standard error • sampling distribution of (statistic divided by SE) follows a t-distribution

More Related