230 likes | 238 Views
Covering key statistical concepts such as mean, median, standard deviation, and linear regression for beginners. Learn to create tables, graphs, and interpret statistical data effectively.
E N D
Statistics:Data Analysis and Presentation Fr Clinic II
Overview • Tables and Graphs • Populations and Samples • Mean, Median, and Standard Deviation • Standard Error & 95% Confidence Interval (CI) • Error Bars • Comparing Means of Two Data Sets • Linear Regression (LR)
Warning • Statistics is a huge field, I’ve simplified considerably here. For example: • Mean, Median, and Standard Deviation • There are alternative formulas • Standard Error and the 95% Confidence Interval • There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) • Error Bars • Don’t go beyond the interpretations I give here! • Comparing Means of Two Data Sets • We just cover the t test for two means when the variances are unknown but equal, there are other tests • Linear Regression • We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR!
Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters 4 5 12 Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns
20 11 10 7 5 1 Consistent Format, Title, Units Good Axis Titles, Big Fonts Figures 11 Figure 1: Turbidity of Pond Water, Treated and Untreated
Populations and Samples • Population • All of the possible outcomes of experiment or observation • US population • Particular type of steel beam • Sample • A finite number of outcomes measured or observations made • 1000 US citizens • 5 beams • We use samples to estimate population properties • Mean, Variability (e.g. standard deviation), Distribution • Height of 1000 US citizens used to estimate mean of US population
Mean and Median • Turbidity of Treated Water (NTU) Mean = Sum of values divided by number of samples = (1+3+3+6+8+10)/6 = 5.2 NTU 1 3 3 6 8 10 Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5 Excel: Mean – AVERAGE; Median - MEDIAN
Variance • Measure of variability • sum of the square of the deviation about the mean divided by degrees of freedom n = number of data points Excel: variance – VAR
95% -1.96 1.96 Standard Deviation, s • Square-root of the variance • For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean • Area under curve is probability of getting value within specified range Excel: standard deviation – STDEV Standard Deviations from Mean
Standard Error of Mean • Standard deviation of mean • Of sample of size n • taken from population with standard deviation s • Estimate of mean depends on sample selected • As n , variance of mean estimate goes down, i.e., estimate of population mean improves • As n , mean estimate distribution approaches normal, regardless of population distribution
95% Confidence Interval (CI) for Mean • Interval within which we are 95 % confident the true mean lies • t95%,n-1 is t-statistic for 95% CI if sample size = n • If n 30, let t95%,n-1 = 1.96 (Normal Distribution) • Otherwise, use Excel formula: TINV(0.05,n-1) • n = number of data points
Error Bars • Show data variability on plot of mean values • Types of error bars include: • ± Standard Deviation, ± Standard Error, ± 95% CI • Maximum and minimum value
Using Error Bars to compare data • Standard Deviation • Demonstrates data variability, but no comparison possible • Standard Error • If bars overlap, any difference in means is not statistically significant • If bars do not overlap, indicates nothing! • 95% Confidence Interval • If bars overlap, indicates nothing! • If bars do not overlap, difference is statistically significant • We’ll use 95 % CI
Example 1 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.
What can we do? • Plot mean water quality data for various filters with error bars • Plot mean water quality over time with error bars
Comparing Filter Performance • Use t test to determine if the mean of two populations are different. • Based on two data sets • E.g., turbidity produced by two different filters
Comparing Two Data Sets using the t test • Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity. • Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20 • Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20 • You ask the question - Do the Filters make water with a different mean turbidity?
Do the Filters make different water? • Use TTEST (Excel) • Fractional probability of being wrong if you answer yes • We want probability to be small 0.01 to 0.10 (1 to 10 %). Use 0.01
“t test” Questions • Do two filters make different water? • Take multiple measurements of a particular water quality parameter for 2 filters • Do two filters treat difference amounts of water between cleanings? • Measure amount of water filtered between cleanings for two filters • Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated? • For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated
Linear Regression • Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.
R2 - Coefficient of multiple Determination ŷi = Predicted y values, from regression equation yi = Observed y values R2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line