240 likes | 248 Views
Data. Freshman Clinic II. Overview. Populations and Samples Presentation Tables and Figures Central Tendency Variability Confidence Intervals Error Bars Student t test Linear Regression Applications. Populations and Samples. Population All possible data points Entire US population
E N D
Data Freshman Clinic II
Overview • Populations and Samples • Presentation • Tables and Figures • Central Tendency • Variability • Confidence Intervals • Error Bars • Student t test • Linear Regression • Applications
Populations and Samples • Population • All possible data points • Entire US population • Every rainfall event in Glassboro (past, present, and future) • Sample • Subset of population • We use samples to estimate population parameters
Presentation • Present clearly, objectively • Properly communicate uncertainty • Compare using valid statistics
Tables Table 1: Water Quality (average of 3 to 5 values)
20 11 10 7 5 1 Figures – Bar Chart 11 Figure 1: Average Turbidity of Pond Water, Treated and Untreated
Figures – XY Scatter Figure 2: Change in Water Quality
Central Tendency • Example: Turbidity of Treated Water (NTU) • Sample is 1, 3, 3, 6, 8, 10 n = 6 Mean = Sum of values divided by number of data points e.g., (1+3+3+6+8+10)/6 = 5.17 NTU Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 (ordered) For even number of sample points, average middle two e.g., (3+6)/2 = 4.5 For odd number of sample points, median = middle point
Variability • Standard deviation of a sample xi = ith data point = mean of sample n = number of data points e.g., [{(1-5.2)2+(3-5.2)2 +(3-5.2)2 +(6-5.2)2 +(8-5.2)2 +(10-5.2) 2}/(6-1)]0.5 = 3.43
Where = sample mean, t = statistical parameter related to confidence, s = sample standard deviation, and n = sample size Confidence Interval of Mean • Estimated range within which population mean falls • e.g., 95% confidence interval of mean, based on our sample, is (1.57 8.77) where = population mean • We are 95% confident true mean of population (from which our sample was drawn) lies within this range • Confidence interval (CI) calculated from sample:
In Excel, type “=TINV” into a cell and select the “=“ symbol in the formula bar The student’s t-distribution inverse formula palette pops up “Probability” = 1 – confidence level (as a fraction) e.g., if confidence level is 95%, “probability” = 1 - 0.95 = 0.05 “Deg_freedom” = degrees of freedom = n - 1 TINV returns “t”, the statistical parameter we need to estimate a confidence interval based on a sample Calculating “t”
Calculating a Confidence Interval • For our example: • “TINV” returned 2.57 • t x s / sqrt(n) = 2.57 x 3.43 / sqrt(6) = 3.60 • 5.17 – 3.60 = 1.57 • 5.17 + 3.60 = 8.77 • CI: (1.57 8.77) with 95% confidence • i.e., we are 95% confident the population mean lies between 1.57 and 8.77 • Quite Wide! • Lower “s” or higher “n” will narrow range
Error Bars • Used to show data variability on a graph • Bar chart, XY,…
Types of Error Bars • Standard Error of Mean • Confidence Interval • Standard Deviation • Percentage http://www.graphpad.com/articles/errorbars.htm Standard Error
Create chart in Excel Select a data series by selecting a data point or bar From “Format” menu, select “Selected data series…” 5. Select + and – error bar data. This could be standard deviation, standard error, or confidence limits. 4. Select “custom” Adding Error Bars
Error Bars and our Example • Standard Error of Mean • s / sqrt(n) = 3.43 / sqrt(6) = 1.40 • Put 1.40 in + and - cells • Since the mean = 5.17, the error bars in a bar chart would go from • 5.17 – 1.40 = 3.77 to • 5.17 + 1.40 = 6.57
Interpreting Error Bars • Error bars can be used to compare two sample means • Standard Error (SE) • SE bars do not overlap, no conclusions can be drawn • SE bars overlap, sample appear to be not drawn from significantly different populations • Confidence Interval (CI) • CI bars do not overlap, samples appear to be drawn from significantly different populations, at confidence level of confidence interval • CI bars overlap, no conclusions can be drawn http://www.graphpad.com/articles/errorbars.htm
Comparing Samples with a t-test • Example - You measure untreated and treated pond water • Treated: mean = 2 NTU, s = 0.5 NTU, n = 20 • Untreated: mean = 3 NTU, s = 0.6 NTU, n = 20 • You ask the question – Is the average turbidity of treated water different from that of untreated water? • Use a t-test
Is the water different? • Use TTEST (Excel) • Probability (as fraction) of being wrong if you claim statistically significant difference (type I error) • Select significance level ahead of time, usually 0.01 - 0.1 • For our example, our #, 0.0000015, is very small
T test steps • Identify two samples to compare • Select a , significance of statistical test • We’ll use 0.05 in this class • Confidence = 1 - a • Use Excel “TTEST” formula to estimate probability of Type I Error • If probability returned by TTEST is less than or equal to 0.05, assume the samples come from two different populations For our example, 0.0000015 < 0.05, assume the treated water is different from the untreated water
Linear Regression • Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to show equation and R2.
R2 - Coefficient of multiple Determination = Predicted y values, from regression equation = Average of y yi = Observed y values R2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line
What might you do in this class? • Flow rate versus stroke rate • Figure with linear regression over linear range • Ability to improve water quality • Table and t-test comparison with untreated water (for turbidity and apparent color), or • Bar chart (for turbidity and apparent color) with confidence interval error bars • Pressure change versus flow rate, Power versus flowrate • Figure (no statistics possible because we only took one reading of pressure for each flow rate and relationship is non-linear) • Force versus stroke rate, • Figure w/95% confidence interval error bars for each data point • Power versus Flowrate • Figure
Example – Water Quality Table 2: Improvement in Water Quality Note: Statistical significance tested at level = 0.05 using t-test