250 likes | 361 Views
Data Analysis, Presentation, and Statistics. Fr Clinic I. Overview. Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR). Warning.
E N D
Data Analysis, Presentation, and Statistics Fr Clinic I
Overview • Tables and Graphs • Populations and Samples • Mean, Median, and Standard Deviation • Standard Error & 95% Confidence Interval (CI) • Error Bars • Comparing Means of Two Data Sets • Linear Regression (LR)
Warning • Statistics is a huge field, I’ve simplified considerably here. For example: • Mean, Median, and Standard Deviation • There are alternative formulas • Standard Error and the 95% Confidence Interval • There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) • Error Bars • Don’t go beyond the interpretations I give here! • Linear Regression • We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR!
Should I Use a Table or Graph? • Tables • Presenting large amount of different data • Comparing multiple characteristics • Graphs • Visual presentation quickly gives information • Compare one or two characteristics • Showing trends
Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters 4 5 12 Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns
20 11 10 7 5 1 Consistent Format, Title, Units Good Axis Titles, Big Fonts Figures 11 Figure 1: Turbidity of Pond Water, Treated and Untreated
Graphing Suggestions • 1, 2, 5 rule – • Set gradations so smallest division of the axis is a positive integer power of 10 times 1, 2, or 5. • Huh? • Set your scale up so that the smallest division is an integer increment.
Graphing Suggestions • Labels • All axes should be labeled • Include units on the label • Points, lines, curves • Play around with options • Color can be your friend • Color can be your enemy
Populations and Samples • Population • All of the possible outcomes of experiment or observation • US population • Particular type of steel beam • Sample • A finite number of outcomes measured or observations made • 1000 US citizens • 5 beams • We use samples to estimate population properties • Mean, Variability (e.g. standard deviation), Distribution • Height of 1000 US citizens used to estimate mean of US population
Mean and Median • Turbidity of Treated Water (NTU) Mean = Sum of values divided by number of samples = (1+3+3+6+8+10)/6 = 5.2 NTU 1 3 3 6 8 10 Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5 Excel: Mean – AVERAGE; Median - MEDIAN
Variance • Measure of variability • sum of the square of the deviation about the mean divided by degrees of freedom n = number of data points Excel: variance – VAR
95% -1.96 1.96 Standard Deviation, s • Square-root of the variance • For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean • Area under curve is probability of getting value within specified range Excel: standard deviation – STDEV Standard Deviations from Mean
Standard Error of Mean • Standard deviation of mean • Of sample of size n • taken from population with standard deviation s • Estimate of mean depends on sample selected • As n , variance of mean estimate goes down, i.e., estimate of population mean improves • As n , mean estimate distribution approaches normal, regardless of population distribution
95% Confidence Interval (CI) for Mean • Interval within which we are 95 % confident the true mean lies • t95%,n-1 is t-statistic for 95% CI if sample size = n • If n 30, let t95%,n-1 = 1.96 (Normal Distribution) • Otherwise, use Excel formula: TINV(0.05,n-1) • n = number of data points
Error Bars • Show data variability on plot of mean values • Types of error bars include: • ± Standard Deviation, ± Standard Error, ± 95% CI • Maximum and minimum value
Using Error Bars to compare data • Standard Deviation • Demonstrates data variability, but no comparison possible • Standard Error • If bars overlap, any difference in means is not statistically significant • If bars do not overlap, indicates nothing! • 95% Confidence Interval • If bars overlap, indicates nothing! • If bars do not overlap, difference is statistically significant • We’ll use 95 % CI
Example 1 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.
Linear Regression • Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.
R2 - Coefficient of multiple Determination ŷi = Predicted y values, from regression equation yi = Observed y values R2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line