240 likes | 363 Views
Data Analysis I. Anthony E. Butterfield CH EN 4903-1. "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC). Data Analysis.
E N D
Data Analysis I Anthony E. Butterfield CH EN 4903-1 "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC)
Data Analysis • Reasons for data analysis using our p data. • Basics of data analysis. • Statistics. • Probability distributions. • Confidence Intervals. • Error Propagation. • Rejecting data. • Hypothesis Testing. • Fitting data. http://www.che.utah.edu/~geoff/writing/index.html
Analysis of Our Experiment • Hypothesis:Stuff that look like circles are circles. • We have our data…What now? Is the hypothesis true? Object Name Width Perimeter Battery 4.4 ± 0.1 14.0 ± 0.1 Scotch Tape 2.6 ± 0.0 8.2 ± 0.0 Duct Tape 5.3 ± 0.1 16.8 ± 0.3 Floppy 6.3 ± 0.1 19.0 ± 1.0 Fitting 8.8 ± 0.0 27.7 ± 0.2 Gold Doubloon 3.5 ± 0.0 10.7 ± 0.2 Red Cap 4.1 ± 0.5 12.9 ± 1.0 White Cap 4.0 ± 0.0 12.5 ± 0.0 Black Cap 7.8 ± 0.0 24.6 ± 0.0 Soup Can 6.7 ± 0.5 21.3 ± 0.1 Frisbee 8.8 ± 0.1 27.8 ± 0.5 Poker Chip 27.0 ± 0.5 85.0 ± 1.0 Toy Wheel 5.6 ± 0.1 17.1 ± 0.2 Spool of Wire 25.9 ± 0.1 81.5 ± 0.1 Plastic Cup 9.8 ± 0.0 31.4 ± 0.0 Paper Cup 2.9 ± 0.0 9.3 ± 0.0
Results from Our Experiment • Good news: The average “” we found is pretty close to . • But is it close enough? • Other issues: Precision, accuracy, types of error?
For or Against • “” ≈ • Confidence in our hypothesis is increased. • Nothing is “proven”. • Publish results:A.E.Butterfield, et al., “The Circularity of Circular Looking Stuff”, Nature, 2009. • “” Does Not ≈ • Confidence in our hypothesis is diminished. • Going against robust “theory”: Check methods, calculations, take more data… • Good luck publishing….
Data Analysis, Big Picture • We need an objective means to avoid Thucydides‘ criticism, and impartially choose whether our data supports or undermines our favored hypothesis. • "The method of science, as stodgy and grumpy as it may seem, is far more important than the findings of science." ~ Carl Sagan, The Demon Haunted World
Types of Data Analysis • Quality vs Quantity • Quantitative • “The temperature is 45.2 ± 0.1 °C (95% CL).” • Semi-Quantitative • “The temperature is above 0 °C.” • Qualitative • “It’s hot.” • Structural Analysis – What is its structure? • Content Analysis – What is in it? • Distribution Analysis – Where is it? • Process Analysis – When does it occur?
Basics of Statistics • Mean: • Deviation: • Standard Deviation: • Variance:
Discrete Probability Distributions • Random variable x can take on n different values, x1, x2…, xn, with probabilities of P1, P2…, Pn, respectively. • Examples:
Continuous Probability Distributions • A probability density function that describes the probability that a continuous variable will fall within a particular range. • Examples:
Central Limit Theorem • The sum of a sufficiently large number of independent and identically distributed random variables has a normal distribution, regardless of the original distribution:
Normal Distribution • AKA: Gaussian distribution, bell curve. • One of the most common distributions in nature and, therefore, data analysis. • Probability density function (PDF):
Normal Cumulative Distribution Function • Integrate PDF from -∞ to x. • The probability that a value will be below x.
Normal Examples 0% • What is the probability, with =0 and =1, of the measurement being exactly 0? • What is the probability of measuring a value between -0.5 and 1.5 , with =0 and =1? • Between -1 and 3 if with =0 and =2? 62% 62%
An Abnormal Distribution • Log-normal Distribution • Used when random variables multiply. • Particles often take this distribution.
Normal Confidence Intervals • A range that a parameter lies within, given a certain probability. • Confidence intervals for normal distributions:
C.I. for Single Measurements • Gauges / Rulers. • Estimated by the distinguishable increments. • In our experiment? • Digital readouts. • Often ± the smallest digital precision available. • Fluctuating values. • Use the range of fluctuation over an appropriate amount of time. 2.41±0.01 (0.4%) 3.11±0.01 (0.3%) 1.27±0.02 (1.6%)
Error Propagation • For addition or subtraction intuition may be: • But it is unlikely the extremes or error will occur twice: • Multiplication or division • Our p data: • In general:
Some Examples • Calculate interfacial tension between a liquid and a solid: • If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v? T’s Contribution P’s Contribution
A Better, Numerical Method • Can be used for problems which are solved numerically. • May add or subtract si and get different results.
An Example • If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v of an ideal gas? P is biggest source of error
Chauvenet’s Criterion • A statistically justifiable means of rejecting outlying data may be desired (illegitimate error). • The probability of taking a certain measurement on a normal distribution times the number of measurements must be less than 50%. • Tossing data out is suspect, though; avoid it.
Example of Chauvenet’s Criterion • Data from our circle experiment: • We could toss the “floppy”datum. • Would make ouraverage p=3.1465,verses 3.138. • Further from p.