Statistics

Statistics • Introduction • 1.)All measurements contain random error • results always have some uncertainty 2.) Uncertainty are used to determine if two or more experimental results are equivalent or different • Statistics is used to accomplish this task Is the mutant (transgenic) mouse significantly fatter than the normal (wild-type) mouse? Statistical Methods Provide Unbiased Means to Answer Such Questions. Masuzaki, H., et. al Science (2001), 294(5549), 2166

Statistics • Gaussian Curve • 1.)For a series of experimental results with only random error: (i) A large number of experiments done under identical conditions will yield a distribution of results. (ii) Distribution of results is described by a Gaussian or Normal Error Curve High population about correct value low population far from correct value Number of Occurrences Value

Statistics • Gaussian Curve • 2.)Any set of data (and corresponding Gaussian curve) can be characterized by two parameters: (i) Mean or Average Value ( ) where: n = number of data points xi = value of data point number i = value1 + value2 + value3… valuen (ii)Standard Deviation (s) Smaller the standard deviation is, more precise the measurement is.

Statistics • Gaussian Curve • 3.)Other Terms Used to Describe a Data Set (i) Variance: Related to the standard deviation • Used to describe how “wide” or precise a distribution of results is variance = (s)2 where: s = standard deviation (ii)Range: difference in the highest and lowest values in a set of data • Example: measurments of 4 light bulb lifetimes • 821, 783, 834, 855 High Value = 855 hours Low Value = 783 hours Range = High Value – Low Value = 855 – 783 = 72 hours

Statistics • Gaussian Curve • 3.)Other Terms Used to Describe a Data Set (iii) Median: The value in a set of data which has an equal number of data values above it and below it • For odd number of data points, the median is actually the middle value • For even number of data points, the median is the value halfway between the two middle values • Example: Data Set: 1.19, 1.23, 1.25, 1.45 ,1.51 mean( ) = 1.33 Data Set: 1.19, 1.23, 1.25, 1.45 mean( ) = 1.28 median = 1.24 Median value Median value

Statistics • Gaussian Curve (iii)Example: For the following bowling scores 116.0, 97.9, 114.2, 106.8 and 108.3, find the mean, median, range and standard deviation.

Statistics • Gaussian Curve • 4.)Relating Terms Back to the Gaussian Curve (i) Formula for a Gaussian curve wheree = base of natural logarithm (2.71828…) m≈ (mean) s ≈ s (standard deviation) mean Entire area under curve is normalized to one ± standard deviation

Statistics • Standard Deviation and Probability • 1.)By knowing the standard deviation (s) and the mean ( ) of a set of result (and the corresponding Gaussian curve) (i) The probability of the next result falling in any given range can be calculated by: (ii) The probability of a result falling in that portion of the Gaussian curve is equal to the normalized area of the curve in that portion. (iii) Example: 68.3% of the area of a Gaussian curve occurs between the values -1s and +1s ( ± 1s) Thus, any new result has a 68.3% chance of falling within this range. Probability of Measuring a value in a certain range is equal to the area of that range

Statistics Standard Deviation and Probability • - Area under curve from mean value and result. • Total ½ area is 0.5. • Remaining area is 0.5 – Area. • Example: • z = 1.3area from mean to 1.3 is 0.403 •  area from infinity to 1.3 is 0.5 – 0.403 = 0.097

Statistics • Standard Deviation and Probability (iii) Example: A bowler has a mean score of 108.6 and a standard deviation of 7.1. What fraction of the bowler’s scores will be less than 80.2?

Statistics • Standard Deviation and Probability • 2.)Knowing the standard deviation (s) of a data set indicates the precision of a measurement (i) Common intervals used for expressing analytical results are shown below: (ii) The precision of many analytical measurements is expressed as: • There is only a ~5% chance (1 out of 20) that any given measurement on the sample will be outside of this range

Statistics • Standard Deviation and Probability • 4.)The precision of a mean (average) result is expressed using a confidence interval (i) Relationship between the true mean value (m) and the measured mean ( ) is given by: where:s = standard deviation n = number of measurements t = student’s t value degrees of freedom = (n-1) Confidence interval Note: As n increases, the confidence interval becomes smaller (m becomes more precisely known)

Statistics • Standard Deviation and Probability • 4.)The precision of a mean (average) result is expressed using a confidence interval (ii) Student’s t • Statistical tool frequently used to express confidence intervals From number of measurements (n-1) A probability distribution that addresses the problem of estimating the mean of a normally distributed population when the sample size is small. Population standard deviation (s) is unknown and has to be estimated from the data using s.

Statistics • Standard Deviation and Probability • 4.)The precision of a mean (average) result is expressed using a confidence interval (iii) The meaning of Confidence Interval • To determine the “true” mean need to collect an infinite number of data points. - obviously not possible • Confidence interval tells us the probability that the range of numbers contains the “true” mean. 50% confidence interval  range of numbers only contains true mean 50% of the time 90% confidence interval  range of numbers contains true mean 90% of the time. “true” mean 50% of data sets do not contain true mean

Statistics • Standard Deviation and Probability (iii) Example: For the following bowling scores 116.0, 97.9, 114.2, 106.8 and 108.3, a bowler has a mean score of 108.6 and a standard deviation of 7.1. What is the 90% confidence interval for the mean?

Statistics • Standard Deviation and Probability • 5.)Comparison of Two Data Sets (i) To determine if two results obtained by the same method are statistically the same, use the following formula to determine a calculated t: where: = mean results of samples 1 & 2 n1, n2 = number of measurements of samples 1 & 2 spooled = “pooled” standard deviation Requires standard deviation from the two data sets be similar.

Statistics • Standard Deviation and Probability • 5.)Comparison of Two Data Sets (ii) Compare calculated t to the corresponding value in the Student’s t probability table. • Use the desired %confidence level at the appropriate Degrees of freedom • Degrees of Freedom = (n1 + n2 -2) (iii) If calculated t is greater than the value in the Student’s t probability table, then the two results are significantly different at the given % confidence level. • Easier to achieve for lower %confidence level Calculated t needs to be less than table value

Statistics • Standard Deviation and Probability • 5.)Comparison of Two Data Sets (iv) Example: The amount of 14CO2 in a plant sample is measured to be: 28, 32, 27, 39 & 40 counts/min (mean = 33.2). The amount of radioactivity in a blank is found to be: 28, 21, 28, & 20 counts/min (mean = 24.2). Are the mean values significantly different at a 95% confidence level?

Statistics • Standard Deviation and Probability • 5.)Comparison of Two Data Sets (iv) Example: Degrees of Freedom = (5 + 4 – 2) = 7 From Student’s t probability table: Degrees of Freedom (7) 95% Confidence level Calculated t (2.48) > 2.365 The results are significantly different at a 95% confidence level, but not at 98% or higher confidence levels

Statistics • Standard Deviation and Probability • 6.)Comparison of Two Methods (i) To determine if the results of two methods for the same sample are the same, use the following formula to determine a calculated t: where: = difference in the mean values of the two methods n = number of samples analyzed by each method sd = (ii) Degree of Freedom = (n - 1) (iii) If calculated t is greater than the value in the Student’s t probability table, then the two methods are significantly different at the given % confidence level.

Statistics • Standard Deviation and Probability • 6.)Comparison of Two Methods (iv) Example: Two methods for measuring cholesterol in blood provide the following results: Are these methods significantly different at the 95% confidence level?

Statistics • Standard Deviation and Probability • 6.)Comparison of Two Methods (iv) Example: 95% Confidence level Degrees of Freedom (6-1 =5) Calculated t (1.20) ≤ 2.571 The results are not significantly different at a 95% confidence level.

Statistics • Dealing with Bad Data • 1.)Q Test (i) Method used to decide whether or not to reject a “bad” data point. (ii) Procedure: • Arrange Data in order of increasing value. • Determine the lowest and highest values and the total range of values. Example: 12.47 12.48 12.53 12.56 12.67 • Determine the difference between the “bad” data point and the nearest value. - Calculate the “Q value” Questionable point Range = 0.20 gap = 0.11

Statistics • Dealing with Bad Data • 1.)Q Test (ii) Procedure: 4. Compare the calculated Q value to those in Tables at the same value of n and the desired %confidence level. - n: total number of values or observations - For example, at n = 5 and 90% confidence, the value of Q is 0.64 - Since: Q (calculated) ≤ Q (table) 0.55 ≤ 0.64 - data point 12.67 can not be rejected at the 90% confidence level (iii) Although the Q-test is valuable in eliminating bad data, common sense and repeating experiments with questionable results are usually more helpful.

Statistics • Dealing with Bad Data • 1.)Q Test (ii) Example: For the following bowling scores 116.0, 97.9, 114.2, 106.8 and 108.3, a bowler has a mean score of 108.6 and a standard deviation of 7.1. Using the Q test, decide whether the number 97.9 should be discarded.

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics & Statistics Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics &amp; Statistics Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Mathematics & Statistics Statistics