310 likes | 332 Views
Statistics: Dealing With Uncertainty. ACADs (08-006) Covered Keywords Sample, normal distribution, central tendency, histogram, probability, sample, population, data sorting, standard normal distribution, Z-tables, probability. Supporting Material. Statistics. Dealing With Uncertainty.
E N D
Statistics: Dealing With Uncertainty ACADs (08-006) Covered Keywords Sample, normal distribution, central tendency, histogram, probability, sample, population, data sorting, standard normal distribution, Z-tables, probability. Supporting Material
Statistics Dealing With Uncertainty
Objectives • Describe the difference between a sample and a population • Learn to use descriptive statistics (data sorting, central tendency, etc.) • Learn how to prepare and interpret histograms • State what is meant by normal distribution and standard normal distribution. • Use Z-tables to compute probability.
Statistics • “There are lies, d#$& lies, and then there’s statistics.” Mark Twain
Statistics is... • a standard method for... - collecting, organizing, summarizing, presenting, and analyzing data - drawing conclusions - making decisions based upon the analyses of these data. • used extensively by engineers (e.g., quality control)
Populations and Samples • Population - complete set of all of the possible instances of a particular object • e.g., the entire class • Sample - subset of the population • e.g., a team • We use samples to draw conclusions about the parent population.
Why use samples? • The population may be large • all people on earth, all stars in the sky. • The population may be dangerous to observe • automobile wrecks, explosions, etc. • The population may be difficult to measure • subatomic particles. • Measurement may destroy sample • bolt strength
Exercise: Sample Bias • To three significant figures, estimate the average age of the class based upon your team. • When would a team not be a representative sample of the class?
Measures of Central Tendency • If you wish to describe a population (or a sample) with a single number, what do you use? • Mean - the arithmetic average • Mode - most likely (most common) value. • Median - “middle” of the data set.
What is the Mean? • The mean is the sum of all data values divided by the number of values.
Sample Mean Where: • is the sample mean • xi are the data points • n is the sample size
Population Mean Where: • μis the population mean • xiare the data points • Nis thetotal number of observations in the population
What is the Mode? • mode - the value that occurs the most often in discrete data (or data that have been grouped into discrete intervals) • Example, students in this class are most likely to get a grade of B.
Mode continued • Example of a grade distribution with mean C, mode B
What is the Median? • Median - for sorted data, the median is the middle value (for an odd number of points) or the average of the two middle values (for an even number of points). • useful to characterize data sets with a few extreme values that would distort the mean (e.g., house price,family incomes).
What Is the Range? • Range - the difference between the lowest and highest values in the set. • Example, driving time to Houston is 2 hours +/- 15 minutes. Therefore... • Minimum = 105 min • Maximum = 135 minutes • Range = 30 minutes
Standard Deviation • Gives a unique and unbiased estimate of the scatter in the data.
Standard Deviation • Population • Sample Variance = s2 Deviation Variance = s2
The Subtle Difference Between s and σ N versus n-1 n-1 is needed to get a better estimate of the population s from the sample s. Note: for large n, the difference is trivial.
A Valuable Tool • Gauss invented standard deviation circa 1700 to explain the error observed in measured star positions. • Today it is used in everything from quality control to measuring financial risk.
Team Exercise • In your team’s bag of M&M candies, count • the number of candies for each color • the total number of candies in the bag • When you are done counting, have a representative from your team enter your data in Excel More
Team Exercise (con’t) For each color, and the total number of candies, determine the following: maximum mode minimum median range standard deviation mean variance
Individual Exercise: Histograms • Flip a coin EXACTLY ten times. Count the number of heads YOU get. • Report your result to the instructor who will post all the results on the board • Open Excel • Using the data from the entire class, create bar graphs showing the number of classmates who get one head, two heads, three heads, etc.
Data Distributions • The “shape” of the data is described by its frequency histogram. • Data that behaves “normally” exhibit a “bell-shaped” curve, or the “normal” distribution. • Gauss found that star position errors tended to follow a “normal” distribution.
mean The Normal Distribution • The normal distribution is sometimes called the “Gauss” curve. RF Relative Frequency x
Standard Normal Distribution Define: Then Area = 1.00 z
Some handy things to know. • 50% of the area lies on each side of the mid-point for any normal curve. • A standard normal distribution (SND) has a total area of 1.00. • “z-Tables” show the area under the standard normal distribution, and can be used to find the area between any two points on the z-axis.
Using Z Tables (Appendix C, p. 624) • Question: Find the area between z= -1.0 and z= 2.0 • From table, for z = 1.0, area = 0.3413 • By symmetry, for z = -1.0, area = 0.3413 • From table, for z= 2.0, area = 0.4772 • Total area = 0.3413 + 0.4772 = 0.8185 • “Tails” area = 1.0 - 0.8185 = 0.1815
“Quick and Dirty” Estimates of m and s • m@ (lowest + 4*mode + highest)/6 • For a standard normal curve, 99.7% of the area is contained within ± 3 s from the mean. • Define “highest” = m + 3 s • Define “lowest” = m - 3 s • Therefore, s@ (highest - lowest)/6
Example: Drive time to Houston • Lowest = 1 h • Most likely = 2 h • Highest = 4 h (including a flat tire, etc.) • m = (1+4*2+4)/6 = 2.16 (2 h 12 min) • s = (4 - 1)/6= 0.5 h • This technique (Delphi) was used to plan the moon flights.
Review • Central tendency • mean • mode • median • Scatter • range • variance • standard deviation • Normal Distribution