460 likes | 692 Views
Summary. Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability range, IQR , average absolute deviation, variation and standard deviation
E N D
Summary • Five numbers summary, percentiles, mean • Box plot, modified box plot • Robust statistic – mean, median, trimmed mean • outlier • Measures of variability • range, IQR, average absolute deviation, variation and standard deviation • Average distance between each data value and the meanis zero.
population (census) vs. sample parameter (population) vs. statistic (sample) Population - parameter Mean Standard deviation Sample - statistic Mean Standard deviation Výběr - statistika Výběrový průměr Výběrová směrodatná odchylka
Bias, sampling • Sampling – how to construct sample from the population? • Bias – a sample is biased if it differs from the population in a systematic way. • Unbiased standard deviation – divide by .
SRS • sampling with replacement • Generates independent samples. • Two sample values are independent if that what we get on the first one doesn't affect what we get on the second. • sampling without replacement • Deliberately avoid choosing any member of the population more than once. • This type of sampling is not independent, however it is more common. • The error is small as long as • the sample is large • the sample size is no more than 10% of population size
Suppose you have a bag with 3 cards in it. The cards are numbered 0, 2 and 4. • Population mean = 2 • Population variance = 8/3 • An important property of a sample statistic that estimates a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal the population parameter. We want: • This is called unbiased.
Histogram revision • Distribution – the pattern of values in the data • Histogram – visualizing the distribution • We can see • whether the data tend to be close to the particular value • whether the data varies a lot or a little about the most common values • whether that variation tends to be more above or below the common values • whether there are unusually large or small values in the data
Life expectancy data – histogram • Use interactive histogram applet to generate histogram with bin size of 10, starting at 40. frequency life expectancy
Life expectancy data – histogram frequency life expectancy
Making conclusions from a histogram • What all you can tell for life expectancy data? • how many modes? • where is the mode? • symmetric, left skewed or right skewed? • outliers – yes or no? frequency life expectancy
Making conclusions from a histogram • Where is the mode, the median, the mean? frequency life expectancy
Five numbers summary Min. Q1 Median Q3 Max. 47.79 64.67 73.24 76.65 83.39 What is the position of the mean and the median?
standardizing normování
Playing chess • Pretend I am a chess player. • Which of the following tells you most about how good I am: • My rating is 1800. • 8110th place among world competitive chess players. • Ranked higher than 88% of competitive chess players.
Distribution Distribution of scores in one particular year We should use relative frequencies and convert all absolute frequencies to proportions.
Height data – absolute frequencies http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights
Height data – relative frequencies What proportion of values is between 170 cm and 173.75 cm? 30%
Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? We can’t tell for certain.
How should we modify data/histogram to allow us a more detail? • Adding more value to the dataset • Increasing the bin size • A smaller bin size
Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? 36%
Decreasing bin size • Check out what happens with the smallest bin size for Physics Test Scores from http://quarknet.fnal.gov/cosmics/histo.shtml.
Normal distribution recall the empirical rule 68-95-99.7
Empirical rule -3 0 -2 -1 +1 +2 +3 0 3 1 2 4 5 6
Z Z – number of standard deviations away from the mean If the Z-value is 1, how many percent are less than that value? cca 84 % -3 0 -2 -1 +1 +2 +3
Who is more popular? Let’s demonstrate the importance of Z-scores with the following example.
Who is more popular s.d. = 36 Z = -3.53 s.d. = 60 Z = -2.57
Formula • What formula describes what we did?
Quiz • What does a negative Z-score mean? • The original value is negative. • The original value is less than mean. • The original value is less than 0. • The original value minus the mean is negative.
Quiz II • If we standardize a distribution by converting every value to a Z-score, what will be the new mean of this standardized distribution? • If we standardize a distribution by converting every value to a Z-score, what will be the new standard deviation of this standardized distribution?
Standard normal distribution N(,) N(,)
Meaning of relative frequencies 3 4 4 5 3 1 3 2 2 3
Probability density function Probability density function (PDF) Hustota pravděpodobnosti