210 likes | 335 Views
ENGR 224/STAT 224 Probability and Statistics Lecture 16. 1. Estimation. This is our introduction to the field of inferential statistics. We already know why we want to study samples instead of entire populations, (e.g. limited resources, destructive sampling etc.).
E N D
Estimation • This is our introduction to the field of inferential statistics. • We already know why we want to study samples instead of entire populations, (e.g. limited resources, destructive sampling etc.). • By studying the sample and its statistics, can we make inferences about the population and its parameters.
Sampling Distributions • We previously considered distributions of individual scores, {x1, x2, x3, … xn} • We now want to consider the distribution of sample statistics, where the samples are all of the same size and are drawn from the same population. • In this case sample 1 could be different from sample 2 which is different from sample n • If the samples are different, then so to are the sample statistics, that is, the sample statistics are a random variable
Example: Exam Scores:Sampling Distribution of the Mean • For example, suppose each of us were to take a random sample of 6 scores from the population of exam scores (on next slide) and compute the mean score.
Example: Exam Scores Did each of us obtain the exact same value? No. Different Values. Lets sketch the distribution (Histogram/Stem-Leaf). What do we observe? That is these sample means have a distribution and these sample means will also have a mean, ie., the mean of the sample means, and a standard deviation, the standard deviation of the sample means.
Definition: Sampling Distribution of Sample Statistics The sampling distribution of Sample Statistics (E.g., ) is the distribution of the Sample Statistics obtained when we repeatedly draw samples of the same size, n, from the same population.
Factors affecting the Sampling Distribution • The Sampling Distribution of any sample statistic therefore depends on three things • Distribution of the population • Sample size • Sampling process
Random Sampling • The most common sampling process is random sampling, • each outcome or random variable Xi is independent of the other, and • the probability distribution of each Xi is the same. • i.e., Independent and Identically distributed. • Achieved if sampling is done with replacement, or if an infinite population • If finite population N, then if n/N ≤ 0.05 then this condition is approximately satisfied.
x P(x) x P(x) 11 12 21 13 22 31 14 23 32 41 15 24 33 42 51 16 25 34 43 52 61 26 35 44 53 62 36 45 54 63 46 55 64 56 65 66 1 1/36 1/36 1.5 2/36 3/36 2 3/36 6/36 2.5 4/36 10/36 3 5/36 15/36 3.5 6/36 21/36 4 5/36 20/36 4.5 4/36 18/36 5 3/36 15/36 5.5 2/36 11/36 6 1/36 6/36 126/36 Example: Rolling a die Population = { 1, 2, 3, 4, 5, 6} each value equally likely with P = 1/6 Take samples of size 2 with replacement and compute its mean.
Example: Rolling a die cont. • The mean value of all possible sample means is equal to 3.5 • One can also calculate the standard deviation of the sample means In this case it is equal to 1.2076 • Note how the mean of the population of scores is also 3.5. Curious …
Example: Marbles in a bag Suppose there are 20 marbles in bag with numbers 1, 3, 8 inscribed on them. There are 10 (1 marbles), 6 (3 marbles) and 4 (8 marbles). Our goal is to estimate the mean of all marbles in the bag. Samples of size three are drawn with replacement from the bag and the sample median is calculated. Sample Space 111 113 118 138 333 331 338 888 881 883 Probability 0.125 0.225 0.150 0.180 0.027 0.135 0.054 0.008 0.060 0.036 Median 1 1 1 3 3 3 3 8 8 8 Probability Distribution: • x P(X = x) • 1 0.500 • 3 0.396 • 0.104 Expected Value of Sample Median = 1(.500)+3(0.396)+8(.104) =2.52
Example: Marbles in a bag Suppose there are 20 marbles in bag with numbers 1, 3, 8 inscribed on them. There are 10 (1 marbles), 6 (3 marbles) and 4 (8 marbles). Samples of size three with replacement are drawn from the bag and the sample mean is calculated. Sample Space 111 113 118 138 333 331 338 888 881 883 Probability 0.125 0.225 0.150 0.180 0.027 0.135 0.054 0.008 0.060 0.036 Mean 1 5/3 10/3 4 3 7/3 14/3 8 17/3 19/3 Expected Value of Sample Mean = 1(.125)+5/3(0.225)+10/3(.150)+4(.180)+3(0.027)+7/3(.135)+14/3(0.540)+8(0.008)+17/3(0.060)+19/3(0.036) =3 Population Mean = 1(.5)+6(.3)+8(.2) = 3
Definition: Estimator and Point Estimate Definition:An estimator is a sample statistic (such as the sample mean, sample Median, sample proportion, or sample standard deviation) used to approximate a population parameter. Definition: A Point Estimate is a single value or point used to approximate a population parameter.
Biased Estimators • We observe that the expected value of the Sample Median is not the same as the population mean. It is a BIASED estimator of the Mean. • We observe that the expected value of the Sample Mean is the same as the population mean. It is a NON-BIASED estimator of the Mean. • In general, if the sampling distribution of the statistic has a mean equal to the population parameter to be estimated, then it is an unbiased estimate of the parameter. • In general, if the sampling distribution of the statistic has a mean different than the population parameter to be estimated, then it is an biased estimate of the parameter.
“The sample mean is the best point estimate of the population mean m.”
Central Limit Theorem • Given • The random variable x has a distribution (which may or may not be normal) with mean m and standard deviation s, • Samples of size n are randomly selected from the population “with replacement” • Then • The distribution of the sample means, will as the sample size increases approach a normal distribution. • The mean of the sample means will be the population mean m. • The standard deviation of the sample means will be
Notation • Let samples of size n be selected from a population with mean m and standard deviation s, • The mean of the sample means is denoted as • Therefore the CLT says that • The standard deviation of the sample means is denoted as • According to the CLT, we have thatfor large populations.
Application of the CLT Therefore, the distribution of the sample mean x of a random sample drawn from practically any population with mean m and standard deviation s can be approximated by a normal distribution with mean m , and standard deviation provided the population is large.
Overview Distributions of Statistics Sections 5.3. 5.4 of text 20
Homework Reread 5.3 and Read 5.4 Start Preparing for the Midterm Exam (Chaps 1 – 5 ) 21