Probability

Statistics 111 - Lecture 7 Probability Normal Distribution and Standardization Stat 111 - Lecture 7 - Normal Distribution

Administrative Notes • Homework 2 due on Monday Stat 111 - Lecture 7 - Normal Distribution

Outline • Law of Large Numbers • Normal Distribution • Standardization and Normal Table Stat 111 - Lecture 7 - Normal Distribution

Data versus Random Variables • Data variables are variables for which we actually observe values • Eg. height of students in the Stat 111 class • For these data variables, we can directly calculate the statistics s2 and x • Random variables are things that we don't directly observe, but we still have a probability distribution of all possible values • Eg. heights of entire Penn student population Stat 111 - Lecture 7 - Normal Distribution

Law of Large Numbers • Rest of course will be about using data statistics (x and s2) to estimate parameters of random variables ( and 2) • Law of Large Numbers: as the size of our data sample increases, the mean x of the observed data variable approaches the mean  of the population • If our sample is large enough, we can be confident that our sample mean is a good estimate of the population mean! Stat 111 - Lecture 7 - Normal Distribution

The Normal Distribution • The Normal distribution has the shape of a “bell curve” with parameters  and 2 that determine the center and spread:   Stat 111 - Lecture 7 - Normal Distribution

Different Normal Distributions • Each different value of  and 2 gives a different Normal distribution, denoted N(,2) • We can adjust values of  and 2 to provide the best approximation to observed data • If  = 0 and 2 = 1, we have the Standard Normal distribution N(0,1) N(2,1) N(-1,2) N(0,2) Stat 111 - Lecture 7 - Normal Distribution

Property of Normal Distributions • Normal distribution follows the 68-95-99.7 rule: • 68% of observations are between  -  and  +  • 95% of observations are between  - 2 and  + 2 • 99.7% of observations are between  - 3 and  + 3  2 Stat 111 - Lecture 7 - Normal Distribution

Calculating Probabilities • For more general probability calculations, we have to do integration For the standard normal distribution, we have tables of probabilities already made for us! If Z follows N(0,1): P(Z < -1.00) = 0.1587 Stat 111 - Lecture 7 - Normal Distribution

Standard Normal Table If Z has N(0,1): P(Z > 1.46) = 1 - P(Z < 1.46) = 1 - 0.9279 = 0.0721 • What if we need to do a probability calculation for a non-standard Normal distribution? Stat 111 - Lecture 7 - Normal Distribution

Standardization • If we only have a standard normal table, then we need to transform our non-standard normal distribution into a standard one • This process is called standardization  1  0 Stat 111 - Lecture 7 - Normal Distribution

Standardization Formula • We convert a non-standard normal distribution into a standard normal distribution using a linear transformation • If X has a N(,2) distribution, then we can convert to Z which follows a N(0,1) distribution Z = (X-)/ • First, subtract the mean  from X • Then, divide by the standard deviation  of X Stat 111 - Lecture 7 - Normal Distribution

Linear Transformations of Variables • Sometimes need to do simple mathematical operations on our variables, such as adding and/or multiplying with constants Y = a·X + b • Example: changing temperature scales Fahrenheit = 9/5 x Celsius + 32 • How are means and variances affected? Stat 111 - Lecture 7 - Normal Distribution

Mean/Variances of Linear Transforms • For transformed variable Y = a·X + b mean(Y) = a·mean(X) + b Var(Y) = a2·Var(X) SD(Y) = |a|·SD(X) • Note that adding a constant b does not affect measures of spread (variance and sd) Stat 111 - Lecture 7 - Normal Distribution

More complicated linear functions • We can also do linear transformations involving with more than one variable: Z = a·X + b·Y + c • The mean formula is similar: mean(Z) = a·mean(X) + b·mean(Y) + c • If X and Y are also independent then var(Z) = a2·var(X) + b2·var(Y) • Need more complicated variance formula (in book) if the variables are not independent Stat 111 - Lecture 7 - Normal Distribution

Standardization Example Dear Abby, You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my baby for 10 months and 5 days. My husband is in the Navy and it could not have been conceived any other time because I only saw him once for an hour, and I didn’t see him again until the day after the baby was born. I don’t drink or run around, and there is no way the baby isn’t his, so please print a retraction about the 266-day carrying time because I am in a lot of trouble! -San Diego Reader Stat 111 - Lecture 7 - Normal Distribution

Standardization Example • According to well-documented data, gestation time follows a normal distribution with mean  of 266 days and SD  of 16 • Let X = gestation time. What percent of babies have gestation time greater than 310 days (10 months & 5 days) ? • Need to convert X = 310 into standard Z Z = (X-)/ = (310-266)/16 = 44/16 = 2.75 Stat 111 - Lecture 7 - Normal Distribution

Standardization Example P(X > 310) = P(Z > 2.75) = 1 - P(Z < 2.75) = 1 - 0.9970 = 0.0030 So, only a 0.3% chance of a pregnancy lasting as long as 310 days! Stat 111 - Lecture 7 - Normal Distribution

Reverse Standardization • Sometimes, we need to convert a standard normal Z into a non-standard normal X • Example: what is the length of pregnancy below which we have 10% of the population? • From table, we see P(Z <-1.28) = 0.10 • Reverse Standardization formula: X = σ⋅Z +μ • For Z = -1.28, we calculate X = -1.28·16 + 266 = 246 days (8.2 months) Stat 111 - Lecture 7 - Normal Distribution

Another Example • NCAA Division 1 SAT Requirements: athletes are required to score at least 820 on combined math and verbal SAT • In 2000, SAT scores were normally distributed with mean  of 1019 and SD  of 209 • What percentage of students have scores greater than 820 ? Z = (X-)/ = (820-1019)/209 = -199/209 = -.95 Stat 111 - Lecture 7 - Normal Distribution

Another Example • P(X > 820) = P(Z > -0.95) = 1- P(Z < -0.95) • P(Z < -0.95) = 0.17 so P(X > 820) = 0.83 • 83% of students meet NCAA requirements Stat 111 - Lecture 7 - Normal Distribution

SAT Verbal Scores • Now, just look at X = Verbal SAT score, which is normally distributed with mean  of 505 and SD  of 110 • What Verbal SAT score will place a student in the top 10% of the population? Stat 111 - Lecture 7 - Normal Distribution

SAT Verbal Scores • From the table, P(Z >1.28) = 0.10 • Need to reverse standardize to get X: X =σ⋅Z +μ =110⋅1.28 + 505 = 646 • So, a student needs a Verbal SAT score of 646 in order to be in the top 10% of all students Stat 111 - Lecture 7 - Normal Distribution

Next Class - Lecture 8 • Chapter 5: Sampling Distributions Stat 111 - Lecture 7 - Normal Distribution

Probability

Probability

Presentation Transcript

Probability

Probability

Probability

Probability

Probability

Probability

Probability

Probability

Probability

Probability

probability

Probability

Probability

Probability

probability

Probability

Probability

Probability

Probability

PROBABILITY

Probability

Probability