Chapter 2: Probability

Chapter 2: Probability Random Variable (r.v.) is a variable whose value is unknown until it is observed. The value of a random variable results from an experiment. Experiments can be either controlled (laboratory) or uncontrolled (observational). Most economic variables are random and are the result of uncontrolled experiments.

Random Variables A discrete random variable can take on only a finite number of values such as • The number of visits to a doctor’s office • Number of children in a household • Flip of a coin • Dummy (binary) variable: D=0 if male, D=1 if female A continuous random variable can take any real value (not just whole numbers) in an interval on the real number line such as: • Gross Domestic Product next year • Price of a share in Microsoft • Interest rate on a 30 year mortgage

Probability Distributions of Random Variables • All random variables have probability distributions that describe the values the random variable can take on and the associated probabilities of these values. • Knowing the probability distribution of random variable gives us some indication of the value the r.v. may take on.

Probability Distribution for Discrete Random Variable Expressed as a table, graph or function 1. Suppose X = # of tails when a coin is flipped twice. X can take on the values 0, 1 or 2. Let f(x) be the associated probabilities: TableGraph X f(x) 0 0.25 1 0.50 2 0.25 Probability is represented as height on this bar graph

2. Suppose X is a binary variable that can take on two values: 0 or 1. Furthermore, assume P(X=1) = p and P(X=0) = (1-p) Function: P(X=x) = f(x) = px(1-p)1-x for X = 0, 1 Table X f(x) 0 (1-p) 1 p Suppose p = 0.10 Then X takes on 0 with probability 0.90 and X takes on 1 with probability 0.10

Facts about discrete probability distribution functions • Each probability P(X=x) = f(x) must lie between 0 and 1: 0  f(x)  1 2. The sum of the probabilities must be 1. If X can take on n different values then: f(x1) + f(x2)+. . .+f(xn) = 1

Probability Distribution (Density)for Continuous Random Variables Expressed as a function or graph. Continuous r.v.’s can take on an infinite number of values in a given interval • A table isn’t appropriate to express pdf EX: f(x) = 2x for 0  x  1 = 0 otherwise

Because a continuous random variable has an uncountably infinitenumber of values, the probability of one occurring is zero. P(X = a) = 0 Instead, we ask “What is the probability that X is between a and b? P[a < X < b] = ? In an experiment, the probability P[a < X < b] is the proportion of the time, in many experiments, that X will fall between a and b.

Probability is represented as area under the function. Total area must be 1.0 Area of triangle is 1.0 Probability that x lies between 0 and 1/2 P [ 0  X  1/2 ] = 0.25 [Area of any triangle is ½*Base*Height]

Uniform Random Variable: u is distributed uniformly between a and b • p.d.f. is a line between a and b of height 1/(b-a) • f(u) = 1/(b – a) if a  u  b = 0 otherwise EX: Spin a dial on a clock a = 0 and b = 12 Find the probability that u lies between 1 and 2

b P [ a  X  b ] = f(x) dx a In calculus, the integral of a function defines the area under it: For continuous random variables it is the area under f(x), and not f(x) itself, which defines the probability of an event. We will NOT be integrating functions; when necessary we use tables and/or computers to calculate the necessary probability (integral).

n Rule 1: xi = x1 + x2 + . . . + xn i = 1 n Rule 2: a = na i = 1 n n n Rule 4: xi +yi = xi + yi i = 1 i = 1 i = 1 Rules of Summation Rule 3: axi = a xi

n n n Rule 5: axi +byi = a xi + b yi i = 1 i = 1 i = 1 x1 + x2 + . . . + xn Rule 6: x = xi = n n xix) = 0 i = 1 Rules of Summation (continued) n 1 n i = 1 From Rule 6, we can prove (in class) that:

n Rule 6: f(xi) = f(x1) + f(x2) + . . . + f(xn) i = 1 n Notation: f(xi) = f(xi)= f(xi) x i i = 1 n n m Rule 7: f(xi,yj) = [ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)] i = 1 i = 1 j = 1 n m m n f(xi,yj) = f(xi,yj) j = 1 i = 1 i = 1 j = 1 Rules of Summation (continued) The order of summation does not matter :

The Mean of a Random Variable The mean of a random variable is its mathematical expectation, or expected value. For a discrete random variable, this is: E(X) = xif(xi) = x1f(x1) + x2f(x2) + . . . + xnf(xn) where n measures the number of values X can take on It is a probability-weighted average of the possible values the random variable X can take on. This is a sum for discrete r.v.’s and an integral for continuous r.v.’s

E(X) tells us the “long-run” average value for X. It is not the value one would expect X to take on. • If you were to randomly draw values of X from its pdf an infinite number of times and average these values, you would get E(X) • E(X) =  this greek letter “mu” is not used in your text but is commonly used to denote the mean of X.

Example: Roll a fair die Interpretation: In a large number of rolls of a fair die, one-sixth of the values will be 1’s, one-sixth of the values will be 2’s. etc., and the average of these values will be 3.5.

Mathematical Expectation • Think of E(.) as an operator that requires you to weight by probabilities any expression inside the parentheses, and then sum • E(g(x)) = g(xi)f(xi) = g(x1)f(x1) + g(x2 )f(x2) + . . . + g(xn )f(xn)

Rules of Mathematical Expectation • E(c) = c where c is a constant • E(cX) = cE(X) where c is a constant and X is a random variable • E(a + cX) = a + cE(X) where a and c are constants and X is a random variable.

Variance of a Random Variable • Like the mean, the variance of a r.v. is an expected value, but it is the expected value of the squared deviations from the mean • Let g(x) = (x – E(x))2 • Variance 2 = Var(x) = E(x – E(x))2 = g(xi)f(xi) = (xi – E(xi))2f(xi) • It measures the amount of dispersion in the possible values for X.

About Variance • Unit of measurement is X units squared • When we create a new random variable as a linear transformation of X: y = a + cx We know that E(y) = a + cE(x) But Var(y) = c2Var(x) (proof in class) This property tells us that the amount of variation in y is determined by: the amount of variation in X and the constant c. The additive constant a in no way alters the amount of variation in the values on x.

About Variance (con’t) • E(x – E(x))2 = E[x2 – 2E(x)x + E(x)2] = E(x2) – 2E(x)E(x) + E(x)2 = E(x2) – 2E(x)2 + E(x)2 = E(x2) – E(x)2 • Run the E(.) operator thru, pulling out constants and stopping on random variables. Remember that E(x) is itself a constant, so • E(E(x)) = E(x)

Standard Deviation • Because variance is in squared units of the r.v., we can take the square root of the variance to obtain the standard deviation. • =  2 =  Var(x) Be sure to take the square root after you square and sum the deviations from the mean.

Joint Probability • An experiment can randomly determine the outcome of more than one variable. • When there are 2 random variables of interest, we study the joint probability density function • When there are more than 2 random variables of interest, we study the multivariate probability density function.

For a discrete joint pdf, probability is expressed in a matrix: Let X= return on stocks, Y= return on bonds P(X=x,Y=y) = f(x,y) e.g. P(X=10,Y=8) = 0.30

About Joint P.d.F’s • Marginal Probability Distribution: what is the probability distribution for X regardless of what values Y takes on? f(x) = yf(x,y) what is the probability distribution for Y regardless of what values X takes on? f(y) = xf(x,y)

Conditional Probability Distribution: What is the probability distribution for X given that Y takes on a particular value? f(x|y) = f(x,y)/f(y) What is the probability distribution for Y given that X takes on a particular value? f(y|y\x) = f(x,y)/f(x)

Covariance: A measure that summarizes the joint probability distribution between two random variables. • cov(x,y) = E[(x – E(x))(y-E(y))] • = x y (xi – E(x))(yi – E(y))f(x,y) • Ex:

About Covariance: It measures the joint association between 2 random variables. Try asking: “When X is large, is Y more or less likely to also be large?” If the answer is that Y is likely to be large when X is large, then we say X and Y have a positive relationship. Cov(x,y) > 0 If the answer is that Y is likely to be small when X is large, then we say that X and Y have a negative relationship. Cov(x,y) < 0. cov(x,y) = E[(x – E(x))(y – E(y))] = E[xy – E(x)y – xE(y) + E(x)E(y)] = E(xy) – E(x)E(y) – E(x)E(y) + E(x)E(y) = E(xy) – E(x)E(y)  useful!!

Correlation • Covariance has awkward units of measurement. • Correlation removes all units of measurement by • dividing covariance by the product of the standard • deviations: • xy= Cov(x,y)/(xy) • and –1  xy  1 • Ex:

What does correlation look like?? =0 =.7 =.3 =.9

Statistical Independence Two random variables are statistically independent if knowing the value that one will take on does not reveal anything about what value the other may take on: f(x|y) = f(x) or f(y|x) = f(y) This implies that f(x,y) = f(x)f(y) if X and Y are independent. If 2 r.v.’s are independent, then their covariance will necessarily be equal to 0.

Functions of more than one Random Variable Suppose that X and Y are two random variables. If we sum them together we create a new random variable that has the following mean and variance: Z = aX + bY  E(Z) = E(aX + bY) = aE(x) + bE(y) Var(Z) = Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y) If X and Y are independent  Var(Z) = Var(aX + bY) = a2Var(X) + b2Var(Y) see page 31

Normal Probability Distribution • Many random variables tend to have a normal distribution (a well known bell shape) • Theoretically, x~N(β,2) where E(x) = β and Var(x) = 2 The probability density function is a b x 

Normal Distribution (con’t) • A family of distributions, each with its own mean and variance. The mean anchors the distribution’s center and the variance captures the spread of the bell-shaped curve • To find area under the curve would require integrating the p.d.f – too complicated. Computer generated table gives all the probabilities we need for a normal r.v. that has mean 0 and variance of 1 To use the table (pg. 389), we need to take a normal random variable x~N(,2) and transform it by subtracting the mean and dividing by the standard deviation. This is a linear transformation of X that creates a new random variable that has mean 0 and variance of 1. Z = (x - )/  where z ~N(0,1)

Statistical inference: drawing conclusions about a population based on a sample

Chapter 2: Probability

Chapter 2: Probability

Presentation Transcript

Introduction to probability

Rules of Probability

Probability Assessment

Examples of discrete probability distributions:

Chapter 6

CHAPTER 7, the logic of sampling

Joint Probability Distributions

Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 1 Introduction

Discrete Probability

Joint Probability distribution

HMM Algorithms

Chapter 5: Probability Distributions: Discrete Probability Distributions

Lecture Slides

Probability Distributions

Chapter 2 Probability Concepts and Applications

Probability

Unit 7 - Probability

Chapter 5

Probability and Discrete Random Variable

Continuous Random Variables and Probability Distributions