Introduction to Probability and Statistics

Introduction to Probability and Statistics Chapter 12

Topics • Types of Probability • Fundamentals of Probability • Statistical Independence and Dependence • Expected Value • The Normal Distribution

Sample Space and Event • Probability is associated with performing an experiment whose outcomes occur randomly • Sample space contains all the outcomes of an experiment • An event is a subset of sample space • Probability of an event is always greater than or equal to zero • Probabilities of all the events must sum to one • Events in an experiment are mutually exclusive if only one can occur at a time

Objective Probability • Objective Probability • Stated prior to the occurrence of the event • Based on the logic of the process producing the outcomes • Relative frequency is the more widely used definition of objective probability. • Subjective Probability • Based on personal belief, experience, or knowledge of a situation. • Frequently used in making business decisions. • Different people often arrive at different subjective probabilities.

Fundamentals of Probability Distributions • Frequency Distribution • organization of numerical data about the events • Probability Distribution • A list of corresponding probabilities for each event • Mutually Exclusive Events • If two or more events cannot occur at the same time • Probability that one or more events will occur is found by summing the individual probabilities of the events: • P(A or B) = P(A) + P(B)

Fundamentals of Probability A Frequency Distribution Example • Grades for past four years.

Fundamentals of Probability Non-Mutually Exclusive Events & Joint Probability • Probability that non-mutually exclusive events M and F or both will occur expressed as: • P(M or F) = P(M) + P(F) - P(MF) • A joint (intersection) probability, P(MF), is the probability that two or more events that are not mutually exclusive can occur simultaneously.

Fundamentals of Probability Cumulative Probability Distribution • Determined by adding the probability of an event to the sum of all previously listed probabilities • Probability that a student will get a grade of C or higher: • P(A or B or C) = P(A) + P(B) + P(C) = .10 + .20 + .50 = .80

Statistical Independence and Dependence Independent Events • Events that do not affect each other are independent. • Computed by multiplying the probabilities of each event. • P(AB) = P(A)  P(B) • For coin tossed three consecutive times: Probability of getting head on first toss, tail on second, tail on third is: • P(HTT) = P(H)  P(T)  P(T) = (.5)(.5)(.5) = .125

Statistical Independence and Dependence Independent Events – Bernoulli Process Definition • Properties of a Bernoulli Process: • Two possible outcomes for each trial. • Probability of the outcome remains constant over time. • Outcomes of the trials are independent. • Number of trials is discrete and integer.

Binomial Distribution • Used to determine the probability of a number of successes in n trials. • where: p = probability of a success • q = 1- p = probability of a failure • n = number of trials • r = number of successes in n trials • Determine probability of getting exactly two tails in three tosses of a coin.

Example • Microchips are inspected at the quality control station • From every batch, four are selected and tested for defects • Given defective rate of 20%, what is the probability that each batch contains exactly two defectives

Binomial Distribution Example – Quality Control • What is probability that each batch will contain exactly two defectives? • What is probability of getting two or more defectives? • Probability of less than two defectives: • P(r<2) = P(r=0) + P(r=1) = 1.0 - [P(r=2) + P(r=3) + P(r=4)] • = 1.0 - .1808 = .8192

Dependent Events • If the occurrence of one event affects the probability of the occurrence of another event, the events are dependent. • Coin toss to select bucket, draw for blue ball. • If tail occurs, 1/6 chance of drawing blue ball from bucket 2; if head results, no possibility of drawing blue ball from bucket 1. • Probability of event “drawing a blue ball” dependent on event “flipping a coin.”

Dependent Events – Conditional Probabilities • Unconditional: P(H) = .5; P(T) = .5, must sum to one. • Conditional: P(RH) =.33, P(WH) = .67, P(RT) = .83, P(WT) = .17

Math Formulation of Conditional Probabilities • Given two dependent events A and B: • P(AB) = P(AB)/P(B) or P(AB) = P(A|B).P(B) • With data from previous example: • P(RH) = P(RH)  P(H) = (.33)(.5) = .165 • P(WH) = P(WH)  P(H) = (.67)(.5) = .335 • P(RT) = P(RT)  P(T) = (.83)(.5) = .415 • P(WT) = P(WT)  P(T) = (.17)(.5) = .085

Summary of Example Problem Probabilities

Bayesian Analysis • In Bayesian analysis, additional information is used to alter (improve) the marginal probability of the occurrence of an event. • Improved probability is called posterior probability • A posterior probability is the altered marginal probability of an event based on additional information. • Bayes’ Rule for two events, A and B, and third event, C, conditionally dependent on A and B:

Bayesian Analysis – Example (1 of 2) • Machine setup; if correct 10% chance of defective part; if incorrect, 40%. • 50% chance setup will be correct or incorrect. • What is probability that machine setup is incorrect if sample part is defective? • Solution: P(C) = .50, P(IC) = .50, P(DC) = .10, P(DIC) = .40 • where C = correct, IC = incorrect, D = defective

Statistical Independence and Dependence Bayesian Analysis – Example (2 of 2) • Previously, the manager knew that there was a 50% chance that the machine was set up incorrectly • Now, after testing the part, he knows that if it is defective, there is 0.8 probability that the machine was set up incorrectly

Expected Value Random Variables • When the values of variables occur in no particular order or sequence, the variables are referred to as random variables. • Random variables are represented by a letter x, y, z, etc. • Possible to assign a probability to the occurrence of possible values. Possible values of no. of heads are: Possible values of demand/week:

Expected Value Example (1 of 4) • Machines break down 0, 1, 2, 3, or 4 times per month. • Relative frequency of breakdowns , or a probability distribution:

Expected Value Example (2 of 4) • Computed by multiplying each possible value of the variable by its probability and summing these products. • The weighted average, or mean, of the probability distribution of the random variable. • Expected value of number of breakdowns per month: • E(x) = (0)(.10) + (1)(.20) + (2)(.30) + (3)(.25) + (4)(.15) • = 0 + .20 + .60 + .75 + .60 • = 2.15 breakdowns

Expected Value Example (3 of 4) • Variance is a measure of the dispersion of random variable values about the mean. • Variance computed as follows: • Square the difference between each value and the expected value. • Multiply resulting amounts by the probability of each value. • Sum the values compiled in step 2. • General formula: • 2 = [xi - E(xi)] 2 P(xi)

Expected Value Example (4 of 4) • Standard deviation computed by taking the square root of the variance. • For example data: • 2 = 1.425 breakdowns per month • standard deviation =  = sqrt(1.425) • = 1.19 breakdowns per month

Based on the number of outcomes occurring during a given time interval or in a specified regions Examples # of accidents that occur on a given highway during a 1-week period # of customers coming to a bank during a 1-hour interval # of TVs sold at a department store during a given week # of breakdowns of a washing machine per month Poisson Distribution

Conditions • Consider the # of breakdowns of a washing machine per month example • Each breakdown is called an occurrence • Occurrences are random that they do not follow any pattern (unpredictable) • Occurrence is always considered with respect to an interval (one month)

X = number of counts in the interval Poisson random variable with  > 0 PMF f(x)= x=0,1,2, Mean and Variance E[X] =  , V (X) =  The Probability Mass Distribution

If a bank gets on average  = 6 bad checks per day, what are the probabilities that it will receive four bad checks on any given day?10 bad checks on any two consecutive days? Solution x = 4 and  = 6, then f(4) = = 0.135  = 12 and x = 10, then f(10) = = 0.105 Example

The number of failures of a testing instrument from contamination particle on the product is a Poisson random variable with a mean of 0.02 failure per hour. What is the probability that the instrument does not fail in an 8-hour shift? What is the probability of at least one failure in one 24-hour day? Example

Let X denote the failure in 8 hours. Then, X has a Poisson distribution with =0.16 P(X=0)=0.8521 Let Y denote the number of failure in 24 hours. Then, Y has a Poisson distribution with =0.48 P(Y1) = 1-P(Y = 0) =0.3812 Solution

The Normal Distribution Continuous Random Variables • Continuous random variable can take on an infinite number of values within some interval. • Continuous random variables have values that are not countable • Cannot assign a unique probability to each value

The Normal Distribution Definition • The normal distribution is a continuous probability distribution that is symmetrical on both sides of the mean. • The center of a normal distribution is its mean . • The area under the normal curve represents probability, and total area under the curve sums to one.

The Normal Distribution Example (1 of 5) • Mean weekly carpet sales of 4,200 yards, with standard deviation of 1,400 yards. • What is probability of sales exceeding 6,000 yards? •  = 4,200 yd;  = 1,400 yd; probability that number of yards of carpet will be equal to or greater than 6,000 expressed as: P(x6,000).

The Normal Distribution Example (2 of 5) - -

The Normal Distribution Standard Normal Curve (1 of 2) • The area or probability under a normal curve is measured by determining the number of standard deviations from the mean. • Number of standard deviations a value is from the mean designated as Z. • Z = (x - )/

The Normal Distribution Standard Normal Curve (2 of 2)

The Normal Distribution Example (3 of 5) Z = (x - )/  = (6,000 - 4,200)/1,400 = 1.29 standard deviations P(x 6,000) = .5000 - .4015 = .0985

The Normal Distribution Example (4 of 5) • Determine probability that demand will be 5,000 yards or less. • Z = (x - )/ = (5,000 - 4,200)/1,400 = .57 standard deviations • P(x 5,000) = .5000 + .2157 = .7157

The Normal Distribution Example (5 of 5) • Determine probability that demand will be between 3,000 yards and 5,000 yards. • Z = (3,000 - 4,200)/1,400 = -1,200/1,400 = -.86 • P(3,000  x  5,000) = .2157 + .3051= .5208

Different Table • P(3,000  x  5,000)= • P((3,000 - 4,200)/1,400) z  ((5,000 - 4,200)/1,400) • P(-0.86 z  0.57)= • P( z  0.57)- P( z  -0.86)= • P( z  0.57)- P( z ≥0.86)= • P( z  0.57)- [1-P( z  0.86)]= • (0.7157)-[1-0.8051]=0.5208

The Normal Distribution Sample Mean and Variance • The population mean and variance are for the entire set of data being analyzed. • The sample mean and variance are derived from a subset of the population data and are used to make inferences about the population.

The Normal Distribution Computing the Sample Mean and Variance

The Normal Distribution Example Problem Re-Done Sample mean = 42,000/10 = 4,200 yd Sample variance = [(190,060,000) - (1,764,000,000/10)]/9 = 1,517,777 Sample std. dev. = sqrt(1,517,777) = 1,232 yd

The Normal Distribution Chi-Square Test for Normality (1 of 2) • It can never be simply assumed that data are normally distributed. • A statistical test must be performed to determine the exact distribution. • The Chi-square test is used to determine if a set of data fit a particular distribution. • It compares an observed frequency distribution with a theoretical frequency distribution that would be expected to occur if the data followed a particular distribution (testing the goodness-of-fit).

The Normal Distribution Chi-Square Test for Normality (2 of 2) • In the test, the actual number of frequencies in each range of frequency distribution is compared to the theoretical frequencies that should occur in each range if the data follow a particular distribution. • A Chi-square statistic is then calculated and compared to a number, called a critical value, from a chi-square table. • If the test statistic is greater than the critical value, the distribution does not follow the distribution being tested; if it is less, the distribution does exist. • Chi-square test is a form of hypothesis testing.

Statistical Analysis with Excel (1 of 3)

Introduction to Probability and Statistics