440 likes | 578 Views
Lecture Eleven. Probability Models. Outline. Bayesian Probability Duration Models. Bayesian Probability. Facts Incidence of the disease in the population is one in a thousand The probability of testing positive if you have the disease is 99 out of 100
E N D
Lecture Eleven Probability Models
Outline • Bayesian Probability • Duration Models
Bayesian Probability • Facts • Incidence of the disease in the population is one in a thousand • The probability of testing positive if you have the disease is 99 out of 100 • The probability of testing positive if you do not have the disease is 2 in a 100
Using Conditional Probability • Pr(+ H)= Pr(+/H)*Pr(H)= 0.02*0.999=.01998 • Pr(+ S) = Pr(+/S)*Pr(S) = 0.99*0.001=.00099
False Positive Paradox • Probability of Being Sick If You Test + • Pr(S/+) ? • From Conditional Probability: • Pr(S/+) = Pr(S +)/Pr(+) = 0.00099/0.02097 • Pr(S/+) = 0.0472
Bayesian Probability By Formula • Pr(S/+) = Pr(S +)/Pr(+) = PR(+/S)*Pr(S)/Pr(+) • Where PR(+) = PR(+/S)*PR(S) + PR(+/H)*PR(H) • And Using our facts;Pr(S/+) = 0.99*(0.001)/[0.99*.001 + 0.02*.999] • Pr(S/+) = 0.00099/[0.00099+0.01998] • Pr(S/+) = 0.00099/0.02097 = 0.0472
Duration Models • Exploratory (Graphical) Estimates • Kaplan-Meier • Functional Form Estimates • Exponential Distribution
Kaplan-Meyer Estimate of Survivor Function • Survivor Function = (# at risk - # ending)/# at risk
Exponential Distribution • Density: f(t) = exp[ - t], 0 t • Cumulative Distribution Function F(t) • F(t) = • F(t) = - exp[- u] • F(t) = -1 {exp[- t] - exp[0]} • F(t) = 1 - exp[- t] • Survivor Function, S(t) = 1- F(t) = exp[- t] • Taking logarithms, lnS(t) = - t
Exponential Distribution (Cont.) • Mean = 1/ = • Memoryless feature: • Duration conditional on surviving until t = : • DURC( ) = = + 1/ • Expected remaining duration = duration conditional on surviving until time , i.e DURC, minus • Or 1/ , which is equal to the overall mean, so the distribution is memoryless
Exponential Distribution(Cont.) • Hazard rate or function, h(t) is the probability of failure conditional on survival until that time, and is the ratio of the density function to the survivor function. It is a constant for the exponential. • h(t) = f(t)/S(t) = exp[- t]/exp[- t] =
Model Building • Reference: Ch 20
20.2 Polynomial Models • There are models where the independent variables (xi) may appear as functions of a smaller number of predictor variables. • Polynomial models are one such example.
Polynomial Models with One Predictor Variable y = b0 + b1x1+ b2x2 +…+ bpxp + e y = b0 + b1x + b2x2 + …+bpxp + e
b2 < 0 b2 > 0 Polynomial Models with One Predictor Variable • First order model (p = 1) • y = b0 + b1x+ e • Second order model (p=2) y = b0 + b1x + b2x2+ e
b3 < 0 b3 > 0 Polynomial Models with One Predictor Variable • Third order model (p = 3) y = b0 + b1x + b2x2+e b3x3 + e
y x1 x2 Polynomial Models with Two Predictor Variables y b1 > 0 • First order modely = b0 + b1x1 + e b2x2 + e b1 < 0 x1 x2 b2 > 0 b2 < 0
20.3 Nominal Independent Variables • In many real-life situations one or more independent variables are nominal. • Including nominal variables in a regression analysis model is done via indicator variables. • An indicator variable (I) can assume one out of two values, “zero” or “one”. 1 if the temperature was below 50o 0 if the temperature was 50o or more 1 if a first condition out of two is met 0 if a second condition out of two is met 1 if data were collected before 1980 0 if data were collected after 1980 1 if a degree earned is in Finance 0 if a degree earned is not in Finance I=
Nominal Independent Variables; Example: Auction Car Price (II) • Example 18.2 - revised (Xm18-02a) • Recall: A car dealer wants to predict the auction price of a car. • The dealer believes now that odometer reading and the car color are variables that affect a car’s price. • Three color categories are considered: • White • Silver • Other colors • Note: Color is a nominal variable.
Nominal Independent Variables; Example: Auction Car Price (II) • Example 18.2 - revised (Xm18-02b) 1 if the color is white 0 if the color is not white I1 = 1 if the color is silver 0 if the color is not silver I2 = The category “Other colors” is defined by: I1 = 0; I2 = 0
How Many Indicator Variables? • Note: To represent the situation of three possible colors we need only two indicator variables. • Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables.
Nominal Independent Variables; Example: Auction Car Price • Solution • the proposed model is y = b0 + b1(Odometer) + b2I1 + b3I2 + e • The data White car Other color Silver color
Price 16996.48 - .0555(Odometer) 16791.48 - .0555(Odometer) 16701 - .0555(Odometer) Odometer Example: Auction Car Price The Regression Equation From Excel (Xm18-02b) we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) The equation for a silver color car. Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1) The equation for a white color car. Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0) Price = 16701 - .0555(Odometer) + 45.2(0) + 148(0) The equation for an “other color” car.
Example: Auction Car Price The Regression Equation From Excel we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) For one additional mile the auction price decreases by 5.55 cents. A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.
There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Example: Auction Car Price The Regression Equation Xm18-02b
Nominal Independent Variables; Example: MBA Program Admission (MBA II) • Recall: The Dean wanted to evaluate applications for the MBA program by predicting future performance of the applicants. • The following three predictors were suggested: • Undergraduate GPA • GMAT score • Years of work experience • It is now believed that the type of undergraduate degree should be included in the model. Note: The undergraduate degree is nominal data.
Nominal Independent Variables; Example: MBA Program Admission (II) 1 if B.A. 0 otherwise I1 = 1 if B.B.A 0 otherwise I2 = 1 if B.Sc. or B.Eng. 0 otherwise I3 = The category “Other group” is defined by: I1 = 0; I2 = 0; I3 = 0
Nominal Independent Variables; Example: MBA Program Admission (II) MBA-II
20.4 Applications in Human Resources Management: Pay-Equity • Pay-equity can be handled in two different forms: • Equal pay for equal work • Equal pay for work of equal value. • Regression analysis is extensively employed in cases of equal pay for equal work.
Human Resources Management: Pay-Equity • Solution • Construct the following multiple regression model:y = b0 + b1Education + b2Experience + b3Gender + e • Note the nature of the variables: • Education – Interval • Experience – Interval • Gender – Nominal (Gender = 1 if male; =0 otherwise).
Human Resources Management: Pay-Equity • Solution – Continued (Xm20-03) • Analysis and Interpretation • The model fits the data quite well. • The model is very useful. • Experience is a variable strongly related to salary. • There is no evidence of sex discrimination.
Human Resources Management: Pay-Equity • Solution – Continued (Xm20-03) • Analysis and Interpretation • Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 • Average salary for female manager is $76,189 Average salary for male manager is $97,832
Midterm Grade Distribution • A: 68- 7 • A-: 65-67 7 • B+: 61-64 9 • B: -59 7 • total 30
Midterm Grade distribution: Normal Distribution If you scored above the median, A- or A otherwise B or B+