540 likes | 666 Views
Introduction to Logistic Regression. Up to this point, we have been dealing with categorical exposure variables.
E N D
Introduction to Logistic Regression • Up to this point, we have been dealing with categorical exposure variables. • If the exposure variable is continuous, for example, income, the results covered in previous lectures can not be applied unless you categorize it. Keep in mind that categorizing a continuous exposure variable is problematic in some cases.
Introduction to Logistic Regression • It is necessary to have methodologies to deal with an continuous exposure variable without categorization. • Think about the research question whether or not income is associated with buying a new car.
Introduction to Logistic Regression • Q: Can we study the association by modeling the probability of buying a car as a linear function of income, i.e., Where y=1 if one buys a new car and 0 otherwise; x=income.
Introduction to Logistic Regression • A: No, because the right-hand side of the equation can be negative and can exceed 1, which is not allowable as probability is always between 0 and 1. • Therefore, we are looking for a function of x that lies in (0,1). • One such a function is called logistic function, defined as:
Introduction to Logistic Regression • Note that this logistic function is S-shaped, which means that changing the exposure level does not affect the probability much if the exposure level is low or high. • Therefore, the logistic function is suitable for describing the association of income with the probability of buying a new car because changing income does not affect the probability much if income is low or high.
Logistic Regression Model • A Logistic Regression Model is to model the conditional probability of Y=1 given explanatory variables X1,…,Xr as a logit function of a linear conbination of X1,…,Xr , i.e.,
Logistic Regression Model • Equivalently, a Logistic Regression Model is to model the logarithm of the conditional odds of Y=1 given explanatory variables X1,…,Xr as a linear function of X1,…,Xr , i.e.,
Logistic Regression Model: Marginal (crude) Odds Ratio • Logistic Regression Model allows one to obtain marginal (crude) odds ratio to study association of an explanatory variable, X, with a binary response variable, Y, where X can be either a qualitative explanatory variable or a quantitative explanatory variable. • Our remainder discussion will be split into two parts, one for X= a qualitative explanatory variable, and one for X= a quantitative explanatory variable
Logistic Regression Model: Marginal (crude) Odds Ratio • Q: how to set up a regression model to obtain marginal (crude) odds ratio to study the association of a qualitative explanatory variable with a binary response variable? • Examples of qualitative explanatory variable: gender (female, male) and race (white, black, others). • A: Qualitative explanatory variable can not be included directly into a Logistic Regression Model. Dummy variable (s) need to be created and then include them into a Logistic Regression Model. Dummy variables can be viewed as a way to quantitatively identifying the classes of a qualitative explanatory variable.
Logistic Regression Model: Marginal (crude) Odds Ratio • example 1: qualitative explanatory variable=gender • create a dummy variable , gdum, for gender: • use the following logistic regression model to link the conditional odds of Y=1(having lung cancer) given gdum with gdum:
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for gdum=1 to the odds of Y=1 for gdum=0. • Why? Because is the logarithm of the ratio of the odds of Y=1 for gdum=1 to the odds of Y=1 for gdum=0
Logistic Regression Model: Marginal (crude) Odds Ratio • Example 2: qualitative explanatory variable=race (white, black and hispanic) • Choose two classes of race, for example, white and black, and then create a dummy variable for each of these two classes: • use the following logistic regression model to link the conditional odds of Y=1(having lung cancer) given rdum1 and rdum2with rdum1 and rdum2 :
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for rdum1=1 to the odds of Y=1 for rdum1=0 with rdum2 fixed at 0 . • Why? Because is the logarithm of the ratio of the odds of Y=1 for rdum1=1 to the odds of Y=1 for rdum1=0 with rdum2 fixed at 0 .
Logistic Regression Model: Marginal (crude) Odds Ratio • Correspondence between classes of race and the values of rdum1 and rdum2
Logistic Regression Model: Marginal (crude) Odds Ratio • Since rdum1=1 and rdum2=0 corresponds to white group and rdum1=0 and rdum2=0 corresponds to hispanic group, the interpretation of becomes the ratio of the odds of Y=1 for white to the odds of Y=1 for hispanic
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for rdum1=0 and rdum2=1 to the odds of Y=1 for rdum1=0 and rdum2=0 . • Why? Because is the logarithm of the ratio of the odds of Y=1 for rdum1=0 and rdum2=1 to the odds of Y=1 for rdum1=0 and rdum2=0 .
Logistic Regression Model: Marginal (crude) Odds Ratio • Since rdum1=1 and rdum2=0 corresponds to white group and rdum1=0 and rdum2=0 corresponds to others group, theinterpretation of becomes the ratio of the odds of Y=1 for black to the odds of Y=1 for hispanic
Logistic Regression Model: Marginal (crude) Odds Ratio • Q: Can we also create a dummy variable rdum3 for others: and include it along with the other two dummy variables rdum1 and rdum2 into a Logistic Regression Model? i.e., • A: Definitely No. Two reasons are provided on the slides that follow.
Logistic Regression Model: Marginal (crude) Odds Ratio • Reason 1: the interpretation of are not odds ratios anymore. • Correspondence between classes of race and values of rdum1 , rdum2 and rdum3
Logistic Regression Model: Marginal (crude) Odds Ratio • the ratio of the odds of Y=1 for white to the odds of Y=1 for hispanic becomes
Logistic Regression Model: Marginal (crude) Odds Ratio • the ratio of the odds of Y=1 for black to the odds of Y=1 for hispanic becomes
Logistic Regression Model: Marginal (crude) Odds Ratio • Reason 2: More importantly, cause over parameterization problem, i.e., the number of parameters in the model is more than the number of quantities that can be estimated. • The three estimable quantities are the three log odds, with corresponding to white, corresponding to black, and corresponding to others i.e.,
Logistic Regression Model: Marginal (crude) Odds Ratio • However, the model uses four parameters, to describe the three estimable parameters, As a result, are not estimable.
Logistic Regression Model: Marginal (crude) Odds Ratio • In general, for a qualitative explanatory variable that has s classes, s-1 dummy variables need to be created. • Choose one class to be the reference class (for example, others is used as the reference class in example 2). Create one dummy variable for each of the non-reference classes (for example, white and black are the non-reference classes in example 2), and then include those s-1 dummy variables into a Logistic Regression Model.
Logistic Regression Model: Marginal (crude) Odds Ratio • Q: how to set up a regression model to study the association of a quantitative explanatory variable with a binary response variable? • Examples of quantitative explanatory variable: age, income. • A: Unlike a qualitative explanatory variable, a quantitative explanatory variable can be included into model directly.
Logistic Regression Model: Marginal (crude) Odds Ratio • Example 3: quantitative explanatory variable=age • use the following logistic regression model to link the conditional odds of Y=1(having lung cancer) given age
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for age=age0+1 to the odds of Y=1 for age=age0 . • Why? Because is the logarithm of the ratio of the odds of Y=1 for age=age0+1 to the odds of Y=1 for age=age0 .
Logistic Regression Model: Marginal (crude) Odds Ratio • Q: how to set up a regression model to study the association of a ordinal explanatory variable with a binary response variable? • Examples of ordinal explanatory variable: income level(=0 if income<30,000, =1 if 30,000<=income<50,000, =2 if income>=50,000 . • A: a ordinal explanatory variable can be treated as either a qualitative explanatory variable or a quantitative explanatory variable. However, Different treatments lead to different interpretations.
Logistic Regression Model: Marginal (crude) Odds Ratio • Example 4: ordinal explanatory variable=income_level (0,1,2) treated as a qualitative explanatory variable • Choose two classes of income_level, for example, 1 and 2, and then create a dummy variable for each of these two classes: • use the following logistic regression model to link the conditional odds of Y=1(having lung cancer) given incdum1 and incdum2with incdum1 and incdum2 :
Logistic Regression Model: Marginal (crude) Odds Ratio • the interpretation of : the ratio of the odds of Y=1 for income_level=1 to the odds of Y=1 for income_level=0 • the interpretation of : the ratio of the odds of Y=1 for income_level=2 to the odds of Y=1 for income_level=0
Logistic Regression Model: Marginal (crude) Odds Ratio • Example 5: ordinal explanatory variable =income level treated as a quantitative explanatory variable treated as a quantitative explanatory variable • use the following logistic regression model to link the conditional odds of Y=1(having lung cancer) given income level
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for income_level=1 to the odds of Y=1 for income_level=0. • Why? Because is the logarithm of the ratio of the odds of Y=1 for income_level=1 to the odds of Y=1 for income_level=0 .
Logistic Regression Model: Marginal (crude) Odds Ratio • Interpretation of : The ratio of the odds of Y=1 for income_level=2 to the odds of Y=1 for income_level=1. • Why? Because is the logarithm of the ratio of the odds of Y=1 for income_level=2 to the odds of Y=1 for income_level=1 .
Logistic Regression Model: Marginal (crude) Odds Ratio • Q: what is the interpretation of ? • A: The ratio of the odds of Y=1 for income_level=2 to the odds of Y=1 for income_level=0. • Why? Because is the logarithm of the ratio of the odds of Y=1 for income_level=2 to the odds of Y=1 for income_level=0 .
Logistic Regression Model: Common conditional (adjusted) Odds Ratio • We have understood the issue of using marginal (crude) odds ratio to assess the association of an exposure variable X with a binary variable Y in the presence of a confounding variable Z and the need to use conditional (adjusted) Odds Ratios. • Q: Let’s assume for now that the conditional odds ratios are the same and ask the question of how to use a Logistic Regression Model to estimate the Common conditional (adjusted) Odds Ratio?
Logistic Regression Model: Common conditional (adjusted) Odds Ratio • A:All you need to do is to add the confounding Z variable (or its dummy variables in the case where Z is a qualitative explanatory variable) into the Logistic Regression Model that contains the exposure X variable (or its dummy variables in the case where X is a qualitative explanatory variable).
Logistic Regression Model: Common conditional (adjusted) Odds Ratio • Example 6: X=gender, Z=tea drinking: • create one dummy variable for X and one dummy variable for Z : • and use the following Logistic Regression Model :
Logistic Regression Model: Common conditional (adjusted) Odds Ratio • Interpretation of : The common conditional (adjusted) odds ratio of Y and X with condition (adjustment) on Z. • Why? • is equal to the ratio of the odds of Y=1 for gdum=1 to the odds of Y=1 for gdum=0 with tdum fixed at 1 .
Logistic Regression Model: Common conditional (adjusted) Odds Ratio • Why?(cont.) • is also equal to the ratio of the odds of Y=1 for gdum=1 to the odds of Y=1 for gdum=0 with tdum fixed at 0 .
Logistic Regression Model: Test For effect modification • Q: Now, let’s remove the assumption that the conditional odds ratios are the same and ask the question of how to use a Logistic Regression Model to test whether or not Z is an effect modifier with respect to the association of X with Y.
Logistic Regression Model: Test For effect modification • A: All you need to do is to add the interaction term(s) into the logistic regression model assuming no effect modification and test for the null hypothesis that all the beta coefficients of interaction terms are equal to 0. How to create interaction term(s) depends on which case of the following four cases it is. • Case 1: both X and Z are qualitative • Case 2: X is qualitative and Z is quantitative • Case 3: X is quantitative and Z is qualitative • Case 4: both X and Z are quantitative
Logistic Regression Model: Test For effect modification • Case1: interaction terms need to be created: where are r-1 dummy variables of X , and are s-1 dummy variables of Z,
Logistic Regression Model: Test For effect modification Example 1 of case 1: X=gender, Z=tea drinking: • create one dummy variable for X and one dummy variable for Z : • Create one interaction term: • and use the following Logistic Regression Model :
Logistic Regression Model: Test For effect modification • Example 1 of case 1(cont.) • Fact: the null hypothesis that Z is not a effect modifier is equivalent to • Why? • is equal to the ratio of the odds of Y=1 for gdum=1 and tdum=1 to the odds of Y=1 for gdum=0 and tdum=1 .
Logistic Regression Model: Test For effect modification • Example 1 of case 1(cont.) • is equal to the ratio of the odds of Y=1 for gdum=1 and tdum=0 to the odds of Y=1 for gdum=0 and tdum=0 .
Logistic Regression Model: Test For effect modification Example 2 of case 1: X=gender, Z=race: • create one dummy variable for X and two dummy variable for Z : • Create two interaction term: • and use the following Logistic Regression Model :
Logistic Regression Model: Test For effect modification • Case2: X is qualitative and Z is quantitative r-1 • interaction terms need to be created: where are r-1 dummy variables
Logistic Regression Model: Test For effect modification Example of case 2: X=race, Z=age: • create two dummy variable for X • Create two interaction term: • and use the following Logistic Regression Model :
Logistic Regression Model: Test For effect modification • Case3: X is quantitative and Z is qualitative • s-1 interaction terms need to be created: where are s-1 dummy variables of Z
Logistic Regression Model: Test For effect modification Example of case 3: X=age, Z=race: • create two dummy variable for Z • Create two interaction term: • and use the following Logistic Regression Model :