470 likes | 852 Views
Chapter 16 logistic Regression Analysis. Content. Logistic regression Conditional logistic regression Application.
E N D
Content • Logistic regression • Conditional logistic regression • Application
Purpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variables (risk factors). Logistic regression is a kind of nonlinear regression. Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no“.2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categorical variable should be quantified.
Implication: Logistic regression can be used to study the quantitative relations between the happening of some diseases or phenomena and many risk factors. There are some demerits to use test (or u test ): 1. can only study one risk factor. 2. can only educe the qualitative conclusion.
Category: 1.Between-subjects (non-conditional) logistic regression equation 2. Paired (conditional) logistic regression equation
§ 1 logistic regression (non-conditional logistic regression )
I Basic Conception The probability of positive outcome under the function of m independent variables can be marked like this:
If: Regression model Probability: P:0~1,logitP:-∞~∞。 Scale:
The meaning of model parameter By constant we mean the natural logarithm of likelihood ratio between happening and non-happening when exposure dose is zero. By regression coefficientwemean the change of logitP when the independent variable changes by one unit.
Odds ratio(OR) The statistical indicator--odds ratio which is used to measure the function of risk factor in the epidemiology ,the formula of computation is:
II the parametric estimation of logistic regression model • parametric estimation • Theory:the estimation of likelihood
2.Estimation of ORIt can show the OR of two different levels (c1,c0)of one factor.
e.g.: 16-1 Table 16-1 is a case-control data which is used to study the relations among smoking、drinking and esophagus cancer, please try running logistic regression analysis. Definite every variable’s code
Table16-1 the case-control data of the relation between smoking and esophagus cancer
Results: The OR of smoking and nonsmoking: 95 confidence interval of The OR of drinking and no drinking 95 confidence interval of
III the hypothesis test of logistic regression model 1. Likelihood test 2. Wald test comparing the estimations of parameters with zero, the control is its standard error , statistics are: Both of are more than 3.84, that is to say that esophagus cancer、smoking and drinking have relations with each other. The conclusion is same as above.
IV variable selection methods:forward selection、backward elimination and stepwise regression .Test statistics:it is not F statistic,but one of likelihood、 Wald test and score test statistics. e.g.: 16-2 In order to discuss the risk factors that relate to coronary heart disease, to take case-control study on 26 coronary heart disease patients and 28 controllers, table 16-2 and table 16-3 show the definition of all factors and the data. Please try using logistic stepwise regression to select the risk factors.
Table 16-2 eight probable risk factors of coronary heart disease and valuation
Table 16-3 the case-control data of heart disease’s risk factors
Learn how to see the results! Table 16-4 e.g.16-2 the independent variables which are entering equation and estimations of related parameters
Content • Logistic regression • Conditional logistic regression • Application
§2 conditional logistic regression I Principle
Table 16-5 the data format of 1: Mconditional logistic regression * t = 0 is the case and the others are the control.
P344: Table 16-7 the data table of 1:2 paired case-control study about larynxcancer
Using stepwise Six risk factors variable selection four factors enter equation,Table16-9 shows the results。 Table16-8 e.g.16-3 The Estimation of independent variables and related parameters which have entered the equation
Content • Logistic regression • Conditional logistic regression • Application
§ 3 the application of logistic regression and the notice I the application of logistic regression 1.The analysis of epidemiologic risk factors One feature of logisticregression is that the meaning of parameter is clear, so logistic regression is suitable for epidemiologicstudy.
2.Analysis of clinical experiment The goal of clinical experiment is to assess the effect of some drugs or cure methods, if there are some confounding factors, and they are not balance among teams, the final results will be wrong. So it is necessary to adjust these factors during the process of analysis. when dependent variable is binary, we can use logistic regression to analyze and get the adjusted results.
3.Analyze dose–response of drugs or poisons In the studies about dose–response of some drugs or poisons, if the date is the logarithm ofdose ,the Probability distribution close to normal. The distribution of normal function is very similar to logistic regression, then we can express their relation through the following model. (While P is the positive rate; X is dose.)
4.Forecast and discrimination logistic regression is a model of probability ,so we can use it to predict the probability of something. For example in clinical we can discriminate the probability of some diseases under some index. please refer to the chapter 18 about discrimination.
summary: Purpose: Work out the equations for logistic regression which are used to estimate the dependent variable (outcome factor) from the independent variable (risk factor). Logistic regression belong to probability type and nonlinear regression. Data: 1.The dependent variable is a binary categorical variable that has two values such as "yes" and "no“.2.All of the independent variables, at least, most of which should be categories. Of course, some of them can be numerical variable. The categories variable should be measure by number.
Implication: Logistic regression can be used to study the quantitative relations between the happening of some disease or phenomena and many risk factors Category: • 1.Between-subjects (non-conditional) logistic regression equation • 2. Paired (conditional) logistic regression equation
Thinking: In order to analysis the influent factors of the rescue of AMI patients, a hospital collected five years’ data of AMI patients (there are many related factors ,this case only lists three ones for the limited space), which has 200 cases in total, the data has been shown in the following table, P=0 means successful rescue,P=1 means death;X1=1 means shock before rescue, X1=0 means no shock before rescue; X2=1 means heart failure before rescue, X2=0 means no heart failure before rescue; X3=1 means that it has been more than 12 hours from the beginning of AMI symptom to rescue, X3=0 means the time has not passed 12 hours. which analysis method is the best one? why? which result can we got?