210 likes | 340 Views
Regression Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore, India. Linear Regression ... a linear regression coefficient indicates the impact of each independent variable on the outcome in the context of (or “adjusting for”) all other variables.
E N D
RegressionDr.L.JeyaseelanDept. of BiostatisticsChristian Medical CollegeVellore, India
Linear Regression ... a linear regression coefficient indicates the impact of each independent variable on the outcome in the context of (or “adjusting for”) all other variables. - J. Concato, A. R. Feinstein, T. R. Holford
Overview • Research interests lies when we may want to describe the relationship and thus predict the value of one variable using the value of the other variable for an individual. • Describing the relation between the values of the two variables - Regression
Origin of Regression Concept • Sir Francis Galton (1822-1911) used the term Regression. • To explain the relationship between the heights (inches) of fathers and their sons. • Father – Son pairs (n=1,078) • Son’s height, Y = 33.73 + 0.516 (Father’s height, X) when X = 74 => Y=72 (son is not tall as his father) when X = 65 => Y = 67 (son is taller than his father)
Assumptions • Outcome is normally distributed • Independent observations • Relationship between variables is linear
Linear regression Equation Suppose we want to test whether there is any relation between birth weight (BW) of baby and Blood Pressure (BP) Dependent variable is BP and independent variable is BW So the equation will be BP= a + b (BW) i.e. Given a value of birth weight (BW) corresponding Blood Pressure (BP) can be predicted. In mathematics Y is called a function of X but in statistics the term regression is used to describe the relationship.
So the regression equation will be • What does these coefficients tells us? • The slope b means that for each unit change in X (i.e. Birth weight), Y ( Blood Pressure) increases by 25.34 units.
Straight line: The equation of the straight line is Y = ß0 +ß1 X where ß0 is the Y intercept of the line ß1 is the slope. The following diagram depicts the relationship between the blood pressure and the drug concentration.
The highest line is of the relationship Y=20+15X, which represents the effect of drug A on an animal. The quantity of drug is measured in micrograms, the blood pressure in millimeters mercury. If 4g of the drug have been given, then the blood pressure would be Y=20 + 15(4)=80mm Hg. If the independent variable equals zero, the dependent variable does not also equals zero, but equals ß0. In the diagram, it equals to a blood pressure of 20mm, which is the normal BP of animal in the absence of drug. Obviously, when no drug is administered, the BP should be at the same Y - intercept, since the identical animal is studied.
In the above equation ß0 is called Y-intercept. ß1 is called the slope or regression coefficient. In the lowest line, Y=20+7.5X, the Y intercept remains the same, but the slope has been halved. We visualize this as the effect of a different drug B on the animal. (Kleinbaum and Kupper, 1978)
Test for slope and intercepts The null hypothesis is, ß1 = 0. Wald Statistics, t = The data showed 5 units change in cholesterol level for a one year increase in age Is this increase of 5 units, just confined to this dataset (chance effect) or is it a real change due to the effect of age
Interpretation For a one year increase in age, there is a significant 5 units increase in cholesterol level
Prediction Age = 43 years Cholesterol = ??? Cholesterol = 107.55 + (5.25*Age)
Principle… Estimated value of Y at X = Xi:- where and are the intercept and slope regression parameters to be determined Error in predicting an actual observation Y = Yi at X = Xi is
Total sum of squared errors (SSE) x y x x x x x x Objective: Fit so that SSE is minimised.
Simple (Linear) Regression One independent variable Age and cholesterol Age and BP Age and Forced Vital Capacity
Multiple (Linear) Regression More than one independent variable Age, gender, BMI and cholesterol Age, height, weight and FVC
Uses: • Measure of linear association • Interpolation • Prediction after controlling confounders • To identify which combination of variables best predicts • response variables or outcome.
Misuses • Extrapolation without assurance that the trend remains • same. • Using the regression relationship whose slope has • been shown to be not significantly different from zero • Concluding that cause and effect relationship exists, • while the relationship may just be statistical • Applying the relationship established in one group of subject to another group without the assurance that is applicable to all groups.