330 likes | 557 Views
Ch 18 Simple Linear Regression. 2010.02.24. Like correlation analysis, simple linear regression is a technique to explore the relationship between two continuous random variables . What are the differences between correlation analysis and simple linear regression?
E N D
Ch 18 Simple Linear Regression 2010.02.24
Like correlation analysis, simple linear regression is a technique to explore the relationship between two continuous random variables. • What are the differences between correlation analysis and simple linear regression? • Among children of both sexes, head circumference appears to increase linearly between the ages of 2 and 18 years.
In this case, head circumference is the response, and age is the explanatory variable. • An understanding of their relationship helps parents and pediatricians to monitor growth and detect possible cases of macrocephaly and microcephaly.
18.1 Regression Concepts • Suppose that we are interested in the probability distribution of a continuous random variable Y. (response variable) Y: the head circumference measurements in centimeters for the population of low birth weight infants. • Since the distribution of measurements is roughly normal, we are able to say that approximately 95% of the infants have head circumferences that measure between 22.1 cm and 31.9 cm.
Suppose we also know that the head circumferences of newborn infants increase with gestational age. • Under age x (explanatory variable), the distribution of measurements is approximately normal. For example, the head circumferences of infants whose gestational age is 26 weeks are normally distributed with mean μy|26=24 cm and σy|26=1.6. 18.1 Regression Concepts
For each value of gestational age x, the standard deviation σy|x is constant and is less than σy. • If X and Y have no linear relationship. • If , the . There is a fairly strong correlation between x and y in the underlying population of low birth weight infants, but we cannot determine whether the correlation is positive or negative. 18.1 Regression Concepts
18.2.1 The population Regression Line • As noted in the preceding section, mean head circumference tends to become larger as gestational age increases. • Where is the mean head circumference of low birth weight infants whose gestational age is x weeks. This model ---known as the population regression line ---is the equation of a straight line.
The parameter and are constants call the coefficients of the equation; is the y-intercept of the line and is its slope. • The slope is the change in the mean value of y that corresponds to a one-unit increase in x. 18.2.1 The population Regression Line
We actually fit a model of the form where , known as the error, is the distance a particular outcome y lines from the population regression line In simple linear regression, the coefficients of the population regression line are estimated using a random sample of observations (xi, yi). Before we attempt to fit such a line, we must make a few assumptions: 18.2.1 The population Regression Line
Assumptions: • For a specified value of x, the distribution of y is normal with mean and standard deviation . • The relationship between and x is described by the straight line • For any specified value of x, ---the standard deviation of the outcomes y ---does not change. This assumption of constant variability across all values of x is known as homoscedasticity. • The outcomes y are independent. 18.2.1 The population Regression Line
Figure18.3 18.2.1 The population Regression Line
Figure18.4 18.2.1 The population Regression Line
18.2.2 The Method of Least Squares • Explanatory variable and response variable. • Lines sketched by two different individuals are unlikely to be identical, even though both persons might be attempting to depict the same trend. Which line best describes the relationship of them? BY Method of Least Square!
Figure18.5 two-way scatter plotFigure18.6 residuals 18.2.2 The Method of Least Squares
If yi is the observed outcome of Y for a particular value xi, and (yi-hat) is the corresponding point on the fitted line. • We choose a criterion for fitting a line that makes the residuals as small as possible. • We fine that 18.2.2 The Method of Least Squares
Testing 18.2.3 Inference for Regression Coefficients
It can be shown that a test of the null hypothesis is mathematically equivalent to the test of is the correlation between head circumference and gestational age in the underlying population of low birth weight infants. • . 18.2.3 Inference for Regression Coefficients
18.2.4 Inference for predicted values • We might also be interested in estimating the mean value of y corresponding to a particular value of x.
18.2.4 Inference for predicted values • The predict an individual value of y for a new member of the population. • The 95% prediction interval for an individual outcome y.
18.3.1 The Coefficient of Determination • The coefficient of determination (判定係數): R2 • is the variation in the y values that still remains after accounting for the relationship between y and x. • must be the variation in y that is explained by linear regression. • R2 is the proportion of the total observed variability among the y values that is explained by the linear regression of y on x.
R2=0.6095; this value implies a moderately strong linear relationship between gestational age and head circumference; 60.95% of the variability among the observed values of head circumference is explained by the linear relationship. • The remaining 39.05% of the variation is not explained by this relationship. 18.3.1 The Coefficient of Determination
Residual Plots: evaluating how well the least-squares regression line fits the observed data. • Residual of observation: 18.3.2 Residual Plots
A plot of the residuals can also suggest a failure in the assumption of homoscedasticity. Recall that homoscedasticity means that the standard deviation of the outcomes y, or , is constant across all values of x. • Figure 18.11. In this case, simple linear regression is not the appropriate technique for modeling the relationship between x and y.(the residual either increases or decreases as y-hat) 18.3.2 Residual Plots
18.3.3 Transformations • If the residuals do not exhibit a random scatter but instead follow a distinct trend. This would suggest that the true relationship between x and y might not be linear. A transformation of x or y or both might be appropriate. • We begin by looking at transformations of the form xp or yp, where p=…,-3,-2,-1,-1/2, ln,1/2,1,2,3,...
The circle of powers--or ladder of powers—provides a general guideline for choosing a transformation. • Quadrant I: the power of bother x and y are grander than p=1. • Quadrant II: the power of y is grander than 1 and of x is less than 1. • Quadrant III: the power of x and y are less than 1. • Quadrant IV: the power of y is less than 1 and of x is grander than 1. 18.3.3 Transformations