150 likes | 321 Views
Linear Regression. Modeling with Data. The BIG Question. Did you prepare for today?. If you did, mark yes and estimate the amount of time you spent preparing on your frequency log. Problem.
E N D
Linear Regression Modeling with Data
The BIG Question Did you prepare for today? If you did, mark yes and estimate the amount of time you spent preparing on your frequency log.
Problem Suppose we are given the following data about father and son heights to analyze. What can we conclude about it?
Connect Is there anything we have studied that can help you think where to start? How about if we formulate a hypothesis to investigate such as: Is there a correlation between a father’s height and his son’s height? : There is a correlation between a father’s height and his son’s height. : There is no correlation between a father’s height and his son’s height.
Definitions For a problem such as this one, we are trying to determine if there is a relationship between two variables. This is called a correlation. The data can be represented as ordered pairs (x, y). Does anyone recall what the x and y are called? The x-variable is the independent (or explanatory) variable and the y-variable is the dependent (or response) variable. This is similar to the concepts you have seen in algebra. In our example, the father’s height is the independent variable and the son’s height is the dependent variable.
Scatter plot A scatter plot is a plotting of the ordered pairs (x, y) which is used to see what kind of correlation two variables might have. Example 1: What kind of correlation would you guess these data sets to have? Negative Linear Correlation Nonlinear Correlation No Correlation Positive Linear Correlation
Father and Son Data Scatter plot Using SPSS, I loaded the father and son height data into the software. I then generated a scatter plot for the data which looks like: What kind of correlation does it look like it might have? Looks like a positive linear relationship.
Question Is there a way can we can calculate to find out if there is a correlation and how strong it might be? The correlation coefficient, denoted as r, gives us a measure of the strength and direction of a linear relationship between two variables. The population correlation coefficient is denoted as ρ . How do we calculate the correlation coefficient? The formula is: Where n is the number of data pairs.
1 -1 0 What is the correlation coefficient for the father and son data? Using SPSS we have the following output: This is the correlation coefficient. About where .668 is. What is the range for the correlation coefficient? ● If r is close to 0 there is no linear correlation If r = -1 there is a perfect negative correlation If r = 1 there is a perfect positive correlation
Analysis Since the correlation coefficient is .668, this implies there seems to be a positive linear relationship between a father’s height and his son’s height. However, does this imply that this relationship is significant enough to use it to predict if it would hold as a population correlation coefficient for ρ? We would use r as the test statistic and could use the standardized test statistic t with degrees of freedom n - 2. How do we calculate the t statistic here?
Hypothesis testing for significance Testing the null hypothesis that there is no linear relationship between the independent and dependent variables, we would use the model: : ρ = 0 : ρ≠ 0 • = .05 Degrees of freedom would be 11 – 2 = 9. Thus at a .05 significance, the rejection region starts at - = -2.262 and = 2.262. Example
Calculate and Summarize By running a model analysis in SPSS we have: At the .05 level of significance, the t-value is 2.690 The test statistic lies inside of the rejection region which starts at 2.262. Thus there is enough evidence to reject the null hypothesis and conclude there is a significant linear correlation between a father’s height and his son’s height.
Finding the Regression Line Now that we know that there is a significant linear correlation between a father and son’s height, we can find the regression line. The regression line is the line that best models the data. It can be used to predict the value of y given a value of x. In SPSS we find the regression line to the right:
Question Can we find the exact equation of the regression line? Yes, the equation is similar to the equation of a line from algebra. Who recalls the equation of a line?