Lecture Notes

Lecture Notes The Relation between Two Variables Q Q Correlation and Regression Prof. L Prado OER - www.helpyourmath.com

Mathematical model is a mathematical expression that represents some phenomenon. It can be deterministic model or probabilistic model Often describe the relationship between 2 variables.

1 2 3 • Learning objectives Draw and interpret scatter diagrams Understand the properties of the linear correlation coefficient Compute and interpret the linear correlation coefficient

10.1. Scatter Diagrams and Correlation When dealing with 2 variables: • We try to see the relationship between the 2 variables • Sometimes there is a 3rd variable that is not considered, that affects the results (lurking variable). Shoe size does not cause height to change (age affects both the two variables) • Therefore, we can’t conclude that variable A causes B • Some examples are: • Rainfall amounts and plant growth (possible lurking var. Sunlight) • Exercise and cholesterol levels for a group of people (possible lurking variable Diet) • Height and weight for a group of people • Height and fast speed you have ever driven a car. • When we have two variables, they could be related in one of several different ways • They could be unrelated • One variable (the explanatory or predictor variable) could be used to explain the other (the response or dependent variable) • One variable could be thought of as causing the other variable to change

Scatter Diagrams The scatter diagramis a graph that shows the relationship visually between 2 quantitative variables. The explanatory variable is plotted on the horizontal axis, the response variable on the vertical axis The response variable (y-axis) is the variable whose value can be explained by the value of the explanatory variable (x-axis).

Linear Correlation • The linearcorrelationcoefficient is a measure of the strength and direction of linear relation between two quantitative variables • The sample correlation coefficient “r” is • This should be computed with software (and not by hand) whenever possible

Answer ‘How StrongIs the Linear Relationship Between 2 Variables?’ • Coefficient of Correlation Used • Population Correlation Coefficient Denoted  (Rho) • Values Range from -1 to +1 • Measures Degree of Association • The sign of r indicates the direction of the relationship: Positive the two variables tend to increase together. Negative one variable increases, the other is likely to decrease. • Used Mainly for Understanding

Perfect Negative Correlation Perfect Positive Correlation No Correlation -1.0 -.5 0 +.5 +1.0 Increasing degree of negative correlation Increasing degree of positive correlation

Strong Negative r = –.8 Strong Positive r = .8 Moderate Negative r = –.5 Moderate Positive r = .5 Very Weak r = .1 Very Weak r = –.1 • Examples of positive correlation • Examples of negative correlation • In general, if the correlation is visible to the eye, then it is likely to be strong

Data x 1 2 1 8 3 6 5 4 y nxy–(x)(y) (Shorcut formula) r= n(x2)– (x)2n(y2)– (y)2 4(48)–(10)(20) r= 4(36)– (10)24(120)– (20)2 –8 r= = –0.135 59.329

Correlation is not causation! • Just because two variables are correlated does not mean that one causes the other to change • There is a strong correlation between shoe sizes and vocabulary sizes for grade school children • Clearly larger shoe sizes do not cause larger vocabularies • Clearly larger vocabularies do not cause larger shoe sizes • Often lurking variables result in confounding

Summary: Chapter 10 – Section 1 • Visual methods • Scatter diagrams • Analogous to histograms for single variables • Numeric methods • Linear correlation coefficient • Analogous to mean and variance for single variables • Care should be taken in the interpretation of linear correlation (nonlinearity and causation) • Correlation between two variables can be described with both visual and numeric methods

Chapter 10 – Section 2 1 2 3 • Learning objectives Find the least-squares regression line and use the line to make predictions and estimations Interpret the slope and the y-intercept of the least squares regression line Compute the sum of squared residuals

If we have two variables X and Y, we often would like to model the relation as a line • Draw a line through the scatter diagram • We want to find the line that “best” describes the linear relationship … the regression line (“The Best Fit”)

Linear Equations • We want to use a linear model • Linear models can be written in several different (equivalent) ways • y = m x + b • y – y1 = m (x – x1) • y = b0 + b1x • Because the slope and the intercept are important to analyze, we will use y = b0 + b1x

Linear Equations BMCC PROFESSOR

The residual The model line The observed value y The predicted value y The x value of interest • One difference between math and stat is that statistics assumes that the measurements are not exact, that there is an error or residual • The formula for the residual is always • Residual = Observed – Predicted • What the residual is on the scatter diagram The equation for the least-squares regression line is given by y = b0 + b1x • b1 is the slope of the least-squares regression line (marginal change) • b0 is the y-intercept of the least-squares regression line

x 1 2 4 5 ^ y= 5 + 4x y 4 24 8 32 Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

n(xy) – (x) (y) b1 = (slope) n(x2) – (x)2 b0 =y – b1x (y-intercept) (slope of the least-squares regression line) (Shorcut) calculators or computers can compute these values

Finding the values of b1 and b0, by hand, is a very tedious process • You should use software for this • Finding the coefficients b1 and b0 is only the first step of a regression analysis • We need to interpret the slope b1 • We need to interpret the y-intercept b0 • We need to do quite a bit more statistical analysis … this is covered in Section 4.3 and also in Chapter 14

Data x 1 2 1 8 3 6 5 4 y n(xy) – (x) (y) b1 = b0 =y – b1x 5 – (–0.181818)(2.5) = 5.45 n(x2) –(x)2 4(48) – (10) (20) b1 = 4(36) – (10)2 –8 b1 = = –0.181818 44 n= 4 x = 10 y = 20 x2 = 36 y2 = 120 xy = 48 Theestimated equation of the regression line is: ^ y= 5.45 – 0.182x

Guidelines for Using The Regression Equation 1. If there is no significant linear correlation, don’t use the regression equation to make predictions. 2. When using the regression equation for predictions, stay within the scope of the available sample data. 3. A regression equation based on old data is not necessarily valid now. 4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

Lecture Notes

Lecture Notes

Presentation Transcript

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture notes

Lecture Notes

Lecture notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture Notes

Lecture notes

Lecture Notes

Lecture Notes

Lecture Notes