410 likes | 417 Views
This topic involves comparing a set of data that includes two variables - one variable is the independent variable (explanatory variable) and the other is the dependent variable (response variable). The strength of the relationship between the two variables is analyzed using Pearson's Product Correlation coefficient, coefficient of determination, and finding a line of best fit called the Least Squares Regression.
E N D
RELATIONSHIPS BETWEEN TWO NUMERICAL VARIABLES GENERAL MATHS Unit 2
y-axis RESPONSE VARIABLE this topic involves…. x-axis EXPLANATORY VARIABLE • Comparing a set of data that includes two variables – - One Variable is the INDEPENDENT variable – The EXPLANATORY variable - The other is the DEPENDENT variable – The RESPONSE variable • Analysis of the data on a SCATTERPLOT • The strength of the relationship between the 2 variables are compared using both • Pearson’s Product Correlation coefficient, • Coefficient of Determination, • Finding a line of best fit called the LEAST SQUARES REGRESSION and interpreting the result based on the equation of this line.
Explanatory vs response variables The value of the RESPONSE variable depends on the EXPLANATORY variable. We can say the scenario in this sentence to help identify which variable is which…… “ { Response Variable } ” depends on the “ { Explanatory Variable } ” Lets explain the logic by identify the EXPLANATORY and RESPONSE variable in each case: a. The ‘time spent filling up a swimming pool’ with water compared to the ‘size of the pool’. b. The ‘hours per week spent doing laundry’ compared with ‘the number of children living in the house’. c. The ‘last time you went to the hairdresser’ and the ‘length of your hair’. d. A ‘child’s height’ compared to their ‘age’.
If you need more help with these terms, view the following video tutorials Explanatory vs response variables SCATTERPLOTS https://www.youtube.com/watch?v=VlJ85vE97lA http://www.vcefurthermaths.com/2011/02/interpreting-scatterplots/ http://www.vcefurthermaths.com/2011/01/tutorial-12-independent-and-dependent-variables/
SCATTERPLOTS We plot the EXPLANATORY variable on the x-axis and the RESPONSE variable on the y-axis. We can plot these on a scatterplot by hand, but we can also use our calculator to create it for us. eg. Results on a Maths test were compared to the time spent studying the day before the test. The following data was obtained from a sample of 10 students. CALCULATOR: Statistics Enter Data ‘Study’ into List 1, Enter Data ‘Results’ into List 2.
CALCULATOR: Statistics Enter Data ‘Study’ into List 1, Enter Data ‘Result’ into List 2. To plot the SCATTERPLOT, now press Choose - Type: Scatter - Xlist: Explanatory Variable List (Time) - Ylist: Response Variable List (Result) Click SET Now click graph
Result Study Time The following scatterplot is produced. Comparing this scatterplot to the graphs on our worksheet, we can say that there is a Strong Positive relationship between the number of hours studying and the score achieved on the test.
CORRELATION The relationship between variables is called the correlation. We describe the correlation in terms of form, strength and direction. Form – Linear or Non Linear (all of this course will be linear, but we still need to mention this) Strength – Strong, Moderate, Weak Direction – Positive, Negative
CORRELATION Strong positive linear correlation Moderate positive linear correlation Weak positive linear correlation No correlation Weak negative linear correlation Moderate negative linear correlation Strong negative linear correlation Form – Linear or Non Linear (all of this course will be linear, but we still need to mention this) Strength – Strong, Moderate, Weak Direction – Positive, Negative
Pearson’s product correlation coefficient • What is it? • Correlation between sets of data is a measure of how well they are related. The most common measure of data correlation in statistics is the Pearson Correlation. • It shows the linear relationship between two sets of data. • The correlation coefficient can be represented as “ r ” • Calculated using the equation: • Luckily, our calculator can solve this for us!
Pearson’s product correlation coefficient • What does it look like? What values can we expect? • The value will be between -1 and 1. • 1 tells us there is a perfect positive relationship • -1 tells us there is a perfect negative relationship • Relationships for r values in between are given in the following table
Pearson’s product correlation coefficient If you need more help with this, view the following video tutorial http://www.vcefurthermaths.com/2011/02/tutorial-18-pearsons-correlation-coefficient/
Pearson’s product correlation coefficient • Using what we know about ‘r’, view the scatterplot and predict – what approximate value will r have? Result • Using this data, we can use the calculator to find r. • First, using the Statistics function on your • calculator, place the data into List 1 and List 2 Study Time Finding the Correlation Coefficient ‘r’ using the calculator eg. Consider the problem given earlier – Results on a Maths test were compared to the time spent studying the day before the test. The following data was obtained from a sample of 10 students. This data gave the scatterplot:
Pearson’s product correlation coefficient Xlist: list with Explanatory Variable Ylist: list with Response Variable Choose: CalcLinear Reg Finding the Correlation Coefficient ‘r’ using the calculator
Pearson’s product correlation coefficient Click OK. The following information is revealed. Reading r off this gives the Correlation Coefficient r = 0.964 This confirms that there is a strong positive relationship between the amount of study done and the score achieved on the maths test. Finding the Correlation Coefficient ‘r’ using the calculator
Coefficient of determination Readingoff this gives the Coefficient of Determination = 0.929 From this, we can say that 92.9% of the total variation in y (test result) can be explained by variation in x (the amount of time spent studying) We will cover more on this later….. Finding the Coefficient of determination ‘ ’ using the calculator • Thecoefficient of determination, r2, tells us how much the variation in the response variable can be explained by the variation in the explanatory variable • It is a measure that allows us to determine how certain one can be in makingpredictions from a certain model/graph.
Now DO EXERCISE 14.2 Q7abc, 2, 8, 9, 10, 3, 4, 11, 5
The coefficient of determination • Is a measure that allows us to determine how certain one can be in making predictions just by looking at a certain model/graph. • Thecoefficient of determination, r2, tells us how much the variation in the response variable can be explained by the variation in the explanatory variable (note: not whether one causes the other). • We find r2 simply by squaring our r value, ALSO our calculator also generates this value for us • Because r values have to be between -1 and +1, these r² values are between 0 and 1. • When answering questions about the Coefficient of Determination, the way we word our response is important – Variations to the response variable are not caused by variations of the explanatory variable, rather we can say, The Variations of the Response Variable can be explained by variations of the Explanatory variable.
The coefficient of determination If you need more help with this, view the following video tutorial http://www.vcefurthermaths.com/2011/02/tutorial-19-coefficient-of-determination/
The coefficient of determination Using the calculator to find Xlist: Choose list which has the EXPLANATORY variable Ylist: Choose list which has the RESPONSE variable Choose CALC, Linear Reg Click STATISTICS. Insert Data into lists is found!
The coefficient of determination Again, looking at our prior example: Results on a Maths test were compared to the time spent studying the day before the test. This data was obtained from a sample of 10 students. Readingoff this gives the Coefficient of Determination = 0.929 From this, we can say that 92.9% of the total variation in y (test result) can be explained by variation in x (the amount of time spent studying) Finding the Coefficient of determination ‘ ’ using the calculator
The coefficient of determination Lets try another example: The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising, to see if one influences the other. Choose CALC, Linear Reg Click STATISTICS. Insert Data into lists is found!
The coefficient of determination The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising, to see if one influences the other. Readingoff this gives the Coefficient of Determination = 0.873 From this, we can say that 87.3% of the total variation in y (heart rate) can be explained by variation in x (the amount of time spent exercising) Now look at the r value…..what else does this tell us about the relationship between the two variables? is found!
Now DO EXERCISE 14.2 Q6, 12, 13, 14, 15, 16, 17
LEAST SQUARES REGRESSION LINE Choose • The least squares regression line is simply a line of best fit of your scatter-plotted data. • Our calculator can easily be used to plot this lineand give us the equation that represents this line. What does it look like??
LEAST SQUARES REGERESSION LINE Lets find the linear regression line for our earlier example: The resting heart rate of 10 individuals were compared with the number of hours per week they spent exercising. Choose CALC, Linear Reg Click STATISTICS. Insert Data into lists Select drop down box to display
LEAST SQUARES REGRESSION LINE The value of and the value of We can then put these values into our straight line equation to find our regression line. We are used to seeing this in the form The equation we are using in this topic is the same as this, except the “m” is replaced by “b” and the “c” is replaced with “a”. Which for this example gives: are found!
Now DO EXERCISE 14.3 Q1, 2, 5, 6, 11, 13, 14, 10
LEAST SQUARES REGRESSION EQUATIONS{ without the lists of data } Where a and b can be found using…: There may be questions to answer when you have no access to the original table of data, but INSTEAD we have other information about the data (such as mean, standard deviation). In these cases we can use the following equations to find values forr, r2, or the equation of the least squares regression line.
LEAST SQUARES REGRESSION EQUATIONS { without the lists of data } eg1. Given the summary details 1) Find the value of b (correct to 2 decimal places): 2) Find the value of a (correct to 2 decimal places): 3) Find the linear regression equation :
LEAST SQUARES REGRESSION EQUATIONS { without the lists of data } eg2. The following values were found when comparing the width of strawberries compared to the hours of daylight the plant was exposed to: Mean width of strawberries = 25mm Mean hours of daylight = 8 hours Standard Deviation of width of strawberries = 5mm Standard Deviation of hours of daylight = 3 hours Pearson’s Correlation Coefficient = 0.92 Find the equation of the least squares (linear) regression line (follow the given steps) 1) Decide which is the EV (x-data) and which is the RV (y-data) 2) Define your variables based on which is your EV and RV 3) Sub values into your “b” equation to find “b” 4) Sub values into your “a” equation to find “a” 5) Sub your “a” and “b” values into the “y” equation to give your answer.
Eg2 (cont’d) The following values were found when comparing the width of strawberries compared to the hours of daylight the plant was exposed to: Mean width of strawberries = 25mm Mean hours of daylight = 8 hours Standard Deviation of width of strawberries = 5mm Standard Deviation of hours of daylight = 3 hours Pearsons Correlation Coefficient = 0.92 Find the equation of the least squares (linear) regression line. 1) Decide which is the EV (x-data) DAYLIGHT HOURSand which is the RV (y-data) WIDTH OF STRAWBERRY 2) Define your variables based on which is your EV and RV (DONE ABOVE) 3) Sub values into your “b” equation to find “b” 4) Sub values into your “a” equation to find “a” 5) Sub your “a” and “b” values into the “y” equation to give your answer.
Using the calculator to find values {if we have data} Scroll through to find Enter your data into lists Choose calc – Two-Variable Xlist : Explanatory Data Ylist : Response Data
Now DO EXERCISE 14.3 Q1, 2, 5, 6, 11, 13, 14, 10, 3, 4, 7, 8
Making predictions {Interpolation & extrapolation} How many cars could be made for a cost of $7000? What would the ‘fixed’ costs be in this scenario hint - When no cars are made yet ? $1200 Using what we have learnt, we can make predictions about values which lie within a given data set. We can do this by using our least squares regression equation, substituting in an exact value for either or and solving for the other unknown. Example 1:A factory produces toy cars. The linear regression line for the cost to produce number of toy cars is Predict the cost to produce 80 toy cars. Solution:We can substitute in the number of cars, 80, into and solve for
Making predictions {Interpolation & extrapolation} What is Interpolation and Extrapolation? Interpolation – An estimation of a value within the given data set - predictions are reliable. Extrapolation – An estimation of a value which is outside of the given set of data – predictions not as reliable. Example 2:The height of plants were measured and compared to amount of water they received weekly. A) Is predicting the height of a plant which receives 600ml of water reliable? 600ml – This value is within our data set, so is an INTERPOLATION – predictions reliable. B) Is predicting the height of a plant which receives 2000ml of water reliable? 2000ml – This value is outside of our data set, so is an EXTRAPOLATION – predictions not reliable.
Making predictions {Interpolation & extrapolation} CAN ALSO BE WRITTEN AS b – Interpolation (lies within our given data set) RELIABLE d – Extrapolation (lies outside of the data set) NOT RELIABLE, INDICATION ONLY Example 3:A study of the weekly supermarket shopping cost of various household income groups produced the following results shown in the table below. Determine the least squares regression line Predict the weekly spend for an income of $720 Predict the weekly income if the supermarket spend is $146 Predict the weekly spend for an income of $1500 How reliable are your predictions in b and d?
Now DO EXERCISE 14.4 Q1 - 6, Q8 – 14, Q16