1 / 65

Bivariate Data Analysis

Bivariate Data Analysis. Bivariate Data analysis. Modelling the data.

sanson
Download Presentation

Bivariate Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bivariate Data Analysis Bivariate Data analysis

  2. Modelling the data

  3. Basically we need to find an equation of a straight line that may model the relationship. The gradient of the line gives us a rate i.e. the ‘y’ units per ‘x’ units e.g. the grams of fat per gram of protein. (Note: We do NOT say “a 1 unit increase in xresults in a change of b units in y”(where b is the gradient) This encourages causal thinking. Beware.

  4. Quote“All models are wrong - but some are useful”George Box, famous statistician

  5. When the data points lie exactly on a straight line… This equation describes the relationship between the variables x and y

  6. Going Crackers! • Do crackers with more fat content have greater energy content? • Can knowing the percentage total fat content of a cracker help us to predict the energy content? • If I switch to a different brand of cracker with 100mg per 100g less salt content, what change in percentage total fat content can I expect?

  7. Common Cracker Brands 380 430 480 530 Energy (Calories/100g) The energy content of 100g of cracker for 18 common cracker brands are shown in the dot plot with summary statistics below. Based on the information above, my prediction for the energy content of a cracker is _____?_______ Calories per 100g

  8. Common Cracker Brands 380 430 480 530 Energy (Calories/100g) The energy content of 100g of cracker for 18 common cracker brands are shown in the dot plot with summary statistics below. Based on the information above, my prediction for the energy content of a cracker is about 449 Calories per 100g

  9. Another quantitative variable which could be useful in predicting (the explanatory variable) the energy content (the response variable) of 100g of cracker is _______

  10. Another quantitative variable which could be useful in predicting (the explanatory variable) the energy content (the response variable) of 100g of cracker iscarbohydrate content.

  11. The Consumermagazine gives some nutritional information from an analysis of these 18 brands of cracker. Some of this information is shown in the table below:

  12. What do I see in these scatter plots? The data suggests a linear trend Positive association The data suggests constant scatter Appears to be a strong relationship No outliers No groupings Response variable Explanatory variable

  13. What do I see in these scatter plots?

  14. What do I see in these scatter plots? • The data suggests a linear trend • Positive association • The data suggests constant scatter • Appears to be a moderate relationship • No outliers • No groupings

  15. What do I see in these scatter plots?

  16. What do I see in these scatter plots? • No obvious trend overall • Suggestion of two groups (about 30 or less AND about 50-60 crackers per pkt) • No outliers

  17. From these plots, the best explanatory variable to use to predict energy content is ________________________ because _____________

  18. From these plots, the best explanatory variable to use to predict energy content is total fat contentbecausethe relationship is stronger (less scatter) so I can make more reliable predictions.

  19. Draw a straight line to fit these data.

  20. Roughly, my line predicts the energy content for a cracker with a 10% total fat content is about ___?____ Calories

  21. Roughly, my line predicts the energy content for a cracker with a 10% total fat content is about 440 Calories

  22. Which Line? Which line?

  23. Balancing errors • Errors are the vertical distances between the points and the fitted straight line. • The errors can be marked on a scatterplot using error bars. • The aim is to balance the sum of the error bars above the line with the sum of the errors below the line.

  24. Regression

  25. The Least Squares Regression Line Minimise the sum of squared prediction errors Minimise

  26. The scatter plot is the basic tool for investigating the relationship between 2 quantitative variables. Check for a linear trend – never do a linear regression without first looking at the scatter plot

  27. Problem: How does the total fat content of a 100g of cracker change with a 100mg decrease in salt content? Use the template on page 34 and 35 to answer this question.

  28. Four scatter plots with fitted lines are shown below. The equation of the fitted line and the value of R2 are given for each plot.

  29. Comment on any relationship between the scatter plot and the value of R2. What do you think R2 is measuring? The smaller the scatter about the trend line, the greater the value of R2.

  30. So what does R2 measure? In a nutshell, it is a measure of how well a model fits the data.

  31. When we ask how well the model fits, we’re really asking how much of the data is still in the residuals.

  32. We can writeData = Model + ResidualorResidual = Data - Model

  33. The difference between the observed ‘y’ value and its associated predicted ‘y’ value is called the residual. The residual at each data value tells us how far off our prediction is at that point.

  34. For linear regression, the errors should be normally distributed

  35. = the fraction of the variance that is accounted for by the model

  36. Fitted line Look at the scatter plot below. What do you notice?

  37. Compare the fitted value with the observed value

  38. The points lie in a perfect straight line.Correlation coefficient, r = 1Fitted values = observed values

  39. Distribution of y-values Distribution of fitted y-values Shows variation in fitted y’s Shows variation in y’s

  40. Regression relationship = Trend + scatter No scatter

  41. The variability in the fitted values is exactly the same as the variability in the observed values. The fitted line explains all of the variability in the observed values. • There is variability in the x-values, so we expect variability in the fitted values.

  42. In this case, there are no residuals and hence R2 = 1

  43. Look at the scatter plot below. What do you notice?

  44. There is no linear relationship. Fitted Line

  45. Correlation coefficient, r = 0Fitted values all equal 5.

  46. Regression relationship = Trend + scatter Variability in y-values No variability in fitted values

  47. The variability in the residuals is exactly the same as the variability in the observed values.

More Related