1 / 37

1.3 Trends in Data

1.3 Trends in Data. Due now: p. 20–24 #1, 4, 9, 11, 14 Learning goal : Describe the trend and correlation in a scatter plot and construct a median-median line MSIP / Home Learning: p . 37 #2, 3, 6, 8. Variables. Variable (Mathematics)

merry
Download Presentation

1.3 Trends in Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1.3 Trends in Data Due now: p. 20–24 #1, 4, 9, 11, 14 Learning goal: Describe the trend and correlation in a scatter plot and construct a median-median line MSIP / Home Learning: p. 37 #2, 3, 6, 8

  2. Variables • Variable (Mathematics) • a symbol denoting a quantity or symbolic representation • an unknown quantity • Variable (Statistics) • A measurable attribute; these typically vary over time or between individuals • E.g. Height, Weight, Age, Favourite Hockey Team • Can be Discrete, Continuous or neither • Continuous: Weight (digital scale) • Discrete: Number of siblings • Neither: Hair colour

  3. The Two Types of Variables • Independent Variable • horizontal axis • Time is independent (why?) • Timing is dependent • e.g., time to run a race vs. length of race • Dependent Variable • values depend on the independent variable • vertical axis • Format: “dependent vs. independent” • e.g., a graph of arm span vs. height means arm span is the dependent variable and height is the independent

  4. Scatter Plots • a graph that shows two numeric variables • each axis represents a variable • each point indicates a pair of values (x, y) • may show a trend

  5. What is a trend? • the ‘direction’ of the data • a pattern of average behavior that occurs over time • e.g., costs tend to increase over time (inflation) • need two variables to exhibit a trend

  6. An Example of a trend • U.S. population from 1780 to 1960 • Describe the trend

  7. Correlations • Strength is… • None – no clear pattern in the data • Weak – data loosely follows a pattern • Strong – data follows a clear pattern • For strong/weak, direction is… • Positive - data rises from left to right (overall) • As x increases, y increases • Negative: data drops from left to right (overall) • As x increases, y decreases • http://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.html Strong, positive linear correlation

  8. Line of Best Fit • A straight line that represents the trend in the data • Can be used tomake predictions (graph or equation) • Can be drawn or calculated • Fathom has 3: movable, median-median, least squares • Gives no measurement of the strength of the trend (that’s tomorrow!)

  9. An example of the line of best fit • this is temperature recycling data with a median-median line added • what type of trend are we looking at?

  10. Creating a Median-Median Line • Divide the points into 3 symmetric groups • If there is 1 extra point, include it in the middle group • If there are 2 extra points, include one in each end group • Calculate the median x- and y-coordinates for each group and plot the 3 median points (x, y) • If the median points are in a straight line, connect them • Otherwise, line up the two outer points, move 1/3 of the way to the other point and draw a line of best fit

  11. Median-Median Line

  12. Median-Median Line (10 points)

  13. Lines of Best Fit – why 3? • Drawing a line of best fit is arbitrary • Hit as many points as possible • Have the same number of points above and below the line • Outliers tend to be ignored • The median-median line is easy to construct and takes the spread of the data into consideration • The least-squares line takes every point into consideration but is based on a complicated formula

  14. AGENDA for Wed-Thu • 1.3 Median-Median Line • Using a regression equation • Fathom Activity - Predict your weight as an NHL player • 1.4 Trends With Technology • Correlation coefficient • Coefficient of Determination • Residuals • Least-Squares Line • Fathom Investigation: finding the Least Squares Line

  15. Scatter Plots - Summary • A graph that compares two numeric variables • One is dependent on the other • May show a trend / correlation • positive/negative and strong/weak • A line may be a good model • Median-Median and Least-Squares • If not, non-linear (can be quadratic, exponential, logarithmic, etc.)

  16. Using a regression equation • For a line of best fit, the equation will be in the form y = mx + b • e.g., W = 7.25 H – 332 • Mr. Lieff is 71.5 in tall. His weight would be: • W = 7.25(71.5) – 332 = 186

  17. Fathom Activity – Predict your weight as an NHL player! • Click http://www.nhl.com/ice/playerstats.htm • Under TEAM: Pick your favourite • You can also change Position, Country, Status • Under REPORT: BIOS • Click GO> • Copy the URL • Run FathomFileImportImport From URLPaste • Create a scatter plot of Weight vs. Height • Add a median-median line • Use the equation to: • predict your weight based on your height • Is it accurate? Discuss with a neighbour. • MSIP / Home Learning: p. 51 #1-6, 7 bcd, 8

  18. 1.4 Trends in Data Using Technology Learning goal: Describe and measure the strength of trends Due now:p. 37 #2, 3, (6-7 or 8) MSIP / Home Learning: p. 51 #1-6, 7 bcd, 8 use Fathom and Excel

  19. Regression • The process of fitting a line or curve to a set of data • A line is linear regression (Excel or Fathom) • A curve can be quadratic, cubic, exponential, logarithmic, etc. (Excel) • We do this to generate a mathematical model (equation) • We can use the equation to make predictions • Interpolation – within the span of the data • Extrapolation – outside of the span of the data

  20. Example • armspan = 0.87 height + 22 • y = 0.87 x + 22 • What is the arm span of a student who is 175 cm tall? • y = 0.87(175) + 22 • = 174.25 cm • How tall is a student with a 160 cm arm span? • y = 0.87x + 22 • 160 = 0.87x + 22 • 160 – 22 = 0.87x • 138 = 0.87x • x = 138 ÷ 0.87 • = 158.6 cm

  21. Correlation Coefficient • The correlation coefficient, r, is an indicator of the strength and direction of a linear relationship • r = 0 no relationship • r = 1 perfect positive correlation • r = -1 perfect negative correlation • r2 is the coefficient of determination • Takes on values from 0 to 1 • r2 is the percent of the change in the y-variable that is due to the change in x • if r2 = 0.85, that means that 85% of the variation in y is due to x

  22. Residuals • a residual is the vertical distance between a point and the line of best fit • if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern • The least-squares line minimizes the sum of the squares of the residuals http://www.math.csusb.edu/faculty/stanton/m262/regress/

  23. Least Squares LineWeight vs. Height (NHL) • w = 7.23h – 325

  24. Using the equation • How much does a player who is 71 in tall weigh? • w = 7.23(71) – 325 • = 188.33 lbs • How tall is a player who weighs 180 lbs? • w = 7.23h – 325  h = (w + 325) ÷ 7.23 • So h = (180 + 325) ÷ 7.23 • = 69.85” or 177.4cm

  25. 1.5 Comparing Apples to Oranges • http://www.smarter.org/research/apples-to-oranges/

  26. The Power of Data Chapter 1.5 – The Media Mathematics of Data Management (Nelson) MDM 4U There are 3 kinds of lies: lies, damn lies and statistics.

  27. Example 1 – Changing the scale on the axis • Why is the following graph misleading?

  28. Example 1 – Scale from 0 • Consider that this is a bar graph – could it still be misleading?

  29. Include every category!

  30. Example 2 – Using a Small Sample • For the following surveys, consider: • The sample size • If there is any (mis)leading language

  31. Example 2 – Using a Small Sample • “4 out of 5 dentists recommend Trident sugarless gum to their patients who chew gum.” • “In the past, we found errors in 4 out of 5 of the returns people brought infor a Second Lookreview.” (H&R Block) • “Did you know that 1 in 4 women can misread a traditional pregnancy test result?” (Clearblue Easy Digital Pregnancy Test) • “Using Pedigree® DentaStix® daily can reduce the build up of tartar by up to 80%.” • “Did you know that the average Canadian wastes $500 of food in a year?” (Zip-Lock Freezer bags)

  32. Details on the Trident Survey • How many dentists did they ask? • Actual number: 1200 • 4 out of 5 is convincing but reasonable • 5 out of 5 is preposterous • 3 out of 5 is good but not great • Actual statistic 85% • Recommend Trident over what? • There were 2 other options: • Chewing sugared gum • Not chewing gum

  33. Misleading Statements(?) • How could these statements be misleading? • “More people stay with Bell Mobility than any other provider.” • “Every minute of every hour of every business day, someone comes back to Bell.”

  34. “More people stay with Bell Mobility than any other provider.” • Does not specify how many more customers stay with Bell. • e.g. Percentage of customers renewing their plan: Bell: 30% Rogers: 29% Telus: 25% Fido: 28% • Did they compare percentages or totals? • What does it mean to “stay with Bell”? Honour entire contract? Renew contract at the end of a term? • Are early terminations factored in? If so, does Bell have a higher cost for early terminations? • Competitors’ renewal rates may have decreased due to family plans / bundling • Does the data include Private / Corporate plans?

  35. “Every minute of every hour of every business day, someone comes back to Bell.” • 60 mins x 7 hours x 5 days = 2 100/wk • What does it mean to “Come back to Bell”? • How many hours in a business day?

  36. How does the media use (misuse) data? • To inform the public about world events in an objective manner • It sometimes gives misleading or false impressions to sway the public or to increase ratings • It is important to: • Study statistics to understand how information is represented or misrepresented • Correctly interpret tables/charts presented by the media

  37. MSIP / Homework • Read pp. 57 – 60 Ex. 1-2 • Complete p. 60 #1-6 • Final Project Example – Manipulating Data (on wiki) • Examples • http://junkcharts.typepad.com/ • http://www.coolschool.ca/lor/AMA11/unit1/U01L02.htm • http://mediamatters.org/research/200503220005

More Related