1 / 33

Communicating Quantitative Information

Communicating Quantitative Information. Diagrams Sampling issue Risk & Communicating Risk Smoking. Hormone Replacement Therapy Dimension Homework: Prepare/Design diagram/chart. Postings. Special Survey. Will come back to topic of Sampling

trevet
Download Presentation

Communicating Quantitative Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Communicating Quantitative Information Diagrams Sampling issue Risk & Communicating Risk Smoking. Hormone Replacement Therapy Dimension Homework: Prepare/Design diagram/chart. Postings

  2. Special Survey • Will come back to topic of Sampling • Accuracy (confidence, error) of tests, surveys depends on • quality of sample • size of sample • My sister asked me: can [young people] identify Einstein? • Use my students / sample of my students • Spring 2008: • students responding to request to take survey in both my classes • Last Spring: quantity was pretty small (14 + 11)

  3. Preview • Margin of error: • Claim actual result (for whole population) is within certain limits (answer plus/minus MoE) • Confidence • confidence that this particular sample is not so unusual as to make results wrong where wrong means the actual result is outside the margin of error. • Generally, the means (averages) of samples are distributed normally around the mean of the whole population and the SD of this distribution is smaller (tighter) than the SD of the distribution for this quantity for the whole population.

  4. Thought experiment…. • Want to get average height of people in the class. • Claim: impossible to measure everyone, so use a sample. • Only time to measure 6 people 62 72 68 66 69 71 for a sample mean of 68 • Statistics say (IF the sample is random) then we can be 95% confident that the mean of the whole class is between61.4 and 74.6 • will spend some more time on this

  5. Thought experiment • Want to know favorable rating of the President from whole USA population. • Ask a sample • p (proportion) of sample view President favorably. • Want to make a statement with 99% confidence • Formula for the margin of error, call it E: we can be 95% sure that the proportion of the whole population favorable is within p-E and p+E • If we want to be 99% sure, then formula will give a bigger margin of error, call this F: we can be 99% sure that the proportion of the whole population favorable is within p-F and p+F.

  6. Another way • Random sample of size N means that each person in the whole population equally likely to be in the sample • There are many samples of size N • The results of a sample of size N vary, but • Some samples of size N are very different from the whole population, but most aren’t • In most cases, the sample result will be close to the result for the whole population • What do I mean by close? Within the margin of error • What do I mean by most? This refers to the confidence interval (19/20, 99/100)

  7. Formally… • The averages of samples of size N are normally distributed • The average (mean) is the mean of the whole population • The standard deviation is smaller by a factor of square root of N • Think of narrow mountain • To half (reduce to ½ what it was) the required margin of error, you need to quadruple (*4) the sample size

  8. Note • The size of the whole population does not enter into these calculations!

  9. Quality of sample • Does not mean: how good you are…in any way. • Does mean: how representative of general population • For the Einstein question, this means how representative of 'young people today'. Amend that to college students. • The former class was practically all seniors. That class and all since tend to be journalism, history, political science majors… • Students who don't have specific required math&science courses • These factors mean sample is not representative of college population!

  10. Quality of sample • Opportunity sample • subjects available to me in my classes. Are they/you typical of 'young people'?(My sister thought yes.) • Response bias • students who took up offer. Are they/you more likely to 'know Einstein' than those who didn't. • higher level of general curiosity • diligence at obtaining extra credit

  11. Tester reliability • I was generous in categorizing answers as correct. • Two questions considered separately • 23 out of 26 • 24 out of 26

  12. Reporting Confidence at level alpha that actual proportion is within error of tested proportion More confidant at larger interval • I am 95% confident (chances are only 1/20 that this is wrong) that the actual proportion is at least 83% that knows Einstein. • I am 99.5% confident (chances are 1/200 that this is wrong) that the actual proportion is at least 78%

  13. Formulas for margin of error • Based on the finding that means of samples are [close to] normally distributed with standard deviation function of tested proportion and size of sample. • One-tailed test (just checking one side because tested proportion close to 1)

  14. Correlation (again) • Two variables • common examples • height & weight • mortality & set of health risks factors (e.g., smoking history) • Are the two correlated? Does value of one predict [some of] value of the other?

  15. Linear model • Linear = line. • X and Y (standard names for two variables—variables, values that vary!) • Y = a + b*X • if a = 0, b>0 if a>0, b>0 Note: negative values of X and/or Y may or may not be valid…

  16. Linear Model • a>0, b<0 • (This will be basis of negative correlation. Still a relationship, but in the negative. As X gets big, Y gets small.)

  17. Cab fare • (Numbers are not right, but the idea is) • $3 to get in • $2 every ¼ of a mile • Y is the fare/total cost (not including tip!) and X is distance, given in miles rounded up to the nearest quarter mile. • Fare = 3 + 2*(miles * 4) • Example: rode ½ a mile. Fare is 3 + 2*2 = 7

  18. (rough) graph of cab fare Points (0,3), (1,8)

  19. Aside • Units: miles versus quarter miles, miles versus feet versus kilometers versus … need to be understood. Some stories/calculations/experiments succeed or fail based on getting the units right! • space flight that failed due to misunderstanding/lack of agreement on units.

  20. Correlation • Two variables, X and Y. • Make a graph (computer program does not make a graph—you think about a graph) • Process: determine line that would be the best fit • defined as minimizing sum of the squares of the distances from the line ('least squares')

  21. Excel example • List two sets of numbers: • Graph using scatter plot • Use =correl(B2:B8,C2:C8) .96927

  22. Other models • … other relationship: quadratic, log, exponential, etc. Say you know deer population at two points in time. Is/will the growth be linear or exponential????? Pop. Time

  23. Caution • Correlation is not cause • coincidental • both caused by other factor • Cause is not….absolute determination. • other factors

  24. Terminology (reprise) • False positive: wrongly say someone/something has condition. • False negative: wrongly say someone/something does NOT have condition when, if fact, he or she or it does • Control group: group in experiment that does not have treatment. • treatment/condition group: group in experiment that does have treatment

  25. Double-blind study • Randomly assign subjects to • treatment • control (may give placebo) • Subject does not know which…. • Tester/evaluator does not know which… • See what happens. Time period may be long. • Smoking cannot be studies using a double-blind study!

  26. Retrospective study • Of the people who did/have X, ask how many did Y? • Not as reliable. • Also need to study group that do not have X. • 85% of people with lung cancer report that they smoke[d]. • (How many George Burns are there?)

  27. Smoking and Lung Cancer • Strong correlation • more smoking increases chances of lung cancer • smoking comes before the cancer • many different studies • Women's incidence of lung cancer went up when women started smoking • Incidence going down in groups decreasing smoking • Biological evidence • nicotine experiments with animals • lab study of lungs, blood, blood pressure, etc.

  28. What's likely to kill you http://www.reason.com/blog/show/128501.html

  29. Small multiples • Several (many) graphs/diagrams of the same format

  30. Homework • Identify complex topic (such as health risks, sports records, voting) • multiple dimensions/factors; multiple categories; timeline?, geography? • Find reputable source (more than one source even better) • Determine critical findings • determine audience • Design/build diagram (chart, graph, picture) • Bring to class to present AND to turn in. Be professional! • as appropriate, consider examples shown in class: • using 'small multiple' idea as done for 31 days • spreadsheet, but with pictures & words (tax cuts) • as appropriate, consider charts presented on health risks • This could be topic for your project I paper + charts • DUE in 1 week. • Continue postings.

More Related