440 likes | 444 Views
Communicating Quantitative Information. Inflation Election district Polling, predictions, confidence intervals, margin of error Homework: Identify topic for Project 1. Postings. Prepare for Midterm. Inflation. is when goods and services cost more over time money is worth less
E N D
Communicating Quantitative Information Inflation Election district Polling, predictions, confidence intervals, margin of error Homework: Identify topic for Project 1. Postings. Prepare for Midterm
Inflation • is when goods and services cost more over time • money is worth less • Government agencies do the analysis on a 'shopping cart' of goods and services and calculates (and publishes) a number • If annual inflation is 2% = .02 , it means that something that cost $100 last year would cost $102 this year (on average) old_cost * (1 + inflation_rate) is the new_cost
Hint • Need to change the percentage into a fraction • 2% becomes .02 • Need to add 1 • Multiply old by 1.02 • Hint: if inflation is positive (if goods and services are increasing in price), then new must be more than old—need to multiply by something that increases…..
Exercises • If inflation is 4%, what would new prices be for something • $50 • $10 • If inflation is 12%, what would new prices be for something • $50 • $10
History • Mostly, there is inflation, though deflation is possible (and generally not good for economy) • Central banks ('the fed') try to regulate inflation by changes in the interest rates • Calculation is complex • Consider computers • digital cameras
Dental expenses • Yes, expenses have gone up, but have they gone up faster than inflation, that is, faster than everything • Look at the graph • Gray line versus blue line • NOTE: both are increases
Pie chart versus Bar graph • Pie is to show parts of a whole • For example, different categories of spending • Bar graphs can show categories, also. • Better than pie charts if categories are not everything • Bar graphs good for showing different time periods • Horizontal (x-axis) typically holds the time • Clustered bar good for comparisons • Stacked bar good for parts of a whole
On graphs • Graphs and diagrams are for showing context…. Telling a story (the relevant story) • Complexity is okay • Want to encourage AND reward study Remember: definitions, denominator, distribution, difference (context), dimension Dimension: may be axis in graph gapminder uses color, size of 'dot', and timing Napoleon matching to/from Moscow: color, thickness of line, geography, temperature
On re-districting • One technique is to concentrate [known] voters of one type to remove from other districts • Are voters so predictable? • Do the qualities of the individual representatives count?
New topic(s) • Measurement • Polling and sampling
Measurements • Measuring something can require defining a system / process • Competitive figure skating • ‘operational’ definition • ‘likely voter’ • someone who voted in x% of last general elections and/or y% of primaries • And knows the voting place • Fixed place and time • For surveys: answered a specific question in the context of other questions, …
Source • The Cartoon guide to Statistics by Larry Gonick and Woollcott SmithHarperResource
Caution • Procedures (formulas) presented without proof, though, hopefully, motivated • Go over process different ways • Next class: models of population, subpopulations in sample
Task • Want to know the percentage (proportion) of some large group • adults in USA • television viewers • web users • For a particular thing • think the president is doing a good job • watched specific program • viewed specific commercial • visited specific website
Strategy: Sampling • Ask a small group • phone • solicitation at a mall • other? • Monitor actions of a small group, group defined for this purpose • Monitor actions of a panel chosen ahead of time
Quality of sample • Recall discussion on students who 'took the bait' to take special survey • More on quality of sample later • More on adjusting data from panel for statement about total population later
Two approaches • Estimating with confidence intervalc in general population based on proportionphatin sample • Hypothesis testing:H0 (null hypothesis) p = p0 versusHa p > p0
Estimation process • Construct a sample of size n and determine phat • Ask who they are voting for (for now, let this be binomial choice) • Use this as estimate for actual proportion p. • … but the estimate has a margin of error. This means :The actual value is within a range centered at phat …UNLESS the sample was really strange. • The confidence value specifies what the chances are of the sample being that strange.
Statement • I'm 95% sure that the actual proportion is in the following range…. • phat – m <= p <= phat + m • Notice: if you want to claim more confidence, you need to make the margin bigger.
Image from Cartoon book • You are standing behind a target. • An arrow is shot at the target, at a specific point in the target. The arrow comes through to your side. • You draw a circle (more complex than+/- error) and sayChances are:the target point is inthis circle unless shooterwas 'way off' . Shooter would only be way off X percent of the time.(Typically X is 5% or 1%.)
Mathematical basis • Samples are themselves normally distributed… • if sample and p satisfy certain conditions. • Most samples produce phat values that are close to the p value of the whole population. • Only a small number of samples produce values that are way off. • Think of outliers of normal distribution
Actual (mathematical) process Sample size must be this big • Can use these techniques when n*p>=5 and n*(1-p)>=5 • The phat values are distributed close to normal distribution with standard deviation sd(p) = • Can estimate this using phat in place of p in formula! • Choose the level of confidence you want (again, typically 5% or 1%). For 5% (95% confident), look up (or learn by heart the value 1.96: this is the amount of standard deviations such that 95% of values fall in this area. So.95 is P(-1.96 <= (p-phat)/sd(p) <=1.96)
Notes • p is less than 1 so (1-p) is positive. • Margin of error decreases as p varies from .5 in either direction. (Check using excel). • if sample produces a very high (close to 1) or very low value (close to 0), p * (1-p) gets smaller • (.9)*(.1) = .09; (.8)*(.2) = .16, (.6)*(.4) =.24; (.5)*.5)=.25
Notes • Need to quadruple the n to halve the margin of error.
Formula • Use a value called the z transform • 95% confidence, the value is 1.96
Mechanics Process is • Gather data (get phat and n) • choose confidence level • Using table, calculate margin of error. Book example: 55% (.55 of sample of 1000) said they backed the politician) sd(phat) = square_root ((.55)*(.45)/1000) = .0157 • Multiply by z-score (e.g., 1.96 for a 95% confidence) to get margin of error So p is within the range: .550 – (1.96)*(.0157) and .550 + (1.96)*(.0157) .519 to .581 or 51.9% to 58.1%
Example, continued 51.9% to 58.1% may round to 52% to 58% or may say 55% plus or minus 3 percent. What is typically left out is that there is a 1/20 chance that the actual value is NOT in this range.
95% confident means • 95/100 probability that this is true • 5/100 chance that this is not true • 5/100 is the same as 1/20 so, • There is only a 1/20 chance that this is not true. • Only 1/20 truly random samples would give an answer that deviated more from the real • ASSUMING NO INTRINSIC QUALITY PROBLEMS • ASSUMING IT IS RANDOMLY CHOSEN
99% confidence means • [Give fraction positive] • [Give fraction negative]
Why • Confidence intervals given mainly for 95% and 99%?? • History, tradition, doing others required more computing….
Let's ask a question • How many of you watched the last Super Bowl? World Cup? • Sample is whole class • How many registered to vote? • Sample size is number in class 18 and older • ????
Variation of book problem Divisor smaller • Say sample was 300 (not 1000). • sd(phat) = square_root ((.55)*(.45)/300) = .0287 Bigger number. The circle around the arrow is larger. The margin is larger because it was based on a smaller sample. Multiplying by 1.96 get .056, subtracting and adding from the .55 get .494 to .606You/we are 95% sure that true value is in this range. • Oops: may be better, but may be worse. The fact that the lower end is below .5 is significant for an election!
Exercise Determine / choose / read • size of sample n • proportion in sample (phat) • claimed confidence level (and consult table). • Hint: go back to Mechanics slide and Table slide and plug in the numbers!
Exercise • size of sample is n • proportion in sample is phat • confidence level produces factor called the z-score • Can be anything but common values are [80%], 90%, 95%, 99%) • Use table. For example, 95% value is 1.96; 99% is 2.58 • Calculate margin of error m • m = zscore * sqrt((phat)*(1-phat)/n) • Actual value is >= phat – m and <= phat + m
Hypothesis testing • Pre-election polling • Repeat example • Source (again) The Cartoon Guide to Statistics by Gonick and Smith • See also for Jury selection, product inspection, etc.
Hypothesis testing • Null hypothesis p = p0 • Alternate hypothesisp > p0 • Do a test and decide if there is evidence to reject the Null hypothesis. (Need more evidence to reject than to keep). • Similar analysis (not giving proof!)
Hypothesis testing, continued • Test statistic is Z = (.55-50)/sqrt(.5*.5)/sqrt(1000) = 3.16 Use Excel =1-normsdist(3.16) P(z>=3.16) = .0008 Reject Null hypothesis. Chances are .0008 that it is true (that p = p0)
Project I • Paper or presentation on news story involving mathematics and/or quantitative reasoning • Involving the audience is good • Everybody be ready with paper or ready to present. Some presentations may go to next class. • Use multiple sources • Explain the mathematics!!!
Ways to get topic • Topic, assignment in other course that involves quantitative information • Double dipping • Alternative: compare how two different newspapers/writers/media treat the same topic. There must be real differences. • Variant (special case): election polling. Talk about similarities and differences, perhaps definition of 'toss-up', how they describe sources,? • Paulos TV series: http://abcnews.go.com/Technology/WhosCounting/
Homework • Topic for project 1 due by October 20 • You can re-use any topic you or anyone else posted • You can re-use spreadsheet or diagram topics • You can use topics I suggested • You can use topics from another class • YOU MUST post your proposal even if it is a topic I suggested. • Midterm is October 18 • Presentation and project 1 paper due Nov. 4 • (Guide to midterm is on-line. Reviewing will assume you have studied the guide.)