220 likes | 366 Views
Consuming statistical data. Badania Operacyjne. Statistical pitfalls.
E N D
Consumingstatistical data Badania Operacyjne
Statisticalpitfalls Assuming independenceof your action and the other passengers’ actions, the probability ofthere being two bombs conditional on your bringing one is equal to theprobability of one bomb conditional on your not bringing one. • Confoundingconditionalprobabilities • Supposethatyouareabout to board a flight and thatyoufeartheplanemight be blownup by a bomb • An old jokesuggeststhatyoutake a bomb on theplanewithyou, becausetheprobability of twobombsisreallylow
Problem 2 • You are going to play roulette. You first sit there and observe, andyou notice that the last five times it came up “black.” • Would you beton “red” or on “black”? The same withbetting on 1,2,3,4,5,6 in lotto state lottery.
Statisticalpitfalls • Ignoringbaseprobabilities: Problem 1 • Youareconcernedthatmighthave a disease • Youaregoing to take a test that: • Ifyouhavethedisease, it will show itwith prob. 95% • Ifyoudon’t, the test mightstill be positivewith prob. 10% • Assumethatyoutookthe test and youtestedpositive. • Whatistheprobability of actuallyhavingthedisease?
Statisticalpitfalls • Supposethediseaseisknown to be extinct, then P(D|T)=0 • And if P(D)=0.9, then P(D|T)=0,9884
Anotherexample - prejudice Assumethat most of the top squash playersarePakistani. Itdoes not meanthat most Pakistaniare top squash players. Yetpeopleoften make thismistake.
Statisticalpitfalls • Biasedsamples (correlation) • 1936 US presidentialelections Roosevelt (democrat) vsLandon (republican) • OpinionpollsinLiteraryDigest • Thepollrelied on car and telephoneregistrationlists • Anotherexample: members of someultraconservative party refuse to respond to thepollsters’ questions
Statisticalpitfalls • Biasedsamples (samplingprocedure) • Problem 4: Familysize: • I wish to findtheaveragenumber of childrenin a family • I go to a school, randomlyselectseveraldozenschildren, and askthemhow many siblingstheyhave • I computetheaverage • Thebiasstemsfrom my verychoice of samplingchlidreninschools • We canimmediatelyseethatiswrongwithoutanyinformation on correlation • Noticethatthesampleis not biasedifyou want to answerthequestion: • „How many children (includingyourself) do yougrowupwith”
Statisticalpitfalls • Biasedsamples (samplingprocedure) • Waiting time • I wish to estimatetheaverage time betweenthearrival of twoconsecutivebuses • I go to the bus stop, measurethe time, and multiplyit by two • A bus thathappens to takelonger to arrivehas a higherprobability to appearin my sample • If I wished to estimatethewaiting time for a passengerwhoarrivesatthe stop at a random moment, thesamplewould not be biased ItaliansvsEuropeans
Problem 3 • A study of students’ grades in the United States showed that immigrantshad, on average, a higher grade point average than US-bornstudents. The conclusion was that Americans are not very smart, or atleast do not work very hard, as compared with other nationalities. • What do youthink?
Statisticalpitfalls • Biasedsamples (samplingprocedure) • Problem 5: Winner’scurse: • We areconducting an auction for a goodthathas a commonvalue (e.g. oil field) • Thiscommonvalueis not knownwithcertainty • Assumethateach firm gets an estimate of theworth of the oil field and submitsit • Theestimatestheygetarestatisticallyunbiased • Ifyouhave 1 firm itsexpectedpayoffis zero • Ifyouhaveseveralfirms, the firm thatwinsthe bid islikely to losemoney • Think of „winningtheauction” as a samplingprocedure • Thewinningbidsthataresampledare not representative of thewhole „population” of bids
Statisticalpitfalls • Regression to themean • „Regression” refers to theprocess of fitting a curve to datapoints, under theassumptionthatthereissomeinherentnoiseinthe data generatingprocess. Because of thatnoisesimplecurveisbetterthanmorecomplexcurvethatmatchesthedatapointsexactly (overfitting) • Historically, linearregression was first used to explaintheheight of men by theheight of theirfathers. • Theline was increasing but theslope was less than one – hence „regression” • Theheightdepends on genes • And on allotherthings(in theabsence of infoaboutthem, let’sputthemalltogether and callthemnoise) • Assumethatthenoiseis independent of thefather’sheight. • Thentake a verytallman • He will pass to his son his genes but not thenoise
Statisticalpitfalls • Regression to themean • Supposethatyouselectstudents by theirgrades on an examination, and assignthebest to a separateclass • After a yearyouchecktheirprogress • Youwouldexpectthem to do betterthantheaverage student, • But youwouldalsoexpectthem, on average, to do belowtheirpreviouslevel. • Thisisbeacuse of thewayyouselectedthem • Talent will be robust • Noise (luck on theday of exam, etc.) will not be
Statisticalpitfalls • Regression to themean • Yourfriendtellsyouthatyoumustsee a moviethat’sjust out: „It’sthebestmovie I haveeverseen” • Youselect a political leader or investment consultantbased on their past performance
Problem 6 [At a restaurant] • ANN: I hate it. It’s just like I told you: they don’tmake an effortanymore. • BARBARA: They? • ANN: Just taste it. It’s really bad food. Don’t you remember how itwas the first time we were here? • BARBARA: Well, maybe you’re tired. • ANN: Do you like your dish? • BARBARA: Well, it isn’t bad. Maybe not as good as last time, but… • ANN: You see? They first make an effort to impress and lure us, andthen they think that we’re anyway going to come back. No wonderthat so many restaurants shut down after less than a year. • BARBARA: Well, I’m not sure that this restaurant is so new. • ANN: Itisn’t? • BARBARA: I don’t think so. Jim mentioned it to me a long time ago, it’sonly us who didn’t come here for so long. • ANN: So how did they know they should have impressed us the firsttime and how did they know it’s our second time now? Do youthink the waiter was telling the chef, “Two sirloins at no. 14, butdon’t worry about it, they’re here for the second time”?
Statisticalpitfalls • Correlation and causation • Twovariablesarecorrelatediftheytend to assume high valuestogether and lowvaluestogether • We measureit by thecovariance and correlationcoefficient • Causalityis a much trickierconceptbecauseitinvolvescounterfactual, namelystatement of thetype: • „X is high and so is Y; but had X beenlow, Y wouldhavebeenlow, too.”
Problem 7 • Studies show a high correlation between years of education andannual income. Thus, argued your teacher, it’s good for you to study:the more you do, the more money you will make in the future. • Isthisconclusionwarranted? • More educationmoreincomeXY • More incomemoreeducationYX • Richparentsmoreeducation and moreincome ZX,Y
Problem 8 • In a recent study, it was found that people who did not smoke at allhad more visits to their doctors than people who smoked a little bit.One researcher claimed: “Apparently, smoking is just like consumingred wine – too much of it is dangerous, but a little bit is actually good for your health!” • Do you accept this conclusion?
Statisticalpitfalls • Correlation and causation • You want to measuretheeffectthat smoking (X) has on general health (Y) • Linearregressiongivesyoucorrelationbetweenthetwo • Thiscorrelationmay be theresult of: • X affecting Y • Y affecting X • Anothervariable Z affectingboth X and Y • Purechance • We canchoose an instrument – a variablethat: • Iscorrelatedwith X • Is not correlatedwith Y • For exampletobaccotax: If tobacco taxes only affect health because they affect smoking (holding other variables in the model fixed), correlation between tobacco taxes and health is evidence that smoking causes changes in health. An estimate of the effect of smoking on health can be made by also making use of the correlation between taxes and smoking patterns.
Statisticalpitfalls 6. Statisticalsignificance Problem 9 Comment on the following. • CHARLES: I don’t use a mobile phone anymore. • DANIEL: Really? Why? • CHARLES: Because it was found to be correlated with brain cancer. • DANIEL: Com’n, you can’t be serious. I asked an expert and they said that the effect is so small that it’s not worth thinking about. • CHARLES: As long as you have something to think with. Do as you please, but I’m not going to kill myself. • DANIEL: Fine, it’s your decision. But I tell you, the effects that were found were insignificant. • CHARLES: Insignificant? They were significant at the 5% level! Eveniftheuse of mobile phonesincreasestheprobability of braintumorsfrom 0.0000302 to 0.0000303 withlargeenoughsample