1 / 24

Confidence Intervals for Means

Confidence Intervals for Means. Chapter 8,Section 1. Statistical Methods II. QM 3620. Estimation and Guessing.

abel
Download Presentation

Confidence Intervals for Means

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ConfidenceIntervalsforMeans Chapter8,Section1 StatisticalMethodsII QM3620

  2. EstimationandGuessing SupposeIwantedtoguessyourage.Iwouldlookyouovercarefully and thenbasedontheinformationI have, Iwouldventureaguess.OddsarethatIwouldbewrong.Imightactuallybe close, butIwouldmostlikelybewrong. IfitwasreallyimportantthatIgetthatguesscorrect, Icouldwidenmyguessabitto includearangeofvalues.Ifyougotoacarnivalwheretheywillattempttoguessyour ageforaprize, theyconsiderthemselvescorrectiftheyguesswithinacoupleyears. Theygivethemselvesa “marginoferror”sotospeak. Thatmarginoferrorcompensatesforthelimitedinformationtheyhaveonyou.If theycouldgleanmoreinformation,likeyourparent’sage, whenyougraduatedfrom highschool,etc., thentheycouldaccuratelydetermineyourpreciseage.More informationallowsyoutoguessmoreaccuratelyandmoreprecisely.

  3. EstimationandGuessing Sohowdoesthatfigureintowhatwearelearning? Animportantpartofstatisticsistheabilitytoguessthevalueofcertainnumbers.Okay, guessisadubiousword.Let’susethephraseestimate; itssoundsmorescientific. Sothisisaclassonguessing…er, Imeanestimating? Partially, butitisnotassimpleasthat.Therearebadwaystoestimateandgoodways to estimate, so wearegoingtolearnhowtogoaboutittherightway.

  4. AGoodEstimate Sowhatmakesagoodestimate?  Agoodestimateisaccurate. Agoodestimateisprecise.   Accuracy?Precision? Whatdoyoumean?  Accuracyistheabilitytoestimatecorrectly. Precisionisthespecificityoftheestimate. Thinkofitthisway:Icouldbe100%accurateifIsaidthatyouarebetween0and 115yearsofage, butIwouldbeextremelyimprecise. Alternatively, Icouldguessthatyouareexactly21.6yearsold.Iwouldbe extremelyprecise, butwhataremychancesofguessingcorrectly?     Thereisatrade-offbetweenaccuracyandprecision.Preciseestimateslead toinaccuracy, whereasaccurateestimatesrequirelooserprecision. SohowdoIwinthis“game”ifthereisatrade-off?   Thatiswherestatisticscomesin. Statisticsallowsustocontrolthetradeoffof precisionandaccuracysoyoucanmakethe “best”estimateforyourspecific situation. 

  5. HowDid TheyDoIt? Statisticiansrealizedthattheyhadtostartwiththeirbestonenumberestimate, the point estimate.Thisistheonenumberthatyouwouldguessifyouwereonlyallowed onenumber.  So, whatistheonenumber?  Well, thatdependsonwhatyouareestimating.Generallyspeaking, ifyouhaveasampleofdata, the bestonenumberestimateofthestatisticinapopulationisthesamestatisticinthesample.Inother words, thebestonenumberestimateofthemeanofapopulationwouldbethemeanofthesample. Ifyouwanttoestimatetheproportionofapopulation, youstartwiththeproportioninthesample. Ifyouwanttoestimate…well,yougetthepicture.    Sowhyaren’twedoneoncewehavethepointestimate?  That’swhatIwasexplainingearlierwhenwetalkedaboutaccuracyandprecision. Thepointestimateisreallyprecise; it’sonenumber.But, it’salsoboundtobeinaccurate.Youonly havesomeofthedata(i.e. asample)ratherthanallofthedata(i.e. thepopulation)soanythingbased onlimitedinformationisboundtobeinaccurate. Okay, givemeahammer.Let’spoundthatlastpointin.UNLESS YOUHAVE ALLOF THEDATA,YOU ARE WORKING WITHLIMITEDINFORMATION.LIMITEDINFORMATIONLEADS TO INACCURATEESTIMATES.SO, CHOOSE YOURPOISON…INACCURACYORIMPRECISION. IMPRECISIONCANBEDEALT WITHBUTINACCURACYJUSTMAKES YOU WRONG. Wearegoingtobuildarangeormarginoferrorforourestimatetoimproveouraccuracy…and sacrificeabitofprecisionintheprocess.Thatisthebestwaytoapproachsituationswithlimited information.    

  6. Populations?Samples? Forgotaboutsamplesandpopulation, eh? Okay, quickreview.Supposewewereinterestedindeterminingtheaveragepricepaid foratickettoabaseballgame.   Whenwesaypopulation, whatwereallymeanis ALLoftheobservationsthatwewanttotalk about.  DowereallywanttomakeanestimateforALLbaseballgames? Wecan’tseriouslybeincludingLittleLeaguegames.Perhapsitisbettertosayallticketstoprofessionalgames. Oh,notinterestedinminorleaguegames.Thepopulationwouldthenbemoreproperlydefinedasallprofessionalmajor leaguegames,orwecouldsayallMLBgames. Whatyear?Thanks!Youhadtothrowsomethingintothemix,didn’tyou.Let’ssaythisyear’sMLBgames.     Nowthatwehaveapopulationdefined, let’stakealookattheobservations.  Itmightbepossibletodeterminethepriceofeveryticketbybadgeringthefrontofficeofeachbaseballteam,butthat doesnotaddressthediscountsoffered.Noteveryticketsgoesforfullprice. Andhowtheheckarewegoingtofindouthowmuchsomeonepaidforaticketwhenthegamehasalreadybeenplayed? Howarewegoingtofindallthosepeople? Bottomlineisthatlookingateveryobservationinapopulationispossible…sometimes…butusuallywehavetolook atthoseobservationsforwhichwehaveaccess. Thisdoesn’tevenstarttoaddresstheproblemslookingatobservationswhenwehavetodestroytheobservationtoget ameasurement.Sayforexample,theneedforacompanylikeGeneralElectrictodeterminetheaveragelifespanoftheir lightbulbs.Ifwedestroyalloftheobservations,thenwherearewe?     Asampleisjustasubsetofthepopulation; onethatwecangetourhandsonandhopefully onethatrepresentsthepopulationwell.Randomsamplesarealwaysthebest. 

  7. MarginofError We’vegotasample, andwe’vecalculatedthepointestimate.Inthiscase, let’ssaywehaveasample of100observationsoftheamountspentonaMLBticketthisyear.  Theaveragepriceofthese100ticketswas$23.52.Thiswouldmakeagoodstarting(point)estimatefortheaverage priceforallofthetickets. DoyousupposethatthisisreallytheaveragepricepaidforALLMLBticketsthisyear?   Didn’tthinkso.Youaretoosmarttothinkitwasthateasy.  Nowwehavetobuildamarginoferroraroundourpointestimate.  Ourmarginoferrordependsonhowmuchtheticketpricesvary.  Thinkofitthisway.Ifalloftheticketsin the samplewereexactlythesameprice,thenwemightreasonablythinkthatall ticketsinthepopulationcostthesame.Ifthatisthecase,wecoulduseoursampleaveragetoestimatethepopulation averagewithnomarginoferror.Simple…butunrealistic. Pointoffact: Ifthereisalargevariationinthevaluesofthesampleobservations, wecanpresumethatthereisalsoa largevariationinthevaluesinthepopulation.Thatwouldimplythatoursampleaveragemightchangequiteabit fromsampletosampleastheobservationscouldbequitedifferentineachsample. Anotherpointoffact: Thefewerobservationsinthesample, thelessinformationweevenhaveaboutthepoint estimateandthegreaterwehavetocompensateforalackofcompleteinformation. BottomLine: Themarginoferrorhaseverythingtodowiththenumberofandvarianceintheobservations, andnot thevalueofthepointestimate.

  8. Whyit Works Hopefullyyounowunderstandthebasicthoughtsbehindestimatingwitharangeusing amarginoferror.Creatingoneisnotthatdifficult.Justremindyourselfthatboring statisticiansformulatedthissoyoudon’thaveto. TheMechanics   Toactuallycreatetherangeusingamarginoferror, wefocusonhowmuchasamplemeanislikelytovary fromsampletosample. Thistellsushowclosewecanexpectthesamplemeantobefrom hepopulationmean weareinterestedin. Ifthesamplemeansareexpectedtovary substantiallyfromsampletosample, thepopulationmeancouldbefarawayfromthesamplemeanandwe wouldhavetocompensatewithalargemarginoferror.Ifthesamplemeansvarylittle ,thenthepopulation meanwilllikelybeclosetothesamplemeanandourmarginoferrorwillbesmall.

  9. Whyit Works • Howdoweknowhowmuchthesamplemeanswillvarywhenweonlyhaveonesample? • Goodquestion!Thegoodnewsisthatthevariationinthesamplemeanscanbedirectlycalculatedfromthevariationin theindividualobservations. • Thereisamathematicalrelationship…thestandarderrorofthesamplemeancanbe estimatedbydividingthestandarddeviationoftheobservationsbythesquarerootofthesamplesize. • StandardError?Yes,toeliminateconfusion,statisticiansusethetermstandarddeviationtorefertothevariationin individualdatapointsandthetermstandarderrortorefertothevariationincalculationslikeameanormedian. • Sothe standarderrorofthesamplemeansisreallyjustthestandarddeviationofthesamplemeans…orhowmuchsample meansvaryfromsampletosample.

  10. TheEquation Ournextstepistousethemarginoferror(theamountweaddandsubtractfromourpointestimate)toformup Our “rangeguess”(ormoretechnically,ourconfidence interval.)Thisconfidenceintervalwillgiveusthebest accuracyandprecisioncombination…butweneedastatisticalmultiplefromthestatisticians.Thismultipletakes intoaccountthesamplesizeandtheconfidence(accuracy)level(aswasmentionedinthe“MarginofError”slide). Thegeneralformulaforallconfidenceintervalsis:  Point Estimate StandardErrorof PointEstimate   Multiple Theversionforaconfidenceintervalforapopulationmeanis: Sample Mean StandardErrorof SampleMean   t-value Takeacloselookatthatsecondformula.  Thecalculationtotherighthandsideoftheplus-minussign(±)isthemarginoferror. Thestandarderrorofthesamplemeancanbeestimateddirectlyfromthesampleyoutook(seetheslide“Why It Works”againifyoudon’tbelieveme). Westatedthatthemarginoferrorneedstobecomegreaterifthesamplesizeissmaller(hencewehaveless information)oriftheconfidencelevelishigher(toincreaseaccuracyweloseprecision)…THEREFOREwecan expectagreatermultipleforasmallersamplesizeandagreatermultipleforahigherconfidencelevel.Thet-value hasallofthisbuiltin.   

  11. Thet-value • Youaregoingtoneedtobeabletodeterminethet-value (Stats I). • Whydoweuseatvalue?Letmeexplainitthisway.Thetvaluetakesintoaccountthatwe havehadtoestimateeverythingfromourdata. • Weestimatethestandard deviationandthestandarderrorofasamplemeanfromourdata…andthatwasbeforewe evengottoourmainpurposeofestimatingtheoverallmean.Youcanonlyestimatesomany thingsbeforeyoubetterstartcompensatingforit.Thetvaluehasabuiltincompensatorfor thoseintermediateestimates. • Thetextbookreviewshowtolookupthetinthetableonpages340-342, ifyouareso inclined.Personally, Ithinkatablesellsthetvalueshort.Therearetoomanyvaluesthatare neededthatdonotshowuponthetables.Usethecomputer.Itcanfillintheblankswhere notvalueapparentlyexists.ThenextslideshowshowwedoitinExcel. • Bytheway, inthefuturethemultipleisnotalwaysatvalue.Italldepends.Ifyouare estimatingsomeotherstatisticbesidesthepopulationmean, youwilllikelybeusingan altogetherdifferentmultiple. • Thebottomlineisthatthemultipleisinvariablylinkedtotheaccuracyandprecisionofthe estimate.Highconfidence(highaccuracy)leadstoabigmultipleandlessprecision(wide marginoferror), andviceversa.

  12. FindingatvalueusingExcel ThetDistribution TofindatvalueinExcel,youuse theTINVfunction.TheTINV functionneedstwobitsof information: 1)Theconfidencelevelyouwant;and 2)thesamplesize(minus1). -tvalue TheTINVfunctiontakestheconfidencelevelbackwards.Itwantstoknow thelevelof “unconfidence”orthe“probabilitythatyouarewrong”.Tobe 95%confidentwouldimplythatyouare5%“unconfident”.The TINV functionlooksuptheareaoutsidetheblueareasinthegraphabove. Thesamplesizehastobereducedbyonewhenlookingupthetvalueto compensatefortheestimationwedidofthestandarddeviation.Trustme onthis…youdon’twanttoseethederivations. Thetdistributionlooksjustlikea normaldistribution,exceptitisabit flatter(whichdependsonthe amountofinformationyouhave– i.e.thesamplesize).Aninfinite samplesizemakesthet distributionidenticaltothenormal distribution. +tvalue So,atvalue fora95% confidence(or.95indecimal format)withasamplesizeof 100wouldbe: =TINV(1-0.95,100-1)

  13. Nowthatwehaveeverything Puttingitaltogether…  Wehaveapointestimate…themeanwecalculatedfromthesample…referredtoasx Wehavethemeasureofthevariationinthesample…thestandarddeviationwe calculatedfromthesample…whichisreferredtoass Wehavethesamplesize…thenumberofobservationsinthesample…whichis referredtoasn Wehavethemultiple…thetvaluefromExcel…whichisreferredtoast     Whatdoesthatspell?  xsnt No, itactuallyspellsconfidenceintervalcalculationtime  

  14. TheCalculation Rememberourconfidenceintervalequationfromanearlierslide, Sample Mean StandardError ofSampleMean  t-value s n x t Algebraically,theequationlookslike: WherexSampleMean s n t StandardErrorof SampleMean t-value and

  15. SomethingtoKeepinMind s n t Marginof Error Variationinthesample: Samplesize: ConfidenceLevel(CL): ass asn asCL Marginof Error Marginof Error Marginof Error    …andmarginoferrorisdirectlyrelatedtoprecision.Asmallermarginoferrorisa morepreciseestimate.

  16. So WhatwasthePointof All This? Remember, ourwholereasonforthisseriesof thoughtsandcalculationswastohelpusbest estimatethemeanofsomevariableinalargegroup byusingonlythelimitedinformationprovidedbya samplefromthatgroup.

  17. ApplicationTime Let’strythisforreal

  18. Business ApplicationHighlights Readthediscussionofintervalestimationwhenthestandarddeviationforthe populationisunknown(page340)andtheexplanationofthetdistributionand degreesoffreedom(pages340-342). Readthebusinessapplicationonpages342-343. HeritageSoftwareoperatesaservicecenterin Tulsa, Oklahomatorespondto servicecallsontheireducationalandbusinesssoftware. Timespenthelpingacustomerisanimportantmeasureofefficiencyofthese operations.Moretimepercustomersmeansthatmoreserviceoperatorsmustbe onstafftohandletheload. Managementwouldliketheaveragecalltimefortheseserviceoperatorstobe estimated. Asampleof25callswascollectedandrecordedwiththeintentofestimatingthe averagetimeforallcallstakenbytheserviceoperators.      

  19. The Approach The25callsthatwehavedataonwillserveasabasisforourestimate.Wecancalculatea meanfromthesampleanduseitasapointestimate(onenumberestimate)forthemean lengthofallcalls.Theistheanchorofourintervalestimate. Wealsoneedthestandarddeviationoftheobservations.This, alongwiththesamplesize, willbeusedtocalculatethestandarderrorofthesamplemean. Thetvalue,whichisourmultiple, willbedeterminedusingExcel.Wetrynottousetablesin thisclass. We’llusetwodifferentapproachesonExceltogetustheinformationweneed.Onewilldo mostoftheworkforus.Theotherrequiresustowalkthroughtheprocessstepbystep. KNOW THEMBOTH. Thekeytothiswholeprocessistoremindyourselfthatyouareworkingwithasmallbitof informationfromasample.Youaretryingtoestimatesomethingthatyouhavenowayof verifying.Usingalogicaldata-drivenapproach, wecanderiveinformationfromthesampleto giveusa “bestguess”intervalthatalsoprovidesuswithameansofmeasuringour “accuracy”viatheconfidencelevel.Itisbettertoknowhowcertainyouarethantobe shootinginthedark.     

  20. AReiterationofDefinitions ThePointEstimate  Thepointestimateforthepopulationmeanisalwaysthesamplemean.It’s ouronenumberbestguess.  TheConfidenceLevel  Astandardconfidencelevelis95%,whichmeansouroddsare19outof20 (95%)thattheconfidenceintervalcapturesthemeanlengthofaservice centercallforallcalls.Remember, thisisbasedsolelyontheinformationwe havefromasmallsampleofcalls. If95%isn’tgoodenough(notyourdecisioninthiscase), thenyoucanbemore certainbyusinga99%confidenceinterval.Thebadnews: rememberwhat happenstoprecisionwhenwewanttobemoreaccurate.   TheStandardErrorofthePointEstimate  Themeanofasample, whichisourpointestimate, isgoingtochangefrom sampletosample.Takingintoaccountthisvariationisakeypartofformingan intervalestimate.Ifthepointestimateswouldvaryagreatdeal, thenwewill havetoformupaprettywideintervaltotakethatintoaccount. 

  21. TheImplied Assumptions Allstatisticalestimatesaregoingtocomewithsomeassumptions.  Thefirstassumptionisthatyouarenottryingtocookthedatatomake theestimatecomeoutinsomepredeterminedway.We’llsaythatyouare supposedtobeunbiased.Thisisnotanexplicitassumption, butitisthere nonetheless. Thesecondbasicassumptionisthatwetookarandomsample   Mostallstatisticalcalculationsassumethatyouarenotchoosingthemembersofthe samplebasedonopinion.Forthemathematicstoworkinthissituation, youneedtobe allowingeachobservationtohaveanequalchancetobeinthesample.Thatiscalleda randomsample…likerollingdicetoseewhichobservationtoinclude. ThinkbacktotheMLBticketpriceexample.Itisprobablyimpossibletorandomly choosefromtheticketsthatweresold.Thuswemayhavetriedtousearandomsample, butitisdoubtfulthatweactuallyachievedrandomness.Thatmakesanyresults questionable.Whensomeonesaystheysampledrandomly, askthemhowtheydidit. Questioneverything. NOTE:Statisticalcalculationsarebuilttodealwithrandomsamples.Anyconfidence intervalsyoucalculatewillneveradjustforbiasorpoorsamplingtechniques.Themargin oferrorisbuilttohandlenaturalvariationinthedata, notincompetence.   

  22. TheFormal Assumptions Thetextindicatesthewehavetoassumethattheobservationvalues(calllengthsin thiscase)aremound-shapedornormallydistributed, specificallyifthesamplesizeis small.  Thebrutaltruthisthatevenwithsmallsamplesizes, theconfidenceintervalwearecalculatingwill work, andstatisticianstendtospendtoomuchtimeonthelittledetails. Themain problemhereisnotgoingtobethedistributionoftheobservations, butthemeansbywhichwe choosethem.Ifwecouldreallyrandomlyselectfromallobservations, thenitislikelythatthose observationsareallinthecomputer…andthenwedon’treallyneedtousestatisticstoestimate anything.Wejustcalculatetherealnumberbyusingalloftheobservations. Forreallysmallsamplesizes, youdoneedtomakesurethattheobservationsarenotstronglynon- mound-shaped. Didyoufollowthat?Thatmeans, ifthesampleisreallysmall, say 20orless, thenthe distributionoftheobservationsshouldreallybereasonablymound-shaped, withstrongemphasison theword “reasonably”.Theproblemsonlyseemtooccurinasamplewithastrongbi-modal distribution(whichmeansthatobservationsformwhatlooksmoreliketwomoundsratherthan one.) Howdoyoucheckthedistributionoftheobservations?Iamgladyouasked.Plotahistogram.They arenotdifficulttodowithExcelandyoumightbesurprisedontheamountofinformationyoucan gleanaboutsomevariablebylookingatahistogram.

  23. Problem 1 HeritageSoftwareoperatesaservicecenterin Tulsa, Oklahomatorespondto servicecallsontheireducationalandbusinesssoftware. Timespenthelpingacustomerisanimportantmeasureofefficiencyofthese operations.Moretimepercustomersmeansthatmoreserviceoperatorsmustbe onstafftohandletheload. Managementwouldliketheaveragecalltimefortheseserviceoperatorstobe estimated. Asampleof25callswascollectedandrecordedwiththeintentofestimatingthe averagetimeforallcallstakenbytheserviceoperators.

  24. Problem 2 Medlin & Associates is a CPA firm that is conducting an audit of a discount chain store. Management would like to have some measure of the amount of error that is occurring during checkout operations. A sample of 20 transactions are taken and the amount and direct of a mischarge is noted. Positives values indicate overcharges to the customers; negative values are undercharges. The problem asks for a 90% confidence interval. That means that our interval will be narrower, but our degree of certainty is less. We are trading accuracy for precision.

More Related