120 likes | 244 Views
TA: Zhen Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 3-4 PM Tuesday ( office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM , Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24.
E N D
TA: Zhen Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 3-4 PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM, Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24 STT 200 – Lecture 5, section 23,24Recitation 11(3/26/2013)
Main Goals • Understand the sampling distribution of sample proportion . • The normal model , where is the population proportion, and is the sample size.
Data • Here are data from a population of 400 people, indicating whether they do ("Yes") or don't ("No") have wireless internet service at home. Please copy the following chunk and paste in R. • haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes", "No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","No","No","No","Yes","Yes","No","Yes","No","No","Yes","No","No","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","Yes","No","No","No","No","Yes","No","Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","No","Yes","Yes","Yes","No","No","No","Yes","No","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","No","No","No","Yes","No","No","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","No","Yes","No","No","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","No","Yes","No")
Data • Here is a table of integers between 1 and 400 chosen at random. R chuck: • rd<- c(92,149,41,310,307,130,296,130,77,399,212,301,25,177,313,147,298,160,354,20,199, 191,104,164,216,399,25,99,28,91,211,357,350,301,39,372,61,67,304,333,174,321,191,157,316,172,5,277,78,396,208,126,162,311,17,287,138,160,124,266,177,209,361,41,398,9,79,299,257,315,40,278,2,225,206,383,254,74,335,159,37,360,9,393,143,246,305,152,90,312,208,172,117,277,93,399,226,8,231,386,136,75,38,56,37,267,381,63,52,231,287,94,50,77,179,337,387,318,112,219,17,356,77,183,259,258,141,198,30,36,61,306,65,330,161,348,19,20,61,275,365,241,115,4,338,205,108,241,190,374,323,243,146,318,217,375,267,44,373,185,341,283,200,178,266,390,232,263,386,36,270,50,315,83,90,281,260,41,305,136,116,185,25,338,4,367,296,183,103,290,208,170,143,158,198,132,155,144,26,104,281,150,240,68,67,339,389,345,141,268,349,99,147,65,170,375,317,251,185,278,80,250,4,378,175,130,359,319,400,59,166,147,130,107,123,304,234,41,20,165,96,115,272,149,142,75,262,235,106,107,354,362,2,81,89,309,371,10,282,202,203,156,386,130,252,26,387,143,237,183,328,306,27,187,310,321,183,109,198,200,281,70,394,378,203,42,34,318,156,255,354,53,196,20,382,97,292,188,179,69,151,14,348,311,389,298,399,104,300,243,163,316,328,65,167,200,301,305,27,176,69,301,188,192,242,350,92,86,42,373,195,118,64,289,329,131,156,252,169,299,191,302,19,83,220,326,229,285,267,351,333,101,128,146,307,304,245,264,149,163,353,276,296,243,8,127,31,210,263,33,384,176,125,275,76,45,60,59,143,324,281,376,298,54,62,170,295,293,27,183,126,375,21,294,242,364,145,138,52,267,26,308,391,352,78,98,211,174,277,176,74,295,64,315,171,135,159,111,79,348,88,23,348,111,188,16,152,212,104,349,14,272,209,73,238,146,50,113,103,204,389,158,260,344,207,329,184,250,38,231,292,300,34,170,343,233,275,14,15,244,104,96,234,297,113,270,369,202,37,310,294,64,183,253,299,287,225,166,260,125,198,2,180,219,117,358,191,301,310,254,230,296,2,134,67,186,265,161,130,257,166,339,33,332,137,61,340,16,212,209,42,315,8,269,68,389,316,355,62,51,64,388,260,319,244,116,265,169,153,147,170,59,329,261,384,272,367,177,217,278,266,307,182,225,80,264,342,280,350,366,280,156,323,208,110,37,266,260,59,33,314,80,185,185,87,228,246,61,369,60,119,179,326,223,128,62,98,130,283,328,225,398,3,138,140,84,381,234,131,364,294,59,343,126,93,14,204,50,35,161,15,142,275,72,254,194,309,115,344,378,267,23,111,168,334,92,213,1,181,246,336,52,82,4,115,286,3,87,121,84,281,181,58,372,232,30,279,258,154,37,6,113,125,317,123,198,25,388,268,106)
Problems • Use the following procedure to choose 25 people at random from the population on the first page. For each person, record whether he does or does not have wireless internet service. • (a) Choose a starting point in the table of integers between 1 and 400 by closing your eyes and pointing at the table. • (b) Starting there, use the next 25 numbers to choose your sample of size 25. (There's a small chance that you'll pick the same person twice, but we'll not worry about that.) • Record the 25 yeses and nos here: • What proportion of the 25 people in your sample said yes? This is , the sample proportion. • The population proportion who have wireless internet service is . How far is your estimate from the true value ?
Simulation • Suppose we have many students who draw a sample with size 25 and get a sample proportion , We plot the histogram of ’s obtained by all students, and impose the density of on it.
Problems • Comment: is (before the data are collected) a random variable, and that we’ll use what we know about its distribution to try to quantify how confident we are in its estimation of . • Now we'll investigatemore generally. First, using the facts that: • the mean of is , and • that the standard deviation of is , • compute the mean and standard deviation of in our case when population proportion and sample size . • standard deviation of .
Problems • Next use the fact that a normal model is a good model for the distribution of to compute the probability that is within of the actual value of . • , thus the probability: • The z-score for under the normal model above is • with the area below is . • Similarly, the z-score for under the normal model above is • with the area below is . So the area in-between is. • Or: normcdf() or normcdf() in a calculator, or pnorm(1.007) – pnorm(-1.007) in R.
Problems • Repeat the two questions above, but this time with . • mean will again be • but standard deviation of , smaller! And
Appendix • R codes for the problems. • # prob4: • n <- 25; p <- 0.5575 • ( sdphat <- sqrt(p*(1-p)/n) ) • # prob 5: • ( pnorm(p+0.1, p, sdphat) - pnorm(p-0.1, p, sdphat) ) • # prob 6: • n2 <- 100 • ( sdphat2 <- sqrt(p*(1-p)/n2) ) • ( pnorm(p+0.1, p, sdphat2) - pnorm(p-0.1, p, sdphat2) ) • # comparison of n=25 and n=100 • vec <- seq(0.01,0.99, length=1000) • par(yaxt='n',mar=c(4,.3,.3,.3)) • plot(dnorm(vec, p, sdphat2)~vec, type='n', ylab=' ',xlab=expression(hat(p))) • grid(col='gray80') • lines(dnorm(vec, p, sdphat)~vec, lty=1, lwd=2) • lines(dnorm(vec, p, sdphat2)~vec, lty=2, lwd=2) • abline(v=p, col='red', lty=2) • text(x=p,y=0,labels=paste("p =",round(p,4)),col='red') • legend('topleft', legend=c(paste('N(',round(p,4),', ',round(sdphat,4),'), n=25',sep=''), paste('N(',round(p,4),', ',round(sdphat2,4),'), • n=100',sep='')), bg='gray90', inset=.02, lty=c(1,2), lwd=c(2,2))
Appendix(cont’d) • R codes for the simulations • (N <- length(haswi)) • (L <- length(rd)) • # prob 1: • set.seed(20); n <- 25 • ( mystart <- sample(1:L, size=1) ) • ( myindex <- rd[mystart+c(1:n)] ) • ( mysample <- haswi[myindex] ) • # prob 2: • ( myphat <- sum(mysample=="Yes")/n ) • # prob 3: • p <- 0.5575 • ( p - myphat) • # above is for one students. For many students, we have phats • set.seed(241); phats<- numeric(nstudents <- 10000) • for (t in 1:nstudents){ • mystarts <- sample(1:L, size=1) • myindexs <- rd[mystarts+c(1:n)] • mysamples <- haswi[myindexs] • phats[t] <- sum(mysamples=="Yes")/n • } • phats <- na.omit(phats) • # prob 4: • ( sdphat <- sqrt(p*(1-p)/n) ) • hist(phats, xlab=expression(hat(p)), freq=F, main='') • vec<- seq(min(phats), max(phats), length=1000); lines(dnorm(vec, p, sdphat)~vec)