250 likes | 370 Views
BIO503: Homework 2 Solutions. Harvard School of Public Health Wintersession 2009. Question 1A. Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4)) We now have 3 rows, 4 columns and 12 plots in total. For each individual plot, we need to:
E N D
BIO503: Homework 2 Solutions Harvard School of Public Health Wintersession 2009
Question 1A Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4)) We now have 3 rows, 4 columns and 12 plots in total. For each individual plot, we need to: • generate two sets of Normal random variables rnorm • construct a QQ plot qqplot • add a title, set the color and plotting character main, col, pch
By calling successive plot commands, R will fill in each of the 12 panels row-wise. The code we need to construct the first row: > qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[1], pch=1, lwd=2, main="10 Normal RVS, Set 1") > qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[2], pch=2, lwd=2, main="10 Normal RVS, Set 2") > qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[3], pch=3, lwd=2, main="10 Normal RVS, Set 3") > qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[4], pch=4, lwd=2, main="10 Normal RVS, Set 4")
For the remaining rows we can start with the same code, changing the sample size, color and plotting character and modifying the title of each plot. For the second row: > qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[5], pch=5, lwd=2, main="100 Normal RVS, Set 1") > qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[6], pch=6, lwd=2, main="100 Normal RVS, Set 2") > qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[7], pch=7, lwd=2, main="100 Normal RVS, Set 3") > qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[8], pch=8, lwd=2, main="100 Normal RVS, Set 4")
And the third row: > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[9], pch=9, lwd=2, main="1000 Normal RVS, Set 1") > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[10], pch=10, lwd=2, main="1000 Normal RVS, Set 2") > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[11], pch=11, lwd=2, main="1000 Normal RVS, Set 3") > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[12], pch=12, lwd=2, main="1000 Normal RVS, Set 4")
Question 1B Comment on how the appearance of the plots changes as the sample size changes. We see that we can assess more confidently the distributional assumption as the sample size grows. As the sample sizes get better it becomes clearer that the samples are normally distributed.
Question 2A We know the data file has the following properties: file name = cats.txt tab-delimited column headings Therefore we use the following code to read in the data: > cats <- read.table("cats.txt", sep="\t", header=T)
Question 2B The first column of the data.frame cats, i.e. cats[,1] contains the gender-specific information. This code will generate a logical vector where T corresponds to the rows for female cats. > cats[,1] == "F" We can create subsets of the cats object in the following way: > female.cats <- cats[cats[,1] == "F",] > male.cats <- cats[cats[,1] == "M",]
Question 2B To fit a linear regression model for heart weight on body weight, we need to let the lm function which data set to use. > female.mod <- lm(Hwt ~ Bwt, data=female.cats) > male.mod <- lm(Hwt ~ Bwt, data=male.cats)
Question 2C We can see the coefficients fitted in the model either using the summary function or more directly, the coefficients function. > summary(female.mod) > coefficients(female.mod) For the female cats, the slope estimate is 2.64 and the intercept is 2.98. > summary(male.mod) > coefficients(male.mod) For the male cats, the slope estimate is 4.31 and the intercept is -1.18.
Question 2D For the female cat model: Residual standard error: 1.162 on 45 degrees of freedom For the male cat model: Residual standard error: 1.557 on 95 degrees of freedom Good job to those of you who actually calculated the residual standard error!
Question 2E We can create a new data.frame to store the information for the new cats. > new.female.cat <- data.frame(Bwt = 2.5) > new.male.cat <- data.frame(Bwt = 2.9) Then we can use the predict function which will apply our linear model object to make a prediction about the new data. > predict(female.mod, new.female.cat) > predict(male.mod, new.male.cat) The female's heart weight is 9.57g, the male's heart weight is 11.32g.
Question 2F Construct a plot to examine the residuals for each model. There are two possible plots we could make to examine the residuals arising from the model. • QQ-plot • scatter plot
Question 2F – QQ Plots > par(mfrow=c(1,2)) > qqnorm(residuals(female.mod), lwd=3, pch=3, col="orange", main="QQ Plot for Female Cat Model Residuals") > qqline(residuals(female.mod), lwd=3, col="blue") > qqnorm(residuals(male.mod), lwd=2, col="purple", pch=4, main="QQ Plot for Male Cat Model Residuals") > qqline(residuals(male.mod), lwd=3, col="gold")
Question 2F – QQ Plots You should get something that looks like this:
Question 2F – Scatter Plots > par(mfrow=c(1,2)) > plot(residuals(female.mod), ylim=range(residuals(female.mod), residuals(male.mod)), pch=3, col="orange", lwd=3, main="Residuals for the Female Cat Model", ylab="Residual Values") > abline(a=0, b=0, lty=3, lwd=3) > plot(residuals(male.mod), ylim=range(residuals(female.mod), residuals(male.mod)), pch=4, col="purple", lwd=3, main="Residuals for the Male Cat Model", ylab="Residual Values") > abline(a=0, b=0, lty=3, lwd=3)
Question 2F – Scatter Plots You should get something like this:
Question 2F Interpretation: There aren't any strong patterns in the scatter plots. The residuals appear randomly distributed about mean 0. The QQ plots show that most of the residuals line up on the diagonal line meaning the residuals appear to be Normally distributed. These observations suggest that our model assumptions are valid.
Question 2G Construct a plot of the raw data for the male cats, add the regression line you've fitted. > plot(male.cats$Bwt, male.cats$Hwt, pch=4, lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats") > abline(male.mod)
Question 2G For the male cats, we get the following plot:
Question 2G We use similar code to produce the corresponding plot for the female cats. Given that the question asks us to put the two plots side by side, we need to use par and set the ylim argument. > par(mfrow=c(1,2)) > plot(female.cats$Bwt, female.cats$Hwt, pch=3, ylim=range(cats$Hwt), lwd=3, col="orange", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Female Cats") > abline(female.mod) > plot(male.cats$Bwt, male.cats$Hwt, pch=4, ylim=range(cats$Hwt), lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats") > abline(male.mod)
Question 2G We'll get a plot that looks like this:
Bonus Question > col.palette <- rainbow(12) > lwd.width <- 2 > pch.sym <- 1:20 > nsize <- rep(c(10,100,1000), each=4) > par(mfrow=c(3,4)) > for( i in 1:12 ){ what.set <- i%%4 if( what.set == 0 ){ title.text <- paste(nsize[i], " Normal RVS, Set 4", sep="") } else{ title.text <- paste(nsize[i], " Normal RVS, Set ", what.set, sep="") } qqplot(rnorm(nsize[i]), rnorm(nsize[i]), xlab="Quantiles", ylab="Quantiles", col=col.palette[i], pch=pch.sym[i], lwd=lwd.width, main=title.text) }
Bonus Question By the way, I saw some terrific and very clever variations from students. Congratulations and well done!