360 likes | 557 Views
STATS 330: Lecture 2. Graphics. Housekeeping matters. STATS 762 An extra test (details to be provided ). An extra assignment Class rep – ???? Office hours Alan: 10:30 – 12:00 Tuesday and Thursday in Rm 265, Building 303S Tutors: TBA Assignment 1: Due 2 August.
E N D
STATS 330: Lecture 2 Graphics 330 Lecture 2
Housekeeping matters • STATS 762 • An extra test (details to be provided). An extra assignment • Class rep – ???? • Office hours • Alan: 10:30 – 12:00 Tuesday and Thursday in Rm 265, Building 303S • Tutors: TBA • Assignment 1: Due 2 August 330 Lecture 2
Today’s lecture: Exploratory graphics Aim of the lecture: To give you a quick overview of the kinds of graphs that can be helpful in exploring data. Some of this material has been covered in 201/8. (We will discuss the R code used to make the plots in Lecture 4) 330 Lecture 2
Exploratory Graphics: topics • Exploratory Graphics for a single variable: aim is to show distribution of values • Histograms • Kernel density estimators • Qq plots • For 2 variables: aim is to show relationships • Both continuous: scatter plots • One of each: side by side boxplots • Both categorical: mosaic plots – see Ch 5 • 3 variables • Pairs plot • Rotating plot • coplots • 3D plots, contour plots 330 Lecture 2
Single variable:Exchange rate data • The data of interest: daily changes in log(exchange rate) for the US$/Kiwi. • Monthly date from June 1986 to may 2012 • Source: Reserve Bank • Questions: • What is the distribution of the daily changes in the logged exchange rate? • Is it normal? If not, how is it different? 330 Lecture 2
yt = exchange rate at time t Difference in logs is log(yt) – log(yt-1) = log(yt/yt-1) 330 Lecture 2
Data Analysis Suppose we have the data (3374 differences in the logs), in an R vector, Diff.in.Logs hist(Diff.in.Logs,nclass=100,freq=FALSE) # add density estimate lines(density(Diff.in.Logs),col="blue", lwd=2) # add fitted normal density xvec = seq(-0.1, 0.1, length=100) lines(xvec,dnorm(xvec,mean=mean(Diff.in.Logs), sd=sd(Diff.in.Logs)),col="red", lwd=2) 330 Lecture 2
Normal plot > qqnorm(Diff.in.Logs) Normal data? No – QQ plot indicates that the differences have longer tails than normal, since the plotted points are lower than the line for small values and higher for big ones 330 Lecture 2
Two Variables: Rats! Of interest: growth rates of 16 rats i.e. relationship between weight and time Want to explore the relationship graphically. Each rat was measured (roughly) every week for 11 weeks For weeks 1-6, all rats were on a fixed diet. Diet was changed after week 6. 330 Lecture 2
Two Variables: Rats! Data set rats.df has variables rat (1-16) growth (weight in grams) day (day since start of study, 11 values, at approximately weekly intervals group (litter, one of 3) change (has values 1 or 2 - diet was changed after 6 weeks, diet 1 for weeks 1-6, diet 2 for weeks 7-11 330 Lecture 2
Rats: the data • > rats.df • growth group rat change day • 1 240 1 1 1 1 • 2 250 1 1 1 8 • 3 255 1 1 1 15 • 4 260 1 1 1 22 • 5 262 1 1 1 29 • 6 258 1 1 1 36 • 7 266 1 1 2 43 • 8 266 1 1 2 44 • 9 265 1 1 2 50 • 10 272 1 1 2 57 • 11 278 1 1 2 64 • 12 225 1 2 1 1 • 12 230 1 2 1 8 • ... More data 330 Lecture 2
Rats (cont) • Could plot weight (i.e. the variable growth) versus the variable day: plot(day,growth) BUT…. 330 Lecture 2
Criticisms • Can’t tell which points belong to which rat • Seem to be 2 groups of points • In actual fact, the rats came from 3 different litters, is this relevant? • Could do better 330 Lecture 2
More rats: improvements • Join points representing the same rat with a line • Use different colours (or different line types e.g. dashed or dotted) for the different litters • Use a legend 330 Lecture 2
More improvements • Plot is too cluttered • Could plot each rat on a different graph – important to use same scales (axes) for each graph • This leads to the idea of “Trellis graphics” 330 Lecture 2
Two variables: one continuous, one categorical • Insurance data: data on 14,000 insurance claims. Want to explore relationship between the amount of the claim (a continuous variable) and the type of car (a categorical variable). • Use side-by side boxplots. 330 Lecture 2
Loess smooth 8 Log(ADINCUR) 6 4 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 11 11 13 13 15 15 17 17 CARGROUP Car Group 330 Lecture 2
More than 2 variables: • If all variables are continuous, we can explore the relationships between them using a pairs plot • If we have 3 variables, a rotating plot is a very useful tool 330 Lecture 2
Example: Cherry trees > cherry.df diameter height volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 ... more data – 31 trees in all 330 Lecture 2
Cherry trees: pairs plots > pairs(cherry.df) 330 Lecture 2
3-d Rotating plots • The challenge: to represent a 3-dimensional object on a 2-dimensional surface (a TV screen, computer screen etc) • Traditional method uses projection, perspective • A powerful idea is to use motion, looking at the 3-d scene from different angles 330 Lecture 2
Perspective 330 Lecture 2
Diameter -height view Arbitrary view Projection Volume -height view Diameter -volume view 330 Lecture 2
Cherry trees: rotating plot 330 Lecture 2
Dynamic motion • By dynamically changing the angle of view, we get a better impression of the 3-dimensional structure of the data • “Dynamic graphics” is a very powerful tool 330 Lecture 2
A powerful idea: Coplots • Coplot shows relationship between x and y for selected values of z (usually a narrow range of z’s) • By showing separate plots for different z ranges, we can see how the relationship between x and y changes as z changes • Coplot: conditioning plot, shows relationship between x and y conditional on z (ie for fixed z) 330 Lecture 2
Cherry trees: coplots • To show the relationship between height and volume for different values of diameter: • Divide the range of diameter (8.3 to 20.6) up into 6 subranges 8-11, 10.5 -11.5 etc • Draw 6 plots, the first using all data whose diameter is between 8 and 11, the second using all data whose diameter is between 10.5 and 11.5, and so on 330 Lecture 2
Interpretation • Note that the lines are not of the same slope • This implies that the point configuration is not “planar” 330 Lecture 2
Other 3-d graphs 3-d scatter plot plot of surface Both can be rotated 330 Lecture 2