1 / 12

Multivariate Analysis

Multivariate Analysis. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold . Everitt , BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer.

jalen
Download Presentation

Multivariate Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

  2. Resources • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

  3. Roadmap • PBL group assignments • Multivariate data graphics tutorials • Testing distributional assumptions • Principle components analysis • Cluster analysis • Summary

  4. PBL group assignments • Two groups

  5. Multivariate data graphics tutorials • Available on the module website • Covers both standard and lattice graphics

  6. Testing distributional assumptions • For these techniques to work, the data have to be distributed in a multivariate normal distribution. • There are two ways of testing this: • Examine each variable separately (this does not imply the data follow a multivariate normal distribution) • Convert the data to a single number (a generalised distance) and plot against an appropriate chi-squared distribution.

  7. Separate Examination • X has two columns, and the combined data are bivariate normal: par(mfrow=c(1,2) qqnorm(X[,1],ylab= “Ordered observations”) qqline(X[,1]) qqnorm(X[,2],ylab= “Ordered observations”) qqline(X[,2])

  8. Comparison to a chi-squared distribution • Same data, using chisplot available at http://biostatistics.iop.kcl.ac.uk/publications/everitt/ par(mfrow=c(1,1) chisplot(X)

  9. Principle components analysis (PCA) • Describe the variation of a set of multivariate data in terms of a set of uncorrelated variables, each a linear combination of the original variables. • The goal is to reduce the number of meaningful variables to a small number that summarise the data set. • Deals with highly correlated explanatory variables. • Representative of projection pursuit methods.

  10. Cluster analysis • A tool for classifying a phenomenon that sorts the samples into a small number of groups or clusters, usually non-overlapping. • These clusters may not be unique. • Predictive clustering • Clustering based on causation • Hence a cluster analysis is neither true nor false, but is simply useful.

  11. Cluster analysis approaches • Agglomerative hierarchical clustering (fusion from the bottom-up) • K-means type methods (partition from the top down) • Classification maximum likelihood methods (assume a model for the shape of the clusters) • Or you can simply use the tree library. library(tree) model<-tree(ozone~.,data=ozone.pollution) plot(model) text(model)

  12. Summary • Multivariate statistics is usually done from the point of view that there are no laws of scientific inference—‘anything goes’. • First, you explore the data to come up with hypotheses—the models. • Then you confirm the models on a second data set. • If you have a single data set, split it into two parts, one for exploration and one for confirmation. • Good data analysis is based on the skilful interpretation of evidence and the subsequent development of hunches.

More Related