150 likes | 237 Views
Overview. Too many variables, and too many people or things, cause thorny problems in data analysis, and the issue is not simply the availability of sufficient computing power to handle all that data .
E N D
Overview • Too many variables, and too many people or things, cause thorny problems in data analysis, and the issue is not simply the availability of sufficient computing power to handle all that data. • Principal components analysis seeks to identify and quantify those components by analyzing the original, observable variables. In many cases, we can wind up working with just a few—on the order of, say, three to ten—principal components or factors instead of tens or hundreds of conventionally measured variables.
observable variables Z1 X1 Z2 X2 Z3 X3 Which component explains the most variance?
Data Structures character vector numeric vector Dataframe: d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed") List: w <- list(name="Fred", age=5.3) Numeric Vector: a <- c(1,2,5.3,6,-2,4) Character Vector: b <- c("one","two","three") Framework Source: Hadley Wickham Matrix: y<-matrix(1:20, nrow=5,ncol=4)
Identity Matrix Inverse