120 likes | 282 Views
Visualizing the Legislature. Howard University Systems and Computer Science. Mugizi Robert Rwebangira. How to get student interested?. Show them something relevant!. What is relevant right now?. BIG DATA!. BIG DATA.
E N D
Visualizing the Legislature Howard University Systems and Computer Science Mugizi Robert Rwebangira
How to get student interested? Show them something relevant! What is relevant right now? BIG DATA!
BIG DATA 1,200 billion terabytes of data generated in 2008 (=1,200 billion terabytes) More than generated in first 6000 years of human history Growing at 60% per year Source: The Economist, “The Data Deluge” February 26,2010
Problem Storing this Data Understanding/Summarization Emergence of “Data Scientist” Have skills in programming and math/statistics
APPLICATION: POLITICS Getting data on congressional votes used to be difficult Now easily downloadable on government web site For example US Senate takes about 600 votes a session Question: how do we present this situation in a useful way
Solution: Math For each senator we have 600 pieces of information (their votes) We can view the senate as a “cloud” of points in a high dimensional space We want to PROJECT these points into 2 dimensions while preserving the features of the dataset (i.e similar senators should be close together) Also should be efficient
Principle Component Analysis Let X be the n X d data matrix where n = number of senators and d = number of votes We want to compute a matrix n X 2 matrix Y PCA computes the Y such that each dimension is maximally informative (in some sense) Can be computed by Singular Value Decomposition
Principle Component Analysis (cont.) Take X = (V) (E) (VT) (Singular Value Decomposition) The Y = (VT)(X) Can be computed quickly O(n^3)
Dataset Senate Roll Call 110th Senate: 2007 - 2008 n = 102 senators d = 634 votes Data can be obtained from here: http://voteview.com/senate110.htm
Conclusions Visualizing high dimensional will become increasingly important as data proliferates Good motivation for the study of linear algebra/statistics