Lecture 07: Data Transform II

Lecture 07:Data Transform II September 28, 2010 COMP 150-12Topics in Visual Analytics

Lecture Outline • Data Retrieval • Methods for increasing retrieval speed: • Pre-computation • Pre-fetching and Caching • Levels of Detail (LOD) • Hardware support • Data transform (pre-processing) • Aggregate (clustering) • Sampling (sub-sampling, re-sampling) • Simplification (dimension reduction) • Appropriate representation (finding underlying mathematical representation)

Dimension Reduction • Lots of possibilities, but can be roughly categorized into two groups: • Linear dimension reduction • Non-linear dimension reduction • Related to machine learning…

Dimension Reduction • Can think of clustering as a dimension reduction mechanism: • Assume the dataset has n dimensions • Using clustering, results in k dimensions (k < n) • Instead of representing the data as n dimensional vector • Present the data using the k-dimensions

Some Common Techniques • Principle Component Analysis • demo • Multi-Dimensional Scaling • draw • Kohonen Maps / Self-Organizing Maps • demo • Isomap • draw

Principle Component Analysis height GPA 0.5*GPA + 0.2*age + 0.3*height = ? • Quick Refresher of PCA • Find most dominant eigenvectors as principle components • Data points are re-projected into the new coordinate system • For reducing dimensionality • For finding clusters • Problem: PCA is easy to understand mathematically, but difficult to understand “semantically”. age

Principle Component Analysis • Pseudo code • Pose data such that each column is a dimension, and each row is a data entry (a nxm matrix, n = rows, m = cols) • Subtract the mean of a dimension from each value • Compute the covariance matrix (M) • Compute the eigenvectors and eigenvalues of (M) • Use singular value decomposition (SVD) • where and are mxm matrices, • is an mxndiagnoal matrix (of positive real numbers) • Sort the eigenvectors in based on their associated eigenvalues in from highest eigenvalue to lowest • Project your original data onto the first (highest) eigenvectors

Multi-Dimensional Scaling • Minimize distances between low-d and high-d representations. • Where is the position of point i in low dimensional space, and is the distance between two points i and j in n dimensions

Multi-Dimensional Scaling Image courtesy of Jing Yang

Multidimensional Scaling

Self-Organizing Maps • Pseudo code • Assume input of n rows of m dimensional data • Define some number of nodes (e.g. 40x40 grid) • Give each node m values (vector of size m) • Randomize those values • Loop k number of times: • Select one of the n rows of data as “input vector” • Find within the 40x40 grid nodes the one most similar to the input vector (call this node Best Matching Unit – BMU) • Find the neighbors of the BMU on the grid • Update the BMU and its neighbors based on the following equation: • where is the gaussian function of distance (decays over time) • is the learning function (decays over time) • is the input vector, and is the grid node’s vector

Isomap Image courtesy of Wikipedia: Nonlinear Dimensionality Reduction

Many Others! • To name a few: • Latent Semantic Indexing • Support Vector Machine • Linear Discriminant Analysis (LDA) • Locally Linear Embedding • “manifold learning” • Etc. • Consider the characteristics of the data, and choose the appropriate. • e.g. are the data labeled? Apply supervised vs. unsupervised methods.

Support Vector Machine

Questions?

Lecture 07: Data Transform II

Lecture 07: Data Transform II

Presentation Transcript

Fourier Transform: Applications in seismology

Lecture 16: Union and Find for Disjoint Data Sets

Hilbert-Huang Transform(HHT)

Lecture 5 Why computer science needs philosophy

RADON TRANSFORM

GIS Tutorial 1

LIS6 54 lecture 5 repository interoperability

DATA MINING LECTURE 3

LECTURE 3 Introduction to PCA and PLS K-mean clustering

CSC 211 Data Structures Lecture 6

Graphs Arrays Iteration Combining Data Structures

Lecture 2

CS1020 Data Structures and Algorithms I Lecture Note # 6

SQL

Data Mining: Introduction

Data Mining: Data

UNIT-1 Introduction

Welcome!

CSCE 641 Computer Graphics: Image Sampling and Reconstruction

Materials for Lecture 08