Multidimensional Scaling

Multidimensional Scaling

Agenda • Multidimensional Scaling • Goodness of fit measures • Nosofsky, 1986

Proximities pAmherst, Hadley

Configuration (in 2-D) xi

Configuration (in 1-D)

Formal MDS Definition • f: pijdij(X) • MDS is a mapping from proximities to corresponding distances in MDS space. • After a transformation f, the proximities are equal to distances in X.

Distances, dij dAmherst, Hadley(X)

Distances, dij

Distances, dij dAmherst, Hadley(X)=4.32

Proximities and Distances Proximities Distances

The Role of f • f relates the proximities to the distances. • f(pij)=dij(X)

The Role of f • f can be linear, exponential, etc. • In psychological data, f is usually assumed any monotonic function. • That is, if pij<pklthen dij(X)dkl(X). • Most psychological data is on an ordinal scale, e.g., rating scales.

Looking at Ordinal Relations Proximities Distances

Stress • It is not always possible to perfectly satisfy this mapping. • Stress is a measure of how closely the model came. • Stress is essentially the scaled sum of squared error between f(pij) and dij(X)

Stress “Correct” Dimensionality Stress Dimensions

Distance Invariant Transformations • Scaling (All X doubled in size (or flipped)) • Rotatation (X rotated 20 degrees left) • Translation (X moved 2 to the right)

Configuration (in 2-D)

Rotated Configuration (in 2-D)

Uses of MDS • Visually look for structure in data. • Discover the dimensions that underlie data. • Psychological model that explains similarity judgments in terms of distance in MDS space.

Simple Goodness of Fit Measures • Sum-of-squared error (SSE) • Chi-Square • Proportion of variance accounted for (PVAF) • R2 • Maximum likelihood (ML)

Sum of Squared Error

Chi-Square

Proportion of Variance Accounted for (SST-SSE)/SST = (34-7.96)/34 = .77

R2 • R2 is PVAF, but… (SST-SSE)/SST = (34-44.03)/34 = -0.295

Maximum Likelihood • Assume we are sampling from a population with probability f(Y; ). • The Y is an observation and the  are the model parameters. =[0] Y N(-1.7; [=0])=0.094

Maximum Likelihood • With independent observations, Y1…Yn, the joint probability of the sample observations is: =[0] Y1 Y2 Y3 0.094 x 0.2661 x .3605 = .0090

Maximum Likelihood • Expressed as a function of the parameters, we have the likelihood function: • The goal is to maximize L with respect to the parameters, .

Maximum Likelihood =[0] Y1 Y2 Y3 0.094 x 0.2661 x .3605 = .0090 (Assuming =1) =[-1.0167] Y1 Y2 Y3 0.3159 x 0.3962 x .3398 = .0425

Maximum Likelihood • Preferred to other methods • Has very nice mathematical properties. • Easier to interpret. • We’ll see specifics in a few weeks. • Often harder (or impossible?) to calculate than other methods. • Often presented as log likelihood, ln(ML). • Easier to compute (sums, not products). • Better numerical resolution. • Sometimes equivalent to other methods. • E.g., same as SSE when calculating mean of a distribution.

Multidimensional Scaling