RadViz Extensions with Applications

RadViz Extensions with Applications Dissertation Defense John Sharko October 26, 2009

Committee • Prof. Georges Grinstein (Advisor) • Prof. Kenneth Marx • Prof. Haim Levkowitz • Dr. Patrick Hoffman • Dr. Alex Gee

Outline • Introduction • RadViz • Cluster Ensembles • Fuzzy Clusters • Methodology • Contributions • Recommendations

RadViz Example

Description of Traditional RadViz Each dimension in a dataset is represented by a point, called an anchor point, on the circumference of a circle. Each record in the dataset is positioned as if it were being pulled by a spring attached to each anchor point where the strength of the spring is proportional to that record’s coordinate or value for the dimension related to that anchor point.

RadViz ExampleAll Coordinate Values Equal

RadViz ExampleTwo Coordinate Values Equal

RadViz ExampleRange of Coordinates Values

Terminology • Dimensional Anchor (Anchor Point) – point on the circle representing a dimension • Point • representation of record(s) within the circle

RadViz Mathematical Formulation where: xi and yi are the resulting transformed coordinates for record i θj is the angular position on the circle corresponding to dimension j ai,j is the value for dimension j for record i d is the number of dimensions and n the number of records.

Impact of Exchanging Dimensional Anchors A A B C (1, 0, 1, 0) D D C B

Example of Repositioning Anchor Points Using Layout Algorithm Before repositioning After repositioning

Multiple Clustered Datasets • Clustering algorithms are heuristic, not optimal • Different clustering algorithms tend to generate different clusters

Sample Multiple Clustered Dataset

Stable Group of Records

Uniquely Clustered Record

Fuzzy Clusters • A record belongs to multiple clusters • Varying strengths of association

Cluster Ensemble vs. Fuzzy Clustering

Using RadViz to Analyze Multiple Clustered Datasets • RadViz typically deals with real numbers • Cluster number just does not work • How do you produce a meaningful RadViz visualization?

Flattening of Categorical Data • Break up each original dimension into multiple dimensions • Each new dimension represents a value of the original dimension

Flattening a Dimension Flattened Manufacturer Model Small Large Sporty Van Price Original Manufacturer Model Type Price Original Record: (Cadillac, Deville, Large, 33) Flattened Record: (Cadillac, Deville, 0, 1, 0, 0, 33)

Flattening Multi Cluster Dataset Flattened Dimensions 1 2 1 2 3 1 2 3 4 Original Dimensions Algorithm A Algorithm B Algorithm C { { A B C A B C { Sample Record: (2, 1, 4) (0, 1, 1, 0, 0, 0, 0, 0, 1 )

Simple Example • Iris dataset • Three cluster sets • KM1: K-means clustering with 1000 iterations • KM2: K-means clustering with 100,000 iterations • HC: hierarchical clustering • Ten clusters per cluster set

Flattened Multi-cluster Iris Dataset KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

Flattened Multi-cluster Iris Dataset - Jittered KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

Flattened Multi-cluster Iris Dataset KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

Repositioning Dimensional Anchors • Move points away from the center • Separate points • Increase displayed information content

Class Discrimination Layout Algorithm • Select a dimension that classifies the records • Assign each dimension to the class with the highest values with respect to the other classes • Move the dimensional anchors assigned to the same class next to each other to form a classification sector

Example of Class Discrimination Layout Algorithm Before After Class Classification Sector 1 1 2 Classification Sector 2

After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

After Repositioning Dimensional Anchors KM1-2 KM1 Cluster Size 30 records 20 records 10 records 5 records

After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

Moving Similar Classification Sectors Close to Each Other -Dimensions have been grouped together into classification sectors -Determine which record classes are most similar to each other using Euclidean distances -Move those dimension sectors closer to each other using greedy algorithm -Records will tend to be moved away from the center

Repositioning Classification Sectors KM1 Color Scale KM2-5 KM1-7 KM1-3 HC-9 10 9 8 7 6 5 4 3 2 1 KM2-10 KM1-4 HC-5 7 HC-3 4 3 KM1-2 KM1-8 8 2 HC-6 HC-2 1 5 KM1-1 KM2-4 KM2-2 6 KM1-5 9 10 KM2-6 HC-8, HC-7 KM2-3, KM2-8 KM2-1, HC-10 KM2-9, KM1-6 HC-1 KM1-9 KM1-10 HC-4 KM2-7

Interpreting Vectorized RadViz Petal length Setosa Virsicolor Virginica Sepal length

Interpreting VRV Petal length Setosa Virsicolor Virginica Sepal length

Salamander Gene Expression Levels Expression Levels Time

Salamander Class 9 Genes Nvg00226 Nvg00111 Nvg00155 Nvg00091

Salamander Class 9 Genes • Nvg00111 • “Key” gene • CXC chemokine, ligand 10 • Nvg00226 • No homology • Nvg00155 • Keratin type II cytoskeletal • Nvg00091 • Annexin

Fuzzy Clusters

Description of Fuzzy Clusters • K-means clustering algorithm used • Four clusters • Applied to Iris dataset

RadViz Extensions with Applications

RadViz Extensions with Applications

Presentation Transcript

Arrays … The Sequel Applications and Extensions

Extensions

Supply and Demand, Applications and Extensions

Extensions

Trade Models: Extensions and Applications

Chapter 8: Extensions and Applications

EXTENSIONS EXTENSIONS

PLINQ Parallelize Your .NET Applications with Parallel Extensions

cor-test.csv - Radviz

DOLCE extensions and applications

Extensions

Extensions

Extensions

Developing Application Extensions with Axis2

Supply and Demand: Applications and Extensions

Chapter Four Supply and Demand, Applications and Extensions

Supply and Demand — Applications and Extensions

Rough Sets, Their Extensions and Applications Introduction

Supply and Demand: Applications and Extensions

Supply and Demand: Applications and Extensions

CHAPTER 10 ARRAYS II Applications and Extensions

Shop hair extensions with Rubin Extensions