820 likes | 834 Views
Structuring Interactive Cluster Analysis. Wayne Oldford University of Waterloo. Structuring Interactive Cluster Analysis. This talk is about interactive cluster analysis, that is about interactive tools for finding and identifying groups in data.
E N D
Structuring Interactive Cluster Analysis Wayne Oldford University of Waterloo Structuring Interactive Cluster Analysis R.W. Oldford
Structuring Interactive Cluster Analysis This talk is about interactive cluster analysis, that is about interactive tools for finding and identifying groups in data. But more than that, it's about stepping back and understanding the structure of this process so that software tools can be organized to simplify and to aid the analysis. Wayne Oldford University of Waterloo Structuring Interactive Cluster Analysis R.W. Oldford
Overview The problem of `cluster analysis' or of `finding groups in data' is ill defined. So there can be no universal solution and any claimed solution must necessarily solve some other suitably constrained problem and not the more general one. What we need instead are highly interactive tools which allow us to adapt to the peculiarities of the data and the problem at hand. These tools are usefully organized and integrated if we step back and consider the problem as one of exploratory data analysis, except that now, in addition to the data itself, the exploration is to take place as well on the space of partitions of the data. Existing algorithms need to be recast, and new ones developed, in terms of exploring the space of partitions. The algorithms can then be easily integrated with other interactive tools so that jointly they provide a broadly useful and easily adapted tool-set for finding and identifying groups in data. Argument: • ill-defined problem • high-interaction desirable • explore partitions • recast algorithms Structuring Interactive Cluster Analysis R.W. Oldford
Overview Argument: Develop by example: • ill-defined problem • high-interaction desirable • explore partitions • recast algorithms • problems • resources • interactive clustering • partition moves • implications • prototype interface Structuring Interactive Cluster Analysis R.W. Oldford
Problem … geometric/visual structure Visual system easily identifies groups … algorithms are often motivated and/or understood via visual intuition and geometric structure Structuring Interactive Cluster Analysis R.W. Oldford
Problem … geometric/visual structure Visual system easily identifies groups … algorithms are often motivated and/or understood via visual intuition and geometric structure Structuring Interactive Cluster Analysis R.W. Oldford
Problem … Consider visually grouping here: Context matters … each point is a document located by each word’s frequency within the document Structuring Interactive Cluster Analysis R.W. Oldford
Problem … two similar documents of different lengths should be “closer” … one of these has more text than the other. Structuring Interactive Cluster Analysis R.W. Oldford
Problem … green “closer” to orange than to red? … “distance” measured by angle? Structuring Interactive Cluster Analysis R.W. Oldford
Problem … structure in context … segmentation in MRI … groups are spatially contiguous in the plane of the image and nearby in the intensity. … shape is not defined a priori … image source Structuring Interactive Cluster Analysis R.W. Oldford
Problem … context specific structure … aneurysm presents as intensity in blood vessels … groups are spatially contiguous tubes of similar intensity … shape is restricted a priori to be 3-d tubes … image source Structuring Interactive Cluster Analysis R.W. Oldford
Problem … some specific some not … image source … same slice, five different measurements at each location … spatial grouping as before, additional grouping possible across measurements Structuring Interactive Cluster Analysis R.W. Oldford
Problem … some specific some not … image source 4 dimensional data from connected images: … 2d spatial with clear biological grouping, connected to … 2d intensity measures with abstract structure/grouping Structuring Interactive Cluster Analysis R.W. Oldford
Problem • Find groups in data • Similar objects are together • Groups are separated • What do you mean similar? • Problem is ill defined: • E.g. what is contiguous structure? • When are groups separate? • Can we believe it? Structuring Interactive Cluster Analysis R.W. Oldford
Computational resources 1. Processing 2. Memory 3. Display Structuring Interactive Cluster Analysis R.W. Oldford
Computational resources (and response) 1. Processing • Gflops, Tflops, multiple processors • “computationally intensive” methods • problem constrained and optimized 2. Memory 3. Display Structuring Interactive Cluster Analysis R.W. Oldford
Computational resources (and response) 1. Processing 2. Memory • GBs, TBs, disk and RAM • try to analyze huge data-sets • data-sets larger than necessary? 3. Display Structuring Interactive Cluster Analysis R.W. Oldford
Computational resources (and response) 1. Processing 2. Memory 3. Display • high resolution, large • graphics processors, digital video • more data, more visual detail Structuring Interactive Cluster Analysis R.W. Oldford
Computational resources 1. Processing 2. Memory 3. Display Exploit no one resource exclusively Balance and integrate Structuring Interactive Cluster Analysis R.W. Oldford
High interaction (much overlooked by researchers) • assume multiple displays • integrate computational resources • challenge is to design software to be simple, understandable, integrated and extensible Structuring Interactive Cluster Analysis R.W. Oldford
Example: image analysis … find groups via intensity (contours and two small unusual structures revealed) Structuring Interactive Cluster Analysis R.W. Oldford
Example: image analysis … other measurements may contain interesting structure Structuring Interactive Cluster Analysis R.W. Oldford
Example: image analysis … identify new structure location in the original image Structuring Interactive Cluster Analysis R.W. Oldford
Example: image analysis … mark new groups by colour (hue, preserving lightness in original image) Structuring Interactive Cluster Analysis R.W. Oldford
Example: image analysis … explore relation between old and new groups via contours in the image itself Structuring Interactive Cluster Analysis R.W. Oldford
humans Gorillas, orangutans chimps hominids Proconsul Africanus Example: 8 dimensions from teeth measurements on species (+ sex) Structuring Interactive Cluster Analysis R.W. Oldford
Example: apes, hominids, modern humans • multiple and very different views • 3-d point clouds (of first 3 discriminant co-ordinates) • cases identified in a list • each point represented as a smooth curve by projecting it on a direction vector smoothly moving around the surface of an 8-d sphere • all linked via colour by cases being displayed • context helps • knowing the species encourages grouping • grouping based on context + the visual information • grouping is confirmed across different kinds of display Structuring Interactive Cluster Analysis R.W. Oldford
Example: mutual support and shapes a 3-d projection Shape from all dimensions How many groups? Structuring Interactive Cluster Analysis R.W. Oldford
Example: mutual support and shapes Groups found here Same in all dimensions? How many groups? Structuring Interactive Cluster Analysis R.W. Oldford
Example: mutual support and shapes Observe effect here Split black group by shape How many groups? Structuring Interactive Cluster Analysis R.W. Oldford
Example: mutual support and shapes Get new 3-d projection Coloured by shape Five groups corroborated Structuring Interactive Cluster Analysis R.W. Oldford
Example: exploratory data analysis How many groups? Structuring Interactive Cluster Analysis R.W. Oldford
Example: exploratory data analysis Choose data to cut away Explore the rest Distinguish groups Structuring Interactive Cluster Analysis R.W. Oldford
Example: exploratory data analysis Bring data back Explore all together Some black with red? Focus on centre Structuring Interactive Cluster Analysis R.W. Oldford
Example: exploratory data analysis Explore separately Mark group Discard new view Explore all together Two groups Structuring Interactive Cluster Analysis R.W. Oldford
Interactive clustering • visual grouping • location, motion, shape, texture, ... • linking across displays • manual • selection • cases, variates, groups, ... • colouring • focus • immediate and incremental • context can be used to form groups • multiple partitions Structuring Interactive Cluster Analysis R.W. Oldford
Automated clustering: typical software • resources dedicated to numerical computation • teletype interaction • runs to completion • graphical “output” • don’t always work so well (no universal solution) • confirm via exploratory data analysis Must be integrated with interactive methods Structuring Interactive Cluster Analysis R.W. Oldford
Example: K-means clustering K = 2 groups Starting groups as shown have centre ball in one group K-means moves one point at a time to “improve” 2 groups Structuring Interactive Cluster Analysis R.W. Oldford
Example: K-means clustering K = 2 groups Final groups shown maximize F-like statistic (between/within) Central ball is lost K-means poor for this data configuration Structuring Interactive Cluster Analysis R.W. Oldford
Example: VERI Visual Empirical Regions of Influence join points if no third point falls in this region Visual Empirical Regions of Influence Structuring Interactive Cluster Analysis R.W. Oldford
Example: VERI Visual Empirical Regions of Influence join points if no third point falls in this region Visual Empirical Regions of Influence Structuring Interactive Cluster Analysis R.W. Oldford
Visual Empirical Regions of Influence • psychophysical experiments of human visual perception to join data points • very special circumstances (two lines of three equi-spaced points each) • works well on demonstration 2-d cases • extends to higher dimensions • two points are joined or not depending on their joint configuration with a third point • each third point examined forms a plane with the candidate pair and so VERI shape applies • works in high-d with published demonstration cases Structuring Interactive Cluster Analysis R.W. Oldford
Example: VERI Each colour is a different group found by VERI. Central ball is lost. VERI fails for this data configuration (also for small perturbations of demonstration cases). There is no universal method, nor can there be. Structuring Interactive Cluster Analysis R.W. Oldford
Example: VERI (with parameters) VERI algorithm, but parameterized now to shrink region size. Becomes minimal spanning tree in the limit (MST gets 2 groups here). Again. no universal method possible, but methods can be parameterized. Structuring Interactive Cluster Analysis R.W. Oldford
Integrating automatic methods: Move about the space of partitions: Pa --> Pb --> Pc --> …. Which operators f f(Pa) --> Pb are of interest? Structuring Interactive Cluster Analysis R.W. Oldford
Refine Need not be nested. Nesting produces hierarchy Reduce Structuring Interactive Cluster Analysis R.W. Oldford
Reassign Structuring Interactive Cluster Analysis R.W. Oldford
Refinement sequence: Begin with partition containing all points in one group. 1 Structuring Interactive Cluster Analysis R.W. Oldford
-> 2 Refinement sequence: Refine partition to move to a new partition containing two groups. 1 This refinement was had by projecting all points onto the eigen-vector of the largest eigen value of the sample variance covariance matrix and splitting at the largest gap between projected points. Blue points are on the outer sphere. Structuring Interactive Cluster Analysis R.W. Oldford
-> 2 -> 3 Refinement sequence: Refine partition (2) to move to a new partition containing three groups. 1 • Refinement move: • select group whose sample var-cov matrix has largest eigen-value • for that group, project and split as before. Green points are also on the outer sphere. Structuring Interactive Cluster Analysis R.W. Oldford