Chapter 4 Clustering

Chapter 4Clustering

Neural Network Based Approaches Figure 5.12 Single artificial neuron with three inputs Table 5.4 All possible input patterns from Figure 5.12

Finding the output value

Finding the Output Value w1=2 w2=-4 w3=1 Table 5.5 Input patterns for the neural network

Kohonen Self-Organizing Map • Invented by TeuvoKohonen, a professor of the Academy of Finland, in 1979. • Provides a data visualization technique which helps to understand high dimensional data by reducing the dimensions of data to a map • SOM also represents clustering concept by grouping similar data together.

Kohonen Self-Organizing Map SOM reduces data dimensions and displays similarities among data.

Kohonen SOM • The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. • The procedure for placing a vector from data space onto the map is to first find the node with the closest weight vector to the vector taken from data space. • Once the closest node is located it is assigned the values from the vector taken from the data space.

How the algorithm works • Intialize the weights • Get best matching uint • Scale neighbors • Determining neighbors • Learning

Training a SOM Training occurs in several steps and over many iterations: • Each node's weights are initialized. • A vector is chosen at random from the set of training data and presented to the lattice. • Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU). • The radius of the neighbourhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighbourhood. • Each neighbouring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered. • Repeat step 2 for N iterations.

Components of a SOM 1. Data Colorsare represented in three dimensions (red, blue, and green.) The idea of the self-organizing maps is to project the n-dimensional data (here it would be colors and would be 3 dimensions) into something that be better understood visually (in this case it would be a 2 dimensional image map).

Components of a SOM 2. Weight Vectors • Each weight vector has two components. • The first part of a weight vector is its data (R,G,B). • The second part of a weight vector is its natural location (x,y)

The problem • With these two components (the dataand weight vectors), how can one order the weight vectors in such a way that they will represent the similarities of the sample vectors?

Algorithm Initialize Map For t from 0 to 1 Randomly select a sample Get best matching unit Scale neighbours Increase t a small amount End for

Scaling & Neighborhood function Equation (5.1) wi(t+1)= wi(t) +hck(t) [x(t)-wi(t)]

Numerical Demonstration Figure 5.14 SOM Table 5.6 Values of the weights for SOM

Numerical Demonstration

Text Clustering • An increasingly popular technique to group similar documents • A search engine may retrieve documents and present as groups • For example, a keyword “Cricket” may retrieve two clusters: • Related to cricket sports • Related to cricket insect

Chapter 4 Clustering