200 likes | 286 Views
Chapter 4 Clustering . Neural Network Based Approaches. Figure 5.12 Single artificial neuron with three inputs. Table 5.4 All possible input patterns from Figure 5.12. Finding the output value. Finding the Output Value. w 1=2 w 2=-4 w 3=1. Table 5.5
E N D
Neural Network Based Approaches Figure 5.12 Single artificial neuron with three inputs Table 5.4 All possible input patterns from Figure 5.12
Finding the Output Value w1=2 w2=-4 w3=1 Table 5.5 Input patterns for the neural network
Kohonen Self-Organizing Map • Invented by TeuvoKohonen, a professor of the Academy of Finland, in 1979. • Provides a data visualization technique which helps to understand high dimensional data by reducing the dimensions of data to a map • SOM also represents clustering concept by grouping similar data together.
Kohonen Self-Organizing Map SOM reduces data dimensions and displays similarities among data.
Kohonen SOM • The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. • The procedure for placing a vector from data space onto the map is to first find the node with the closest weight vector to the vector taken from data space. • Once the closest node is located it is assigned the values from the vector taken from the data space.
How the algorithm works • Intialize the weights • Get best matching uint • Scale neighbors • Determining neighbors • Learning
Training a SOM Training occurs in several steps and over many iterations: • Each node's weights are initialized. • A vector is chosen at random from the set of training data and presented to the lattice. • Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU). • The radius of the neighbourhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighbourhood. • Each neighbouring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered. • Repeat step 2 for N iterations.
Components of a SOM 1. Data Colorsare represented in three dimensions (red, blue, and green.) The idea of the self-organizing maps is to project the n-dimensional data (here it would be colors and would be 3 dimensions) into something that be better understood visually (in this case it would be a 2 dimensional image map).
Components of a SOM 2. Weight Vectors • Each weight vector has two components. • The first part of a weight vector is its data (R,G,B). • The second part of a weight vector is its natural location (x,y)
The problem • With these two components (the dataand weight vectors), how can one order the weight vectors in such a way that they will represent the similarities of the sample vectors?
Algorithm Initialize Map For t from 0 to 1 Randomly select a sample Get best matching unit Scale neighbours Increase t a small amount End for
Scaling & Neighborhood function Equation (5.1) wi(t+1)= wi(t) +hck(t) [x(t)-wi(t)]
Numerical Demonstration Figure 5.14 SOM Table 5.6 Values of the weights for SOM
Text Clustering • An increasingly popular technique to group similar documents • A search engine may retrieve documents and present as groups • For example, a keyword “Cricket” may retrieve two clusters: • Related to cricket sports • Related to cricket insect