370 likes | 645 Views
CSE5230/DMS/2004/8. Data Mining - CSE5230. S elf-Organizing Maps (SOMs). Lecture Outline. Motivation unsupervised learning the cortex topographic feature maps biological self-organizing maps Artificial self-organizing maps Kohonen’s self-organizing network learning algorithm examples
E N D
CSE5230/DMS/2004/8 Data Mining - CSE5230 Self-Organizing Maps (SOMs)
Lecture Outline • Motivation • unsupervised learning • the cortex • topographic feature maps • biological self-organizing maps • Artificial self-organizing maps • Kohonen’s self-organizing network • learning algorithm • examples • Data mining examples • Text mining • Customer Understanding
Lecture Objectives • By the end of this lecture you should be able to • Explain the principal differences between MLPs and SOMs • Describe the properties of a topological feature map, with particular attention to the notion of similarity in feature space being mapped to proximity in the SOM • Describe how Kohonen networks are trained • Give examples of how SOMs can be used in data mining
Motivation - 1 • The feed-forward back-propagation NNs discussed last week are an example of a supervised learning technique • In supervised learning, the aim is to discover a relationship between the inputs and outputs of a system • This relationship can be used for tasks such as prediction, estimation or classification • A known training set of input/output pairs is used to train the network
Motivation - Unsupervised Learning • Many data mining tasks are not suited to this approach • Often the data mining task is to discover structure in the data set, without any prior knowledge of what is there • This is an example of unsupervised learning (we have already seen the example of the K-means clustering algorithm) • A class of neural networks called Self-Organizing Maps (SOMs) can be used for this task
The Cortex - 1 • SOMs research was inspired by the observation of topologically correct sensory maps in the cortex (e.g. the retinotopic, somatotopic, tonotopic maps) • In humans, the cortex consists of a layer of nerve tissue about 0.2m2 in area and 2-3mm in thickness • It is highly convoluted to save space, and forms the exterior of the brain - it’s the folded, wrinkled stuff we see when we look at a brain
The Cortex - 2 Lateral (schematic) view of the human left-brain hemisphere. Various cortical areas devoted to specialized tasks can be distinguished [RMS1992, p. 18]
Sensory Surfaces • Most signals that the brain receives from the environment come from “sensory surfaces” covered with receptors: • skin (touch and temperature) • retina (vision) • cochlea [in the ear] (1-D sound sensor) • It is usually found that the “wiring” of the nervous system exhibits topographic ordering: • signals from adjacent receptors tend to be conducted to adjacent neurons in the cortex
Topographic Feature Maps - 1 • This neighbourhood-preserving organization of the cortex is called a topographic feature map • For touch, maps of the body are found in the somatosensory cortex • In the primary visual cortex, neighbouring neurons tend to respond to stimulation of neighbouring regions of the retina • As well as these simple maps, the brain also constructs topographic maps of abstract features: • In the auditory cortex of many higher brains, a tonotopic map is found, where the pitch of received sounds is mapped regularly
Topographic Feature Maps - 2 Map of part of the body surface in the somatosensory cortex of a monkey Direction map for sound signals in the so-called “optical tectum” of an owl [RMS1992, p. 21]
Biological Self-Organizing Maps - 1 • The subject of SOMs arose from the question of how such topology-preserving mappings might arise in neural networks • It is probable that in biological systems that much of the organization of such maps is genetically determined, BUT: • The brain is estimated to have ~1013 synapses (connections), so it would be impossible to produce this organization by specifying each connection in detail – the genome does not contain that much information
Biological Self-Organizing Maps - 2 • A more likely scenario is that there are genetically specified mechanisms of structure formation that result in the creation of the desired connectivity • These could operate before birth, or as part of later maturation, involving interaction with the environment • There is much evidence for such changes: • the normal development of edge-detectors in the visual cortex of newborn kittens is suppressed in the absence of sufficient visual experience • the somatosensory maps of adult monkeys have been observed to adapt following the amputation of a finger
Biological Self-Organizing Maps - 3 Readaptation of the somatosensory map of the hand region of an adult nocturnal ape due to the amputation of one finger. Several weeks after the amputation of the middle finger (3), the assigned region has disappeared and the adjacent regions have spread out. [RMS, p. 117]
Artificial Self-Organizing Maps - 1 • In the NN models we have seen so far, every neuron in a layer is connected to every neuron in the next layer of the network • The location of a neuron in a layer plays no role in determining its connectivity or weights • With SOMs, the ordering of neurons within a layer plays an important role: • How should the neurons organize their connectivity to optimize the spatial distribution of their responses within the layer?
Artificial Self-Organizing Maps - 2 • The purpose of this optimization is to achieve the mapping: • Such a mapping allows neurons with similar tasks to communicate over especially short connection paths - important for a massively parallel system • Moreover, it results in the formation of topographic feature maps: • most important similarity relationships among the input signals are converted into spatial relationships between responding neurons Similarity of features Proximity of excited neurons
Kohonen’s Self-Organizing Network - 1 • Kohonen [Koh1982] studied a system consisting of a two-dimensional layer of neurons, with the properties: • each neuron identified by its position vector r (i.e. its coordinates) • input signals to the layer represented by a feature vector x (usually normalized) • output of each neuron is a sigmoidal function of its total activation (as for MLPs last week):
Kohonen’s Self-Organizing Network - 2 • Each neuron r forms the weighted sum of the input signals. The external activation is:(the magnitudes of the weight vectors are usually normalized) • In addition to the input connections, the neurons in the layer are connected to each other • the layer has internal feedback • The weight from neuron r’ to neuron r is labelled grr’ • These lateral inputs are superimposed on the external input signal:
Kohonen’s Self-Organizing Network - 3 • The output of neuron r is this given by: • The neuron activities are the solutions of this system of non-linear equations The feedback due to the lateral connections grr’ is usually arranged so that it is excitatory at small distances and inhibitory at large distances. This is often called a “Mexican Hat” response
Kohonen’s Self-Organizing Network - 4 • The solution of such systems of non-linear equations is tedious and time-consuming. Kohonen avoided this by introducing a simplification. Kohonen’s model showing excitation zone around “winning” neuron [RMS p. 64]
Kohonen’s Self-Organizing Network - 5 • The response of the network is assumed to always be the same “shape”: • the response is 1 at the location of the neuron r* receiving maximal external excitation, and decreases to 0 as one moves away from r* • The excitation of neuron r is thus only a function of its distance from r*: • The model then proposes a rule for changing the weights to each neuron so that a topologically ordered map is formed. Weight change is:
Kohonen’s Self-Organizing Network - 6 • Experiments have shown that the precise shape of the response is not critical • A suitable function is thus simply chosen. The Gaussian is a suitable choice: • The parameter s determines the length scale on which input stimuli cause changes in the map • usually learn coarse structure first and then the fine structure. This is done by letting s decrease over time • e on the previous slide, which specifies the size of each change, usually also decreases over time
Learning Algorithm 0. Initialization: start with appropriate initial values for the weights wr. Usually just random 1. Choice of stimulus: Choose an input vector x at random from the data set 2. Response: Determine the “winning” neuron r* most strongly activated by x 3. Adaptation: Carry out a “learning” step by modifying the weights:(Normalize weights if required) 4. Continue with step 1 until specified number of learning steps are completed
Examples - 1 SOM that has learnt data uniformly distributed on a square SOM that has learnt data on a rotated square, where points are twice as likely to occur in a circle at the centre of the square (relationship to clustering)
Examples - 2 2-dimensional SOM that has learnt data uniformly distributed in a 3-dimensional cube
Examples - 3 1-dimensional SOM that has learnt data uniformly distributed in a 2-dimensional circle
Examples - 4 2-dimensional SOM that has learnt 2-dimensional data containing 3 clusters
The SOM for Data Mining • The SOM is a good method for obtaining an initial understanding of a set of data about which the analyst does not have any opinion (e.g. no need to estimate number of clusters) • The map can be used as an initial unbiased starting point for further analysis. Once the clusters are selected from the map, they are analyzed to find out the reasons for such clustering • It may be possible to determine which attributes were responsible for the clusters • It may also be possible to identify some attributes which do not contribute to the clustering
Example: Text Mining with a SOM - 1 • This example comes from the WEBSOM project in Finland: http://websom.hut.fi/websom/ • WEBSOM is a method for organizing miscellaneous text documents onto meaningful maps for exploration and search. WEBSOM automatically organizes the documents onto a two-dimensional grid so that related documents appear close to each other
Example: Text Mining with a SOM - 2 • This map was constructed using more than one million documents from 83 USENET newsgroups: • Color denotes the density or the clustering tendency of the documents • Light (yellow) areas are clusters and dark (red) areas empty space between the clusters • This is a little difficult to read, but WEBSOM allows one to zoom in
Example: Text Mining with a SOM - 3 • Zoomed view of the WEBSOM map: blues - rec.music.bluenotebooks - rec.arts.books classical - rec.music.classical humor - rec.humor lang.dylan - comp.lang.dylan music - music shostakovich - alt.fan.shostakovich
Example: Customer Understanding with a SOM - 1 • This example is from [YaZ2001], using KDD 2000 Cup data: • clickstream and purchase data from Gazelle.com, a retailer of legware and legcare products • On-line retailers are interested in understanding their customers, so that they can • Better organize the website • Better target marketing • Improve strategies for acquiring and retaining customers • Gazelle.com was interested in analysing the differences between light ( $12) and heavy spenders ( $12)
Example: Customer Understanding with a SOM - 2 • Data set and Feature Selection • Data set has more than 1700 records, each with 426 features and a variable indicating light or heavy spending. Features include: • age (discrete) • income band (ordered), e.g. • < $15,000, $15,000-$19,999, $20,000-$29,999,… • percentage of discounted items in purchase (continous) • [YaZ2001] compared a variety of methods for generating a reduced feature set. These were adapted from criteria used in other DM techniques: • Discriminant analysis, decision tree, naïve Bayes, Principal Components Analysis (PCA) • The different methods highlighted a variety of features, e.g.: • discount rate, average and total weight of items, minimum shipping order amount, geographic location, house value, vendor, main template views, etc.
Example: Customer Understanding with a SOM - 3 • [YaZ2001] selected the eight variables indicated by discriminant analysis Projection onto the first two components provided by PCA of these data did not show clear separation into two clusters: x: heavy spendero: light spender This could indicate the presence of a non-linear relationship
Example: Customer Understanding with a SOM - 4 • Then applied a modified self-organizing map, called a Generative Topographic Mapping (GTM) to produce another 2-D visualization of the data: Separation of classes into seven clusters now much better: 1: heavy 88%, light 12%2: heaving 93%, light 7%3: heavy 100%4: light 100%5: light 94%, heavy 6%6: light 93%, heavy 7%7: light 97%, heavy 3%
Example: Customer Understanding with a SOM - 5 • Analysis of the features corresponding to these clusters reveals facts such as: • Cluster 4 (100% light) are those customers with more than 40% discounted items in their purchases • Clusters 1-3: Those who heard about the company from friend/family are light spenders……but those who heard from a means other than news, e-mail, print ad, direct mail, or friend/family were heavy spenders • Clusters 6-7: people who frequently wear casual or athletic socks are light spenders • Insights such as these could be used for managing marketing, and also pricing policies (e.g. discounts)
References • [Koh1982] Teuvo Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics, 43:59-69, 1982 • [RMS1992] Helge Ritter, Thomas Martinetz and Klaus Schulten, Neural computation and self-organizing maps: an introduction, Addison-Wesley, 1992 • [YaZ2001] Jinsan Yang and Byoung-Tak Zhang, Customer Data Mining and Visualization by Generative Topographic Mapping Methods, In Simeon J. Simoff, Monique Noirhomme-Fraiture and Michael H. Böhlen eds., Proceedings of the International Workshop on Visual Data Mining (VDM@ECML/PKDD2001), Freiburg, Germany, pp. 55-66, 4 September 2001