Neural Nets and Genetic Algo rithm s in Data Mining

Neural Nets and Genetic Algorithms in Data Mining Muhammad Ali Yousuf

References • Based on Lectures by M. Zaki • http://www.cs.rpi.edu/~zaki/dmcourse/ • and • DavidSquire • http://www.csse.monash.edu.au

Lecture Outline • Why study neural networks? • What are neural networks and how do they work? • History of artificial neural networks (NNs) • Applications and advantages • Choosing and preparing data • An illustrative example

Why Study Neural Networks? - 1 • Two basic motivations for NN research: • to model brain function • to solve engineering (and business) problems

Why Study Neural Networks? - 1 • So far as modeling the brain goes, it is worth remembering:“… metaphors for the brain are usually based on the most complex device currently available: in the seventeenth century the brain was compared to a hydraulic system, and in the early twentieth century to a telephone switchboard. Now, of course, we compare the brain to a digital computer.”

Why Study Neural Networks? - 2 • Historically, NN theories were first developed by neurophysiologists. • For engineers (and others), the attractions of NN processing include: • inherent parallelism • speed (avoiding the von Neumann bottleneck) • distributed “holographic” storage of information • robustness • generalization • learning by example rather than having to understand the underlying problem (a double-edged sword!)

Why Study Neural Networks? - 3 • It is important to be wary of the black-box characterization of NNs as “artificial brains”

Why Study Neural Networks? - 3 • Beware of the anthropomorphisms common in the field (let alone in popular coverage of NNs!) • learning • memory • training • forgetting • Remember that every NN is a mathematical model. There is usually a good statistical explanation of NN behaviour

What Is a Neuron? - 1 • a (biological) neuron is a node that has many inputs and one output • inputs come from other neurons or sensory organs • the inputs are weighted

What Is a Neuron? - 1 • weights can be both positive and negative • inputs are summed at the node to produce an activation value • if the activation is greater than some threshold, the neuron fires

What Is a Neuron? - 2 • In order to simulate neurons on a computer, we need a mathematical model of this node • node i has ninputsxj • each connection has an associated weightwij • the net input to node i is the sum of the products of the connection inputs and their weights:

What Is a Neuron? - 2 • The output of node i is determined by applying a non-linear transfer functionfi to the net input:

What Is a Neuron? - 3 • A common choice for the transfer function is the sigmoid:

What Is a Neuron? - 3 • The sigmoid has similar non-linear properties to the transfer function of real neurons: • bounded below by 0 • saturates when input becomes large • bounded above by 1

What is a neural network? • Now that we have a model for an artificial neuron, we can imagine connecting many of them together to form an Artificial Neural Network: Output layer Hidden layer Input layer

Applications of NNs • Predicting financial time series • Diagnosing medical conditions • Identifying clusters in customer databases • Identifying fraudulent credit card transactions • Hand-written character recognition (cheques) • Predicting the failure rate of machinery • and many more….

Using a Neural Network for Prediction - 1 • Identify input and outputs • Preprocess inputs - often scale to the range [0,1] • Choose a NN architecture (see next slide) • Train the NN with a representative set of training examples (usually using BP) • Test the NN with another set of known examples • often the known data set is divided in to training and test sets. Cross-validation is a more rigorous validation procedure. • Apply the model to unknown input data

Using a Neural Network for Prediction - 2 • The network designer must decide the network architecture for a given application • It has been proven that one hidden layer is sufficient to handle all situations of practical interest

Using a Neural Network for Prediction - 2 • The number of nodes in the hidden layer will determine the complexity of the NN model (and thus its capacity to recognize patterns) • BUT, too many hidden nodes will result in the memorization of individual training patterns, rather than generalization • Amount of available training data is an important factor - must be large for a complex model

Neural network An example Living space Size of garage Age of house Heating type Other attributes • Note that here the network is treated as a “black-box” Appraised value

Issues in choosing the training data set • The neural network is only as good as the data set with which it is trained upon • When selecting training data, the designer should consider: • Whether all important features are covered • What are the important/necessary features • The number of inputs • The number of outputs • Availability of hardware

Preparing Data • Preprocessing is usually the most complicated and time-consuming issue when working with NNs (as with any DM tool)

Preparing Data • Main types of data encountered: • Continuous data with known min/max values (range/domain known). There problems with skewed distributions: solutions include removing values or using log function to filter • Ordered, discrete values: e.g. low, medium, high • Categorical values (no order): e.g. {“Male”, “Female”, “Unknown”} ( use “1 of N coding” or “1 of N-1 coding”)

Preparing Data • There will always be other problems where the analyst’s experience and ingenuity must be used

Illustrative Example – 1(Following http://www.geog.leeds.ac.uk/courses/level3/geog3110/week6/sld047.htm Ff.) • Organization • a building society with 5 million customers and using a direct mailing campaign to promote a new investment product to existing savers

Illustrative Example – 1(Following http://www.geog.leeds.ac.uk/courses/level3/geog3110/week6/sld047.htm Ff.) • Available data • The 5 million customer database • Results of an initial test mailing where 50,000 customers (randomly selected) were mailed. There were 1000 responses (2%) in terms of product take up • Objective • Find a way of targeting the mailing so that: • the response rate is doubled to 4% • at least 40,000 new investment holders are brought in

Illustrative Example - 2 • For simplicity we assume that only two attributes (features) of a customer are relevant for this situation: • TIMEAC: time (in years) that the account has been open • AVEBAL: average account balance over the past 3 months

Illustrative Example - 2 • Examining the data, it was obvious to analysts that the pattern of respondents is different from the non-respondents. But what are the reasons for this? • We need to know the reasons to select/develop a model for identifying such responding customers

Neural network Illustrative Example - 3 • A neural network can be used to model this data without having to make any assumptions for the reasons of such patterns • Let a neural network learn the pattern from the data and classify the data for us AVEBAL SCORE TIMEAC

Illustrative Example - 4Preparing the training and test data sets • We have 1000 respondents. Randomly split in to a training set and a test set: 500 respondents+ 500 non-respondents 1000 (training test) 500 respondents+ 500 non-respondents 1000 (test test)

Illustrative Example - 4: Preparing the training and test data sets • The network is trained by making repeated passes over the training data, adjusting weights using the BP algorithm

Illustrative Example - 5Using the resultant network • Order the score value for the test in descending order (see next slide) • 45 degree line shows the results if random ranking is used (since the test set consists of 50% “good” customers) • The extent to which the graph deviates from the 45 degree line shows the power of the model to discriminate between good and bad customers • Now calculate the number of customers required to be mailed to achieve the company objective

Illustrative Example - 6

Illustrative Example - 7 • Analysis shows that company objectives are achievable: 40,000 product holders at 4% response • Can save hundreds of thousands of dollars in mailing costs • Better than the other model in this example

Unsupervised Learning - Motivation - 1 • The feed-forward back-propagation NNs discussed last week are an example of a supervised learning technique • In supervised learning, the aim is to discover a relationship between the inputs and outputs of a system

Unsupervised Learning - Motivation - 1 • This relationship can be used for tasks such as prediction, estimation or classification • A known training set of input/output pairs is used to train the network

Motivation - Unsupervised Learning • Many data mining tasks are not suited to this approach • Often the data mining task is to discover structure in the data set, without any prior knowledge of what is there

Motivation - Unsupervised Learning • This is an example of unsupervised learning (we have already seen the example of the K-means clustering algorithm) • A class of neural networks called Self-Organizing Maps (SOMs) can be used for this task

The Cortex - 1 • SOMs research was inspired by the observation of topologically correct sensory maps in the cortex • In humans, the cortex consists of a layer of nerve tissue about 0.2m2 in area and 2-3mm in thickness • It is highly convoluted to save space, and forms the exterior of the brain - it’s the folded, wrinkled stuff we see when we look at a brain

The Cortex - 2 Lateral (schematic) view of the human left-brain hemisphere. Various cortical areas devoted to specialized tasks can be distinguished [RMS1992, p. 18]

Sensory Surfaces • Most signals that the brain receives from the environment come from “sensory surfaces” covered with receptors: • skin (touch and temperature) • retina (vision) • cochlea [in the ear] (1-D sound sensor) • It is usually found that the “wiring” of the nervous system exhibits topographic ordering: • signals from adjacent receptors tend to be conducted to adjacent neurons in the cortex

Kohonen’s Self-Organizing Network - 1 • Kohonen [Koh1982] studied a system consisting of a two-dimensional layer of neurons, with the properties: • each neuron identified by its position vector r (i.e. its coordinates) • input signals to the layer represented by a feature vector x (usually normalized) • output of each neuron is a sigmoidal function of its total activation (as for MLPs ):

Kohonen’s Self-Organizing Network - 2 • Each neuron r forms the weighted sum of the input signals. The external activation is:(the magnitudes of the weight vectors are usually normalized) • In addition to the input connections, the neurons in the layer are connected to each other • the layer has internal feedback

Kohonen’s Self-Organizing Network - 2 • The weight from neuron r’ to neuron r is labelled grr’ • These lateral inputs are superimposed on the external input signal:

Kohonen’s Self-Organizing Network - 3 • The output of neuron r is thus given by: • The neuron activities are the solutions of this system of non-linear equations

Kohonen’s Self-Organizing Network - 3 The feedback due to the lateral connections grr’ is usually arranged so that it is excitatory at small distances and inhibitory at large distances. This is often called a “Mexican Hat” response

Kohonen’s Self-Organizing Network - 4 • The solution of such systems of non-linear equations is tedious and time-consuming. Kohonen avoided this by introducing a simplification. Kohonen’s model showing excitation zone around “winning” neuron [RMS p. 64]

Kohonen’s Self-Organizing Network - 5 • The response of the network is assumed to always be the same “shape”: • the response is 1 at the location of the neuron r* receiving maximal external excitation, and decreases to 0 as one moves away from r* • The excitation of neuron r is thus only a function of its distance from r*:

Kohonen’s Self-Organizing Network - 5 • The model then proposes a rule for changing the weights to each neuron so that a topologically ordered map is formed. Weight change is:

Kohonen’s Self-Organizing Network - 6 • Experiments have shown that the precise shape of the response is not critical • A suitable function is thus simply chosen. The Gaussian is a suitable choice:

Neural Nets and Genetic Algo rithm s in Data Mining

Neural Nets and Genetic Algo rithm s in Data Mining

Presentation Transcript

Neural Nets Using Backpropagation

Artificial Neural Networks and Data Mining

Neural Nets

Neural Networks in Data Mining “An Overview”

Biologically-Inspired Neural Nets

Genetic Programming and Artificial Neural Networks

Function Learning and Neural Nets

Neural Nets Applications

Artificial Neural Nets

Genetic Algorithms in Artificial Neural Networks

UI Issues, Neural Nets, RTS

Genetic Algorithms and Neural Networks

Neural networks for data mining

Neural Nets

Data Mining with Neural Networks

Genetic Data in

Genetic Data in

Chapter 7 Neural Networks in Data Mining

Increasing Use of Neural Networks in Data Mining

Artificial Neural Network in Data Mining

Artificial Neural Networks for Data Mining

Genetic Process Mining