220 likes | 312 Views
Advanced information retreival. Chapter 02: Modeling - Neural Network Model. Neural Network Model. A neural network is an oversimplified representation of the neuron interconnections in the human brain: nodes are processing units edges are synaptic connections
E N D
Advanced information retreival Chapter 02: Modeling - Neural Network Model
Neural Network Model • A neural network is an oversimplified representation of the neuron interconnections in the human brain: • nodes are processing units • edges are synaptic connections • the strength of a propagating signal is modelled by a weight assigned to each edge • the state of a node is defined by its activation level • depending on its activation level, a node might issue an output signal
Dendrites Axon Cell Body Neural Networks • Neural Networks • Complex learning systems recognized in animal brains • Single neuron has simple structure • Interconnected sets of neurons perform complex learning tasks • Human brain has 1015 synaptic connections • Artificial Neural Networks attempt to replicate non-linear learning found in nature
Neural Networks (cont’d) • Dendrites gather inputs from other neurons and combine information • Then generate non-linear response when threshold reached • Signal sent to other neurons via axon • Artificial neuron model is similar • Data inputs (xi) are collected from upstream neurons input to combination function (sigma)
Neural Networks(cont’d) • Activation function reads combined input and produces non-linear response (y) • Response channeled downstream to other neurons • What problems applicable to Neural Networks? • Quite robust with respect to noisy data • Can learn and work around erroneous data • Results opaque to human interpretation • Often require long training times
Input and Output Encoding • Neural Networks require attribute values encoded to [0, 1] • Numeric • Apply Min-max Normalization to continuous variables • Works well when Min and Max known • Also assumes new data values occur within Min-Max range • Values outside range may be rejected or mapped to Min or Max
Input and Output Encoding (cont’d) • Output • Neural Networks always return continuous values [0, 1] • Many classification problems have two outcomes • Solution uses threshold established a priori in single output node to separate classes • For example, target variable is “leave” or “stay” • Threshold value is “leave if output >= 0.67” • Single output node value = 0.72 classifies record as “leave”
Input Layer Hidden Layer Output Layer W0A W1A W1B WAZ W2A W2B WBZ W0Z W3A W0B W3B Node 1 Node A Node 2 Node Z Node B Node 3 Simple Example of a Neural Network • Neural Network consists of layered, feedforward, completely connected network of nodes • Feedforward restricts network flow to single direction • Flow does not loop or cycle • Network composed of two or more layers
Simple Example of a Neural Network (cont’d) • Most networks have Input, Hidden, Output layers • Network may contain more than one hidden layer • Network is completely connected • Each node in given layer, connected to every node in next layer • Every connection has weight (Wij) associated with it • Weight values randomly assigned 0 to 1 by algorithm • Number of input nodes dependent on number of predictors • Number of hidden and output nodes configurable
Input Layer Hidden Layer Output Layer W0A W1A W1B WAZ W2A W2B WBZ W0Z W3A W0B W3B Node 1 Node A Node 2 Node Z Node B Node 3 Simple Example of a Neural Network (cont) • Combination function produces linear combination of node inputs and connection weights to single scalar value • For node j, xij is ith input • Wij is weight associated with ith input node • I+ 1 inputs to node j • x1, x2, ..., xI are inputs from upstream nodes • x0 is constant input value = 1.0 • Each input node has extra input W0jx0j = W0j
Simple Example of a Neural Network (cont’d) • The scalar value computed for hidden layer Node A equals • For Node A, netA = 1.32 is input to activation function • Neurons “fire” in biological organisms • Signals sent between neurons when combination of inputs cross threshold
Simple Example of a Neural Network (cont’d) • Firing response not necessarily linearly related to increase in input stimulation • Neural Networks model behavior using non-linear activation function • Sigmoid function most commonly used • In Node A, sigmoid function takes netA = 1.32 as input and produces output
Simple Example of a Neural Network (cont’d) • Node A outputs 0.7892 along connection to Node Z, and becomes component of netZ • Before netZ is computed, contribution from Node B required • Node Z combines outputs from Node A and Node B, through netZ
Simple Example of a Neural Network (cont’d) • Inputs to Node Z not data attribute values • Rather, outputs are from sigmoid function in upstream nodes • Value 0.8750 output from Neural Network on first pass • Represents predicted value for target variable, given first observation
Sigmoid Activation Function • Sigmoid function combines nearly linear, curvilinear, and nearly constant behavior depending on input value • Function nearly linear for domain values -1 < x < 1 • Becomes curvilinear as values move away from center • At extreme values, f(x) is nearly constant • Moderate increments in x produce variable increase in f(x), depending on location of x • Sometimes called “Squashing Function” • Takes real-valued input and returns values [0, 1]
Back-Propagation • Neural Networks are supervised learning method • Require target variable • Each observation passed through network results in output value • Output value compared to actual value of target variable • (Actual – Output) = Error • Prediction error analogous to residuals in regression models • Most networks use Sum of Squares (SSE) to measure how well predictions fit target values
Back-Propagation (cont’d) • Squared prediction errors summed over all output nodes, and all records in data set • Model weights constructed that minimize SSE • Actual values that minimize SSE are unknown • Weights estimated, given the data set
Neural Network for IR: • From the work by Wilkinson & Hingston, SIGIR’91 Query Terms DocumentTerms Documents k1 d1 ka ka dj kb kb dj+1 kc kc dN kt
Neural Network for IR • Three layers network • Signals propagate across the network • First level of propagation: • Query terms issue the first signals • These signals propagate accross the network to reach the document nodes • Second level of propagation: • Document nodes might themselves generate new signals which affect the document term nodes • Document term nodes might respond with new signals of their own
Quantifying Signal Propagation • Normalize signal strength (MAX = 1) • Query terms emit initial signal equal to 1 • Weight associated with an edge from a query term node ki to a document term node ki: • Wiq = wiq sqrt ( i wiq ) • Weight associated with an edge from a document term node ki to a document node dj: • Wij = wij sqrt ( i wij ) 2 2
Quantifying Signal Propagation • After the first level of signal propagation, the activation level of a document node dj is given by: iWiqWij = i wiq wij sqrt ( i wiq ) * sqrt ( i wij ) • which is exactly the ranking of the Vector model • New signals might be exchanged among document term nodes and document nodes in a process analogous to a feedback cycle • A minimum threshold should be enforced to avoid spurious signal generation 2 2 2
Conclusions • Model provides an interesting formulation of the IR problem • Model has not been tested extensively • It is not clear the improvements that the model might provide