800 likes | 910 Views
Modeling with IRENE. I ntegrated R -code for E ngineered N eural E volution Trevor Grant and Olcay Akman Department of Mathematics Illinois State University. Overview. Neural Evolution What is a Neural Network? Using genetic algorithms to find optimal parameters to nonlinear functions
E N D
Modeling with IRENE Integrated R-code for Engineered Neural Evolution Trevor Grant and Olcay Akman Department of Mathematics Illinois State University
Overview • Neural Evolution • What is a Neural Network? • Using genetic algorithms to find optimal parameters to nonlinear functions • Neural evolution • Special Population Attributes • Jump Connections • User defined libraries and learning functions • Mutating learning functions • Engineered Genetic Algorithms
Starting out simple β0 We begin by modeling the data with a simple linear model. We then look at the sum of the squared residuals (SSR). A value is assigned to the model based on this SSR. β1 β2 β3 Output (Y) Inputs (X1, X2, …, Xn)
Example Life Expectancy β0 1974 Statistics regarding Income Income: per capita income (1974) Life Exp: life expectancy in years (1969–71) Murder: murder and non-negligent manslaughter rate per 100,000 population (1976) HS Grad: percent high-school graduates (1970) Frost: mean number of days with minimum temperature below freezing (1931–1960) in capital or large city Murder Rate β1 Income β2 HS Grad % β3 Frost Output (Y) Inputs (X1, X2, …, Xn)
Residuals The difference between the estimated value and the fitted value is known as the residual
Sum of Squared Residuals Height A linear model is estimated which minimizes the sum of squared residuals (SSR). The distance between the estimates and the actual data points. Age
Relationship Linear Nonlinear True relationship may be (often is) non-linear Sometimes we know relationship and can use nonlinear regression methods such as Neural Networks Nonlinear least squares Sometimes we don’t know the functional form of the relationship. IRENE explores functional forms while estimating parameters. • Traditionally we estimate linear relationships.
Sum of Squared Residuals Height A nonlinear model reduces the sum of squared residuals and better models the actual data. Age
Anatomy of a neural network Layers Nodes
What’s in a node? A node contains a learning function The learning function takes input and parameters converts it to output.
A model has parameter values α11 α12 α13 α14
Let’s pretend the first observation contains these values 22 2 -14 .1
Now say a model has these parameters: 22 5 -4 2 .1 22 -14 .1
And the learning function on this node is exponential 1 5 -4 2 h1 .1 22 -14 .1
So the value for node h1 for the first observation is .1108 1 5 -4 2 h1 .1108 .1 22 -14 .1
This is repeated for each observation Each model has it’s own unique set of α. The fitted values of the output are functions of After this is complete a linear model is estimated. The values of the nodes in the last layer are regressed on the output. The sum of the squared residuals is assigned as the model’s value.
The linear model estimated • The sum of the squared residuals of the model (SSR) is referred to as the value of the model. We want a model that minimizes sum of squared residuals (or value).
Linear model estimated in a more complex neural network h11 h21 h12 h22 h13 NOTE: h11, h12, h13are not included in the final linear model. Only the nodes in the final layer are included in the linear model
Optimizing Parameters with Genetic Algorithms • Step 1: A population of models is created each with randomly assigned parameters • Step 2: Models ‘mate’ in the hope of creating ‘children’ models with better value (lower SSR). • From now on we will refer to each unique set of parameters in a model as a creature. A collection of creatures, models with identical topology but different parameters, is referred to as a species.
Copy this model 200 times, each copy has randomly assigned parameter values Each individual collection of parameters is referred to as a creature. The collection of creatures for a given topology (arrangement of layers and nodes) is referred to as a species. Creature Species
Species A species has a unique arrangement of nodes, layers and learning functions. Even though these creatures have the same arrangement of layers and nodes, they have a different learning function and so they are different species ≠ Sigmoid Learning Function Exponential Learning Function
Then each creature has a different computed value (SSR), and assigned ID#, this is saved in a table. Model ID Sum Squared Resid. (SSR) ID # 001 41,240 215,635 ID # 002 ID # 003 3,612
Two creatures are selected with probability weighted according to model fitness. Model ID Sum Squared Resid. (SSR) ID # 001 41,240 215,635 ID # 002 ID # 003 3,612
Each creature can be represented by DNA 2.512 .105 51.25 -15.2 Model Structure α11 α12 α13 α14
Two methods of mating Average Crossover A ‘cut point’ is randomly determined, every parameter before the cut point is inherited from the father, after the cut point each parameter is inherited from the mother • The average of each parameter in the mother’s and father’s DNA is averaged in the child’s DNA
DNA is selected from the two creatures chosen to mate. Model ID Sum Squared Resid. (SSR) Father α11=3.613 α12=26.252 α13=-25.12 α14=104.4 ID # 001 41,240 215,635 ID # 002 Mother α11=2.512 α12=.105 α13=51.25 α14=-15.2 ID # 003 3,612
Average Method Father Α11=3.613 Α12=26.252 Α13=-25.12 Α14=104.4 Child Α11=(3.613 +2.512)/2 =3.0625 Α12=(26.252 +.105)/2 =13.1785 Α13=(-25.12 +51.25)/2 =13.065 Α14=(104.4 -15.2)/2 =44.6 Mother α11=2.512 α12=.105 α13=51.25 α14=-15.2
Average Method Father Α11=3.613 Α12=26.252 Α13=-25.12 Α14=104.4 Child Α11=3.0625 Α12=13.1785 Α13=13.065 Α14=44.6 Mother α11=2.512 α12=.105 α13=51.25 α14=-15.2
Crossover Method • A random number between one and the length of the parameter sequence is chosen. • This is the ‘cut point’. The child inherits parameters from the father before this point, from the mother after.
Crossover Method: Cut point at position two Father Α11=3.613 Α12=26.252 Α13=-25.12 Α14=104.4 Α11=3.613 Α12=26.252 Child Mother α11=2.512 α12=.105 α13=51.25 α14=-15.2 α13=51.25 α14=-15.2
The least fit creatures are killed to make room for the new children Model ID Sum Squared Resid. (SSR) ID # 001 41,240 3,289 ID # 002 ID # 003 215,635
The least fit creatures are killed to make room for the new children Model ID Sum Squared Resid. (SSR) ID # 001 41,240 3,289 ID # 002
The least fit creatures are killed to make room for the new children Model ID Sum Squared Resid. (SSR) ID # 001 41,240 3,289 ID # 002 Model Structure Α11=3.0625 Α12=13.1785 Α13=13.065 Α14=44.6
The children are assigned new ID numbers and their value (SSR) is computed Model ID Sum Squared Resid. (SSR) ID # 001 41,240 3,289 ID # 002 ID # 004 6,755
This process repeats several times Model ID Sum Squared Resid. (SSR) ID # 001 41,240 3,289 ID # 002 ID # 004 6,755
This process repeats several times Model ID Sum Squared Resid. (SSR) ID # 005 4,242 3,289 ID # 002 ID # 004 6,755
This process repeats several times Model ID Sum Squared Resid. (SSR) ID # 005 4,242 3,289 ID # 002 ID # 007 3,111
This process repeats several times Model ID Sum Squared Resid. (SSR) ID # 008 4,841 3,289 ID # 002 ID # 007 3,111
Eventually there is convergence at an optimum (either local or global) Model ID Sum Squared Resid. (SSR) ID # 239 2,015 2,015 ID # 159 ID # 412 2,015
At convergence we kill all the extra creatures in the species (to free up memory) Model ID Sum Squared Resid. (SSR) ID # 239 2,015 2,015 ID # 159 ID # 412 2,015
What is neural evolution? Neural evolution: simultaneously explore new topologies while optimizing existing topologies. New species are born out of old species.
‘Growing’ new nodes (We don’t always wait for convergence to add new layers and nodes…)
We call each arrangement of layers, nodes and learning functions a species.
Who lives? Who dies? • After each generation a roster of all creatures is created and ordered according to value.
Who lives? Who dies? • If there is at least one creature of species in the top 60%* of a list of all creatures the species survives. Otherwise the entire species is eradicated. • *60% is arbitrary. We can set that to other proportions. We’ll talk about this more in engineered genetic algorithms.
Example: Species 2 Survivors: No creature of Species 1 is among them Species 3 60% Species 1
While each species searches for optimums, new ones are born and others dies out.
We could search forever, but we stop our search based on time or generations elapsed.
Special Population Attributes Jump connections, user defined libraries and learning functions, and mutating functional forms.