Mikko Vapa, researcher student InBCT 3.2 Cheese Factory / P2P Communication Agora Center

Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer EnvironmentMaster’s Thesis by Joni Töyrylä 3.9.2004 Mikko Vapa, researcher studentInBCT 3.2 Cheese Factory / P2P Communication Agora Center http://tisu.it.jyu.fi/cheesefactory

Contents • Resource Discovery Problem • Related Work • Peer-to-Peer Network • Neural Networks • Evolutionary Computing • NeuroSearch • Research Environment • Research Cases • Fitness • Population • Inputs • Resources • Queriers • Brain Size • Summary and Future

Resource Discovery Problem • In peer-to-peer (P2P) resource discovery problem a P2P node decides based on local knowledge which neighbors would be the best targets (if any) for the query to find the needed resource • A good solution locates the predetermined number of resources using minimal number of packets

NeuroSearch • NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its behavior to given environment • neural network for deciding whether to pass the query further down the link or not • evolution for breeding and finding out the best neural network in a large class of local search algorithms Neighbor Node Forward the query Query Neighbor Node Forward the query

NeuroSearch’s Inputs • The internal structure of NeuroSearch algorithm • Multiple layers enable the algorithm to express non-linear behavior • With enough neurons the algorithm can universally approximate any decision function

NeuroSearch’s Inputs • Biasis always 1 and provides means for neuron to produce non-zero output with zero inputs • Hops is the number of links the message has gone this far • Neighbors (also known as currentNeighbors or MyNeighbors) is the amount of neighbor nodes this node has • Target’s neighbors (also known as toNeighbors) is the amount of neighbor nodes the message’s target has • Neighbor rank (also known as NeighborsOrder) tells target’s neighbor amoun related to current node’s other neighbors • Sent is a flag telling if this message has already been forwarded to the target node by this node • Received (also known as currentVisited) is a flag describing whether the current node has got this message earlier

NeuroSearch’s Training Program • The neural network weights define how neural network behaves so they must be adjusted to right values • This is done using iterative optimization process based on evolution and Gaussian mutation Define thenetwork conditions Iteratethousandsofgenerations Create candidate algorithmsrandomly Select the bestones for nextgeneration Breed a newpopulation Define the quality requirementsfor the algorithm Finally select thebest algorithm forthese conditions

Research Environment • The peer-to-peer network being tested contained: • 100 power-law distributed P2P nodes with 394 links and 788 resources • Resources were distributed based on the number of connections the node has meaning that high-connectivity nodes were more likely to answer to the queries • Topology was static so nodes were not disappearing or moving • Querier and the queried resource were selected randomly and 10 different queries were used in each generation (this was found to be enough to determine the overall performance of the neural network) • Requirements for the fitness function were: • The algorithm should locate half of the available resources for every query (each obtained resource increased fitness 50 points) • The algorithm should use as minimal number of packets as possible (each used packet decreased fitness by 1 point) • The algorithm should always stop (stop limit for number of packets was set to 300)

Research Environment

Research Cases - Fitness • Fitness value determines how good the neural network is compared to others • Even smallest and simplest neural networks manage to have fitness value over 10000 • Fitness value is calculated for poor NeuroSearch as following: Fitness = 50 * replies – packets = 50*239 – 1290 = 10660 Note: Because of bug Steiner tree does not locate half of replies and thus gets a lower fitness than HDS

Research Cases – Random Weights • 10 million new neural networks were randomly generated • It seems that over 16000 fitness values cannot be obtained purely by guessing and therefore we need optimization method

Research Cases - Inputs • Different inputs were tested individually and together to get a feeling what inputs are important Using Hops we can forexample design rules: ”I have travelled 4 hops,I will not send further”

”Target node contains 10 neighbors,I will send further” ”Target node contains the most number ofneighbors compared to all my neighbors,I will not send further”

”I have 7 neighbors,I will send further” ”I have received this query earlier,I will not send further”

The results indicate that using only one topological information is more efficient than combining it with other topological information (the explanation for this behavior is still unclear)

Also the results indicate that using only one query related information is more efficient than combining it with other query related information (the explanation for this behavior is also unclear)

Research Cases - Resources • The needed percentage of resources was varied and the results compared to other local search algorithms (Highest Degree Search and Breadth-First Search) and to near-optimal search trees (Steiner) Note: Breadth-FirstSearch curve needsto be halved becausethe percentage wascalculated to half ofresources and not allavailable resources

Research Cases - Queriers • The effect of lowering the amount of queriers per generation to calculate fitness value of neural network was examined • It was found that the number ofqueriers can be dropped from 50 to 10 and still we get reliable fitness values Speeds up the optimizationprocess significantly

Research Cases – Brain Size • The amount of neurons on first and second layer were varied • It was found that there exists many different kind of NeuroSearch algorithms

Research Cases – Brain Size • Also optimization of larger neural networks takes more time

Research Cases – Brain Size • And there exists an interesting breadth-first search vs. depth-first search dilemma where: • smaller networks obtain best fitness values with breadth-first search strategy, • medium-sized networks obtain best fitness values with depth-first search strategy and • large-sized networks obtain best fitness values with breadth-first search strategy • In overall it seems that best fitness 18091.0 can be obtained with breadth-first strategy using 5 hops with neuron size of 25:10 (25 on the first hidden layer and 10 on the second hidden layer)

20:10 had the greatest average hops value What happens if the number of neuronson 2nd hidden layer is increased? Willthe average number of hops decrease? 25:10 had the greatest fitness value Would more generations than 100.000 increase the fitness when 1st hiddenlayer contains more than 25 neurons?

Summary and Future • The main findings of the thesis were that: • Population size of 24 and query amount of 10 are sufficient • Optimization algorithm needs to be used, because randomly guessing neural network weights does not give good results • Individual inputs give better results than combination of two inputs (however the best fitnesses can be obtained by using all 7 inputs) • By choosing specific set of inputs NeuroSearch may imitate any existing search algorithm or it may behavior as combination of any of those • Optimal algorithm (Steiner) has efficiency of 99%, whereas the best known local search algorithm (HDS) achieves 33% and NeuroSearch 25% • Breadth-first search vs. Depth-first search dilemma exists, but no good explanation can be given yet

Summary and Future • In addition to the problems shown this far, for the future work of NeuroSearch it is suggested that: • More inputs would be designed such that they provide useful information e.g., the number of received replies, inputs used by Highest-Degree Search algorithm, inputs that define how many forwarding decisions have already been done in the current decision round and how many are still left • Probability based output instead of threshold function could also be tested • The correct neural network architecture and the size of population could be dynamically adjusted during evolution to find an optimal structure more easily

Mikko Vapa, researcher student InBCT 3.2 Cheese Factory / P2P Communication Agora Center