Yevgeniy Ivanchenko University of Jyväskylä yeivanch@cc.jyu.fi

Adaptation of Neural Nets for Resource Discovery Problem in Dynamic and Distributed P2P Yevgeniy IvanchenkoUniversity of Jyväskylä yeivanch@cc.jyu.fi Environment

OBJECTIVES (I) • Since nothing is known about decision mechanism of NeuroSearch we need to look inside the algorithm to understand its behavior. • Since nothing is known about behavior of NeuroSearch algorithm in dynamic environment, we need to know its behavior under conditions that are approximated to real life situation.

OBJECTIVES (II) • To understandbehavior of NeuroSearch data analysis techniques were used. The Self-Organizing Maps (SOM) is well known tool to perform data mining task. • Set of rules was obtained based on the analysis of NeuroSearch. The rules were tested in static environment. The question that arises here: Is it possible to use the algorithm, which utilized properties of static environment, in dynamic scenario?

OBJECTIVES (III) • If we know the inner structure of decision mechanism of NeuroSearch we will be able to tell about contribution of every input to particular decision of the algorithm. This for example can be used to remove unnecessary input information. • This also can help evaluate complexity and robustness of the algorithm.

SOM (I) • SOM is neural network model that maps high dimensional space onto low-dimensional space (usually two dimensional). • After using SOM algorithm similar vectors from the input space are located near each other in the output space. This can help investigate properties of obtained clusters and as a consequence causes that produced these clusters on the output map.

SOM (II) • Usually SOM represents itself either hexagonal or rectangular grid of neurons. In the figure R1 and R2 denote different neighborhood size. • During the training process size of neighborhood is slightly decreased to provide more accurate adjustment of the weights of the neurons. R2 R1

SOM (III) • In the figure one can see that the neurons that are ‘covered’ by neighborhood kernel function move closer to the input vector. • Best Matching Unit (BMU) is the closest neuron to the current input vector. • The weights of the neurons are updated according to the kernel function and the distance to BMU. BMU

DATA ANALYSIS (I) • NeuroSearch can be considered as the main part of information model of the system. To build this system black box method was used: we are modeling external behavior of the system and at the same time we don’t know what are the causes of particular behavior of the system. • To investigate decision mechanism of NeuroSearch analysis of input-output pairs was done using SOM.

DATA ANALYSIS (II) • To perform the analysis we used Component plane & U-matrix with ‘hit’ distribution on it. Component plane visualizes values of all components of the vectors according to the output map. U-matrix is one of possible ways to visualize the output map. The ‘hits’ on the U-matrix correspond to the decisions of NeuroSearch. • This approach allows us investigating not only contribution of each component to particular decision, but also the correlations between components.

DATA ANALYSIS (III) • The figure shows U-matrix (the left side of the figure) & fragment of Component plane (the right side of the figure). • It is easy to see variable From is responsible for stopping further forwarding of the queries where it is 1. • Other variables have different values in the area where From is 1, for example variable toUnsearchedNeighbors has different values in this area. toUnsearchedNeighbors U-matrix From

DATA ANALYSIS (IV) • After the analysis it was found that 4 variables (From, toVisited, Sent and currentVisited) are responsible for stopping further forwarding of the queries. • Variables toUnsearchedNeighbors and Neighbors are correlated. • Variables packetsNow and Hops are highly correlated. • Variables fromNeighborAmount, packetsNow and Hops are correlated somehow. • NeuroSearch mostly doesn’t send the queries further if Neighbors or toUnsearchedNeighbors is small.

DATA ANALYSIS (V) • Further investigation of the algorithm is based on Hops because only this variable shows the state of the algorithm in particular time interval, in other words analyzing intervals of this variable we can monitor the queries through their path. • The maximum length of the queries’ path is 7. Thus we have 7 different cases to analyze. • Data for each case contains only samples with the currently investigating value of Hops variable. All samples where at least one of From, Sent, currentVisted or toVisited variables is equal to 1 were removed as well. It is because we already know behavior of the algorithm in these areas.

DATA ANALYSIS (VI) • After investigation of the algorithm for the different values of Hops we have produced Rule Based Algorithm (RBA). RBA is based on rules that were extracted using analysis of U-matrix and corresponding component plane. • General strategy of the algorithm is quite simple: A decision is mostly based on interconnection between Hops, Neighbors/toUnsearchedNeighbors and NeighborsOrder values. In the beginning the algorithm sends the queries to the most connected nodes. When number of hops in the query is increasing NeuroSearch slightly starts to forward the queries to low-connected nodes.

DATA ANALYSIS (VII) The table shows efficiency of four algorithms. One can see that NeuroSearch and RBA have almost the same level of performance. This means that RBA adapted behavior of NeuroSearch and we can say that SOM suits well for analyzing of NeuroSearch. Both these algorithms have better performance compared to BFS2 and BFS3. Comparison between algorithms

DYNAMIC ENVIRONMENT (I) • Since RBA is based on decision mechanism of NeuroSearch it is possible to evaluate behavior of NeuroSearch using RBA in dynamic environment. • As a simulation environment P2P extension for NS-2 was built. • The environment provides quite high dynamical changes. There are two different classes of probabilities that define dynamical changes in the network. The first class is defined randomly before starting the simulation. The second is defined by the formulas:

DYNAMIC ENVIRONMENT (II) To make qualitative evaluation of performance, RBA was compared to BFS2 and BFS3 in static and dynamic environments. Number of replies and amount of used packets in static environment are shown in the figures:

DYNAMIC ENVIRONMENT (III) • Analyzing behavior of the algorithms in static environment one can see that mostly RBA locates more resources than BFS2 and significantly less than BFS3. • In general RBA uses more packets than BFS2 and significantly less than BFS3. • This situation satisfies us because RBA is based on NeuroSearch’s decision mechanism that is trained to locate only half of available resources. • In some points RBA locates more resources than BFS3 algorithm and in the same time uses less packets. This means that if some resource isn’t common in the network, RBA and as a consequence NeuroSearch can find enough instances of this resource.

DYNAMIC ENVIRONMENT (IV) Number of replies and amount of used packets in dynamic environment are shown in the figures: Analyzing the figures one can see that performance of the algorithms didn’t suffer so much in the dynamic environment.

DYNAMIC ENVIRONMENT (V) Total number of located resources and used packets in static and dynamic environment are shown in the table: The algorithms still can find enough resources in dynamic environment. There are two possible causes that can explain the fact that all investigated algorithms found a little bit fewer resources: 1) Some nodes in offline mode could contain queried resources. 2) Some nodes in offline mode could lie on possible path of the query.

DYNAMIC ENVIRONMENT (VI) • The algorithms used less packets in dynamic environment than in static environment. • BFS strategy is very sensitive to the size of the network, because BFS based algorithms used significantly less packets in dynamic environment where size of the network was smaller all the simulation time. • RBA used approximately the same amount of packets in both environments. Therefore we can say that RBA is not strongly sensitive to the size of the network.

FUTURE WORK • Developing the supervised approach to train NeuroSearch. • Developing modification of the algorithm for ad hoc wireless P2P networks. • Paying more detailed and deeper attention to the inner structure of the algorithm, using knowledge discovery methods. • Investigating and utilizing properties of other P2P algorithms to answer to the question about adding these properties to NeuroSearch.

Thank you!

Yevgeniy Ivanchenko University of Jyväskylä yeivanch@cc.jyu.fi