Caching & Replacement of Multimedia Streaming Objects using Soft Computing in Fixed Networks

Caching & Replacement of Multimedia Streaming Objects using Soft Computing in Fixed Networks

Neural Networks : Introduction • Simplified model of human brain. • Massively parallel distributed processing system. • Ability to learn and thereby acquire knowledge.

Characteristics of neural networks • Generally appropriate for problems where the final answer depends heavily on combinations of many input features. • They exhibit mapping capabilities i.e. they can map input patterns to their associated output patterns.

Characteristics of neural networks • They can be trained with known examples of a problem before they are tested for their ‘inference’ capability on unknown instances of the problem. • Robust and fault tolerant. • Can recall full patterns from incomplete, partial or noisy patterns.

Model of an artificial neuron Thresholding Unit

Activation functions • Common activation function: • Thresholding function: Compared to value Ф • Here Ф=0.

Activation functions

Neural network architectures • Single layer feedforward network • Multilayer feedforward network • Recurrent network.

Single layer Feedforward Network • There are only two layers • Input layer • Output layer • Only output layer performs the computation. • Input layer merely transmits the signals.

Multilayer Feedforward network In a feed forward network information always moves one direction; it never goes backwards.

Recurrent Networks • There is atleast one feedback loop from output to input.

Learning Methods Where Do The Weights Come From? • The weights in a neural network are the most important factor in determining its function • Learning is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function. • There are two main types of training • Supervised Training • Supplies the neural network with inputs and the desired outputs, to determine the error. • The weights are modified to reduce the difference between • the actual and desired outputs

2. Unsupervised Training • Only supplies inputs • The neural network adjusts its own weights so that similar inputs cause similar outputs • The network identifies the patterns and differences in the inputs without any external assistance Epoch • One iteration through the process of providing the network with an input and updating the network's weights • Typically many epochs are required to train the neural network

Perceptrons • First neural network with the ability to learn • Made up of only input neurons and output neurons • Input neurons typically have two states: ON and OFF Yk = f (netk) = 1, if netk >0 = 0, otherwise netk = ∑ xiwik 0.5 0.2 0.8 Input neurons Weights Output neurons

How do perceptrons learn?

Perceptrons cannot handle tasks which are not linearly seperable • Set of points in two dimensional spaces are linearly seperable if the sets can be seperated by a straight line • Perceptron cannot find weights for problems that are not linearly seperable. An example is the XOR problem.

XOR: Not linearly separable XOR and its negation are the only Boolean functions of two arguments that are not linearly separable

The XOR problem can be solved by multilayer feedforward network

Backpropagation Networks • Single neurons can perform certain simple pattern detection functions, the power of neural computation comes from the neurons connected in network structure. • For many years there was no theoretically sound algorithm for training multilayer artificial neural networks. • Backpropagation was one of the first general techniques developed to train multilayer networks • Backpropagation networks use a gradient descent method to minimize the total squared error of the output

A form of supervised training • Simple, Slow, Prone to local minima issues • Most common measure of error is the mean square error It is the sum over all points p in our data set of the squared difference between the target value tp and the model's prediction yp, calculated from the input value xp

Minimizing the error • The gradient of E gives us the direction in which the error function at the current setting of the w has the steepest slope. In order to decrease E, we take a small step in the opposite direction, -G By repeating this over and over, we move "downhill" in E until we reach a minimum, where G = 0, so that no further progress is possible

In order to train neural networks such as the ones shown above by gradient descent, we need to be able to compute the gradient G of the error function with respect to each weight wij of the network. It tells us how a small change in that weight will affect the overall error E. A small step (learning rate) in the opposite direction will result in the maximum decrease of the (local) error function: wnew = wold – α ∂E/∂wold where α is the learning rate Calculation of the derivatives flows backwards through the network, hence the name, backpropagation

- (ti - yi) yj where An important consideration is the learning rate µ, which determines by how much we change the weights w at each step. If µ is too small, the algorithm will take a long time to converge Conversely, if µ is too large, we may end up bouncing around the error surface out of control - the algorithm diverges

Backpropagation Learning If the squashing function is the sigmoid function the derivative has the convenient form • Another popular choice of squashing function is tanh, which takes values in the range (-1,1) rather than (0,1) tanh(u)' = 1 - tanh(u)2

Adding a momentum term • Tends to aid convergence • Add a fraction of the previous weight change to the current weight change. Δwnew = βΔwold - α ∂E/∂wold β is the momentum coefficient wnew = wold + Δwnew Addition of such a term smoothes out the descent path by preventing extreme changes in the gradients due to local anomalies.

Replacement Issues • Current replacement algorithms usually make a binary decision on the caching of an atomic object. • The object is cached or flushed in its entirety based on the time or frequency. • The optimal caching algorithm caches the objects partially, (i.e. some portion in a frame is cached).

Replacement Issues • A replacement algorithm, which uses just time or frequency, would not suffice • The algorithm should take into account size because size is the most important factor in case of multimedia objects. • In case of videos, popularity of the videos is quite essential, because once a video becomes popular the requests for such videos grow exponentially and less popular videos are almost ignored.

Neural Network Proxy Cache Replacement • In the Neural Replacement Policy, whenever the cache is full and a cache miss occurs ,the NN algorithm determines the video frames to evict by computing a mathematical merit called cache metric for each of the video depending on certain parameters viz. Size, Frequency and Access recency

Neural Network Proxy Cache Replacement • Parameters • Access Frequency (Highest priority) • Size • Access Recency (Lowest priority)

Neural Network Proxy Cache Replacement • Access frequency: Less is the frequency, more is the probablility of that object being replaced. • Size: Larger size objects are given less priority so that more number of objects can fit inside cache.

Neural Network Proxy Cache Replacement • Access Recency: Every video object has a field access recency field T(r). Every time a request is made, this field is calculated by subtracting the proxy start time from the current time and is updated. So a higher T(r) value indicates a indicates more recently accessed object.

Neural Network Proxy Cache Replacement • Multilayer feed-forward artificial neural network to handle web proxy cache replacement decisions. • weights of the network are adjusted using back-propagation. • sigmoid function is used as the activation function for the neural network

Neural Network Proxy Cache Replacement • For approximating bounded continuous functions as in our case, one hidden layer is sufficient. • If number of input neurons = j then no of hidden neurons in hidden layer must be between j & 2j. • The exact no of hidden layer neurons can be determined by the error and computation time during training.

Neural Network Proxy Cache Replacement • The exact no of hidden layer neurons can be determined by the error and computation time during training. • If number of input neurons = j then no of hidden neurons in hidden layer must be between j & 2j. • For approximating bounded continuous functions as in our case, one hidden layer is sufficient.

Cache Metric Function • The neural network is used to approximate the following cache metric function: H = 1 1+ exp(-F) where F = fifTirsis fi = frequency of the ith video object/frame Ti = Access recency time of ith video object/frame si = Size of ith video object/frame • All these values are normalized • The indices f, r and s are integer constants. They signify the relative priority given to the three parameters. • For this cache metric, the values of these indices determined after training are as f=5, r=2 and s= - 4.

Neural Network Proxy Cache Replacement • Neural network has a single output (tag value) which is assigned to each video object and also to each of its frames • First, the video object with lowest tag value is selected and then the frames with lower tag values of that video are identified and evicted.

Training of neural network • (3,4,1) Multilayer feedforward NN with backpropagation. • Initially randomized weights are assigned each of the interconnection. Frequency Neural Network Output Access recency Tag Value Size

Training of neural network • Training Set of 100 video objects • Each video having frequency, access recency and size is given as input to neural network. • Each of the inputs were normalized before being fed to NN. Frequency Neural Network Output Access recency Tag Value Size

Training of neural network • Learning rate and momentum coefficient were found to be optimum at 0.2 and 0.8 respectively. Frequency Neural Network Output Access recency Tag Value Size

Pseudo code for replacement • Initially we find cache size required for fulfilling all the requests without a replacement policy. Next we simulate the algo for different cache sizes. • At start of program proxy server cache is empty and we fix the cache size and cutoff . • The client requests are stored on a text file. Also the file(containing frame sizes) corresponding to each video object is stored on the origin server. • The proxy server starts reading the client requests one by one.

Results • Neural Network Structure obtained after training (output written to a file)

Results • Output of neural network for the training set

The Proxy Cache • The proxy server contains the following items- 1) Cached data of video objects 2) Binary tree storing entries corresponding to each cached object 3) Stack for storing frames of each video. Each node of binary tree holds the following values 1) Frequency 2) Size 3) Time Stamp 4) Neural network output(tag value)

The Proxy Cache • The ordering parameter for the binary tree is the tag value. • For each cache miss a new node is inserted in the tree. • For each cache hit, we search the video in the tree and update its parameters.

Replacement Algorithm • As caching of the videos using OC algorithm happens frame-by-frame, the replacement also should happen frame-by-frame. • Once the victim video is selected, the deletion of frames is done from the last frame. If any request for the video being deleted arrives, it could be locked and other file could be chosen for deletion and then the initial frames can be used to serve a future request, at least partially

Cache Hit Start delivering the video file to the client and update all parameters related to the video object. • Cache Miss If the current_cache_size + current_request_size < Max_cache_size Send request to the origin server. Start caching the video using optimal caching algorithm and transfer data to the client. Else Run the replacement algorithm. Find victim video and remove it from cache frame by frame until sufficient space is created for the new video.

Caching & Replacement of Multimedia Streaming Objects using Soft Computing in Fixed Networks