Presented by Pooja Hegde CIS 525: Neural Computation Spring 2004 Instructor: Dr Vucetic

Support Vector Machine With Adaptive Parameters in Financial Time Series Forecastingby L. J. Cao and Francis E. H. TayIEEE Transactions On Neural Networks, Vol. 14, No. 6, Nov 2003 Presented by Pooja Hegde CIS 525: Neural Computation Spring 2004 Instructor: Dr Vucetic

Presentation Outline • Introduction • Motivation and introduction of a novel approach: SVM • Background • SVMs in Regression Estimation • Application of SVMs in financial forecasting • Experimental setup and results • Experimental analysis of SVM parameters and results • Adaptive Support Vector machines (ASVM) • Experimental setup and results • Conclusions

Introduction • Financial Time Seriesis one of the most challenging applications of modern time series forecasting. • Characteristics: • Noisy-unavailability of complete information from past behavior of financial markets to fully capture dependency between future and past prices. • Non-stationary-distribution of financial time series changes over time. • The learning algorithm needs to incorporate this characteristic: information given by recent data points is given more weight as compared to distant data points.

Introduction • Back-propagation Neural Networks have been successfully used for modeling financial series. • BP Neural networks are universal function approximators that can map any non-linear function without any priori assumptions about the properties of the data. • They are more effective in describing dynamics of non-stationary time series due to their unique non-parametric, noise-tolerant and adaptive properties. • Then what’s the problem!! • Need for large number of controlling parameters. • Difficulty in obtaining a stable solution. • Danger of overfitting: Neural network captures not just the useful information in training data but also unwanted noises, hence this leads to poor generalization.

A Novel Approach: SVMs • Support Vector Machines are being used in a number of areas ranging from pattern recognition to regression estimation. • Reason: Remarkable characteristics of SVMs • Good generalization performance: SVMs implement the Structural Risk Minimization Principle which seeks to minimize the upper bound of the generalization error rather than only minimize the training error. • Absence of local minima: Training SMV is equivalent to solving a linearly constrained quadratic programming problem. Hence the solution of SVMs is unique and globally optimal. • Sparse Representation of solution:In SVM, the solution to the problem only depends on a subset of training data points, called support vectors.

Background • Theory of SVMs in Regression Estimation • Given a set of data points (x1,y1), (x2,y2),…,(xl,yl) randomly and independently generated from an unknown function. SVM approximates the function using the following: • The coefficients w and b are estimated by minimizing the regularized risk function. • To estimate w and b the above equation is transformed to the primal function by introducing positive slack variables.

Background • Theory of SVMs in Regression Estimation (contd..) • Introducing Lagrange multipliers and exploiting optimality constraints: decision function has following explicit form • are the Lagrange multipliers. They satisfy the equalities and they are obtained by maximizing the dual function which has the following form:

Feasibility of Applying SVM in Financial Forecasting • Experimental Setup: • Data Sets- • The daily closing prices of five real futures contracts from the Chicago Mercantile Market are used as datasets. • The original closing price is transformed into a five-day relative difference in percentage of price (RDP).

Feasibility of Applying SVM in Financial Forecasting • Input variables are determined from four lagged RDP values based on 5-day periods (RDP-5, RDP-10, RDP-15, RDP-20) and one transformed closing price(EMA100). Output variable- RDP+5. • Z-score normalization is used for normalizing the time series containing outliers. • Walk-forward testing routine is used to divide whole dataset into 5 overlapping training-validation-testing sets.

Feasibility of Applying SVM in Financial Forecasting • Performance Criteria: • NMSE and MAE: measures of deviation between the actual and predicted values. • Smaller values of NMSE and MAE indicate better predictor. • DS: indication of the correctness of the predicted direction of RDP+5 given in the form of percentages. • A larger value of DS suggests a better predictor. • Gaussian Kernel is used as the kernel function of SVM. • Use the results on the validation set to choose the optimal kernel parameters (C,ε and δ2)of the SVM.

Feasibility of Applying SVM in Financial Forecasting • Benchmarks • Standard 3-layer BP neural network with 5 input nodes and 1 output node. • Number of hidden nodes,learning rate & number of epochs is chosen based on the validation set. • Sigmoid transfer function-hidden nodes and Linear transfer function-output node. • Stochastic gradient descent method- train NN. • Regularized RBF Neural Network • It minimizes the risk function consisting of the empirical error and regularized term. • Regularized RBF neural network software used is developed by Muller et al. and can be downloaded fromhttp://www.kernel-machines.org. • Centers, variances and output weights are adjusted. • Number of hidden nodes and regularization parameter is chosen based on validation set.

Results • In all future contracts, largest values of NMSE & MAE are in RBF Neural Network. • In CME-SP, CBOT-US and EUREX_BUND, SVM has smaller NMSE and MAE values but BP has smaller values for DS . • The reverseis true for CBOT-BO & MATIF-CAC40 • All values of NMSE are near or larger than 1.0 indicating financial datasets are very noisy. • Smallest values of NMSE & MAE occur in SVM, followed by RBF neural network. • In terms of DS, results are comparable among the 3 methods

Results • In CME-SP, CBOT-BO, EUREX-BUND, and MATIF-CAC40, smallest values of NMSE and MAE are found in SVMfollowed by RBF neural network. • In CBOT-US, BP has smallest NMSE & MAE followed by RBF. • Paired t-test: SVM and RBF outperform BP with = 5% significance level for one-tailed test. No significant difference between SVM and RBF.

Experimental Analysis of Parameters C and δ2

Results • Too small a value of δ2 causes SVM to overfit the training data while too large a value causes SVM to underfit the training data. • Small value for C will underfit training data. When C is too large, SVM will overfit the training set – deterioration in generalization performance. • δ2 and C play an important role as far as the generalization performance of the SVM is concerned.

Experimental Analysis of Parameterε • NMSE on training & validation set is very stable & relatively unaffected by changes in ε. • Performance of SVM is insensitive to ε.But this result cannot be generalized because effect of ε on performance depends on input dimension of dataset • Number of support vectors is a decreasing function of ε.Hence a large ε reduces the number of support vectors without affecting the performance of the SVM.

Support Vector Machine with Adaptive Vectors (ASVM) • Modification of parameter C: • Regularized risk function – empirical error + regularized term • Increasing value of C increases relative importance of empirical error w.r.t regularized term. • The behaviors of the weight function can be summarized as follows: • When a0 lima 0Ci = C. Hence EASVM = ESVM • When a • When a  [0, ] and a increases, the weights for first half of training data points become smaller and those for second half of training data points become larger.

Support Vector Machine with Adaptive Vectors (ASVM) • Modification of parameter ε : • To make the solution of SVM sparser, ε adopts following form: • Proposed adaptive places ε more weights on recent training points than the distant ones. • Support vectors are a decreasing function of ε, recent training points will obtain more attention in the representation of solution that the distant points • The behaviors of the weight function can be summarized as follows: • When b0 limb  0 ε i = ε. Hence the weights in all training data points = 1.0 • When b • When b  [0, ] and b increases, the weights for first half of training data points become larger and those for second half of training data points become smaller.

Adaptive Vectors (ASVM) & Weighted BP Neural Network(WBP) • Regularized risk function in ASVM: • Corresponding dual function • Weighted BP Neural Network: • Weight update:

Results of ASVM • ASVM and WBP have smaller NMSE & MAE but larger DS than their corresponding standard methods. • ASVM outperforms SVM with =2.5% • WBF outperforms BP with  =10% • ASVM outperforms WBP with  =5% • ASVM converges to fewer support vectors

Conclusions • SVM: Apromising alternative toolto BP neural network for financial time series forecasting. • Comparable performance between regularized RBF neural network and SVM. • C and δ2 have a great influence on the performance of SVM. Number of support vectors can be reduced by using larger ε, resulting in sparse representation of solution. • ASVM achieveshigher generalization performanceand usesfewer support vectorsthan standard SVM in financial forecasting. • Future work: Investigate techniques to choose optimal values of the free parameters of ASVM. Explore sophisticated weight functions that closely follow dynamics of time series and further improve performance of ASVM.

THANK YOU!!!!

Presented by Pooja Hegde CIS 525: Neural Computation Spring 2004 Instructor: Dr Vucetic