350 likes | 677 Views
An automated trading system based on Recurrent Reinforcement Learning. Students: Lior Kupfer Pavel Lifshits Supervisor: Andrey Bernstein Advisor: Prof. Nahum Shimkin. Outline. Introduction Notations The System The Learning Algorithm Project Goals Results
E N D
An automated trading system based on Recurrent Reinforcement Learning Students: Lior Kupfer PavelLifshits Supervisor: Andrey Bernstein Advisor: Prof. Nahum Shimkin
Outline • Introduction • Notations • The System • The Learning Algorithm • Project Goals • Results • Artificial Time Series (the AR case) • Real Foreign Exchange / Stock Data • Conclusions • Future work • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Introduction • Using Machine Learning methods for trading • One relatively new approach to financial trading • Using learning algorithms to predict the rise and fall of asset prices before they occur • An optimal trader would buy an asset before the price rises, and sell the asset before its value declines • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Introduction • Tradingtechnique • An asset trader was implemented using recurrent reinforcement learning (RRL) suggest by Moody and Saffell (2001) • It is a gradient ascent algorithm which attempts to maximize a utility function known as Sharpe’s ratio. • We denote a parameter vector which completely defines the actions of the trader. • By choosing an optimal parameter for the trader, we attempt to take advantage of asset price changes. • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Introduction • Due to transactions costs which include • Commissions • Bid/Ask spreads • Price slippage • Market impact • Our constrains • Can’t make arbitrarily frequent trade • Can’t make large changes in portfolio composition. • Model assumptions • Fixed position size • Single security • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Notations • – Fixed quantities of security • The price series is • - The corresponding price changes • -Out position in each time step • - System return in each time step • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Notations • - Additive profit accumulated over T time periods • - Performance criterion • Is the marginalincrease in the performance • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The system • - Parameters vector (which we attempt to learn) • - Information available at time t (in our case - the price changes) • - Stochastic extension (noise) which level can be varied to control “exploration vs. exploitation”. • Our system is a single layer recurrent neural network: • Formally: • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The system • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The learning algorithm • We use reinforcement learning (RL) to adjust the parameters of the system to maximize our performance criteria of choice • RL – an alternative between supervised & unsupervised learning • RL Framework: • Agent Environment • Reward • Expected Return • Policy Learning • RL modus operandi • Agent perceives the state of the environment st and chooses an action at. It subsequently observes the new state of the environment st+1 and receives a reward rt. • Aim : Learn a policy π (mapping from states to actions), which optimizes the expected return • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The learning algorithm • RL approaches • Direct RL • In this approach, the policy is represented directly. The reward function (immediate feedback) is used to adjust the policy on the fly. • e.g. policy search • Value function RL • In this approach ,values are assigned to each state (or state‐action pair). Values correspond to estimates of future expected returns, or in other words, to the long‐term desirability of states. These values help guide the agent towards the optimal policy • e.g. TD-Learning, Q-Learning. • Actor-Critic • The model is split into two parts: the critic, which maintains the state value estimate V, and the actor, which is responsible for choosing the appropriate actions at each state. • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The learning algorithm • In RRL we learn the policy by gradient ascent in the performance function • Performance function can be • Profits • Sharpe’s ratio • Sterling ratio • Double deviation • Moody suggests an additive and differentiable approximation for Sharpe’s ratio – the differential Sharpe’s ratio • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
The learning algorithm • Now we develop • Note: • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Project goals • Investigate Reinforcement Learning by policy gradient • Implement an automated trading system which learns it’s trading strategy by Recurrent Reinforcement Learning algorithm • Analyze the system’s results & structure • Suggest and examine improvement methods • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results • Outline • Introduction • Notations • The system • The Learning Algorithm • Project Goals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results • The challenges we face • Model parameters: • If and how to normalize the learned weights? • How to normalize the input? • The averages changes over time (non stationary) – we assume that the change is slower than “how far back we look” • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • rt– the return series are generated by AR(p) process • We analyze the effect of • transaction costs • quantization levels • number of autoregressive inputs On • Sharpe’s ratio • trading frequency • Profits • Effect of initial conditions • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Artificial series • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Real Forex Data • The prices series are of US Dollar vs. Euro exchange rate between 21/05/2007 until 15/01/2010 on 15 minutes data points • We compare our trader with • Random strategy of Uniform distribution • Buy and Hold strategy of Euro against US Dollar. • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Real Forex Data • No commissions • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Results – Real Forex Data • With commissions (0.1%) • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Conclusions • RRL performs better than the random strategy • Positive Sharpe Ratios achieved in most cases • RRL seems to struggle during volatile periods • Large variance is a major cause for concern • Can’t unravel complex relationships in the data • Changes in market condition lead to waste of all the system’s learning during the training phase (but most learning systems suffer from this). • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Conclusions • When trading real data - the transaction cost is a killer • Normalizing the input series can be a real challenge • The input series are non-stationary • We assume the rate of change of average number of AR inputs to the system • Normalizing the weights – heuristically • Threshold method leads to best results on both artificial & real data • Redundancy when input series are ARMA processes • Large training sessions under constant market conditions lead to overfitting • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Future work • Wrapping the system with risk management layer (e.g. Stop-Loss, retraining trigger, shut down the system under anomalous behavior) • Dynamical adjustment of external parameters (such as learning-rate) • Working with more than one security • Working with variable size positions • Working with coordination with another expert system (based on other algorithms) • Outline • Introduction • Notations • The system • The Learning Algorithm • ProjectGoals • Results • Artificial Series • Real Forex Data • Conclusions • Future work L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Acknowledgment • We would like to thank our project supervisor Andrey Bernstein for the guidance, Prof. Nahum Shimkinfor advising us and allowing us to pursue a research project of our interest and sharing his experience with us. • Additionally we would like to thank Prof. Ron Meir & Prof. NeriMerhavfor their time spent consulting us. • Special warm thanks to Gabriel Molina from Stanford university and TikeshRamtohulfrom University of Basel for their priceless help. L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
Questions? L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.
References • [1] J Moody, M Saffell, Learning to Trade via Direct Reinforcement, IEEE Transactions on Neural Networks,Vol 12, No 4, July 2001 • [2] Carl Gold, FX Trading via Recurrent Reinforcement Learning, CIFE, Hong Kong, 2003 • [3] M.A.H. Dempster, V. Leemans, An Automated FX trading system using adaptive reinforcement learning, Expert Systems with Applications 30, pp.543-552, 2006 L. Kupfer & P.Lifshits : “An automated trading system based on Recurrent Reinforcement Learning”, Technion - Israel Institute of Technology, Faculty of Electrical Engineering, Control and Robotics Laboratory.