120 likes | 302 Views
The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting. Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA. What is the GDB Cup?. Modeled after the KDD Cup. Start with $100,000 +. Financial Data +. Data Mining Techniques =.
E N D
The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA
What is the GDB Cup? Modeled after the KDD Cup Start with $100,000 + Financial Data + Data Mining Techniques = Make As Much Money as Possible
Motivation • Availability of Data • Gain Experience with DM Process • Synthesize ML + Domain Knowledge • Pragmatic implications
Availability of Data • Different Time Series Perspectives • 1 minute to monthly • Different Financial Instruments • Stocks, Futures, Options, Mutual Funds • Large Sample Size • 400 - 700 Stocks (Daily, 2.5 Years) • EMini Future (5 Minute, 2 Years) • Inexpensive or Free Sources • www.anfutures.com • www.ashkon.com • Screen Scraping (finance.yahoo.com)
DM Process: Data Cleansing • Low = 0 • Volume = 0 • Missing Data (e.g. no Open) • Missing Time Periods
Tech. Analysis Machine Learners Supervised NN, GP, SVM, Neuro Fuzzy, SOM, ILP, etc. Moving Averages, RSI, MACD, Stochastics, PNF, etc. www.equis.com/Education/TAAZ Build Models(Synthesize ML & Domain Knowledge)
Validating Models Statistical Valid. Financial Valid. Ignore Market Conditions (Buy & Hold) Start Date Value End Date Value Unrealistic Conditions (e.g. Drawdown) Standardize portfolio management Validate with EXCEL models
Annual ROI = 270% Annual ROI = 852% Annual ROI = 310% Results - 1 Spring 2003 12/31/99 - 5/31/02 712 stocks Fall 2003 6/14/02 - 6/12/03 S&P EMini (5 Min.) Fall 2002 12/31/99 - 5/31/02 452 stocks
Annual ROI = 23,300% Annual ROI = 2,172% Results - 2 Spring 2004 (Test) 12/29/03 - 04/16/04 S&P EMini (5 Min.) Spring 2004 (Train) 10/12/01 - 12/26/03 S&P EMini (5 Min.)
Conclusions • Effective way to understand DM Process • Data Cleansing • Data Validation • Very Good Results • ROI > 250% in all four cases • Pragmatic implications