1 / 11

The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting

The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting. Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA. What is the GDB Cup?. Modeled after the KDD Cup. Start with $100,000 +. Financial Data +. Data Mining Techniques =.

taite
Download Presentation

The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA

  2. What is the GDB Cup? Modeled after the KDD Cup Start with $100,000 + Financial Data + Data Mining Techniques = Make As Much Money as Possible

  3. Motivation • Availability of Data • Gain Experience with DM Process • Synthesize ML + Domain Knowledge • Pragmatic implications

  4. Availability of Data • Different Time Series Perspectives • 1 minute to monthly • Different Financial Instruments • Stocks, Futures, Options, Mutual Funds • Large Sample Size • 400 - 700 Stocks (Daily, 2.5 Years) • EMini Future (5 Minute, 2 Years) • Inexpensive or Free Sources • www.anfutures.com • www.ashkon.com • Screen Scraping (finance.yahoo.com)

  5. DM Process: Data Cleansing • Low = 0 • Volume = 0 • Missing Data (e.g. no Open) • Missing Time Periods

  6. Tech. Analysis Machine Learners Supervised NN, GP, SVM, Neuro Fuzzy, SOM, ILP, etc. Moving Averages, RSI, MACD, Stochastics, PNF, etc. www.equis.com/Education/TAAZ Build Models(Synthesize ML & Domain Knowledge)

  7. Validating Models Statistical Valid.  Financial Valid. Ignore Market Conditions (Buy & Hold) Start Date Value  End Date Value Unrealistic Conditions (e.g. Drawdown) Standardize portfolio management Validate with EXCEL models

  8. Annual ROI = 270% Annual ROI = 852% Annual ROI = 310% Results - 1 Spring 2003 12/31/99 - 5/31/02 712 stocks Fall 2003 6/14/02 - 6/12/03 S&P EMini (5 Min.) Fall 2002 12/31/99 - 5/31/02 452 stocks

  9. Annual ROI = 23,300% Annual ROI = 2,172% Results - 2 Spring 2004 (Test) 12/29/03 - 04/16/04 S&P EMini (5 Min.) Spring 2004 (Train) 10/12/01 - 12/26/03 S&P EMini (5 Min.)

  10. Demo

  11. Conclusions • Effective way to understand DM Process • Data Cleansing • Data Validation • Very Good Results • ROI > 250% in all four cases • Pragmatic implications

More Related