1 / 8

The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures

The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures. Wei Ge Atish P. Sinha Huimin Zhao School of Business Administration University of Wisconsin-Milwaukee. Research Objective.

hachi
Download Presentation

The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures Wei Ge Atish P. Sinha Huimin Zhao School of Business Administration University of Wisconsin-Milwaukee

  2. Research Objective • To examine the effects of domain knowledge on the performance of data mining models.

  3. Problem Domain • Bankruptcy: early warnings of bank failures. • Data sets • Banks failed in 1991 and 1992. • Each failed bank matched with a survived bank. • Balanced sample (half failed, half survived). • One-year prior (480 cases) and two-year prior prediction (468 cases)

  4. Domain Knowledge • CAMEL financial ratios derived from raw accounting variables • Capital adequacy • Asset quality • Management quality • Earnings • Liquidity • Over 100 ratios developed in past research. • 26 simple ratios adopted in this study.

  5. Methodology • Comparing performance of classifiers learned with & without domain knowledge • 93 original accounting variables • 26 financial ratios • 4 classification techniques: logistic regression, C4.5 decision tree, neural network, k-nearest neighbor • Performance measure: expected misclassification cost

  6. Results • Under various settings of prior probability of failure (0.01, 0.02) and cost ratios (10-100) • Lower cost is observed with domain knowledge for every classification method • Effect size of domain knowledge depends on the method • More improvement on logistic regression and neural network than on other methods

  7. Examples (a)One-year-prior, p=0.02 and r=50 (b) One-year-prior, p=0.02 and r=60

  8. Conclusion • Incorporating domain knowledge improves classifier performance. • Effect of domain knowledge varies depending on classification method used.

More Related