80 likes | 191 Views
The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures. Wei Ge Atish P. Sinha Huimin Zhao School of Business Administration University of Wisconsin-Milwaukee. Research Objective.
E N D
The Effects of Domain Knowledge on the Performance of Data Mining Models for Bank Failures Wei Ge Atish P. Sinha Huimin Zhao School of Business Administration University of Wisconsin-Milwaukee
Research Objective • To examine the effects of domain knowledge on the performance of data mining models.
Problem Domain • Bankruptcy: early warnings of bank failures. • Data sets • Banks failed in 1991 and 1992. • Each failed bank matched with a survived bank. • Balanced sample (half failed, half survived). • One-year prior (480 cases) and two-year prior prediction (468 cases)
Domain Knowledge • CAMEL financial ratios derived from raw accounting variables • Capital adequacy • Asset quality • Management quality • Earnings • Liquidity • Over 100 ratios developed in past research. • 26 simple ratios adopted in this study.
Methodology • Comparing performance of classifiers learned with & without domain knowledge • 93 original accounting variables • 26 financial ratios • 4 classification techniques: logistic regression, C4.5 decision tree, neural network, k-nearest neighbor • Performance measure: expected misclassification cost
Results • Under various settings of prior probability of failure (0.01, 0.02) and cost ratios (10-100) • Lower cost is observed with domain knowledge for every classification method • Effect size of domain knowledge depends on the method • More improvement on logistic regression and neural network than on other methods
Examples (a)One-year-prior, p=0.02 and r=50 (b) One-year-prior, p=0.02 and r=60
Conclusion • Incorporating domain knowledge improves classifier performance. • Effect of domain knowledge varies depending on classification method used.