A Novel Prediction Model for Credit Card Risk Management

A Novel Prediction Model for Credit Card Risk Management Tsung-Nan Chou Department of Finance, Chaoyang University of Technology168 Jifong E. Rd., Wufong Township, Taichung County, 41349, TaiwanE-mail: tnchou@mail.cyut.edu.tw

Outline • Introduction • Integrated Model • Research Methodologies • Evolutional neural network • Grey incidence analysis • Dempster-Shafer theory • Experiment Results • Conclusion

Introduction Credit Card Business in Taiwan • The severe competition of credit card market in Taiwan. • The amount of issued credit cards has increased rapidly and is sixteen times the size of the past decade issues • The annual amount of credit has also growth 9.54 times.

Introduction The Truths of Applying Credit Card • Everyone is easy to apply for the credit cards. • One existed card applied for another cards without verification in promotion campaign. • Platinum card holders are everywhere in Taiwan. • Most financial institutions focus on the prior process of credit card verification instead of the posterior process of the risk management after cards issued.

Introduction What the Card Issuers Need ? • Credit risk management is critical safeguard against the possible losses. • A credit risk management alert system will be able to freeze the credit usage or to reject the on-going transaction and to prevent the potential bad debts over the credit limit. • Banking industrial need to construct an efficient credit risk prediction system to detect the default of card holders correctly.

Introduction Research Work in Credit Assessment • Some researches applied trandional statistic regression models such as Orgler(1970) Steenackers＆Goovaerts (1989) • Many method in statistics ,such as regression analysis ,variance analysis and principal component require a large amount of samples and satisfy certain probability distribution. • Recent studies use Artificial Intelligence (AI) methods for credit assessment. Neural network is the more recently in support of both business and financial applications. • Many applications of AI can be found in (Brause, Langsdorf, and Hepp, 1999), (Aleskerov, Freisleben and Rao, 1997), (West, 2000) (Donato et. al., 1999)

Objective of the system • The aim of this study is to construct an efficient risk prediction system to detect the default of credit card holders correctly. • The system collects the personal and financial information of credit card holders and then applies evolutional neural network which integrated with grey incidence analysis and Dempster-Shafer theory of evidence to predict the default cases and trace the behaviors of the card holders and manage the default risks.

System structure

Research Methodologies • Evolutional neural network • Grey incidence analysis • Ju-Long Deng, 1998, Essential Topics on Grey System, Theory and Application, China Ocean Press. • Kun-Li Wen, 2004, Grey System Modeling and Prediction, Yang’s Scientific Research Institute, USA. • Dempster-Shafer theory • G.. Shafer, “A Mathematical Theory of Evidence”, Princeton, NJ, Princeton, University Press, 1976. • G.. Shafer, “Probability Judgement in Artificial Intelligence”, Uncertainty in Artificial Intelligence. L. N. Kanal and J. F. Lemmer. New York, Elsevier Science, 1986.

Evolutional neural network Limitation of Neural Network Training • The back-propagation learning algorithm cannot guarantee an optimal solution as it might converge to a local optimal weights. As a result, the neural network is often unable to find a desirable solution to a problem. • The other difficulty is to selecting an optimal topology for the neural network. The network architecture for a particular problem is often chosen by means of heuristics.

‧‧‧‧ ‧‧‧‧‧‧ ‧‧‧‧‧‧ Evolutional neural network Genetic algorithms are an effective optimisation technique that can be applied to guide both optimisation of weights and topology.

By courtesy of Negnevitsky, Pearson Education, 2002

Evolutional neural network Execution Step: • Step 1: Randomly generate an initial population of chromosomes which represents the topology of the neural networks. • Step 2: Calculate the fitness of each individual chromosome. • Step 3: Create a pair of offspring chromosomes by applying the genetic operations such as reproduction, crossover and mutation operations. • Step 4: Replace the initial chromosome population with the new population. • Step 5: Go to Step 2, and repeat the process until the termination criterion is satisfied.

Grey incidence analysis • The fundamental idea of the grey incidence analysis is that the closeness of a relation is judged based on the similarity level of the geometrical patterns of sequence curves. The more similar the curves are ,the higher degree of incidence between sequence. • Generation technique of grey Sequences: to realize the data pretreatment with analysis of object system and applying operators of sequences. • Grey incidence analysis techniques: to find out relationship of sequences based on the geometry comparability of these sequences.

Grey incidence analysis • Execution Step: • Step 1: Transform each sequence to grey generation sequences by four techniques. • Step 2: Calculate incidence coefficient and degree of grey incidence. Assume that the following sequence x0representing the characteristics of a system. • And the following is the sequence of relevant factors.

Grey incidence analysis • The degree of grey incidence γ(x0, xi) is denoted as γ0,iand the incidence coefficient γ(x0(k), xi(k)) at the point of the sequence as γ0,i(k). • For ζ (0, 1), where ζ is called distinguishing coefficient. • γ(x0(k), xi(k)), define

Grey incidence analysis • The degree of grey incidence γ(x0, xi) can be calculated as the following Average approach. • Step 3: The order of grey incidences is defined as the following according to their values.

Dempster-Shafer theory • The Dempster-Shafer decision theory is considered a generalized Bayesian theory which is traditional method to deal with statistical problems. • The Dempster-Shafer theory is a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to calculate the probability of an event. • The Dempster-Shafer (D-S) theory of evidence was created by Glen Shafer [Shafer, 1976] at Princeton. He built on earlier work performed by Arthur P. Dempster. The theory is a broad treatment of probabilities, and includes classical probability and certainty factors as subsets.

Dempster-Shafer theory Consider the nature of evidence. • Some evidence is not reliable (the weatherman is wrong sometimes and right sometimes). • Some evidence is uncertain (an intermittent atmospheric reading). • Some is incomplete (the wind speed by itself does not tell us much). • Some evidence is contradictory (the weatherman's forecast and the atmospheric conditions). • Some evidence is incorrect (a broken atmospheric data source or a wrong weather forecast).

Dempster-Shafer theory • In the D-S theory of evidence, the set of all hypotheses that describes a situation is the frame of discernment. The hypotheses should be mutually exclusive and exhaustive, meaning that they must cover all the possibilities and that the individual hypotheses cannot overlap. • The D-S theory mirrors human reasoning by narrowing its reasoning gradually as more evidence becomes available. Two properties of the D-S theory permit this process: • the ability to assign belief to ignorance • the ability to assign belief to subsets of hypotheses.

Dempster-Shafer theory • Two special sets applied in D-S theory. The first is the null set, which cannot hold any value and the second special set is a set contains all elements. Assigning belief to the second set does not help distinguish anything and representing ignorance. Humans often give weight to the hypothesis "I don't know", which is not possible in classical probability. Assigning belief to "I don't know" allows us to delay a decision until more evidence becomes available. • Each data source, Si for example, will contribute its observation by assigning its beliefs. This assignment function is called the “probability mass function” and denoted by mi. So, the upper and lower bounds of a probability interval can be defined as contains the precise probability of a set of interest in the classical sense, and is called belief and plausibility.

Dempster-Shafer theory • The lower bound of the confidence interval is the belief confidence, which accounts all evidences Ek that support the given proposition “A”: • Beliefi(A) =

Dempster-Shafer theory • The upper bound of the confidence interval is the plausibility confidence, which accounts all the observations that does not rule out the given proposition: • Plausibilityi(A) =

Dempster-Shafer theory • For each possible proposition (e.g., A), Dempster-Shafer theory gives a rule of combining data source Si’s observation mi and data source Sj’s observation mj:

Dempster-Shafer theory • The Dempster's rule of combination, is a generalization of Bayes' rule. This rule strongly emphasises the agreement between multiple sources and ignores all the conflicting evidence through a normalization factor. • Compared with Bayesian theory, the Dempster-Shafer theory of evidence is much more analogous to our human perception-reasoning processes. Its capability to assign uncertainty or ignorance to propositions is a powerful tool for dealing with a large range of problems that otherwise would be intractable.

Experiment Results Data Processing • The raw data are segmented into good records and bad records for two successive terms and then are randomized to improve the performance of the training process. • To provide sufficient and adequate data for the system evaluation, total of 4000 records are collected and subdivided into 1000 for the training set and 3000 for the cross validation set.

Experiment Results Among 24 explanatory variables , it is found that a total of 10 variables have higher ranking derived from the integrated model. • Minimum Payment Due (X21) • Last Minimum Payment (X23) • Gender (X07) • Number of Cards Held (X04) • Martial Status (X08) • Card Holder’s Age (X05) • Revolving Credit (X17) • Account Duration (X01) • Available Credit (X20) • Annual Income (X10

Experiment Results Result of the feature selection

Experiment Results Comparison of prediction accuracy

Conclusion • We have found this integrated feature selection approach performs better than that of applying grey incidence analysis only in terms of the rate of prediction accuracy. • The former correctly predicts the default cases to an average of nearly 86.33% of all cases, which is about 3.63% higher than the latter. As both methods results in different order for the variables, the DS method is able to combine the different outcomes of the grey incidence analysis and perform the task of data fusion. • We discovered that using grey incidence analysis leads us to the reduction of the variables during the process of feature selection and understood that more additional variables can not improve the accuracy of the perdition.

Conclusion • This study also shows evolutional neural network with feature selection (GIA) is superior to the evolutional neural network with no feature selection. The former correctly predicts the default cases to an average of nearly 82.7% of all cases, which is about 5.4% higher than the latter. • This study collected the real data from only one financial institute to evaluate the performance of the integrated model. Further research could follow this line to collect more real data from other financial institutions located in different geographic regions and investigate whether there is different ranking priority with the input variables and produce the inconsistent results with this study.

Variables

A Novel Prediction Model for Credit Card Risk Management