320 likes | 526 Views
Overview. IntroductionMethodology Data DescriptionEmpirical studyConclusionFurther research. The Impact of Sample Bias on Consumer Credit Scoring Performance and Profitability. . Application Scoring as opposed to behavioral/profit scoringSample Bias - ?Population drainage'
E N D
3. Overview Introduction
Methodology
Data Description
Empirical study
Conclusion
Further research
4.
Application Scoring
as opposed to behavioral/profit scoring
Sample Bias
- ‘Population drainage’
- Biased estimates
Reject Inference
Application scoring: Binary classification of good versus bad classes
Comparison with a threshold and discretizationApplication scoring: Binary classification of good versus bad classes
Comparison with a threshold and discretization
5.
Calibration set
Creating a ‘proportional’ sample
3 Research Questions:
Q1: Does sample bias occur ?
Q2: What is the impact of sample bias ?
Q3: Proportionality versus sample size
Calibration set: a set of orders accepted by the credit score but were accepted on a judgmental basis (“low side overrides”)
Useful through inclusion in:
the model building process
the holdout sample, which can be designed to be more representative of the ‘trough-the-door’ applicant population
=> We will refer to this as a ‘proportional’ sample, i.e. a sample with the same ratio of orders accepted versus rejected according to the existing credit score, but where all outcomes are known.
Q1: a) model trained on accepted orders only and tested on accepted orders versus the calibration set
b) does a variable selection procedure select different variables when a proportional sample is used
Q2: a) quantify the gain in predictive performance and profitability if the outcome of all rejected orders would be available
b) most important part of the study => sensitivity analysis to check the external validity of our results.
Q3: If there is a gain in performance, would it then be optimal to sample a small part of the rejected orders and use a proportional sample? In practice, this would often mean that, in order to create a proportional sample, sample size will be reduced. In this last research question we investigate whether a reduction in sample size (in order to gain proportionality) is beneficial.
Ps: the sample size will be reduced whenever alpha > sigma (acceptance rate > percentage of rejects for which the outcome is available) calculations can be found in the paperCalibration set: a set of orders accepted by the credit score but were accepted on a judgmental basis (“low side overrides”)
Useful through inclusion in:
the model building process
the holdout sample, which can be designed to be more representative of the ‘trough-the-door’ applicant population
=> We will refer to this as a ‘proportional’ sample, i.e. a sample with the same ratio of orders accepted versus rejected according to the existing credit score, but where all outcomes are known.
Q1: a) model trained on accepted orders only and tested on accepted orders versus the calibration set
b) does a variable selection procedure select different variables when a proportional sample is used
Q2: a) quantify the gain in predictive performance and profitability if the outcome of all rejected orders would be available
b) most important part of the study => sensitivity analysis to check the external validity of our results.
Q3: If there is a gain in performance, would it then be optimal to sample a small part of the rejected orders and use a proportional sample? In practice, this would often mean that, in order to create a proportional sample, sample size will be reduced. In this last research question we investigate whether a reduction in sample size (in order to gain proportionality) is beneficial.
Ps: the sample size will be reduced whenever alpha > sigma (acceptance rate > percentage of rejects for which the outcome is available) calculations can be found in the paper
6. Logistic Regression
Performance measurement
PCC, AUC and profitability Why a logit model:
one of the most frequently used techniques in research and industry
Traditional statistical models, such as logistic regression perform very vell for credit scoring when compared to machine learning techniques (B. Baesens, JORS, June 2003).
Another technique often used in credit scoring, discriminant analysis, has been proven to introduce bias when used for extrapolation beyond the reject region (Hand and Henley, IMA journal of mathematics applied in business and industry, 1994, Feelders, International Journal of Intelligent Systems in Accounting and Finance, 2000, Eisenbeis, Journal of Finance, 1977.
Performance measurement: PCC, AUC, Profit
PCC = TP+TN/TP+TN+FP+FN, but it tacitly assumes equal misclassification costs for false positive versus false negative predictions, and class distributions are presumed constant and relatively balanced. Evaluation for one specific cutoff, not for a range of cutoffs, whereas auc is a measure that summarizes the classifier’s performance over various values .
ROC chart: Y as = Sensitivity (TP/(TP+FN=those predicted to be positive)) & X as = 1 – Specificity (TN/FP+TN)
A 2-dimensional graphical illustration of the sensitivity versus the specificity for various values of the classification threshold. The area under this curve provides a simple figure of merit for the performance of the constructed classifier (cf. Gini coefficient: 2 x the area between the curve and the diagonal)Why a logit model:
one of the most frequently used techniques in research and industry
Traditional statistical models, such as logistic regression perform very vell for credit scoring when compared to machine learning techniques (B. Baesens, JORS, June 2003).
Another technique often used in credit scoring, discriminant analysis, has been proven to introduce bias when used for extrapolation beyond the reject region (Hand and Henley, IMA journal of mathematics applied in business and industry, 1994, Feelders, International Journal of Intelligent Systems in Accounting and Finance, 2000, Eisenbeis, Journal of Finance, 1977.
Performance measurement: PCC, AUC, Profit
PCC = TP+TN/TP+TN+FP+FN, but it tacitly assumes equal misclassification costs for false positive versus false negative predictions, and class distributions are presumed constant and relatively balanced. Evaluation for one specific cutoff, not for a range of cutoffs, whereas auc is a measure that summarizes the classifier’s performance over various values .
ROC chart: Y as = Sensitivity (TP/(TP+FN=those predicted to be positive)) & X as = 1 – Specificity (TN/FP+TN)
A 2-dimensional graphical illustration of the sensitivity versus the specificity for various values of the classification threshold. The area under this curve provides a simple figure of merit for the performance of the constructed classifier (cf. Gini coefficient: 2 x the area between the curve and the diagonal)
7. Logistic Regression
Performance measurement
PCC, AUC and profitability
Resampling procedure
Stratified resampling
Sensitivity analysis
To what degree does the extent of truncation
influence the results ? Resampling procedure: Assess the variance of performance indicators by splitting the data into train and validation set (stratified sampling allocating an equal percentage of defaulters to training and holdout sample). This procedure will be performed 100 times.
Sensitivity analysis: The value of reject inference is driven by the extent of truncation, ie the size of the reject region. Since we have historical scores, we can simulate the situation if only the best 70% of the orders (instead of 86.8%) would have been accepted, considering Hand & Henley’s (Journal of the Royal Statistical Society 1997) observation that 70% is not unusual in mail-order consumer credit.Resampling procedure: Assess the variance of performance indicators by splitting the data into train and validation set (stratified sampling allocating an equal percentage of defaulters to training and holdout sample). This procedure will be performed 100 times.
Sensitivity analysis: The value of reject inference is driven by the extent of truncation, ie the size of the reject region. Since we have historical scores, we can simulate the situation if only the best 70% of the orders (instead of 86.8%) would have been accepted, considering Hand & Henley’s (Journal of the Royal Statistical Society 1997) observation that 70% is not unusual in mail-order consumer credit.
8. Furnival & Wilson (1974)
Leap-and-Bound algorithm
De Long, De Long & Clarke-Pearson (1988)
Comparing AUC’s We have used the leap-and-bound algorithm of Furnival and Wilson, Technometrics 1974, implemented in selection = score option in the sas logistic procedure to detect the best model for all possible model sizes. The algorithm requires a minimum of arithmetic, and the possibility for finding the best subsets without examining all possible subsets. However, it generates a likelihood score (chisq) statistic without significance tests, so we used the algorithm proposed by De Long, De Long and Clarke-Pearson, Biometrics 1988 to investigate whether a model with a given sample size significantly differs in terms of AUC from the full model. We then selected the model with the lowest number of variables that does not differ significantly from the model using all characteristics at a 5% significance level.
We have used the leap-and-bound algorithm of Furnival and Wilson, Technometrics 1974, implemented in selection = score option in the sas logistic procedure to detect the best model for all possible model sizes. The algorithm requires a minimum of arithmetic, and the possibility for finding the best subsets without examining all possible subsets. However, it generates a likelihood score (chisq) statistic without significance tests, so we used the algorithm proposed by De Long, De Long and Clarke-Pearson, Biometrics 1988 to investigate whether a model with a given sample size significantly differs in terms of AUC from the full model. We then selected the model with the lowest number of variables that does not differ significantly from the model using all characteristics at a 5% significance level.
9.
Belgian Catalog Retailer
Orders between 1/7/2000 and 1/2/2002
Variable creation
Demographics, occupation, financial information, and default information
Scoring process Catalog retailer offering consumer credit to its customers. Articles as diverse as furniture, electronics, gardening & DIY equipment and jewelry. This analysis was performed on a moment when the previous score – constructed by an international company specialized in consumer credit scoring – was to be updated since it had been in use for 6 years.
Orders between July 1st 2000 and February 1st 2002, but follow-up until February 1st 2003, so the outcome could be tracked for all orders.
Variables were inspired on the ideas of the company’s managers as well as previous research
Dependent variable: third reminder: (i) customer is then charged for his delay (ii) reminder really urges customer to pay and (iii) historically been used by the company. Profitability of the order was not used, since the class distribution got even more skewed then, which degraded the performance of all models severely.
Variables: it’s a strategic decision to limit the info required upon application. Nevertheless, we computed 45 variables for this study, which can be found in the appendix of the full paper.Catalog retailer offering consumer credit to its customers. Articles as diverse as furniture, electronics, gardening & DIY equipment and jewelry. This analysis was performed on a moment when the previous score – constructed by an international company specialized in consumer credit scoring – was to be updated since it had been in use for 6 years.
Orders between July 1st 2000 and February 1st 2002, but follow-up until February 1st 2003, so the outcome could be tracked for all orders.
Variables were inspired on the ideas of the company’s managers as well as previous research
Dependent variable: third reminder: (i) customer is then charged for his delay (ii) reminder really urges customer to pay and (iii) historically been used by the company. Profitability of the order was not used, since the class distribution got even more skewed then, which degraded the performance of all models severely.
Variables: it’s a strategic decision to limit the info required upon application. Nevertheless, we computed 45 variables for this study, which can be found in the appendix of the full paper.
10. Automatic scoring procedure and an independent manual selection procedure. A rather large set was investigated ‘manually’ regardless of their score.
Manual acceptance overrulesAutomatic scoring procedure and an independent manual selection procedure. A rather large set was investigated ‘manually’ regardless of their score.
Manual acceptance overrules
11. We coded 1 as defaulting orders, 0 as non-defaulting orders, so the p-value is a defaulting probability: the higher, the more risky
R3: of no important: rejected for strategic and/or legal reasons. Since these rules are long-term rules, these orders are of no importance for future credit scoring in the company.
Accepted by score: 36039 Rejected by score: 5471 => acceptance rate of 86.8%. Of the rejected orders, 36.7 % are overrides
In more than 95% of the orders rejected by the score, the orders were handled by a judgmental processWe coded 1 as defaulting orders, 0 as non-defaulting orders, so the p-value is a defaulting probability: the higher, the more risky
R3: of no important: rejected for strategic and/or legal reasons. Since these rules are long-term rules, these orders are of no importance for future credit scoring in the company.
Accepted by score: 36039 Rejected by score: 5471 => acceptance rate of 86.8%. Of the rejected orders, 36.7 % are overrides
In more than 95% of the orders rejected by the score, the orders were handled by a judgmental process
12.
Does sample bias occur ?
1A: Does a classifier trained on accepted orders only prove to be more erroneous on the calibration sample ? Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
13. Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
14.
Average AUC difference
0.0812 points
t Value 48.02
p < 0.0001 Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
15.
Does sample bias occur ?
1A: Does a classifier trained on accepted orders only prove to be more erroneous on the calibration sample ?
1B: Does a classifier trained on accepted orders only lead to the selection of other variables ? Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
16. In order to detect whether sample bias influences the variable-selection process, we compared the variable selection process on a sample of 50% of the accept region (2) with a proportional sample (1 and 3)In order to detect whether sample bias influences the variable-selection process, we compared the variable selection process on a sample of 50% of the accept region (2) with a proportional sample (1 and 3)
17.
Model sizes
All variables: 45
Selected variables: 31
Overlap: 24
Difference: 7 Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
By coincidence, in both models the model with 31 variables was selected
Now the interesting part follows. The degree to which the different variables influence credit scoring performance and profitability will now be investigatedData partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
By coincidence, in both models the model with 31 variables was selected
Now the interesting part follows. The degree to which the different variables influence credit scoring performance and profitability will now be investigated
18.
What is the impact of sample bias on credit scoring performance and profitability for a given sample size ?
Actual setting (86.8 % accepted)
Sensitivity analysis (70 % accepted)
Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
In order to enhance the comparison between the actual setting and the sensitivity analysis, we always presented the results side by side.Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
In order to enhance the comparison between the actual setting and the sensitivity analysis, we always presented the results side by side.
19. Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
Holdout sample is proportional: so we test the real-life situation (4 and 5)
Comparisons with a given sample size: 1+2 versus 1+3: only accepted orders compared to accepted and rejected ordersData partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
Holdout sample is proportional: so we test the real-life situation (4 and 5)
Comparisons with a given sample size: 1+2 versus 1+3: only accepted orders compared to accepted and rejected orders
20. Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
As a sensitivity analysis, we recreate the situation if there would have been an acceptance rate of 70%. Hence, a sample of 6833 orders with the highest default probabilities was appended to the calibration sample, whereby the impact of the previous calibration sample is reduced to 22% of the current sample, and the influence of the manual selection procedure is drastically reduced.
The holdout sample is again proportional to the through the door population, ensuring the 70 % proportionality.
Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
As a sensitivity analysis, we recreate the situation if there would have been an acceptance rate of 70%. Hence, a sample of 6833 orders with the highest default probabilities was appended to the calibration sample, whereby the impact of the previous calibration sample is reduced to 22% of the current sample, and the influence of the manual selection procedure is drastically reduced.
The holdout sample is again proportional to the through the door population, ensuring the 70 % proportionality.
21.
p . a + (1 – p ) . (1 – a )
(Morrison, 1969) The accuracy of a random model is defined by:
p = the true proportion of refunded orders a = the proportion of applicants that will be accepted for credit, Morrison JMR 1969The accuracy of a random model is defined by:
p = the true proportion of refunded orders a = the proportion of applicants that will be accepted for credit, Morrison JMR 1969
22. The second model clearly outperforms the other models in terms of PCC, but the impact on PCC that can be reached by including the calibration sample in a proportional way, seems to be low (0.0003), especially when compared to the difference resulting from the update of the model (0.0042).The second model clearly outperforms the other models in terms of PCC, but the impact on PCC that can be reached by including the calibration sample in a proportional way, seems to be low (0.0003), especially when compared to the difference resulting from the update of the model (0.0042).
23. The results are again completely analoguous to the PCC results, which strongly confirms the former observations:
1) the second model is the best model in terms of AUC, and the third performs significantly worse
2) the improvement of performance between models 2 and 1 seems relatively small when compared to the difference resulting from the update of the model.
The results are again completely analoguous to the PCC results, which strongly confirms the former observations:
1) the second model is the best model in terms of AUC, and the third performs significantly worse
2) the improvement of performance between models 2 and 1 seems relatively small when compared to the difference resulting from the update of the model.
24. Considering confidentiality, we do not reveal absolute profit information, but we represent the relative profit changes that exist.
In terms of Profitability, it is clear that, both in the actual setting as the sensitivity analysis, model 3 outperforms the other models. However, it must clearly be stated that the maximal improvement that can be made considering perfect reject inference is only 1 to 3 % higher in this setting. Additionally, this does not keep into account the costs that should be incurred by determining the outcome of a sample of the orders that would normally be rejected, or the time cost for applying a reject inferencing procedure.
Again, it is confirmed that profit results differ from classification performance results, and it would be up to management to decide upon the model that optimally meets the business objectives.
The profit gain from including the calibration sample rises as the reject region becomes more important, which was not tested, but it seems logical that the impact of sample bias grows when the bias itself grows.Considering confidentiality, we do not reveal absolute profit information, but we represent the relative profit changes that exist.
In terms of Profitability, it is clear that, both in the actual setting as the sensitivity analysis, model 3 outperforms the other models. However, it must clearly be stated that the maximal improvement that can be made considering perfect reject inference is only 1 to 3 % higher in this setting. Additionally, this does not keep into account the costs that should be incurred by determining the outcome of a sample of the orders that would normally be rejected, or the time cost for applying a reject inferencing procedure.
Again, it is confirmed that profit results differ from classification performance results, and it would be up to management to decide upon the model that optimally meets the business objectives.
The profit gain from including the calibration sample rises as the reject region becomes more important, which was not tested, but it seems logical that the impact of sample bias grows when the bias itself grows.
25. Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
In the previous sample composition, we clearly notice that a large part of the data (21800 orders) were not used in order to compare the situation of equal sample sizes. However, here, we will enlarge the homogeneous sample with this part to investigate whether increasing sample size of the homogeneous sample reduces the benefits due to proportionality.
Ps: the sample size will be reduced whenever alpha > sigma (acceptance rate > percentage of rejects for which the outcome is available) calculations can be found in the paper
Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
In the previous sample composition, we clearly notice that a large part of the data (21800 orders) were not used in order to compare the situation of equal sample sizes. However, here, we will enlarge the homogeneous sample with this part to investigate whether increasing sample size of the homogeneous sample reduces the benefits due to proportionality.
Ps: the sample size will be reduced whenever alpha > sigma (acceptance rate > percentage of rejects for which the outcome is available) calculations can be found in the paper
26.
Does a trade-off exist between proportionality and sample size ? Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
27. Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
This group of 21800 orders was not used as such, but split randomly into 50 samples of 436 orders each, where the average amount of defaulters was 7.6 per sample. These samples were added successively.
Data partitioning repeated 100 times with a stratified sampling procedure (allocating an equal percentage of defaulters to both groups)
This group of 21800 orders was not used as such, but split randomly into 50 samples of 436 orders each, where the average amount of defaulters was 7.6 per sample. These samples were added successively.
28. Comparison of the model with accepted orders only with a proportional model (but variables selected based on the accepted orders only), if we add the 50 data samples to the sample of accepted orders. The positive slope here means that, as the size of the homogenous sample rises, the quality of the model increases.
The point on the intersection with the vertical axes (most to left) is the situation of equal sample sizes that was discussed before. There, we saw earlier that the model with accepted orders only performed significantly worse than the proportional model (indicated by a black mark as opposed to a nonsignificant white mark). Yet as sample size of the homogeneous sample increases, this difference quickly becomes insignificant, and when all data is added, the quality of the model with only accepted orders is clearly higher than the proportional model.
The second graph: we saw earlier that, with equal sample sizes, pcc performance was not significantly different if we used the variables selected on the proportional sample. Also, the quality of the model built on accepted orders only rises, and becomes significantly higher than the proportional sample as sample size of the homogeneous sample increases.Comparison of the model with accepted orders only with a proportional model (but variables selected based on the accepted orders only), if we add the 50 data samples to the sample of accepted orders. The positive slope here means that, as the size of the homogenous sample rises, the quality of the model increases.
The point on the intersection with the vertical axes (most to left) is the situation of equal sample sizes that was discussed before. There, we saw earlier that the model with accepted orders only performed significantly worse than the proportional model (indicated by a black mark as opposed to a nonsignificant white mark). Yet as sample size of the homogeneous sample increases, this difference quickly becomes insignificant, and when all data is added, the quality of the model with only accepted orders is clearly higher than the proportional model.
The second graph: we saw earlier that, with equal sample sizes, pcc performance was not significantly different if we used the variables selected on the proportional sample. Also, the quality of the model built on accepted orders only rises, and becomes significantly higher than the proportional sample as sample size of the homogeneous sample increases.
29. Again, AUC results are very comparable and a little more stable than PCC graphs. This confirms that, in terms of classification accuracy, sample size is more important than sample bias.Again, AUC results are very comparable and a little more stable than PCC graphs. This confirms that, in terms of classification accuracy, sample size is more important than sample bias.
30. Again, the profitability results show a different picture.
The graph on the right here is important, because this was the best model in the previous analysis in terms of profitability. It is the case that adding more data to the homogeneous sample of accepted orders does improve the profitability slightly, but never up to the point where the profitability is higher than the profitability of the proportional sample. Again, this seems to indicate that the impact of sample bias is a little higher in terms of profitability than in terms of predictive accuracy, although the impact is still modest (around 1% of profits).Again, the profitability results show a different picture.
The graph on the right here is important, because this was the best model in the previous analysis in terms of profitability. It is the case that adding more data to the homogeneous sample of accepted orders does improve the profitability slightly, but never up to the point where the profitability is higher than the profitability of the proportional sample. Again, this seems to indicate that the impact of sample bias is a little higher in terms of profitability than in terms of predictive accuracy, although the impact is still modest (around 1% of profits).
31. Effect of Sample Bias
Significant yet modest improvements
Predictive performance differs from profitability
- Impact of the inclusion of calibration set in training sample and variable selection procedure
Proportionality vs sample size
In contradiction with other studies, we have not proposed a reject inferencing technique, but we estimate the maximal improvement that could be reached if the reject inference procedure was flawless.
Modest improvements, especially when compared to the improvements reached due to updating the model and creating new variables, and when the cost of gaining such a sample should be accounted for (they are an upper limit for improvement).
It seems at least counter intuitive that a mere expansion of the homogeneous sample can cover, at least in terms of predictive accuracy the lack of calibration sample.
To conclude the effect of proportionality prevails, and enhancing proportionality can lead to improvements in classification accuracy and profitability. However, at least in this mail-order credit setting, the resulting benefits of any possible reject inferencing techniques are low.In contradiction with other studies, we have not proposed a reject inferencing technique, but we estimate the maximal improvement that could be reached if the reject inference procedure was flawless.
Modest improvements, especially when compared to the improvements reached due to updating the model and creating new variables, and when the cost of gaining such a sample should be accounted for (they are an upper limit for improvement).
It seems at least counter intuitive that a mere expansion of the homogeneous sample can cover, at least in terms of predictive accuracy the lack of calibration sample.
To conclude the effect of proportionality prevails, and enhancing proportionality can lead to improvements in classification accuracy and profitability. However, at least in this mail-order credit setting, the resulting benefits of any possible reject inferencing techniques are low.
32.
Direct-mail company
- acceptance rate 86.8 %
- default percentage 1.94 %
- misclassification costs: ratio 2.58
Methodology of sensitivity analysis
While we tried to improve the external validity of this study in the sensitivity analysis, the study was executed on the data of a direct-mail company.
The cost involved with a defaulter was only 2.58 times higher than the profit gained from a non-defaulter.
Further research depends largely on the availability of other credit scoring datasets.While we tried to improve the external validity of this study in the sensitivity analysis, the study was executed on the data of a direct-mail company.
The cost involved with a defaulter was only 2.58 times higher than the profit gained from a non-defaulter.
Further research depends largely on the availability of other credit scoring datasets.