
Neuro-Fuzzy Modeling Techniques in Economics
ISSN 2415-3516
Дослідження залежності ефективності моделювання кредитоспроможності позичальників від способу формування контрольної вибірки
Research of the dependence of the efficiency of modeling the creditworthiness of borrowers on the method of forming a control subset
DOI:
10.33111/nfmte.2020.156
Анотація: У статті проведено дослідження з підвищення стійкості результатів класифікації кредитоспроможності боржників комерційного банку з використанням бустингових дерев рішень та нейромережевих алгоритмів за рахунок застосування стратифікованого семплінгу. Запропоновано удосконалення класичної процедури стратифікованого семплінгу шляхом врахування при формуванні контрольної вибірки не тільки цільової змінної, але й найбільш значущих предикторів моделі.
Експериментальні розрахунки для перевірки висунутих гіпотез проведено з використанням програмних пакетів LGBM і H2O на даних міжнародного провайдера споживчого кредитування Home Credit. У статті перевірено та підтверджено, що використання стратифікованого семплінгу в процесі формування контрольної вибірки під час навчання моделей машинного навчання дозволяє підвищити їх стабільність і точність прогнозів на нових наборах даних.
Відповідно до отриманих результатів, авторський підхід до стратифікованого семплінгу при формуванні контрольного набору даних за цільовою змінною та найбільш значущими характеристиками моделі демонструє вищу середню точність для бустингових дерев рішень на тестовій вибірці в порівнянні зі стандартним стратифікованим алгоритмом семплінгу та випадковим відбором.
Abstract: In the article has been conducted a research aiming increase of classification result stability of commercial bank’s debtor creditworthiness with usage of boosted decision trees and neural network algorithms due to the use of stratified sampling. It is proposed to improve the classical procedure of stratified sampling by taking into account not only the target variable, but also the most significant predictors of the model when forming the control subset.
Experimental calculations to test the proposed hypotheses were carried out using the program packages LGBM and H2O on the data of international consumer finance provider Home Credit. In the article checked and confirmed that the use of stratified sampling in the process of forming a control subset during training of machine learning models makes possible to increase their stability and accuracy of forecasts on new data sets.
As per the achieved results, the authors’ approach of stratified sampling during forming a control dataset by target variable and the most significant characteristics of a model demonstrates a higher average accuracy for boosted decision trees on the test subset compared to the standard stratified sampling algorithm and random selection.
Ключові слова: дерево рішень, градієнтний бустинг, нейронна мережа, стратифікований семплінг
Key words: decision tree, gradient boosting, neural network, stratified sampling
УДК: 330.4
UDC: 330.4
JEL: C38 C45 C51 C52 C63
To cite paper
In APA style
Pyrohov, V., & Turchenko, S. (2020). Research of the dependence of the efficiency of modeling the creditworthiness of borrowers on the method of forming a control subset. Neuro-Fuzzy Modeling Techniques in Economics, 9, 156-174. http://doi.org/10.33111/nfmte.2020.156
In MON style
Пирогов В., Турченко С. Дослідження залежності ефективності моделювання кредитоспроможності позичальників від способу формування контрольної вибірки. Нейро-нечіткі технології моделювання в економіці. 2020. № 9. С. 156-174. http://doi.org/10.33111/nfmte.2020.156 (дата звернення: 18.06.2025).
With transliteration
Pyrohov, V., Turchenko, S. (2020) Doslidzhennia zalezhnosti efektyvnosti modeliuvannia kredytospromozhnosti pozychalnykiv vid sposobu formuvannia kontrolnoi vybirky [Research of the dependence of the efficiency of modeling the creditworthiness of borrowers on the method of forming a control subset]. Neuro-Fuzzy Modeling Techniques in Economics, no. 9. pp. 156-174. http://doi.org/10.33111/nfmte.2020.156 (accessed 18 Jun 2025).

Download Paper
322
Views
79
Downloads
0
Cited by
- Harris, R. (2016, December 23). More data will be created in 2017 than the previous 5,000 years of humanity. App Developer Magazine. https://appdevelopermagazine.com/more-data-will-be-created-in-2017-than-the-previous-5,000-years-of-humanity-/
- Onay, C., & Öztürk, E. (2018). A review of credit scoring research in the age of Big Data. Journal of Financial Regulation and Compliance, 26(3), 382-405. https://doi.org/10.1108/JFRC-06-2017-0054
- Desai, V.S., Crook, J.N., & Overstreet, G.A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37. https://doi.org/10.1016/0377-2217(95)00246-4
- Desai, V. S., Conway, D. G., Crook, J. N., & Overstreet, G. A. (1997). Credit-scoring models in the credit-union environment using neural networks and genetic algorithms. IMA Journal of Management Mathematics, 8(4), 323-346. https://doi.org/10.1093/imaman/8.4.323
- Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. Computational Economics, 15(1), 107–143. https://doi.org/10.1023/A:1008699112516
- West, D. (2000). Neural network credit scoring models. Computers & Operations Research, 27(11-12), 1131–1152. https://doi.org/10.1016/S0305-0548(99)00149-5
- Chuang, Ch.-L., & Huang, S.-T. (2011). A hybrid neural network approach for credit scoring. Expert Systems, 28(2), 185-196. https://doi.org/10.1111/j.1468-0394.2010.00565.x
- Hryhorovych, O. (2019). Application of multilayer perceptrons to legal entities borrowers classification. Neiro-Nechitki Tekhnolohii Modelyuvannya v Ekonomitsi (Neuro-Fuzzy Modeling Techniques in Economics), 8, 48-64. http://doi.org/10.33111/nfmte.2019.048 [in Ukrainian]
- Munkhdalai, L., Lee, J.Y., & Ryu, K.H. (2020). A Hybrid Credit Scoring Model Using Neural Networks and Logistic Regression. In J.S. Pan, J. Li, P.W. Tsai, & L. Jain (Eds.), Smart Innovation, Systems and Technologies: Vol. 156. Advances in Intelligent Information Hiding and Multimedia Signal Processing (pp. 251–258). Springer. https://doi.org/10.1007/978-981-13-9714-1_27
- Kocadağlı, O., & Soydaner, D. (2015). Artificial Neural Networks with Gradient Learning Algorithm for Credit Scoring. Istanbul University Journal of the School of Business, 44(2), 3-12. https://dergipark.org.tr/en/pub/iuisletme/issue/9259/115847
- Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168. https://doi.org/10.1109/ACCESS.2018.2887138
- Akkoç, S. (2019). Exploring the nature of credit scoring: a neuro fuzzy approach. Fuzzy Economic Review, 24(1), 3–24. http://doi.org/10.25102/fer.2019.01.01
- Doskocil, R. (2017). Evaluating the Creditworthiness of a Client in the Insurance Industry Using Adaptive Neuro-Fuzzy Inference System. Engineering Economics, 28(1), 15-24. https://doi.org/10.5755/j01.ee.28.1.14194
- Mehdiyev, N. (2020). Application of Fuzzy TOPSIS for Credit Scoring. In C. Kahraman, S. Cebi, S. Cevik Onar, B. Oztaysi, A. Tolga, & I. Sari. (Eds.), Advances in Intelligent Systems and Computing: Vol. 1029. Intelligent and Fuzzy Techniques in Big Data Analytics and Decision Making (pp. 779–786). Springer. https://doi.org/10.1007/978-3-030-23756-1_93
- Sohn, S., Kim, D.-H., & Yoon, J. (2016). Technology credit scoring model with fuzzy logistic regression. Applied Soft Computing, 43, 150-158. https://doi.org/10.1016/j.asoc.2016.02.025
- Eghbali, A., Razavi Hajiagha, S. H., & Amoozad, H. (2017). Performance Comparison of Genetic Algorithm Fitness Function in Customer Credit Scoring. Industrial Management Journal, 9(2), 245-264. http://doi.org/10.22059/imj.2017.226860.1007191
- Kozeny, V. (2015). Genetic algorithms for credit scoring: alternative fitness function performance comparison. Expert Systems with Applications, 42(6), 2998-3004. https://doi.org/10.1016/j.eswa.2014.11.028
- Frydman, H., & Matuszyk, A. (2020). Random survival forest for competing credit risks. Journal of the Operational Research Society, 73(1), 15-25. http://doi.org/10.1080/01605682.2020.1759385
- Rudra Kumar, M., & Kumar Gunjan, V. (2020). Review of Machine Learning models for Credit Scoring Analysis. Ingeniería Solidaria, 16(1), Article 11. https://doi.org/10.16925/2357-6014.2020.01.11
- Veeramanikandan, V., & Jeyakarthic, M. (2019). An ensemble model of outlier detection with random tree data classification for financial credit scoring prediction system. International Journal of Recent Technology and Engineering, 8(3), 7108-7114. http://doi.org/10.35940/ijrte.C5850.098319
- Bramer, M. (2020). Avoiding Overfitting of Decision Trees. In Principles of Data Mining (4th ed., pp. 121–136). Springer. https://doi.org/10.1007/978-1-4471-7493-6_9
- Caruana, R., Lawrence, S., & Giles, L. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In T.K. Leen, T.G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems 13 (pp. 402-408), MIT Press.
- Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series, 1168(2), Article 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
- Velykoivanenko, H., Korchynskyi, V., & Chernyshova, V. (2016). Study of the neural networks overfitting effect on the example of the problem of application scoring. Neiro-Nechitki Tekhnolohii Modelyuvannya v Ekonomitsi (Neuro-Fuzzy Modeling Techniques in Economics), 5, 3-23. https://doi.org/10.33111/nfmte.2016.003 [in Ukrainian]
- Savina, S., & Ben, V. (2016). Selection of neural network architecture for solving problem of borrowers-individuals trustability classification. Neiro-Nechitki Tekhnolohii Modelyuvannya v Ekonomitsi (Neuro-Fuzzy Modeling Techniques in Economics), 5, 123–151. https://doi.org/10.33111/nfmte.2016.123 [in Ukrainian]
- Chen, T., & Guestrin, C. (2016, August). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794). Association for Computing Machinery. https://doi.org/10.48550/arXiv.1603.02754
- Menardi, G., Tedeschi, F., & Torelli, N. (2011). On the Use of Boosting Procedures to Predict the Risk of Default. In B. Fichet, D. Piccolo, R. Verde, & M. Vichi (Eds.), Classification and Multivariate Analysis for Complex Data Structures (pp. 211–218). Springer. https://doi.org/10.1007/978-3-642-13312-1_21
- Thompson, S. (2012). Stratified Sampling. In Sampling (3rd ed., pp. 139-156). John Wiley & Sons. https://doi.org/10.1002/9781118162934.ch11
- Neyman, J. (1934). On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4), 558-625. https://www.jstor.org/stable/2342192
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
- Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30 (pp. 3149-3157). Neural Information Processing Systems Foundation. https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
- H2O. (2016). Welcome to H2O 3. R Users. UpToDate. Retrieved February 20, 2020, from https://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html#r-users
- Kaggle. (2020). Home Credit Default Risk. Dataset Description [Data set]. Retrieved Fabruary 10, 2020, from https://www.kaggle.com/c/home-credit-default-risk/data
- Kleban, Y. (2019). Studying the methods of data transformation in the context of increasing the effectiveness of credit scoring models. Neiro-Nechitki Tekhnolohii Modelyuvannya v Ekonomitsi (Neuro-Fuzzy Modeling Techniques in Economics), 8, 94—123. https://doi.org/10.33111/nfmte.2019.094 [in Ukrainian]