Journal of Statistics Applications & Probability

Author Country (or Countries)



This paper explores churn prediction for savings account customer, based on various statistical & machine learning models and uses under-sampling, to improve the predictive power of these models, considering the imbalance characteristics of customer churn rate in the data. Model Accuracy, Area under the curve (AUC), Gini coefficient, and Receiver Operating Characteristics (ROC) curve have been utilized for model comparison. The results show that out of the various machine learning models, Random Forest which predicts the churn with 78% accuracy, is the most powerful model for the scenario. Customer vintage, customer’s age, average balance, occupation code, population type, average debit amount, and an average number of transactions are found to be the variables with high predictive power for the churn prediction model. The commercial banks can deploy the model in order to avoid the customer churn so that they may retain the funds, which are kept by savings bank (SB) customers. The article suggests a customized campaign to be initiated by commercial banks to avoid SB customer churn. Hence, by giving better customer satisfaction and experience, the commercial banks can limit the customer churn and maintain their deposits.

Digital Object Identifier (DOI)