Customer Churn Prediction in Telecommunication Using Machine Learning: A Comparison Study

Document Type : Research Article


Department of Electrical Engineering (Communication), Tarbiat Modares University


Telecommunication operators need to accurately predict the customer churn for surviving in the Telecom market. There is a huge volume of customer records such as calls, SMSs and the use of Internet. This data contains rich and valuable information about costumer behavior and his/her pattern consumption. Machine learning is a powerful tool for extraction of costumer information that can be useful for churn prediction. Although several researchers have studies some types of machine learning methods, but, there is not any work which assess different methods from various point of views. The aim of this work is to assess the performance of a wide range of machine learning methods for churn prediction in the form of a comparison study. In this paper, various machine learning methods consisting of 7 classifiers, 7 target detectors, 10 feature reduction methods containing 4 feature extraction algorithms and 6 feature selection ones are discussed. The performance of these methods are experimented on three Telecom datasets with 6 evaluation measures. The results show that the random forest and feed-forward neural network beside the genetic algorithm outperform other competitors. The superior methods achieve 97%, 62% and 93% prediction accuracy in BigML, kaggle and Telco customer churn datasets, respectively.  


Main Subjects

[1]Shirazi, F., Mohammadi, M.. A big data analytics model for customer churn prediction in the retiree segment. International Journal of Information Management, 48 (2019), 238-253.
[2]Ahn, J.-H. , Han, S. P., Lee, Y.S.  Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry. Telecommunications Policy, 30 (2006), 552–568.
[3]Kim, H.-S., Yoon., C.-H.. Determinants of Subscriber Churn and Customer Loyalty in the Korean Mobile Telephony Market. Telecommunications Policy, 28 (2004), 751-765.
[4]Keramati, A., Ardabili, S. M.S.. Churn analysis for an Iranian mobile operator. Telecommunications Policy, 35 (4) (2011), 344-356.
[5]Kisioglu, P., Topcu, Y. I. Applying Bayesian Belief Network approach to customer churn analysis: A case study on the telecom industry of Turkey. Expert Systems with Applications, 38 (6) (2011), 7151-7157.
[6]Maldonado, S., Flores, Á., Verbraken, T., Baesens, B., Weber, R.. Profit-based feature selection using support vector machines – General framework and an application for customer retention. Applied Soft Computing, 35 (2015), 740-748.
[7]Amin, A., Anwar, S., Adnan, A., Nawaz, M., Alawfi, K., A. H., Huang, K. Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing, 237 (2017), 242-254.
[8]Caigny, A. D., Coussement, K., De Bock, K.W. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269 (2) (2018), 760-772.
[9]Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., Abbasi, U. Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing, 24 (2014), 994-1012.
[10]Tsai, C.-F., Lu, Y.-H. Customer churn prediction by hybrid neural networks. Expert Systems with Applications, 36 (10) (2009), 12547-12553.
[11]Vafeiadis, T., Diamantaras, K.I., Sarigiannidis, G., Chatzisavvas, K.Ch. A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55 (2015), 1-9.
[12]De Bock, K. W., Poel, D. V. d. An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Systems with Applications, 38 (10) (2011), 12293-12301.
[13]Idris, A., Rizwan, M., Khan, A. Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies. Computers & Electrical Engineering, 38 (6) (2012), 1808-1819.
[14]Tan, Steinbach, Kumar. Introduction to Data Mining, chapter 4, Classification: Basic Concepts, Decision Trees, and Model Evaluation (2004).
[15]Shang, Z., Deng, T., He, J., Duan, X. A novel model for hourly PM2.5 concentration prediction based on CART and EELM. Science of The Total Environment, 651 (Part 2) (2019), 3043-3052.
[16]Cutler, A. Random Forests for Regression and Classification, Utah State University, Ovronnaz, Switzerland, September 15-17 (2010).
[17]Chen, L., Su, W., Feng, Y., Wu, M., She, J., Hirota, K. Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction. Information Sciences, 509 (2020), 150-163.
[18]Akca, Z. and Kaya, R. On the Taxicab Trigonometry. Jour. of Inst. of Math& Comp. Sci. (Math. Ser) 10 (3) (1997), 151-159.
[19]Ben-Hur A., Weston J. A User’s Guide to Support Vector Machines. In: Carugo O., Eisenhaber F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology (Methods and Protocols), 609 (2010), Humana Press.
[20]Svozil, D., Kvasnicka, V., Pospichal, J. Introduction to multi-layer feed-forward neural networks. Chemometrics and Intelligent Laboratory Systems, 39 (1) (1997), 43-62.
[21]Hochreiter, S.  Long Short-Term Memory, Neural Computation, 9 (8) (1997), 1735-1780.
[23]Kwon, H., Nasrabadi, N.M. A Comparative Analysis of Kernel Subspace Target Detectors for Hyperspectral Imagery. EURASIP Journal on Advances in Signal Processing, Article number: 029250 (2006) (2007),
[24]Scharf, L. L., Friedlander, B., Matched subspace detectors. IEEE Transactions on Signal Processing, 42, (8) (1994), 2146-2157.
[25]Harsanyi, J. C. and Chang, C.-I. Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach. IEEE Transactions on Geoscience and Remote Sensing, 32 (4) (1994), 779-785.
[26]Kraut, S., Scharf, L. L., McWhorter, L. T. Adaptive subspace detectors. IEEE Transactions on Signal Processing, 49 (1) (2001), 1-16.
[27]Zhao, R., Z., Shi, Zou, Z., Zhang, Z. Ensemble-Based Cascaded Constrained Energy Minimization for Hyperspectral Target Detection. Remote Sens., 11 (11) (2019), 1310.
[28]Kruse, F.A., Lefkoff, A.B., Boardman, J.W., Heidebrecht, K.B., Shapiro, A.T., Barloon, P.J., Goetz, A.F.H. The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data. Remote Sensing of Environment, 44 (2–3) (1993),145-163.
[29]Camps-Valls, G. Kernel spectral angle mapper. Electronics Letters, 52 (14) (2016), 1218-1220.
[30]Zhang, Y., Du, B., Zhang, L. A Sparse Representation-Based Binary Hypothesis Model for Target Detection in Hyperspectral Images. IEEE Transactions on Geoscience and Remote Sensing, 53 (3) (2015), 1346-1354.
[31]Cui, M., Prasad, S. Sparse representation-based classification: Orthogonal least squares or orthogonal matching pursuit?. Pattern Recognition Letters, 84 (2016), 120-126.
[32]Imani, M., Ghassemian, H. Binary coding based feature extraction in remote sensing high dimensional data. Information Sciences, 342 (2016), 191-208,
[33]Fukunaga, K. Introduction to Statistical Pattern Recognition. Academic Press Inc., San Diego (1990).
[34]Liang, Z., Shi, P. Kernel direct discriminant analysis and its theoretical foundation. Pattern Recognition, 38 (3), 445-447 (2005).
[35]Imani, M., Ghassemian, H. Band Clustering-Based Feature Extraction for Classification of Hyperspectral Images Using Limited Training Samples. IEEE Geoscience and Remote Sensing Letters, 11 (8) (2014), 1325-1329.
[36]Imani, M., Ghassemian, H. Feature Extraction Using Median-Mean and Feature Line Embedding. International Journal of Remote Sensing, 36 (17) (2015), 4297-4314.
[37]Omran, M. G.H., Al-Sharhan, S. Improved continuous Ant Colony Optimization algorithms for real-world engineering optimization problems. Engineering Applications of Artificial Intelligence, 85 (2019), 818-829.
[38]Kashef, S., Nezamabadi-pour, H. An advanced ACO algorithm for feature subset selection. Neurocomputing, 147 (2015), 271-279.
[39]Robnik-Šikonja, M., Kononenko, I. Machine Learning Journal, 53 (2003), 23-69.
[40]Megchelenbrink, W. Relief-based feature selection in bioinformatics: detecting functional specificity residues from multiple sequence alignments, Master thesis in information science, supervised by Elena Marchiori and Peter Lucas, Radboud University Nijmegen (2010).
[41]Roffo, G., Feature Selection Library (MATLAB Toolbox), Feature Selection Library (FSLib) (2018), arXiv:1607.01327.
[42]Du, L.,  Shen, Y.-D., Unsupervised Feature Selection with Adaptive Structure Learning (2015), arXiv:1504.00736v1 [cs.LG].
[43]Tibshirani, R., Regression Shrinkage and Selection via the Lasso. Journal of The Royal Statistical Society, Series B (1994).
[44]Stripling, E., Broucke, S., Antonio, K., Baesens, B., Snoeck, M., Profit maximizing logistic model for customer churn prediction using genetic algorithms. Swarm and Evolutionary Computation, 40 (2018), 116-130.
[45]Sivakumar, S., Chandrasekar, C., Feature Selection Using Genetic Algorithm with Mutual Information. International Journal of Computer Science and Information Technologies, 5 (3) (2014), 2871-2874.
[46]Peng, H.-Y., Jiang, C.-F., Fang, X., Liu, J.-S., Variable selection for Fisher linear discriminant analysis using the modified sequential backward selection algorithm for the microarray data. Applied Mathematics and Computation, 238 (2014), 132-140.
[47]Yan, C., Liang, J., Zhao, M., Zhang, X., Zhang, T., Li, H., A novel hybrid feature selection strategy in quantitative analysis of laser-induced breakdown spectroscopy, Analytica Chimica Acta, 1080 (2019), 35-42.
[48]Amin, A., Anwar, S., Adnan, A., Nawaz, M., Alawfi, K., Hussain, A., Huang, K., Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing, 237 (2017), 242-254.
[49]Tunga, M. A., Karahoca, A., Detecting GSM churners by using Euclidean Indexing HDMR. Applied Soft Computing, 27 (2015), 38-46.
[50]Li, P., Li, S., Bi, T., Liu, Y., Telecom customer churn prediction method based on cluster stratified sampling logistic regression. International Conference on Software Intelligence Technologies and Applications & International Conference on Frontiers of Internet of Things, Hsinchu (2014), 282-287.
[51]Othman, N. H., Lee, K. Y., Radzo, A. R. M. l, Mansor,  W., Rashid, U. R. M., Optimal PCA-EOC-KNN Model for Detection of NS1 from Salivary SERS Spectra. 2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Bangkok (2018), 204-208
[52]Bishop, C. M., Pattern Recognition and Machine Learning, Springer (2006).
[53]Fukunaga, K., Introduction to Statistical Pattern Recognition, 2nd ed. New York: Academic (1990).