Predicting Life Expectancy using Machine Learning Techniques

Authors

  • Antora Das Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
  • Md Mahfuz Uddin Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
  • Md Rezaul Karim Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh

DOI:

https://doi.org/10.3329/ijss.v25i1.81045

Keywords:

Life expectancy, Boruta algorithm, Regularized Random Forest algorithm, Linear Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbor, Gradient Boosting, XGBoost, and Neural Network (NN).

Abstract

Life expectancy is a key measure of a country's overall health, socioeconomic development, and quality of life. The main objective of the study is to identify key factors influencing life expectancy using ‘Cleaned-Life-Exp’ standardized data from the World Health Organization (WHO) and to compare the performances of life expectancy prediction using various machine learning algorithms. The key influencing features on life expectancy are selected using Boruta and Regularized Random Forest (RRF) algorithms. Eight machine learning models such as Linear Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting (GB), XGBoost (XGB), and Neural Network (NN) are evaluated for predictive performance of life expectancy. Evaluation metrics such as the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) are applied to evaluate the performance of the models. Boruta and Regularized Random Forest (RRF) algorithms identified the same 20 significant predictors, including Income Composition of Resources, HIV/AIDS, Adult Mortality, and Schooling, as the most influential features. Among the eight machine learning models evaluated, Random Forest achieves the highest performance (R2 = 0.969, RMSE = 0.179, MAE = 0.116), highlighting the superiority of ensemble methods. Support Vector Machine (SVM) performs well, while Decision Tree and KNN show moderate performance. Linear Regression and Neural Networks have the lowest predictive performances. This study will help to provide a better predictive framework using machine learning models, which can guide policymakers in improving life expectancy prediction.

IJSS, Vol. 25(1), March, 2025, pp 55-70

Abstract
89
PDF
66

Downloads

Published

2025-04-17

How to Cite

Das, A., Uddin, M. M., & Karim, M. R. (2025). Predicting Life Expectancy using Machine Learning Techniques. International Journal of Statistical Sciences , 25(1), 55–70. https://doi.org/10.3329/ijss.v25i1.81045

Issue

Section

Original Articles