APA Style
Rahibu Abdalla Abassi, Rocky Rajabu Akarro. (2026). A Comparative Evaluation of Imputation Techniques for Missing Data: A Simulation-Based Analysis. Computing&AI Connect, 3 (Article ID: 0033). https://doi.org/Registering DOIMLA Style
Rahibu Abdalla Abassi, Rocky Rajabu Akarro. "A Comparative Evaluation of Imputation Techniques for Missing Data: A Simulation-Based Analysis". Computing&AI Connect, vol. 3, 2026, Article ID: 0033, https://doi.org/Registering DOI.Chicago Style
Rahibu Abdalla Abassi, Rocky Rajabu Akarro. 2026. "A Comparative Evaluation of Imputation Techniques for Missing Data: A Simulation-Based Analysis." Computing&AI Connect 3 (2026): 0033. https://doi.org/Registering DOI.
ACCESS
Research Article
Volume 3, Article ID: 2026.0033
Rahibu Abdalla Abassi
r.bassi@suza.ac.tz
Rocky Rajabu Akarro
akarror@gmail.com
1 Department of Natural Sciences, State University of Zanzibar, P.O. Box 146 Zanzibar-Tanzania
2 Department of Statistics, University of Dar es Salaam, P.O.Box 35091, Dar es Salaam, Tanzania
* Author to whom correspondence should be addressed
Received: 05 Nov 2025 Accepted: 03 Apr 2026 Available Online: 03 Apr 2026
Missing data frequently occur in research and if not properly addressed before analysis, can adversely affect the validity of findings. This article evaluates the efficiency of various imputation techniques as formal methods for replacing missing covariates’ data. Root Mean Squared Error (RMSE) were calculated for each missing data under mechanisms of MCAR and MAR to ascertain the method of imputation that yield lower values of RMSE under various simulation conditions. The results show that the RMSE for every applied imputation technique increased as proportional of missing data got increased under both MAR and MCAR mechanisms. Under MCAR mechanism, both simulated and non-simulated data provided quite similar trends for three multiple imputation-based techniques; Multiple Imputations Chained Equations (MICE), Expected Maximisation via Bootstrapping (EMB), and Predictive Mean Matching (PMM) except for single-based technique (Series MEAN) that yield RMSE values that substantially different. Amongst applied Multiple Imputations (MI) techniques, the PMM techniques yields the least values of RMSE 5.80 and 7.50, respectively for imputed simulated data with 15% missing rate under MAR mechanism and non-simulated data. The study indicates that when treating missing data, the utilization of multiple imputation techniques is preferable, as they address uncertainty and improve efficiency. It is recommended to compare findings from both imputed and original datasets to evaluate how missing data influences the analysis. For clarity, researchers should also present the means and standard errors for both imputed and non-imputed data.
Disclaimer: This is not the final version of the article. Changes may occur when the manuscript is published in its final format.
We use cookies to improve your experience on our site. By continuing to use our site, you accept our use of cookies. Learn more