
Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data
Epidemiology and Health Data Insights, 1(4), 2025, ehdi015, https://doi.org/10.63946/ehdi/17159
Publication date: Sep 24, 2025
ABSTRACT
Chronic diseases remain a leading cause of global mortality, underscoring the need for developing reliable models that predict mortality prediction to guide individualized treatments and optimize resource allocation. This methodological note presents a reproducible framework for predicting one-year mortality in chronic disease patients using large-scale administrative healthcare data. The approach employs retrospective cohort design, year-specific subcohorts, and a stratified 5-fold cross-validation using a broad range of machine learning models. Performance is assessed with multiple metrics, including AUC, sensitivity, specificity, and balanced accuracy, to account for class imbalance. Model interpretability is enhanced through SHapley Additive exPlanations (SHAP), enabling identification of key mortality predictors and their directional impact. The proposed framework is general and can be applied to different chronic diseases. It has already been successfully demonstrated in nationwide cohorts of patients with diabetes mellitus and chronic viral hepatitis in Kazakhstan, achieving AUC values of 0.74–0.80, comparable to international benchmarks despite relying on administrative data alone. The method is scalable and adaptable, allowing integration of laboratory and clinical data with feature selection to address high-dimensionality challenges. Its generalizability and clinical relevance, however, should be validated in practice using enriched datasets across additional chronic diseases and diverse populations.
KEYWORDS
CITATION (Vancouver)
Arupzhanov I, Alimbayev A, Seyil T, Aimyshev T, Maulenkul T, Oshibayeva A, et al. Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data. Epidemiology and Health Data Insights. 2025;1(4):ehdi015. https://doi.org/10.63946/ehdi/17159
APA
Arupzhanov, I., Alimbayev, A., Seyil, T., Aimyshev, T., Maulenkul, T., Oshibayeva, A., & Gaipov, A. (2025). Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data. Epidemiology and Health Data Insights, 1(4), ehdi015. https://doi.org/10.63946/ehdi/17159
Harvard
Arupzhanov, I., Alimbayev, A., Seyil, T., Aimyshev, T., Maulenkul, T., Oshibayeva, A., and Gaipov, A. (2025). Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data. Epidemiology and Health Data Insights, 1(4), ehdi015. https://doi.org/10.63946/ehdi/17159
AMA
Arupzhanov I, Alimbayev A, Seyil T, et al. Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data. Epidemiology and Health Data Insights. 2025;1(4), ehdi015. https://doi.org/10.63946/ehdi/17159
Chicago
Arupzhanov, Iliyar, Aidar Alimbayev, Temirlan Seyil, Temirgali Aimyshev, Tilektes Maulenkul, Ainash Oshibayeva, and Abduzhappar Gaipov. "Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data". Epidemiology and Health Data Insights 2025 1 no. 4 (2025): ehdi015. https://doi.org/10.63946/ehdi/17159
MLA
Arupzhanov, Iliyar et al. "Methodological Note on Predicting One-Year Mortality for Chronic Diseases Using Administrative Data". Epidemiology and Health Data Insights, vol. 1, no. 4, 2025, ehdi015. https://doi.org/10.63946/ehdi/17159
REFERENCES
- World Health Organization. Noncommunicable diseases. 2024. https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases (accessed Aug 19, 2025).
- Schwartz L, Anteby R, Klang E, Soffer S. Stroke mortality prediction using machine learning: Systematic review. J Neurol Sci. 2023;444:120529. doi: 10.1016/j.jns.2022.120529
- Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: A systematic review. Diabetol Metab Syndr. 2021;13(1):148. doi: 10.1186/s13098-021-00767-9
- Alimbayev A, Zhakhina G, Gusmanov A, Sakko Y, Yerdessov S, Arupzhanov I, et al. Predicting 1-year mortality of patients with diabetes mellitus in Kazakhstan based on administrative health data using machine learning. Sci Rep. 2023;13(1):8427. doi: 10.1038/s41598-023-35551-4
- Arupzhanov I, Syssoyev D, Alimbayev A, Zhakhina G, Sakko Y, Yerdessov S, et al. One-year mortality prediction of patients with hepatitis in Kazakhstan based on Administrative Health Data: A machine learning approach. Electron J Gen Med. 2024;21(6):em15747. doi: 10.29333/ejgm/15747
- Gusmanov A, Zhakhina G, Yerdessov S, Sakko Y, Mussina K, Alimbayev A, et al. Review of the research databases on population-based registries of Unified Electronic Healthcare System of kazakhstan (UNEHS): Possibilities and limitations for epidemiological research and real-world evidence. Int J Med Inform. 2023;170:104950. doi: 10.1016/j.ijmedinf.2022.104950
- National Institute of Mental Health. Understanding the Link Between Chronic Disease and Depression. https://www.nimh.nih.gov/health/publications/chronic-illness-mental-health (accessed Aug 19, 2025).
- U.S. Department of Health & Human Services. Chronic conditions. 2019. https://www.hhs.gov/guidance/document/chronic-conditions (accessed Aug 19, 2025).
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd ed. New York: Springer; 2009. doi: 10.1007/978-0-387-84858-7
- Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97. doi: 10.1007/bf00994018
- Duda RO, Hart PE, Stork DG. Pattern Classification. 2nd ed. Hoboken: John Wiley & Sons; 2001.
- Anderson TW. Classification by multivariate analysis. Psychometrika. 1951;16(1):31–50. doi: 10.1007/bf02313425
- Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. doi: 10.1023/a:1010933404324
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. p. 3146-54.
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. p. 785–94. doi: 10.1145/2939672.2939785
- Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39. doi: 10.1006/jcss.1997.1504
- Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29(5):1189–232. doi: 10.1214/aos/1013203451
- Soladoye AA, Aderinto N, Popoola MR, Adeyanju IA, Osonuga A, Olawade DB. Machine learning techniques for stroke prediction: A systematic review of algorithms, datasets, and regional gaps. Int J Med Inform. 2025;203:106041. doi: 10.1016/j.ijmedinf.2025.106041
- Tan KR, Seng JJ, Kwan YH, Chen YJ, Zainudin SB, Loh DH, et al. Evaluation of machine learning methods developed for prediction of diabetes complications: A systematic review. J Diabetes Sci Technol. 2023;17(2):474–89. doi: 10.1177/19322968211056917
- Moulaei K, Sharifi H, Bahaadinbeigy K, Haghdoost AA, Nasiri N. Machine learning for prediction of viral hepatitis: A systematic review and meta-analysis. Int J Med Inform. 2023;179:105243. doi: 10.1016/j.ijmedinf.2023.105243
- Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017.
- Pines JM, Carpenter CR, Raja AS, Schuur JD. Evidence-Based Emergency Care: Diagnostic Testing and Clinical Decision Rules. 2nd ed. Chichester: Wiley-Blackwell; 2013. doi: 10.1002/9781118482117
- Tang O, Matsushita K, Coresh J, Sharrett AR, McEvoy JW, Windham BG, et al. Mortality implications of prediabetes and diabetes in older adults. Diabetes Care. 2020;43(2):382–8. doi: 10.2337/dc19-1221
- Röckl S, Brinks R, Baumert J, Paprott R, Du Y, Heidemann C, et al. All-cause mortality in adults with and without type 2 diabetes: Findings from the National Health Monitoring in Germany. BMJ Open Diabetes Res Care. 2017;5(1):e000451. doi: 10.1136/bmjdrc-2017-000451
- Bollerup S, Hallager S, Engsig F, Mocroft A, Krarup H, Madsen LG, et al. Mortality and cause of death in persons with chronic hepatitis B virus infection versus healthy persons from the general population in Denmark. J Viral Hepat. 2022;29(9):727–36. doi: 10.1111/jvh.13713
- Montuclard C, Hamza S, Rollot F, Evrard P, Faivre J, Hillon P, et al. Causes of death in people with chronic HBV infection: A population-based Cohort Study. J Hepatol. 2015;62(6):1265–71. doi: 10.1016/j.jhep.2015.01.020
- Li Y, Guan L, Ning C, Zhang P, Zhao Y, Liu Q, et al. Machine learning-based models to predict one-year mortality among Chinese older patients with coronary artery disease combined with impaired glucose tolerance or diabetes mellitus. Cardiovasc Diabetol. 2023;22(1):138. doi: 10.1186/s12933-023-01854-z
- Al Alawi AM, Al Shuaili HH, Al-Naamani K, Al Naamani Z, Al-Busafi SA. A machine learning-based mortality prediction model for patients with chronic hepatitis C infection: An exploratory study. J Clin Med. 2024;13(10):2939. doi: 10.3390/jcm13102939
LICENSE

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.