» Articles » PMID: 36091522

Application of Machine Learning Algorithms in Predicting HIV Infection Among Men Who Have Sex with Men: Model Development and Validation

Overview
Specialty Public Health
Date 2022 Sep 12
PMID 36091522
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions.

Methods: We extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018-2019 data ( < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province.

Results: A total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846).

Conclusion: Machine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM.

Citing Articles

Risk-based evaluation of machine learning-based classification methods used for medical devices.

Haimerl M, Reich C BMC Med Inform Decis Mak. 2025; 25(1):126.

PMID: 40069689 PMC: 11895222. DOI: 10.1186/s12911-025-02909-9.


Prediction of new HIV infection in men who have sex with men based on machine learning: secondary analysis of a prospective cohort study from Western China.

Li K, Shi G, Zhang C, Lin B, Tao Y, Wang Q Ann Med. 2025; 57(1):2476040.

PMID: 40059791 PMC: 11894746. DOI: 10.1080/07853890.2025.2476040.


Predicting the Risk of HIV Infection and Sexually Transmitted Diseases Among Men Who Have Sex With Men: Cross-Sectional Study Using Multiple Machine Learning Approaches.

Lin B, Liu J, Li K, Zhong X J Med Internet Res. 2025; 27:e59101.

PMID: 39977856 PMC: 11888048. DOI: 10.2196/59101.


High security and privacy protection model for STI/HIV risk prediction.

Tang Z, Van Nguyen T, Yang W, Xia X, Chen H, Mullens A Digit Health. 2024; 10:20552076241298425.

PMID: 39574801 PMC: 11580078. DOI: 10.1177/20552076241298425.


Role of HIV Serostatus Communication on Frequent HIV Testing and Self-Testing Among Men Who Have Sex With Men Who Seek Sexual Partners on the Internet in Zhejiang, China: Cross-Sectional Study.

Chen W, Chen L, Ni Z, He L, Pan X JMIR Form Res. 2024; 8:e57244.

PMID: 39541583 PMC: 11605257. DOI: 10.2196/57244.


References
1.
Liu H, Zhao M, Ren J, Qi X, Sun H, Qu L . Identifying factors associated with depression among men living with HIV/AIDS and undergoing antiretroviral therapy: a cross-sectional study in Heilongjiang, China. Health Qual Life Outcomes. 2018; 16(1):190. PMC: 6146526. DOI: 10.1186/s12955-018-1020-x. View

2.
do Nascimento P, Gomes Medeiros I, Falcao R, Stransky B, de Souza J . A decision tree to improve identification of pathogenic mutations in clinical practice. BMC Med Inform Decis Mak. 2020; 20(1):52. PMC: 7063785. DOI: 10.1186/s12911-020-1060-0. View

3.
Collins G, Reitsma J, Altman D, Moons K . Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015; 350:g7594. DOI: 10.1136/bmj.g7594. View

4.
Guanghua L, Yi C, Shuai T, Zhiyong S, Zhenzhu T, Yuhua R . HIV, syphilis and behavioral risk factors among men who have sex with men in a drug-using area of southwestern China: Results of 3 cross-sectional surveys from 2013 to 2015. Medicine (Baltimore). 2018; 97(16):e0404. PMC: 5916656. DOI: 10.1097/MD.0000000000010404. View

5.
Wang B, Liu F, Deveaux L, Ash A, Gosh S, Li X . Adolescent HIV-related behavioural prediction using machine learning: a foundation for precision HIV prevention. AIDS. 2021; 35(Suppl 1):S75-S84. PMC: 8133351. DOI: 10.1097/QAD.0000000000002867. View