Scientific publications

Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing. Scientific Publication

Sep 14, 2024 | Magazine: Clinical & Translational Oncology

Andrés J Muñoz Martín  1 , Ramón Lecumberri  2   3 , Juan Carlos Souto  4 , Berta Obispo  5 , Antonio Sanchez  6 , Jorge Aparicio  7 , Cristina Aguayo  8 , David Gutierrez  9 , Andrés García Palomo  10 , Diego Benavent  11 , Miren Taberna  11 , María Carmen Viñuela-Benéitez  12 , Daniel Arumi  13 , Miguel Ángel Hernández-Presa  13


Purpose: We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.

Methods: This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.

Results: Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).

Conclusions: Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.

CITATION  Clin Transl Oncol. 2024 Sep 14. doi: 10.1007/s12094-024-03586-2

Our authors