Major Article| Volume 50, ISSUE 4, P440-445, April 01, 2022

# Early prediction of central line associated bloodstream infection using machine learning

Open AccessPublished:August 20, 2021

## Highlights

• CLABSIs are a major source of hospital-acquired infections and can add $46,000 in costs per patient. • Three machine learning models to predict CLABSI were compared using EHR data. • The XGBoost model obtained an AUROC of 0.762 for CLABSI risk prediction. ### Background Central line-associated bloodstream infections (CLABSIs) are associated with significant morbidity, mortality, and increased healthcare costs. Despite the high prevalence of CLABSIs in the U.S., there are currently no tools to stratify a patient's risk of developing an infection as the result of central line placement. To this end, we have developed and validated a machine learning algorithm (MLA) that can predict a patient's likelihood of developing CLABSI using only electronic health record data in order to provide clinical decision support. ### Methods We created three machine learning models to retrospectively analyze electronic health record data from 27,619 patient encounters. The models were trained and validated using an 80:20 split for the train and test data. Patients designated as having a central line procedure based on International Statistical Classification of Diseases and Related Health Problems 10 codes were included. ### Results XGBoost was the highest performing MLA out of the three models, obtaining an AUROC of 0.762 for CLABSI risk prediction at 48 hours after the recorded time for central line placement. ### Conclusions Our results demonstrate that MLAs may be effective clinical decision support tools for assessment of CLABSI risk and should be explored further for this purpose. ## Key Words Central line-associated bloodstream infections (CLABSIs) are defined by the Centers of Disease Control and Prevention (CDC) as laboratory-confirmed bloodstream infections that cannot be attributed to a source other than the presence of a central line and develop 48 hours after central line placement. • Hallam C • Jackson T • Rajgopal A • Russell B. Establishing catheter-related bloodstream infection surveillance to drive improvement. They can be attributed to lines that are intended to be permanent as well as those that are not implanted and serve as temporary measures for patient management. 2021 NHSN patient safety component manual. 2021;428. An estimated 250,000 cases of bloodstream infections occur in the United States annually, 80,000 of which are attributed to being in the intensive care unit (ICU). • O'Grady NP • Alexander M • Burns LA • et al. Summary of recommendations: guidelines for the prevention of intravascular catheter-related infections. CLABSI rates in non-ICU settings, where central lines are also commonly used, mirror those in an ICU setting. • Dumyati G • Concannon C • van Wijngaarden E • et al. Sustained reduction of central line-associated bloodstream infections outside the intensive care unit with a multimodal intervention focusing on central line maintenance. Further, because these wards typically have higher patient counts than that of ICUs, the number of patients at risk is also higher. • Dumyati G • Concannon C • van Wijngaarden E • et al. Sustained reduction of central line-associated bloodstream infections outside the intensive care unit with a multimodal intervention focusing on central line maintenance. CLABSIs are not only a major source of hospital-acquired infections (HAI) across care settings, • Latif A • Halim MS • Pronovost PJ. Eliminating infections in the ICU: CLABSI. they also contribute to additional hospital costs of approximately$46,000 per infection
• Zimlichman E
• Henderson D
• Tamir O
• et al.
Health care-associated infections: a meta-analysis of costs and financial impact on the US health care system.
and have a mortality rate of 10%-30%.
• Ranji SR
• Shetty K
• Posley KA
• et al.
Closing the Quality Gap: A Critical Analysis of Quality Improvement Strategies (Vol. 6: Prevention of Healthcare–Associated Infections) [Internet].
Therefore, efforts to decrease the rate of CLABSIs and improve the quality of patient care are of critical importance.
Interventions for CLABSI as outlined by CDC guidance can be very helpful in mitigating harmful sequelae. The guidelines have supported national use of CLABSI rates as a quality metric for reporting.
The US Centers for Medicare and Medicaid Services report rates of CLABSI publicly and can deny hospitals reimbursement for high infection rates
• Park JY
• Kwon KT
• Lee WK
• et al.
The impact of infection control cost reimbursement policy on central line-associated bloodstream infections.
and, as a result, programs to mitigate CLABSI with financial incentives have emerged with varying results.
• Bastian ND
• Kang H
• Nembhard HB
• Bloschichak A
• Griffin PM.
The Impact of a pay-for-performance program on central line-associated blood stream infections in Pennsylvania.
,
• Vokes RA
• Bearman G
• Bazzoli GJ.
Hospital-acquired infections under pay-for-performance systems: an administrative perspective on management and change.
Because HAIs pose a continuous threat to both patient safety and to the financial well-being of healthcare institutions,
• Vokes RA
• Bearman G
• Bazzoli GJ.
Hospital-acquired infections under pay-for-performance systems: an administrative perspective on management and change.
there is a growing need for the development of tools which can help to reduce rates of infection.
While certain host factors (i.e., history of immunodeficiency and neutropenia
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
) and central line factors have been identified as significant risks for developing a CLABSI, there are currently no CLABSI risk evaluation tools that are validated in nationally representative hospital populations of both ICU and general ward patients. Identification of high-risk patients could beneficially impact clinical practice by enabling earlier or more intensive treatment and monitoring; for example, by encouraging more timely replacement of catheters and catheter site dressings.
,

Automated Surveillance for Healthcare-Associated Infections: Opportunities for Improvement | Clinical Infectious Diseases Oxford Academic [Internet]. Accessed Jun 10, 2021. Available from: https://academic.oup.com/cid/article/57/1/85/279509?login=true

To improve clinician ability to identify patients at a high risk for developing a CLABSI, we have developed and retrospectively validated a machine learning algorithm (MLA) that predicts whether a patient is at risk of developing CLABSI before central line placement at any later point during the patient's hospital stay. This MLA uses readily available data drawn from electronic health records (EHRs), requires minimal inputs, and thus, does not disrupt clinical workflow. Our hypothesis is that this MLA can effectively serve as a risk prediction clinical decision support (CDS) tool to aid with identification of patients at risk for central line infections and can outperform existing CLABSI risk prediction tools.

## Materials and methods

### Dataset and data processing

This model was developed and validated on a proprietary national longitudinal EHR repository that incorporates clinical, claims and other medical administrative data obtained from over 700 inpatient and ambulatory care sites. After applying exclusion criteria within this dataset, 46 hospital sites nationwide were used for analysis of the MLA. Patient data were de-identified in compliance with the Health Insurance Portability and Accountability Act (HIPAA). Data utilized were limited to encounters occurring between October 1, 2015 and June 2020 and were the result of aggregation across several different EHR systems. Data collection was limited to this timeframe to capture International Statistical Classification of Diseases and Related Health Problems (ICD) 10 diagnostic codes. Information that was extracted for analysis included demographics (gender, age, race, ethnicity), the number of days a patient had been hospitalized before placement of a central line, lab and vital values (white blood cell count, neutrophil, hemoglobin, temperature), and history of comorbidities (smoking, heart failure, chronic kidney disease, renal failure, sepsis, valvular disease, diabetes, arrhythmia, presence of a stoma, tumor, cirrhosis, trauma, peptic ulcer disease, peripheral vascular disease).

### Inclusion/exclusion criteria

All inpatient visits that utilized a central line procedure were considered. Central line procedures were identified using ICD 10 and procedural classification system codes (Supplementary Table 1).
From the dataset, all non-inpatient visits were excluded, as well as all patients <18 years old and those without EHR data. Comorbidities were considered if they existed prior to the visit in which the central line was placed. All EHR data occurring prior to placement of the central line was included. To identify and label the positive patients, ICD 10 codes for central line infection were used. Any positive diagnoses of CLABSI within 48 hours after the placement of the central line were discarded to better align with the CDC standard. With patients for whom the hour of central line placement was not demarcated, the first possible time point at which a CLABSI could occur was deemed to be 48 hours after 11:59 pm on the day of procedure placement to ensure that a minimum of 48 hours had elapsed after the central line placement. The CLABSI cases that occurred prior to the patient's current visit were marked as history of CLABSI.
In total, the dataset contained 9,599,481 inpatient encounters. Among these, there were 111,684 unique visits in which a central line procedure was carried out. Due to the extremely large size of the dimensional lab and observation tables, we randomly selected 50% of the sample to create a more manageable sample of 55,842 patients. Further excluding patients who were younger than 18 years of age, had missing EHR records, or unknown gender reduced the dataset to 27,619. These steps are summarized in the attrition chart shown in Figure 1. For training and test purposes, the dataset was split by an 80:20 ratio. The features and their distribution in the test dataset are shown in Table 1. The feature distribution for the entire dataset (train and test sets) are shown in Supplementary Table 2. For the definition of the feature name abbreviations please refer to Supplementary Table 3.
Table 1Demographic information and feature distribution for the study sample (test dataset)
Neg. CLABSI N (%)Pos. CLABSI N (%)P value
Demographics
P values were calculated using a χ² test of independence between the feature and the classes.
Race_african american914 (16.73%)20 (33.33%).001
Race_asian74 (1.35%)2 (3.33%).452
Race_caucasian4069 (74.47%)35 (58.33%).007
Race_other/unknown407 (7.45%)3 (5.00%).637
Ethnicity_hispanic272 (4.98%)2 (3.33%).776
Ethnicity_not hispanic4857 (88.89%)56 (93.33%).377
Ethnicity_unknown335 (6.13%)2 (3.33%).529
Gender_male2867 (52.47%)32 (53.33%).998
Gender_female2597 (47.53%)28 (46.67%).998
Age_18-30241 (4.41%)3 (5.00%).924
Age_31-40332 (6.08%)5 (8.33%).649
Age_41-50558 (10.21%)6 (10.00%).873
Age_51-601051 (19.23%)16 (26.67%).198
Age_61-701363 (24.95%)15 (25.00%).888
Age_>711919 (35.12%)15 (25.00%).134
Comorbidities
P values were calculated using a χ² test of independence between the feature and the classes.
Smoke_05383 (98.52%)56 (93.33%)<.001
Smoke_11 (0.02%)0 (0.00%)<.001
Smoke_232 (0.59%)4 (6.67%)<.001
Smoke_348 (0.88%)0 (0.00%)<.001
Prior CLABSI29 (0.53%)6 (10.00%)<.001
Heart failure1681 (30.77%)22 (36.67%).399
CKD1869 (34.21%)40 (66.67%)<.001
Rf3001 (54.92%)43 (71.67%).014
Sepsis1931 (35.34%)37 (61.67%)<.001
VD928 (16.98%)15 (25.00%).142
Diabetes1854 (33.93%)31 (51.67%).006
Arrhythmia2647 (48.44%)35 (58.33%).163
Stoma208 (3.81%)6 (10.00%).033
Cirrhosis288 (5.27%)4 (6.67%).849
Trauma1093 (20.00%)14 (23.33%).632
PUD349 (6.39%)7 (11.67%).164
PVD601 (11.00%)12 (20.00%).045
COPD1207 (22.09%)12 (20.00%).817
Tumor
Values from 12 mo leading to the hospitalization.
590 (10.80%)4 (6.67%).413
Leukemia
Values from 12 mo leading to the hospitalization.
157 (2.87%)5 (8.33%).035
HIV39 (0.71%)0 (0.00%).906
Transplant167 (3.06%)3 (5.00%).623
Vitals and Lab
P values were calculated using a two-sample independent t test between positive and negative classes.
BMI30.1428.33.089
Temp (celcius)36.6936.97.008
HGB (g/dl)11.5310.57.004
NEUT (x10^3/ul)10.3411.45.331
WBC (x10^3/ul)13.1913.65.685
STAY_PRE (d)0.210.58.173
Note: demographics and comorbidities share the same footnote: P values were calculated using a χ² test of independence between the feature and the classes.
P values were calculated using a χ² test of independence between the feature and the classes.
Values from 12 mo leading to the hospitalization.
P values were calculated using a two-sample independent t test between positive and negative classes.

### Gold standard

The endpoint of interest followed the gold standard definition of CLABSI as relevant to an adult population. This includes permanent and non-permanent central lines in inpatient populations that lead to a lab-confirmed infection after two consecutive days of having a central line in place and are designated by ICD 10 codes within inpatient EHR records (see Supplementary Table 1).

Bloodstream Infections. 2021;50.

Onset time was classified as at least 48 hours after the recorded time for central line placement.
If a diagnostic code for CLABSI was assigned within the initial 48-hour period, the patient was excluded, as it could not be determined if the CDC time-based criterion for a CLABSI was achieved.

### Machine learning model

Three machine learning classifiers were considered for this task. The XGBoost (XGB) classifier uses an ensemble of decision trees with additional specific parameters for regularization and learning rate. It is a gradient boosting algorithm with a collection of decision trees, where each “tree” is grown sequentially in order to minimize the error from previous trees that were built.
• Chen T
• Guestrin C.
XGBoost: a scalable tree boosting system.
To ensure that using a complex model, such as XGB, was warranted for the existing data, we also trained a single decision tree model and a linear logistic regression model classifier to compare their performances to the XGB classifier. All three classifiers were trained and validated using a stratified 80:20 split for the train and test data. The test data, that is, the hold out test dataset, was never used in any part of the training process and was only used on the trained model to measure the performance on the unseen data. The 80% training subset was fed a 5-fold stratified cross validation dataset. Two different methods were initially considered to address the class imbalance: the Synthetic Minority Oversampling Technique (SMOTE),
• Chawla NV
• Bowyer KW
• Hall LO
• Kegelmeyer WP.
SMOTE: synthetic minority over-sampling technique.
and class weights. SMOTE is a method that utilizes the distribution of clusters of the minority class in order to generate synthetic minority class examples for training. Class weights, in contrast, oversample the minority class during the cross-validation process in order to provide enough examples of the minority class to the model during the training process. The SMOTE method did not produce any promising results, so it was abandoned in favor of class weights, which were incorporated as a hyperparameter in the models. Hyperparameter tuning of the algorithms was performed by using grid search. The hyperparameter space for each of the classifiers is given in Supplementary Table 4.

### Feature selection

We initially trained all three classifiers on these features using 5-fold cross validation and hyperparameter tuning on the training dataset and compared their performances. The best performing classifier, XGB, was used to further prune the set of features to create a more portable model which was less likely to overfit while still maintaining a high level of performance. The feature elimination process was primarily based on the impact of the feature on the model performance, which was measured using SHapley Additive exPlanations (SHAP) plots. SHAP values provide a more consistent way of interpreting the feature importance in a model, based on both the magnitude and direction of impact of individual feature values.
• Lundberg S
• Lee S-I.
A Unified Approach to Interpreting Model Predictions.
We started with features that the model determined to be the most impactful, after which we attempted to further reduce this set by incorporating a backward elimination process until the final set of features was identified that had the same performance as the original set.
Although XGB is not very susceptible to correlation among features, we verified that the final set of features were also not highly correlated in order to ensure that the selected features could not be easily reduced to a smaller set. For time series features (vital and lab measurements), we took the most recent value prior to the procedure time. This produced the best performance result compared to other methods, such as taking a sequence of values from the time series, or taking the descriptive statistics (mean, median etc.) from the time series; hence our classifiers used the most recent value of the vital and lab data.

### Statistical analysis

The performance of all three classifiers was assessed using the area under the receiver operating characteristic (AUROC). We also assessed the model specificity when sensitivity was fixed at approximately 0.80, as well as the corresponding positive likelihood ratio (LR+), the negative likelihood ratio (LR-), and diagnostic odds ratio (DOR). Feature importance of all three classifiers was assessed using SHAP values, which measure the impact of each individual feature on the model when specific values are compared with baseline values. The results of the statistical evaluation of the models are presented in the following section.

## Results

The AUROC values of the three classifiers are presented in Figure 2. Table 2 shows the performance summary of the three classifiers in terms of the AUROC and the statistical results of sensitivity, specificity, LR+, LR-, DOR and their confidence intervals. All metrics were measured on the hold out test set after the model training task was completed. Figure 3 shows the feature importance plot of the top 20 most important features for the XGB classifier according to the SHAP values. The SHAP summary plots for the most important features of all three classifiers are also shown in Supplementary Figures 1, 2 and 3. There is a clear drop in feature impact after the first 13 features, as shown in Figure 3. As such, we further trained a portable XGB classifier using these 13 features only, with performance results summarized in Table 3. These results demonstrate that the loss of performance by limiting the model only to these features was small. This minimal input model used the following features: temperature, hemoglobin, white blood cell count, neutrophil count, one hot encoded races Caucasian and African American, age, and any history of sepsis, chronic kidney disease, Stoma, prior CLABSI, renal failure and valvular disease.
Table 2Performance metrics for CLABSI prediction at time of central line placement for the three classifiers.
Value (95% confidence interval)
XGB
AUROC0.762 (0.695, 0.831)
Sensitivity0.800 (0.699, 0.901)
Specificity0.538 (0.525, 0.552)
LR+1.733 (1.522, 1.973)
LR-0.371 (0.224, 0.616)
DOR4.671
Logistical regression
AUROC0.633 (0.569, 0.700)
Sensitivity0.800 (0.699, 0.901)
Specificity0.443 (0.430, 0.456)
LR+1.43601 (1.262, 1.633)
LR-0.45157 (0.272, 0.750)
DOR3.184
Decision tree
AUROC0.690 (0.606, 0.761)
Sensitivity0.867 (0.78065, 0.95268)
Specificity0.177 (0.16757, 0.18784)
LR+1.05397 (0.95366, 1.16482)
LR-0.75029 (0.39262, 1.4338)
DOR1.4
Abbreviations: AUROC, area under the receiving operating characteristic; DOR, diagnostic odds ratio; LR, likelihood ratio.
Table 3Performance metrics for the minimal input classifier that included 13 features.
Value (95% confidence interval)
AUROC0.746 (0.667, 0.820)
Sensitivity0.800 (0.699, 0.901)
Specificity0.482 (0.468, 0.495)
LR+1.54296 (1.356, 1.755)
LR-0.41536 (0.250, 0.689)
DOR3.715
Abbreviations: AUROC, area under the receiving operating characteristic; DOR, diagnostic odds ratio; LR, likelihood ratio.

## Discussion

In our research, three MLAs as well as a minimal input model were developed and validated for their ability to predict CLABSIs, with the XGB model outperforming the logistic regression and decision tree models in terms of AUROC. Recent studies modeling patient risk for a variety of conditions including sepsis and respiratory decompensation have also demonstrated superiority of the XGBoost model over other ML models.
• Barton C
• Chettipally U
• Zhou Y
• et al.
Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs.
• Burdick H
• Lam C
• Mataraso S
• et al.
Prediction of respiratory decompensation in Covid-19 patients using machine learning: the READY trial.
• Ryan L
• Lam C
• Mataraso S
• et al.
Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: a retrospective study.
The models described in the current study may provide early warning of which patients may be more vulnerable if they were to have a central line placed during the course of care. Our research team has previously demonstrated the utility of this approach with an algorithm designed to predict sepsis, resulting in significantly reduced rates of sepsis, costs related to sepsis, and mortality.
• Burdick H
• Pino E
• Gabel-Comeau D
• et al.
Effect of a sepsis prediction algorithm on patient mortality, length of stay and readmission: a prospective multicentre clinical outcomes evaluation of real-world patient data from US hospitals.
,
• Calvert J
• Hoffman J
• Barton C.
Cost and mortality impact of an algorithm-driven sepsis prediction system.
Through multiple steps of eliminating the least important features that did not significantly impact the model performance, we identified the most prominent subset of features for CLABSI prediction with the XGBoost model to be age, race, temperature, hemoglobin, white blood cell count, neutrophil, and any comorbidity history of sepsis, chronic kidney disease, stoma, renal failure, valvular disease and previous CLABSI. It is important to note that, since medical history and prior diagnoses emerged as some of the most important factors for prediction, this tool is very convenient in a clinical setting since it can readily scan all available EHR records. Thus, everyone can be classified as having or not having the pre-existing conditions based on EHR records reflecting historical diagnosis and/or diagnoses present on admission. This may afford healthcare providers (HCPs) with the opportunity to enhance clinical monitoring and modify treatment plans in a targeted fashion for those patients who are flagged as being high-risk. Currently, HCPs rely on established measures to prevent and detect CLABSI, for example, conducting a daily inspection and cleansing of the central line insertion site.
Despite these methods being evidence-based, they do not serve the purpose of stratifying risk in hospital patients; therefore, prediction of future CLABSI risk relies on a clinical gestalt.
• DeVries M.
CLABSI: Definition and Diagnosis.
To date, there are no validated, widely adopted prediction tools to identify patients with varied types of central lines at high risk of CLABSI in both ICU and non-ICU wards. In 2017, Herc et al. described the development of a clinical decision tool to predict CLABSI among patients with peripherally inserted central catheters (PICC). This rules-based tool was developed to predict PICC-line infections in both ICU and non-ICU patients and achieved AUROCs of 0.70 and 0.80, respectively.
• Herc E
• Patel P
• Washer LL
• Conlon A
• Flanders SA
• Chopra V.
A model to predict central-line–associated bloodstream infection among patients with peripherally inserted central catheters: the MPC score.
However, the results of the tool are not generalizable for risk assessment on tunneled central lines, as PICCs are non-tunneled in terms of the method and location of placement.
• Annamaraju P
• Regunath H.
Central Line Associated Blood Stream Infections.
An MLA CDS tool that can generate predictions for CLABSI across diverse types of central lines without limiting predictions to a specific line type or requiring different CDS tools to evaluate different line types may be of great utility to HCPs. Such a tool would not only make the control of CLABSI in healthcare settings more efficient and effective, but also help expand the capacity of infection prevention personnel to engage in other interventions against the spread of healthcare-associated infections.
• Sips ME
• Bonten MJM
• van Mourik MSM.
Automated surveillance of healthcare-associated infections: state of the art.
Deploying these tools would provide critical support to the field, especially with high rates of turnover and retirement of trained infection preventionists anticipated in the coming years.
• Gilmartin H
• Smathers S
• Reese SM.
Infection preventionist retention and professional development strategies: insights from a national survey.
,
• Vassallo A
• Boston KM.
The master of public health graduate as infection preventionist:Navigating the changing landscape of infection prevention.
Recently, research has explored the use of machine learning to predict CLABSI. Beeler et al. utilized a random-forest MLA to predict CLABSI using retrospective EHR data.
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
Data consisted of ICD and diagnosis-related group coding and HCP notes for adult, pediatric, and neonatal inpatient data, and the model obtained an AUC of 0.87 during the study and 0.82 during an independent statistical validation.
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
However, Beeler et al's research relied upon HCP notes in the EHR data, which is subject to the individual assessment of HCPs, while our MLA utilizes only ICD codes and does not require assessment by an HCP.
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
Further, Beeler et al. concluded that their results indicated the ability of MLAs to predict CLABSI in real-time, but did not validate prediction in advance of CLABSI onset, limiting the clinical utility of such a tool for CLABSI prevention.
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
Parreco et al. sought to ascertain the performance of three ML models- logistic regression, gradient boosted trees, and deep learning- to predict patient mortality, onset of CLABSI, and the need for central line placement in patients using ICU patient EHR data comprised of ICD 9 codes.
• Parreco JP
• Hidalgo AE
• Ilyas O
• Rattan R.
Predicting central line-associated bloodstream infections and mortality using supervised machine learning.
For prediction of CLABSI, the highest performing ML model was logistic regression, yielding an AUC of 0.722. However, it is important to note that these models were trained and tested on a dataset of ICU patients and, as such, the results are not generalizable to other inpatient wards.
• Parreco JP
• Hidalgo AE
• Ilyas O
• Rattan R.
Predicting central line-associated bloodstream infections and mortality using supervised machine learning.
Though these results highlight the ability of ML to predict CLABSI, there were several limitations to our research. First, to ensure the portability of this tool, ICD codes were used, despite the poor sensitivity of these codes in regards to their ability to detect CLABSI.
• Tukey MH
• Borzecki AM
• Wiener RS.
Validity of ICD-9-CM codes for the identification of complications related to central venous catheterization.
However, the infrequent use of administrative codes to document CLABSI does not impact the utility of our MLA in a clinical setting. Additionally, without access to National Healthcare Safety Network surveillance data, the use of EHR data was the only method available to us to validate the algorithm using a large, national dataset. This method was further justified by the fact that the use of ICD codes eliminated the need for a more intensive patient chart review to identify HCP notes indicating CLABSI, which would have drastically limited the patient pool and type of hospitals from which our data were derived. Lastly, as this study was conducted using retrospective data, future directions should include validation in live clinical settings, in which MLA parameters are tuned to individual hospitals as well as conducting human factors surveys to assess clinical utility. Further, outcome measures should be examined to assess the impact of the MLA on patient outcomes and the economic savings associated with the use of this tool. This could be achieved in a clinical trial using diverse patient populations and a post-marketing study using a combination of ICD codes, HCP notes, and clinician feedback during algorithm training prior to deploying the software for commercial use.

## Conclusion

Despite incentivized programs to reduce central line infections, there remains a need for tools to predict the risk of CLABSI to provide HCPs with the opportunity to modify treatment protocols or to more closely monitor high-risk patients. Prediction of CLABSI may be complicated by the heterogeneity of central line type, the range of patient populations in which central lines are used, and individual patient factors. However, our preliminary results indicate that, with further refinement, MLAs may serve as an invaluable clinical decision support tool to identify patients at risk for CLABSI to supplement existing protocols.

## Acknowledgments

We gratefully acknowledge Megan Handley for her assistance in writing and editing this manuscript.

## References

• Hallam C
• Jackson T
• Rajgopal A
• Russell B.
Establishing catheter-related bloodstream infection surveillance to drive improvement.
J Infect Prev. 2018; 19: 160-166
1. 2021 NHSN patient safety component manual. 2021;428.

• Alexander M
• Burns LA
• et al.
Summary of recommendations: guidelines for the prevention of intravascular catheter-related infections.
Clin Infect Dis off Publ Infect Dis Soc Am. 2011; 52: 1087-1099
• Dumyati G
• Concannon C
• van Wijngaarden E
• et al.
Sustained reduction of central line-associated bloodstream infections outside the intensive care unit with a multimodal intervention focusing on central line maintenance.
Am J Infect Control. 2014; 42: 723-730
• Latif A
• Halim MS
• Pronovost PJ.
Eliminating infections in the ICU: CLABSI.
Curr Infect Dis Rep. 2015; 17: 35
• Zimlichman E
• Henderson D
• Tamir O
• et al.
Health care-associated infections: a meta-analysis of costs and financial impact on the US health care system.
JAMA Intern Med. 2013; 173: 2039-2046
• Ranji SR
• Shetty K
• Posley KA
• et al.
Closing the Quality Gap: A Critical Analysis of Quality Improvement Strategies (Vol. 6: Prevention of Healthcare–Associated Infections) [Internet].
Agency for Healthcare Research and Quality (US), RockvilleMD2007 ([cited 2020 Sep 11]. (AHRQ Technical Reviews). Available from:)
Guidelines for the Prevention of Intravascular Catheter-Related Infections. 2017. 2011: 80
• Park JY
• Kwon KT
• Lee WK
• et al.
The impact of infection control cost reimbursement policy on central line-associated bloodstream infections.
Am J Infect Control. 2020; 48: 560-565
• Bastian ND
• Kang H
• Nembhard HB
• Bloschichak A
• Griffin PM.
The Impact of a pay-for-performance program on central line-associated blood stream infections in Pennsylvania.
Hosp Top. 2016; 94: 8-14
• Vokes RA
• Bearman G
• Bazzoli GJ.
Hospital-acquired infections under pay-for-performance systems: an administrative perspective on management and change.
Curr Infect Dis Rep. 2018; 20: 35
• Beeler C
• Dbeibo L
• Kelley K
• et al.
Assessing patient risk of central line-associated bacteremia via machine learning.
Am J Infect Control. 2018; 46: 986-991
2. Automated Surveillance for Healthcare-Associated Infections: Opportunities for Improvement | Clinical Infectious Diseases Oxford Academic [Internet]. Accessed Jun 10, 2021. Available from: https://academic.oup.com/cid/article/57/1/85/279509?login=true

3. Bloodstream Infections. 2021;50.

• Chen T
• Guestrin C.
XGBoost: a scalable tree boosting system.
in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA2016: 785-794 ([Internet] Accessed Mar 11, 2021Available from:)
• Chawla NV
• Bowyer KW
• Hall LO
• Kegelmeyer WP.
SMOTE: synthetic minority over-sampling technique.
J Artif Intell Res. 2002; 16: 321-357
• Lundberg S
• Lee S-I.
A Unified Approach to Interpreting Model Predictions.
2017 (ArXiv170507874 Cs Stat [Internet]. Accessed May 3, 2021. Available at:)
• Barton C
• Chettipally U
• Zhou Y
• et al.
Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs.
Comput Biol Med. 2019; 109: 79-84
• Burdick H
• Lam C
• Mataraso S
• et al.
Prediction of respiratory decompensation in Covid-19 patients using machine learning: the READY trial.
Comput Biol Med. 2020; 124103949
• Ryan L
• Lam C
• Mataraso S
• et al.
Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: a retrospective study.
Ann Med Surg. 2012; 59: 207-216
• Burdick H
• Pino E
• Gabel-Comeau D
• et al.
Effect of a sepsis prediction algorithm on patient mortality, length of stay and readmission: a prospective multicentre clinical outcomes evaluation of real-world patient data from US hospitals.
BMJ Health Care Inform. 2020; 27
• Calvert J
• Hoffman J
• Barton C.
Cost and mortality impact of an algorithm-driven sepsis prediction system.
J Med Econ. 2017; 20: 646-651
• DeVries M.
CLABSI: Definition and Diagnosis.
in: Moureau NL Vessel Health and Preservation: The Right Approach for Vascular Access [Internet]. Springer International Publishing, Cham, Switzerland2019: 163-168
• Herc E
• Patel P
• Washer LL
• Conlon A
• Flanders SA
• Chopra V.
A model to predict central-line–associated bloodstream infection among patients with peripherally inserted central catheters: the MPC score.
Infect Control Hosp Epidemiol. 2017; 38: 1155-1166
• Annamaraju P
• Regunath H.
Central Line Associated Blood Stream Infections.
in: StatPearls [Internet]. StatPearls Publishing, Treasure Island, FL2021
• Sips ME
• Bonten MJM
• van Mourik MSM.
Automated surveillance of healthcare-associated infections: state of the art.
Curr Opin Infect Dis. 2017; 30: 425-431
• Gilmartin H
• Smathers S
• Reese SM.
Infection preventionist retention and professional development strategies: insights from a national survey.
Am J Infect Control. 2021; 49: 960-962
• Vassallo A
• Boston KM.
The master of public health graduate as infection preventionist:Navigating the changing landscape of infection prevention.
Am J Infect Control. 2019; 47: 201-207
• Parreco JP
• Hidalgo AE