The utility of the rapid emergency medicine score (REMS) compared with SIRS, qSOFA and NEWS for Predicting in-hospital Mortality among Patients with suspicion of Sepsis in an emergency department

Many early warning scores (EWSs) have been validated to prognosticate adverse outcomes secondary to sepsis in the Emergency Department (ED). These EWSs include the Systemic Inflammatory Response Syndrome criteria (SIRS), the quick Sequential Organ Failure Assessment (qSOFA) and the National Early Warning Score (NEWS). However, the Rapid Emergency Medicine Score (REMS) has never been validated for this purpose. We aimed to assess and compare the prognostic utility of REMS with that of SIRS, qSOFA and NEWS for predicting mortality in patients with suspicion of sepsis in the ED. We conducted a retrospective study at the ED of Siriraj Hospital Mahidol University, Thailand. Adult patients suspected of having sepsis in the ED between August 2018 and July 2019 were included. Their EWSs were calculated. The primary outcome was all-cause in-hospital mortality. The secondary outcome was 7-day mortality. A total of 1622 patients were included in the study; 457 (28.2%) died at hospital discharge. REMS yielded the highest discrimination capacity for in-hospital mortality (the area under the receiver operator characteristics curves (AUROC) 0.62 (95% confidence interval (CI) 0.59, 0.65)), which was significantly higher than qSOFA (AUROC 0.58 (95%CI 0.55, 0.60); p = 0.005) and SIRS (AUROC 0.52 (95%CI 0.49, 0.55); p < 0.001) but not significantly superior to NEWS (AUROC 0.61 (95%CI 0.58, 0.64); p = 0.27). REMS was the best EWS in terms of calibration and association with the outcome. It could also provide the highest net benefit from the decision curve analysis. Comparison of EWSs plus baseline risk model showed similar results. REMS also performed better than other EWSs for 7-day mortality. REMS was an early warning score with higher accuracy than sepsis-related scores (qSOFA and SIRS), similar to NEWS, and had the highest utility in terms of net benefit compared to SIRS, qSOFA and NEWS in predicting in-hospital mortality in patients presenting to the ED with suspected sepsis.


Introduction
Sepsis is a state of organ dysfunction caused by dysregulated host response to infection [1,2]. It is a critical condition with a high mortality rate and is considered a major health problem worldwide [1][2][3]. In high-income countries, around 20 million people suffer from sepsis each year with mortality rate ranging between 17 and 26% [4]. In the middle-income country of Thailand, the mortality rate was estimated at 25-50% [5,6]. Thailand has limited universal coverage health care resources with increasing emergency department (ED) overcrowding. Thus, the burden of disease is higher than in highincome countries.
Early recognition of patients with sepsis is the key to improve its management, especially in those with greater severity who are at risk of adverse outcomes. Developing early warning score (EWS) tools to identify these patients early may aid clinicians to accelerate treatment and could lead to improved outcomes. Sepsis was previously defined and identified using Systemic Inflammatory Response Syndrome criteria (SIRS). However, SIRS has been criticized for its low specificity [7][8][9], which led to the introduction of the quick Sequential Organ Failure Assessment (qSOFA) proposed by the third international consensus definition (Sepsis-3) [10]. qSOFA has been shown to have better specificity but lower sensitivity than SIRS. Consequently, it might not detect patients early enough in their course of disease and may not be beneficial for ED utilization [11][12][13][14].
Many EWSs have been developed for ED use. The National Early Warning Score (NEWS) was developed to assess and monitor hospitalized patients for early detection of clinical deterioration [15]. Despite being developed for clinical deterioration, it has been validated as a feasible predictor for adverse outcomes due to sepsis. NEWS has higher accuracy than qSOFA and SIRS for predicting mortality and intensive care unit (ICU) transfer of suspected septic patients [16][17][18][19]. The Rapid Emergency Medicine Score (REMS) was developed to predict in-hospital mortality in non-surgical ED patients [20]. It has not been validated and compared to sepsisrelated scoring systems and other EWSs to predict adverse outcomes due to sepsis. Thus, we aimed to validate and compare the clinical utility of REMS, SIRS, qSOFA, and NEWS in predicting in-hospital mortality and mortality within 7 days of admission in ED patients with suspected sepsis.

Study design and setting
We conducted a retrospective study at the ED of Siriraj Hospital, Mahidol University, Bangkok, Thailand. Siriraj Hospital is the largest tertiary university hospital in Thailand with over 20,000 Emergency Severity Index level 1-2 ED visits per year. Siriraj Institutional Review Board approved the study. Patients' inform consent was waived due to de-identification of their data.

Patients
Adult patients aged > 18 years were eligible if they were suspected of having sepsis by ED physicians using clinical judgement and had hemoculture taken, followed by prescribed intravenous antibiotics or vice versa. Patients transferred from outpatient units after having been treated with intravenous antibiotics were excluded.

Data collection
We assessed ED patients retrospectively for eligibility between 1 August 2018 and 31 July 2019. Triage nurses assessed patients visiting the ED and recorded their initial vital signs in the standing triage form, and then attending ED physicians assessed them.
We extracted physiologic variables, underlying conditions, management, and outcomes from electronic medical records. All components of each risk score were in the standing ED admission triage form, from which we used the initial values at presentation to retrospectively calculate all risk score values. If sepsis was suspected after 4 h from time of ED visit, we imputed the values closest to the time of suspicion, defined as time of culture or antibiotics, whichever came first. We calculated all risk scores using online calculator (MdCalc online calculator).

Scoring systems
SIRS is a 4-item score consisting of pulse rate, respiratory rate, body temperature and white blood cell counts; each item containing 1 point (0-4 points). qSOFA has 3 items with 1 point each; respiratory rate, mental status and systolic blood pressure (0-3 points). NEWS and REMS are scoring systems with multiple components with weighted score points. NEWS (0-20 points) consists of pulse rate, respiratory rate, body temperature, systolic blood pressure, oxygen saturation and need for oxygen supplement. REMS comprises of pulse rate, respiratory rate, mean arterial pressure, mental status, pulse oximetry and age (0-26 points). The components and details of each risk score are presented in the Table S1.

Outcomes
The primary outcome was all-cause in-hospital mortality, and the secondary outcome was all-cause mortality within 7 days of ED visit.

Statistical analysis
We reported categorical variables as frequency (percentage) and continuous variables as mean (SD) or median (interquartile range) as appropriate. We compared patient characteristic variables using Chi square, t-test, or Mann-Whitney U test as appropriate.
The predictive performance of SIRS, qSOFA, NEWS, and REMS for primary and second outcomes was assessed. Discrimination was assessed by area under the curve of the receiver operator characteristics curves (AUROC). We estimated the 95% confidence interval (CI) of the AUROCs and made comparisons between EWSs using a bootstrapped method at 10000 replications. We evaluated calibration with calibration plots and the Hosmer-Lemeshow test, using a smoothed nonparametric method to fit the calibration curves [21,22]. Overall model performance was tested by scaled Brier score and Nagelkerke's R squared. A complementary analysis of EWSs was performed incorporating information an ED physician may have at assessment including age, gender, and Charlson Comorbidity Index. Baseline mortality risk models were fitted for each outcome with age as a restricted cubic spline [21]. The additional predictive contribution of each EWS to the baseline risk model was assessed by likelihood ratio test. Comparisons between baseline risk model plus an EWS versus baseline risk model plus a different EWS were assessed by bootstrap test with 10,000 replications. Integrated discrimination improvement assessed whether a baseline plus EWS model had better discrimination than the baseline model alone by difference in discrimination slopes between the models [23].
Good discrimination and calibration may not reflect the clinical usefulness of an EWS because they assign equal weight to sensitivity, specificity, and prediction errors. ED physicians making decisions in clinical practice usually assign different weights to these based on the patient's characteristics and available resources. To reflect this, we calculated net benefit (NB) at each threshold probability from decision curve analysis [24]. For a patient with suspected sepsis, the ED physician weights the harm/cost of overtreatment against the benefit of treatment (the harm/cost-to-benefit ratio). If the physician thinks that the harm/cost-to-benefit ratio is 1:9, this represents a threshold probability of 10%, and the number of patient that the physician is willing to treat (NWT) to prevent the mortality outcome is 10. We present a threshold probability range from 1 to 20% (NWT from 100 to 5, respectively), which is a plausible range over which a physician would use an EWS for a patient with suspected sepsis. A higher NB is desirable at any threshold probability and should be higher than 'treat all' or 'treat none' strategies to have clinical utility.
Clinical usefulness at cutoff values was also assessed by sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ration (LR-), negative predictive value (NPV) and positive predictive value (PPV). These were calculated for SIRS and qSOFA at the recommended cutoffs from previous literature [10,25]. For NEWS and REMS, we reported the accuracy at the optimal cutpoint according to optimal Youden index. Pre-specified subgroup analyses were performed by age ≥ 70 years and age < 70 years as well as by no comorbidities and at least one comorbidity. Comorbidities included chronic neurologic disease, acute stroke, chronic respiratory disease, chronic liver disease, neoplasia, chronic renal disease, diabetes mellitus, chronic heart failure, and immunocompromised status.
All statistical analyses were performed using R software version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) with the rms, Hmisc, foreign, pROC, sciplot, and dca packages except for sensitivity and specificity, LR+, LR-, NPV and PPV, which were calculated using MedCalc for Windows version 19 (Med-Calc statistical software, Mariakerke, Belgium).

Study population
A total of 15,830 patients visited the ED during August 2018-July 2019. Of these, 1927 (12.2%) patients had suspected sepsis, and 305 were excluded because they had been treated and transferred from other units. Consequently, 1622 patients with suspected sepsis were included in the final analysis. Of these, 457 (28.2%) met the primary outcome of all-cause in-hospital mortality, and 280 (17.3%) died within 7 days of admission. A total of 1382 (85.2%) were diagnosed with sepsis at hospital disposition according to Sepsis-3 definition. Patient characteristics are shown in Table 1. The study population's mean age ± SD was 72.6 ± 15.4 years, and 51.1% were female. Patients who had all-cause in-hospital mortality were older, had a greater prevalence of neoplasia, and more history of recent hospital admission. They also had significantly more severe abnormal initial vital signs, higher serum white blood cells and band-form cell counts, as well as higher rate of positive hemoculture, inotropic drug prescription and ICU admission compared with patients discharged alive.

Scoring systems
None of the 1622 patients with suspected sepsis had missing EWS values. All 4 mean score values except SIRS were significantly higher in those who died (Table 1). qSOFA and REMS showed clear association with both mortality outcomes whereas higher SIRS did not, and all-cause inhospital mortality rate did not increase between NEWS 0-5 and 6-7 ( Fig. 1 and Fig. S1). Distributions of scores amongst the cohort are shown in Fig. 2.

Score performance
Overall performance assessed by scaled Brier score and Nagelkerke's R square showed REMS had the best  Fig. 3). All EWSs had better discrimination by AUROCs for all-cause mortality within 7 days of admission compared to all-cause in-hospital mortality although the trend of results of AUROCs was similar ( Table 2 and Fig. 3). In pairwise comparisons between EWSs, REMS had significantly better discrimination than all other EWSs except for NEWS for both outcomes (Table 3). In subgroup analyses, all EWSs show better discrimination for all-cause in-hospital mortality and mortality within 7 days in those aged greater or equal to 70 years than those aged less than 70 years (Table S2). All EWSs show better discrimination for allcause mortality in those without chronic comorbidities compared with those with at least one chronic comorbidity, but an opposite trend was seen for all-cause mortality within 7 days of admission for SIRS and NEWS (Table S2). Calibration for SIRS showed underestimation of predicted mortality risk at lowest and highest SIRS scores ( Fig. 4 and S2). The other EWSs tended to be well-calibrated except for at underestimation of all-cause in-hospital mortality risk at high predicted probabilities in NEWS and for all-cause mortality within 7 days of admission for both NEWS and REMS ( Fig. 4 and S2). However, only a few patients had very high NEWS and REMS scores (Fig. 2).
Additional contribution of EWSs to baseline mortality risk model All baseline risk model plus an EWS had significantly better discrimination than the baseline risk model for all-cause in-hospital mortality (Table S3). For all-cause in-hospital mortality, the baseline risk model plus REMS showed the greatest improvement in discrimination over the baseline mortality risk models, followed by NEWS, qSOFA, and SIRS (Table S4 and Fig. S3). NEWS and REMS had significantly better discrimination than SIRS and qSOFA, but REMS was not significantly superior to NEWS (Table S3). The trend of results was generally similar for all-cause mortality within 7 days of admission except that NEWS did not have significantly better discrimination than qSOFA for both mortality outcomes (Table S3 and S4 and Fig. S3). Integrated discrimination improvement also showed REMS had the greatest improvement over the baseline risk models, followed by NEWS, qSOFA, and SIRS (Table S4). REMS showed the greatest percentage improvement in sensitivity for allcause in-hospital mortality and all-cause mortality within 7 days of admission compared with the baseline risk model (Table S4). Calibration plots for baseline risk model plus EWSs for both mortality outcomes generally were well-calibrated up to a predicted probability of 0.5 except for SIRS for all-cause in-hospital mortality. Above a predicted probability of 0.5 some models showed some over-or under-estimation of mortality risk (Figs. S4 and  S5).
The clinical usefulness of the EWS scores was assessed by sensitivity, specificity, PPV, NPV, LR+, and LR-( Table 2). For all-cause in-hospital mortality, SIR > 2 had the highest sensitivity but the least specificity. qSOFA > 2 had the highest specificity but lowest sensitivity. At optimal Youden Index cut points, NEWS > 8 and REMS > 9 had a balance of sensitivity and specificity, which favored sensitivity. PPV and NPV were similar for all EWSs except for a much lower PPV of SIRS. qSOFA had the highest LR+ while LR-of all EWSs were similar. For all-cause mortality within 7 days of admission, results of sensitivity and specificity were similar except that the optimal cut-off point for REMS was > 10, and results for PPV, NPV, LR+, and LR-were similar to the primary outcome.
In subgroup analysis for all-cause in-hospital mortality, results were generally similar to the full cohort except that REMS > 9 had higher specificity in those aged less than 70 years and higher sensitivity in those aged greater or equal to 70 years (Table S5). Subgroup analysis of allcause mortality within 7 days of admission showed similar changes to all-cause in-hospital mortality in sensitivity and specificity for REM > 10 (Table S5).
The NB for all-cause in-hospital mortality showed SIRS and qSOFA did not have an advantage over a treat-all strategy for all plausible threshold probabilities. The range of threshold probabilities over which any NB advantage over a treat-all strategy was 18-20% for NEWS and 14-20% for REMS. The number of avoided interventions per 100 patients at a threshold probability of 20% (NWT 5) using NEWS or REMS would be 1.1 and 2.6, respectively (Fig. 5). For all-cause mortality within 7 days of admission, all EWSs showed advantageous NB over a treat-all strategy within the plausible threshold probability range. NEWS and REMS had the lowest threshold probabilities at which advantage over a treat-all strategy began at 10% (NWT 10) and 4% (NWT 25), respectively. The number of avoided interventions per 100 patients in a hypothetical population at threshold probabilities of 10 and 20% for NEWS would be 2.4 and 23, respectively, and for REMS would be 2.4 and 25, respectively (Fig. 5). Results of NB analysis for baseline risk model + EWS were similar for both mortality outcomes (Fig. S6).

Discussion
We validated REMS, SIRS, qSOFA, and NEWS as EWSs for patients with suspected sepsis in the ED. We found that REMS and NEWS performed better than qSOFA and SIRS in predicting adverse outcomes of suspected sepsis patients in the ED. REMS has never been validated for this purpose, and it had the highest discrimination and the highest clinical utility assessed by NB among all the studied EWSs and was well-calibrated.
Identifying patients with life-threatening infection early in the ED is very important. Earlier recognition can lead to earlier initiation of effective and appropriate management. Despite being developed as a component of sepsis definition, SIRS was the first scoring system that was adapted for this purpose. However, it has been criticized for its low specificity [8,9]. Similarly in our study, SIRS could yield the highest sensitivity, but its sensitivity might have been too high and its specificity not high enough. Over 80% of all patients with suspected sepsis had a SIRS criteria 2, but  Table 3 Pairwise comparisons of area under the receiver operator characteristic curve of early warning scores for in-hospital mortality and mortality within 7 days among patients with suspected sepsis only 29.5% of them had adverse outcomes while its AUROC showed it to be a poor classifier, which may be worse than a random classifier. Furthermore, SIRS could not demonstrate a higher NB over the treat-all strategy for the in-hospital mortality outcome similar to a previous study [26] and had the lowest NB for the mortality within 7 days of admission outcome. qSOFA was invented and implemented in the Society of Critical Care Medicine and the European Society of Intensive Care Medicine task force 2016 to prompt physicians to suspect sepsis outside the ICU because it could more accurately predict adverse outcomes than SIRS. However, qSOFA has shown to have extremely high specificity at a cost of low sensitivity, especially in the ED where sepsis suspicion usually begins. Our results were also concordant with previous studies [26][27][28] with qSOFA having a sensitivity as low as < 50%. This could be explained by its low ability to detect sepsis at early disease course. It also could not demonstrate any additional NB on decision-making to a treat-all strategy for in-hospital mortality. Thus, qSOFA may not be an appropriate tool to detect early sepsis at ED presentation. Our study found that general EWSs performed better than scores developed for sepsis in predicting adverse outcomes associated with sepsis. This could have been because these EWSs have weighted score points and incorporate more physiological components than SIRS and qSOFA. NEWS is a widely used to identify clinicallydeteriorating patients, and it has been proven to have better accuracy both in wards and in the ED [16][17][18][19]. Similar to previous studies [16,17], we found that NEWS without baseline risk model could outperform SIRS and qSOFA in AUROC. It also showed higher NB Fig. 4 Calibration plots of early warning scores for all-cause in-hospital mortality in patients with suspected sepsis. a SIRS criteria. b qSOFA score. c NEWS score. d REMS score. Hollow circles denote groups of predicted risk. Vertical line through hollow circles denote 95% confidence intervals. The distribution of non-events of the outcome (0) and events of the outcome (1) by expected probability are denoted by the rug plot (light grey) along the x axis. Abbreviations: SIRS, systemic inflammatory response syndrome; qSOFA, quick Sequential Organ Failure Assessment; NEWS, National Early Warning Score; REMS, Rapid Emergency Medicine Score than qSOFA and SIRS over a narrow range of threshold probability.
Interestingly, the EWS with the best accuracy to predict adverse outcome in our study was REMS, which was originally developed to predict mortality in general ED patient with non-specific non-surgical conditions. It has never been validated to predict mortality outcomes after sepsis suspicion in the ED. Our study is the first study to validate REMS for this purpose. We found that REMS performed the best over a wide range of statistical analytic methods. The AUROCs of REMS alone and REMS with baseline risk model was the highest among all the EWSs. The score values were well-calibrated and associated with mortality outcomes. Moreover, it is the EWS with the highest NB over the widest range of threshold probabilities and highest number of avoidable interventions per 100 patients at any particular threshold probability of interest. The superiority of REMS over other EWSs might have been because age is a component of REMS. Older patients might have had higher risk of death secondary to sepsis. Subgroup analyses of patients older than 70 years also showed that all EWSs had better diagnostic and discrimination capacity for both mortality outcomes than in younger patients. REMS was also the best EWS in both subgroups of patients with and without comorbidities. However, we found an opposite trend of AUROCs between the two mortality outcomes in the subgroup analyses comparing patients with at least one and those without comorbidities. This inconsistency might have been because of a small number of participants in the subgroup without comorbidities. In fact, this interpretation should be treated with caution as some of the subgroups contained < 100 outcome events, which might have been too few for an external validation study.
At the recommended cut-point or the best cut-point according to the Youden index, no early warning scores has both high sensitivity and specificity. However, given  Abbreviations: CI confidence interval, SIRS systemic inflammatory response score, qSOFA quick sequential organ failure assessment score, NEWS national early warning score, REMS rapid emergency medicine scores their wide score ranges, our exploratory analyses of all cut-points of each scoring system revealed that both good rule-in and rule-out properties could be achieved with 2-point cut-offs. With one lower cut-point, more patients could be discharged with lower false negative rate and with another higher cut-point, more patients could be accurately diagnosed with a lower false positive rate. Although these more complex EWSs may be inconvenient for clinical use as a triage tool compared to more easily calculated SIRS and qSOFA, it could be feasible with the increasing use of electronic medical record that could provide automated score calculations as part of the triage process. Nonetheless, it is important to note that the overall prognostic accuracies of all scores were not high enough to be used regardless of clinical signs and symptoms. One should always use them in conjunction with clinical correlation. Another issue with validating the performance of EWSs for sepsis is the issue of different outcome definitions and how they affect results and interpretations. Inhospital and 30-day mortality were commonly used to provide sufficient numbers of observations for analyses. However, they may depend on health care service provisions and societal preferences/resources for patients and thus not truly represent mortality associated with sepsis. Patients in middle-income countries, unlike high-income countries with hospice facilities and home care, may need to receive their end-of-life care in a hospital. This could be evidenced by the much lower validation AUR-OCs in our study compared to other studies from higher-income countries, which shows poor transportability of their findings to our setting [16][17][18][19][20]. Moreover, the length of hospital stay range was as much as 310 days and the median time to mortality in our cohort was 27 days (data not shown), indicating prolonged hospital length of stay that may have been due to the lack of hospice service provision in Thailand. Therefore, our mortality within 7 days of admission outcome may be a more valid outcome definition for sepsis-related mortality in our setting and other similar settings. Also, it might be the most generalizable outcome definition to dissimilar settings. NEWS scores were better associated with 7-day mortality than in-hospital mortality. Also, REMS had a NB performance that makes it a more suitable EWS for decision-making with NWT as low as 25 and overall superiority over a treat-all strategy. If using the in-hospital mortality outcome, there is little clinical utility except for a narrow group of patients in terms of harm/cost-to-benefit ratio.

Limitations
There were several limitations to this study. First, it was conducted in a single tertiary care center in a middleincome country, which may limit the generalizability of the study. Second, we did not have a standardized criterion for sepsis suspicion. Physicians usually based their decisions on patients' clinical picture, combined with either qSOFA or SIRS. Moreover, we only included patients suspected of sepsis in the ED and not patients that we misdiagnosed and later went on to be diagnosed as sepsis after admission. Including those patients and defining a clearer sepsis suspicion criterion may better represent the true prognostic value of early warning scores. Another limitation is that we only measured one EWS value closest to the time of sepsis suspicion. Using the highest score from repeated measures may improve the accuracy of scoring systems but may not have represented the real clinical situation in the ED, where treatment decisions usually begin at early ED arrival.

Conclusion
REMS was an early warning score with higher accuracy than sepsis-related scores (qSOFA and SIRS), similar to NEWS. It also had the highest utility in terms of net benefit compared to SIRS, qSOFA and NEWS in predicting in-hospital mortality in patients presenting to the ED with suspected sepsis.