The effectiveness of physiologically based early warning or track and trigger systems after triage in adult patients presenting to emergency departments: a systematic review

Background Changes to physiological parameters precede deterioration of ill patients. Early warning and track and trigger systems (TTS) use routine physiological measurements with pre-specified thresholds to identify deteriorating patients and trigger appropriate and timely escalation of care. Patients presenting to the emergency department (ED) are undiagnosed, undifferentiated and of varying acuity, yet the effectiveness and cost-effectiveness of using early warning systems and TTS in this setting is unclear. We aimed to systematically review the evidence on the use, development/validation, clinical effectiveness and cost-effectiveness of physiologically based early warning systems and TTS for the detection of deterioration in adult patients presenting to EDs. Methods We searched for any study design in scientific databases and grey literature resources up to March 2016. Two reviewers independently screened results and conducted quality assessment. One reviewer extracted data with independent verification of 50% by a second reviewer. Only information available in English was included. Due to the heterogeneity of reporting across studies, results were synthesised narratively and in evidence tables. Results We identified 6397 citations of which 47 studies and 1 clinical trial registration were included. Although early warning systems are increasingly used in EDs, compliance varies. One non-randomised controlled trial found that using an early warning system in the ED may lead to a change in patient management but may not reduce adverse events; however, this is uncertain, considering the very low quality of evidence. Twenty-eight different early warning systems were developed/validated in 36 studies. There is relatively good evidence on the predictive ability of certain early warning systems on mortality and ICU/hospital admission. No health economic data were identified. Conclusions Early warning systems seem to predict adverse outcomes in adult patients of varying acuity presenting to the ED but there is a lack of high quality comparative studies to examine the effect of using early warning systems on patient outcomes. Such studies should include health economics assessments. Electronic supplementary material The online version of this article (10.1186/s12873-017-0148-z) contains supplementary material, which is available to authorized users.


Background
Serious clinical adverse events are related to physiological abnormalities and changes in physiological parameters, such as blood pressure, pulse rate, temperature, respiratory rate, level of consciousness, often precede the deterioration of patients [1][2][3][4]. Early intervention may improve patient outcomes and failure to recognise acute deterioration in patients may lead to increased morbidity and mortality [5,6]. Early warning systems and track and trigger systems (TTS) use routine physiological measurements to generate a score with pre-specified alert thresholds. Their aim is to identify patients at risk of deterioration early and trigger appropriate and timely responses known as escalation of care.
Early warning systems are used increasingly in acute care settings and several countries have developed National Early Warning Scores (NEWS). In Ireland, the National Clinical Guideline on the use of NEWS for adult patients came into effect in 2013 [7]. In the UK, The Royal College of Physicians (RCoP) published a National Early Warning Score in 2012 [8], and the National Institute for Health and Care Excellence (NICE) recommends the use of a TTS to monitor hospital patients [9]. In Australia, the Early Recognition of Deteriorating Patient Program introduced a TTS [10]. Similarly, in the USA, Rapid Response Systems with fixed "Calling Criteria" are recommended to trigger adequate medical response [11].
Many acutely ill patients first present to the emergency department (ED). The ED is a complex environment, distinctly different from other hospital departments. Visits are unscheduled and patients attend with undiagnosed, undifferentiated conditions of varying acuity. Medical staff must care for several patients simultaneously, deal with constantly shifting priorities and respond to multiple demands due to the unpredictable nature of the ED environment [12,13]. Initial triage determines the priority of patients' treatments but following triage, continuous monitoring and prompt recognition of deteriorating patients is crucial to escalate care appropriately. Early warning systems are sometimes used as an adjunct to triage for early identification of deterioration in the ED, particularly in situations of crowding [14]. Common early warning systems such as the Modified Early Warning Score (MEWS) [15] are used frequently and validated against specific subgroups of patients (e.g. acute renal failure, myocardial infarction, etc.) but may not be directly transferable to an ED setting [14] where patients present with a variety of unspecified conditions. There was an urgent need to evaluate the use of early warning systems and TTS in the ED.
The review addressed five objectives: 1. To describe the use, including the extent of use, the variety of systems in use, and compliance with systems used, of physiologically based early warning systems or TTS for the detection of deterioration in adult patients presenting to the ED; 2. To evaluate the clinical effectiveness of physiologically based early warning systems or TTS in adult patients presenting to the ED; 3. To describe the development and validation of such systems; 4. To evaluate the cost effectiveness, cost impact and resources involved in such systems; 5. To describe the education programmes, including the evaluation of such programmes, established to train staff in the delivery of such systems.

Study design & scope
We conducted a systematic review, which we report according to the PRISMA guidelines [16]. The scope is presented in Table 1 Fig. 1 [16].

Study selection & extraction
Two reviewers (FW, and PM or SD) independently screened the titles/abstracts. For additional resources, the information specialist (AC) sifted through the search results for potentially eligible studies. Full text reports from databases and additional resources were assessed for inclusion by two reviewers independently (FW, PM) and discrepancies were resolved by discussion or by involving a third person (DD). Data extraction forms were designed for each of the six types of studies. Data extraction was completed by two reviewers (FW, PM). Each reviewer extracted data from half of the included reports and 50% of entries were checked by a second reviewer for accuracy. The data elements that were extracted are available in Additional file 2. Two reviewers (FW, and VS or DD) independently assessed the Risk of Bias (ROB)/methodological quality of the included reports, using the instruments listed in Table 2.

Data analysis
Data were summarised in evidence tables and synthesised narratively for use of warning systems, compliance, effects of systems on patient outcomes, development and validation of systems, and cost-effectiveness studies. For the effects of systems on patient outcomes, a metaanalysis was planned but was not performed due to the limited number of studies (n = 1). For validation studies, we provided results for AUROC (area under the receiver operating characteristic curve) [17]. It equals one for a perfect test and 0.5 for a completely uninformative test. For health economics studies, we planned to examine the cost-effectiveness but no such studies were identified. The GRADE (Grades of Recommendation, Assessment, Development and Evaluation) approach was used to assess the certainty of the body of evidence for effects of systems on patient outcomes.

Results
A total of 6397 records were identified. After removal of duplicates, 1147 database records were screened by title/abstract. Full texts of 83 records were assessed of which 43 studies (44 records) were included. The most common reason for exclusion was 'non ED setting' (n = 24). One study in Chinese was identified but the abstract was in English and ▪ Use of healthcare resources associated with early warning systems or TTS use, including direct medical resource costs (staff time, education time and cost, additional referrals), indirect costs (associated with loss of productivity) and other non-medical costs (e.g. patient out-of-pocket expenses) ▪ Cost savings, cost effectiveness measures such as Incremental Cost-Effectiveness Ratios (ICERs), Quality Adjusted Life Years (QALYs) • Types of education programmes • Strategies and methods to evaluate education programmes of early warning systems or TTS S The following six types of studies were included: a. Descriptive studiestypes and use of systems: Studies that described types or variety of early warning systems or TTS used and the extent to which they were used in clinical practice. b. Descriptive studiescompliance: Studies that described compliance with early warning systems or TTS in clinical practice. c. Descriptive studieseducation programmes: Studies that described education programmes to train healthcare professionals in delivering early warning systems or TTS. d. Effectiveness studies: Studies that examined the effectiveness of an early warning system or TTS on outcomes for adults admitted to the ED, following triage and that had a controlled design (i.e., RCTs, non-RCTs, controlled before-and-after studies, interrupted time series designs and cohort studies with historical controls). Studies that evaluated the effects of the system on relevant outcomes without control (e.g. case series, cohort studies without historical control) were included in the descriptive category. e. Development and validation studies: Development studies were defined as studies that focused on the development of early warning systems or TTS while validation studies assessed the predictive ability of such systems. Studies in this category needed to include adult patients both with and without the reference outcome (such as admission to intensive care or mortality) or were otherwise considered a descriptive study. For the purpose of classification, we regarded studies as 'development' studies if reference ranges, parameters, and/or design of scoring systems were identified based on the outcomes of the study sample (for example, through the use of receiver operating characteristics [ROC] curves). In validation studies, such reference criteria were already determined and their predictive ability was evaluated in a new sample of patients. f. Health economics: Full economic evaluation studies (cost-effectiveness analysis, cost-utility analysis and cost-benefit analysis), cost analysis and comparative resource use studies comparing early warning systems or TTS to one or more standard treatments. These may have included any study that met the eligibility criteria for the review of effectiveness; hence studies in other categories might have been also been included here.
presented relevant data that we included [18]. Five studies of the 56 screened additional resources were included. The results of the search/selection are presented in Fig. 1.
Extent of use and compliance with early warning systems and track and trigger systems (1) Four studies described the use of early warning systems within the ED and five studies examined compliance.
The studies examining the extent of use collected data from medical records [19], a survey [20], a web-survey [21], and through participatory action research [22]. Considine et al. [19] described a pilot study of a 4parameter system in the ED of a hospital in Australia and found that nurses made 93.1% of activations, the most common reasons being respiratory (25%) and cardiac (22.5%) and the median time between documenting physiological abnormalities and ED early warning system activation was 5 min (range 0-20). A survey in 2012 of 145 (57% response rate) clinical leads of EDs in the UK showed that 71% used an early warning system, most commonly the MEWS (80%) [20]. A survey in seven jurisdictions in Australia, found that 20 of 220 hospitals had a formal rapid response system in the ED but the prevalence of early warning systems in EDs was not reported [21]. Coughlan et al. [22] reported insufficient information in a conference abstract. The findings of these four studies demonstrate that multiple early warning systems are available and the extent of their use in the ED may vary geographically but limited data precludes comparisons between countries. Three retrospective studies [23][24][25], one prospective study [27] and one audit (before and after early warning system implementation) [26] examined compliance with recording early warning system parameters. There was large variation in compliance ranging from 7% to 66% and factors such as patients' triage category, age, gender, number of medications, length of hospital stay and the level of crowding in ED affected compliance with early warning systems [24]. Christensen et al. [23] reported a rate of 7% (22/300) of calculated scores in the clinical notes; however, 16% of records included all five vital signs. Heart rate (HR), shortness of breath (SOB) and loss of consciousness (LOC) were reported in 90-95% of records. Compliance with escalation of care varied; all nine patients that met the trauma call activation criteria had triggered a trauma call but only 24 of the 48 emergency call activation criteria had been responded to. Austen et al. [25] found a higher compliance with 66% of records containing an aggregate score, although only 72.6% were accurate. In an audit, the preimplementation rate (30%) of abnormal vital sign identification was significantly lower than the post-implementation (53.5%) rate (p = 0.007) but no details of the implementation strategy were described [26]. Wilson et al. [27] compared the TTS scores recorded in charts with scores calculated retrospectively and found that 60.6% of charts contained at least one calculated TTS score but 20.6% (n = 211) were incorrect. This was mainly because of incorrect assignment of the score to an individual vital sign, which led to underscoring and reduced escalation activation. Hudson et al. [26] found that using a standardised emergency activation chart resulted in a higher percentage of abnormal vital signs recording (p = 0.007).

Effects of early warning systems and track and trigger systems (2)
One non-randomised controlled design compared the effect of the MEWS (n = 269), recorded by emergency nurses every four hours, with clinical judgment (n = 275) in patients who are waiting for in-patient beds in the ED of a large hospital in Hong Kong [28]. It found that the MEWS might increase the rate of activating a critical pathway (1 per 10 patients with a MEWS >4 versus 1 in 20 patients based on clinical judgement) but might make little or no difference to the detection of deterioration or adverse events (0.4% is both groups). We assessed the overall body of evidence as very low quality (GRADE) due to serious imprecision and high ROB (Additional file 3).

Development & Validation studies of early warning systems and track and trigger systems (3)
A scoping review by Challen et al. [64] identified 119 tools related to outcome prediction in ED; however, the majority were condition-specific tools (n = 94). They found the APACHE II score to have the highest reported AUROC curve (0.984) in patients with peritonitis.
Of the 36 primary development and/or validation studies, 13 were retrospective, 22 were prospective studies and one was a secondary analysis of a Randomised Controlled Trial (RCT) [48]. Eight studies developed and validated (in the same sample) an early warning system, while 28 validated an existing system in a different sample. Three studies included a random sample [30,39,43] and participants in the remaining studies were recruited consecutively or the sampling strategy was not stated clearly.
A total of 28 early warning systems were developed and/or validated. Churpek et al. [65] classified early warning systems into single-parameter systems, multipleparameter systems and aggregate weighted scores. The early warning systems examined in the studies included primarily aggregate weighted scores ( Table 3).
The most common outcomes examined were in-hospital mortality (n = 21), admission to ICU (n = 12), mortality (not specified where or during a specific follow up time frame possibly beyond hospital discharge) (n = 11), hospital admission (n = 7), and length of hospital stay (n = 5). Only one study measured the number of patients identified as critically ill as outcome [50]. Overall, the APACHE II score, PEDS, VIEWS-L, and THERM scores appeared relatively better at predicting mortality and ICU admission. The MEWS was the most commonly assessed tool and the cutoff value used was 4 or 5, with the exception of Dundar et al. [41] who found an optimal cut-off of 3 for predicting hospitalisation. To synthesise the findings, studies were categorised into three groups according to the degree of differentiation of the ED patient group: a patient group in a specific triage category(ies), a patient group with a certain (suspected) condition or an undifferentiated patient group. Findings are presented in Tables 4, 5 and 6 and full details are provided in Additional file 4.
We did not identify studies that examined the cost effectiveness of early warning systems or TTS in EDs, nor did we find any studies evaluating related educational programmes (objectives (4) and (5)).

Discussion
Multiple early warning systems were identified but the extent to which they are used in the ED seems to vary across countries for which data were available in the nine included descriptive studies. Moreover, incorrect score calculation was common. Compliance with recording aggregate scores was relatively low although the vital signs HR and BP were usually recorded. This finding emphasises the importance of effective implementation strategies. However, we did not identify any studies examining educational programmes for early warning systems. Existing guidelines regarding the use of early warning systems  [39] Emergency severity index (ESI) [32] Acute Physiology and Chronic Health Evaluation score (APACHE II) [31,33,52,59] Assessment Score for Sick patient Identification and Step-up in Treatment (AS-SIST) [50] Bispebjerg EWS (BEWS) [30] Charlson comorbidity index (CCI) [32,38,60] Early Warning Score (EWS) [55] Logistic Organ Dysfunction System (LODS) [48] Mainz Emergency Evaluation Score (MEES) [35] Modified Early Warning Score (MEWS) [18, 29, 31, 32, 35-38, 41-44, 50, 51, 54, 56-58, 60, 63] MEWS plus [43] Modified REMS (mREMS) [45] Morbidity Probability Model at admission (MPMO II) [48] National Early Warning Score (NEWS) [35,40,47,49,53] National Early Warning Score including Lactate (NEWS-L) [47] Patient Status Index (PSI) [61] Predisposition, Insult/Infection, Response, and Organ dysfunction model (PIRO) [59] Prince of Wales ED Score (PEDS) [31,35] Rapid Acute Physiology Score (RAPS) [33,34] Rapid Emergency Medicine Score (REMS) [29,31,[33][34][35]37] Revised Trauma Score (RTS) [31] Sequential Organ Failure Assessment (SOFA) [52] Simple Clinical Score (SCS) [35] New Simplified Acute Physiology Score (SAPS II) [48,52] The Resuscitation Management score (THERM) [35] Triage Early Warning Score (TEWS) [62] VitalPAC Early Warning Score (VIEWS) [41] VitalPAC Early Warning Score-Lactate (VIEWS-L) [46] No multiple parameter systems were identified to monitor acute patients in hospital do include educational tools but are not specific to the ED [7,8]. Using early warning systems in the ED would likely require contextual adaptation to the ED environment, for example broadening of the ranges of physiological parameters to reflect acutely unwell patients' physiology. In implementing an early warning system in the ED, staff training could consist of a joined core package applicable to any service supplemented by an ED specific component. The performance of early warning systems in the ED will also depend on the time patients spend in the ED, which varies substantially between countries. Evidence from 36 validation and development studies demonstrated that early warning systems used in ED settings seem to be able to predict adverse outcomes, based on the AUROC, but there is variability between studies. All but two early warning systems were aggregated scores, which limited the ability to compare between single, multiple parameter and aggregated scores. The APACHE II score, PEDS, VIEWS-L, and THERM scores were relatively best at predicting mortality and ICU admission, providing excellent discrimination ability (AUROC >0.8) [66].
The MEWS was the most commonly assessed system but findings suggest a relatively lower ability to predict mortality and ICU admissions compared to the four scores mentioned above, with only some studies indicating acceptable discriminatory ability (AUROC >0.7) and other studies indicating a lack of discriminatory ability (AUROC <0.7) [66], especially for the outcome of ICU admission. The exception was one low ROB study that found excellent discriminatory ability of MEWS for the outcome in-hospital mortality (AUROC 0.89) [41]. This was the only study that examined the MEWS in an undifferentiated sample, which could contribute to this observed difference. However, the ability of early warning systems to predict adverse outcomes does not mean that they are effective at preventing adverse outcomes through early detection of deterioration.
Only one study addressed this question and it found that the introduction of an early warning system may have little or no difference in detecting deterioration or adverse events; however, the evidence was of very low quality making it impossible to draw any strong conclusions. The effectiveness of early warning systems also highly depends on an appropriate response to such systems. If effective,  the role of early warning systems in the ED could primarily be to assist with patient and resource management in the post-triage phase, when the time for patients to see a treating clinicians is prolonged (overcrowding). They could also provide additional information to help determine who to refer to critical care admission or to guide discharge from the ED, but this is currently not generally their purpose in places where they have been implemented in the ED. Recent studies also show that additional laboratory data (e.g. D-dimer, lactate) might enhance the performance of early warning systems in predicting adverse outcome [67,68]. The cost effectiveness of early warning systems remains unclear. While it is clear that implementing early warning systems requires a healthcare resource investment, the degree to which such systems may or may not result in cost savings remains unclear, particularly since the effectiveness of early warning systems in the ED is uncertain. The limited evidence base suggests that early warning systems might be effective in, for example, identifying deteriorating patients. This could result in improved patient outcomes and, should these effects exist, the potential healthcare cost savings could go towards funding, at least to some degree, their implementation. While this theory is open to question, it highlights the need to conduct primary research studies that directly evaluate their cost effectiveness. Such studies should focus on the monitoring of resource use, costs and patient outcomes in order to determine whether early warning systems are likely to deliver good value for money.

Limitations
We did not translate reports although only one non-English study was identified. We could not pool findings of the validation studies due to clinical heterogeneity; however, the AUROC were provided to inform accuracy of the models. Strengths of the review lie in its thorough search strategy, its scope and inclusion of different designs to best address the objectives and in its rigorous methodology with dual independent screening and quality assessment.

Conclusions
There are a lack of high quality RCTs examining the effects of using early warning systems in the ED on patient outcomes. The cost-effectiveness of such interventions, compliance, the effectiveness of related educational programmes and barriers and facilitators to implementation also need to be examined and reported as presently there is a clear lack of such evidence.