Trauma scoring systems are prominently used to rapidly determine injury severity in order to facilitate triage and predictions of prognosis [12]. The aim of this study was to evaluate the prognostic abilities of the simpler GAP and MGAP scores as compared to the more complex RTS in predicting trauma mortality in a resource-deficient emergency department in a LMIC hospital. Validating the discriminatory ability of these scores will enable future studies on their utility for both prospective triage as well as retrospective QI projects.
Thus far, the primary discussion has focused on the strengths of the RTS, including its potential in maximizing time efficiency. However, the RTS has potential weaknesses that have hindered its implementation. Firstly, its use can result in false negatives in cases of severe injury in a single body area [15]. It additionally neglects the impaired bodily resilience associated with aging. For these reasons, the other studied scores (MGAP and GAP) have been advocated by researchers to meet these flaws. As previously noted, the MGAP scoring system has been validated by a study in France [11] due to the critical importance of validation of these scores prior to application in clinical practice to prevent adverse outcomes [16]. Sartorius et al. expanded on this characterization by demonstrating the MGAP system can clearly outline the differences in mortality outcomes between low-, intermediate-, and high-risk groups, even more specifically than the Triage-Revised Trauma Score (T-RTS), RTS, and TRISS [11]. Like the TRISS, the MGAP score incorporates two anatomic components that distinguishes it from the RTS. The first is the mechanism of trauma, which helps to cover the largest subset of false negatives produced by the RTS [17, 18]. The second component is age. Age is considered an important factor in predicting mortality, which is significantly higher among the elderly, who often have weakened adaptive responses [19, 20].
Another area of the trauma score discussion that requires further exploration (especially in low-resource settings) is the concept of resource allocation, as this can be a major hinderance to feasibility. An example of this is the New Trauma Score (NTS) introduced by Jeong et al. in 2017, which improved on the RTS by using peripheral oxygen saturations (SpO2) instead of the RR as well as modified point values for the GCS and SBP aspects [21]. Despite its predictive success, however, the MGAP and GAP scores were found to be superior to the NTS in more than one aspect with respect to application in low resource settings. For instance, the NTS depends on measuring the patient’s SpO2; however, pulse oximeters are often not available upon initial presentation to the emergency department in such settings, being reserved for the ICU (if available). Likewise, the RTS similarly suffers from a reliance on accurate measurement of a patient’s RR, which at the time of a trauma code may require similar equipment. Accordingly, the MGAP and GAP scores can be more feasibly and accurately calculated for trauma patients in modest-resource trauma centers and critically for patients at the time of presentation rather than at a delayed timepoint [21]. Jeong et al. subsequently concluded that the NTS is better than the RTS but fails to overtake the efficacy and efficiency of the MGAP and GAP scores.
Our findings were consistent with the literature. The patient population was found to be representative of the international trauma epidemic, with younger males known to be disproportionately affected at 2–3 times the rate of females (Table 2). Patients that ultimately survived showed higher average RTS, MGAP, and GAP scores, accurately representing stabler vital signs and likely better overall condition (Table 4). Most importantly, the data revealed all three scores to have good AUROC values (0.881 for the RTS, 0.890 for the GAP score, and 0.879 for the MGAP score), demonstrating efficacy as a predictive measure. There were no statistical differences detected between the scores using DeLong’s test. These results were consistent with those reported previously by Ahun et al. and Jeong et al. [7, 21]. Importantly, these results were also consistent with a similar study in another LMIC setting (Mumbai, India), where the authors calculated AUROC values of 0.85, 0.85, and 0.84 for the RTS, GAP, and MGAP scores respectively, which may suggest broader applicability [22].
Evaluating the prevalence-dependent statistics in particular additionally generated several points of discussion regarding clinical utility. The use of more liberal score cutoff values resulted in excellent negative predictive values above 95%, demonstrating efficacy in the ability to rule out mortality in a low-resource setting. In particular, the MGAP score was especially adept at capturing almost the entirety of the mortality subgroup with a sensitivity of 94%. The more liberal cutoff values, however, presented a secondary issue in that with PPVs at around 50%, a large quantity of resources may potentially be diverted to a significant volume of patients mischaracterized from the survivor subgroup. A confounding factor here is that many of these “false positives” showed lower scores due to significant physiological derangement and morbidity that required ultimately successful intervention in the ICU. Accordingly, this probably represents a desirable manner of both ruling out severe outcomes and accurately identifying critically ill patients. In contrast, usage of more conservative cutoff scores optimized the predictive values of the scores (perhaps improving resource efficiency); however, this resulted in close to 50% of the mortality subgroup being missed as false negatives, which is especially problematic given the high mortality rate (18%) of the studied trauma population. Overall, considering the advantage that it can be reasonably and accurately applied to the evaluation of patients at presentation while also maintaining the best sensitivity with comparable predictive values, use of the MGAP score could robustly improve the ability to triage in LMICs with a look to reducing morbidity and mortality in a cost-effective manner. This aspect is especially important in high volume, low-resource environments, where many critically ill patients may be missed by physicians due to time and attention constraints.
Despite the implicated potential for prognostic-based triage, the implementation of these scoring systems in LMIC trauma care should be conceived cautiously with frequent quality assessment. Many studies have highlighted the importance of the motor component of the GCS score for prognosis. However, the absence of specific GCS details in a large percentage of trauma records prevents the evaluation of this hypothesis [23]. Ultimately, the efficacy of such scoring systems for prospective use in triage would depend on the reliability of these specific components as well as the time at which these scores are determined. Ideally, such scores could be calculated in the pre-hospital stage; however, in locations similar to Egypt, this may not be feasible due to the lack of trained ambulatory staff and electronic communication with the hospital. Accordingly, these scores would likely need to be determined after the application of initial Advanced Trauma Life Support (ATLS) guidelines, which may affect their utility in affecting disposition, though they can still serve as an evidence-based measure to buttress care plans.
Additionally, some limitations of the study should be mentioned. Firstly, this study was a retrospective study, which is inherently subject to more confounding variables than a prospective study which can be more standardized. Specifically, the data analyzed are limited to that which is already recorded in the existing registry. Here, a considerable number of records (23% of the eligible cases) required exclusion as they simply lacked the required data (GCS scores, vitals, etc.) needed to calculate the scores in the study. Of note, this figure was substantially lower than that of similar studies in other LMICs where the RTS could not be calculated in 65–98% of cases retrospectively, likely reflecting local differences in documentation [3]. Even so, this exposes the study to a sampling bias since, for instance, it may be that the assessing physicians only felt the need to record certain data points if they were critical to the patient’s management. The lack of uniform guidelines for triaging patients in the hospital could also affect the quality of medical care delivered to the patient, potentially skewing mortality statistics.
Retrospective studies additionally preclude certain analyses. For example, the Kampala Trauma Score (KTS) has been proposed as a potential triage tool, showing efficacy as a discriminator of trauma mortality in other resource-limited settings [3]. However, the score relies on an assessment of neurological status that could not be determined retrospectively in this study, as only total GCS scores are recorded in the existing records. In a similar vein, the lack of temperature data (found in only 22.1% of the records) precluded the study of the Worthing Physiological Scoring system, which was found to be to be superior to the RTS in predicting both mortality and morbidity in another study in Iran, though both scores still showed good discriminatory capacity [24]. Furthermore, the records used do not contain documentation of the ISS nor all the required information to calculate it. Thus, we were unable to use the ISS as an intermediate point, which has been used in many similar studies. Due to the established robustness of the ISS, other studies have often tested new scores’ mortality prediction abilities against the ISS. The lack of ISS data accordingly prevented further validation here. Finally, the pediatric population was excluded, so the findings reported are only applicable to adults.
Another limitation of trauma assessment in general lies in the variability in measuring vital signs. In the hospital setting, there is an intra-observer variability of measuring the cardiac pulse of up to 10–15%, SBP of up to 20–25%, and RR by more than 30% where these are not measured electronically. It is accordingly expected that the reproducibility of trauma scores that incorporate vital signs strongly depends on the reliability of their measurement [25]. This reliability can falter in low-resource settings due to reasons such as low staff-to-patient ratios, dysfunctional equipment, and disorganized emergency rooms. Similarly, the data collected in this study comes from one study center, making the generalizability of the findings limited to settings with similar conditions as described above. Specifically, predictive value statistics depend on the prevalence of the tested condition. Accordingly, variances in this can affect the applicability of these results.
Variances in the study cohort compared to other regions can also affect general applicability. Given the relative youth of the cohort, it was not surprising that the rate of co-morbidities was low in this study (22.4%), though in a low-resource setting this can reflect a lack of adequate primary care. It should be noted, however, that in populations with a higher prevalence of co-morbidities, predictive statistics established here may be less applicable given the increased propensity in these patients for rapid deterioration. Additionally, 85.7% of the cases in this study were blunt trauma cases, which tends to be more prevalent in the majority of areas internationally. However, in areas with higher incidents of penetrating trauma (such as areas marred by gun violence), the discriminative ability of the scores used here may also be less applicable.
Moving forward, future studies are required to further enhance the discussion. The implementation of prospective studies to validate these scores can help to reduce the confounding variables of a retrospective study, as the data collected can be standardized. Such studies would additionally allow for assessment of the practicality of using scores for triaging decisions in real-time, which is possibly the most important characteristic required of a trauma score in a high-volume, low-resource setting. For instance, one study validated the KTS as a retrospective classifier of injury but found the predictive value may not be strong enough to merit use as a triage tool [26]. From the retrospective angle, an important area of research that requires further investigation is the implementation of these trauma scores as part of a QI process, as there is a dearth of literature available from LMICs [3]. Finally, it would be beneficial to assess these scores at multiple study centers across the region to enhance the generalizability of the results.