Study design, setting and participants
This study was approved by the Institutional Review Board of China Medical University (CMUH109-REC1-021). All methods were performed in accordance with relevant guidelines, and individual informed consent was waived because of the study design. It was a retrospective research by applying ED datasets of two hospitals in Taiwan. Dataset since Jan. 2018 to Dec. 2018 from China Medical University Hospital (CMUH) was used for model construction and internal validation, and dataset since Jan. 2018 to Dec. 2019 from Asia University Hospital (AUH) was applied for external validation. CMUH is a 1700-bed, urban, academic, tertiary care hospital with approximately 150 000 to 160 000 ED visits annually. AUH is a 482-bed regional hospital, and annual ED visits are around 36,000 persons.
The computerized TTAS system evaluates (a) trauma or nontrauma; (b) chief complaints; (C) injury mechanisms; and (d) first-order modifiers, such as vital signs (including degree of respiratory distress, hemodynamic stability, conscious level, body temperature, and pain severity), to determine the triage level. Secondary order modifiers are used if the triage level cannot be determined according to these variables. The 2 main systems of TTAS are the traumatic and nontraumatic systems. The nontraumatic system contains 13 categories with 125 chief complaints (pulmonary, cardiovascular, digestive, neurological, musculoskeletal, genitourinary, ear, nose, and throat–related, ophthalmologic, dermatologic, obstetric and gynecologic, psychiatric, general, and other disorders) [25].
Adult patients (aged over 20 years) with TTAS level 3 were enrolled, and we excluded patients with the following criteria: 1) death on arrival, 2) trauma, 3) having left without being seen, 4) discharge against medical advice, 5) admission to either ward or ICU, 6) transfer to another hospital, 7) missing information, and 8) inconsistent data (i.e., systolic blood pressure (SBP) > 300 mmHg or < 30 mmHg, diastolic blood pressure (DBP) > 300 mmHg, SBP < DBP, pulse rate > 300/min or < 20/min, respiratory rate > 60/min, body temperature > 45 °C or < 30 °C, and body mass index (BMI) > 150 or < 5).
Data collection
The triage data were recorded routinely by each triage nurse and were extracted from electronic databases in two hospitals. The information included age, gender, BMI, vital signs, consciousness, indwelling tube, whether the patient was transferred and the facility the patient was transferred from, mode of arrival, bed request, comorbidity, pregnancy, frequency of intensive ED visits (> 2 times a week or > 3 times a month), 72-h unscheduled returns, and the system of chief complaint.
Machine learning models
In this study, we used the following 5 machine learning classification models: CatBoost, XGBoost, decision tree (DT), random forest (RF), and logistic regression (LR) [26,27,28,29,30]. We explored the parameter space and common variations for each machine learning classification model as thoroughly as computationally feasible. XGBoost uses no weighted sampling techniques, which slows its splitting process compared with that of gradient-based one-side sampling and minimal variance sampling (MVS). CatBoost offers a new technique called MVS, which is a weighted sampling version of stochastic gradient boosting. CatBoost-weighted sampling happens at the tree-level and not at the split-level. The observations for each boosting tree are sampled to maximize the accuracy of split scoring. DT is one of the earliest and most prominent machine learning based on decision logics. DTs have multiple levels in which the first or topmost node is called the root node. All internal nodes represent tests on input variables or attributes. Depending on the test outcome, the classification algorithm branches toward the appropriate child node where the process of testing and branching repeats until it reaches the leaf node. RF is a classification algorithm that works by forming multiple DTs to train and test the classes it outputs. A DT effectively learns the characteristics of simple decision rules that are extracted from the data. The deeper the tree, the more complex the rules and the healthier the decision. RFs overcome problematic trees that are over-adapted to decision-making. LR can be considered as an extension of ordinary regression and can model only a dichotomous variable that typically represents the occurrence or nonoccurrence of an event. LR helps in finding the probability that a new instance belongs to a certain class. Supervised learning is mainly used to learn a model by learning the training data of multiple features and predicting the result of the target variable through the model. The model is represented by a mathematical function; furthermore, by using the objective function of predicting Y from a given X—wherein the parameters of the model are learned, adjusted from the data, and depend on the predicted value—we can classify problem types into regression or classification.6
The dataset of CMUH was divided into 2 subsets, where 80% of the sample was for the training set and the remaining 20% sample was used to test the trained mode. Besides, all the data from AUH was used for the externa validation. To indicate prediction performance, we computed the receiver operating characteristics (ROC) curve, the area under the ROC curve (AUC), sensitivity, specificity, positive predicted value, and negative predictive value. All ML algorithms and competed performance analysis were conducted using scikit-learn and the XGBoost library.
Feature selection
We filtered the data to the remaining 32 pieces of inspection information and began to train and evaluate our model, which we then discussed with another doctor who picked out the least used feature. In general, a few or several variables are commonly used in machine learning predictive models and are not associated with the response. In practice, including such irrelevant variables leads to unnecessary complexity in the resulting model. Therefore, in this study, we used the popular feature importance selection tool scikit-learn to choose the most effective attributes in classifying training data. This algorithm assesses the weight of each variable by evaluating the Gini index regarding the outcome and then ranks the variables according to their weights.
Parameter optimization
Machine learning algorithms involve a few hyperparameters that must be fixed before the algorithms are run. Grid search is often used in the machine learning literature, and it is used to optimize the hyperparameters of the machine learning model. Grid search is a traversal of each intersection in the grid to find the best combination. The dimension of the grid is the number of super parameters. If there are k parameters and each parameter has candidates, we must traverse k × m combinations. Grid search yields good results at the expense of very slow implementation efficiency. Bergstra and Bengio noted that random search is more efficient than grid search [31]. In this study, only when the number of searches was the same were the random search results the same as the web search results. Therefore, the web search speed was slow, but the optimization result was better. For the detailed hyper-parameterization of the algorithms, please refer to the scikit-learn documentation [32].
Outcomes
The outcome was a short DLOS of < 4 h in the ED, and the DLOS was measured as the time interval between being registered in the ED triage and being discharged from the ED.