Study design and setting
As we reported in abstract form, we developed the model using a data set that included all Emergency Department (ED) encounters (132,748 encounters, of which 549 were septic) from January 2006 through December 2008 from both LDS Hospital and Intermountain Medical Center . In that retrospective study, data available within 1 h of ED arrival (temperature, mean blood pressure, heart rate, and white blood cell count) were used for a model with an area under the curve (AUC) of 0.836. The prospective validation we report here covered April 2009 through June 2010 and was conducted at Intermountain Medical Center. Intermountain Medical Center is a 456-bed teaching hospital affiliated with the University of Utah School of Medicine which replaced LDS Hospital in October 2007 as Intermountain Healthcare’s tertiary referral center and Level One trauma facility. (LDS Hospital remained in operation, transitioning into a secondary referral center status.)
Selection of participants
Patients were identified as having severe sepsis or septic shock based on manual chart review by trained data and chart abstractors of patients with compatible International Classification of Disease-9 (ICD-9) discharge diagnosis codes as part of a broader quality improvement effort . Patients were selected as candidates for chart abstraction if they were admitted directly from the ED to the intensive care unit (ICU) and either had a discharge diagnosis code related to sepsis or infection or were identified separately by quality improvement coordinators in the ICU. The ICD-9 codes were chosen to include broad representation of diagnoses associated with sepsis or infection and thus to maximize sensitivity with the knowledge that chart review would subsequently minimize potential false positive results. Encounters for all patients 14 years of age or older at the time of the encounter were included in the prospective validation.
Methods and measurements
A chart review datamart was populated daily based on discharge diagnoses with subsequent chart review by trained data collectors. In addition to determining whether patients met consensus criteria  for severe sepsis or septic shock, data collectors also determined whether care was compliant with the Intermountain sepsis management bundles. Results of the compliance with sepsis bundles among septic patients have been reported separately .
To support improved compliance and quality of care as part of the overall bundle-driven quality improvement effort, we developed the probabilistic model whose accuracy is described in this report in both the retrospective (reported previously) and prospective models. Clinician collaborators initially identified 75 parameters that they suspected might be useful in the rapid identification of septic patients. These included basic demographic information, encounter details (e.g., time of day, day of week, method of arrival in the ED), results of laboratory tests, vital signs, other clinical measures, and the coded chief complaint. For any parameter for which multiple measures were available, the first available measurement in the ED was preferred. The components of the standard definition of the Systemic Inflammatory Response Syndrome (SIRS)  (e.g., heart rate greater than 90 beats per minute) were also included as candidates within the model. Clinicians also described desired performance characteristics for the model based on their a priori beliefs about acceptable sensitivity and false positive rates.
Chart review from January 2006 through December 2008 was completed in January 2009. During February and March 2009, the model was reviewed for overall performance characteristics (e.g., sensitivity, false positive rate) and individual case review by clinicians to determine whether performance was satisfactory and at what threshold to trigger an alert. The decision was made to implement the sepsis alerting system and a binary threshold was selected (0.05, though known calibration issues with naïve Bayes models do not support a simple interpretation of that threshold).
The alert system consisted of (a) routine clinical data entry, (b) pushing the data to a research decision support environment, (c) generating a probability estimate using Netica™ (Norsys Software Corp), a tool for developing/deploying Bayesian systems using data available at 60 min from ED entry, and (d) sending a page and email to the ED charge nurse for clinical consideration and action if the threshold was met. While training of charge nurses and other clinicians was ongoing throughout the process, we did not explicitly measure the effect of training on model performance.
Changes between model development and validation
There were several differences between the time periods of model development and model validation. Organizational changes included the movement of the primary center from LDS Hospital to Intermountain Medical Center midway through the development period and the implementation of the alerting system after the development period. From a system perspective, both the triage nurse identification system (described below) and the model-based alert were deployed after the development period. After initial implementation, the model-based alert was shifted from 60 min post ED admission to 90 min post ED admission in order to capture more laboratory results. Finally, the H1N1 influenza pandemic  occurred during the model validation period and after model development.
Throughout these changes, the same data processes and procedures were maintained for the source data, chart abstraction, chart review, and model assessment.
Triage nurse identification system
To maximize the chances of rapid patient identification, a parallel effort was made to identify sepsis patients via a modification to the triage nurse screens in the hospital electronic medical record. This system prompted the nurse to consider whether the patient had a new infection if the patient displayed fever, tachycardia, hypotension, or tachypnea. The triage nurse information was not considered as a potential predictor for the model, as it was not available during the model development phase. Data on the triage nurse information are provided for comparison purposes only.
The outcome of interest for model development was confirmed diagnosis of severe sepsis or septic shock. For assessment of model performance, the outcome was sensitivity > 80 % and a false positive rate < 15 alerts per day.
The methodological details of the statistical modeling environment will be described in a separate publication. Briefly, it is a naïve Bayes framework that employs forward stepwise selection based on out-of-bag area under the receiver operating characteristic curve (AUC) derived via bootstrap sampling [12, 13]. The naïve Bayes model was selected due to substantial, non-random missingness (especially in laboratory values) and a desire to avoid imputation methods. Naïve Bayes deals with non-random missingness natively and performs better than traditional regression methods at non-random missingness. Briefly, a missing variable does not affect the overall probability estimate of the model. Additionally, the modeling environment allows for direct clinician input with respect to preferences for certain parameters over others—should certain parameters be of statistically equivalent utility but more easily attained, for instance. Among highly collinear variables the naïve Bayes model we employed selected the variable most predictive of outcome. Because we were interested in prediction rather than model inference, we therefore performed no analyses of collinearity. In a naïve Bayesian model, parameters do not have weights/coefficients; instead each parameter is associated with a mean and standard deviation in septic and non-septic patients, and the differences in these distributions are used to build the prediction model.
All model development was based on the R statistical computing environment .
Model performance assessment
The primary success measures specified by clinicians were sensitivity (target 80 %) and the false positive count (target <15 patients per day). Based on ED visit volume, the target false positive rate equated to approximately 7.2 %. We calculated the AUC for both the continuous prediction model and the binary alert. For the binary alert, the positive predictive value (PPV), negative predictive value (NPV), and incremental (compared to the triage nurse system) true positive counts were calculated.
Final probability estimates from the model were calculated in Netica and then incorporated into the alerting environment; all other analyses were conducted with R.