This article has Open Peer Review reports available.
A practical method for predicting frequent use of emergency department care using routinely available electronic registration data
© Wu et al. 2016
Received: 25 July 2014
Accepted: 1 February 2016
Published: 9 February 2016
Accurately predicting future frequent emergency department (ED) utilization can support a case management approach and ultimately reduce health care costs. This study assesses the feasibility of using routinely collected registration data to predict future frequent ED visits.
Using routinely collected registration data in the state of Indiana, U.S.A., from 2008, we developed multivariable logistic regression models to predict frequent ED visits in the subsequent two years. We assessed the model’s accuracy using Receiver Operating Characteristic (ROC) curves, sensitivity, and positive predictive value (PPV).
Strong predictors of frequent ED visits included age between 25 and 44 years, female gender, close proximity to the ED (less than 5 miles traveling distance), total visits in the baseline year, and respiratory and dental chief complaint syndromes. The area under ROC curve (AUC) ranged from 0.83 to 0.92 for models predicting patients with 8 or more visits to 16 or more visits in the subsequent two years, suggesting acceptable discrimination. With 25 % sensitivity, the model predicting frequent ED use as defined as 16 or more visits in 2009 and 2010 had a PPV of 59.5 % and specificity of 99.9 %. The “adjusted” PPV of this model, which includes patients having 8 or more visits, is 81.9 %.
We demonstrate a strong association between predictor variables present in registration data and frequent ED use. The algorithm’s performance characteristics suggest that it is technically feasible to use routinely collected registration data to predict future frequent ED use.
With increasing medical costs, health care reformers and policy makers have turned to emergency department (ED) utilization as a potential source for cost savings. A relatively small number of patients, often called “frequent” or “high ED users”, have been an increasing focus because of their disproportionate share of ED visits and cost. When defined as 4 or more ED visits per year, frequent users accounted for 4.5 to 8 % of all ED patients and contributed 21 to 28 % of all ED visits . Prior interventions targeting frequent users did not achieve universally positive outcomes although some studies demonstrate reduced ED use [2–6]. A clear framework including a consensus-based definition of frequent users and methods to accurately and consistently identify this population may improve the effectiveness of care management interventions.
Currently, however, a standardized definition for frequent ED users remains elusive. A single visit threshold has been used to differentiate frequent users from low ED users, and the visit threshold varies from as few as 3 to 12 or more annual visits, often without a clear rationale for the visit cut-point [7–13]. Further, the majority of prior studies on frequent ED users focused on identifying existing frequent ED users [7, 9, 11, 14], which is problematic as most frequent users in a given year will not remain frequent users in the next year. It has been shown that an individual who has 4 or more visits in a given year was only 28 to 38 % likely to be a frequent user the next year . Fertel et al. also showed that highly frequent use occurs for only a minority of ED patients, and then only for a discrete period . Roland et al. pointed out that the “regression to mean” phenomenon should not be ignored when evaluating interventions for frequent ED users . Therefore, blindly targeting most current frequent ED users for future interventions is inefficient because their heavy use of ED services may decrease without intervention. Since health care resources are limited, it is essential that interventions target patients whose heavy ED use will likely persist. Thus, the capacity to predict patients who are likely to sustain frequent future ED utilization can help address this problem by identifying patients who are most likely to generate future heavy ED use and costs.
Previously we reported that 2.8 million patients from 96 EDs in the state of Indiana within United States generated 7.4 million ED visits from 2008 to 2010, and the average number of visits was 2.6 visits per patient . We found that patients cross over to other ED institutions with great frequency, and about 3.3 % of the patients made more than 10 visits to Indiana EDs from 2008 to 2010 . In this study, we explore whether specific features contained within routinely gathered registration data could meaningfully predict a patient’s future ED utilization. If these features accurately predict future frequent ED users, then we can more effectively, target limited health care resources on this group, maximizing the benefit of the intervention. The purpose of this study is to assess the feasibility of using routinely gathered registration data to predict patients who will visit ED’s with high frequency.
This study was approved by Indiana State Department of Health (ISDH) data release committee and the Indiana University Institutional review board (USA).
Data collected for this study were derived from the original Health Level-7 (HL7) version 2 registration transactions for ED encounters from 96 institutions participating in the Indiana Public Health Emergency Surveillance System (PHESS) between January 1, 2008 and December 31, 2010. The data is not publically available but can be accessed through the Regenstrief Institute Data Core (https://www.regenstrief.org/hsr/research-programs/rcher/data-core/).
The processes for preparing ED encounter data as well as the details for each step were presented in our previous paper . Briefly, registration transactions were processed to ensure each transaction was unique and contained valid ED encounter data according to PHESS requirements and a set of heuristics drawn from Regenstrief’s long-term real-world experience operating a health information exchange. Unique ED encounters were established using data elements including person, place and time. The specific fields included  healthcare institution (HL7 MSH-4),  ED encounter date (HL7 PV1–44), and  medical record number (HL7 PID-3). Transactions missing any of these fields could not be definitively and uniquely identified as an encounter and were excluded from the analysis.
Unique patients were identified using various combinations of patient demographics, including social security number, last and first name, gender, date of birth, telephone number, and zip code as determined by an open-source probabilistic record linkage software package . In this manner all ED encounters belonging to the same patient were linked, forming a “patient group.” A unique global patient identifier was assigned to each patient group. In total, we identified 7,447,521 unique ED encounters. Data available for analysis includes: age, sex, chief complaints, ZIP codes for patients’ address, and hospital ZIP codes. Patients’ global identifier was used to link visits across different hospital databases, including all ED visits regardless of disposition.
We developed multivariable logistic regression models. Patients with at least one ED visit in 2008 were used to predict ED visits in the years of 2009 and 2010. Patients who died before January 1, 2009 or had missing values in one or more covariates were excluded (<4.30 %). The final sample size was 1,272,367 patients. All variables were summarized at the patient level for model development.
All covariates were determined based on the ED utilization data in 2008.
Age: age was determined at the time of the first ED visit, and divided into six subgroups: <5, 5–14, 15–24, 25–44, 45–64 and > =65 years.
Sex: male and female;
Visits in 2008: the total number of ED visits made in 2008 for each patient;
Chief complaints: the chief complaint syndromes were grouped into 11 categories: respiratory, gastrointestinal (GI), undifferentiated infection (UDI), influenza-like illness (ILI), lymphatic, skin, neurological, pain, dental, alcohol and musculoskeletal syndromes. These categories were used by other surveillance programs with slight modification [19–21]. Chief complaints that could not be grouped into the above 11 syndromes were assigned to “unclassified”. The categories were then reviewed by two physicians (Grannis S, Finnel JT) and an epidemiologist. For each patient, the proportion of each chief complaint syndrome is determined through dividing the number of ED visits with a specific syndrome by the total number of ED visits that the patient had in 2008. Since one ED visit may have more than one syndrome, these percentages do not add up to 100 %.
Zip code centroid straight-line distances: The Perl library Geo::Distance was used to calculate the straight-line distances between geographic points from patients’ home to hospital based on zip code centroids of patient’s home address and hospital address. Distance was then grouped into 3 categories: <=5 miles, 5–20 miles and >20 miles. Since one patient may have multiple ED visits with different distance, we determined the proportion of ED visits falling into each of the three categories by dividing the number of ED visits with a specific distance category by the total number of ED visits that a patient made in 2008. Because the proportions for each of these three distance categories add up to 100 %, only two categories (<5 miles and >20 miles) were included in the analytic model.
The outcome was measured as dichotomized variable (frequent versus low ED user). Frequent ED users were investigated by using visit cut-points ranging from 8 to 16 visits over a two-year period (between 2009 and 2010). One model was fit for each cut point. Patients were defined as frequent ED users if their ED visits were equal to or higher than the visit cut-point, and were otherwise defined as low ED users.
Model performance evaluation
The model’s performance was assessed for discrimination using the Receiver Operating Characteristic (ROC) curves. We balanced the goal of identifying all frequent ED utilizers with the intervention cost of incorrectly identifying frequent ED users by selecting a fixed sensitivity of 25 % to minimize the false positive rate. We then evaluated the specificity and positive predictive value (PPV) for each model at fixed sensitivity of 25 %. We also combined the false positive (FP) patients who had 8 or more visits with the true positive (TP) patients to obtain the “adjusted” positive cohort. The “adjusted” PPV was determined by dividing the “adjusted” positive group by the sum of TP and FP. Statistical analyses were conducted using SAS version 9.3 (SAS Corporation; Cary, North Carolina).
ED utilization and patients’ distribution by year
Characteristics of ED patients
Patients aged 25 to 44 years were the largest group and accounted for ~29 % of the total visits and ~26 % of total patients. The second largest age group were 44 to 65 years and contributed ~20 % of the total visits. Patients aged 5 to 14 years had the least number of visits and accounted for 8 to 9 % of total visits in each year, respectively. More than 53 % of patients were female and they accounted for 55 % of visits.
Distribution of patients and ED visits by visit cut-points in each year
Year (Pat. No.)
> = 4
> = 8
> = 16
2008 (n = 1329645)
2009 (n = 1396313)
2010 (n = 1397338)
2008 (n = 2367399)
2009 (n = 2551881)
2010 (n = 2528241)
Distribution of ED visits by travel distance and chief complaints in each year, 2008 to 2010
Travel distance (miles)
Table 2 captures chief complaint distributions. “Pain” was the most common chief complaint category, accounting for more than 40 % of total visits, while the “musculoskeletal” category contributed ~28 % of total ED visits. “Respiratory” and “gastrointestinal” syndrome categories accounted for more than 23 % and 17 % of total visits, respectively. The “alcohol” accounted for 0.5 % of total visits each year. Nearly 15 % of visits were grouped into the “unclassified” category.
Multivariable logistic regression model predicting frequent use of ED care
Distribution of predictors in 2008 stratified by number of visits in 2009 and 2010
Visits in 2009 & 2010
Total (Patients No.)
> = 16
Year 2008 patients (No.)
Age (Years) (Patients, %)
> = 65
Sex (Patients, %)
Visits in 2008 (Patients, %)
> = 16
Chief complaints in 2008 (Visits, %)
Distance (Miles) (Visits, %)
Multivariable logistic regression models predicting frequent ED users having > = 8 visits and frequent ED users having > =16 visits in 2009 and 2010
> = 8 visits
> = 16 visits
> = 65
Visits in 2008
(Visits in 2008)2
Sex (ref. = “Male”)
Travel Distance (miles)
Multivariable Logistic Regression Models
No. of visits constituting ‘frequent use’
> = 8
> = 9
> = 10
> = 11
> = 12
> = 13
> = 14
> = 15
> = 16
Area under ROC curve (AUC)
With sensitivity < =25 %, probability > 0.5
False positive patients
> = 8 visits (No.)
Adjusted PPV for patients with > =8 visits in subsequent two years (%)
The primary goals of this study were to evaluate the feasibility of using routinely available registration data to predict patients likely to use ED services frequently in the future and to develop strategies for improving the accuracy and efficiency of detecting frequent ED users. We demonstrate a strong association between predictor variables present in routine registration data and frequent ED use. The algorithm’s performance characteristics suggest that it is technically feasible to use routinely collected registration data to predict such use, and the model’s observed prediction accuracy may support identifying and intervening upon frequent ED users. Thus, such models may support more effective targeting of limited health care resources to patients who may maximally benefit from intervention.
Much of the literature studying frequent ED utilization has substantial limitations, which our study sought to address. First, some published studies used data from a limited number of ED’s and thus their broad generalizability is unclear [7–12, 22, 23]. Although several statewide studies in United States explored ED visits across age, gender, health insurance groups and clinical characteristics between frequent and in-frequent ED users, most were descriptive in nature and few applied prediction models to identify frequent ED users [24–27]. Second, some studies used survey or interview data and the quality and reliability of such data can be affected by survey response rates [8–10]. Further, the cost, time and other resources involved in the interview may be prohibitive. Third, some studies focused on specific cohorts such as asthmatics or the elderly, and this limits the ability of policy makers and providers to determine whether unifying factors that could be targeted for intervention exist amongst a more general population of patients with frequent ED utilization. Lastly and most importantly, in many cases researchers focused on identifying existing frequent ED users instead of predicting future frequent ED utilization [7, 9, 11, 14]. As shown in ours and others studies [1, 15, 16], most patients do not remain frequent ED users over time and many naturally reduce their ED use without intervention (regression toward mean). Thus, predicting patients who are likely to sustain future frequent ED utilization will be necessary for improving the health of this vulnerable patient group.
Developing algorithms that accurately identify patients who are likely to frequently visit ED’s in subsequent years is a first step toward developing potential interventions to mitigate overuse. However, few studies have leveraged any approach or method to identify future frequent ED users [4,28–31]. In those studies, frequent ED users were defined with a threshold number of ED visits, e.g. 3 to 10 ED visits within the 12 months prior to the study period. In addition, the majority of the comparative cohort studies used a pre-and post-intervention design, where the population exposed to the intervention served as their own historical control groups, without recognizing the regression toward mean phenomenon, which might incorrectly inflate the effectiveness of interventions.
In our study, we developed a practical approach to predict future frequent ED users. The model predicting patients with 8 or more visits in the subsequent two years demonstrated reasonable discriminative power with an AUC of 0.84. As the threshold defining ‘frequent use’ increases, the corresponding AUC also increased. The model predicting frequent ED use of 16 or more visits in the subsequent two years showed good discrimination, with an AUC of 0.92. Strong predictor variables included visits in the baseline year, age, sex, zipcode centroid straight-line distance between home and hospital, and specific chief complaints, including respiratory, dental and alcohol syndromes. When comparing false positives to true positives and false negatives to true negatives, respectively, we noted that the variable “Number of visits in the baseline year” were very close, indicating that patients’ other features contained within routinely gathered registration data contributed additional discriminating power.
If the algorithm incorrectly flags patients as frequent utilizers, the resulting inefficiencies may offset potential savings from subsequent reduced ED utilization. Considering the trade-offs between (a) identifying the maximal number of subjects who are truly frequent ED use patients and (b) minimizing subjects incorrectly flagged as frequent ED use patients, we aimed to balance the cost of incorrectly identifying frequent ED patients by setting the prediction model’s sensitivity at 25 %. Although the models had PPVs around 60 %, a significant proportion of false positive patients actually had more than 8 ED visits in two years. The adjusted PPV for patients having 8 or more visits in the model that predicting frequent ED users as 16 or more visits is 81.9 %. To our knowledge, this is the first study to employ routine registration data to develop predictive algorithm to predict future frequent ED use. The prediction accuracy strongly suggests that it is feasible to apply routinely collected registration data for future frequent ED utilization prediction.
Limitations of our study include the following: First, we lacked comprehensive population level data for persons who did not use the emergency department. Therefore our analysis is limited to characterizing those individuals who present to emergency departments. Second, we did not include data such as patients’ socioeconomic status, since that data is not routinely captured in ED registration data. Third, the applicability of our model to ED registration data from other sites is not assessed, and the predictive performance of the models might be overrated. In the future, we seek to validate this approach against other datasets in a geographically distinct region. Finally, we only evaluated models with 25 % sensitivity as we aimed to balance the cost of ED utilization and the intervention support cost from incorrectly identified frequent ED users.
We demonstrate a strong association between predictor variables present in routine registration data and frequent ED use. This analysis suggests that it is technically feasible to use routinely collected registration data to identify such use, and the model’s observed prediction accuracy may support identifying and intervening to ensure health care resources will be delivered to ensure this group will maximally benefit from intervention. Future work will include validating our algorithm using data sets from other state or organizations within United States.
We acknowledge James Egg, Joe Kesterson and Jane Wang for their assistance on data extracting and processing.
This study is supported in part by the CDC through the Indiana Center of Excellence in Public Health Informatics (1P01HK000077-01) and award (T15OC000047-01) from the Office of the National Coordinator for Health Information Technology, Os, Hhs in United States.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- LaCalle E, Rabin E. Frequent users of emergency departments: the myths, the data, and the policy implications. Ann Emerg Med. 2010;56(1):42–8.View ArticlePubMedGoogle Scholar
- Shumway M, Boccellari A, O’Brien K, Okin RL. Cost-effectiveness of clinical case management for ED frequent users: results of a randomized trial. Am J Emerg Med. 2008;26:155–64.View ArticlePubMedGoogle Scholar
- Spillane LL, Lumb EW, Cobaugh DJ, Wilcox SR, Clark JS, Schneider SM. Frequent users of the emergency department: can we intervene? Acad Emerg Med. 1997;4:574–80.View ArticlePubMedGoogle Scholar
- Lee KH, Davenport L. Can case management interventions reduce the number of emergency department visits by frequent users? Health Care Manag (Frederick). 2006;25:155–9.Google Scholar
- Kne T, Young R, Spillane L. Frequent ED users patterns of use over time. Am J Emerg Med. 1998;16(7):648–52.View ArticlePubMedGoogle Scholar
- Morgan SR, Chang AM, Alqatari M, Pines JM. Non-emergency department interventions to reduce ED utilization: a systematic review. Acad Emerg Med. 2013;20(10):969–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Sun BC, Burstin HR, Brennan TA. Predictors and outcomes of frequent emergency department users. Acad Emerg Med. 2003;10(4):320–8.View ArticlePubMedGoogle Scholar
- Zuckerman S, Shen YC. Characteristics of occasional and frequent emergency department users: do insurance coverage and access to care matter? Med Care. 2004;42(2):176–82.View ArticlePubMedGoogle Scholar
- Hunt KA, Weber EJ, Showstack JA, Colby DC, Callaham ML. Characteristics of frequent users of emergency departments. Ann Emerg Med. 2006;48(1):1–8.View ArticlePubMedGoogle Scholar
- Pines JM, Buford K. Predictors of frequent emergency department utilization in Southeastern Pennsylvania. J Asthma. 2006;43(3):219–23.View ArticlePubMedGoogle Scholar
- Milbrett P, Halm M. Characteristics and predictors of frequent utilization of emergency services. J Emerg Nurs. 2009;35(3):191–8.View ArticlePubMedGoogle Scholar
- Blank FS, Li H, Henneman PL, Smithline HA, Santoro JS, Provost D, et al. A descriptive study of heavy emergency department users at an academic emergency department reveals heavy ED users have better access to care than average users. J Emerg Nurs. 2005;31(2):139–44.View ArticlePubMedGoogle Scholar
- Locker TE, Baston S, Mason SM, Nicholl J. Defining frequent use of an urban emergency department. Emerg Med J. 2007;24(6):398–401.View ArticlePubMedPubMed CentralGoogle Scholar
- Doupe MB, Palatnick W, Day S, Chateau D, Soodeen RA, Burchill C, et al. Frequent users of emergency departments: developing standard definitions and defining prominent risk factors. Ann Emerg Med. 2012;60(1):24–32.View ArticlePubMedGoogle Scholar
- Fertel BS, Hart KW, Lindsell CJ, Ryan RJ, Lyons MS. Toward understanding the difference between using patients or encounters in the accounting of emergency department utilization. Ann Emerg Med. 2012;60(6):693–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Roland M, Abel G. Reducing emergency admissions: are we on the right track? BMJ. 2012;8:345–e6017.Google Scholar
- Finnell JT, Overhage JM, Grannis S. All health care is not local: an evaluation of the distribution of Emergency Department care delivered in Indiana. AMIA Annu Symp Proc. 2011;409:16.Google Scholar
- Grannis S, Egg J, Ribeka N, RecMatch. Probabilistic Patient Record Matching. 2008.Google Scholar
- Tsui FC, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM. Technical Description of RODS: A Real-time Public Health Surveillance System. J Am Med Inform Assoc. 2003;10:399–408.View ArticlePubMedPubMed CentralGoogle Scholar
- Lewis MD, Pavlin JA, Mansfield JL, O’Brien S, Boomsma LG, Elbert Y, et al. Disease Outbreak Detection System Using Syndromic Data in the Greater Washington DC Area. Am J Prev Med. 2002;23(3):180–6.View ArticlePubMedGoogle Scholar
- Brillman JC, Burr T, Forslund D, Joyce E, Picard R, Umland E. Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance. BMC Med Inform Decis Mak. 2005;5:4.View ArticlePubMedPubMed CentralGoogle Scholar
- Moe J, Bailey AL, Oland R, Levesque L, Murray H. Defining, quantifying, and characterizing adult frequent users of a suburban Canadian emergency department. CJEM. 2013;15:1–13.View ArticleGoogle Scholar
- Geurts J, Palatnick W, Strome T, Weldon E. Frequent users of an inner-city emergency department. CJEM. 2012;14(5):306–13.View ArticlePubMedGoogle Scholar
- Fuda KK, Immekus R. Frequent users of Massachusetts emergency departments: a statewide analysis. Ann Emerg Med. 2006;48(1):9–16.View ArticlePubMedGoogle Scholar
- Cook LJ, Knight S, Junkins Jr EP, Mann NC, Dean JM, Olson LM. Repeat patients to the emergency department in a statewide database. Acad Emerg Med. 2004;11(3):256–63.View ArticlePubMedGoogle Scholar
- South Carolina Public health institute. A Report on Frequent Users of Hospital Emergency Departments in South Carolina. 2011.Google Scholar
- Kilbreth B, Shaw B, Westcott D, Gray C. Analysis of emergency department use in Maine. January: Muskie School of Public Service; 2010.Google Scholar
- Bodenmann P, Velonaki VS, Ruggeri O, Hugli O, Burnand B, Wasserfallen JB, et al. Case management for frequent users of the emergency department: study protocol of a randomised controlled trial. BMC Health Serv Res. 2014;14:264.View ArticlePubMedPubMed CentralGoogle Scholar
- Reinius P, Johansson M, Fjellner A, Werr J, Ohlén G, Edgren G. A telephone-based case-management intervention reduces healthcare utilization for frequent emergency department visitors. Eur J Emerg Med. 2013;20(5):327–34.View ArticlePubMedGoogle Scholar
- Crane S, Collins L, Hall J, Rochester D, Patch S. Reducing utilization by uninsured frequent users of the emergency department: combining case management and drop-in group medical appointments. J Am Board Fam Med. 2012;25(2):184–91.View ArticlePubMedGoogle Scholar
- Hansagi H, Olsson M, Hussain A, Ohlén G. Is information sharing between the emergency department and primary care useful to the care of frequent emergency department users? Eur J Emerg Med. 2008;15(1):34–9.View ArticlePubMedGoogle Scholar