Forecasting daily attendances at an emergency department to aid resource planning

Background Accurate forecasting of emergency department (ED) attendances can be a valuable tool for micro and macro level planning. Methods Data for analysis was the counts of daily patient attendances at the ED of an acute care regional general hospital from July 2005 to Mar 2008. Patients were stratified into three acuity categories; i.e. P1, P2 and P3, with P1 being the most acute and P3 being the least acute. The autoregressive integrated moving average (ARIMA) method was separately applied to each of the three acuity categories and total patient attendances. Independent variables included in the model were public holiday (yes or no), ambient air quality measured by pollution standard index (PSI), daily ambient average temperature and daily relative humidity. The seasonal components of weekly and yearly periodicities in the time series of daily attendances were also studied. Univariate analysis by t-tests and multivariate time series analysis were carried out in SPSS version 15. Results By time series analyses, P1 attendances did not show any weekly or yearly periodicity and was only predicted by ambient air quality of PSI > 50. P2 and total attendances showed weekly periodicities, and were also significantly predicted by public holiday. P3 attendances were significantly correlated with day of the week, month of the year, public holiday, and ambient air quality of PSI > 50. After applying the developed models to validate the forecast, the MAPE of prediction by the models were 16.8%, 6.7%, 8.6% and 4.8% for P1, P2, P3 and total attendances, respectively. The models were able to account for most of the significant autocorrelations present in the data. Conclusion Time series analysis has been shown to provide a useful, readily available tool for predicting emergency department workload that can be used to plan staff roster and resource planning.


Background
The ability to predict daily attendances at the emergency department (ED) of a hospital is valuable at a micro level for planning of staff rosters, and at a macro level for financial and strategic planning. Time series analysis has been applied in emergency medicine to forecast workload (patient volumes) and to study the impact of selected factors on the provision of patient care at ED [1][2][3][4][5][6][7][8][9][10]. A time series is a sequence of measurements made over time. If a forecasting method is used to predict the time series, the difference between the actual value and the predicted value measures the error in prediction. The ultimate test of any forecasting method is the size of these errors, and a best-fit model is a model which minimizes the error.
Most published studies using time series were based on seasonal factors only and were developed for forecasting overall demand for ED services [2][3][4][5][6][7]. Since there is wide variation in disease severity and acuity among patients presenting at the ED, clinical services and resources required will likewise vary considerably. The experiences gained from studies carried out in Western countries may not necessarily apply to local conditions, as there are multiple factors that might contribute to the fluctuation of the daily attendances at an ED in Singapore.
The purpose of this paper is to identify the local factors associated with the daily attendances at ED, and to make predictions based on these local factors. As resources are dependent on patient acuity levels, the forecast is also stratified by patient acuity categories (PAC).

Setting
The study was carried out in an emergency department in a major public sector acute care regional general hospital in Singapore. The hospital has the highest number of ED attendances and the highest proportion of acutely ill patients among five public sector acute care general hospitals in Singapore. Permission to conduct the study was granted by the Chairman, Medical Board of the hospital.

Data
Data used in the study was counts of daily patient attendances at ED between July 2005 and March 2008 (1,005 days), extracted from the ED administrative database. Patients who presented at the ED were classified as P1, P2 and P3 by the patient acuity category scale (PACS) used in all public sector hospital emergency departments in Singapore for resource allocation. P1 cases are most acutely ill and need immediate clinical services and treatment, P2 being acutely ill but can wait to be treated, and P3 being the less acutely ill patients who can wait longer to receive services (Table 1). Other data collected for the study included public holiday, and local weather factors (ambient temperature, ambient air quality measured by PSI, and relative humidity). The selection of the potential predictors was based on literature, local observation and availability of data. Singapore is a tropical country where the range in daily temperature throughout the year does not vary very much, hence daily average temperature was used.

Study design and methods
Univariate analysis of daily ED attendances and their association with potential predictors was carried out using general linear model, and significance testing using t-test where probabilities > 0.05 was considered statistically significant.
Time series analysis for identifying significant predictors as well as for forecasting daily ED attendances were carried out using established time series analysis procedures, the most popular time series analysis technique being auto regression integrated moving average (ARIMA) [11] model. ARIMA is a class of models, which are represented by (p, d, q)(P, D, Q) S , where p is the order of autoregression, d is the order of differencing (or integration), and q is the order of moving-average; (P, D, Q) are their seasonal counterparts; and s is the seasonal period [12]. Both weekly and yearly seasonal periodicities were taken into account in this analysis.
ARIMA models were iteratively applied to P1, P2, P3 and total patient attendances using data of the first 24 months to train, data of the following 6 months to test, and that of the following 3 months to validate. Elsewhere, models are usually trained and their performance evaluated on the test data; finally the model with least error is chosen as best-fit model. This strategy, however, leads an optimistic estimation of the performance of the chosen model since the data used for training and testing are identical with the data used for performance evaluation. Therefore, in this study, we used a third data set for performance evaluation (model validation). The best-fit model was then used to forecast prospectively and validated. As far as we know, there is no specific definition of "good accuracy" of a model. It is usually taken to be a non-significant p-value of the model by Ljung-Box test (p < 0.05) and a MAPE of < 20%. If the MAPE is less than 5%, the model performance can be regarded as being excellent.
Independent variables included in the model as potential predictors of daily ED attendances were public holiday (yes/no), ambient air quality measured by pollution standards index (PSI), average daily ambient temperature and average daily relative humidity. The seasonal components of weekly and yearly periodicities in the time series of daily attendances were also studied. The National Environmental Agency (NEA) of Singapore adopts the PSI developed by the US Environmental Protection Agency that provides easily understandable information about daily levels of air pollution. A range of 1-50 is considered good, while that 51-100 was moderately unhealthy, and >= 100 was unhealthy [14]. The readings on most days in Singapore were within good range. Therefore, we categorized PSI (> 50 and <= 50) for better statistical power.
The predictors at preceding days may also affect current ED attendance, or a lag association. It is defined as correlational dependency of order k between each i'th element of the series and the (i-k)'th element and measured by autocorrelation (i.e. a correlation between the two terms), and k being the lag [15].
All statistical analyses were done in SPSS version 15, using automated identification of best-fit models from each dependant variable based on performance measure, where probabilities less than 0.05 was considered statistically significant. Lag association was also automated by SPSS. The secular trend is one of increasing trend in total attendances, especially from 2006 onwards (Fig. 1). Fig. 2 shows weekly fluctuations. The higher total attendances on Monday were contributed mainly by P2 and P3 cases, while higher attendances on Sunday were contributed by P3 cases. Fig. 3 shows higher attendances from May to July, being contributed mainly by P3 cases. There was no yearly fluctuation in P1 attendances. Table 3 shows a significant upward secular trend in the number of attendances; with a monthly increase of 2.

Time series analysis
As shown in Table 4, by Ljung-Box tests, the p-values of the best-fit models were not significant, which means all the four models closely represented the observed time series. The best-fit model for P1 was ARIMA(0,1,1), which is a non-seasonal and non-stationary moving average model. The best-fit model for P2 was ARIMA(1,1,1)(1,0,1), which is a seasonal non-stationary auto-regression integrated with moving average model. The best-fit models for P3 and total attendances were ARIMA(0,1,1)(1,0,1), which are seasonal non-stationary moving average model.
All the four data series had linear trend since all 'd's in the best-fit models equal 1. P1 attendance did not show any weekly or yearly periodicity and was only predicted by ambient air quality of PSI > 50. P2 and total attendances showed weekly periodicities in the time series analyses, and were also significantly correlated with public holiday. P3 attendance was significantly correlated with day of the week, month of the year, public holiday, and ambient air quality of PSI > 50. The maximum lag between PSI > 50 and P1 cases was two days; there was no lag between PSI > 50 and P3 cases. The maximum lag between public holiday and P2, P3 and total cases was one day (Table 4).
P1 yielded a MAPE of 16.9% on validation; or forecasts of the model had an average error of 6 out of an average 33 attendances per day. The models for P2, P3 and total attendances performed better in the daily prediction of attendances, with a MAPE of 6.7%, 8.6% and 4.8%, respectively. Fig. 4 shows the observed and predicted time series for P1, P2, P3 and total attendances overlap with each other to a great degree. The scatter plots of observed vs predicted attendances by the four best-fit models shows that the points to be distributed along the diagonal line (Fig. 5); i.e. the models were successful in accounting for most of the significant autocorrelations present in the data.

Discussion
Although emergencies are difficult to foresee, this study demonstrated that daily patient attendances at ED can be predicted with good accuracy using the modeling techniques in time series analysis. During the study period, the daily variations noted were quite significant, with daily P1 attendances ranging from 10 to 72; P2 attendances ranging from 96 to 239; P3 attendances ranging from 138 to 307. The model developed has identified factors associ- Unlike other studies [6,8], this study showed that daily total ED attendances were not predicted by weather conditions. This could be because Singapore is a tropical city with little variation in its hot and humid weather conditions throughout the year. While there was no seasonal   ARIMA: auto-regression integrated moving average MAPE: mean absolute percentage error (p, d, q)(P, D, Q): p is the order of auto-regression, d is the order of differencing (integration), and q is the order of moving average; P, D, Q are their seasonal counterparts fluctuation, higher P1 attendances was predicted by moderate or poor ambient air quality (PSI > 50). This could be due to severe respiratory and heart diseases among the vulnerable elderly population, which make up the 70% of P1 cases, and as reported by other studies [16][17][18]. On the other hand, PSI > 50 was significantly inversely correlated with P3 attendances; i.e. fewer P3 attendances on days with high PSI. Singapore's national advisory on days with moderate to poor PSI follow that of US EPA; to reduce outdoor activities especially among those with compromised heart and lung conditions. Reduced outdoor activities during days of bad PSI may possibly account for this as attendances for trauma associated with minor accidents also decreased.
There were predictable higher weekly attendances on Sundays and Mondays, contributed by P3 cases. This is attributed by the closure of primary care facilities, mainly of the public sector on Sundays and public holidays; and the build-up of demand on Mondays. Similarly public holidays were also strongly correlated with higher P3 attendances when the primary care facilities are closed. There were also higher monthly attendances from May to July, contributed by P3 cases. This is attributed to the perennial seasonal dengue outbreaks and mid-year influenza activity.
Similar modeling and predicting framework can be extended to time series analysis of different intervals, such as hourly, weekly, monthly or yearly, as well as for different disease groups. The model's performance is based on historical trends. It is imperative for the forecasts to be iterative and updated regularly as more data is available in order to improve the prediction performance. In this case, the model is updated 3-monthly and the framework has been put into practice, where the model is run weekly to forecast the workload the following week. The forecasts have been used by the ED management to plan its staff deployment on a weekly base.
In addition to the immediate weekly forecasts, the model has also been used to plan longer term ahead. The study has shown higher daily P3 attendances due to the seasonal dengue and influenza outbreaks mid-year. Moreover, there were also higher P1 and P3 attendances associated with high PSI readings caused by transboundary air pollution from the seasonal forest fires in neighboring countries. These secular annual forecasts help the department plan staff headcounts and budget allocation a year in advance.
The study has helped us understand the factors associated with variation of daily ED attendances in a local setting Num ber of daily ED attendances Number of predicted daily ED attendances P1 P2 P3 Total and develop a model to forecast the daily attendances. To our knowledge, it is the first such study in Singapore. This study suffers from a few limitations. One is that there may be other factors affecting the daily ED attendances, like the availability of other primary care facilities and their workload which may predict ED attendances. Another limitation is the use of average daily temperature. Although the temperature range throughout the day may not be wide, maximum and minimum temperature could be more useful as a predictor. Also, we did not evaluate alternate forms of the predictor variables (e.g., squared, cubed or other non-linear forms) in this study, which may give better prediction of ED attendances.

Conclusion
Forecasting methods are useful in healthcare management. Accurate prediction of patient attendances will facilitate timely planning of staff deployment and allocation of resources within a department or a hospital. The hospital where the study was carried out is a regional hospital, with its catchment of patients geographically determined. The approach proposed and lessons learned from this experience may assist other four regional hospitals and their emergency departments to carry out their own analysis to aid planning and budgeting. Overall, it allows for a basis of macro-planning and allocation of budget by the Ministry of Health, which up to now is based on an average aggregated incremental percentage annual growth.