Rank aggregation methods comparison: A case for triage prioritization

Rank aggregation methods comparison: A case for triage prioritization

Expert Systems with Applications 40 (2013) 1305–1311 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

517KB Sizes 0 Downloads 28 Views

Expert Systems with Applications 40 (2013) 1305–1311

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Rank aggregation methods comparison: A case for triage prioritization Erica B. Fields a, Gül E. Okudan a,b,⇑, Omar M. Ashour a a b

The Harlod and Inge Marcus Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, 102 Engineering Unit B, University Park, PA 16802, USA School of Engineering Design, The Pennsylvania State University, 213T Hammond Building, University Park, PA 16802, USA

a r t i c l e

i n f o

Keywords: Group decision making Rank aggregation Utility intervals Mathematical programming models Ordered weighted averaging (OWA) operator weights

a b s t r a c t This paper seeks to test and to determine a suitable aggregation method to represent a set of rankings made by individual decision makers (DMs). A case study for triage prioritization is used to test the aggregation methods. The triage is a decision-making process with which patients are prioritized according to their medical condition and chance of survival on arrival at the emergency department (ED). There is a lot of subjective decision-making in the process which leads to discrepancies among nurses. Four rank aggregation methods are applied to the prioritization data and then an expert evaluates the results and judges them on practicality and acceptability. The proposed recommendation for preference aggregation is the method of the estimation of utility intervals. Expert opinion is highly valued in a decision-making environment such as this, where experience and intuition are key to successful job performance and outcomes. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction The task of ranking a list of alternatives based on one or more criteria is encountered in many situations. If the task is done based on a single criterion and single decision maker, it will be an easy task. In contrast, this paper seeks to address the problem of identifying a consensus ranking of alternatives, given the individual ranking preferences of several decision makers. This problem is called the rank aggregation problem. The selection of the most appropriate aggregation method for a new application in decision-making is a difficult task. This is a complex issue, because many different aggregation methods are available. In this paper, we reduce the problem of method selection to the problem of aggregation method testing by testing four rank aggregation methods on the problem at hand. As a case study, we study the problem of triage rank aggregation in the emergency department (ED) settings. A critical concern in health systems today is overcrowding, where there are more patients waiting for service or treatment than available resources or staff to attend to them. This can be seen prevalently in emergency departments (EDs). Furthermore, some patients arrive at the ED with non-urgent conditions (Buesching et al., 1985; Patel, Gutnik, Karlin, and Pusic, 2008). To maintain order in the ED and provide care to the patients who are most in need of it first, most EDs utilize a triage system to sort patients in order

⇑ Corresponding author at: School of Engineering Design, The Pennsylvania State University, 213T Hammond Building, University Park, PA 16802, USA. E-mail addresses: [email protected] (E.B. Fields), [email protected], gkremer@ psu.edu (G.E. Okudan), [email protected] (O.M. Ashour). 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.08.060

of the severity of their condition (Andersson, Omberg, & Svedlund, 2006; Beveridge, 1998). While most critical patients are seen immediately, others are initially assessed and then sent back to the waiting room until they can be seen. Thus, some patients end up waiting for long periods of time. As patients are waiting, their condition may worsen or improve. This is the inherent dynamic nature of healthcare in the ED and must be considered by nurses and other staff when making decisions of who will receive treatment next. Frequent reassessment of patients is necessary to ensure that no patient reaches a lifethreatening state without receiving the proper care. In particular, vital signs such as temperature, blood pressure, and heart rate may be changing and there is uncertainty in the nurse decisionmaking process as to which signs or combinations of signs (and their specific levels) are more important to base a decision on. The development of decision support aids to make decisionmaking easier or more expedient for nurses could help here. The motivation for this research is the improvement of the healthcare systems by applying decision-making tools to assist in the decision-making process of ED nurses. The working environment of an ED nurse can be very hectic and stressful at times, as it involves dealing with a lot of patients in the waiting room, triaging and reassessing patients accordingly, and caring for patients who are receiving care at the ED. These conditions are further amplified because often there is limited staff. Many judgments need to be made as they work, probably the most important being the triage category assigned to each patient, as this influences the priority the patient will have for receiving treatment. An aid can reduce the stress and strain on nurses and also help reduce their workload.

1306

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311

In a previous study, we investigated the discrepancies in decisions made across nurses in three clinical settings: Susquehanna Health Williamsport Hospital (S), Mount Nittany Medical Center (M), and Hershey Medical Center (H); nurses triaged and prioritized a set of patients according to the Emergency Severity Index (ESI) (Fields, Claudio, Okudan, Smith, & Freivalds, 2009). The ESI outlines five categories with clinically meaningful differences in projected resource needs and therefore, associated operational needs (Gilboy et al., 2005; Zimmermann, 2001). The discrepancies exist among nurses in the same ED and across the EDs. The preferences need to be aggregated because studies have shown that much of decision-making is based on a nurse’s experience, knowledge, and intuition (Patel et al., 2008; Andersson et al., 2006; Cone & Murray, 2002). Since these aspects differ from nurse to nurse, their determination of an ESI category and subsequent prioritization of a patient may differ, especially for patients who do not have the most critical symptoms. 2. Aggregation methods The aggregation of individual preferences into one overall preference representing the group, or a consensus, has been studied extensively (Wang, Yang, & Xu, 2005). The history of aggregation methods thus far can be categorized into four areas: early efforts using weighted sums, studies of the simple group consensus, the use and incorporation of distance measures, and alternative frameworks. The earliest effort to study the problem of rank aggregation was done by Borda (1784) and Kendall (1962) later studied it from a statistical viewpoint, arriving at the same conclusion. Borda (1784) explored an election problem and proposed determining the rank of candidates according to the sum of ranks given to them by voters. If there were m candidates and each voter ranked each candidate with no ties, the highest rank received a weight of m, the next highest rank received a weight of m1, continuing so that the lowest rank received a weight of 1 (Wang, Chin, & Yang, 2007a). The final rankings are determined by a weighted sum, where the alternative with the highest sum is most preferred followed by the other alternatives in descending sum order. Because this method determines weights to be used in a weighted sum, it is called a weight-determining method. With its simple calculations, the Borda–Kendall (BK) method, as it is commonly referred to, is the most widely used technique for rank aggregation. Many other weight-determining methods have been developed from this one. The simplest and perhaps most frequently used way to draw a group consensus is the majority rule. This rule dictates that the alternative receiving the most votes is declared the winner. Arrow (1951) decided that any aggregation or consensus drawn from individual preferences needs to satisfy certain social welfare axioms. Inada (1964) and others since have studied Arrow’s axioms further and developed methods satisfying them, including one based on the majority rule idea. Kemeny and Snell (1962) first studied the use of distance measures in rank aggregation and proposed their own set of axioms, similar to Arrow’s. A distance measure is a measure of how close two vectors are to each other. To illustrate, we describe the ‘1-metric, also known as the Manhattan- or Cityblock-metric. Assuming that x and y are two n-dimensional vectors,

l1 ðx; yÞ ¼

n X

jxi  yi j

ð1Þ

i¼1

Similar to this is the Spearman footrule distance, which is used for ordinal data, or ranks, instead of quantitative data. If A orders his preference among n items in vector x and B orders his preference on the same n items in vector y, the footrule distance of preference between A and B is:

dAB ¼

n X jxAi  yBi j

ð2Þ

i¼1

Bogart (1973, 1975) studied distance measures applied to partial orderings. For a set of n items, a partial ordering is a ranking where only a subset of the n items is ranked. Cook and Seiford (1978) limited their study to rankings where no ties were allowed, called complete ordinal rankings, and developed axioms similar to Kemeny and Snell (1962). About two decades later, a general model for drawing a distance-based consensus was introduced (Cook, Kress, & Seiford, 1996). The last area identified concerning previous research on rank aggregation methods is the development of different frameworks. Researchers have developed heuristics and used methods such as data envelopment analysis (DEA) and extreme-point approaches to arrive at a consensus. Methods like these most often have mathematical programming as an essential part of the solution process. As an example of this kind of work, Cook and Kress (1990) proposed a DEA model for aggregating preference rankings, and found it to be equivalent to the BK method under certain circumstances. In this paper, the focus is on four aggregation methods from the literature. Three methods are outlined in the following subsections, and the fourth method is the BK method, discussed earlier. These techniques have been selected out of all methods studied because they are not overly specific; they are designed for broad applications. For example, the chosen methods basically require numerical ranking data, which are lists of permutations of 1, 2, 3, . . . , n, if there are n items to be ranked. On the other hand, a more specific method may require fuzzy preference relations to be defined, or bipolar preferences (Peneva & Popchev, 2007; Öztürk & Tsoukiàs, 2008). Our current problem would not fit very well into the more specific methods found in the literature. Further, the chosen methods are very adaptable and flexible. For example, some are able to handle a set of rankings where ties are present. In others, the decision maker can specify a desired result through defining a parameter or additional preference for the weights to satisfy, if it is a weight-determining method. Finally, we chose different types of methods: some weight-determining, some DEA-based, and some utilizing distance measures. Different types were chosen in order to obtain results according to different decision rules. This way, if similar results are achieved, it will not be due to unintentional, repeated evaluations under the same decision rule. 2.1. Aggregation through the estimation of utility intervals In this method, constructed by Wang et al. (2005), individual preference rankings are viewed as constraints on utilities, and linear programming (LP) models are used to estimate the utility intervals. Then, a weighted average sum is used to aggregate the intervals for each alternative. Finally, a simple, yet practical interval comparison method is used to determine the overall ranking. The interval comparison method also provides information on the degree to which one interval is preferred to another, and in the final ranking, gives a percentage of how much a higher-ranked alternative is preferred to a lower-ranked alternative. This method is suggested for group decision-making, social choice, and committee elections. It has been previously used in voting systems (Tamiz & Foroughi, 2007). 2.2. Aggregation using ordered weighted averaging (OWA) operator weights For this method, the basic premise of a traditional rank aggregation method holds, where different ranking places are assigned a weight representative of its importance to the overall solution, and the overall aggregation is achieved through a simple weighted

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311

1307

sum. The weights are also normalized. ‘‘OWA operators. . .provide a unified framework for decision making under uncertainty, where different decision criteria. . .are characterized by different OWA operator weights’’ (Wang, Luo, & Hua, 2007b, pp. 3357). Thus, the authors propose using OWA operator weights because they suggest a weighted average sum is similar enough to an OWA operator to warrant using its weights. Orness, a measure associated with the weight vector, is a value in the interval [0, 1] and assesses the degree to which the DM emphasizes the higher ranking places. It is termed the optimism level, a, of the DM. For example, an optimism level of 1 means that all weight is placed on the 1st ranking place and an optimism level of 0.5 ensures that all ranking places are equally considered. This method can be used with varying optimism levels toward the same problem, giving the DM the opportunity to choose an appropriate solution by optimism level and view the stability of the solution in terms of the results given by other optimism levels. The paper also proves that the Borda–Kendall method corresponds to a = 2/3 (Wang et al., 2007b). This method provides more choices and flexibility for DMs than BK, and has previously been used in preferential voting and election systems, in parameterized estimation of fuzzy random variables (Liu, 2009), and in querying systems of a hospital’s database (Wang, Chang, & Cheng, 2006). Further, OWA operators have been used in aggregating criteria functions in multi-criteria decision-making (Yager, 1988).

Center (MNMC), and Hershey Medical Center (HMC). In order to not impact clinical activities adversely, our team has visited the EDs at these locations during their off-peak operational times (e.g., 4:00–6:00 am). Each interview took approximately 30 min. The interview ended with a 3-minute exercise, shown in Fig. 2, where interviewees were asked to provide the ESI level and priorities for 8 patients, for which we only provided the vital signs, age, and gender data. The vital signs included were temperature (°F), heart rate (beats/minute), respiration rate (breaths/minute), systolic blood pressure (mm Hg), and diastolic blood pressure (mm Hg). All the hospitals used ESI and as nurses completed the exercise, they could use the ESI algorithm, if desired. The 8 patient scenarios of the exercise were constructed so that all patients would be viewed as semi-urgent or non-urgent, which most likely suggests an ESI level categorization of three, four, or five. There are less obvious distinguishing factors among these patients as opposed to severely acute patients, especially since symptoms and conditions are not provided. As mentioned earlier, overcrowded ED waiting areas are mainly comprised of patients who would fall in these categories. So, focusing on this group would be most beneficial to alleviate the problem in EDs. At the conclusion of data collection, the total number of nurses interviewed was 36. 14 nurses were interviewed at SHWH, 12 at MNMC, and 10 at HMC.

2.3. Three mathematical programming models

3.2. Methods

Wang et al. (2007a) suggests another weight-determining model for rank aggregation. Even though three models are proposed, two LP models and one nonlinear programming (NLP) model, we consider them as a set from which the DM should choose only one. The models are straightforward and simple to use, providing objective weights, as well as final rankings, and not requiring any parameter to be specified by the DM. Further, the models put more emphasis on the 1st ranking place, using the strong ordering constraint w1 P 2w2 P    P mwm, where wj = the weight of the jth ranking place. Although no parameters are specified, which usually play a role in determining the 2nd through mth places, their results show that the models produce strong, stable final rankings. LP-1 and LP-2 maximize the minimum total scores of all n items. The differences in the models are that the LPs generate the same set of weights for all alternatives and the NLP determines the most favorable weights for each alternative. LP-1 requires that the weights sum to 1, while LP-2 does not, and LP-2 requires that each alternative’s score be less than or equal to 1, while LP-1 does not. All three are adapted from a data envelopment analysis (DEA) model by Cook and Kress (1990). They have previously been used in preferential voting and election systems.

In order to justify the need for preference aggregation method of nurse prioritization data, the Spearman’s footrule was used to measure the ranking (prioritization) differences (Fields et al., 2009).To get an accurate picture of the relationships, the footrule distances were calculated for all possible pairwise combinations of nurses in each hospital, and across each pair of hospitals. The discrepancies exist across nurses in the same ED and in different EDs (Fields et al., 2009).

3. Data and methodology Fig. 1 shows the research methodology. The study was broken into pieces. The data were collected and the discrepancies were analyzed to justify the need of the aggregation methods (Fields et al., 2009). The research methodology after that continues with the application of the methods to the data. An expert examines the results gained from applying the methods and suggests the method that performs the best in practice. The decision from the expert is considered in the recommendation of the best method for preference aggregation of ED nurse triage prioritizations. 3.1. Data collection Data was collected in three clinical settings: Susquehanna Health Williamsport Hospital (SHWH), Mount Nittany Medical

3.2.1. Application of aggregation methods to data The methods were applied to the data to explore their use in practice. First, they are applied to the data of each hospital separately, then to the set of all data combined. Microsoft Excel and Excel Solver are used for the calculations and solutions to the linear and nonlinear programs. Since the BK method corresponds to a special case of the OWA operator weight method, it was only necessary to apply three methods: estimation of utility intervals, OWA operator weight-determination, and the three mathematical programming models. For ease of illustration, in determining the nurses’ utility intervals, we assume the order relations of their preference rankings to be a complete weak order. This means that if alternative i is ranked immediately higher than alternative j, then i is not inferior to j, as opposed to being strictly preferred to j. Both LPs and the NLP from Wang et al. (2007a) are applied. Additionally, we determined OWA operator weights using the minimax disparity approach, developed by Wang and Parkan (2005), for DM optimism levels 1, 0.9, 0.8, 0.7, 0.6, and 2/3.This approach was only chosen for convenience and it ‘‘minimizes the maximum disparity between two adjacent weights under a given [optimism level]’’ (Wang et al., 2007b, pp. 3358). 3.2.2. Expert judgment on applied methods Finally, we sought evaluation from a participant with expert judgment. The expert judge has over thirty years of experience in advanced assessment in diagnosis in primary care situations, and also trains new and inexperienced nurses on the process. Our expert prioritized the fictional patients as did the other participants. Following this, one method was presented and its results were explained. Then the expert was asked to comment on the suitability

1308

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311

Fig. 1. Research methodology.

Patient #

Gender

Age

Temperature (° F)

Pulse (beats per min.)

Respiration Rate (breaths per min.)

Systolic Blood Pressure (mm Hg)

Diastolic Blood Pressure (mm Hg)

1 2 3 4 5 6 7 8

M M F F F M F M

18 40 25 7 33 24 3 55

101.8 98.0 99.1 98.3 101.2 97.8 100.7 97.8

97 110 94 115 75 92 80 80

26 20 28 18 23 29 25 18

125 150 120 145 130 120 128 125

79 92 80 90 85 81 83 96

ESI

Rank

Fig. 2. ESI and ranking exercise.

and acceptability of the method and results, in addition to providing any other thoughts. This was repeated for all the methods that were applied. The only results shown were that of all data combined, not the results separated by hospital. Next, with all methods and results in front of her together, the expert was asked which method best performed the aggregation and why. Then it was discovered how much of her response was due to the method used or the result obtained and why. Opinions on the second-best method and least acceptable method were sought to reveal further insight. Finally, we explored her perceptions on the similarities or differences the results may have had to her own prioritization. The expert’s opinion is considered very highly in making a recommendation of the best preference aggregation method for our data.

4. Results This section presents and discusses the results of implementing the research methodology. The data is analyzed and aggregated rankings are determined. We also indicate which aggregation method is most recommended. The following three tables (Tables 1–3) show our prioritization data sets. Four rank aggregation methods are applied to the data and their results are analyzed. First, the method estimating utility intervals is applied. It is followed by the method which determines OWA operator weights and simultaneously treats the BK method. Finally, the method using three mathematical programs is applied.

Table 1 Prioritization data set 1: Rankings from Susquehanna Health Williamsport Hospital.

Table 2 Prioritization data set 2: Rankings from mount Nittany Medical Center.

1309

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311 Table 3 Prioritization data set 3: Rankings from Hershey Medical Center.

4.1. Results of the method utilizing the estimation of utility intervals Table 4 shows the results for the methods using the estimation of utility intervals. If a precedes b with the degree of preference Pða>bÞ

P(a > b), it is denoted a  b. Considering data from all three hospitals together, the aggregated ranking shows that Patient 1 should be seen first, followed by Patient 4, then Patients 7, 6, 2, 3, 5, and 8, in that order. Patient 1 is preferred to Patient 4 by 50.07%, which suggests that they are difficult to distinguish. By the rules of this method, if candidate a has a degree of preference to candidate b that is less than 50%, candidate a should be ranked lower than b. In examining the most highly prioritized patient across the results, Patient 1 is decided upon most often. The results for individual hospitals can be interpreted similarly. The procedure of the method requires counting the number of times each ranking order is given. In this case, there are 8 items ranked, which means there are 8! = 40,320 possible ranking orders. Out of all the 36 ranking orders, only 2 were the same, and those happened to be at the same hospital. Because of this, the utility intervals are weighted fairly equally, so each patient’s weighted average utility depended primarily on the ranking place it was assigned to most often. This shows that for weak order relations, the analyst may eliminate a step in preparing the data for use with this method. This would decrease the complexity of the method and make it easier to use. Additionally, having the accuracy feature, degrees of preference, was very helpful in enhancing the credibility of the method. When they make triage decisions, it is helpful for the DM to see how strong the preferences are between adjacently-ranked patients in the results. For example,

in the MNMC results, Patient 7 is prioritized ahead of Patient 4 with the degree of preference of 59.58%, which is fairly strong. Degrees of preference near to 50%, such as with Patient 6 and Patient 5 for HMC, imply that the two are nearly indifferent in terms of whether one should be prioritized before the other. 4.2. Results of the method utilizing OWA operator weights, including the Borda–Kendall method The following tables present the aggregation provided by the OWA operator weight-determination method; Table 5 shows the OWA operator weights while Table 6 provides the results. The method was applied for optimism levels (a) of 1, 0.9, 0.8, 0.7, 0.6, and 2/3, which corresponds to the BK method. If Patient a precedes Patient b, it is written a  b and if Patient a is preferentially indifferent to Patient b, it is written a  b. For each hospital and for all data combined, the results generally became very stable around a = 0.8 and a = 0.7. This is even considering the fact that for a = 0.8, w7 = w8 = 0, which means the 7th and 8th ranking places were not even considered for the aggregation. Although the votes for the 7th and 8th places had no effect on the results for a = 0.8, the outcome is still comparable with those where all ranking places contributed. The rankings for a = 1 are very unreliable and that essentially corresponds to a majority rule. There are still some ties produced as seen in Table 6 with SHWH, which may be due to our use of the minimax disparity approach for determining the weights. This method turned out to be very simple to implement and interpret. The Borda–Kendall method performs very well except for the SHWH data, due to the presence of a tie in the aggregation.

Table 4 Estimation of utility intervals method: Results. Location

Aggregated ranking

SHWH

SE.43% 50.59% S7.42% SO.2% Patient 4  Patient 2  Patient 7  Patient 6  Patient 1  Patie?it 3  Patient 5  Patient 8 51.13% 50.26% 59.58% 55.68% Patient 1  Patient 6  Patient 7  Patient 4  Patient 3  Patient 2  Patient 5  Patient S Sa.7% 52.72% 51.52% 50.33% Patient 1 S- Patient 2  Patient 4  Patient 6  Patient 5  Patient 3  Patient 7  Patient

MNMC HMC All Data

59.37%

53.64%

59.99%

54.11%

51.33%

62.33%

54.6% 8

52.89%

53.27%

50.07% 53.17% 50.98% 51.9% 58.5% Patient 1 Patient 4  Patient 7  Patient 6  Patient 1  Patient 3  Patient 5  Patient 8

51.64%

61.04%

Table 5 Weights determined by OWA operator weights.

a

OWA wts w1

w2

w3

w3

w5

w6

w7

w8

1 0.9 0.8 0.7 0.666667 0.6

1 0.49 0.32381 0.241667 0.222222 0.183333

0 0.33 0.260952 0.208333 0.194444 0.166667

0 0.17 0.198095 0.175 0.166667 0.15

0 0.01 0.135238 0.141667 0.138889 0.133333

0 0 0.072381 0.108333 0.111111 0.116667

0 0 0.009524 0.075 0.083333 0.1

0 0 0 0.041667 0.055556 0.083333

0 0 0 0.008333 0.027778 0.066667

1310

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311

Table 6 OWA Operator Method and BK Method: Results. Location

Optimism level, a

Aggregated ranking

SHWH

1 0.9 0.8 0.7 0.666666667 0.6

47261358 42716358 42176358 42176358 42176358 42176358

MNMC

a 1 0.9 0.8 0.7 0.666666667 0.6

HMC

a 1 0.9 0.8 0.7 0.666666667 0.6

All Data

71642358 16734258 16734258 1673452 8 1673452 8 1673452 8 16234587 12456738 12457368 12457368 12457368 12457368

a 1 0.9 0.8 0.7 0.666666667 0.6

71462358 14267358 14267358 14276358 14276358 14276358

4.3. Results of the method of three mathematical programs In Table 7 are the results as determined by the method using three mathematical programs. In applying this method, we conclude that the NLP is somewhat tedious because it must be evaluated for each patient. It is clearly shown that evaluating all three methods is unnecessary because the aggregated rankings are exactly the same for any given hospital except Hershey Medical Center. But even this anomaly could be attributed to the fact that there were only ten nurse prioritizations at that location, instead of a larger number. Otherwise, the method is very easy to implement with the simple LPs. Although there is no characteristic to show accuracy in 2 out of the 4 methods applied (Borda–Kendall and the method involving the three MPs), most of the results of these two exactly match the result from the estimation of utility intervals method, with there being only one rank reversal between the aggregations that did not match it. So, in summary, all methods gave similar results, if only one hospital is being considered. There were few rank reversals across methods for individual hospital results, but even fewer in the results from all data combined. Judging from that, the most

Table 7 Three Mathematical Programs: Results. Location

MP

Aggregated ranking

SHWH

LP-1 LP-2 NLP-1

42716358 42716358 42716358

LP-1 LP-2 NLP-1

16743258 16743258 16743258

LP-1 LP-2 NLP-1

12645378 12456378 12456378

LP-1 LP-2 NLP-1

14762358 14762358 14762358

MNMC

HMC

All Data

variability for patient priority came down to the 3rd, 4th, and 5th patients to be seen. Thus, nurses are very much in agreement on who to be seen first, as well as who can wait the longest before being seen. This phenomenon should be explored further in the future. Since there is a lot of similarity seen in the results of the methods, a lot of emphasis is placed on the expert opinion for an overall recommendation. 4.4. Expert opinion In a meeting with the expert, she was asked to perform the prioritization exercise as done by the nurses in the study. The expert’s patient prioritization is Patient2  Patient4  Patient1  Patient5  Patient7  Patient3  Patient8  Patient6. Although the expert’s opinion was not directly illustrated in any of the aggregated rankings of the combined data, she suggested that the utility interval method gave the most acceptable results and picked this method as the best. She said it is the best at clearly communicating the results and thinks the percentage of how much one is preferred over another is a very useful feature. We also note that the footrule distance between her ranking and the ranking determined by this method is 16, which is smaller than the smallest average distance obtained in the exploration of preference discrepancies for comparisons across hospitals. This distance is 17.01 and was observed from the SHWH vs. HMC data (Fields et al. (2009), Figure 5, pp. 702). The difference between the expert’s ranking and the results from all methods is not surprising. She attributes this to her expertise in the primary care area instead of emergency medicine. Commenting on the disparities seen across the methods’ aggregations, the expert believes the few rank reversals are not important. Overall, the expert believes the results show consistency. The expert nurse based her decision for the best choice on the method and the results. Although both were considered, the results were more heavily so. According to the expert, this is because nurses like to see statistics. The method approved as second-best is the BK method, due to its simplicity. Least favored are the three mathematical programs and the method which determines OWA operator weights, with the reason being their complexity. The expert says that what contributes most to the lack of appeal for these methods is no clear understanding of their need to be so complex. She does agree that these methods allow for individual nurse variations to be accounted for, though. Overall, these methods would not be the most advantageous in practice. If the expert was a triage nurse in the ED and needed to look to any method on the job as a decision-making aid to her own knowledge, she would prefer the method using the estimation of utility intervals. 4.5. Preference aggregation method recommendation According to this study, we recommend the method using estimation of utility intervals. In this method, it is easy to see how the weights are obtained and the expert stresses that nurses performing the aggregation need to have weights that make sense to them. Additionally, this method’s provision of degrees of preference is a very desirable quality that gained it a lot of the expert’s support and interest. Because of this, the method shows great representation of all nurse opinions. Moreover, the nurses are believed to appreciate how clearly the results are communicated via this method. 5. Conclusion The paper compares four aggregation methods in the area of nurse triage and prioritization in the emergency department of a hospital. In this environment, crucial and potential life-altering decisions are made by a skillful and caring staff of individuals

E.B. Fields et al. / Expert Systems with Applications 40 (2013) 1305–1311

who sometimes work long hours in a fast-paced and ever-changing job, made so by those who arrive at their doors. There are a number of aspects that contribute to the stress in that setting and complicate nurse decision-making processes. Additionally, prior studies have shown that nurses draw upon their personal experience, knowledge, and intuition in making judgments on the category a patient should be triaged in and on the prioritization of those patients after they have been initially triaged. If enough time has elapsed, initial prioritizations may need to be altered to accommodate the changes in patient status, if any. Since nurses bring their individual judgments to their decision-making, this work first shows that when given the same set of fictional patient data, discrepancies do exist in patient prioritization. Thus, by selecting and recommending an appropriate preference aggregation method for this situation, this work seeks to benefit the healthcare industry by offering an instrument that could help increase productivity, whether it is implemented as a part of a larger decision-support tool, or on its own. Upon applying four of the methods to the ranking data collected, it was discovered that for each hospital individually, the aggregation results were fairly similar, with the exception of a few rank reversals, especially for SHWH. The OWA operator weights method with a = 1 performed the worst overall, which is expected because it is essentially an aggregation by majority rule. The most stable methods were the method of the three mathematical programs and the OWA operator weights method for a values between 0.8 and 0.6. An expert opinion was also solicited to help evaluate the results of applying the aggregation methods to the data and to obtain a suggestion from someone who would best know what may realistically work well if incorporated in some way into these nurses’ decision-making environment. The expert’s ranking of the patients was not reflected in any of the aggregation results of the combined data, but had a fairly small footrule distance of 16 from the method that estimates utility intervals. The methods which had designations of accuracy were well received by the expert and her final suggestion was the method that estimates utility intervals. This study recommends the method that estimates utility intervals as the most suitable preference aggregation method due to the expert’s opinion. The BK method is the second best recommendation. Future work could be to expand this study to include more hospitals, nurses, methods, or a more extensive or expansive scenario set. New preference aggregation methods could be developed specifically for this type of decision-making scenario, as well. The final method recommended here can be implemented by itself and employed under peak or particularly stressful times if desired, or combined with a decision support tool for more impact. Regardless of the next steps inspired as a result of this study, it is sure to add value to the US healthcare industry through the increase of efficient practices. References Andersson, A. K., Omberg, M., & Svedlund, M. (2006). Triage in the emergency department—a qualitative study of the factors which nurses consider when making decisions. Nursing in Critical Care, 11(3), 136–145. Arrow, K. J. (1951). Social Choice and Individual Values. New York: Wiley.

1311

Beveridge, R. (1998). The Canadian triage and acuity scale: A new and critical element in health care reform. Journal of Emergency Medicine, 16(3), 507–511. Bogart, K. P. (1973). Preference structures I: Distances between transitive preference relations. Journal of Mathematical Sociology, 3, 49–67. Bogart, K. P. (1975). Preference structures II: Distances between asymmetric relations. SIAM Journal on Applied Mathematics, 29(2), 254–262. Borda, J. C. (1784). Mémoire sur les élections au scrutin. Histoire de l’Académie Royale de Science 1784; Paris: (Translated in the political theory of Condorcet. Sommerlad F., Mclean I. Social studies. Working paper 1/89, Oxford, 1989). Buesching, D. P., Jablonowski, A., Vesta, E., Dilts, W., Runge, C., Lund, J., et al. (1985). Inappropriate emergency department visits. Annals of Emergency Medicine, 14(7), 672–676. Cone, K. J., & Murray, R. (2002). Characteristics, insights, decision making, and preparation of ED triage nurses. Journal of Emergency Nursing, 28(5), 401–406. Cook, W. D., & Kress, M. (1990). A data envelopment model for aggregating preference rankings. Management Science, 36(11), 1302–1310. Cook, W. D., Kress, M., & Seiford, L. M. (1996). A general framework for distancebased consensus in ordinal ranking models. European Journal of Operational Research, 96, 392–397. Cook, W. D., & Seiford, L. M. (1978). Priority ranking and consensus formation. Management Science, 24(1), 1721–1732. Fields, E., Claudio, D., Okudan, G.E., Smith, C., & Freivalds, A. (2009). Triage Decision Making: Discrepancies in assigning the Emergency Severity Index. In: Proceedings of the 2009 Industrial Engineering Research Conference (May 30– June 03) 2009; (pp. 699–704). Gilboy, N., Tanabe, P., Travers, D.A., Rosenau, A.M. & Eitel, D.R. Emergency Severity Index, Version 4: Implementation Handbook, AHRQ Publication No. 05–0046-2. Rockville, MD: Agency for Healthcare Research and, Quality 2005. Inada, K. (1964). A note on the simple majority rule. Econometrica, 32(4), 525–531. Kemeny, J. G., & Snell, L. J. (1962). Preference Ranking: An Axiomatic Approach. Mathematical Models in the Social Sciences (pp. 9–23). New York: Ginn. Kendall, M. (1962). Rank Correction Methods (3rd ed.). New York: Hafner. Liu, X. (2009). Parameterized defuzzification with continuous weighted quasiarithmetic means – An extension. Information Sciences, 179(8), 1193–1206. Öztürk, M., & Tsoukiàs, A. (2008). Bipolar preference modeling and aggregation in decision support. International Journal of Intelligent Systems, 23, 970–984. Patel, V. L., Gutnik, L. A., Karlin, D. R., & Pusic, M. (2008). Calibrating urgency: Triage decision making in a pediatric emergency department. Advances in Health Science Education, 13, 503–520. Peneva, V., & Popchev, I. (2007). Aggregation of fuzzy preference relations to multicriteria decision making. Fuzzy Optimization and Decision Making, 6, 351–365. Tamiz, M., & Foroughi, A. A. (2007). An enhanced approach to the ranked voting system. World Review of Entreprenuership, Management, and Sustainable Development, 3(3–4), 365–372. Wang, J. W., Chang, J. R., & Cheng, C. H. (2006). Flexible fuzzy OWA querying method for hemodialysis database. Soft Computing, 10(11), 1031–1042. Wang, Y. M., Chin, K. S., & Yang, J. B. (2007a). Three new models for preference voting and aggregation. Journal of the Operational Research Society, 58, 1389–1393. Wang, Y. M., Luo, Y., & Hua, Z. (2007b). Aggregating preference rankings using OWA operator weights. Information Sciences, 177, 3356–3363. Wang, Y. M., & Parkan, C. (2005). A minimax disparity approach for obtaining OWA operator weights. Information Sciences, 175, 20–29. Wang, Y. M., Yang, J. B., & Xu, D. L. (2005). A preference aggregation method through the estimation of utility intervals. Computers & Operations Research, 32, 2027–2049. Yager, R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Transactions on Systems, Man, and Cybernetics, 18, 183–190. Zimmermann, P. G. (2001). The Case for a universal, valid, reliable 5-tier Triage Acuity Scale for US Emergency Departments. Journal of Emergency Nursing, 27(3), 246–254.