European Journal of Operational Research 219 (2012) 360–367
Contents lists available at SciVerse ScienceDirect
European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor
Decision Support
Can a linear value function explain choices? An experimental study q Pekka J. Korhonen, Kari Silvennoinen, Jyrki Wallenius ⇑, Anssi Öörni Aalto University, School of Economics, Department of Business Technology, P.O. Box 21220, 00076 Aalto, Helsinki, Finland
a r t i c l e
i n f o
Article history: Received 1 July 2011 Accepted 20 December 2011 Available online 12 January 2012 Keywords: Linear value function Inconsistency Multiple criteria Weights Binary choices
a b s t r a c t We investigate in a simple bi-criteria experimental study, whether subjects are consistent with a linear value function while making binary choices. Many inconsistencies appeared in our experiment. However, the impact of inconsistencies on the linearity vs. non-linearity of the value function was minor. Moreover, a linear value function seems to predict choices for bi-criteria problems quite well. This ability to predict is independent of whether the value function is diagnosed linear or not. Inconsistencies in responses did not necessarily change the original diagnosis of the form of the value function. Our findings have implications for the design and development of decision support tools for Multiple Criteria Decision Making problems. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction Use of linear models in one way or another is not uncommon in decision making (Bigelow, 1887; Dawes and Corrigan, 1974; Zionts and Wallenius, 1976; Dawes, 1979; Saaty, 1980; Zionts, 1981; Phelps and Köksalan, 2003). One of the first uses of a linear value function model is due to Benjamin Franklin in Bigelow (1887). He advocated a simple scoring system, where he would list the pro and con arguments and then find ‘‘where the balance lies’’. Dawes and Corrigan (1974) and Dawes (1979) review the use of linear models in decision making and explore the reasons why such models have been successful. Their focus is, however, somewhat different from ours, namely how well linear models (such as regression models) perform in prediction. They argue that many decision contexts have structural characteristics which make linear models appropriate. In some contexts the use of randomly chosen weights outperformed expert judges. Zionts and Wallenius (1976), Zionts (1981), and Phelps and Köksalan (2003) work with an approximate linear value function. They realized early on that humans may exhibit inconsistencies with it, and suggested periodically purging oldest responses. Their argument is that a linear value function is not a bad first approximation. Saaty (1980), in his Analytic Hierarchy Process, uses a linear value function to aggregate criteria. He also realized early on that humans are
q The research was supported by the Academy of Finland (Grant Number 121980). All rights reserved. This study may not be reproduced in whole or in part without the authors’ permission. ⇑ Corresponding author. Tel.: +358 9 47001. E-mail addresses: Pekka.Korhonen@aalto.fi (P.J. Korhonen), jyrki.wallenius@aalto.fi (J. Wallenius).
0377-2217/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2011.12.040
seldom consistent with a linear value function in their responses and suggested the use of an inconsistency index to monitor the extent to which humans exhibit such inconsistency. Many ranking and sorting schemes are based on the use of a linear aggregate value function (Köksalan et al., 2010; Köksalan and Ulu, 2003), and so are simple scoring models. See also Scheubrein and Bossert (2001). Data Envelopment Analysis, a popular performance measurement approach, is based on the use of linear programming to aggregate multiple outputs (Charnes et al., 1978), as is weighted Goal Programming (Charnes and Cooper, 1961). Linear models have their proponents, yet they have also faced criticism both as descriptive and normative models of choice (Korhonen and Wallenius, 1989). We take a careful new look at this old issue, namely can a linear value function explain choices. We are aware of the behavioral literature, such as Simon (1956), Tversky (1969), and Einhorn and Hogarth (1981). They essentially challenge the existence of any utility (value) function. However, to our knowledge there is no prior empirical literature looking at how well a linear value function approximately explains choices. One obvious finding from our study is that people behave inconsistently with respect to a linear value function, or any function. For instance, they choose a dominated alternative, make a different choice in an identical decision situation, or do not follow a certain decision rule (such as a linear value function), and so forth. There are many reasons for the appearance of such inconsistencies, for instance: (1) people do not pay enough attention to choices, (2) they are simply unable to be fully consistent with a decision rule (in our case, a linear value function), (3) they change their mind, or (4) people simply make errors. We have studied such inconsistencies and modeled their impact on the diagnosis of a linear value function. Moreover, we have studied how well a linear value function enables us to predict subjects’ choices. The
361
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367
weights of the linear function have been estimated using a formulation maximizing the minimum preference difference in choices. One of our important results is that a linear value function seems to predict choices quite well in case of two criteria. This result is independent of whether the value function was diagnosed linear or not. However, if a linear value function is used to model a decision maker’s choices (preferences), the estimation method has to tolerate inconsistent responses. Our e formulation does this. See also Zionts and Wallenius (1976) and the Analytic Hierarchy Process (AHP) by Saaty (1980). This paper unfolds as follows. Section 2 provides preliminary theoretical considerations. Section 3 describes the experiment and Section 4 presents our findings. Section 5 discusses our findings and concludes the paper. 2. Basics In this chapter, we introduce basic definitions and prove theorems needed in setting up our experiment and understanding the findings of our empirical study. For instance, if the subject’s value function is linear, we show how the original values can be scaled, yet the choices should remain the same. We also provide the details of our weight estimation procedure.
p X j¼1
kj xsj () a
p X
j¼1
>a
p X
p X
>
kj xsj ()
p X
p X
kj axrj þ
p X
j¼1
kj axsj þ
p X
j¼1
>
kj xrj
j¼1
j¼1
kj bj ()
j¼1
kj bj
j¼1 p X
kj ðaxrj þ bj Þ
j¼1
kj ðaxsj þ bj Þ; for all ðX r ; X s Þ 2 P:
j¼1
In Lemma 2, we prove that if a (scaling coefficient) is not the same for all criteria, then the consistency property remains. The weights have to be changed accordingly. Pp Lemma 2. If a linear value function j¼1 kj xij ; i ¼ 1; 2; . . . ; n, with vector k P 0, is consistent with the DM’s preferences, then the linear P value function pj¼1 lj ðaj xij þ bj Þ i 2 N and j = 1, 2, . . . , p, where aj > 0,
lj = kj/aj, is consistent with all preferences (Xr, Xs) 2 P. Proof. Replace xij by ajxij + bj in the linear value function:
2.1. Some theory
p X
Consider a discrete, finite, deterministic multiple criteria evaluation problem where a single decision maker (DM) compares a set of n alternatives with respect to p criteria. The set S of alternatives includes vectors X i 2 Rp ; i 2 N ¼ f1; . . . ; ng. Without loss of generality, assume that for each criterion more is better. We define nondominance in Rp in the usual way.
j¼1
Definition 1. A vector X 2 Rp is nondominated iff (if and only if) there does not exist another X 2 Rp such that X P X⁄ and X⁄ – X. In the following we use the symbol ‘‘’’ to indicate the relationship ‘‘is preferred to.’’ We assume the relation is transitive. The DM’s preferences are expressed by set P = {(Xr, Xs) jXr Xs, r, s 2 N}1 called a preference set. Thus P defines a strict partial order in S.
p X
kj xrj >
lj ðaj xrj þ bj Þ >
p X
lj ðaj xsj þ bj Þ ()
j¼1
>
p X
p X
lj aj xrj
j¼1
lj aj xsj ()
j¼1
p X
kj xrj P
j¼1
p X
kj xsj ;
j¼1
because aj > 0. We define kj = ljaj ) lj = kj/aj. Thus is consistent with all preferences (Xr, Xs) 2 P. h
Pp
j¼1
lj ðaxij þ bj Þ
Corollary 1. If there exists no vector k > 0 such that the linear value Pp function j¼1 kj xij ; i ¼ 1; 2; . . . ; n, is consistent with preferences (Xr, Xs) 2 P, then there exists no vector l P 0 such that the linear funcPp tion of the form j¼1 lj ðaj xij þ bj Þ; aj > 0, is consistent with all (Xr, Xs) 2 P. Proof. The result follows directly from Lemmas 1 and 2. h
Definition 2. Any strictly increasing function a value function.
p
v:R
! R is called
Definition 3. If v(Xr) > v(Xs) for all (Xr, Xs) 2 P, v is said to be consistent with P. Definition 4. Assume P = {(Xr, Xs)jXr Xs, r, s 2 N} consists of preference information available about alternatives Xi, i 2 N. If there P P exists a weight vector k > 0 such that pj¼1 kj xrj > pj¼1 kj xsj , for all Pp (Xr, Xs) 2 P, then a linear value function j¼1 kj xij i ¼ 1; 2; . . . ; n, is consistent with the DM’s preferences in P.
2.2. Basic model A simple way to study the consistency of a linear value function with preference set P and a way to find its weights is to formulate the problem as an LP-problem. For each pair (Xr, Xs) 2 P, formulate Pp Pp Pp the inequality: j¼1 kj xrj e P j¼1 kj xsj , where j¼1 kj ¼ 1, and kj > 0, j = 1, 2, . . . , p, and maximize e:
max
e
s:t: p X
kj xrj e P
j¼1
Lemma 1. The consistency property of a linear value function is invariant under the linear transformation of the criteria: xij ? axij + bj, i 2 N, j = 1, 2, . . . , p, and a > 0.
p X
1
If Xi dominates Xj, i, j 2 N, )(Xi, Xj) 2 P.
kj xsj ; for all ðX r ; X s Þ 2 P
j¼1
ð1Þ
kj ¼ 1
j¼1
kj P d; Pp Proof. Assume a linear value function j¼1 kj xij ; i ¼ 1; 2; . . . ; n, with vector k > 0, is consistent with the DM’s preferences. Replace xij by axij + bj, i 2 N, j = 1, 2, . . . , p, and a > 0 in the linear value function. Hence we obtain
p X
j ¼ 1; 2; . . . ; p;
where d > 0 is non-Archimedean. If e > 0 in model (1), then there exists a linear value function consistent with all preferences (Xr, Xs) 2 P. If e 6 0, there does not exist a linear value function consistent with the preference information. Note that the preference information only includes strict preferences.
362
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367
Fig. 1. Illustrating the use of model (1).
Example. To illustrate the use of the model, consider the following simple example. Suppose that we have three pairs of choices, each evaluated with two criteria: [(3, 8); (2, 10)], [(2, 6); (5, 2)], and [(4, 5); (6, 4)]. Suppose further that a subject has specified that the first alternative is preferred to the latter in each of the three cases: (3, 8) (2, 10); (2, 6) (5, 2); (4, 5) (6, 4). When we insert these values and preferences into model (1), we obtain the solution: e = 0.5 and k1 = k2 = 0.5. Because e < 0, it is not possible to find a linear value function which is consistent with the three preference statements. Fig. 1 illustrates the situation. The preference direction is described with a pointed arrow (pointing towards the inferior alternative). The indifference contour of a linear value function with weights k1 = k2 = 0.5 is located at the more preferred alternative in each pair. As we can see, the estimated linear function violates the preference relation for pairs [(3, 8); (2, 10)] and [(4, 5); (6, 4)]. In both cases, the value of the linear function is higher at the worst alternative. The difference is 0.5. However, even if the consistency according to the solution of our model is violated in two pairs, dropping one pair [(3, 8); (2, 10)] from the model restores consistency (e > 0). Note that the change of one’s mind does not necessarily affect the estimation. For instance, you may reverse the preference for pair [(2, 6); (5, 2)] without any impact on the consistency with a linear value function, or the weights. 3. The experiment 3.1. Subjects One hundred and forty four sophomores at the Helsinki School of Economics2 participated in the experiment. All subjects were students in an introductory Management Science course with some experience in using computer models. They were recruited on a voluntary basis and received a lunch coupon for participating. They were motivated to participate and the task was of interest, highly relevant, and important to them. We gathered the data in the spring of 2009. 3.2. The experimental task We developed a Web-based application for data collection. The program first queried for some background information related to 2
Currently Aalto University, School of Economics.
past success with studies. Then the program asked for realistic lower and upper limits for both criteria. Based on the revealed preferences and the range of realistic values, the program then generated 20 pairs of bi-criteria alternatives as a basic choice set. The alternatives were presented as they appear in Table 1. We showed the value pairs on the computer screen, and the subjects were expected to choose the most preferred alternative. The choice situation was symmetric and there was no reference alternative. We simply asked: which one (pair) do you prefer? Participants were asked to make pairwise choices between twocriteria alternatives. The two criteria for the problem were Credit Points (ECTS = European Credit Trading System)3 and Grade Point Average (GPA) for the next academic year of studies. For instance, we asked them to choose a preferred alternative out of two alternatives: (40, 70) vs. (50, 65). In other words, they had to weigh, whether the gain of 10 units in Credit Points compensates the loss of 5 units in Grade Point Average. For each subject, we randomly generated a set of pairs of alternatives and asked him/her to choose the preferred one. The range for the criteria was individually obtained from each student and was based on his/her individual expected max/min values. The principles for generating the pairs of alternatives are described in detail in Table 1. We felt that the decision context was relevant for the students. At least implicitly (if not explicitly), they have to make tradeoffs with grades and credit points when allocating time for their studies. The instrument was pre-tested in a pilot study with a number of graduate students. The pilot study revealed some issues with the wording of the questions, which necessitated changes to the data collection mechanism. Note that the re-scaling was done in such a way that a subject should choose the same alternative in a re-scaled situation, given that his/her decision rule is consistent with a linear value function. We used two kinds of re-scalings to check this. 3.3. Procedure The experiment was organized as follows. The data collection application was installed in a web server and it was accessed over the network with a web browser. The experiment was organized in our university’s computer lab, and each participant had a personal computer at his/her disposal. In the beginning of the experiment, the organizer outlined the experiment in broad terms and motivated the participants. He then displayed the web address of the data collection application and explained how to start the application. The application was self-documenting and, hence, little further information was provided to the participants besides assistance in technical problems. 3.4. Analysis Our conclusions are mainly based on the binomial probabilities p that a subject will make a certain choice. The computations are carried out based on the responses of the subjects. We use the maximum likelihood principle to estimate p, i.e. we maximize the likelihood function L(p):
max LðpÞ ¼
k Y
Pk Pk ðpi ð1 pÞki Þfi ¼ p i¼0 ifi ð1 pÞ i¼0 ðkiÞfi ;
ð2Þ
i¼0
where k refers to the total number of choices of interest and fi refers to the number of subjects who make i choices of interest (i = 0, 1, . . . , k). For example, if we count how many times each subject responded to the re-scaled questions 23–27 differently than to the 3 US credit hours can be transformed to ECTS credits by multiplying them with 1.5. Hence a 4 credit hour course corresponds to 6 ECTS credits.
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367
363
Table 1 Principles for generating pairs of alternatives. Pairs
Description
Examples
1–20
Randomly generated nondominated pairs. The criterion values were within the subjective range provided by the subject Randomly generated pairs, where one dominated the other The pairs were the same as pairs 1, 4, 11, 17, and 19, but multiplied with a positive number such that one value was on the lower or upper bound of the range given by the subject The pairs were the same as 1, 4, 11, 17, and 19, but the coefficient was chosen in such a way that one of the lower/upper bounds was exceeded by 10%. Control questions
The first criterion is ECTS credits and the second GPA. Which do you prefer? (40, 75) vs. (50, 60) Which do you prefer? (54, 85) vs. (60, 88) Pair 1: (49, 61) vs. (64, 60) Pair 23: (45, 56) vs. (70, 55) In pair 23, we have made a transformation a = 5/3,b1 = 36.7, and b2 = 45.3.a
21–22 23–27
28–32
33–34
Pair 1: (49, 61) vs. (64, 60) Pair 28: (36, 45) vs. (77, 43) Pair 28 was generated as above, but using the range enlarged by 10% The pairs were identical with pairs 4 and 6
a
Constants b1 and b2 were found by multiplying the midpoints of the respective subjective criterion ranges by 1 a, and a was chosen in such a way that one criterion value was at a lower/upper bound. (In the example, value 70 was the upper bound of the first criterion.) Pairs 23–32 are called re-scaled pairs.
original ones (see, e.g., Table 5), we can estimate the probability that one response of an (average) subject is consistent with a linear value function. In Table 5, for instance, k is 5 and 13 (fi = 13) subjects made 2 (i = 2) choices inconsistent with a linear value function. The solution of (2) is simply
Pk p¼
i¼0 ifi
kn
;
ð3Þ
P where n ¼ ki¼0 fi is the number of subjects who responded to the question. The confidence limits for p can be computed in the usual way by using a standard normal distribution approximation:
pz
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pð1 pÞ ; ðn 1Þ
ð4Þ Fig. 2. Consistency of responses with linear value function.
where z = z(1 a/2) is found from the standard normal distribution function, when the risk level a is chosen. In our considerations, we use 5% risk level. 4. Findings Our findings are discussed in more detail in the following subsections. We consider three research questions: consistency of the preferences with a linear value function impact of re-scalings on consistency of responses ability to predict choices with a linear value function 4.1. Consistency of preferences with a linear value function First, we studied to which extent a linear value function could be used to model subjects’ choices. Our basic analysis is based on the answers of subjects to the first 20 pairs of questions.4 We used model (1) for consistency checking. When the optimal solution e > 0, it was possible to find weights (k) such that all answers were consistent with a linear value function; otherwise not. The results are given in Fig. 2. From Fig. 2, we can see that only 38.9% of the subjects made choices which were fully consistent with a linear value function. The 95-percent confidence interval for this ratio (expressed as percentage) is (30.9%, 46.9%). Based on our simple bi-criteria experiment and the above confidence interval, we conclude that a majority of people do not make 20 binary choices which are fully consistent with a linear value function. 4 Not all subjects expressed a strict preference about all 20 questions. We only used those strict preferences, because we were not reliably able to make a difference between ‘‘indifference’’ and ‘‘not being able to choose’’.
It is interesting to try to understand, why the empirical results show that the subjects’ choices are not fully consistent with a linear value function. We recognize at least four different reasons: 1. A linear value function is too simple to describe the subjects’ preferences. 2. Subjects do not pay enough attention to choices; the problem is not important enough for them. 3. Subjects will change their mind or are simply not able to be fully consistent with a linear value function. 4. Subjects simply make errors in their choices. We studied reasons 2–4 by focusing on the following questions: 1. How subjects choose, when one alternative dominates the other (questions 21–22)? 2. How subjects choose, when identical pairs (33–34) are presented to them? 3. How many binding constraints have to be removed from model (1) (one by one, focusing on the highest shadow price) before consistency with a linear value function is restored? The subjects’ commitment to our experiment was tested by asking them to make two choices, in which one alternative dominated the other. In our simple situation, it is difficult to find reasons for a subject to prefer a dominated alternative. We call such choices ‘‘irrational’’. We believe that the reason for choosing a dominated alternative is that the subject did not pay enough attention to the choice. The results are given in Fig. 3. We use a maximum likelihood principle to estimate the probability p that a subject notices the dominance relation. We find the value for p as follows (formula 3):
364
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367 Table 2 Consistency with linear model based on original questions vs. replacing two questions by control questions. Original questions
Control questions
Fig. 3. Number of times a dominated alternative was preferred.
p ¼ ð133 2 þ 8 1 þ 3 0Þ=ð144 2Þ ¼ 274=288 ¼ 0:951: The 95-percent confidence interval for this probability is (0.916, 0.987) (see, formula 4). Five percent of the subjects did not notice that an alternative dominated the other. This may be regarded as an indication of how perceptive the subjects were after responding to 20 questions. In other words, subjects make errors in their choices, because they do not pay enough attention to a choice. To study how consistently subjects choose, we asked them to evaluate two pairs (33, 34) which were identical with pairs 4 and 6 (see Fig. 4). Surprisingly only half of the subjects (50.7%) made the same choice twice. Note that these control questions were presented at the end of a long sequence of binary choices. If these control questions were presented earlier, the percentage of subjects making the same choice twice may very well have increased. As before, using the maximum likelihood principle, we estimated the probability that a subject makes a consistent choice, when two identical pairs are presented to him/her:
Pfsubject chooses consistently; when two identical pairs are presented to him=herg ¼ 0:708: A 95-percent confidence interval for the probability is (0.656, 0.761). We notice that this probability is clearly smaller than the probability of choosing an alternative which dominates another. Hence we conclude that people quite often make inconsistent choices. This may happen accidentally and/or people may change their mind during the choice process and/or simply get tired. Although only half (50.7%) of the subjects were fully consistent (Fig. 4), the impact of inconsistencies on the linearity vs. non-linearity of the value function was minor. As we can see from Table 2,
Fig. 4. Consistency of choices between identical pairs.
Yes No Total
Total
Yes
No
55 1 56
4 84 88
59 85 144
only one subject’s value function from column one did not test linear with the control questions, and on the other hand only four subjects’ value function changed its status from nonlinear to linear. The test result significantly supported the null-hypothesis that the ratio of the change in status from linear to nonlinear (1/56 = 0.018) was the same as the ratio of the change in status from nonlinear to linear (4/88 = 0.045). As we discussed in the context of our example, the change of the preference does not necessarily have an impact on the test of the functional form of the value function. If a change in a stated preference has no effect on the binding constraints of the optimal solution of model (1), it has no impact on the test concerning the functional form of the value function (see, e.g., Fig. 1). Based on the above, we assume that subjects may choose according to a linear value function, but they are not fully consistent in their responses. That is why we wanted to study, how many responses we had to purge to make the subjects’ responses fully consistent with a linear value function.5 In Table 3, we report the results, when binding constraints of model (1) were removed one by one in such a way that the constraint with the highest shadow price was purged in each step. The first column indicates the number of strict preference comparisons the subjects made. As reported already in Fig. 2, 56 subjects (38.9%) out of 144 were fully consistent with the linear value function. If we allow the subjects to make 5% inconsistent choices (lightly shaded area in the table), then we conclude that 93 subjects (64.6%) out of 144 are consistent with the linear value function. Correspondingly, allowing 10% inconsistencies in choices, the number of ‘‘consistent with linear value function’’ subjects increases to 120 (83.3%). Let us consider more closely the cases (totaling 50), where the subjects provided a strict preference for all 20 pairs. We use this distribution in estimating the probability that a subject will choose fully consistently according to a linear value function. Using the maximum likelihood principle, we obtain p = 0.920. A 95-percent confidence interval is (0.844, 0.996). When n = 20, the probability is 0.189 (=0.92020) that all choices of a person are consistent with a linear value function. Moreover, the probability is 0.517 (=about fifty–fifty) that a person will make at most one inconsistent choice out of 20 choices. The corresponding probabilities for at most two and three inconsistent choices are 0.788 and 0.929, respectively. If the subjects are asked to make a sufficiently large number of choices, the probability approaches 1 that at least one of the choices is inconsistent with a linear value function. The conclusion from the above is that we should accept some inconsistent choices. The justification for the use of a linear value function cannot solely be based on consistency of responses. A good strategy may be to eliminate choices which cause inconsistencies as long as there are not too many of them.6 The results in Table 4 are based on an analysis, where we have estimated the weights of the linear value function (Model (1)) 5 When several binding constraints are purged one by one, the number of purged constraints is not necessarily minimal. 6 Note, however, that the ‘‘wrong’’ choice is not necessarily the choice which makes the choices inconsistent with the linear value function (see Fig. 1).
365
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367 Table 3 Number of binding constraints that had to be removed to make model consistent with a linear modela. # of Removed constraints
a
# of Strictly preferred pairs
0
5 7 10 11 12 13 14 15 16 17 18 19 20 Total
1 1 1 1 1 5 6 6 5 4 6 6 13 56
1
2
3
Error Rate: 4
5
6
7
Consistency ratio
Consistency ratio
Consistency ratio
1 1
1 1 2 1 3 8 10 9 11 12 13 23 50 144
100.0 100.0 50.0 100.0 33.3 62.5 60.0 66.7 45.5 33.3 46.2 26.1 26.0 38.9
100.0 100.0 100.0 100.0 100.0 100.0 60.0 88.9 63.6 66.7 69.2 43.5 58.0 64.6
100.0 100.0 100.0 100.0 100.0 100.0 60.0 100.0 72.7 91.7 100.0 73.9 80.0 83.3
2 3 1 3 6 14
1 2 3
1
1 1 1
2
1
10%
Sum
2 3 2 1 1 3 4 7 11 29
5%
10
1
2 2 4 3 4 16 37
0%
We only used strict preferences.
Table 4 The Predictability of the last 10 choices. # of Correct predictions
10 9 8 7 6 3 Total
Table 5 Linearly re-scaled criterion values (one on the bound) of pairs 1, 4, 11, 17, and 19⁄.
Consistency with linear model
# of Inconsistent choices
Yes
No
Total
10 11 7 4 1
1 8 5 1 1 1 17
11 19 12 5 2 1 50
33
0 1 2 3 4 5 Total ⁄
based on the first 10 choices and then used the estimated function to predict the subsequent 10 choices. Note that we use the estimated weights even if the form of the function is not diagnosed linear. The results are given in two columns according to whether the value function is diagnosed linear or not – based on the first 10 choices. Using the maximum likelihood principle we find the probability that the estimated value function correctly predicts a choice. The probabilities are estimated separately for the linear (p1) and nonlinear (p2) case, respectively. We find p1 = 0.876 and p2 = 0.812. We tested whether the difference of these probabilities is zero in the ‘‘Yes’’ and ‘‘No’’ populations. We used a standard test for the difference between two population proportions (see, e.g., Neter et al., 1988, pp. 412–413). Because the sample is large enough, we used the standardized test statistic z⁄, based on an approximate normal distribution. Because z⁄ = 0.607 1.96 (risk level 5%), we conclude that there is no difference between the population proportions. The probability estimate based on the sum distribution is 0.854. This result is interesting. The predictability of a correct choice is relatively high irrespective of whether the estimated value function is linear or not. 4.2. Consistency of responses with re-scalings As we showed in Lemma 2 (and Corollary 1), if a subject is consistent with a linear value function, (s)he should remain consistent when the criterion values are re-scaled. This is, however, not necessarily true for subjects who do not make choices according to a linear value function. We used two different re-scalings. We now analyze whether subjects behaved according to our theory. The principles for the re-scalings were explained in Table 1. In Table 5 we transformed the criterion values of five original pairs
Consistency with linear model Yes
No
Total
16 24 13 1 2 0 56
15 32 23 13 5 0 88
31 56 36 14 7 0 144
See Table 1 and the explanation after it.
in such a way that the same a for both criteria was used and bj, j = 1, 2, was chosen in such a way that at least one out of the four criterion values was on the bound of the range [‘‘most pessimistic’’-value, ‘‘most optimistic’’-value]. The other values were within the range. Table 5 describes the results. Using a maximum likelihood principle, we estimated the probability for both columns (Consistent with Linear Models (‘‘Yes’’) and Inconsistent with Linear Models (‘‘No’’)) that a subject chooses in the re-scaled case in the same way as in the original one. The corresponding probabilities are 0.782 (for ‘‘Yes’’) and 0.689 (for ‘‘No’’), respectively. When we tested the hypothesis that the probabilities are the same in both populations (Neter et al., 1988, pp. 412–413) using the standard normal distribution approximation, we obtained 1.225 as the value of the test statistic, which is clearly lower than the critical value (1.96) with 5%-risk level. Thus we conclude that the difference in the distributions of the responses between re-scaled questions of the ‘‘Yes’’ population and the ‘‘No’’ population is not significant. Although only 21.5% (=100⁄31/144) of the subjects were fully consistent (Table 5), the impact of inconsistent responses on the linearity vs. nonlinearity of the value function was minor 11.8% (=100⁄(10 + 7)/144) (Table 6). The probability is 0.831 that a linear value function is still diagnosed linear, when 25% (5 out of 20) of the original questions are replaced by the re-scaled questions. The corresponding probability is 0.918 for a nonlinear value function to remain nonlinear. As before, we tested the difference in the populations as before (Neter et al., 1988, pp. 412–413), and obtained z⁄ = 1.59 < 1.96 (5% risk level). We conclude that the robustness difference between proportions ‘‘Yes–Yes’’ / ‘‘Yes –Total’’ and ‘‘No–No’’ / ’’No – Total’’ in the two populations is not significant. The test of the form of the value function is quite robust when using model (1) in estimation.
366
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367
Table 6 Consistency with linear model using original vs. re-scaled values I. Original responses
Re-scaled questions (on the bound)
Yes No Total
Table 9 Consistency of women and men with linear value function model. Total
Yes
No
49 (34.0%) 7 (4.9%) 56 (38.9%)
10 (6.9%) 78 (54.2%) 88 (61.1%)
59 (41.0%) 85 (59.0%) 144
In Table 7 we performed a similar re-scaling as previously, but we first expanded the above range in such a way that the lower bound was decreased and the upper bound was increased by 10% (see Table 1). We compared the results for the ‘‘Yes’’ and ‘‘No’’ distributions as before. The corresponding probabilities for ‘‘Yes’’ and ‘‘No’’ value functions are 0.746 (‘‘Yes’’) and 0.675 (‘‘No’’). The test statistic for the difference in the population proportions is 0.914 < 1.96. Thus our conclusion is that the difference in the distributions of the responses between re-scaled questions of the ‘‘Yes’’ population and the ‘‘No’’ population is not significant. The test of the form of the value function is quite robust when using model (1) in estimation. It is surprising that almost the same proportion 20.1% (=100⁄29/ 144) of the subjects were fully consistent (Table 7) as in the previous re-scaling situation (21.5%) (Table 5). The impact of inconsistent responses on the linearity vs. nonlinearity of the value function was also almost the same 13.9% (=100⁄(7 + 13)/144) (Table 8); previously it was 11.8%. The corresponding probability is 0.860 that a linear value function is still diagnosed linear, when 25% (5 out of 20) of the original questions were replaced by the rescaled questions (over the bounds) and the probability is 0.862 for a nonlinear value function to remain nonlinear. The test statistic for the difference in the population proportions is z⁄ = 0.028 1.96 (5% risk level). We conclude that the difference between the two population proportions ‘‘Yes–Yes / ‘‘Yes – Total’’ and ‘‘No–No / ’’No – Total’’ is not significant. 4.3. Gender and linear value function For curiosity, we cross-tabulated the gender and consistency with a linear value function and obtained an interesting result. From Table 9, we see that men’s responses were to a larger extent consistent with a linear value function than women’s. We were able to explain 28.4% of the responses of women with a linear value Table 7 Linearly re-scaled criterion values (one on the bound) of pairs 1, 4, 11, 17, and 19 within the 10% expanded range. # of Inconsistent choices
0 1 2 3 4 5 Total
Consistency with linear model Yes
No
Total
14 18 19 5 0 0 56
15 26 28 15 4 0 88
29 44 47 20 4 0 144
Table 8 Consistency with linear model using original vs. scaled values II. Original responses
Re-scaled values (over the bound)
Yes No Total
Total
Yes
No
43 (29.9%) 13 (9.0%) 56 (38.9%)
7 (4.9%) 81 (56.3%) 88 (61.1%)
50 (34.7%) 94 (65.3%) 144
Consistency with linear model
Gender
Women Men Total
Total
Yes
No
21 (14.6%) 35 (24.3%) 56 (38.9%)
53 (36.8%) 35 (24.3%) 88 (61.1%)
74 (51.4%) 70 (48.6%) 144
function, and 50.0% of men. The difference is significant, but we do not even try to explain why. 5. Discussion and conclusions We have conducted an experiment to investigate, whether subjects are consistent with a linear value function, while making binary choices in a bi-criteria setting. Inconsistencies may appear for various reasons, for instance: (1) people do not pay enough attention to choices, (2) they are simply unable to be fully consistent (with a linear value function), (3) they change their mind, or (4) they simply make errors. We studied the prevalence of such inconsistencies, what kind of impact they have on the functional form of the value function, and whether a linear value function could explain choices. We estimated the weights of the linear value function for each subject using a linear programming model. The model maximizes the minimum distance of the resulting weights from the preference constraints. When we were able to find weights in such a way that all choices were correctly explained by a linear function, we concluded that the subject’s value function was linear; otherwise it was diagnosed nonlinear. In our experiment, it turned out that people are not very often fully consistent with a linear value function, or any value function, when making binary choices. This is in line with behavioral decision theory research such as Tversky (1969) and Simon (1956). We estimated that the probability is only 0.708 that a person will make the same choice twice, when there is a certain time period between the choices. The underlying reason might be a learning effect, getting tired, or simply the fact that the subjects were not able to be fully consistent. (Getting tired is probably one of the reasons in our case explaining the low probability value, because the two control questions were presented at the end of the experiment.) We also estimated the probability that a subject chooses a dominated alternative over an alternative that dominates it: 0.049 (=1 0.951), which is clearly smaller than 0.292 (=1 0.708). Obviously, the probability that a subject will make 20 consistent choices (with a linear value function) must be lower than 0.37 (=0.95120). Note that our conclusions are limited by presenting the ‘‘control questions’’ rather late in the stream of questions. The situation may have changed, had we presented them earlier. In spite of many inconsistent choices, fortunately our weight estimation method is quite robust. The robustness argument is based on our finding that the impact of inconsistent responses on the functional form of the value function was minor. In our experiment we also explored the impact of re-scaling the criterion values on the consistency of choices. We initially thought that the subjects’ choice model might become more lexicographic with choices outside ‘‘the expected range’’ of the subjects. It was surprising to discover that there was no significant difference in the consistency probability between subjects with a linear value function and subjects with no linear value function, when we analyzed the impact of two kinds of re-scalings. Theoretically, in case of a linear value function, the choices with the original values and the re-scaled values should have been the same. However, the subjects provided inconsistent responses in spite of whether the value function was diagnosed linear or not.
P.J. Korhonen et al. / European Journal of Operational Research 219 (2012) 360–367
Interestingly, in general, there was no significant difference in the consistency probability between subjects who possessed a linear value function and subjects with no such function. This observation led us to study the use of a linear value function in explaining the preferences of the subjects. To do this we divided the original set of pairs (20) into two subsets, both consisting of 10 pairs. The first 10 pairs, or more accurately the corresponding choices, were used to estimate the weights of the linear value function for each subject, and the value function was used to predict the preferences for the last 10 alternative pairs for each subject. As we hypothesized, no significant difference in predictability was concluded between the subjects for whom the value function was diagnosed linear or not linear. In both cases the probability to predict a choice was 0.85. This leads us to conclude that a linear value function may be used to convey information about the preferences – at least in a bi-criteria binary choice situation - provided that we have no a priori information about the functional form of the value function. Apparently a critical issue is how to estimate the weights of the linear function. The method we used in this paper seems to be robust and gives good results. A topic for further research is to compare various estimation methods. Finally, we would like to emphasize that our conclusions are drawn from a bi-criteria experiment, where the subjects were students at the Helsinki School of Economics (currently Aalto University, School of Economics). The subjects were asked to evaluate several alternative pairs and to choose the more preferred one. The alternatives were described with the criteria: Credit Points (ECTS = European Credit Trading System) and Grade Point Average (GPA). We believe – but do not know for sure – that our results generalize to higher dimensions. The question is open for further research. An interesting question is, whether inconsistent choices are related to the magnitude of the choice differences. It is possible that small differences are too difficult to have anything other than a weak preference, or it may be that large differences are too difficult to cognitively process. These questions are important research issues for the future. We believe that our findings – despite their tentative nature – have implications for the design and development of decision tools and procedures for Multiple Criteria Decision Making. We believe that in estimating a value function it is important to use a method which tolerates some violations from a general rule (such as a linear value function), when information from the choices is collected. If this violation is not random such as in case of learning,
367
it is important to model it. If there is no information about the functional form of a value function (this is the usual case), a linear form seems to work fine. A linear proxy also seems to work fine to identify the most preferred region of solutions. If we would like to find the most preferred alternative, then the problem is different (and more complex), and there are more efficient multiple criteria methods which could be used for this task (for a survey, see for example Korhonen et al., 1992). We answer affirmatively the question posed in the title of our paper: can a linear value function explain choices? References Bigelow, J. (Ed.), 1887. The Complete Works of Benjamin Franklin, Vol. 4. Putnam, New York. Charnes, A., Cooper, W.W., 1961. Management Models and Industrial Applications of Linear Programming. Wiley, New York. Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2, 429–444. Dawes, R.M., 1979. The robust beauty of improper linear models in decision making. American Psychologist 34 (7), 571–582. Dawes, R.M., Corrigan, B., 1974. Linear models in decision making. Psychological Bulletin 81, 95–106. Einhorn, H.J., Hogarth, R.M., 1981. Behavioral decision theory: processes of judgment and choice. Annual Review of Psychology 32, 53–88. Köksalan, M., Büyükbasßaran, T., Özpeynirci, Ö., Wallenius, J., 2010. A flexible approach to ranking with an application to MBA programs. European Journal of Operational Research 201, 470–476. Köksalan, M., Ulu, C., 2003. An interactive procedure for placing alternatives in preference classes. European Journal of Operational Research 144, 429–439. Korhonen, P., Moskowitz, H., Wallenius, J., 1992. Multiple criteria decision support – a review. European Journal of Operational Research 63, 361–375. Korhonen, P., Wallenius, J., 1989. A careful look at efficiency and utility in multiple criteria decision-making: a tutorial. Asia-Pacific Journal of Operational Research 6, 46–62. Neter, J., Wasserman, W., Whitmore, G.A., 1988. Applied Statistics, Third ed. Allyn and Bacon, Boston. Phelps, S., Köksalan, M., 2003. An interactive evolutionary metaheuristic for multiobjective combinatorial optimization. Management Science 49, 1726– 1738. Saaty, T., 1980. The Analytic Hierarchy Process. McGraw-Hill, New York. Scheubrein, R., Bossert, B., 2001. An Internet System to Apply the Balanced Scorecard Concept to Supply Chain Management. In: MCDM in the New Millennium. In: Köksalan, M., Zionts, S. (Eds.), . Lecture Notes in Economics and Mathematical Systems, 507. Springer, Berlin, pp. 348–356. Simon, H.A., 1956. Rational choice and the structure of the environment. Psychological Review 63, 129–138. Tversky, A., 1969. Intransitivity of preferences. Psychological Review 76, 105–110. Zionts, S., 1981. A multiple criteria method for choosing among discrete alternatives. European Journal of Operational Research 7, 143–147. Zionts, S., Wallenius, J., 1976. An interactive programming method for solving the multiple criteria problem. Management Science 22, 652–663.