Behavioral influences on weight judgments in multiattribute decision making

Behavioral influences on weight judgments in multiattribute decision making

EuropeanJourrmlof OperationalResearch67 (1993)1-12 North-Holland Invited Review Behavioral influences on weight judgments in multiattribute decision...

1MB Sizes 3 Downloads 174 Views

EuropeanJourrmlof OperationalResearch67 (1993)1-12 North-Holland

Invited Review

Behavioral influences on weight judgments in multiattribute decision making Martin Weber lnstitut fiir Betriebswirtschaftslehre, Universitiit Kid, 2300 Kie~ Germany

Katrin Borcherding Institut filr Psychologic, Unive~itiit Darmstadt, 6100 Darmstadt, Germany ReceivedOctober 1992 Abstract: In multiattribute decision making the derivation of weights is a central step in eliciting the decision maker's preferences. As in other measurement tasks, weight judgments can depend on a variety of different factors. In our survey we review those studies that investigate behavioral influences on weight judgments. These results from descriptive research are of importance for the prescriptive use of decision analysis. Only if we know about behavioral influences might we be able to avoid or reduce them. Keywords: Decision theory; Multiple criteria; Preferences; Weights; Behavior

1. Theoretical background: multiattribute value theory Decision problems in everyday life or in economic settings often involve multiple criteria, thus making multiattribute decision making an important field in decision analysis. Practically all approaches to multiattribute decision making explicitly or implicitly make use of the concept of the relative importance of criteria, i.e. the weights of criteria or attributes. In prescriptive decision research, the decision maker's basic criteria are elicited and evaluated in terms of importance weights and the resulting best decision is derived by a model.

Correspondence to: Professor Martin Weber, Institut fiir Betriebswirtschaftslehre, Universitllt Kiel, 2300 Kiel, Germany.

This paper focuses on the assessment of attribute weights, which are important inputs f o r the model. From a prescriptive point of view quite a few variables should not have an influence on weights, for example, the description of the attribute (description invariance), structural components of the value tree (description inw,aiance) or the procedure to elicit the weights (procedu~e invariance). Weights should be sensitive~ however, to the range of attributes. In the following we will concentrate on whether these p r e scriptive requirements are fulffiled, i.e. we investigate behavioral influences on weight judgments, t t The question raised here has an impo~ant parallel in decision making under risk (Hershey,Kunreuther and Schoemaker, 1982; Tversky and Kalmeman, 1986). The utility functioncan dependon the assessmentprocedureas wcll~ on the presentationof the decisionproblem.

0377-2217/93/$06.00 © 1993 ElsevierSciencePublishers B.V. All rights reserved -

2

M. Weber, K. Borcherding / Behavioral influences on weightjudgments

The concept of weight can only be defined with regard to a specific theory of preference. Many different theories have been proposed, e.g. the analytical hierarchy process (Saaty, 1990), Electre-type methods (Roy, 1990), multiattribute value theory (decision under certainty, Dyer and Sarin, 1979) and multiattribute utility theory (decision under risk, Keeney and Raiffa, 1976). Multiattribute value theory (MAVF) is the most widely used theory in solving multiattribute decision making problems. As will be seen below the simple additive aggregation model is based on MAVT. In addition MAVT has a thorough axiomatic basis which is especially important when dealing with problems of measurement (of weights). Electre-type methods as well as the analytical hierarchy process so far do not offer a convincing axiomatic foundation for the meaning of weights and will not be discussed here. We will therefore investigate the behavioral influences for weights that are defined in tile context of multiattribute value theory (MAVT). Let a = (a, . . . . . a n) be an alternative described by the n-tuple of specific outcomes on n relevant attributes (objectives). In MAVT general properties of the decision maker's preferences determine the appropriate model to aggregate the outcomes. The simplest aggregation rule is defined by the additive model. For this model, the value of an alternative, denoted v(a), is given by n

v ( a ) = ~_, kivi(ai),

(1)

i~i

where vi is a conditional value function for the i-th attribute, normalized on the interval [0,1]. The coefficient k i is a scaling constant, also referred to as the weight of the i-th attribute. We have k i > 0, and the weights sum up to one. The weight k i shows the importance of the i-th attribute relative to a unit of the conditional value function. They represent tradeoffs between units of different conditional value functions. An alternative a is preferred to' an alternative a' if and only if v(a) is greater than v(a'). If the decision maker's preference, apart from some technical conditions like transitivity, follows the additive difference independence condition, preference can be modelled by the additive aggregation rule (see Dyer and Satin, 1979). This condition states that preference and strength of

preference for outcomes on one attribute can be evaluated independently from the outcome levels of the other attributes. This independence condition is necessary if one wants to assess conditional value functions independently from one another. All simple rating and weighting techniques are based on this condition. In the following we will assume that these conditions for the additive model are met. The determination of weights is a central part of multiattribute value analysis. If the attribute values are positively correlated across alternatives, decisions are pretty robust to changes in weights. If, however, as in most complex decision problems, outcomes are negatively correlated, small changes in weights can yield different optimal decisions (see, e.g. Einhorn and Hogarth, 1975; Wertenbroch, 1991). It is therefore worthwhile to investigate behavioral influences on weight elicitation. 2. Methods for weight elleltation

Numerous procedures for the determination of weights have been proposed (see Fishburn, 1967; von Nitzsch, 1992; yon Winterfeldt and Edwards, 1986). They can be classified by whether the weighting procedure is statistical or algebraic, holistic or decomposed, or whether it is direct or indirect. Algebraic procedures calculate the n weights from a set of n - 1 judgments often using a simple system of equations; statistical procedures are based on a redundant set of judgments and the weights are derived with some statistical procedure such as multiple regression analysis or maximum likelihood estimation. Holistic procedures require the decision maker to evaluate (holistic) alternatives, i.e. rate or rank alternatives; decomposed methods look at one attribute or attribute pair at a time. Direct procedures ask the decision maker to compare the ranges of two attributes in terms of ratio judgments whereas indirect procedures infer weights from preference judgments. In the following we will present some methods for weight elicitation that are frequently used and that are importaiit for analyzing behavioral influences. Although tliese methods can vary-with respect to the three dimensions just described, only a few of these combinations are actually

M. Weber,I~ Borcherding / Behavioralinfluenceson weightjuc~ments

applied. By no means do we attempt to give an overview on methods of weight elicitation. Weight elicitation methods can be applied in a hierarchical and a non-hierarchical approach. In the non-hierarchical approach the weights are elicited across all attributes at the bottom of the value tree. If the elicitation of weights makes use of the hierarchical structure of the value tree, the methods are applied on various levels of the tree. Branches at the lowest level are considered first. The weight elicitation continues on the next higher level, etc. The final weights are calculated by multiplying the weights of the different levels through the tree. The ratio method (Edwards, 1977)requires the decision maker to first rank the relevant attributes according to their importance. The least important attribute is assigned a weight of 10 and all others are judged as multiples of 10. The resulting raw weights are normalized to sum to one. The ratio method is an algebraic, decomposed, direct procedure. The swing procedure (yon Winterfeldt and Edwards, 1986) starts from an alternative with the worst outcomes on all attributes. The decision maker is allowed to change one attribute from worst outcome to best. The decision maker is asked which 'swing' from the worst to the best outcome would result in the largest, second largest, etc., improvement. The attribute with the most preferred swing is most important, and arbitrarily - it is assigned 100 points. The magnitudes of all other swings are expressed as percentages of the largest swing. Again, the derived percentages are the raw weights that are normalized to yield final weights. The swing method is an algebraic, decomposed, direct procedure. The tradeoff procedure has the strongest theoretical foundation (Keeney and Raiffa, 1976). The key idea of the procedure is to compare two alternatives described on two attributes (for the remaining attributes both alternatives have identical values). One alternative has the best outcome on the first and the worst outcome on the second attribute, the other has the worst on the first and the best on the second attribute. By choosing the preferred alternative out of the two the decision maker decides on the more important attribute. The critical step is the adjustment of attribute outcome in order to yield indifference between

3

the two alternatives. This is typically done by either worsening the chosen alternative in the good outcome or improving the non-chosen alternative in the bad outcome. Such indifferences have to be elicited for n - 1 meaningfully selected pairs of alternatives. If the conditional value functions are known, attribute weights can be derived. The tradeoff method can be classified as at, algebraic, decomposed, indirect procedure. We only briefly want to cite another important class of methods for weight elicitation. Conjoint procedures (statistical, bolistic, indirect) require the decision maker to rank (Green and Srinivasan, 1978, 1990) or to rate (Barren and Person, 1979) alternatives. Using statistical procedures, weights are derived that best fit the alternatives' evaluations. Deriving weights using regression analysis is one example of a conjoint procedure. In the basic MAVT model (see (1)) it is implicitly assumed that the decision maker has well defined preferences. No matter which way the weights are elicited or which way the attributes are described, the decision maker uses (in theory) his or her stable, well defined preferences to answer the questions of the elicitation procedure. Like using a mechanical device, well defined preferences lead to a set of weights that are independent of elicitation procedure and presentation mode. 3. Procedural invariance in weight determination In this section we will investigate whether there is any influence on weights when different methods of weight determination or different implementations of the same method are used. We will argue that elicitation methods are suited to "measure' subjective importance (i.e. to elicit numbers that reflect a decision maker's preference) depending on whether the decision maker is able to understand the method and use it consistently (Section 3.1), - different weight elicitation methods yield similar weights (Section 3.2), context effects have no influence (Section 3.3). -

-

3.1. Consistency within methods

Elicitation procedures are often applied redundantly. Weights for n attributes can usually

4

M. Weber,K Borcherding / Behavioralinfluenceson weightjudgments

be determined from n - 1 evaluations, but very often more evaluations are asked for. In multiple regression, weights are determined from many more than n - 1 holistically evaluated alternatives. Redundancy in accordance with such a statistical approach is needed to get meaningful weights that are free of random error. The smaller the multiple correlation coefficient, the less ~,ell an overall evaluation can be predicted from a linear combination of the attribute values, instability of subjects' weights can be one of the many reasons for this. If weights are elicited with algebraic methods like the ratio, swing and tradeoff procedures, often a few more than the minimum number of estimates are asked for. Although subjects' consistency within such methods could thus be assessed, few researchers report on this. Inconsistencies are seen as mistakes, not intentionally (deliberately) made by the decision maker. They are fed back to enable the decision maker to learn about the procedure, and they are either neglected or reconciled by asking the person for a final judgment. If no feedback is possible or if the decision maker does not want to change his or her judgments, weights are sometimes derived mathematically through averaging or estimating the best fitting weights. Borcherding, Eppel and von Winterfeldt (1991) reported on inconsistencies within methods. They found their subjects to be inconsistent 3 0 % / 50%/67% of the time using the ratio/swing/ tradeoff method. The judged mean difficulty ratings of the three methods were similar but corresponded to the previous results in that the ratio method was judged easiest and the tradeoff method judged hardest. Still, on an individual level, the correspondence almost disappeared. For each method, the correlation between the number of inconsistencies and the judged difficulty rating was computed. It was assumed that subjects who judge a method as being more difficult will also be more likely to show inconsistencies. This was not the case: all correlations were close to zero, namely r = 0.13/0.15/0.01 for the three methods. Interestingly enough, subjects using one method consistently were more likely to use others also consistently, but consistent subjects did not differ from inconsistent ones on other more relevant variables, i.e., no difference in conver-

gence between methods was observed. This result is in line with Schoemaker and Waid (1982) who did not find significant correlations between the degree of internal consisteny of a weighting method and the percentage of correct predictions when applying the weights to predict actual choices among pairs of alternatives. These studies show that decision makers can have a considerable amount of inconsistencies. It will definitely be interesting to further study how to cope with such inconsistencies. At this point we do not know if (or how) the degree of inconsistency can be related to the validity of the estimated weights. 3.2. Convergence between methods

There is no single way to determine the degree of correspondence between different sets of weights. Apart from the heterogeneity of the methods and structural aspects of the set of alternatives, the convergence between weights depends on the criterion of comparison. Meaningful criteria are direct correspondence between different sets of weights (discrepancy, correlation between and steepness within sets of weights), similarity of pairwise relations between attribute weights, and their strategic equivalence (i.e., if they lead to the same preference order when applied to alternatives). For a student admission decision problem with four attributes, Schoemaker and Waid (1982) compared five different weighting methods, namely weights from holistic judgments using multiple regression, direct pairwise attribute tradeoffs, the analytic hierarchy method, allocation of 100 points to attributes, and equal weights. The different methods yielded similar weights that corresponded in rank order but were steepest for multiple regression and flattest for direct point allocation. This is in accordance with earlier work by Slovic (1975) who showed that weights elicited through holistic procedures tend to be steeper than the ones elicited directly (see Weber, 1985, for similar results). In order to compare the effect of the different weighting methods, Schoemaker and Waid (1982) used subjects' choices for 20 randomly selected pairs of alternatives as a criterion. Additive aggregation models with the different weights as inputs were used. All models did well and with

M. Weber,K. Borcherding / Behavioral~[tuenccson ~ t

the exception of equal weighting predicted more than 80% of the choices correctly. When strength of preference served as the criterion, predictions from the point allocation method were best, closely followed by the tradeoff, multiple regression and finally the analytical hierarchy method; again, equal weighting fared worst. The results of Borcherding, Ep[~l and yon Winteffeldt (1991) show larger discrepancies between more similar weighting methods. In a siting problem with 5 to 12 attributes, they compared four commonly used methods in MAVT analysis, namely the ratio, swing, tradeoff and pricing-out methods. The ratio method was used hierarchically, whereas the other methods were used nonhierarchically. For the 5 attributes used in all conditions, the mean individual rank correlations were only at about r = 0.5: highest between the rate and swing method, and lowest between the pricing-out (see Keeney and Raiffa, 1976, for this method) and the tradeoff method. Obviously, the relative low rank correlation between differently elicited sets of weights is unsatisfying. Looking at the actual size of the weights, huge discrepancies occurred. The hierarchic ratio method resulted in the heighest weight for a branch that was given only average importance by the other methods, but happened to be split off at the top level of the tree with no further subdivision on lower levels. For the pricing-out method the weight for the attribute 'Costs' was highest and overwhelmed the others, whereas the other methods gave minor weight to the cost attribute. The explanation is probably subjects' unfamilarity with the large amounts of money involved. Across methods the most important attribute was 'Health and Safety Impacts to Human Beings'. This attribute got the highest weight within the swing and the tradeoff method. Although the swing weights for this attribute might correspond to what subjects wanted to express, the average tradeoff weight seems to be too high. It might stem from the fact that the required indifference adjustments had to he done for this attribute and people just wanted to avoid very bad outcomes on 'Health and Safety'. If the differences in weights were transformed to average equivalent costs, the difference between the ratio and pricing-ont method was largest: In order to avoid the bad consequences of the nonmonetary attributes, subjects were willing to pay 50 times as

~

5

much as a consequence of ratio weights than pricing out weights. In a set of experiments, Eppel (1990) analyzed the same methods as before. Among other things he was interested in how subjects would reconcile discrepancies between different elicitation methods when they learned about them and had the chance to change their weights. Probably pa.~ly due to an improvement in the procedure for the pricing-out method (subjects had to compare each nonmonetary attribute with the cost attribute first, bei~re specifying equivalent dollar anmunts) the prieing-~ut method was not such an outlier. Partly due to a more familiar decision context, the methods yielded more similar weights and higher correlations between methods. In order to have subjects reconcile discrepancies between the weighting methods, subjects got feedback for their preceding assessments. The feedback format was done in different modes, and reconciliation mostly followed the previously elicited ones in the same mode: The pricing mode resulted in reconciliation towards pricing-out weights; the tradeoff nmde resulted in reconciliation towards tradeoff weights; and the ratio mode leaned towards ratio weights. If reconciliation is biased towards the method that is most compatible with the actual feedback format, this might be the influence of more fundamental cognitive biases, i.e., depending on the task, some weights are more prominent than others (Tversky, Sattath and Slovic, 1988). Eppel's results contain a clear warning: The effect of elicitation methods on weights even remains during reconciliation and thus is stable and resistant, thus violating the invariance condition more severely. 3.3. Invariance o f decision mode

In identifying the best alternative (or subset of alternatives) different response modes are meaningful. Among others, the best alternative can be directly selected (choice), dominated or inferior alternatives can he successively rejected, the actual choice can he inferred from an evaluation of all alternatives (judgment), or choices can he inferred from constructed indifferences between alternatives (matching). Prescriptively, all strategies should lead to the same best alternative. Tversky, Sattath and Si~vic (1988) analyzed preferences from matching and choice. In describing

6

M. Weber,K Borcherding / Behavioralinfluenceson weightjudgments

alternatives on two relevant attributes they found that attribute weights derived from choices are steeper than from matching; choices are more congruent with a lexicographic decision rule. Slovic, Griffin and Tversky (1990) found compatibility effects on judgment and choice. When evaluating alternatives, the specific response format influences attribute weights insofar that attributes with response compatible outcome descriptions receive a higher weight, a phenomenon that also explains preference reversal. Both effects are explained by the prominence hypothesis, which states that the primary dimension of a decision problem looms larger in choice than in matching. Accordingly subjects tend to choose the option that is superior on the primary dimension - which might be either the more important dimension or the one with higher compatibility (Tversky, Sattath and Slovic, 1988). Westenberg and Koele (1990) conceive response mode as a continuum with judgment and choice as the two extremes, and classification into subeategories or ranking in between. The nature of such a scale is basically determined by the number of scale points, and the actual decision strategy diffe~'s as a function of the response format, i.e. the number of scale points it involves. The results correspond to the previous one: The number of scale points may make different attributes compatible and thus salient; and the decision mode 'choosing' is best explained by the 'elimination by aspects' decision strategy. The results of these behavioral studies suggest that, contrary to the assumptions of MAVT, decision makers' expressed preferences are determined not only by the alternatives' values but also by such normatively irreleval~t factors as response format. Put simply, different alternatives may emerge as optimal due to procedural aspects of the elicitation procedure.

4. Description invarJance: invariance in weights across structures

In the last section we found that weights were sensitive to the elicitation procedure. In this section we will consider different problem structures that are equivalent according to MAVT. A decision maker's preferences should be invariant across alternative ways of structuring the decision

problem. The question arises whether this invarianee holds when decision makers specify weights. First we will report on three structure-dependent or structure-induced biases, namely the splitting bias, range effects, and hierarchical effects. At the end, a unifying frame and some generalizations will be presented. 4.1. The effect o f attribute ranges

Think about the attribute 'earnings per year' in a job choice problem. How important is this attribute? Following MAVT, the answer clearly has to depend on the range of this attribute. As the conditional value function is normalized on [0,1], a larger (smaller) range of the attribute should result in a larger (smaller) weight of the attribute (assuming the conditional value function is monotonically increasing). A range of earnings from DM 50.000 to DM 70.000 has to be given less weight than a range from DM 30.000 to DM 100.000. The weight of an attribute as a function of its range (i.e. as a function of different normalizations) can easily be derived (see, e.g., von Nitzsch and Weber, 1991). It is important to note, that in a correct prescriptive use of MAVT a decision maker intuitively has to adjust her weights to alternative ranges in order to have a stable preference. If the decision maker did not properly adjust her weights, preferences would depend on the range of attribute outcomes used for normalization. Recently, three studies have investigated whether decision makers' weight judgments follow the prescriptive requirements of MAVT (Bcattie and Baron, 1991; Fischer, 1991; yon Nitzsch and Weber, 1991; see these papers for previous work). Von Nitzsch and Weber (1991) investigated a job choice problem using student subjects. First, each student was asked what range comes to his mind when he thinks about the attribute 'earnings per year'. For this attribute weights were elicited based on each subject's intuitive range and based on half or double the intuitive range. The ratio method and a conjoint procedure that required the decision maker to rate holistic alternatives were used. A range sensitivity index was defined that was equal to one, if subjects adjusted the weights according to theory and that was zero if subjects did not adjust the

M. Weber,K. Borc/~rding / Bdmvio~! inf'~nces on .civet jud~nts weights at all. The results clearly indicated that people do not adjust weights properly: The average range sensRivity for the conjoint method was equal to 0.56; for the direct ratio method it was equal to 0.18. The results also issue a clear warning: if decision makers are only asked for the importance of an attribute, they obviously do not take the range sufficiently into account. To say it differently: Do not use weighting methods that rely on importat~ce judgments. The same lack of range sensitivity can be found in other studies. Fischer (1991) compared the range sensitivity for a direct ratio procedure, the swing procedure, and the tradeoff method. He also compared job offers described by the attributes 'salary' and 'vacation days'. For the direct ratio method, his subjects did not adjust the weight of the attribute 'vacation days' when the range was changed from 5-10 days to 5-20 days. In contrast, for the swing and tradeoff methods, the weights wore adjusted to some degree. Assuming linear value functions for vacation days, the range sensitivity index is equal for swing and tradeoff. As the absolute values for the sensitivity index heavily depend on the shape of the value functions (not reported in the paper), the results of both studies cannot be compared further. Beattie and Baron (1991) used a standard hol~stic regression approach (similar to the conjoint method in yon Nitzsch and Weber, 1991). Although their study does not investigate the range sensitivity question in the light of MAVT, some parts of their results can he compared to the results of yon Nitzsch and Weber (1991). Their subjects also adjusted the ranges to so--~ degree; however, it seems as if they did not fully adjust their weights (range sensitivity assuming linear value function for two experiments: 0.56 and 0.16). The range bias could be related to the proxy bias (see Fischer et al., 1987). Fischer et al. found that decision makers tend to overweight proxy attributes compared to more fundamental attributes. In their study they used the level of dust emissions as a proxy attribute for the attribute 'person days per year of asthma' resulting from exposure to a factory's dust. Utility theory requires that the weight for the proxy should be smaller than the weight for the fundamental attribute: There is only a probabilistic link between the proxy and the fundamental attribute, i.e. only

7

'part' of the adverse impact of the proxy attribute is reflected by the fundamental attribute. Howe~er, decision makers do not adjust the weights properly when part of the attribute is taken away.

4.2. The effect of splitting An influence on weight judgments closely related to the range bias is the so-called splitting bias (Weber, Eisenfiihr and yon Winterfel~, 1988). Suppose one attribute is split into two disjunct subattributes. According to MAVT, for the additive model, the weight for the attribute should he equal to the sum of the weights of the two subattributes. However, Weber, Eisenflihr and yon Winterfeldt (1988) showed that for the ratio, swing, and a decomposed statistical procedure the sum of weights of subattributes is significantly greater than the weight of the overall attribute. The effect still exists but is somewhat smaller for a conjoint procedure. The splitting bias also shows up in other studies (see Eppel, 1990, and Weber, 1989, who was able to partially explain the bias by availability). Borcherding and yon Winterfeldt (1988) investigated different influences of value tree structuring on weight judgments. In what they called 'substructure', they varied the degree of partition into subattributes, thus checking for a splitting bias. The swing and the pricing out method were used non-hierarchically for all trees. The results for the different methods were affected by the number of attributes at the lowest level, thus showing a significant splitting bias. For the swqng method split, attributes with consequently considerably reduced meaning basically got the same weight as the unsplit ones. They found no splitting bias for the ratio method where the weights were elicited on each hierarchi~ level. We will discuss a possible effect of hierarchy on weights in the next section.

4.3. The effect of hierarchy Value trees can be constructed using basically two different strategies. In the top-down approach one starts with very general top level objectives that successively get subdivided and specified by lower level objectives. In the bottom-up approach one starts with a list of all the differences between alternatives one cares about

8

M. Weber,K. Borcherding / Behavioralinfluenceson weightjudgments

and sucessively structures and combines them according to higher level objectives. Both approaches are meaningful. As Adelman, Sticha and Donnell (1986) found, the top-down approach yields steeper value trees with more layers between the top and the bottom level, but there is no difference between the average number of lowest level objectives nor in the quality of the trees as evaluated by independent experts. The question arises whether the hierarchy of a tree has an influence on the next step in decision analysis: the weight judgments. Borcherding and yon Winterfeldt (1988) analyzed weight judgments for value trees that systematically varied in hierarchical structure. They asked whether an objective received more weight when presented at a higher level of the value tree. In their study they added one branch to the value tree at different levels of the hierarchy. Using the ratio method they found that the higher in the tree a branch was added, the more weight it got. As mentioned earlier, in a value tree weights can be derived for the lowest attributes (i.e. for a list of attributes) or for each level separately (hierarchical weights). In the latter case the lower level weights are calculated by multiplying down the corresponding weights. Again, both ways of deriving weights yield different results. Using a direct, decomposed, algebraic procedure, Stillwell, yon Winterfeldt and John (1987) showed that hierarchical weights were steeper, i.e. then had higher weight ratios. 4.4. A unifying frame

The insensitivity to range and the sensitivity to splitting of attributes can be attributed to the same cognitive causes, and both are special cases of structural issues that influence weights. The splitting of attributes as well as the specification of attribute ranges are variations done in the bottom part of the value tree. When splitting an attribute further into subattributes, the subattributes usually are qualitatively different from each other and from the unsplit attribute. If range effects are seen in terms of the splitting bias, an attribute with the full range might be subdivided into parts of the range. For example, if severity of consequences is determined by the length of time it lasts, say a certain amount of

pollution for ten years, this attribute can be split into two attributes, namely the pollution during the first five years and the following five years. Such a split is done by cutting the range. The relation of cutting ranges and splitting attributes can also be shown for more general cases. Suppose you are interested in filling a professorship. Clearly the quality of the applicants, which can be split into teaching and research quality, is of importance. Conceptually the splitting is identical to having two subattributes with reduced content (ranges). As subjects do not adjust the importance judgments enough for reduced ranges (contents), the split attributes get overweighted with respect to the combined main attribute. The similarity between both biases is also evident when an attribute is described by a colnpound event on the super level, and with the specific events on the sublevels. An example would be the 'Health and Safety' consequences in Eppel (1990) and Borcherding and yon Winterfeldt (1988). In one case they are described as the number of workers and residents dying either by accidents or from cancer, and in the other case as four separate subattributes, health for workers, health for residents, safety for workers and safety for residents with the same outcomes as before, respectively. In both cases, precisely the same conceptual information was given, reflecting an attribute split by a range split. Variations in the bottom and the top levels of a value tree have different effects on weighting procedures. Direct weighting procedures seem to be influenced by lower level variations, if applied in a non-hierarchic way. Most of the research reported here has found the splitting bias in using non-hierarchic n,~thods. On the other hand top level variations influence weight assessments when methods are used hierarchically.

5. Description |nvariance: invariance in weights across reference points Recently, reference points have become very popular in decision making under risk. Different descriptions of a decision problem can lead to different reference points. The same outcome (5% salary increase) may be framed as a gain (when you were expecting 2%) or as a loss (when you were expecting 10%) depending on the refer-

M. Weber,K Borcherding / Behavioral inf&ences on weight~ n t s

A~

ii

A1 Figure 1

ence point. The setting of a reference point (i.e. the framing of a decision) influences the risk attitude of decision makers. For lo~es decision makers tend to be risk seeking, for gains they tend to be risk averse (Kahneman and Tversky, 1979). The idea that reference points matter has recently been extended to multiattribute decision making (Tversky and Kahneman, 1991). The new theory is mainly descriptive. However, for prescriptive use it is important to recognize the potential behavioral influences described by the thepry. In one experiment Tversky and Kahneman (1991) convincingly demonstrated that weights depend on the status quo of the decision maker. Two jobs were compared: X ~ (limited social contact, 20 rain daily travel time to work), Y ~ (moderately sociable, 60 min daily travel time to work). If the reference point, i.e. the present job, involved no social contact and 10 minutes travel time 70% of the subjects preferred job X. In case the present job involved much pleasant interaction and 80 minutes daily travel time only 33% of the subjects preferred job X. The difference stems from the fact that subjects do not want to lose relative to what they currently have. More generally, consider Figure 1. Let X be the reference point and X' the matched indifference point. The loss aversion of having to give up something on attribute A 2 makes this attribute more important. A similar argument holds for the pair Y and Y'. It is interesting to note that due to different weighting, indifference curves can cross. Delquie (1990) elicited indifference statements by matching one of two dimensions. He derived

9

indifference curves analogously to Figure 1; he also found significant changes in weight jud~ ments. However, Delqule sees loss aversion only as one possibility to explain his results. Obviously, there are different ways to set a reference point. Presenting people with phantom alternatives might be another possibility (Farquo bar and Pratkanis 1986). If a subject is indifferent between alternatives X and Y (see Figure i), the presentation of an unavailable (phantom) alternative, e.g. X', dominating Y makes the decision maker (to a small degree) prefer Y over X. The loss of Yl (when exchanging Y to X) in the context of the phantom has less of an impact than in the context without the phantom. The general question how additional alternatives effect the tradeoff between a given pair of alternatives is far from being resolved. For a recent approach to modelling context dependent preferences see Tversky and Simonson (1991). Some framing effects directly influence weight assessments. As Shapira (1981) has shown using the tradeoff procedure, it is not the same whether weights are derived from equivalent improvements (or wins) versus equivalent degradation (or losses), and effects like these might also occur with other methods, specifically the swing method, as well as all kinds of anchoring and reference points. Framing from a more general perspective is outlined in Keeney (1992). We tend to frame decisions as problems, not as opportunities, wlficl~ they often are and can be. If a person looks for a job and gets several options, we might think of this person as having a decision problem. Why is this a problem and what is bad about having more than one job offer? If situations like the previous one are associated with opportunities and positive feelings, decisions might be different and might be also less stressful and even enjoyable. 6. Some psychological explanations for failures of invariance

6.1. Cognitive overload reduction strategy When people have to consider many criteria in making a decision, they typically suffer from cognitive overload. In order to reduce the stress of

10

M. Weber,If. Borcherding / Behavioral influences on weightjudgments

such decision tasks, they apply simplifying strategies or heuristics that mostly lead to good but under specific conditions may lead to often very biased decisions. As mentioned earlier, attribute weights derived from choices or holistic alternative evaluations are steeper than those derived from matching or decomposed criteria evaluations. In these cases, alternatives are the focus of interest, and the complexity of the situation is reduced by emphasizing outcomes on important aspects and ignoring unimportant ones. The result is overly steep weights, increased for the already important attributes and decreased for the unimportant ones, thereby reducing the dimensions of the evaluation task and deciding in accordance to some lexicographic decision rule or with prominent attributes. In cases where attributes are salient and attributes have to get judged in accordance to importance, subjects do not sufficiently differentiate between the different attributes and attribute outcomes and spread the weight too evenly across all attributes. This explains why direct weight estimation usually is too flat and accounts for flatter weights in decomposed weighting and matching procedm'es; it also accounts for insensitivity to range effects as well as for sensitivity to split attributes. o.z. Communicating about importance weights

In making decisions, attribute importance can bc inferred from holistic evaluations and viceversa. The extent to which this is true is explored by Goldsteln and his associates ((3oidstein, 1990). In learning about other peoples' beliefs and val. ues we can either observe their evaluations and decision behavior or we can directly communicate the importance of criteria. The latter approach seems to be a lot more efficient. The topic is relevant when alternatives are still to be created, in public issues where decisions have to he justified, and in magistrate or deputy decisions where we want to instruct agents to decide and act appropriately with the goals of a company or the values of a stakeholder. Two questions arise in this context. The extent to which people can adequately infer importance weights from other peoples' judgments and the adequacy of preference orders of alternatives de-

rived from the communications of attribute weights. Whether translations from one set of information to another one are possible depends on whether people agree on the interpretation of the concept of weight, and different possible interpretations are given (Goldstein, 1990; Goldstein and Beattie, 1992). In the global interpretation of subjective weight, subjects express and interpret weight as a general attitude toward proxy or general values. This interpretation is in accordance with the insensitivity of subjects' weights as well as to the attributes' real outcomes and ranges. In the local interpretation of weight, the meaning of weigi~t depends on the specific situation and context effects, and there is evidence for both interpretations.

7. What to learn from this survey In this article we have tried to :onvince the reader that there are a lot of behavioral influences on weight judgments in multiattribute decision making. We have presented evidence that weights can depend on the method to determine the weight (violating procedural invariance), on the hierarchical structure, and on the reference point (violating description invariance). People, however, do not ad/ust properly for ehan~eg o£ ranges. For those who apply MAVT, our results are good and bad news. Bad news, for one has to abandon the notion of weights that can be assessed easily. There are no criteria for determining which is the true weight since it is not clear which elicitation procedure is the least biased. Good news, for we now know much more about potential shortcomings in assessing weights. Being aware of potential biases is the first and probably most important step to overcome them. Multiple assessments provide another way to avoid systematic biases. In case the results differ across these ways, the weights are fed back to the decision maker who is then asked to reconsider her original statements. When weights are assessed in different ways it might also be helpful to apply the concept of partial information (Sale and tI~imiiliiinen, 1992; Weber, 1987). Using this concept a set of weights compatible with all information can be derived.

M. Weber, K. Borcherding / Behavioral influences on weightjudgng'nts

O n e can check if the optimal solution can be found with respect to all weights elicited. F o r future research the a r e a o f behavioral influence on weight judgment offers a lot of interesting perspectives. Researchers have definitely not found all biases relevant for weight elicitation. W e are only aware o f very few attempts to (descriptively) model the observed behavior, Empirical studies o f explaining and manipulating the biases would also be o f great interest. T h e question o f how to deal with bias in a prescriptive framework is also far from being settled. Clearly, what is n e e d e d is an error theory for multiattribute decision making. With all these challenges to M A V T should we not use o t h e r methods like A H P o r others? O u r answer is a clear 'no'. O t h e r methods use procedures like the ratio procedure to derive the weights and thus are prone to the same biases. O t h e r methods do not even offer a convincing answer as to how the decision m a k e r should behave in the light o f certain problems (e.g. A H P for range effect). So far, M A V T is the only axiomatically founded theory which allows the user to carefully investigate the behavioral influences on weight judgments, T h e goal should be to learn more about these influences and to be able to cope with them in a prescriptive setting.

Acknowledgments W e like to thank T h o m a s Eppel, Riidiger von Nitzseh, Stefanie Schmccr, Elke W e b e r and Klaus W e r t e n b r o c h who gave us helpful comments on an earlier draft o f the paper.

References Adelman, L., Sticha, P.J. and Donnell, M.L. (1986), "An experimental investigation of the relative effectiveness of two techniques for structuring multiattribute hierarchies", Organizational Behavior and Human Decision Processes 37, 188-196. Barton, F.H., and Person, H.B. (1979), "Assessment of multiplicative utility functions via holistic judgments", Organizational Behavior and Human Performance 24, 147-166. Baattie, J., and Baron, J. (1991), "Investigating the effect of stimulus range on attribute weight", Journal o/Experimental Psychology: Human Perception and Performance 17, 571-585.

11

Borcherding, K., Eppel, T., and yon Winterf©id~,D. (1991), "Comparhon of weighting jndgn~nts in m u l f i a u ~ utility measurement", ManaGementS C ~ e 37, 1603-1619. Borcherding, K., and yon Wintcffeidt, D. (1988), "The effect of varying value trees on multiattribute cvaiuatiom", POvhoiog/cn 68, i53-170. Delqule, P. (1990), "Ir~'onsistcnt tradc-off's between ~tributes: New insights in preference a s ~ n t biases", Working paper, Graduate School of Business, Uuh'cni~ of Texas at Austin, Austin, TX. Dyer, LA., and Satin, R.K. (1979), "Measurable multi-attribute value functions", Operat/ons Research 27, 810-822. Edwards, W. (1977), "How to use mnltiaUribnte utility analysis for social decision manna", IEEE TransacPams on Systems, Man, and Cybernetics 7, 326-340.

Einhorn, HJ., and Hogarth, R.M. (1975), "Unit weigl~ing schemes for decision making", Organ/zationa/ Rehar~" and immon Performance 13, 171-192. Eppel, T. (1990), "Elicitation and reconciling multiattribute utility weights", Doctoral Dissertation, University of Southern California, Los Angeles, CA. Farquher, P.H., and P atkanis, A.R. (1986), "Phantom choices: The effect of unavailable alternatives on decision making", GSIA, Working paper, Carnegie Mellon University Pitt~ burgh. Fischer, G.W. (1991), "Range sensitivity of attribute weights in multiattribute utility assessment", to appear in Organizational Behavior and Human Decision Processes.

Fischer, G.W., Damodaran, N., Laskey, K.B., and Lincoln, D. (1987), "Preferences for proxy attributes", Managemmt Science 33, 198-214. Fishburn, P.C. (1967), "Methods of estimating additive utilities", Management Science 13, 435-453. Goldstein, W.M. (1990), "Judgments of relative importance Ln decision making: Olol~l vs. local i n t e ~ a t ~ ot ~ ttvc ~ h X " , Orgen~.ationag B~hau~orand Haman Processes 47, 313-336. Goldstein, W.M., and Beattle, J. (1991), "Judgments of relative importance in decision making: The importance of interpretation and the interpretation of importance", in: D.R. Brown and J.E.K. Smith (eds.), Frontiers of Mathematical Psychology, gpringer-Verlaa. New York, 110-137. Green, P.E., and Srinivasan, V. (1978), "Conjoint analysb in consumer research: Issues and outlook", Journal of Consumer Research 5, 103-123. Green, P.E., and Srinivasan, V. (1990), "Conjoint analysis in marketing: New developments with implications for research and practice", Journal of Marketing 54, 3-19. Hershey, J.C., Kunreuther, H.C., and Schoemaker, P.LH. (1982), "Sources of bias in assessment procedures for utility functions", Management Science 28, 936-954. Kahneman, D., and Tversky, A. (1979), "Prospect theory:.An analysis of decisions under risk", Econometdca 47, 263291. Keeney, R.L. (1992), Value Focused Thinking: A Path to Creative Decision Making, Harvard University Press, Cambridge, MA. Keeney, R.L., and Ralffa, H. (1976), Decisions with Multqde Objectives, Wiley, New York. Roy, B. (1990), "Decision-aid and decisior, making" European ]onrnal of Operational Research 45, 324-331.

M. Weber,K. Borcherding / Behavioralinfluenceson weightjudgments Saaty, T.L. (1990), "How to make a decision: The Analytical Hierarchy Process", European Journal of Operational Research 48, 9-26. Sale, A.A., and Hfim[il[iinen,R.P. (1992), "Preference assessment by imprecise ratio statement", to appear in Operations Research. Schocmaker, P.J.H., and Waid. C.C. (1982), "An experimental comparison of different approoches to determ!aing weights", Management Science 28, 182-I96. Shapira, Z. (1981), "Making trade-offs between job attributes", Organizational Behavior and Haman Decision Making 28, 331-355. Slovic, P. (1975), "Consistency of choice between equally valued alternatives", Journal of Experimental Psychology: Human Perception and Performance 1, 280-287. SIovic, P., Griffin, D., and Tversky, A. (1990), "Compatibility effects in judgment and choice", in: H.J. Einhorn (ed.), Insights' in Decision Making, University of Chicago Press, Chicago, IL. Stillwell, W.G., yon Winterfeldt, D., and John, R.S. (1987), "Comparing hierarc~zieal and nonhierarehieal weighting methods for eliciting multi attribute value models", Management Science 33, 442-450. Tversky, A., and Kahneman, D. (1986), "Rational choice and the framing of decisions", Journal of Business 59, $251$278. TverskT, A., and Kahneman, D. (1992), "Loss aversion in riskless choice: A reference dependent model", Quarterly Journal of Economics 107, 1039-1041. Tversky, A., Sattath, S., and SIovic, P. (1988), "Contingent weighting in judgment and choice", PsychologicalReview 95, 371-384.

Tversky, A., and Simonson, I. (1991), "Context-dependent preferences", to appear in Management Science. von Nitzsch, R. (1992), Entsckeidangbei Zielkonflikten, Gabler, Wiesbaden. von Nitzsch, R., and Weber, M. (1991), "The effect of attribute ranges on weights in multiattribute utility measurements", to appear in Management Science. yon Winterfeldt, D., and Edwards, W. (1986), DecisionAnalysis and Behavioral Research, Cambridge University Press, Cambridge, MA. Weber, M. (1985), "Entseheidung bei Mehrfaehzielen nod unvollst~indiger Information", Zeitschrift fiir betriebswirt. schaftliche Forschung 37, 311-331. Weber, M. (1987), "Decision making with partial information", European Journalof Operational Research 28, 44-57. Weber, M. (1989), "Availability as an explanation for the multi-anribute splitting bias", Working Paper No. 88-5, Institut fdr Wirtschaflswissenschaflen, RWTH Aachen, Aachen. Weber, M., Eisenfdhr, F., and yon Winterfeldt, D. (1988), "The effects of splitting attributes on weights in multiattribute utility measurement", Management Science 34, 431-445. Wertenbroch, K. (1991), "The sensitivity of bivariate linear models to changes in the weighting scheme: A method of assessment", Working Paper, Center for Decision Research, University of Chicago, Chicago, |L. Westenberg, M.R.M., and Kocle, P. (1990), "Response modes and decision strategies", in: K. Boreherding, O.L Lariehev and Messick (eds.), Contemporary Issues in Decision Making, North-Holland, Amsterdam, 159-170.