Reduction of paradoxes in subjectively judged competitions

Reduction of paradoxes in subjectively judged competitions

16 European Journal of Operational Research 35 (1988) 16-29 North-Holland Theory and Methodology Reduction of paradoxes in subjectively judged comp...

900KB Sizes 0 Downloads 20 Views

16

European Journal of Operational Research 35 (1988) 16-29 North-Holland

Theory and Methodology

Reduction of paradoxes in subjectively judged competitions J e s p e r S. F R E D E R I K S E N

*

DIKU, Copenhagen R o b e r t E. M A C H O L

J.L. Kellogg Graduate School of Management, Northwestern University, Evanston, 1L, USA

Abstract: In competitions where a number of judges choose with some subjectivity among three or more alternatives, a wide variety of undesirable effects can occur. Examples of such effects are presented, with emphasis on figure skating and dance competitions, where the rule traditionally used allows for the existence of dictatorsituations, dependence on irrelevant alternatives, and intransitivity. Although not all paradoxes can be removed (as proven by Arrow), we propose an improved rule which alleviates m a n y of the difficulties, and is superior to any weighted summation rule.

Keywords: Decision, sports, competitions, social choice, voting

1. Introduction

A wide variety of competitions are judged subjectively by several judges, each of whom either ranks the contestants in some order, or gives an absolute value to the performance of each contestant, or votes to decide which contestants will go on to more advanced levels of competition. These contests include a wide variety of Olympic sports, among which are diving, figure skating, and gymnastics, as well as other competitions, including ballroom dancing. Contestants are hereinafter called ' teams' for brevity. In such competitions, it is possible to have a wide variety of undesirable effects such as intransitivity (team 1 is preferred to team 2 is preferred * Present address: Structura, Management Consultants, Antoinettevej 4, Copenhagen, Denmark. Received December 1986

to team 3 is preferred to team 1); irrelevance (team 1 beats T e a m 2 if team 3 is contending, but if team 3 is eliminated from consideration, Team 2 beats T e a m 1); and dictatorsituations (the result between two of the alternatives is in opposition to all but one judge.) 1 The name dictatorsituations is given to these situations because, when examining the result in relation to the judging, it appears as if one of the judges has been dictating the result. A theorem by Arrow, discussed in Section 3, indicates that it is not possible to eliminate all of these undesirable characteristics. In this paper, we construct a scheme which eliminates dictatorsituations and some other undesirable characteristics, while permitting some minimal irrelevance. We do not know of any discussion of this type of judging, although there is, of course, an extenDictatorsituations should not be confused with the word 'dictator' as used in the social choice literature. Exact definitions of both are given in Section 3 of this paper.

0377-2217/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

sive literature on voting--e.g., [2,3,4]. In a slightly different situation where the rankings are objective rather than subjective--specifically in the case of cross-country r u n n i n g - - S i d n e y and Sidney [5] have discussed ways of reducing some of these same difficulties.

Table 1 Results of nine judges ranking six teams Team Judge number number Cha-Cha

Place

1 3

7

22

23

33

36

41

49

6 26 44 50 53 79

6 3 5 4 2 1

4 3 6 2 5 1

2 6 3 4 5 1

2 3 6 5 4 1

2 3 4 6 5 1

3 2 5 6 4 1

5 3 6 1 4 2

1 5 6 3 4 2

2 5 6 3 4 1

1 2

4

8

12

14

20

21

40

6 26 44 50 53 79

6 4 5 3 2 1

2 6 5 3 4 1

2 5 6 3 4 1

3 2 6 4 5 1

2 4 6 5 3 1

3 5 4 6 2 1

2 4 5 6 3 1

3 2 6 5 4 1

2 6 5 4 3 1

1 3

6

10

13

18

34

38

45

6 26 44 50 53 79

6 3 5 4 2 1

1 4 5 6 3 2

2 4 6 5 3 1

3 6 4 5 1 2

3 4 5 6 2 1

1 3 6 4 2 5

3 4 6 5 2 1

2 1 5 6 4 3

1 9

11 19

25

37

42

43

48

6 4 5 2 3 1

1 3 6 5 4 2

1 5 6 2 3 4

1 5 4 6 3 2

4 3 5 6 2 1

3 2 6 1 4 5

2 5 4 6 3 1

5 3 2 6 4 1

1 15 16 17

24

30

35

44

47

1 2 5 6 4 3

4 2 5 6 3 1

5 4 2 6 3 1

3 5 4 2 6 1

1 2 4 6 5 3

Samba

Rumba

Paso-Doble 6 26 44 50 53 79 Jive 6 26 44 50 53 79 Final analysis 6 26 44 50 53 79

6 4 5 3 2 1

4 3 6 2 5 1

1 2 6 5 4 3

4 1 6 5 3 2

1 6 4 5 3 2

4 1 6 5 3 2

C S

R

P

J

Total

2 3 6 4 5 1

3 4 6 5 2 1

2 3 6 5 4 1

4 2 5 6 3 1

13 16 29 25 17 5

2 4 6 5 3 1

2nd 3rd 6th 4th 5th 1st

2nd 4th 6th 5th 3rd 1st

3rd 4th 6th 5th 2nd 1st

2nd 3rd 6th 5th 4th 1st

4th 2nd 5th 6th 3rd 1st

2nd 3rd 6th 5th 4th 1st

17

The basic problem is to obtain a resulting relative ranking (hereinafter called 'resulting ranking') of a set of objects from a set of relative rankings of these objects. Ranking the couples in a dance competition is an example. In the final round of the United Kingdom Championships in amateur Latin-American dancing, January 1983, in H a m mersmith Palace, London, six teams competed, and for each dance a panel of nine judges ranked the six teams [1]. Table 1 shows the judges' rankings Sections 2 through 7 discuss ordinal ranking, in which each judge states which team he thinks is best (rank 'one'), which second best, and so on. A commonly used rule is presented, several criteria are defined, some theorems are proven, and an improved rule is developed. Section 8 discusses absolute ranking, in which each judge awards a (cardinal) numerical score to each team. Section 9 discusses binary scoring in which each judge votes either to send that team on to the next round of competition or to eliminate it from the competition. Section 10 discusses the use of these rules to evaluate judges. Section 11 presents the summary and conclusions. We assume throughout this paper that the number of judges is odd (as is usually the case); extension to an even number of judges is trivial.

2. The skatingrule The rule used in these competitions is called the skatingrule. It is a generalization of the principle that a team with a majority of firsts wins the competition, with subsidiary rules when no majority of firsts exists, and with similar rules for determining second, third, etc. In the algorithmic presentation of the rule, Table 2, each of the rankings (such as the number within the matrices of Table 1) is called a ' m a r k ' , with 1 being the best mark. If after application of the tie-breaking procedure (steps M + 2 and M + 4) the tie cannot be broken and the k tied teams should have been placed from a to a + k - 1 they are all given the rank ~a+k-1 i - i=~ - = a + - k -- 1 k

2

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

18

Table 2 The skatingrule

M+I M+2 M+3 M+4 M+5

Number of teams = M; number of judges = 2n - 1. Assume the first R - 1 teams have already been found (initially R = 1). w(STEP) is the number of teams among the remaining M - R + 1 teams with n or more marks less than or equal to STEP. The following algorithm is used to find the team(s) to be ranked from R to R + w ( S T E P ) - 1: For STEP ~ 1 to M, DO IF w(STEP) > 0 T H E N GO TO M + 1 I F w = 1 T H E N give rank R to this team (with n or more marks < STEP) and GO TO M + 5. Of the w(STEP) >/2 teams give best (lowest) rank(s) to those with the largest number of marks ~
Main example. The following example, shown in Table 3, with 6 teams and 2n - 1 judges, will be referred to frequently and will be called the 'main example'. It illustrates some interesting possibilities of paradoxes. Applying the skatingrule to the main example, we first want to find the winner or the best teams. STEP = 1. N o couples found. STEP = 2. Teams 1 and 2 are found. Step M + 2 does not break the tie. Using the tiebreaking rule in step M + 4, team 1 is given rank 1 and team 2 is given rank 2. Now teams 1 and 2 are deleted. To find the team to be placed 3rd repeat the algorithm: STEP = 3. Teams 3 and 4 are found and selected. In step M + 2, team 3 is given rank 3 and team 4 is given rank 4. Now teams 3 and 4 are deleted. To find the team to be placed 5th repeat the algorithm: STEP = 5. Team 5 is found and given the rank 5. After deleting team 5: STEP = 6. Team 6 is found and given rank 6. Table 3 Main example Team

No. of judges

1

n- 1

n - 1

Resulting

no.

Identification

A

B...

b...

ranking

1 2 3 4 5 6

2...2 4...4 1...1 3...3 5...5 6...6

6...6 2...2 3...3 1...1 5...5 4...4

1 2 3 4 5 6

1 2 3 4 5 6

The main example is constructed to illustrate the paradoxes that may occur when the skatingrule is used. First notice that team 1 beats team 3 although team 3 is preferred over team 1 by 2n - 2 judges (that is, all but one) and is strongly preferred by half of them. Similarly, team 2 beats team 4, although 2 n - 2 judges prefer 4 over 2. So this example produces a dictatorsituation with all judges except judge A in opposition to the ranking of team 1 over team 3 and team 2 over team 4. Judge A happens to decide the full result. Note that in this example judge A's votes are not out of line. It is not as if the dictator were awarding 6 where everyone else was awarding 1; he differs by at most one rank from half of the other judges. We shall see that if the judges disagree a bit more, this paradox might have been even worse. We do not know of any investigation dealing with the frequency of dictatorsituations in such judging. It is likely that dictatorsituations are rather infrequent with 5 or more judges; however the adjudicators often disagree with each other, and rather often one of the judges is out of line; note in Table 1 that team 6 was awarded every one of the possible ranks, from 1 (best) to 6 (worst), by at least one of the nine judges, in both cha-cha and paso-doble. This means that dictatorsituations, which could lead to widespread dissatisfaction, can happen in practice even with a large number of judges.

Definition. Let X 1 and X 2 be teams. X1 >iX2 means that judge j prefers (gives lower rank to) X 2 over X 1. X t ~ j X 2 means that judge j ties X 1 and X 2 (not permitted in the dancing case, but

J.S. Frederiksen, R.E. Machol / Subjectioely judged competitions

19

Table 4 Team no. 1 K m-2 m--1 m

No. o f j u d g e s

1

n- 1

n- 1

Resulting

Identification

A

B...

b...

ranking

1

2...2 (K+I)...(K +1) m... m (m-1)...(m-1) 1...1

m...m K...K

1 K

(m-2)...(m -2) 1...1 (m - 1 ) . . . ( m - 1)

m -2 m-1 m

2 <~K<~m-3 m -2 m-I m

possible in figure skating). X 1 > X 2 means that X 2 beats X 1 in the resulting ranking. X 1 - X 2 means that X 2 and X 1 are given the same rank in the resulting ranking. X 1 > X 2 means X 1 > X 2 or X 1 - X 2. Note that this definition differs from that commonly used in utility theory (because here the smaller number is better). Definition. A chaotic dictatorsituation occurs if, for a judge J, X 1 >j X m for all judges j ~ J where X 1 is the winner and Arm is the lowest-ranked team in the resulting ranking. Theorem 1. The skatingrule with 2 n - 1 judges, n > 1, and m teams, m > 3, allows existence of chaotic dictatorsituations. Proof. Use the skatingrule on the example shown in Table 4. Notice that if judge A exchanges the marks for teams 1 and m he also exchanges the resulting ranks of teams 1 and m.

Example 1. It would seem that eliminating the best and worst marks for each team might avoid dictatorsituations, but this is not the case. If we cross out the best and worst mark for each team in the main example, we get Table 5. Still all judges except judge A (now with all his marks eliminated!) are in opposition to the ranking of team 1 over team 3 and of team 2 over team 4. Example 2 shows instances of another kind of paradox called 'irrelevance', where deleting team i reverses the rankings of teams j and k. In the main example, suppose team 4 is not competing, but the judges have the same relative ranking of the remaining teams; see Example 2a in Table 6. Notice the change of the relative ranking between teams 1 and 3. If it were team 3 that was not competing, and the judges still have the same

relative ranking of the remaining teams, the result is again totally different; see Example 2b in Table 7. Now team 4 beats team 2. A particular example of irrelevance is intransitivity. If, in the main example, only teams 1 and 2 are competing; we get Example 2c, shown in Table 8. If only teams 2 and 3 are competing we get Example 2d (Table 9). If only teams 1 and 3 are competing, we get Example 2e (Table 10). Thus, team 1 is preferred over 2, team 2 is preferred over 3, and team 3 is preferred over team 1.

Table 5 Team

N u m b e r of judges

number

n- 1

n- 2

Result

1 2

2 2

6 4

2 1

3

3

1

312

4

3

1

312

5 6

5 6

5 4

5 6

Table 6 Example 2a Team

A

numbe r

No. of judges

1 2 3 5 6

B...

b...

1

n -1

n -1

1 2 3 4 5

2...2 3...3 1...1 4...4 5...5

5...5 1...1 2...2 4...4 3...3

Result

3 2 1 4 5

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

20 Table 7 Example 2b Team

A

number

No. of judges

1 2 4 5 6

1 2 3 4 5

B...

b...

Result

IIR

n -1

n -1

1 ...1 3...3 2...2 4...4 5...5

5...5 2...2 1...1 4...4 3...3

1 3 2 4 5

ND

Table 8 Example 2c Team

A

number

No. of judges 1

B...

n -1

b...

Result

n -1

1

1

1...1

2...2

1

2

2

2...2

1...1

2

Table 9 Example 2d Team

A

number

No. of judges 1

n -1

n -1

2 3

1 2

2...2 1...1

1...1 2...2

1 2

B...

b...

Result

Team

A

B,..

b...

Result

number

No. of judges 1

n -1

n -1

1 3

1 2

2...2 1 ...1

2...2 1 ...1

Table 10 Example 2e

2 1

3. Arrow's impossibility theorem in relation to possible improvements Arrow [2] has discussed some reasonable conditions one might expect a voting rule to have. Slightly reformulated, these are: PA

PO

Positive Association of resulting and relative rankings. If a judge changes his ranking so that team X1 is given a better ranking, while his preferences among the other teams are unchanged, then strong preferences

(such as Xs > X1) are all preserved, and indifference relations (such as Xs - X 1) are preserved or changed to Xs > X 1. Independence of Irrelevant Alternatives. If the judges' rankings are changed, but no judge has changed his preferences between teams in some proper subset B, then resulting rankings among the teams in B will remain unchanged. The judges' sovereignty (Pareto-Optimality). If x > j y for all judges j then x > y. NonDictatorship. There is no judge J such that for all competitors x, y x > s y =, x > y regardless of the preferences of the other judges.

For a rule to be used in sports competitions it is natural to impose the following stronger condition. NDS No DictatorSituations. There is no judge J such that for any teams x and y, y > x but x >g y for all judges j ~ J. Note that NDS =~ ND, but N D ~, NDS. Arrow has proved that no rule can satisfy conditions PA, IIR, PO, and N D when 2 or more judges choose between 3 or more teams. Obviously, therefore, no rule can satisfy conditions PA, IIR, PO, and the stronger condition NDS. It is trivial that the skatingrule satisfies conditions PA and PO. Application of the Arrow theorem shows that it is impossible to eliminate both the possible existence of dictatorsituations and the possible existence of irrelevance without taking away the judges' sovereignty or the positive association of social and individual preferences. The purpose of the following discussion will therefore be to eliminate the possible existence of dictatorsituations while PA and PO and most of IIR still h o l d - - o r at least hold in slightly weakened form.

4. A summation rule If the number of judges exceeds the number of teams, it is easy to show that the following way to obtain the resulting ranking will satisfy NDS: For each team, add the rankings of the judges; give first place to the team with the lowest total score, second place to the team with the second lowest score, etc. However, this rule does give rise to some other

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

21

ble; the only serious problem in this case is the dependence on irrelevant alternatives.

Table 11 Example 3 Team number

Judges A

B

C

D

E

F

G

1

4

6

1

2

1

6

4

Total

Result

24



2

3

5

2

5

2

1

6

24

212

3

2

2

5

6

3

5

1

24

2~2

4 5 6

5 1 6

1 4 3

6 4 3

1 4 3

4 5 6

2 4 3

5 3 2

24 25 26

212 5 6

1 2 3 4 5

4 3 2 5 1

5 4 2 1 3

1 2 4 5 3

2 4 5 1 3

1 2 3 4 5

5 1 4 2 3

3 5 1 4 2

21 21 21 22 20

3 3 3 5 1

major problems, including a virulent form of irrelevance: the elimination of one team may cause a 'wild' permutation of the rankings of the others. This is dealt with in general by Saari [3,7] and can be illustrated by the following example. By deleting team 6 in Example 3 (Table 11), team 5, initially placed below all of the remaining teams, now becomes the winner. Another problem with this rule is that a team which has been ranked first by a majority of the judges will not necessarily win (and similarly for a team ranked last by a majority of the judges); see Example 4 (and also Examples 8-10). Of course these examples do require somewhat disparate performances among the judges. But such disagreements are not uncommon, as illustrated in Table 12, which is quite representative. With 'dances' substituted for 'judges', the above-described rule is used when obtaining the final rankings from the results in the individual dances (Table 1). In this sitiaation the fact that the result in the majority of the dances does not necessarily decide the full result is not objectionaTable 12 Example 4

5. The updating principle and fair rules In Example 3, excluding the lowest-placed team drastically changed the results among the remainder of the teams. An obvious way to resolve this paradox is to introduce the following updating principle: 1. Select the team among the considered teams to be placed lowest and eliminate this team. 2. If any teams are left, then update the marks for the rest of the teams accordingly and Go TO 1. Updating means changing the marks for each judge so that the rest of the teams are ranked without changing the preferences of the judge, as we have done in Examples 2 and 3. Mathematically, of course, selecting the best team(s) first or the worst team(s) first are symmetrical. But it offends our sensibilities if the worst teams affect the relative rankings of first and second, whereas we are rather indifferent if the best teams affect the relative rankings of last and next-to-last. In Example 3, team number 6 is eliminated and the rest of the marks updated accordingly; then team 4 is given the next lowest rank and eliminated; etc.

Definition. A ranking procedure is said to have the exclude-bad-alternatives property if, for all t < m 1, when the t lowest-placed alternatives are taken out of consideration, the ranking of the best placed ( m - t) alternatives would not be affected, provided that the judges do not change their preferences. It is obvious that any deterministic ranking procedure which uses the updating principle satisfies the exclude-bad-alternatives property.

Definition.

Team

Judges

number

A

B

C

D

E

F

G

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6

1 2 4 5 6 3

5 2 1 3 4 6

6 2 3 4 5 1

Total

Result

16 14 20 28 35 34

2 1 3 4 6 5

WPA Weak Positive Association between resulting and relative rankings. If a judge changes his rankings so that team X is given a better ranking while his preferences among the other teams are unchanged, then team X will not be ranked last if it was not initially ranked last.

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

22

WPA, which is weaker than PA, is similar to the positive association condition for voting procedures. PA ~ WPA; WPA ~ PA; WPA and IRR together imply PA. When examining rules constructed using the updating principle it seems to be easier to check WPA than PA.

Definition. Condition of Anonymity: judges are said to have 'equal power' if, given a set of marks, the resulting ranking is independent of which judges award the marks. Definition. Condition of Neutrality: teams are said to have 'equal chances' if, given a set of relative rankings, the resulting rankings depend only on the marks awarded to the various teams. These two conditions are discussed by Arrow in [2, pp. 100-102]. Definition. A rule is 'fair' if 1. It satisfies WPA and PO. 2. The judges have equal power. 3. The teams have equal chances. 4. No dictatorsituation can occur if only two teams are competing. It is highly desirable to get a fair rule in the above sense. All rules discussed in this article are fair rules. Theorem 2. There is no fair rule with 3 or more teams without the possibility of intransitivity. Proof. Consider Example 5 (Table 13), with three teams and 2 n - 1 judges; suppose we use a fair rule with transitivity. In the final result X 1 > X 3, since if X 1 and X 3 were the only teams competing, X 1 > X 3 follows from fairness condition 4. This leaves only the following possible types of final rankings CASE 1.

Xl > X3 ~ X 2 .

CASE 2.

XI>X2>~X3.

CASE 3.

XI ~_~X2 > X3 .

CASE4.

X2 >_ X l > X 3.

In cases 1 or 2, if X, and X 2 were the only contestants competing, XI > X2 because of transitivity. The updated marks would then be as shown in Table 14. If judge A then exchanges the votes for contestants 1 and 2, the result would be X 2 > X 1

Table 13 Example 5 N u m b e r ofj udge s

X1 X2 X3

1

n- 1

n-1

A

B...

b...

1 2 3

2...2 3...3 1...1

3...3 1...1 2...2

(by fairness conditions 2 and 3). But then X 2 is placed lowest of the two contestants, despite the improved votes for contestant X2; that is, condition 1 is not satisfied. Thus, the cases 1 and 2 can be excluded. But cases 3 and 4 can also be excluded by similar considerations of the pair X 2,

x3. Corollary. There is no fair rule which always guarantees IIR.

Proof. This follows from the definition of intransitivity as a special case of irrelevance.

6. The efficient skatingrule It seems reasonable to look for a rule that uses the updating principle and also satisfies the following conditions: 1. It is a fair rule. 2. If a majority of the judges agree on a winner, that contestant is the winner; similarly, a majority of the judges decides the contestant to be placed lowest. 3. Using a sufficiently large number of judges, dictatorsituations will never occur. A rule which uses the updating principle and the method for selection of the team(s) to be placed lowest presented in Table 15 satisfies the above three conditions. Let us apply this 'efficient skatingrule' to the main example with a = 1, fl = 1, m = 6 and n >~ 3. In step 1 team 6 is found and given rank 6. Table 14 N u m b e r of judges

X1 X2

1

n- 1

n- 1

A

B...

b...

1 2

1...1 2...2

2...2 1...1

J.S. Frederiksen, R.E. Maehol / Subjectwely judged competitions

23

Table 15 The efficient skatingrule

a+2 a+3

Number of teams = M. Number of judges = 2n - 1 . a and fl are nonnegative integers satisfying aft ~< n - 1 . w(STEP) is the number of teams with at least n + f l ( S T E P - 1 ) marks greater than or equal to M - ( S T E P - 1 ) . The following algorithm is used to find the team(s) to be ranked lowest: For STEP = 1 to a + 1, DO IF w(STEP) 4:0 T H E N select the teams among the w(STEP) teams with the largest number of marks greater than or equal to M - (STEP - 1). IF there is more than one such team select from these teams the team(s) with the highest total of the marks which are at least M - STEP+ 1 T H E N GO TO a + 3 ELSE G O TO a + 2 Select the team(s) with the highest total of (all) their marks. Give the rank M - ( k 1)/2 to the k selected teams.

After updating, team 5 is found in step 1 and given rank 5. After updating, team 2 is f o u n d in step 3 and given rank 4. Then team 4 is found in step 1 and given rank 3. Finally team 1 is f o u n d in step 1 and given rank 2, leaving team 3 as winner.

Theorem 3. Given a competition with m teams and 2 n - 1 judges. When the efficient skatingrule is used with parameters a, fl chosen such that mfl < 2 n - 1

and

afl <<.n - 1 ,

then the following conditions are satisfied: I. II.

It has the exclude-bad-alternatives property. It is a fair rule, and if a majority of the judges agree on a winner, that contestant is the winner; similarly, a majority of the judges decide the contestant to be placed lowest. III. There is no situation where, for any set B of fl judges,

erences a m o n g the other teams, then since x wasn't selected earlier, x will not be selected now with improved marks. But since all the other teams n o w have the same or worse marks, some team will be chosen now or earlier. Hence x will not be placed lowest now either. The equality and neutrality conditions are trivially satisfied. It is also a trivial observation that a majority of the judges if they agree can decide the lowest-placed team. Fairness condition 4 follows f r o m this fact. Suppose that team x is placed first by at least n of the judges, and team x is selected a m o n g the m t best of the teams left, where 1 ~< m I ~< m. Then team x would be f o u n d in step a + 2. But all the rest of the m a considered teams have all their u p d a t e d marks in the set {1 . . . . . m 1 }. The total of all u p d a t e d marks for the m I remaining teams is trl 1

N. • X 1> X2

although X 2 ( j S 1

for all judges,

j q~ B.

In particular there are no dictatorsituations if

B>~I.

i = ½ m l N ( m 1 + 1)

i=l

which gives an average total a = ½N(m 1 + 1) over the m~ remaining teams. The total of the u p d a t e d marks for team x is less than or equal to

E =n'l+(N-n)ml

Proof. I.

The exclude-bad-alternatives property follows f r o m the use of the updating principle. II. The weak positive association between resulting and individual rankings follows from the following argument: Assume a given set of rankings, with y the lowest-placed team selected in the tie-breaking procedure of the rule; if a judge then gives a better ranking to team x 4: y without changing the pref-

= ( ½ N - n ) ( m 1 - 1) + ½N(m a + 1) < a for m I > 1 since ½N - n < 0. But since team x is selected ~ >~ a, hence m~ = 1 and team x is the winner. III. Assume that x 2 is a lowest-placed team a m o n g the m 1 ~< m (best) teams left; x I is also a m o n g these teams and x 2 > x I (m 1 > 1) and x 1 > i x 2 for N - fl judges j. These assumptions can

24

J.S. Frederiksen,R.E. Machol / Subjectivelyjudged competitions

be made with no loss of generality because of the updating principle, x 2 cannot have been selected in Step 1 because by assumption x 2 has at most fl marks of m r (Since a > / 1 and n + a f l < ~ N it follows that f l < n ) . Suppose then that x 2 is selected in Step i + 1, a >/i >/1. Then x 2 has at least n + ifl marks in ( m 1 - 1. . . . . m I }. Since x l > i x 2 for N - f l judges j, x~ must have at least n + ( i - 1 ) f l marks in ( m l - l + l . . . . . ml}. But then the selection of the lowest-placed alternative took place in step i or earlier contradicting the assumption that x 2 was selected in step i + 1. If x 2 were selected in step a + 2, the total T2 of the (updated) marks for x 2 must be at least as high as the total 7"1 of the (updated) marks for xx. But T 1 - T2 > ~ ( N - f l ) - f l ( m 1 - 1 ) > ~ ( N - f l ) fl(m-1)=N-flm>O, since n - f l of the N judges prefer x 2 over x 1 and the difference in marks is less than or equal to (m I - 1 ) . Hence 7"1 > T2, a contradiction.

7. The generalized efficient skatingrule In some cases the number of judges, N = 2n 1, is so low or the number of teams, m, so high that mfl >/2n - 1 for any positive integer ft. In these cases normal use of the efficient skatingrule does not guarantee absence of dictatorsituations. With the following modification of the efficient skatingrule it is possible to obtain a very good rule also for this type of competition. First we define the closure of a set of teams: Let A be a set of teams. The 'closure of A' (cl A) will be the smallest set which includes all of A and also other teams such that for all X 1 ~ cl A and X 2 ~ {1 . . . . . m } \ c l A more than fl judges prefer X 2 over X r We can generalize the efficient skatingrule by changing step a + 2 to the following a + 2 Call the set of team(s) with the highest total of the updated marks A. Select the teams in cl A. Note that in cases where m < N / f l , the above generalization reduces to step a + 2 of the original efficient skatingrule--this fact follows from Theorem 3.

Theorem 4. Given a competition with m teams and 2 n - 1 judges where the generalized efficient

skatingrule is used with parameters et, fl chosen such that fl < n and afl <~n - 1 , then the following conditions are satisfied: I. It has the exclude-bad-alternatives property. II. It is a fair rule and if a majority of the judges agree on the winner, that contestant is the winner; similarly, a majority of the judges decide the contestant to be placed lowest. III. There is no situation where, for any set B of fl judges X1>X 2

althoughX 2 < i X 1

for all judges j ~ B. Proof. An obvious modification of the proof of Theorem 3. Example 6. Consider the main example with n = 2, a = 0, and fl = 1. Having found team 6 to be given the rank 6 and team 5 to be given the rank 5, teams 5 and 6 have been deleted and the marks of the rest of the teams are updated: Team no. Marks 1 1 2 4 2 2 4 2 3 3 1 3 4 4 3 1 STEP 1. N o teams found. STEP 2. Teams 2 and 4 each have the highest total

(8) cl A := (2, 4}. T e a m 2 is preferred over team 3 by 2 judges so team 3 is appended to cl A. But team 3 is preferred over team 1 by 2 judges, so team 1 is then also appended to cl A, such that cl A = {1, 2, 3 , 4 } . The four teams are all found, selected, and given the rank 2½ (tie). Example 6 shows that the generalized efficient skating,rule works very well if ties are acceptable. To eliminate most of the ties the following refinement of the tie-breaking rule can be introduced: IF w teams are found in STEP y, 1 ~< y ~< (or + 1); TrIE~ consider the team(s) with the highest number of marks in { m - ~ + 1 . . . . . m } and from these select the team(s) with

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

25

skatingrule to those relative rankings. In economics such cardinalities of utility are discarded, since they represent allowable divergences in tastes. However, for the type of judging considered here, we cannot consider large divergences to be quantitatively valid. Several methods are commonly used to obtain a result from such absolute rankings. In the most obvious method, the marks for each team are summed; the team with the highest total is then ranked first, the team with second highest total second, and so on. The following example (Table 16) with 7 judges illustrates difficulties inherent in the summation method. As in Olympic figure skating, the judges give marks between 0.0 and 6.0. Suppose that judges A and B now change some of their marks (see Table 17): Since no judge has changed the (relative) relation between teams 1 and 2, the result indicates that the IIR (Independence of Irrelevant Alternatives) condition is violated. Furthermore, judge B has given team 2 improved rank, but the result for team 2 is worse, hence the PA (Positive Association of resulting and relative rankings) condition is also violated. Note also that in both cases team 1 is considered to be worst by a majority of the judges (and so would be ranked last by the efficient skatingrule); but in neither case is team 1 ranked last, and in one it is actually ranked first. The next example (see Table 18) shows how this rule and a similar rule behave in relation to the NDS (No Dictatorsituations) condition. Here all judges except A prefer team 2 over team 1; yet team 1 wins. Furthermore, judge A can still dictate the result (in this case with even higher difference in totals!) with all his marks crossed out, when the best and worst marks for each contestant are eliminated. See Table 19. Paradoxes analogous to all those shown for the

the highest total of marks in { m - 7 + 1. . . . . m } ; ELSE IF w < m ; THEN use the selection method of the generalized efficient skatingrule on updated marks for the w teams to select the team(s) to be placed lowest; ELSE (if w = m) select the w found teams. This recursive use of the selection method stops, since the number of tied teams goes down for each activation of the selection method, and the use of the more advanced tie-breaking rule does not violate Theorems 3 or 4.

8. Absolute ranking

Ordinal (relative) ranking is appropriate when judges watch several teams simultaneously and are thus able to make rational comparisons between them. When the judges can observe only one team at a time, it is better to have each judge score that team against some defined standard; that is, to apply cardinal (absolute) ranking. In such competitions each judge gives to each team a mark consisting of a real number; for example, in Olympic skating one of the 61 marks between 0.0 and 6.0, and in gymnastics one of the 101 marks between 0.0 and 10.0 (in practice, only the few highest marks are generally used). While these absolute marks are nominally objective, they are in practice subjective, based on the opinion of the judge concerning the performance observed. When all teams have completed their performances, there is for each judge an 'underlying' set of ordinal scores obtained by simply ranking the cardinal scores. We indicate here that the best results are obtained by applying the efficient Table 16 Example 7 Team

Judges

number

A

B

C

D

E

F

G

1 2 3

6.0 5.9 5.5

5.9 5.7 5.8

6.0 5.9 5.7

5.6 5.7 5.9

5.7 5.9 5.8

5.8 6.0 5.9

5.6 5.7 5.8

1 2 3

Underlying 1 2 3

relative ranking 1 1 3 2 2 3

3 2 1

3 1 2

3 1 2

3 2 1

Total

Result

40.6 40.8 40.4

2 1 3

26

J.S. Frederiksen, R.E. Machol / Subjectioelyjudged competitions

Table 17 Team number

Judges A

B

C

D

E

F

G

1 2 3

6.0 5.7 5.5

5.9 5.6 5.5

6.0 5.9 5.7

5.6 5.7 5.9

5.7 5.9 5.8

5.8 6.0 5.9

5.6 5.7 5.8

Total

Result

40.6 40.5 40.1

1 2 3

Total

Result

38.6 38.5

1 2

Total

Result

28.0 27.7

1 2

Underlying relative ranking 1 2 3

1 2 3

1 2 3

1 2 3

3 2 1

3 1 2

3 1 2

3 2 1

Table 18 Example 8 Team number

Judge A

B

C

D

E

F

G

1 2

5.7 5.0

5.5 5.6

4.9 5.0

5.6 5.7

5.7 5.8

5.7 5.8

5.5 5.6

Table 19 Team number

Judge A

1 2

B 5.5 5.6

C

D

E

F

G

5.7

5.0

5.6 5.7

5.7 5.8

5.5 5.6

main example can easily be constructed if cardinal numbers are summed, or if they are converted to ordinal rankings and the skatingrule is then applied. As before, the situation is improved if the generalized efficient skatingrule is used.

9. Go-no-go voting When the number of teams competing simultaneously is very large (in ballroom dancing it is often dozens), the judges may make a binary decision on each team: each judge votes either 0 or 1, i.e. either to eliminate the team or to advance it to the next stage of competition. The marks are added to give a total for each team. The teams with the highest totals are advanced to the next round. Since this is a weighted summation rule, the paradox of dependence on irrelevant alternatives can occur. This is indicated by Example 9 (Table 20). We wish to devise a rule that will ensure that

the best team is advanced, but we also wish to be able to predict the approximate number of teams to be advanced. Note that in Example 8, each judge chose only 6 teams to be advanced, and six teams were advanced in the first part of the example, but seven teams were tied for the highest score in the second part. It is easy to construct examples where the number of teams tied, and therefore advanced, is almost twice the number of ones awarded by each judge. Only teams 2, 3, 4, 5, 6, and 7 are advanced to the next (final) round, although team 1 is considered to be the best by a majority of the judges. If team 12 (the worst team) did not participate, the result would be as shown in Table 21. N o w team 1 will be advanced to a 7-team final, and will win if the generalized efficient skatingrule is used on the same rankings! We propose an improved g o - n o - g o rule. Let x be the total score of each team (the penultimate column in Example 4). Let a be the desired approximate number of teams to be advanced, and

J.S. Frederiksen, R.E. Machol / Subjectioelyjudged competitions

27

Table 20 Example 9, 12 teams, 7 judges Team no.

Underlying relative ranking

1 2 3 4 5 6 7 8 9 10 11 12

1 7 2 8 3 9 4 5 6 10 11 12

1 7 2 8 3 9 4 6 5 10 11 12

1 2 7 3 8 4 5 9 10 6 11 12

1 2 7 3 8 4 5 9 10 11 6 12

Marks (1 if ranking ~<6, 0 if >~ 7) 7 1 2 3 4 5 6 8 9 10 11 12

7 1 2 3 4 5 8 9 10 11 6 12

7 1 2 3 4 5 8 9 10 11 12 6

1 0 1 0 1 0 1 1 1 0 0 0

1 0 1 0 1 0 1 1 1 0 0 0

1 1 0 1 0 1 1 0 0 1 0 0

1 1 0 1 0 1 1 0 0 0 1 0

0 1 1 1 1 1 1 0 0 0 0 0

0 1 1 1 1 1 0 0 0 0 1 0

0 1 1 1 1 1 0 0 0 0 0 1

Total

Result

4 5 5 5 5 5 5 2 2 1 2 1

0 1 1 1 1 1 1 0 0 0 0 0

Total

Result

5 5 5 5 5 5 5 2 2 1 2

1 1 1 1 1 1 1 0 0 0 0

Table 21 Team no.

Underlying relative ranking

1 2 3 4 5 6 7 8 9 10 11

1 7 2 8 3 9 4 5 6 10 11

1 7 2 8 3 9 4 6 5 10 11

1 2 7 3 8 4 5 9 10 6 11

1 2 7 3 8 4 5 9 10 11 6

Marks (1 if ranking ~<6, 0 if >/7) 7 1 2 3 4 5 6 8 9 10 11

7 1 2 3 4 5 8 9 10 11 6

6 1 2 3 4 5 7 8 9 10 11

1 0 1 0 1 0 1 1 1 0 0

a ' the n u m b e r of ones awarded b y each judge. We define a f u n c t i o n f.(x) as follows: f.(x) = 0

if0~
x 2n - 1

ifl / n .

Table 22 Examplel0. n = 4 ;

m=12, a=6, j=l,

Team no.

Underlying relative ranking

1 2 3 4 5 6 7 8 9 10 11 12

1 7 2 8 3 9 4 5 6 10 11 12

1 7 2 8 3 9 4 6 5 10 11 12

1 2 7 3 8 4 5 9 10 6 11 12

1 2 7 3 8 4 5 9 10 11 6 12

1 0 1 0 1 0 1 1 1 0 0

1 1 0 1 0 1 1 0 0 1 0

1 1 0 1 0 1 1 0 0 0 1

0 1 1 1 1 1 1 0 0 0 0

0 1 1 1 1 1 0 0 0 0 1

1 | 1 1 1 1 0 0 0 0 0

Then let a' be the largest integer smaller than n(a + j + 1 ) / 2 n - 1, where j is any small integer, and advance those teams with largest f,(x). Then the number of teams advanced will be at least a' (at least a' teams must get positive scores) and at

a'=4 Marks (1 if ranking ~<4, 0 if ~ 5) 7 1 2 3 4 5 6 8 10 11 11 12

7 1 2 3 4 5 8 9 9 10 12 6

7 1 2 3 1 5 8 9 10 11 12 6

1 0 1 0 1 0 1 0 0 0 0 0

1 0 1 0 1 0 1 0 0 0 0 0

1 1 0 1 0 1 0 0 0 0 0 0

1 1 0 1 0 1 0 0 0 0 0 0

0 1 1 1 1 0 0 0 0 0 0 0

0 1 1 1 1 0 0 0 0 0 0 0

0 1 1 1 1 0 0 0 0 0 0 0

x

Result fn ( x )

4 5 5 5 5 2 2 0 0 0 0 0

7 7 7 7 7 2 2 0 0 0 0 0

28

J.S. Frederiksen, R.E. Machol / Subjectivelyjudged competitions

most a + j (at most a + j teams will tie for the m a x i m u m fn (x)). See Table 22. Teams 1, 2, 3, 4, and 5 are advanced (or teams 1 through 7, if seven teams is deemed preferable to five in the f i n a l - - n o t e that six were desired). By forcing each judge to vote for fewer teams to advance ( a ' < a), dependence on lower-ranked teams (as in Example 4) is generally reduced. But to avoid in any specific case having a bad team advanced on the vote of a single judge, fn(1) is set equal to 0.

For purposes of classifying and selecting judges it is desirable to have some method of scoring their performance in previous competitions. Since we are talking of subjective competitions, there can be no objective measure of a judge's performance other than how well he agrees with other judges. We propose such a measure, based on the efficient skatingrule and the improved g o - n o - g o rule. This measure may not be as useful for detecting bias (for example, if a judge is suspected of showing preference for or against competitors from a particular nation or group of nations) as for the r a n d o m deviations which might be expected from an incompetent judge. For the efficient skatingrule, a point is given to a judge each time a team is selected in step y, l~
possible to avoid all such complications, but they can be minimized. Obvious or commonly used solutions lead to severe problems, as indicated in the examples above. It is shown that a new rule called the 'generalized efficient skatingrule', a modification of the commonly used 'skatingrule', eliminates m a n y of these difficulties. We give here an example of a family of such rules. Say there are six teams and nine judges. If any team has five or more 6's, it is last; if not, if any one team has six or more 5's or 6's, it is last. If two or more teams have six or more 5's or 6's, the team with the most 5's and 6's is last; if there is still a tie, these teams are ranked tied for last. If there is no team with six or more 5's or 6's, the team with the highest sum of all its marks is last, but tied with all teams necessary to ensure that no six judges prefer a team which is last over a team which is not last. N o w the marks of the remaining teams are updated: if any team was ranked below the last team(s) by any judge, its mark from that judge is reduced by one (or two, etc.). Now repeat: look for a team with a majority of 5's (or 4's, etc.)... Alternative forms of the algorithm may be admissible: one can look for teams with seven or more 4's, 5's, or 6's before adding all the marks, etc.; or one can go from five or more 6's to seven or more (instead of six or more) 5's and 6's, etc. If the number of teams is very large, modifications re required. While these principles have been applied thus far only to competitive sports and dancing, they may have more far-reaching implications. For example, examinations of students, or of applicants for professional licenses, are often graded by several judges. Applications for jobs are frequently graded by m a n y interviewers. And of enormous possible significance is the application of these principles to awarding contracts; this is discussed in some detail in [6].

1 I. Summary and conclusions

References

When several judges independently grade several competitors, their assessments may differ. If so, any rule which attempts to combine the different grades to give a resulting ranking of the competitors is subject to numerous paradoxes, leading to undesirable results such as intransitivity, irrelevance, and dictatorsituations. It is not

[1] Dance News, No. 800, October 27, 1983. [2] Arrow, Kenneth J., Social Choice and Individual Values (2nd edition), Yale University Press, New Haven/London, 1963. [3] Saari, Donald, "Inconsistencies of weighted summation voting systems", Mathematics of Operations Research 7/4, (1982). [4] Blin, Jean-Marie, and Satterthwaite, Mark A., "Individual

10. Scoring judges

J.S. Frederiksen, R.E. Machol / Subjectively judged competitions

decisions and group decisions", Journal of Public Economics 10, (1978) 247-267. [5] Sidney, B., and Sidney, S.J., "Intransitivity in the scoring of cross-country competitions", in: R.E. Machol et al., (eds.), Management Science Applications in Sports, North-Holland, Amsterdam, 1976.

29

[6] Machol, R.E., and Frederiksen, J., "Are proposals evaluated appropriately?", O R ~ M S Today 13/2, (1986) 6-7. [7] Saari, Donald, "The ultimate of chaos resulting from weighted voting systems", Adoanced in Applied Mathematics 5 (1984) 286-308.