Concordance and consensus

Concordance and consensus

Information Sciences 181 (2011) 2529–2549 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/i...

670KB Sizes 138 Downloads 164 Views

Information Sciences 181 (2011) 2529–2549

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Concordance and consensus Cees Elzinga a,⇑, Hui Wang b, Zhiwei Lin b, Yash Kumar c a b c

Department of Sociology, PARIS Research Program, VU University Amsterdam, The Netherlands Computer Science Research Institute, School of Computing and Mathematics, University of Ulster, Northern Ireland, UK Computer Science Department, International Institute of Information Technology, Hyderabad, India

a r t i c l e

i n f o

Article history: Received 14 August 2009 Received in revised form 23 December 2010 Accepted 3 February 2011 Available online 17 February 2011 Keywords: Consensus Concordance Preference ranking Kendall W Kendall s Likert item All common subsequences String kernels Rank correlation

a b s t r a c t This paper deals with the measurement of concordance and the construction of consensus in preference data, either in the form of preference rankings or in the form of response distributions with Likert-items. We propose a set of axioms of concordance in preference orderings and a new class of concordance measures. The measures outperform classic measures like Kendall’s s and W and Spearman’s q in sensitivity and apply to large sets of orderings instead of just to pairs of orderings. For sets of N orderings of n items, we present very efficient and flexible algorithms that have a time complexity of only O(Nn2). Remarkably, the algorithms also allow for fast calculation of all longest common subsequences of the full set of orderings. We experimentally demonstrate the performance of the algorithms. A new and simple measure for assessing concordance on Likert-items is proposed.  2011 Elsevier Inc. All rights reserved.

1. Introduction The concept of concordance refers to agreement of judgements and it occurs in at least three different contexts: in voting and decision making, in group attitude assessment using Likert items and in statistics. The first and oldest context is the context of voting and decision making. Here, concordance refers to the degree to which voters or judges agree on the ordering or ranking of a set of items, the items standing for e.g. political parties, brands of produce, patients waiting for treatment or projects to be funded from limited resources. The more the voters agree, the more concordance and concordance is maximal when all judges order the items in the same way. Mostly, when the judges have ordered the items, concordance is not perfect and then the problem arises of how to aggregate the various orderings into a consensus, i.e. an ordering that, in some sense, best represents the variety of observed orderings. Constructing a consensus, i.e. aggregating the different orderings into one ‘‘representative’’ ordering, was already studied during the second half of the eighteenth century [6,13] in the context of voting systems and since then, the problem has been studied in many contexts: in machine learning [21], in sports tournaments [43], in evaluating consumer preferences and in web search strategies [3,12,57]. The problem is also known as the group ranking problem or the rank aggregation problem, and is formally identical to the problem of multi-criteria decision making [29]. Consensus has also been studied in bio-informatics as an alignment problem of spatial orderings of biochemical substances [53]. However, different aggregating methods will produce different consensus orderings [22,17] and when the observed orderings are widely different, the aggregated consensus ordering may not be very useful since only few or even none of ⇑ Corresponding author. Tel.: +31 20 598 6889. E-mail addresses: [email protected] (C. Elzinga), [email protected] (H. Wang), [email protected] (Z. Lin), [email protected] (Y. Kumar). 0020-0255/$ - see front matter  2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.02.001

2530

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

the observed orderings may be similar to the consensus ordering. Therefore, given a consensus ordering, we need means to assess the representativeness of such consensus. Here too, concordance could have a role since if concordance is high, the consensus ordering should be a ‘‘good’’ representative of the full set of orderings whereas this is impossible when concordance is low. The second context in which the concept of concordance is relevant arises when judges have to decide upon their attitude or strength of opinion by picking one of the ordered categories from a so-called Likert-item [39]. Such ordered categories normally are expressions like ‘‘Strongly agree’’, ‘‘Agree’’, ‘‘Neutral’’, ‘‘Disagree’’ or ‘‘Strongly disagree’’ and pertain to statements like ‘‘Abortion should be legally prohibited’’. By picking a category, judges order themselves by the extent to which they agree with the statement. If all judges pick the same category, concordance is maximal and if they do not, there is (much) less concordance. Recently, Tastle et al. [54] proposed an entropy-based index for this kind of concordance (which they call ‘‘consensus’’; we will confine the use of the word ‘‘consensus’’ for referring to aggregated preference-orderings). In this context, there are no preferences to be aggregated so the problem of constructing a consensus does not arise. The third context in which the issue of concordance arises is in statistics. In statistics, it can be useful to know how much large values of one random variable X correspond to large values of another random variable Y. For any two observations ðX ¼ x1 ; Y ¼ y1 Þ and ðX ¼ x2 ; Y ¼ y2 Þ from a continuous bivariate vector ðX ; YÞ, the two observations are said to be concordant if either x1 < x2 and y1 < y2 or x1 > x2 and y1 > y2. The properties of such concordances can be gauged by a measure of concordance and a unified theory of such measures was first formulated by Scarsini [18,50] using the theory of copulas. Although we will not deal with statistical problems, we mention Scarsini’s work since we will, in one of the sections to follow, compare his concordance axioms for continuous random variables with the axioms that we will formulate for concordance of item-orderings. Furthermore, the best known measures of concordance for item-orderings, Spearman’s q [52] and Kendall’s s [33,35] satisfy the Scarsini-axioms and are widely used in a variety of applications. In recent years, we have seen applications of these measures in e.g. the study of species diversification [1], the study of ecological communities [37,46,49], in neuropsychiatry [44], in emergency medicine [23], in the study of consumer and stakeholder preferences [8] and in personnel psychology [36]. In this paper we take an axiomatic approach to the study of concordance. In Section 4.1, we propose a set of axioms that all proper concordance measures should satisfy and in Section 4.2, we will confront the classical measures s, W and q with those axioms. As will appear, neither of Kendall’s s or W nor Spearman’s q satisfy all of the axioms proposed and, moreover, neither of these measures is very sensitive to small perturbations of the orderings involved. Therefore, we propose a new class of concordance measures that do satisfy all of the proposed axioms and that are much more sensitive to perturbations of orderings. Finally, we discuss efficient methods to calculate the new measures and handle the problem of constructing consensus. To achieve all this, we start in Section 2 by introducing the reader to some necessary concepts and notation. In Section 3, we concisely discuss related work: sequential pattern mining, Scarsini’s axioms and distance based consensus models, in particular the Cook–Seiford axioms. Once the stage is set, we discuss new axioms of concordance in Section 4, consider the classical measures of concordance and derive a class of new measures that satisfy the axioms. In Section 5, we discuss constructing consensus in the context of vector spaces and in Section 6 we amply discuss efficient algorithms with a time complexity that is linear in the number of orderings involved and quadratic in the size of the item set ordered. We demonstrate excellent performance of the algorithms in Section 7. In Section 8, we discuss assessing concordance in the context of survey research using Likert-items and we make some concluding remarks in Section 9.

2. Preliminaries Concordance was studied in different contexts, so different terminologies have been used in the literature. In order to provide a unified account of concordance, we choose a context-neutral terminology, which is string based. Here we introduce the string-based terminology and related notations, followed by discussions of representing orderings as strings. We also briefly discuss representing orderings through vectors of rank numbers.

2.1. Strings: concepts and notation Throughout this paper, we will often use the notation [n] to denote the first n natural numbers {1, . . . , n}. Let

R = {r1, . . . , rn} denote a set, also called an alphabet, of n characters. A string over R is a sequence x = x1 x2 . . . xn that arises by concatenation of characters from the alphabet: xi 2 R for i 2 [n]. We let Rq denote the set of finite strings that are constructed from the characters of R by concatenation. We say that a string x = x1 . . . xn has length jxj = n or that x is n-long if it consists of n, not necessarily distinct, characters from R. The empty string or empty sequence, which has a length of zero, is denoted by k and we adopt the convention to write x0 = k. The set Rn denotes the set of all n-long strings over R. If a string x is n-long, it has n nonempty prefixes xi = x1 . . . xi (in particular, xn = x), and the empty prefix x0 = k. In the present context where the strings represent transitive orderings, we have that each character occurs at most once in each string. At most once, since we might allow judges to order only a subset of R. Thus, jxj 6 jRj = n.

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2531

The reverse ~ x of the sequence x = x1 . . . xn is the sequence ~ x1 . . . ~ xn ¼ xn . . . x1 . A k-long string y = y1 . . . yk is a subsequence of x if there exist k + 1, not necessarily distinct and possibly empty, strings v1, . . . , vk+1 2 Rq such that v1y1 . . . vkykvk+1 = x and we write y v x to denote this fact. The set of all subsequences of x is denoted by SðxÞ and we write /ðxÞ ¼ jSðxÞj for the size of this set. Let u v x with u ¼ xj1 . . . xjjuj . Then the gaps of u in x are defined as gm(ujx) = jm  jm1  1 for m 2 [juj  1]. Hence, as is intuitive, gj(xjx) = 0. The width w(ujx) of u in x is defined as jjuj  j1 + 1 and w(xjx) = jxj. If u v x and u v y, we write u v (x, y) and we say that u is a common subsequence of x and y and we will write Sðx; yÞ to denote the set of all common subsequences of x and y with /ðx; yÞ ¼ jSðx; yÞj. Let X denote a collection of strings: X = {x1, . . . , xN} with xi ¼ x1i . . . xni i . Then we write SðXÞ ¼ Sðx1 ; . . . ; xN Þ for the set of all subsequences that are common to all strings in X, i.e. u 2 SðXÞ implies u v xi 2 X for i 2 [N] and /ðXÞ ¼ /ðx1 ; . . . ; xN Þ ¼ jSðXÞj denotes the size of that set. A special kind of common subsequences is called longest common subsequence. We say that u is a longest common subsequence (for short: lcs) of X when 9 = v 2 SðXÞ such that jvj > juj. We will write LðXÞ # SðXÞ to denote the set of all lcs’s of X and llcs(X) for the length of these lcs’s. Whenever we want to confine the length of the (common) subsequences to some k P 0, we use the symbols S k and /k. So, for example, S k ðx; yÞ denotes the set of all common k-long subsequences of x and y and there exist /k(x, y) of such subsequences. 2.2. Orderings of items as strings of characters When we label a set of items with distinct characters from an alphabet, we can use the alphabet R = {r1, . . . , rn} as denoting the set of items. A judge who transitively orders a set of items thus produces a chain of observations

ri1  ri2  . . .  rin ;

ð1Þ

wherein rj  rk denotes the observation that the judge prefers the item labeled rj to the item labeled rk and i1 . . . in denotes a permutation of [n]. Dropping the preference relation  from our notation leaves us with an n-long string over the alphabet R:

x ¼ x1 . . . xn ¼ ri1 . . . rin :

ð2Þ

The reader notes that, since we assume the ordering to be fully transitive, each item or character appears precisely once in the string, implying that the string contains no repeating characters. Later, when we discuss calculating measures of concordance, this property of the strings – no repeating characters – will allow for very efficient algorithms. Given an item set or alphabet R of size jRj = n, we will allow for judges not ordering all items from R and thus generate k-long strings with 2 6 k 6 n. Whenever necessary, we will write xm = x1m . . . xkm to specifically denote the ordering as generated by the mth judge and we use X to denote the collection of the orderings as generated from N P 1 judges. Hence, with these conventions we represent our data as strings over an n-sized alphabet R and no character repeats in these strings. When a judge cannot order certain subsets of items with respect to each other, we write x = x1 . . . (xj . . . xk) . . . xn, indicating that the items between the brackets could not be ordered with respect to each other and we say that (xj . . . xk) represents a ‘‘tie’’. 2.3. Orderings of items as vectors of rank numbers Instead of representing orderings as strings, orderings may be represented as vectors of rank numbers r = (r(r1), . . . , r(rn)) chosen such that r(ri) = k iff xk = ri in the string representation. Hence, the ordering x = r2r4r1r3 is equivalent to the rank vector r = (3, 1, 4, 2) and if there is a tie as in x = r2r1(r4r3), we assign an average rank within the tie: r = (2, 1, 3.5, 3.5). Hence, there is a 1–1 correspondence between the string representation and the rank vector. Cook [14] mentions several other representations that we do not need in this paper. Although ‘‘(item) ordering’’ refers to observations and ‘‘string’’, ‘‘sequence’’ and ‘‘rank vector’’ refer to representations of these observations, we will not be very strict and often use these words as synonyms. 3. Related work 3.1. Sequential pattern mining Sequential pattern mining (SPM) [2,47] originated from analyzing retail customer data and aims at finding (all) lists of items that occur in some minimum fraction of customer buying histories. For example, suppose that 80% of the customers that bought A, also bought B, C and W within a week and bought E and Q within four weeks. The sequence of lists ((A), (B, C, W), (E, Q)) is called a sequential pattern with a support of 80%. SPM is different from the problems that we deal with in this paper, as the data that SPM operates on are lists instead of sequences and the result is a list of items with minimum support.

2532

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

Srikant and Agrawal [51] generalized the SPM-problem to include time constraints, no longer required that the items were bought in the same transaction and advocated the GSP-algorithm as a successor of their famous AprioriAll-algorithm, therewith expanding SPM to find frequent item sequences instead of frequent item lists. Within the tradition of SPM and inspired by the basic idea of AprioriAll, Chen and Cheng [9] proposed their Maximum Consensus Subsequences algorithm (MCS) that generates the longest common subsequences that have support from a user-defined majority of judges. Furthermore, the algorithm identifies items that are the causes of most conflicting subsequences. Finding the most frequent subsequences, as GSP does, or, like MCS or PrefixSpan [47], finding maximum consensus sequences is very useful. However, such ‘‘majority rules’’ do not provide us with a complete picture of the full set of rankings as is provided by one or more consensus sequences and information about the concordance. So, SPM and the present approach are complementary, rather than conflicting. 3.2. Concordance and random variables: The Scarsini-axioms Scarsini [50] studied concordance of two continuous random variables through the concept of a 2-copula. The interested reader is referred to [45] for an introduction to the theory of copulas. Here, it suffices to state that a 2-copula is a continuous map C that uniquely relates the marginal distribution functions of two continuous random variables X and Y to their joint distribution function: FðX ; YÞ ¼ CðFðX Þ; FðYÞÞ. Scarsini formulates [50, pp. 205–206] the following axioms (in our notation and in a different order) for a measure j of concordance: S S S S S S S

1 1 6 jXY 6 1, 2 jXY ¼ jYX , 3 jXX ¼ 1, 4 jðX ÞY ¼ jXY , 5 jXY ¼ 0 when X and Y are stochastically independent 6 if CðFðX Þ; FðYÞÞ P CðFðWÞ; FðZÞÞ, then jXY P jWZ 7 if the sequence CðFðX n Þ; FðY n ÞÞ converges to CðFðX Þ; FðYÞÞ pointwise, then

jX n Yn converges to jXY .

As we will not deal with item-orderings as continuous random variables, the last three of Scarsini’s axioms are irrelevant here. However, discussion of the first four axioms seems warranted. Axiom S 1 is on the range of j: the closed interval [1, 1]. This certainly makes sense in the context of two numerical variables where one interprets jXY ¼ 1 as meaning that X and Y are anti-monotonous. However, we will not confine concordance to apply to just two orderings and it would be hard to understand what j(X) =  1 would mean for a set of orderings X containing three or more distinct orderings. Therefore, we will confine the range of a concordance measure to [0, 1]. Axiom S 2 is a symmetry-requirement; however, when j is allowed to operate on a collection, such an axiom is unnecessary. Translated to item-orderings, Axiom S 3 would mean that judges perfectly agree with themselves. This is intuitive and we will have a similar axiom. As we do not deal with numerical random variables, X has no sensible interpretation in the present context. However, an immediate consequence of S3 and S4 is that jðX ÞX ¼ 1. In the present context, it is natural to compare X to the reversed ordering ~ x and state that concordance between x and ~ x should be zero. Scarsini [50] showed that, even for the discrete case, both Kendall’s index

P

sXY ¼

iPj signððX i

 X j ÞðY i  Y j ÞÞ   n 2

ð3Þ

and Spearman’s index

qXY ¼ 1 

P 6 i ðrankðX i Þ  rankðY i ÞÞ2 ; nðn2  1Þ

ð4Þ

satisfy Scarsini’s axioms asymptotically. However, in Section 4.2, we will show that neither s nor q are satisfying measures of concordance. 3.3. Distance based consensus models: the Cook–Seiford axioms A consensus model prescribes how various orderings of the same item set should be aggregated into one single ranking of all items. Such models exist for partial rankings as arise when the judges only rank some of the items [15,24] as well as for full rankings. Although the methods proposed in this paper are by no means confined to full rankings by all judges, we will only explicitly deal with full rankings.

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2533

Constructing consensus orderings can be interpreted as the problem of finding an ordering that is ‘‘closest’’ to all observed orderings {x1, x2, . . . , xN} = X and this interpretation invites to construct metrics over X  X in order that closeness may be quantified. This idea has been studied by several authors [5,14,16,32] and different metrics have arisen out of different representations of ordering data. The general idea is then, given a metric d(x, y) over the set X  X, to find a consensus orderP ing cX such that x2X dðcX ; xÞ is minimized. The consensus construction method presented in this paper (to be discussed later) fits into this approach in the sense that we propose vector spaces to define concordance as an angle between vectors that represent orderings and such angles are easily transformed into distances by normalizing the vectors to unit-length. Furthermore, we propose kernel functions to evaluate inner products of the representing vectors and from these inner products, it is only a small step to Euclidean distances. P However, we do not propose to minimize x2X dðcX ; xÞ for a number of reasons. First, this minimization could prove to be quite an intractable integer programming problem [14]. Second, in case of little concordance between judges, a single consensus ordering may not be very informative about the full set of orderings and taking decisions on the basis of such a consensus could be quite risky (this is the same objection that was formulated in [9] when these authors motivated the development of MCS). Instead of minimizing, we propose to use the full distance matrix to evaluate concordance and, if high, use the observed ordering(s) closest to the centroid of the vectors as approximations to the consensus. If concordance is low, one might try to find a distance based partition of the set X into more homogeneous subsets and again, use observed orderings that are closest to the centroids of the subsets as representative of the subsets of judges. As we will also need to determine distances between orderings (judges), it is interesting to have a closer look at the distance axioms that were proposed by Cook and Seiford [16]. Cook and Seiford started from the problem of constructing a consensus ranking from rank vectors and proposed six axioms that distances between rank vectors should satisfy. As there is a 1–1 correspondence between rank vectors and sequences, we formulate the axioms in terms of sequences after first discussing the concept of ‘‘betweenness’’: In general, sequence z is between sequences x and y, written as zB(x, y), whenever for all i 2 [n]

rðxi Þ 6 rðzi Þ 6 rðyi Þ or rðxi Þ P rðzi Þ P rðyi Þ: An example is provided below where z is between x and y because the ranks of the items in z are between the ranks of the same items in x and y:

x ¼ acbd;

rðxÞ ¼ ð1; 3; 2; 4Þ;

z ¼ adbc;

rðzÞ ¼ ð1; 3; 4; 2Þ;

y ¼ bdac;

rðyÞ ¼ ð3; 1; 4; 2Þ:

ð5Þ

As one of the axioms is about ‘‘irrelevant alternatives’’, we need two sets of judgments X and Y from the same set of judges: X pertaining to the item set R and Y generated from the item set R [ {a}. So, the sets X and Y differ because the orderings in Y contain an extra item a. Cook and Seiford [14,16] proposed that a rank distance function d(, ) should satisfy, for each triple of sequences xi, xj, xk, CS CS CS CS CS CS

1 2 3 4 5 6

d(xi, xj) P 0, equality holding for xi = xj, d(xi, xj) = d(xj, xi), d(xi, xj) 6 d(xi, xk) + d(xk, xj), equality holding iff xkB(xi, xj), d(xi, xj) = d(p(xi), p(xj)) for an arbitrary permutation p, if yi = xia and yj = xja, then d(yi, yj) = d(xi, xj), d(xi, xj) P 1 for xi – xj.

Equivalent axioms were proposed by Kemeny and Snell [32] for pairwise comparisons. The first three axioms CS 1–3 ensure that the distance function is a proper metric. The additivity implied by Axiom CS 3 enforces equidistance on the rank numbers since ‘‘betweenness’’ results from a condition on the rank vector. Axiom CS 4 ensures that the metric in no way depends on the labeling of the items. Axiom CS 5 is a ‘‘normalization’’ axiom: it ensures that distances are not affected by the size of the object set and the last axiom is a scaling convention. From these axioms, Cook and Seiford derive the unique distance function over the rank vectors ri and rj as

dCS ðri ; rj Þ ¼

n X

jr ki  r kj j:

ð6Þ

k

Interestingly, we can use the Cook–Seiford metric to derive a concordance metric jCS. Thereto we use the fact that, for rank vectors of n items, the maximal Cook–Seiford distance equals

2

bn=2c X

ðn  ði  1Þ  iÞ ¼ 2

i¼1

jn k jn k n ¼M 2 2

ð7Þ

and define

0 6 jCS ðx; yÞ ¼ 1 

dCS ðrx ; ry Þ 6 1: M

ð8Þ

2534

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

4. Concordance in sets of orderings 4.1. Axioms of concordance In many real life situations, there are usually more than two judges, voters or decision makers. Therefore, there is no reason to confine the concept of concordance to just two orderings or judges unless one takes the concept of distance as a starting point. Therefore, we propose a definition of concordance, based on sequence representation of orderings, that applies to finite sets of orderings. Definition 1. Let R = {r1, . . . , rn} and W = {w1, . . . , wn} be two sets of items and let B : R ? W be a bijection of R and W. Let X = {x1, . . . , xN} denote a set of orderings over R and let Y = {y1, . . . , yN} refer to a set of orderings over W. Let U # X and B(U) # Y such that for each xi 2 U, $yj 2 B(U) with, "k 2 [n], B(xki) = ykj. A map C : U ! R is a concordance, if, for all nonempty U, V, W # X, it satisfies the following axioms A A A A A A A

1 2 3 4 5 6 7

C(U) P 0, C(U) = C(B(U)), C(U) P C(U [ V), C(U) = 1 iff jUj = 1, C(U) = 0 iff llcs (U) 6 1, C(U [ V) + C(V [ W) 6 C(U [ W) + C(V), C(U) = C(V) = C(U [ V) iff U = V.

Axiom A 1 sets an arbitrary but intuitive lower bound on concordance. Axiom A 2 states that concordance only depends on the ordering of the items and not on the features of the items. This is reasonable since we may assume that the judges generated the orderings taking the relevant item features into account. So, for example, we demand that the pairs of sequences (abcd, badc) and (pqrs,qpsr) have the same concordance. Axiom A 3 states that concordance C(U) will not increase through adding more orderings to U. This may seem counterintuitive since it implies that adding more and more equal orderings will not increase concordance. However, we do not want to confound the concept of concordance with the mechanism through which a consensus is created. In many situations, adding more equal judgments does not alter the decision since some judges may have much more voting power than others. On the other hand, adding one ordering that is only slightly different from the others will decrease concordance, again without necessarily affecting the decision. Axiom A 4 states that every judge is in perfect agreement with himself and together with Axiom A 3, it implies that C(U) 6 1. Hence the range of C is confined to the closed interval [0, 1]. Normally, judges order the same items so, even when their preferences are radically adverse, the preference orderings will have single items in common: llcs(U) = 1. By stating llcs(U) 6 1 in Axiom A 5, we express that we allow that different judges order different parts of the item set; if these parts happen to be disjoint, the orderings do not share any item in which case llcs(U) = 0. Consider the toy-sequences U = {abc, cab, bca}. It is almost immediate that llcs(U) = 1 with LðUÞ ¼ fa; b; cg yet no pair of these strings contains the reverse of the other string. So, according to Axiom 5, it is not sufficient for concordance to be positive when some judges agree on some subsequences with a length of 2 or longer and concordance can be positive only when llcs(U) P 2. The role of Axiom A 6 is similar to the role of the triangle inequality for distance metrics: d(x, y) 6 d(x, z) + d(z, y) bounds d(x, y) by distances to an arbitrary object z. Here we say that C(U [ W) alone is not sufficient to bound C(U [ V) + C(V [ W). For example, we might have C(U [ W) = 0 while C(U [ V) – 0 – C(V [ W) so we need C(V). Finally, Axiom A 7 states that C is sufficiently sensitive: even the smallest possible change in the set of orderings should be reflected in the value of C. Together, the Axioms A 1–3 generalize the definition of a similarity metric as proposed in [10], the essential difference between similarity and concordance being embodied in Axiom A 2. With an eye on the concordance Axioms A 1–7, let us now have a brief look at a few well known and widely used indices of concordance. 4.2. Classical indices In this section we will discuss three classical, often used indices of concordance. This discussion is by no means exhaustive as we will not deal with all indices that have been proposed for all sorts of specific applications, like for example Cohen’s j [11]. We confront these three classical measures with the definition of concordance as presented in the previous section, compare them with each other and look ahead by comparing one of them with a new measure of concordance c that we will only define and amply discuss in Section 4.3. 4.2.1. Kendall’s s Kendall’s s was already mentioned in Section 3.2, Eq. (3), in the context of concordance between pairs of random variables. Here, we specify it again for pairs of sequences x and y as

2535

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

1 6 sx;y ¼

/2 ðx; yÞ  /2 ð~x; yÞ   6 1: n

ð9Þ

2

s violates Axiom A1 of Definition 1 as its lower bound is negative: it is not a measure of concordance in the strict sense as it balances concordant and disconcordant pairs of items. Furthermore, it is hard to see how it could be generalized to more than two sequences. From Eq. (9), it is evident that s uses little information of the sequences as only pairs of items are con  n þ 1 distinct values in its range [1, 1]. So, even if it were an acceptable measure of sidered. Therefore, s can have only 2 concordance, it is only a crude measure. To demonstrate this, we compare s to a measure of concordance c that we will define and discuss in detail in the sections to come. A measure that is more sensitive should generate more equivalence classes over a set of pairs of orderings. Therefore, we generated all n! permutations of n items, 3 6 n 6 10, picked an arbitrary ordering x0 as the benchmark sequence and calculated c(x0, xi) and s(x0, xi) for the remaining n!  1 sequences. In Table 1 we show the results of this illustrative exercise and in Fig. 1, we plot the results for n = 10. Clearly, c is much more sensitive and the more so when n increases. 4.2.2. Spearman’s q and Kendall’s W Spearman’s q [52] dates back to 1904 and can be defined, analogous to Eq. (4), as

1 6 qxy ¼ 1 

P 6 ni¼1 ðrðxi Þ  rðyi ÞÞ2 6 1; n3  n

ð10Þ

where we use a rank vector representation for the sequences x and y. The main difference between s and q is that s weighs all pairs of orderings in the same way whereas q puts more weight on bigger rank differences. Numerous relations between s and q have been described [34]; for example, we know that ^Þ ¼ p2 sin1 q. Interestingly, q is a bit more sensitive than s but it 1 6 3s  2q 6 1 and that the expectation of a sample Eðs does not beat c but for small n, as is shown in Table 1. Like Kendall’s s, the lower bound of q equals 1 and it is hard to generalize to more than two sequences. So, q is not a proper measure of concordance in the sense of Definition 1. However, the reason that we discuss it, is that it was used to  X denote the average of qx x of all pairs define a famous index for the concordance of N P 2 sequences: Kendall’s W [35]. Let q i j xi, xj 2 X, then W is given by

Table 1 The number of equivalence classes generated with Kendall’s s, Spearman’s q and c on all possible orderings of n objects. n

3

4

5

6

7

8

9

10

s q c

4 4 4

7 10 8

11 20 16

16 35 30

22 56 55

29 84 99

37 120 178

46 165 318

Fig. 1. Scatter-plot of Kendall’s s vs. c as defined in Eq. (32) for all 10! = 3628800 orderings of 10 objects.

2536

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

06W ¼

X þ 1 ðN  1Þq 6 1: N

ð11Þ

Computationally, there are much more efficient formulae than Eq. (11) but this equation clearly shows that W will be positive when some judges agree on the orderings of some subsets of the items. In fact, it shows that W violates Axiom A3 of Definition 1: W will increase when more judges agree on the same items. So, apart from its poor sensitivity, W is not a proper measure of concordance either. So, unfortunately, none of the classical measures is very satisfying, neither in terms of the Axioms A 1–7 nor in terms of sensitivity. Therefore, we discuss new, better measures in the next section. 4.3. Measures of concordance Concordance is about agreement among orderings of items. All information about this agreement is stored in the set SðXÞ of all common subsequences of the set of orderings X. Hence, it is natural that we define a concordance as a function F of some property a(X) of SðXÞ:

cðXÞ ¼ FðaðXÞÞ:

ð12Þ

Below, we will discuss some measures of concordance wherein we take

0 6 cðXÞ ¼ FðaðXÞÞ ¼

aðXÞ  min aðXÞ 6 1: max aðXÞ  min aðXÞ

ð13Þ

For ease and clarity of presentation, we will assume that all sequences pertain to all items of r. Later, we will spend some lines on how to handle sequences of unequal lengths, i.e. orderings of different subsets of R. 4.3.1. Length of the longest common subsequence By specifying a in Eq. (13) as

aðXÞ ¼ maxfjuj : u 2 SðXÞg ¼ llcsðXÞ;

ð14Þ

one takes the length of the longest common subsequence of X as the fundamental quantity: the longer, the more concordance. Using Eq. (13) immediately yields:

cðXÞ ¼

llcsðXÞ  1 : n1

ð15Þ

We note that c(X) as defined in Eq. (15) is not, according to Definition 1, a proper concordance measure since it violates Axiom A 7. To illustrate this, consider U = {u1 = abcd, u2 = cadb} and V = {v1 = abcd, v2 = dacb}: it is clear that llcs(U) = 2 = llcs(V) = llcs(U [ V) whereas U – V. llcs has been extensively studied since the seventies when [27] proposed a dynamic programming algorithm to calculate llcs(x, y) in H(jxj  jyj) time. Since then, a number of algorithms have been proposed [4] to solve different variants of the longest common subsequent problem. For example, the dynamic programming algorithm can be extended to handle a set of N sequences, which has a complexity of H(NjxjN). This is computationally intractable, even for moderate N. For arbitrary sets, the problem of calculating llcs(X) is known to be NP-hard [41]. In this paper we will present an algorithm that, as a side product, calculates llcs(X) and all longest common subsequences for arbitrary sized X in polynomial time. This is possible since the sequences in X, while representing orderings, have the special property that no character occurs more than once. However, llcs as a measure of concordance has a number of drawbacks. For example, it is possible that a particular X has a great number of longest common subsequences (for short: lcs’s) of length k whereas another set Y has only one lcs of length k + 1 and it would then be difficult to say which of X or Y shows the most concordance. On the other hand, and complementary to a single, synthetic consensus, a list of all lcs’s provides useful information on what partial orderings all judges agree. In terms of Sequential Pattern Mining, such a list shows all patterns that have a support of 100%. Furthermore, such a list will contain a vast majority of all common subsequences. However, we note that there may exist common subsequences that do not occur as a subsequence of any of the lcs’s. For example, consider the orderings apqbc and pabcq with lcs’s abc and pbc: these lcs’s do not contain the common subsequences q, pq and aq. We note that Levenshtein’s edit distance [38] dL is directly related to llcs through the identity dL(x, y) = jxj + jyj  2llcs(x, y) but the edit distance is only defined for pairs of sequences, not for sets with more then two orderings. 4.3.2. All distinct common subsequences As lcs’s do not contain all the information about the common subsequences of strings, several authors have proposed similarity measures based on all common subsequences instead of just the longest ones. However, these proposals were confined to pairs of sequences. In its most simple form, this amounts to using

aðx; yÞ ¼

X

1  1 ¼ jSðx; yÞj ¼ /ðx; yÞ:

ð16Þ

u2Sðx;yÞ

Algorithms to evaluate a were proposed in [20]. More sophisticated measures weigh the common subsequences according to their embedding frequency but, in the context of concordance of transitive orderings, this is irrelevant since all subsequences

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2537

appear just once in any string. Ignoring embedding frequency, we mention a proposal made in [40] that employs a decay for subsequences with widely separated characters that uses the width w(ujx) (see Section 2) of the subsequences:

X

aðx; yÞ ¼

kwðujxÞ  kwðujyÞ

ð17Þ

u2Sðx;yÞ

for 0 < k < 1. This a penalizes subsequences with wide gaps between the characters. Unfortunately, it also penalizes for the occurrence of long compact common subsequences which is hard to defend in the context of the measurement of concordance. Therefore, we generalize and take a(X) = /(X), i.e. we take the number of all distinct common subsequences of the set X as the fundamental quantity and define

cðXÞ ¼

/ðXÞ  ðn þ 1Þ : 2n  ðn þ 1Þ

ð18Þ

As we assume that all judges order the full set R of alternatives and since the empty sequence k v x for any sequence, we have that

n þ 1 6 /ðXÞ 6 2n

ð19Þ

and hence that 0 6 c(X) 6 1. Clearly, if /(X) = n + 1, we must have that Axiom A 5 of Definition 1 is satisfied: there must be two sequences in X that order the items in reversed order for if not, there would have been common pairs. Eq. (18) satisfies Axiom A 1 since /(X) P n + 1. Axiom A 2 is satisfied as a 1–1 mapping Y = F(X) will not affect the number of common subsequences: /(Y) = /(F(X)) = /(X). Furthermore, we must have that Axiom A 3 is satisfied for if u 2 SðU [ VÞ; u 2 SðUÞ hence /(U [ V) 6 /(U). To show that Eq. (18) also satisfies Axiom A 6 we define, for a set X of n-long strings, the set X ¼ Rn n X. Now we have that

/ðU [ VÞ þ /ðV [ WÞ ¼ /ðU [ V [ WÞ þ /ðU [ V [ WÞ þ /ðU [ V [ WÞ þ /ðU [ V [ WÞ 6 /ðU [ WÞ þ /ðVÞ;

ð20Þ ð21Þ

the inequality following from /(U [ V [ W) 6 /(U [ W) and /ðU [ V [ WÞ þ /ðU [ V [ WÞ þ /ðU [ V [ WÞ 6 /ðVÞ. It is not difficult to show that /(X) satisfies Axiom A 7 and hence, c of Eq. (18) too must satisfy Axiom A 7. So, c(X) as defined in Eq. (18) is an acceptable measure of concordance and it puts equal weight on all common subsequences, regardless of their properties. In the next subsection, we discuss some possibilities to weigh the subsequences according to some of their properties. 4.3.3. Weighing the subsequences Setting a(X) = /(X) means that we just count all distinct common subsequences regardless of other properties that these subsequences might have. Therefore, it is worthwhile to discuss how to handle the weighing of subsequences. We start considering the length of the subsequences as a property of relevance and therefore, we write S k ðXÞ to denote the set of k-long common subsequences of X and /k ðXÞ ¼ jS k ðXÞj to denote its cardinality. Weighing according to length then amounts to defining a function W : N2 ! R and determining

aðXÞ ¼

n X

Wðk; /k ðXÞÞ;

ð22Þ

k¼0

min aðXÞ ¼ Wð1; nÞ þ Wð0; 1Þ;

ð23Þ

   n W k; : k k¼0

ð24Þ

n X

max aðXÞ ¼

We can use these quantities to construct a measure cW that is analogous to c as defined in (18):

cW ðXÞ ¼

Pn

k¼0 Wðk; /k ðXÞÞ

 ðWð1; nÞ þ Wð0; 1ÞÞ    n  ðWð1; nÞ þ Wð0; 1ÞÞ k¼0 W k; k

Pn

and it is not difficult to see that cW satisfies the concordance axioms of Definition 1 under mild restrictions on W: R 1 W(k,m) P 0, R 2 "k, W(k, 0) = 0, R 3 W(k, m) is monotone increasing in m. Clearly, if, "k, W(k, m) = m, Eq. (25) reduces to the familiar Eq. (18).

ð25Þ

2538

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

Apart from their lengths, one might want to somehow account for the gaps in the subsequences. For example, one might want to confine the subsequences to those that have no gaps exceeding a fixed size d, the so-called d-subsequences (e.g. [30]), or penalize for subsequences that have big gaps. Thereto, we generalize the quantity /k(X) in Eq. (22) and write

aðXÞ ¼

n X

Wðk; #k ðXÞÞ;

ð26Þ

k¼0

where

#k ðXÞ ¼

X

gðu; XÞ:

ð27Þ

u2S k ðXÞ

P Clearly, when we define g(u, X) = 1 iff u 2 S k ðXÞ and g(u, X) = 0 otherwise, gðu; XÞ ¼ u2SðXÞ 1 ¼ /k ðXÞ. This is certainly not the only way to define #k. For example, we might penalize subsequences for wide gaps by defining the ‘‘density’’ 0 < Dm(u) = juj/ w(ujxm) 6 1 (if u has wide gaps in xm, Dm will be small) and then set

#k ðXÞ ¼

N X Y

Dj ðuÞ:

ð28Þ

u2SðXÞ j¼1

Counting only d-subsequences could be accomplished by setting

Dj ðuÞ ¼



0

if 9m : g m ðujxj Þ > d;

1 otherwise

ð29Þ

Of course, many more variants are possible but we will not dwell upon these. Mostly, determining min a(X) will be easy but determining max a(X) may be a problem. However, given an algorithm to calculate a(X), this problem can be solved ‘‘empirically’’ by calculating max a(X) = a(xm) for any xm 2 X. 4.3.4. Average pairwise concordance as a diagnostic When c or cW happens to be a small number, it is not clear whether most judges disagree on the ordering of the items or that there is only a small minority of deviating judges confronting a vast, homogeneous majority of highly equivocal judges. So it is an asset to have a diagnostic in such cases: the average pairwise concordance

ðXÞ ¼ 06c

X x;y2X

,

cðx; yÞ

N



2

6 1:

ð30Þ

 is not a proper measure of concordance in the sense of Definition 1 since it violates Axiom A 3: adding more judges Clearly, c . But because that strongly agree will raise the value of c

ðXÞ; 0 6 cðXÞ 6 min cðx; yÞ 6 c x;y2X

ð31Þ

c is a useful diagnostic in case c(X) is small: a big difference cðXÞ  cðXÞ indicates that there is a small minority of deviating judges. Of course, the standard deviation of the c(x, y) is an interesting tool too. 4.3.5. Scaling c There is a scaling problem with c(X) as defined in Eq. (18) that is best illustrated with a small example. Consider the 7long sequences x = abcdefg and y = abcdfeg. It is not difficult, only tedious, to verify that /(x, y) = 96 and thus that 968 cðx; yÞ ¼ 1288 ¼ 0:73. But this is quite a low number since quite remote from one while the two sequences only differ by a swap of items e and f: they could hardly differ less! In practice, c will be quite close to zero because its denominator is so big and its numerator is so sensitive. Therefore, we propose to measure the subsequence counts on a log2-scale and use

0 6 cðXÞ ¼

log2 ð/ðXÞ  ðn þ 1ÞÞ 6 1: log2 ð2n  ðn þ 1ÞÞ

ð32Þ

With this slight adaptation, c is still a concordance measure that is easy to interpret. We have now dealt extensively with the measurement of concordance so in the next section, turn our attention to the concept and construction of consensus. 5. Consensus in sets of orderings In Section 1, we stated that a consensus is an ordering that in some sense best represents the full set of orderings X. We define the consensus as the centroid of a Euclidean space wherein the N sequences are represented as vectors x1, . . . , xN. As explained in the next subsection, the distances between the vectors are measured in terms of the number of (weighted) common subsequences, hence the distances are directly related to concordance: the smaller the distance between two vectors, the bigger the concordance between the pair of orderings represented. The centroid is ‘‘characteristic’’

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2539

for the set of vectors/orderings in the sense that the centroid cX is the vector that minimizes the sum of squared distances to all other vectors:

cX ¼ N1

N X

xj ¼ min z

j¼1

N X ðxj  zÞ2 :

ð33Þ

j¼1

The ‘‘quality’’ of the centroid can be quantified as the size of the sum of squared distances it minimizes, i.e. by calculating PN 2 j¼1 d ðxj ; cX Þ. However, we will not try to recover the consensus ordering from this centroid as it may not represent an observable ordering. Instead, we will calculate the distances d(xj, cX) and consider the sequences whose representing vectors are closest to the centroid as approximations of the consensus orderings. That we only obtain one or more approximations, is not a very serious problem since when the closest vectors are very close, the small distances will imply that the concordance between the approximation(s) and the centroid is high. If, on the other hand, the closest distances are big, many of the orderings must be conflicting and in that case, a consensus ordering is not a very useful thing. In fact, this possible lack of ‘‘representativeness’’ of a consensus inspired Chen and Cheng [9] to propose their MCS-algorithm that determines common partial orderings with a user-defined support. We propose to use the set LðXÞ of lcs’s as indicators of the representativeness of the (approximations of the) consensus. If LðXÞ contains only a few, relatively short lcs’s, a single consensus cannot be very useful since there must be many conflicting orderings. If, on the other hand, LðXÞ is a big set and/or the lcs’s are long relative to the number of items n, the lcs’s provide useful common partial orderings with a support of 100%. In Section 6.5, we will explain how LðXÞ can be determined. 5.1. Subsequence spaces and inner products We construct a high-dimensional vector space where vector coordinates correspond to subsequences. We fix an arbitrary order on the sequences in R⁄, e.g. lexicographically, and then we use the mapping r : R ! N to construct, for each x 2 R⁄, a vector x = (x1, x2, . . .) according to

8u 2 R : xrðuÞ ¼



f ðu; xÞ if u v x; 0

otherwise;

ð34Þ

wherein f(u, x) is some function that subserves the application and satisfies f(k, x) = 0 for all x 2 R⁄, in order that k ? 0 = (0, 0, . . .). There are several interesting ways to specify the coordinate values. The simplest way is to set f(u, x) = 1 for all u, x 2 R⁄nk. Then Eq. (34) reduces to

8u 2 R : xrðuÞ ¼



1 if u v x 0

ð35Þ

otherwise

with the result that the inner product x0 y evaluates the number of common subsequences /(x, y):

x0 y ¼

X

f ðu; xÞ  f ðu; yÞ ¼

u2R

X

1  1 ¼ /ðx; yÞ:

ð36Þ

u2Sðx;yÞ

Analogously, x0 x = /(x) = 2jxj since, in the present context, no x has repeated characters. Clearly, we can use these inner products to calculate squared Euclidean distances as 2

d ðx; yÞ ¼ x0 x þ y0 y  2x0 y jxj

jyj

¼ 2 þ 2  2/ðx; yÞ;

ð37Þ ð38Þ

showing that the unit of distance equals (non-common) subsequences. It is convenient to use the generic name Linear Subsequence Metric, for short ‘‘LSM’’, for subsequence vector spaces defined through Eq. (34) and distance function (38). By choosing an appropriate specification for f(u, x) in Eq. (34), subsequence weighings as discussed in Section 4.3.3, and many more, can be formulated in terms of an LSM. We will not dwell upon this as the vectors themselves will not be used in the actual calculations of their inner products. Interestingly, we can interpret /(X) with jXj > 2 as a generalized inner product hx1, x2, . . .i:

/ðXÞ ¼ hx1 ; . . . ; xN i ¼

X

N Y

i

j¼1

xij ;

ð39Þ ð40Þ

wherein the xij now refer to the vector coordinates and hx1, . . . , xNi is called an N-inner product. The properties of such innerproducts and their associated norms have been amply studied in e.g. [26,42].

2540

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

5.2. LSM’s and the Cook–Seiford axioms The reader notes that an LSM satisfies most of the Cook–Seiford axioms. An LSM is a metric so it satisfies the first three axioms. However, the additivity d(xi, xj) = d(xi, xk) + d(xk, xj) as implied by the equality of Axiom CS 3 is not a property of LSM’s. For the reader easily verifies that for the example sequences of (5), we have that d2(x, y) = 18 while d2(x, z) = d2(z, y) = 16. The additivity implied by Axiom CS 3 in fact enforces equidistance on a structure that is generated from rank numbers that, by nature, lack such property. For an LSM, such an axiom is not necessary since an LSM is based on (a transformation of) a set count. A permutation of a sequence x does not affect /(x) and it is not difficult to see that the same permutation applied to both x and y will not affect /(x, y), hence Axiom CS 4 of Cook and Seiford is satisfied by LSM’s. Axiom CS 5 is a ‘‘normalization’’ axiom: it ensures that distances are not affected by the size of the object set. In general, LSM’s will not satisfy this axiom. For by elongating x and y with a new object a, we will have that /(xa) = 2/(x) and / (xa, ya) = 2/(x, y) and thus, in an LSM constructed through Eq. (35), the squared distances will double. However, by normalizing the vectors to unit-length and measuring distance on the unit-sphere, this can be easily mitigated [10]. The last axiom is a scaling convention that is automatically satisfied by an LSM constructed according to Eq. (35). Cook and Seiford showed [16] that their rank number based metric dCS, specified in Section 3.3 as Eq. (6), uniquely satisfies the above axioms. However, the LSM’s of the previous subsection satisfy almost all of the Cook–Seiford axioms and do not require an assumption that enforces equidistance on rank numbers. Of course, actually constructing the vectors as discussed here and multiplying them is not feasible, not even for small item P sets as the number of coordinates of these vectors would, for an object set of size n, equal ni¼0 ni ; even for small item sets, this is a colossal number. Section 6 therefore deals with efficient kernels to evaluate the inner products without having to explicitly construct the vectors. 5.3. Distances to the centroid Once the pairwise distances d(xi, xj) are available for the set of vectors {x1, . . . , xN} = X, we can easily evaluate the distances P d(xj, c) to the centroid c ¼ j xj =N. To explain how this is accomplished without actually using the vectors themselves, we fix an arbitrary vector y = xi 2 X and write 2

d ðy; cÞ ¼ y0 y  2c0 y þ c0 c ¼ y0 y  2N1

N X

y0 xj þ N2

j

N X

ð41Þ

!2 xj

:

ð42Þ

d ðxj ; xk Þ:

ð43Þ

j

Manipulating the latter expression then yields 2

d ðy; cÞ ¼ N1

N X

2

d ðy; xj Þ  N2

j¼1

N1 X N X

2

j¼1 k¼jþ1

So, once the distances are available, finding the vectors in X that are closest to the centroid c is straightforward. 6. Calculating concordance Central to the calculation of concordance c is the calculation of the number of common subsequences /(X). Counting common subsequences has been amply studied by [20,40,55,56] for the case of jXj = 2, i.e. for pairs of sequences. However, not all of the algorithms presented by these authors are easily adapted to evaluate weighted subsequences, handle ties nor do they all lend themselves to be generalized to the case of jXj > 2. So we will focus on a particular algorithm, a so-called ‘‘grid-algorithm’’ proposed in [20], which evaluates /(x, y), the number of common subsequences, and also evaluate #(x, y), i.e. to count weighted subsequences. It will appear that this grid algorithm, although very flexible and efficient, cannot be generalized to deal with the more general problem of calculating /(X) in polynomial time. However, some principles of the grid-algorithm can be used to develop a new algorithm that does handle /(X) with a time complexity of at most O(n3): the ‘‘trail-algorithm’’. This trail-algorithm can be further adapted to calculate #(X), to handle ties and to determine the set of all lcs’s LðXÞ. 6.1. Counting common subsequences: the grid-algorithm Before giving a formal description of the grid-algorithm, we illustrate its principles with an example. Fig. 2 shows two grids representing different stages in the process of finding common subsequences for two toy sequences x = a b c d e f and y = b a e d c f. In both grids, common subsequences are indicated by arrows. In the left grid, only the common 1-tuples are shown and hence the arrows point to the node from which they departed. In the right grid, the arrows point to all nodes that are to the South–East relative to the nodes from which they departed, hence the arrows represent all common 2-tuples. The common 3-tuples arise by combining two arrows with one common node. We constructed E1 according to the rule

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2541

Fig. 2. Counting 1-, 2- and 3-long common subsequences of x = abcdef and y = baedcf using a grid: common subsequences are represented by paths of arrows that are either self-referencing in case of 1-tuples (left grid) or that are pointing to the ‘‘South–East’’ (right grid). The resulting counts are stored in matrices E1, E2 and E3. There are no common 4-long subsequences, hence we conclude that /(x, y) = 6 + 11 + 6 + 1 = 24, the ‘‘+1’’ counting k, the empty sequence.

e1ij ¼ 1 iff xi = yj and e1ij ¼ 0 otherwise. Then we keep track of the number of k-long South–East going paths that depart from each node by adding the numbers South–East of each non-zero cell in Ek1 and storing these results in the corresponding cell in Ek. Adding the values in each matrix Ek then results in the number of common k-long subsequences of the pertaining sequences. The process stops as soon as all cells of Ek+1 equal zero, implying that x and y have no subsequences in common with a length of k + 1 or longer and that all common subsequences have been counted but the empty k. The process described and illustrated in Fig. 2, is formalized in the next Lemma 1. Lemma 1 ([19,20]). Let x, y 2 Rn, neither having repeated characters. Furthermore, let /(x, y) denote the number of common   ðkÞ distinct subsequences of the pair (x, y), let /k(x, y) denote the number of k-long subsequences of x and y and let EðkÞ ¼ eij denote ð1Þ ð1Þ ðkÞ ð1Þ P ðk1Þ n  n-matrices as follows. We set eij ¼ 1 if xi = yj, and eij ¼ 0 otherwise. For 2 6 k 6 n, we set eij ¼ eij . Then a>i;b>j eab P ðkÞ /k ðx; yÞ ¼ ij eij and

/ðx; yÞ ¼ 1 þ

X

/k ðx; yÞ:

ð44Þ

k¼1

ðkÞ

Proof. By induction over k; eij equals the number of k-long common subsequences that start at position i in x and at position j in y and spell the same string. h The reader notes that, as discussed in Section 5.1, /(x, y) = x0 y, an inner product of vectors in an LSM, hence the calculation of /(x, y) without first constructing the vectors x and y implies a kernel function. Therefore, we say that Lemma 1 defines a kernel function. Clearly, the algorithmic complexity of this kernel function equals H(n3). However, the cubic function will only become realistic when the two sequences happen to be identical; in all other cases the algorithm will stop as soon as the common subsequences have been counted. Furthermore, the reader notes that at each next step, the matrices will have one more row and one more column filled with zero’s only, implying that the adding process can be speeded up considerable. Further technical improvements were discussed in [19]. In the next subsection, we discuss adaptations of this kernel that allow for weighting the common subsequences according to the more complicated LSM’s discussed in Sections 4.3.3 and 5.1.

2542

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

6.2. A weighing grid kernel Here we discuss ways to adapt the kernel function implied in Lemma 1 so as to allow for weighing the common subsequences as previously discussed. To prevent confusion, we will confine the use of the symbol / in case of unweighed subsequences and use # where we use weighing of whatever nature. Lemma 2. Let x, y 2 Rn, neither having repeated characters. Furthermore, let 0(x, y) denote the number of weighed common distinct subsequences of the pair (x, y), let #k (x, y) denote the number of weighed k-long subsequences of x and y and let   ðkÞ ð1Þ ð1Þ denote n  n-matrices as follows. We set eij ¼ 1 if xi = yj, and eij ¼ 0 otherwise. For 2 6 k 6 n, we set EðkÞ ¼ eij ðkÞ

ð1Þ

eij ¼ eij

X

ðk1Þ

dði; a; j; bÞeab

ð45Þ

;

a>i;b>j

wherein d is an arbitrary function. Then #k ðx; yÞ ¼

#ðx; yÞ ¼ Wð1Þ þ

X

P

ðkÞ ij eij

and

Wðk; #k ðx; yÞÞ

ð46Þ

k¼1

with W an arbitrary function. Proof. The proof is analogous to the proof of Lemma 1 and is left to the reader.

h

Passing from Lemma 1 to Lemma 2, there are two differences. First, there is the additional function d as appearing in Eq. (45) and operating on the indices; this function allows us to weigh the gaps within the subsequences. The second difference is the function W appearing in Eq. (46) that allows for weighing the length of subsequences. Clearly, by setting d(i, a, j, b) = 1 everywhere and setting W(k, #k) = /k, we are back at evaluating /. But by setting

 dði; a; j; bÞ ¼

1 if a  i  1 < d and b  j  1 < d; 0 otherwise;

ð47Þ

we only1 count those subsequences whose gaps never exceed some upper bound d. Alternatively, we could weigh the subsequences according to their ‘‘resistance’’ through first setting, for some 0 < g < 1,

dði; a; j; bÞ ¼ gai1þbj1

ð48Þ

and later averaging the gap-weights through setting 2

Wðk; #k ðx; yÞÞ ¼ #k ðx; yÞ=k :

ð49Þ

Of course, the time complexity of the algorithm implied by Lemma 2 depends upon the choice of the functions d and W. However, for most practical choices of these functions, the time complexity will not differ from that of the standard gridalgorithm. In the next subsection, we will concisely deal with calculating /(X), i.e. the number of (weighed) common subsequences in a set X = {x1, . . . , xN} for the general case jXj P 2. 6.3. Generalizing the kernel functions: the trail-algorithm As already stated in Section 5.1, calculating /(X) amounts to the evaluation of a generalized inner product hx1, . . . , xNi. So, an algorithm that determines /(X) may be called a generalized kernel. Below, we will discuss such generalized kernels. It is not difficult to generalize Lemma 2 to obtain an algorithm for evaluating the number of subsequences common to all sequences in the set X. Essentially, it is the same algorithm but now it operates on an jXj-dimensional grid instead of a 2dimensional grid. This appears to require only notation that is a bit more sophisticated and indeed, a proof of correctness of such an algorithm is quite straightforward. However, such an algorithm is of complexity H(nN+1), which is consistent with the complexity H(n3) for the algorithms implied by Lemmas 1 and 2. Clearly, this is unacceptable, even for a moderate values of N. In this section, we present an algorithm that avoids most of the redundant work: the checking of zero-filled cells of the E(k). Instead, we focus on the trail of n non-zero cells in the almost empty multidimensional grid. Thereto, we construct this trail as an array 1 This construction solves a problem posed in [30], of evaluating the number of subsequences with a specified maximum gap-size, so-called ‘‘dsubsequences’’.

2543

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

0

1 i11 ; . . . ; i1N B . .. C C IðXÞ ¼ ðI1 ; . . . ; IN ÞT ¼ B . A; @ .. in1 ; . . . ; inN wherein the arrays Ij point at the location of the non-zero cells of the multidimensional grid, relative to an arbitrary reference ordering. We fix this reference ordering as the sequence x1 = x11 . . . xn1 and set ij1 = j. So, the first column of I(X) consists of the integers 1, 2, . . . , n. The next columns are used to indicate the position of the item xj1 in the ordering xk as generated by the kth judge. This is accomplished by setting, for j > 1, ijk = m iff xmk = xj1. We illustrate this construction in Table 2 for the toysequences X = {x1 = abcdef, x2 = baedcf}, the same sequences that were also used in Fig. 2. Next, we define a precedence relation  on the index-arrays Ij by writing, "i, j 2 [n]

iff; 8k 2 ½N;

Ii  Ij

iik < ijk

and we assign a dominance-set Di to every index-array Ii such that

j 2 Di

iff

Ii  I j :

In effect, this implies that the Di contain the indices of the (n, N)-index arrays that are ‘‘South–East’’ of Ii. Constructing the Di is n of time complexity H(n2N) since it involves ð Þ comparisons of N pairs of integers. 2 P Next we define an array q1 = q11, . . . , q1n with q1j = 1 for all 1 6 j 6 n. Obviously, i qi1 ¼ n ¼ /1 ðXÞ, the number of common 1-long subsequences of X = {x1, x2}. Now we calculate the next array qik from the previous array using the dominance-sets Di as follows:

qik ¼

X

qjðk1Þ

ð50Þ

j2Di

P P and from this array we obtain /k ðXÞ ¼ i qik . As /ðXÞ ¼ 1 þ k /k ðXÞ, evaluating /(X) through the procedure sketched and 3 2 illustrated in Table 2 is only of complexity min{O(n ), O(Nn )} since evaluating I(X) and {Di} is of order O(Nn2) and calculating Q = (qij)nn requires O(n3). In applications, we may expect to mostly encounter n < N. We formalize the above algorithm, including the weighing of the subsequences and covering jXj P 1 in the next Lemma: Lemma 3. Let X = {x1, . . . , xN} denote a non-empty set of N P 1 sequences xi = xi1 . . . xin of n items without repetitions and let I(X) = (ijk)nN denote a set of n N-long arrays of integers such that



ijk ¼

j;

if k ¼ 1;

ð51Þ

m; if k > 1 and xj1 ¼ xmk :

Furthermore, let there be n sets of integers Di, i 2 [n], such that k 2 Di if Ii  Ik and let Q = (qij) denote an (n  n)-matrix with qi1 = 1 for i 2 [n] and, for j > 1,

qij ¼

X

qkðj1Þ dðIi ; Ik Þ;

ð52Þ

k2Di

wherein d is an arbitrary function. Then, for k 2 [n], #k ðXÞ ¼

Pn

i¼1 qik

and hence #ðXÞ ¼ Wð1Þ þ

P

k¼1 Wðk; #k ðXÞÞ.

Proof. The proof is by induction and is left to the reader. h By appropriately specifying d and W, Lemma 3 provides us with a very efficient algorithm to compute inner products of sequence representing vectors and to calculate the number of (weighed) common subsequences of a set of sequences of arbitrary size. So, the proposed indices of concordance c(X) and c are easily calculated for a wide variety of weighing schemes. In the next subsection, we will adapt the algorithm of Lemma 3 to handle ties. 6.4. Ties in the trail Handling ties is easy when we assign to each sequence xi = x1i . . . xni an array ti = t1i, . . ., tni of ‘‘tie-labels’’ such that

8j; k 2 ½n : t ji ¼ t ki iff xji xki :

ð53Þ

Table 2 Illustrating Lemma 3 with X = {x1 = abcdef, x2 = baedcf}. The reader is invited to compare this table with Fig. 2. x1

x2

i

Ii

Di

qi1

qi3

qi4

a b c d e f P

b a e d c f

1 2 3 4 5 6

(1, 2) (2, 1) (3, 5) (4, 4) (5, 3) (6, 6)

{3, 4, 5, 6} {3, 4, 5, 6} {6} {6} {6} {}

1 1 1 1 1 1

qi2 4 4 1 1 1 0

3 3 0 0 0 0

0 0 0 0 0 0

6

11

6

0

2544

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

Of course, there are many ways to accomplish this; a particularly convenient labeling is illustrated below for an 8-long sequence with two ties: xi ti

= =

a 1

b 2

[c 3

d 3

e] 3

f 4

[g 5

h], 5.

So, a set of sequences X = {x1, . . .} is assigned a set of tie-label arrays T = {t1, . . .}. We illustrate the use of these tie-label arrays by adding ties to the toy-sequences used in Table 2 as shown in Table 3 below. In Lemma 3, we only used the relative position, in the guise of (n, N)-indices, to count subsequences. But as soon as we allow for ties, we need the tie-labels to ‘‘correct’’ for the position of tied objects. This is attained by storing the tie-labels in Nlong arrays Li = (‘i1, . . . , ‘iN), i 2 [n], such that there is a direct correspondence between the index-arrays, Ii and the tie-label arrays Li. So, in Lemma 4, the dominance-sets Di are determined by both the Li and the Ii. Thereto, we first define the relation ¿ on pairs of tie-label arrays: Li ¿ Lj if, "k 2 [N], ‘ik – ‘jk. Lemma 4. Let X = {x1, . . .} denote a non-empty set of N P 1 sequences without repetitions xi = xi1 . . . xin of n objects and let T = {t1, . . .} denote the set of associated n-long tie-label arrays. Let I(X) = (ijk)nN denote a set of n N-long arrays of integers such that

 ijk ¼

j;

if k ¼ 1;

ð54Þ

m; if k > 1 and xj1 ¼ xmk :

and let arrays Li = (‘i1, . . . , ‘iN) be defined such that ‘kj ¼ t kikj . Furthermore, let there be n sets of integers Di, i 2 [n], such that

k 2 Di if ðIi < Ik ^ Li ¿ Lk Þ:

ð55Þ

Let Q = (qij) denote an (n  n)-matrix with qi1 = 1 for i 2 [n] and, for j > 1,

qij ¼

X

qkðj1Þ dðIi ; Ik Þ;

ð56Þ

k2Di

wherein d is an arbitrary function. Then, for k 2 [n], #k ðXÞ ¼

Pn

i¼1 qik

and hence #ðXÞ ¼ Wð1Þ þ

P

k¼1 Wðk; #k ðXÞÞ.

Proof. The proof is by induction and is left to the reader. h Clearly, because of Eq. (53), the time complexity is not substantially affected by constructing and using the tie-labels and hence will equal H(Nn2) for N > n. So, we conclude this section with a very efficient and flexible algorithm. Efficient, since its complexity is linear in the number of judges N and flexible in the sense that the length of the subsequences and the gap-widths can be weighed. 6.5. Generating all lcs’s When we know that concordance is not too low, we often want to know what the consensus is. We already discussed the method that uses the centroid of the representing vectors X and the usefulness of lcs’s in judging the quality of the consensus. In this section, we describe an efficient procedure to find all distinct lcs’s of a set of sequences X. For the case of jXj = 2, Greenberg [25] provided an elegant solution. Here we provide an efficient one for the case where jXj P 2. The method uses the dominance sets Di and the matrix Q as defined in Lemma 3 and we illustrate the method with an example that is shown in Table 4. To explain and prove the correctness of this solution, we have to refine our concept of longest common subsequence as follows. Let v ¼ v 1 . . . v k 2 SðXÞ. When 9 = v 1 u 2 SðXÞ such that jv1uj > jvj, we say that v is a k-long v1-lcs of X: longer common subsequences that start with the character v1 do not exist in X. We will say that v is a proper lcs of X precisely when v 2 LðXÞ, i.e. precisely when v is a longest common subsequence of X, regardless of its first character. Remember that we defined dominance sets {Di} that contain integers referring to items of x1 and containing integers, also referring to items in x1. Now, it will be convenient to also have dominance sets D, indexed by and containing characters from R instead of integers from [n]: we Table 3 Illustrating Lemma 4 with X = {x1 = [abc]def, x2 = b[ae]d[cf]}. i

Ii

Li

Di

qi1

qi2

qi3

qi4

1 2 3 4 5 6 P

(1, 2) (2, 1) (3, 5) (4, 4) (5, 3) (6, 6)

(1, 2) (1, 1) (1, 4) (2, 3) (3, 2) (4, 4)

{4, 6} {4, 5, 6} {} {6} {6} {}

1 1 1 1 1 1

2 3 0 1 1 0

1 2 0 0 0 0

0 0 0 0 0 0

6

7

3

0

2545

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549 Table 4 Calculating the number of common subsequences of X = {x1, . . . , x4} as shown in the left part of the table. x1

x2

x3

x4

i

Ii

Di

q1

q2

q3

q4

q5

a b c d e f g h

a d f h b c e g

b a c e g d f h

b a d c e f h g

1 2 3 4 5 6 7 8

(1,1,2,2) (2, 5, 1, 1) (3, 6, 3, 4) (4, 2, 6, 3) (5, 7, 4, 5) (6, 3, 7, 6) (7, 8, 5, 8) (8, 4, 8, 7)

{3, 4, 5, 6, 7, 8} {3, 5, 7} {5, 7} {6, 8} {7} {8} {} {}

1 1 1 1 1 1 1 1

6 3 2 2 1 1 0 0

6 3 1 1 0 0 0 0

2 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

define xj1 2 Dxi1 iff j 2 Di, thus implying a bijection R M [n]. So, we may write Da for any a 2 R. Now we use these concepts to formulate and prove the next Lemma: Lemma 5. Let X,jXj > 1, denote a set of sequences without repeating characters over R with jRj = n and let fDa g denote dominance sets of the characters a 2 R. Furthermore, let the length of the proper lcs’s of X be ‘, let uk = u1 . . . uk denote the k-long prefix of a proper lcs and let v = v1 . . . v‘k be a v1-lcs of X. Then ukv1 is a (k + 1)-long prefix of a proper lcs of X if v 1 2 Duk Proof. Let LðXÞ denote the set of proper, ‘-long lcs’s of X and let u ¼ uk v 2 LðXÞ. Then v = v1 . . . v‘k and jvj = ‘  k. Suppose there exists a v1-lcs w with jwj > ‘  k. Then ukw would be a common subsequence of X since uk  v1. But jukwj > jukvj which is, by hypothesis, impossible since u ¼ uk v 2 LðXÞ. So, a v1-lcs longer than jvj = ‘  k cannot exist and thus v must be a v1-lcs of X. Now suppose that uk w 2 LðXÞ and that w1 . . . w‘k is a w1-lcs of X. Then uk w1 . . . w‘k 2 LðXÞ if w1 2 Duk and then ukw1 must be a (k + 1)-long prefix of a proper lcs of X. h Lemma 5 implies a recursive algorithm that generates LðXÞ: find all 1-long prefixes of the lcs’s and all (‘  1)-long vj-lcs’s and therefrom construct 2-long prefixes using the dominance sets D or D, then use these 2-long prefixes and (‘  2)-long vjlcs’s, etc. To accomplish this recursion, we use the matrix Q as previously defined. qij = k implies that there exist k distinct common j-long subsequences in X that have xi1 as their first character. Thus, when qij = k > 0 and qi(j+m) = 0, "m P 1, we must have that there exist k distinct xi1-lcs’s. Therefore, when qij = k > 0 and qk(j+m) = 0, k 2 [n] and m P 1, xi1 must be the first prefix of k distinct, j-long proper lcs’s of X. For example, from Table 4 it is clear that P there are i2½n qi4 ¼ 3 distinct 4-long lcs’s, two of which have x11 = a as their first character and one that has x21 = b as a first prefix. Now we put the recursion to work on Table 4 and find q33 = 1 = q34 while q34 = 0 = q44, implying that there exists one 3long c-lcs and one 3-long d-lcs. Since c; d 2 Da and c 2 Db , i.e. 3, 4 2 D1 and 3 2 D2, we find all distinct 2-prefixes as p21 ¼ ac, p22 ¼ ad and p23 ¼ bc. Another two cycles then yield the proper 4-long lcs’s as

p41 ¼ aceg; p42 ¼ adfh; p43 ¼ bceg: More generally, the algorithm can be formulated as P Step 1. Find the biggest integer ‘ such that ni¼1 qi‘ ¼ m > 0 and define the set fp11 ; . . . ; p1m g of 1-long prefixes, set k of them equal to xi1 iff qi‘ = k. Set r: = 1 and go to Step 2. Step 2. Set r: = r + 1. If r > ‘, Stop. Else go to Step 3. Step 3. Find I = {i 2 [n] : (qi(‘r) > 0) ^ (qi(‘r+1) = 0)} and, "j 2 [m], "i 2 I, elongate pjr1 to prj ¼ pjr1 xi1 if xi1 2 Dpr1 . Then go to jðr1Þ Step 2. When the algorithm stops, the set P‘ contains all distinct lcs’s of X. As llcs(X) 6 n = jRj and jDij < n, the algorithm is, given {Di} and Q, of complexity O(n3). The trail-algorithm has now been amply discussed and it is able to rapidly find #(X), to handle ties at almost no cost and part of its output, the {Di}n and Q, allow fast recovery of all lcs’s. Therefore, we evaluate the performance of the trail-algorithm in a number of experiments with artificial data. 7. The trail-algorithm: experimental evaluation In this section, to demonstrate the efficiency of our methods, we conduct an empirical evaluation of the algorithm implied by Lemma 4. The Java source code for this evaluation can be downloaded from2. We used the Windows XP platform on a Dell OptiPlex 745 PC with an Intel Core™2 6700 CPU, running at 2.66 GHz with 3.0 GB RAM. 2

http://home.fsw.vu.nl/ch.elzinga/.

2546

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

The time complexity of the algorithm has two parameters; the number of items n and the number of judges N and we independently varied both parameters in two experiments. In the first experiment, we fixed n as 10, 50, 100, 200, 300, 400, and 500 and for each of these item set sizes, we randomly generated N orderings for N increasing in steps of 15 judges from N = 5 to N = 500 judges. For each of these (n, N)-combinations, we replicated the calculations 100 times and averaged the computation time. We plotted these averages in Fig. 3. Clearly and as expected, for fixed n, the time complexity is linear in N. In the second experiment, we fixed N at 10, 50, 100, 200, 300, 400, and 500 and, for each of the fixed values, varied the size of the item set in steps of 15 items from n = 5 to n = 500, again replicating 100 times for each combination of parameter values. The results are shown in Fig. 4. Now there is a clear, convex non-linearity, showing that, in practice, the algorithm’s time complexity is better than quadratic in the size n of the item set.

8. Ordering judges By now we have amply studied concordance and consensus in sets of item-orderings. However, this is not the only kind of data that is gathered to investigate agreement among judges. Often, in surveys and public opinion polls, respondents are asked about their opinions, attitudes or intentions by having them pick one of a number of ordered categories. In this section we will concisely discuss two methods to asses concordance from such data. A common kind of question in many surveys is a so-called Likert-item [39]: a statement containing an opinion, an attitude or a belief which the respondent is asked to evaluate by picking one from a number (mostly 5, 7 or 9) of ordered response levels. For example, the statement could be ‘‘Immigrants take away our jobs’’ and the response levels could be ‘‘Strongly disagree’’, ‘‘Disagree’’, ‘‘Neither agree nor disagree’’, ‘‘Agree’’ and ‘‘Strongly agree’’. It is presumed that there is a latent dimension to which the statement pertains and that the judges will pick a category, e.g. ‘‘Agree’’, that is perceived as closest to their own ‘‘position’’ on the latent dimension. So, if two judges pick the same category, these positions do not necessarily coincide.

Fig. 3. Runtime of the algorithm implied in Lemma 4 when the length (n) of sequence is fixed and varying N from 5 to 500 with a step of 15.

Fig. 4. Runtime of the algorithm implied in Lemma 4 when the size (N) of the set of sequences is fixed and varying n from 5 to 500 with a step of 15.

2547

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

Clearly, by picking a category, the judge orders himself with respect to all of the categories of the Likert-item and the aggregated judgements of many judges generate a frequency distribution of categories picked. From this frequency distribution, Tastle and Wierman [54] derive an index of concordance (which they call ‘‘consensus’’) about the content of the Likertitem. To discuss their proposal and an alternative, we first need some concepts and notation. Let us identify the set L = {c1, . . . , ck} with the categories of the Likert-item with the convention that if i < j, then cj signifies more agreement with the item-content than does ci. Without loss of generality, we may identify the {ci} with the first k positive integers. We write Ji = cm = m to denote the observation that the ith judge picked alternative cm as the one most correctly reflecting his opinion about the content of the Likert-item. Interpreted as counts, the meaning of Ji = m = Jl is clear: both judges ‘‘passed’’ an equal number of categories to express their opinion and Ja = Ji + 1 expresses the observation that judge a passed one more category than judge i. The frequencies fm = j{Ji : Ji = m}j are simply the sizes of sets of judges picking a particular category. Clearly, the probability that a randomly P chosen judge picks category m equals fm/N = pm and ki¼1 pi  i ¼ E p ðJÞ denotes the expectation of the random variable J with probability distribution {pi}. With these concepts and notation, the proposal of Tastle et al. reads as

0 6 CnsðJÞ ¼ 1 þ

k X i¼1

  ji  E p ðJÞj 6 1: pi log2 1  k1

ð57Þ

It is not difficult to see that Cns(J) = 1 if there is perfect concordance. For if there is, all judges chose the same category, say cm, so pi = 0 if i – m, pm = 1 and E p ðJÞ ¼ m. Under these conditions, all summands in (57) equal zero hence Cns(J) = 1. On the other hand, if concordance is minimal, half of the judges must have chosen category 1 and the other half must have picked category k. But then p1 = 21 = pk and E p ðJÞ ¼ k1 þ 1 and hence Cns(J) = 0. 2 Tastle et al. [54, pp. 538] argue that means and variances should not be used since J only provides for an ordinal measure. Indeed, interpreted as an ordering of judges on a latent continuum of strength of opinion or attitude, J has ordinal properties only. However, here we do not use J as an indicator of strength of opinion; we use J as a counter variable and EðJÞ ¼ 3:2 means that judges order themselves such that, on the average, 3.2 categories have been passed. Therefore, in using J for quantifying consensus, it is immaterial that it only has ordinal properties when used as a proxy for strength of opinion. No argument for the construction of (57) is given other than that it seems to work and produces numbers that are not counter-intuitive. However, the interpretation of Eq. (57) is not straightforward, to say the least. What would be the meaning of the finding that Cns(J) = 0.3? Therefore, it is interesting to consider a simpler and easy to interpret alternative. Evidently, if there is full concordance, all judges picked the same category with the result that the variance of the random variable J would be zero: r2J ¼ 0. On the other hand, if there is maximal disagreement, we must have that half of the judges picked category 1 and the other half picked the other extreme, category k. But this would imply that the variance of the random variable J would be maximal: r2J ¼ ðk  1Þ2 =4. Any other distribution of judges over the categories would generate a smaller variance. Therefore, an interesting alternative to the index as defined in (57) would be the variance of J relative to its maximum:

CðJÞ ¼ 1 

4r2J ðk  1Þ2

ð58Þ

:

Now the index has a clear interpretation: C(J) = 0.30 simply means that the variance of the number of ordered categories actually used equals 70% of its maximum. We also know (e.g. [31]) that, given mild assumptions on the sampling process, 2 the sampling distribution of ðk1Þ ð1  CÞ v2df ¼k1 . So, statistical inferences would be easy. 4 To illustrate the behavior of Cns and C, we use some example distributions (previously used in [54, Table 1, pp. 539]) on a 5-point Likert-scale and calculated both coefficients. The results are shown in Table 5. Evidently, Cns and C almost identically order the concordance, only disagreeing on the distributions E, F and G. Bootstrapping shows that the standard error of C is slightly smaller than that of Cns. In general, when measuring uncertainty of probability distributions, entropy is to be preferred over variance since entropy and variance may generate different orderings of the same distributions. However, in the case of concordance, we are

Table 5 Coefficients Cns and C (see text) calculated for 10 toy-data sets for a 5-point Likert-scale, labeled A, . . . , J, as presented in [54, Table 1, pp. 539].

A B C D E F G H I J

1

2

3

4

5

Cns

C

.50 .25 .20 .19 .00 .10 .15 .50 .00 .00

.00 .25 .20 .16 .50 .20 .00 .50 .50 .00

.00 .00 .20 .26 .00 .40 .70 .00 .50 1.0

.00 .25 .20 .29 .50 .20 .00 .00 .00 .00

.50 .25 .20 .10 .00 .10 .15 .00 .00 .00

.00 .29 .43 .52 .58 .63 .70 .81 .81 1.0

.00 .38 .50 .60 .75 .70 .70 .94 .94 1.0

2548

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

interested in the accumulation of probability mass in a small, contiguous area of the support of the distribution and this is what variance measures. Entropy also measures concentration of probability on a set of small measure but this set need not be contiguous [7, pp. 78] [28], not even for the transformation of Eq. (57). Therefore, we prefer a variance-based measure that is easy to interpret. Unfortunately a more fundamental discussion of the merits of variance and entropy is beyond the scope of the present paper. 9. Conclusion Often, groups of decision makers or voters consist of more than two members so it is only natural to have a concept of concordance that applies to more than two preference orderings. Therefore, we formulated our preference axioms in terms sets and subsets of orderings. This is a definite advantage over the distance based models that cannot but start from considering pairs of orderings. Each of the axioms themselves pertains to a precise, substantive idea about concordance. For example, we think that the concept of concordance should be separated from the decision making and hence we demanded that adding more orderings will never increase concordance. We showed that the classical, most often used measures of concordance are crude measures and do not satisfy the axioms whereas the new measure c and its weighing variant # do satisfy the axioms and that they are sensitive to small perturbations of the preference orderings. We also showed that concordance, so defined, is related to a geometry of preference orderings, the Euclidean LSM, and we explained how such a geometry can be used to define and approximate a consensus. The geometry itself arises in a natural way from the sequence representation of the preference orderings and concordance appears in this geometry as a generalized inner product. Finally, we presented two classes of efficient algorithms. First, we presented the Grid-algorithms to calculate pairwise concordances and, from these, distances with which to approximate a consensus. Then we presented the flexible Trail-algorithms, generalized kernels in the LSM, to calculate concordance. Surprisingly, from the output of the trail-algorithm, we were able to efficiently recover all lcs’s in polynomial time. We demonstrated that the theoretical claim, a time complexity of O(Nn2) for N judges and n items, can be justified. So, the efficiency is much better than Chen and Cheng’s program MCS, which has exponential complexity in n [9, pp. 249]. Our approach does not, like MCS does, generate longest subsequences with a user defined support but it does generate all longest common subsequences that have a support of 100%. If more homogeneous subgroups of judges are to be found, then the pairwise LSM-distances can be used with one of the standard partitioning algorithms like e.g. principal component or K-means clustering [48], to construct such groups. A few questions are still open. First, a more precise answer should be formulated to the question of which weighing functions can be applied to subsequence lengths and gaps without producing violations of the concordance axioms. We tried to answer this question in Section 4.3.3 but this answer certainly needs refinement. Second, we know nothing yet about the sampling distribution of the various concordance measures discussed. At least, we should have some statistical theory under the hypothesis that the orderings were generated by a random process. However, developing such theory is beyond the scope of the present paper. Third, it would be interesting to see how we could expand the principles of the trail-algorithm to sequences in other domains like e.g. bio-informatics. This will probably amount to investigating how the condition that the sequences have no repeating characters can be relaxed. Acknowledgements We express our gratitude to the anonymous reviewers for their suggestions that helped to substantially improve our paper. A major part of this research was conducted during the first authors’ visit to the Computer Science Research Institute of the University of Ulster, Northern Ireland, in June 2010, funded by the University of Ulster. References [1] D.C. Adams, C.M. Berns, K.H. Korak, J. Wiens, Are rates of species diversification correlated with rates of morphological evolution?, Proceedings of the Royal Society B (2009) 1–10 [2] R. Agrawal, R. Srikant, Mining sequential patterns, in: Eleventh International Conference on Data Engineering, IEEE Computer Society Press, 1995. [3] M.M.S. Beg, N. Ahmad, Soft computing techniques for rank aggregation on the World Wide Web, World Wide Web-Internet and Web Information Systems 6 (1) (2003) 5–22. [4] L. Bergroth, H. Hakonen, T. Raita, A survey of longest common subsequence algorithms, in: Proceedings of the Seventh International Symposium on String Processing and Information Retrieval (SPIRE’00), 2000, pp. 39–48. [5] J.M. Blin, A linear assignment formulation of the multiattribute decision problem, Revue Française d’Automatique, d’Informatique et de Recherche Operationelle 10 (6) (1976) 21–23. [6] J.-C. de Borda, Memoires sur les élections au scrutin, Histoire de l’Académie Royale des Sciences, Paris, 1781. [7] J-B. Brissaud, The meaning of entropy, Entropy 7 (1) (2005) 68–96. [8] S.W. Bunting, Horizontally integrated aquaculture development: exploring consensus on constraints and opportunities with a stakeholder Delphi, Aquaculture International 16 (2008) 153–169. [9] Y.-L. Chen, L-C. Cheng, Mining maximum consensus sequences from group ranking data, European Journal of Operational Research 198 (2009) 241–251. [10] S. Chen, B. Ma, K. Zhang, On the similarity metric and the distance metric, Theoretical Computer Science 410 (24–25) (2009) 2365–2376. [11] J. Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20 (1960) 37–46. [12] W. Cohen, Learning to order things, The Journal of Artificial Intelligence Research 10 (1999) 43.

C. Elzinga et al. / Information Sciences 181 (2011) 2529–2549

2549

[13] M. Condorcet, Essai sur l’application de l’analyse á la probabilité des décisions rendues á la pluralité des voix, Paris, 1785. [14] W.D. Cook, Distance-based and ad hoc consensus models in ordinal preference ranking, European Journal of Operational Research 172 (2006) 369–385. [15] W.D. Cook, B. Golany, M. Kress, M. Penn, T. Raviv, Creating a consensus ranking of proposals from reviewers’ partial ordinal rankings, Computers and Operational Research 34 (4) (2007) 954–965. [16] W.D. Cook, L. Seiford, Priority ranking and consensus formation, Management Science 24 (16) (1978) 1721–1732. [17] W.D. Cook, L. Seiford, S. Warner, Preference ranking models: conditions for equivalence, Journal of Mathematical Sociology 9 (1983) 125–137. [18] H.H. Edwards, P. Mikusin´ski, M.D. Taylor, Measures of concordance determined by D4-invariant copulas, International Journal of Mathematics and Mathematical Sciences 70 (2004) 3867–3875. [19] C.H. Elzinga, Combinatorial representations of token sequences, Journal of Classification 22 (1) (2005) 87–118. [20] C. Elzinga, S. Rahmann, H. Wang, Algorithms for subsequence combinatorics, Theoretical Computer Science 409 (3) (2008) 394–404. [21] R. Fagin, R. Kumar, D. Sivakumar, Efficient similarity search and classification via rank aggregation, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, San Diego (Cal.), 2003. [22] P.C. Fishburn, A comparative analysis of group decision methods, Behavioral Science 16 (1971) 538–544. [23] M. Gill, K. Martens, E.L. Lynch, A. Salik, S.M. Green, Interrater reliability of 3 simplified neurologic scales applied to adults presenting to the emergency department with altered levels of consciousness, Annals of Emergency Medicine 49 (4) (2007) 403–407. [24] J. González-Pachón, C. Romero, Aggregation of partial ordinal rankings: an interval goal programming approach, Computers and Operations Research 28 (2001) 827–834. [25] R.I. Greenberg, Fast and simple computation of all longest common subsequences, , 2002. [26] H. Gunawan, Inner products on n-inner product spaces, Soochow Journal of Mathematics 28 (4) (2002) 389–398. [27] D.S. Hirschberg, Algorithms for the longest common subsequence problem, Journal of the ACM 24 (4) (1977) 664–675. [28] I.I. Hirschman Jr., A note on entropy, American Journal of Mathematics 79 (1) (1957) 152–156. [29] D.S. Hochbaum, A. Levin, Methodologies and algorithms for group-rankings decision, Management Science 52 (9) (2006) 1394–1408. [30] A. Iványi, On the d-complexity of words, Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös nominatae, Sectio Computatorica (8) (1987) 69–90. [31] N.L. Johnson, S. Kotz, N. Balakrishnan, Continuous Univariate Distributions, second ed., Wiley, 1994. [32] J.G. Kemeny, L.J. Snell, Preference ranking: an axiomatic approach, in: Mathematical Models in the Social Sciences, Ginn, 1962. [33] M.G. Kendall, A new measure of rank correlation, Biometrika 30 (1938) 81–89. [34] M.G. Kendall, Rank Correlation Methods, Griffin, 1948. [35] M.G. Kendall, B. Babington Smith, The problem of m rankings, Annals of Mathematical Statistics 10 (1939) 275–287. [36] A. Koch, A. Strobel, G. Kici, K. Westhoff, Quality of the Critical Incident Technique in practice: interrater reliability and users’ acceptance under real conditions, Psychology Science Quarterly 51 (1) (2009) 3–15. [37] P. Legendre, Species associations: the Kendall coefficient of concordance, Journal of Agricultural, Biological and Environmental Statistics 10 (2) (2005) 226–245. [38] V.I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady 10 (8) (1966) 707–710. [39] R. Likert, A technique for the measurement of attitudes, Archives of Psychology 140 (1932) 1–55. [40] H. Lodhi, C. Saunders, J. Shaw-Taylor, N. Cristianini, C. Watkins, Text classification using string kernels, Journal of Machine Learning Research 2 (2002) 419–444. [41] D. Maier, The complexity of some problems on subsequences and supersequences, Journal of the ACM 25 (2) (1978) 322–336. [42] A. Misiak, n-inner product spaces, Mathematische Nachrichten 140 (1989) 299–319. [43] J. Moon, Topics in Tournaments, Holt, Reinhart and Winston, 1968. [44] A.K. Nair, B.E. Gavitt, M. Damman, W. Dekker, R.C. Green, A. Mandel, S. Auerbeck, E. Steinberg, E.J. Hubbard, A. Jefferson, R. Stern, Clock drawing test ratings by dementia specialists: interrater reliability and diagnostic accuracy, Journal of Neuropsychiatry and Clinical Neurosciences 22 (2010) 85–92. [45] R.B. Nelsen, An Introduction to Copulas, Lecture Notes in Statistics, vol. 139, Springer Verlag, 1999. [46] W.F.J. Parsons, J.G. Blockheim, R.L. Lindroth, Independent, interactive, and species-specific responses of leaf litter decomposition to elevated CO2 and O3 in a Northern hardwood forest, Ecosystems 11 (2008) 505–519. [47] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M. Hsu, PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth, in: ICDE ’01: Proceedings of the 17th International Conference on Data Engineering, IEEE Computer Society Press, 2001. [48] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. [49] S. Roy, G.C. Smith, J.C. Russel, The eradication of invasive mammal species: can adaptation resource management fill the gaps in our knowledge?, Human-Wildlife Conflicts 3 (1) (2009) 30–40 [50] M. Scarsini, On measures of concordance, Stochastica 8 (3) (1984) 201–218. [51] R. Srikant, R. Agrawal, Mining sequential patterns: generalizations and performance improvements, in: Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), Avignon, 1996. [52] C. Spearman, The proof and measurement of association between two things, American Journal of Psychology 15 (1904) 72–101. [53] V. Sperschneider, Bioinformatics, Problem Solving Paradigms, Springer, 2008. [54] W.J. Tastle, M.J. Wierman, Concensus and dissention: a measure of ordinal dispersion, International Journal of Approximate Reasoning 45 (2007) 531–545. [55] H. Wang, Nearest neighbors by neighborhood counting, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (6) (2006) 942–953. [56] H. Wang, Z. Lin, A novel algorithm for counting all common subsequences, in: 2007 IEEE Conference on Granular Computing, 2007, pp. 502–505. [57] G. You, S. Hwang, Search structures and algorithms for personalized rankings, Information Sciences 178 (20) (2008) 3925–3942.