Agricultural Sciences in China
August 2010
2010, 9(8): 1117-1126
A Rank-Sum Testing Method for Multi-Trait Comprehensive Ranking and Its Application LUO Ru-jiu1, 2, HU Zhi-qiu1, KONG Wen-qian1, SONG Wen1 and XU Chen-wu1 Jiangsu Provincial Key Laboratory of Crop Genetics and Physiology/Key Laboratory of Plant Functional Genomics, Ministry of Education/ Yangzhou University, Yangzhou 225009, P.R.China 2 Lianyungang Technical College, Lianyungang 222006, P.R.China 1
Abstract The rank-sum test is a nonparametric method used in variety evaluation. However, the hypothesis testing of the method hasn’t been established for multi-trait comprehensive ranking. In this paper, under null hypothesis H 0: the variety’s ranking on each trait is random, the theoretical distribution of sum of ranks (SR) was firstly derived and further used to obtain the critical values for multi-trait comprehensive evaluation in rank-sum testing. A new C++ class and its basic arithmetic were defined to deal with the miscount caused by the precision limitation of built-in data type in common statistical software under large number of varieties and traits. Finally, an application of the theoretical results was demonstrated using five starch viscosity traits of 12 glutinous maize varieties. The proposed method is so simple and convenient that it can be easily used to rank different varieties by multiple traits. Key words: comprehensive evaluation, sum of ranks, theoretical distribution, critical value
INTRODUCTION Multi-trait comprehensive evaluation converts multiple statistical information values of different attributes into nonparametric relative values and integrates these values to evaluate different grades of objects through mathematical and statistical methods. There are many ways of carrying out comprehensive evaluation, such as the analytic hierarchy process (AHP), fuzzy comprehensive evaluation (FCE), principal component analysis (PCA) and technique for order preference by similarity to ideal solution (TOPSIS). The AHP (Saaty 1980, 1990) is a structured technique for dealing with complex decisions. Rather than prescribing a ‘correct’ decision, the AHP helps the decision makers find one solution that best suits their needs and understanding
of the problem. This method was developed by Saaty (1980) in the 1970s based on mathematics and psychology which has been extensively studied and refined since then. The AHP provides a comprehensive and rational framework for structuring a decision-making problem, representing and quantifying its elements, relating those elements to overall goals, and evaluating alternative solutions. It is used in decision-making situations in a variety of fields such as government, business, industry, healthcare, and education. The FCE is a comprehensive and systematic evaluation method derived from fuzzy mathematics which turns fuzzy factors into quantitative factors through applying the fuzzy relationship synthesis principle. PCA (Gao 2007), invented in 1901 by Karl and Pearson, involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of
This paper is translated from its Chinese version in Scientia Agricultura Sinica. Correspondence XU Chen-wu, Professor, Tel: +86-514-87979358, Fax: +86-514-87996817, E-mail:
[email protected]
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd. doi:10.1016/S1671-2927(09)60198-X
1118
LUO Ru-jiu et al.
uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. This analysis not only solves the problems of overlapping between traits, but also is able to simplify the structure of raw data. TOPSIS (Hwang 1981) is a decision making technique and has been applied widely in multi-objective decision making and evaluation in recent years. The basic concept of this method is that the selected objects should have the shortest distance from the ideal solution and the farthest from the negative-ideal solution. Although the comprehensive evaluation methods mentioned above have been applied to different fields, they only provide different mathematical judging methods to give synthesized values and corresponding ranking while unable to infer the significance of difference between objects. Therefore, in this paper, we proposed a statistical hypothesis testing method, the nonparametric rank-sum test, to evaluate objects with multi-traits. Under the null hypothesis H0: each variety’s ranking on each trait is random, the theoretical distribution of the rank-sum can be deduced and further used to get the critical value of rank-sum testing. Finally, an application was demonstrated using five starch viscosity traits of 12 glutinous maize varieties.
THEORY AND METHOD Rank-sum distribution Define V = {V1, V2, ..., Vv} as a set of evaluation objects, L = {L1, L2, ..., Ll} as a trait set. Suppose that v objects can be ranked according to each trait. Assign consecutive values 0, 1, ..., v-1 to objects in an ascending or descending order, let Rij be the rank of the ith (i = 1, 2, ..., v) object in the jth trait, the sum of ranks of the object on l traits can be described as: (1) Under the null hypothesis H0: each object is randomly ranked on each trait, the distribution of SR i can be deduced. That is the error distribution of rank-sum of v objects with l traits which is simplified as rank-sum
distribution. Apparently, there are T = v l possible rank combinations. Table 1 lists the rank permutations and their sums under v evaluation objects and l traits, which can be rearranged to obtain the frequency distribution of rank-sum shown in Table 2. Table 1 Rank permutation and their sums under v evaluation objects and l traits Serial number
Traits
1 2
1 0 0
2 0 0
l 0 1
vl
v-1
v-1
v-1
SR 0 1 l(v-1)
Table 2 Frequency distribution of rank-sum under v evaluation objects and l traits SR 0 1 2 l(v - 1)
f 1 l Cl2+ 1 or C2l 1
Obviously, when all traits are given the minimum rank zero, SR is the minimum and namely SRmin = 0 with one count. When all traits are given the maximum rank v-1, SR is the maximum and SRmax = l(v - 1) also with one count. Therefore, SR could be 0, 1, ..., l(v-1) and there are k = l(v - 1) + 1 possible values in total. Here SR = 1 occurs only when one of l traits is ranked one, the other traits are all ranked zero, so the frequency of SR=1 is C1l = l. There are two conditions when SR = 2. The first condition is that two of l traits are given the rank one, other traits are given the rank zero. The frequency of this condition is Cl2 = l(l - 1)/2. The second condition is that one of the l traits is given the rank two, other traits are given the rank zero. The frequency is C1l = l in this condition. Therefore, the total frequency is C1l + Cl2 = l(l + 1)/2 = Cl2+ 1. It is noted that since SR = 2 only appears when v > 3, it means that on the condition v = 3, SR = 2 is possible only when two traits are of rank one and the rest are of rank zero, and in this case, frequency of SR = 2 is Cl2. As a result of this analogy, the general formula of SR under v evaluation objects and l traits can be expressed as: (2)
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
A Rank-Sum Testing Method for Multi-Trait Comprehensive Ranking and Its Application
Here m is the module of SRC/v, namely the integral part of quotient.
Features of rank-sum distribution (1) Since (3) f (SR = SRC) = f [SR = l(v - 1) - SRC] Rank-sum distribution is symmetrical. The center of the distribution (arithmetic mean) can be deduced as μSR = l(v - 1)/2. Mean, median and mode have the same value when k is odd, and mean equals to median when k is even. (2) Under the null hypothesis, rank Rj (j = 1, 2, ..., l) of object on the jth trait follows a discrete uniform distribution. The variance of the rank-sum distribution can be denoted as: 2 = Var(R1 + R2 + ... + Rl) = l(v2 - 1)/12 (4) σSR (3) Since the rank-sum distribution is a symmetric distribution, the coefficient of skewness of the distribution g1 is zero. (4) The coefficient of kurtosis of the distribution can be described as: (5) Fig.1 shows the coefficient of kurtosis of rank-sum distribution under different v and l. According to Fig.1, the coefficients of kurtosis of the rank-sum distribution are all less than zero, i.e., the shape is more flattened than the normal distribution. When v and l are
1119
relatively small, especially when l is small, obvious difference is observed between rank-sum distribution and normal distribution. However, with the increase of v and l, the rank-sum distribution gets closer to the normal distribution. For example, when v = l = 10, ranksum distribution has negligible difference with the nor2 ) (Fig.2). mal distribution N(μSR, σSR
SR
Fig. 2 Comparison between normal distribution and rank-sum distribution under, v,l = 10.
Determination of critical value of rank-sum significance test The following is an example on how to determine the critical value of rank-sum significance test when v = 4, l = 5. (1) Calculate frequency f (column 2 in Table 3) under k = l(v - 1) + 1 = 16 different values of SR from 0l(v - 1) = 15 based on formula (2). Table 3 Rank-sum distribution of 4 evaluation objects and 5 traits
Fig. 1 Kurtosis (g2) of rank-sum distribution under different v and l.
SR
f
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 5 15 35 65 101 135 155 155 135 101 65 35 15 5 1
F 1 6 21 56 121 222 357 512 667 802 903 968 1 003 1 018 1 023 1 024
P (SR SRC) 0.0010 0.0059 0.0205 0.0547 0.1182 0.2168 0.3486 0.5000 0.6514 0.7832 0.8818 0.9453 0.9795 0.9941 0.9990 1.0000
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
1120
LUO Ru-jiu et al.
(2) The cumulative frequency (column 3 in Table 3) can be calculated directly from the following formula or according to f: (6) (3) The cumulative probability (column 4 in Table 3) can be worked out from cumulative frequency, which can be described as: (7) P(SR SRC) = F(SR SRC)/T (4) Work out the critical value SRα of rank-sum significant test according to the level of significance and cumulative probability. Apparently, in this example, the critical value for the left tail is SRl, 0.05 = 2, whereas the critical value for the right tail is SRr, 0.05 = SRmax - SRl, 0.05 = 13. That is at the significant level of 0.05, the left tail critical range corresponds to SR 2, and the right tail critical range corresponds to SR 13.
Bitwise arithmetic When the values of both v and l are small, the commonly used statistical software can be employed to obtain the critical values according to formulae (2) and (6). However, 16 significant digits are reserved at maximum for the buildin data type (double and float) of the common statistical software (SAS OnlineDoc, http://support.sas.com/ onlinedoc/913/; MATLAB Central, http://www.math. uic.edu/~hanson/MATLAB/MATLABformat.html). When T = v l > 1016, the miscount will come forth resulted from the precision limitation and will be greater and greater with the increase of v and l. Take v = l = 20 as an example, precision limitation will lead to more than 1011 errors. A new C++ class and its basic arithmetic method were defined to deal with the problem, which support operation of integers of any lengths. Considering that commonly used computers just manipulate at most 232 integers and the need to temporarily store data for multiplication, 104 system not only fully utilizes the calculating ability of 32 system computers, but also is able to make sure that the 104 system number could be converted to decimal system with convenience. All the integers are broken up into parts first, and then to be stored and operated bitwise through dynamic arrays. Bitwise operation is the core of arbitrary length integral operations, whose main idea can be de-
scribed as the following.
Bitwise addition and subtraction of integers of any lengths (A = P ± Q) (1) Convert integers P and Q into 104 system, and further into one dimensional arrays P[z] and Q[w]. (2) Define an array A[max(z, w)], and carry out A[i] = P[i] ± Q[i] starting from lower digit. (3) Digit-wise adjustment If A[i] 104 then A[i + 1] = A[i + 1] + int(A[i]/104), A[i] = A[i]%104, if A[i] < 0 then A[i + 1] - 1 A[i + 1], A[i] = A[i]+104.
Bitwise multiplication of integers of any lengths (A=P × Q) (1) Convert P and Q into 104 system, and further into one dimensional arrays P[z] and Q[w]. (2) Define an array A[z + w] and carry out A[i] = P[k] × Q[j]......(i=k + j). (3) Digit-wise adjustment, ditto.
Bitwise division of integers of any lengths (A = P/Q, F = P %Q) (1) Convert P and Q into 104 system, and further into one dimensional arrays P[z] and Q[w]. (2) Define an array A[max(z, w) - min(z, w) + 1] and make the initial value to be zero. (3) If P > Q then F = P, go to (11). (4) Assign the first w - 1 numbers in P array to F as the initial values of the remainder; and further let i = z - w. (5) Define the quotient T = 0, assign the next bit of P to F, so F = F × 104 + P[i]. (6) If F < Q, go to (11). (7) If F is one-bit more than Q, then , otherwise (8) Let
.
, T = T + C.
(9) If F Q, then go to (8), otherwise A = A × 104 + T. (10) If it doesn’t get to the units digit, then i = i - 1 and go to (5).
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
A Rank-Sum Testing Method for Multi-Trait Comprehensive Ranking and Its Application
(11) Output quotient (A) and remainder (F). Where ‘int’ stands for rounding operation, ‘%’ stands for complementation. Considering that integer division is a non-closed operation, the value that the division returns is the integral part of the quotient. In order to realize the mixed operation with the common integer variables, basic operational symbols (=, +, -, ×, ÷, and %) are reloaded. According to the procedure of calculating critical values and the algorithm of arbitrary length integers, Borland C++, Builder X, and Borland Delphi 7 are applied to write a program for getting critical value of rank-sum testing. The running output of the program includes the critical value, mean, variance, kurtosis and percentiles of rank-sum distribution given v and l. Taking the practical application into consideration, Tables 4 and 5 list the critical values of 2-30 varieties and 2-30 traits in rank-sum testing.
1121
posed method and the principal component analysis (PCA) in comprehensive evaluation of multiple traits, the data was also analyzed by PCA. The result showed that the information of the five traits can be abstracted by two principal components. The first component accounts for 49.29% of the total contributions, while the second occupies 47.64%. The two components can explain almost all variation of the 5 traits. Therefore, we rank the 12 varieties by the first and second principal components respectively. The results are also shown in Table 6. We then further calculate out the rank correlation coefficient between the rank-sum and the two principal components, which are 0.6923 and 0.7168 respectively. The rank correlation coefficients reach the significance level and the very significance level, which illustrates that the rank-sum method in this paper is not only able to test significance among varieties, but also able to compare and contrast its ranking results with the PCA results.
REAL DATA ANALYSIS DISCUSSION Starchy viscosity is a comprehensive performance of its physical and chemical features and a vital standard in judging the quality of fresh-food glutinous maize. Starchy viscosity can be tested by the rapid viscosity analyser (RVA). RVA is used to measure five parameters including peak viscosity (PV), trough viscosity (TV), final viscosity (FV), breakdown (BD), and setback (SB). Except that the smaller the trough viscosity, the better the quality is, the greater the other four parameters, the better the quality is. Table 6 lists five RVA parameters, ranks and rank-sums of 12 varieties of fresh-food glutinous maize planted in Jiangsu Province, China. Except that the values of trough viscosity are given the rank 0, 1, 2, ..., 11 in the ascending order, the other four characters are ranked in the descending order. Thus, the smaller the rank-sum, the better the quality of varieties will be. From Table 4, for the 12 varieties and 5 traits, the left-tailed critical value is SRl, 0.05 = 14 in the rank-sum testing, and the righttailed critical value is SRr, 0.05 = 41. Comprehensive result can be concluded from the table that Suyunuo 11 and Suyunuo 13 are superior to the average level of the 12 varieties, while Nannongziyunuo is inferior to the average level. In order to compare the difference between the pro-
In this paper, under null hypothesis: the variety’s ranking on each trait is random, the theoretical distribution of sum of ranks (SR) is firstly derived and a statistical hypothesis test method is henceforth proposed. The advantage of the method is the ability to either rank the evaluation objects or test the significance between the evaluation objects and the average level. The theoretical distribution of rank-sum difference can be further derived based on this method, then the significance of every two objects can be tested, which will be published separately. It is noted that though the proposed method is not limited to agricultural experimental data analysis, several problems must be issued in agricultural experiments considering the features of the agricultural experiment, especially the specialty of the variety evaluation. First of all, it is often required for crop variety test to conduct experiments in a time span of several years or on several locations, other than having multiple evaluation indexes, therefore how to utilize the results of these experiments to conduct the above-mentioned tests and comparison remains a problem to be looked into. Secondly, if observed values in the same trait of different varieties are the same, these values should be given the same rank which is the mean of the
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
1122
LUO Ru-jiu et al.
Table 4 The left-tailed (up) and right-tailed (down) critical values of rank-sum at significant level α = 0.05 under different v and l v 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
l 2
3
0 8 0 10 0 12 1 13 1 15 1 17 2 18 2 20 2 22 2 24 3 25 3 27 3 29 4 30 4 32 4 34 5 35 5 37 5 39 6 40 6 42 6 44 7 45 7 47 7 49 8 50
0 6 0 9 1 11 2 13 2 16 3 18 4 20 4 23 5 25 6 27 6 30 7 32 8 34 8 37 9 39 10 41 10 44 11 46 12 48 12 51 13 53 14 55 14 58 15 60 16 62 16 65 17 67 18 69
4
5
6
0 0 5 6 0 1 2 8 9 10 1 2 4 11 13 14 2 4 5 14 16 19 3 5 7 17 20 23 4 7 9 20 23 27 5 8 11 23 27 31 6 9 13 26 31 35 8 11 14 28 34 40 9 12 16 31 38 44 10 14 18 34 41 48 11 15 20 37 45 52 12 17 22 40 48 56 13 18 24 43 52 60 14 19 25 46 56 65 15 21 27 49 59 69 16 22 29 52 63 73 17 24 31 55 66 77 18 25 33 58 70 81 19 27 35 61 73 85 20 28 36 64 77 90 21 30 38 67 80 94 22 31 40 70 84 98 23 32 42 73 88 102 24 34 44 76 91 106 25 35 46 79 95 110 26 37 47 82 98 115 27 38 49 85 102 119 28 40 51 88 105 123
7
8
9
0 7 2 12 5 16 7 21 9 26 11 31 14 35 16 40 18 45 20 50 22 55 25 59 27 64 29 69 31 74 34 78 36 83 38 88 40 93 43 97 45 102 47 107 49 112 52 116 54 121 56 126 58 131 61 135 63 140
1 7 3 13 6 18 8 24 11 29 14 34 16 40 19 45 22 50 24 56 27 61 30 66 32 72 35 77 38 82 40 88 43 93 45 99 48 104 51 109 53 115 56 120 59 125 61 131 64 136 67 141 69 147 72 152 75 157
1 8 4 14 7 20 10 26 13 32 16 38 19 44 22 50 25 56 28 62 31 68 35 73 38 79 41 85 44 91 47 97 50 103 53 109 56 115 59 121 62 127 65 133 68 139 71 145 74 151 78 156 81 162 84 168 87 174
10 11 1 9 5 15 8 22 12 28 15 35 19 41 22 48 26 54 29 61 33 67 36 74 40 80 43 87 46 94 50 100 53 107 57 113 60 120 64 126 67 133 71 139 74 146 78 152 81 159 85 165 88 172 92 178 95 185 99 191
2 9 6 16 9 24 13 31 17 38 21 45 25 52 29 59 33 66 37 73 41 80 45 87 48 95 52 102 56 109 60 116 64 123 68 130 72 137 76 144 80 151 84 158 88 165 92 172 96 179 99 187 103 194 107 201 111 208
12 13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2 10 6 18 11 25 15 33 19 41 24 48 28 56 32 64 37 71 41 79 45 87 50 94 54 102 58 110 63 117 67 125 71 133 76 140 80 148 84 156 89 163 93 171 98 178 102 186 106 194 111 201 115 209 119 217 124 224
3 11 8 20 13 29 18 38 23 47 29 55 34 64 39 73 44 82 50 90 55 99 60 108 65 117 70 126 76 134 81 143 86 152 91 161 96 170 102 178 107 187 112 196 117 205 123 213 128 222 133 231 138 240 143 249 149 257
3 12 9 21 14 31 20 40 26 49 31 59 37 68 43 77 48 87 54 96 59 106 65 115 71 124 76 134 82 143 88 152 93 162 99 171 105 180 110 190 116 199 122 208 127 218 133 227 139 236 144 246 150 255 156 264 161 274
4 12 10 22 16 32 22 42 28 52 34 62 40 72 46 82 52 92 58 102 64 112 70 122 76 132 83 141 89 151 95 161 101 171 107 181 113 191 119 201 125 211 131 221 137 231 144 240 150 250 156 260 162 270 168 280 174 290
4 13 10 24 17 34 23 45 30 55 36 66 43 76 49 87 56 97 63 107 69 118 76 128 82 139 89 149 95 160 102 170 108 181 115 191 121 202 128 212 134 223 141 233 148 243 154 254 161 264 167 275 174 285 180 296 187 306
5 13 11 25 18 36 25 47 32 58 39 69 46 80 53 91 60 102 67 113 74 124 81 135 88 146 95 157 102 168 109 179 116 190 123 201 130 212 137 223 144 234 151 245 158 256 165 267 172 278 179 289 186 300 193 311 200 322
5 14 12 26 19 38 27 49 34 61 42 72 49 84 56 96 64 107 71 119 79 130 86 142 94 153 101 165 108 177 116 188 123 200 131 211 138 223 146 234 153 246 160 258 168 269 175 281 183 292 190 304 198 315 205 327 212 339
5 15 13 27 21 39 29 51 36 64 44 76 52 88 60 100 68 112 76 124 84 136 91 149 99 161 107 173 115 185 123 197 131 209 139 221 147 233 154 246 162 258 170 270 178 282 186 294 194 306 202 318 210 330 217 343 225 355
6 15 14 28 22 41 30 54 39 66 47 79 55 92 64 104 72 117 80 130 88 143 97 155 105 168 113 181 122 193 130 206 138 219 147 231 155 244 163 257 172 269 180 282 188 295 197 307 205 320 213 333 222 345 230 358 238 371
6 16 15 29 23 43 32 56 41 69 50 82 58 96 67 109 76 122 85 135 93 149 102 162 111 175 120 188 128 202 137 215 146 228 155 241 163 255 172 268 181 281 190 294 199 307 207 321 216 334 225 347 234 360 242 374 251 387
7 16 16 30 25 44 34 58 43 72 52 86 61 100 71 113 80 127 89 141 98 155 107 169 117 182 126 196 135 210 144 224 154 237 163 251 172 265 181 279 190 293 200 306 209 320 218 334 227 348 237 361 246 375 255 389 264 403
7 17 16 32 26 46 36 60 45 75 55 89 65 103 74 118 84 132 94 146 103 161 113 175 122 190 132 204 142 218 152 232 161 247 171 261 181 275 190 290 200 304 210 318 219 333 229 347 239 361 248 376 258 390 268 404 277 419
7 18 17 33 27 48 37 63 47 78 58 92 68 107 78 122 88 137 98 152 108 167 118 182 128 197 138 212 149 226 159 241 169 256 179 271 189 286 199 301 209 316 219 331 230 345 240 360 250 375 260 390 270 405 280 420 290 435
8 18 18 34 29 49 39 65 50 80 60 96 71 111 81 127 92 142 102 158 113 173 124 188 134 204 145 219 155 235 166 250 176 266 187 281 198 296 208 312 219 327 229 343 240 358 250 374 261 389 272 404 282 420 293 435 303 451
8 19 19 35 30 51 41 67 52 83 63 99 74 115 85 131 96 147 107 163 118 179 129 195 140 211 151 227 162 243 173 259 184 275 195 291 206 307 217 323 228 339 239 355 250 371 261 387 272 403 283 419 294 435 305 451 316 467
9 19 20 36 31 53 43 69 54 86 66 102 77 119 89 135 100 152 111 169 123 185 134 202 146 218 157 235 169 251 180 268 192 284 203 301 215 317 226 334 238 350 249 367 261 383 272 400 284 416 295 433 307 449 318 466 330 482
9 20 21 37 33 54 44 72 56 89 68 106 80 123 92 140 104 157 116 174 128 191 140 208 152 225 164 242 176 259 188 276 200 293 211 311 223 328 235 345 247 362 259 379 271 396 283 413 295 430 307 447 319 464 331 481 343 498
10 20 22 38 34 56 46 74 59 91 71 109 83 127 96 144 108 162 120 180 133 197 145 215 158 232 170 250 182 268 195 285 207 303 220 320 232 338 244 356 257 373 269 391 282 408 294 426 306 444 319 461 331 479 344 496 356 514
3 10 7 19 12 27 17 35 21 44 26 52 31 60 36 68 40 77 45 85 50 93 55 101 60 109 64 118 69 126 74 134 79 142 83 151 88 159 93 167 98 175 103 183 107 192 112 200 117 208 122 216 127 224 131 233 136 241
different varieties’ ranks under no node condition so that there could be non-integer rank. Besides, some of the researchers maintain that different significance
should be considered among different observed values in ranking and further put forward ranking based on multiple comparisons (Xu 1998; Jin and Bai 2001; Guo © 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
A Rank-Sum Testing Method for Multi-Trait Comprehensive Ranking and Its Application
1123
Table 5 The left-tailed (up) and right-tailed (down) critical values of rank-sum at significant level α = 0.025 under different v and l v 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
l 2
3
4
5
6
7
8
0 12 0 14 0 16 0 18 1 19 1 21 1 23 1 25 1 27 2 28 2 30 2 32 2 34 3 35 3 37 3 39 3 41 3 43 4 44 4 46 4 48 4 50 5 51 5 53
0 9 0 12 1 14 1 17 2 19 2 22 3 24 3 27 4 29 4 32 5 34 6 36 6 39 7 41 7 44 8 46 8 49 9 51 9 54 10 56 10 59 11 61 11 64 12 66 12 69 13 71 13 74
0 8 1 11 2 14 2 18 3 21 4 24 5 27 6 30 7 33 8 36 8 40 9 43 10 46 11 49 12 52 13 55 14 58 15 61 16 64 16 68 17 71 18 74 19 77 20 80 21 83 22 86 23 89 23 93
1 9 2 13 3 17 4 21 5 25 7 28 8 32 9 36 10 40 12 43 13 47 14 51 15 55 16 59 18 62 19 66 20 70 21 74 23 77 24 81 25 85 26 89 28 92 29 96 30 100 31 104 33 107 34 111
0 6 1 11 3 15 4 20 6 24 7 29 9 33 11 37 12 42 14 46 16 50 17 55 19 59 20 64 22 68 24 72 25 77 27 81 29 85 30 90 32 94 33 99 35 103 37 107 38 112 40 116 42 120 43 125 45 129
0 7 2 12 4 17 6 22 8 27 10 32 12 37 14 42 16 47 18 52 20 57 22 62 24 67 26 72 28 77 30 82 32 87 34 92 36 97 38 102 40 107 42 112 44 117 46 122 48 127 50 132 52 137 54 142 56 147
0 8 3 13 5 19 7 25 10 30 12 36 14 42 17 47 19 53 22 58 24 64 26 70 29 75 31 81 34 86 36 92 38 98 41 103 43 109 46 114 48 120 50 126 53 131 55 137 58 142 60 148 62 154 65 159 67 165
9 10 1 8 3 15 6 21 9 27 12 33 14 40 17 46 20 52 23 58 26 64 28 71 31 77 34 83 37 89 40 95 42 102 45 108 48 114 51 120 54 126 56 133 59 139 62 145 65 151 68 157 70 164 73 170 76 176 79 182
1 9 4 16 7 23 10 30 13 37 17 43 20 50 23 57 26 64 29 71 33 77 36 84 39 91 42 98 46 104 49 111 52 118 55 125 58 132 62 138 65 145 68 152 71 159 75 165 78 172 81 179 84 186 87 193 91 199
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1 10 5 17 8 25 12 32 15 40 19 47 23 54 26 62 30 69 34 76 37 84 41 91 44 99 48 106 52 113 55 121 59 128 63 135 66 143 70 150 73 158 77 165 81 172 84 180 88 187 92 194 95 202 99 209 102 217
2 10 6 18 9 27 13 35 17 43 21 51 26 58 30 66 34 74 38 82 42 90 46 98 50 106 54 114 58 122 62 130 66 138 70 146 74 154 78 162 82 170 86 178 90 186 94 194 98 202 102 210 106 218 110 226 114 234
2 11 6 20 11 28 15 37 19 46 24 54 28 63 33 71 37 80 42 88 46 97 51 105 55 114 60 122 64 131 68 140 73 148 77 157 82 165 86 174 91 182 95 191 100 199 104 208 109 216 113 225 118 233 122 242 127 250
2 12 7 21 12 30 17 39 22 48 26 58 31 67 36 76 41 85 46 94 51 103 56 112 61 121 65 131 70 140 75 149 80 158 85 167 90 176 95 185 100 194 105 203 109 213 114 222 119 231 124 240 129 249 134 258 139 267
3 12 8 22 13 32 18 42 24 51 29 61 34 71 39 81 45 90 50 100 55 110 61 119 66 129 71 139 77 148 82 158 87 168 93 177 98 187 103 197 108 207 114 216 119 226 124 236 130 245 135 255 140 265 146 274 151 284
3 13 9 23 14 34 20 44 26 54 31 65 37 75 43 85 49 95 54 106 60 116 66 126 71 137 77 147 83 157 89 167 94 178 100 188 106 198 112 208 117 219 123 229 129 239 135 249 140 260 146 270 152 280 158 290 163 301
4 13 9 25 16 35 22 46 28 57 34 68 40 79 46 90 52 101 59 111 65 122 71 133 77 144 83 155 89 166 96 176 102 187 108 198 114 209 120 220 126 231 133 241 139 252 145 263 151 274 157 285 163 296 170 306 176 317
4 14 10 26 17 37 23 49 30 60 36 72 43 83 50 94 56 106 63 117 69 129 76 140 83 151 89 163 96 174 102 186 109 197 116 208 122 220 129 231 135 243 142 254 149 265 155 277 162 288 168 300 175 311 182 322 188 334
4 15 11 27 18 39 25 51 32 63 39 75 46 87 53 99 60 111 67 123 74 135 81 147 88 159 95 171 102 183 109 195 116 207 123 219 130 231 137 243 144 255 151 267 158 279 166 290 173 302 180 314 187 326 194 338 201 350
5 15 12 28 19 41 27 53 34 66 42 78 49 91 56 104 64 116 71 129 79 141 86 154 94 166 101 179 109 191 116 204 124 216 131 229 139 241 146 254 154 266 161 279 168 292 176 304 183 317 191 329 198 342 206 354 213 367
5 16 13 29 20 43 28 56 36 69 44 82 52 95 60 108 68 121 76 134 84 147 91 161 99 174 107 187 115 200 123 213 131 226 139 239 147 252 155 265 163 278 171 291 178 305 186 318 194 331 202 344 210 357 218 370 226 383
5 17 14 30 22 44 30 58 38 72 47 85 55 99 63 113 72 126 80 140 88 154 97 167 105 181 113 195 122 208 130 222 138 236 147 249 155 263 163 277 172 290 180 304 188 318 197 331 205 345 214 358 222 372 230 386 239 399
6 17 14 32 23 46 32 60 40 75 49 89 58 103 67 117 76 131 84 146 93 160 102 174 111 188 119 203 128 217 137 231 146 245 155 259 163 274 172 288 181 302 190 316 199 330 207 345 216 359 225 373 234 387 243 401 251 416
6 18 15 33 24 48 33 63 43 77 52 92 61 107 70 122 79 137 89 151 98 166 107 181 116 196 126 210 135 225 144 240 153 255 163 269 172 284 181 299 190 314 199 329 209 343 218 358 227 373 236 388 246 402 255 417 264 432
7 18 16 34 26 49 35 65 45 80 54 96 64 111 74 126 83 142 93 157 103 172 112 188 122 203 132 218 141 234 151 249 161 264 170 280 180 295 190 310 199 326 209 341 219 356 228 372 238 387 248 402 257 418 267 433 277 448
7 19 17 35 27 51 37 67 47 83 57 99 67 115 77 131 87 147 97 163 108 178 118 194 128 210 138 226 148 242 158 258 168 274 178 290 188 306 199 321 209 337 219 353 229 369 239 385 249 401 259 417 269 433 280 448 290 464
7 20 18 36 28 53 39 69 49 86 60 102 70 119 81 135 91 152 102 168 112 185 123 201 134 217 144 234 155 250 165 267 176 283 186 300 197 316 207 333 218 349 229 365 239 382 250 398 260 415 271 431 281 448 292 464 302 481
8 20 19 37 29 55 40 72 51 89 62 106 73 123 84 140 95 157 106 174 117 191 128 208 139 225 150 242 161 259 172 276 183 293 194 310 205 327 216 344 227 361 238 378 249 395 260 412 271 429 282 446 293 463 304 480 315 497
8 21 19 39 31 56 42 74 54 91 65 109 76 127 88 144 99 162 111 179 122 197 134 214 145 232 156 250 168 267 179 285 191 302 202 320 214 337 225 355 237 372 248 390 260 407 271 425 282 443 294 460 305 478 317 495 328 513
9 21 20 40 32 58 44 76 56 94 68 112 79 131 91 149 103 167 115 185 127 203 139 221 151 239 163 257 175 275 186 294 198 312 210 330 222 348 234 366 246 384 258 402 270 420 282 438 294 456 306 474 317 493 329 511 341 529
2005). This method may also lead to the same ranking position for different samples. In this paper, the theoretical distribution of rank-sum is deduced when all the
traits can be ranked distinctively, namely all the ranks are integers. Therefore, whether the results can be applied directly to significance test on non-integer sta© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
1124
LUO Ru-jiu et al.
Table 6 The observation of five RVA characteristic parameters of twelve glutinous maize varieties and their ranking by the proposed method and the principal component analysis Variety Suyunuo 1 Suyunuo 2 Suyunuo 5 Suyunuo 11 Suyunuo 12 Suyunuo 13 Sunuo 638 Sunuo 528 Jiangnanhuanuo Jiangnanzinuo Nannongziyunuo Shenyunuo 1
PV Observation 1 681.7 1 659.7 2 066.7 2 226.0 2 066.3 2 082.3 1 646.3 1 686.0 1 892.3 1 864.3 1 539.7 1 751.7
TV Rank 8 9 2 0 3 1 10 7 4 5 11 6
FV
BD
SB
Observation
Rank
Observation
Rank
Observation
Rank
Observation
Rank
757.7 769.7 1 038.0 958.3 974.7 954.0 720.0 783.3 1 427.0 1 298.3 1 131.0 1 365.3
1 2 7 5 6 4 0 3 11 9 8 10
892.0 887.7 1 292.7 1 300.7 1 157.0 1 221.3 822.7 884.0 1 522.0 1 471.0 1 253.0 1 517.7
8 9 4 3 7 6 11 10 0 2 5 1
924.0 890.0 1 028.7 1 267.7 1 091.7 1 128.3 926.3 902.7 465.3 566.0 408.7 386.3
5 7 3 0 2 1 4 6 9 8 10 11
134.3 118.0 254.7 342.3 182.3 267.3 102.7 100.7 95.0 172.7 122.0 152.3
6 8 2 0 3 1 9 10 11 4 7 5
SR 28 35 18 8* 21 13 * 34 36 35 28 41 * 33
Ranking on PC 1st PC 2nd PC 4 7 5 0 2 1 3 6 10 8 9 11
8 10 1 0 4 2 11 9 3 5 7 6
, significance at 5% level.
*
tistics should be proven further. In order to eliminate the non-integer ranks, Nassar and Huhn (1987) used to suggest a method to randomly arrange ranks on nodes. For example, if two varieties tie at the 3rd rank in one of the traits (suppose they correspond ranks 2 and 3 respectively), the non-integer ranking method gives the rank 2.5 to each of them, while Nassar and Huhn (1987) suggest giving a random rank 2 or 3 to the varieties. In this paper, we suggest to give the rank 2 to both of the varieties. In this method, the mean of rank-sums will be lower than the theoretical mean μSR = l(v - 1)/2. It can be easily proven that if there are same t observed values on the same trait, the difference between mean of rank-sums and μSR is t(t - 1)/2v. The deviation will grow with the increase of nodes. Therefore what impacts the significance test will have when nodes are getting accumulated and how to react to these impacts need to be studied further. Thirdly, critical values in Tables 4 and Table 5 are deduced under the assumption that different traits have the same importance. In fact, for comprehensive evaluation in varieties, it is possible that different traits occupy different weight which can be determined by evaluators based on actual situation such as the production of a variety weighs more than other characters. Under this circumstance, only if the evaluator gives the weights to each of the traits according to different corresponding importance, the theoretical distribution in this paper can still be applied. The rank-sum of each variety should be dealt with weighted calculation according to each of the traits. Formula can be converted into
, here wj is the weight
of each trait. In the arrangement of T = v l possibilities, a new rank-sum can be calculated, so are the left-tailed and the right-tailed critical values. Finally, it is likely that different traits relate to each other in one way or another. If every trait is ranked by the trait’s value, the relevance of traits will undoubtedly affect the comparison among rank-sums. Therefore, we suggest that the ranking should be based on preferences of the magnitude of the trait’s value. If it is that the larger the trait’s value the better it is, low ranks will be given to the large values. Otherwise, low ranks will be given to the small values, in which case, the smaller the rank-sums are, the better the varieties will be. The impacting factors on ranking thus will be eliminated owing to the relevance of traits. In some other cases, it is possible that a medium or optimized value is the best, the ranks are based on proximity between the observed and the optimized value. The smaller the difference is, the lower the rank will be. A smaller rank-sum also illustrates a better variety. More and more researchers focus on the multi-traits’ comprehensive evaluation in crop varieties because the ordinary single-trait’s evaluation cannot reflect the differences between varieties. Except for factor of production, resistibility and quality of a breed is also the standard which researchers should take into account. Even if the evaluation standard is production only, production statistics of every location can be viewed as a single trait. In this way, production in different places can be regarded as multi-traits which can be ranked by the method we offer in this paper. This differs from the common method of simply work-
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
A Rank-Sum Testing Method for Multi-Trait Comprehensive Ranking and Its Application
ing out sum or mean of production from many places and then testing its significance. It is worth finding out the advantages and disadvantages between the ranksum method and the direct method in analysis of variance. In this paper, we offer an attempt about multi-traits’ comprehensive evaluation ranking method which does not necessarily repel other statistical methods such as analysis of variance (Mo 1992), analysis of linear regression (Finlay 1963), stability analysis (Eberhart and Russels 1966; Zhu et al. 1993), principal component analysis (Li et al. 1998) and AMMI model analysis (Zhang et al. 1998). The differences are that these methods usually analyze the single trait, while the method in this paper considers and analyzes the multiple traits. Except for the rank-sum method, non-parametric methods which are used to evaluate multi-traits are the rank analysis and the grey analysis. For these methods, effective statistical hypothesis test cannot be taken. In the identical-discrepant-contrary grey correlation analysis method, the critical value of significance is defined to be 0.3 artificially which is not derived from the error distribution. While the rank analysis method sets ranks as the repeating values of the same variety in different environment and tests the significance of rank mean among different varieties according to one-factor ANOVA and LSD multiple comparisons (Jin and Bai 1999). Due to the stability of the varieties, rank variance of the same variety is different under different environments. Besides, in the four arithmetic operations of arbitrary length integers, decimal fraction can only be produced by division. Since the integer division is a nonclosed operation which only returns the integral part of the quotient, precision will thus be hammered. In actual application, the denominator part of a division could be enhanced by 10 N + 1 so as to keep N decimal places. What’s more, the bitwise division offered in this paper has more complicated calculation and operation method than other operations. Therefore, in actual operation, division should be least frequently used so as to significantly improve the operational efficiency. For instance, achieving the same result, carrying out calculation of D = A/(B×C) will be much more efficient than doing D = A/B/C. Under the present condition of computer software and hardware, the
1125
basic thought of extending scale is an effective strategy to get rid of the limitation of platform on digital storage and the precision and size of calculation. In this paper, we offer an arbitrary length integer category which enables storage and calculation of integers regardless of length in a 32 system computer. Hence, after optimizing basic operation with regards to power and logarithm, this type of arbitrary integers could really play an important role in scientific and industrial calculation fields which have special requirements for precision and size of numbers.
CONCLUSION In this paper, under null hypothesis: the variety’s ranking on each trait is random, the theoretical distribution of sum of ranks (SR) was firstly derived and further used to obtain the critical values for multi-trait comprehensive evaluation in rank-sum test. Finally, an application of the theoretical results was demonstrated using five starch viscosity traits of 12 glutinous maize varieties and the result was compared to the results from the principal component analysis. Sound comparability is discovered between these two methods. The proposed method is so simple and convenient that it can be easily adopted to rank different varieties by multiple traits.
Acknowledgements We would like to express our gratitude to Prof. Mo Huidong from Yangzhou University, China. Without his inspiration, guidance and support, the study could not be accomplished. The study was financially supported by the National Key Basic Research Program of China (2006CB101700) and the Program for New Century Excellent Talents in University of Ministry of Education of China (NCET2005-05-0502).
References Eberhart S A, Russell W A. 1966. Stability parameters of comparing varieties. Crop Science, 6, 36-40. Finlay K W, Wilkinson G N. 1963. The analysis of adaptation in a plant-breeding programme. Australian Journal of Agricultural Research, 14, 742-754. Gao H X. 2007. Applied Multivariate Statistical Analysis. Peking
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.
1126
University Press, Beijing. pp. 265-290. (in Chinese) Guo R L. 2005. Indentical-discrepant-contrary grey correlation analysis method and its application in wheat. System Sciences and Comprehensive Studies in Agriculture, 21, 170-174. (in Chinese) Hwang C L, Yoon K. 1981. Multiple Attribute Decision Making: Methods and Applications. Springer, Berlin. Jin W L, Bai Q Y. 1999. The analysis based on ranks of crop varieties in regional trials. Acta Agronomica Sinica, 25, 632638. (in Chinese) Jin W L, Bai Q Y. 2001. The rank analysis method of nonbalance data in mid-long run variety comparative trials of rolling way. Acta Agronomica Sinica, 27, 946-952. (in Chinese) Li X H, Chang R Z. 1998. Cluster and principal component analysis of the spring soybean varieties in China. Acta Agronomica Sinica, 24, 325-332. (in Chinese) Mo H D. 1992. Statistics for Agricultural Experiments. 2nd ed.
LUO Ru-jiu et al.
Chinese). Nassar R, Huhn M. 1987. Studies on estimation of phenotypic stability: tests of significance for nonparametric measures of phenotypic stability. Biometrics, 43, 45-53. Saaty T L. 1980. The Analytic Hierarchy Process. McGraw Hill, New York. Saaty T L. 1990. How to make a decision: the analytic hierarchy process. European Journal of Operational Research, 48, 926. Xu C W. 1998. Inheritance of quality characters of rice grain. Ph D thesis, Nanjing Agriculture University. (in Chinese) Zhang Z, Lu C, Xiang Z H. 1998. Analysis of variety stability based on AMMI model. Acta Agronomica Sinica, 24, 304309. (in Chinese) Zhu J, Xu F H, Lai M G. 1993. Analysis methods for unbalanced data from regional trials of crop variety: Analysis for single trait. Journal of Zhejiang Agricultural University, 19, 7-13. (in Chinese)
Shanghai Science and Technology Press, Shanghai. (in (Managing editor ZHANG Yi-min)
© 2010, CAAS. All rights reserved. Published by Elsevier Ltd.