A FORTRAN program for ranking and for calculation of Spearman's correlation coefficient

A FORTRAN program for ranking and for calculation of Spearman's correlation coefficient

Computer Methods and Programs in Biomedicine 21 (1985) 123-125 123 Elsevier CPB 00721 A FORTRAN program for ranking and for calculation of Spearman...

155KB Sizes 18 Downloads 261 Views

Computer Methods and Programs in Biomedicine 21 (1985) 123-125

123

Elsevier CPB 00721

A FORTRAN program for ranking and for calculation of Spearman's correlation coefficient Viktor Kempi Department of Radiophysics, Sjukhuset, S-831 83 Ostersund, Sweden

A FORTRAN IV program is presented for ranking data in ascending order. The ranks corrected for ties are printed together with the correspondingraw data, if requested. After ranking, the Spearman's rank correlation coefficient, r~, as well as the correspondin~ t-value, can be calculated. In these calculations corrections are made for tied values. Spearman's correlation coefficient Ranking

1. Introduction In many fields of research the question is met whether or not two sets of values are related. Assuming that the two sets of values stem from a bivariate normal population, the usual measure of correlation is the product moment correlation. When the assumption of normality seems questionable, a non-parametric alternative is offered by Spearman's rank correlation coefficient, rs. Its calculation requires that the variables can be measured in at least an ordinal scale. As the first non-parametric test based on ranks to be developed, it is perhaps the best known [1]. The calculation starts with the ranking procedure, an easy task when the number of values is limited (e.g. < 20). With larger numbers ranking becomes increasingly cumbersome. In the present study a F O R T R A N IV program is presented which is able to rank up to 200 values, irrespective of the presence of ties. After ranking, the difference in ranks can be obtained for each single pair, and the correlation coefficient is then calculated. The presence of ties adds to the difficulty of manual ranking. If their number is large they tend to decrease the strength of the correlation [1,2]. The program

presented corrects for the presence of ties. To test the significance of the rs value obtained, the program finally gives the corresponding t-value. Very little has been published about ranking or the subsequent calculation of Spearman's correlation coefficient using a F O R T R A N program: a thorough search revealed only one publication [3], the program in which does not include a display of calculated ranks or a routine for the correction of erroneously entered data. A list of ranks is useful when dealing with non-parametric tests and correction of erroneous data may be of help in the presence of numerous data. Although software for IBM-compatible machines is commercially available [4,5,6], since the user is denied access to the source codes of the software he is deprived of any possibility of making the alterations he might need. Nor is he able to check the calculations at a given stage. Moreover, IBM computers and compatible machines are not available in every scientific institution. Another disadvantage is that their software does not print ranks corrected for ties except for one version for SAS [5]. In contrast, the program presented here may be used on any computer with a F O R T R A N compiler. It requires minimal additional informa-

0169-2607/85/$03.30 © 1985 Elsevier Science Publishers B.V. (Biomedical Division)

124 tion after the paired values to be compared have been keyed in and the number of pairs that can be handled is limited only by the memory size of the computer. A user with basic familiarity with FORT R A N programming may of course change the program according to his needs. Since its use is not restricted to the normal distribution, the Spearman rank correlation method is more widely applicable than its parametric alternative, the Pearson's product moment correlation method. In spite of this the method is not widely used, possibly due to the lack of a generally available computer program. The FORT R A N program presented here may respond to that need.

An example showing the inputs and outputs is shown in the Appendix. The input data are obtained from Kendall [2] for comparison. Significance can be checked by comparison with a table of critical values of t. Alternatively, the significance of the rs value can be directly obtained by comparison with the table of critical rs values [1,7].

3. Availability The program listing is available from the author on request.

4. Acknowledgements 2. Program description The program presented has been implemented on an Intertechnique CINE-200/Multi-20 with a 16bit words memory. The program is written in F O R T R A N IV and can therefore be used on other computer systems with a minimum of modification. The program is not claimed to have an optimal configuration: for example, the FORT R A N version does not include logical expressions, which are often very useful. The fairly large size of the program (9397 bytes) is partly due to the number of cells occupied in D I M E N S I O N : if D I M E N S I O N (i.e. the maximum number of paired values) is reduced to 100, the number of bytes required is reduced to 6997. The program can of course be used for the ranking of data alone, i.e. without calculating r s and t, and can therefore also be used for other non-parametric tests which require ranking of data. A number of comments are inserted in the program to help with the identification of essential parts of the program. A procedure is embodied in the program which enables the user to correct erroneously entered values.

My sincere thanks are due to hospital physicist Bengt Johansson of the Department of Radiation Physics, University of UmeL Sweden for expert advice.

References [1] S. Siegel, Nonparametric Statistics for the Behavioral Sciences (McGraw-Hill, New York, 1956). [2] M.G. Kendall, The treatment of ties in ranking problems, Biometrika 33 (1945) 239-251. [3] R.L. Mason, FORTRAN Programs for Non-Parametric Studies (National Technical Information Service, U.S. Department of Commerce, Springfield VA, 1973). [4] SAS Institute Inc., SAS User's Guide: Basics, 1982 ed., pp. 501-512 (SAS Institute Inc., Cary NC, 1982). [5] SAS Institute Inc., SAS User's Guide: Statistics, 1982 ed., pp. 479-484 (SAS Institute Inc., Cary NC, 1982). [6] SPSSInc., SPSSx User's Guide, pp. 663-669 (McGraw-Hill, 1983). [7] T. Colton, Statistics in Medicine (Little, Brown and Co., Boston MA, 1974).

125

Appendix: Sample run RANKING

OF

DATA

PRINT RANKS REQUIRED? O=NO, KEY IN NUMBER OF COLUMN=YES I

COMMENT: MAY 13, 1985 MAY 13,1985 NUMBER OF ROWS (MAX 200) I0

RANKING ACCORDING TO COLUMN: I

KEY IN PAIRED VALUES ROW BY ROW WITH SPACE BETWEEN VALUES NOTE: ERRORS ARE CORRECTED LATER ROW= I 1 1 22 23 33 33 43 44 54 64 6 5

COL I 1.0 2.5 2.5 4.5 4.5 6.5 6.5 8.0 9.5 9.5

SPEARMAN RANK CORRELATION COEFFICIENT: RS

CORRECTIONS? O=NO, KEY IN NUMBER OF ROW TO BE CORRECTED=YES 0 SPEARMAN CORRELATION? I

O=NO, I=YES

PRINT RANKS AND VALUES? O=NO, KEY IN NUMBEROF COLUMN=YES I

RS= 0.9171 T= 0.65084 E+01 PRINT RANKS REQUIRED? O=NO, KEY IN NUMBER OF COLUMN=YES 0 ALTERNATIVE RANKING REQUIRED? O=NO, I=YES 0 STOP

MAY 13, 1985 RANKING ACCORDING TO COLUMN: I COL 1.02.52.54.54.56.56.58.09.59.5-

1 1.00 2.00 2,00 3.00 3.00 4.00 4.00 5.00 6.00 6.00

COL 1.02.04.54.54.54.58.08.08.010.0-

2 I .00 2.00 3.00 3.00 3.00 3.00 4.00 4.00 4.00 5.00

SPEARMAN RANK CORRELATION COEFFICIENT: RS RS= 0.9171 T= 0.65084 E+01

COL 2 1.0 2.0 4.5 4.5 4.5 4.5 8.0 8.0 8.0 10.0

00000