Computer Programsin Biomedicine 7 (1977) 247-250 © Elsevier/North-Holland Biomedical Press
TWO-SAMPLE KOLMOGOROV-SMIRNOV TEST FOR TRUNCATED DATA N.B. GROVER The Department of Experimental Medicine and CancerResearch, The Hebrew University-Hadassah Medical School, Jerusalem, Israel A nonparametric statistical test to compare two cumulative frequency distribution functions is presented that can be used even when both samples include censored data, as is often the case when comparing the survival of two groups of laboratory animals under conditions in which the experiment is terminated before all the animals die. (Such a design can produce considerable savings and is to be recommended.) The program calculates exact probabilities for both the one-sided and two-sided alternatives to the null hypothesis, applicable to the case of equal group size, as well as the corresponding general asymptotic values; a continuity correction is employed that markedly improves the asymptotic approximation. Expressions are stated in terms of two different but related statistics, and the one that utilizes more information in any particular set of data is selected for the probability calculations. All basic equations and definitions are provided. Truncated Kolmogorov-Smirnov statistics
Comparison of survival distributions
Test for censored samples
1. Introduction
sample from the continuous distribution function F(x), with empirical cumulative distribution function
It is common practice in biomedical research, as when comparing the survival of two groups o f laboratory animals, to terminate the experiment before all the animals are dead. Such a design has the merits of expediency and e c o n o m y and is termed righttruncated: no information is available on subjects sur. viving beyond a specified time. In the event that data in both groups are truncated, it is no longer possible
Sn(x) = kin i f X k ~
to employ the usual ranking statistics and recourse must be had to special techniques. Gehan [1] has published an extension to the Mann-Whitney U-test
dr = maxx ~ max (xr, Yr) i sn (X) -- S~n (x)i ,
suitable for comparing survival distributions from censored clinical trials, and Robertson and Gehan [2] have programmed the calculations involved. In animal experiments, where the time to censoring is usually the same for all surviving members in each group, Gehan's approach is less appropriate than one based on Kolmogorov-Smirnov type statistics. The present article describes such a test.
Conover [3] introduced the analogous statistic
2. Basic expressions
3.1. Preliminary calculations
We follow the notation o f Conover [3], and let X l < X2 < ... < Xn represent an ordered random
The program first ranks the data of each group separately in ascending order. The next step is a deci-
•
t
r ~< min(m, n);
d~ = maxx~ min(Xr, Yr) [Sn(x) - Sm(x)[ ' r ~< rain(m, n).
3. Computational methods
247
248
N.B. Grover,Two-sampleKolmogorov-Smirnovtest for truncateddata
sion as to whether to calculate d'r or d~, based on the wish to utilize as much of the input information as possible. Thus if the last recorded death occurs in the larger group (the one with more uncensored measurements), d" is determined, otherwise d'r is used. The data from the two groups are then ranked together and the appropriate dr (either d r/ or d rt?) calculated, At the same time, the dr values corresponding to both one-sided alternatives are also computed.
from the expression [3]
i' p ( d , r ~ x ) = l + 2 ~ ( _ l ) i N i / ( 2n ) i=1 where i' = min([(r +
c)/(c+ 1)], [n/(c+ 1)]),
and
n-r-c-1
(
2n
)
Ni . n. zc. .i . -
3.2. Exact probabilities
.
-
/=o
.
c
=
[nx]
(2n-2r-c-l-/) n - r
F( 2r +c +j _( ?rZc +j i1. XL\r+ ic+ i-1) \r-ic-i+c/-J
Exact two-sided probabilities (under He) are then determined for the case of equal total group size n
Where necessary, d" is first transformed to d'r via the
initialize) f'Read I#of sets HE,II ]variable formatJ
J Select lIstatistic appropriate l[ d'r or d';-) Call RAHK I comb ned dataI
I Jl=Jl+l
Cat, R.,K for group 2
(b
ICait
T
s
PROB two-sided one -sided probabilities
~I
('Read J ]header card I for set JX , ('Read I Idata cardsforl /set Jx I for group 1
[io~ii i ~i e]
Calculate ties I
[summary of
,
Calculate
Print I exact and I lasymptotic I
continuity correction
* Ca.:utate I asymptotic
[
(
statistics
4
Fig. 1. Flowchart of main program.
los ) sto p
N.B. Grover, Two-sampleKolmogorov-Smirnov test for truncated data identity tl ~
F
cation for the input data 0 5 , A5, 7A10). Then come the data sets themselves, each one preceded by a card containing the number of cases, total and uncensored
.
P(d r ~. x) = P(dr-c ~< x) One-sided probabilities are obtained from P(drt ~-x) = 1 - - N1/
[2n\ 1--}
\n!
249
.
(215), and the time to censoring (F5) for each o f the two groups in the set. A title (5A10, starting in column 31) can also be appended to each set and will appear in the output. The data themselves are read in
3.3. Asymptotic probabilities
according to the variable format, one group at a time.
For unequal group sizes (n > m, say), exact probabilities are not available and recourse must be had to the asymptotic distribution. This is given (under Ho) for the two-sided test by [3]
4.2. Output
p(Nl/2dr <~x) = ~ ( - 1 ) / e x p ( - 2 j 2 x 2) j= _=
soring are listed. The next line contains n, r and c; where m ~ n, the harmonic mean is displayed. P Whether the calculations are based on d r or dr' is indicated by the flag PRIME (= 1 or 2, respectively), while the actual value of N1/2dr, corrected for continuity, is given for the one-sided tests next to the label Zl or Z2. The exact probabilities for each of the one-sided tests and for the two-sided test then appear, and these are followed by the corresponding asymptotic probabilities.
X P(I z - 2 / x [(1
-
b)/b]l/2[ < x ( b - b2) -112)
where
N = mn/(m + n) b = rim = r/n
for d r = dr for d r = dr'
and z is the standardized normal random variable. The one-sided test is obtained similarly to the exact case by taking only the terms ] = 0 and ] = 1 in the above summation.
For every set the set number and (optional) title are printed, and for each group the total number of cases, number of uncensored cases and time to cen-
4.3. Subroutines
exact multiple of m.
Subroutine RANK is called three times in order to rank the data, first each group separately, then both together. Subroutine EXACT is then called to calculate the exact probabilities, once for the two-sided test and once for each of the one-sided tests. This subroutine uses the function F to compute binomial coefficients, in which logarithms are employed to decrease the range o f the intermediate results. Subroutine PROB calculates the asymptotic probabilities, and utilizes the IMSL [6] subprogram MDNOR to obtain the value of the normal probability density function; users at installations without this facility can easily programme their own routine based on accepted algorithms [7].
4. The program
4.4. Restrictions
4.1. Input
The dimension specifications allow a max'tmum o f 200 uncensored data points in each group; there is no limitation on the number of data sets run. In general,
3.4. Continuity Since dr assumes only discrete values, P(dr >t x) :/: 1 - P(d r ~t x) = 1 - P(dr < x) = 1 - P(dr ~
The first data card contains the number of sets of data to be processed and the variable format specifi-
N.B. Grover, Two-sampleKolmogorov-Smirnov test for truncated data
250
the arrays X, Y, XL, XS must be large enough to contain the input data o f each group separately, and S, AM, LS large enough to contain them combined.
References [11 E.A. Gehan, A generalized Wilcoxon test for comparing arbitrarily singly-censored samples, Biometrika 52 (1965) 203. [2] C.O. Robertson and E.A. Gehan, A computer sub-program for calculating the generalized Wilcoxon test, Comp. Prog. Biomed. 1 (1970) 167. [31 W.J. Conover, The distribution functions of Tsao's truncared Smirnov statistics, Ann. Math. Stat. 38 (1967) 1208. [4] C.K. Tsao, An extension of Massey's distribution of the maximum deviation between two-sample cumulative step functions, Ann. Math. Stat. 25 (1954) 587. [5] P.J. Kim, On the exact and approximate sampling distribution of the two-sample Kolmogorov-Smirnov criterion Dmn, m < n, J. Am. Stat. Assoc. 64 (1969) 1625. [61 IMSL, IMSL Library 3 Reference Manual, Edition 5, (International Mathematical and Statistical Libraries, Houston (1975) p. MERF/MERFC/MDNOR-1. [7] M. Zelen and N.C. Severo, Probability functions, in: Handbook of Mathematical Functions, National Bureau of Standards, Applied Mathematics Series, Vol. 55, p. 932, eds. M. Abramowitz and I.A. Stegun (Dover, New York, 1965).
5. Specifications This program and its subroutines are coded in F O R T R A N IV. The sample run (2 sets) on the CYBER 74 at the Hebrew University Computation Center, Jerusalem, requires 1.8 s of central processor time and 0.9 s o f I/0. It occupies 40 k octal locations during compilation, 17 k locations during execution, and 2.5 mass storage units.
6. Mode o f availability A source listing is available from the author.
Acknowledgement This work was supported in part by the Shirley Grover Altman Memorial Fund (Montreal).
Sample run
Input
sizes (n = 6) and so the exact probabilities m a y be used. In the second, rn :/: n and recourse must be had to the asymptotic approximations instead.
Two sets o f data are run. The first has equal group
6 5 i.(0. 6 1 1GC.TEST RUH WITH EQUAL GROUP SIZE i N - 6 ) 6~ ~8 37 ~Z 4~ 58 23 11 4~, Z~ 22 3t%TEST RU~ WITH U~EQUAL GROUP SIZE 33 26 16 T4 T6 ~ 22 ~ 13 5 9 29 22 14 10 10 ~ 64*5 ~2,5 6 3
Output This is the output for the above input as it appears on the line printer.
UNCEMSORED TOTAL CUTOFP TIM~
EXP~RIMF~,T ~UMSER ~ GROUP I GROUP 2 ~ ............ ---1 ............................................ 6 6 l~,r 1~,0
NBAR- 6 , ~ ~" ~ C~t~ PRXWE- 2 EXACT ~ R O B A ~ I L I T I E $ ( F q p EQUAL GROUP SIZE ONLY)! ASYMPTOTI~-~O~~RI-L-~r'~~-I . . . . . . . . . . .
UNCENSOREO TOTAL CUTOFF T I ~
TE~T RUN WITH EQUAL GROUP SIZE ( ~ g 6 )
--
~XPFRIMEhT NUM~fR 2 GROUP ~-TROOP- 2 10 12 23 Z8 40e~ 3CeP
.
.
.
.
.
.
Zl" O,OO FOR 1>2 Z.OOOe ~OR 1 ) 2 l , O 0 0 e
Z2" FOR 2>1 FOR Z>X
1.16 eO30e ,OTbt
FOR 1~2 FOR X#Z
cob1 ,X49
F~R ; # Z FOR X # |
e|G2' el~
TEST RUN WITH UNEQUAL GROUP SIZE . . . . . . . . . . . . . . . . . . . . .
NBARmZ~e2 ~5 C" 7.~ FRIHE" Z EXACT ~ R ~ ) - q ~ T L I T I ~ f | r ~ - ~ - Q ~ A [ GRCuP-~IZE ONLY)i ASYqPTOTIC D R O ~ A ~ I L I T I ~ S l
Zl" le12 ¢OR 1 ) 2 - - - . 0 4 1 e KOR 1>2 ,Obq~
' ZZ m OeO0 FOR Z)X |eOOOp FOR Z > l leOOOe