The analysis of clinical studies: The use of nonparametric techniques

The analysis of clinical studies: The use of nonparametric techniques

By the Numbers BY THE NUMBERS The analysis of clinical studies: the use of nonparametric techniques As stated in previous columns, the better-know...

50KB Sizes 0 Downloads 73 Views

By the Numbers

BY

THE

NUMBERS

The analysis of clinical studies: the use of nonparametric techniques As stated in previous columns, the better-known parametric tests rely on normal theory. When data are not approximately normal, unless there is a specific test that is uniquely appropriate, an attempt should be made to transform the data so that after transformation there is an appearance of unimodality and symmetry. This allows analysis by using the statistical tests based on normal theory. When the data are sufficiently pathologic in the sense that there are no obvious apparent transformations that allow standard parametric statistical procedures to be used, the only remaining remedy is to use nonparametric techniques to analyze the data. The same is true when the sample size is too small to allow the determination of normality, or lack of it, to be made. This column will focus on some of the better-known nonparametric methods that can be used, especially those available on computerized software packages. For context, these methods will be identified with the parametric methods most closely corresponding to a significance test of an intuitively related hypothesis. A minor distinction is made between nonparametric procedures and distribution-free procedures. The terms are often used interchangeably, but the latter is correctly used only if no assumptions are made about the population sampled. Because most of the techniques discussed in this column will implicitly rest on some distributional assumptions (e.g., related to a true, but unknown, mean value), the term nonparametric will be used here. The main descriptive statistics in a nonparametric setting are the median, the interquartile range, and the full range of values observed. These are based on the rank order of the underlying values. In 422

GASTROINTESTINAL ENDOSCOPY

the preliminary examination of data, it is reasonable to calculate these descriptive statistics for the populations under consideration, as well as for relevant subgroups. Although the mean value is the most representative single summary statistic in the parametric setting, the median mean value is the most representative single summary statistic in the nonparametric setting. The median represents the midpoint of ordered values and is formally defined as follows: if the number of values, n, is odd, so that n = 2m + 1 for some whole number, m, then the median is the value that has a ranking of m + 1 when the values are ranked in ascending order. If the number of values, n, is even so that n = 2m for some whole number, m, then the median is the average of the two values ranked m and m + 1 when the values are ranked in ascending order. The median has about half the values above it and about half the values below it. When examining ranks of individual values as a percentage of the total number of values ranked (the so-called percentiles of a distribution), the median is the 50th percentile. Intuitively, when dealing with two populations of possibly different sizes, it would seem to be the most representative single comparative statistic. As opposed to the mean value, the median is largely unaffected by the presence of large or small outlying values. However, in a population that is symmetric, the mean and median will take similar values. If a primary hypothesis of a research study would be addressed by a test of equality of means in the parametric setting, then a nonparametric substitute could be a test of equality of medians. To describe the variability of values in a population, the standard deviation is cited in a parametric setting. The corresponding nonparametric expression is the interquartile range (IQR), defined as the interval (25th percentile–75th percentile), which is represented by the two values separated by a hyphen. For very small populations, the IQR may be somewhat affected by outliers. The range is also of interest and is often presented; however, because it is extremely affected by the size of the population observed as well as by the presence of any outliers, its main purpose is to add completeness to descriptive statistics of the population. Other percentiles of a distribution may also be given, but these usually arise in specific problem settings for comparison with published historical data. Box plots are highly useful pictorial descriptions of the distribution of values. They give the IQR in a rectangular box, with an indication of where the median lies inside the box. Some box plots indicate the entire range past the quartiles by extended lines called whiskers. The latVOLUME 54, NO. 3, 2001

By the Numbers

ter figures are referred to as box-and-whisker plots. Some intricate box-and-whisker plots omit outliers from the whiskers and, rather, indicate them by a different symbol (e.g., an asterisk used by the Minitab statistical software package, Minitab Inc., State College, Pa.) outside of the range of the whisker lines to show a more useful picture of the range. These figures allow sample data to be shown nonparametrically in a way that parallels confidence intervals around a mean in the parametric setting. After the preliminary examination of data, the issue is whether the observed data differ across groups. The situations with two groups will be discussed first. These nonparametric tests will be comparable to the use of Student t tests in the parametric setting and, as in the parametric setting, the limitation to only 2 groups allows a wider range of alternatives as opposed to 3 or more groups of interest. If the data to be analyzed consist of observations in matched pairs, the situation is analogous to the parametric paired t test. The sign test is a simple way to test the null hypothesis that the median difference is zero. Intuitively, this test examines the signs of paired differences and infers that the null hypothesis is false whenever a binomial test, interpreting a plus sign as heads and a minus sign as tails, would reject the null hypothesis of a fair coin (i.e., the null hypothesis that probability of heads = probability of tails = 1⁄2). If a pair have the same value, with a difference of zero, the data points are omitted from the analysis. The sign test, however, has the undesirable property of not making full use of the available data. A nonparametric test for these data that makes greater use of the available data is the Wilcoxon matched-pairs signed-rank test. Although more complicated than the sign test, it is widely used to test the null hypothesis that the median difference is zero. If the data to be analyzed are not paired observations, the situation is analogous to the parametric two-sample t test. An adaptation of the chi-square test for contingency table data, categorizing each data point as above or below the overall median, is simple to apply but fails to make full use of the available data, like the sign test for paired data. A nonparametric test for these data that makes greater use of the available data is the rank sum test, or its equivalent, the Mann-Whitney U test. In these tests an overall ranking of the ungrouped data yields the sums of ranks within each group. The sums of ranks yield the test statistics for which tables are readily available. It should be noted that although the null hypothesis tested is equality of central location, the rank sum test and MannWhitney U test are really tests of the null hypotheVOLUME 54, NO. 3, 2001

sis that the two populations from which the samples arose have identical distributions, involving both central location and spread. In order to compare two distributions, the KolmogorovSmirnov goodness-of-fit test may be an excellent nonparametric technique to use. It is often used to compare two sets of data with unknown true distributions, whereas the better known chi-square goodness-of-fit test is appropriate to test whether data conform to a specified theoretical distribution. A type of generalization of the Kolmogorov-Smirnov goodness-of-fit test is the Kruskal-Wallis test for 3 or more populations to test whether the underlying distributions are the same. Returning to the line of thought of testing equality of central location, the parametric situation moves from t tests when comparing 2 groups, to analysis of variance (ANOVA) when comparing 3 or more groups. The parametric ANOVA tests of equality of means have nonparametric counterparts. The one-way ANOVA is used when there is only one factor defining the differences between groups to be compared. The Kruskal-Wallis one-way ANOVA by ranks allows the nonparametric testing of the null hypothesis of equality of medians for 3 or more groups distinguished by different levels of a single factor. Sometimes it may be true that the groups being compared are meaningfully distinguished simultaneously by different levels of two different factors. In the parametric setting a two-way ANOVA model would be appropriate. In the nonparametric setting, a frequently used analogous test is the Friedman two-way ANOVA by ranks. There is one notable difference from the parametric setting in which added insight might be gained by investigating the potential interaction of the two factors in the model. However, treatment of interactions is less tractable in the nonparametric setting because, among other reasons, one factor is often assigned to the random effect of blocking, which is part of the experimental design. This column has outlined a progression of nonparametric techniques corresponding to the increasingly complex progression of techniques available for analysis in the parametric setting. The analytical approach for a research study must be chosen, first, to address the hypotheses of primary importance to the study and, second, to conform to limitations of the data. Power calculations depend on the analytical strategy that will be used. The insights gained from examining the distributional properties of the key variables in preliminary data are essential to the researcher conceptualizing the analytical strategy. This is especially true when the nature of the data may narrow choices of tests of significance, because the calculations of needed sample size are correGASTROINTESTINAL ENDOSCOPY

423

spondingly impacted. The next column will suggest some additional analytical tools for situations where standard techniques may not be appropriate. Sara M. Debanne, PhD Douglas Y. Rowland, PhD Cleveland, Ohio doi:10.1067/mge.2001.116630

424

GASTROINTESTINAL ENDOSCOPY

VOLUME 54, NO. 3, 2001