European Journal of Pain (2001) 5: 457±463 doi:10.1053/eujp.2001.0257, available online at http://www.idealibrary.com on
1
A review of statistical methods for analysing pain measurements Noori Akhtar-Danesh Department of Community Medicine, Jahrom School of Medicine, Jahrom, Iran
In this article the usual methods of analysis of postoperative pain data, measured by a visual analogue scale at several time points, have been studied. These methods are mostly inappropriate or inefficient. Some better methods are discussed and the antedependence test is suggested as a suitable method. With this method, differences between overall profiles and at each time point are detectable and the method can be used efficiently even if some observations are missing. # 2001 European Federation of Chapters of the International Association for the Study of Pain KEYWORDS: antedependence test, postoperative pain, repeated measures, missing data.
INTRODUCTION
VISUAL ANALOGUE SCALE
Providing relief for patients suffering acute pain is of great importance to the medical profession. Studies aimed at reducing the pain experienced by patients in the immediate postoperative stages, that is in the hours after they regain consciousness, can be found across a broad range of clinical medicine. The aim of such trials is to determine which of two or more treatments give the greatest amount of pain relief or which surgical procedure for the same operation is the less painful. Pain is a subjective and private event (McGrath and Unruh, 1994) and unlike many other variables, such as heart rate or blood pressure, cannot be directly measured. A solution to this problem which is often seen in acute pain studies is to ask patients to assess their own pain level on an appropriately worded scale.
Among the techniques which are used to measure postoperative pain perhaps the Visual Analogue Scale (VAS) is the most sensitive (Huskisson, 1974) and the most familiar one (Chapman et al., 1992). Usually it involves the use of a 10 cm straight line on a piece of white paper labelled at the left end as `no pain' and at the right as `worst pain imaginable' or similar descriptions (Revill et al., 1976; Max and Laska, 1991; Gracely, 1994). To use the VAS the patient is simply asked to mark on the line the amount of pain that he or she is feeling. Usually, in clinical trials pain is measured at several times for each subject in each treatment group: a prime example of a set of repeated measures data. The nature of VAS and which methods to use for analysing VAS scores are matters of controversy. Compared with the other pain measures an important advantage of VAS is that it provides an unlimited number of possible responses along a single continuum (Scott and Huskisson, 1976). One issue is whether the VAS represents ordinal, discontinuous data or interval, continuous information. An ordinal scale refers to simple rank ordering data usually specified by numbers. In
Paper received 10 October 2000 and accepted in revised form 22 May 2001. Correspondence to: N. Akhtar-Danesh, Department of Community Medicine, Jahrom School of Medicine, Jahrom, Iran. Tel and Fax: 0098 791 31520; E-mail:
[email protected]
1090-3801/01/040457 + 07 $35.00/0 & 2001 European Federation of Chapters of the International Association for the Study of Pain
458
contrast an interval scale refers to equal distance between unit values. Most psychological tests such as the VAS provide more information than simple rank ordering, but equality of units cannot be ensured. The data thus lie somewhere between ordinal and interval values (Philip, 1990). Some statisticians consider VAS as an improved version of ordered categories (Altman, 1991). However, some papers argued that VAS measures can be treated not only as an interval scale but also as a ratio scale (Price et al., 1983; Price and Harkins, 1992; Price et al., 1994). Philip (1990) indicated that according to some statistical sources on psychological tests, the parametric techniques are applicable for psychological test data that do not represent clearly interval values. In particular, Maxwell (1978) has validated the use of parametric statistics for VAS scores. In summary, the VAS represents data that, although not clearly values, are most appropriately analysed by parametric techniques. These techniques permit statistical inferences without altering the risk of type I and type II errors, whereas use of non-parametric techniques may increase the likelihood of type II errors (Philip, 1990). In this paper, first of all we review the usual methods of analysing pain scores based on VAS measures as well as whether these methods are suitable or not. Nevertheless, in suggesting suitable methods, we obviously prefer the parametric methods for the above-mentioned reasons. In addition, as we know in marking each score the patient states the amount of pain as a proportion of the VAS line which might differ by up to 20 cm. Hence, if the distribution of scores is far from normal an arcsine or logit transformation is likely to convert the distribution to normal. Furthermore, in the case of a sufficient sample size in each group, on the basis of the central limit theorem there would be no need for any transformation. Nonetheless, when the sample size in each group is too small, using non-parametric methods seems to be appropriate.
AN EXAMPLE Table 1 shows pain measures from a randomized clinical trial (Seymour et al., 1996). Seventy-nine European Journal of Pain (2001), 5
N. AKHTAR-DANESH
TABLE 1. Example data: pain measures based on VAS, 0, 15, 30, 45, 60 and 90 min after treatment by placebo or paracetamol. Missing data are denoted by *. Placebo 0
15
Paracetemol 30
45
60 90
0
15 30 45 60 90
76 58 48 42 36 34 65 67 66 66 30 30 54 54 25 13 13 9 44 42 42 37 29 29 46 48 30 29 29 31 30 22 6 6 3 3 61 53 41 37 37 38 36 36 27 15 16 18 53 44 40 25 35 46 41 42 35 34 43 22 45 39 24 12 0 0 86 59 54 54 36 26 76 83 64 43 33 23 52 39 30 24 19 11 47 42 41 35 37 41 53 53 43 41 41 20 50 27 15 12 8 38 68 73 59 33 20 20 47 47 18 16 17 38 66 56 49 45 27 24 45 45 34 21 9 2 72 54 53 53 53 54 69 73 59 54 54 39 41 38 55 55 44 27 75 73 76 75 62 73 43 19 7 7 7 0 43 26 11 7 28 38 52 53 38 28 26 24 61 61 54 50 39 25 44 31 20 20 18 5 63 63 62 60 60 58 58 55 53 58 46 37 38 36 36 10 9 10 64 50 36 16 18 3 72 61 48 47 28 46 30 29 31 29 19 28 76 66 61 56 67 90 31 30 28 17 18 10 68 69 71 70 70 74 44 44 23 24 24 12 46 49 39 42 33 32 51 50 33 33 20 15 54 45 30 30 15 10 68 69 54 60 58 44 44 45 47 20 70 70 53 51 42 12 5 5 100 100 100 100 81 86 52 45 41 33 28 19 48 27 27 13 13 28 38 37 30 30 30 37 36 36 37 35 37 39 39 32 30 29 29 25 37 20 20 17 16 16 31 17 17 4 15 12 45 44 44 43 41 41 55 55 42 17 17 21 48 26 26 40 47 * 65 68 69 52 44 36 78 78 75 63 63 63 63 61 53 37 37 36 49 52 36 0 0 0 30 28 22 17 16 14 41 34 29 26 22 34 83 83 68 70 43 45 48 49 49 42 41 48 69 57 48 47 46 18 72 70 61 56 48 42 76 77 65 29 7 0 60 64 54 53 53 59 52 48 46 39 38 32 80 79 67 66 61 38 45 45 44 44 43 * 62 62 52 45 43 43 78 78 81 88 62 60 42 41 44 44 45 45 83 82 82 74 81 89 49 42 41 41 42 48 53 52 48 46 46 46 65 44 28 28 16 19
patients who had their third molar tooth extracted were in the study. The first 39 patients received placebo and the other 40 received 1000 mg paracetamol after tooth extraction. Pain was measured using VAS. Figure 1 shows a scatter plot of pain for the two groups for the first 90 min after tooth extraction.
S TATISTICAL
FIG. 1.
METHODS FOR PAIN MEASUREMENTS
459
Scatter plot of the VAS measures of Table 1.
USUAL METHODS OF ANALYSIS In this section the statistical methods which are commonly used in the literature are discussed and exemplified using the data of Table 1. Wherever it is appropriate, a 95% confidence interval (CI95%) for the mean difference between two groups and the p value are provided. In the time-by-time method the differences among treatment groups are examined at each time point. The most frequently used tests are the t-test and Mann±Whitney test for two treatment groups and analysis of variance (ANOVA) and Kruskal±Wallis tests for more than two treatment groups. The advantage of this method is that it is easy to use and does not need much statistical background. However, it suffers from two substantial defects. First, successive tests are not independent, particularly when successive measurements are highly correlated. Second, successive tests cannot measure the treatment effects over time (Matthews et al., 1990; Armitage and Berry, 1994; Kenward, 1987). Using t-test for the data of Table 1, the two groups are different only on time point 6 (CI 95% (5.2, 23.4), p 0.003).
In repeated measures ANOVA (RepANOVA) time, treatment and time by treatment interactions are modelled. This is considered as a splitplot design, with individuals as main plots and time points as subplots. Apart from time-by-time methods, RepANOVA is used more than other methods to analyse repeated measures data. The technical drawback of this method is that the subunits, time points, cannot be randomized within main units. In using this method it is supposed that the correlation coefficients are equal between all time points. In other words, data at all time points have the same variance. This seems to be a naive assumption because in many situations the correlation between closer time points is higher than that for further time points. Consequently, different contrasts of subplots (time points) could have different variances. However, there are some useful methods for correction, for example the method which was suggested by Greenhouse and Geisser (1959). This method indicates that the difference between the two groups of Table 1 is significant ( p < 0.001). Summary measures are derived from all measurements that belong to each subject and used instead of the raw data. In general, many types of European Journal of Pain (2001), 5
460
summary measures can be made, for example mean, median, area under curve (AUC), maximum value and minimum value. Two particular examples for pain scores are summed pain intensity difference (SPID) and maximum pain relief. A suitable summary measure may or may not exist. In the following the use of AUC, SPID and maximum pain relief are explained.
Area under curve This method is discussed by Matthews et al. (1990) in terms of general application to clinical data and by Max and Laska (1991) with direct reference to acute postoperative pain measurement. For the data of Table 1 the mean values of AUC for placebo and paracetamol groups are 64.5 and 56.6 respectively. Using a t-test, the difference between the two groups is not significant (CI95% (ÿ3.3,19.2), p > 0.10). It is worth mentioning that for two patients who withdrew from the study the last measures before drop-out are used instead of missing values for calculating the AUC. The method of computing the AUC can be found in Matthews et al. (1990). AUC is the most frequent summary measure found in the literature.
Summed pain intensity differences For each subject at each time pain intensity difference (PID) is the difference between pain score and the baseline score, in the direction that a positive PID indicates that the treatment was effective. The SPID is the area under the curve of PIDs. For the data of Table 1 the mean values of the SPID for placebo and paracetamol groups are 20.3 and 24.8 respectively, and the difference between means is not significant (CI95% (ÿ11.6,2.6), p > 0.10). Despite their simplicity, neither AUC nor SPID may be able to distinguish between different shape of profiles. For instance, a short-acting, highly effective treatment could have nearly the same AUC as a long-acting marginally effective treatment (Max and Laska, 1991). Furthermore, European Journal of Pain (2001), 5
N. AKHTAR-DANESH
the way that missing data are dealt with is a matter of some controversy. Maximum pain relief Maximum pain relief is the maximum value of PID for each subject. For the data of Table 1 the mean of maximum pain relief is 25.0 for placebo and 30.1 for paracetamol groups. Using a t-test, the difference between the means of the two groups is not significant (CI95% (ÿ12.1,2.1), p > 0.10). This summary measure ignores rather a lot of the available data. Multivariate ANOVA (MANOVA) is rarely found in the literature. This method is explained in the next section. The routine application of RepANOVA, summary measures and MANOVA is severely hindered by the occurrence of missing data. Such data usually occur as a result of the ethical requirement, necessarily imposed on pain relief trials, that the patient has the right to withdraw from the study whenever he or she wants to, which is generally documented in the literature as when the pain relief is inadequate.
APPROPRIATE METHODS FOR ANALYSIS In this section some statistical methods which could be appropriate for analysis of pain data measured by VAS are mentioned and the results of applying them to the above example are presented. Summary or derived measures This method has been explained in the previous section. Log-rank test This method is based only on the time of withdrawals and therefore fails to use any of the pain scores. The null hypothesis of the test assumes
S TATISTICAL
461
METHODS FOR PAIN MEASUREMENTS
that the chance of an event occuring is the same in all treatment groups (Armitage and Berry, 1994). Therefore, the number of events, withdrawals, in each group is proportional to the number of subjects at risk. In addition, drop-outs which occur for reasons other than insufficient treatment effect are considered as censored data. This method is not used for the data of Table 1 because there are not enough drop-outs in this table. Multivariate analysis of variance In this method several variables are measured for each subject. The aim is to find any difference between the means of the variables in the different groups. A significant result indicates that there are at least two variables whose means are different. In pain studies it is supposed that measurements at different time points are different variables. Using MANOVA the difference between the two groups of Table 1 is significant (p 0.003). It also indicates that this difference is due to a difference in the sixth time point (minute 90).
A REVIEW OF THE LITERATURE To exemplify the usual methods in the literature, 73 papers from 12 journals which used VAS to measure postoperative pain were reviewed. Studies of chronic or experimental pain were not included. These papers were published from January 1993 to June 1994 and were available at the timeof study,October1994,inthe MedicalLibrary of the University of Newcastle upon Tyne, UK.
RESULTS Table 2 shows the frequency of papers in different journals. Sixty-five (89%) papers were from seven professional analgesia journals. Table 3 shows the statistical tests that had been applied in these studies. As can be seen, time-bytime methods were used in 59 (81%) studies. In 11 studies (15%) RepANOVA was used. Summary measures and MANOVA were used only in two and one studies respectively.
TABLE 2.
Frequencies of journals.
Journal
Frequency
Percentage
15 15
20.5 20.5
12
16.4
7
9.6
7
9.6
5 4
6.8 5.5
2
2.7
2
2.7
2 1
2.7 1.4
1
1.4
73
100.0
Anaesthesia Anesthesia and Analgesia British Journal of Anaesthesia Acta Anaesthesiologica Scandinavica Canadian Journal of Anaesthesia Anesthesiology Anaesthesia and Intensive Care The Australian and New Zealand Journal of Surgery British Journal of Surgery Pain Annals of Royal College of Surgeons of England The Annals of Thoracic Surgery Total
TABLE 3.
Statistical tests.
Test(s) Time-by-time tests RepANOVA Summary measures MANOVA Total
Frequency
Percentage
59 11 2 1 73
80.8 15.1 2.7 1.4 100.0
As can be seen, it seems that most pain studies are not using appropriate methods for analysis. In the next section we introduce the antedependence test as an alternative method for analysing pain data. This method has none of the theoretical shortcomings possessed by time-bytime methods and RepANOVA. In addition, it can be used in the case of a reasonable number of drop-outs.
ANTEDEPENDENCE TEST This test is based on the fact that usually in longitudinal data each observation on each European Journal of Pain (2001), 5
N. AKHTAR-DANESH
462
subject is positively correlated with previous observations. The antedependence method was first proposed by Gabriel (1961,1962). In an antedependence structure of order r, the observation of occasion i (i > r), given the previous r observations, is independent of further earlier observations. In effect, the antedependence test uses the method of analysis of covariance at each time point and combines the results of all time points to make a 2 statistic. To avoid using mathematical expressions and to keep the paper at a reasonable size the antedependence test is not discussed in detail and the interested readers are referred to Kenward (1987) and Akhtar-Danesh and Appleton (2000). Gabriel (1961) also argued that using a test based on an antedependence structure of order 1 is preferable even if it does not hold exactly but is a good approximation. Byrne and Arnold (1983) showed that a first-order antedependence test is more powerful than MANOVA. Kenward (1987) introduced a method to be used for data with the antedependence structure. This method consists of two tests: first, a test for finding the order of antedependence structure; second, a test for comparing treatment groups. We have applied the antedependence test to the data of Table 1. Using the test, an antedependence structure of order 1 fits the data well. Also, the difference between profiles is significant (p < 0.01). Knowing that the profiles differ, it may be of interest to detect at which time points the differences happen. This is done by the ordinary analysis of covariance test which is carried out at each time point by the antedependence test, with the conclusion that the means of treatments differ at minutes 60 and 90 (compared with the t-test and MANOVA in which the difference occurred only at minute 90). Details of calculations can be found in Akhtar-Danesh and Appleton (2000) and Akhtar-Danesh (1997).
TABLE 4.
COMPARING THE ANTEDEPENDENCE TEST WITH THE t-TEST AND REPEATED MEASURES ANALYSIS OF VARIANCE As discussed in the first section, in a time-by-time method the dependence between time points is ignored, so in the presence of significant serial correlation the results will not be the same as from an antedependence test. Table 4 gives CI 95% for the difference between the means of the two treatment groups of Table 1 at each time point based on the antedependence method and the t-test. As can be seen from this table the confidence interval for the antedependence test has about half the width of that for a t-test. The former may therefore be expected to be the much more powerful method. Also, comparison with RepANOVA shows that the power of the antedependence test is nearly the same as that of RepANOVA (AkhtarDanesh and Appleton, 2000).
DISCUSSION Analysis of acute pain measures based on a VAS needs to be handled carefully. As this investigation illuminates, it seems that in most pain studies inapproriate or inefficient methods are used (Table 3). The reasons may be, firstly, that the timeby-time method is very easy to use and needs simple statistical calculations as well as less statistical background and, secondly, that frequent missing data prevent the use of other suitable methods. We suggest using the antedependence test for analysing acute pain measures in which there is usually a positive correlation between measurements at successive time points, particularly adjacent time points. Using this method the differences between overall profiles and at each time point are detectable. In addition, all the
Confidence intervals for the differences between two groups.
Time point t-Test Antedependence test
European Journal of Pain (2001), 5
1 2.3 6.9 2.3 6.9
2 2.8 7.8 2.8 3.5
3 2.1 8.3 2.1 3.5
4 1.9 9.2 1.9 4.3
5 6.5 8.6 6.5 4.4
6 14.3 9.1 14.3 4.2
S TATISTICAL
METHODS FOR PAIN MEASUREMENTS
calculations can be done using the method of analysis of covariance which is available in general statistical software. In a simulation study it has been shown that an antedependence structure of low order, 1 or 2, fits acute pain measures well. Moreover, when there are not too many missing data this method can still be used efficiently (Akhtar-Danesh, 1997) although the bias which may be imposed on estimation of the treatment effects by non-random missing data is not known.
ACKNOWLEDGEMENTS I am grateful to Professor R. A. Seymour for making available to me the data analysed herein. I am also thankful to Dr David Appleton for reading the first draft and Professor Peter Kelly for his useful comments. I would like to thank the referees for their valuable comments.
REFERENCES Akhtar-Danesh N. Statistical aspects of studies measuring postoperative pain. PhD thesis, University of Newcastle upon Tyne, 1997. Akhtar-Danesh N, Appleton DR. Using an antedependence test to analyse post-operative pain measurements. Stat Med 2000; 19: 1889±1899. Altman DG. Practical Statistics for Medical Research. London: Chapman and Hall, 1991. Armitage B, Berry G. Statistical Methods in Medical Research. Oxford: Blackwell, 1994. Byrne PJ, Arnold SF. Inference about multivariate means for a nonstationary autoregressive model. J Am Stat Assoc 1983; 78: 850±855. Chapman CR, Donaldson GW, Jacobson RC. Measurement of acute pain states. In: Turk DC, Melzack R, editors. Handbook of Pain Assessment. New York: Guilford, 1992: 332±343. Gabriel KR. The model of ante-dependence for data of biological growth. Bull Int Stat Inst, 33rd Sess 1961; 39: 253±264.
463 Gabriel KR. Ante-dependence analysis of an ordered set of variables. Ann Math Stat 1962; 33: 201±212. Gracely RH. Studies of pain in normal man. In: Wall PD, Melzack R, editors. Textbook of Pain. Edinburgh: Churchill Livingstone, 1994: 315±336. Greenhouse SW, Geisser S. On methods in the analysis of profile data. Psychometrika 1959; 24: 95±111. Huskisson EC. Measurement of pain. Lancet 1974; 2: 1127±1131. Kenward MG. A method for comparing profiles of repeated measurements. Appl Stat 1987; 36: 296±308. Matthews JNS, Altman DG, Campbell MJ, Royston P. Analysis of serial measurement in medical research. Br Med J 1990; 300: 230±235. Max MB, Laska EM. Single-dose analgesic comparisons. In: Max MB, Portenoy RK, Laska EM, editors. Advances in Pain Research and Therapy. New York: Raven, 1991: 55±95. Maxwell C. Sensitivity and accuracy of the visual analogue scale: a psycho-physical classroom experiment. Br J Clin Pharmacol 1978; 6: 15±24. McGrath PJ, Unruh AM. Measurement and assessment of paediatric pain. In: Wall PD, Melzack R, editors. Textbook of Pain. Edinburgh: Churchill Livingstone, 1994: 303±313. Philip BK. Parametric statistics for evaluation of the visual analogue scale. Anesth Analg 1990; 71: 710. Price DD, Harkins SW. Psychophysical approaches to pain measurement and assessment. In: Turk DC, Melzack R, editors. Handbook of Pain Assessment. New York: Guilford, 1992; 111±134. Price DD, McGrath, PA, Rafii A, Buckingham B. The validation of visual analogue scale as ratio measures for chronic and experimental pain. Pain 1983; 17: 45±56. Price DD, Bush FM, Long S, Harkins SW. A comparison of pain measurement characteristics of mechanical visual analogue and simple numerical rating scales. Pain 1994; 56: 217±226. Revill SI, Robinson JO, Rosen M, Hogg MIJ. The reliability of a linear analogue for evaluating pain. Anaesthesia 1976; 36: 1191±1198. Scott J, Huskisson EC. Graphic representation of pain. Pain 1976; 2: 175±184. Seymour RA, Kelly PJ, Hawkesford JE. The efficacy of ketoprofen and paracetamol (acetaminophen) in postoperative pain after third molar surgery. Br J Clin Pharmacol 1996; 41: 581±585.
European Journal of Pain (2001), 5