Reliability of joint count assessment in rheumatoid arthritis: A systematic literature review

Reliability of joint count assessment in rheumatoid arthritis: A systematic literature review

Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]] Contents lists available at ScienceDirect Seminars in Arthritis and Rheumatism journal homepag...

423KB Sizes 1 Downloads 71 Views

Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

Contents lists available at ScienceDirect

Seminars in Arthritis and Rheumatism journal homepage: www.elsevier.com/locate/semarthrit

Reliability of joint count assessment in rheumatoid arthritis: A systematic literature review Peter P. Cheung, MBBS, FRACP, FAMSa,b,d,n, Laure Gossec, MD, PhDc, Anselm Mak, MD, FRCP, FAMSa,b, Lyn March, MBBS, FRACP, MSc (Epidem & Biostat), FAFPHM, PhDd a

Division of Rheumatology, National University Health System, Singapore Yong Loo Lin School of Medicine, National University of Singapore, Singapore Department of Rheumatology, UPMC Univ Paris 06, GRC-UPMC 08 (EEMOIS), AP-HP, Pitié-Salpêtrière Hospital, Paris, France d Department of Rheumatology, University of Sydney, Institute of Bone and Joint Research, Royal North Shore Hospital, St Leonards, Australia b c

a r t i c l e in fo

Keywords: Synovitis Rheumatoid arthritis Health care professional and patient joint counts Reliability Reproducibility Clinical examination

a b s t r a c t Background: Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. Objectives: To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Methods: Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intraobserver and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Results: Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49– 0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64–0.88 vs. 0.29–0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31–0.91) compared to SJCs (0.16–0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Conclusion: Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. & 2013 Elsevier Inc. All rights reserved.

Introduction Rheumatoid arthritis (RA) is an inflammatory disorder characterized by synovitis, a process of joint inflammation that leads to joint destruction, and ultimately to physical disability [1]. Early detection and treatment of synovitis has been shown to reduce radiographic progression and has been part of the overarching principles of “treating-to-target,” with the ultimate objective of achieving and remaining in remission [2]. Hence, it is important that physicians are able to reliably detect clinical synovitis. n Correspondence to: Division of Rheumatology, University Medicine Cluster, National University Health System, 1E Kent Ridge Rd, Tower Block Level 10, 119228, Singapore. E-mail address: [email protected] (P.P. Cheung).

0049-0172/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.semarthrit.2013.11.003

Joint count assessment by physicians through swollen and tender joints is considered the most conventional way of detecting clinical synovitis [3], and its importance in disease activity assessment is supported by its inclusion in core data sets of disease activity indices such as the Disease Activity Score in 28 joints (DAS28) [4] and the American College of Rheumatology (ACR) response criteria [5] used in clinical trials, research, and clinical practice. In addition, it is part of the Outcomes in Measures for Rheumatoid Arthritis in Clinical Trials (OMERACT) RA Core Set [5]. As a measurement outcome, joint counts can measure disease activity in RA [5], are sensitive to change [6], predictive of radiographic progression [5,7], and correlate with other surrogate markers of disease activity [8]. Although physicians traditionally perform joint counts, nurses and other health professionals (metrologists) have started to participate, either as part of extended roles or for clinical research [9]. In addition, the

2

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

association of patient-reported swollen and tender joints when compared with joint counts derived by physicians or metrologists has been evaluated [10]. This increased interest is due to the evidence that regular assessment of disease activity is important for patients to achieve and remain in remission [2]. Specifically, detection of residual swollen joints is clinically important for patients in clinical remission (DAS28 o 2.6), as this contributes significantly to continual radiographic progression [7]. Hence, monitoring especially between clinic visits with joint counts will potentially optimize achievement of “treating-to-target” [11]. However, the reliability by both the same person performing the assessment (intra-observer) and between different assessors (interobserver) is debatable, particularly the inter-observer reliability [12,13]. It is unclear how reliable physician- or patientreported joint counts are and whether reliability of patient-reported joint counts is comparable to that between health care professionals (i.e., physician or metrologist). Training through didactic teaching or standardization through group consensus following set guidelines has been used to improve reliability in various clinical outcome measures in rheumatology [14,15]. It is uncertain whether this can improve reliability in joint count assessments.

The objective of this systematic review was to evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by physician, metrologists, or patients and to assess whether training or standardization may improve joint count reliability and agreement in these studies.

Methods Literature search strategy A systematic search was performed in PubMed, EMBASE, and Cochrane databases up to September 2012. Inclusion criteria were articles reporting reliability or agreement of joint counts or articular indices in RA derived by physicians, metrologists, or patients. The search was limited to humans and publications in English and French. The following exploded medical subject heading (MeSH) terms were used in PubMed: “Arthritis, Rheumatoid” AND either “Joints” OR “severity of illness index” OR “physicians” OR “patients” AND either “reproducibility of results” OR “observer variation” OR “reliability” OR “analysis of variance.”

Keywords = Pubmed: “Arthritis, Rheumatoid” AND either “Joints” OR “severity of illness index” OR “physicians” OR “patients” AND either “reproducibility of results” OR “observer variation” OR “reliability” OR “analysis of variance”, EMBASE: "rheumatoid arthritis"/exp OR "rheumatoid arthritis" AND "joint"/exp OR joint AND counts AND "reliability"/exp OR reliability OR agreement”, Limit Humans and English only Pubmed=1387 Cochrane=0 EMBASE=85 Duplicates=20 Publications=1452

Publications excluded on basis of title and abstract=1402 Not on topic=992 Other outcome measures=105 No reliability testing=25 Not rheumatoid arthritis=100 Editorial or review=92 Not adults=88

Publications=50

Hand search=4

Publications excluded after obtaining the full text=6

Publications included=48 Physician/Metrologist=28 Patient=20 Fig. 1. Literature search strategy.

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

The following terms were used in EMBASE: “rheumatoid arthritis”/exp OR “rheumatoid arthritis” AND “joint”/exp OR joint AND counts AND “reliability”/exp OR reliability OR agreement” (Fig. 1). A hand-search of references was subsequently performed including abstracts from meetings of the European League Against Rheumatism (EULAR) and ACR from 2009 to 2012. Experts in the field were contacted to ensure that important publications were not missed. Reviews, editorials, and comments were excluded.

3

checklist of 11 items that explores 7 principles considered important in diagnostic reliability studies (Appendix). Items cover the spectrum of subjects, spectrum of examiners, examiner blinding, order effects of examination, suitability of the time interval among repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis [28]. One author (P. C.) assessed the risk of bias in all the articles with a blinded second assessment by another author (A.M.) in a random 25% of articles (excellent agreement was achieved with prevalence-adjusted biasadjusted kappa (PABAK) of 0.85).

Data extraction Data analysis General data One investigator (P.C.) selected the articles using the MeSH terms described, formulated with 2 university librarians. Data were extracted using a predetermined form on year of publication, study design, type of joint count, or articular index. The number of observers, the number of joints involved in the reliability testing, and the methodology used were recorded. If available, the level and method of training and standardization were recorded descriptively. Patient characteristics including age, disease duration, and level of disease activity were collected. The joint count has been described in many formats [16–24]. The various joint count formats recorded including tender joint count (TJC), swollen joint count (SJC), Ritchie articular index (RAI), Thompson–Kirwan articular index (TKAI), and the American Rheumatism Association (ARA) index are further described in Appendix. Articular indices with weightings and semi-quantitative grading were analyzed separately from binary joint counts.

A descriptive analysis was undertaken. Statistical analysis was performed with SPSS version 20 (SPSS, Chicago, IL). For indicative purposes, a pooled summary estimate was calculated on studies that evaluated reliability of the 28 TJCs and SJCs, reporting ICCs. To our knowledge, there is no validated method to pool kappa. Correlation coefficients from ICC were converted using Fisher's transformation and pooled using the random effects model. The summary estimate was transformed back to ICC using Fisher's inverse transformation [29]. Heterogeneity of effect sizes was assessed by the I2 index that indicated the between study variation to the total variation. Correlation coefficients other than ICC such as Pearson's or Spearman's coefficients were not pooled, as this statistic does not describe reliability, but association.

Results Characteristics of the studies

Reliability and agreement of joint counts The main outcome was the reliability or agreement between observers (inter-observer) and within observers (intra-observer). Reliability was assessed on the following levels: (i) Inter-observer  Health Care Professionals (HCP): Compared between HCPs, either between physicians, physicians vs. metrologists, or between metrologists. Metrologists could be nurse, research assistant, or other health professionals.  Patients: Compared to HCPs, either patients vs. physician or patients vs. metrologists. (ii) Intra-observer: Test–retest reliability of one observer, either HCP or the patient. The different types of results reporting reliability and agreement were extracted from the studies. For the purposes of this review, correlations were also extracted. Statistical measures included the following (Appendix): (i) Reliability: It may include intraclass correlation coefficient (ICC) [25], variance, or coefficient of variation. An ICC 4 0.8 was regarded as excellent, ICC 0.61–0.8 as good, ICC 0.4–0.6 as moderate, and ICC o0.4 as poor. (ii) Agreement: Kappa and limits of agreement [26,27]. Levels of agreement from kappa according to Landis and Koch [26] were as follows: poor 0–0.2, slight 0.21–0.4, fair 0.41–0.6, moderate 0.61–0.8, and 40.8 as excellent. (iii) Correlation: Pearson's or Spearman's product-moment coefficient (r). Risk of bias A qualitative assessment of the risk of bias was performed using the Quality Appraisal of Reliability Studies (QAREL) [28], a

Overall, 48 studies were identified (Fig. 1). Among them, 28 studies reported reliability of joint counts or articular indices by HCPs (Table 1), of which 17 studies examined inter-observer reliability, 4 studies reported intra-observer reliability, and 7 reported both. Two studies evaluated TJC reliability while 3 studies reported SJC reliability and 16 reported both. Sixteen studies evaluated reliability of various articular indices such as RAI, TKAI, Lansbury, and the ARA index (Table 1). A majority of studies that reported inter-observer reliability were between physicians only, while 2 were between metrologists only and the remaining studies on both. The most commonly reported results were on the 28 TJCs (n ¼ 11) and 28 SJCs (n ¼ 12). Twenty studies evaluated reliability of patient self-reported joint counts or articular indices (Table 2). The majority of studies reported inter-observer reliability between patients and physicians (n ¼ 10), while 6 were between patients and metrologists and the remaining 4 studies included comparison between both metrologists and physicians. The most commonly reported results were the 28 TJCs (n ¼ 9) and 28 SJCs (n ¼ 8), followed by TKAI (n ¼ 4) and RAI (n ¼ 1). Intra-observer reliability was evaluated in 8 studies, while inter-observer reliability was assessed in all patient studies. Disease characteristics Not all studies reported patient baseline characteristics. For reliability studies on HCPs, there was an average of 38 patients per study. Of the available data, 76% were females with mean TJCs of 11 (95% CI: 5.8–16.2) and SJCs of 7.4 (95% CI: 5.3–9.6). Patient reliability studies had larger sample sizes with an average of 93 patients, with 76% females, mean TJCs of 9.4 (95% CI: 5.1–13.6), and SJCs of 6 (95% CI: 3.3–8.7). In these studies, patients reported higher mean TJCs of 10.3 (95% CI: 6.2–14.3) and lower mean SJCs of 5.4 (95% CI: 3.5–7.3).

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

4

Table 1 Summary of studies reporting reliability of joint counts/articular indices by health care professionals Studies

Patient (no.)

Rater (no.)

Rel.a

TJC

SJC

RAI

Thom.

Lan.

ARA

Lansbury [30] Ritchie et al. [16] Eberl et al. [31] Marks et al. [32] Hansen et al. [33] Hart et al. [34] Klinkhoff et al. [35] Lewis et al. [36] Archenholtz et al. [37] Bellamy et al. [38] Thompson et al. [39] Prevoo et al. [40] van den Brink et al. [41] Scott et al. [42] Escalante [43] Hernandez-Cruz and Cardiel [44] Lassere et al. [45] Naredo et al. [46] Salaffi et al. [47] Walsh et al. [48] Uhlig et al. [49] Alegre et al. [50] Cheung et al. [51] Grunke et al. [52] Marhadour et al. [53] Stamp et al. [54] Radner et al. [55] Cheung et al. [56]

46 72 66 20 10 18 6 42 30 6 4 97 21 8 10 22 10 94 44 12 28 22 50 41/71b 7 5 209 9

3 4 3 3 5 3 9 2 3 6 4 NA 3 8 2 1 2 2 2 6 1 2 2 196/251b 7 23 2 18

Inter Inter Inter Both Inter Inter Both Both Intra Inter Both Intra Inter Inter Inter Intra Inter Inter Inter Both Intra Inter Both Both Both Inter Inter Inter

N N N N Y Y Y N N N N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y N

N N N Y Y N N N N Y N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y

N Y Y N N Y N Y Y Y Y Y N N N Y N N N N N N N N N N N N

N N N N N N N N N N N Y Y N N N N N N N N N N N N N N N

Y N N N N N N N Y Y Y N N N N N N N N N N N N N N N N N

N N Y N N N N N Y Y Y N N N N N N N N N N N N N N N N N

TJC ¼ tender joint count; SJC ¼ swollen joint count; RAI ¼ Ritchie articular index; Thom ¼ Thompson Kirwan articular index; Lan ¼ lansbury; ARA ¼ ARA index. a b

Rel ¼ reliability; type of reliability reported ¼ inter-observer, intra-observer, or both. Twenty-eight TJCs and SJCs/68 TJCs and 66 SJCs.

Quality of studies

Intra-observer reliability, agreement, and association

According to the QAREL checklist, only 19 of 48 (39%) studies fulfilled 7 or more of the 11 items of quality of diagnostic reliability studies (Appendix).

The average time interval varied between repeated measurements to test for intra-observer reliability for HCP studies (2.2 days) compared to patient reliability studies (21 days). HCP TJCs showed moderate to near-perfect intra-observer reliability with ICC ranging from 0.49 to 0.98. SJCs also showed moderate to excellent reliability (Table 3). Agreement at the joint level for HCP showed at least moderate agreement in one study (κ ¼ 0.51) [53]. Although Pearson's/Spearman's correlation did not measure reliability, but rather, the strength of association between 2 variables,

Table 2 Summary of studies on reliability of joint counts or articular indices reported by patients Studies

Stewart et al. [57] Abraham et al. [58] Hewlett et al. [59] Stucki et al. [60] Hanley et al. [61] Prevoo et al. [62] Taal et al. [63] Escalante [43] Alarcon et al. [64] Calvo et al. [65] Houssein et al. [66] Wong et al. [67] Greenwood et al. [68] Figueroa et al. [69] Levy et al. [70] Tricta et al. [71] Kavanaugh et al. [72] Riazzioli et al. [73] Cheung et al. [51] Radner et al. [55]

Patient (no.)

Reliability

40 32 50 55 61 141 43 110 67 60 100 27 45

Both Inter Inter Both Inter Both Both Both Inter Inter Inter Both Inter

N Y N Y Y Y N Y Y Y Y Y Y

N N N Y Y Y N Y N N Y Y N

N N N N N Y N N N N N N N

Y N Y N N N Y N N N N N N

N N N N N N N N N N N N N

N N N N N N N N N N N N N

82 60 126 447

Both Inter Inter Inter

Y Y Y Y

Y Y Y Y

N N N N

N N N N

N N N N

N N N N

47 50 209

Inter Both Inter

Y Y Y

Y Y Y

N N N

N N N

N N N

N N N

TJC SJC RAI Thom. Lan. ARA

a

TJC ¼ tender joint count; SJC ¼ swollen joint count; RAI ¼ Ritchie articular index; Thom ¼ Thompson Kirwan articular index; Lan ¼ lansbury; ARA ¼ ARA index. a

Type of reliability reported ¼ inter-observer, intra-observer, or both.

Table 3 Intra-observer reliability of tender and swollen joints by health care professionals Studies

Interval Count Statistic

Tender joint count (28 joint counts) Walsh et al. [48] 72 h 28 Uhlig et al. [49] 1 week 28 Cheung et al. [51] 48 h 28

ICC ICC ICC

0.99 0.83 0.99

NR NR NR

ICC

0.49

NR

28 28 28 28

ICC ICC ICC Kappa

0.95 0.78 0.83 0.51

28

Pearson's 0.81

NR NR NR 0.31– 0.77 NR

ICC

NR

Tender joint count (other than 28 joint counts) Hernandez-Cruz and Cardiel 48 h 61 [44] Swollen joint count (28 joint counts) Walsh et al. [48] 72 h Uhlig et al. [49] 1 week Cheung et al. [51] 48 h Marhadour et al. [53] 6h Prevoo et al. [40]

NR

Result 95% CI

Swollen joint count (other than 28 joint counts) Hernandez-Cruz and Cardiel 48 h 61 [44]

0.47

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

excellent reliability (ICC: 0.56–0.93). Both TJCs and SJCs appeared to have good to excellent association with r between 0.63 and 0.96.

Table 4 Intra-observer reliability of patient-reported tender and swollen joints Studies

Interval

Count

Statistic

Result

5

95% CI

Inter-observer reliability, agreement, and association Tender joint count (28 joint counts) Cheung et al. [51] 48 h 28

0.94

0.86–0.98

Tender joint count (other than 28 joint counts) Wong et al. [67] 48 h 50 ICC Figueroa et al. [69] 1 h 42 ICC Stucki et al. [60] 2 weeks 40 Spearman's Escalante [43] 1.5 h 40 Pearson's

ICC

0.96 0.90 0.67–0.85 0.89

NR NR NR NR

Swollen joint count (28 joint counts) Cheung et al. [51] 48 h 28

0.56

0.16–0.81

0.93 0.89 0.63

NR NR NR

ICC

Swollen joint count (other than 28 joint counts) Wong et al. [67] 48 h 50 ICC Figueroa et al. [69] 1 h 42 ICC Stucki et al. [60] 2 weeks 40 Spearman's

ICC ¼ intraclass correlation coefficient; NR ¼ not reported.

results are included in Table 3. All joint count indices reported good to excellent association with r between 0.69 and 0.99. For patient reliability studies (Table 4), TJCs had excellent reliability in 3 studies with ICC 40.9 and SJCs had moderate to

TJCs by health care professionals TJC reliability between physicians and metrologists (n ¼ 7 physician vs. metrologist and n ¼ 7 between physicians) (Table 5) was good in studies that reported the 28 joint counts. ICC ranged from 0.64 to 0.83 between physicians and 0.75–0.87 for physicians vs. metrologists. For indicative purposes, a pooled estimate for the 5 studies (patient n ¼ 299) that evaluated the 28 joint counts reported excellent reliability of ICC ¼ 0.86 (95% CI: 0.81–0.89), with mild level of heterogeneity (I2 ¼ 14%). Only 2 studies evaluated the 68 TJCs and variances ranged from 14 to 21. At a joint level, agreement of tender joints was variable with κ that ranged from 0.31 to 0.68. SJC by health care professionals Reliability of SJCs between physicians and metrologists was evaluated in 8 studies and between physicians in 7 studies (Table 5). ICC was particularly wide-ranging for physicians vs. metrologists using the 28 joint counts (0.29–0.95). This was less so

Table 5 Inter-observer reliability of tender and swollen joints by health care professionals Studies

Comparatora

Count

Statistic

Result

95% CI

Tender joint count (28 joint count) Lassere et al. [45] Walsh et al. [48] Alegre et al. [50] Cheung et al. [51] Radner et al. [55] Scott and Houssein [42] Grunke et al. [52] Stamp et al. [54] Prevoo et al. [40]

Physician Both Both Both Both Both Both Both Both

28 28 28 28 28 28 28 28 28

ICC ICC ICC ICC ICC COV Variance SD Pearson's

0.64 0.75 0.85 0.87 0.86 66 7 5 0.89

NR 0.47–0.96 NR 0.70–0.95 0.82–0.89 NR NR NR NR

Tender joint count (other than 28 joint count) Lassere et al. [45] Both Klinkhoff et al. [35] Physician Grunke et al. [52] Both Naredo et al. [46] Physician Hart et al. [34] Physician Escalante [43] Physician Salaffi et al. [47] Physician Hansen et al. [33] Physician

74 68 68 60 52 48 42 23

ICC Variance Variance Kappa ICC ICC Kappa COV

0.81 14 21 0.38–0.68 0.83 0.88 0.31–0.62 38

NR NR NR NR NR NR NR NR

Swollen joint count (28 joint count) Escalante[43] Lassere et al. [45] Walsh et al. [48] Alegre et al. [50] Cheung et al. [51] Radner et al. [55] Scott and Houssein [42] Grunke et al. [52] Stamp et al. [54] Marhadour et al. [53] Cheung et al. [56]

Physician Physician Both Both Both Both Both Both Both Both Physician

28 28 28 28 28 28 28 28 28 28 28

ICC ICC ICC ICC ICC ICC COV Variance SD Kappa Kappa

0.84 0.52 0.29 0.78 0.31 0.95 82 12 7 0.54 0.50

NR NR 0.09–0.38 NR  0.16 to 0.67 0.93–0.96 NR NR NR 0.4–0.62** 0.41–0.59

Swollen joint count (other than 28 joint count) Lassere et al. [45] Physician Bellamy et al. [38] Physician Grunke et al. [52] Both Naredo et al. [46] Physician Salaffi et al. [47] Physician Hansen et al. [33] Physician Marks et al. [32] Physician

68 66 66 60 40 14 2

ICC ICC Variance Kappa Kappa COV Pearson's

0.52 0.98 28  0.01 to 1 0.2–0.76 31 0.52–0.57

NR NR NR NR NR NR NR

ICC ¼ intraclass correlation coefficient; NR ¼ not reported; COV ¼ coefficient of variation; SD ¼ standard deviation. a **

“Both” indicates physician and metrologist. indicates range, rather than 95%CI.

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

6

in studies between physicians (ICC ranging from 0.52 to 0.84). A pooled estimate for the 4 studies (patient n ¼ 313) that evaluated the 28 joint counts reported ICC ¼ 0.67 (95% CI: 0.06–0.92), but they were highly heterogeneous with I2 ¼ 96%. Only 2 studies reported reliability of the 66 joint counts. One reported excellent reliability, with ICC ¼ 0.98 [38] whilst another reported a high variance of 28 [52]. Agreement at the “joint” level of swollen joints was more variable compared to studies reporting tender joint agreement with κ as low as  0.01 to as high as 1. Articular indices by health care professionals (HCPs) Reliability for the RAI was excellent (ICC ¼ 0.97) [38] and excellent association was seen between HCPs with r ranging from 0.72 to 0.95 (Appendix). There were limited studies on TKAI, but reliability was reported to be excellent in 1 study (ICC ¼ 0.85) [34]. TJCs by patients Patient-reported TJCs reliability when compared to HCPs was evaluated in 17 studies (Table 6). Of the 9 studies using the 28 joint count, 4 studies reported ICC ranging from 0.70 to 0.92. For indicative purposes, summary estimate for the 4 studies (patient n ¼ 751) on the 28 joint counts was excellent with ICC ¼ 0.82 (95% CI: 0.73–0.89), but I2 ¼ 84%, which indicated significant heterogeneity. Patient-reported 28 TJCs also had a good association with HCP TJCs (r ranging from 0.62 to 0.87).

SJCs by patients Patient SJC reliability when compared to HCP SJCs was reported in 13 studies (Table 6). Three of the 7 studies using the 28 joint counts reported ICC ranging as low as 0.31–0.55. Summary estimate for indicative purposes (patient n ¼ 406) showed ICC was moderate at 0.44 (95% CI: 0.24–0.60) but with substantial heterogeneity (I2 ¼ 79%). Nevertheless, in the studies that evaluated correlation, there was a strong relationship between patient SJCs when compared to HCPs (r ¼ 0.43–0.75). Articular indices by patients There were limited studies in other articular indices but the RAI appeared to have better reliability than the TKAI (ICC ¼ 0.83 compared to ICC ¼ 0.44, respectively) [57,63] (Appendix). Methods of standardization, training, and consensus Nine studies reported training or consensus methods as an intervention for improving joint counts (n ¼ 7 HCPs and n ¼ 2 patients). Training was primarily based on published guidelines such as the EULAR guideline for joint assessment [24], which was used in 7 studies. The length of training or standardization was variable between HCPs and patients. HCP reliability studies had training or consensus with duration no more than 1 day, while patients had very limited training of 5 min. Of the HCP studies, 4 (57%) used didactic methods while 3 (43%) used consensus as the

Table 6 Inter-observer reliability of patient self-reported joints Studies

Comparatora

No.

Statistic

Result

95% CI

Tender joint count (28 joint count) Greenwood et al. [68] Kavanaugh et al. [72] Cheung et al. [51] Radner et al. [55] Houssein et al. [66] Prevoo et al. [62] Levy et al. [70] Tricta et al. [71] Riazzioli et al. [73]

Metrologist Physician Both Both Physician Both Metrologist Metrologist Physician

28 28 28 28 28 28 28 28 28

ICC ICC ICC ICC Kappa Pearson's Pearson's Pearson's Spearman's

0.92 0.78 0.85 0.70 0.49–0.84 0.62 0.79 0.72 0.87

NR NR 0.65–0.94 0.62–0.76 NR NR NR NR NR

Tender joint count (other than 28 joint count) Wong et al. [67] Physician Figueroa et al. [69] Physician Stucki et al. [60] Physician Escalante [43] Both Alarcon et al. [64] Metrologist

50 42 42 40 38

ICC ICC Pearson's Pearson's ICC

Metrologist Physician Metrologist

36 20 20

Spearman’s ICC Pearson's

0.65 0.77 0.43 0.78 0.55 (Text) 0.64 (Mannequin) 0.76 0.31 0.89

NR NR NR NR 0.35–0.69 0.60–0.83 NR NR NR

Physician Both Both Physician Both Metrologist Metrologist Physician

28 28 28 28 28 28 28 28

ICC ICC ICC Kappa Pearson's Pearson's Pearson's Spearman's

0.55 0.41 0.30 0.02–0.61 0.61 0.64 0.43 0.75

NR  0.05 to 0.72 0.18–0.42 NR NR NR NR NR

48 42 40 32 20

ICC ICC Pearson's Spearman's ICC

0.64 0.43 0.31 0.42 0.16

NR NR NR NR NR

Calvo et al. [65] Hanley et al. [61] Abraham et al. [58] Swollen joint count (28 joint count) Kavanaugh et al. [72] Cheung et al. [51] Radner et al. [55] Houssein et al. [66] Prevoo et al. [62] Levy et al. [70] Tricta et al. [71] Riazzioli et al. [73]

Swollen joint count (other than 28 joint count) Wong et al. [67] Physician Figueroa et al. [69] Physician Escalante [43] Both Stucki et al. [60] Physician Hanley et al. [61] Physician ICC ¼ intraclass correlation coefficient; NR ¼ not reported. a

“Both” indicates physician and metrologist.

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

primary way to improve reliability. However, the exact details were variable as shown in Table 7. Small group consensus was the most common method for reaching agreement with HCPs. The didactic method was used in the 2 studies on patients mainly by verbal instruction instead of structured teaching. Practicing how to perform joint counts beyond that of 1 evaluation after training was not carried out in either of these studies, but was applied in 3 of the 7 HCP studies. Despite this, there was improvement in TJC reliability in all studies except 1 patient study [55]. For SJCs, there was no overall improvement in 3 (60%) HCP studies, while both studies on patients reported improvement in SJCs.

Discussion It appears that patient-reported joint counts have a potential role in the monitoring of disease activity in rheumatoid arthritis. From the included studies, the intra-observer reliability for both TJCs and SJCs as well as TJC inter-observer reliability of physicians, metrologists, and patients was good. However, inter-observer SJC reliability was poor, especially for patients vs. HCPs. Overall there was conflicting evidence that training improved SJC reliability between HCPs, although TJC reliability did improve after training or standardization. For patients, of the limited number of studies available, there was an indication that training did improve reliability for SJCs although the training described was brief and lacked longitudinal follow-up, indicating a need for more consensus and research in this area. There were several limitations with the data available for this review. Firstly, studies included were heterogeneous in statistical reporting. Several studies used statistics to measure association instead of using measures more appropriate for reliability such as ICCs for example. Hence interpretation of results was difficult, rendering a meta-analysis of the results as well as the direct comparison of the reliability of joint counts by HCPs compared to patients impossible. Although pooled summary estimates were

7

performed for inter-observer reliability of the 28 joint counts, it needed to be interpreted carefully as they did not account for other aspects contributing to heterogeneity in those studies. Secondly, for completeness, all of the joint indices were reviewed and hence there were many different variations of joint counts even for tender and swollen joints. For practical purposes, the analysis focused on the 28 joint counts, which was more commonly reported in the studies and is widely used in clinical practice, and for indicative purposes, a metanalysis of these results reporting ICC was performed. However, they were interpreted conservatively due to the significant statistical heterogeneity. Another potential limitation was the lack of data related to patient demographics and clinical disease activity. This made interpretation of reliability statistics such as ICC difficult, as the ICC is affected and falsely reduced if there is limited clinical heterogeneity. ICC depends on the variation between study samples. Hence, the ICC is generalizable only to samples with a similar variation. This may contribute to the ICC being different in the studies, rather than measurement error alone. The risk-of-bias assessment from the QAREL checklist [28] indicated an over-arching need for better methodology and reporting of results with diagnostic reliability studies in joint counts. A critical appraisal of our review identified a few limitations specific to the review methodology [74]. Firstly, only one author performed the literature search; however the keyword search was conducted in conjunction with 2 independent librarians, with partial validation through independent double reading (25% random sub-sample) of the data extraction and quality assessment. Although this revealed excellent agreement, further double reading was not pursued. In addition, not all publications reporting reliability may have been included, but several different databases were used with publication bias addressed by reviewing abstracts of the leading international scientific meetings as well as contacting experts in the field to identify unpublished reports. According to the OMERACT filter [75], joint counts performed by physicians or metrologists have fulfilled the criteria of “truth” in the context of face, content, construct, and criterion validity [6].

Table 7 Summary of reliability studies reporting effects of training or consensus Studies

Standard protocol Length

Health care professional (physician or metrologist) Hart et al. [34] NR NR Klinkoff et al. [35] Yes 3 h Bellamy et al. [38] Yes 2 h pre, 2.5

Method of delivery

Consensus Consensus h post Consensus Objectives of training and consensus provided Didactic

Demonstration Practice Time to re-assess Improve in reliability

NR Noa No

No NR No

NR 6 month 2 days

NR TJC—Yes SJC—No

No

No

NR

Didactic Yes Objectives of training and consensus provided Consensus No

Yes

1 day

TJC—Yes SJC—No TJC—Yes SJC—Yes

Yes

1/2 day

Scott and Houssein [42] Yes

1h

Grunke et al. [52]

Yes

1 day

Stamp et al. [54]

Yes

1/2 day

Cheung et al. [56]

Yes

1/2 day

Consensus No Objectives of training and consensus provided

Yes

1/2 day

Yes

5 min

No

Next follow-up

TJC—Yes SJC—Yes

Yes

“Short”

Didactic No Objectives of training and consensus provided Didactic NR Objectives of training and consensus provided

No

3 month

TJC—No SJC—Yes

Patient Levy et al. [70]

Radner et al. [55]

NR ¼ not recorded; TJC ¼ tender joint count; SJC ¼ swollen joint count. a b

Photo was taken of joint location. Improvement only for newly qualified rheumatologists against their senior peers.

TJC—Yes SJC—Yes SJC—Nob

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]]

8

“Feasibility” is also apparent with its easy application during clinical consultation. “Sensitivity to change” is also agreed [7,76]. Despite joint counts being a validated outcome measure, it is well known that not all physicians routinely perform a formal joint count in clinical practice [77]. Although patient-reported joint counts have not been subjected to the OMERACT filter to date, patients could potentially assist the physician in recording the joint count. It was reassuring that intra-observer reliability was good for both HCPs and patients. This highlights the potential use of joint counts as a method for disease monitoring especially when performed by the same person, particularly for patients who may be able to assist the physician in disease monitoring between clinic visits. However, inter-observer SJC reliability is still a problem. Unlike other methods of measurement in medicine such as blood pressure, there are many factors affecting the reliability with swollen joint assessment that are unique. These factors may include the level of experience of assessors, whom they were trained by, the level of disease activity, or the degree of joint deformity. A number of studies in this review that looked at either consensus or standardization and training did report an improvement in TJC. However, the conflicting results with HCP SJC may reflect the heterogeneity in study design and patient sampling. In comparison, patient SJC reported improvement after training despite patients receiving much shorter duration and intensity of training than the HCPs. However, caution needs to be exercised as there were limited studies, and the extent of training or practice indicated was not extensive. Hence more studies on training with a greater educational component for patients and longitudinal data collection are needed. In conclusion, the intra-observer reliability of TJCs and SJCs is good among that derived by physicians, metrologists, and patients. This shows some promise for the purposes of clinical trials, and more importantly, highlights the potential of patients acting as their own observer in measuring joint counts between clinic visits over time and as an outcome measure in clinical trials. However, inter-observer reliability especially SJCs is still variable and poor especially for patients. Although standardization and training theoretically will improve inter-observer reliability, the current results are conflicting and call for more rigorous research in this area.

Acknowledgment We thank Guillemette Utard from Bibliotheque Inter-Universitaire, Rene Descartes University, Paris and Jeremy Cullis from University of Sydney for assistance with keyword literature search. We also thank Dr. Charles Chen, University of Sydney for statistical advice.

Appendix.

Supplementary information

Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.semarthrit.2013.11.003.

References [1] Drossaer-Baker KW, de Buck M, van Zeben D, Zwinderman AH, Breedveld FC, Hazes JM. Long term course and outcome of functional capacity in rheumatoid arthritis: the effect of disease activity and radiologic damage over time. Arthritis Rheum 1999;42:1854–60. [2] Smolen JS, Aletaha D, Bijlsma JW, Breedveld FC, Boumpas D, Burmester G, et al. Treating rheumatoid arthritis to target: recommendations of an international task force. Ann Rheum Dis 2010;69:631–7. [3] Scott DL, Antoni C, Choy EH, van Riel PC. Joint counts in routine practice. Rheumatology (Oxford) 2003;42:919–23.

[4] van der Heijde DM, van't Hof M, van Riel PL, van de Putte LB. Development of a disease activity score based on judgment in clinical practice by rheumatologists. J Rheumatol 1993;20:579–81. [5] Felson DT, Anderson JJ, Boers M, Bombardier C, Chernoff M, Fried B, et al. The American College of Rheumatology preliminary core set of disease activity measures for rheumatoid arthritis clinical trials. The Committee on outcome measures in rheumatoid arthritis clinical trials. Arthritis Rheum 1993;36: 729–40. [6] Anderson JJ, Felson DT, Meenan RF, Williams HJ. Which traditional measures should be used in rheumatoid arthritis clinical trials? Arthritis Rheum 1989;32:1093–9. [7] Aletaha D, Smolen JS. Joint damage in rheumatoid arthritis progresses in remission according to the disease activity score in 28 joints and is driven by residual swollen joints. Arthritis Rheum 2011;63:3702–11. [8] Van Leeuwen MA, van der Heijde DM, van Rikswijk MH, Houtman PM, van Riel PL, van de Putte LB, et al. Interrelationship of outcome measures and process variables in early rheumatoid arthritis: a comparison of radiologic damage, physical disability, joint counts, and acute phase reactants. J Rheumatol 1994;21:425–9. [9] van Eijk-hustings Y, van Tubergen A, Bostrom C, Braychenko E, Buss B, Felix J, et al. EULAR recommendations for the role of the nurse in the management of chronic inflammatory arthritis. Ann Rheum Dis 2012;71:13–9. [10] Barton JL, Criswell LA, Kaiser R, Chen YH, Schillinger D. Systematic review and metaanalysis of patient self-report versus trained assessor joint counts in rheumatoid arthritis. J Rheumatol 2009;36:2635–41. [11] Dougados M, Perrodeau E, Fayet F, Gaudin P, Cerato M-H, Le Loët X, et al. Impact of a nurse led program of patient self-assessment of disease activity on the management of rheumatoid arthritis: results of a prospective, multicentre, randomised, controlled trial (COMEDRA). Ann Rheum Dis 2013;72(Suppl. 3):150 [abstract]. [12] Sokka T. How should rheumatoid arthritis disease activity be measured today and in the future in clinical care? Rheum Dis Clin North Am 2010;36: 243–57. [13] Scott DL, Choy EH, Greeves A, Isenberg D, Kassinor D, Rankin E, et al. Standardising joint assessment in rheumatoid arthritis. Clin Rheumatol 1996;15:579–82. [14] Bellamy N, Muirden KD, Bendrups A, Boyden K, McColl G, Moran H, et al. Rheumatoid arthritis antirheumatic drug trials: effects of a standardised instructional videotape on the reliability of observer-dependent outcome measures. Inflammopharmacology 1997;5:273–84. [15] Czirjak L, Nagy Z, Aringer M, Riemekasten G, Matucci-Cerinic M, Furst DE. The EUSTAR model for teaching and implementing the modified Rodnan skin score in systemic sclerosis. Ann Rheum Dis 2007;66:966–9. [16] Ritchie DM, Boyle JA, McInnes JM, Jasani MK, Dalakos TG, Grieveson P, et al. Clinical studies with an articular index for the assessment of joint tenderness in patients with rheumatoid arthritis. Q J Med 1968;37:393–406. [17] Cooperating Clinics Committee of the American Rheumatism Association. A seven-day variability study of 499 patients with peripheral rheumatoid arthritis. Arthritis Rheum 1965;8:302–35. [18] van der Heijde DM, Klareskog L, Boers M, Landewe R, Codreanu C, Bolosiu HD, et al. Comparison of different definitions to classify remission and sustained remission: 1-year TEMPO results. Ann Rheum Dis 2005;64:1582–7. [19] Egger MJ, Huth DA, Ward JR, Reading JC, Williams HJ. Reduced joint count indices in the evaluation of rheumatoid arthritis. Arthritis Rheum 1985;28: 613–9. [20] Fuchs HA, Brooks RH, Callahan LF, Pincus T. A simplified twenty-eight joint quantitative articular index in rheumatoid arthritis. Arthritis Rheum 1989;32:531–7. [21] Sokka T, Pincus T. Eligibility of patients in routine care for major clinical trials of anti-tumour necrosis factor alpha agents in rheumatoid arthritis. Arthritis Rheum 2003;48:313–8. [22] Thompson PW, Silman AJ, Kirwan JR, Currey HL. Articular indices of joint inflammation in rheumatoid arthritis: correlation with the acute phase response. Arthritis Rheum 1987;30:618–23. [23] Sokka T, Pincus T. Quantitative joint assessment in rheumatoid arthritis. Clin Exp Rheumatol 2005;23:S58–62. [24] Van Riel PLCM, Scott DL. EULAR handbook of clinical assessment in rheumatoid arthritis. Alphen Aan Den Rijn, The Netherlands: Van Zuiden Communications; 2000. [25] Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. [26] Landis RJ, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–327. [27] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307–10. [28] Lucas NP, Macaskill P, Irwig L, Bogduk N. The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). J Clin Epidemiol 2010;63:854–61. [29] Fisher RA. Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika 1915;10:507–21. [30] Lansbury J. A method for summation of the systemic indices of rheumatoid activity. Am J Med Sci 1956;232:300–10. [31] Eberl DR, Fasching V, Rahlfs V, Schleyer I, Wolf R. Repeatability and objectivity of various measurements in rheumatoid arthritis. Arthritis Rheum 1976;19: 1278–86.

P.P. Cheung et al. / Seminars in Arthritis and Rheumatism ] (2013) ]]]–]]] [32] Marks JS, Palmer MK, Burke MJ, Smith P. Observer variation in examination of knee joints. Ann Rheum Dis 1978;37:376–7. [33] Hansen TM, Keiding S, Lilholt SLauritzen, Manthorpe R, Freiesleben SSorensen, Wiik A. Clinical assessment of disease activity in rheumatoid arthritis. Scand J Rheumatol 1979;8:101–5. [34] Hart LE, Tugwell P, Buchanan W, Norman GR, Grace EM, Southwell D. Grading of tenderness as a source of interrater error in the Ritchie articular index. J Rheumatol 1985;12:716–7. [35] Klinkhoff A, Bellamy N, Bombardier C, Carette S, Chalmers A, Esdaile JM, et al. An experiment in reducing interobserver variability of the examination for joint tenderness. J Rheumatol 1988;15:492–4. [36] Lewis PA, O'Sullivan MM, Rumfeld WR, Coles EC, Jessop JD. Significant changes in Ritchie scores. Br J Rheumatol 1988;27:32–6. [37] Archenholtz B, Ahlmen M, Bengtsson C, Bjelle A, Hansson G, Lurie M, et al. Reliability of articular indices and function tests in a population study of rheumatic disorders. Clin Rheumatol 1989;8:215–24. [38] Bellamy N, Anastassiades TP, Buchanan WW, Davis P, Lee P, McCain GA, et al. Rheumatoid arthritis antirheumatic drug trials. Effects of standardization procedures on observer dependent outcome measures. J Rheumatol 1991; 18:1893–900. [39] Thompson PW, Hart LE, Goldsmith CH, Spector TD, Bell MJ, Ramsden MF. Comparison of four articular indices for use in clinical trials in rheumatoid arthritis: patient, order and observer variation. J Rheumatol 1991;18:661–5. [40] Prevoo ML, van Riel PLCM, van't Hof MA, van Rijswijk MH, van Leeuwen MA, Kuper HH, et al. Validity and reliability of joint indices, a longitudinal study in patients with recent onset rheumatoid arthritis. Rheumatology (Oxford) 1993;35: 589–94. [41] van den Brink HR, van der Heide A, Jacobs JW, van der Veen MJ, Bijlsma JW. Evaluation of the Thompson articular index. J Rheumatol 1993;20:28–32. [42] Scott DL, Houssein DA. Joint assessment in rheumatoid arthritis. Br J Rheumtol 1996;35:14–8. [43] Escalante A. What do self-administered joint counts tell us about patients with rheumatoid arthritis? Arthritis Care Res 1998:280–90. [44] Hernandez-Cruz B, Cardiel MH. Intra-observer reliability of commonly used outcome measures in rheumatoid arthritis. Clin Exp Rheumatol 1998;16: 459–62. [45] Lassere M, van der Heijde D, Johnson KR, Edmonds J. Reliability of measures of disease activity and disease damage in rheumatoid arthritis: implications for smallest detectable difference, minimal clinically important difference, and analysis of treatment effects in randomised controlled trials. J Rheumatol 2001;28:892–903. [46] Naredo E, Bonilla G, Gamero F, Uson J, Carmona L, Laffon A. Assessment of inflammatory activity in rheumatoid arthritis: a comparative study of clinical evaluation with grey sale and power Doppler ultrasonography. Ann Rheum Dis 2005;64:375–81. [47] Salaffi F, Filipucci E, Carotti M, Naredo E, Meenagh G, Ciapetti A, et al. Interobserver agreement of standard joint counts in early rheumatoid arthritis: a comparison with grey scale ultrasonography—a preliminary study. Rheumatology (Oxford) 2008;47:54–8. [48] Walsh CA, Mullan RH, Minnock PB, Slattery C, FitzGerald O, Bresnihan B. Consistency in assessing the disease activity score 28 in routine clinical practice. Ann Rheum Dis 2008;67:135–6. [49] Uhlig T, Kvien TK, Pincus T. Test-retest reliability of disease activity core set measures and indices in rheumatoid arthritis. Ann Rheum Dis 2009;68:972. [50] Alegre J, Colazo M, Gonzalez-Diez C, Cuesta-Lasso M, Alegre-Løpez MC, et al. Agreement between the tender joint counts, swollen joint counts, and resulting DAS28 score according to different assessors in patients with rheumatoid arthritis. Ann Rheum Dis 2010;69:670 [abstract]. [51] Cheung PP, Ruyssen-Witrand A, Gossec L, Paternotte S, Le Bourlout C, Mezieres M, et al. Reliability of patient self-evaluation of swollen and tender joints in rheumatoid arthritis: a comparison study with ultrasonography, physician and nurse assessments. Arthritis Care Res 2010;62:1112–9. [52] Grunke M, Antoni CE, Kavanaugh A, Hildebrand V, Dechant C, Schett G, et al. Standardisation of joint examination technique leads to significant decrease in variability among different examiners. J Rheumatol 2010;37:860–4. [53] Marhadour T, Jouse-Joulin S, Chales G, Grange L, Hacquard C, Loeuille D, et al. Reproducibility of joint swelling assessments in long-lasting rheumatoid arthritis: influence on disease activity score-28 values (SEA-Repro study part 1). J Rheumatol 2010;37:932–7. [54] Stamp LK, Harrison A, Frampton C, Corkill MM. Does a joint count calibration exercise make a difference? Implications for clinical trials and training J Rheumatol 2012;39:877–8.

9

[55] Radner H, Grisar J, Smolen JS, Stamm T, Aletaha D. Value of self-performed joint counts in rheumatoid arthritis patients near remission. Arthritis Res Ther 2012;14:R61. [56] Cheung PP, Dougados M, Andre V, Balandraud N, Chales G, Chary-Valckenaere I, et al. Improving agreement in assessment of synovitis in rheumatoid arthritis. Joint Bone Spine 2013;80(2):155–9. [57] Stewart MW, Palmer DG, Knight RG. A self-report articular index measure of arthritic activity: investigations of reliability, validity and sensitivity. J Rheumatol 1990;17:1011–5. [58] Abraham N, Blackman D, Jackson JR, Bradley LA, Lorish CD, Alarcon GS. Use of self-administered joint counts in the evaluation of rheumatoid arthritis patients. Arthritis Care Res 1993:78–81. [59] Hewlett SE, Haynes J, Shepstone L, Kirwan JR. Rheumatoid arthritis patients cannot accurately report signs of inflammatory activity. Rheumatology (Oxford) 1995;34:547–53. [60] Stucki G, Stucki S, Bruhlmann P, Maus S, Michel BA. Comparison of the validity and reliability of self-reported articular indices. Rheumatology (Oxford) 1995;34:760–6. [61] Hanley JG, Mosher D, Sutton E, Weerasinghe S, Theriault D. Self-assessment of disease activity by patients with rheumatoid arthritis. J Rheumatol 1996;23: 1531–8. [62] Prevoo ML, Kuper IH, van't Hof MA, van Leeuwen MA, van de Putte LB, van Riel PL. Validity and reproducibility of self-administered joint counts: a prospective longitudinal followup study in patients with rheumatoid arthritis. J Rheumatol 1996;23:841–5. [63] Taal E, Abdel-Nasser AM, Rasker JJ, Wiegman O. A self-report Thompson articular index: what does it measure? Clin Rheumatol 1998;17:125–9. [64] Alarcon GS, Tilley BC, Li SH, Fowler SE, Pillemer SR. Self-administered joint counts and standard joint counts in the assessment of rheumatoid arthritis. J Rheumatol 1999;26:1065–7. [65] Calvo FA, Calvo A, Berrocal A, Pevez C, Romero F, Vega E, et al. Selfadministered joint counts in rheumatoid arthritis: comparison with standard joint counts. J Rheumatol 1999;26:536–9. [66] Houssien DA, Stucki G, Scott DL. A patient-derived disease activity score can substitute for a physician-derived disease activity score in clinical research. Rheumatology (Oxford) 1999;38:48–52. [67] Wong AL, Wong WK, Harker J, Sterz M, Bulpitt K, Park G, et al. Patient selfreport tender and swollen joint counts in early rheumatoid arthritis. J Rheumatol 1999;26:2551–61. [68] Greenwood MC, Hakim AJ, Carson E, Doyle DV. Touch-screen computer systems in the rheumatology clinic offer a reliable and user-friendly means of collecting quality-of-life and outcome data from patients with rheumatoid arthritis. Rheumatology (Oxford) 2006;45:66–71. [69] Figueroa F, Braun-Moscovici Y, Khanna D, Voon E, Gallardo L, Luinstra D, et al. Patient self-administered joint tenderness counts in rheumatoid arthritis are reliable and responsive to changes in disease activity. J Rheumatol 2007;34: 54–6. [70] Levy G, Cheetham G, Cheatwood A, Burchette R. Validation of patientsreported joint counts in rheumatoid arthritis and the role of training. J Rheumatol 2007;34:1261–5. [71] Tricta J, Larche MJ, Beattie KA, Wong A, Benson WG, Pavlova V, et al. Rheumatoid arthritis patient self assessment of disease activity as determined by joint count compared to clinical joint count assessment by trained masters student [abstract]. Arthritis Rheum 2009;60:802. [72] Kavanaugh A, Lee SJ, Weng HH, Chon Y, Huang XY, Lin SL. Patient-derived joint counts are a potential alternative for determining disease activity score. J Rheumatol 2010;37:1035–41. [73] Riazzioli J, Nilsson JA, Teleman A, Petersson IF, Rantapp-Dahlqvist S, Jacobsson LT, et al. Patient-reported 28 swollen and tender joint counts accurately represent RA disease activity and can be used to assess therapy responses at the group level. Rheumatology 2010;49:2098–103. [74] Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature—how to use an overview. J Am Med Assoc 1994;272:1367–71. [75] Boers M, Brooks P, Strand V, Tugwell P. The OMERACT filter for outcome measures in rheumatology. J Rheumatol 1998;25:198–9. [76] Zhang B, Lavalley M, Felson DT. The sensitivity to change for lower disease activity is greater than for higher disease activity in rheumatoid arthritis trials. Ann Rheum Dis 2009;68:1255–9. [77] Pincus T, Segurado OG. Most visits of most patients with rheumatoid arthritis to most rheumatologists do not include a formal quantitative joint count. Ann Rheum Dis 2006;65:820–2.