RESEARCH
The use of outcome measures relating to the knee
the content and delivery of healthcare and are increasingly important. It is no longer adequate to be merely aware that outcome measures exist and both practicing clinicians and researchers alike will require a more comprehensive knowledge of this area. Measuring health outcome is an assessment of change, often from a complex clinical status, following either an intervention (such as an operation), or passage of time (a period of observation). As a minimum, the information must be collected at two separate time points, with a specified interval between (e.g. within 4 weeks pre-surgery and at 6 months post-surgery). The magnitude of change occurring within this interval represents the effect of a certain treatment. In terms of utility, the derived outcome can be simply evaluated on its own merits or compared with the outcome produced with other treatment(s) giving a relative value of a specific treatment. It is of paramount importance that the outcomes are measured under as similar circumstances as possible. There is an entire science concerned with the validity, stability, appropriateness and accuracy of outcome measures. Only by having constant parameters and adequate standardization is any variability sufficiently controlled to enable valid conclusions to be drawn. There are number of methods that allow us to assess the health status in orthopaedics. Generally, these tend to be classified as either ‘objective’ or ‘subjective’ measurements. Objective measurements are considered to generate “harder” information and more often involve higher-level data (ratio, interval, or sometimes ordinal). They may involve judgements or measurements made by a clinician on clinical or mechanical findings such as radiological changes, range of movement or joint laxity. The term ‘subjective’ is usually applied to evidence obtained about current health status based upon patients’ own perceptions and usually involves standard questionnaires. The choice of an appropriate evaluation method will depend on a number of circumstances, such as the type of condition, task or setting.1 It is has now become widely accepted that traditionally defined outcomes, such as clinical and laboratory measures, need to be complemented by measures that focus on the patient’s concerns in order to fully evaluate intervention and identify more appropriate forms of health care.2 Furthermore, it is now clear that well designed reports from the patient can be very good in determining clinical change. A substantial amount of research carried out in the last two decades has led to considerable growth in the number of patient reported outcome measures (PROMs) designed for measuring health-related quality of life and functional status.3 This article attempts to outline the requirements of a good outcome instrument by reviewing aspects of the methodological quality of instruments, including issues of standardization, validity, reliability and responsiveness. We discuss the current state of knowledge in outcome measurement of the knee and briefly review the methods available for patients with different knee pathologies. Strong recommendations are avoided and there is no intention to highlight the best measures although some advantages and shortcomings of various systems are briefly discussed. The overall purpose is to allow the prospective user to better select an appropriate outcome instrument with an improved understanding of the science behind it.
David J Beard Kristina Knezevic Sami Al-Ali Jill Dawson Andrew J Price
Abstract Accurate outcomes assessment is one of the fundamental aspects of any reliable research. There are a vast number of outcome instruments currently available in orthopaedics. This plethora and diversity can lead to frustration when it comes to choosing the appropriate option. In this article we address the essential properties an outcome measure must possess, aiming to allow an informed selection of the most suitable instrument. We describe and illustrate key psychometric properties that are to be assessed when judging suitability of an instrument, such as validity, reliability and responsiveness. We also review some of the most commonly used and recommended outcome measures available in regard to different knee pathologies. With no intention of recommending a specific option, we briefly outline advances and shortcomings of named instruments.
Keywords health status; knee; outcome measures; psychometrics
Introduction Outcome measures provide the basis for research, audit and clinical governance. Ultimately they are responsible for shaping
David J Beard MSc DPhil University Research Lecturer, RCUK Fellow & Extended Scope Practitioner, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Oxford, UK. Kristina Knezevic BSc Research Fellow, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Oxford, UK. Sami Al-Ali BM MRCS Clinical Research Fellow, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Oxford, UK. Jill Dawson MSc DPhil Senior Research Scientist, Department of Public Health (HSRU), University of Oxford, Oxford, UK. Andrew J Price DPhil FRCS (Orth) Reader in Musculoskeletal Medicine & Consultant Orthopaedic Surgeon, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Nuffield Orthopaedic Centre, Oxford, UK.
ORTHOPAEDICS AND TRAUMA 24:4
309
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
Features that make a good outcome instrument
(or characteristic that a test is designed to measure) that cannot be directly confirmed by an existing gold standard. There is no upper limit on hypotheses that can be tested this way, but no single experiment can definitely confirm the construct.5 Validity is not assessed by a single statistic that determines the relationship between the test and the condition it is intended to measure. When assessing the validity of an instrument, it should be completed in relation to a specific purpose (i.e. population of interest with particular impairment). Therefore, the instrument can be claimed to be valid only when used with a similar type of population and similar context.5
There are some essential properties an outcome measure must possess in order to be classified as a good instrument. Published evidence of satisfactory psychometric properties such as acceptable levels of reliability, validity and sensitivity1 ensures that the instrument has been developed (and applied) using sound methodology. The measurement of validity and reliability is not entirely straightforward and criteria for acceptable validity are not explicitly defined. Without discerning which instruments have adequate validity or reliability the choice of outcome measure becomes nearly impossible because of the abundance of instruments currently available. The use of multiple instruments, and a lack of standardization in application, has caused difficulties with the comparison of results throughout the orthopaedic world.4
Reliability Reliability is an estimation of the consistency and stability of a measuring instrument. It analyses the extent to which an instrument is free from measurement error.1 Ideally, a reliable instrument, measuring the same characteristic tested under similar conditions, would produce very similar results. There are two most common classes of reliability estimates. The assessment of internal consistency involves a measurement of inter-correlation of the responses from a single administration (in the case of a PROM). The most commonly used method to measure internal consistency uses Cronbach’s alpha7 giving a value that ranges from zero to one (one indicating perfect reliability). Care should be taken when interpreting the results of the alpha however, because when an instrument has a multidimensional structure, the alpha will usually be low regardless of its internal consistency. It is recommended that alpha should be not lower than 0.7 and not higher than 0.9.1,5 Another method of estimating reliability uses the testeretest method. Unlike internal consistency involving a single administration, the testeretest method requires two separate administrations of the same instrument at a certain time interval involving the same respondents. The reliability can be expressed in many ways (error standard deviation, coefficient of variation) but the most used and accepted method in recent times is an intra-class correlation coefficient (ICC).5 In order to avoid the learning bias that can occur in the retest, the interval between two administrations should not be too short as considerably different estimates can be obtained depending on the interval. At the same time, the period between two administrations should not be too long because of the possibility of real clinical change having occurred so between 2 and 14 days is the most usual period adopted.5
Validity Validity is concerned with whether a measure measures what it purports to measure.5 Validity has been defined as: ‘The degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test’.6 Various sources of validity evidence are described: evidencebased on test content, on the response process, the internal structure, and based on relations to other variables and on the consequences of testing. Streiner and Norman describe several validation methods. The content validation of an instrument refers to examining the appropriateness of items on an instrument, with regards to its ability to adequately sample the domain of interest (by a selection of representative questions in the case of PROMs).5 For example questions about intensive sport would be largely inappropriate for a very elderly patient who is attempting to retain mobility by undergoing knee replacement. The evaluation is qualitative, and the appropriateness of an item is based on expert judgements. For this reason some authors do not consider it to be an expression of validity.5 A PROM is most likely to have good content validity if patients are involved in its development,1 and this is an essential first step when developing new instruments. Criterion validation assesses the degree of correspondence between a measure and a criterion variable (the gold standard).5 For establishing this aspect of validity an objective reliable standard measure is needed for comparison. For example the findings at arthroscopy may be considered the gold standard when evaluating the ability of magnetic resonance imaging (MRI) to detect ligament damage within the knee. When it comes to PROMs, especially knee scores, there is often no appropriate gold standard measure, so validity testing must take more indirect approaches. The only exception would be in the case of validating a concise version of an existing score, where the longer version would represent a ‘gold standard’. This approach would imply looking for a strong correlation between the short and the long version of the score. Two types of temporal designs are common when performing criterion validation of an instrument: (1) concurrent validation, which implies gathering both forms of information at the same time, and (2) predictive validation, when the criterion information becomes known at a later time point.5 Construct validation encompasses a number of methods of validity testing which can be used when assessing a construct
ORTHOPAEDICS AND TRAUMA 24:4
Responsiveness The ability of an instrument to detect meaningful clinical change over time is critical for the use and application of an instrument.1 This might be following an intervention, or just during a period of observation. There are many statistical methods described for quantifying the responsiveness of an instrument.8e10 In the context of knee replacement, changes in operative techniques (i.e. the use of unicompartmental rather than total knee replacement) and moves towards offering surgery to younger, more active individuals than used to be the case, could potentially raise issues around the responsiveness of some outcome measures devised before such developments occurred. Were this to be the case, some real changes resulting from an intervention might not be detected. This could be because
310
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
instruments were no longer so finely tuned to matters of importance to the relevant patients. The inability of instruments to detect change can occur for two reasons: 1. An instrument may have compromised validity for that specific population or context in which it is being applied i.e. the content of the questions does not tease out known or observed clinical changes (perhaps identified by a different outcome measure). 2. The instrument is adequate and is reflecting a real lack of change in status. It is essential for the administrator of the instrument to be aware of any responsiveness issues before interpreting data. Any lack of responsiveness may be due to specific reasons such as the existence of a “floor or ceiling” effect. Terwee et al.11 proposed that floor and ceiling effects and interpretability should always be analysed to establish the methodological soundness of a health status instrument.
The issue of discerning what is considered a clinically relevant change: the minimum amount of change that is discerned as meaningful by patients and which is greater than the instrument’s measurement error12) is particularly important as it affects sample size calculations.
Different types of outcome instruments There is a growing demand for orthopaedic surgeons to record the outcome of their practice. Traditionally, success or a failure of an orthopaedic intervention was judged exclusively on the basis of empirical assessment of quantitative or technical variables. Measuring outcome using this approach was prone to bias, as objective success of an intervention does not necessarily reflect the improvement in patient’s quality of life.13 Clinicians and researchers are now aware of the difficulty of trying to quantify an outcome. It is often the case that objective measures do not necessarily correlate with the patient’s pain and function levels. One of the first attempts to add patient satisfaction assessment to the objective milieu was by Aichroth et al.14 in his knee function assessment chart. Indeed in past times it was not uncommon to find hybrid instruments combining clinician-based and functional outcome instruments together and these have been used extensively. However, such hybrid instruments rarely have full validation and often suffer low levels of inter-and intrarater reliability, if assessed at all. In more recent times the importance of assessing symptoms and function based on patients’ perspective13 has been acknowledged, and it could be argued that if no other measures are taken, a self reported outcome questionnaire together with a rating of patients’ satisfaction with the outcome, will provide the essential information about success. The stated purpose for which the outcomes data are being obtained will naturally influence the choice of an appropriate outcome measure. The primary purpose might be to rule out or record catastrophic failure of an implant, demonstrate the benefits of a new intervention or merely record population averages. There are many types of PROMs, but one classification contrasts generic (‘general health’) with disease specific instruments. Generic quality of life instruments are suitable for a broad range of patient groups across the general population. They are multidimensional, covering a broad range of daily activities represented by different sub-scales. Their advantage is that they have the ability to detect the influence of co-morbidities on overall health status and unforeseen effects of a health intervention (Table 1). Disease specific outcome instruments assess the quality of life of a patient who is suffering from a specific disease or health condition. These instruments generally have higher sensitivity than generic ‘quality of life’ instruments. Their targeted focus gives them the ability to detect clinically important changes and this makes them more clinically relevant when assessing changes in health status over time. However, they often lack the ability to detect unforeseen effects of a health intervention.15 At present we are not in a position to select or state the best instruments. The final choice of instrument should be based on current expert consensus and recommendations. Patients are likely to suffer from more than one condition at the time, so
Floor and ceiling effect If more than 20% of the respondents achieve the lowest or the highest possible score, then floor and ceiling effects are present. This may indicate that items at the lower or upper end of scale are missing.1 Floor and ceiling effects can occur when, say, administering an instrument designed for an older population to a younger, fit, sample of respondents who are most likely to score at the higher/better end of the scale. As previously mentioned if floor and ceiling effects are present, responsiveness is also reduced as participants cannot be distinguished between each other, and measuring change in health status will be restricted. It should always be remembered that the measurement properties of an instrument are not simply ‘of the instrument’, but ‘of the instrument in relation to the characteristics of the study population and context to whom it is applied’; thus, for instance, apparent floor or ceiling effects might simply reflect the (inappropriate) timing of the assessment in relation to an intervention e.g. asking patients to complete a knee score, when they had been told not to kneel, or weight bear, during the preceding 4 weeks. Interpretability Interpretability is defined as the degree to which can one assign qualitative meaning to a quantitative score.11 This issue can concern what we consider to be a “good”, “bad” or “indifferent” outcome and what we consider to be a clinically relevant change. Thus, if a post-intervention score of 75/100 is generated, in the absence of a context, particularly, knowledge of the pre-intervention or baseline score, and hence the amount of change that has occurred, the value may be relatively meaningless. This area fuels controversy, or certainly irritation. In order to provide context and a qualitative interpretation some authors have categorized or graded groups of scores to represent ‘excellent’, ‘good’, and so on, in order to help guide interpretation. However, measurement purists reject this practice, as any ‘cut-off points’ are always arbitrary. Instead, they recommend that the scores should stand alone with interpretation being made only in the context of each trial or experiment. The issue is likely to remain unresolved. However, at a minimum, it is useful to have baseline comparative values with which study results can be compared.
ORTHOPAEDICS AND TRAUMA 24:4
311
Ó 2010 Elsevier Ltd. All rights reserved.
ORTHOPAEDICS AND TRAUMA 24:4
Most common used generic instruments in orthopaedics Sickness Impact Profile (SIP)49
Short Form 36-item Health Survey (SF-36)51
European Quality of Life Questionnaire (EQ-5D)27
Domains
Physical mobility, pain, social isolation, emotional reaction, energy, sleep Yes/no, 38-items
Bodily pain (BP), vitality (V), general health (GH), physical functioning (PF), social functioning (SF), mental health (MH), role emotional (RE), role physical (RP) Likert scale, 36-items; available as a shorter 12-item version SF-12
Mobility, self-care, usual activities, pain/discomfort and anxiety/depression Self-rated health: EQ-VAS
Score
Domain profile (0 ¼ best/100 ¼ worst)
Physical subscale (ambulation, mobility, body care and movement); and psychosocial subscale (social interaction, communication, alertness, behaviour, emotional behaviour; sleep and rest, eating, home management, recreation and pastimes, employment). Yes/no, 136-items Overall, category or dimension profile (0¼best/100 ¼ worst)
Domain profile (0 ¼ worst/100 ¼ best); Two summary scores: physical component summary (PCS); mental component summary (MCS)
Method of administration Time for administration
Self administered
Patient interview/self administered
Patient interview/self administered
Domain profile Three levels: (no problems, some problems, severe problems) EQ-VAS (100 ¼ best imaginable state/0 ¼ worst imaginable state) EQ-5D index Patient interview/self administered
5e10 min
20e30 min
5e10 min
5 min
Table 1
RESEARCH
Nottingham Health Profile (NHP)50
312
Instrument
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
consists of 12 questions with Likert-box type responses. As in the SF-36, the scores are calculated using weighted algorithms and summarized into Physical Component Summary (PCS) and Mental Component Summary (MCS). From April 2009, in the UK, all NHS-funded providers of unilateral knee replacement are reporting on PROMs.26 Data is collected pre-operatively and postoperatively, and compromise the OKS as a condition-specific measure and the EQ-5D27 as a generic measure. EQ-5D measures health-related quality of life by assessing five dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/ depression) with three levels of severity (problems/some or moderate problems/extreme problems) and has a visual analogue scale (VAS) which allows individuals to rate their own health state on the day of the assessment ranging from 0 (worst imaginable state) to 100 (best imaginable state). Despite being a valuable tool the OKS (and similar instruments) will never be perfect for measuring arthroplasty outcomes, and the interpretation of scores may not be straightforward. One example concerns the situation where observed discrepancies can occur between patient satisfaction and the PROM score: a high ‘good’ absolute score may not be accompanied by high patient satisfaction. A recent study in shoulder replacement showed that one group of patients with high Oxford Shoulder Scores were not necessarily satisfied with their outcome compared to another group of patients. In this instance, it is likely that the discrepancy was related to pathology type and co-morbidity. The dissatisfied group of patients was that who had undergone replacement for rheumatoid arthritis rather than osteoarthritis.28 Whilst any similar finding has not been reported for the knee, a similar picture could easily emerge. The relationship between PROMs and measures of patient satisfaction, as well as patients’ global judgements of change (‘transition’: how is the pain in your knee now, compared with before surgery?) warrants further exploration.
including a generic questionnaire together with the conditionspecific assessment will ensure a more comprehensive evaluation.16 Furthermore, generic instruments have the ability to detect unforeseen side-effects of the intervention that cannot always be identified using narrow condition-specific instruments, and this makes them more suitable for economic related evaluations.1
Knee outcome scores according to disorders/intervention Knee arthroplasty Observer based outcome scores have been developed for measuring the outcome of knee arthroplasty. The best known of these are the Hospital for Special Surgery (HSS) score17 and its successor the American Knee Society Score (AKSS).18 The more commonly used AKSS has a knee-specific assessment relating to pain and clinical examination, together with a separate assessment of function. As described earlier, like all assessor-based scoring systems, they suffer from observer bias and have been widely criticized for this. In the past, both scores were reported widely in the literature and the AKSS was the recommended outcome tool by the American Journal of Bone and Joint Surgery. We have previously mentioned that the focus on patient reported outcomes has been increasing in orthopaedic surgery in recent years and these will continue to be the primary outcome assessment methods for arthroplasty. From the patient’s point of view, goals of knee replacement surgery are chiefly pain relief and the restoration of function and mobility. The evolution and proliferation of outcome tools over the years may leave the reader confused when it comes to choosing an appropriate instrument for a data collection exercise or study. The quality requirements of a self reported instrument for assessing outcome of knee arthroplasty are the same as for assessing any other condition or intervention and include evidence of validity and reliability. Unfortunately, many instruments do not adequately encompass the recommended quality criteria for instrument development.1,11 Indeed, only few instruments have been shown to meet the recommended criteria.19e21 The Oxford Knee score (OKS) (occasionally referred to as the Oxford-12 Knee score) is a 12-item joint-specific selfadministered questionnaire that has been developed with patients, to assess the outcome after knee arthroplasty.22,23 It has been scientifically appraised and demonstrated to be valid, reliable and responsive to clinical change with minimal patient burden and maximal feasibility.23,24 The original scoring system was from 12 to 60 with 12 as the best/least severe score or outcome. More recently the scoring system has been altered to the more intuitive system of 0e48 with 48 being the maximum or best score representing minimal deficit in outcome or status.23 Each of 12 questions is scored from 0 to 4 with 4 being the best possible outcome. In a study of 3600 patients the OKS has been recommended for use in arthroplasty patients in preference to the WOMAC 24 predominantly based on considerations of patient burden, feasibility, validity and reliability. Inclusion of a generic quality of life instrument provides a better overall picture of a patient’s health state. A combination of the OKS and the Medical Outcomes Study 12-Item Short Form (SF-12) is recommended 21,24 for the patients who have undergone knee arthroplasty. The SF-12 25 is an abridged version of the SF-36 and
ORTHOPAEDICS AND TRAUMA 24:4
Knee osteoarthritis The measurement of outcome for patients with osteoarthritis, rather than arthroplasty, presents some different problems. Whilst many of the issues are similar, there are some aspects peculiar to measuring disease progress. Arthroplasty involves the assessment of a rather dramatic intervention which could allow the use of somewhat blunt outcome instruments (although not if long-term follow-up is envisaged). The assessment of gradual, often slow and subtle changes in function due to disease progression demands the use of precise and responsive tools. The universality of the problem also means that the standardization of the outcome instrument is of crucial importance in both clinical work and evidence-based medical studies. Several systems are in existence. The Outcome Measures in Rheumatology Clinical Trials (OMERACT) group advised on a core set of outcome dimensions for hip and knee OA phase III clinical trials, which should include physical function (performance and daily activities), pain, patient global assessment and standardized joint imaging (studies lasting one year or longer).29 A systematic review19 recommended the use of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) VA3.0 and WOMAC numeric scale30 for evaluating pain and physical function in patients with hip and/or knee OA. The WOMAC Index is a widely
313
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
recommended, valid, reliable and responsive instrument, developed for patients with osteoarthritis. The latest version of the instrument WOMAC 3.1 uses a battery of 24 items, assessing three dimensions (pain, disability and joint stiffness). The Knee Injury and Osteoarthritis Outcome Score (KOOS) is based on the WOMAC and assesses five dimensions in 42 questions (pain, symptoms, function in daily living (ADL), function in sports and recreation (Sport/Rec), and in knee related quality of life).31 Results are not combined in a single score, but presented as five different values representing five dimensions of the instrument. The KOOS has good evidence of reliability, validity and responsiveness, and has been recommended as a good choice for long and short term assessment of knee OA, anterior cruciate ligament (ACL) reconstruction and meniscus injury.19,31,32 It is recommended that a generic instrument should be used (Table 1) together with a condition-specific one.16 The Medical Outcomes Study 36-Item Short Form Health Survey (SF-36)33 is designed to measure general health status. It consists of 36 questions with Likert-box response options. Questions are grouped into eight domains with scores ranging from 0 (poor health) to 100 (good health). The SF-36 has been shown to have good psychometric properties and is applicable to patients suffering from a wide range of health conditions, although it may not be ideal for use with elderly people. The SF-36 has demonstrated good overall ratings in patients with OA of the knee.19,20 Together with the use of WOMAC, SF-36 is the most recommended for evaluating pain and function in patients with knee OA. A set of criteria to be used as an endpoint in clinical trials of disease-modifying osteoarthritic drugs was recently proposed by the ongoing OARSI/OMERACT initiative.34 These consist of three domains (pain, function and structure). With regard to the proposed domains, two new PROMs were developed: KOOSPhysical Function Shortform (KOOS-PS)35 consists of seven items devised to assess function of the knee with various degrees of OA severity. The Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP)36 consists of 11 items devised to measure the pain of people suffering from knee and hip OA. The rationale behind this initiative is to facilitate further research in this area. However, these instruments have not yet been subjected to satisfactory methodological review and will benefit from further validation studies.
comprehensive tool that is used should satisfy established criteria for an outcome instrument.1,11 The Cincinnati Knee Rating System, published in 1983, was originally developed as an outcome measure for ACL injuries. Since then, it has undergone modifications and has been used for a variety of conditions. The Cincinnati Knee Rating System is comprehensive and covers symptoms, sports, occupational activities, functional limitations with activities of daily living and ‘objective’ findings. It is well documented and appears to have most supporting evidence to be a valid scale, but the inclusion of objective findings compromises its utility.21 Activities of Daily Living Scale of the Knee Outcome Survey (ADLS of KOS)37 is a patient-administered, knee-specific outcome instrument, designed for a variety of knee conditions. It was designed on the basis of clinician opinion and review of relevant instruments. It has good evidence of reliability, validity and responsiveness for assessing outcome of treatment of ACLdeficient knees. Therefore, it recommended as a suitable option when assessing knee ligament injuries and patello-femoral disorders.21 A simple but useful tool is the Lysholm and Gillquist score first reported in 1982.38 It consists of eight items on a 100-point scale and is best administered alongside the Tegner Activity Score.39 These Swedish systems have recently been revalidated in a recent publication40 and appear to offer a compact yet valid method of evaluating outcome for knee ligament injuries. The International Knee Documentation Committee (IKDC)41 instrument is often cited as the most comprehensive assessment of the soft tissue injured knee. The system rates knee status from A to D with A being the best. Whilst comprehensive, it can sometimes be unwieldy in a clinic situation and suffers by having to allocate a grade in each section to the lowest denominator. For example an otherwise normal knee with grade D laxity (a finding that does not necessarily relate to symptoms) will end up being scored as an overall D, the worst possible outcome. Meniscal injury As reported, the Lysholm Knee Scale was originally developed to evaluate patients after knee ligament surgery, but has been widely used for assessing meniscal injuries. The LysholmeTegner system has been fully documented and found adequate in its testeretest reliability, floor and ceiling effect, criterion validity, construct validity and responsiveness to change for meniscal injuries.42 KOOS was proven to have good properties, and is recommended for assessing outcome of meniscal surgery 21,32 together with ADLS of KOS.21 The IKDC score has recently demonstrated good psychometric properties for assessing meniscal injuries.43 An injury-specific instrument that is able to detect the subtle and small changes between groups or procedures would be most beneficial. The Western Ontario Meniscal Evaluation Tool (WOMET)44 was developed in 2005 specifically to measure the quality of life in patients undergoing meniscal pathology treatment. The instrument has 16 items that represent three domains: (1) physical symptoms, (2) sports, recreation/work/lifestyle and (3) emotions. The instrument has good psychometric properties, but has been used in only one clinical study so far. Additional comparison of existing instruments to assess the outcome of meniscal tears and interventions is needed.
Knee ligament injuries To date there is no single scoring system that has achieved widespread acceptance when it comes to PROMs for the ligament (primarily anterior cruciate) deficient knee. One reason for the lack of general consensus is the deficiency in understanding the nature and the lack of standard manifestations of knee ligament injuries. Some patients have very mechanically based problems of instability, best measured using objective tests, whereas others have more perception based problems and feelings of instability which are best measured by self report or activity assessment. It creates difficulties in trying to establish and define criteria which should be addressed when assessing outcome. As mentioned earlier, relying solely on clinical examination as a method of measuring outcome could compromise the findings as this method is highly prone to intra- and inter-rater bias. It is also necessary to get patients to attend hospital appointments to be assessed at particular points in time, which can be difficult. Any
ORTHOPAEDICS AND TRAUMA 24:4
314
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
Isolated articular cartilage defect A number of scores have been validated for use in the assessment of isolated cartilage damage in the knee. The International Cartilage Research Society (ICRS) macroscopic cartilage clinician-based evaluation score has proven psychometric properties as an outcome instrument for research utilizaton.45 As previously described in this article, these types of instruments are becoming less preferable because of many known shortcomings when compared with PROMs. The IKDC subjective knee form46 and the KOOS score47 are PROMs that have been recently validated for the use in patients with focal cartilage lesions. In addition, the Lysholm score has been adapted for use with cartilage lesions. This modified revision of the Lysholm score has now been validated to assess articular cartilage defects.48 Unfortunately, there is still a lack of general consensus as to which of the many currently available instruments should be used.
9 Beaton DE. Understanding the relevance of measured change through studies of responsiveness. Spine 2000; 25: 3192e9. 10 Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res 2003; 12: 349e62. 11 Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007; 60: 34e42. 12 Dawson J, Doll H, Boller I, et al. Comparative responsiveness and minimal change for the Oxford Elbow Score following surgery. Qual Life Res 2008; 17: 1257e67. 13 Jackowski D, Guyatt G. A guide to health measurement. Clin Orthop Relat Res 2003; 413: 80e9. 14 A knee function assessment chart. From the British Orthopaedic Association Research Sub-Committee. J Bone Joint Surg Br 1978; 60-B: 308e9. 15 Garratt AM, Klaber Moffett J, Farrin AJ. Responsiveness of generic and specific measures of health outcome in low back pain. Spine 2001; 26: 71e7. discussion 7. 16 Dieppe PA. Recommended methodology for assessing the progression of osteoarthritis of the hip and knee joints. Osteoarthritis Cartilage 1995; 3: 73e7. 17 Ranawat CS, Insall J, Shine J. Duo-condylar knee arthroplasty: hospital for special surgery design. Clin Orthop Relat Res 1976; 120: 76e82. 18 Insall JN, Dorr LD, Scott RD, Scott WN. Rationale of the Knee Society clinical rating system. Clin Orthop Relat Res 1989; 248: 13e4. 19 Veenhof C, Bijlsma JW, van den Ende CH, van Dijk GM, Pisters MF, Dekker J. Psychometric evaluation of osteoarthritis questionnaires: a systematic review of the literature. Arthritis Rheum 2006; 55: 480e92. 20 Garratt AM, Brealey S, Gillespie WJ. Patient-assessed health instruments for the knee: a structured review. Rheumatology 2004; 43: 1414e23. 21 Pynsent PB, Fairbank JCT, Carr AJ. Outcome measures in orthopaedics and orthopaedic trauma. 2nd edn. London: Arnold, 2004. pp. 381. 22 Dawson J, Fitzpatrick R, Murray D, Carr A. Questionnaire on the perceptions of patients about total knee replacement. J Bone Joint Surg Br 1998; 80: 63e9. 23 Murray DW, Fitzpatrick R, Rogers K, et al. The use of the Oxford hip and knee scores. J Bone Joint Surg Br 2007; 89-B: 1010e4. 24 Dunbar MJ, Robertsson O, Ryd L, Lidgren L. Appropriate questionnaires for knee arthroplasty. Results of a survey of 3600 patients from The Swedish Knee Arthroplasty Registry. J Bone Joint Surg Br 2001; 83: 339e44. 25 Ware Jr J, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34: 220e33. 26 DH/NHS Finance Performance and Operations. The NHS in England: the operating framework for 2008/09. Department of Health, 2007. 27 EuroQol. A new facility for the measurement of health related quality of life. The EuroQol Group. Health Policy 1990; 16(3): 199e208. 28 Rees JL, Dawson J, Hand GC, et al. The use of PROM’s and patient satisfaction ratings to assess outcome in elective orthopaedic surgery. BESS 2009. University College London, 2009. 29 Bellamy N, Kirwan J, Boers M. Recommendations for a core set of outcome measures for future phase III clinical trials in knee, hip, and hand osteoarthritis. Consensus development at OMERACT III. J Rheumatol 1997; 24: 799e802. 30 Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring
Summary There is a noticeable and recent emphasis on measurement and outcome reporting for the knee. Clinicians and researchers are advised to increase their awareness and knowledge of the current and new tools coming into existence. Whilst the orthopaedic literature is awash with scoring systems and outcome measures to assess status and the effect of intervention on the knee, not all will be suitable or appropriate. Instruments should be chosen carefully after close inspection of evidence of their validity, reliability, context and purpose. An instrument deemed suitable to test hypotheses in a research study may not be as valuable for large scale population based data collection. This rapidly expanding area of musculoskeletal medicine will increasingly influence most aspects of orthopaedics and those who choose to ignore it will likely be disadvantaged. A
REFERENCES 1 Fitzpatrick R, Davey C, Buxton M, Jones D. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess 1998; 2: 1e74. 2 Slevin ML, Plant H, Lynch D, Drinkwater J, Gregory WM. Who should measure quality of life, the doctor or the patient? Br J Cancer 1988; 57: 109e12. 3 Garratt A, Schmidt L, Mackintosh A, Fitzpatrick R. Quality of life measurement: bibliographic study of patient assessed health outcome measures. BMJ 2002; 324: 1417. 4 Davies AP. Rating systems for total knee replacement. Knee 2002; 9: 261e6. 5 Streiner D, Norman R. Health measurement scales: a practical guide to their development and use. 4thedn. Oxford University Press, 2008. pp. 431. 6 AERA, APA, NCME. Standards for educational and psychological testing. Washington: American Educational Research Association, 1999. pp. 194. 7 Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297e334. 8 Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol 1997; 50: 239e46.
ORTHOPAEDICS AND TRAUMA 24:4
315
Ó 2010 Elsevier Ltd. All rights reserved.
RESEARCH
31
32
33
34
35
36
37
38
39 40
41
42
clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988; 15: 1833e40. Roos EM, Roos HP, Lohmander LS, Ekdahl C, Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS) e development of a selfadministered outcome measure. J Orthop Sports Phys Ther 1998; 28: 88e96. Roos EM, Roos HP, Ekdahl C, Lohmander LS. Knee injury and Osteoarthritis Outcome Score (KOOS) e validation of a Swedish version. Scand J Med Sci Sports 1998; 8: 439e48. Ware Jr JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992; 30: 473e83. Dougados M, Hawker G, Lohmander S, et al. OARSI/OMERACT criteria of being considered a candidate for total joint replacement in knee/hip osteoarthritis as an endpoint in clinical trials evaluating potential disease modifying osteoarthritic drugs. J Rheumatol 2009; 36: 2097e9. Perruccio AV, Stefan Lohmander L, Canizares M, et al. The development of a short measure of physical function for knee OA KOOSPhysical Function Shortform (KOOS-PS) e an OARSI/OMERACT initiative. Osteoarthritis Cartilage 2008; 16: 542e50. Hawker GA, Davis AM, French MR, et al. Development and preliminary psychometric testing of a new OA pain measure e an OARSI/ OMERACT initiative. Osteoarthritis Cartilage 2008; 16: 409e14. Irrgang JJ, Snyder-Mackler L, Wainner RS, Fu FH, Harner CD. Development of a patient-reported measure of function of the knee. J Bone Joint Surg Am 1998; 80: 1132e45. Lysholm J, Gillquist J. Evaluation of knee ligament surgery results with special emphasis on use of a scoring scale. Am J Sports Med 1982; 10: 150e4. Tegner Y, Lysholm J. Rating systems in the evaluation of knee ligament injuries. Clin Orthop Relat Res 1985; 198: 42e9. Briggs KK, Lysholm J, Tegner Y, Rodkey WG, Kocher MS, Steadman JR. The reliability, validity, and responsiveness of the Lysholm score and Tegner activity scale for anterior cruciate ligament injuries of the knee: 25 years later. Am J Sports Med 2009; 37: 890e7. Hefti F, Muller W, Jakob RP, Staubli HU. Evaluation of knee ligament injuries with the IKDC form. Knee Surg Sports Traumatol Arthrosc 1993; 1: 226e34. Briggs KK, Kocher MS, Rodkey WG, Steadman JR. Reliability, validity, and responsiveness of the Lysholm knee score and Tegner activity
ORTHOPAEDICS AND TRAUMA 24:4
43
44
45
46
47
48
49
50
51
scale for patients with meniscal injury of the knee. J Bone Joint Surg Am 2006; 88: 698e705. Crawford K, Briggs KK, Rodkey WG, Steadman JR. Reliability, validity, and responsiveness of the IKDC score for meniscus injuries of the knee. Arthroscopy 2007; 23: 839e44. Kirkley A, Griffin S, Whelan D. The development and validation of a quality of life-measurement tool for patients with meniscal pathology: the Western Ontario Meniscal Evaluation Tool (WOMET). Clin J Sport Med 2007; 17: 349e56. van den Borne MP, Raijmakers NJ, Vanlauwe J, et al. International Cartilage Repair Society (ICRS) and Oswestry macroscopic cartilage evaluation scores validated for use in Autologous Chondrocyte Implantation (ACI) and microfracture. Osteoarthritis Cartilage 2007; 15: 1397e402. Greco NJ, Anderson AF, Mann BJ, et al. Responsiveness of the International Knee Documentation Committee Subjective Knee Form in comparison to the Western Ontario and McMaster Universities Osteoarthritis Index, Modified Cincinnati Knee Rating System, and Short Form 36 in patients with focal art. Am J Sports Med, published online before print December 31, 2009, doi:10.1177/ 0363546509354163. Bekkers JE, de Windt TS, Raijmakers NJ, Dhert WJ, Saris DB. Validation of the Knee Injury and Osteoarthritis Outcome Score (KOOS) for the treatment of focal cartilage lesions. Osteoarthritis Cartilage 2009; 17: 1434e9. Smith HJ, Richardson JB, Tennant A. Modification and validation of the Lysholm Knee Scale to assess articular cartilage damage. Osteoarthritis Cartilage 2009; 17: 53e8. Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981; 19(8): 787e805. Hunt SM, McKenna SP, McEwan J, Backett EM, Williams J, Papp E. A quantitative approach to perceived health status: a validation study. J Epidemiol Community Health 1980; 34(4): 281e6. Ware JE, Kosinski M. The SF-36 health survey (version 2.0) technical note. Boston: Health Assessment Lab, 1996 (updated 1997).
Acknowledgements Financial support was received from the NIHR Biomedical Research Unit into Musculoskeletal Disease, Nuffield Orthopaedic Centre and University of Oxford.
316
Ó 2010 Elsevier Ltd. All rights reserved.