(vi) Hip outcome measures

(vi) Hip outcome measures

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES (vi) Hip outcome measures s­ urgery and even admission to the intensive care unit. ...

104KB Sizes 0 Downloads 93 Views

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

(vi) Hip outcome measures

s­ urgery and even admission to the intensive care unit. As a patient with a SSI spends on average twice the length of time in hospital, SSI is not only distressing for the patient, it is an economic burden for the health care provider. It is proposed that from 2010, all UK hospital SSI data will be available on the “NHS Choices” website. Historically, the success or failure of an orthopaedic intervention was assessed and reported by the operating surgeon. This is changing, now more emphasis is placed on patient-centered assessments and patient-reported quality of life. There are several quality of life assessments available, which fall into three broad categories: generic, disease-specific and joint-specific. Generic surveys aim to investigate all aspects of quality of life and can be used to assess any medical or surgical intervention. Disease specific tools concentrate on disability relating to a particular condition or single disease entity. Joint specific tools are used to assess the impact of disease in one particular joint. Analysis of quality of life outcome data has previously been at group level. However, many recent studies have focused on individual patient outcomes, either by responder criteria or by stateattainment criteria. In the former case, each patient is classified as a responder or a non-responder to treatment based on whether the change in health status exceeds a pre-defined threshold. With state-attainment criteria, a patient is classified not on the basis of change, but on whether a certain level of low symptom severity is attained. Research in both areas is experimental but may provide more relevant results than group level studies. Patient co-morbidity can affect the results of outcome studies. The Physiological and Operative Severity Score for enUmeration of Mortality and morbidity (POSSUM)1 was developed to overcome this problem. Each patient is given a score according to physiological status and the severity of the surgery they are to undergo. In clinical trials, the mean POSSUM scores of all patient groups must be similar to ensure that results are not skewed by patient co-morbidity. A validated orthopaedic version of ­POSSUM is now available.

Miss E Ashby MPW Grocott FS Haddad

Abstract In the UK the Department of Health published a report earlier this year entitled ‘High Quality Care For All’ which requires that all hospitals must collect and publish surgical outcome data by 2010, including post­operative complication rates, surgical site infection rates and patientcentered quality-of-life assessments. This paper gives an overview of the outcome measures used to assess interventions on the hip.

Keywords outcome measures; quality of life; questionnaires

Introduction Outcome measures and scores are used to assess the impact of orthopaedic interventions for many purposes such as clinical trials, comparing different interventions, alternative prostheses, different methods of fixation and surgical techniques. They are used to assess elements of peri-operative care such as the use of prophylactic antibiotics, increased physiotherapy input and different regimens of post-operative analgesia, as well as for audit purposes comparing individuals, departments, hospitals and regions. This enables good practice to be highlighted and propagated, and for remedial action to be instituted where practice is sub-standard. Orthopaedic surgical outcomes can be assessed in several ways, eg. generic clinical outcomes, radiological outcomes, postoperative complication rates, re-operation rates, length of inpatient stay and health-related quality-of-life. Generic clinical outcome measures assess overall mortality and morbidity following surgery. This includes mortality at various time-points after surgery, length of post-operative in-patient stay, and incidence of specific complications (e.g. hip dislocation and periprosthetic fracture). Surgical Site Infection (SSI) is a major risk, giving rise not only to patient pain and discomfort, but also to wound ­ dehiscence, deep infection and generalized sepsis, necessitating ­ further

What makes a good outcome measure? If they are to be good descriptors of clinical or quality-of-life phenomena, outcome measures must fulfill certain psychometric criteria. They must be reliable, validated and sensitive to change and questionnaires should be acceptable to patients, simple, easy to use and score, and preferably short! Reliability is a term used inconsistently in the literature. It is a measure of the degree to which subjects can be distinguished from each other. It can be defined as the ratio of variance between subjects to the total variance. A reliability value of zero indicates a completely unreliable measure, where as a reliability value of one indicates a perfectly reliable measure. Reliability is dependent on the relationship between the measurement error and the variability between subjects. Therefore, internal consistency and reproducibility are both components of reliability. Internal consistency determines whether a survey measures a single variable. The test for internal consistency is Cronbach’s alpha. This summarizes the internal correlation of all questions in a survey onto a single scale. The higher the alpha coefficient, the greater the likelihood the questionnaire is tapping into a single variable and is free from random error.

Miss E Ashby BA MB BChir MA(Cantab) MRCS is Orthopaedic Registrar at Chase Farm Hospital, Middlesex, UK. MPW Grocott BSc MBBS MRCP FRCA is Senior Lecturer in Critical Care Medicine at Surgical Outcome Research Centre, Joint UCLH/UCL Comprehensive Biomedical Research Centre, London, UK. FS Haddad BSc MCh(Orth) FRCS Ed FRCS(Tr & Orth) Consultant Orthopaedic Surgeon, University College London Hospital, UK.

ORTHOPAEDICS AND TRAUMA 23:1

40

© 2009 Elsevier Ltd. All rights reserved.

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

acceptable to patients. It was designed to identify morbidity of a type and severity that could delay discharge from hospital using a simple data collection process to allow the routine screening of large numbers of patients and concentrates on indicators of clinically important dysfunction in key organ systems (e.g. inability to tolerate enteral diet) rather than traditional diagnostic categories (e.g. DVT). The survey assesses nine domains of morbidity (Table 1) using readily available data and requires no additional investigations; all data are obtained from observation charts, medication charts, patient notes, routine blood test results and direct questioning and observation of the patient. The relationship between short-term generic clinical outcome and long-term quality of life outcome is not yet clearly defined. One significant problem with all quality of life outcome measures is the length of time taken to collect the data. If short-term outcome measures could predict longer-term quality of life, they could be used as early surrogate markers for longer term function and patient satisfaction. While POMS provides early post­operative information, it is as yet unknown whether there is any correlation between POMS data and long-term quality of life data. Hence both short-term and long-term outcome measures are needed to assess the success of any intervention. As well as being a scale to measure and compare post­operative morbidity, POMS can also be used as a tool to assess when patients are ready for discharge. When the POMS score is

Reproducibility investigates if a questionnaire produces the same results if repeated under the same conditions. Interobserver reliability (agreement between two or more observers on the same occasion), intra-observer reliability (same observer on separate occasions), and test-retest reliability (stability of the measure over time in the same subject) are all aspects of reproducibility. Paired sets of data can be compared using the kappa coefficient or the coefficient of reliability according to the method of Bland & Altmann. A higher coefficient indicates a more reproducible questionnaire. Validity examines whether a questionnaire measures what it is proposed to measure. Several types of validity exist: content and face validity, criterion (convergent/concurrent) validity and construct validity. Face and content validity assess whether a survey fully investigates the intended topic of interest. Content validity can be increased by conducting exploratory interviews with patients prior to writing the questionnaire. This will elucidate the priorities and concerns of patients rather than imposing clinical assumptions. Face and content validity are subjective measures with no statistical method to assess them. Criterion validity assesses how a new questionnaire compares to an established questionnaire on the same subject i.e. the current ‘gold standard’. This approach is only tenable when such a ‘gold standard’ is available and begs the question of why a new measure is being developed. For measures where no “gold standard” exists, construct validity examines the extent to which the results from the questionnaire support predefined hypotheses. It can be measured using Pearson correlation coefficients between the total score for the questionnaire and other measures considered to be associated with the underlying construct being investigated. Construct validity investigates if a single concept is being measured by the questionnaire. If construct validity is proven, scores can be combined to produce one overall score. Construct validity is tested by calculating the correlation between scale scores. Tests of sensitivity or responsiveness investigate if a survey is capable of detecting clinically significant changes. The definition of sensitivity is the difference in the mean pre-operative and post-operative scores, divided by the standard deviation of the pre-operative scores. An effect size of one is equal to a change of one standard deviation in the sample.

Criteria for a positive POMS score Variable

Criteria for positive result

Pulmonary

Requires supplementary oxygen or ventilatory support Currently on antibiotics or temperature >38 °C in the last 24 hours Oliguria (<500 ml/day), elevated creatinine (>30% pre-op level), catheter in-situ (for non-surgical reason) Unable to tolerate enteral diet for any reason Diagnostic tests or treatment within the last 24 hours for: myocardial infarction, hypotension (requiring pharmacological therapy or fluids >200 ml/hour), atrial/ ventricular arrhythmia or cardiogenic pulmonary oedema Presence of new focal deficit, coma, confusion, delirium Wound dehiscence requiring surgical exploration or drainage of pus from operative wound with or without isolation of organisms Requirement of blood transfusion, platelets, fresh frozen plasma or cryoprecipitate within the last 24 hours Wound pain requiring parenteral opioids or regional anaesthesia

Infection Renal

Gastrointestinal Cardiovascular

Short-term clinical outcome measures There are a variety of ways to describe the overall clinical impact of operative interventions and the associated physiological disturbance of undergoing major surgery, all of which have their limitations. Mortality is now so infrequent for most types of orthopaedic surgery that it is not a useful comparative index. The event rate is so low that very large numbers of operations would have to be compared to demonstrate a meaningful difference in outcome. Length of hospital stay is sometimes used as a surrogate for clinical outcome but this is well known to be influenced by many factors other than the health status of the patient. It is well known that post-operative complications and morbidity following surgery are poorly recorded, which led to the development of the Post-Operative Morbidity Survey2 (POMS). It has been used in post-operative morbidity, outcomes and effectiveness research and has been shown to be reliable, valid and

ORTHOPAEDICS AND TRAUMA 23:1

Central nervous system Wound complications

Haematological

Pain

Table 1

41

© 2009 Elsevier Ltd. All rights reserved.

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

erythema, serous and purulent exudates, and the clinical consequences of infection such as prolonged hospital stay and readmission. A score of over 10 indicates an increasing probability and severity of infection (Table 2). The original ASEPSIS scoring method was psychometrically tested and found to be objective and repeatable but the most recent revised version has not undergone the same evaluation. Scoring methods provide more detailed and objective information regarding SSI than CDC and NINSS but they are more costly, complicated and time-consuming to perform, the average time taken to collect the data and calculate an overall ASEPSIS score is 59 minutes.

zero, a patient is likely to be fit for discharge. Therefore, as well as providing useful clinical research and audit data, POMS may be useful to assess the efficiency of a hospital’s discharge processes.

Surgical site infection Wound surveillance in orthopaedic surgery became mandatory in the NHS in England in 2004. Patient follow-up is essential in any wound surveillance program, as half of Surgical Site Infections (SSI)s present after hospital discharge. Reported SSI rates depend on the diagnostic criterias, case mix, the thoroughness of surveillance and documentation, and the length of patient ­follow-up. A reliable and repeatable method for diagnosing SSI is essential for audit and treatment purposes. There is a misconception that SSIs are simple to define and diagnose. However, several definitions of SSI exist and the criteria used vary between surgeons. If SSI rates are to be published, as suggested in ‘High Quality Care For All’ published by the Department of Health,3 the same validated method for diagnosing SSIs must be used by all institutions. Traditionally SSIs were diagnosed by the clinical features of pain (dolor), redness (rubror), heat (calor), swelling (tumor) and impairment of function but the increasing emphasis on clinical governance and accountability in the NHS, more practical, reproducible and reliable methods of diagnosing SSI are necessary. Three common SSI definitions in use today are those of the US Center for Disease Control (CDC) definition, the English Nosocomial Infection National Surveillance Scheme (NINSS) definition and the English ASEPSIS definition. The CDC definition is used worldwide to classify wound infections. It includes any wound infection within 20 days of surgery or one year if an implant is present. Infection is classified as ‘none’, ‘superficial’ or ‘deep’. The CDC definition comprises 4 criteria, only one of which must be fulfilled to diagnose infection. They are: 1) Purulent discharge from the incision (incisional infection) or from a drain below the fascia (deep infection). 2) Surgeon’s diagnosis of infection. 3) For incisional infections, an organism isolated from fluid culture or a surgeon opening the wound, unless cultures are negative. 4) For deep infections, a spontaneous dehiscence or surgeon opening the wound in the presence of fever or local pain, unless cultures are negative or an abscess is present on direct examination. Although widely used, the CDC definition is weak since 3 out of the 4 criteria are subjective. On psychometric evaluation CDC has been shown to be unreliable. The UK NINSS definition of SSI is a modified version of CDC. It requires the presence pus cells for a wound culture to be classified as positive and excludes a surgeon’s diagnosis of infection as a sufficient criterion. These changes were intended to make the CDC definition more objective, but interpretation of NINSS has still been shown to vary both between hospitals and regions. ASEPSIS is a quantitative wound scoring method developed in 1986.4 It provides a numerical score which indicates the severity of wound infection. The score is calculated using objective criteria based on the physical appearance of the wound such as

ORTHOPAEDICS AND TRAUMA 23:1

Generic quality of life outcome measures Generic outcome measures are used by all medical and surgical specialties. They assess overall health-related quality of life and are not specific to age, disease or treatment group. They can be used to compare the relative burden of disease in general and specific populations and can be used to quantify the overall health benefits from different treatments. The World Health Organisation Quality of Life Group recommended that generic surveys should explore five areas • physical health • psychological health • social relationship perceptions • function • well-being5 Commonly used generic outcome measures are: • Medical Outcomes Study 36-Item Short Form Health ­Survey (SF-36) • Medical Outcomes Study 12-Item Short Form Health ­Survey (SF-12) • European quality-of-life 5 dimension questionnaire (­EuroQol/EQ-5D). The medical outcomes study 36-item short form health survey (SF-36) SF-36 is a multi-purpose questionnaire used to measure general health status.6 It was originally developed in American English but a United Kingdom English version is now available. It refers to health over the previous four weeks but a more acute version, referring to health over the previous week, is available. The questionnaire contains 36 questions, each of which has between 2 and 6 answers. Each is scored between 0 (poor health) and 100 (good health). The questions are grouped into one of eight

ASEPSIS scores ASEPSIS score

Meaning

0–10 11–20 21–30 31–40 Over 40

No infection. Normal healing. Disturbance of healing. Minor infection Moderate infection Severe infection

Table 2

42

© 2009 Elsevier Ltd. All rights reserved.

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

health domains: bodily pain (BP), physical functioning (PF), role ­limitations due to physical health (RP), general health (GH), mental health (MH), vitality (VT), social functioning (SF) and role limitations due to emotional health (RE). There is also a health transition question that does not contribute to any of the eight domains. The domains can be amalgamated into two higher order groups, known as the Physical Component Summary (PCS) and the Mental Component Summary (MCS). The PCS is calculated from the BP, PF, RP and GH scores and is most responsive to treatments that alter physical symptoms such as hip arthroplasty. MCS is calculated from the MH, VT, SF and RE scores and is most responsive to drugs and therapies that target psychiatric disorders. Three of the scales (VT, GH and SF) have a significant correlation with both the physical and mental summary measures. SF-36 takes approximately ten minutes to complete. It is proven to be suitable for self-administration, computerised administration or administration by an interviewer either in person or by telephone. Scores are calculated by summated ratings and standardised SF-36 algorithms. Individual question scores are summated without standardisation or weighting. Standardisation is avoided by using questions with roughly similar means and standard deviations, and weighting is avoided by selecting equally representative questions. SF-36 has been evaluated in several studies. It is proven to be valid, reliable, sensitive and acceptable to patients. It has been used in over 4,000 publications assessing over 200 different diseases and has been specifically investigated in patients undergoing hip arthroplasty. It was shown to be both valid and reliable. However, these studies also showed that SF-36 has minor ‘floor’ and ‘ceiling’ effects. ‘Floor’ effect refers to the situation where a questionnaire is unable to measure a negative value that is lower than the range provided in the choice of answers. In this situation, if a patient reports the lowest value for a question, and then deteriorates further, the deterioration will not be detected by the questionnaire. ‘Ceiling’ effect refers to the opposite situation where a questionnaire is unable to measure a positive value that is higher than the range provided in the choice of answers. In this situation, if a patient reports the highest value for a question, and then improves, the improvement will not be detected by the questionnaire.

studies since the confidence intervals are largely determined by sample size but could result in insignificant findings in smaller studies. The European quality of life 5 dimension questionnaire (­EuroQol/EQ-5D) The EuroQol (EQ-5D)8 questionnaire comprises two pages. There are 15 questions on the first page regarding five aspects of general health: mobility, self care, usual activities, pain and depression. Each question has three possible answers: ‘no problem’, ‘moderate problem’ or ‘extreme problem’. The second page aims to eludidate how the patient regards their overall health using a visual analogue scale with 0 indicating the worst possible health and 100 indicating the best possible health. Euroqol was designed to be self-administered and takes five minutes to complete and has been shown in studies to be both valid and reliable. However, it suffers from ‘ceiling’ effects due to the restricted response format. This is partially overcome by the use of the visual analogue scale on the second page. There is limited psychometric analysis of the questionnaire for use in patients undergoing lower limb arthroplasty and some evidence of construct validity, test-retest reliability and sensitivity.

Disease-specific quality of life outcome measures Disease-specific outcome measures assess the impact of a disease on a patient’s quality of life. They are used in both research and clinical practice to assess and compare alternate surgical and medical treatments for the same disease entity. There are two commonly used disease specific questionnaires used to assess hip arthritis, the Western Ontario and MacMaster Universities (WOMAC) Osteoarthritis Index and the Arthritis Impact Measurement Scales (AIMS). These can be used to assess arthritis in any joint and are not restricted to assessment of the hip. The Western Ontario and MacMaster universities (womac) osteoarthritis index The WOMAC Index was developed for patients with osteoarthritis.9 The original version contained 5 subscales (WOMAC 5.0) but only three were retained for further development (WOMAC 3.0). Globalisation of WOMAC resulted in several refinements leading to WOMAC 3.1 which is now the standard. It was developed in Canadian English and designed to be self-administered. It comprises 24 questions covering three topics: joint pain, joint stiffness and physical function. Other versions are available with ­differing numbers of questions and dimensions to meet different measurement needs. The standard version uses a 48-hour timeframe but it is sufficiently robust to tolerate variations from 24 hours to 1 month and is available in a 5-point Likert, 100 mm visual analogue and 11-point numerical rating format. Most clinical research uses the Likert and visual analogue versions of WOMAC 3.1. The WOMAC Index has been extensively evaluated and shown to be valid, reliable and sensitive. It has been used in several hundred publications and has been translated and linguistically validated in over 65 languages. It’s validity has been specifically tested in patients undergoing hip arthroplasty where it has been shown to be sensitive, have high internal consistency and acceptable test/retest reliability. However, it does show post-operative

The medical outcomes study 12-item short form health survey (SF-12) SF-127 is an abridged version of SF-36 with 12 out of the 36 questions. SF-12 questions can be amalgamated to produce profiles of the eight SF-36 health concepts but only if the sample size is sufficiently large. If the sample size is too small, there is insufficient data to calculate scores for the eight health profiles. SF-12 scores are calculated using weighted algorithms (i.e. the questions in SF-12 contribute different values to the overall score, unlike SF- 36) for which a computer program is available. The main advantage of SF-12 over SF-36 is that it is shorter and therefore quicker for patients to complete and quicker for research personnel to record and analyse data. One disadvantage is that a computer program is necessary for scoring each survey. A further disadvantage is that SF-12 has less construct validity and sensitivity than SF-36 producing less precise scores for the 8- scale health profile. This is less important in large group

ORTHOPAEDICS AND TRAUMA 23:1

43

© 2009 Elsevier Ltd. All rights reserved.

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

8 questions and a physical examination. The questions cover 3 dimensions, pain (with a maximum score of 44), function (with a maximum score of 33) and level of activity (with a maximum score of 14). The physical examination assesses hip range of motion and a maximum of 9 points can be awarded. The number of points in each section are added together to make a maximum possible score of 100. The score is rated:

ceiling effects for the pain and stiffness subscales in patients undergoing hip arthroplasty, as with SF-36 and Euroqol. One study showed WOMAC to have superior sensitivity to generic outcome measures. However, disease-specific and generic outcome measures are generally used for different purposes and the use of both instruments together often provides more information than using either individually. It has been hypothesised that WOMAC could be used to predict future health status and health resource utilization but this remains unproven.

90–100 80–89 70–79 60–69 <60

The Arthritis Impact Measurement Scale (AIMS) The Arthritis Impact Measurement Scale (AIMS) was originally developed to measure outcome in patients with rheumatic ­disease but has subsequently been evaluated in patients with osteoarthritis and shown to be sensitive to clinical improvement. AIMS 210 is a modified version of the original AIMS, developed in American English. It was designed to be self-administered and takes 20 minutes to complete. The questionnaire comprises 78 questions exploring 12 concepts: mobility, walking and bending, hand and finger function, arm function, self-care tasks, household tasks, social activity, support from family and friends, arthritis pain, work, level of tension and mood. The range of scores for each concept depends upon the number of questions it contains. Each score within a health concept are simply added. In order to express each scale in the same units, a normalization procedure is performed so that each concept is expressed as a value from 0 to 10 with 0 representing good health status and 10 representing poor health. The AIMS scales can be combined into 3 or 5 component models of health status. The 3 component model groups the scales into general categories of physical function, psychological status and pain. The 5 component model combines the scales into measures of lower limb function, upper limb function, affect, symptoms and social interaction. AIMS2 has been psychometrically evaluated and been shown to be both valid and reliable. A modified version of AIMS is available specifically for patients undergoing hip arthroplasty. This has 57 questions which can be grouped into four higher order profiles: physiologic function, self concept, role function and interdependence. Evaluation of this version proved the questionnaire to be sensitive and valid, but reliability remains unproven.

The original version of the assessment was performed entirely by the surgeon. This has been modified to create a patient reported measure, of 7 questions regarding hip pain, walking aids, limping, walking distance, climbing stairs, putting on shoes and socks and sitting. Each question has between 3 and 7 answers which are expressed on a Likert type scale. The scores are added to give a total score of between 0 and 100, where 0 is the best result. On psychometric evaluation the Harris Hip Score was found to be reliable but other forms of validity remain unproven in the literature. Charnley Score The Charnley Score12 was developed in UK English and is another example of a surgeon assessed outcome. Hip pain, mobility and walking are graded on a 6 point scale. Walking is only assessed in patients who have no other condition that may undermine their walking ability. Higher scores indicate better outcome. Scores for different treatment groups can either be averaged or state attainment criteria can be used where the number of patients scoring 5 or 6 in each group can be compared. There are no studies in the literature validating use of the Charnley Score in hip arthroplasty patients. Oxford Hip Score The Oxford Hip Score (OHS)13 is a joint-specific outcome measure designed to assess outcome in patients undergoing hip arthroplasty. It was developed in UK English and takes five minutes to complete. It contains 12 questions to assess hip pain and functional ability over the preceding four weeks. Each question has five possible answers, scored from 0 to 4 giving an overall score range of 0–48. A score of 0 is the best possible outcome, with higher scores indicating increasing problems. The OHS has been used extensively in orthopaedic literature and its psychometric properties have been rigorously examined. It is internally consistent, reproducible and valid. It is proven to be sensitive in patients undergoing both primary and revision hip arthroplasty. Some studies have found the OHS to be more sensitive than generic measures such as SF-36 and disease-specific measures such as WOMAC.

Hip-specific outcome measures Historically the outcome after hip arthroplasty was assessed by the operating surgeon using tools such as the Harris Hip Score and the Charnley Score. These were derived from clinical and radiological data and ultimately depended on the judgment of the surgeon. Patients, and surgeons, opinions often differ and it became apparent that methods were needed to elicit the patient’s perception of their hip surgery. This led to the design of patient centered hip specific quality of life surveys such as the Oxford Hip Score (OHS), the Hip disability and Osteoarthritis Outcome Score (HOOS) and University of California at Los Angeles (UCLA) hip score.

The Hip Disability and Osteoarthritis Outcome Score The Hip disability and Osteoarthritis Outcome Score (HOOS)14 is a joint-specific survey that evolved from the Knee disability and Osteoarthritis Outcome Score (KOOS). It was designed to be

Harris Hip Score The Harris Hip Score11 was developed in American English to assess patients following hip arthroplasty. It is made up of

ORTHOPAEDICS AND TRAUMA 23:1

Excellent Good Fair Poor Fail

44

© 2009 Elsevier Ltd. All rights reserved.

Mini-symposium: what’s new in hip replacement — BASIC PRINCIPLES

self-administered and takes 10 minutes to complete. HOOS has 40 questions, each of which has five possible answers (scored 0– 4). The questions can be grouped into 5 higher order dimensions: pain, other symptoms, activities of daily living, sport and hip-related quality of life. The scores from each dimension are added together and then transformed onto a scale of 0–100 where 100 represents the best outcome. HOOS has been evaluated and proven to be both valid and responsive. HOOS contains all the WOMAC Likert 3.0 questions and thus can be used to calculate WOMAC scores indeed two of the subscales (pain and other symptoms) have been shown to be more responsive than WOMAC.

References 1 Copeland GP, Jones D, Walters M. POSSUM: a scoring system for surgical audit. Br J Surg 1991; 78: 355–60. 2 Bennett Guerrero E, Welsby I, Dunn TJ, et al. The use of a postoperative morbidity survey to evaluate patients with prolonged hospitalization after routine, moderate-risk, elective surgery. Anesth Analg 1999; 89: 514–19. 3 Darzi A. High quality care for all. Next stage NHS review. Final report. DOH June 30th 2008. 4 Wilson AP, Treasure T, Sturridge MF, Gruneberg RN. A scoring method (ASEPSIS) for post-operative wound infections for use in clinical trials of antibiotic prophylaxis. Lancet 1986; i: 311–313. 5 Study Protocol of the World Health Organisation project to develop a quality of life instrument (WHOQOL). Qual Life Res 1993; 2: 153–9. 6 Ware Jr. JE, Donald Sherbourne C. The MOS 36-item Short-Form Health Survey (SF-36). Med Care 1992; 30: 473–483. 7 Ware JR, Konsinski M, Keller SD. A 12-item short-form health survey: construction of scales and preliminary tests of reliability and validity. Med Care 1996; 34: 220–33. 8 Brooks R. Euroqol: the current stage of play. Health Policy 1996; 37: 53–72. 9 Bellamy N. Osteoarthritis - an evaluative index for clinical trials (MSc thesis). Hamilton, Ontario. Canada: McMaster University; 1982. 10 Meena RF, Mason JH, Anderson JJ, Gucione AA, Kazis LE. AIMS2: the content and properties of a revised and expanded arthritis impact measurement scales health status questionnaire. Arthritis Rheum 1992; 35: 1–10. 11 Harris WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty. An end-result study using a new method of result evaluation. J Bone Joint Surg Am 1969; 51: 737–755. 12 Charnley J. The long-term results of low-friction arthroplasty of the hip performed as a primary intervention. J Bone Joint Surg Br 1972; 54: 61–76. 13 Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br 1996; 78: 185–190. 14 Nilsdotter AK, Lohmander LS, Klassbo M, Roos EM. Hip disability and osteoarthritis outcome score (HOOS) - validity and responsiveness in total hip replacement. BMC Musculoskelet Disord 2003; 4: 10. 15 Amstutz HC, Thomas BJ, Jinnah R, Kim W, Yale C. Treatment of primary osteoarthritis of the hip. A comparison of total joint and surface replacement arthroplasty. J Bone Joint Surg Am 1984; 66: 228–41.

The University of California at Los Angeles hip scale The University of California at Los Angeles (UCLA) hip scale15 is often used to assess post-operative outcome in arthroplasty patients. More recently it has also been used to assess hip arthroscopy outcomes. The scale explores four dimensions: pain, walking, function and activity. There are 10 points on the scale with 10 indicating the best outcome and 1 indicating the worst. While there is no published psychometric evidence validating the UCLA hip scale, many studies still use it as a measure of outcome.

Conclusions To assess orthopaedic interventions knowledge of outcome measures is becoming increasingly important for both research and audit purposes. When outcome data are made available to the general public in 2010, it will be imperative that each institution uses the same validated method for collecting the data. If different outcome measures are used by different institutions, like-for-like will not be compared, and any comparisons will be invalid and misleading. Thus guidelines will have to be given to health-care providers as to which method they should use to collect each data set. Short-term post-operative complications can be recorded using the POMS. Longer-term patient satisfaction can be assessed using generic, disease-specific or joint-specific patient-centered quality of life questionnaires. Of the several published quality of life questionnaires, the most evaluated generic measure is SF- 36, the most evaluated disease-specific measure is WOMAC and the most evaluated hip-specific measure is the Oxford Hip Score. There are also several methods of measuring SSI, none of which has been shown to be definitively superior to the others. ◆

ORTHOPAEDICS AND TRAUMA 23:1

45

© 2009 Elsevier Ltd. All rights reserved.