The art of choosing sound study endpoints

The art of choosing sound study endpoints

Injury, Int. J. Care Injured (2008) 39, 656—658 www.elsevier.com/locate/injury The art of choosing sound study endpoints Beate Hanson * AO Clinical ...

66KB Sizes 2 Downloads 102 Views

Injury, Int. J. Care Injured (2008) 39, 656—658

www.elsevier.com/locate/injury

The art of choosing sound study endpoints Beate Hanson * AO Clinical Investigation and Documentation, Stettbachstrasse 6, CH-8600 Du ¨ bendorf, Switzerland Accepted 7 February 2008

KEYWORDS Clinical trials; Outcome; Function; Quality of life

Summary The selection of appropriate endpoints is the key to successful clinical studies. The chosen primary outcome measure ultimately determines the required sample size, significantly affects trial logistics, and establishes acceptance of results in the scientific community. Confirmatory statistics may only be conducted on the primary endpoint, while secondary endpoints are better analysed in an exploratory fashion. While investigators may prefer surrogates (e.g., radiographic results, laboratory markers) clinical trials should no longer be planned without addressing function or quality of life. Valid outcome tools (e.g., Short-Form 36) are needed to ensure comparability of findings to other studies. # 2008 Published by Elsevier Ltd.

Benefits and harms Valuable clinical treatments are those that have a favorable benefits/harm ratio (sometimes called the therapeutic index). For example, total joint replacement has proven efficacy and effectiveness in restoring mobility, alleviating symptoms, and improving quality of life in patients with osteoarthritis of the knee. These benefits clearly outweigh the risks and intermittent discomfort of surgery.1,2 Determining the benefits/harm ratio of a specific treatment is very similar to an economic equation, and Porzsolt coined the term ‘‘clinical economics’’ to describe the process of opposing intangible payments (like pain) to yields.3 * Tel.: +41 44 200 24 70; fax: +41 44 200 24 60. E-mail address: [email protected]. 0020–1383/$ — see front matter # 2008 Published by Elsevier Ltd. doi:10.1016/j.injury.2008.02.014

Many competing treatment options may exist for the same condition. For internal fixation of proximal humerus fractures, ten different implants are available, all of which promise good functional outcomes. However, there are at least 30 different outcome assessment instruments applying to the shoulder (e.g., DASH, Constant-Murley score, Neer score). How can we decide whether a treatment is better (i.e., superior) to others? Treatments can also be equivalent (i.e., the difference in outcomes are so small that it cannot be detected with the largest imaginable study, or is far beyond clinical meaning). This is a two-tailed assumption: the treatment of interest is neither better nor worse than its competitor. Under a one-tailed hypothesis, it may be at least as good (or non-inferior) to another one. There are two major types of endpoints: (1) surrogates (such as laboratory measures and radio-

The art of choosing sound study endpoints graphs), and (2) patient reported outcomes (PROs) (e.g., Short-Form 36, SF-36). The selection of suitable endpoints depends on the purpose and scope of the planned study, the available evidence from previous trials, the clinical setting, the intervention(s) of interest, and anticipated treatment effects. Of course, the classic chronology of clinical investigation from phase I (e.g., pharmacodynamics, toxicity, and tolerated dosage of a drug) to phase II (e.g., preliminary data of success, like increased bone mineral density) and phase III studies (i.e., outcome evaluation) does rarely meet the practice of orthopaedic and trauma research. Clinical trials must employ validated instruments with known psychometric properties. Self-developed, untested scales and scores, as well as ‘‘modifications’’ of available tools must be avoided. Instruments need to be tested and validated in separate studies. Key features of outcome instruments are: (1) validity, (2) reliability, and (3) responsiveness. Validity implies that the instrument measures what it is intended to measure. Reliability assesses technical measurement properties such as test— retest, interobserver reproducibility, and the consistency of a scale. A reliable instrument ensures that measurements are made with minimum error. It must be stressed that an instrument can be reliable (i.e., precise) but still be invalid, if it fails to measure the concept it is supposed to measure. Responsiveness is the instrument’s sensitivity to change in the measured concepts, and its ability to note changes when they are present. Responsiveness is an important characteristic for measuring effects of treatments over time. Many questionnaire based measurement instruments are index measures. Each answer is scored, and a total score is calculated based on the formula combining individual scores. For example, the Harris hip score has a maximum of 100 points and combines 0 to 44 points for pain, 0 to 47 points for function, 0 to 5 points for range of motion, and 0 to 4 points for absence of deformity. All questions in the instrument need to be answered to calculate the overall score. It is not appropriate to employ only a few questions from the instrument, as it prevents calculation of the index. Further to index-based measures, some instruments create a profile consisting of multiple uncombined scores from multiple related domains. For example, the American Spine Injury Association outcome measure (ASIA) consists of two 0 to 100 index measures (ASIA motor and ASIA sensory), and one single rating for impairment from A to E.

657

The primary endpoint One of the endpoints serves as the primary endpoint. The choice of the primary endpoint has to align with the main objective of the study. The primary endpoint determines the statistical design of the study. Sample size and power are based on statistical assumptions about the behaviour of the primary endpoint variable. While sample size calculation should remain in the hand of the statistician, the qualitative and quantitative definition of the primary endpoint variable is the clinician’s responsibility. As the sample size and power is always tailored to the primary endpoint, the study can be under-powered (i.e., too small sample size) for secondary endpoint analysis. It is of particular importance that the primary endpoint can be measured with a valid, reliable, and responsive instrument.4 The primary endpoint should be relevant for the main purpose of the study.

Secondary endpoints Secondary endpoints create combinations of outcome measures that provide important information about the benefits and harms of the investigated treatment. Evaluation of treatments commonly includes condition-specific and generic endpoints. Condition-specific instruments target areas of health affected by a specific condition (e.g., osteoarthritis). In orthopaedic surgery, specific instruments most often relate to a certain anatomic region. Numerous instruments have been developed to assess the function of ankles, knees, hips, shoulders, elbows, hands, and wrists. Not all instruments have been rigorously tested for validity, and investigators are advised to verify the scientific standing of the planned outcome instrument in advance. Generic instruments capture overall health, function, and well-being. They allow for illustrating the overall impact and significance of an intervention on physical and mental health, and comparing treatments across different conditions and populations. The most widely used generic health instrument is the SF-36 (Medical Outcomes Trust, Boston, MA). The SF-36 is a multipurpose, generic, short-form health survey with 36 questions, available in several languages. It yields an eight-scale profile of scores: (1) limitations in physical activities because of health problems, (2) limitations in social activities because of physical or emotional problems, (3) limitations in usual role activities because of phy-

658 sical health problems, (4) bodily pain, (5) general mental health (psychological distress and wellbeing), (6) limitations in usual role activities because of emotional problems, (7) vitality (energy and fatigue), and (8) general health perceptions. In addition, two composite scores have been constructed using factorial modelling, one for physical health (Physical Composite Score–—PCS) and the other for mental health (Mental Composite Score–—MCS). A scoring algorithm is available using linear T-score transformation that translates scores by population-based norms into scores with a mean of 50 and a standard deviation of 10. Safety measured through complications and adverse effects, if not a primary endpoint, should always be included as a secondary endpoint. Orthopaedic treatments may be beneficial for the patient, but may be harmful as well. It is the balance of benefits and harms that determines the true clinical value of the treatment. Planned monitoring of safety should be a part of any clinical study. Further to the importance of safety as the denominator of the benefit/harm ratio, safety of subjects enrolled in a clinical study is an ethical imperative. The primary responsibility of the investigator is to assure safety of the subjects in the study. This starts with the planned collection, and regular monitoring of safety endpoints throughout the study.

B. Hanson studies, and lower or even waived for academic studies. When applying for user rights, it is a good idea to clearly specify whether the study has an industrial or non-profit sponsor. Absence of user fees does not imply that investigators do not need to obtain copyright permission. Prospective clinical studies are time and resource consuming. Thus, investigators are easily tempted to pose many relevant research questions, and collect as much data as possible. This, however, may overwhelm patients, cause attrition, and make them lose interest in the study or decline further participation. Therefore, the quantity of information has to be rationalized. Sometimes, it is a good idea to pilot data collection on only a few subjects. In conclusion, the selection of endpoints supports main study objectives. Instruments need to be chosen among validated and widely accepted outcome measures. Special consideration has to be given to the choice of the primary endpoint, which will determine the sample size and other key logistic aspects of the study. Patient-centred, conditionspecific and generic health outcomes ultimately define the clinical value of treatments in orthopaedic and trauma surgery.

Conflict of interest None.

Other topics to consider Many instruments are copyright protected. Copyright protection requires investigators to obtain permission from the copyright holder to use the instrument. Copyrighted instruments cannot be altered without permission from the developer or copyright holder. Copyright protects intellectual investment of the developer and the sponsor, and also ensures that instruments are used in a valid and acceptable fashion. Some instruments also require user fees. User fees are often higher for commercially sponsored

References 1. NIH Consensus Statement on total knee replacement. NIH Consens State Sci Statements 2003;20:1—34. 2. Kane RL, Saleh KJ, Wilt TJ, Bershadsky B. The functional outcomes of total knee arthroplasty. J Bone Joint Surg Am 2005;87:1719—24. 3. Porzsolt F, Ackermann M, Amelung V. The value of health care– —a matter of discussion in Germany. BMC Health Serv Res 2007;7:1—15. 4. Suk M, Hanson BP, Norvell DC, Helfet DL. AO handbook musculoskeletal outcomes measures and instruments, 1 ed., New York: Thieme; 2005.