CLINICAL THERAPEUTICS’VVOL.
18, NO. 5, 1996
Evaluating Quality-of-Life and Health Status Instruments: Development of Scientific Review Criteria* Kathleen N. L.ohr, PhD,’ Neil K. Aaronson, PhD,2 Jordi Alonso, MD, PhD,3 M. Audrey Burnam, PhD,4 Donald L. Patrick, PhD, MSPH,s Edward B. Pen-in, PhD,5 and James S. Roberts, MD6 ‘Health Services and Policy Research Program, Research Triangle Institute, Research Triangle Park, North Carolina, 2Division of Psychosocial Research and Epidemiology, The Netherlands Cancer Institute, Amsterdam, The Netherlands, 3Department of Epidemiology and Public Health, Institut Municipal d’lnvestigacio Medica, Barcelona, Spain, 4The RAND Corporation, Santa Monica, California, ‘Department of Health Services, University of Washington, Seattle, Washington, and 6Voluntary Hospitals of America, Irving, Texas
ABSTRACT The Medical Outcomes Trust is a depository and distributor of high-quality, standardized, health outcomes measurement instruments to national and international health communities. Every instrument in the Trust library is reviewed by the Scientific Advisory Committee against a rigorous set of eight attributes. These attributes consist of the following: (1) conceptual and measurement model; (2) reliability; (3) validity; (4) responsiveness; (5) inter-
*Presented at the first annual international meeting of the Association for Pharmacoeconomics and Outcomes Research, May 12-15, 1996, Philadelphia, Pennsylvania.
0149-2918/96/$3.50
pretability; (6) respondent and administrative burden; (7) alternative forms; and (8) cultural and language adaptations. In addition to a full description of each attribute, we discuss uses of these criteria beyond evaluation of existing instruments and lessons learned in the first few rounds of instrument review against these criteria.
INTRODUCTION This paper reports on work from the Medical Outcomes Trust and its Scientific Advisory Committee (SAC).‘** The Trust is a nonprofit public service organization, established in 1994 to serve as a depository and distributor of highquality, standardized, health outcomes
979
CLINICAL THERAPEUTICS”
measurement instruments to national and international health communities. The goal of the Trust is to achieve universal adoption of health outcomes measurement to improve the value of health services. It believes that strategies for meamonitoring, and managing suring, functional outcomes, and for integrating those approaches into quality improvement programs, offer the best hope of achieving value from the nation’s investments in health care. Thus the Trust was established to ensure that advanced patient-based instruments and related support materials are easily available to all interested parties. The Trust does this through a variety of services intended to support the application of health outcomes assessment in everyday health care practice; those services include providing timely information and answers to questions, technical support, and ongoing educational activities.3 To ensure that the instruments accepted into the Trust meet rigorous scientific standards, it established the SAC to evaluate all instruments submitted and make recommendations to the Trust’s board of trustees. The SAC evaluates these instruments in the context of their intended applications as well as their general significance and contributions to the field, and also in relation to explicit, public criteria that ensure consistent and defensible recommendations and decisions. To that end, the SAC in 1995 made public its current set of eight attributes used to review and evaluate instruments. These are briefly presented in the remainder of this paper in eight “pairs”: first a brief definition of the attribute and then a short outline of the assessment criteria. The appendix gives the fully detailed set of attributes and criteria.
980
ATTRIBUTES CRITERIA
AND RELATED
Conceptual and Measurement Model The first attribute is the instrument’s conceptual and measurement model. The conceptual model comprises the underlying rationale for and description of the concepts that the measure is intended to assess and the relationships between or among those concepts. The measurement model is reflected in the instrument’s scale and subscale structure and the procedures that are used to create the scale scores. At least four major questions must be answered about the conceptual and measurement model. First, what is the basis for combining items into scales? Second, what descriptive statistics for scales can be provided, and do the scales demonstrate adequate variability? Third, what evidence describes or supports the intended level of measurement (eg, ordinal, interval, or ratio scales) or the scaling assumptions for preference-weighted measures? Fourth, are the procedures for deriving scale scores from raw scores adequately specified and justified?
Reliability Reliability is the degree to which the instrument is free from random error. This aspect of an instrument is examined in two ways-for internal consistency reliability, that is, high correlations among test items, and for reproducibility, that is, the stability over time in test-retest circumstances or the consistency of scores across raters at a point in time. The SAC uses several criteria to evaluate reliability. These include reliability estimates and standard errors of measure-
K.N. LOHR ET AL.
ment for all elements of an instrument; a clear, complete, and detailed description of how the reliability data are collected; reliability estimates for subpopulations (eg, persons with a specific chronic disease or groups with different languages or ethnic backgrounds); and the application of minimal reliability coefficients, depending on whether the comparisons are to be with groups or for individuals. In addition, specific information about test and retest scores is expected, including various coefficients appropriate for instruments yielding interval-level or nominal or ordinal scale values.
as an element of construct validation. Some experts think of it as the ratio of real change over time to variability in scores over time that is not associated with true change in status. More colloquially, responsiveness is the signal-to-noise ratio. The SAC assesses this attribute using longitudinal data, ideally from field tests that compare a group that is expected to change with one that is expected to stay the same, in which change scores are expressed as effect sizes. SAC reviewers look for clear descriptions of those populations and of the approaches used to calculate change and effect size.
Validity
Interpretability
In many arenas, validity is the most important attribute. The SAC uses a classic definition-namely, the degree to which an instrument measures what it purports to measure. We are concerned with three aspects of validity: content, construct, and criterion. The standards used by the SAC to evaluate validity include the methods used to collect content and construct validity as well as a defense of any criteria measures when the instrument developers make a claim for criterion validity. SAC reviewers also look for validity evidence for every proposed use of the instrument and for different populations if it is believed that validity may differ for such groups.
Interpretability is defined as the degree to which one can assign qualitative meaning-that is, clinical or commonly understood connotations-to quantitative scores. This can be done with various types of information intended to aid in interpreting scores on the instrument. Examples include comparative data on the distribution of scores from other groups, including the general public, and information on the relationship of scores to disease conditions or health care interventions, or to various events such as losing a job, graduating from college, or needing institutional care. Basically, in evaluating interpretability, the SAC seeks clear descriptions of the comparison populations and the means by which the relevant data (eg, on clinical conditions or life events) were amassed, recorded, interpreted, and displayed.
Responsiveness Responsiveness is an instrument’s ability to detect change in outcomes that matter to persons with a health condition, to their significant others, or to their providers. It sometimes is referred to as sensitivity to change, and it is also sometimes regarded
Respondent and Administrative Burden The sixth attribute is really a pair of properties-respondent burden and administrative burden. These involve the time,
981
CLINICAL THERAF’EUTICS”
energy, financial resources, personnel, or other resources required of respondents or those administering the instrument. With respect to respondent burden, the SAC is concerned that the instrument place no undue physical or emotional strain on the respondent. Among the important items of information are the following: time needed to complete the form or interview; levels of reading or comprehension assumed; any special requirements that might be placed on respondents, such as need to consult health or financial records; and acceptability of the instrument, such as the level of missing data or refusal rates. Regarding administrative burden, the SAC requires information about the time needed for a trained interviewer to administer the instrument, the assumptions about the level of training or expertise needed to administer the instrument, and any special resources needed, such as specific computer equipment to administer, score, or analyze the instrument.
Alternative Forms Alternative forms, the seventh attribute, refers to all the ways in which the instrument might be administered other than the original way; this particular characteristic can refer to many different modes of applying the instrument. In addition, alternative forms can include proxy versions of the original instrument. Two aspects of evaluating alternative forms of an instrument are important. The first is to evaluate these using most of the criteria already mentioned-that is, evidence of reliability, validity, responsiveness, interpretability, and burden. In addition, SAC reviewers want to see information indicating how compa-
982
rable each alternative inal document.
form is to the orig-
Cultural and Language Aaizptations Finally, the eighth attribute involves cultural and language adaptations. Here the focus is on conceptual and linguistic equivalence between the original instrument and its adaptations-that is, equivalence in relevance and meaning of the various concepts in an instrument as well as equivalence of the wording and meaning of items and response choices. In addition, the independent psychometric properties of the adaptations are significant aspects of the review, so again the SAC requires information to be provided about reliability, validity, responsiveness, interpretability, and burden. In judging this attribute, SAC reviewers seek material about how the developers sought to achieve conceptual equivalence, ideally in terms of content validity. Also, common rules concerning translations are applied: at least two forward translations and a pooled forward translation; at least one (more would be better) backward translation and another pooled translation; review of these versions by both lay and expert panels, with revisions as necessary; and field tests to gather evidence of comparability. Certainly any significant differences between the original and translated versions should be pointed out and explained.
LESSONS LEARNED IN EARLY ROUNDS OF INSTRUMENT REVIEW By early 1996, the SAC had reviewed perhaps two dozen instruments using these criteria and a wide array of information
K.N. LOHR ET AL.
groups (over time) may be found but may not be of any real importance to either the patients or the clinicians. Furthermore, data related to the clinical interpretation of scores are critical when health status or quality-of-life instruments are to be introduced into clinical practice settings, again because of the central importance of the patients’ and the clinicians’ perspectives. A second lesson is that, notwithstanding these gaps, developers tend to be focused on the conceptual and statistical properties of their measures, and issues of the practicality of the instrument get short shrift. Even when information on reliability and validity is available, for instance, facts and figures on administrative or respondent burden are not given. For users wanting good guidance from the Trust about applications of different instruments, this is an unfortunate omission. A third point is an inevitable tension between insisting on scientific rigor and passing instruments that may not achieve the level of scientific validity that we would like to see but that appear to have real promise or to fill an important gap in the
supplied by the developers or their representatives. The main purpose has been Eo make determinations about recommendations to the Trust’s board concerning instruments to be included in its library. (For instruments in the Trust’s library in September 1996, see the table.) This effort has provided several lessons. First, the psychometric analyses may well have been done for much of the work reviewed, but they often are not fully reported. The SAC often has to defer decisions and ask specifically for the missing information in a second round. In particular, two key pieces of information are often lacking altogether or are only partially available for review. One area involves data on the responsiveness of instruments to changes in health status over time, and the other concerns data that will facilitate (or even enable) the interpretation of scores in terms that are meaningful for clinicians or lay persons and policy makers. This gap appears particularly worrisome for largescale clinical trials, where statistically significant differences in scores between
Table.
Medical Outcomes Trust-approved
instruments
(as of September
1996).
London Handicap Scale Quality of Well-Being Scale Seattle Angina Questionnaire SF-12 Health Survey SF-36 Health Survey (standard and acute versions) Languages available: Australia/New Zealand (English) Canada (English) Germany (German) Spain (Spanish) Sweden (Swedish) United Kingdom (English) United States (English) Sickness Impact Profile
983
CLINICAL THERAPEUTICS’
measurement armamentarium. Generally, the SAC has had the primary objective of applying stringent standards to all instruments as a means of ensuring the quality of those contained in the Trust’s library. When the information needed may well be available or obtainable with relative ease, the SAC asks those submitting instruments to provide or collect and forward it, as a means of encouraging the full development of their work and this field.
CONCLUSIONS By consistently applying publicly known criteria, the SAC can contribute to the Trust’s objective of assuring widespread availability of a comprehensive library of high-quality health outcomes instruments. By making the criteria well known, we believe that others may use them for evaluating and revising their own instruments, for developing new measures, and for choosing functional outcomes instruments for both clinical trials and health services research. Finally, the criteria may prove valuable for others to use in assessing the adequacy of research, in evaluating claims
984
for new diagnostic or therapeutic modalities, and in reviewing publications in this field. The SAC and the Trust welcome both submissions of health status and quality-of-life instruments and comments and suggestions about the instrument review criteria and their applications.
Address correspondence to: Kathleen N. Lohr, PhD, Director, Health Services and Policy Research Program, Research Triangle Institute, PO Box 12194, 3040 Cornwallis Drive, Research Triangle Park, NC 27709-2194. REFERENCES Perrin EG. SAC instrument review process. Med Outcomes Trust Bull. 1995; 3:l.
Scientific Advisory Committee. Instrument review criteria. Med Outcomes Trust Bull. 199$3:1-N SourcePages. Vol. 1, No. 1. Boston: Medical Outcomes Trust; 1996.
K.N. LOHR ET AL.
Appendix.
Scientific Advisory Committee INSTRUMENT REVIEW CRITERIA The Scientific Advisory Committee of the Medical Outcomes Trust has established criteria by which to evaluate instruments that are submitted to the Trust for inclusion in the library. The Committee identified eight instrument attributes to serve as the principle foci of their review. The relative importance of criteria for instrument review may differ depending on the intended use(s) and application(s) of the instrument. Instruments may be intended to distinguish between two or more groups, assess change over time, or predict future status. Instruments will be reviewed in the context of documented
applications as stated in the Instrument Submission Form available by contacting the
Trust. The properties of an instmment are context specific. An instrument that works well for one purpose or in one setting or population may not do so when applied for another purpose or in another setting or population. Ideally, one would want to have evidence of the measurement properties of an instrument for each of its intended applications.
I. CONCEPTUAL AND MEASUREMENT MODEL DEFINITION A conceptual model is a rationale for and description
of the concept(s) that the measure is intended
to assess and the relationship between those concepts. A measurement
model is defined as an instru-
ment’s scale and subscale structure, and the procedures followed to create scale and subscale scores. The adequacy of the measurement
model can be evaluated by examining evidence: (1) that a scale
measures a single conceptual domain or construct; (2) that multiple scales measure distinct domains; (3) that variability in the domain is adequately represented by the scale; and (4) that the intended level of measurement
of the scale and its scoring procedures are well justified.
REVIEW CRITERIA (1) The conceptual and empirical basis for combining multiple items into a single scale score and/or multiple scale scores should be provided. This might include information
on factor structure, distinctiveness
consistency
of scales as generated by methods such as factor analysis and Rasch
of multiple scales, and internal
or structural equation modeling.
(2)
Descriptive
statistics for each scale should be provided, including information
on
central tendency and dispersion, skewness, and frequency of missing data. Any other evidence that the scale demonstrates adequate variability in a range that is relevant to its intended use should be provided.
985
CLINICAL THERAPEUTICS”
(3)
A description
of the intended level of measurement,
for example, ordinal, interval or
ratio scales, should be provided, along with available supportive evidence. For preference-weighted assumptions (4)
measures, a rationale and evidence supporting
scaling
should be provided.
Procedures for deriving scale scores from raw scores should be clearly specilied and a clear description standardization)
of and rationale for transformations
(such as weighting
and
should be provided.
II. RELIABILITY DEFINITION The principal definition of test reliability is the degree to which an instrument is free from random error. This succinct definition implies homogeneity sistency (ie, high correlations)
of content on multi-item tests and internal con-
among test items. The two approaches recommended
test reliability are coefftcient (Y(Cronbach’s alpha) and alternative form correlations. ter approach is seldom used in health status assessment,
for examining Because the lat-
the coefficient 0: can be considered the most
relevant approach to reliability estimation. A second definition
of reliability
is reproducibility
or stability of an instrument
retest) and interrater agreement at one point in time. The two definitions
over time (test-
are largely independent
of
one another.
A. Internal Consistency Coefficient
cx provides an estimate of reliability based on all possible correlations
between two sets of items within a test. For instruments
employing
response choices, an alternative formula, the Kuder-Richardson
dichotomous
formula 20 (KR-20).
is available.
B. Reproducibility Test-retest reproducibility: Test-retest reproducibility
is the degree to which an instrument yields stable scores
over time among respondents who are assumed not to have changed on the domains being assessed. The influence of test administration on the second administration may overestimate reliability. Conversely, variations in health, learning, reaction, or regression to the mean may yield test-retest data underestimating reliability. Despite these cautions, information on test-retest reproducibility data is important for the evaluation of the instrument. Inter-observer
(interviewer)
reproducibility:
For instruments administered by an interviewer, to both i&a-observer and inter-observer agreement.
986
test-retest
reproducibility
may refer
K.N. LOHR ET AL.
REVIEW CRITERIA
(1)
Reliability estimates and standard errors should be reported for all elements of an instrument, including both the total score and subscale scores, where appropriate.
(2)
A clear description should be provided of the methods employed to collect reliability data. This should include (a) methods of sample accrual and sample size; (b) characteristics of the sample (eg, sociodemographics, clinical characteristics if drawn from a patient population, etc.); (c) the testing conditions (ie, where and how the instrument of interest was administered); and (d) descriptive statistics for the instrument under study (ie, means, standard deviations, floor and ceiling effects, etc.).
(3)
Where there are reasons measurement will differ instrument is to be used, of interest (eg, different groups, etc.).
(4)
Test-retest reproducibility information should be provided as a complement to, not as a substitute for, internal consistency. Reproducibility is more important when repeated measures with the instrument are proposed.
(5)
A well-argued rationale should support the time elapsed between first and second administration, as well as the design of the study to ensure that changes in health status were minimal, ie, including transitional questions on general and specific health or functional status.
(6)
Information about test and retest scores should include the appropriate central tendency and dispersion measures of both test and retest administrations.
(7)
For instruments yielding interval-level data, information on test-retest reliability (reproducibility) and interrater reliability should include intraclass correlation coefficients (ICC). For nominal or ordinal scale values, kappa and weighted kappa, respectively, are recommended.
(8)
Commonly accepted minimal standards for reliability coefftcients comparisons and 0.90-0.95 for individual comparisons.
to believe that reliability estimates or standard errors of substantially for the various populations in which an these data should be presented for each major population chronic disease populations, different language or cultural
are 0.7 for group
987
CLINICAL THERAPEUTICS”
III. VALIDITY DEFINITION The validity of an instrument is defined as the degree to which the instrument measures what it purports to measure. There are three ways of accumulating A.
Content-related:
evidence that the content domain of an instrument
relative to its intended use. Methods commonly content-related
Construct-related:
is appropriate
used to obtain evidence about
validity include the use of lay and expert panel judgments
clarity, comprehensiveness, B.
evidence for the validity of an instrument:
and redundancy
of the
of items and scales of an instrument.
evidence that supports a proposed interpretation
of scores on the
instrument based on theoretical implications associated with the constructs. Common methods to obtain construct-related
validity include an examination
of the logical
relations that should exist with other measures and/or patterns of scores across groups of individuals. C.
Criterion-related:
evidence that shows the extent to which scores of the instrument
are related to a criterion measure. Criterion measures are measures of the target construct that are widely accepted valid measures of that construct. In the area of health status assessment,
criterion-related
validity is rarely tested because of
the absence of widely accepted criterion measures.
REVIEW CRITERIA
(1)
Evidence of content validity should be presented
for every major proposed use of
the instrument. Information about the methods for developing instrument should be provided.
(2)
the content of the
Evidence of construct validity should be presented for every major proposed use of the instrument.
When data related to criterion validity are presented,
a clear
rationale and support for the choice of criteria measures should be stated. A rationale should be provided to support the particular mix of evidence presented for the intended uses. (3)
A clear description should be provided of the methods employed to collect validity data. This should include (a) methods of sample accrual and sample size; (b) characteristics of the sample (eg, sociodemographics, clinical characteristics if drawn from a patient population, etc.); (c) the testing conditions (ie, where and how the instrument of interest was administered); and (d) descriptive statistics for the instrument under study (ie, means, standard deviations, floor and ceiling effects, etc.).
988
K.N. LOHR ET AL.
(4)
The composition of the validation sample should be described in sufficient detail to make clear the population(s) to which it applies. Available data on selective factors that might reasonably be expected to influence validity, such as gender, age, ethnicity, and language, should be described.
(5)
Where there are reasons to believe that validity will differ substantially for the various populations in which an instrument is to be used, these data should be presented for each major population of interest (eg, different chronic disease populations, different language or cultural groups, etc.).
IV. RESPONSIVENESS DEFINITION Sometimes
referred to as sensitivity to change, responsiveness
construct validation process. Responsiveness defined as the minimal change considered their significant
is viewed as an important part of the
refers to an instrument’s ability to detect change, often to be important by the persons with the health condition,
others, or their providers. The criterion of responsiveness
requires asking whether
the measure can detect differences in outcomes that are important, even if those differences are small. Responsiveness
can be conceptualized
also as the ratio of a signal (the real change over time that
has occurred) to the noise (the variability in scores seen over time that is not associated
with true
change in status). Common methods of evaluating responsiveness intervention
that is expected
include comparing
to affect the construct,
and comparing
scale scores before and after an changes in scale scores with
changes in other related measures that are assumed to move in the same direction as the target measure. Assessment
of responsiveness
often involves estimation of the effect size. Effect size is an estimate
of the magnitude of change in health status. Effect size translates the before-and-after a standard unit of measurement. tion of effects is discussed
changes into
Different methods may be used to calculate effect size. Interpreta-
in Section V.
REVIEW CRITERIA (1) For any claim that an instrument is responsive, evidence should be provided on the change scores found in field tests of the instrument. should be expressed
These change scores
ideally as effect sizes, with information
provided on the
methods used to calculate the effect size.
(2)
Claims for an instrument’s responsiveness
should be derived from longitudinal data,
preferably comparing a group that is expected to change with a group that is expected to remain stable. (3)
The population(s) on which responsiveness has been tested should be clearly identified including the time intervals of assessment, the interventions or measures involved in evaluating change, and the populations assumed to be stable.
989
CLINICAL THERAPEUTICS”
V. INTERPRETABILITY DEFINITION Interpretability
is defined as the degree to which one can assign qualitative meaning to an instru-
ment’s quantitative a quantitative
scores. Interpretability
of a measure is facilitated by information
that translates
score or change in scores to a qualitative category that has clinical or commonly
un-
derstood meaning.
There are several types of information
(1)
comparative
that can aid in the interpretation
data on the distribution
of scores:
of scores derived from a variety of defined
population groups, including, when possible, a representative
sample of the
general population,
(2)
information
on the relationship
of scores to clinically recognized
conditions
or need
for specific treatments, (3)
information on the relationship of scores or changes in scores to commonly recognized life events (such as the impact of losing a job), and
(4)
information
on how well scores predict known relevant events (such as death or need for
institutional
care).
REVIEW CRITERIA (1)
A clear description
should be provided of the rationale for selection of populations
for purposes of comparison
and interpretability
of data. This should include (a)
methods of sample accrual and sample size; (b) characteristics of the sample (eg, sociodemographics, clinical characteristics if drawn from a patient population, etc.); (c) the testing conditions administered);
(ie, where and how the instrument
and (d) descriptive
means, standard deviations, (2)
statistics for the instrument
of interest was under study (ie,
floor and ceiling effects, etc.).
Provide any information regarding various ways in which the data have been reported and displayed in order to facilitate interpretation.
VI. BURDEN DEFINITION Respondent burden is defined as the time, energy, and other demands placed on those to whom the instrument is administered. Administrative burden is defined as the demands placed on those who administer
990
the instrument.
K.N. LOHR ET AL.
REVIEW
(1)
CRITERIA:
Evidence
RESPONDENT
emotional strain on the respondent. circumstances
(2)
BURDEN
should be provided that the instrument places no undue physical or
Information
their instrument
Developers
should indicate when or under what
is not suitable for respondents.
should be provided on the average time and range of time needed to
complete the instrument
on a self-administered
administered
for all population
instrument,
basis or as an interviewer-
groups for which the instrument is
intended. (3)
Information
should be provided about the reading and comprehension
for all population (4)
Information
groups for which the instrument
should be provided about any special requirements
might be placed on respondents, or copy information (5)
Information
level assumed
is intended. or requests that
such as the need to consult health care records
about medications
used.
should be provided on the acceptability
of the instrument,
for example,
by indicating the level of missing data and refusal rates and the reasons for both. REVIEW (1)
CRITERIA:
ADMINISTRATIVE
For interviewer-administered
instruments,
BURDEN information
should be provided on the
average time and range of time required of a trained interviewer instrument.
If appropriate
(eg, if the times differ significantly),
should be given for face-to-face
interview, telephone,
to administer
the
the information
and computer-assisted
formats/applications. (2)
Information
should be provided on the amount of training and level of education or
professional
expertise
and experience
needed by administrative
staff to administer,
score, or otherwise use the instrument. (3)
Information
should be provided about any resources required for administration
the instrument,
software to administer, VII. ALTERNATIVE
of
such as the need for special or specific computer hardware or score, or analyze the instrument.
FORMS
DEFINITION Altemative forms of an instmment include all modes of administration other than the original source instrument. Depending on the nature of the original source instrument altemative forms can include self-administered self-report, interviewer-administemd self-mm trained observer rating, computer-assisted self-repo& computer-assisted interviewer-administered report_ and performancebased measures. hi addition altemative forms may include proxy versions of the original source instrument such as self-administemd proxy report and interviewer-administered proxy report.
991
CLINICAL THERAPEUTICS”
REVIEW CRITERIA Alternative forms of an instrument will be evaluated employing inal source instrument. pretability,
This will include evidence
the same criteria used for the orig-
of reliability,
and burden. An additional criterion for evaluation
validity, responsiveness,
of alternative
inter-
forms will be compa-
rability with the original instrument.
VIII. CULTURAL
AND LANGUAGE
ADAPTATIONS
DEFINITION The cross-cultural
adaptation of an instrument
ceptual and linguistic equivalence, alence refers to equivalence
involves two primary steps: (1) assessment
and (2) evaluation of measurement
in relevance and meaning of the same concepts being measured in dif-
ferent cultures and/or languages. Linguistic equivalence and meaning in the formulation its applications.
For evaluation
tion will be reviewed
of items, response of measurement
refers to equivalence
interpretability,
of question wording
choices, and all aspects of the instrument
properties,
and
each cultural and/or language adapta-
separately by the Scientific Advisory Committee
validity, responsiveness,
of con-
properties. Conceptual equiv-
for evidence of reliability,
and burden.
REVIEW CRITERIA (1)
Information
about methods to achieve conceptual
It is commonly
recommended
equivalence
should be provided.
that the content validity of the instrument
be
assessed in each cultural or language group to which the instrument is to be applied. (2)
Information
about methods to achieve linguistic equivalence
is commonly recommended
should be provided. It
that there should be: (a) at least two forward translations
from the source language, preferably
by persons experienced
in translations
in health status research. This should result in a pooled forward translation; least one, preferably
more, backward translations
and (b) at
to the source language that
results in another pooled translation; (c) a review of translated versions by lay and expert panels with revisions; and (d) field tests to provide evidence of comparability. (3)
Any significant identified
differences
between the original and translated versions should be
and explained.
Appendix. Scientific Advisory Committee instrument Medical Outcomes Trust, Boston, Massachusetts.
992
review criteria.
Reprinted,
with permission,
from the