Measuring patient knowledge of the risks and benefits of prostate cancer screening

Measuring patient knowledge of the risks and benefits of prostate cancer screening

Patient Education and Counseling 54 (2004) 143–152 Measuring patient knowledge of the risks and benefits of prostate cancer screening David M. Radose...

98KB Sizes 4 Downloads 73 Views

Patient Education and Counseling 54 (2004) 143–152

Measuring patient knowledge of the risks and benefits of prostate cancer screening David M. Radosevich a,∗ , Melissa R. Partin b , Sean Nugent b , David Nelson b , Ann Barry Flood c , Jeremy Holtzman a , Nancy Dillon b , Michele Haas b , Timothy J. Wilt b b

a Clinical Outcomes Research Center, School of Public Health, University of Minnesota, Minneapolis, MN, USA Minneapolis Veterans Affairs Center for Chronic Disease Outcomes Research Center, Veterans Affairs Medical Center, Minneapolis, MN, USA c Center for the Evaluative Clinical Sciences, Dartmouth Medical School, Minneapolis, MN, USA

Received 6 February 2003; received in revised form 2 June 2003; accepted 2 June 2003

Abstract This manuscript describes the development and validation of measures assessing patient knowledge about the risks and benefits of prostate cancer (CaP) screening. The measures described include a 10-item knowledge index and four single-item measures, used in previous studies, that assess knowledge of: CaP natural history and treatment efficacy, expert disagreement over the value of CaP screening, and the accuracy of the prostate specific antigen (PSA) test for CaP. We assessed the validity and reliability of these measures on a sample of 1152 male veteran patients age 50 and older. All knowledge index items had acceptable levels of discrimination, difficulty, and reliability. The index demonstrated strong evidence for construct and criterion validity. Much weaker validity evidence was found for the four single-item knowledge questions. The 10-item index developed in this study provides a valid and reliable tool for assessing patient knowledge of the risks and benefits of CaP screening. © 2003 Elsevier Ireland Ltd. All rights reserved. Keywords: Prostate specific antigen; Prostatic neoplasms; Decision making; Risk assessment; Patient education

1. Introduction Prostate cancer (CaP) is the most common cancer among males in the United States and the second leading cause of cancer death in this population after lung cancer [1,2]. Its impact on morbidity and mortality certainly warrants a search for effective treatment strategies and the hope that earlier treatment of CaP detected by screening will reduce the burden of this disease. To date, however, the efficacy of available strategies has not been established, and convincing evidence that mass screening of asymptomatic men reduces the CaP disease burden is lacking [1]. For these reasons, most organizations publishing CaP screening guidelines do not recommend mass CaP screening, but rather encourage clinicians to provide their male patients with balanced information about the risks and benefits of CaP screening and to involve them in the decision of whether or not to be screened [3–5]. ∗ Corresponding author. Present address: Transplant Information Services, Department of Surgery, 925 Delaware Street S.E., Dinnaken 150, Minneapolis, MN 55414-2932, USA. Tel.: +1-612-626-4701; fax: +1-612-625-9467. E-mail address: [email protected] (D.M. Radosevich).

Approaches to providing medical care that advocate involving patients in this way are often referred to as “shared decision making”. Shared decision making is a process where patients and clinicians work together to reach decisions about patient care. The purpose of the process is not only to inform patients about available options, but also to help them identify and understand the value they place on the possible outcomes of these options. A shared decision making approach is most appropriate in situations where the evidence regarding efficacy is uncertain, available options have different and non-trivial risks associated with them, and/or decisions can have different long and short term implications [6]. In other words, the shared decision making approach is most appropriate when there is clearly a decision to make, and its resolution is neither constant across all individuals nor immediately obvious for any given individual. Since these conditions all apply to CaP screening, it is an ideal target for a shared decision making approach. Patient education is a crucial component of the shared decision making process since patients need to comprehend the options and their relative risks and benefits before they can consider the personal value they place on the possible

0738-3991/$ – see front matter © 2003 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/S0738-3991(03)00207-6

144

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

outcomes of options available [7]. Hence, patient knowledge of the risks and benefits of CaP screening should be an intermediate endpoint in assessing the effectiveness of efforts to promote shared decision making for CaP. Previous research has consistently shown that individuals will not attend to health messages unless they are first persuaded that the message has some personal and important relevance for them. Perhaps in recognition of this consistent finding, all measures of CaP screening knowledge employed in previous studies have included questions on CaP screening risk factors and natural history. However, if patients are to meaningfully participate in the decision making process, they must be aware of and understand not only the importance of the health message, but also the risks and benefits of the available options under consideration and why there is a decision to make. These observations imply that patients will be best equipped to make an informed choice about CaP screening if they understand not only the natural history and risk factors of CaP, but also PSA accuracy and the nature of follow-up diagnostic tests, the efficacy and side effects of CaP treatment options, and the controversy over the value of CaP screening. Recommendations from qualitative interviews with medical experts, patients, and their families regarding what should be included in informed consent for CaP screening confirm that patients and providers agree with this assertion [8]. However, most studies evaluating interventions to facilitate CaP screening decisions that have measured some aspect of CaP screening knowledge have only examined responses to three categorical measures assessing CaP natural history, treatment efficacy, and the predictive value of the prostate specific antigen (PSA) test, which were developed by Flood and others [9–11]. Those measures exclude some aspects of knowledge identified by patients and providers as important to CaP screening decision making [8]. This manuscript summarizes results from a study that developed and tested a multi-item CaP screening knowledge assessment tool. The approach used to develop this knowledge index will be described, and the reliability and validity of the index will be assessed and contrasted with results obtained for four knowledge questions used in several prior studies. These measurement development activities were conducted as a part of a larger randomized trial evaluating two different approaches to facilitating shared decision making about CaP screening.

2. Methods 2.1. Knowledge item generation and piloting A multi-stage process was used to select items for consideration in the final knowledge index measure. First, members of our research group (which included two general internal medicine physicians, two behavioral scientists, a nurse health educator, an epidemiologist, and a biostatistician)

identified the following content domains, corresponding to the educational objectives of the interventions tested: (1) CaP natural history and risk factors, (2) PSA accuracy and follow-up tests, (3) CaP treatment efficacy, and (4) expert disagreement about the value of mass CaP screening. We then solicited input on the relevance of these domains to CaP from three focus groups with patients from the target population (including a separate focus group with African Americans). Next, we developed individual items to assess patient understanding of each of the content domains. Many of the selected items were adapted from questionnaires used in previous studies [9,12,13]. We then prepared a pool of 18 closed-ended questions to test. Most questions were true–false and tested simple recall of basic facts that could be gleaned from the pamphlet and video educational materials evaluated in the larger intervention trial. Several of the questions tested more complex integration of factual content or advanced knowledge of PSA test accuracy. We included many redundant items employing different question formats (i.e. true–false versus multiple choice) to test more complex concepts, such as the risk of dying from CaP and the predictive value of the PSA test. Counting the number of correct responses to these questions permitted scalability consistent with a Likerttype model [14]. Experts from our research group next reviewed our pool of questions for content validity. The project coordinator (Haas) then conducted six cognitive interviews [15] with typical patients considering PSA testing. These interviews, conducted in “thinking-out-loud format,” were used to refine our questions and to clarify the educational material presented in the pamphlet. As recommended by Dillman [16], the interviews focused on the understandability of the words used in asking about PSA screening, on similarity in interpretation of the items, on the difficulty level of the questions, and on the amount of time necessary to complete the questions. Results from these interviews led to the rewording of two knowledge items (to make them more understandable) and the deletion of another. Our revised set of 17 knowledge items (see Table 2) was then piloted in a telephone survey of 115 respondents from the target population. Results from this pilot telephone survey were used to make preliminary assessments regarding item scalability, redundancy, and discriminatory power, and to gauge the time necessary to complete the questionnaire. Results showed that four of the questions were perceived as ambiguous or confusing by a significant proportion of subjects. These questions were reworded for the final set of knowledge items included in the patient follow-up survey. We used psychometric methods to develop a summated knowledge index measure from the 17 knowledge questions included in our patient survey and to assess its validity and reliability. To enable comparisons with previous studies on CaP screening shared decision making, we also separately assessed the validity of the questions numbered 1, 9, 16, and

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

17 in Table 2 since they have been used in several previous studies. 2.2. Data collection The target population for the larger randomized trial (the Prostate Cancer Screening Education, or PROCASE Study) consisted of male patients receiving primary care at four participating Veterans Affairs (VA) Medical Centers in the Midwest who were potentially eligible for making a screening decision (i.e. were at least 50 years old and had no prior diagnosis of CaP). A total of 1152 eligible patients with primary care appointments between April and June 2001 were enrolled. To minimize inclusion of patients with appointments for acute problems (where screening decisions are less likely to be routinely discussed), eligible patients were identified 2 months in advance from administrative databases at each of the four participating centers. In addition to ascertaining patient eligibility, administrative data were used to assess the presence of comorbidities and the use of prescription medications that can cause, or are used to treat, urologic symptoms. All participants in our project were asked to complete a telephone interview 1 week after their scheduled primary care appointment (or 2 weeks after receiving a pamphlet versus video intervention). The telephone interview was designed to assess CaP screening knowledge and other measures. Of the 1152 participants in the larger study, 37 were excluded from our project sample; either because they died (n = 8), had CaP (n = 29), or were female (n = 3). Of the remaining 1112 eligible male patients, 895 were interviewed (overall response rate, 80%). A total of 20 cases interviewed were excluded because they completed less than 80% of the original 17 knowledge items on the survey, leaving 875 subjects in our final analyses. 2.3. Analysis The development and testing of knowledge measures (including the multi-item knowledge index, called the PROCASE Knowledge Index, and additional four questions) concentrated on four psychometric dimensions: item difficulty, item discrimination, reliability, and validity. This approach is consistent with the methods proposed by Crocker and Algina in developing educational tests [17]. 2.3.1. Item difficulty and uncertainty Because the design of all our knowledge questions included “don’t know” as a response option (this was not read to respondents), we applied two separate methods for measuring item difficulty. In the first method, a “don’t know” response was treated as an incorrect response. Using this method, item difficulty was defined as the proportion of respondents who answered the item incorrectly, or one minus the proportion of correct responses. In the second method, the proportion of “don’t know” responses was treated as

145

a distinct response category that reflected the respondent’s level of uncertainty related to the content area. Intuitively, the greater the item uncertainty or “don’t know” responses, the greater the level of confusion related to an item. Using this second method, we defined the proportion of “don’t know” responses as item uncertainty. As a general rule, items of moderate difficulty (i.e. approximately 50% with a correct response to the item) tend to maximize the reliability of a multi-item scale [17] because they discriminate better among respondents than do simpler questions. 2.3.2. Index of discrimination We quantified item discrimination using two approaches: a corrected item–total correlation and an index of discrimination (D). The corrected item–total correlation was computed by separately correlating each item with a total score removing that item. Items with high discrimination (r > 0.3) add more to the overall reliability of the test, but make a more difficult test for respondents [18]. Hence, as a practical consideration, we selected items on the basis of both their ease of completion and their discrimination. Corrected item–total correlation values greater than 0.10 were generally accepted for inclusion in our set of knowledge items. The index of discrimination compared the proportion of correct responses for those items with total scores in the upper 50th versus lower 50th percentile. The D has no known sampling distribution; therefore, only general guidelines can help with interpretation. In general, D-values greater than 0.40 are acceptable for discrimination, and those less than 0.20 are considered inadequate. For the multi-item PROCASE Knowledge Index, an item was accepted for inclusion if it had a D-value greater than 0.30 [19]. 2.3.3. Reliability The preferred method for measuring reliability is replicate testing, also called test-retest reliability. Using this method, the respondent completes the same instrument twice, separated by an interval of 10 days to 2 weeks. In general, measures of health and other attributes are relatively stable and unbiased by re-measurement. When assessing knowledge, however, the second test is unavoidably influenced by the earlier test experience, and hence can positively bias statistical tests of agreement between the first and second test. Due to its susceptibility to the potentially biasing effects of re-measurement [20], we did not employ the test-retest method to assess the reliability of our knowledge measures. As an alternative, we approximated reliability using a measure of agreement restricted to the multi-item PROCASE Knowledge Index [21]. This measure, the average intercorrelation of items with dichotomous responses (i.e. correct versus incorrect), is known as the Kuder–Richardson 20 (KR-20) [22,23]. It is based on the average correlation of parallel versions known as split-halves of the test of the PROCASE Knowledge Index. For our project, we computed KR-20 with the item deleted for each item.

146

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

For single-item knowledge measures examined, a lower bound estimate for reliability of the item can be estimated using the correlation of that item with the multi-item scale measuring the same attribute [24]. However, since we did not obtain multi-item scales measuring each knowledge domain, we did not assess the reliability of the single-item measures of knowledge. 2.3.4. Validity Our project tested three forms of validity: content, construct, and criterion. Content validity, also referred to as face validity, refers to the judgment that a measure looks sensible, or that it measures what it is purported to measure [25]. We evaluated content validity as part of the development process of our instrument. Experts from our research team agreed that the knowledge items made sense at face value and measured the critical domains for being informed about CaP screening choices. Construct validity refers to the extent to which the measure corresponds to concepts studied. We tested construct validity for convergent and discriminant type validity [21]. “Convergent” refers to the existence of a relationship between knowledge and other variables expected to be positively associated with knowledge; “discriminant” refers to the lack of a relationship between knowledge and other variables expected to be unrelated to knowledge. For convergent validity, we hypothesized a positive association between knowledge and the respondent’s level of formal education, history of abnormal PSA test results, and exposure to CaP screening educational materials. We used the odds ratio as a measure of association between respondent characteristics and either being a high scorer on the knowledge index (i.e. greater than the 50th percentile) or having a correct response to the single knowledge questions. For discriminant validity, we expected no association or a weak association between knowledge and both comorbidity and medication use. Criterion validity compares the results of the test with a gold standard. We administered a written version of the knowledge questions to 29 health professionals: registered nurses, advanced practice nurses, and physicians working in the internal medicine and urology services at the Minneapolis VA Medical Center. Each of these clinicians was asked to indicate the correct response to the knowledge questions. In evaluating criterion validity, we compared responses of our experts to the scoring rules for the knowledge items (i.e. the correct response). Evidence for criterion validity of the separate knowledge items was measured as the percent agreement between our experts and the correct response for each knowledge item as determined by the PROCASE research team. We used the results from our analyses to construct a valid and reliable CaP screening knowledge index. We then compared the psychometric properties of the index with those of the four knowledge questions used in several prior studies [9–11].

3. Results Table 1 provides a profile of the 875 patients interviewed as part of the PROCASE patient survey. The sample comprised an older (average age 68.2 years), white, married veteran patient population. Approximately 78% had received at least a high school diploma. Most of them reported being in good to excellent health, but the burden of chronic disease in this population is relatively high. For instance, nearly one-fourth (25.2%) of enrolled subjects had diabetes mellitus, and about 21% had chronic obstructive pulmonary disease (COPD). Most (74.8%) reported having undergone a prior PSA test; almost 18% were on medications to treat benign diseases of the prostate (alpha blockers); more than one-third reported moderate to severe urologic symptoms, using the American Urological Association Symptom Index

Table 1 Personal characteristics of veterans interviewed as part of the baseline PROCASE survey Characteristic

Baseline PROCASE survey (N = 875)

Age in years, mean (standard deviation) Married

68.2 ± 9.34 628 (71.8)

Formal education Less than high school diploma High school diploma Some college or trade school

187 (22.1) 314 (37.0) 347 (40.9)

Non-white race

43 (5.1)

Perceived health Good, very good, or excellent Fair or poor

547 (62.9) 323 (37.1)

Comorbidity Congestive heart failure Chronic obstructive pulmonary disease Asthma Diabetes mellitus Substance abuse Depression

80 179 33 218 60 138

Prostate-related history Ever had PSA test History of abnormal PSA test History of prostate problems

619 (74.8) 97 (11.1) 180 (20.6)

Lower urinary tract symptomsa None Mild Moderate Severe

140 412 252 46

Selected medication use Alpha blocker Diuretic

153 (17.5) 261 (29.8)

(9.3) (20.7) (3.8) (25.2) (6.9) (16.0)

(16.5) (48.5) (29.6) (5.4)

Exposure to educational material about PSA screening Did not receive printed material 425 (48.6) Received printed material 450 (51.4) Read printed material 336 (38.4) Values in parentheses are in percent. a Based on the American Urological Association Symptom Index.

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

[26,27]; and almost 30% were on medications that could cause urologic symptoms (diuretics). 3.1. Item response characteristics Table 2 provides the item response characteristics for all 17 items originally considered for our knowledge index. Item difficulty (proportion of incorrect responses) ranged from 9 to 85%. Item uncertainty (proportion of “don’t know” responses) ranged from 3 to 44%. Of the 17 original items, 6 did not meet minimum item difficulty and uncertainty criteria for inclusion (items 2, 6, 9, 13, 16, 17, Table 2) and one (item 1) was eliminated from the final index because it was redundant with a higher performing item (item #3). Items that were too easy (i.e. greater than 90% correct, or a difficulty score of less than 0.10, as was the case with items 2 and 6 in Table 2) or too difficult (i.e. less than 20% correct, as was the case with items 9, 16 and 17) were dropped, because they provided little or insufficient information regarding variability in participant knowledge level. In addition, questions that elicited a high proportion of “don’t know” responses were excluded (i.e. uncertainty levels greater than 20% across all participants, as was the case for item #13). Four of the items excluded from the final knowledge index (items 1, 9, 16 and 17) were analyzed separately for validity and reliability since they have been used in several previous studies. The above item analysis supported a 10-item test measuring subjects’ global knowledge of CaP screening. The 10 remaining questions from the original 17 represented a subset of items that were most strongly correlated with one another and that positively contributed to the overall reliability of the summative index. Item response characteristics for the 10-item PROCASE Knowledge Index are shown in Table 3. Our analysis showed that difficulty ranged from relatively easy (for the question concerning the need for a biopsy after an abnormal PSA test result) to moderately difficult (for the question concerning CaP being the most common underlying cause of problems with urination). Overall, no more than 1-in-12 to 1-in-5 respondents were uncertain as to how to respond to each question. The greatest uncertainty was for the question concerning the effects of CaP treatment on sexual function. Item uncertainty greater than 20% was deemed unacceptable for inclusion as a measure of knowledge. All the items used to compute the multi-item PROCASE Knowledge Index had acceptable levels of discrimination as measured by D-values greater than 0.20 [19]. The questions concerning CaP patients dying from other causes (item 3, Table 3) and PSA testing finding all CaP (item 9, Table 3) were moderately high in discriminating between high and low scorers. The percent difference in correct responses between high and low scorers for those two questions was over 40%. The question concerning an abnormal PSA test leading to a biopsy (item 7, Table 3) had the lowest discrimination among the ten questions (i.e. D-value less than 0.20).

147

Despite its marginal value, that question was retained because it was relatively easy and respondents were more certain in their responses to it—in effect, it provided a lower bound for discrimination and item difficulty. 3.2. Reliability In computing reliability, “don’t know” responses were treated as incorrect responses. For the PROCASE Knowledge Index, the value of KR-20 was 0.68. Dropping respondents with “don’t know” responses failed to improve reliability above the earlier value. 3.3. Validity 3.3.1. Convergent validity Table 4 summarizes the evidence for the construct validity of measures of CaP screening knowledge. For convergent validity, we found that high scores on the 10-item PROCASE Knowledge Index to be significantly associated as predicted, with higher level of formal education, a history of abnormal PSA test results, and exposure to material about PSA screening. For formal education, we noted a clear gradient association with the global measure of knowledge across educational categories. Respondents with no post-high school education were less likely to score in the upper 50th percentile, as compared with those with some post-high school education. The likelihood of high scores was 59% (O.R. = 0.41) lower for veterans without a high school diploma, as compared with high school graduates (reference). Among respondents with some college or trade school, the odds of a higher score were 81% (O.R. = 1.81) greater. A previous history of abnormal PSA test results doubled the odds (O.R. = 2.00) of high scores. Receiving and reading printed material increased the likelihood of scoring higher in global knowledge, increasing the odds of high scores by 63% (receiving) and 93% (reading). The evidence for convergent validity was weaker for the four single-item knowledge measures. Expert agreement about annual PSA testing (item 17, Table 2) was associated with formal education, history of abnormal PSA test results and exposure to materials about PSA screening. The odds of a correct response to this question were more likely among respondents who had some college or trade school (O.R. = 1.52), had an abnormal PSA test (O.R. = 1.47) and had received the pamphlet (O.R. = 1.45) or read it (O.R. = 1.73). Exposure to material about PSA screening was the only factor associated with correct responses to the CaP treatment efficacy question (item 16, Table 2—see column 3 of Table 4). Correct responses to the CaP natural history question (item 1, Table 2—see column 4 Table 4) were positively associated with higher level of formal education and exposure to material about PSA screening. Knowledge of the accuracy of a PSA test (item 9, Table 2—see column 5 Table 4) was unrelated to any of the factors used to test for convergent validity.

148

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

Table 2 Content domains and knowledge items considered for inclusion in the PROCASE Knowledge Index Domain

Natural history of prostate cancer and prostate cancer risk factors

Item numbera

1

2 3 4

5 6 7 8 PSA test accuracy and diagnostic tests

9

10

11 12

Treatment efficacy and complications

13

14

15

16

Expert agreement

17

Itemb

Item difficulty

Item uncertainty

Based on what you have heard or read, about how many men diagnosed as having prostate cancer will actually die because of prostate cancer? [Would you say most die because of prostate cancer, about half die because of prostate cancer, or most die because of something else?] The chance of getting prostate cancer increases with age Most men diagnosed as having prostate cancer die of something else Men are more likely to die because of prostate cancer than because of heart disease Prostate cancer is the most common cause of problems with urination Prostate cancer is a potentially serious disease that can cause death Prostate cancer never causes problems with urination Prostate cancer is one of the least common cancers among men

0.42

0.15

0.10

Based on what you have heard or read, how many men with abnormal prostate specific antigen (PSA) test results have prostate cancer? [Would you say most don’t have prostate cancer, about half have prostate cancer, or most do have prostate cancer?] If you have an abnormal prostate specific antigen (PSA) test result, your doctor may recommend that you have a prostate biopsy The prostate specific antigen (PSA) test will pick up all prostate cancers A prostate biopsy can tell you with more certainty whether you have prostate cancer than a prostate specific antigen (PSA) test can Persistent headaches are a common side effect of prostate cancer treatments Loss of sexual function is a common side effect of prostate cancer treatments Problems with urination are common side effects of prostate cancer treatments Prostate cancer treatments have been shown to extend the life of a man with prostate cancer All experts agree that men should get annual PSA tests

Corrected item–total correlation

Index of discrimination (D) (%)

Cronbach’s coefficient alpha (item deleted)

0.1438

39.7

0.6558

0.04

0.0626

13.1

0.6698

0.39

0.17

0.1497

42.3

0.6552

0.29

0.12

0.1951

38.1

0.6506

0.68

0.15

0.1209

37.3

0.6606

0.09

0.03

0.0569

8.3

0.6740

0.13

0.08

0.2661

24.2

0.6512

0.25

0.08

0.1367

30.3

0.6600

0.76

0.22

−0.0407

16.2

0.6826

0.12

0.07

0.1622

18.0

0.6631

0.30

0.14

0.2269

43.8

0.6452

0.20

0.12

0.1886

25.3

0.6554

0.58

0.44

0.0192

27.2

0.6747

0.32

0.20

0.1457

32.9

0.6570

0.22

0.12

0.1218

22.2

0.6629

0.85

0.11

−0.0220

10.0

0.6803

0.82

0.07

0.0802

20.3

0.6690

a Item number: 10-item PROCASE Knowledge Index (3, 4, 5, 7, 8, 10, 11, 12, 14, 15); four single-item knowledge indexes—most diagnosed with CaP do not die of CaP (item 1), positive predictive value of PSA tests (item 9), natural history of CaP (item 16), experts agree about annual PSA tests (item 17). b With the exception of items 1 and 9, the response format for all items is true–false. Response choices for items 1 and 9 appear in square brackets [ ]; correct choice is underlined.

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

149

Table 3 Item response characteristics for the 10-item PROCASE Knowledge Index Annotated description

Item number

Difficulty of item

Item uncertainty

Corrected item–total correlation

Index of discrimination (D) (%)

Cronbach’s coefficient alpha (item deleted)

Treatment affects sexual function Treatment affects urination Have cancer but die other reason More likely die cancer Cancer causes urination problems Cancer never causes urination Abnormal PSA equals biopsy CaP least common cancer PSA finds all prostrate cancers Biopsy more certain than PSA Average

1 2 3 4 5 6 7 8 9 10

0.32 0.22 0.39 0.29 0.68 0.13 0.12 0.25 0.30 0.20 0.29

0.20 0.12 0.17 0.12 0.15 0.08 0.07 0.08 0.14 0.12 0.13

0.1256 0.1039 0.1033 0.1389 0.0430 0.2554 0.1469 0.1231 0.1869 0.1701 0.1397

32.9 22.2 42.3 38.1 37.3 24.2 18.0 30.3 43.8 25.3 31.4

0.6656 0.6718 0.6669 0.6637 0.6787 0.6589 0.6737 0.6674 0.6546 0.6640 0.6780

3.3.2. Discriminant validity We evaluated discriminant validity for the PROCASE Knowledge Index separately for comorbidities and medication use. With the exception of chronic obstructive pulmonary disease, the odds for all comorbidities and medications were, as expected, not statistically significantly different between high and low scorers on the index. Discriminant validity tests for the single-item on expert disagreement about annual PSA testing (item 17, Table 2) revealed a trend similar to that observed for the PROCASE Knowledge Index. Respondents with COPD were less likely

than those without COPD to correctly respond (O.R. = 0.63). Other comorbidities and medication use were unrelated to item 17. Discriminant validity for the remaining individual knowledge questions examined was relatively weak. Contrary to what was predicted, respondents with asthma and diabetes mellitus were more likely to correctly answer the question on treatment efficacy (item 16, Table 2—see column 3 of Table 4). Respondents with diabetes mellitus were more likely to have correct responses to the CaP natural history question (item 1, Table 2—see column 4 Table 4), and those with COPD were more likely to have incorrect

Table 4 Odds ratios and 95% confidence intervals for eight or more correct on the knowledge index (high score) and a correct response for the single-item knowledge question according to subject characteristic Characteristic

Ten-item index

Single-item index

Knowledge index (high scores vs. low scores)

Experts agree about PSA tests

Treatment efficacy

Natural history

PSA accuracy

Convergent validity Formal education Less than high school diploma High school graduate Some college or trade school

0.41 (0.30, 0.57) 1.00 1.81 (1.37, 2.39)

0.73 (0.50, 1.06) 1.00 1.52 (1.10, 2.09)

1.14 (0.72, 1.76) 1.00 1.00 (0.68, 1.46)

0.63 (0.40, 0.99) 1.00 1.62 (1.14, 2.29)

0.75 (0.49, 1.14) 1.00 1.14 (0.74, 1.61)

History of abnormal PSA test

2.00 (1.43, 2.78)

1.47 (1.00, 2.16)

0.67 (0.41, 1.09)

0.74 (0.48, 1.14)

1.12 (0.75, 1.61)

Exposure to material about PSA screening Did not receive material 1.00 Received material 1.63 (1.25, 2.12) Read printed material 1.93 (1.44, 2.59)

1.00 1.45 (1.06, 1.98) 1.73 (1.27, 2.44)

1.00 1.86 (1.25, 2.77) 2.31 (1.54, 3.48)

1.00 2.35 (1.64, 3.39) 2.99 (2.05, 4.37)

1.00 1.05 (0.75, 1.48) 1.16 (0.81, 1.66)

0.91 0.63 1.36 1.01 0.98 0.88

1.44 1.20 3.24 1.51 0.70 1.19

0.84 0.56 0.89 1.47 0.63 0.97

1.17 0.84 0.39 1.05 1.50 2.23

Discriminant validity Comorbidity Congestive heart failure COPD Asthma Diabetes mellitus Substance abuse Depression Medication use Alpha blocker Diuretic

1.00 0.59 1.00 0.75 0.68 0.71

(0.64, (0.43, (0.52, (0.55, (0.40, (0.49,

1.57) 0.82) 1.93) 1.01) 1.14) 1.01)

1.01 (0.72, 1.44) 0.90 (0.68, 1.20)

(0.54, (0.44, (0.60, (0.71, (0.54, (0.58,

1.52) 0.92) 3.08) 1.45) 1.77) 1.34)

0.71 (0.48, 1.05) 1.05 (0.75, 1.48)

(0.81, (0.76, (1.58, (1.01, (0.31, (0.72,

2.56) 1.88) 6.65) 2.27) 1.58) 1.96)

1.06 (0.64, 1.73) 1.46 (0.99, 2.16)

(0.46, (0.35, (0.36, (1.01, (0.29, (0.61,

1.54) 0.91) 2.20) 2.15) 1.35) 1.54)

0.69 (0.42, 1.12) 0.87 (0.59, 1.27)

(0.65, (0.55, (0.13, (0.71, (0.78, (1.39,

2.10) 1.29) 1.17) 1.55) 2.89) 3.55)

0.70 (0.44, 1.11) 1.11 (0.77, 1.61)

150

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

Table 5 Agreement among clinical experts on the 14 items used to measure CaP screening knowledge Annotated description

Item number

Registered nurse (n = 4) (%)

Nurse practitioner/ clinical nurse specialist (n = 7) (%)

General internist (n = 11) (%)

Urologist (n = 5) (%)

Total (n = 28) (%)

Ten-item PROCASE Knowledge Index CaP treatment affects sex function CaP treatment affects urination Have CaP but die for another reason More likely to die of CaP than heart disease CaP causes urination problems CaP never causes urination Abnormal PSA MD recommends a biopsy CaP least common cancer PSA finds all CaP Biopsy more certain than PSA

1 2 3 4 5 6 7 8 9 10

100.0 100.0 75.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

100.0 100.0 100.0 100.0 85.7 100.0 100.0 100.0 100.0 71.4

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 90.9

100.0 100.0 100.0 60.0 80.0 100.0 100.0 80.0 80.0 80.0

100.0 100.0 96.3 92.6 92.6 100.0 100.0 100.0 100.0 85.2

Single-item knowledge measures Experts agree about annual PSA tests Treatment efficacy Natural history PSA accuracy

11 13 14 12

75.0 75.0 50.0 50.0

100.0 100.0 85.7 100.0

100.0 100.0 100.0 81.8

40.0 100.0 40.0 60.0

85.2 96.3 85.7 77.8

responses. Except for an association between depression and correct responses, knowledge of the accuracy of a PSA test (item 9, Table 2—see column 5 Table 4) was unrelated to any of the factors used to test for discriminant validity. 3.3.3. Criterion validity We evaluated criterion validity by looking at the percent agreement for each item among the 29 clinical experts completing a written version of the knowledge questions (Table 5). In general, we found a relatively high level of concordance among the clinical experts for the ten items used to construct the 10-item PROCASE Knowledge Index. Overall, less agreement was found for the question concerning the accuracy of the biopsy (over a PSA test): there was 85.2% agreement among the raters. Less concordance was found for our four single-item knowledge questions. Among clinical experts, we found less agreement for the value of annual PSA tests (85.2% agreement), the proportion of PSA tests positive for CaP (77.8% agreement), and the natural history of prostate cancer (85.7% agreement). The greatest concordance found among clinical experts was for the question concerning whether most persons diagnosed with CaP do not die of CaP (96.3% agreement). Across clinical experts, registered staff nurses and urologists were slightly less likely to agree with correct responses than were nurse specialists and general internists.

4. Discussion Current prostate cancer screening guidelines recommend neither promoting nor discouraging screening, but rather involving patients in shared decisions about the issue. Since patients cannot meaningfully participate in decisions about

their care unless they are adequately informed about the risks and benefits of choices they are confronted with, enhancing patient knowledge about the risks and benefits of CaP screening is an essential component of this process. Yet prior research on CaP screening decision aids has not yet produced standard, validated measures of CaP knowledge. To our knowledge, ours is the first study to systematically develop, test and report the psychometric properties of knowledge questions used in evaluating the effectiveness of CaP screening decision making aids. We systematically examined the content validity of the CaP screening knowledge items as part of a multistage development process. In addition to the expertise of our research group, we used cognitive interviews and pre-testing with patients to cull the items to our final set of 17. The 10-item PROCASE Knowledge Index had minimally acceptable reliability and demonstrated strong construct and criterion validity. In contrast, construct validity evidence was weak and inconclusive for several of the single-item knowledge questions used in previous studies. Of the four single-item questions we tested, the item pertaining to PSA accuracy performed the least well. As a question, it was the one with the least agreement among our clinical experts as to the correct response. Future work would benefit from testing parallel versions of instruments measuring the same constructs, since replicate testing is neither practical nor valid. The measures examined in this study address CaP knowledge but do not assess other important components of the shared decision making process such as patient clarity regarding the relative value they place on the various potential outcomes of CaP screening and treatment; patient satisfaction with the decision making process; and provider receptivity to and proficiency with the shared decision making approach. Measures that tap these aspects of the shared

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

151

decision making process have been developed in previous studies [28–30] and we and others are employing them in our evaluative studies of decision aids. The lack of consensus among experts regarding the value of PSA testing was clearly mirrored in our survey of 29 clinical experts consulted for criterion validity analyses. It should come as no surprise that the greatest discordance among our experts was over the accuracy and value of annual PSA testing and the accuracy of PSA testing. Fewer than 80% of our experts agreed in terms of the positive predictive value of PSA tests. In contrast, on other knowledge questions, their agreement was greater than 85% and, most frequently, unanimous. One of the main limitations of our project was that it was conducted on a population that is relatively homogenous with regard to race and ethnicity. However, our sample provided considerable variation on comorbidities, educational background, and prior experience with PSA testing. Hence, our results support the use of this measure in a range of populations. Finally, it is not possible to determine from this project whether mode of administration (i.e. telephone versus self-administered survey) has an affect on assessing prostate cancer screening knowledge. Some previous work suggested that telephone personal interviews resulted in decreased depth of cognitive processing compared to mailed self-administered questionnaires [31–33]. This finding suggests that telephone interviews may obtain a greater number of “don’t know” responses that self-administered questionnaires. However, it is unclear whether the psychometric properties for knowledge measures examined in our analyses would be significantly modified by this type of difference.

the effects of shared decision making process, and enhance the comparability of results across studies. The index is currently being used as an outcome in at least one randomized trial of educational interventions designed to facilitate informed patient decisions about CaP screening and could be a used as an outcome in future studies evaluating other educational interventions in this area. The approach used to develop this index could serve as a model for developing knowledge measures in other areas. Future studies should assess whether mode of administration has any impact on the psychometric properties of this index, and assess the reliability and validity of the PROCASE Knowledge Index in a more racially diverse population. Studies assessing the reliability and validity of the index in a predominantly African American population would be particularly helpful since this population is disproportionately affected by CaP and hence a priority target for interventions designed to facilitate informed decision making about CaP screening.

4.1. Practice implications

References

Effective patient education is crucial to the process of shared decision making. But in evaluating the effects of patient education, standardized tests to measure patient knowledge before and after an intervention are necessary. Two recent studies of CaP screening decision making interventions employed knowledge measures made up of 10–18 items representing domains of information prioritized by study investigators and content experts [12,13]. These knowledge measures cover a broader range of potentially relevant information than the three questions employed by Flood and others [9–11], but still exclude some potentially relevant domains of information. For instance, neither index included a question on expert disagreement about the value of CaP screening, one did not cover the accuracy of the PSA test [12], and the other did not cover treatment efficacy [13]. Furthermore, the validity and reliability of these indexes have not been thoroughly examined. The availability of the standard, reliable CaP screening knowledge assessment tool developed and validated as part of this study will facilitate more rigorous evaluation of future of CaP decision making intervention, advance our understanding of

Acknowledgements Funded by VA Health Services Research and Development Service grant #IIR 99-277-1 to the Center for Chronic Disease Outcomes Research, Veterans Affairs Medical Center, Minneapolis, MN. The authors thank Drs. Ann Barry Flood, Robert Volk, and Marilyn Schapira for contributing versions of their knowledge instruments for use in this study. The views expressed in this article are those of the authors and do not necessarily represent the views of the US Department of Veterans Affairs.

[1] Sarma AV, Schottenfeld D. Prostate cancer incidence, mortality, and survival in the United States: 1981–2001. Semin Urol Oncol 2002;20(February):3–9. [2] American Cancer Society. Cancer Facts and Figures—2001. Atlanta, GA: American Cancer Society; 2001. [3] Office of Technology Assessment. Costs and effectiveness of prostate cancer screening in elderly men. Washington, DC: US Government Printing Office; 1995. [4] Harris R, Lohr KN. Screening for prostate cancer: an update of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med 2002;137(11):917–29. [5] von Escheinbach A, Ho M, Cunningham M, Lins N. American Cancer Society guidelines for the early detection of prostate cancer: update. Cancer 1997;80:1805–7. [6] O’Connor A. Using patient decision aids to promote evidence-based decision making. ACP J Club 2001;July–August:A11–2. [7] O’Connor AM, Rostom A, Fiset V, Tetroe J, Entwisle V, LlewellynThomas H, et al. Decision aids for patients facing health treatment or screening decisions: systematic review. Br Med J 1999;319:731–4. [8] Chan ECY, Sulmasy DP. What should men know about prostatespecific antigen screening before giving informed consent? Am J Med 1998;105:266–74. [9] Flood AB, Wennberg JE, Nease RF, Fowler FJ, Ding J, Hynes LM, et al. The importance of patient preference in the decision to screen for prostate cancer. J Gen Intern Med 1996;11:342–9.

152

D.M. Radosevich et al. / Patient Education and Counseling 54 (2004) 143–152

[10] Wilt TJ, Paul J, Murdoch M, Nelson D, Nugent S, Rubins HB. Educating men about prostate cancer screening: a randomized trial of a mailed pamphlet. Eff Clin Pract 2001;4:112–20. [11] Frosch DL, Kaplan RM, Felitti V. Evaluation of two methods to facilitate shared decision making for men considering the prostatespecific antigen test. J Gen Intern Med 2001;16:391–6. [12] Schapira MM, VanRuiswyk J. The effect of an illustrated pamphlet decision-aid on the use of prostate cancer screening tests. J Fam Pract 2000;49(5):418–24. [13] Volk RJ, Cass AR, Spann SJ. A randomized controlled trial of shared decision making for prostate cancer screening. Arch Fam Med 1999;8:333–40. [14] McIver JP, Carmines EG. Unidimensional scaling. Beverly Hills, CA: Sage; 1981. [15] Dillman DA. The total design method. New York: Wiley; 1978. [16] Dillman DA. Mail and internet surveys: the tailored design method. New York: Wiley; 2000. [17] Crocker L, Algina J. Introduction to classical and modern test theory. Fort Worth: Harcourt Brace Javanovich College Publishers; 1986. [18] Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994. [19] Ebel RL. Measuring educational achievement. Englewood Cliffs, NJ: Prentice-Hall; 1965. [20] Shadish WR, Cook TD, Campbell DT. Experimental and quasiexperimental designs for generalized causal inference. Boston: Houghton Mifflin Company; 2002. [21] Streiner DL, Norman GR. Health measurement scales: A practical guide to their development and use. Oxford: Oxford University Press; 1989. [22] Kuder GF, Richardson MW. The theory of estimation of test reliability. Psychometrika 1937;2:151–60. [23] Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297–334.

[24] Stewart AL, Ware JE, editors. Measuring functioning and well-being: the Medical Outcomes Study approach. Durham: Duke University Press; 1992. [25] Feinstein AR. Clinimetrics. New Haven: Yale University Press; 1987. [26] Barry MJ, Fowler FJ, O’Leary MP, Bruskewitz RC, Holtgrewe HL, Mebust WK, et al. Correlation of the American Urological Association Symptom Index with self-administered version of the Madsen-Iversen, Boyarsky and Maine Medical Assessment Program Symptom Indexes. J Urol 1992;148:1558–63. [27] Fowler FJ, Wennberg JE, Timothy RP, Barry MJ, Mulley AG, Hanley D. Symptom status and quality of life following prostatectomy. J Am Med Assoc 1988;259(20):3018–22. [28] O’Connor AM. Validation of a decisional conflict scale. Med Decis Making 1995;15:25–30. [29] Lerman CE, Brody DS, Caputo GC, Smith DG, Lazaro CG, Wolfson HG. Patients’ perceived involvement in care scale: relationship to attitudes about illness and medical care. J Gen Intern Med 1990;5:29– 33. [30] Holmes-Rovner M, Kroll J, Schmitt N, Rovner DR, Breer L, Rohert ML, et al. Patient satisfaction with health care decisions: the satisfaction with decision scale. Med Decis Making 1996;16:58– 64. [31] Dillman DA, Sangster RL, Tarnai J, Rockwood TH. Understanding differences in people’s answers to telephone and mail surveys. In: Braverman MT, Slater JK, editors. Advances in survey research. San Francisco: Jossey-Bass; 1996. p. 110. [32] Rockwood TH, Sangster RL, Dillman DA. The effect of response categories on questionnaire answers: context and mode effects. Sociol Methods Res 1997;26:118–40. [33] Schwarz N, Strack F, Mai H-P. Assimilation and contrast effects in part-whole question sequences: a conversational logic analysis. Pub Opin Q 1991;55:3–23.