The validity and reliability of an interactive computer tobacco and alcohol use survey in general practice

The validity and reliability of an interactive computer tobacco and alcohol use survey in general practice

Addictive Behaviors 35 (2010) 492–498 Contents lists available at ScienceDirect Addictive Behaviors The validity and reliability of an interactive ...

318KB Sizes 0 Downloads 29 Views

Addictive Behaviors 35 (2010) 492–498

Contents lists available at ScienceDirect

Addictive Behaviors

The validity and reliability of an interactive computer tobacco and alcohol use survey in general practice B. Bonevski a,⁎, E. Campbell b, R.W. Sanson-Fisher c a b c

Centre for Health Research and Psycho-oncology (CHeRP), Cancer Council NSW and The University of Newcastle, Level 2, David Maddison Building, Callaghan 2308, NSW, Australia Population Health, Hunter New England Area Health Service, Locked Bag 10, Wallsend 2289, NSW, Australia Health Behaviour Research Group, The University of Newcastle, Callaghan 2308, NSW, Australia

a r t i c l e

i n f o

Keywords: Validity Reliability Computer Health risk assessment

a b s t r a c t Background: Uncertainty regarding the accuracy of the computer as a data collection or patient screening tool persists. Previous research evaluating the validity of computer health surveys have tended to compare those responses to that of paper survey or clinical interview (as the gold standard). This approach is limited as it assumes that the paper version of the self-report survey is valid and an appropriate gold standard. Objectives: First, to compare the accuracy of computer and paper methods of assessing self-reported smoking and alcohol use in general practice with biochemical measures as gold standard. Second, to compare the test re-test reliability of computer administration, paper administration and mixed methods of assessing selfreported smoking status and alcohol use in general practice. Methods: A randomised cross-over design was used. Consenting patients were randomly assigned to one of four groups; Group 1. C–C : completing a computer survey at the time of that consultation (Time 1) and a computer survey 4–7 days later (Time 2); Group 2. C–P: completing a computer survey at Time 1 and a paper survey at Time 2; Group 3. P–C: completing a paper survey at Time 1 and a computer survey at Time 2; and Group 4. P–P: completing a paper survey at Time 1 and 2. At Time 1 all participants also completed biochemical measures to validate self-reported smoking status (expired air carbon monoxide breath test) and alcohol consumption (ethyl alcohol urine assay). Results: Of the 618 who were eligible, 575 (93%) consented to completing the Time 1 surveys. Of these, 71% (N = 411) completed Time 2 surveys. Compared to CO, the computer smoking self-report survey demonstrated 91% sensitivity, 94% specificity, 75% positive predictive value (PPV) and 98% negative predictive value (NPV). The equivalent paper survey demonstrated 86% sensitivity, 95% specificity, 80% PPV, and 96% NPV. Compared to urine assay, the computer alcohol use self-report survey demonstrated 92% sensitivity, 50% specificity, 10% PPV and 99% NPV. The equivalent paper survey demonstrated 75% sensitivity, 57% specificity, 6% PPV, and 98% NPV. Level of agreement of smoking self-reports at Time 1 and Time 2 revealed kappa coefficients ranging from 0.95 to 0.98 in each group and hazardous alcohol use self-reports at Time 1 and Time 2 revealed kappa coefficients ranging from 0.90 to 0.96 in each group. Conclusion: The collection of self-reported health risk information is equally accurate and reliable using computer interface in the general practice setting as traditional paper survey. Computer survey appears highly reliable and accurate for the measurement of smoking status. Further research is needed to confirm the adequacy of the quantity/frequency measure in detecting those who drink alcohol. Interactive computer administered health surveys offer a number of advantages to researchers and clinicians and further research is warranted. © 2010 Elsevier Ltd. All rights reserved.

1. Introduction The use of interactive computers to collect health data directly from patients has several advantages (Fricker & Schonlau, 2002). ⁎ Corresponding author. Tel.: +61 2 49138 619; fax: +61 2 49138 601. E-mail addresses: [email protected] (B. Bonevski), [email protected] (E. Campbell), rob.sanson-fi[email protected] (R.W. Sanson-Fisher). 0306-4603/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.addbeh.2009.12.030

Studies indicate that computers are a confidential, acceptable, feasible, user-preferred, and cost-effective mechanism for collecting health information (Bernhardt, Strecher, Bishop, Potts, Madison & Thorp, 2001; Wright, Aquilino & Supple, 1998; Hibbert, Hamill, Rosier, Caust, Patton & Bowes, 1996; Shakeshaft, Bowman & Sanson-Fisher, 1998; Shakeshaft & Frankish, 2003), particularly if the touch screen format is used (Westman, Hampel & Bradley, 2000). Touch screen computer interface has reported high acceptability rates in settings such as general practice (Bonevski, Sanson-Fisher, Campbell & Ireland,

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

1997) community drug and alcohol clinics (Shakeshaft, Bowman & Sanson-Fisher, 1998), and cancer treatment centres (Newell, Girgis & Sanson-Fisher, 1997). Computerised surveys offer better flexibility in questionnaire design due to automatic tailoring and branching of items based on responses, reducing item redundancy and missing data (Fricker & Schonlau, 2002). An additional advantage of the computerised data collection is that it can provide tailored ‘real time’ results immediately available to users and their doctors (Bonevski, Sanson-Fisher, Campbell, Carruthers, Reid & Ireland, 1999). This quality in particular has motivated the proliferation of computerdelivered and web-based health behaviour change interventions (Portnoy, Scott-Sheldon, Johnson & Carey, 2008; Walters, Wright & Shegog, 2006). The practicality and acceptability of computerised self-report measures cannot always compensate for other important characteristics. Uncertainty regarding the accuracy and reliability of the computer as a data collection or patient screening tool persists despite the importance of these characteristics (Fricker & Schonlau, 2002). Inaccurate self-report measures may lead to misclassification of patient's health risk status which may in turn result in the use of inappropriate interventions. Given many factors including mode of delivery (Bowling, 2005) may affect the accuracy of a self-report measure, it is imperative that new methods of collecting healthrelated self-report information are thoroughly evaluated. The accuracy of self-report measures can be assessed in several ways (Nunnally, 1978). One desirable quality that a measure should demonstrate is concurrent validity. Assessing concurrent validity usually involves the comparison of self-reported health behaviours to a gold standard (or true) measure of the behaviour. Another important characteristic is the demonstrated test retest reliability. Reliability is the extent to which a measure produces results which are free of random error and can be assessed by measuring the degree of discrepancy of responses when a scale is administered to a sample on two separate occasions. Previous research evaluating the validity of computer-delivered health behaviour surveys have tended to compare the responses of the computer survey to that of paper survey or clinical interview (as the gold standard) (Ahmed, Hogg-Johnson & Skinner, 2008; Wu, Thorpe, Ross, Micevski, Marquez & Straus, 2009; Hayward et al., 1992; Davis, Hoffman, Morse & Luehr, 1992). This approach is limited as it assumes that the paper version of the self-report survey is valid and an appropriate gold standard. For example, there is some evidence that responses to computer surveys can be more valid than responses to face-to-face interviews, particularly for socially undesirable behaviours (Beck, Steer & Ranieri, 1988; Greist, Klein, Van Cura & Erdman, 2000). Surprisingly few studies have compared commonly assessed health risk behaviours using computerised surveys against biochemical measures. One such study has compared self-report from 52 general practice patients using computer survey on diet and nutrition with blood and urine assay results as gold standard (O'Donnell, Nelson, Wise & Walker, 1991). The authors failed to provide any information on the appropriateness of this gold standard as a measure of dietary intake. Information on the sensitivity and specificity of the assays and the half-life of the biochemical variables in these mediums is necessary. The study also failed to provide a comparison of accuracy with non-computerised methods. Another study compared computer survey self-reported and accelerometermeasured average daily time spent performing moderate to vigorous physical activity (Wong, Leatherdale & Manske, 2006). Self-reported and measured BMI were also compared. The study found that computer collected self-report responses significantly correlated with the objective gold standard measures. However, important indices of agreement were not reported such as sensitivity and specificity of the self-report measure. The study also failed to compare the computer survey results with non-computerised alternative survey methods. Finally the generalisability of those results is

493

restricted as the study was set in secondary schools with students in grades 6 to 12. A number of studies have examined the test re-test reliability of computer delivery of health risk survey and have generally concluded that computer surveys are reliable (Hayward et al., 1992; Bernadt, Daniels, Blizard & Murray, 1989; Miller et al., 2002). However, many of these studies have had small sample sizes (Bernadt, Daniels, Blizard & Murray, 1989; Miller et al., 2002), or short periods (1–4 h) between the first and second administration of the survey (Hayward, Smittner, Meyer et al, 1992). The aims of this study are two-fold. Firstly, to compare the accuracy of computer and paper methods of assessing self-reported smoking and alcohol use in general practice with biochemical measures as gold standard. Secondly, to compare the test re-test reliability of computer administration, paper administration and administration using mixed methods of assessing self-reported smoking status and alcohol use in general practice. 2. Methods 2.1. Study design A randomised cross-over design was used (see Fig. 1). Consenting patients were randomly assigned to one of four groups; Group 1. C–C: those completing a computer survey at the time of that consultation (Time 1) and a computer survey 4–7 days later (Time 2). Group 2. C–P: those completing a computer survey at Time 1 and a paper survey at Time 2. Group 3. P–C: those completing a paper survey at Time 1 and a computer survey at Time 2. Group 4. P–P: those completing paper surveys at Time 1 and Time 2.

2.2. Recruitment A consecutive sample of patients was recruited from four general practice surgeries in one region of New South Wales, Australia. Trained interviewers were located in each surgery waiting area. As patients waited to see the doctor, the interviewers approached adult patients using a standardised protocol. Each patient was assessed for eligibility: aged over 18 years, not too ill to participate, can comprehend English, and had not previously completed the survey at this or another surgery. Each patient was given written information about the study and asked for written consent. A two-step consent process was used. Patients were initially asked only if they were willing to complete a health behaviour survey either on computer or paper on two occasions, before their consultation that day and within the next 4–7 days. Once they completed their first survey, they were asked for consent to perform the biochemical tests. 2.3. Procedure At Time 1, all patients completing the health survey were also asked for consent to provide the interviewer with a urine sample using a sample jar provided to them and to participate in a breath analysis before leaving the surgery. All urine samples were placed immediately on ice and kept frozen until transportation to the pathology laboratory. Patients were unaware of which type of survey they would be administered at Time 2. Many patients were returning to the surgery within the required timeframe for the purpose of a follow-up consultation and therefore were able to complete the Time 2 survey at the surgery (that is, prior to their consultation). Patients randomised to receive the paper survey at Time 2 but who were not returning to the surgery were mailed the survey to be completed at home and returned. An incentive to return to the surgery at Time 2 was provided (instant lottery ticket). An acceptability survey,

494

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

Fig. 1. Flow diagram of patient recruitment.

described in detail elsewhere (Bonevski, Sanson-Fisher, Campbell & Ireland, 1997), was distributed to participants in groups C–P and P–C at Time 2. The study was approved by the University of Newcastle Human Research Ethics Committee. 2.3.1. Computer administration The touch screen computer program was designed to be useable by those with no prior computer use experience. The computer hardware sat on a raised protective stand with a screening enclosure and was located at a private position within the surgery waiting area. Patients stood to complete the survey with one question presented at a time. The patient responded to the question on the screen by touching the area of the screen with the answer. The program re-set itself for each new patient. 2.3.2. Paper administration Patients completed the paper survey in the waiting area by circling appropriate options or writing responses. 2.4. Materials 2.4.1. Health risk survey The health behaviour survey was identical in content regardless of method or time of administration. The survey contained items on smoking status, alcohol consumption and benzodiazepine use. 2.4.1.1. Smoking status. A single item was used to determine smoking status. Patients were asked “Which of the following best describes your smoking status?” and responded by selecting from “I'm a smoker, I smoke daily”, “I'm a smoker, I smoke occasionally”, “I'm an

ex-smoker, I never smoke now”, “I'm a non-smoker, I have never smoked”. A patient who indicated to smoke daily or occasionally was defined as a smoker. The item has demonstrated accuracy in paper form in past research (Dickinson, Wiggers, Leeder & Sanson-Fisher, 1989). Smokers were also asked “How long ago was the last time you had smoked a cigarette, cigar or pipe?” and responded by indicating one of the following: “In the last 4 hours”, “More than 4 and up to 8 hours ago”, “More than 8 and up to 12 hours ago”, or “More than 12 hours ago”. 2.4.1.2. Alcohol consumption. The two-item quantity/frequency measure of alcohol consumption was used (Straus & Bacon, 1953; Wyllie, Zhang & Casswell, 1994). Patients were asked “How many days in an average week do you usually drink alcohol?” and “On a day when you drink alcohol, how many standard drinks do you usually have?” A standard drinks graphic was displayed with this item. The measure has demonstrated adequate test re-test reliability in paper surveys in past research (Webb, Redman, Gibberd & Sanson-Fisher, 1991). National guidelines defined hazardous drinking as consumption of more than 28 (males) or 14 (females) standard drinks per week (National Health and Medical Research Council of Australia, 1992). Patients who indicated to have had at least one drink were also asked “When was the last time you had a standard drink (i.e., beer, wine, spirits or fortified wine)?” and responded by indicating; “In the last 6 hours”, “More than 6 and up to 12 hours ago”, “More than 12 and up to 24 hours ago”, “More than 24 hours ago”. 2.4.2. Biochemical measures To validate self-reported current smoking status, useable expired air carbon monoxide (CO) samples were taken. To validate self-reported

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

alcohol use, urine assays on the collected samples were conducted. Patients were told that breath and urine samples were required to test their exposure to passive smoking and medication use. 2.4.2.1. Smoking status. Measuring CO in expired air has demonstrated high sensitivity and specificity of around 90% for both (Benowitz, 2002). Analyses of CO levels were undertaken using a Bedfont EC50 micro hand-held smokerlyzer. To record expired air CO, the Jarvis protocol was followed (Jarvis, Russell & Saloosee, 1980). In accordance with manufacturer's instructions, the CO monitors were calibrated with a gaseous mixture containing 50 ppm of CO prior to the start of the study. A cut-off of 9 ppm was used to validate the smoking status of participants. This level has been used previously and is within the range recommended by the SRNT Biochemical Validation Working Group (Benowitz, 2002; Stookey, Katz, Olsen et al, 1987). 2.4.2.2. Alcohol consumption. All urine samples were assayed qualitatively for alcohol using Enzyme Multiplied Immunoassay Technique (EMIT) ETS Plus ethyl alcohol assay (Syva Diagnostics, Palo Alto, CA). The positive presence of ethyl alcohol was defined by 10 mg/dL. A critical evaluation of EMIT by Sutheimer et al. (1982) found it to be class specific, simple to perform, time efficient and more than sensitive enough to detect drug levels associated with heavy use. 2.5. Analyses Chi Square tests were performed to examine for demographic differences between groups. Prevalence of smoking and alcohol consumption are presented as proportions with 95% confidence intervals (95% CI). Prevalence of health behaviours were estimated for both self-report and biochemical measures. For the purposes of statistical analyses, agreement between the two estimates of prevalence was assessed using CO or urinalysis as the

495

standard. Diagnostic statistical tests were conducted to assess the performance of self-report (Sackett, Haynes, Guyatt & Tugwell, 1991). These tests included assessing the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) with 95% CIs. To calculate the level of agreement between self-report risk factors at Time 1 and Time 2, kappa analyses were performed for each of the four groups. Kappa analyses were adjusted for bias and prevalence effects. All analyses were conducted using the SAS statistical package and the Microsoft Excel spreadsheet package. 3. Results 3.1. Sample A total of 878 patients were approached by the interviewers. Of these, 260 (30%) were ineligible; 58 were under 18 years of age, 77 displayed limited or no English, 11 were too ill, and 114 had previously participated. Of the 618 who were eligible, 575 (93%) consented to completing the Time 1 surveys. In summary, of the participants who completed the Time 1 surveys, 95% (N = 549) completed the CO test and 87% (N = 499) completed the urinalysis and 71% (N = 411) completed Time 2 surveys. Fig. 1 shows the numbers for the various components of the study by group. Table 1 summarises the basic participant demographics in each group at Time 1. There were no differences between groups in health risk factors and demographic characteristics. There were also no differences between patients who returned at Time 2 and those who did not. 3.2. Preferences for mode of administration In summary, 213 participants who had completed both paper and computer surveys completed an acceptability survey providing a response rate of 76%. Ninety seven per cent (97%) agreed with the

Table 1 Demographic and health risk characteristics of patients completing surveys at Time 1.

Gender Male Age 18–29 years 30–49 years 50–69 years 70–86 years Marital status Single Separated/divorced/widowed Married/de-facto Employment status Full-time/home duties/student Part-time/casual Unemployed Retired/unable/other Secondary education Not complete primary Primary school Up to year 10 Up to year 12 Tertiary education None Certificate/diploma Graduate degree Postgraduate degree Other Smoking status Smoker Alcohol use Hazardous drinker No/safe/responsible

Group C–C N = 154

Group C–P N = 143

Group P–C N = 138

Group P–P N = 140

%

%

%

%

X2, df, p value

30

37

44

41

7.067, 3, 0.070

18 39 26 17

21 40 25 14

17 35 29 19

20 41 24 15

3.431, 9, 0.945

18 8 74

15 5 80

19 4 77

19 7 74

3.257, 6, 0.776

53 11 26 10

49 15 24 12

50 16 24 10

51 12 27 10

2.787, 9, 0.972

2 8 63 27

5 10 61 24

1 12 66 21

2 13 60 25

9.391, 9, 0.402

63 12 16 3 6

68 18 12 1 1

59 21 10 4 6

62 15 14 4 4

17.611, 12, 0.128

20

20

22

22

0.241, 3, 0.971

4 96

5 95

5 95

5 95

0.127, 3, 0.988

496

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

statement “The computer program was enjoyable”, only 4% felt the computer survey “took too long”, and 99% of respondents claimed to have “answered all questions in the computer survey truthfully”. The computer survey was rated easier to read (42%) and quicker to finish (68%) than the paper survey (8% and 6% respectively) and preferred by 68% of respondents who completed both types of assessment. 3.3. Comparison of smoking self-report with biochemical measure Analyses were conducted on the data from those participants who completed the smoking status item in the survey and a CO test. As a result, 507 participants were eligible for validation of their smoking status, 254 completing the computer survey and 253 completing the paper survey. Table 2 details the two estimates of smoking status prevalence, sensitivity, specificity, PPV and NPV with 95% CIs for computer and paper administration. The results show that compared to CO results, the computer survey demonstrated 91% sensitivity, 94% specificity, 75% PPV and 98% NPV. The equivalent paper survey demonstrated 86% sensitivity, 95% specificity, 80% PPV, and 96% NPV. More detailed analysis using the paper survey data only showed that of the 11 self-reported smokers who were not confirmed by the CO test results, 10 (91%) reported to have had a smoke over 4 h before completing the survey. Only 5% (2 of the 44) of the self-reported smokers who were confirmed by CO indicated to have had a smoke over 4 h ago. 3.4. Comparison of alcohol consumption self-report with biochemical measure Analyses were conducted on the data from those participants who completed the alcohol use measure in the survey and provided a urine sample. As a result, 433 participants were eligible for validation of their self-reported alcohol use, 215 completing the computer survey and 218 completing the paper survey. Given the low prevalence of ‘hazardous’ alcohol use (4%), participants were classified to show ‘any’ alcohol use. Any alcohol use was defined as those indicating at least one standard drink per week (the minimal amount of alcohol use able to be reported on the quantity/frequency measure). Table 3 summarises the alcohol use prevalence, sensitivity, specificity, PPV and NPV with 95% CIs for computer and paper administration of the survey. More detailed analysis of the paper survey results only shows that of the 90 participants reporting to use any amount of alcohol who were not confirmed by urinalysis, 79 (88%) reported to have had their last drink of alcohol over 24 h prior to completing the survey. In comparison, all participants who reported to drink any alcohol who were confirmed by urinalysis reported to have had a drink in the last 24 h.

Table 2 Self-reported smoking status using computer survey and paper survey compared to CO test results.

Self-reported smoking status N Yes N No

Prevalence By self-report By CO Sensitivity Specificity PPV NPV

Computer

Paper

CO levels

CO levels

≥9 ppm 39 4

≤9 ppm 13 198

≥9 ppm 44 7

% (95% CI)

% (95% CI)

20 17 91 94 75 98

22 20 86 95 80 96

(15, (12, (82, (91, (63, (96,

25) 22) 99) 97) 87) 100)

(17, 27) (15, 25) (77, 96) (91, 98) (69, 91) (93, 99)

≤9 ppm 11 191

Table 3 Self-reported alcohol use (any amount) using computer survey and paper survey compared to urine assay results.

Self-reported alcohol use N Yes N No

Prevalence By self-report By urinalysis Sensitivity Specificity PPV NPV

Computer

Paper

Detected in urine

Detected in urine

Yes 11 1

Yes 6 2

No 101 102

% (95% CI)

% (95% CI)

52 (45, 59) 6 (3, 9) 92 (76, 100) 50 (43, 57) 10 (4, 16) 99 (98, 100)

44 (37, 51) 4 (1, 7) 75 (45, 100) 57 (50, 63) 6 (1, 11) 98 (96, 100)

No 90 118

3.5. Comparison of test re-test reliability of computer and paper survey administration To be included in the analyses for test retest reliability, participant responses from both Time 1 and Time 2 surveys were required. Table 4 summarises the number of participants completing surveys both times and prevalence of the health behaviour for each group. Level of agreement of smoking self-reports at Time 1 and Time 2 revealed kappa coefficients ranging from 0.95 to 0.98 in each group. Level of agreement of hazardous alcohol use self-reports at Time 1 and Time 2 revealed kappa coefficients ranging from 0.90 to 0.96 in each group. 4. Discussion This study aimed to examine the psychometric qualities of interactive computer administered patient health risk surveys in the general practice setting and compare those qualities with traditional paper administration of surveys. The study found no notable differences between administration modes in prevalence of estimates of smoking and alcohol use as indicated by overlapping 95% confidence intervals. The results are in accord with previous studies, but provide new information on the sensitivity, specificity, positive predictive and negative predictive values of the self-report measures. 4.1. The accuracy of computer health risk survey Of the two behaviours examined, the self-reported prevalence estimates for smoking status were in the best agreement with Table 4 Time 1 and Time 2 self-report prevalence estimates of smoking status and alcohol use for each group and level of agreement using kappa statistics.

Total N

Time 1

Time 2

%

%

95% CI

Kappa co-efficient

95% CI

Group C–C Smoking status: smoker Alcohol use: hazardous

84 84

Computer 17 9, 25 4 0, 8

Computer 17 9, 25 2 0, 5

0.95 0.93

Group P–P Smoking status: smoker Alcohol use: hazardous

95 94

Paper 18 10, 26 5 0, 9

Paper 17 9, 25 4 0, 8

0.98 0.90

Group C–P Smoking status: smoker Alcohol use: hazardous

96 75

Computer 16 9, 23 2 0, 5

Paper 15 8, 22 2 0, 5

0.98 0.96

Group P–C Smoking status: smoker Alcohol use: hazardous

101 102

Paper 21 13, 29 3 0, 6

Computer 20 12, 28 4 0, 8

0.98 0.92

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

biochemical measures suggesting that self-report is accurate regardless of mode of administration. The results reflect those reported in the literature comparing paper surveys with CO validation (Benowitz, 2002; Stookey, Katz, Olsen et al., 1987) and provide new data comparing computer survey with CO validation results. Only 7% of patients completing the computer survey were misclassified, and only 9% of smokers were missed. Similar results were obtained for the paper administered surveys where 7% of patients were misclassified and 14% of CO-defined smokers failed to be identified by the surveys. This is likely because an individual's current smoking status is a relatively stable and concrete risk factor which requires little recall of distant events (Benowitz for the SRNT). The quantity/frequency measure of alcohol requires the patient to estimate an average amount of drinking in which they normally engage. The definition of hazardous drinking used in the study was consumption of more than 28 (males) or 14 (females) standard drinks per week which was based on published national guidelines at the time of the study (National Health and Medical Research Council of Australia, 1992). Guidelines for alcohol consumption have recently come under review and definitions of low and high risk drinking have changed (National Health and Medical Research Council, 2009). Nonetheless, the study showed self-reported consumption of any amount of alcohol displayed poor accuracy when compared to urinalysis, regardless of computer or paper method. In particular, the specificity and positive predictive values of both computer and paper methods of self-report were low. The computer survey only accurately identified 50% of those who did not have alcohol in their urine assay results (specificity). The paper survey specificity rate was similar at 57%. The positive predictive values of the computer and paper surveys were particularly low, 10% and 6% respectively, suggesting that while the surveys themselves are poor at confirming alcohol use, they will pick up 92% (computer) or 75% (paper) of all alcohol users (sensitivity). The use of the positive predictive value is limited since, as it is not intrinsic to the test, it is highly influenced by the prevalence of the measured behaviour, which may be affecting the values obtained in this study (Sackett, Haynes, Guyatt & Tugwell, 1991). Also, alcohol is a notoriously difficult drug for which to obtain reasonably accurate biological markers, due mostly to its high metabolic rate and subsequent short half-life which may be contributing to the results (Conigrave, Saunders & Whifield, 1995; Dawe & Mattick, 1997). The low accuracy rates of computer administration of the quantity/ frequency measure obtained here have been found by others (Shakeshaft, Bowman, & Sanson-Fisher, 1999; Barry & Fleming, 1990). However it would be hasty to dismiss the measure as an unsuitable survey of alcohol intake based on the current findings as other factors need to be considered. For example, some comparisons of reported consumption levels using quantity/frequency with other self-report measures, such as retrospective seven day diary, have revealed reasonable comparative results (Wyllie, Zhang & Casswell, 1994), supporting the view that the comparison biochemical measure may be a strong influence on the poor accuracy results. Also, the two-item quantity/frequency measure is attractive in the clinical setting due to its brevity and ease of use and continues to be useful at detecting true positives. 4.2. The reliability of computer health risk survey Test re-test reliability was equally high across the two health behaviours and survey administration formats. The time period between first and second administration was long enough to suggest that participants were not merely recalling previous responses but short enough to reduce the likelihood of true changes in behaviours. The findings suggest both survey methods reveal consistent results for the behaviours under investigation. The current results concur with previous research which has reported high test re-test reliability in computer surveys of smoking and alcohol use (Hayward, Smittner, Meyer et al., 1992; Bernadt, Daniels, Blizard & Murray, 1989; Miller

497

et al., 2002). In addition to high reliability for repeated computer application or repeated paper administration, repeated administration of mixed computer and paper formats resulted in high agreement scores. This suggests that either format may be used interchangeably during the duration of a project without compromising the reliability of results obtained. 4.3. Implications for research and practice Despite some remaining questions regarding the accuracy of alcohol use self-report measures, it appears that computerised healthrelated data collection is at least as reliable and valid as the paper survey for the collection of data regarding smoking status and alcohol use. Given there are many other advantages to interactive computer data collection such as cost-efficiency, standardised data collection and acceptability to users, the current study adds weight to the case for computerised health behaviour surveys. It is important though to continue validating computer and other methods of self-report data collection in order to improve accuracy. A number of reasons have been outlined in the literature as to why inaccuracies in self-report occur (Baranowski, 1985; Sudman & Bradman, 1982) such as poor recall, practice effects and the social desirability effect. It is important to continue exploring methods to minimise inaccuracies and to assist computer survey respondents in providing true responses. The results of the study have practical implications for clinicians. One of the major barriers cited by clinicians as limiting their ability to carry out routine health risk screening is lack of time (Zwar & Richmond, 2006). An interactive touch screen computer in the waiting room, as reported in this paper, can abbreviate the amount of time the clinician spends with patients in assessing health risk while at the same time provide a prompt for further probing if necessary. A randomised controlled trial of the effectiveness of a feedback mechanism for general practitioners attached to the computer health survey showed that it is also potentially a strategy for increasing rates of preventive care delivery (Bonevski, SansonFisher, Campbell, Carruthers, Reid & Ireland, 1999). 4.4. Strengths and limitations of the study Four important methodological issues need to be considered when interpreting the results of the study. First, biochemical measures utilised in this study constituted those which were pragmatically and financially feasible within the general practice setting. As such, they present a number of limitations. Alternative biochemical measures of smoking status including salivary cotinine have been reported to have greater sensitivity and specificity and have longer half lives than expired air CO (Benowitz, 2002). Attempts were made to explore the possible effects of short half-life (4 h for CO and 12 h for ethyl alcohol) with additional items in the survey to determine the time-span between last smoke/ drink and survey completion. Due to a programming error, that data was not preserved on the computer survey. Using the paper survey data available, analyses indicate that time since last smoked and the short CO half-life may be influencing the results. Second, and related to the previous point, some values were missing. Our use of pairwise deletion in order to address missing values reduced the sample sizes used in the validation analyses. However, this only affected a small number of cases (9% of respondents) and there were no differences between computer and paper groups in the amount of missing data. Third, as indicated in the methods, it was necessary to allow a number of patients who were randomised to receive the paper survey at Time 2 during the reliability study, to complete the survey at home. The different setting may have affected the responses provided. The small number of surveys completed in this manner (10 in group C–P and 28 in group P–P) and the high agreement coefficients obtained, suggest that any such effects were minimal. Finally, the generalisability of the current findings is limited to the clinical setting. Further evaluations in other settings such

498

B. Bonevski et al. / Addictive Behaviors 35 (2010) 492–498

as workplaces and schools are warranted. Notwithstanding those limitations, the study had a number of methodological strengths which maximise the representativeness of the study sample and minimise potential biases. Relatively high participation rates were achieved for each component of the study which is unusual considering the intrusiveness of the procedures. No differences in demographic characteristics or prevalence of health behaviours was found between groups or between those who returned to complete Time 2 measures and those who did not. Participants were unaware, at the time of giving consent, to which group they had been randomised, and that their selfreport responses would be validated by objective measure. 5. Conclusion In conclusion, the collection of self-reported health risk information is equally accurate and reliable using computer administered survey in the general practice setting as paper administered survey. Computer health risk assessment appears highly reliable and accurate for the measurement of smoking status. Further research is needed to confirm the adequacy of the quantity/frequency measure in detecting those who drink alcohol. Whilst the sensitivity and test re-test reliability of the computer quantity/frequency measure were high, its specificity is limited. Interactive computer administered health surveys offer a number of advantages to researchers and clinicians and further research is warranted. Role of Funding Source This research was funded by the Commonwealth General Practice Evaluation Program and Cancer Council New South Wales' Centre for Health Research & Psychooncology (CHeRP). The funders played no role in the conduct of the study. Contributors All authors contributed to the conceptual and planning phases, data collection, data analysis and writing of the paper. Conflict of Interest The authors declare no known conflicts of interest. Acknowledgements This research was funded by the Commonwealth General Practice Evaluation Program and Cancer Council New South Wales' Centre for Health Research & Psychooncology (CHeRP). The views expressed are not necessarily those of The Cancer Council. We gratefully acknowledge the assistance of Professor ALA Reid and Professor M. Ireland in the recruitment of general practitioners and Ms L. Adamson in the recruitment of patients.

References Ahmed, F., Hogg-Johnson, S., & Skinner, H. A. (2008). Assessing patient attitudes to computerized screening in primary care: psychometric properties of the computerized lifestyle assessment scale. Journal Medical Internet Research, 10(2), e11, doi:10.2196/jmir.955. Baranowski, T. (1985). Methodological issues in self-report of health behaviour. Journal of School Health, 55, 179−182. Barry, K. L., & Fleming, M. F. (1990). Computerised administration of alcoholism screening tests in a primary care setting. Journal of the American Board of Family Practice, 3, 93−98. Beck, A. T., Steer, R. A., & Ranieri, W. F. (1988). Scale for suicide ideation: psychometric properties of self report version. Journal of Clinical Psychology, 44, 499−505. Benowitz, N. L. (2002). for the SRNT Subcommittee on Biochemical Verification. Biochemical verification of tobacco use and cessation. Nicotine & Tobacco Research, 4, 149−159. Bernadt, M. W., Daniels, O. J., Blizard, R. A., & Murray, R. M. (1989). Can a computer reliably elicit an alcohol history? British Journal of Addiction, 84, 405−411. Bernhardt, J. M., Strecher, V. J., Bishop, K. R., Potts, P., Madison, E. M., & Thorp, J. (2001). Handheld computer-assisted self interviews: user comfort level and preferences. American Journal of Health Behaviour, 25(6), 557−563. Bonevski, B., Sanson-Fisher, R. W., Campbell, E. M., & Ireland, M. C. (1997). Do general practice patients find computer health risk surveys acceptable? A comparison with pen-and-paper method. Health Promotion Journal of Australia, 7, 100−106. Bonevski, B., Sanson-Fisher, R. W., Campbell, E., Carruthers, A., Reid, A. L. A., & Ireland, M. (1999). Randomized controlled trial of a computer strategy to increase general practitioner preventive care. Preventive Medicine, 29, 478−486. Bowling, A. (2005). Mode of questionnaire administration can have serious effects on data quality. Journal of Public Health, 27(3), 281−291.

Conigrave, K. M., Saunders, J. B., & Whifield, J. B. (1995). Diagnostic tests for alcohol consumption. Alcohol and Alcoholism, 30, 13−26. Davis, L. J., Hoffman, N. G., Morse, R. M., & Luehr, J. G. (1992). Substance use disorder diagnostic schedule (SUDDS): the equivalence and validity of a computeradministered and an interviewer-administered format. Alcoholism, Clinical and Experimental Research, 16, 250−254. Dawe, S., & Mattick, R. P. (1997). Review of Diagnostic Screening Instruments for Alcohol and Other Drug Use and Other Psychiatric Disorders Canberra: Australian Government Publishing Service. Dickinson, J. A., Wiggers, J., Leeder, S. R., & Sanson-Fisher, R. W. (1989). General practitioners' detection of patients' smoking status. Medical Journal of Australia, 150, 420−426. Fricker, R. D., & Schonlau, M. (2002). Advantages and disadvantages of internet research surveys: evidence from the literature. Field Methods, 14(4), 347−367, doi:10.1177/ 152582202237725. Greist, J. H., Klein, M. H., Van Cura, L. J., & Erdman, H. P. (2000). Computer interview questionnaires for drug use/abuse. In D. J. Lettieri (Ed.), Predicting Adolescent Drug Abuse: A Review of Issues, Methods and Correlates, DHEW Publication No ADM-76-299 (pp. 147−164). Washington DC: US Government Printing Office. Hayward, R. S. A., Smittner, J. P., Meyer, P., et al. (1992). Computer versus interview administered preventive care questionnaire: does survey medium affect response reliability? Clinical Research, 40, 608A. Hibbert, M. E., Hamill, C., Rosier, M., Caust, J., Patton, G., & Bowes, G. (1996). Computer administration of a school-based adolescent health survey. Journal of Pediatric Child Health, 32, 372−377. Jarvis, M. J., Russell, M. A., & Saloosee, Y. (1980). Expired air carbon monoxide: a simple breath test of tobacco smoke intake. BMJ, 16, 484−485. Miller, E. T., Neal, D. J., Roberts, L. J., Baer, J. S., Cressler, S. O., et al. (2002). Test–retest reliability of alcohol measures: is there a difference between internet-based assessment and traditional methods? Psychology of Addictive Behaviours, 16(1), 56−63. National Health and Medical Research Council. (2009). Australian Guidelines to Reduce Health Risks from Drinking Alcohol Canberra: Government Publishing Service. National Health and Medical Research Council of Australia. (1992). Is There a Safe Level of Daily Consumption of Alcohol for Men and Women?, 2nd Ed Canberra: Australian Government Publishing Service. Newell, S., Girgis, A., & Sanson-Fisher, R. W. (1997). Are touchscreen computer surveys acceptable to medical oncology patients? Journal of Psychosocial Oncology, 15, 37−46. Nunnally, J. C. (1978). Psychometric Theory: Second Ed. New York: McGraw-Hill. O'Donnell, M. G., Nelson, M., Wise, P. H., & Walker, D. M. (1991). A computerised diet questionnaire for use in diet health education. 1. Development and validation. British Journal of Nutrition, 66, 3−15. Portnoy, D. B., Scott-Sheldon, L. A. J., Johnson, B. T., & Carey, M. P. (2008). Computerdelivered interventions for health promotion and behavioural risk reduction: a meta-analysis of 75 randomized controlled trials, 1988–2007. Preventive Medicine, 47, 3−16. Sackett, D. L., Haynes, R. B., Guyatt, G. H., & Tugwell, P. (1991). Clinical Epidemiology. A Basic Science for Clinical Medicine. Boston: Little, Brown and Co. Shakeshaft, A. P., & Frankish, C. J. (2003). Using patient-driven computers to provide cost-effective prevention in primary care: a conceptual framework. Health Promotion International, 18(1), 67−77. Shakeshaft, A. P., Bowman, J. A., & Sanson-Fisher, R. W. (1998). Computers in community-based drug and alcohol clinical settings: are they acceptable to respondents? Drug & Alcohol Dependence, 50, 177−180. Shakeshaft, A. P., Bowman, J. A., & Sanson-Fisher, R. W. (1999). A comparison of two retrospective measures of weekly alcohol consumption: diary and quantity/ frequency index. Alcohol and Alcoholism, 34(4), 636−645. Stookey, G. K., Katz, B. P., Olsen, B. L., et al. (1987). Evaluation of biochemical validation measures in determination of smoking status. Journal of Dental Research, 66, 1597−1601. Straus, R., & Bacon, S. D. (1953). Drinking in College. New Haven, CT: Yale University Press. Sudman, S., & Bradman, N. M. (1982). Asking Questions. A Practical Guide to Questionnaire Design. San-Fransisco: Josey Bass. Sutheimer, C., Hepler, B. R., & Sunshine, I. (1982). Clinical application and evaluation of the EMIT-st drug detection system. American Journal of Clinical Pathways, 77, 731−735. Walters, S. T., Wright, J. A., & Shegog, R. (2006). A review of computer and internetbased interventions for smoking behaviour. Addictive Behaviours, 31, 264−277. Webb, G. R., Redman, S., Gibberd, R. W., & Sanson-Fisher, R. W. (1991). The reliability and stability of a quantity–frequency method and a diary method of measuring alcohol consumption. Drug and Alcohol Dependence, 27, 223−231. Westman, J., Hampel, H., & Bradley, T. (2000). Efficacy of a touchscreen computer based family cancer history questionnaire and subsequent cancer risk assessment. Journal of Medical Genetics, 37, 354−360, doi:10.1136/jmg.37.5.354. Wong, S. L., Leatherdale, S. T., & Manske, S. R. (2006). Reliability and validity of a schoolbased physical activity questionnaire. Medicine & Science in Sports & Exercise, 38(9), 1593−1600. Wright, D. L., Aquilino, W. S., & Supple, A. J. (1998). A comparison of computer-assisted and paper-and-pencil self-administered questionnaires in a survey on smoking, alcohol, and drug use. Public Opinion Quarterly, 62, 331−353. Wu, R. C., Thorpe, K., Ross, H., Micevski, V., Marquez, C., & Straus, S. E. (2009). Comparing administration of questionnaires via the internet to pen-and-paper in patients with heart failure: randomized controlled trial. Journal of Medical Internet Research, 11(1), e3, doi:10.2196/jmir.1106. Wyllie, A., Zhang, J., & Casswell, S. (1994). Comparison of six alcohol consumption measures from survey data 1994. Addiction, 89, 425−430. Zwar, N. A., & Richmond, R. L. (2006). Role of the general practitioner in smoking cessation. Drug and Alcohol Review, 25(1), 21−26.