Int. J. Man-Machine Studies (1977) 9, 69-86
A study of patients" attitudes to computer interrogation R. W. Luchs
Researeh Centre for Diagnostic Methodology, Southern General Hospital, 1345, Govan Road, Glasgow G51 4TF (Received 28 November 1975 and in revised form 20 May 1976) In the evaluation of techniques for the automatic acquisition of medical history data, the attitudes of patients, the "users", are of the utmost importance. While patients' opinions have been sought in several studies, there has previously been no objective measurement of attitudes in this field. In the present paper, the development and evaluation of two scales for the measurement of patients' attitudes are described, one being of the traditional "Thurstone" type based on attitude statements, and the other being a Semantic Differential scale. A reliability coefficient of 0.90 was obtained for the "Thurstone" scale, and the results from this and the Semantic Differential were found to correlate well with each other (r = 0.70), supporting the validity of the measures. In a study of 75 patients, each was interrogated by computer and then asked to take home a questionnaire containing the attitude scales to complete and return anonymously through the post. Of the 67 patients who returned their questionnaires, 82 ~o had favourable attitudes toward computer interrogation, and 49 700 had more favourable attitudes toward medical interviews with a computer than toward medical interviews with a doctor. Male patients had more favourable attitudes than female patients; younger patients were more favourable toward the computer than older patients; and manual workers were more favourable toward the computer than non-manual workers.
Introduction In the evaluation of techniques of computer interrogation]', the attitudes of patients are of the utmost importance. However, the idea that the attitudes of patients, the "consumers" of medical care, should be measured is of fairly recent origin. In medicine in general, various approaches have now been tried to gather information concerning patients' attitudes and opinions, (Reader et al., 1957; Cabal, 1962; Jaco, 1963; Deisher, 1965; Koos, 1965; Apostle, 1967; Franklin et aI., 1967; Korsche et al., 1968; Francis et al., 1969; Hulka et al., 1970), but with just two exceptions, (Franklin et al., 1967; and Hulka et al., 1970), these have all involved direct questioning, a method which has two severe limitations: firstly, patients rarely express negative attitudes and tend to reply in a manner which they perceive to be acceptable to the investigator, or according to social stereotypes and expectations, and secondly, subjective interpretations of such data can never produce unbiased quantitative scores for patients' attitudes. Similar criticisms can be made of the short questionnaires used almost exclusively to assess patients' attitudes toward computer interrogation in previous studies, (Mayne et al., 1968; Slack et al., t"Computer interrogation" is the questioning of patients by their interaction with a computer in order to elicit medical history data. 69
70
R . W . LUCAS
1968; Evans et aL, 1971 ; Grossman et aL, 1971 ; Doddington et aL, 1972; Yarnall et aL, 1972). Further, it is only possible to analyse responses to the individual questions of such questionnaires, no rationale exists for the combination of individual responses into a single score for the patient's attitude. Franklin et aL (1967) were the first investigators to make use of an attitude scale, developed by recognized psychological scaling techniques, in the study of patients' attitudes toward health care. They adapted Thurstone's "Method of Equal Appearing Intervals" (Thurstone et aL, 1929) to the development of a scale for the measurement of attitudes toward student health services. Hulka et aL (1970) more recently developed a scale for the measurement of attitudes toward primary medical care. In the present paper, the construction and evaluation of a "Thurstone" scale for the measurement of patients' attitudes toward computer interrogation, constructed by a variation of Edwards' "Scale Discrimination Method" (Edwards, 1948), is described. This scale was used in the estimation of the proportions of patients with favourablel and unfavourable attitudes toward computer interrogation, and in the investigation of the effects of patient variables, i.e. individual differences such as age and sex, on favourability. The development of a second "Thurstone" scale for the measurement of patients' attitudes toward medical interviews with doctors would be one method of comparing patients' attitudes toward interviews with computers and doctors. This :method would not only be costly in time and effort, but would also result in a comparison based on measurements made by different instruments. However, when the same instrument is used to measure both attitudes, then assuming that the generality of the instrument is valid, such direct comparisons become more tenable. An instrument which has been used as a generalized attitude scale is the Semantic Differential, referred to below as the "SD", developed by Osgood and his associates (Osgood et al., 1957). The construction, evaluation, and use of an SD for the measurement of patients' attitudes toward "Medical Interviews with a Computer", "Medical Interviews with a Doctor", and "The Ideal Medical Interview", is also described in this paper:
Construction of the "'Thurstone'" scale Approximately 200 attitude statements concerning the use of computers in hospitals for interrogating patients, as far as possible representing all possible strengths and directions of attitude, were developed. Sources were doctors, patients, and other observers, as well as the lay and scientific literature (e.g. Mayne et aL, 1968; Slack et aL, 1968; Kanner, 1969). These statements were then extensively edited according to Edwards' criteria for attitude statements, (Edwards, 1957), after which 100 statements remained. These were then submitted to a large group of judges who were to assign to each statement a rating on a continuum from "extremely favourable" to "extremely unfavourable" toward computer interrogation. The importance of the instructions used for the judging procedure has been emphasized by Rambo (1968). Thus the instructions used were a synthesis of those used by Hovland & Sherif (1952), and those of Jones et aL (1965), with the suggestions of Rambo (1968) incorporated. The final instructions were as follows: "The statements that you will find on the following pages are all about the use of computers in hospitals for patient interviewing. You will find your task easier if you read over a number of them, chosen haphazardly, before you begin your judgements. tA "favourable"attitudeis one whichpredisposesthe subjectto behavepositivelytowardthe object of the attitude.
ATTITUDESTO COMPUTERINTERROGATION
71
To the right of each statement you will find the numbers 1 to 9. After those statements which you think are most favourable to the use of computers in hospitals, put a ring around number 1. After those statements which you think express a neutral position with regard to the use of computers in hospitals, put a ring around number 5, and after those which you think are most unfavourable, put a ring around number 9. Assign various numbers to the rest of the statements in accordance with the degree of favourability or unfavourability expressed in them, i.e. scale them using... 1
I
Most favourable
2
3
4
5
t
Neutral
6
7
8
9
I
Most unfavourable
Whether you agree or disagree with the statements should NOT enter into your judgements of them. You are only to judge the degree of favourability or unfavourability toward the use of computers in hospitals, for the purpose described above, that is expressed in a statement, and NOT the extent that you are willing to endorse the opinion expressed. For example, consider the statement: "Computers are superior to men". It is unlikely that many people would agree with this statement, but most would agree that the statement itself expresses favourability toward computers, and so, in this present task, would put a ring around one of the numbers 1 to 4 on the favourable side of "neutral". The statements are in no particular order of favourability or otherwise, they are randomly ordered. Do not worry about assigning each number to an equal number of statements, there are no right and wrong answers, but PLEASE, TRY NOT TO ALLOW YOUR DECISIONS TO BE INFLUENCED BY WHETHER. OR NOT YOU PERSONALLY AGREE OR DISAGREE WITH THE SENTIMENTS EXPRESSED IN THE STATEMENTS." The judges were 85 postgraduate students of the Jordanhill College of Education, Glasgow. The median of all judgements, or "scale value", and the inter-quartile range, or " Q value", were computed for each statement. All statements with Q values greater than 1.5 were rejected. From the remaining statements two parallel forms of a scale were developed, "'Form A " and " F o r m B", each containing 14 statements spread evenly throughout the favourable-unfavourable continuum according to their scale values. The 28 statements selected to make up the two parallel forms are presented in Fig. 1, together with their scale values. The 28 statements comprising the whole scale were randomly ordered and the following instructions to patients were used: "On the following pages you will find 28 statements about the use of computers in hospitals. For each and EVER Y one, please: (a) Read the statement. (b) Decide whether you AGREE or DISAGREE with it. (e) Decide how strongly you feel about it, that is, decide whether you: 1. STRONGLY AGREE (or strongly disagree) or 2. AGREE (or disagree) or 3. MILDLY AGREE (or mildly disagree) (d) Show what you have decided by putting a tick in the box of your choice under the statement." These instructions were followed by the 28 statements. Under each statement was a scale of six divisions marked "strongly agree", "agree", "mildly agree", "mildly disagree", "disagree", and "strongly disagree". An example of the layout o f the statements and scales is presented in Fig. 2.
72
R . W . LUCAS
Statements (in order of presentation to patients) I. Most people would not trust a computer with their children's health. 2. The computer will not be good enough to do any of a doctor's work. 3. If a computer can ask questions for a doctor, he will spend more time thinking about patients' answers. 4. People will eventually become the slaves of machines. 5. If computers are used a lot, we will soon have to pay to go into hospital. 6. Computers are a menace to society. 7. Patients should at least be able to choose whether they are interviewed by a doctor or by a computer. 8. W h e n you are interviewed by a computer it still seems as though someone is paying attention to your answers. 9. Computers make you less nervous than doctors. I0. The standard of treatment in hospitals will improve if computers are used a lot. 11. It will soon be better to go to a chemist for advice than to bother with a hospital that is using a computer. 12. Nothing but harm will be done if computers are used a lot in hospitals. 13. More people will be able to see medical specialists if computers are used a lot in hospitals. 14. Doctors will neglect their patients more if computers are helping them. 15. Doctors should choose whether they want computers or not. 16. Most people will be sure of first class treatment if computers are used a lot in hospitals. 17. Standards of treatment are sure to go up if computers are used a lot in hospitals. 18. N o two doctors would agree about what is wrong with you, so computers cannot do any harm. 19. Doctors will be less devoted to their patients if computers are he/ping them. 20. Computers will make relationships between doctors and patients worse than they are now. 21. Most patients will benefit from the high accuracy of computers. 22. Computers should not be trusted with anyone's health. 23. If computers are used a lot in hospitals, it will be harder to get medical help in your own home. 24. Older people will find it more difficult to answer a computer. 25. Computers in hospitals will save everyone's time. 26. You feel more at ease sat in front of a computer than you do in front of a doctor. 27. More people will be able to get expert treatment if computers are used a lot in hospitals. 28. Doctors hardly ever tell you what is wrong with you anyway, so you might as well talk to a computer.
Equivalent form
Scale value
A B
7.967 8.441
B B
2.786 8'022
B B
7.879 8.768
A
5'000
A A
3.679 3'022
B
1"820
A
8.447
B
8-818
B
2-333
A B
7.719 5.065
A
1.382
A
1.711
B
4.674
A
6'978
B B B
7.580 2.038 8.802
A B A
7.455 6.929 2.500
B
3'086
A
2.042
A
4.318
FIG. 1. The 28 statements of forms A and B, and their scale values. F i f t y - o n e p a t i e n t s w h o h a d b e e n i n t e r r o g a t e d b y c o m p u t e r c o m p l e t e d t h e scale. O n e s t a t e m e n t , n u m b e r 7 i n Fig. 1, w a s e n d o r s e d b y all 51 p a t i e n t s a n d w a s t h u s r e j e c t e d as r e c o m m e n d e d b y E d w a r d s (1957). T h e r e m a i n i n g 2 7 s t a t e m e n t s w e r e s c o r e d 1 t o 6, " 6 " being the most favourable response, and correlation coefficients between each statem e n t ' s s c o r e s a n d t h e w h o l e scale, less t h e s t a t e m e n t u n d e r c o n s i d e r a t i o n , s c o r e s c o m p u t e d f o r e a c h s t a t e m e n t . T h e c o r r e l a t i o n s a r e p r e s e n t e d i n T a b l e 1. S t a t e m e n t s f o r w h i c h
73
ATTITUDES TO COMPUTER INTERROGATION
Please tick only ONE box for each statement: 1. MOST PEOPLE WOULD NOT TRUST A COMPUTER WITH THEIR CHILDREN'S HEALTH. strongly agree []
agree
mildly agree
[]
mildly [] disagree []
disagree []
strongly disagree
[]
2. THE COMPUTER WILL NOT BE GOOD ENOUGH TO DO ANY OF A DOCTOR'S WORK. strongly agree []
agree
mildly agree
[]
mildly [] disagree []
disagree []
strongly disagree
[]
FI~. 2. The layout of the attitude statements and the rating scale of the "L22" attitude scale. TABLE 1
The correlation between each of the 27 statement's scores and the whole scale (minus the statement under consideration) scores for 51 patients Correlation (Pearson r) Statement between statement score number and total score minus the statement 1 2 3 4 5 6
0"224* 0"472 0"458 0"742 0'402 0"191"
7
*****
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
0"527 0"540 0"670 0-591 0"744 0"731 0'473 --0"091" 0"737 0"793 0"216" 0.585 0.825 0.698 0.502 0.465 0.291 0.568 0.498 0.742 -0.031"
74
R . W . LUCAS
this correlation was not significantly greater than zero were rejected from the scale. The statements thus rejected, marked with an asterisk in Table 1, were numbers 1, 6, 15, 18, and 28 of Fig. 1. The scale at this stage consisted of 22 statements, and will be referred to below as the "L22 scale".
Properties of the L22 scale SCALABILITY
The 51 patients' responses to the 22 statements of the L22 scale were subjected to Guttman Scalogram Analysis using the computer program of Dixon (1973) which Uses the Cornell technique (Guttman, 1947). This analysis is used to test whether a given set of items, (attitude statements), as a group, constitute a scale in the sense that from the "rank-order" score it is possible to reproduce a subject's responses to individual items. The degree to which this is possible is expressed by the "coefficient of reproducibility". In practice, scales with coefficients greater than 0.85 have been used as efficient approximations to perfect, unidimensional, scales. A coefficient o f reproducibility of 0-89 was obtained for the whole L22 scale, indicating that the 22 statements constitute a scale in the Guttman sense. RELIABILITY
Measures of reliability can be divided into two main classes: measures of stability, in which a measure is correlated across time, and measures of equivalence which measure the equivalance of each item as an indicator of an underlying attitude. Measures of stability require at least two administrations of the scale which is difficult to implement with patient populations, whereas measures of equivalence require only one administration. Developing equivalent forms of a test or scale and examining the correlations between these has been a common method of evaluating the reliability of a scale and was used in this study. The correlation between the scores for the 14 item equivalent forms, Form A and Form B,("rAB"), was 0.85 (Pearson r). The standard deviation of the Form A scores was 9.64 and that of the Form B scores was 11.75. The ratio of these was therefore 1-22, a value outside the range acceptable to assume that aA = aB (Cronbach, 1951; p. 302), therefore the Flanagan formula (Kelly, 1942) for equivalent form reliability was used, giving an estimate of equivalent form reliability for Forms A and B of: 40"AtrBrA B = 0-91. 0"2 +O'2 +20"AO'BrA B
If the two forms are exactly equivalent, this estimate is exactly the percentage of scale variance which is accounted for by the principal common factor and other common factors, as opposed to item specific factors and chance error. The fact that the two forms can never be exactly equivalent means that this coefficient is always an under-estimate, that is, the total percentage of variance accounted for by the common factors is an upper bound for this coefficient of equivalence, which is itself dependent on the two forms chosen. A measure which is independent of the "split" of the items was first suggested by Kuder & Richardson (1937). Their approach, called "internal consistency", examines the covariance among all of the items simultaneously rather than that in a particular split.
75
ATTITUDES TO COMPUTER INTERROGATION
The most frequently used generalization of their original formulae is that of Cronbach (1951) known as "coefficient. ~t". Coefficient ~t is the mean of all possible split-half coefficients, it is a lower bound for the coefficient of precision (the instantaneous accuracy of the scale with the particular items), it estimates, and is a lower bound to the proportion of scale variance attributable to common factors among the items, and is an upper bound to the concentration in the scale of the first factor among the items. Cronbach concluded that: "for reasonably lo.ng tests not divisible into a few factorially distinct subtests, ct is very little greater than the exact proportion of variance due to the first factor" and the higher the value of a therefore, the more interpretable the test or scale is likely to be because it is likely to have a higher first factor concentration, intended, for the scale under consideration, to be attitude toward computer interrogation, a is given by: a --
n
S~ 27jC~j
n--1
Vt
; (i,j = 1,2, . . . , n ; i ~ j )
where: n is the number of statements in the scale, C~j is the covariance of two items i and j, and V~ is the variance of the whole scale. For the L22 scale, a = 0.904. To summarize, the mean of all splits, coefficients of equivalence for the L22 scale was 0.90 (a), and from one of these splits, for the equivalent Forms A and B, the coefficient was 0.91, slightly higher than the mean as would be expected. Considerations of the coefficient a and the coefficient of reproducibility suggest that up to 9 0 ~ of the variance in scale scores can be attributed to the first common factor, which, from the examination of content alone, is likely to be attitude toward computer interrogation, and that 89~o might be a good approximation.
Construction of the Semantic Differential ("SD") scale The SD measures peoples' reactions to various concepts in terms of ratings on bipolar scales defined at their extremes by contrasting adjectives. An example of an SD scale is: GOOD :: extremely
: quite
:slightly
: neutral
: slightly
quite
: • BAD extremely
This type of scale measures directionality and intensity, and, typically, a person is presented with some concept of interest, e.g. computer interrogation, and asked to rate it on a number of such scales. Such ratings tend to be correlated, and three orthogonal dimensions seem to account for most of the covariance in ratings. These three dimensions have been called Evaluation (E), Potency (P), and Activity (A), and have been confirmed in a very wide range of studies (e.g. Heise, 1965; Jacobovitz, 1966; Heise, 1969). The question of the generality of SD scales across concepts has been thoroughly discussed by Heise (1969, pp. 416-420) who concludes that true concept-scale interactions can and do exist in which the loadings of some scales on the EPA factors vary with the concepts being rated. If the concepts are widely different, Heise recommends the careful "tailoring" of the SD to each concept domain, possibly on the basis of a d hoe factor analyses. In the present study however, the concepts to be rated are all types of medical interview, and are thus extremely similar and closely related.
76
R . W . LUCAS TABLE 2
Summed ranks of 12 patients and the rank-order of the factorially "pure" scales of the Evaluation, Potency, and Activity dimensions with respect to their relevance tO medical interviews Scales
Summed ranks o f 12 p a t i e n t s
Good-Bad Beautiful-Ugly Sweet-Sour Clean-Dirty Valuable-Worthless Fair-Unfair Pleasant-Unpleasant Bitter-Sweet Happy-Sad Nice-Awful Honest-Dishonest Kind-Cruel
29 118 127 106 41 66 35 129 83 77 65 60
Factor Evaluation
1"
10 11
9 3* 6 2* 12
8 7 5 4*
Large-Small Hard-Soft Strong-Weak Loud-Soft Deep-Shallow Heavy-Light Thick-Thin Rugged-Delicate Wide-Narrow
64 68 37 72 22 65 91 69 52
4* 6 2*
Sharp-Dull Hot-Cold Angular-Rounded Active-Passive Fast-Slow
30 43 51 20 36
2* 4* 5 1" 3*
Potency
Activity
Rank order
8 1" 5
9 7 3*
Medical Interviews with a C o m p u t e r UNPLEASANT
:
WEAK : VALUABLE :
:---
:- - : - - -
:
:-
PLEASANT
:-
:
:--
:-
STRONG
:---
:--:-
: - -
: - -
WORTHLESS
- -
:
FAST
:--
:-
CRUEL
: -
HOT
:-
SLOW : KIND
:
:
COLD : - -
: - - - : - - : - - : - -
WIDE :
:---
:- - : - - -
:
DULL :
:
:
:-
:
:
:-
: - -
: - -
1 - - - : - -
SHALLOW : - -
:-
GOOD : LARGE : ACTIVE :
.
, _ _ ; .
: - -
;
. ".
--
: - -
NARROW
: -
SHARP DEEP
::
:
."
:_
.
l
SMALL
•
PASSIVE
; _ _ : _ _ ;
BAD
FIG. 3. The 12 scales o f the SD in the order in which t h e y were presented to patients.
ATTITUDES TO COMPUTER INTERROGATION
77
Scales are selected for the SD according to two m a i n criteria: factorial composition and scale relevance.
Factorial composition The selection of factorially "pure" scales can be done with the help o f ad hoe factor analyses, but the most common procedure is to select scales from published factor analyses, and assume that their factorial compositions will not vary significantly in the content area to which they are to be applied, an assumption which is usually tenable if scales with high loadings, and which have had their high loadings confirmed in other studies using other concepts, are selected. Scale relevance Subjects find it easier to use scales which are meaningful to the concepts being judged and which make distinctions which are familiar (Triandis, 1959). Further, relevant scales provide more sensitive measurements, with more variance in ratings and less random error (Mitsos, 1961 ; Koltuv, 1962). The scales for this study were initially selected from a published factor analysis study (Osgood et al., 1957; p. 53). Twelve scales were selected as being factorially "pure" on the Evaluation dimension, ten on the Potency dimension, and six on the Activity dimension. Twelve patients were asked to rank-order each set of scales with respect to their relevance to "medical interviews with computers and doctors". N o further judges were sought as the degree of concordance among the first twelve judges was already very significant (Kendall's coefficients of concordance for the rankings of the scales of the three dimensions were: Evaluation = 0.66, Z ~ = 79.86, p < 0 . 0 0 1 ; Potency = 0.385, X2 = 36.96, p<0.001 ; and Activity = 0.39, Z 2 = 37.7, p<0.001). The scales and their respective summed ranks and rank-orders are shown in Table 2, those marked with an asterisk were those selected to make up the SD and are shown in Fig. 3 in the order in which they were presented to patients in the attitudes questionnaire. The scales were randomly ordered on each page, with the favourable pole randomly allocated to the left or right side of the scales, and the concepts to be rated printed in bold capital letters at the top of the page. Three concepts were to be rated, these were: 1. Medical Interviews with a Doctor, 2. Medical Interviews with a Computer, and 3. The Ideal Medical Interview. The three concepts were included in the questionnaire in the above order, since order appears to be immaterial (Sommer, 1965). The instructions to patients for completing the SD ratings were based on the recommendations of Osgood et al. (1957) and those of Wells & Smith (1960), and were as follows: "WE WOULD LIKE TO KNOW JUST HOW YOU FEEL ABOUT THREE KINDS OF MEDICAL INTERVIEWS, WE WOULD LIKE TO KNOW: 1. What you feel about interviews with doctors, what does being interviewed by a doctor mean to you ? 2. What you feel about interviews with computers, what does being interviewed by a computer mean to you ? 3. What do you think the IDEAL medical interview would be like ? We would like you to answer these questions by making a number of simple choices about these different types of interviews. At the top of each of the next three pages you will find
78
~. w. LUCAS the type of interview you are to think about, this will be printed in large letters. Below this you will find the 12 choices you have to make about that particular type of interview. For example, the first one is like this: UNPLEASANT : :-: :-: :-:-: P.LEASANT extremely slightly slightly extremely quite neutral quite On the first page we would like you to think about all the times you have been interviewed by a doctor, and try to decide whether, on the whole, they were PLEASANT or UNPLEASANT, and just H O W pleasant or unpleasant they were. Then please put a tick in the position on the line which shows what you have chosen. Next you decide whether they were W E A K or STRONG, and then, just H O W weak or strong they were, put a tick somewhere on the line, and carry on with the next one, and so on for each of the twelve choices. Here are two examples.
Example 1. If you think interviews with doctors are usually "QUITE PLEASANT" you should put a tick like this:
¢,
UNPLEASANT :-:- :- :~ :- : --: --.: PLEASANT extremely slightly slightly extremely quite neutral quite Example 2. If you think interviews with doctors are usually "EJ~TREMELY U N P L E A S A N T " you should put a tick like this:
/
UNPLEASANT : - : -: : -: :-:-:PLEASANT extremely slightly slightly extremely quite neutral quite PLEASE COMPLETE ALL 12 CHOICES F O R EACH OF THE NEXT THREE PAGES. T H A N K YOU."
Properties of the SD RELIABILITY Measures o f equivalence are extremely difficult to obtain for SD factor scores, the sum o f the scores o f each individual scale (--3, --2, -- 1, 0, 1, 2, 3; from most unfavourable to m o s t favourable) representing one factor, because o f the small n u m b e r o f scales normally used. Measures o f stability o f SD scores are easier to assess, however, the difficulties o f test-retest procedures with strict control o f interval using outpatients, m a k e such a test for the ad hoc purpose o f estimating the reliaNlity o f the SD used in this study o f doubtful value, particularly in view o f the m a n y published studies. T a n n e n b a u m (Osgood et al., 1957; p. 192) reported correlations between test and retest factor scores ranging f r o m 0-87 to 0.97, with a mean o f 0.91, and O s g o o d et al. (1957; p. 193) reported correlations between test and retest factor scores consistently greater than correlations between test and retest " T h u r s t o n e " scale scores. Again, for the SD, these were quite high, ranging f r o m 0.83 to 0.91. VALIDITY The validity o f SD's has usually been investigated by correlating the results with those from a "traditional" attitude scale o f the Thurstone type, a variation o f criterion related validity. O s g o o d et al. (1957) originally proposed that the Evaluation dimension was the only dimension needed to measure attitudes. However, m a n y empirical studies have shown that attitude is only sometimes pure Evaluation, and most often it is a function o f Evaluation, Potency, and Activity (Heise, 1970).
ATTITUDESTO COMPUTERINTERROGATION
79
For the first 51 patients to complete both the SD scales correlation (R) between E, P, and A factor scores was p<0.001). The matrix of correlations between E, P, and is presented in Table 3. It can be seen that all of the correlated.
and the L22 scale, the multiple0.63 ~(F = 10.35; d f = 3, 47; A factor scores and L22 scores scores were significantly inter-
TABLE 3 Intercorrelations between E, P, and A factor scores and L22 scores
E P A L22
E
P
A
L22
1"00 0'53 0"48 0"55
0"53 1"00 0'58 0"49
0"48 0"58 1"00 0"34
0"55 0"49 0'34 1"00
Correcting the multiple-correlation coefficient for attenuation due to unreliability according to the formula: PXlX2
V(p g where: p rx r~ = correlation between the true scores of scales 1 and 2; Pxlx= = correlation between observed scores of scales 1 and 2; P~lx~ = reliability of scale 1 ; P~2~t = reliability of scale 2. and taking the reliability of the L22 scale to be 0.91, the reliability of the SD to be 0.90, and the multiple-correlation between E, P, and A factor scores to be 0.63, the corrected multiple-correlation is 0.70. Thus approximately 50% of the variance in L22 scores could be predicted from E, P, and A factor scores for the concept "Medical Interviews with a Computer" in the ideal situation of no measurement error, which indicates that the two scales are measuring essentially the same phenomenon, i.e. attitude toward computer interrogation.
The a t t i t u d e scales in use METHODS Seventy-five out-patients attending the Gastro-Intestinal Clinic of the Southern General Hospital, Glasgow, were interrogated by computer. The mechanism for interaction, or "interface", between the patient and the computer++ consisted of a standard 15 in video picture monitor, on which the questions were presented at 10 or 15 characters per second, and a simple keyboard with 3 push-buttons labelled "YES", " N O " , and " D O N ' T U N D E R S T A N D " , for the patient's responses. The first part o f the computer "interview" explained the experiment and instructed the patient, and the medical questions which followed all pertained to symptoms of dyspepsia. After a brief introduction, all patients were left alone in the room with the apparatus (Lucas, 1974). On completion of their "interviews", patients were given a copy of the attitude questionnaire, and asked to read it over and ask questions if there was anything in the tThis is also, of course, a lower bound for the scale's reliability. ++Thecomputer system used was the Honeywell Mark III time-sharing service through a Moore Reed VT 107 Video Data Terminal.
80
R. W, LUCAS
instructions which was not understood. No major difficulties were r e p o r t e d . Each patient was then given a stamped-addressed-envelope and asked to take t h e questionnaire home to complete and return anonymously, for it was thought that the benefits of complete anonymity would outweigh the possible losses due to non-return. It was stressed that the questionnaire was to be completed by the patient himself without the aid of any other person. The front page of the questionnaire took the form of an explanatory letter; part one of the,questionnaire contained the SD attitude scales for ratings of "Medical Interviews with a Doctor", "Medical Interviews with a Computer", and "The Ideal Medical Interview"; part two contained the 22 attitude statements of the L22 scale; and on the final page the patient was asked to supply details of his age, sex, and occupation.~" RESULTS
Of the 75 questionnaires distributed, 67 were returned, a response rate of 89~o. The distributions of non-respondents with respect to age and Registrar General's Social Class Classification (Office of Population Censuses and Surveys, 1970) were not significantly different from those of the patients who returned their questionnaires. However, the sex distributions of respondents and non-respondents were significantly different, ~vlith a higher proportion of female patients failing to respond (%2 = 4.29; d f = 1; p<0-05). Appropriate weightings for results from male and female patients would be 1.094 and 1.412 respectively. These weights will be used below where appropriate.
Favourability of patients' attitudes Distinguishing between favourable and unfavourable attitudes is a problem because of statement bias. However, as Guttman writes: "No matter how questions are worded or "loaded", use of the intensity function will yield the same proportion of the group as favourable and unfavourable" (Guttman, 1947). This "intensity function" is obtained empirically by plotting intensity scores against content scores. For each of the responses to the 22 items of the L22 scale, the intensity score was either 1, 2, or 3, for "mildly agree"/"mildly disagree", "agree"• "disagree", or "strongly agree"/"strongly disagree", respectively. Each patient's overall intensity score was the sum of these individual item intensity scores. During the development of the L22 scale, 85 judges were asked to decide whether each statement was either "favourable" or "unfavourable" toward computer interrogation, and only those statements on which there was a high concensus among the judges were retained in the final scale. For each item of the L22 scale a content score of 1 was allocated if the patient "agreed" with a "favourable" statement or "disagreed" with an "unfavourable" statement, and a content score of 0 allocated if the patient "disagreed" with a "favourable" statement or "agreed" with an "unfavourable" statement. Again, each patient's overall content score was the sum of his individual item content scores. Figure 4 shows the graph of intensity scores against content scores for the 67 patients. The general " U " shape of the graph is clear in Fig. 4, and the point at which this function reaches a minimum is the cutting point between favourable and unfavourable attitudes. This point is usually determined by computing median intensity scores for equal intervals of content scores. For Fig. 4, it does not matter whether 3 or 4 content scores are included tCopies of the questionnaire are available from the author.
81
ATTITUDES TO COMPUTER INTERROGATION
in an inter~,al, or where the intervals have their boundaries, the interval with the lowest median intensity always contains the content score "16". The point of inflexion of this empirical intensity function was therefore taken as that point with an x co-ordinate, i.e. content.score, of "16", and all points to the right of a vertical line drawn through the content score "16" in Fig. 4 were "favourable", Nfav -----51, and all those points to the left of this line in Fig. 4 were "unfavourable", Nu,f = 10. Thus with the weightings for non-response suggested above, an estimate of the proportion of patients with favourable attitudes toward computer interrogation is 8 2 ~ . V
70
T
60 ;
•
o
8
v-
50
y
•
W
i
i
;. •
40
i
r
I
i
Content
FIG. 4. Graph of intensity scores against content scores. ~', Male patient; O , female patient.
Patient variables and favourability The mean of the L22 scores, the overall attitude scores, of male patients was significantly greater than the mean of the L22 scores of female patients (means: 98.25 and 88.895; t = 8.37; d f = 65; P<0.001). Thus female patients appear to have less favourable attitudes toward computer interrogation than male patients. There was a significant negative correlation between age and L22 scores (Pearson r = --0.24; n -----67;p<0.05). Thus age appears to be inversely related to favourability of attitude toward computer interrogation. Two specific hypotheses concerning patient variables were also tested: (1) that nonmanual workers, Registrar General's Social Classes I, II, and IIIN (Office of Population Censuses and Surveys, 1970), would have less favourable attitudes toward computer interrogation than manual workers, Social Classes IIIM, IV, and V; and (2) that persons under 30 years of age, those who have grown up with computers, would have more favourable attitudes toward computer interrogation than older persons. The mean of the L22 scores of non-manual workers was significantly lower than the mean of the L22 scores of manual workers (means: 93.9 and 95.9; t = 1.70; d f = 65; p<0.05 (one-tailed)). The mean of the L22 scores of patients under 30 years of age was significantly higher than the mean of the L22 scores of patients over 30 years of age (means: 101.4 and 94.3; t ---- 5.47; d f = 65; p<0.001 (one-tailed)). Thus both of the above hypotheses were supported by the results from the L22 scale.
Semantic Differential attitude scale ratings The solution of the regression equation of L22 scores on E, P and A factor scores gives the appropriate weights for these scores in the prediction of attitude scores. However, the regression equation: L22( = p,+13~E~+[3aP~+i3aA~+e~
82
~
R.'W. LUCAS
when solved for male and female patients together, gave systematic variations in the error term with respect to sex, but not to age or social class. It is therefore likely that the attitude dimensions of male and female patients are different, and so the above regression equation was solved independently for male and female patients to giye the following predictive equations: for male patients: L22/-----79.94+2-45E~+0.58PI+0.19A~We~, for female patients: L 2 2 / = 73.63 ÷2.27E~+2-92P~--l.03A~-I-e~. The multiple-correlation coefficient (R) for male patients was 0.64, and for female patients was 0.61. The weighted SD scores predicted from these equations for the concepts "Medical Interviews with a Computer", "Medical Interviews with a Doctor", and "The "Ideal Medical Interview" will be referred to below as "SD Computer", " s D Doctor", and "SD Ideal" scores respectively. For 22 out of the 45 male patients, and 11 out of the 17 female patients, "Medical Interviews with a Computer" was rated nearer "The Ideal Medical Interview" than "Medical Interviews with a Doctor". Corrected for non-response bias, 49.9yo'of all patients rated the Computer interview nearer the Ideal. The numbers of patients for whom SD Computer scores were greater than SD.Doctor scores were slightly different to those above, 23 male patients and 9 female patients. Again, correcting for non-response bias, 47"5~o of all patients had SD Computer scores higher than SD Doctor scores. The difference between this proportion and the proportion rating the Computer nearer the Ideal is quite small, and the "true" proportion of patients having more favourable attitudes toward "Medical Interviews with a Computer" than toward "Medical Interviews with a Doctor" is probably between these two proportions, i.e. approximately 49~o.
Patient variables and Semantic Differential ratings "Deviation scores", defined as: I(SD Ideal-SD Doctor) [-l (SD Ideal-SD Computer) ] (negative when SD Doctor nearer SD Ideal, and positive when SD Computer nearer SD Ideal), were computed. The correlation between deviation scores and age was not significantly different from zero for either male or female patients. However, deviation scores for males under 30 years of age were significantly higher than for males over 30 years of age (means: 6.48 and --0.55; t = 5.45; df = 45; p<0.001 (one-tailed)) and the corresponding difference for female patients was also significant (means: 0.6 and --5.4; t = 1.94; d f = 18; p<0.05 (one-tailed)). Thus patients of both sexes under 30 years of age rated "Medical Interviews with a Computer" more favourable compared to "Medical Interviews with a Doctor" than patients over 30 years o f age. Deviation scores for male non-manual workers were significantly lower than for male manual workers (means: --0.9 and 1.39; t = 2.13; d f = 45; p<0.05 (one-tailed)). The corresponding difference for female patients was also significant (means: --17.2 and 2.35; t = 8.88; d f = 18; p<0.001 (one-tailed)). Thus manual workers of both sexes rated "Medical Interviews with a Computer" more favourable compared to "Medical Interviews with a Doctor" than non-manual workers.
ATTITUDESTO COMPUTERINTERROGATION
83
The results from the Semantic Differential attitude scale are thus in very close agreement with the results from the L22 scale, supporting the validity of both measures. The "'Ideal" Medical Interview "SD Profiles" are a convenient method of displaying mean SD ratings on the individual scales of the SD. Figure 5 shows the SD profile of "The Ideal Medical Interview" for 47 male patients and 20 female patients. It would appear that the "Ideal" is a quite different concept for male and female patients, for female patients rated it significantly more "Pleasant", whereas male patients rated it as significantly more "Wide", "Deep", "Large", and "Active". The Ideal Medical Interview p< 0.05 =' l / e
UNPLEASANT j/
PLEASANT
/1
WEAK WORTHLESS
VALUABLE
SLOW
eC. -. "~.
FAST
CRUEL
KIND
COLD
HOT
NARROW
P
DULL
WIDE SHARP
!
SHALLOW
OEEP
BAD
~--d='..
SMALL PASSIVE
-1
O
-~< ' O.0~I~,.,,,,,,.. -"-... ,w,,,. i"i "v 1 p
GOOD
LARGE J ACTIVE 3
PIG. 5. SD profiles of "The Ideal Medical Interview" for 47 male patients (T) and 20 female patients (0). (Significance levels of differences indicated where appropriate.)
Discussion In the development and evaluation of many new man-machine interaction systems, study of the user's attitude is likely to be important. Where quantitative measurement is required, psychological attitude scaling techniques provide suitable measuring instruments, yet they have been little used. In this present paper, two techniques have been described which have been shown to give reliable and valid measurement in the above field of application. The "Thurstone" type of scale requires some effort to construct, but apart from its direct use in the measurement of attitudes and the differentiation of favourable and unfavourable, it can also be used to locate the dimension of attitude in SD ratings, i.e. in the determination of weights for E, P and A factor scores, and because of its typically high first factor concentration, can be used as a standard against which the validity of SD ratings can be assessed. In contrast, the SD is extremely quick and easy to construct, however, generality
84
R.W.
LUCAS
must be assumed when comparisons between attitudes toward more than one object are being made. The validity of the SD would appear to have been adequately demonstrated in this study and possibly would not need confirmation in future studies of attitudes related to computer interrogation. Results from the subsequent use of the scales were extremely similar for the two types, lending further support to the validity of the measures. Use of the attitude scales showed a high level of acceptability of computer interrogation by patients, with 82~o having favourable attitudes. Further, approximately half of the patients in this study had more favourable attitudes toward medical interviews with a computer than toward medical interviews with a doctor, implying that this sample of patients' Overall level of satisfaction with its medical care would have been the same whether it had been received in a clinic in which computer interrogation wereroutinely used or in a "traditional" clinic, assuming all other factors were equal in both clinics. The "other factors" include the doctor-patient interaction, which should take a modified form if the routine questioning were previously done by a computer. The doctor in such a situation might have a considerable amount of information available to him before the interview, including a record of the symptomatology and a list of possible diagnoseswith associated probabilities, so that the most likely effect on the doctor-patient interaction would be to make it more specific to the individual patient from the very beginning. Further research into attitudes of both patients and medical staff would be necessary in a situation where computer interrogation was part of the initial routine. Accuracy, the extent of agreement between the patient's answers concerning individual symptoms and the "true" state of nature (Card et aL, 1974); monetary costs, the calculable computing costs per interview as comPared with the cost of paying a doctor; and acceptability are the three general criteria by which computer interrogation is best evaluated. Since the first attempt to assess patient acceptability by Slack & Van Cura (1968) in which it was found that over 90~ of patients thought computer interrogation was "Interesting", "Likeable", and "Not Difficult", and over 80~o thought it was "Enjoyable", the results of many other studies in several areas of medicine have supported this finding of a high level of patient favourability, (Mayne et al., 1968; Evans et aL, 1971; Grossman et aL, 1971; Doddington & King, 1972; Marg et al., 1972; Yarnall et aL, 1972). This high level of favourability has now been quantified in this study and, through the objective attitude scaling techniques used and the complete anonymity given to patients, an apparently reliable and valid assessment in support of the results of previous studies has been obtained. This work was undertaken as part of a project funded by the Scottish Home and Health Department. I would like to thank Professor W. I. Card for directing this research: Mr S. Hobbs of Jordanhill College of Education, Glasgow, for making students available for the judging task; Dr G. P. Crean and Dr R. P. Knill-Jones for making patients available; and The National Physical Laboratory for the loan of equipment.
References APOSTLE,D. A. (1967). Factors that influence the public view of medical care. Journal of the American Medical Association, 202, 592.
ATTITUDESTO COMPUTERINTERROGATION
85
CAHAL,M. F. (1962). What the public thinks of the family doctor. GeneraIPractitioner, 25, 146. CARD, W. I., NICHOLSON, M., CREAN, G. P., WATKINSON, G., EVANS, C. R., WILSON, J. RUSSELL,n . (1974). A comparison of doctor and computer interrogation of patients. International Journal of BiD-medical Computing, 5, 175. CRONBACH,L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297. DEISHER,R. W., ENGEL, W. L., STIELHOLZ,R. & STANDFAST,S. J. (1965). Mothers' opinions of their pediatric care. Pediatrics, 82. DIXON, W. J. (1973). BMD-Biomedical Computer Programs. London: University of California Press Ltd. DODDINGTON, R. D. & KING, T. L. (1972). Automated history taking in child psychiatry. American Journal of Psychiatry, 129, 52. EDWARDS,A. L. & KILPATRICK,F. P. (1948). A technique for the construction of attitude scales. Journal of Applied Psychology, 32, 374. EDWARDS,A. L. (1957). Techniquesof Attitude Scale Construction. New York: Appleton-CenturyCrofts, Inc. EVANS, C. R., WILSON, J., CARD,W. I., CREAN,G. P., WILSONJAMES, B., NICHOLSON,M. ~¢ WATKINSON, G. (1971). A study of on-line interrogation of hospital patients by a timesharing terminal with computer/consultant comparison analysis. National Physical Laboratory Computer Science Divisional Report, 52. FRANCIS, V., KORSCH, B. M. & MORRIS~M. J. (1969). Gaps in doctor-patient communication. Patients' response to medical advice. New England Journal of Medicine, 280, 535. FRANKLIN, B. J. & McLEMOR~, S. D. (1967). A scale for measuring attitudes toward student health services. Journal of Psychology, 66, 143. GROSSMAN,J. H., McGUIRE, M. T., BARNETT,G. O. & SWEDLOW,D. B. (1971). Evaluation of computer acquired patient histories. Journal of the American Medical Association, 215, 1286. GUTTMAN, L. (1947). The Cornell technique for scale and intensity analysis. Educational and Psychological .AIeasurement, 7, 247. HEISE, D. R. (1965). Semantic differential profiles for 1000 most frequent English words. Psychology Monographs, 79. I-IEISE,D. R. (1969). Some methodological issues in semantic differential research. Psychological Bulletin, 72, 406. HEISE, D. R. (1970). The semantic differential and attitude research. In G. SVV~ERS,Ed., Attitude Measurement. Chicago : Rand McNally. HOVLAND, C. I. &: SHERIF, M. (1952). Judgemental phenomena and scales of attitude measurement; item displacement in Thurstone scales. Journal of Abnormal and Social Psychology, 47, 822. HULKA, B. S., CASSEL,J. C., ZYZANSKI,S. J. & THOMPSON,S. J. (1970). Scale for the measurement of attitudes toward physicians and primary medical care. Medical Care, 8, 429. JACO, E. G. (1963). Medical care: its social and organizational aspects, twentieth-century attitudes toward health and their effect on medicine. New EnglandJournal of Medicine, 269, 18. JACOBOVITZ,L. A. (1966). Comparative psycholinguistics in the study of cultures. International Journal of Psychology, 1, 15. JON~S, F. N., KORR, A. S. & HUrMPHRnY,G. (1965). A direct scale of attitude toward the church. Perceptual and Motor Skills, 2, 319. KANNER,I. F. (1969). Programmed medical history-taking, with or without computer. Journal of the American Medical Association, 207, 317. KELLY, T. L. (1942). The reliability coefficient. Psychometrika, 7, 75. KOLTOV, B. B. (1962). Some characteristics of intrajudge trait intercorrelations. Psychological Monographs, 75, 33. Koos, E. L. (1965). Metropolis what city people think of their medical services. American Journal of Public Health, 45, 1551. KORSCH, B. M., GozzI, E. K. & FRANCIS, V. (1968). Gaps in doctor-patient communication. I. Doctor-patient interaction and patient satisfaction. Pediatrics, 42, 855.
86
R.W. LUCAS
KLrDER, G. F. & RICHARDSON,M. W. (1937). Theory of the estimation of reliability. Psychometrika, 2, 135. LUCAS, R. W. (1974). The development of a computer based system of patient interrogation. University of Glasgow, Ph.D. Thesis. MARO, E., CROSSMAN,E. R. F. W., GOOI)EVE,P. J. & WAKAMATSU,H. (1972). An ~utomated case history taker for eye examination. Journal of Optometry and Archives of the American Academy of Optometry, p. 105, MAYNE, J. G., WEKSEL, W. & SHOLTZ, P. N. (1968). Toward automating the medical history. Mayo ClinicProceedings, 43, 1. MITSOS,S. B. (1961). Personal constructs and the Semantic Differential. Journal of Abnormal and Social Psychology, 62, 433. Office of Population Censuses and Surveys (1970). Classification of Occupations. London: H.M.S.O. OSGOOD,C. E., SucI, G. J. & TANNENBAUM,P. H. (1957). The Measurement of Meaning. Illinois: University of Illinois Press. RAMBO, W. W. (1968). Equal appearing interval scales, own-attitude, and experimental instructions. Perceptualand Motor Skills, 26, 839. READER, G. G., PRATT, L. & MUDD, M. C. (1957). What patients expect from their.doctors. Modern Hospital, 89, 88. SLACK, W. V. & VAN CURA, L. ft. (1968). Patient reaction to computer-based medical interviewing. Computers and Biomedical Research, 1, 527. SOMMER,R. (1965). Anchor effects and the semantic differential. American Journal of Ps)~hology, 78, 317. THURSTONE,L. L. & CHAVE,E. J. (1929). The Measuremgnt of Attitude. Chicago: University of Chicago Press. TRIANDIS,H. C. (1959). Differential perception of certain jobs and people by managers, clerks, and workers in industry. Journal of Applied Psychology, 43, 221. WELLS, W. D. & SMITH, G. (1960). Four semantic rating scales compared. Journal of Applied Psychology, 44, 393. YARNALL,S. R., SAMOELSON,P. & WAKEFIELD,J. S. (1972). Clinical evaluation of an automated screening history. Northwest Medicine, p. 186.