Form effect in the measurement of feeling states

Form effect in the measurement of feeling states

SOCIAL SCIENCE RESEARCH 11, 301-317 (1982) Form Effect in the Measurement of Feeling States RAE R. NEWTON Indiana University DAVID PRENSKY Univ...

1MB Sizes 0 Downloads 16 Views

SOCIAL

SCIENCE

RESEARCH

11,

301-317 (1982)

Form Effect in the Measurement of Feeling States RAE R. NEWTON Indiana

University

DAVID PRENSKY University

of Chicago

AND KARL SCHUESSLER Indiana

University

This study investigates the relation between the responses given to survey items and the manner in which those items are given. Results are based on responses of a random sample of 1522 American adults to 202 items expressing social life feelings (SLFs). SLF items were selected from a domain of over 1000 such items appearing in over 100 scales used in American sociology during the last 50 years. Three different methods of administration were used: self-administration, interview, and card-sort. Each method was applied to each item about 500 times, and to each respondent approximately 67 times. The analysis consisted of comparing mean scores and missing response rates for items across methods, investigating the characteristics of items subject to form effects, investigating the possibility that form effects were spurious and might disappear after controlling on background of respondent, and investigating the presence of form effects in summary scale scores based on several or more items. Results indicate that for most items, patterns of responding and techniques of testing were statistically independent.

This paper concerns the relation between the way in which dichotomous items are given in a sample survey and the answers respondents give. The results are based on the responses of a random sample of 1522 American adults to 202 agree-disagree sociai life feeling items’: 10 true/ Support for this research was provided in part by NIMH Training Grant in Measurement PHS T32 MH 15789-02 and U.S. Public Health Service Grant MH 22294. ’ A social life feeling (SLF) is a sentiment about the social world (a person born at the bottom has no chance of getting ahead) or an affect-state that comes from daily living in that world (I feel good about my job). A social life feeling item (SLF item) expresses such a feeling. 301 0049-089X/82/040301-17$02.00/0 Copyright 0 1982 by Academic Press. Inc. All rights of reproduction in any form reserved.

302

NEWTON,

PRENSKY,

AND

SCHUESSLER

false acquiescing items, 20 true/false social desirability items, and five background items: age, education, race, family income, marital status. Results are pertinent to the possibility that differences between sociological findings represent not differences between sampled populations in feelings and/or attitudes, but rather differences in measuring methods. The 202 social life feeling items (SLF) were administered in three different ways: by interview (INT), by self-administration (SELF), and by a card-sorting process (CARD). In the survey sample, each of the 202 items was assigned to each method an approximately equal number of times (1522 + 3 = 507), and each method was assigned to each respondent an approximately equal number of times (202 t 3 = 67). Thus, on any given item, the 1522 sample cases were split evenly between INT, SELF, and CARD; and for any given case, the 202 items were similarly evenly divided between these three categories. The 10 acquiescing items and the 20 social desirability items were administered by SELF or CARD but never by INT; also, each respondent answered all 30 of these items in only one way, by CARD or SELF. LITERATURE

REVIEW

Numerous studies of how method may affect response pattern appear in the methodological literature of disciplines which utilize survey research. In their review of response effects in sample surveys, Sudman and Bradburn (1974) cite 935 references pertaining to the topic of error in surveys. These studies demonstrate that response differences are associated with many variables, including the type of information being obtained (abilities, attitudes, behaviors), the setting in which one obtains the information (in the home, on the job), the characteristics of the interviewer and/or the interviewee (age, race, education, sex, occupation, etc.), characteristics of the questions (length, number of letters, position of question in survey), method of recall, salience of topic, time period being assessed, subject of respondent’s report (self, other, environment), method of administration (face-to-face, telephone, self-administered, diary, group discussion), and finally, the possibility for a socially desirable answer. In their analysis, Sudman and Bradburn suggest that all such studies may be usefully organized into three categories: (1) task variables, (2) variables associated with the interviewer role, and (3) variables associated with the respondent role. Our focus on method of administration clearly falls into the task-variable category. Sudman and Bradbum list method of administration among task variables to which “surprisingly little attention has been given” and state that “the nature of the task and the conditions under which it is performed are among the variables which have the strongest effects on response” (p. 28). Most studies on methods of administration have been designed to

FORM

EFFECTS

IN SURVEY

RESEARCH

303

assess their effects conditional on one or more factors such as question content or respondent background (age, sex, etc.). In a study comparing self-administered, face-to-face, and telephone interviews, Hochstim (1%7) found that questions pertaining to discussions between husband and wife about women’s most intimate problems received the highest proportion of responses confirming discussion of these topics in the least public reporting method. Comparable studies show a similar distribution of responses along a public-private dimension when the questions relate to sensitive information about self. This relation appears to hold for reported deviant acts by college students (Clark and Tifft, 1966); for nonnormative sexual codes among unmarried mothers (Knudsen, Pope, and Irish, 1967); and for socially sensitive questions regarding abortion and birth control in a sample of residents of a Boston suburb (Wiseman, 1972). Sudman and Bradburn (1974) suggest that when the questioning process engages the respondent’s desire to present him/herself in a favorable light, distortion in response is likely to occur. They note that methods of administration vary along a “privacy” dimension, with self-administration at one pole and face-to-face administration at the other. Their “privacy hypothesis” suggests that less private methods of administration are more likely to create problems of self-presentation and thus greater response distortion. Referring specifically to attitudinal items they hypothesize that less private methods of administration may induce respondents to converge on a socially desirable response. Evidence is cited for both behavioral items and attitudinal items which supports the above hypothesis. Recently, Bradburn and Sudman (1979) have again addressed questions regarding self-presentational strategies and methods of administration. Their results show no clear relation between method of administration and social desirability of reported behavior. They offer the hypothesis that “overreporting of socially desirable acts might be highest for the more personal methods [face-to-face, telephone] whereas underreporting of socially undesirable acts might be highest for the more anonymous methods [self-administered, random response]” (p. 9). Their analysis does not support this hypothesis; it does indicate that when question threat is high; reports of socially undesirable behavior are consistently less frequent than available records indicate, regardless of form. The possible dependence of form effects on respondent characteristics has also been studied. Wiseman (1972) found form effects significantly related to the religious preference of respondents. For questions concerning belief in restrictive premarital sexual norms, Knudsen, Pope, and Irish (1967) found greater differences between questionnaire and interview forms among lower SES respondents than among middle SES respondents. They attribute this apparent interaction between form and

304

NEWTON,PRENSKY.ANDSCHUESSLER

social class to greater “deference” to college-educated middle-class interviewers among women interviewed from low-blue-collar and farmrelated families. In validating interview reports by hospital records, Cannell and Fowler (1972) found a statistical interaction between characteristic of respondent and accuracy of recall in the interview, but not on the self-administered form. A number of studies (for example: Hochstim. 1967; Schmiedeskamp. 1962) have focused on the relation between method of administration and tendency of respondent to give no answer (NA) or to respond “don’t know” (DK). A consistent finding is a smaller number of DK or NA values in the personal interview. The ability of the interviewer to develop rapport and to maintain task involvement in the respondent has been given as an explanation (Rogers, 1976). This finding touches on the question of whether one method of giving items is more accurate than another because it has fewer DKs and NAs. Our findings, mentioned here in passing, indicate that this question may have little practical importance for agree/disagree SLF items. In our sample of 1522 cases, method differences in the missing response rate were much smaller than item differences: item missing response rates ranged from 1 per 100 to 10 per 100, around a mean of approximately 4 per 100, whereas differences between methods within items were seldom larger than 5 per 1000 responses. The implication of this contrast is that efforts to reduce the missing response rate concentrate on the item as such rather than on the method of giving it. Schuman and Presser (1979) cite evidence that the level of “don’t know” or “no opinion” responses is affected by both method of administration (telephone vs face-to-face) and form of question (filtered vs standard). A telephone sample yielded more “don’t know” responses on filtered questions and fewer on standard questions. They suggest that “telephone respondents tended to seize on DKs when they were offered and to avoid giving them spontaneously when they were not offeredin both cases to shorten the interview.” The present study addresses the above issues in a manner distinctive in two main respects: (1) The number and variety of SLF items permitted us to explore more generally the possible interaction between method of administration and type of item; that is to say, whether some categories of items are more subject to form effects than others. (2) In analogous manner, the broad sampling of American adults permitted us to consider more extensively whether some subpopulations are more subject to form effects than others, for example, whether older respondents are more subject to form effects than younger. METHOD

AND ANALYSIS

The general method of this study consisted in giving selected social life feeling items to a broad sampling of American adults in three specific

FORM EFFECTS

IN SURVEY RESEARCH

305

ways.* The 202 questionnaire items were selected from a list of approximately 1000 appearing in over 100 social life feeling tests in use in American sociology during the last 50 years. The selection of INT, CARD, and SELF was based not on theoretical grounds but rather on practical ones. These three techniques were practicable in terms of survey time and energy, and, because of their differing demands, it was supposed that they would add variety to the interview and help to sustain respondent interest.3 That they represent differing degrees of interviewer involvement in interviewee’s response was not a factor in their initial selection, although use was made of this relation in interpreting the results of this study. The interviewer was most involved in INT, and least in SELF.4 The idea of interviewer involvement as a factor in interviewee response is a commonplace in writing on interview methodology; see, for example, Bradburn and Sudman (1979) or Phillips (1971). The sampling plan was consistent with the object of the larger research project, namely to establish the dimensionality of a large and representative collection of social life feeling items and to assess the feasibility of representing these dimensions by l-factor scales (Schuessler and Freshnock, 1978). The survey population was defined as all persons 18 years of age or older resident in households in the continental United States at the time of the survey (September, 1974); the sample was * Interviewer instructions for each of the three methods of administration were as follows: INTERVIEW: “Now 1 am going to read you some statements that have been made about this country and its government. Please tell me whether you mostly agree or mostly disagree with each statement. Many of these issues are complicated but we just want your first general impression on each statement.” SELF: Interviewer hands respondent questionnaire and pen, and then says: “Here are some statements that have been made about this country and its government. Please indicate whether you mostly agree or mostly disagree with each statement on the questionnaire by circling number “I” for “agree” and number “2” for “disagree”. When respondent is finished, interviewer takes back questionnaire. CARD: Interviewer shuflles deck of cards and hands it to respondent along with sort board, and then says: “The statements on these cards have been made about this country and its government. Please sort the cards on this board according to whether you mostly agree or mostly disagree with each statement.” When respondent is finished, interviewer asks respondent to read the number of cards in each box. Unsorted cards are recorded “Don’t Know.” ’ One referee suggests that using different methods within the same interview may create difficulties and cause confusion. We have no way of checking this with our data, because we have no control group made up of respondents answering all items in the same way. a It should be noted that the interviewer was physically present during all three methods of administration. Thus, our SELF form does not represent the same degree of interviewer noninvolvement or apparent anonymity as other methods, which may also be called selfadministration (for example, group or mail self-administration).

306

NEWTON.

PRENSKY,

AND

SCHUESSLER

defined as a random sample of 1500 persons from that population. Both sampling and interviewing were done by Response Analysis, Inc., Princeton, New Jersey. Households were drawn at each of 200 sample points (locations) in such numbers as to yield the sample quota of 1500. It was supposed at the start that 10 assignments (drawings) would yield seven completed cases, and that 2150 assignments would yield slightly more than 1500 cases. But owing to a lower than expected completion rate, it became necessary to enlarge the number of assignments from 2150 to 2657. For all 2657 drawings, no household was present in 152 cases. Of the 2505 cases with household present, 1522 yielded an interview for completion rate of 0.608. The relatively low completion rate was ascribed by Response Analysis, Inc. to both the length of the questionnaire and its content, which was apparently uninteresting to many potential respondents. The SLF items were arranged on the survey questionnaire in seven sections of around 30 items each. Sections were randomly assigned to items, and items were randomly ordered within sections. Within questionnaires, INT, SELF, and CARD were assigned to sections as required to meet the criterion that items be evenly divided between methods for any given case; between questionnaires, methods were rotated within sections as required to meet the criterion that cases be evenly divided between methods on any given item. This procedure of rotating methods within and across questionnaires eliminates the problem of confounding due to differential refusal rates (Groves and Kahn, 1976). SLF items were classified as negative or positive according to whether they reflected a negative or positive sentiment. Agree was scored 1 if the item was negative, 0 if the item was positive; disagree was scored 1 if the item was positive, 0 if the item was negative. An item’s mean score is simply the proportion of responses (agree or disagree as the case may be) reflecting the negative feeling. The missing response rate of an item is the proportion of responses that were neither disagree nor agree; it is a crude rate in that it makes no distinction among “don’t know, no answer, pass, undecided, don’t understand,” and so forth. Items for measuring acquiescing were keyed true regardless of content; items for measuring social desirability were keyed true if they had a relatively high social desirability value, and keyed false if they had a relatively low social desirability value. Data analysis followed the main research questions and consisted of these operations: (1) comparing response patterns for items across methods, (2) investigating the characteristics of items subject to form (method of administration) effects, (3) investigating the possibility that form effects were spurious and might disappear after controlling on background of respondents, and (4) investigating the presence of form effects in summary measures (scale scores) based on several or more items.

FORM EFFECTS

IN SURVEY

307

RESEARCH

RESULTS Comparing

Methods

In comparing methods, these quantities were first calculated: (1) the mean score for each social life feeling item for each method (INT, CARD, and SELF), (2) the mean score for each item for CARD and SELF combined, and (3) the missing response rate for each of the 202 social life feeling items for each method. Next, these differences were tested for significance: (1) between INT/CARD/SELF mean scores, (2) between INT and CARD/SELF combined mean scores, and (3) between INT/ CARD/SELF missing response rates. Lastly, in comparing methods the consistency of differences across INTKARDSELF was assessed. Table la, first column, shows that differences between INT/CARD/ SELF mean scores were significant on 46 (22 + 24) of the 202 SLF items at the .05 level; the second column shows that differences between INT and CARD/SELF combined were significant on 36 of the 202 items. The sense of this latter analysis was to contrast answers given verbally (INT) with nonverbal replies (CARD or SELF). Table lb shows that differences between INT/SELF/CARD missing response rates were significant on 84 of the 202 items. It thus appears that the missing response rate (as defined) is more subject to form effects than item mean score, or that the ratio of agree to disagree responses

Distribution

TABLE I of Items by Level of Significance for Item Mean Comparisons (a) and Missing Response Rate Comparisons (b)

(a) Identifies number of items with significant mean differences for interview/self/card comparisons and verbakonverbal comparisons Interview/self/card

p < .Ol p c .05 Not significant Total

Verbal/nonverbal

%

N

%

N

10.9 11.9 82.2 100.0

22 24 156 202

8.9 8.9 11.2 100.0

18 18 166 202

(b) Identifies number of items with significant response rate differences for interview/self/card comparisons Interview/self/card

p < .Ol p < .05 Not significant Total

%

N

27.2 14.4 58.4 100.0

55 29 118 202

308

NEWTON,

PRENSKY.

AND SCHLJESSLER

across forms is more stable than the proportion of all cases answering agree or disagree. In other words, the missing response rate, which may differ significantly between forms, tends not to differ for agree and disagree respondents within forms. We may speculate that while a change in form may make an item less meaningful for more respondents. and thereby raise its missing response rate, it is not likely to change the ratio of agree to disagree responses among those for whom the item retains its meaning by the criterion of answering agree or disagree. For those items demonstrating significant mean differences (Table la), we calculated the proportion of variance attributable to form of administration (eta squared). None of these 82 (46 + 36) values was larger than .05 and only 11 were larger than .Ol. It thus appears that form has little or no capacity for predicting item response, even in those cases where it produces significant differences between item means. Table 2a addresses the question of whether items with significant mean differences across forms are consistent in their pattern. In other words, when significant mean differences exist, does one method consistently yield a higher mean score than another? The first column of Table 2a shows that INT had the highest mean score on 21 of 46 items, and that SELF was highest on 17 of these 46 items. The second column shows that INT had the higher mean score on 24 of 36 items. Chi-square tests Distribution

TABLE 2 of Significant Items by Form with Highest Mean Response (a) and Form with Highest Missing Response Rate (b)

(a) Distribution

Form with highest mean response Interview (verbal) Nonverbal Self-administered Card Sort Total

of items with significantly different means across form Interview/self/card Verbal/nonverbal comparison comparison -

%

N

7%

N

45.7 37.0 17.4 100.0

21 17 8 46

66.7 33.3 100.0

24 12 36*

(b) Distribution

of items with significantly different response rates across form Interview/self/card comparison Form with the highest % N missing response rate Interview Self Card Total * p < .05. ** p < ,001.

81.0 1.2 17.9 100.0

68 I 15 84**

FORM

EFFECTS

IN SURVEY

RESEARCH

309

were used to examine the hypothesis that the items with significant mean differences were randomly distributed across forms of administration. The second column of Table 2a was significant (p < .05) and the first column, although showing a similar pattern, was not. Respondents appear to show a greater willingness to endorse a negative item or to reject a positive item in INT than in CARD/SELF. Table 2b shows that INT had the highest missing response rate on 68 of the 84 items in which form effects were present. In other words, respondents were more likely to give no answer or to say “don’t know” in INT than in CARD or SELF. By and large, INT drew both more negative responses and more missing responses than either CARD or SELF. Item Characteristics The possibility that items in one category are more subject to form effects than items in another was next considered. Prior to their use in the survey, items had been classified by (1) social desirability, (2) sentence subject, and (3) direction of wording. The association between each of these variables and the classification of items by presence or absence of form effects was investigated. The sense of this investigation was that there might be something about the items themselves that accounted for the presence of form effects, or that form effects were conditional on item characteristics. Social desirability scale values (SDSV) were determined by ratings from an independent survey (Schuessler et al., 1978). This survey instructed respondents to rate the social desirability of SLF items along a 9-point scale ranging from low to high. For present purposes, an item was classified as high in social desirability if its SDSV was above the median, and low in social desirability if its SDSV was below the median. Dropping items not given in the independent survey left 182 for analysis. Assertions about self (I am lonely) were classified as PERSON (term of convenience) and assertions about society (the poor have only themselves to blame) were classified as PEOPLE. Items having neither self nor the social world as clear subject were dropped in this analysis, as were seemingly ambiguous items that were inconsistently judged. There were 74 such items in all; dropping them left 128 for analysis. (For detail see Schuessler and Wallace, 1979.) An item was regarded as positive if it expressed a feeling that would generally be preferred to its opposite, which was then classified as negative. For example, if “I like my job” was classified as positive (preferred), “I dislike my job” was classified as negative. Each of these three dichotomies was paired with each of the three presence/absence distributions of Table 1, namely, 44: 156,36: 166,84: 118, for a total of nine cross tabulations. These nine cross tabulations appear in Table 3. This table shows that form effects were independent (by the

* p < .os.

Direction of wording Negative (Agree) Positive (Disagree) SDSV Low High Subject class** Person People

Item characteristics

Direction of wording Negative Positive SDSV Low High Subject class Person People

Item characteristics

15 25

9 25

16.9 26.9

18.4 31.6

40 44 41 40 12 42

46.1 43.0

24.5 53.2

N

39.2 44.0

%

40 54

74 68

81 7s

N

%

100 100

100 100

100 100

comparison Total

49 79

89 93

102 100

N

8.2 25.3

16.9 19.4

19.6 16.0

%

Present

4 20

I5 18

20 I6

N

75.5 46.8

53.9 51.0

60.8 56.0

%

Absent

37 37

48 53

62 56

N

91.8 74.4

83.1 80.6

80.4 84.0

%

Absent

Verbal/nonverbal

of items with significantly different missing response rates Interview/self/card comparisons

81.6 68.4

83.1 73.1

19.4 75.0

%

Absent

Present

(b) Distribution

21 25

N

20.6 25.0

%

Present

of Form Effects

for items with significant mean differences across form Form Effects

Interview/self/card

(a) Distributions

TABLE 3 Cross Classification of Item Characteristics with Presence/Absence

100 100

100 100

100 100

%

45 59

74 75

82 84

N

comparison

Total

100 100

100 100

100 100

%

Total

49 79

89 93

102 100

N

49 79%

89 93

102 100

N

2 K r E

: K

5 E -5

$ -w

2 E

w E,

FORM

EFFECTS

IN SURVEY

RESEARCH

311

chi-square test) of social desirability (SDSV) in all three tables; similarly, form effects were independent of the negative-positive dichotomy. The cross tabulation of form effect and subject-class (PERSON or PEOPLE) was against the hypothesis of independence two times in three. Of 79 PEOPLE items, 25 (32%) showed form effects by INTICARDISELF, compared with 9 (18%) of the 49 PERSON items. Roughly the same ratios hold for the comparison by INT versus CARD and SELF combined. Forty-two (53%) of the 79 PEOPLE items showed form effects in the missing response rate, compared with 12 (24%) of the 49 PERSON items. These figures justify the conclusion that items about people, which tend to be somewhat abstract, impersonal, and vague, are more subject to form effects than items about the person, which tend by contrast to be concrete, personal, and meaningful. At this juncture, we considered the possibility that the effect of method on response to any given item was dependent on that item’s position in the questionnaire. To check this idea, we counted the number of form effects in the first 20 items, the number in the second 20, and so on, through the last 20 (plus the remainder of two). A comparison of observed and expected frequencies favored the null hypothesis of independence between form and location (x2(9) = 11.52, p > 0.30). As a second check, we compared the observed number of runs in the entire sequence of 202 items with the number expected by chance. This comparison also favored the null hypothesis of independence between form and position (z = - .46). The general idea that the probability of a form effect is conditional on an item’s location in the sequence was not supported by our findings. Respondent

Characteristics

The possibility that differences between mean scores by INT/CARD/ SELF might be due to differences in the composition of form subsamples was next considered. We speculated, for example, that form subsamples might differ by level of education and those differences might account for the differences between score means. In fact, we had little reason to suppose that this was the case, because we had previously verified that respondents between form samples were very nearly identical in their demographic characteristics, consistent with the random assignment of forms to cases. It was, however, at least possible that small differences between forms might become even smaller and possibly lose in significance after correcting for differences in the composition of form samples, and/or for an interaction between form and respondent characteristics. To investigate this problem, we carried out the following analysis. For each of the 46 INT/CARD/SELF items whose mean score differences were significant (Table la), we retested for significance after correcting

312

NEWTON,

PRENSKY.

AND

SCHUESSLER

for age, race, education, and marital status, one at a time.’ In the event of no interaction, we tested for the significance of form after adjusting for the effect of respondent characteristic alone. If an interaction was present we tested for significance after controlling on both that interaction and respondent characteristic. The results of all 46 x 4 = 184 separate analyses are summarized in Table 4. This table shows that 11 of 184 interactions were significant at the .05 level: 2 for race, 6 for marital status, 1 for age, and 2 for education. This total is about what one would expect by chance and is against the hypothesis that method effects are dependent on respondent characteristics. Differences between means lost in significance on 3 items after controlling for age, on 4 items after controlling for race, on 7 for education, and on 8 for marital status. This total of 22 changes does not correspond to 22 separate items, because 7 items lost in significance on two factors. The 22 changes thus represent 15 different items ((7 x 2) + 8). Changes in p values before and after adjustment tended to be small. In 15 of the 22 cases, the change was from significant at the .05 level to significant at the .07 level. Such small changes do not justify reversing the original decision against the nufl hypothesis of no differences between method means. By contrast, in 7 cases, corresponding to as many different items, the change was from significant at the .05 level to not significant at the .25 level. These 7 changes were in line with our original hunch that small differences between method means might become smaller and possibly lose in significance after adjusting for differences between subsamples in race, education, marital status, or age. In these 7 cases, there would seem to be some justification for reversing the original decision against the null hypothesis of no differences between method means. That reversal would lower the estimated number of items subject to form effects from 46 to 39. This lower figure suggests that perhaps no more than 1 in 5 social life feeling items are subject to form effects, all other things equal. The Effect

of Form on Scale Statistics

A further question to which these data may provide an answer concerns the possibility that form effects exert their influence cumulatively over a number of items composing a scale, or composite. Since items were randomly distributed within the questionnaire, it was generally not possible to score two or more items together with form constant. Analysis, however, was possible for at least two social life feeling scales6 and also ’ Family income had originally been included as a respondent characteristic but was dropped because the sample after adjustment had 179 fewer cases than the sample before adjustment owing to information missing on family income. 6 A 9-item job satisfaction scale and a e-item concern with career scale. Social life

FORM EFFECTS

IN SURVEY

313

RESEARCH

TABLE 4 Effect of Respondent Characteristic on Significance of Form Effect Marital status

Race Statistical effect of control Interaction absent p remained < .05 p changed > .05 Interaction present p remained < .05 p changed > .05 Totals

Education

Age

%

N

%

N

%

N

%

N

93.0 3.5

42 2

86.0 3.5

37 3

91.0 5.5

42 3

82.5 12.0

37 7

0.0 3.5 100.0

0 2 46

2.0 8.5 100.0

1 5 46

3.5 0.0 100.0

1 0 46

5.5 0.0 100.0

2 0 46

three response set scales.’ These scales consist of items which were asked within the same form for each individual. Questions of multivariate scale structure across form may be addressed in a wide variety of ways, from a simple summing of items to questions of factorial invariance (cf. Alwin and Jackson, 1981). An exploratory approach was selected for use in this analysis. Five summary statistics readily available from the statistical package for the social sciences (SPSS) and in wide use in the literature were selected as representative scale descriptors. These were (1) scale mean, (2) scale standard deviation, (3) mean interitem correlation, (4) alpha reliability, and (5) the percent of variance explained by the first eigenvalue.’ The results of this analysis are presented in Table 5. Table 5 is constructed so that form effects may be examined by comparisons within columns. A number of methodologically relevant results may be isolated in this table. First, scale means and standard deviations vary little within scales across forms; they are usually slightly larger when the self-administered method is used. The mean interitem correlation, alpha reliability, and percentage of variance explained are also consistently larger in SELF. It appears that scores will more nearly approximate scale criteria under SELF than under CARD. feeling scales were based on a factor analysis of 237 (202 agree/disagree and 35 often/ sometimes/seldom/never items) all at a time. Twelve scales were constructed in all. Each corresponds to a factor in the common-factor space but not all factors were represented by a SLF scale. ’ Ten items from Marlowe-Crowne (l%O) Social Desirability Scale; 10 items from Jackson’s (1%7) Social Desirability Scale: 10 items from Jackson-Messick’s (1961) Acquiescing Scale. a Because scales differ in length, comparisons between eigenvalues are not easily made. For that reason, we give percent total variance corresponding to the first eigenvalue.

314

NEWTON,

PRENSKY,

AND SCHUESSLER

TABLE 5 Analysis of Scale Statistics by Form Scale Statistics” Scale (k items) Jackson-Messick respondingdesirably (10) Self-administered Card-sort Jackson-Messick Acquiescing (10) Self-administered Card-sort Crowne-Marlowe need for social approval (IO) Self-administered Card-sort Concern with career (6) Self-administered Card-sort Bradbum’s negative sentiment (8) Self-administered Card-sort Bradbum’s positive sentiment (4) Self-administered Card-sort Job satisfaction (9) Self-administered Card-sort Interview

a2 x,

%

N

x

770 752

7.98 7.87

1.89

.65 .61

.16 .13

25.4 22.8

770 752

5.82 6.02

2.14 2.01

.58 .52

.I2 .I0

23.3 20.5

770 752

6.47 6.37

2.46 2.44

.73 .71

.21 .20

29.0 28.3

745 777

4.08 4.45*

1.62 1.51

.64 .66

.23 .24

36.5 37.3

745 777

3.62 3.50

2.18 2.10

.70 .68

.23 .20

33.6 31.7

745 777

.71 .66

1.01 .93**

.62 .55

.29 .24

46.7 42.8

245 233 245

2.60 2.20 2.07

2.35 2.11 2.13

.77 .72 .76

.27 .22 .26

35.9 31.2 35.1

0

1.96

Note. Scale means were compared using t tests for two-way comparisons and F tests for three-way comparisons. All standard deviations were compared using F tests. Tests for alpha are equivalent to tests for mean interitem correlations. These tests were made using Feldt’s (1969, 1980) test statistic W, approximated well by the F distribution in large samples. Differences in variance explained were examined by comparing the x2 goodness of &statistics for l-factor models using the ratio F = xf/dF,/&dF2. a X = Scale mean; o = scale standard deviation; a, = standardized item alpha; 2, = mean interitem correlation; % = percentage variance explained by first Eigenvalue. * p < .Ol. ** p < .05.

The 9-item job satisfaction scale provided the one instance in which study design permitted an analysis across all three methods of administration. Consistent with comparisons between SELF and CARD, job satisfaction scores under SELF more nearly met scale criteria, although difference between forms was slight. INT was closely similar in its score statistics to SELF. Here, as in the other tabulations, differences in score statistics by form appear to be consistent in pattern but negligible in magnitude. In the following discussion, we comment on the substantive importance of these results and the implications for the theory and practice of social science research.

FORM

EFFECTS

IN SURVEY

RESEARCH

315

DISCUSSION Rotating methods of administration within questionnaires (cases) was initially proposed as a technique for maintaining respondent interest; rotating methods across questionnaires was proposed as a scheme for comparing methods in their response distributions. Methods differed in their item means about 2 times in 9 (Table 2); the figure is somewhat lower-l time in 5-after making an allowance for differences between form subsamples in respondent characteristics (Table 4). An implication of this result is that method of administration should be considered in the planning of research based on social life feeling items beyond the issues of cost effectiveness and respondent motivation. Such planning should rest to the degree possible on whatever is reliably known about the relation between methods of administration and response patterns. The remainder of this discussion is directed to that relation. In our work we posed these questions: (1) Is there a consistent relation between form of administration and mean score? (2) Is there a consistent relation between form of administration and missing response rate? In respect to the first question no form was consistently highest in its mean, but INT was highest in a plurality of cases (21 of 46 and 24 of 36). If negative responses to attitudinal items are less socially desirable than positive responses, then our findings conflict with those of Sudman and Bradburn (1974) who suggest that for attitudinal items responses tend to converge on a socially desirable response. On Question 2, Table 3b shows that 68 of 84 items had the highest missing response rate in the interview, given that differences were significant. Previous research (Hochstim, 1967; Schmiedeskamp, 1962; Bradburn and Sudman, 1979) has shown that the missing response rate increases when question content is socially sensitive and/or probes illicit behavior, and that these effects are larger in the more public methods of administration. Thus, when items are administered verbally, a higher proportion of nonresponses might be expected. Our findings are in line with this expectation. Several previous studies found a correlation between item characteristic and form effect. We found that form effects were independent of item social desirability and also of direction of wording (Table 3) but dependent in some measure on item subject as PERSON or PEOPLE. In general, PERSON items were less subject to form effects than PEOPLE. A possible explanation is that responses to PERSON items are less subject to random disturbances than PEOPLE because they are based on feelings that to a greater degree are stable, concrete, and welldefined. A practical implication is that form effects might be reasonably ignored in the case of PERSON items, while account of them would have to be taken in research based on PEOPLE social life feeling items. In our work, we found practically no interaction between method of administration and respondent background (Table 4). We did find that

316

NEWTON,

PRENSKY.

AND

SCHUESSLER

small but significant differences between means occasionally lost in significance after adjusting for age, race, marital status, or education. The explanation of this shift lies in the correlation between response to social life feeling items and respondent social background. When form subsamples differ in their respondent characteristics, differences between subsample means reflect not the effect of form but rather the hidden effect of respondent background. Finally, we considered the presence of form effects in measures based on several or more items. We found that SELF did better than CARD, with the mean interitem correlation (or alpha reliability) as criterion (see Table 5). Possibly respondents were distracted by the novelty of CARD and, for that reason, less consistent in their responses to similar items. SELF differentiated among respondents to a greater degree than CARD, as indicated by its slightly larger standard deviation. Significance testing with no exceptions, however, supported the hypothesis that CARD and SELF are identical in their parameters. Only the 9-item job satisfaction scale could be examined across all three methods of administration. This three-way comparison showed a slight difference between SELF and CARD, as above, but practically none between INT and SELF. No difference was statistically significant. The finding of no form effects in composite scores was anticipated. First, the majority of items individually showed no form effects, and second, differences between item means, when significant, were somewhat inconsistent in their pattern (Table 2). The implication of this latter result is that differences between composite scores will tend to zero, even though differences between component (item) scores may be relatively large. SUMMARY

POINTS

The purpose of this study was to determine whether responses social life feeling items are affected by the manner in which they administered. For most items, patterns of responding and techniques testing were statistically independent; moreover, form effects were sent in composites based on several or more items. For that group items showing form effects, we found that

to are of abof

(1) Missing responses are more frequent in INT than in CARD or SELF. (2) Negative responses are more common in INT than in CARD or SELF. (3) PEOPLE items are more subject to form effects than PERSON items. (4) Items high in social desirability are no more subject to form effects than items low in social desirability; and negative items are no more subject to form effects than positive items. (5) Form effects are uncorrelated with respondent’s social background.

All of these findings bear on the planning of research based on responses to social life feeling statements. Together they suggest that, given the

FORM EFFECTS

IN SURVEY RESEARCH

317

kind of item utilized in this study, variations in form of administration may serve to reduce respondent (interviewer) fatigue and boredom, without significantly altering the results. REFERENCES Alwin, D. F., and Jackson, D. J. (1981). “Application of simultaneous factor analysis to issues of factorial invariance,” in Factor Analysis and Measurement in Sociological Research (David J. Jackson. Ed.), Sage, London. Bradburn, N., and Caplovitz, D. (1965), Reports on Happiness: A Pilot Study of the Behavior Related to Mental Health, Aldine, Chicago. Bradburn, N., and Sudman, S. (1979), Improving Interview Method and Questionnaire Design. Response Effects to Threatening Questions in Survey Research, Jossey-Bass, San Francisco. Cannel], C. F., and Fowler, F. J. (1974), Comparison of Hospitalization Reporting in Three Survey Procedures, National Center for Health Statistics, Vital and Health Statistics, Series 2, Number 8. Clark, J., and Tifft, L. (1966), “Polygraph and interview validation of self-reported deviant behavior,” American Sociological Review 31, 516-23. Crowne, D., and Marlowe, D. (1960). “A new scale of social desirability independent of psychopathology,” Journal of Consulting and Clinical Psychology 24, 349-54. Feldt, L. A. (1969), “A test of the hypothesis that Cronbach’s alpha or Kuder-Richardson coefficient twenty is the same for two tests,” Psychometrika 34, 363-373. Feldt, L. A. (1980). “A test of the hypothesis that Cronbach’s alpha reliability coefficient is the same for two tests administered to the same people,” Psychometrika 45, 99-105. Groves, R. M., and Kahn, R. L. (1979) Surveys by Telephone: A National Comparison with Personal Interviews, Academic Press, New York. Hochstim, J. R. (1967) “A critical comparison of three strategies of collecting data from households,” Journal of the American Statistical Association 62 (September), 976-89. Jackson, D. N. (1967), Personality Research Form Manual, Research Psychologists Press, Goshen, New York. Jackson, D. N., and Messick, S. (1961). “Acquiescence and desirability as reponse determinants on the MMPI,” Educational and Psychological Measurement 21, 771-90. Knudson, D. D., Pope, H., and Irish, D. P. (1967), “Response differences to questions on sexual standards,” Public Opinion Quarterly 31 (Summer), 290-97. Marlowe, D., and Crowne, D. P. (1960), “A new scale of social desirability independent of psychopathology,” Journal of Consulting Psychology 24, 349-354. Phillips, D. L. (1971), Knowledge from What. Rand McNally, Chicago. Rogers, T. F. (1976), “Interviews by telephone and in person: Quality of responses and field performance,” Public Opinion Quarterly 40 (Spring), 51-56. Schmiedeskamp, J. (1962). “Reinterviews by telephone,” Journal of Marketing 26, 28-34. Schuessler, K., and Freshnock, L. (1978), “Measuring attitudes towards self and others in society: State of the art,” Social Forces 56, 1228-44. Schuessler, K., Hittle, D., and Cardascia, J. (1978), “Measuring responding desirably with attitude-opinion items,” Social Psychology 41 No. 3, 224-235. Schuessler, K., and Wallace, M. (1979). “Components in the communality of mental attitude items,” Sociological Focus 12, 247-261. Schuman, H., and Presser, S. (1979), “The assessment of ‘no opinion’ in attitude surveys,” in Sociological Methodology 1979 (Karl F. Schuessler Ed.), Jossey-Bass, San Francisco. Sudman, S., and Bradburn, N. M. (1974), Response Effects in Surveys. Aldine, Chicago. Wiseman, F. (1972), “Methodological bias in public opinion surveys,” Public Opinion Quarterly 36 (Spring), 105-08.