Print versus electronic surveys: a comparison of two data collection methodologies

Print versus electronic surveys: a comparison of two data collection methodologies

Journal of Operations Management 20 (2002) 357–373 Print versus electronic surveys: a comparison of two data collection methodologies Kenneth K. Boye...

229KB Sizes 0 Downloads 20 Views

Journal of Operations Management 20 (2002) 357–373

Print versus electronic surveys: a comparison of two data collection methodologies Kenneth K. Boyer a,∗ , John R. Olson b,1 , Roger J. Calantone a , Eric C. Jackson a a b

Department of Marketing & Supply Chain Management, The Eli Broad Graduate School of Management, Michigan State University, N370 North Business Complex, East Lansing, MI 48824-1122, USA Management Department, Kellstadt Graduate School of Commerce, DePaul University, 1 E. Jackson Blvd., Chicago, IL 60604-2287, USA Received 7 September 2001; accepted 2 December 2001

Abstract This paper compares the responses of consumers who submitted answers to a survey instrument focusing on Internet purchasing patterns both electronically and using traditional paper response methods. We present the results of a controlled experiment within a larger data collection effort. The same survey instrument was completed by 416 Internet customers of a major office supplies company, with approximately 60% receiving the survey in paper form and 40% receiving the electronic version. In order to evaluate the efficacy of electronic surveys relative to traditional, printed surveys we conduct two levels of analysis. On a macro-level, we compare the two groups for similarity in terms of fairly aggregate, coarse data characteristics such as response rates, proportion of missing data, scale means and inter-item reliability. On a more fine-grained, micro-level, we compare the two groups for aspects of data integrity such as the presence of data runs and measurement errors. This deeper, finer-grained analysis allows an examination of the potential benefits and flaws of electronic data collection. Our findings suggest that electronic surveys are generally comparable to print surveys in most respects, but that there are a few key advantages and challenges that researchers should evaluate. Notably, our sample indicates that electronic surveys have fewer missing responses and can be coded/presented in a more flexible manner (namely, contingent coding with different respondents receiving different questions depending on the response to earlier questions) that offers researchers new capabilities. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Print; Electronic surveys; Internet

1. Introduction The use of the Internet as a data collection and exchange tool over the past 5 years has greatly expanded. However, many details of how the exchange ∗ Corresponding author. Tel.: +1-517-353-6381; fax: +1-517-432-1112. E-mail address: [email protected] (K.K. Boyer). 1 Tel.: +1-312-362-6061; fax: +1-312-362-6973 [email protected]

of electronic information should be, or is, employed to best advantage in either a business setting or an academic one are yet to be determined. A strong argument for the use of electronic survey methods is the lower relative cost of e-mail surveys when compared with traditional mail surveys. However, questions other then expense emerge. In particular, questions relating to the quality of responses in the electronic media have been raised. So too have questions relative to the actual response rates to electronic surveys as opposed to traditional paper questionnaires.

0272-6963/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S 0 2 7 2 - 6 9 6 3 ( 0 2 ) 0 0 0 0 4 - 9

358

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

This paper addresses these questions by examining the responses of consumers who submitted answers to a survey instrument both electronically and using traditional paper response methods. By comparing the response rates, it is possible to determine whether or not the response rates are equivalent. We compare the responses of the two groups and examine whether their answers are effectively the same on two levels. On a macro-level, we compare the two groups for similarity in terms of fairly aggregate, coarse data characteristics such as response rates, proportion of missing data, scale means and inter-item reliability. On a more fine-grained, micro-level, we compare the two groups for aspects of data integrity such as the presence of data runs and measurement errors. This deeper, finer-grained analysis allows an examination of the potential benefits and flaws of electronic data collection. We seek to build on a tradition within the field of operations management of both adapting proven research techniques from other fields and developing customized applications where appropriate. In recent years, the specialty areas of operations strategy, quality management and technology management have all adopted, improved or developed new techniques for both survey-based research (Flynn et al., 1990; Malhotra and Grover, 1998) and qualitative, case-based research (McCutcheon and Meredith, 1993; Meredith et al., 1989). Over the past decade there has been a substantial increase in both the rigor and quality of survey-based operations management research (Pannirselvam et al., 1999; Boyer and Pagell, 2000). Thus, in an effort to support the continued effort to refine and evaluate data collection methods, we compare print versus electronic data collection methods. This paper is organized in several sections. We first provide a brief review of the extant literature on electronic surveys in order to position our efforts within the body of existing knowledge. In particular, this review points out that there is little consensus regarding the relative benefits/drawbacks of electronic versus print surveys and that there has been little application or investigation of electronic surveys to operations management research questions. The next section describes our data collection process, including the similarities and differences between the two data collection methods. Our data is drawn from a study of 416 customers that have purchased via the Internet from Office Depot. In the data analysis section we conduct two lev-

els of comparison: a macro-level analysis of aggregate, surface characteristics of the two data collection methods and a micro-level analysis of deeper, finer grained data characteristics. We conclude the paper with discussion and set of suggestions for future researchers considering electronic data collection methods.

2. Related literature While electronic data collection techniques are fairly new, there have been several studies in a variety of fields that evaluate their efficacy. Many of these studies evaluate individual dimensions of data collection, but there is no clear consensus regarding the advantages, disadvantages and the best ways to maximize the utility of electronic surveys. We, therefore, present a brief overview of the literature to highlight what has been found to date and to illustrate where further analysis is required. Much existing literature has noted that electronic surveys are attractive to researchers, both academically and commercially because of the potential that they have to reduce the expense of survey work. Bachmann et al. (1996) and Weible and Wallace (1998) both note that electronic surveys offer considerable cost advantages over traditional mail surveys. While cost is certainly important, it should not necessarily be the driving factor in choosing one method over the other. The relative quality of the data collected is a fundamental concern to researchers and in many ways should be the driver of all methodological considerations, although this principle is often overlooked, much as purchasing agents may lose sight of the need to improve the quality of purchased goods when offered a cheap price. In an early example of electronic surveys, Kiesler and Sproull (1986) examined the use of electronic surveys and compared them to traditional mail surveys. They found that when administered in an organized setting the response rates to an electronic survey were good and that the survey turn around time was lessened relative to a paper survey. They determined that there were fewer incomplete responses to an electronic survey format than to paper surveys. They also found that while responses in the two media were similar, paper and electronic responses could not be used interchangeably. However, their work was done in a

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

drastically different era, prior to the extensive utilization of networked computers that is prevalent today. Their data was collected at Carnegie-Mellon University from students and employees with both access to computers at the university and an interest in using what was a new computer technology. Thus, their findings need to be interpreted carefully given the radical changes that have occurred over the past 15 years. In a more recent study, Walsh et al. (1992) examine the idea that broadly available network surveys such as those on the Internet have the potential to improve response rates, increase self-disclosure and the ability to encourage unsolicited contributions, or self-selected responses. They randomly selected 300 subscribers of the Ocean Division of SCIENCEnet. They contacted this sample by e-mail and attached the survey to the e-mail message. They also posted the survey on an electronic bulletin board. They found that the respondents who chose to respond to the survey from the bulletin board gave more complete responses and gave responses of higher quality. It is important to note that the WWW was still not in general use when this work was done and that the respondents generally had extensive computer literacy and interest for the time. Mehta and Sivadas (1995a) conducted research to directly compare the response rates and response content in mail and electronic surveys using the Internet. They divided their sample set into the following five segments: 1. Mail surveys with no pre-notification, no incentives to respond and no reminders. 2. Mail surveys with pre-notification, incentives to respond and reminders. 3. The e-mail surveys sent domestically with conditions corresponding to (1). 4. The e-mail surveys sent domestically with conditions corresponding to (2). 5. An international group of e-mail surveys with conditions equivalent to 4. Mehta and Sivadas (1995a) found that the response rates for group 2 were the highest with 83% responding. The equivalent e-mail group (4) and the international group (5) had response rates of 63 and 65%, respectively. They also discovered that much of group 3 (the e-mail target group) objected to having been sent the survey. This objection to unsolicited e-mail, or “spam”, has been observed in other studies of elec-

359

tronic surveys. Thus, this study suggests that electronic studies generally have lower response rates and researchers should seek to find methods to counteract the “spam effect”. Similar lower response rates to e-mail surveys have been found in other recent literature. in a study predating the WWW, Schuldt and Totten (1994) found that response rates for electronic media were low relative to response rates for traditional methods. They felt that this was in part due to limitations of the media and that further investigation was warranted. Tse (1998) found that response rates using e-mail were lower then those of traditional mail survey methods and speculated that some of the lower response rates were due to a resistance of e-mail account holders to use the media. In a more general examination of marketing on the Internet, Mehta and Sivadas (1995b) concluded that consumers respond very negatively to untargeted electronic marketing methods, in other word consumers objected to “being spammed”. As with traditional print surveys, the ability to target data collection efforts to a specific list of respondents with interests that mesh well with the survey’s items is very important. For example, Cheyne and Ritter (2001) suggest that newsgroups may be a good method of selecting audiences that have some interest in the subject matter of the survey instrument and that as a result this will lessen the negative response many Internet users have to unsolicited e-mail. Similarly, Rogelberg et al. (2001) found that a relationship exists between attitudes toward surveys and the respondent behavior on the surveys. This suggests that Internet respondents who have a positive attitude toward participating in a survey will be more apt to respond to a survey and will respond with better information. In another study, Crawford et al. (2001) relate the lower response rates of web-based surveys to the design of the survey instrument or how burdensome it is for the respondent to complete the survey. They studied a group of college students using a survey designed to examine their attitudes toward affirmative action and related issues. They concluded that the challenge for Web survey designers is two-fold. First, to get respondents to initiate the survey and second to maintain their interest until it is completed. Couper (2000) examined the design of the survey instrument relative to its usability, a design perspective that focuses on the users of the system as opposed to the system itself.

360

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

361

Fig. 1. Data analysis map.

The most recent and applicable work for our purposes is a study by Klassen and Jacobs (2001) that focused on gathering data on operations management— specifically forecasting practices. This paper consisted of a study within a larger study of forecasting with surveys administered as part of four treatment groups: mail, fax, PC disk-by-mail and web-page survey. The general findings of Klassen and Jacobs (2001) were first, that surveys administered via web page had a significantly lower response rate and second, that the three treatment groups excluding mail all had better individual item completion rates. These findings suggest that electronic survey methods (either PC disk by-mail or web-page surveys) lead to more complete data (i.e. better item completion rates), but that web-page surveys may have substantially lower response rates. These important insights lay the foundation for our second level analysis of data quality in addition to data quantity. As may be seen from the foregoing discussion, there are many questions regarding the response rate to electronic surveys, the quality of the responses made to electronic surveys, whether or not the responses of people who respond to electronic surveys are equivalent to the responses to traditional surveys and, if there are differences, how the design of the survey instrument may be used to correct any problems that may exist. Table 1 provides an overview of the literature reviewed and the findings to date. Many of the cited studies have been conducted on college campuses or using students as respondents to the survey instruments. In

contrast, there has been little application of electronic data collection techniques to business research in general, or operations management research in particular. Therefore, we address some of these questions and examine them in a business setting. More specifically, we examine the use of print versus electronic surveys to assess user views of on-line purchasing of office supplies. In the next section, our research hypotheses are developed. Section 4 describes the data collection process, including the similarities and differences between the two data collection methods. Next, the data analysis section is broken into two parts: a macro-level analysis of aggregate, surface characteristics of the two data collection methods and a micro-level analysis of deeper, finer grained data characteristics. Fig. 1 provides an overview of these two types of data analysis. We conclude the paper with a discussion and set of suggestions for future researchers considering electronic collection methods.

3. Research hypotheses The hypotheses that we test in this paper all center on differences in data collection methods (print versus electronic) and are grouped into two levels. As shown in Fig. 1, at a macro-level we are concerned with fairly aggregate measures of data quality. In particular, we test the following four hypotheses: H1a. Response rates are equivalent for individuals receiving print and electronic surveys.

362

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

H1b. The proportion of missing data within a given study is equivalent for individuals receiving print and electronic surveys. H1c. The mean value for scales/constructs is equivalent for individuals receiving print and electronic surveys. H1d. Inter-item reliability (Cronbach’s alpha) is equivalent for individuals receiving print and electronic surveys. At the micro-level, we examine finer-grained characteristics of the two data collection methodologies. Specifically, we test for the presence of data runs, by respondent, for each construct. In addition, we evaluate the measurement model for two sets of questions that are coded and presented differently for the two data collection techniques. This test is done to assess the relative value of utilizing electronic surveys to write questions in a contingent manner not possible on printed surveys. For example, a simple example of a contingent question is: Q1—do you watch television for >10 h per week, if “yes”, Q2—what is your favorite show? An electronic survey can be coded contingently so that if the answer to Q1 is no, then Q2 is skipped. While print surveys can give instructions to the respondent to follow the same procedure, the printed survey is non-contingent in the sense that a respondent cannot be forced to skip a particular question. In particular, we test the following hypotheses: H2a. The likelihood of data runs is equivalent for individuals receiving print and electronic surveys. H2b. The measurement model is equivalent for individuals receiving print and electronic surveys when the questions are coded/presented in a non-contingent versus contingent manner. We define data runs as situations where respondents fall into a pattern of marking numerous responses in a string with very similar values. When this occurs because the respondent is not using the entire response scale, there is a risk that the data collected will suffer from attenuation problems. Further discussion regarding this effect is provided in Section 5.2.2.

4. Data collection Data was collected through a survey of customers who had purchased products online from Office Depot. The goal of our study was to study Internet usage patterns to identify and assess factors that facilitated increased user satisfaction and maximized the performance improvement for the retailer. The survey was designed to assess factors such as specific characteristics of the technology user (purchaser of office products), impact of web site characteristics and respondent comfort levels with computers. The current paper focuses not on content issues, but on methodological issues associated with different data collection methodologies. The survey was conducted using two methods, paper and electronic. First, approximately 60% of the sample was contacted with a traditional four-page printed survey, accompanied by a cover letter explaining that we would provide participants with a survey summary and a US$ 15 rebate bonus. The cover letter also explained that we would keep all results anonymous and only report aggregate findings. Office Depot also provided a letter stating their interest in the study and explaining that the authors were acting as independent, non-biased third party researchers. The second data collection method involved using a computer survey program named “Sensus” (available from Sawtooth Technologies—http://www.sawtooth.com/). This program allows a written survey instrument to be coded on a floppy disk using fairly simply programming rules. Some of the mechanical advantages of using a computer-administered survey include the ability to add more detailed descriptions to questions, tailor the questions more precisely, for example, a seven point Likert scale was used that allowed respondents to answer in half point increments, it allows for the addition of pictures and color formatting. This is consistent with the idea that survey instruments should be interesting to the respondents (Couper, 2000; Crawford et al., 2001; Mehta and Sivadas, 1995b; Rogelberg et al., 2001). “Sensus” also records the data directly on the disk. The data can then be read directly into any database program such as Excel or SPSS. The inclusion of a disk with the survey instrument differs from many of the historical examinations of electronic surveys in that a tactile presence is included,

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

one that is not present in surveys conducted over the Internet. The specific details for both data collection methods were identical, except in a few specific ways. First, written surveys were sent to 631 people (262 responded for a 41.4% response rate) and electronic surveys were sent to 414 people (155 responded for a 37.4% response rate). Both of these groups were 1. pre-contacted by e-mail, and if they responded that they did not want to participate, were removed from the contact list; 2. sent a survey—either written on paper or contained on a floppy disk; 3. all contacts were sent a letter of introduction, a letter from Office Depot endorsing the study, and a promise of a US$ 15 rebate for participating, a business reply envelope; 4. all contacts received a follow-up letter 2 weeks after the initial letter, and a second follow-up package with a replacement survey 4 weeks after the initial letter; 5. a thank you letter with US$ 15 rebate, followed by a summary of research findings upon completion of the survey. To summarize, the mechanics of contacting our samples were exactly the same, except for Step 2— the participants received either a written survey or a computer disk with associated instructions for completion. We chose to include the computer disk rather than sending the survey by e-mail or the Internet because a computer disk seems less likely to be thrown out because of its tangibility and my also reduce respondents’ fears of security lapses due to the interception of electronic communications. Our goal was to put a disk in their hands so that they might feel guilty throwing out, but that once they opened up the appropriate file would make filling out the survey easy. In designing our study, we were concerned that a “pure” Internet survey or e-mail survey might suffer from low response rates similar to those experienced by Klassen and Jacobs (2001). This choice involves important trade-offs for example, sending a computer disk rather than sending an intangible e-mail certainly negates any potential cost advantages of an electronic survey over a paper one. Similarly, the recent panics over anthrax in the US mail raise concerns regarding the reliability of the US mail (Simon, 2001).

363

Furthermore, there is a risk that people may be reluctant to participate due to fears of receiving a computer virus off the disk. Office Depot’s privacy policy prohibiting customer information from being sold or given away dictated that a pre-e-mail contact be made of customers to ask if it was OK for us to contact them. This is also consistent with the concept that electronic survey takers who are pre-contacted will be more likely to respond to the survey instrument (Cheyne and Ritter, 2001; Mehta and Sivadas, 1995b; Tse, 1998). All of the potential participants in the study were pre-contacted by e-mail. Less than 5% of the customers pre-contacted by e-mail noted that they did not want to be included in the study. This low refusal rate supports the idea that interested or targeted groups will be receptive to requests to participate in electronic surveys (Crawford et al., 2001; Mehta and Sivadas, 1995b; Rogelberg et al., 2001). Those customers who responded that they did not want to participate in the survey were deleted from the list. The survey instrument was also pre-tested. We contacted three companies that agreed to fill out the pre-test and participate in a short interview regarding its readability and provide suggestions for improvement. Several useful suggestions were obtained regarding wording of questions and presentation of the survey. We sent out a total of 1045 surveys in our first round of mailings. Several steps were taken to increase the response rate, including the inclusion of a business reply envelope, an incentive to complete the survey and the use of several follow-up letters. The first reminder letter was mailed 2 weeks later re-emphasizing the confidential nature and importance of the survey. A second follow-up letter and a second copy of the survey were mailed to companies that had not filled out the original after 6 weeks. Very few (<5%) of the mailings were returned due to incorrect addresses or the contact person having left the company. This high accuracy rate is due to the current nature of the information in the database we received—most of our contact list had conducted business with Office Depot within the last 6 months. The final tally consisted of 416 usable responses out of 1045 total surveys, representing a 39.8% response rate. To assess non-response bias, we conducted a Chi-square test on the proportion of positive responses for 17 industry membership categories (self-typed by

364

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

5. Data analysis

Table 2 Profile of survey respondents Mean Company data Total employment 202.9 Purchasing employment 3.3 Years in workforce 16.8 Years with current company 6.0 Years in current position 4.8 Business located in Single location Multiple locations Total

S.D.

Median

1103.1 17.4 10.9 5.9 4.8

13 (97.8% <10) 15.0 4.0 3.5

276 125

68.8% 31.2%

401

100.0%

Individual respondent data Education level High school Two year degree Four year degree Graduate degree Total

Our analytical approach consists of two levels of specificity. First, we analyze the macro-level characteristics of the data set as a whole and then each component, print and electronic, individually to determine the basic issues of reliability and validity. In the second, micro-level of analysis we assess the relative quality of the two data gathering techniques. Our goal is to assess whether electronic surveys allow for, or facilitate, a greater or lesser level of quality or thoughtfulness on the part of the respondent. 5.1. Macro-level data analysis

131 102 114 54

32.7% 25.4% 28.4 13.5

401

100.0%

customers when ordering). This test did not reveal a significant difference. Thus, based on this test and the relatively high response rate, there is no evidence indicating a response bias in the data. Table 2 provides an overview of the respondents. The companies tend to be very small, since our initial contact sample consisted of companies chosen to represent companies with <100 employees. In the process of collecting the data it is important to note that five respondents that were sent electronic based surveys returned their answers using a paper format. Since these responses were not clearly categorized as either electronic or paper responses they were not used in the analysis that follows. It is important to note that the results in the current paper represent a sub-study of methodological methods which is a partial presentation of data gathered for other purposes. Other applications of this data have examined relationships between website design, employee work environments, internet strategy and purchasing performance (Boyer and Olson, 2001) and have examined the strategy–execution–performance relationship for companies utilizing the Internet as a purchasing channel. More specifically, companies with more clearly defined, explicit strategies are linked with improved transactional performance (Boyer et al., 2001).

In assessing the reliability and quality relative to respondent thoughtfulness, we simultaneously assess the impact of several findings from extant research. As mention ed above, this data set conforms to the suggestions of (Mehta and Sivadas, 1995a), that subjects on the Internet be pre-contacted before the survey is sent in order to increase the likelihood that they will respond to the survey. Since the subjects were existing users of the Office Depot’s website, we also adopt the suggestion of Cheyne and Ritter (2001) and Mehta and Sivadas (1995b) that a selection of interested subjects will improve response to electronic data collection. Within the confines of these preconditions, a comparison of the response rates for the printed survey (261/631 = 41.4%) and the computer version of the survey, (155/414 = 37.4%) reveals that they are almost identical. The combined response rate for both samples is higher than that seen in similar operations management studies (Kathuria, 2000). This suggests the possibility that electronic surveys conducted using the tactile enhancement of a disk gain better response then response rates seen in studies using the Internet alone (Schuldt and Totten, 1994; Tse, 1998). Furthermore, the overall response rate is higher than that seen in similar studies (Boyer et al., 1997; Duray et al., 2000; Kathuria, 2000). Our evidence suggests that it is possible to obtain response rates for electronic surveys that are comparable to those for printed surveys. Our experience suggests that accepted data collection practices based on Dillman’s (1978) total design method for mail survey research are effective for electronic surveys. However, we also believe that our decision to send the electronic survey in a tangible

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

form (i.e. on a computer disk) rather than through e-mail or the web was a major factor in improving the response rate to the electronic survey. A computer disk seems less likely to he thrown out because of its tangibility and may also reduce respondents’ fears of security lapses due to the interception of electronic communications. Our goal was to put a disk in their hands so that they might feel guilty throwing out, but that once they opened up the appropriate file would make filling out the survey easy. Our next step was to examine the descriptive and reliability data for the scales in aggregate and as separate sub-samples (print and electronic data). Table 3 provides the means, standard deviations and Cronbach’s alphas for the aggregate sample and each of the individual samples. There is an excellent correspondence between the print, electronic and total data. All

365

Cronbach’s alphas meet the generally accepted standard of 0.60 for new scales (Nunnally, 1978). The means for the print and electronic data are very similar to those for the aggregate data. In all but two cases a t-test for differences is insignificant. One area that differs is the number of hours spent on a computer per week (26.04 for the print survey and 28.56 for the computer survey), which makes intuitive sense. Although both sub-samples come from the same population, it is likely that there is a small response bias such that, within the sub-sample that received electronic surveys people with more computer experience are more likely to return the survey. The only other scale that shows a significant difference is DELIVERY. This difference could be a type I error, while the difference for computer hours per week has an easy intuitive explanation. Thus, the data suggest that there is no reason to believe that the data collected via print

Table 3 Descriptive and reliability data for scales Number of items in scale

Individual sub-samples

Aggregate sample

Cronbach’s alpha

Print mean

Mean

All

Print

Computer

Electronic mean

S.D.

Strategy Goals Admin Delivery

3 4 3

5.70 5.04 5.59

5.75 5.04 5.86∗

5.72 5.04 5.69

1.13 1.37 1.28

0.63 0.79 0.80

0.63 0.79 0.80

0.64 0.81 0.80

Environmental Comfort Techsupport Adoptlevel Computer hours per week Training hours per year

4 1 2 1 1

5.12 4.71 2.59 26.04 8.80

5.23 4.83 2.89 28.56∗ 7.82

5.16 4.76 2.70 26.96 8.42

1.24 1.95 1.95 11.54 23.95

0.90

0.89

0.92

0.85∗∗

0.92∗∗

0.74∗∗

Site specific Siteease Accuracy Transact System

5 5 3 7

5.40 5.38 5.34 5.30

5.33 5.34 5.35 5.16

5.37 5.36 5.34 5.25

1.18 1.08 1.27 1.08

0.89 0.87 0.65 0.87

0.90 0.81 0.70 0.87

0.87 0.84 0.56 0.88

Internet specific Percuse Percease Attitude

8 4 4

4.88 5.37 5.39

5.02 5.35 5.53

4.93 5.36 5.44

1.49 1.32 1.33

0.96 0.93 0.89

0.96 0.94 0.88

0.97 0.91 0.90

Performance Intimprv Costperf Perfacc

5 2 2

4.47 4.18 4.57

4.73 4.08 4.46

4.57 4.14 4.53

1.53 0.81 0.88

0.90 0.42∗∗ 0.50∗∗

0.91 0.50∗∗ 0.47∗∗

0.82 0.29∗∗ 0.57∗∗



Represents a t-test of the print vs. electronic means with significant differences (p < 0.05). Represents a scale with only two items. Cronbach’s alpha is not applicable, so these are correlations, which are not equal to zero (p < 0.01). ∗∗

366

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

Table 4 Missing data analysis

Numerical data Likert scales All

Print survey (%)

Electronic survey (%)

t-test

2.62 1.64 1.76

0.86 0.05 0.15

p < 0.01 p < 0.0 p < 0.01

or electronic means differs substantively in terms of either mean response or inter-item reliability. Researchers with experience collecting survey data by any means—print, electronic or telephone—are always concerned about missing responses. Respondents will often deliberately (sometimes accidentally) skip a question. This poses a challenge to the researcher, who must then decide how to treat the missing data. Thus, we examine our two sub-samples of respondents to assess whether one methodology leads to fewer missing responses. Table 4 shows a comparison of missing values rates for both numerical (where respondents were asked to write-in their own number, i.e. how many employees does your company employ?) and Likert-type scale questions. For both types of questions, the electronic survey had a significantly lower portion of missing values. The 2.62% of responses that were missing for numerical responses on the printed survey is a fairly substantial amount. Similarly, 1.64% of the Likert-type scale questions were missing responses for the printed survey versus a very small 0.05% for the electronic survey. In short, the electronic survey appears to be superior in terms of eliciting complete responses from respondents. 5.2. Micro-level data analysis On a surface level the print and electronic surveys are consistent with respect to response rates, reliability (Cronbach’s alphas) and means. In our data, electronic surveys have significantly fewer missing values. Given this evidence suggesting the two data collection methods are comparable with respect to fairly standard measures of reliability and in terms of aggregate, descriptive information, we turn to an examination of more complex and subtle data properties. In order to assess the more complex aspects of data quality or respondent thoughtfulness we employ two techniques. First, we apply a structural equation

modeling approach to two linked sets of questions that the electronic survey allowed us to code in a more precise manner than the printed survey. The second analysis presented in this section examines the relative tendency of the two data collection methodologies to allow or encourage runs of sequential values within various constructs of the survey instrument. This analysis examines the amount of variance within each construct, within each respondent and is compared across the two data collection methodologies. These two analytical techniques are presented separately in the sections below. 5.2.1. Assessing data quality in the presence of special coding for electronic versus print surveys This section analyzes data from two sets of questions that the electronic survey allowed us to code in a more effective manner (we believe) than the print survey. These questions consist of two paired questions relating to six unique features of Office Depot’s website: (1) one that asks respondents to rate the value and (2) one that asks the amount they use each feature. The print version of the survey placed these questions side by side. Yet, it seems unproductive, and potentially biased, to ask someone the value they place on a feature they do not use. For example, if the respondent has never used a custom-shopping list, it would seem likely that they do not place much value on this feature. The electronic survey offered us the capability of coding these matched questions in a contingent manner that if a respondent answered that they did not use a feature at all (i.e. gave a use rating of either 1 or 1.5 on the seven point scale), the survey skipped the associated question on the value of that feature and went directly to a question about the use of the next feature. In order to test hypothesis 2b that this method of presenting and collecting data is “better” for a question such as this, we develop a confirmatory factor analysis (CFA) model to test the impact of data collection methodology. As shown in Fig. 2, we wish to test a model with two factors. The Use factor is combination of the all six features, while Value includes only three of the related questions regarding the value of these same features. The reason Value only includes three items is that there was a great deal of missing data because a majority of the respondents to the electronic survey did not answer the questions on the value of custom-shopping lists

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

367

Fig. 2. Model of relationship between Value and Use for website features.

(Q22a), multiple billing and shipping options (Q22c) and store pick-up/return (Q22e). Fig. 2 indicates that Use is an exogenous variable, while Value is an endogenous variable and data collection methodology is included as a control variable. Two data sets may be compared in EQS by evaluating the equivalence of each set through CFA. This comparison can be done by setting the coefficients of the independent variables equal and constraining the error terms of the independent variable to also be equal across groups. The covariance between constructs is also constrained to be equal across groups. The Chi-square (and associated p-value) of this constraint then indicates whether or not the constraint is reasonable. Where differences across groups would be significant, sensitivity analysis can reveal where the constraints should be released. The dependent variables of Use and Value are related in that it was assumed that the respondent would value a function that they used more highly than a function they did not use. The electronic survey instrument was coded so that if a respondent answered that they did not use a feature at all (i.e. gave a use rating of either 1 or 1.5 on the seven point scale), the survey skipped the associated question on the value of that feature and went directly to a question about the use of the next feature. The printed survey did not have this instruction or capability, therefore, most respondents answered both sets of questions whether they used a feature or not. In order to test hypothesis 2b regarding the equivalency of the measurement models for the two methodologies, we employ structural equation modeling to estimate values for the model shown in Fig. 2. The conceptual model shown in Fig. 2 was tested using EQS (Byrne, 1994). The initial model provided

adequate fit, but a close examination indicated that there were significant covariances between error terms indicating possible instrumentation sourced variance. Examination of the individual questions indicated that the majority of problems were connected to the “Quick Order by Item #” question (Q22f) regarding the value of this feature. One explanation for this comes from our discussions with both Office Depot managers and customers. Office products come in an extremely wide variety of options, but have tremendously close descriptions, even within a single manufacturer’s product line. For example, a search for Avery mailing labels yields 355 different products with the main differentiating feature being the size of the label. This makes ordering items over the web or even over the phone without an order item list virtually impossible. Our subjective assessment is that repeat customers of Office Depot’s website are much more likely to realize the value of being able to order by item # (Q22f) because of their familiarity with the difficulty in finding a specific item. Therefore, it is intuitively logical that there is a correlation between the errors for this question and other questions. Thus, the covariance was not deemed to be an aberration. Similarly, the covariance between the “Custom Shopping List” question and the “Multiple Billing and Shipping Options” question and the covariance between the, “Multiple Billing and Shipping Options” question and the “Store Pick-up/Return” question were deemed not to be a violation of the hypothesis either. Therefore, the covariances were incorporated into the model. Fig. 3 shows the standardized solution and the loadings for the print and electronic surveys. We test for measurement differences (hypothesis 2b) by running the model shown in Fig. 3 for the

368

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

Fig. 3. Structural equation model—Value and Use.

print and electronic data sets while constraining the error terms of the two respondent groups to be equal to each other. This analysis was conducted by creating a dummy variable for methodology (electronic or print), a procedure in EQS for running multi-trait, multi-method analyses. The resulting CFI was 0.885 and the RMSEA was 0.078, indicating an acceptable fit of the model to the data (Bagozzi and Yi, 1988; Bagozzi and Yi, 1989). In order to test the null hypothesis, that there was no difference between the two measurement instruments, all of the primary construct, secondary construct and error covariances were constrained to be equal. A Lagrange Multiplier test was then used to determine if these constraints were significant. It was found that three constraints had p-values <0.05. These constraints between the two instruments were for the value to use relationship, the error term (E3) for Q22f and the error term (E7) for Q23d. All of the primary construct values (between Value and Use and individual questions {Q22b, d, f} and {Q23a–f} were found to have p-values >0.05 and,

thus, the equality constraint could not be rejected. This indicates that there are small instrument differences between the two media. The three equality constraints that were found not to hold were released and the model was rerun. The standardized values for the model and the fit indices are shown in Fig. 3. The secondary construct loading (Use ⇒ Value) of 0.828 for those respondents using the electronic instrument was significantly different (p < 0.05) at 0.637 for those who used the paper instrument. This may be interpreted as meaning that for one standard deviation change in the electronic respondent’s perception of the value of a website functionality there is a 0.828 standard deviation change in their usage of that function. Correspondingly there is a 0.637 standard deviation change in the usage of a function built into a website for one standard deviation change in the perception of the value of that function to those who used the paper instrument. The stronger relationship between use and value for the respondents to the electronic instrument makes intuitive sense because

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

the contingent coding essentially prevents respondents who do not use a feature at all from providing a rating for the value of that feature. While it is not necessarily true that a stronger relationship is better, we believe that the significant difference in loadings implies that the capability to contingently code questions on electronic surveys contributes to improved data. To summarize the results of our measurement model evaluation, the data do not provide evidence to reject hypothesis 2b. In other words, the measurement model shown in Fig. 2 can he fit equally well for either data collected via print or electronic surveys. However, the predictive strength for the endogenous variables (fitted constructs—Value and Use) differs between the two methods because of the usage of contingent or non-contingent coding. 5.2.2. Assessing the presence of data runs or error effects A second concern we had regarding the quality of the data from the two data collection methodologies regarded the possibility of data runs. A common concern of researchers is that respondents will fall into a pattern of marking all answers with similar scores— either tending to be in the high (i.e. mostly 6 and 7’s), low (1 or 2’s) or moderate (3, 4 or 5’s) ranges on Likert-type scale items. This can occur across an entire survey (i.e. the respondent tends to rate all questions high or low) or within individual sets of scale items. For example, operations management researchers have long noted the tendency of respondents to state that several competitive priorities were important or that all respondents have “above average” financial performance (Boyer and Pagell, 2000). One of the underlying problems is that there are “true” effects and “error” effects within any scale or construct. We would like to assess one element of error effects—the tendency of respondents to assign consistent values within a scale due to patterned response making (i.e. an auto-pilot effect) rather than carefully thought out, or “true” effect responses. In the most egregious case, many researchers have run across the situation of a respondent who circles (with one circle) all 6’s or all 7’s for an entire group or questions—or worse yet—an entire page of questions. This response is clearly flawed, but how do we separate less obvious cases? For example, what if a respondent circles (individually) five responses of “1” for our use questions (questions 23a–f

369

in Appendix A) and one “2”? How much of this response pattern is because the respondent does not use these features (i.e. true effect) and how much is because the respondent’s eyes glazed over after the first two or three and put their hand on auto-pilot (i.e. creating runs or error effects)? We expect that the electronic surveys may reduce the effect of runs because the medium presented questions individually, one on screen at a time. Our expectation is that this helps differentiate the question in respondents’ minds, thus, reducing the likelihood of runs or error effects. In order to assess the presence of runs within scales for our two data collection methodologies, we calculated standard deviations for each scale for each respondent. For example, suppose Respondent A has the following responses to questions 23a–f (l, 3, 4, 2, 5, 3) while Respondent B these responses (1, 2, 1, 2, 2, 2). Respondent A’s standard deviation for the Use construct is 1.4, while Respondent B’s is 0.5. The lower standard deviation could indicate the presence of an error effect caused by Respondent B’s falling into a run of low responses for this series of questions. Alternatively, this could be a true effect indicating what the question purports to measure—whether the respondent uses these features or not. Of course, we cannot definitively say that the lower standard deviation for Respondent B is an error or true effect. However, an aggregate assessment of all of our scales for both methodologies can provide some insight. We have already shown that our scales possess good inter-item reliability (i.e. Cronbach’s alphas), thus, the standard deviation within a scale for a respondent serve as a rough indicator of the presence of runs, with lower numbers indicating the possibility that the respondent answered the questions in a biased manner. While high Cronbach’s alphas indicate that the questions comprising a scale are internally consistent (correlate highly and possibly converge on a common underlying latent construct), researchers do not expect (or want) these questions to have zero variability within the scale and respondent. In the extreme case, a respondent that answers the same number for every question (i.e. five 7’s) has zero variability within that scale. As researchers we do not like to see this case since the purpose of a scale composed of multiple items is to address the same underlying issue in slightly different ways—if all the responses are identical, there is no reason to have multiple questions. Thus, the ideal situation is

370

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

6. Discussion

Table 5 Evaluation of runs within scales Print S.D. (mean: n = 260)

Electronic S.D. (mean: n = 148)

Strategy Goals Admin Delivery

0.93 1.06 0.71

0.84 0.97 0.63

Environmental Comfort

0.88

0.79

Site specific Siteease Accuracy Transact System Value Use

0.69 0.67 0.87 0.86 1.13 1.7

0.77 0.65 1.02 0.87 1.13 1.65

Internet specific Percuse Percease Attitude

0.69 0.48 0.83

0.64 0.58 0.74

Performance Intimprv∗ Costperf Perfacc

0.90 0.37 0.41

0.75 0.40 0.36

0.83

0.79

All (mean of all 16 scales)

∗ S.D. of print and electronic sub-samples are different based on a one way ANOVA at p < 0.05.

to have a scale and data collection method that has high-inter-item reliability as well as a fair amount of variability—thus, indicating that individual items assess a similar underlying construct, but not in an identical manner. Table 5 presents the standard deviations for each scale in our study, by data collection method The numbers represent the average of the standard deviation for each respondent on each construct. We test for differences between the means for the printed and electronic surveys using an ANOVA for each scale, as well as the overall mean (average of the standard deviation for all 16 scales by respondent). Only one of the scales has a significant difference (INTIMPRV has a lower standard deviation for the electronic survey than for the print). This is counter to our expectation that the electronic survey should be less prone to runs. The data do not provide any evidence that either data collection methodology is more or less prone to runs.

This paper has presented a comparison and evaluation of two different survey data collection methods. The traditional method of survey data collection involves mailing a printed survey to a selected sample. The relative strengths and weaknesses of this approach are fairly well known and established, both in terms of general business research (Dillman, 1978) and within the more specialized sphere of operations management research (Flynn et al., 1990). In contrast, electronic surveys (whether administered by electronic mail, over the Internet or via computer disk or other specialized application) are less established as a tried and true methodology. While electronic surveys appear to offer several improved capabilities, they also have significant challenges. A necessary first step in evaluating the efficacy of electronic surveys involves comparing the reliability, validity and quality of information gathered using this technique versus data gathered with a traditional printed survey. Our general findings indicate that, with careful design, the two data collection techniques provide very comparable data with a few notable exceptions. In general, our study confirms the findings of Klassen and Jacobs (2001) —both studies found that disk-by-mail surveys yielded response rates comparable to traditional print-based mail surveys while also yielding higher individual item response rates. The current study extends the findings of Klassen and Jacobs by examining the reliability and validity of the data in addition to more surface level characteristics. In particular, we find that electronic surveys offer important capabilities in terms of contingently coding questions. In terms of data collection efforts, there are important differences between the two data collection methods employed in our study. In general, printed surveys involve less pre-investment of time and resources to develop. Electronic surveys are much more challenging to develop because of the need to learn the software package and to carefully pre-test and de-bug the survey to make it as simple and transparent as possible to the respondent. Printed surveys are much easier because what you see is what you get, there are fewer hidden intricacies. Similarly, electronic surveys require different handling procedures. In particular, we found that it took 1–2 min per survey to copy the necessary files onto each floppy disk. In contrast, printed surveys are

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

printed in bulk, mailed and have much lower risk of being damaged in transit. However, many of these difficulties with electronic surveys are counterbalanced by efficiencies gained after the researchers receive the completed surveys. Data entry is automated, which results in both greater efficiency (a savings of between 2 and 4 min per survey) and greater data accuracy due to the automatic entry of data without the need for a human to transcribe data. Thus, there are benefits and drawbacks that must be considered in adopting either approach. Our analysis of the relative data reliability, validity and quality of the two data collection methods suggests that they are largely comparable, with a couple of minor exceptions. On a macro or aggregate level, both methods had statistically similar response rates, scale/construct means and inter item reliabilities. This suggests that the two methods are largely inter-changeable if careful research design is employed prior to starting data collection. The one significant difference was in terms of missing data, where electronic surveys have substantially fewer missing responses. This can be a valuable benefit for researchers who are often plagued by problems with missing data such that sample sizes are severely depleted or vary substantially depending on the scales/variables included in different analyses (Miller and Roth, 1994; Vickery et al., 1993; Duray et al., 2000; Kathuria, 2000). Thus, we reject hypothesis H1b while failing to reject hypotheses H1a, H1c and H1d. On a micro or detailed level, we tested for measurement differences and the presence of data runs. There is no evidence to indicate that electronic surveys help reduce the tendency of respondents to fall into a pattern or zone where their responses become fairly repetitive. Thus, there is no evidence to reject hypothesis H2a. This finding runs counter to our a priori expectation. We had thought that the ability to present questions one at a time on an electronic survey (rather than as an entire page or section of questions grouped together) would help create separation or independence between questions due to the greater ability to focus the respondent’s attention. A reduction in data runs or repetitiveness would improve the quality or true effect of measurement while reducing error effects. While our analysis does not indicate any differences between the two methods, it is important

371

to examine this question further to ascertain whether electronic surveys can be administered in a manner that does reduce data runs or errors. Finally, the comparison of measurement models shown in Figs. 2 and 3 indicates that here no difference in the ability to apply identical measurement models across two data collection methodologies. However, there are important differences in the loadings on this model, particularly in terms of predictive value. This finding supports our a priori belief that electronic surveys provide an important capability to design surveys in a contingent manner. Specifically, the relationship between Use and Value of several Internet features studied differed significantly for the two methodologies. Given that we coded the electronic survey to only ask the corresponding value question if the respondent actually used that feature, it is reassuring that there is a stronger relationship between these two constructs for the contingently coded electronic survey than for the non-contingent printed survey. The implication is that the data from the electronic survey is better suited to assess this relationship because of its more precise coding. The bottom line is that electronic surveys offer a viable alternative to printed surveys, but researchers must carefully consider their goals and objectives. As with any survey, careful design and implementation can prevent or ameliorate many potential problems. On a macro-level, the two data collection techniques offer comparable results, but there are important differences at a more detailed level. Future research should experiment further with electronic surveys to refine and develop the best applications. In many cases, electronic surveys offer dramatically new capabilities, but researchers must explicitly consider the trade-offs and work to maximize the efficacy of this new technique. For example, results can be expected to differ depending on whether data is collected based on data collection method (web, e-mail, fax, print or disk by post). Alternatively, the contact sample may influence outcomes—for example, our sample is likely to be fairly computer literate due to our choice to use consumers who had actually placed orders via the Internet. In short, researchers should explore these issues further through controlled experiments and as with all research designs, must carefully consider the a priori trade-offs that must be made in designing any data collection effort.

372

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373

Appendix A. Questions used to assess Value and Use Please rate whether the following features of Office Depot’s Internet site are valuable and how frequently you use them.

References Bachmann, D., Elfrink, J., Vazzana, B., 1996. Tracking the progress of e-mail versus snail mail. Marketing Research 8 (2), 31–35. Bagozzi, R.P., Yi, Y., 1988. On the evaluation of structural equation models. Journal of Academy of Marketing Science 16 (1), 74– 94. Bagozzi, R.P., Yi, Y., 1989. On the use of structural equation models in experimental designs. Journal of Marketing Research 26 (3), 271–284. Boyer, K.K., Pagell, M., 2000. Measurement issues in empirical research: improving measures of operations strategy and advanced manufacturing technology. Journal of Operations Management 18 (3), 361–374. Boyer, K.K., Olson, J.R., 2001. Drivers of Internet Purchasing Success. Production and Operations Management, in press. Boyer, K.K., Keong, L.G., Ward, P.T., Krajewski, L.J., 1997. Unlocking the potential of advanced manfacturing technologies. Journal of Operations Management 15 (4), 331–347. Boyer, K.K., Olson, J.R., Talluri, 2001. Operations Strategy and Internet Purchasing: A Contingent Model. Working paper. Byrne, B.M., 1994. Structural Equation Modeling with EQS and EQS/Windows. Sage, London. Cheyne, T.L., Ritter, F.E., 2001. Targeting audiences on the Internet. Communications of the ACM 44 (4), 94–98. Couper, M.P., 2000. Usability evaluation of computer-assisted survey instruments. Social Science Computer Review 18 (4), 384–396. Crawford, S.D., Couper, M.P., Lamias, M.J., 2001. Web surveys perceptions of burden. Social Science Computer Review 19 (2), 146–162.

Dillman, D.A., 1978. Mail and Telephone Surveys: The Total Design Method, Wiley, New York, NY. Duray, R., Ward, P.T., Milligan, G.W., Berry, W.L., 2000. Approaches to mass customization, configurations and empirical validation. Journal of Operations Management 18 (6), 605–626. Flynn, B.B., Sakakibara, S., Schroeder, R.G., Bates, K.A., Flynn, E.J., 1990. Empirical research methods in operations management. Journal of Operations Management 9 (2), 250–284. Kathuria, R., 2000. Competitive priorities and managerial performance: a taxonomy of small manufacturers. Journal of Operations Management 18 (6), 627–642. Kiesler, S., Sproull, L.S., 1986. Response effects in the electronic survey. Public Opinion Quarterly 50, 402–413. Klassen, R.D., Jacobs, J., 2001. Experimental comparison of web, electronic and mail survey technologies in operations management. Journal of Operations Management 19 (6), 713–728. Malhotra, M.K., Grover, V., 1998. An assessment of survey research in POM: from constructs to theory. Journal of Operations Management 16 (4), 407–425. Mehta, R., Sivadas, E., 1995a. Comparing response rates and response content in mail versus electronic mail surveys. Journal of the Market Research Society 37 (4), 429–439. Mehta, R., Sivadas, E., 1995b. Direct marketing on the Internet: an empirical assessment of consumer attitudes. Journal of Direct Marketing 9 (3), 21–32. McCutcheon, D.M., Meredith, J.R., 1993. Case study research in operations management. Journal of Operations Management 11, 239–256. Meredith, J.R., Raturi, A., Arnoako-Gyampah, K., Kaplan, B., 1989. Alternative research paradigms in operations. Journal of Operations Management 8 (4), 297–326.

K.K. Boyer et al. / Journal of Operations Management 20 (2002) 357–373 Miller, J.G., Roth, A.V., 1994. A taxonomy of manufacturing strategies. Management Science 40 (3), 285–304. Nunnally, J., 1978. Psychometric Theory, McGraw-Hill, New York. Pannirselvam, G.P., Ferguson, L.A., Ash, R.C., Siferd, S.P., 1999. Operations management research: an update for the 1990s. Journal of Operations Management 18 (1), 95–112. Rogelberg, S.G., Fisher, G.G., Maynard, D.C., Hakel, M.D., Horvath, M., 2001. Attitudes toward surveys: development of a measure and its relationship to respondent behavior. Organizational Research Methods 40 (1), 3–25. Schuldt, B.A., Totten, J.W., 1994. Electronic mail versus mail survey response rates. Marketing Research 6 (1), 36–39. Simon, R., 2001. Anthrax Nation; There are few things as universal as the mail. And that’s why we worry now when the postman

373

rings. Should we? US News & World Report, 5 November, p. 16. Tse, A.C.B., 1998. Comparing the response rate, response speed and response quality of two methods of sending questionnaires: e-mail versus mail. Journal of the Market Research Society 40 (4), 353–361. Vickery, S.K., Droge, C., Markland, R.E., 1993. Production competence and business strategy: do they affect business performance. Decision Sciences 24 (2), 435–455. Walsh, J.P., Kiesler, S., Sproull, L.S., Hesse, B.W., 1992. Selfselected and randomly selected respondents in a computer network survey. Public Opinion Quarterly 56, 241–244. Weible, R., Wallace, J., 1998. Cyber research: the impact of the Internet on data collection. Marketing Research 10 (3), 19–31.