An examination of the WHOQOL-BREF using four popular data collection methods

An examination of the WHOQOL-BREF using four popular data collection methods

Computers in Human Behavior 55 (2016) 446e454 Contents lists available at ScienceDirect Computers in Human Behavior journal homepage: www.elsevier.c...

530KB Sizes 1 Downloads 13 Views

Computers in Human Behavior 55 (2016) 446e454

Contents lists available at ScienceDirect

Computers in Human Behavior journal homepage: www.elsevier.com/locate/comphumbeh

An examination of the WHOQOL-BREF using four popular data collection methods Zared Shawver a, James D. Griffith a, *, Lea T. Adams a, James V. Evans b, Brian Benchoff a, Rikki Sargent a a b

Shippensburg University, USA Binghamton University, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 20 April 2015 Received in revised form 7 September 2015 Accepted 21 September 2015 Available online 5 November 2015

The rapid increase of researchers utilizing internet sites for conducting research has not yet been adequately investigated for the degree to which psychometric properties of instruments remain intact. Although using the internet for data collection is now commonplace, there are a limited number of studies that have investigated the quality of data provided from online subject pools and responses from those platforms compared to methods used to create these instruments. The present study collected data using the World Health Organization's WHOQOL-BREF instrument across four different types of data collection methods which included Amazon.com's Mechanical Turk, Craigslist.org, college students completing the survey online, and college students completing a survey in a traditional face-to-face format. Feldt tests comparing observed Cronbach's alphas to alphas from a published validation study and confirmatory factor analyses corroborated the use of online data collection for quality of life for the general public and college students. However, the face-to-face data collection method did not provide as consistent results across the scales. Further research should investigate the lack of internal consistency in face-to-face responses. © 2015 Elsevier Ltd. All rights reserved.

Keywords: Mechanical Turk Craigslist Online data collection Psychometrics WHOQOL-BREF

It is generally accepted that undergraduate students are vital to the process of data collection that fuels much of research. The reliance on this status quo is not without controversy and objection, as there have been researchers who have identified issues with this practice (e.g., Henry 2008; Sears 1986). To reduce any potential biases that a mostly homogenous college population method may invoke, researchers have taken to data collection strategies with a greater reach to the general population. For example, researchers have used telephone surveys (Senior et al. 2007), mail-in surveys (Kerin & Peterson, 1977), and, most recently, the internet (Granello & Wheaton, 2004). With very swift technological advances, it is appealing for researchers to capitalize on the internet to reach out to participants in the comfort of their own home on a medium that more people are becoming familiar and comfortable with. And, internet data collection is now becoming extremely popular as hundreds of papers have been published that have used it as the primary data collection medium.

This influx of researchers capitalizing on the internet's utility as a data collection source appears to have drastically progressed faster than research on its efficacy has occurred. For example, it is still unclear whether online data collection can produce valid and reliable results compared to traditional face-to-face (FTF) collection. The present study examined the World Health Organization Quality of Life e Brief (WHOQOL-BREF) scale and collected data using two popular online data collection methods (i.e., Amazon.com's Mechanical Turk and Craigslist.org) compared to a sample of college students completing a survey online as well as a sample of college students providing data in a FTF format. One way in which to compare data collection methods is on the basis of online versus FTF formats. Like all data collection methods, there are both advantages and disadvantages associated with each and these perspectives should be taken into consideration by researchers. 1. Advantages of online data collection

* Corresponding author. Psychology Department, Shippensburg University, Franklin Science Center 101, 1871 Old Main Drive, Shippensburg 17257, PA, USA E-mail address: [email protected] (J.D. Griffith). http://dx.doi.org/10.1016/j.chb.2015.09.030 0747-5632/© 2015 Elsevier Ltd. All rights reserved.

There are advantages for researchers who use online data collection in, at least, three different areas of the data collection

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

process. First, conducting studies online has cost and efficiency benefits. It is considerably cheaper than traditional methods using paper-and-pencil surveys and mail delivery (Ahern, 2005; Douglas et al. 2005; Sue & Ritter, 2007). It has also been shown that putting studies online can decrease the time needed to conduct a study, compared to telephone and mail surveys (Ahern, 2005; Beling, Libertini, Sun, Masina, & Albert, 2011; Granello & Wheaton, 2004). Second, reaching and communicating with participants becomes more feasible through the internet. With more than 78% of people in North America connected to the internet in 2012 (Internet World Stats 2012), researchers can gain access to a much larger participant pool (Ahern, 2005). This includes participants who are from traditionally hard-to-reach populations, for example, people in rural areas, shift workers, and those unable to get out their homes (Whitehead, 2007). And because of the internet and email, communication is improved between researchers and participants, as well, which can increase response rates (Wharton, Hampl, Hall,.& Winham, 2003). Third, data management is drastically improved through online data collection. Most online studies employ the use of some data collection software (e.g., Surveymonkey), which exports data to researchers in ready-to-use format that can easily be disseminated to research teams (Jones, Murphy, Edwards, & James, 2008b; Sue & Ritter, 2007). This software also allows for advanced methodologies, which can customize questionnaires based on participants' responses (Jones, Murphy, Edwards, & James, 2008a; Loescher, Hibler, Hiscox, Hla, & Harris, 2011; Sue & Ritter, 2007). The most distinct data management benefit would be the removal of the need for manual data entry, which reduces the time required for data entry, the prevalence of human error during entry, and minimizes problems associated with reading respondents' handwriting (Granello & Wheaton, 2004; Gregory and Pike 2011; Stewart, 2003). Conducting studies online may not only benefits researchers, but participants also have advantages compared to more traditional data collection methods: They are free to participate at their own pace with no perceived time pressure from a researcher and can complete the survey or experiment at a time convenient to them (Buhrmester, Kwang, & Gosling, 2011; Jones et al. 2008a, 2008b). The ease and novelty of participating in a study online also increases participants' interest and response rates (Ahern, 2005; Douglas et al. 2005; Sue & Ritter, 2007). Finally, participants feel that their anonymity is enhanced online because of the social distance created, and are more comfortable and apt to give honest answers on certain topics (Beling et al. 2011; Brindle, Douglas, van Teijlingen, & Hundley, 2005; Katz, Fernandez, Chang, Benoit, & Butler, 2008; Wharton et al. 2003). These advantages may make online data collection appealing, but there are several disadvantages found in the literature, as well. 2. Disadvantages of online data collection Prior work has also highlighted several disadvantages of conducting studies online. First is the lack of control that researchers have over the environment that the study is being taken (Riva, Teruzzi, & Anolli, 2003). This lack of control is perhaps the greatest barrier that researchers need to consider. It is possible that participants could provide false data on demographic questions, or take the survey repeatedly (Whitehead, 2007). There is also the possibility that respondents may complete the survey while they are at work, on the bus or train, or in another environment where there are distractions. This lack of control can be extended to ethical issues that can arise with the anonymity of being online, specifically if a minor were to complete a study under the guise of a legally consenting adult (Holden, Dennie, & Hicks, 2013). Care also must be taken to ensure that respondents are actually human and

447

are not computer programs (e.g., BOTS; Prince, Litovsky, & Friedman-Wheeler, 2012). Second, technological issues are a concern for both researchers and participants. For instance, it can be very costly and inefficient for researchers to format or create questionnaires online (Jones et al. 2008a, 2008b). This is especially true for researchers using Mechanical Turk (MTURK) as it requires the use of basic html; learning the nuances of the program can be burdensome. This would be especially problematic if technology issues affect the quality of one's research (Holden et al. 2013). Third, there are issues concerning whether the population accessed in studies is representative of the general population, given that respondents must be computer-literate and well off financially to afford an internet connection (Brindle et al. 2005). Finally, there are questions concerning the reliability and validity of measures collected online. The validation literature on surveys conducted online is scant, and it is unclear whether certain scales and surveys are accurate when completed online (Whitehead, 2007). There are a variety of ways in which to access the general population in an online manner. Two platforms which researchers have used include MTURK and Craigslist (CL). 3. Mechanical Turk MTURK was introduced by Amazon in 2005 as a “marketplace for work that requires human intelligence” (www.mturk.com). It was originally developed for human computation tasks in order for humans to do tasks that were difficult or impossible for computers. Such tasks included extracting data from images, audio transcription, and filtering adult content; all of which is sometimes referred to as microtasks. As such, it also serves as a platform for recruiting and compensating participants for online studies. MTRUK allows individuals to create accounts to post various tasks (requestors) for other people to work on for compensation (workers) who are also referred to as Turkers. Requestors may put any variety of items on MTURK for completion, such as surveys. Compensation is sometimes very low for participants, at least 1¢ must be paid, and many of the workers have been shown to be intrinsically motivated to complete the tasks known as Human Intelligence Tasks (HITS). The population on MTURK is very diverse, with more than 500,000 individuals from more than 190 countries (Paolacci & Chandler, 2014), although most workers are from the US and India. Demographically, most workers report being female, having an average age of roughly 32, and making about $30 thousand per year (for a full review see Mason & Suri, 2012). 4. Craigslist CL has also become a desirable option for researchers to attempt to access a more representative sample from the population. CL is a free advertisement website located started in 1995 and is in more than 570 cities across more than 50 countries (www.craigslist.org). The site has over 20 billion page views per month which ranks it the 61st most popular website in the world and 11th most popular in the United States (Alexa, 2015). People who are interested in buying/selling goods and services can place or respond to ads in certain categories on CL that are devoted to specific purposes (e.g., automobiles, jobs, volunteers, etc..). Recent published studies using CL as a data collection platform has been used in subject recruitment efforts for smoking cessation (Ybarra, Prescott, & Holtrop, 2014), obese individuals for qualitative interviews (Worthen, 2014), and for a study of aggression among youth (Richmond, Cheney, & Soyfer. 2013). Studies have also been directly collected on topics such as the sexual behavior of gay men (Grove, Rendina, & Parsons, 2014), youth alcohol use (Siegel et al. 2011), and cigarette smoking among young adults (Ramo, Hall, & Prochaska, 2010).

448

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

However, the authors are unaware of any examination of the typical demographics of CL users that has been conducted in a scientific manner. Very limited research has been conducted on the efficacy of using CL as a viable data collection medium, and the authors are unaware of any studies that have examined psychometric properties of any instruments using CL as a data collection mechanism. 5. Prior research comparing online data methods There have been some studies that have examined the integrity of using online data collection as a means to collect via survey instruments. An oft cited article by Buhrmester et al. (2011) examined factors on MTURK that influenced participation and any subsequent changes in reliability involving the completion of six personality questionnaires. The Cronbach alphas ranged from .73 to .93 which the researchers reported as being in the good to excellent range. Johnson and Borden (2012) also looked at six different personality measures and reported reliability differences from .01 to .14 comparing Turkers to a college sample. It should be noted that neither of these studies conducted analyses to compare differences of reliabilities between the samples which is certainly a shortcoming of both studies. Another study (Holden et al. 2013) reported strong 3-week test-retest reliabilities using scores on a measure of the Five Factor Model of personality. A recent study (Rouse 2015) compared Turkers and a college sample on a personality scale and found that reliabilities were lower for Turkers compared to college students. Finally, responses to body-satisfaction questionnaire were similar to data collected in the laboratory (Gardner, Brown, & Boice, 2012). All of these studies lack quantitative comparisons between what was observed online and what is published in the literature. That is, no statistical analyses have confirmed that online data are no different than FTF data. Further, only a limited scope of psychological domains (i.e., personality and self-perception) has been examined using online collection. Although the use of MTURK for collecting data on personality is not uncommon, there is no clear pattern if the psychometric properties of personality assessments remain intact in an online study (Rouse 2015). All of the studies reviewed thus far have focused on MTURK rather than CL, simply because there appears to be no published research on the psychometric properties of data collected via this online platform as a data collection method. One study was found, however, that compared CL to social networking and email recruitment for its potential for data collection. CL was found to be the most cost effective because it is free, but it was also reported to be least apt to access the desired target population to complete a survey on tobacco use compared to the other two online sources (Ramo et al. 2010). At present, the authors are unaware of any study that has undertaken a comprehensive investigation of different online collection mediums (e.g., MTURK and CL) compared to a FTF sample and the population parameters for a widely used scale. In addition to MTURK and CL, conducting research on student samples online is also popular and validation studies confirming the reliability of that collection method is also necessary. As it stands, the literature generally supports conducting studies on students via the internet using proprietary services such as SurveyMonkey or SurveyGizmo. When internal consistency statistics (i.e., Cronbach's alphas) of scales collected from adolescents online to a large FTF sample were examined, there were no statistical differences between the two methods of data collection. In fact, some of the alphas were higher using the online survey (Vosylis,  _ & Zukauskien _ 2012). Some recent research has Malinauskiene, e, examined the psychometric properties of scales measuring social values completed by a mixed sample of college students and people from general population in their natural habitat, or Facebook, for

the lay reader. While no analyses were conducted to compare the alphas from the Facebook group and a FTF group, they were descriptively the same (Grieve, Witteveen, & Tolan, 2014). Again, this was not a clean comparison of college students online to a traditional research method, but acceptable Cronbach's alphas observed for a group mixed with both college students and noncollege students suggests that there were no issues related to reliability with either sample. One study with a repeated measures design varied the data collection method between FTF and online on several scales popular in research on aggression was conducted to compare the two methods. It was found that test-retest reliability was stronger for the online method than the FTF, and that content responses were the same across group (Brock et al. 2015). This was one of the better controlled studies comparing measurement equivalence, and it was valuable in that it emphasized the need for validation of scales within limited domains. However, it was lacking in that there were no comparisons of internal consistency between the methods. 6. WHOQOL-BREF In an effort to compare different methods of data collection, it is essential to select an instrument that has a solid psychometric foundation. Certainly, the WHOQOL-BREF qualifies as such an instrument. There are many quality of life measures but few provide as much data across a wide range of cultures, languages, countries, and populations compared to some of the quality of life instruments developed by the World Health Organization. The WHOQOL-BREF was developed as a brief version of the World Health Organization Quality of Life assessment referred to as the WHOQOL-100. The WHOQOL-BREF used a cross-sectional design in 23 countries across the world to assess the psychometric properties of the instrument (Skevington, Lotfy, & O'Connell2004). The development of the instrument examined responses from over 10,000 participants and found sound psychometric properties on internal consistency as well as discriminant and construct validity, suggesting that it was a sound cross cultural instrument measuring four dimensions of quality of life which included physical, psychological, social, and environmental domains. Since it was developed, the WHOQOL-BREF has been used in hundreds of studies. The fact that it is a reliable and valid cross cultural measurement of quality of life, the WHOQOL-BREF was thought to be an ideal instrument to use in an effort to compare different methods of data collection. The present study was designed to test whether collecting data online is appropriate for a psychometrically sound quality of life scale. Samples were collected from four different methodologies including MTURK, CL, an online survey using SurveyMonkey completed by college students, and a face-to-face format in the traditional pen-and-paper method completed by college students. Most of the research sampling from MTURK and college students has made reliability comparisons to FTF students sampled simultaneously (e.g., Feitosa, Joseph, & Newman, 2015; Grieve et al. 2014). These studies assumed that the psychometric properties of the scales would be stable and that the face-to-face method would be no different than scale validation studies published previously. The present study aimed to test that assumption and collected data face-to-face rather than make that assumption, as well, and make all internal consistency comparisons to a large-scale validation study on the WHOQOL-BREF. It should be stated that variance found between the samples could have occurred from two sources: from the collection methodology (i.e., collecting data online) and from differences in samples (i.e., college students versus general population). The assumption of this study was that the WHOQOL-BREF were valid

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

both for college and non-college adult samples, or the general population. Therefore, because the WHOQOL-BREF has demonstrated outstanding cross-cultural applications across very diverse samples (e.g., Skevington et al., 2004), it was assumed that variance observed would be due to the collection method. Further, this study was not intended to compare the reliability between data collection methods, but rather to validate the collection methods independent of one another. The simultaneous presentation of results below is a product of feasibility and aesthetics, not to imply or invite comparisons or some ordinal status of the methods. Therefore, it was predicted that if a particular data collection method is reasonable for conducting scale-based research, then internal consistencies and scale factor structures, should not differ from what has previously been reported in the literature.

449

Department's online research recruitment web platform. The student participants received extra credit for their participation. Eleven participants did not provide any viable data, so the final sample size was 226. The sample had a mean age of 19.55 (SD ¼ 2.13), was mostly female, primarily reported being Caucasian and having a high school diploma. 7.1.4. Face-to-Face A total of 246 undergraduate students were recruited from General Psychology courses. The student participants received extra credit for their participation and completed the surveys in a group setting with groups ranging from 8 to 24. The sample had a mean age of 18.71 (SD ¼ 1.89), was mostly female, primarily reported being Caucasian, and almost all reported having a high school diploma.

7. Method 7.2. Measures

7.1. Participants 7.1.1. Mechanical Turk A total of 258 participants responded to a HIT placed on MTURK for participation in a research study, however nine participants did not provide viable data, so the final sample size was 249. Participants were credited 10¢ to their MTURK account upon completion. Demographic information for all three samples is presented in Table 1. The sample had a mean age of 33.25 (SD ¼ 10.57), was relatively evenly split between genders, primarily reported being either Asian or Caucasian, and having a Bachelor's degree. 7.1.2. Craigslist A total of 189 participants responded to a SurveyMonkey.com link placed in the “volunteers” section on CL in Boston, MA, Chicago, IL, Los Angeles, CA, and Philadelphia, PA. Seventeen participants did not provide any viable data, so the final sample size was 172. CL participants did not receive any compensation. The sample had a mean age of 34.88 (SD ¼ 14.50), was mostly female, primarily reported being Caucasian, and was relatively mixed between having a high school diploma, Bachelor's or Graduate degree 7.1.3. Online college students A total of 237 undergraduate students responded to a SurveyMonkey.com link to the survey placed on the Psychology

7.2.1. World Health Organization Quality of Life brief scale The 26-item WHOQOL (Skevington et al., 2004) is a 26-item scale measures level of agreement with various statements regarding quality of life. There are two general QOL questions (e.g., How would you rate your quality of life?) on a scale from 1 to 5 (Very poor e Very good), and four domains captured in the remaining 24 questions: physical health (e.g., Do you have enough energy for everyday life?), psychological (e.g., How much do you enjoy life?), social relationships (e.g., How satisfied are you with your personal relationships?), and environment (e.g., How safe do you feel in your daily life?). The total possible scores for each domain are as follows: for physical health 7 to 35, for psychological 6 to 30, for social relationships 3 to 15, and for environment 8 to 40. Higher scores indicate a higher level of quality of life within each particular domain. Internal consistency measures for each domain are as follows: physical health a ¼ .82, psychological a ¼ .81, social relationships a ¼ .68, and environment a ¼ .80 (Skevington et al., 2004). 7.3. Procedure In all four collection methods, participants were given a survey that contained a demographics questionnaire and the WHOQOLBREF. They were instructed that they were to complete the

Table 1 Sociodemographic variables for the four samples.

Variable Age Gender Male Female Ethnicity African American Asian Caucasian Hispanic/Latino Native American Other Education Some High School High School Diploma Associate's Degree Bachelor's Degree Graduate Degree

MTURK

Craigslist

Online CS

Face-to-Face

(n ¼ 249) M ¼ 33.25 (SD ¼ 10.76)

(n ¼ 172) M ¼ 34.88 (SD ¼ 14.50)

(n ¼ 226) M ¼ 19.55 (SD ¼ 2.13)

(n ¼ 246) M ¼ 18.72 (SD ¼ 1.89)

130 (52.2%) 117 (47.0%)

46 (26.7%) 124 (72.1%)

56 (24.8%) 161 (71.2%)

102 (41.5%) 144 (58.5%)

12 (4.8%) 105 (42.2%) 115 (46.2%) 4 (1.6%) 10 (4.0%) 3 (1.2%)

17 (9.9%) 10 (5.8%) 114 (66.3%) 16 (9.3%) 8 (4.7%) 5 (2.9%)

28 (12.4%) 3 (1.3%) 173 (76.5%) 10 (4.4%) 1 (.4%) 5 (2.2%)

28 (11.4%) e 205 (83.3%) 8 (3.3%) 2 (.8%) 3 (1.2%)

9 61 18 121 40

7 48 21 53 43

1 204 7 7 e

e 246 (99.6%) e 1 (.4%) e

(3.6%) (24.5%) (7.2%) (48.6%) (16.1%)

(4.1%) (27.9%) (12.2%) (30.8%) (25.0%)

(.4%) (90.3%) (3.1%) (3.1%)

Note. Some demographic data were incomplete and percentages may not sum up to 100%; MTURK ¼ Mechanical Turk; Online CS ¼ online college students; e indicates that no data reported for this variable.

450

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

survey assessing their psychological well-being, so as not to influence their responses, and received a debriefing statement upon completion that explained the true nature of the study. The two college groups did receive extra credit and the MTURK group did receive 10 cents for their participation whereas the Craigslist group did not receive any form of incentive. It was assumed that the small value of the incentives would not be related to the quality of responses as has been reported in prior studies (e.g., O'Neil & Penrod 2001). There were no other differences between the collection methods were the recruitment method and survey administration medium. The online participants, who responded from their respective platforms, were redirected to a SurveyMonkey.com link that contained the survey, while FTF participants were given a paper packet instead. 8. Results Table 1 shows the sample sizes and sociodemographics for each collection method. As can be seen, the MTURK and CL samples provide a much more diverse distribution of age, ethnicity, and education, compared to the college samples. Also of note is the large percentage of people who identify as Asian in the MTURK sample relative to the Craigslist sample. 8.1. Time A simple metric of the feasibility of a data collection method is how long it takes to gather a large enough sample. It should be noted that the authors actively worked to promote each study either to the student population (e.g., advertising in classrooms and directly to students through email) and the CL population (e.g., constantly updating the advertisements to keep them at the top of the page in multiple cities); there was no way to actively promote the MTURK study, as the interface relies on the market to work itself out. Subsequently, to collect large enough samples it took 10, 15, 78, and 126 days for FTF, MTURK, OCS, and CL, respectively. Thus, it appears that traditional, active recruitment methods for a FTF format in an academic setting were the most time effective, followed closely by MTURK which required much less effort and only a relatively small fee. The OCS and CL samples were much less efficient, and required constant updates and reminders for little time benefit compared to FTF and MTURK. 8.2. Reliability One of the main purposes of this study was to compare the internal consistency of the WHOQOL-BREF collected across different methodologies to what was reported when the scale was initially developed. To test this, Feldt tests (Feldt, 1969) were conducted to compare the observed Cronbach's alphas from the four samples to the Cronbach's alphas reported in recent validation studies for the scales used with large sample sizes (Skevington et al., 2004). Feldt tests are tests of equality of Cronbach's alphas between samples and have the same assumptions as a two-factor ANOVA (Feldt, 1969); the samples must be independent, the population sampled from must be normally distributed, and the samples' variances must be homogeneous. The data were analyzed on an Excel spreadsheet that calculates the W statistic and p value (Suen, 2009). Because each sample was compared to an observed Cronbach's alpha published in a validation study (Skevington et al., 2004), the samples were independent. Tests of normality and homogeneity of variance were conducted which did yield some significant results. Theoretical and simulation studies have shown that violations of normality are of little consequence when mildly violated (i.e., skewness between 1.0 and þ 1.0, see Bachard and Hakstian, 1997;

Woodruff & Feldt, 1986). In the current study, skewness ranged from .81 to .06 across samples and measures. Although there were incidences of homoscedasticity, Novick and Lewis (1967) showed that heterogeneous variances do not invalidate coefficient alpha. Furthermore, Hakstian and Whalen (1976) indicated that the homoscedasticity assumption may be important with sample sizes less than 50, and that the test is fairly robust with larger sample sizes. Kim and Feldt (2008) recently suggested that differences in sample variances may be due to external variables, such the duration it took individuals to complete the instrument or if it was completed in the presence of the experimenter or not. The authors further recommended that Feldt tests should be conducted on unadjusted coefficients because adjusting for differences in sample variances would require a unique sampling theory, which in turn would be related to revisions of the instruments. Thus, the present study did find that there were assumption violations so consideration of that should be noted, although prior research suggests that the results computed using Feldt tests under these violations are still meaningful. Table 2 presents the observed Cronbach's alphas from each scale across the four samples as well as the Cronbach's alphas for comparison from the original WHOQOL-BREF scale development study (Skevington et al., 2004). Because multiple comparisons were being made to the same alpha, the alpha level of rejection was adjusted to .0125 using a Bonferroni correction to control against Type I error. The pattern of responses showed a consistent pattern. The domains of physical health and social relationships had lower Cronbach's alphas compared to the values reported in the scale development study. In addition, there was a significantly higher Cronbach's alpha for the social relationships dimension for the MTURK group relative to the comparison study. Thus, all of the internal consistencies for the MTURK, CL, and online student group were within what would be expected by chance or better compared to the values provided by the Skevington et al. (2004) study, whereas the FTF condition provided the worst results with two dimensions yielding values less than expected. One assumption of this study was that differences in reliabilities would be due to collection method rather than the population sampled because samples were collected both from college adults and non-college adults. Therefore, Feldt tests were conducted to compare the Cronbach's alphas between the online and FTF college students. All of the Cronbach's alphas for the OCS were higher than the FTF students, but only the alphas physical health and social domains were significantly higher, W ¼ .71, p ¼ .006, and W ¼ .66, p ¼ .0009, respectively; the psychological and environmental domains were W ¼ .82, p ¼ .07, and W ¼ .87, p ¼ .15, respectively. Although only two of the comparisons were significantly different between online and FTF students, the Cronbach's alphas for FTF, in general, were low and additional variance caused by methodological differences (e.g., presence of an experimenter in FTF, group vs. individual settings, environmental and psychological distractions in both conditions) could have made the differences between the two groups less consistent. In general, however, it appears that differences were generally due to methodologies rather than samples, given the high reliability of all online samples. 8.3. Validity In addition to comparing the internal consistency of the scales to what was previously reported in the literature, confirmatory factor analyses were conducted on the WHOQOL-BREF, in which they tested a four and six factor model. The study slightly favored the four factor model, thus structural equation modeling (i.e., LISREL) was used to test whether the data fit the expected factor loading model (Skevington et al., 2004). Fig. 1 shows the model of the factor

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

451

Table 2 Comparisons between Cronbach's Alphas on the WHOQOL-BREF Observed across Samples and from the Literature using Feldt Tests.

Measure Phys Psych Social Enviro

Population

MTURK

a

a

.82 .81 .68 .80

N 11,830 11,830 11,830 11,830

.78 .81 .79 .83

Craigslist N 242 242 242 242

W .59 .68 1.48 .94

a .86 .89 .75 .80

Online student N 148 148 148 148

W .93 1.18 1.24 .80

a .80 .86 .73 .80

N 214 214 214 214

Face-to-face W .65 .93 1.15 .80

a .72 .83 .59 .77

N 246 246 246 246

W .46 .76 .76 .70

Note. Items in boldface represent alphas that are significantly different at p < .0125. Phys ¼ Physical Health domain; Psych ¼ Psychological domain; Social ¼ Social relationships domain; Enviro ¼ Environmental domain; MTURK ¼ Mechanical Turk.

Fig. 1. WHOQOL-BREF 4-domain confirmatory factor model.

structure that was tested and fit indices for each collection method are presented in Table 3. Although the recommendations for which fit indices to use are debated, there are a few that tend to be provided on a more regular basis throughout the literature and specifications for reasonable fit are: c2/df between 2 and 3, upper bound of Root Mean Square of Error Approximation (RMSEA) confidence interval <.08, Standardized Root Mean Square Residual (SRMR) < .08, Comparative Fit Index (CFI)  .95, and Non-Normed Fit Index (NNFI)  .95 (Shreiber, Nora, Stage, Barlow, King et al., 2006). As can be seen in Table 3, the fit indices were reasonable for all data collection methods relative to what was reported by Skevington et al. (2004) and fit indices were fairly consistent.

9. Discussion Four different data collection methods were tested on whether or not they can produce psychometrically sound data on a frequently used measure of quality of life. This study was an extension of the narrow-focused literature on using an online arena for data collection, which has primarily been concerned with personality scales (e.g., Buhrmester et al. 2011; Feitosa et al. 2015; Holden et al. 2013). This study is the first, that the authors are aware, that also tested the quality of data collected directly from participants on CL. Finally, responses were examined across college students by checking the status quo of using college students as

452

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

Table 3 Fit indices from confirmatory factor analysis of WHOQOL-BREF across the samples. Fit Indices

Population

MTURK

CL

OCS

FTF

N

5133 6830.80 27.4 .07 .86 NR NR

242 766.07 3.08 .09 .94 .08 .93

148 519.27 2.11 .08 .95 .07 .95

214 547.47 2.22 .08 .96 .07 .95

246 540.95 2.20 .07 .94 .06 .94

c2 c2/df RMSEA CFI SRMR NNFI

Note. Population ¼ data from Skevington et al., 2004, split half sample A; MTURK ¼ Mechanical Turk; CL ¼ Craigslist; OCS ¼ online college students; FTF ¼ Face-to-face; RMSEA ¼ Root mean square error approximation; CFI ¼ Comparative Fit Index; SRMR ¼ Standardized Root Mean Square Residual; NNFI ¼ Non-normed Fit Index; NR ¼ Not reported.

subjects and generalizing those results to the general population. Overall, it appears that all methods are viable mechanisms to collect data using the WHOQOL-BREF. Compared to Cronbach's alphas initially published in the literature on the instrument (Skevington et al., 2004), only subjects in the FTF condition provided values significantly lowered than expected. This is a curious finding and of particular concern is the differences in the pattern of responses among the two college samples. In essence, the online college student sample appears to provide similar internal consistencies to the originally reported data, also similar to the MTURK and CL samples suggesting that a college student population may be used to generalize to the more broad population, at least on this measure of quality of life. This finding may offer support for investigators (Ahern, 2005; Buhrmester et al. 2011; Douglas et al. 2005; Jones et al. 2008a, 2008b; Sue & Ritter, 2007) who indicated the benefits to subjects who participate in research studies using an online format. Namely, they indicated that subjects complete surveys at their own pace without a perceived time pressure from a researcher or others as well as an increase in interest in because of the ease and novelty of participating in a study online on their own accord. Examination of the participants reveals an interesting pattern of those who began the study and those who actually completed it. Listwise deletion was used so if a respondent did not complete the entire survey, that data was not used. The completion rates for the MTURK, CL, Online Students, and FTF groups were: 242/258 (93.8%), 148/189 (78.3%), 214/237 (90.3%), and 246/246 (100%), respectively. The FTF condition had a 100% completion rate. Although not directly tested, this may suggest that in a group setting surrounded by other study participants and a researcher, subjects in this condition may have completed the instrument under some type of time pressure or constraint. If others around them completed the survey before them, they may have responded by completing the instrument faster than they would have done so in a different testing environment. It may be the case that when college students are required to physically show up at a given time and location in order to participate in research, the quality of the data they provide appears to be worse than if they completed the instrument online. This somewhat reflects the call of Sears (1986) and Henry (2008) that only sampling college students is problematic. It may not be that sampling college students is problematic, but how college students complete instruments like the WHOQOL-BREF may be cause for concern. This may not have been the case 20 years ago when fewer college students had internet access. Now, college students can complete these types of surveys from their phone and may not even need access to a personal computer or computer lab for that matter, perhaps making it even more inconvenient for them to complete a survey in a FTF format. This notion is further supported when considering how long it took to collect large enough samples. Despite the quickness and relative ease of

sampling the college students FTF compared to sampling them online, the FTF data were much less reliable, suggesting that while it was a convenient time for them to take a survey for some quick extra credit points, they did not answer as diligently, but rather responded arbitrarily to quickly finish and leave. The differences in attitudes and motivation of FTF vs. online college student responding and its psychometric effects should be investigated in more depth. In addition, because there was a researcher present in the FTF condition, some degree of anonymity was lost and the subjects may have perceived some sort of obligation in that they had to complete the survey rather than not complete it as others did in all three other conditions. There is evidence suggesting that anonymity is increased in an online environment because of the social distance created, and participants are more likely to provide honest answers (Beling et al. 2011; Brindle et al. 2005; Katz et al. 2008; Wharton et al. 2003). The findings from this study do lend support to the potential importance of anonymity. Another issue related to the completion rate is associated with incentive. As noted, only the CL group did not have an incentive and the other three groups did have a very modest incentive. Although not directly tested, it appears as though incentive was not related to the quality of data (CL), although, there does appear to be a lower rate of completion relative to the other groups. However, the data that was collected was from a diverse group of individuals and provided data that was psychometrically sound. In addition to the Cronbach's alphas comparisons, the factor structure of the WHOQOL-BREF was tested to ensure that the observed data fit its proposed four factor model. The results of the confirmatory factor analyses were straightforward and revealed that data from all groups had reasonable fit and were consistent with what was initially reported by Skevington et al. (2004). This study may be the first to test whether CL can be used as a viable option for collecting survey data that can be used in a scientific manner. Ramo et al. (2010) did not find CL to be effective in reaching their target population compared to other approaches. Relative to troubles sampling a targeted population, even with constant monitoring and updating according to CL's policies and procedures, it took a very long time to collect a large enough sample. This is likely due to the fact that traffic to the volunteers section of CL might be considerably less than the buying and selling sections, and as much as the authors tried to encourage traffic to the advertisement for this study, it was only reaching a small number of people. In today's competitive market time is incredibly valuable, and it might not be feasible to have to wait more than four months to finish data collection. Another obvious shortcoming of the present study is that there is no easy way to consistently confirm the demographic information reported by the participants. This issue revisits the greatest pitfall of online data collection; that is, researchers have no ability to have control over participants' responses (Whitehead, 2007). Certainly as many questions that could be asked regarding MTURK could apply to CL, as well. Of particular interest is that CL participants did not receive any compensation for participating, but still provided generally reliable data, although the time it took to collect a large enough sample was considerably longer than any other method. A recommendation for researchers interested in putting studies on CL is to choose many cities to sample from and cycle through them, as CL limits the amount of simultaneous advertisements a single email address can be related to using that service. In general, the use of MTURK to collect data has strong support from this study. It only took a few more days to collect a large sample than aggressive recruiting FTF college students e time that was easily made up when considering the time and error cost of manually inputting data vs. downloading directly into Excel or

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

SPSS. Further, the data collected had good reliability compared to the literature and replicated the expected factor structure of the WHOQOL-BREF. This confirms previous studies investing high data quality on MTURK (e.g., Buhrmester et al. 2011; Holden et al. 2013). However, understanding what factors affect the total time needed to collect a large enough sample size and how those factors affect data quality should be investigated in much more depth, as it is missing in the literature. There are few additional limitations that should be pointed out regarding the present study. First, the sample sizes were relatively small for CFA, especially for CL due to difficulties collecting large sample sizes and high dropout rates. Therefore, caution should be exercised when interpreting the results of the CFAs and they should be replicated with larger samples, although the fit indices were similar or better in some cases than what was reported with much larger sample sizes. Second, differences between means were not examined. This was not done because the research questions driving this project were on the psychometric properties of the WHOQOL-BREF, not on different scores between the groups. Comparing mean responses on the scales between samples introduces the possibility of re-framing the research question into something like, “Do MTURK users have higher levels of psychological quality of life compared to a norm or those using CL?” There were theoretical underpinnings to the present study so the analyses were directed accordingly. Third, it is possible that MTURK and CL respondents were the same individuals. Although they are separate websites, respondents could have participated in both platforms. Fourth, incentive was not the same across the four conditions. Although the incentive was modest, there was an incentive offered for three of the groups. The incentive appeared to be related to completion rate, but not necessarily to the psychometrics of the instrument. The present study was conducted to test how different methods of data collection were related to the psychometric properties of a frequently used instrument measuring quality of life which in this case, was the WHOQOL-BREF. The data suggests that online formats using MTURK, CL, and surveymonkey.com for college students all appear to provide data that is similar to what has been reported in a large international scope study that initially developed the quality of life instrument. This study represents a step which can be built upon in ensuring that studies collecting quality of life information using an online platform does appear to be in concert with prior studies that have collected data in a FTF format. This is important because studies that conduct online data collection can attribute differences to factors of interest rather than on the method of data collection. Another aspect to consider is that there do not appear to be differences between college students and the general population, as long as data is collected in some type of online format, whereas differences do appear when comparing college students who completed the instrument in a FTF format. It may be the case that online data collection may actually yield a higher quality of data for college students, at least on this measure of quality of life. References Ahern, N. R. (2005). Using the internet to conduct research. Nurse Researcher, 13, 55e70. Alexa. (2015). Top sites: United States. Retrieved 5 April 2015, from http://www. alexa.com/topsites/countries/US. Bachard, K. A., & Hakstian, A. R. (1997). The robustness of confidence intervals for coefficient alpha under violation of essential parallelism. Multivariate Behavioral Research, 32, 169e191. Beling, J., Libertini, L. S., Sun, Z., Masina, V. M., & Albert, N. M. (2011). Predictors for electronic survey completion in healthcare research. Computers, Informatics, Nursing, 29, 297e301. Brindle, S., Douglas, F., van Teijlingen, E., & Hundley, V. (2005). Midwifery research: questionnaire surveys. Midwives, 8, 156e158. Brock, R. L., Barry, R. A., Lawrence, E., Rolffs, J., Cerretani, J., & Zarling, A. (2015).

453

Online administration of questionnaires assessing psychological, physical, and sexual aggression: establishing psychometric equivalence. Psychology of Violence, 5, 294e304. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's mechanical turk: a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3e5. Douglas, F., van Teijlingen, E., Brindle, S., Hundley, V., Bruce, J., & Torrance, N. (2005). Designing questionnaires for midwifery research. Midwives, 8, 212e215. Feitosa, J., Joseph, D. A., & Newman, D. A. (2015). Crowdsourcing and personality measurement equivalence: a warning about countries whose primary language is not English. Personality and Individual Differences, 75, 47e52. Feldt, L. (1969). A test of the hypothesis that cronbach's alpha or kuder-richardson coefficient twenty is the same for two tests. Psychometrika, 34, 363e373. Gardner, R. M., Brown, D. L., & Boice, R. (2012). Using Amazon's Mechanical Turk website to measure accuracy of body size estimation and body dissatisfaction. Body Image, 9, 532e534. Granello, D. H., & Wheaton, J. E. (2004). Online data collection: strategies for research. Journal of Counseling and Development, 82, 387e393. Gregory, V. L., & Pike, C. K. (2011). Missing data: a comparison of online and classroom data collection methods with social work students. Journal of Social Service Research, 38, 351e365. Grieve, R., Witteveen, K., & Tolan, A. G. (2014). Social media as a tool for data collection: examining equivalence of socially value-laden constructs. Current Psychology, 33, 532e544. Grove, C., Rendina, J. H., & Parsons, J. T. (2014). Comparing three cohorts of MSM sampled via sex parties, bars/clubs, and Craigslist.org: implications for researchers and providers. AIDS Education and Prevention, 26, 362e382. Hakstian, A. R., & Whalen, T. E. (1976). A k-sample significance test for independent alpha coefficients. Psychometrika, 41, 219e231. Henry, P. J. (2008). College sophomores in the laboratory redux: influences of a narrow data base on social psychology's view of the nature of prejudice. Psychological Inquiry, 19, 49e71. Holden, C. J., Dennie, T., & Hicks, A. D. (2013). Assessing the reliability of the M5-120 on Amazon's mechanical turk. Computers in Human Behavior, 29, 1749e1754. Internet World Stats. (2012). Internet users in the world. Retrieved 5 April 2015, from www.internetworldstats.com/stats2. Johnson, D. R., & Borden, L. A. (2012). Participants at your fingertips: using Amazon's Mechanical Turk to increase student-faculty collaborative research. Teaching of Psychology, 39, 245e251. Jones, S., Murphy, F., Edwards, M., & James, J. (2008a). Using online questionnaires to conduct nursing research. Nursing Times, 104, 66e69. Jones, S., Murphy, F., Edwards, M., & James, J. (2008b). Doing things differently: advantages and disadvantages of web questionnaires. Nurse Researcher, 15, 15e26. Katz, N., Fernandez, K., Chang, A., Benoit, C., & Butler, S. F. (2008). Internet-based survey of nonmedical prescription opioid use in the United States. The Clinical Journal of Pain, 24, 528e535. Kerin, R. A., & Peterson, R. A. (1977). Personalization, respondent anonymity, and response distortion in mail surveys. Journal of Applied Psychology, 62, 86e89. Kim, S., & Feldt, L. S. (2008). A comparison of tests of equality of two or more independent alpha coefficients. Journal of Educational Measurement, 45, 179e193. Loescher, L. J., Hibler, E., Hiscox, H., Hla, H., & Harris, R. B. (2011). Challenges of using the internet for behavioral research. Computers, Informatics, Nursing, 29, 445e448. Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon's mechanical turk. Behavior Research Methods, 44, 1e23. Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1e13. O'Neil, K. M., & Penrod, S. D. (2001). Methodological variables in in Web-based research may affect results: sample type, monetary incentives, and personal information. Behavior Research Methods, Methods, Instruments, and Computers, 33, 226e233. Paolacci, G., & Chandler, J. (2014). Inside the Turk: understanding mechanical turk as a participant pool. Current Directions in Psychological Science, 23, 184e188. Prince, K. R., Litovsky, A. R., & Friedman-Wheeler, D. G. (2012). Internet-mediated research: beware of bots. The Behavior Therapist, 35, 85e88. Ramo, D. E., Hall, S. M., & Prochaska, J. J. (2010). Reaching young adult smokers through the internet: comparison of three recruitment mechanisms. Nicotine & Tobacco Research, 12, 768e775. Richmond, T. S., Cheney, R., Soyfer, L., et al. (2013). Recruitment of communitysetting youth into studies on aggression. Journal of Community Psychology, 41, 425e434. Riva, G., Teruzzi, T., & Anolli, L. (2003). The use of the internet in psychological research: comparison of online and offline questionnaires. CyberPsychology & Behavior, 6, 73e80. Rouse, S. V. (2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304e307. Sears, D. O. (1986). College sophomores in the laboratory: influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology, 51, 515e530. Senior, A. C., Kunik, M. E., Rhoades, H. M., Novy, D. M., Wilson, N. L., & Stanley, M. A. (2007). Utility of telephone assessments in an older population. Psychology and Aging, 22, 392e397. Shreiber, J. B., Nora, A. M., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: a review.

454

Z. Shawver et al. / Computers in Human Behavior 55 (2016) 446e454

The Journal of Educational Research, 99, 323e337. Siegel, M., DiLoreto, J., Johnson, A., Fortunato, E. K., & DeJong, W. (2011). Development and pilot testing of an internet-based survey instrument to measure the alcohol brand preferences of U.S. youth. Alcoholism: Clinical and Experimental Research, 35, 765e772. Skevington, S. M., Lotfy, M., & O'Connell, K. A. (2004). The world health organization's WHOQOL-BREF quality of life assessment: psychometric properties and results of the international field trial. A report from the WHOQOL group. Quality of Life Research, 13, 299e310. Stewart, S. (2003). Casting the net: using the internet for survey research. British Journal of Midwifery, 11, 543e546. Suen, H. K. (2009). Significance tests to compare two independent Cronbach alpha values [Software]. Unpublished instrument. Retrieved from www.bgu.ac.il/ ~baranany/Feldt.xls. Sue, V. M., & Ritter, L. A. (2007). Conducting online surveys. Thousand Oaks, CA: Sage Publications.  _ O., & Zukauskien _ R. (2012). Comparison of internetVosylis, R., Malinauskiene, e,

based versus paper-and-pencil administered assessment of positive development indicators in adolescents' sample. Psichologija, 45, 7e21. Wharton, C. M., Hampl, J. S., Hall, R., & Winham, D. M. (2003). PCs or paper-andpencil: online surveys for data collection. Journal of the American Dietetic Association, 103, 1458e1460. Whitehead, L. C. (2007). Methodological and ethical issues in internet-mediated research in the field of health and integrated review of the literature. Social Science and Medicine, 65, 782e791. Woodruff, D. J., & Feldt, L. S. (1986). Tests for equality of several alpha coefficients when their sample estimates are dependent. Psychometrika, 51, 393e413. Worthen, M. G. F. (2014). An invitation to use Craigslist ads to recruit respondents from stigmatized groups for qualitative interviews. Qualitative Research, 14, 371e383. Ybarra, M. L., Prescott, T. L., & Holtrop, J. S. (2014). Steps in tailoring a text messaging-based smoking cessation program for young adults. Journal of Health Communication, 19, 1393e1407.