Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

Computers & Education 125 (2018) 345–357 Contents lists available at ScienceDirect Computers & Education journal homepage: www.elsevier.com/locate/c...

363KB Sizes 1 Downloads 111 Views

Computers & Education 125 (2018) 345–357

Contents lists available at ScienceDirect

Computers & Education journal homepage: www.elsevier.com/locate/compedu

Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

T

Yu-Ling Tsaia, Chin-Chung Tsaib,c,∗ a

Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, #43, Sec. 4, Keelung Rd., Taipei 106, Taiwan Program of Learning Sciences, National Taiwan Normal University, Taipei City, Taiwan c Institute for Research Excellence in Learning Sciences, National Taiwan Normal University, Taipei City, Taiwan b

A R T IC LE I N F O

ABS TRA CT

Keywords: Applications in subject areas Evaluation of CAL systems Improving classroom teaching Pedagogical issues

Second language (L2) vocabulary learning has been deemed a daunting task for many students. This meta-analysis study aimed to explore the effectiveness of applying digital games for L2 vocabulary learning. A total of 26 published studies (2001–2017) conformed with the inclusion/ exclusion criteria. Due to diverse findings of previous meta-analysis research in the field, we propose a framework of four-condition research designs to differentiate the empirical studies in an attempt to disclose possible expositions for the diversity and to connect the specific learning mechanisms with the research evidence. The overall effect sizes of the studies in the four conditions are reported as follows: A large overall effect size for Condition 1 (10 studies) (experimental groups playing digital games versus control groups receiving alternative activities), medium for Condition 2 (experimental groups playing digital games with a feature added or changed versus control groups playing base-version games) (10 studies), medium to large for Condition 3 (experimental groups playing digital games and control/comparison groups receiving identical content via conventional means) (two studies), and non-significant for Condition 4 (all participants playing the same digital games but being grouped by a non-game related variable) (four studies). Next, a structure diagram is developed in which the four conditions of the research design are connected with their respective game-related factors based on their locality. Further, we conducted moderator analyses to examine how the eight potential moderator variables (game design, educational level, L2 proficiency level, linguistic distance, intervention setting, assessment type, game source and intervention length) influenced the effect sizes in Conditions 1 and 2 to illustrate various digital game-based L2 vocabulary learning scenarios. Finally, suggestions and implications are provided for game designers, educational practitioners, and researchers in the field.

1. Introduction The importance of vocabulary knowledge in the scenario of language learning has been reiterated (Ghanbaran & Ketabi, 2014; Saville-Troike, 1984). It is estimated that good comprehension of a written text needs at least 95%–99% coverage of the lexical items (Laufer & Ravenhorst-Kalovski, 2010; I. P.; Nation & Waring, 1997; I. S.; Nation, 2001). That is, there should be fewer than five words that are unfamiliar to the reader in a 100-word paragraph, and 8000–9000 word families are needed for an adult's general reading



Corresponding author. 162, Section 1, Heping E. Rd., Taipei City 106, Taiwan. E-mail addresses: [email protected] (Y.-L. Tsai), [email protected], [email protected] (C.-C. Tsai).

https://doi.org/10.1016/j.compedu.2018.06.020 Received 19 August 2017; Received in revised form 23 April 2018; Accepted 20 June 2018

Available online 21 June 2018 0360-1315/ © 2018 Published by Elsevier Ltd.

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

(Schmitt, 2008). With such a large vocabulary size, second or foreign language (L2) learners are often frustrated and bored in the process of vocabulary learning (Long, 1996). On the one hand, being aware of the central role of vocabulary knowledge in language learning (Atay & Ozbulgan, 2007), researchers and practitioners in the field of language education have been intensively testing, developing, or comparing factors and means that can enhance vocabulary learning efficiency. On the other hand, digital games have been applied as a learning tool in nearly all domains of education since the last two decades due to its capability to foster “cognitive and behavioral change” (Steinkuehler, Squire, & Sawyer, 2014, p. 1). Hence, abundant studies have been developed either to test the effects or to compare the effectiveness of different types of digital games on L2 vocabulary learning (Moreno-Ger, Burgos, Martínez-Ortiz, Sierra, & FernándezManjón, 2008). While some empirical studies have reported positive effects of digital games on L2 vocabulary gains, some have revealed negative or mixed results (Sundqvist & Wikström, 2015; deHaan, Reed, & Kuwada, 2010). To cope with the issue, various studies have applied the meta-analysis method to generate an overall effect size (the difference in the L2 vocabulary gains of the experimental group and the control/comparison group) to offer a quantitative view of the subject matter. In Chiu's (2013) meta-analysis research on 16 computer-assisted language learning (CALL) studies (2005–2011), CALL without games produced a significantly better result of vocabulary learning (n = 9, d = 1.113, p = 0.003) than those with games (n = 7, d = 0.495, p = 0.001). On the contrary, Chen, Tseng, and Hsiao (2018) meta-analyzed 10 selected empirical studies (2003–2014) and reported a large overall effect size (n = 10, fixed-effect model d = 0.784, p = 0.000, random-effects model d = 1.027, p = 0.000) of L2 vocabulary gains under the umbrella research design “game versus traditional instruction.” To advance the digital game research on learning, however, Mayer (2015) argued that “broad doctrines should be replaced with testable theoretical models that contain specific learning mechanisms linked to research evidence on games for learning” (p. 350). To link the “specific learning mechanisms” and the research evidence, “the impact of playing the game on learning outcomes” (p. 350), Mayer further proposed a research model in which three conditions of research designs were presented to connect three types of research evidence (the cognitive-consequence type, the feature added-or-changed type, and the media-comparison type). Adapted and transformed into L2 vocabulary-learning scenarios, the aim of the first condition, comparing pretest-to-posttest gains of the L2 vocabulary of the experimental group that plays a video game and the control group that receives traditional instruction, is to assess the effect size between the two groups (cognitive-consequence type). The intention of the second condition, in which the control/ comparison group plays the base version of a game and the experimental group plays the same game with one specific feature added or changed, is to have further understanding of the effects of the specific features embedded in games on L2 vocabulary gains (the value added-or-changed type). The third condition, in which the experimental group plays a digital game and the control/comparison group receives the equivalent content of the game via a conventional medium (p. 351), aims to explore the effects size between the digital and the conventional media (the media-comparison type). Therefore, there was a demand for a more nuanced differentiation of research designs on digital game-based L2 vocabulary learning. Drawing on Mayer's (2015) contention on digital game-based learning (DGBL) and the purposes of this study, the contributions of this meta-analysis study on digital game-based L2 vocabulary learning are examined from three aspects. First, this meta-analysis study, taking advantage of a number of empirical studies published in the field to date, advances the topic by decomposing the conditions (types) of the research designs applied by the empirical studies on digital game-based L2 vocabulary learning so as to connect the specific learning mechanisms of DGBL with the research evidence (the impacts on L2 vocabulary learning outcomes). Second, the respective overall effect sizes of the conditions are calculated to further understand the potential effects of the mechanisms on L2 vocabulary learning. Third, previous meta-analysis studies on digital game-based L2 vocabulary learning have used game types, age, and cognates as the potential moderators to investigate their influence on the strength of the effect sizes (Chen et al., 2018). As one of the desired promises of meta-analysis is to “determine the effects of moderators that have never been examined in an original empirical study” (Guzzo, Jackson, & Katzell, 1987, p. 414) and “any study-level variable that may exert a systematic influence on the outcome measure can be considered a potential moderator” (Viechtbauer, 2007, p. 110), in addition to the previous findings, this study intended to extend the scope of the potential moderators so as to better depict the scenarios of digital game-based L2 vocabulary learning. Our research questions were composed accordingly: 1. What have been the conditions of research designs applied by the empirical studies on digital game-based L2 vocabulary learning? 2. What is the overall effect size in each condition? 3. What are the moderator variables that have significant influences on the between-study variation in each condition?

2. Method In order to address the three research questions, the Comprehensive Meta-Analysis (version 2.2.064) [Computer software]. Englewood, NJ: Biostat was employed for the computation. Four discrete steps were conducted: 1) to search for and identify potential target studies, an iterative process; 2) to develop a codebook based on the studies' research conditions and their study characteristics (moderator variables); 3) to calculate the overall effect size for each condition; and 4) to test the influences of the potential moderators under each condition. A colleague in the same research field was invited as the co-coder for the searching and coding processes. The above steps are described in detail in the following.

346

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

2.1. Searching for and identifying potential target studies In the first screening stage, we used two keywords, vocabulary and game for the Boolean search to screen the literature in the databases: Web of Science Core Collection (WOS), ERIC (EBSCOhost) and Scopus. Since the first document comprehensively depicting game-based learning was published in 2001 (Prensky, 2001), the search time zone was set from 2001 to the end of January 2017, and the articles were limited to those written in English. The search resulted in more than 300 studies, which were then imported into EndNote X7, a citation management software package. Next, the articles without using digital games for L2 vocabulary enhancement and those duplicated copies were deleted. Then the first author and the co-coder worked separately to screen each study by its baked-in theme in the title. The results of the two coders were compared. If the title of the study held opaque meaning, it was retained. The process reduced the potential candidates to 87 articles. The remaining articles in EndNote were sifted following the inclusion/exclusion criteria below to address the research questions. 1. A digital vocabulary learning game (not multimedia such as online glosses) was claimed to be implemented in the study as the key independent variable for L2 vocabulary learning. 2. Only experimental studies and quasi-experimental studies were included. Research reviews, case studies, qualitative research, and survey research were excluded. 3. Studies which did not include a game group and a control/comparison group were excluded. 4. Articles focusing on first language (L1) vocabulary acquisition were excluded. 5. Studies should report data from both the experimental and the control/comparison group sufficient for calculating the treatment effect sizes. 6. Only published studies were included. Unpublished studies were excluded as they had not been subject to peer review (Mostert, 2001). Although the exclusion might suggest the risk of publication bias, we chose to decrease the risk by using the standard and widely accepted method for calculating the fail-safe number to avoid other unpredictable bias (Kaminski, Valle, Filene, & Boyle, 2008; Schoemaker, Mulder, Deković, & Matthys, 2013). 7. The candidate study should have an English-written e-file available online for retrieval. 8. To increase the compatibility of the experimental results, studies focusing on learners with learning disabilities were excluded. By the end of the screening procedure, 26 studies remained. 2.2. Developing the codebook Drawing on Mayer's (2015) taxonomy of research designs on DGBL, we distinguished the conditions of the research designs applied by the 26 empirical studies. If the research design in a study was to compare the L2 vocabulary gains (from pretest to immediate posttest) for a group that plays digital games (experimental group) with a group that receives an alternative activity (control group), the study was coded as Condition 1, a research design for detecting the effects of digital games in general. If the research design was to compare L2 vocabulary gains of a group that plays the base version of a game (control/comparison group) with a group that plays the same game or games with one feature added or changed (experimental group), the study was coded as Condition 2, a research design for detecting the effectiveness of added or changed values in digital games. The studies with a research design of comparing a group that plays a digital game (experimental group) with a group that receives the equivalent content via conventional media or paper games (control/comparison group) were coded as Condition 3. The rest were coded as “Others” (as displayed in Table 1). The next phase was to generate a codebook to reveal potential moderator variables and the respective subgroups for later moderator analysis. A moderator in the meta-analysis is a study characteristic with different levels found throughout the target studies that is possibly predictive of the outcomes (Borenstein, Hedges, Higgins, & Rothstein, 2009). As one of our research goals was to explore potential moderator variables comprehensively, the processes in coding the moderator variables were conducted following Cooper's (2009, 2015) coding procedure, including the codebook construction, coder-training, discussion and negotiation on variation, and estimating reliability. The processes are comparable to Stock's (1994) eight-step training and coding procedures: 1) an overview of the synthesis by the principle coder, 2) consensus on forms and descriptions between the coders, 3) describing the Table 1 Codebook for the conditions of research design on DGBL. Condition

Experimental group (E group)

Control/comparison group (C group)

Comparison

Condition 1

Playing an off-the-shelf game

video games vs. traditional instruction

Condition 2 Condition 3

Playing the same game with one feature added or changed Playing a digital game

Engaging in alternative non-game-related activities Playing the base version of a game

Condition 4

Others

Receiving the same content of the game via conventional media

347

A video game with a specific feature vs. the game without that specific feature The effectiveness of the learning media between video games and others

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

method to organize the forms, 4) testing the forms on five to 10 studies, 5) estimating the total time for the coding process, 6) negotiation to achieve the consensus on coded forms, 7) revision of the forms and the codebook, and 8) repetitive processes on another study and so on until consensus is achieved (pp. 134–135). Following the coding processes, in the results, seven categorical types (game type, educational level, L2 proficiency level, linguistic distance, intervention setting, assessment type, game source) and one continuous type (intervention duration) of moderator variable were extracted for the moderator analyses. The eight potential moderator variables and their respective subgroup criteria are described as follows. 2.2.1. Game type The game types were dichotomized into drill and the task-based subgroups, similar to the taxonomy applied by previous metaanalysis studies in the same field (Chen et al., 2018; Chiu, Kao, & Reynolds, 2012). The drill games provide L2 vocabulary learners with repetitive practice of the words in different texts such as matching games and grammar- and vocabulary-related games (e.g., Aghlara & Tamjid, 2011; Jalali & Dousti, 2012). They have scores, challenges, multimedia, and so on, but without a meaningful task to work on. A task-based game, on the contrary, is a game with “a goal-oriented activity in which learners use language to achieve a real outcome” (Willis, 1996, p. 53). Players succeed in the game by completing the assigned tasks. These games' activities involve critical thinking and problem solving wrapped up in the form of meaningful tasks (Homer, Plass, Raffaele, Ober, & Ali, 2018), such as role-play games (Fahim & Sabah, 2012), strategy games (Saffarian & Gorjian, 2012) or adventure games (Vahdat & Behbahani, 2013), in which learners' focus is on meaning rather than on form (Breen, 1987; Estaire & Zanón, 1994). 2.2.2. Educational level Due to the limited number of studies, educational levels were combined into three subgroups in which preschool and elementary school students were coded as the primary group (Aghlara & Tamjid, 2011; AlShaiji, 2015; Aslanabadi & Rasouli, 2013; Saffarian & Gorjian, 2012), junior and senior school students as the middle group (Jalali & Dousti, 2012; Muhanna, 2012), and university students as the high group (Ashraf, Motlagh, & Salami, 2014; Fahim & Sabah, 2012; Vahdat & Behbahani, 2013; Yip & Kwan, 2006). 2.2.3. L2 proficiency level The issue of applying learners' initial L2 proficiency levels for the moderator analysis was the lack of a standardized language assessment instrument, such as the TOEFL or TOEIC, applied in the primary studies to position learners' L2 proficiency levels (Norris & Ortega, 2000). To cope with this problem, we grouped learners' L2 proficiency into three levels based on the descriptions in the studies: beginning, beyond-beginning, and mixed groups. The studies with participants having no primary knowledge of the target language or with a primary level (e.g., L2 kindergarten students) were coded as the beginning group (Aghlara & Tamjid, 2011; AlShaiji, 2015; Aslanabadi & Rasouli, 2013; Jalali & Dousti, 2012). The studies in which learners had lower-, pre-intermediate, to intermediate L2 proficiency were put into the beyond-beginning group (Ashraf et al., 2014; Saffarian & Gorjian, 2012; Vahdat & Behbahani, 2013). Studies using pretests as the covariate variable without indicating learners' L2 proficiency were put into the mixedlevel group. 2.2.4. Linguistic distance As one of the L1 or the L2 in the 26 studies was English, we coded the linguistic distance between the two languages in each study based on the score of the other one in Chiswick and Miller's (2005, pp. 12–13) index, a coding approach previously adopted by Chen et al. (2018). The score of the language in the index was the averaged outcome of a specific English achievement test taken by a group of learners with the same mother language after they received a 24-week English language training. If the score of the language in the index was larger than or equal to 2, suggesting a shorter learning distance of the language to English, the language was coded as “Close”, such as Farsi to English (2) (Aghlara & Tamjid, 2011), and Dutch to English (2.75) (Segers & Verhoeven, 2003); contrarily, Mandarin to English (Huang & Huang, 2015) and Arabic to English (AlShaiji, 2015; Muhanna, 2012) were coded as “Far” for their scores of less than 2, suggesting a farther learning distance. 2.2.5. Assessment type Learners' vocabulary knowledge has been differentiated into the passive/receptive type (the knowledge of a word's meaning) and the active/productive type (the knowledge of using words in context) (Laufer & Paribakht, 1998). Drawing on the descriptions of the target studies, we coded the assessments examining students' passive vocabulary knowledge as the receptive/passive assessment type, such as multiple-choice questions. Those examining students' active vocabulary knowledge such as filling in the blank (Letchumanan, Tan, Paramasivam, Sabariah, & Muthusamy, 2015), answering a question (Sandberg, Maris, & Hoogendoorn, 2014), composition, and presentation, were coded as the productive/active assessment type (Sandberg et al., 2014; Shintani, 2014). 2.2.6. Intervention setting Since it is about learning, and the participants are students, formal settings (in class) usually involve compulsory learning with teachers' supervision, while informal settings (e.g., home, out-of-school, or after-school classroom settings) (Franciosi, 2017; Sandberg et al., 2014; Yen, Chen, & Huang, 2016, pp. 255–262) often speak for interest-powered or self-motivated learning (Davis & Fullerton, 2016) where students receive less instruction and pressure. 348

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Table 2 The codebook for the moderators and their subgroup definitions. Category

Subgroups

Definition

Game type

1. Drill type 2. Task-based type

Educational level

1. 2. 3. 1. 2. 3. 1. 2. 1. 2. 1. 2. 1. 2. 3.

The drill-and-practice types of games that provide exposure to words through multiple texts. Games involving problem-solving, simulations, decision-making (Breen, 1987) with learners' focus on meanings rather than on word forms (Estaire & Zanón, 1994) Preschool and elementary school students Junior and senior school students University students Primary level, no prior knowledge, kindergartens Pre-, lower-level, intermediate-level Studies using pretest as covariate without grouping participants' language proficiency The language scored ≥2 The language scored < 2 Playing games in class Playing games after class or at home Tests such as multiple-choice, which examine students' passive vocabulary knowledge Filling the blank, composition, presentation, etc., which test students' active vocabulary knowledge Games developed for the research Games offering free access online CD-ROM, off-the-shelf software The duration was counted by the day. One week is counted as seven days.

L2 proficiency

Linguistic distance Intervention setting Assessment type Game source

Primary Middle High Beginning Beyond-beginning Mixed Close Far formal informal Receptive Productive Custom-design Web Software

Intervention duration

2.2.7. Game source Regarding the sources of the games, if the game was specifically designed for the study, it was put into the custom-design group. If the game was borrowed from a website which offers free access to the public, it belonged to the web group. If the source of the game was CD-ROM or off-the-shelf software, the game was grouped as the software group. 2.2.8. Intervention duration Because the lengths of the treatments varied to a large degree in the 26 empirical studies (from one day to 15 weeks), instead of dividing them into a random number of subcategories, we coded each treatment duration by the day (one week equals seven days). The codebook for the moderators and their subgroup definitions is summarized and listed in Table 2. 2.3. Conducting the meta-analysis Meta-analysis, a term coined by Glass (1976) and widely applied as a supplemental method to traditional research reviews (Hedges, 1982), is a statistical approach aiming to provide statistical figures generated by extracting and integrating data of a large quantity of empirical studies within the same research domain. One of the main goals for conducting the meta-analysis is to pursue an overall effect size (the standardized mean difference between the experimental group and the control/comparison group) to “quantify the magnitude of the difference between groups” (Maher, Markey, & Ebert-May 2013, p. 345). Another goal in the meta-analysis involves “the comparisons of the mean effects of groups of studies that have different characteristics” (Hedges & Pigott, 2004, p. 426), which, in other words, is to conduct moderator analyses. Two effect-size models, the fixed-effect model and the random-effects model, have been developed for the calculation of the effect sizes based on different inference models (Hedges & Vevea, 1998). If the aim of the analyst is to make inferences of the observed studies, or of a set of identical studies (any variation between studies would be due to sampling errors), the fixed-effect model is considered an appropriate model; however, if the analyst aims to generalize the inferences beyond the observed studies to a population, the random-effects model is desired. Based on our study goals, the statistical values under the random-effects model were employed for further analysis. To achieve the overall effect size, first, we used the Comprehensive Meta-analysis (version 2.2.064) to convert the descriptive statistics and the independent outcome data (different outcome scores from independent assessments or the outcome scores of an assessment from independent groups) to effect sizes (hence, the number of the effect sizes was sometimes larger than the number of the studies), which were calculated for the overall effect size. Next, the significance of the overall effect size was examined by the test of the null hypothesis. If the p value of the null hypothesis was significant (p < 0.05), the null was rejected, suggesting the significant differences between the two groups on L2 vocabulary gains. The metric for the effect sizes is Cohen's d. The Cohen's d values were transformed to Hedges's g, “a corrected standardized mean effect size” (Perez, Van den Noortgate, & Desmet, 2013, p. 726) using sample weights to address the issue of small-sample bias (Hedges, 1985; Lipsey & Wilson, 2001). According to Cohen's rules (Hedges & Olkin, 2014), an effect size of less than 0.2 is considered a small effect, between 0.2 and 0.8 is a medium effect, and larger than 0.8 is a large effect. In addition, the heterogeneity of the effect sizes in each condition was examined by the Q-test (Cochran's Q). The significance of the Q-value (Qtotal) (p < 0.05) suggests the significant dispersion of the effect sizes, indicating that there were true differences among the effect sizes, and moderator analyses were thus suggested. The non-significance of Qtotal (p > 0.05), on the contrary, suggests that differences among the effect sizes were subject to sampling errors (Hedges, 1992). 349

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

The statistical technique Classic Fail-safe N was implemented to address the question of the effect sizes affected by publication bias. To conduct moderator analyses, we also adopted the Q-test for each moderator variable to examine the heterogeneity of the effect sizes between the subgroups (Qb) under the suggested mixed-effects model (a random-effects model is applied to combine studies within each subgroup, and a fixed effect model to combine subgroups to produce the overall effect) (Borenstein et al., 2009, p. 183). As the intervention durations of the studies were coded as a continuous moderator variable, we applied the meta-regression method, under the method of moments (a mixed-effects model), to explore the relationship between the durations and the effect sizes. 3. Results The results are presented in a sequence related to the three research questions. 3.1. Research condition Following Mayer's (2015) three conditions of research designs, we coded 10 studies with 10 effect sizes and 642 participants as Condition 1 (a group that plays digital games versus a group that receives an alternative activity), 10 studies with 14 effect sizes and 837 participants as Condition 2 (the control/comparison group that plays base versions of games versus the experimental group that plays games with one specific feature added or changed). The added or refined features tested include storylines, challenges and rewards, adaptive learning, embedded questions (passive versus active), a refined feedback system, embedded quizzes, simulation games, guiding strategies, ranking systems, and scaffolding. Two studies with two effect sizes and 129 participants were coded as Condition 3 (the experimental group that plays a digital game versus the control/comparison group that receives the equivalent content via conventional means). It should be noted that the focus of our coding criteria for Condition 3 was about the equivalent content of the treatments of the two groups. Therefore, the means such as a paper-booklet plus audio clips in the comparison group in Calvo-Ferrer's (2017) study, and the paper flash-card games in Letchumanan et al.’s (2015) studies were considered adequate to be included in the condition. In addition, Condition 4 was created to substitute the “Others” category. The reason for the substitution was that an additional condition of research designs other than the three mentioned above was identified. The common ground of the studies in this condition was that all the participants received an exact same game as the treatment, but were grouped and compared by a non-game related factor, such as by positions (playing the game versus watching the game) (Ali Mohsen, 2016; deHaan et al., 2010), by motivation (voluntary versus compulsory) (Ma & Kelly, 2006), or by gaming frequencies (Sundqvist & Wikström, 2015), and the focus of the studies was to test the influences of a non-game related variable on digital game-based L2 vocabulary learning. Four studies (six effect sizes and 203 participants) were involved in Condition 4. The framework of the four-conditional research designs for game learning research and the corresponding articles on digital game-based L2 vocabulary learning are displayed in Table 3. Table 3 The framework of the four-conditional research designs for digital game learning research and the corresponding articles on digital game-based L2 vocabulary learning. Research design - purpose

Definition

Condition 1 - Effectiveness of digital games in general

Game play experimental group versus an alternative activity (control group)

- Game vs. traditional instruction (such as memorizing, no treatment) - Game vs. placebo (reading a text-only web page)

Condition 2 - Effectiveness of values added-or-changed in games

Experimental group playing games with one specific feature added or changed versus control group playing base versions of games

Condition 3 - Effectiveness of media

Experimental group playing a digital game vs. control/comparison group receiving the same content via conventional means All participants receiving the same digital game but grouped by a non-game related variable

- Enhanced version vs. original version - Game with scaffolding vs. game without - Game with active questions vs. game with passive questions - Mobile app with game vs. app without game features double loop vs. single loop - Game vs. booklet with equivalent content + audio - Game vs. paper-based game with equivalent content - Player vs. watcher - Out of class vs. in class - High gaming frequency vs. low frequency

Condition 4 - Effectiveness of nongame related factors

Example

Note: k = the number of studies; Nes = the number of effect size; Npart = the number of participants. 350

Study Yip and Kwan (2006); Muhanna (2012); Aghlara and Tamjid (2011); Jalali and Dousti (2012); Aslanabadi and Rasouli (2013); Vahdat and Behbahani (2013); Ashraf et al. (2014); AlShaiji (2015); Fahim and Sabah (2012); Saffarian and Gorjian (2012); Segers and Verhoeven (2003); Sandberg et al. (2014); Young and Wang (2014); Huang and Huang (2015); Wang, Hwang, and Chen (2015); Franciosi, Yagi, Tomoshige, and Ye (2016); Hwang and Wang (2016); Lu and Chang (2016); Yen et al. (2016); Franciosi (2017)

Calvo-Ferrer (2017); Letchumanan et al. (2015)

Ali Mohsen (2016); deHaan et al. (2010); Ma and Kelly (2006); Sundqvist and Wikström (2015)

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

3.2. Overall effect size To answer the second research question, we ran the Comprehensive Meta-analysis (version 2.2.064) to compute the overall effect sizes for the four conditions respectively. The results suggest a large overall effect size (d = 0.986, CI [0.590–1.382], p = 0.000) for Condition 1, indicating that DGBL significantly outperformed alternative activities on students' L2 vocabulary gains. The Qtotal value in a further step shows that there was significant dispersion between the 10 effect sizes (Qtotal = 47.732, df = 9, p < 0.001), and over 80% of the dispersion was attributed to true differences, not simply caused by sampling errors (I2 = 81.145). The results for Condition 2 (d = 0.445, CI [0.218–0.672], p = 0.000) indicate that the added-or-changed features had an overall potential to significantly increase the effectiveness of digital games by a medium effect size compared to digital games in their base versions. The heterogeneity test (Qtotal = 30.120, df = 13, p < 0.01, I2 = 56.840) also suggests further moderator analyses. Based on the suggestion that meta-analysis could still be conducted meaningfully with the minimum amount of two studies (Koricheva, Gurevitch, & Mengersen, 2013), we also calculated the overall effect sizes for Conditions 3 and 4. A significant mediumto-large overall effect size (d = 0.733, CI [0.376–1.091], p = 0.000) is reported, indicating that digital games were more effective for L2 vocabulary learning comparing to other means with an equivalent content. The heterogeneity test suggests no true variance between the two effect sizes (Qtotal = 0.597, df = 1, p > 0.05, I2 = 0.000). The p value (d = 0.503, CI [-0.840–1.846], p = 0.463) shows a non-significant overall effect size for Condition 4. However, the heterogeneity test (Qtotal = 125.194, df = 5, p < 0.001, I2 = 96.006) indicates a high degree of dispersion among the effect sizes, which could be explained partially by two particular studies. The similarity of the two studies was that the game groups in both studies all played a game while the comparison groups all watched the same game played. In such a similar research design, however, players in deHaan et al.’s (2010) study recalled significantly less vocabulary than the watchers did (d = −2.43, CI [-3.010∼-1.854]), but players recalled significantly more vocabulary (d = 2.232, CI [1.471–2.994]) than the watchers in Ali Mohsen's (2016) study. The details of the overall effect sizes and the heterogeneity tests in Conditions 1, 2, 3, and 4 are summarized in Table 4 (the individual-study details are displayed in the Appendices). As we adopted only published studies for the meta-analysis, to address the publication bias, we ran the classic fail-safe N analysis. Rosenthal (1979) suggests that if the “tolerance level” (the fail-safe X) is larger than the requisite number 5Nes + 10 (Nes = the total number of reported effect sizes), the result of the meta-analysis is considered “resistant to the file drawer problem” of unreported null effects (p. 640). The result is 283 for Condition 1, and 117 for Condition 2, indicating that study numbers in both conditions are larger than the requisite numbers. 3.3. Moderator analysis To address the third research question, we produced Table 5 to display the statistical results of the moderator analyses for Conditions 1 and 2, but not for Conditions 3 and 4 due to the limited study numbers. 3.3.1. Significant moderator variables Under the mixed-effects model, three categorical moderators (game types, educational levels, L2 proficiency levels) had a significant influence on learning in Condition 1, and two (intervention settings and assessment types) in Condition 2. The details are displayed and discussed in the following. 3.3.1.1. Game type. The results of the heterogeneity test (Qb = 4.235, df = 1, p < 0.05) indicate that game types were a significant Table 4 The overall effect size and the heterogeneity test in Conditions 1, 2, 3, and 4. Condition 1 Model RE K 10 Nes 10 n 642 d [CI (95%)] 0.986 [0.590–1.382] g [CI (95%)] 0.970 [0.581–1.359] p (test of null) 0.000 Heterogeneity Qtotal 47.732∗∗∗ df 9 I2 81.145 Classic Fail-safe N Number of missing studies that would bring p-value to > 0.05: 283

Condition 2

Condition 3

Condition 4

RE 10 14 837 0.445 [0.218–0.672] 0.438 [0.215–0.662] 0.000

RE 2 2 129 0.733 [0.376–1.091] 0.724 [0.371–1.077] 0.000

RE 4 6 203 0.503 [-0.840–1.846] 0.490 [-0.828–1.808] 0.463

30.120∗∗ 13 56.840

0.597 1 0.000

125.194∗∗∗ 5 96.006

Number of missing studies that would bring p-value to > 0.05: 117

Note. RE = random-effects model; k = the total number of studies; Nes = the total number of effect sizes; n = total sample size; d = Cohen's d; g = Hedges's g. p < .05: Fail-safe N > 5 Nes + 10. 351

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Table 5 Effect sizes of moderator variables at different levels in Conditions 1 and 2 (Mixed-effects model). Condition 1 Moderator Game Type Drill Task Educational level Low Middle High L2 proficiency level Beginning Beyond-beginning Mixed Linguistic distance Close Far Intervention setting Formal Informal Assessment type Receptive Productive Game source Custom-design Web Software Intervention duration mixed-effects reg.

Nes

d

p

CI (95%)

7 3

0.711 1.669

.000 .000

[0.384–1.038] [0.817–2.521]

Condition 2 Qb 4.235∗

df 1

Nes

d

p

CI (95%)

5 9

0.264 0.523

.067 .001

[-0.019-0.547] [0.221–0.825]

7 0 7

0.441 0 0.485

.001

[0.183–0.699]

.024

[0.064–0.906]

7 5 2

0.441 0.337 0.834

.001 .217 .000

[0.183–0.699] [-0.197-0.871] [0.398–1.271]

4 10

0.568 0.397

.001 .008

[0.222–0.914] [0.106–0.688]

10 0

9 5

0.219 0.771

.117 .000

[-0.055-0.492] [0.554–0.989]

10 0

11 3

0.332 0.839

.010 .000

[0.080–0.584] [0.532–1.145]

8 0.419 4 0.553 2 0.228 Slope −0.003

.035 .001 .287 p .370

[0.030–0.808] [0.240–0.866] [-0.192-0.647] CI (95%) [-0.009-0.004]

∗∗∗

17.420 4 2 4

1.128 0.310 1.241

.003 .023 .000

7.284∗ 4 3 3

0.637 1.744 0.801

.000 .000 .016

[0.306–0.967] [1.011–2.478] [0.148–1.455]

7 3

1.050 0.868

.000 .000

[0.502–1.698] [0.212–1.523]

0.175

1.190 0 5 0.760 5 1.221 Slope −0.008

.000 .000 p .258

2

[0.387–1.869] [0.043–0.577] [0.868–1.615]

[0.340–1.179] [0.507–1.935] CI (95%) [-0.022-0.006]

Qm 1.279

2

1

1

df 1

Note: k = Number of studies; Nes = Number of effect sizes; df = degree of freedom; d = Cohen's d; ∗p < 0.05; Slope = slope of regression line; Qm = Q-value under mixed-effects regression model.

∗∗

p < 0.01;

∗∗∗

Qb 1.506

df 1

0.031

1

2.801

2

0.550

1

9.619∗∗

1

6.273∗

1

1.494

2

Qm 0.803

df 1

p < 0.001.

moderator variable influencing the L2 vocabulary learning outcomes in Condition 1. The effect size of task-based games (d = 1.669, df = 2) significantly outperformed that of drill types (d = 0.711, df = 6). However, we could not claim the moderator effects of game types in Condition 2 (Qb = 1.506, df = 1, p > 0.05). 3.3.1.2. Educational level. The effect sizes of the subgroups in educational levels also differed significantly in Condition 1 (Qb = 17.420, df = 2, p < 0.001). This shows that there was a large effect size for university students (d = 1.241, df = 3, p < 0.001) as well as for preschool and elementary students (d = 1.128, df = 3, p < 0.01), and a small to medium effect size for junior and senior high school students (d = 0.310, df = 1, p < 0.05). It indicates that digital games worked better for the two ends of the educational levels than for the junior and senior high school students. It should be noted that no junior or high school participants were involved in Condition 2 (Qb = 0.031, df = 1, p > 0.05). 3.3.1.3. L2 proficiency level. Students' L2 proficiency levels also played a significant moderating role in L2 vocabulary learning in Condition 1 (Qb = 7.284, df = 2, p < 0.05). The results suggest that digital games had the potential to produce a large effect size in L2 vocabulary learning when the students held a certain degree of L2 vocabulary knowledge (beyond-beginning learners) (d = 1.744, df = 2, p = 0.000). In contrast, it produced a medium effect size when students were beginning learners (d = 0.637, df = 3, p = 0.000). In other words, students with some degree of L2 prior knowledge had better L2 vocabulary gains by playing digital vocabulary games than those beginning learners. Nevertheless, L2 proficiency levels did not have the significant moderating influence on the effect sizes in Condition 2 (Qb = 2.801, df = 2, p > 0.05). 3.3.1.4. Intervention setting. While there was a lack of informal settings in Condition 1 for statistical evaluation, intervention settings played a significant role in moderating the effect sizes of the added-or-changed values in games in Condition 2 (Qb = 9.619, df = 1, p < 0.01). This suggests that when students played the more sophisticated games in informal settings, the effect size was medium to large (d = 0.771, df = 4, p < 0.001), but the games had no significant influence in the formal settings (d = 0.219, df = 8, p > 0.05). 3.3.1.5. Assessment type. Although productive assessments were absent in Condition 1, three cases were identified in Condition 2. The results indicate that the effect sizes of the added-or-changed values were significantly moderated by the assessment types (Qb = 6.273, df = 1, p < 0.05). This shows that the effect size was large when students were tested on their productive (active) vocabulary knowledge (d = 0.839, df = 2, p < 0.001), but small (d = 0.332, df = 10, p < 0.05) when they were tested on their receptive (passive) vocabulary knowledge. 352

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

3.3.2. Non-significant variables Under the mixed-effects model, linguistic distance (Qb = 0.175, df = 1, p > 0.05 in Condition 1; Qb = 0.550, df = 1, p > 0.05 in Condition 2) and game source (Qb = 1.190, df = 1, p > 0.05 in Condition 1; Qb = 1.494, df = 2, p > 0.05 in Condition 2) were categorical variables that posed no moderating influence in either condition. The regression coefficient for the relation between the continuous variable of intervention durations and the vocabulary effect sizes was negative (r = −0.008, z = −1.131, p = 0.258 in Condition 1). The p value suggests no significant relationships between intervention durations and L2 vocabulary gains, and the Qmodel value (Qmodel = 1.279, df = 1, p > 0.05) indicates no significant dispersion of the effect sizes. The non-significant influence was also found in Condition 2 (r = −0.003, z = −0.896, p > 0.05; Qmodel = 0.803, df = 1, p > 0.05). To wrap up the findings, firstly, four conditions of research designs were identified from the 26 empirical studies on digital gamebased L2 vocabulary learning. Secondly, the four conditions varied to a large degree in terms of the overall effect sizes, the numbers of the studies included (from two to 10 studies), and the degree of dispersion among the effect sizes. Thirdly, we retested two old (game types and linguistic distances) and explored six new potential moderators (educational levels, L2 proficiency levels, intervention settings, assessment types, game sources, and intervention durations) regarding their influences on digital game-based L2 vocabulary learning. The results yielded are harnessed to address the research questions. 4. Discussion and implications 4.1. Conditions and overall effect size The identification of the four conditions regarding research designs for digital game-based L2 vocabulary learning not only supports and extends Mayer's (2015) three research questions on digital game-based learning, but also gives more weight to the existing phenomena. In comparison with the results of previous meta-analysis studies in the field, this study again supports that DGBL is superior to traditional instruction on L2 vocabulary achievements by a large effect size, but lends no evidence for Chiu's (2013) report that CALL without games has better effects than with games. As Chiu et al.'s. (2012) findings were also in favor of DGBL, the possible reasons accounting for the contradiction might be whether players' L2 vocabulary learning experiences could be accommodated comfortably in the digital games, or whether players' experience of the DGBL meets the educational purposes (Reinhardt & Sykes, 2014). That is, successful DGBL lies not only on the satisfaction of gameplay, but also on the realization of the learning goals. In addition to the issues mentioned above, the relationship between research designs and game-related factors has not as yet been well described. Based on the locales of the factors in each condition, we juxtaposed the four conditions with the factors by their locality. Condition 1, which includes an experimental group playing a digital game accompanied by traditional language instruction and a control group receiving traditional language instruction, explores the total effect of factors from all possible locales. It might include game-design factors, the game-internal factors, such as game genre, topic, game characters, game contextual information, sound, music, graphics, game rules, and so on, which have the potential to enhance or diminish players' motivation, engagement, immersion, and learning (Cairns, Cox, & Nordin, 2014; Prensky, 2001). It might also include the interface, or factors from outside the game, the game-external factors, such as spatial factors (e.g., learning environment), temporal factors (e.g., frequency), players' attitudes (e.g., active versus passive players), and social expectations (e.g., teachers' attitudes, parental expectation), which directly or indirectly influence the total learning effectiveness. While Condition 1 examines the total effects of all potential factors influencing DGBL, Conditions 2, 3, and 4 have their own foci on the game-internal, interface, and game-external factors, respectively. Condition 2 includes a control/comparison group playing a base form of a game and an experimental group playing the game with a feature added in or changed in the game. That is, research in Condition 2 investigates the game-internal factors. Condition 3 includes an experimental group playing a digital game, and a control/ comparison group playing a game with the conventional medium. The effect size is credited to the interface (i.e., the medium) through which the game is played. In Condition 4, the effects are the results of the non-game factors, which, examined by their localities, are located outside the game per se. In other words, they are game-external factors.

Fig. 1. Structure diagram of the relationship between research conditions and game-related factors by their localities. 353

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Accordingly, the relationship between the four conditions and the localities of the game-related factors is visualized through the structure diagram displayed in Fig. 1. The contributions of the structure diagram are discussed correspondingly. First, by recognizing the locality of the issues to be solved, game designers and researchers could cooperate more easily to locate the problems using the shared language. For example, if players reflect negative emotion toward a character in a game, the character is a game-internal issue, whereas the players' emotion is an external factor. The game designer could locate the game-internal factors, find the corresponding game-external factors, and cooperate with researchers and educational practitioners to seek solutions based on game theory and empirical evidence. It could also be utilized to assist novice researchers with the decision of an appropriate research design for the issues under focus. Further, it helps identify the gaps in DGBL research. For example, while most studies in Condition 2 investigated the augmented features added in or changed, few have explored the effects of changing the content features, such as text-type or voice-type narratives in the game for task-based learning based on cognitive load theory (Sweller, 2011). The implication from Condition 3 is that with the advance of digital technologies, the interface might be switched to virtual reality (VR), or augmented reality (AR), or even further. Their potential to enhance the impact of DGBL on L2 vocabulary gains is highly anticipated and to be explored. The identification of Condition 4, exploring the game-external factors, invites research not just on spatial and temporal factors, but also on the potential game-external factors such as socio-cultural influences on players' motivation and engagement in DGBL (Prensky, 2001). 4.2. Moderator variables The extensive search for moderator variables enables us to present a better outline of the research trend in digital game-based L2 vocabulary learning; the results of the moderator analyses facilitate our knowledge of the potential factors influencing the effects of DGBL even when they are not in the foci of the research. 4.2.1. Significant moderator variables 4.2.1.1. Condition 1. Among the eight moderator variables analyzed, game types were the only game-internal factor, which indicates the limited information offered on game-internal factors for further analysis. Game types, educational levels, and L2 proficiency levels are three moderator variables with significant influences on the effect size of digital game-based learning and alternative activities on L2 vocabulary gains. Drawing on the results listed in Table 5, the best scenario for digital game-based L2 vocabulary learning might be when university students, with beyond-beginning L2 proficiency, play task-oriented digital games. With the limited information provided, game types were categorized into task-based games if a task was involved in the game, and drill-practice games if no tasks were involved. Our statistical results indicate that task-based games significantly outperformed the drill-practice games. This implies that while drill-practice games might meet the learning purpose, they might also be seen as less gameful for encouraging engagement. Research has reported the capability of task-based games as they offer task goals to stimulate players' critical thinking, problem-solving and task engagement (Baralt & Gómez, 2017; Chen et al., 2018). It is believed that taskbased games provide more meaningful and engaging situations to increase learners' motivation for learning as well as for meaning negotiation (Chiu et al., 2012). In other words, they are more capable of carrying out both gameplay and learning missions. Yet, as different tasks draw up different SL features, and different game genres tend to offer different tasks, such as the Second-life game genre offering a venue facilitating SL communication (Chen, 2016), our results give more supports to the contention that whether the tasks are well designed to engage players and at the same time meet the purpose of L2 vocabulary learning will decide the effects of the learning outcomes. From Table 5 we also recognize that the low-educational level students (primary school and kindergarten) and university students had better vocabulary gains than the middle-level (junior and high school) students. One of the plausible reasons might be that most of the middle-level students in the primary studies had to face the pressure from the national school entrance examinations. One of the results was that there were few studies on the middle-level students; the other was that under the pressure, the stereotypical impression of game distraction to academic learning minimized the pleasure of the digital game learning and therefore decreased the learning outcomes. This rationale could also lend support to the phenomenon that there was no study on middle-level students in Condition 2. In the digital game-based L2 vocabulary learning scenarios, students with a certain amount of L2 learning experience (beyondbeginning level) had better vocabulary gains than those without (beginning level). One of the possible reasons might be that the prior L2 vocabulary knowledge enables learners to find connections between words and therefore accelerates their L2 vocabulary gains (Abraham, 2008; Pulido, 2003). 4.2.1.2. Condition 2. The results of the moderator analyses in Condition 2 illustrate the scenarios of L2 vocabulary learning influenced by the added-or-changed features of digital games. Based on our findings, digital games with features added in or changed for educational purpose (e.g., games with scaffolding) work better than their base forms when the intervention setting is more informal and the instructors employ productive vocabulary knowledge assessments. This suggests that students are more selfmotivated in an environment with less instruction and pressure, lending support to Norris and Ortega's (2000) research findings that “less instruction is more effective” (p. 474). In addition, while some interventions (e.g., computer-mediated glosses) have been reported as favoring learners' passive/receptive vocabulary gains (Abraham, 2008), our results reveal the potential of a well-designed feature to enhance students' productive vocabulary knowledge (the capability of using vocabulary in contexts). It is a result in concordance with the premise of effective DGBL: the successful DGBL lies in a well-designed game in which the engagement of gameplay meets the needs of the educational goals. 354

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

In sum, to optimize digital game-based L2 vocabulary learning, it is suggested that game designers focus more attention on how language is used in the tasks or in the contexts. At the other end, instructors are encouraged to create a pressure-free environment and apply assessment of productive vocabulary knowledge when employing digital games for L2 vocabulary learning. 4.2.2. Non-significant variables The non-significant influences of linguistic distances in both conditions, conflicting with the cross-linguistic interference theory in which learners' L1 plays a significant role in learning an L2 (Grosjean, 2012; Selinker, 1972), nevertheless, are consistent with Chen et al.'s (2018) finding. One of the conceivable reasons might be that linguistic features (e.g., phonology, morphology, or semantics) (VanPatten, 1994) had not been a target in the game-based vocabulary learning, or that our coding criterion for linguistic distance, the score in Chiswick and Miller's (2005) index, was the result of a 24-week language training program. It should be noted that the longest intervention duration in the 26 studies was 105 days (Segers & Verhoeven, 2003). Our research results also suggest that there might be no significant divergence among vocabulary games from either custom-design, web, or off-the-shelf software. It is also noteworthy that while digital games are superior to various alternative activities in terms of motivating L2 vocabulary learning, the effectiveness of digital games is not necessarily accelerated by longer durations. Fatigue, boredom (Segers & Verhoeven, 2003), and short-term memory loss are elements that have been considered to contribute to this non-significance (Chiu, 2013; Cowan & AuBuchon, 2008). 5. Summary and critical evaluation A wide range of variability has been highlighted regarding the research designs and moderator variables in the empirical studies. The contributions of this meta-analysis could be summarized into three aspects. The first is the identification of the four conditions derived from Mayer's (2015) taxonomy of research designs in DGBL, and their respective effect sizes. It enables further understanding of the status quo, and helps us to connect the research question and the corresponding research designs. The second is the development of a structure diagram on the relationship between the four conditions and the game-related factors by their locality. It connects the conditions of research designs and their respective factors through the visualized structure diagram, which helps us identify the gaps in research and enhance the communication between game designers, researchers, and educational practitioners. The third is the examination of eight moderator variables that directly or indirectly influenced the effectiveness of the DGBL under Conditions 1 and 2. It indicates the influences of covariate variables when conducting research, and implies future research directions – the use of the moderator variables as independent variables in future research. The only game-internal moderator variable is game types, which not only advocates detailed descriptions of game-related factors in research, but also implies directions for further research. That is, further research on the effects of content-related variables. Nevertheless, it should be noted that there are also limitations along with the contributions. Breaking down the 26 empirical studies into four categories leads to the insufficiency of quantity for solid evidence-based theory. Therefore, our findings should be treated as the inception of research rather than the completion of a phenomenon. It is recommended that more efforts are made not only to increase the empirical evidence in each condition and moderators, but also to augment their numbers. In addition, it is suggested that researchers apply digital games for research with caution. Descriptions of all possible variables in details are encouraged to constrain the undesired influences on the outcomes. At the other end, it is suggested that educational practitioners consider the quality and the functions of the educational digital games before use. After all, successful DGBL should satisfy both the gameplay and the purposes of education. 6. Conclusion L2 vocabulary learning has been deemed a daunting process. The results of this meta-analysis study suggest that the use of digital games can effectively motivate and enhance students' L2 vocabulary learning. Our research findings also illustrate various gamelearning scenarios in which different factors might lead to profoundly different learning outcomes. Although previous studies have reported the benefits of using digital games to enhance L2 vocabulary learning, this study advances the research by offering more and newer empirical evidence to support the argument. Further, drawing on Mayer's (2015) theoretical framework on digital game learning, this study develops a framework of four-conditional research designs based on the empirical evidence. Under the framework, research designs in the field were differentiated into four conditions, namely Condition 1, examining the effect sizes of L2 vocabulary gains when playing the digital game and receiving alternative activities, Condition 2, examining the effect sizes of L2 vocabulary gains when playing a game with added or changed features and a game in its base form, Condition 3, examining the effect sizes of L2 vocabulary gains when playing a digital game and receiving an intervention with the equivalent content through different means, and Condition 4, examining the effect sizes of L2 vocabulary gains influenced by a nongame related variable. Further, the four conditions are connected with their respective game-related factors by their locality, and the relationship is visualized through the structure diagram. That is, Condition 1 investigates the total effect of all potential factors, Condition 2 investigates game-internal factors, Condition 3 investigates the potential effect of the interface, and Condition 4 investigates the game-external factors. Researchers are encouraged to test and extend the four-conditional research designs in diverse academic disciplines in the scope of digital game-based learning. What is more, this study investigated eight potential moderators in Conditions 1 and 2. The results depict the effectiveness of digital games in a variety of L2 vocabulary learning scenarios. It suggests that the best effect of L2 vocabulary game-based learning might take place when university and elementary students with prior L2 learning experiences play task-based games. The results from 355

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Condition 2 suggest that the added features in games have generated a medium effect size on L2 vocabulary gains, and the games with added and changed features outperform their base forms in accelerating students' productive vocabulary knowledge when they are implemented in a less time-constrained environment. Finally, our study results illustrate the status quo of research on digital game-based L2 vocabulary learning. On the one hand, it is the limitation of this study that there are not enough studies for better confirmation of the effect sizes for the four conditions, and there has been a shortage of well-defined terms regarding the crucial concepts of DGBL, including the definition of what a game is, and how a task game is defined; on the other hand, it suggests that DGBL can be a very effective tool for L2 vocabulary learning, if it is well designed to meet the pedagogical purposes and the learners' needs, echoing the call for the educational value of digital games (Hong, Cheng, Hwang, Lee, & Chang, 2009). Acknowledgement This work was financially supported by the Ministry of Science and Technology, Taiwan under grant number MOST 105-2511-S003 -052 -MY3, and by the “Institute for Research Excellence in Learning Sciences” of National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. In addition, we would like to give our thanks to the reviewers for their insightful advice. Appendices. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.compedu.2018.06.020. References deHaan, J., Reed, W. M., & Kuwada, K. (2010). The effect of interactivity with a music video game on second language vocabulary recall. Language, Learning and Technology, 14(2), 74–94. Abraham, L. B. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning, 21(3), 199–226. http://dx.doi.org/10.1080/09588220802090246. Aghlara, L., & Tamjid, N. H. (2011). The effect of digital games on Iranian children's vocabulary retention in foreign language acquisition. Procedia-social and Behavioral Sciences, 29, 552–560. Ali Mohsen, M. (2016). The use of computer-based simulation to aid comprehension and incidental vocabulary learning. Journal of Educational Computing Research, 54(6), 863–884. AlShaiji, O. A. (2015). Video games promote Saudi children's English vocabulary retention. Education, 136(2), 123–132. Ashraf, H., Motlagh, F. G., & Salami, M. (2014). The impact of online games on learning English vocabulary by Iranian (low-intermediate) EFL learners. Procedia-social and Behavioral Sciences, 98, 286–291. Aslanabadi, H., & Rasouli, G. (2013). The effect of games on improvement of Iranian EFL vocabulary knowledge in kindergartens. International Review of Social Sciences and Humanities, 6(1), 186–195. Atay, D., & Ozbulgan, C. (2007). Memory strategy instruction, contextual learning and ESP vocabulary recall. English for Specific Purposes, 26(1), 39–51. Baralt, M., & Gómez, J. M. (2017). Task-based language teaching online: A guide for teachers. Language, Learning and Technology, 21(3), 28–43. Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2009). Introduction to meta-analysis. Wiley Online Library. Breen, M. (1987). Learner contributions to task design. Language Learning Tasks, 7, 23–46. Cairns, P., Cox, A., & Nordin, A. I. (2014). Immersion in digital games: Review of gaming experience research. Handbook of Digital Games, 1, 767. Calvo-Ferrer, J. R. (2017). Educational games as stand-alone learning tools and their motivational effect on L2 vocabulary acquisition and perceived learning gains mobile. British Journal of Educational Technology, 48(2), 264–278. http://dx.doi.org/10.1111/bjet.12387. Chen, J. C. (2016). The crossroads of English language learners, task-based instruction, and 3D multi-user virtual learning in Second Life. Computers & Education, 102, 152–171. Chen, M. H., Tseng, W. T., & Hsiao, T. Y. (2018). The effectiveness of digital game-based vocabulary learning: A framework-based view of meta-analysis. British Journal of Educational Technology, 49, 69–77. http://dx.doi.org/10.1111/bjet.12526. Chiswick, B. R., & Miller, P. W. (2005). Linguistic distance: A quantitative measure of the distance between English and other languages. Journal of Multilingual and Multicultural Development, 26(1), 1–11. Chiu, Y. H. (2013). Computer-assisted second language vocabulary instruction: A meta-analysis. British Journal of Educational Technology, 44(2), E52–E56. http://dx. doi.org/10.1111/j.1467-8535.2012.01342.x. Chiu, Y. H., Kao, C. w, & Reynolds, B. L. (2012). The relative effectiveness of digital game-based learning types in English as a foreign language setting: A metaanalysis. British Journal of Educational Technology, 43(4), E104–E107. Cooper, H. (2015). Research synthesis and meta-analysis: A step-by-step approach, Vol. 2. Sage publications. Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis. Russell Sage Foundation. Cowan, N., & AuBuchon, A. M. (2008). Short-term memory loss over time without retroactive stimulus interference. Psychonomic Bulletin & Review, 15(1), 230–235. Davis, K., & Fullerton, S. (2016). Connected learning in and after school: Exploring technology's role in the learning experiences of diverse high school students. The Information Society, 32(2), 98–116. Estaire, S., & Zanón, J. (1994). Planning classwork: A task based approach. Oxford: Macmillan Heinemann. Fahim, M., & Sabah, S. (2012). An ecological analysis of the role of role-play games as affordances in Iranian EFL pre-university students' vocabulary learning. Theory and Practice in Language Studies, 2(6), 1276–1284. http://dx.doi.org/10.4304/tpls.2.6.1276-1284. Franciosi, S. J. (2017). The effect of computer game-based learning on FL vocabulary transferability. Educational Technology & Society, 20(1), 123–133. Franciosi, S. J., Yagi, J., Tomoshige, Y., & Ye, S. (2016). The effect of a simple simulation game on long-term vocabulary retention. CALICO Journal, 33(3), 355–379. http://dx.doi.org/10.1558/cj.v33i2.26063. Ghanbaran, S., & Ketabi, S. (2014). Multimedia games and vocabulary learning. Theory and Practice in Language Studies, 4(3), 489–496. http://dx.doi.org/10.4304/tpls. 4.3.489-496. Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. Grosjean, F. (2012). An attempt to isolate, and then differentiate, transfer and interference. International Journal of Bilingualism, 16(1), 11–21. Guzzo, R. A., Jackson, S. E., & Katzell, R. A. (1987). Meta-analysis analysis. Research in Organizational Behavior, 9(1), 407–442. Hedges, L. V. (1982). Statistical methodology in meta-analysis. Princeton, NJ: ERIC Clearinghouse on Tests. Measurement and Evaluation, Educational Testing Service. Hedges, L. V. (1985). Statistical methodology in meta-analysis. Journal of Educational Statistics, 20. Hedges, L. V. (1992). Meta-analysis. Journal of Educational and Behavioral Statistics, 17(4), 279–296.

356

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Hedges, L. V., & Olkin, I. (2014). Statistical methods for meta-analysis. Academic press. Hedges, L. V., & Pigott, T. D. (2004). The power of statistical tests for moderators in meta-analysis. Psychological Methods, 9(4), 426. Hedges, L. V., & Vevea, J. L. (1998). Fixed-and random-effects models in meta-analysis. Psychological Methods, 3(4), 486. Homer, B. D., Plass, J. L., Raffaele, C., Ober, T. M., & Ali, A. (2018). Improving high school students' executive functions through digital game play. Computers & Education, 117, 50–58. Hong, J. C., Cheng, C. L., Hwang, M. Y., Lee, C. K., & Chang, H. Y. (2009). Assessing the educational values of digital games. Journal of Computer Assisted Learning, 25(5), 423–437. Huang, Y.-M., & Huang, Y.-M. (2015). A scaffolding strategy to develop handheld sensor-based vocabulary games for improving students' learning motivation and performance. Etr&D-Educational Technology Research and Development, 63(5), 691–708. http://dx.doi.org/10.1007/s11423-015-9382-9. Hwang, G.-J., & Wang, S. Y. (2016). Single loop or double loop learning: English vocabulary learning performance and behavior of students in situated computer games with different guiding strategies. Computers & Education, 102, 188–201. Jalali, S., & Dousti, M. (2012). Vocabulary and grammar gain through computer educational games. GEMA Online Journal of Language Studies, 12(4), 1077–1088. Kaminski, J. W., Valle, L. A., Filene, J. H., & Boyle, C. L. (2008). A meta-analytic review of components associated with parent training program effectiveness. Journal of Abnormal Child Psychology, 36(4), 567–589. Koricheva, J., Gurevitch, J., & Mengersen, K. (2013). Handbook of meta-analysis in ecology and evolution. Princeton University Press. Laufer, B., & Paribakht, T. S. (1998). The relationship between passive and active vocabularies: Effects of language learning context. Language Learning, 48(3), 365–391. Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15. Letchumanan, K., Tan, B. H., Paramasivam, S., Sabariah, M. R., & Muthusamy, P. (2015). Incidental learning of vocabulary through computer-based and paper-based games by secondary school ESL learners. Pertanika Journal of Social Science and Humanities, 23(3), 725–740. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis, Vol. 49. Thousand Oaks, CA: Sage publications. Long, M. H. (1996). The role of the linguistic environment in second language acquisition. Handbook of Second Language Acquisition, 2(2), 413–468. Lu, F. C., & Chang, B. (2016). Role-play game-enhanced English for a specific-purpose vocabulary-acquisition framework. Educational Technology & Society, 367–377. Maher, J. M., Markey, J. C., & Ebert-May, D. (2013). The other half of the story: Effect size analysis in quantitative research. Cbe-life Sciences Education, 12(3), 345–351. Ma, Q., & Kelly, P. (2006). Computer assisted vocabulary learning: Design and evaluation. Computer Assisted Language Learning, 19(1), 15–45. Mayer, R. E. (2015). On the need for research evidence to guide the design of computer games for learning. Educational Psychologist, 50(4), 349–353. Moreno-Ger, P., Burgos, D., Martínez-Ortiz, I., Sierra, J. L., & Fernández-Manjón, B. (2008). Educational game design for online education. Computers in Human Behavior, 24(6), 2530–2540. Mostert, M. P. (2001). Facilitated communication since 1995: A review of published studies. Journal of Autism and Developmental Disorders, 31(3), 287–313. Muhanna, W. (2012). Using online games for teaching English vocabulary for Jordanian students learning English as a foreign language. Journal of College Teaching & Learning, 9(3), 235. Nation, I. S. (2001). Learning vocabulary in another language: Ernst klett sprachen. Nation, I. P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. Vocabulary: Description, Acquisition and Pedagogy, 14, 6–19. Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50(3), 417–528. Perez, M. M., Van den Noortgate, W., & Desmet, P. (2013). Captioned video for L2 listening and vocabulary learning: A meta-analysis. System, 41(3), 720–739. http:// dx.doi.org/10.1016/j.system.2013.07.013. Prensky, M. (2001). Digital natives, digital immigrants part 1. On the Horizon, 9(5), 1–6. Pulido, D. (2003). Modeling the role of second language proficiency and topic familiarity in second language incidental vocabulary acquisition through reading. Language Learning, 53(2), 233–284. Reinhardt, J., & Sykes, J. (2014). Special issue commentary: Digital game and play activity in L2 teaching and learning. Language, Learning and Technology, 18(2), 2–8. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638. Saffarian, R., & Gorjian, B. (2012). Effect of computer-based video games for vocabulary acquisition among young children: An experimental study. Journal of Comparative Literature and Culture, 1(3), 44–48. Sandberg, J., Maris, M., & Hoogendoorn, P. (2014). The added value of a gaming context and intelligent adaptation for a mobile learning application for vocabulary learning. Computers & Education, 76, 119–130. http://dx.doi.org/10.1016/j.compedu.2014.03.006. Saville-Troike, M. (1984). What really matters in second language learning for academic achievement? Tesol Quarterly, 199–219. Schmitt, N. (2008). Review article: Instructed second language vocabulary learning. Language Teaching Research, 12(3), 329–363. Schoemaker, K., Mulder, H., Deković, M., & Matthys, W. (2013). Executive functions in preschool children with externalizing behavior problems: A meta-analysis. Journal of Abnormal Child Psychology, 41(3), 457–471. Segers, E., & Verhoeven, L. (2003). Effects of vocabulary training by computer in kindergarten. Journal of Computer Assisted Learning, 19(4), 557–566. Selinker, L. (1972). Interlanguage. Iral-international Review of Applied Linguistics in Language Teaching, 10(1–4), 209–232. Shintani, N. (2014). The effectiveness of processing instruction and production-based instruction on L2 grammar acquisition: A meta-analysis. Applied Linguistics, 36(3), 306–325. Steinkuehler, C., Squire, K., & Sawyer, K. (2014). Videogames and learning. Cambridge handbook of the Learning Sciences, 377–396. Stock, W. A. (1994). Systematic coding for research synthesis. The Handbook of Research Synthesis, 236, 125–138. Sundqvist, P., & Wikström, P. (2015). Out-of-school digital gameplay and in-school L2 English vocabulary outcomes. System, 51, 65–76. Sweller, J. (2011). Cognitive load theory Psychology of learning and motivation, Vol. 55, Elsevier37–76. Vahdat, S., & Behbahani, A. R. (2013). The effect of video games on Iranian EFL learners' vocabulary learning. Reading, 13(1). VanPatten, B. (1994). Evaluating the role of consciousness in second language acquisition: Terms, linguistic features & research methodology. Consciousness in Second Language Learning, 27–36. Viechtbauer, W. (2007). Accounting for heterogeneity via random-effects models and moderator analyses in meta-analysis. Zeitschrift für Psychologie/Journal of Psychology, 215(2), 104–121. Wang, S. Y., Hwang, G. J., & Chen, S. F. (2015). Development of a contextual game for improving English vocabulary learning performance of elementary school students in Taiwan. 2015 Iiai 4th international congress on advanced applied informatics (Iiai-Aai) (pp. 268–272). . http://dx.doi.org/10.1109/iiai-aai.2015.161. Willis, J. (1996). A flexible framework for task-based learning. Challenge and Change in Language Teaching, 52–62. Yen, L., Chen, C. M., & Huang, H. B. (2016). Effects of mobile game-based English vocabulary learning app on learners' perceptions and learning performance: A case study of taiwanese EFL learners. 2016-January. Yip, F. W., & Kwan, A. C. (2006). Online vocabulary games as a tool for teaching and learning English vocabulary. Educational Media International, 43(3), 233–249. Young, S. S. C., & Wang, Y. H. (2014). The game embedded CALL system to facilitate English vocabulary acquisition and pronunciation. Educational Technology & Society, 17(3), 239–251.

357