Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

Computers & Education 125 (2018) 345–357 Contents lists available at ScienceDirect Computers & Education journal homepage: www.elsevier.com/locate/c...

Download PDF

363KB Sizes 1 Downloads 111 Views

Report

PDF Reader
Full Text

Computers & Education 125 (2018) 345–357

Contents lists available at ScienceDirect

Computers & Education journal homepage: www.elsevier.com/locate/compedu

Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

T

Yu-Ling Tsaia, Chin-Chung Tsaib,c,∗ a

Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, #43, Sec. 4, Keelung Rd., Taipei 106, Taiwan Program of Learning Sciences, National Taiwan Normal University, Taipei City, Taiwan c Institute for Research Excellence in Learning Sciences, National Taiwan Normal University, Taipei City, Taiwan b

A R T IC LE I N F O

ABS TRA CT

Keywords: Applications in subject areas Evaluation of CAL systems Improving classroom teaching Pedagogical issues

Second language (L2) vocabulary learning has been deemed a daunting task for many students. This meta-analysis study aimed to explore the eﬀectiveness of applying digital games for L2 vocabulary learning. A total of 26 published studies (2001–2017) conformed with the inclusion/ exclusion criteria. Due to diverse ﬁndings of previous meta-analysis research in the ﬁeld, we propose a framework of four-condition research designs to diﬀerentiate the empirical studies in an attempt to disclose possible expositions for the diversity and to connect the speciﬁc learning mechanisms with the research evidence. The overall eﬀect sizes of the studies in the four conditions are reported as follows: A large overall eﬀect size for Condition 1 (10 studies) (experimental groups playing digital games versus control groups receiving alternative activities), medium for Condition 2 (experimental groups playing digital games with a feature added or changed versus control groups playing base-version games) (10 studies), medium to large for Condition 3 (experimental groups playing digital games and control/comparison groups receiving identical content via conventional means) (two studies), and non-signiﬁcant for Condition 4 (all participants playing the same digital games but being grouped by a non-game related variable) (four studies). Next, a structure diagram is developed in which the four conditions of the research design are connected with their respective game-related factors based on their locality. Further, we conducted moderator analyses to examine how the eight potential moderator variables (game design, educational level, L2 proﬁciency level, linguistic distance, intervention setting, assessment type, game source and intervention length) inﬂuenced the eﬀect sizes in Conditions 1 and 2 to illustrate various digital game-based L2 vocabulary learning scenarios. Finally, suggestions and implications are provided for game designers, educational practitioners, and researchers in the ﬁeld.

1. Introduction The importance of vocabulary knowledge in the scenario of language learning has been reiterated (Ghanbaran & Ketabi, 2014; Saville-Troike, 1984). It is estimated that good comprehension of a written text needs at least 95%–99% coverage of the lexical items (Laufer & Ravenhorst-Kalovski, 2010; I. P.; Nation & Waring, 1997; I. S.; Nation, 2001). That is, there should be fewer than ﬁve words that are unfamiliar to the reader in a 100-word paragraph, and 8000–9000 word families are needed for an adult's general reading

∗

Corresponding author. 162, Section 1, Heping E. Rd., Taipei City 106, Taiwan. E-mail addresses: [email protected] (Y.-L. Tsai), [email protected], [email protected] (C.-C. Tsai).

https://doi.org/10.1016/j.compedu.2018.06.020 Received 19 August 2017; Received in revised form 23 April 2018; Accepted 20 June 2018

Available online 21 June 2018 0360-1315/ © 2018 Published by Elsevier Ltd.

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

(Schmitt, 2008). With such a large vocabulary size, second or foreign language (L2) learners are often frustrated and bored in the process of vocabulary learning (Long, 1996). On the one hand, being aware of the central role of vocabulary knowledge in language learning (Atay & Ozbulgan, 2007), researchers and practitioners in the ﬁeld of language education have been intensively testing, developing, or comparing factors and means that can enhance vocabulary learning eﬃciency. On the other hand, digital games have been applied as a learning tool in nearly all domains of education since the last two decades due to its capability to foster “cognitive and behavioral change” (Steinkuehler, Squire, & Sawyer, 2014, p. 1). Hence, abundant studies have been developed either to test the eﬀects or to compare the eﬀectiveness of diﬀerent types of digital games on L2 vocabulary learning (Moreno-Ger, Burgos, Martínez-Ortiz, Sierra, & FernándezManjón, 2008). While some empirical studies have reported positive eﬀects of digital games on L2 vocabulary gains, some have revealed negative or mixed results (Sundqvist & Wikström, 2015; deHaan, Reed, & Kuwada, 2010). To cope with the issue, various studies have applied the meta-analysis method to generate an overall eﬀect size (the diﬀerence in the L2 vocabulary gains of the experimental group and the control/comparison group) to oﬀer a quantitative view of the subject matter. In Chiu's (2013) meta-analysis research on 16 computer-assisted language learning (CALL) studies (2005–2011), CALL without games produced a signiﬁcantly better result of vocabulary learning (n = 9, d = 1.113, p = 0.003) than those with games (n = 7, d = 0.495, p = 0.001). On the contrary, Chen, Tseng, and Hsiao (2018) meta-analyzed 10 selected empirical studies (2003–2014) and reported a large overall eﬀect size (n = 10, ﬁxed-eﬀect model d = 0.784, p = 0.000, random-eﬀects model d = 1.027, p = 0.000) of L2 vocabulary gains under the umbrella research design “game versus traditional instruction.” To advance the digital game research on learning, however, Mayer (2015) argued that “broad doctrines should be replaced with testable theoretical models that contain speciﬁc learning mechanisms linked to research evidence on games for learning” (p. 350). To link the “speciﬁc learning mechanisms” and the research evidence, “the impact of playing the game on learning outcomes” (p. 350), Mayer further proposed a research model in which three conditions of research designs were presented to connect three types of research evidence (the cognitive-consequence type, the feature added-or-changed type, and the media-comparison type). Adapted and transformed into L2 vocabulary-learning scenarios, the aim of the ﬁrst condition, comparing pretest-to-posttest gains of the L2 vocabulary of the experimental group that plays a video game and the control group that receives traditional instruction, is to assess the eﬀect size between the two groups (cognitive-consequence type). The intention of the second condition, in which the control/ comparison group plays the base version of a game and the experimental group plays the same game with one speciﬁc feature added or changed, is to have further understanding of the eﬀects of the speciﬁc features embedded in games on L2 vocabulary gains (the value added-or-changed type). The third condition, in which the experimental group plays a digital game and the control/comparison group receives the equivalent content of the game via a conventional medium (p. 351), aims to explore the eﬀects size between the digital and the conventional media (the media-comparison type). Therefore, there was a demand for a more nuanced diﬀerentiation of research designs on digital game-based L2 vocabulary learning. Drawing on Mayer's (2015) contention on digital game-based learning (DGBL) and the purposes of this study, the contributions of this meta-analysis study on digital game-based L2 vocabulary learning are examined from three aspects. First, this meta-analysis study, taking advantage of a number of empirical studies published in the ﬁeld to date, advances the topic by decomposing the conditions (types) of the research designs applied by the empirical studies on digital game-based L2 vocabulary learning so as to connect the speciﬁc learning mechanisms of DGBL with the research evidence (the impacts on L2 vocabulary learning outcomes). Second, the respective overall eﬀect sizes of the conditions are calculated to further understand the potential eﬀects of the mechanisms on L2 vocabulary learning. Third, previous meta-analysis studies on digital game-based L2 vocabulary learning have used game types, age, and cognates as the potential moderators to investigate their inﬂuence on the strength of the eﬀect sizes (Chen et al., 2018). As one of the desired promises of meta-analysis is to “determine the eﬀects of moderators that have never been examined in an original empirical study” (Guzzo, Jackson, & Katzell, 1987, p. 414) and “any study-level variable that may exert a systematic inﬂuence on the outcome measure can be considered a potential moderator” (Viechtbauer, 2007, p. 110), in addition to the previous ﬁndings, this study intended to extend the scope of the potential moderators so as to better depict the scenarios of digital game-based L2 vocabulary learning. Our research questions were composed accordingly: 1. What have been the conditions of research designs applied by the empirical studies on digital game-based L2 vocabulary learning? 2. What is the overall eﬀect size in each condition? 3. What are the moderator variables that have signiﬁcant inﬂuences on the between-study variation in each condition?

2. Method In order to address the three research questions, the Comprehensive Meta-Analysis (version 2.2.064) [Computer software]. Englewood, NJ: Biostat was employed for the computation. Four discrete steps were conducted: 1) to search for and identify potential target studies, an iterative process; 2) to develop a codebook based on the studies' research conditions and their study characteristics (moderator variables); 3) to calculate the overall eﬀect size for each condition; and 4) to test the inﬂuences of the potential moderators under each condition. A colleague in the same research ﬁeld was invited as the co-coder for the searching and coding processes. The above steps are described in detail in the following.

346

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

2.1. Searching for and identifying potential target studies In the ﬁrst screening stage, we used two keywords, vocabulary and game for the Boolean search to screen the literature in the databases: Web of Science Core Collection (WOS), ERIC (EBSCOhost) and Scopus. Since the ﬁrst document comprehensively depicting game-based learning was published in 2001 (Prensky, 2001), the search time zone was set from 2001 to the end of January 2017, and the articles were limited to those written in English. The search resulted in more than 300 studies, which were then imported into EndNote X7, a citation management software package. Next, the articles without using digital games for L2 vocabulary enhancement and those duplicated copies were deleted. Then the ﬁrst author and the co-coder worked separately to screen each study by its baked-in theme in the title. The results of the two coders were compared. If the title of the study held opaque meaning, it was retained. The process reduced the potential candidates to 87 articles. The remaining articles in EndNote were sifted following the inclusion/exclusion criteria below to address the research questions. 1. A digital vocabulary learning game (not multimedia such as online glosses) was claimed to be implemented in the study as the key independent variable for L2 vocabulary learning. 2. Only experimental studies and quasi-experimental studies were included. Research reviews, case studies, qualitative research, and survey research were excluded. 3. Studies which did not include a game group and a control/comparison group were excluded. 4. Articles focusing on ﬁrst language (L1) vocabulary acquisition were excluded. 5. Studies should report data from both the experimental and the control/comparison group suﬃcient for calculating the treatment eﬀect sizes. 6. Only published studies were included. Unpublished studies were excluded as they had not been subject to peer review (Mostert, 2001). Although the exclusion might suggest the risk of publication bias, we chose to decrease the risk by using the standard and widely accepted method for calculating the fail-safe number to avoid other unpredictable bias (Kaminski, Valle, Filene, & Boyle, 2008; Schoemaker, Mulder, Deković, & Matthys, 2013). 7. The candidate study should have an English-written e-ﬁle available online for retrieval. 8. To increase the compatibility of the experimental results, studies focusing on learners with learning disabilities were excluded. By the end of the screening procedure, 26 studies remained. 2.2. Developing the codebook Drawing on Mayer's (2015) taxonomy of research designs on DGBL, we distinguished the conditions of the research designs applied by the 26 empirical studies. If the research design in a study was to compare the L2 vocabulary gains (from pretest to immediate posttest) for a group that plays digital games (experimental group) with a group that receives an alternative activity (control group), the study was coded as Condition 1, a research design for detecting the eﬀects of digital games in general. If the research design was to compare L2 vocabulary gains of a group that plays the base version of a game (control/comparison group) with a group that plays the same game or games with one feature added or changed (experimental group), the study was coded as Condition 2, a research design for detecting the eﬀectiveness of added or changed values in digital games. The studies with a research design of comparing a group that plays a digital game (experimental group) with a group that receives the equivalent content via conventional media or paper games (control/comparison group) were coded as Condition 3. The rest were coded as “Others” (as displayed in Table 1). The next phase was to generate a codebook to reveal potential moderator variables and the respective subgroups for later moderator analysis. A moderator in the meta-analysis is a study characteristic with diﬀerent levels found throughout the target studies that is possibly predictive of the outcomes (Borenstein, Hedges, Higgins, & Rothstein, 2009). As one of our research goals was to explore potential moderator variables comprehensively, the processes in coding the moderator variables were conducted following Cooper's (2009, 2015) coding procedure, including the codebook construction, coder-training, discussion and negotiation on variation, and estimating reliability. The processes are comparable to Stock's (1994) eight-step training and coding procedures: 1) an overview of the synthesis by the principle coder, 2) consensus on forms and descriptions between the coders, 3) describing the Table 1 Codebook for the conditions of research design on DGBL. Condition

Experimental group (E group)

Control/comparison group (C group)

Comparison

Condition 1

Playing an oﬀ-the-shelf game

video games vs. traditional instruction

Condition 2 Condition 3

Playing the same game with one feature added or changed Playing a digital game

Engaging in alternative non-game-related activities Playing the base version of a game

Condition 4

Others

Receiving the same content of the game via conventional media

347

A video game with a speciﬁc feature vs. the game without that speciﬁc feature The eﬀectiveness of the learning media between video games and others

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

method to organize the forms, 4) testing the forms on ﬁve to 10 studies, 5) estimating the total time for the coding process, 6) negotiation to achieve the consensus on coded forms, 7) revision of the forms and the codebook, and 8) repetitive processes on another study and so on until consensus is achieved (pp. 134–135). Following the coding processes, in the results, seven categorical types (game type, educational level, L2 proﬁciency level, linguistic distance, intervention setting, assessment type, game source) and one continuous type (intervention duration) of moderator variable were extracted for the moderator analyses. The eight potential moderator variables and their respective subgroup criteria are described as follows. 2.2.1. Game type The game types were dichotomized into drill and the task-based subgroups, similar to the taxonomy applied by previous metaanalysis studies in the same ﬁeld (Chen et al., 2018; Chiu, Kao, & Reynolds, 2012). The drill games provide L2 vocabulary learners with repetitive practice of the words in diﬀerent texts such as matching games and grammar- and vocabulary-related games (e.g., Aghlara & Tamjid, 2011; Jalali & Dousti, 2012). They have scores, challenges, multimedia, and so on, but without a meaningful task to work on. A task-based game, on the contrary, is a game with “a goal-oriented activity in which learners use language to achieve a real outcome” (Willis, 1996, p. 53). Players succeed in the game by completing the assigned tasks. These games' activities involve critical thinking and problem solving wrapped up in the form of meaningful tasks (Homer, Plass, Raﬀaele, Ober, & Ali, 2018), such as role-play games (Fahim & Sabah, 2012), strategy games (Saﬀarian & Gorjian, 2012) or adventure games (Vahdat & Behbahani, 2013), in which learners' focus is on meaning rather than on form (Breen, 1987; Estaire & Zanón, 1994). 2.2.2. Educational level Due to the limited number of studies, educational levels were combined into three subgroups in which preschool and elementary school students were coded as the primary group (Aghlara & Tamjid, 2011; AlShaiji, 2015; Aslanabadi & Rasouli, 2013; Saﬀarian & Gorjian, 2012), junior and senior school students as the middle group (Jalali & Dousti, 2012; Muhanna, 2012), and university students as the high group (Ashraf, Motlagh, & Salami, 2014; Fahim & Sabah, 2012; Vahdat & Behbahani, 2013; Yip & Kwan, 2006). 2.2.3. L2 proﬁciency level The issue of applying learners' initial L2 proﬁciency levels for the moderator analysis was the lack of a standardized language assessment instrument, such as the TOEFL or TOEIC, applied in the primary studies to position learners' L2 proﬁciency levels (Norris & Ortega, 2000). To cope with this problem, we grouped learners' L2 proﬁciency into three levels based on the descriptions in the studies: beginning, beyond-beginning, and mixed groups. The studies with participants having no primary knowledge of the target language or with a primary level (e.g., L2 kindergarten students) were coded as the beginning group (Aghlara & Tamjid, 2011; AlShaiji, 2015; Aslanabadi & Rasouli, 2013; Jalali & Dousti, 2012). The studies in which learners had lower-, pre-intermediate, to intermediate L2 proﬁciency were put into the beyond-beginning group (Ashraf et al., 2014; Saﬀarian & Gorjian, 2012; Vahdat & Behbahani, 2013). Studies using pretests as the covariate variable without indicating learners' L2 proﬁciency were put into the mixedlevel group. 2.2.4. Linguistic distance As one of the L1 or the L2 in the 26 studies was English, we coded the linguistic distance between the two languages in each study based on the score of the other one in Chiswick and Miller's (2005, pp. 12–13) index, a coding approach previously adopted by Chen et al. (2018). The score of the language in the index was the averaged outcome of a speciﬁc English achievement test taken by a group of learners with the same mother language after they received a 24-week English language training. If the score of the language in the index was larger than or equal to 2, suggesting a shorter learning distance of the language to English, the language was coded as “Close”, such as Farsi to English (2) (Aghlara & Tamjid, 2011), and Dutch to English (2.75) (Segers & Verhoeven, 2003); contrarily, Mandarin to English (Huang & Huang, 2015) and Arabic to English (AlShaiji, 2015; Muhanna, 2012) were coded as “Far” for their scores of less than 2, suggesting a farther learning distance. 2.2.5. Assessment type Learners' vocabulary knowledge has been diﬀerentiated into the passive/receptive type (the knowledge of a word's meaning) and the active/productive type (the knowledge of using words in context) (Laufer & Paribakht, 1998). Drawing on the descriptions of the target studies, we coded the assessments examining students' passive vocabulary knowledge as the receptive/passive assessment type, such as multiple-choice questions. Those examining students' active vocabulary knowledge such as ﬁlling in the blank (Letchumanan, Tan, Paramasivam, Sabariah, & Muthusamy, 2015), answering a question (Sandberg, Maris, & Hoogendoorn, 2014), composition, and presentation, were coded as the productive/active assessment type (Sandberg et al., 2014; Shintani, 2014). 2.2.6. Intervention setting Since it is about learning, and the participants are students, formal settings (in class) usually involve compulsory learning with teachers' supervision, while informal settings (e.g., home, out-of-school, or after-school classroom settings) (Franciosi, 2017; Sandberg et al., 2014; Yen, Chen, & Huang, 2016, pp. 255–262) often speak for interest-powered or self-motivated learning (Davis & Fullerton, 2016) where students receive less instruction and pressure. 348

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Table 2 The codebook for the moderators and their subgroup deﬁnitions. Category

Subgroups

Deﬁnition

Game type

1. Drill type 2. Task-based type

Educational level

1. 2. 3. 1. 2. 3. 1. 2. 1. 2. 1. 2. 1. 2. 3.

The drill-and-practice types of games that provide exposure to words through multiple texts. Games involving problem-solving, simulations, decision-making (Breen, 1987) with learners' focus on meanings rather than on word forms (Estaire & Zanón, 1994) Preschool and elementary school students Junior and senior school students University students Primary level, no prior knowledge, kindergartens Pre-, lower-level, intermediate-level Studies using pretest as covariate without grouping participants' language proﬁciency The language scored ≥2 The language scored < 2 Playing games in class Playing games after class or at home Tests such as multiple-choice, which examine students' passive vocabulary knowledge Filling the blank, composition, presentation, etc., which test students' active vocabulary knowledge Games developed for the research Games oﬀering free access online CD-ROM, oﬀ-the-shelf software The duration was counted by the day. One week is counted as seven days.

L2 proﬁciency

Linguistic distance Intervention setting Assessment type Game source

Primary Middle High Beginning Beyond-beginning Mixed Close Far formal informal Receptive Productive Custom-design Web Software

Intervention duration

2.2.7. Game source Regarding the sources of the games, if the game was speciﬁcally designed for the study, it was put into the custom-design group. If the game was borrowed from a website which oﬀers free access to the public, it belonged to the web group. If the source of the game was CD-ROM or oﬀ-the-shelf software, the game was grouped as the software group. 2.2.8. Intervention duration Because the lengths of the treatments varied to a large degree in the 26 empirical studies (from one day to 15 weeks), instead of dividing them into a random number of subcategories, we coded each treatment duration by the day (one week equals seven days). The codebook for the moderators and their subgroup deﬁnitions is summarized and listed in Table 2. 2.3. Conducting the meta-analysis Meta-analysis, a term coined by Glass (1976) and widely applied as a supplemental method to traditional research reviews (Hedges, 1982), is a statistical approach aiming to provide statistical ﬁgures generated by extracting and integrating data of a large quantity of empirical studies within the same research domain. One of the main goals for conducting the meta-analysis is to pursue an overall eﬀect size (the standardized mean diﬀerence between the experimental group and the control/comparison group) to “quantify the magnitude of the diﬀerence between groups” (Maher, Markey, & Ebert-May 2013, p. 345). Another goal in the meta-analysis involves “the comparisons of the mean eﬀects of groups of studies that have diﬀerent characteristics” (Hedges & Pigott, 2004, p. 426), which, in other words, is to conduct moderator analyses. Two eﬀect-size models, the ﬁxed-eﬀect model and the random-eﬀects model, have been developed for the calculation of the eﬀect sizes based on diﬀerent inference models (Hedges & Vevea, 1998). If the aim of the analyst is to make inferences of the observed studies, or of a set of identical studies (any variation between studies would be due to sampling errors), the ﬁxed-eﬀect model is considered an appropriate model; however, if the analyst aims to generalize the inferences beyond the observed studies to a population, the random-eﬀects model is desired. Based on our study goals, the statistical values under the random-eﬀects model were employed for further analysis. To achieve the overall eﬀect size, ﬁrst, we used the Comprehensive Meta-analysis (version 2.2.064) to convert the descriptive statistics and the independent outcome data (diﬀerent outcome scores from independent assessments or the outcome scores of an assessment from independent groups) to eﬀect sizes (hence, the number of the eﬀect sizes was sometimes larger than the number of the studies), which were calculated for the overall eﬀect size. Next, the signiﬁcance of the overall eﬀect size was examined by the test of the null hypothesis. If the p value of the null hypothesis was signiﬁcant (p < 0.05), the null was rejected, suggesting the signiﬁcant diﬀerences between the two groups on L2 vocabulary gains. The metric for the eﬀect sizes is Cohen's d. The Cohen's d values were transformed to Hedges's g, “a corrected standardized mean eﬀect size” (Perez, Van den Noortgate, & Desmet, 2013, p. 726) using sample weights to address the issue of small-sample bias (Hedges, 1985; Lipsey & Wilson, 2001). According to Cohen's rules (Hedges & Olkin, 2014), an eﬀect size of less than 0.2 is considered a small eﬀect, between 0.2 and 0.8 is a medium eﬀect, and larger than 0.8 is a large eﬀect. In addition, the heterogeneity of the eﬀect sizes in each condition was examined by the Q-test (Cochran's Q). The signiﬁcance of the Q-value (Qtotal) (p < 0.05) suggests the signiﬁcant dispersion of the eﬀect sizes, indicating that there were true diﬀerences among the eﬀect sizes, and moderator analyses were thus suggested. The non-signiﬁcance of Qtotal (p > 0.05), on the contrary, suggests that diﬀerences among the eﬀect sizes were subject to sampling errors (Hedges, 1992). 349

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

The statistical technique Classic Fail-safe N was implemented to address the question of the eﬀect sizes aﬀected by publication bias. To conduct moderator analyses, we also adopted the Q-test for each moderator variable to examine the heterogeneity of the eﬀect sizes between the subgroups (Qb) under the suggested mixed-eﬀects model (a random-eﬀects model is applied to combine studies within each subgroup, and a ﬁxed eﬀect model to combine subgroups to produce the overall eﬀect) (Borenstein et al., 2009, p. 183). As the intervention durations of the studies were coded as a continuous moderator variable, we applied the meta-regression method, under the method of moments (a mixed-eﬀects model), to explore the relationship between the durations and the eﬀect sizes. 3. Results The results are presented in a sequence related to the three research questions. 3.1. Research condition Following Mayer's (2015) three conditions of research designs, we coded 10 studies with 10 eﬀect sizes and 642 participants as Condition 1 (a group that plays digital games versus a group that receives an alternative activity), 10 studies with 14 eﬀect sizes and 837 participants as Condition 2 (the control/comparison group that plays base versions of games versus the experimental group that plays games with one speciﬁc feature added or changed). The added or reﬁned features tested include storylines, challenges and rewards, adaptive learning, embedded questions (passive versus active), a reﬁned feedback system, embedded quizzes, simulation games, guiding strategies, ranking systems, and scaﬀolding. Two studies with two eﬀect sizes and 129 participants were coded as Condition 3 (the experimental group that plays a digital game versus the control/comparison group that receives the equivalent content via conventional means). It should be noted that the focus of our coding criteria for Condition 3 was about the equivalent content of the treatments of the two groups. Therefore, the means such as a paper-booklet plus audio clips in the comparison group in Calvo-Ferrer's (2017) study, and the paper ﬂash-card games in Letchumanan et al.’s (2015) studies were considered adequate to be included in the condition. In addition, Condition 4 was created to substitute the “Others” category. The reason for the substitution was that an additional condition of research designs other than the three mentioned above was identiﬁed. The common ground of the studies in this condition was that all the participants received an exact same game as the treatment, but were grouped and compared by a non-game related factor, such as by positions (playing the game versus watching the game) (Ali Mohsen, 2016; deHaan et al., 2010), by motivation (voluntary versus compulsory) (Ma & Kelly, 2006), or by gaming frequencies (Sundqvist & Wikström, 2015), and the focus of the studies was to test the inﬂuences of a non-game related variable on digital game-based L2 vocabulary learning. Four studies (six eﬀect sizes and 203 participants) were involved in Condition 4. The framework of the four-conditional research designs for game learning research and the corresponding articles on digital game-based L2 vocabulary learning are displayed in Table 3. Table 3 The framework of the four-conditional research designs for digital game learning research and the corresponding articles on digital game-based L2 vocabulary learning. Research design - purpose

Deﬁnition

Condition 1 - Eﬀectiveness of digital games in general

Game play experimental group versus an alternative activity (control group)

- Game vs. traditional instruction (such as memorizing, no treatment) - Game vs. placebo (reading a text-only web page)

Condition 2 - Eﬀectiveness of values added-or-changed in games

Experimental group playing games with one speciﬁc feature added or changed versus control group playing base versions of games

Condition 3 - Eﬀectiveness of media

Experimental group playing a digital game vs. control/comparison group receiving the same content via conventional means All participants receiving the same digital game but grouped by a non-game related variable

- Enhanced version vs. original version - Game with scaﬀolding vs. game without - Game with active questions vs. game with passive questions - Mobile app with game vs. app without game features double loop vs. single loop - Game vs. booklet with equivalent content + audio - Game vs. paper-based game with equivalent content - Player vs. watcher - Out of class vs. in class - High gaming frequency vs. low frequency

Condition 4 - Eﬀectiveness of nongame related factors

Example

Note: k = the number of studies; Nes = the number of eﬀect size; Npart = the number of participants. 350

Study Yip and Kwan (2006); Muhanna (2012); Aghlara and Tamjid (2011); Jalali and Dousti (2012); Aslanabadi and Rasouli (2013); Vahdat and Behbahani (2013); Ashraf et al. (2014); AlShaiji (2015); Fahim and Sabah (2012); Saﬀarian and Gorjian (2012); Segers and Verhoeven (2003); Sandberg et al. (2014); Young and Wang (2014); Huang and Huang (2015); Wang, Hwang, and Chen (2015); Franciosi, Yagi, Tomoshige, and Ye (2016); Hwang and Wang (2016); Lu and Chang (2016); Yen et al. (2016); Franciosi (2017)

Calvo-Ferrer (2017); Letchumanan et al. (2015)

Ali Mohsen (2016); deHaan et al. (2010); Ma and Kelly (2006); Sundqvist and Wikström (2015)

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

3.2. Overall eﬀect size To answer the second research question, we ran the Comprehensive Meta-analysis (version 2.2.064) to compute the overall eﬀect sizes for the four conditions respectively. The results suggest a large overall eﬀect size (d = 0.986, CI [0.590–1.382], p = 0.000) for Condition 1, indicating that DGBL signiﬁcantly outperformed alternative activities on students' L2 vocabulary gains. The Qtotal value in a further step shows that there was signiﬁcant dispersion between the 10 eﬀect sizes (Qtotal = 47.732, df = 9, p < 0.001), and over 80% of the dispersion was attributed to true diﬀerences, not simply caused by sampling errors (I2 = 81.145). The results for Condition 2 (d = 0.445, CI [0.218–0.672], p = 0.000) indicate that the added-or-changed features had an overall potential to signiﬁcantly increase the eﬀectiveness of digital games by a medium eﬀect size compared to digital games in their base versions. The heterogeneity test (Qtotal = 30.120, df = 13, p < 0.01, I2 = 56.840) also suggests further moderator analyses. Based on the suggestion that meta-analysis could still be conducted meaningfully with the minimum amount of two studies (Koricheva, Gurevitch, & Mengersen, 2013), we also calculated the overall eﬀect sizes for Conditions 3 and 4. A signiﬁcant mediumto-large overall eﬀect size (d = 0.733, CI [0.376–1.091], p = 0.000) is reported, indicating that digital games were more eﬀective for L2 vocabulary learning comparing to other means with an equivalent content. The heterogeneity test suggests no true variance between the two eﬀect sizes (Qtotal = 0.597, df = 1, p > 0.05, I2 = 0.000). The p value (d = 0.503, CI [-0.840–1.846], p = 0.463) shows a non-signiﬁcant overall eﬀect size for Condition 4. However, the heterogeneity test (Qtotal = 125.194, df = 5, p < 0.001, I2 = 96.006) indicates a high degree of dispersion among the eﬀect sizes, which could be explained partially by two particular studies. The similarity of the two studies was that the game groups in both studies all played a game while the comparison groups all watched the same game played. In such a similar research design, however, players in deHaan et al.’s (2010) study recalled signiﬁcantly less vocabulary than the watchers did (d = −2.43, CI [-3.010∼-1.854]), but players recalled signiﬁcantly more vocabulary (d = 2.232, CI [1.471–2.994]) than the watchers in Ali Mohsen's (2016) study. The details of the overall eﬀect sizes and the heterogeneity tests in Conditions 1, 2, 3, and 4 are summarized in Table 4 (the individual-study details are displayed in the Appendices). As we adopted only published studies for the meta-analysis, to address the publication bias, we ran the classic fail-safe N analysis. Rosenthal (1979) suggests that if the “tolerance level” (the fail-safe X) is larger than the requisite number 5Nes + 10 (Nes = the total number of reported eﬀect sizes), the result of the meta-analysis is considered “resistant to the ﬁle drawer problem” of unreported null eﬀects (p. 640). The result is 283 for Condition 1, and 117 for Condition 2, indicating that study numbers in both conditions are larger than the requisite numbers. 3.3. Moderator analysis To address the third research question, we produced Table 5 to display the statistical results of the moderator analyses for Conditions 1 and 2, but not for Conditions 3 and 4 due to the limited study numbers. 3.3.1. Signiﬁcant moderator variables Under the mixed-eﬀects model, three categorical moderators (game types, educational levels, L2 proﬁciency levels) had a signiﬁcant inﬂuence on learning in Condition 1, and two (intervention settings and assessment types) in Condition 2. The details are displayed and discussed in the following. 3.3.1.1. Game type. The results of the heterogeneity test (Qb = 4.235, df = 1, p < 0.05) indicate that game types were a signiﬁcant Table 4 The overall eﬀect size and the heterogeneity test in Conditions 1, 2, 3, and 4. Condition 1 Model RE K 10 Nes 10 n 642 d [CI (95%)] 0.986 [0.590–1.382] g [CI (95%)] 0.970 [0.581–1.359] p (test of null) 0.000 Heterogeneity Qtotal 47.732∗∗∗ df 9 I2 81.145 Classic Fail-safe N Number of missing studies that would bring p-value to > 0.05: 283

Condition 2

Condition 3

Condition 4

RE 10 14 837 0.445 [0.218–0.672] 0.438 [0.215–0.662] 0.000

RE 2 2 129 0.733 [0.376–1.091] 0.724 [0.371–1.077] 0.000

RE 4 6 203 0.503 [-0.840–1.846] 0.490 [-0.828–1.808] 0.463

30.120∗∗ 13 56.840

0.597 1 0.000

125.194∗∗∗ 5 96.006

Number of missing studies that would bring p-value to > 0.05: 117

Note. RE = random-eﬀects model; k = the total number of studies; Nes = the total number of eﬀect sizes; n = total sample size; d = Cohen's d; g = Hedges's g. p < .05: Fail-safe N > 5 Nes + 10. 351

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Table 5 Eﬀect sizes of moderator variables at diﬀerent levels in Conditions 1 and 2 (Mixed-eﬀects model). Condition 1 Moderator Game Type Drill Task Educational level Low Middle High L2 proﬁciency level Beginning Beyond-beginning Mixed Linguistic distance Close Far Intervention setting Formal Informal Assessment type Receptive Productive Game source Custom-design Web Software Intervention duration mixed-eﬀects reg.

Nes

d

p

CI (95%)

7 3

0.711 1.669

.000 .000

[0.384–1.038] [0.817–2.521]

Condition 2 Qb 4.235∗

df 1

Nes

d

p

CI (95%)

5 9

0.264 0.523

.067 .001

[-0.019-0.547] [0.221–0.825]

7 0 7

0.441 0 0.485

.001

[0.183–0.699]

.024

[0.064–0.906]

7 5 2

0.441 0.337 0.834

.001 .217 .000

[0.183–0.699] [-0.197-0.871] [0.398–1.271]

4 10

0.568 0.397

.001 .008

[0.222–0.914] [0.106–0.688]

10 0

9 5

0.219 0.771

.117 .000

[-0.055-0.492] [0.554–0.989]

10 0

11 3

0.332 0.839

.010 .000

[0.080–0.584] [0.532–1.145]

8 0.419 4 0.553 2 0.228 Slope −0.003

.035 .001 .287 p .370

[0.030–0.808] [0.240–0.866] [-0.192-0.647] CI (95%) [-0.009-0.004]

∗∗∗

17.420 4 2 4

1.128 0.310 1.241

.003 .023 .000

7.284∗ 4 3 3

0.637 1.744 0.801

.000 .000 .016

[0.306–0.967] [1.011–2.478] [0.148–1.455]

7 3

1.050 0.868

.000 .000

[0.502–1.698] [0.212–1.523]

0.175

1.190 0 5 0.760 5 1.221 Slope −0.008

.000 .000 p .258

2

[0.387–1.869] [0.043–0.577] [0.868–1.615]

[0.340–1.179] [0.507–1.935] CI (95%) [-0.022-0.006]

Qm 1.279

2

1

1

df 1

Note: k = Number of studies; Nes = Number of eﬀect sizes; df = degree of freedom; d = Cohen's d; ∗p < 0.05; Slope = slope of regression line; Qm = Q-value under mixed-eﬀects regression model.

∗∗

p < 0.01;

∗∗∗

Qb 1.506

df 1

0.031

1

2.801

2

0.550

1

9.619∗∗

1

6.273∗

1

1.494

2

Qm 0.803

df 1

p < 0.001.

moderator variable inﬂuencing the L2 vocabulary learning outcomes in Condition 1. The eﬀect size of task-based games (d = 1.669, df = 2) signiﬁcantly outperformed that of drill types (d = 0.711, df = 6). However, we could not claim the moderator eﬀects of game types in Condition 2 (Qb = 1.506, df = 1, p > 0.05). 3.3.1.2. Educational level. The eﬀect sizes of the subgroups in educational levels also diﬀered signiﬁcantly in Condition 1 (Qb = 17.420, df = 2, p < 0.001). This shows that there was a large eﬀect size for university students (d = 1.241, df = 3, p < 0.001) as well as for preschool and elementary students (d = 1.128, df = 3, p < 0.01), and a small to medium eﬀect size for junior and senior high school students (d = 0.310, df = 1, p < 0.05). It indicates that digital games worked better for the two ends of the educational levels than for the junior and senior high school students. It should be noted that no junior or high school participants were involved in Condition 2 (Qb = 0.031, df = 1, p > 0.05). 3.3.1.3. L2 proﬁciency level. Students' L2 proﬁciency levels also played a signiﬁcant moderating role in L2 vocabulary learning in Condition 1 (Qb = 7.284, df = 2, p < 0.05). The results suggest that digital games had the potential to produce a large eﬀect size in L2 vocabulary learning when the students held a certain degree of L2 vocabulary knowledge (beyond-beginning learners) (d = 1.744, df = 2, p = 0.000). In contrast, it produced a medium eﬀect size when students were beginning learners (d = 0.637, df = 3, p = 0.000). In other words, students with some degree of L2 prior knowledge had better L2 vocabulary gains by playing digital vocabulary games than those beginning learners. Nevertheless, L2 proﬁciency levels did not have the signiﬁcant moderating inﬂuence on the eﬀect sizes in Condition 2 (Qb = 2.801, df = 2, p > 0.05). 3.3.1.4. Intervention setting. While there was a lack of informal settings in Condition 1 for statistical evaluation, intervention settings played a signiﬁcant role in moderating the eﬀect sizes of the added-or-changed values in games in Condition 2 (Qb = 9.619, df = 1, p < 0.01). This suggests that when students played the more sophisticated games in informal settings, the eﬀect size was medium to large (d = 0.771, df = 4, p < 0.001), but the games had no signiﬁcant inﬂuence in the formal settings (d = 0.219, df = 8, p > 0.05). 3.3.1.5. Assessment type. Although productive assessments were absent in Condition 1, three cases were identiﬁed in Condition 2. The results indicate that the eﬀect sizes of the added-or-changed values were signiﬁcantly moderated by the assessment types (Qb = 6.273, df = 1, p < 0.05). This shows that the eﬀect size was large when students were tested on their productive (active) vocabulary knowledge (d = 0.839, df = 2, p < 0.001), but small (d = 0.332, df = 10, p < 0.05) when they were tested on their receptive (passive) vocabulary knowledge. 352

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

3.3.2. Non-signiﬁcant variables Under the mixed-eﬀects model, linguistic distance (Qb = 0.175, df = 1, p > 0.05 in Condition 1; Qb = 0.550, df = 1, p > 0.05 in Condition 2) and game source (Qb = 1.190, df = 1, p > 0.05 in Condition 1; Qb = 1.494, df = 2, p > 0.05 in Condition 2) were categorical variables that posed no moderating inﬂuence in either condition. The regression coeﬃcient for the relation between the continuous variable of intervention durations and the vocabulary eﬀect sizes was negative (r = −0.008, z = −1.131, p = 0.258 in Condition 1). The p value suggests no signiﬁcant relationships between intervention durations and L2 vocabulary gains, and the Qmodel value (Qmodel = 1.279, df = 1, p > 0.05) indicates no signiﬁcant dispersion of the eﬀect sizes. The non-signiﬁcant inﬂuence was also found in Condition 2 (r = −0.003, z = −0.896, p > 0.05; Qmodel = 0.803, df = 1, p > 0.05). To wrap up the ﬁndings, ﬁrstly, four conditions of research designs were identiﬁed from the 26 empirical studies on digital gamebased L2 vocabulary learning. Secondly, the four conditions varied to a large degree in terms of the overall eﬀect sizes, the numbers of the studies included (from two to 10 studies), and the degree of dispersion among the eﬀect sizes. Thirdly, we retested two old (game types and linguistic distances) and explored six new potential moderators (educational levels, L2 proﬁciency levels, intervention settings, assessment types, game sources, and intervention durations) regarding their inﬂuences on digital game-based L2 vocabulary learning. The results yielded are harnessed to address the research questions. 4. Discussion and implications 4.1. Conditions and overall eﬀect size The identiﬁcation of the four conditions regarding research designs for digital game-based L2 vocabulary learning not only supports and extends Mayer's (2015) three research questions on digital game-based learning, but also gives more weight to the existing phenomena. In comparison with the results of previous meta-analysis studies in the ﬁeld, this study again supports that DGBL is superior to traditional instruction on L2 vocabulary achievements by a large eﬀect size, but lends no evidence for Chiu's (2013) report that CALL without games has better eﬀects than with games. As Chiu et al.'s. (2012) ﬁndings were also in favor of DGBL, the possible reasons accounting for the contradiction might be whether players' L2 vocabulary learning experiences could be accommodated comfortably in the digital games, or whether players' experience of the DGBL meets the educational purposes (Reinhardt & Sykes, 2014). That is, successful DGBL lies not only on the satisfaction of gameplay, but also on the realization of the learning goals. In addition to the issues mentioned above, the relationship between research designs and game-related factors has not as yet been well described. Based on the locales of the factors in each condition, we juxtaposed the four conditions with the factors by their locality. Condition 1, which includes an experimental group playing a digital game accompanied by traditional language instruction and a control group receiving traditional language instruction, explores the total eﬀect of factors from all possible locales. It might include game-design factors, the game-internal factors, such as game genre, topic, game characters, game contextual information, sound, music, graphics, game rules, and so on, which have the potential to enhance or diminish players' motivation, engagement, immersion, and learning (Cairns, Cox, & Nordin, 2014; Prensky, 2001). It might also include the interface, or factors from outside the game, the game-external factors, such as spatial factors (e.g., learning environment), temporal factors (e.g., frequency), players' attitudes (e.g., active versus passive players), and social expectations (e.g., teachers' attitudes, parental expectation), which directly or indirectly inﬂuence the total learning eﬀectiveness. While Condition 1 examines the total eﬀects of all potential factors inﬂuencing DGBL, Conditions 2, 3, and 4 have their own foci on the game-internal, interface, and game-external factors, respectively. Condition 2 includes a control/comparison group playing a base form of a game and an experimental group playing the game with a feature added in or changed in the game. That is, research in Condition 2 investigates the game-internal factors. Condition 3 includes an experimental group playing a digital game, and a control/ comparison group playing a game with the conventional medium. The eﬀect size is credited to the interface (i.e., the medium) through which the game is played. In Condition 4, the eﬀects are the results of the non-game factors, which, examined by their localities, are located outside the game per se. In other words, they are game-external factors.

Fig. 1. Structure diagram of the relationship between research conditions and game-related factors by their localities. 353

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Accordingly, the relationship between the four conditions and the localities of the game-related factors is visualized through the structure diagram displayed in Fig. 1. The contributions of the structure diagram are discussed correspondingly. First, by recognizing the locality of the issues to be solved, game designers and researchers could cooperate more easily to locate the problems using the shared language. For example, if players reﬂect negative emotion toward a character in a game, the character is a game-internal issue, whereas the players' emotion is an external factor. The game designer could locate the game-internal factors, ﬁnd the corresponding game-external factors, and cooperate with researchers and educational practitioners to seek solutions based on game theory and empirical evidence. It could also be utilized to assist novice researchers with the decision of an appropriate research design for the issues under focus. Further, it helps identify the gaps in DGBL research. For example, while most studies in Condition 2 investigated the augmented features added in or changed, few have explored the eﬀects of changing the content features, such as text-type or voice-type narratives in the game for task-based learning based on cognitive load theory (Sweller, 2011). The implication from Condition 3 is that with the advance of digital technologies, the interface might be switched to virtual reality (VR), or augmented reality (AR), or even further. Their potential to enhance the impact of DGBL on L2 vocabulary gains is highly anticipated and to be explored. The identiﬁcation of Condition 4, exploring the game-external factors, invites research not just on spatial and temporal factors, but also on the potential game-external factors such as socio-cultural inﬂuences on players' motivation and engagement in DGBL (Prensky, 2001). 4.2. Moderator variables The extensive search for moderator variables enables us to present a better outline of the research trend in digital game-based L2 vocabulary learning; the results of the moderator analyses facilitate our knowledge of the potential factors inﬂuencing the eﬀects of DGBL even when they are not in the foci of the research. 4.2.1. Signiﬁcant moderator variables 4.2.1.1. Condition 1. Among the eight moderator variables analyzed, game types were the only game-internal factor, which indicates the limited information oﬀered on game-internal factors for further analysis. Game types, educational levels, and L2 proﬁciency levels are three moderator variables with signiﬁcant inﬂuences on the eﬀect size of digital game-based learning and alternative activities on L2 vocabulary gains. Drawing on the results listed in Table 5, the best scenario for digital game-based L2 vocabulary learning might be when university students, with beyond-beginning L2 proﬁciency, play task-oriented digital games. With the limited information provided, game types were categorized into task-based games if a task was involved in the game, and drill-practice games if no tasks were involved. Our statistical results indicate that task-based games signiﬁcantly outperformed the drill-practice games. This implies that while drill-practice games might meet the learning purpose, they might also be seen as less gameful for encouraging engagement. Research has reported the capability of task-based games as they oﬀer task goals to stimulate players' critical thinking, problem-solving and task engagement (Baralt & Gómez, 2017; Chen et al., 2018). It is believed that taskbased games provide more meaningful and engaging situations to increase learners' motivation for learning as well as for meaning negotiation (Chiu et al., 2012). In other words, they are more capable of carrying out both gameplay and learning missions. Yet, as diﬀerent tasks draw up diﬀerent SL features, and diﬀerent game genres tend to oﬀer diﬀerent tasks, such as the Second-life game genre oﬀering a venue facilitating SL communication (Chen, 2016), our results give more supports to the contention that whether the tasks are well designed to engage players and at the same time meet the purpose of L2 vocabulary learning will decide the eﬀects of the learning outcomes. From Table 5 we also recognize that the low-educational level students (primary school and kindergarten) and university students had better vocabulary gains than the middle-level (junior and high school) students. One of the plausible reasons might be that most of the middle-level students in the primary studies had to face the pressure from the national school entrance examinations. One of the results was that there were few studies on the middle-level students; the other was that under the pressure, the stereotypical impression of game distraction to academic learning minimized the pleasure of the digital game learning and therefore decreased the learning outcomes. This rationale could also lend support to the phenomenon that there was no study on middle-level students in Condition 2. In the digital game-based L2 vocabulary learning scenarios, students with a certain amount of L2 learning experience (beyondbeginning level) had better vocabulary gains than those without (beginning level). One of the possible reasons might be that the prior L2 vocabulary knowledge enables learners to ﬁnd connections between words and therefore accelerates their L2 vocabulary gains (Abraham, 2008; Pulido, 2003). 4.2.1.2. Condition 2. The results of the moderator analyses in Condition 2 illustrate the scenarios of L2 vocabulary learning inﬂuenced by the added-or-changed features of digital games. Based on our ﬁndings, digital games with features added in or changed for educational purpose (e.g., games with scaﬀolding) work better than their base forms when the intervention setting is more informal and the instructors employ productive vocabulary knowledge assessments. This suggests that students are more selfmotivated in an environment with less instruction and pressure, lending support to Norris and Ortega's (2000) research ﬁndings that “less instruction is more eﬀective” (p. 474). In addition, while some interventions (e.g., computer-mediated glosses) have been reported as favoring learners' passive/receptive vocabulary gains (Abraham, 2008), our results reveal the potential of a well-designed feature to enhance students' productive vocabulary knowledge (the capability of using vocabulary in contexts). It is a result in concordance with the premise of eﬀective DGBL: the successful DGBL lies in a well-designed game in which the engagement of gameplay meets the needs of the educational goals. 354

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

In sum, to optimize digital game-based L2 vocabulary learning, it is suggested that game designers focus more attention on how language is used in the tasks or in the contexts. At the other end, instructors are encouraged to create a pressure-free environment and apply assessment of productive vocabulary knowledge when employing digital games for L2 vocabulary learning. 4.2.2. Non-signiﬁcant variables The non-signiﬁcant inﬂuences of linguistic distances in both conditions, conﬂicting with the cross-linguistic interference theory in which learners' L1 plays a signiﬁcant role in learning an L2 (Grosjean, 2012; Selinker, 1972), nevertheless, are consistent with Chen et al.'s (2018) ﬁnding. One of the conceivable reasons might be that linguistic features (e.g., phonology, morphology, or semantics) (VanPatten, 1994) had not been a target in the game-based vocabulary learning, or that our coding criterion for linguistic distance, the score in Chiswick and Miller's (2005) index, was the result of a 24-week language training program. It should be noted that the longest intervention duration in the 26 studies was 105 days (Segers & Verhoeven, 2003). Our research results also suggest that there might be no signiﬁcant divergence among vocabulary games from either custom-design, web, or oﬀ-the-shelf software. It is also noteworthy that while digital games are superior to various alternative activities in terms of motivating L2 vocabulary learning, the eﬀectiveness of digital games is not necessarily accelerated by longer durations. Fatigue, boredom (Segers & Verhoeven, 2003), and short-term memory loss are elements that have been considered to contribute to this non-signiﬁcance (Chiu, 2013; Cowan & AuBuchon, 2008). 5. Summary and critical evaluation A wide range of variability has been highlighted regarding the research designs and moderator variables in the empirical studies. The contributions of this meta-analysis could be summarized into three aspects. The ﬁrst is the identiﬁcation of the four conditions derived from Mayer's (2015) taxonomy of research designs in DGBL, and their respective eﬀect sizes. It enables further understanding of the status quo, and helps us to connect the research question and the corresponding research designs. The second is the development of a structure diagram on the relationship between the four conditions and the game-related factors by their locality. It connects the conditions of research designs and their respective factors through the visualized structure diagram, which helps us identify the gaps in research and enhance the communication between game designers, researchers, and educational practitioners. The third is the examination of eight moderator variables that directly or indirectly inﬂuenced the eﬀectiveness of the DGBL under Conditions 1 and 2. It indicates the inﬂuences of covariate variables when conducting research, and implies future research directions – the use of the moderator variables as independent variables in future research. The only game-internal moderator variable is game types, which not only advocates detailed descriptions of game-related factors in research, but also implies directions for further research. That is, further research on the eﬀects of content-related variables. Nevertheless, it should be noted that there are also limitations along with the contributions. Breaking down the 26 empirical studies into four categories leads to the insuﬃciency of quantity for solid evidence-based theory. Therefore, our ﬁndings should be treated as the inception of research rather than the completion of a phenomenon. It is recommended that more eﬀorts are made not only to increase the empirical evidence in each condition and moderators, but also to augment their numbers. In addition, it is suggested that researchers apply digital games for research with caution. Descriptions of all possible variables in details are encouraged to constrain the undesired inﬂuences on the outcomes. At the other end, it is suggested that educational practitioners consider the quality and the functions of the educational digital games before use. After all, successful DGBL should satisfy both the gameplay and the purposes of education. 6. Conclusion L2 vocabulary learning has been deemed a daunting process. The results of this meta-analysis study suggest that the use of digital games can eﬀectively motivate and enhance students' L2 vocabulary learning. Our research ﬁndings also illustrate various gamelearning scenarios in which diﬀerent factors might lead to profoundly diﬀerent learning outcomes. Although previous studies have reported the beneﬁts of using digital games to enhance L2 vocabulary learning, this study advances the research by oﬀering more and newer empirical evidence to support the argument. Further, drawing on Mayer's (2015) theoretical framework on digital game learning, this study develops a framework of four-conditional research designs based on the empirical evidence. Under the framework, research designs in the ﬁeld were diﬀerentiated into four conditions, namely Condition 1, examining the eﬀect sizes of L2 vocabulary gains when playing the digital game and receiving alternative activities, Condition 2, examining the eﬀect sizes of L2 vocabulary gains when playing a game with added or changed features and a game in its base form, Condition 3, examining the eﬀect sizes of L2 vocabulary gains when playing a digital game and receiving an intervention with the equivalent content through diﬀerent means, and Condition 4, examining the eﬀect sizes of L2 vocabulary gains inﬂuenced by a nongame related variable. Further, the four conditions are connected with their respective game-related factors by their locality, and the relationship is visualized through the structure diagram. That is, Condition 1 investigates the total eﬀect of all potential factors, Condition 2 investigates game-internal factors, Condition 3 investigates the potential eﬀect of the interface, and Condition 4 investigates the game-external factors. Researchers are encouraged to test and extend the four-conditional research designs in diverse academic disciplines in the scope of digital game-based learning. What is more, this study investigated eight potential moderators in Conditions 1 and 2. The results depict the eﬀectiveness of digital games in a variety of L2 vocabulary learning scenarios. It suggests that the best eﬀect of L2 vocabulary game-based learning might take place when university and elementary students with prior L2 learning experiences play task-based games. The results from 355

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Condition 2 suggest that the added features in games have generated a medium eﬀect size on L2 vocabulary gains, and the games with added and changed features outperform their base forms in accelerating students' productive vocabulary knowledge when they are implemented in a less time-constrained environment. Finally, our study results illustrate the status quo of research on digital game-based L2 vocabulary learning. On the one hand, it is the limitation of this study that there are not enough studies for better conﬁrmation of the eﬀect sizes for the four conditions, and there has been a shortage of well-deﬁned terms regarding the crucial concepts of DGBL, including the deﬁnition of what a game is, and how a task game is deﬁned; on the other hand, it suggests that DGBL can be a very eﬀective tool for L2 vocabulary learning, if it is well designed to meet the pedagogical purposes and the learners' needs, echoing the call for the educational value of digital games (Hong, Cheng, Hwang, Lee, & Chang, 2009). Acknowledgement This work was ﬁnancially supported by the Ministry of Science and Technology, Taiwan under grant number MOST 105-2511-S003 -052 -MY3, and by the “Institute for Research Excellence in Learning Sciences” of National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. In addition, we would like to give our thanks to the reviewers for their insightful advice. Appendices. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.compedu.2018.06.020. References deHaan, J., Reed, W. M., & Kuwada, K. (2010). The eﬀect of interactivity with a music video game on second language vocabulary recall. Language, Learning and Technology, 14(2), 74–94. Abraham, L. B. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning, 21(3), 199–226. http://dx.doi.org/10.1080/09588220802090246. Aghlara, L., & Tamjid, N. H. (2011). The eﬀect of digital games on Iranian children's vocabulary retention in foreign language acquisition. Procedia-social and Behavioral Sciences, 29, 552–560. Ali Mohsen, M. (2016). The use of computer-based simulation to aid comprehension and incidental vocabulary learning. Journal of Educational Computing Research, 54(6), 863–884. AlShaiji, O. A. (2015). Video games promote Saudi children's English vocabulary retention. Education, 136(2), 123–132. Ashraf, H., Motlagh, F. G., & Salami, M. (2014). The impact of online games on learning English vocabulary by Iranian (low-intermediate) EFL learners. Procedia-social and Behavioral Sciences, 98, 286–291. Aslanabadi, H., & Rasouli, G. (2013). The eﬀect of games on improvement of Iranian EFL vocabulary knowledge in kindergartens. International Review of Social Sciences and Humanities, 6(1), 186–195. Atay, D., & Ozbulgan, C. (2007). Memory strategy instruction, contextual learning and ESP vocabulary recall. English for Speciﬁc Purposes, 26(1), 39–51. Baralt, M., & Gómez, J. M. (2017). Task-based language teaching online: A guide for teachers. Language, Learning and Technology, 21(3), 28–43. Borenstein, M., Hedges, L. V., Higgins, J., & Rothstein, H. R. (2009). Introduction to meta-analysis. Wiley Online Library. Breen, M. (1987). Learner contributions to task design. Language Learning Tasks, 7, 23–46. Cairns, P., Cox, A., & Nordin, A. I. (2014). Immersion in digital games: Review of gaming experience research. Handbook of Digital Games, 1, 767. Calvo-Ferrer, J. R. (2017). Educational games as stand-alone learning tools and their motivational eﬀect on L2 vocabulary acquisition and perceived learning gains mobile. British Journal of Educational Technology, 48(2), 264–278. http://dx.doi.org/10.1111/bjet.12387. Chen, J. C. (2016). The crossroads of English language learners, task-based instruction, and 3D multi-user virtual learning in Second Life. Computers & Education, 102, 152–171. Chen, M. H., Tseng, W. T., & Hsiao, T. Y. (2018). The eﬀectiveness of digital game-based vocabulary learning: A framework-based view of meta-analysis. British Journal of Educational Technology, 49, 69–77. http://dx.doi.org/10.1111/bjet.12526. Chiswick, B. R., & Miller, P. W. (2005). Linguistic distance: A quantitative measure of the distance between English and other languages. Journal of Multilingual and Multicultural Development, 26(1), 1–11. Chiu, Y. H. (2013). Computer-assisted second language vocabulary instruction: A meta-analysis. British Journal of Educational Technology, 44(2), E52–E56. http://dx. doi.org/10.1111/j.1467-8535.2012.01342.x. Chiu, Y. H., Kao, C. w, & Reynolds, B. L. (2012). The relative eﬀectiveness of digital game-based learning types in English as a foreign language setting: A metaanalysis. British Journal of Educational Technology, 43(4), E104–E107. Cooper, H. (2015). Research synthesis and meta-analysis: A step-by-step approach, Vol. 2. Sage publications. Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis. Russell Sage Foundation. Cowan, N., & AuBuchon, A. M. (2008). Short-term memory loss over time without retroactive stimulus interference. Psychonomic Bulletin & Review, 15(1), 230–235. Davis, K., & Fullerton, S. (2016). Connected learning in and after school: Exploring technology's role in the learning experiences of diverse high school students. The Information Society, 32(2), 98–116. Estaire, S., & Zanón, J. (1994). Planning classwork: A task based approach. Oxford: Macmillan Heinemann. Fahim, M., & Sabah, S. (2012). An ecological analysis of the role of role-play games as aﬀordances in Iranian EFL pre-university students' vocabulary learning. Theory and Practice in Language Studies, 2(6), 1276–1284. http://dx.doi.org/10.4304/tpls.2.6.1276-1284. Franciosi, S. J. (2017). The eﬀect of computer game-based learning on FL vocabulary transferability. Educational Technology & Society, 20(1), 123–133. Franciosi, S. J., Yagi, J., Tomoshige, Y., & Ye, S. (2016). The eﬀect of a simple simulation game on long-term vocabulary retention. CALICO Journal, 33(3), 355–379. http://dx.doi.org/10.1558/cj.v33i2.26063. Ghanbaran, S., & Ketabi, S. (2014). Multimedia games and vocabulary learning. Theory and Practice in Language Studies, 4(3), 489–496. http://dx.doi.org/10.4304/tpls. 4.3.489-496. Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. Grosjean, F. (2012). An attempt to isolate, and then diﬀerentiate, transfer and interference. International Journal of Bilingualism, 16(1), 11–21. Guzzo, R. A., Jackson, S. E., & Katzell, R. A. (1987). Meta-analysis analysis. Research in Organizational Behavior, 9(1), 407–442. Hedges, L. V. (1982). Statistical methodology in meta-analysis. Princeton, NJ: ERIC Clearinghouse on Tests. Measurement and Evaluation, Educational Testing Service. Hedges, L. V. (1985). Statistical methodology in meta-analysis. Journal of Educational Statistics, 20. Hedges, L. V. (1992). Meta-analysis. Journal of Educational and Behavioral Statistics, 17(4), 279–296.

356

Computers & Education 125 (2018) 345–357

Y.-L. Tsai, C.-C. Tsai

Hedges, L. V., & Olkin, I. (2014). Statistical methods for meta-analysis. Academic press. Hedges, L. V., & Pigott, T. D. (2004). The power of statistical tests for moderators in meta-analysis. Psychological Methods, 9(4), 426. Hedges, L. V., & Vevea, J. L. (1998). Fixed-and random-eﬀects models in meta-analysis. Psychological Methods, 3(4), 486. Homer, B. D., Plass, J. L., Raﬀaele, C., Ober, T. M., & Ali, A. (2018). Improving high school students' executive functions through digital game play. Computers & Education, 117, 50–58. Hong, J. C., Cheng, C. L., Hwang, M. Y., Lee, C. K., & Chang, H. Y. (2009). Assessing the educational values of digital games. Journal of Computer Assisted Learning, 25(5), 423–437. Huang, Y.-M., & Huang, Y.-M. (2015). A scaﬀolding strategy to develop handheld sensor-based vocabulary games for improving students' learning motivation and performance. Etr&D-Educational Technology Research and Development, 63(5), 691–708. http://dx.doi.org/10.1007/s11423-015-9382-9. Hwang, G.-J., & Wang, S. Y. (2016). Single loop or double loop learning: English vocabulary learning performance and behavior of students in situated computer games with diﬀerent guiding strategies. Computers & Education, 102, 188–201. Jalali, S., & Dousti, M. (2012). Vocabulary and grammar gain through computer educational games. GEMA Online Journal of Language Studies, 12(4), 1077–1088. Kaminski, J. W., Valle, L. A., Filene, J. H., & Boyle, C. L. (2008). A meta-analytic review of components associated with parent training program eﬀectiveness. Journal of Abnormal Child Psychology, 36(4), 567–589. Koricheva, J., Gurevitch, J., & Mengersen, K. (2013). Handbook of meta-analysis in ecology and evolution. Princeton University Press. Laufer, B., & Paribakht, T. S. (1998). The relationship between passive and active vocabularies: Eﬀects of language learning context. Language Learning, 48(3), 365–391. Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15. Letchumanan, K., Tan, B. H., Paramasivam, S., Sabariah, M. R., & Muthusamy, P. (2015). Incidental learning of vocabulary through computer-based and paper-based games by secondary school ESL learners. Pertanika Journal of Social Science and Humanities, 23(3), 725–740. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis, Vol. 49. Thousand Oaks, CA: Sage publications. Long, M. H. (1996). The role of the linguistic environment in second language acquisition. Handbook of Second Language Acquisition, 2(2), 413–468. Lu, F. C., & Chang, B. (2016). Role-play game-enhanced English for a speciﬁc-purpose vocabulary-acquisition framework. Educational Technology & Society, 367–377. Maher, J. M., Markey, J. C., & Ebert-May, D. (2013). The other half of the story: Eﬀect size analysis in quantitative research. Cbe-life Sciences Education, 12(3), 345–351. Ma, Q., & Kelly, P. (2006). Computer assisted vocabulary learning: Design and evaluation. Computer Assisted Language Learning, 19(1), 15–45. Mayer, R. E. (2015). On the need for research evidence to guide the design of computer games for learning. Educational Psychologist, 50(4), 349–353. Moreno-Ger, P., Burgos, D., Martínez-Ortiz, I., Sierra, J. L., & Fernández-Manjón, B. (2008). Educational game design for online education. Computers in Human Behavior, 24(6), 2530–2540. Mostert, M. P. (2001). Facilitated communication since 1995: A review of published studies. Journal of Autism and Developmental Disorders, 31(3), 287–313. Muhanna, W. (2012). Using online games for teaching English vocabulary for Jordanian students learning English as a foreign language. Journal of College Teaching & Learning, 9(3), 235. Nation, I. S. (2001). Learning vocabulary in another language: Ernst klett sprachen. Nation, I. P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. Vocabulary: Description, Acquisition and Pedagogy, 14, 6–19. Norris, J. M., & Ortega, L. (2000). Eﬀectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50(3), 417–528. Perez, M. M., Van den Noortgate, W., & Desmet, P. (2013). Captioned video for L2 listening and vocabulary learning: A meta-analysis. System, 41(3), 720–739. http:// dx.doi.org/10.1016/j.system.2013.07.013. Prensky, M. (2001). Digital natives, digital immigrants part 1. On the Horizon, 9(5), 1–6. Pulido, D. (2003). Modeling the role of second language proﬁciency and topic familiarity in second language incidental vocabulary acquisition through reading. Language Learning, 53(2), 233–284. Reinhardt, J., & Sykes, J. (2014). Special issue commentary: Digital game and play activity in L2 teaching and learning. Language, Learning and Technology, 18(2), 2–8. Rosenthal, R. (1979). The ﬁle drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638. Saﬀarian, R., & Gorjian, B. (2012). Eﬀect of computer-based video games for vocabulary acquisition among young children: An experimental study. Journal of Comparative Literature and Culture, 1(3), 44–48. Sandberg, J., Maris, M., & Hoogendoorn, P. (2014). The added value of a gaming context and intelligent adaptation for a mobile learning application for vocabulary learning. Computers & Education, 76, 119–130. http://dx.doi.org/10.1016/j.compedu.2014.03.006. Saville-Troike, M. (1984). What really matters in second language learning for academic achievement? Tesol Quarterly, 199–219. Schmitt, N. (2008). Review article: Instructed second language vocabulary learning. Language Teaching Research, 12(3), 329–363. Schoemaker, K., Mulder, H., Deković, M., & Matthys, W. (2013). Executive functions in preschool children with externalizing behavior problems: A meta-analysis. Journal of Abnormal Child Psychology, 41(3), 457–471. Segers, E., & Verhoeven, L. (2003). Eﬀects of vocabulary training by computer in kindergarten. Journal of Computer Assisted Learning, 19(4), 557–566. Selinker, L. (1972). Interlanguage. Iral-international Review of Applied Linguistics in Language Teaching, 10(1–4), 209–232. Shintani, N. (2014). The eﬀectiveness of processing instruction and production-based instruction on L2 grammar acquisition: A meta-analysis. Applied Linguistics, 36(3), 306–325. Steinkuehler, C., Squire, K., & Sawyer, K. (2014). Videogames and learning. Cambridge handbook of the Learning Sciences, 377–396. Stock, W. A. (1994). Systematic coding for research synthesis. The Handbook of Research Synthesis, 236, 125–138. Sundqvist, P., & Wikström, P. (2015). Out-of-school digital gameplay and in-school L2 English vocabulary outcomes. System, 51, 65–76. Sweller, J. (2011). Cognitive load theory Psychology of learning and motivation, Vol. 55, Elsevier37–76. Vahdat, S., & Behbahani, A. R. (2013). The eﬀect of video games on Iranian EFL learners' vocabulary learning. Reading, 13(1). VanPatten, B. (1994). Evaluating the role of consciousness in second language acquisition: Terms, linguistic features & research methodology. Consciousness in Second Language Learning, 27–36. Viechtbauer, W. (2007). Accounting for heterogeneity via random-eﬀects models and moderator analyses in meta-analysis. Zeitschrift für Psychologie/Journal of Psychology, 215(2), 104–121. Wang, S. Y., Hwang, G. J., & Chen, S. F. (2015). Development of a contextual game for improving English vocabulary learning performance of elementary school students in Taiwan. 2015 Iiai 4th international congress on advanced applied informatics (Iiai-Aai) (pp. 268–272). . http://dx.doi.org/10.1109/iiai-aai.2015.161. Willis, J. (1996). A ﬂexible framework for task-based learning. Challenge and Change in Language Teaching, 52–62. Yen, L., Chen, C. M., & Huang, H. B. (2016). Eﬀects of mobile game-based English vocabulary learning app on learners' perceptions and learning performance: A case study of taiwanese EFL learners. 2016-January. Yip, F. W., & Kwan, A. C. (2006). Online vocabulary games as a tool for teaching and learning English vocabulary. Educational Media International, 43(3), 233–249. Young, S. S. C., & Wang, Y. H. (2014). The game embedded CALL system to facilitate English vocabulary acquisition and pronunciation. Educational Technology & Society, 17(3), 239–251.

357

Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study

Recommend Documents