Effective vocabulary learning tasks: Involvement Load Hypothesis versus Technique Feature Analysis

Effective vocabulary learning tasks: Involvement Load Hypothesis versus Technique Feature Analysis

System 56 (2016) 28e39 Contents lists available at ScienceDirect System journal homepage: www.elsevier.com/locate/system Effective vocabulary learn...

320KB Sizes 0 Downloads 19 Views

System 56 (2016) 28e39

Contents lists available at ScienceDirect

System journal homepage: www.elsevier.com/locate/system

Effective vocabulary learning tasks: Involvement Load Hypothesis versus Technique Feature Analysis Hsueh-chao Marcella Hu a, *, Hossein Nassaji b, 1 a b

Department of Applied English, The Overseas Chinese University, 100 Chiao Kwang Road, Taichung 407, Taiwan, ROC Department of Linguistics, University of Victoria, PO Box 1700, Victoria, BC V8W 2Y2, Canada

a r t i c l e i n f o

a b s t r a c t

Article history: Received 5 March 2015 Received in revised form 5 November 2015 Accepted 9 November 2015 Available online xxx

L2 vocabulary learning is a complex process involving not only understanding the meanings of words but also being able to retain, retrieve, and use them in production. To this end, learners need not only to pay deliberate attention to the target words but also have to deeply process the various aspects of the words to learn them effectively. This has been referred to as “elaborate processing.” Two frameworks have been proposed to operationalize the construct of elaborate processing for L2 vocabulary learning: Involvement Load Hypothesis (ILH) and Technique Feature Analysis (TFA). However, the two frameworks vary in the ways they conceptualize elaborate learning and also in terms of their attentional components. The present study was designed to empirically compare these two frameworks and their predictability for effective L2 vocabulary learning tasks. Ninety-six adult EFL learners were divided into four groups, and were required to learn the meanings of 14 unknown words. Each group performed one of four vocabulary tasks ranked differently by the two frameworks. The results showed that the TFA had a better explanatory power in predicting vocabulary learning gains than the ILH. The implications of the findings for designing effective L2 vocabulary tasks will be discussed. © 2015 Elsevier Ltd. All rights reserved.

Keywords: Involvement Load Hypothesis Technique Feature Analysis Depth of processing Elaboration Vocabulary learning Task-based learning Predictive power

1. Introduction The importance of vocabulary for L2 acquisition cannot be disputed. Many studies have shown that vocabulary is an important predictor of both reading comprehension and L2 development (Nation, 2001; Pulido, 2007, 2009). However, how vocabulary is learned or what processes are involved has been the focus of much theoretical discussion (Laufer & Hulstijn, 2001; Nation & Webb, 2011). One debate has been regarding the distinction between incidental versus intentional learning. Incidental vocabulary learning is often defined as learning vocabulary with no deliberate intention or when learners' attention is on learning something else whereas intentional vocabulary learning refers to learning with conscious intention and awareness (Laufer, 2001). L1 learners acquire most of their vocabulary incidentally (Nagy, Anderson, & Herman, 1987; Nagy, Herman, & Anderson, 1985; Nagy & Herman, 1987; Sternberg, 1987). However, there have been uncertainties about the extent to which incidental learning contributes to L2 acquisition. L1 learners encounter words frequently in a variety of contexts and this extensive exposure helps them acquire the words effectively. Such exposure opportunities do not exist for

* Corresponding author. Tel.: þ886 427016855x2162; fax: þ886 427075420. E-mail addresses: [email protected] (M. Hu), [email protected] (H. Nassaji). 1 Tel.: þ1 2507216634; fax: þ1 2507217423. http://dx.doi.org/10.1016/j.system.2015.11.001 0346-251X/© 2015 Elsevier Ltd. All rights reserved.

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

29

L2 learners. L2 learners, in particular those with low to intermediate levels, may be unable to benefit from incidental learning in the same way as L1 learners do (Hu & Nassaji, 2012; Hulstijn & Laufer, 2001; Laufer, 2005; Nassaji, 2003, 2004; Nassaji & Hu, 2012; Schmidt, 2001), and as a result they need opportunities for both incidental and intentional learning. In this respect, a number of L2 researchers have also argued that L2 learners need not only to pay deliberate attention to the target word but also deeply process its different aspects in order to learn them effectively (Hu & Nassaji, 2012; Hulstijn & Laufer, 2001; Laufer, 2005; Nassaji, 2003, 2004; Nassaji & Hu, 2012; Schmidt, 2001). This is what has been referred to as “elaborate processing”, and has been emphasized to be essential for L2 vocabulary learning (Ellis, 1994; Hulstijn & Laufer, 2001; Laufer, 2005, 2006; Laufer & Hulstijn, 2001; Pulido, 2009; Schmidt, 2001). The concept of elaborate processing was originally introduced by Craik and Lockhart (1972, 1975) in their “depth of processing” model. The depth of processing model suggests that the degree to which new information is retained and stored in long-term memory depends on how the information is processed. In this model, elaboration is the key to learning and retention of vocabulary. In their revised version, Lockhart and Craik (1990) further expanded those ideas by highlighting at least two stages for effective learning: an input analysis stage whereby sensory features, such as orthographic and phonological features of word forms, are analyzed, and a retrieval stage in which semantic and conceptual features are retrieved with deeper analysis (Eckerth & Tavakoli, 2012). In this model, not only initial attention, noticing and processing of words are essential, but also their subsequent retrieval and consolidation of the semantic encoding of the word features in memory is also critical for learning. The present study was designed to examine and compare the predictions yielded by two frameworks that have attempted to operationalize the construct of elaborate processing for L2 vocabulary learning: Involvement Load Hypothesis (Laufer & Hulstijn, 2001) and Technique Feature Analysis (Nation & Webb, 2011). The aim was to find out which of the two frameworks provided a greater explanatory power in predicting the effectiveness of different vocabulary learning tasks. 2. Literature review A number of studies on L2 vocabulary acquisition have highlighted the importance of lexical elaboration (Pulido, 2007, 2009; Rott, 2007; Schmidt, 2001). However, an important issue has been how to operationalize depth of processing. As just noted, in the context of L2 vocabulary learning, there are currently two theoretical frameworks that have attempted to operationalize and measure depth of processing: The Involvement Load Hypothesis and the Technique Feature Analysis. These two frameworks differ in the way they conceptualize depth of processing and in the parameters they propose for elaborate learning. These differences lead to varying weights given to different attentional components, resulting in variations in prediction about what vocabulary tasks or activities are more effective in L2 learning (Nation & Webb, 2011). In what follows, we will describe the two frameworks. 2.1. The Involvement Load Hypothesis The Involvement Load Hypothesis (ILH) conceptualizes depth of processing and elaborative learning in terms of three major task components: need, search, and evaluation (Laufer & Hulstijn, 2001). Each of the three components is suggested to vary in terms of its strength. ‘Need’, for example, is hypothesized to be either moderate or strong. Need is considered to be moderate if it is externally imposed by the teacher (e.g., The teacher wants the learner to find the meaning of a word). However, need is strong when it is intrinsically motivated or self-imposed by the learners (e.g., the need to look up the meaning of a word in a dictionary when reading a text). There is no need for search if the meanings are provided in the margins. Search can be either moderate or strong depending on whether it is receptive retrieval or productive retrieval (Nation & Webb, 2011). Search is moderate if the learner has to look for or retrieve the meaning of a word, and it is strong if the learner needs to find the word form. As for evaluation, it is moderate if the learner needs to compare the specific meaning of a word with other meanings. Evaluation is strong if there is a need to assess whether a word meaning fits a specific linguistic context. The ILH suggests that the degree to which a vocabulary task helps L2 learners acquire new target words depends on how much the task promotes each of the above involvement load components. It predicts that the greater the involvement load in a given task, the better vocabulary learning and retention. Laufer and Hulstijn (2001) provided the following examples of two tasks and how they differ in terms of their involvement load. One task is when the learner is required to create sentences with a series of new words whose meanings are given by the teacher. They argued that this task induces no search because the meanings are provided. However, it induces a moderate need and a strong evaluation because the learner needs to evaluate the suitability of the words in context. In terms of the overall involvement load, they hypothesized that the task has an involvement index of 3 [0 (search) þ 1 (need) þ 2 (evaluation)]. The second task is when the learner is required to read a text and answer comprehension questions with the meaning of the words being provided in the margins. Here the task involves neither evaluation nor search but a moderate need because the learner needs to look at the glosses. This task, they argued, has an overall involvement index of 1 [0 (search) þ 1 (need) þ 0 (evaluation)]. According to the researchers, Task One would be more effective for vocabulary learning than Task Two. A number of recent studies have examined the efficacy of ILH and have found some evidence for its predictive power (Hulstijn & Laufer, 2001; Keating, 2008; Kim, 2008; Nassaji & Hu, 2012; Peters, Hulstijn, Sercu, & Lutjeharms, 2009; Rott, 2007). One of the initial studies is by Hulstijn and Laufer (2001), which examined the effects of involvement load on the retention of ten English words by young adult ESL learners. To this end, they designed an experimental study with three tasks

30

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

involving different degrees of involvement load (i.e., reading comprehension, comprehension plus filling in target words, and composition writing with target words). They then measured the effects of each task on the retention of the target words. Their results indicated that retention was related to the amount of task-induced involvement load: it was highest in the composition task, lower in reading plus fill-in, and lowest in the reading comprehension. It was argued that the composition task yielded the highest retention because it involved a higher involvement load than the other two tasks. In addition to the effect of type of task, researchers have also examined the role of time-on-task (Folse, 2006; Keating, 2008; Kim, 2008). The underlying idea behind this principle is that the more time you spend on something, the more likely you will become good at it (Nation & Webb, 2011). Time-on-task could be a factor resulting in the ambiguity between the effect of quality and quantity on learning as sometimes it is hard to distinguish whether the learning is caused by the design/type of the task, the time spent on the task, or both. Assuming that time could be the factor in involvement load contributing to different performance levels, Kim (2008) conducted an experiment by partially replicating Hulstijn and Laufer's (2001) study to investigate whether different levels of task-induced involvement load affected the initial learning and retention of target words by L2 learners. The results were consistent with the assumption of the ILH in that the higher involvement load induced by the task resulted in both higher initial learning and also better retention of new words. However, Folse (2006) and Keating (2008) obtained different results from those by Kim (2008). In Folse's study, participants were required to do three tasks: one fill-in-the-blank exercise, three fill-in-the-blanks exercises, and one original sentence writing exercise. From the perspectives of involvement load theory, the third task was assumed to yield the best learning, but the results indicated that participants doing the second task outperformed those doing the other two tasks when time spent on the task was controlled. Excluding the effect of time-on-task, Keating (2008) also found that the predictive power of involvement load was weakened or even disappeared. Nassaji and Hu (2012) investigated the effects of task-induced involvement load on Chinese ESL learners' use of lexical inferencing strategies and vocabulary retention. The results showed a complex interaction between successful inferencing, learner involvement, and word retention. Both successful inferencing and word retention depended not only on task factors, but also on how learners used inferential strategies. The use of multiple strategies, including verifying and evaluating, led to better retention than the use of single strategies. The study also found that the degree of word retention was related to both the degree of involvement and task form-focusedness. These findings were consistent with Pulido's (2009) results, which indicated that the strategies that help learners focus on the target word form and making a form-meaning connection were essential for learning. To extend the above line of research, studies have also examined whether tasks with the same involvement load but different combinations of the three involvement factors (i.e., need, search, and evaluation) had any differential effects (Kim, 2008; Laufer, 2003). Laufer (2003) conducted an experiment in which 90 Arabic learners of English were asked to complete three tasks with the same involvement indexes and they were tested on word retention after each task. The three groups differed significantly in their posttest scores, suggesting that each of the three components might contribute differently to the learning tasks. Kim (2008) also examined whether two tasks (i.e., writing a composition and writing sentences), which were assumed to involve the same theoretical level of task-induced involvement, would have the same effects on initial learning and subsequent retention of new words. The results of this study provided evidence that tasks with the same involvement loads were equally beneficial for vocabulary learning. However, Kim further suggested that different degrees (i.e., moderate and strong) of each individual component (i.e., need, search, and evaluation) might not contribute to the same weights and strong evaluation might be the most influential factor for learner's initial vocabulary acquisition. Kim called for more studies to investigate the value of each individual component and also those with multiple treatments for each task. 2.2. Technique Feature Analysis (TFA) Technique Feature Analysis (TFA) is a theoretical framework proposed by Nation and Webb (2011). This framework was intended to complement the inadequacy mentioned above for the Involvement Load Hypothesis by introducing more criteria for the operationalization of depth of processing than those included in the ILH. TFA is basically a re-modification of an earlier vocabulary-learning framework, which suggested that vocabulary learning involves three components: noticing, retrieval, and generation (Nation, 2001). TFA adds two additional components (i.e., motivation and retention). Nation (2001) argued that the earlier system did not allow for the quantification of the elaboration features. The TFA framework; therefore, includes specific components that not only increase the number of elaboration parameters, but also propose criteria to assess each component. This thus leads to a 5-component and 18-criterion framework. The TFA components and their criteria are displayed in Table 1. As seen in Table 1, the factor of ‘motivation’ concerns whether the vocabulary activity has a clear learning goal and motivates learning. The ‘noticing’ factor focuses on whether the activity gives attention to the target words and raises awareness of new word learning, and also whether it involves negotiation. It occurs when learners have to look up a word in the dictionary, deliberately study a word, guess from context, or have a word explained to them (Nation, 2001). The factor ‘retrieval’ consists of receptive and productive retrieval, and involves recall rather than recognition, and whether there are multiple retrievals or spacing between each interval. According to Baddeley (1990), retrieval can be enhanced by repetition. The fourth factor, ‘generation,’ can be divided into either receptive or productive processes (Nation, 2001). Receptive generation involves meeting a word while listening to or reading an unfamiliar context, whereas productive generation refers to using the word in new contexts. The final factor, ‘retention,’ mainly refers to whether a vocabulary activity ensures successful linking of form and meaning; whether it involves instantiation, imaging, and avoids interference.

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

31

Table 1 Technique Feature Analysis (adopted from Nation & Webb, 2011, p. 7). Criteria Motivation Is there a clear vocabulary learning goal? Does the activity motivate learning? Do the learners select the words? Noticing Does the activity focus attention on the target words? Does the activity raise awareness of new vocabulary learning? Does the activity involve negotiation? Retrieval Does the activity involve retrieval of the word? Is it productive retrieval? Is it recall? Are there multiple retrievals of each word? Is there spacing between retrievals? Generation Does the activity involve generative use? Is it productive? Is there a marked change that involves the use of other words? Retention Does the activity ensure successful linking of form and meaning? Does the activity involve instantiation? Does the activity involve imaging? Does the activity avoid interference? Maximum score

Scores 0 0 0

1 1 1

0 0 0

1 1 1

0 0 0 0 0

1 1 1 1 1

0 0 0

1 1 1

0 0 0 0

1 1 1 1 18

As noted earlier, a number of recent studies have examined the effectiveness of ILH and have found some evidence for its predictive power (Hulstijn & Laufer, 2001; Keating, 2008; Kim, 2008; Nassaji & Hu, 2012). No empirical studies; however, have yet examined the predictive power of TFA. The only examination, which was not conducted in an empirical study, is a scoring comparison that Nation and Webb (2011) did between ILH and TFA on several vocabulary learning tasks. In their analysis, they found some disagreements in terms of the learning ranking between these two frameworks. For example, while some of the tasks were given a higher learning index by the ILH, they were given a lower index by the TFA. For example, they calculated and compared the ILH and TFA scores of the following two vocabulary tasks: ‘word cards’ and ‘fill in the blanks.’ ‘Word cards’ is a vocabulary learning activity in which some information about the word (e.g., word form) is put on one side of the card and the other information (e.g. word meaning) is on the other side. While reviewing information on one side, the learner has to retrieve the information on the other side. Nation and Webb (2011) calculated the involvement load index of this task (based on the ILH) and came up with a score of 3 out of 6 (which they considered to be a moderate involvement). They also calculated the TFA index based on the TFA framework, and they came up with a score of 11 out of 18 (which is a relatively higher index). They followed the same comparison procedure for another vocabulary learning activity: Filling in the blanks. They came up with a higher involvement index of 4 out of 6 but a lower TFA score of 8 out of 18. Since there are differences between the ILH and TFA in the way they operationalize depth of processing and the varying weights they give to each attentional component, the two frameworks can result in differences in prediction about which tasks or activities are more effective for vocabulary teaching and learning. However, no studies have yet empirically examined and compared these predictions. Thus, Nation and Webb (2011) called for empirical research in this area. They pointed out that, “there is certainly scope for experimentally comparing ILH and TFA” (p. 26) as there appears to be some disagreement between the two models. The present study aimed to address this call. The present study was set up to examine and compare the predictions yielded by these two models. More specifically, it aimed to see which of the two frameworks provides greater explanatory power in predicting the effectiveness of different vocabulary tasks. We selected four vocabulary learning tasks that differ in their ranking and the extent to which they promote the different components of the ILH and TFA (see the methods section). We first examined the features of these tasks and calculated their scores based on the two frameworks and then compared their effectiveness in terms of the participants' vocabulary knowledge gains. The following two research questions were investigated: 1. To what extent do the vocabulary tasks used contribute to L2 vocabulary learning? 2. To what extent is their contribution, if any, predicted by the ILH versus TFA frameworks? 3. Method 3.1. Participants A total of 96 EFL learners participated in this study. They were Taiwanese college-level second-year business majors, who had at least 6e7 years of English learning experiences. They were enrolled in the General English classes at a university of

32

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

technology in Central Taiwan. Their age range was between 19 and 21. Each had a CSEPT (College Student English Proficiency Test) score between 130 and 170, which was approximately equivalent to the low-intermediate level (https://www.lttc.ntu. edu.tw/CSEPT_main.htm). To ensure that participants had sufficient vocabulary knowledge to take part in this study, they were first given the 2000 Vocabulary Levels Test (Nation, 2001), which consists of the most frequent 2000 words. All the participants satisfied the threshold level of the test (i.e., 12 out of 18 points on the test) suggested by Nation (2001).

3.2. Vocabulary tasks As noted earlier, the main purpose of this study was to examine to what extent the vocabulary tasks with similar and different rankings between the ILH and TFA contribute to vocabulary learning. To compare the learning effects between the ILH and TFA, four vocabulary tasks that differ in their rankings and differ in the extent to which they promote the different components of the ILH and the TFA were used. As the aim of the study was to compare tasks with similar and different rankings in Involvement Load Hypothesis and Technique Feature Analysis, we had to choose tasks consistent with both frameworks. Therefore the tasks for comparison were those suggested in Nation and Webb (2011) and included the following: 1) reading a text with multiple-choice items, 2) reading a text and choosing definitions, 3) reading plus fill in the blanks, and 4) reading and rewording the sentences. Task 1 has an ILH index of 3 and a TFA score of 6; task 2 had an ILH index of 3 and a TFA score of 6; task 3 had an ILH index of 2 and a TFA score of 7, and task 4 had an ILH index of 3 and a TFA score of 6. It was not possible to choose tasks with a larger gap of involvement load because the maximum involvement load for the tasks listed in Nation and Webb (2011) is 4. Since all the tasks involved initially reading a text, first a reading text had to be developed for the tasks. For that purpose an economics text was used. The text was selected from an ESP textbook used for Taiwanese college-level business majors. One reason for using this text was that it was relevant to the students' area of interest. Another reason for using this text was that the participants in this study had to take an introductory economics class, and they were supposed to have some basic understanding of the concepts mentioned in the text. Therefore, the text was related content-wise to students' academic work. The text length was 596 words. To determine whether the vocabulary tasks that students did after reading the text helped them learn new words, fourteen target words unknown to the learners were selected and used in the text. As the text used in this study was an academic text, the target words were selected from the Academic Word List (Coxhead, 2000) and were also screened by using the AWL Highlighter to make sure that the words were appropriate for the participants' level. The AWL highlighter is a program that screens a text and picks up the academic words from level 1 to 10 of the academic words (Coxhead, 2000). Originally 33 AWL words were selected by the highlighter. A pilot study with a similar pool of participants was then conducted, and based on the results, 14 words, which were unknown to all pilot participants, were selected. The results of the main study later showed that most of the words were unknown to students at that level. All the target words were highlighted in the text. Our vocabulary tasks were then developed based on the text discussed above and the target words. Each task is described below: Task 1: Reading a text with multiple-choice questions: In this task, the target words were highlighted, and participants were required to read and understand the text. Upon finishing reading the text, they had to answer several multiple-choice questions focusing on the comprehension of a section of text containing the target word. The questions followed Nation and Webb's (2011) format and were designed in a way that the participants had to understand the meaning of the target word in order to be able to correctively answer the multiple choice question. Four distractor items were also included in each task described below. Here is an example from Nation and Webb (2011, p. 322). Choose the correct answer 1. What is the story about? a. A girl who gets a novel about technology in the mail. b. A basketball coach who tells jokes. c. A man who takes a direct trip to the local store. d. A girl who can control other people. Task 2: Reading a text and choosing definitions: In this task, the target words were highlighted. Upon finishing reading the text, the participants were required to choose the correct definition of each target word (Nation & Webb, 2011, p. 322). Choose the correct definition for each word century a) first b) hundred c) school d)man Task 3: Reading plus fill in the blanks: In this task, the learners read the text with some blanks in it. The target words were provided on a separate piece of paper with L1 translations and L2 explanations (or synonyms), as well as a sample sentence containing the target word. Some extra words were offered as distractors. The example information for the target word is as below (Hulstijn & Laufer, 2001, p. 537). wrath (noun, U). Strong fierce, anger (憤怒) Ex. The wrath of the opponents to the proposed bill caused it to fail.

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

33

Task 4: Reading a text and rewording the sentences: In this task, the target words were highlighted. Upon finishing reading the text, the learners had to rewrite the sentences drawn from the text containing the target words. The example is as below (Nation & Webb, 2011, p. 322). Reword the sentences without changing their meanings. Use an appropriate form of the words in parentheses. I strongly dislike jazz. (stand) The rankings of each of the tasks were based on the comparisons between the ILH and TFA made by Nation and Webb (2011). A summary of the ranking is presented in Table 2 (adapted from Nation & Webb, 2011). As can be seen, tasks 1, 2, and 4 have been shown to have consistent rankings between both the ILH and TFA (i.e., with an involvement index of 3 and a technique feature score of 6). Task 3; however, has inconsistent rankings between the two frameworks. There is a lower ILH index of 2 for this task but a higher TFA score of 7. During the process, learner's performance was monitored and the four tasks were found to be conducted by the learners as intended. 3.3. The study design and the pre- and posttest measures To investigate how each task affected learners' vocabulary learning, the participants were divided into four groups and each performed one of the four tasks. Their vocabulary learning performance was then compared across the four conditions. Although it was assumed that the target words were unfamiliar to the learners, all participants were still given a vocabulary pretest measuring their receptive knowledge of the target words prior to performing the tasks (see Appendix 1). The pretest consisted of the 14 target words, for which the participants had to provide either the Chinese translations or English synonyms. Then after they completed the tasks, the learners were again tested by the same test measuring their receptive knowledge of the target words. The word order was rearranged to make them different from the pretest and those appearing in the text. Both the pretest and the posttest were scored with the same criteria, with a score of 1 being assigned to a correct response and 0 to an incorrect one. In addition to measuring the learners' knowledge of the vocabulary items in the pretest and the posttest, their duringtask success was also measured. This was done by checking the participants' responses to the target words when they had to

Table 2 Four tasks analyzed using TFA and ILH.

Motivation Is there a clear vocabulary learning goal? Does the activity motivate learning? Do the learners select the words? Noticing Does the activity focus attention on the target words? Does the activity raise awareness of new vocabulary learning? Does the activity involve negotiation? Retrieval Does the activity involve retrieval of the word? Is it productive retrieval? Is it recall? Are there multiple retrievals of each word? Is there spacing between retrievals? Generation Does the activity involve generative use? Is it productive? Is there a marked change that involves the use of other words? Retention Does the activity ensure successful linking of form and meaning? Does the activity involve instantiation? Does the activity involve imaging? Does the activity avoid interference? Total score Involvement load index (need, search, evaluation)

Task 1: Reading a text and multiple-choice items on text

Task 2: Reading a text and Task 3: Reading plus Task 4: Reading a text and choosing definitions fill in the blanks sentence re-wording

0

1

1

1

1 0

1 0

1 0

0 0

1

1

1

1

0

1

1

1

0

0

0

0

1

1

0

0

0 1 0

0 0 0

0 0 0

0 0 0

0

0

0

0

1

0

1

1

0 0

0 0

0 0

1 0

0

0

1

1

0

0

0

0

0 1 6 1 þ 1þ1 ¼ 3

0 1 6 1þ1þ1¼3

0 1 7 1 þ 0þ1 ¼ 2

0 0 6 1þ0þ2¼3

34

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

perform each of the four tasks. To this end, for tasks 1, 2, and 3, the learners’ performance on the multiple-choice questions that followed the task was checked and also scored based on their success. For scoring the during-task performance, the learners received a score of 1 for a correct answer and 0 for an inaccurate response. For task 4 (i.e., reading a text and sentence rewording), the same scoring system was used to evaluate the accuracy of the reworded sentences. They received a score of 1 for a grammatically correct sentence containing the synonym of the original target word, and a score of 0 if the answer was wrong. Two independent raters read and judged their rewritten sentences, and an inter-rater reliability of .98 was also achieved.

4. Results To answer the research questions, first, a one-way analysis of variance (ANOVA) was performed on the learner's pre-test scores (Table 3). Before conducting the ANOVA, the homogeneity of variance was checked using the Levene's test, and this assumption was met. The results of the ANOVA then showed that all students were at a similar level without any significant difference (p > 0.05) in terms of their knowledge of the target words before the treatment. Their mean scores were also quite low, suggesting that they did not have much knowledge of the target words. The first question examined to what extent the vocabulary tasks with similar and different rankings between the ILH and TFA contributed to vocabulary learning. To explore this question, the tasks were first classified according to the amount of involvement as suggested by the degree of task-induced involvement (high and low IL) in the ILH and also the technique feature score suggested by the TFA framework (high and low TFA). To this end, the tasks that received a score of 3 by the involvement load (see Table 4) were classified as having high involvement and the ones that received a score of 2 were classified as lower involvement. Similarly, those tasks that received a score of 7 were classified as high TFA and the ones that received a score of 6 were classified as lower TFA. Accordingly, reading a text plus multiple-choice items, reading a text and choosing definitions, as well as reading a text plus sentence rewording tasks were classified as tasks with high-involvement load indexes but lower technique feature scores. However, the reading plus fill-in-the-blank task was classified as having lowinvolvement load index but a higher technique feature score (Table 4). A one-way ANOVA was conducted to examine how students differed in their task performance and vocabulary learning gains across the 4 tasks. Thus, analyses were conducted on both learners’ performance during each task (their correct responses to the target words when performing the task) as well as the amount of gains from the pretest to the posttest. The results of their task performance can be seen in Table 5. These results indicated that participants doing task 2 (reading words and choosing definitions) had the best performance with a mean of 7.167, followed by task 3 (reading plus fill-in with a mean of 6.750), task 1 (multiple-choice items on text with a mean of 5.250), and task 4 (sentence rewording with a mean of 2.417). Afterward, a one-way ANOVA of during-task performance was conducted to examine whether there was significant difference across the four tasks. The result showed a statistically significant difference (F ¼ 9.888; p ¼ .000, see Table 6). An LSD post hoc was then conducted to examine how each task differed from one another (Table 7). Out of the 4 comparisons with significant differences, three of them were consistent with the assumptions of neither the ILH nor the TFA. That is, participants doing task 2 (reading a text and choosing definitions) performed better than those doing task 1 (reading plus multiple-choice items on text), participants doing task 1 performed better than those doing task 4 (reading a text plus sentence rewording), and participants doing task 2 performed better than those doing task 4. One significantly different comparison consistent with TFA but not with the ILH is that participants doing task 3 (reading plus fill-in-the-blanks) performed better than those doing task 4. Out of the comparisons with non-significant differences, one was consistent with TFA (T3 > T1) and one was consistent with the ILH (T2 > T3). Based on the consistency between the assumptions and the results, it appeared to suggest that TFA has a better predictability of accounting for the during-task learner performance. The cross-task comparisons are summarized in Table 8. To measure vocabulary learning gains, the learners' performance on the pretest and the posttest was examined. The analysis calculated the vocabulary gains from the pretest to the posttest and then examined which of the two frameworks accounted for more variance in the learners' vocabulary gains. For the latter, first weighted scores for the ILH and TFA were calculated for each task based on the learners' test scores. Weighted scores were calculated using the weight given to the different components in each framework. Then, a hierarchical multiple-regression was conducted to see which of the two frameworks contributed more to the amount of gains from the pretest to posttest.

Table 3 ANOVA of the participants' pretest scores across the four tasks. Tasks

Means

Standard deviations

F

p-Value

Task Task Task Task

1.250 1.167 1.042 1.125

1.260 1.404 1.429 1.361

0.095

0.963

1 2 3 4

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

35

Table 4 Task classification based on the indexes provided by ILH and TFA. Tasks

Degree of task-induced involvement load

Degree of TFA

Task Task Task Task

High (3) High (3) Low (2) High (3)

Low (6) Low (6) High (7) Low (6)

1 2 3 4

Table 5 Descriptive statistics of the during-task performance per condition. Tasks

N

Mean

SD

Task 1 Task 2 Task 3 Task 4 Total

24 24 24 24 96

5.250 7.167 6.750 2.417 5.396

2.111 2.548 5.277 2.466 3.791

Table 6 ANOVA of during-task performance.

Task Within-group error Total

df

Mean square

F

p-Value

3 92 95

110.931 11.219

9.888

.000

Table 7 Post hoc multiple comparisons across the four tasks. Task

Between-task

Mean difference

Standard error

Sig.

Task 1

task task task task task task task task task task task task

1.9167 1.5000 2.8333* 1.9167 .4170 4.7500* 1.5000 .4170 4.3333* 2.8333* 4.7500* 4.3333*

.9669 .9669 .9669 .9669 .9669 .9669 .9669 .9669 .9669 .9669 .9669 .9669

.050 .124 .004 .050 .124 .004 .124 .668 .000 .004 .000 .000

Task 2

task 3

Task 4

2 3 4 1 3 4 1 2 4 1 2 3

*The mean difference is significant at the 0.05 level.

Table 8 Cross-task comparisons checked against the assumptions of the ILH and TFA. Cross-task comparisons Task Task Task Task Task Task

2 1 2 3 3 2

> > > > > >

Task Task Task Task Task Task

1 4 4 4 1 3

Mean differences (Sig.)

Assumptions of the ILH

Assumptions of the TFA

1.9167* 2.8333* 4.7500* 4.3333* 1.5000 .4170

     √

   √ √ 

Note. The numbers with an asterisk (*) indicates a significant difference with a 0.05 level.

To calculate the weighted scores, learners' test scores for each task were first converted into percentile scores and then the percentile scores were converted into weighted scores based on the number of the components of the two models: ILH and TFA (see below). The percentile score was calculated for each task based on the following formula:

n/14  100 ¼ p% (n ¼ scores of the task; 14 ¼ the number of target words). The percentile score was then divided into percentile ranks based on the different components proposed by the ILH or the TFA. For the ILH, the scores of each task were converted into percentile ranks based on the three components (1e3) proposed

36

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

in ILH for each task. For example, if a participant doing the second task received a gain score of 6, then his percentile score was 6/14  100 ¼ 42.85%. Based on the three components of ILH, a score of 42.85% falls between 34% and 67% percentile rank (see note for Table 9 below the table), this was then taken to be approximately equal to an involvement index of 2. Similarly, for the TFA, the scores of each task were converted into percentile ranks based on the five components (1e5) proposed in TFA for each task. If a participant doing the second task got a score of 7, and the converted score was 7/ 14  100 ¼ 50%. As 50% falls somewhere of the percentage ranks between 33 and 65, which was approximately equal to a TFA index of 2 (Table 10). A hierarchical multiple-regression was then conducted to see which of the two frameworks contributed more to the amount of gains from the pretest to posttest (Table 11). Separate forced-entry, hierarchical, multiple regressions were performed with the amount of gains from the pretest to posttest as the dependent variable and the tasks with either ILH or TFA index scores as predictor variables. To determine the contribution of each of the two predictor variables over and above the contribution of the other, the two predictor variables were entered into the regression model in different orders. First ILH was entered into the equation, and it explained 38% of the variance in the amount of gains from the pretest to the posttest, which was significant. Then TFA was entered on the second step to see whether it could explain any additional variance in the amount of gains after the variance attributable to ILH was partialled out. When TFA was entered on the next step it explained an additional 13% of the variance and that was statistically significant. Conversely, TFA was entered first into the equation followed by ILH. When TFA was entered first, it explained 51% of the variance in the amount of gains and it was significant. When ILH was entered on the next step, ILH accounted for an addition of only 1% of the variance, which was not statistically significant. The results of the above analyses suggest that of the two predictor variables (ILH and TFA), the latter was a significantly stronger predictor of lexical gains than the former. In other words, the TFA had a stronger explanatory power in explaining lexical gains. 5. Discussion This study examined to what extent vocabulary tasks with similar and different rankings given by the ILH and TFA contribute to L2 vocabulary learning. The results will be discussed in terms of the effectiveness of that for during-task performance, and pretest-posttest vocabulary learning gains. To examine the task effectiveness between the ILH and TFA, the four tasks used were first categorized based on their ILH and TFA indexes and then their effects were compared. In terms of the during-task performance, the findings indicated that TFA appeared to predict better performance than did the ILH. The results showed that participants doing task 2 had the best performance, followed by tasks 3, 1, and 4. Besides, participants scored better in the second and third tasks than they did in the first task. One characteristic of the second and the third tasks was that they were both form-focused as compared to the first task. In the second task, the participants were required to choose a correct definition for each target word based on their reading. In the third task, learners had to read the text with some blanks in it. The first task; however, did not target at any specific word. The difference in learners' performance between these tasks then suggests that students were more likely to learn new words when the exercises were form-focused, that is, directly related to the target words. This finding is consistent with previous research in which the reading-plus task was proven to be more effective than the reading-only one (Eckerth & Tavakoli, 2012; Keating, 2008; Kim, 2008; Laufer & Rozovski-Roitblat, 2011; Peters, 2012; Peters et al., 2009). As for the pretest-posttest vocabulary gains, the means across the 4 tasks all improved from the pretest to the posttest, which suggests that all four task types facilitated the participants' vocabulary learning. These results also showed the task scored higher by the TFA (i.e., reading plus fill-in-the-blanks) led to better task performance than other tasks (e.g., sentence rewriting). One reason for this advantage could have been that the participants had to spend more time on the task and focus on the original sentences containing the target words while being asked to do the task. Participants doing this task were provided with some extra information (e.g., L2 synonym and L1 translation) and thus they might be willing to spend more time on the task with the information at hand. On the other hand, many participants doing the writing task gave up constructing the sentences during the task and thus they spent very little time on the task in general. The results of the

Table 9 Percentage of distribution of the three components in the ILH. Tasks

Need

Search

Evaluation

Task Task Task Task

1 1 1 1

1 (33%) 1 (33%) 0 0

1 1 1 2

1 2 3 4

(33%) (33%) (50%) (33%)

(33%) (33%) (50%) (67%)

Note. Task Task Task Task

1 2 3 4

¼ ¼ ¼ ¼

100/3 100/3 100/2 100/3

¼ ¼ ¼ ¼

33.333 z 33, 33.333 z 33, 50, 33.333 z 33,

Task Task Task Task

1: 2: 3: 4:

0e33 0e33 0e50 0e33

¼ ¼ ¼ ¼

1; 1; 1; 1;

34e67 ¼ 2; 68e100 ¼ 3 34e67 ¼ 2; 68e100 ¼ 3 51e100 ¼ 3 34e100 ¼ 3

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

37

Table 10 Percentage of distribution of the three components in the TFA. Task type

Motivation

Noticing

Retrieval

Generation

Retention

Task Task Task Task

1 2 2 1

1 2 2 2

2 (32%) 1 (17%) 0 0

1 (17%) 0 1 (14%) 2 (32%)

1 1 2 1

1 2 3 4

(17%) (33%) (28%) (16%)

(17%) (33%) (28%) (32%)

(17%) (17%) (28%) (16%)

Note. Task Task Task Task

1 2 3 4

¼ ¼ ¼ ¼

100/6 100/6 100/7 100/6

¼ ¼ ¼ ¼

16.66616 16.666 z 16 14.285 z 14 16.66616

Task Task Task Task

1: 2: 3: 4:

0e16 0e32 0e28 0e16

¼ ¼ ¼ ¼

1; 1; 1; 1;

17e33 33e65 29e57 17e49

¼ ¼ ¼ ¼

2; 2; 2; 2;

34e66 66e82 58e72 50e82

¼ ¼ ¼ ¼

3; 3; 4; 4;

67e83 ¼ 4; 84e100 ¼ 5 83e100 ¼ 5 72e100 ¼ 5 82e100 ¼ 5

Table 11 Multiple regression analysis of variables predicting word gains.

Model 1 1. ILH 2. TFA Model 2 1. TFA 2. ILH

R

R2

DR2

DF

df

Sig. F change

.623 .722

.388 .521

.388 .133

59.623 25.863

(1, 94) (1, 93)

.000 .000

.714 .722

.510 .521

.510 .011

97.881 2.163

(1, 94) (1, 93)

.000 .145

Note. DR2 ¼ change in R2; DF ¼ change in F.

hierarchical multiple regression also confirmed that TFA better predicted the effectivness of the vocabulary learning tasks than the ILH did. Altogether the results indicated that TFA was more able to predict better vocabulary learning gains than the ILH. Besides, the study did not support the assumption for the Involvement Load Hypothesis in which tasks with proposed equal involvement load would lead to similar learning gains. For example, tasks 1, 2, and 4, did not lead to similar learning gains in the during-task stage despite of their same ILH indexes. This might be because the distinction between low and high involvement load tasks might not be large enough to create differences in the learning outcomes. Another possibility may be attributed to the different weights of the three components in the ILH, and this was also proposed by Laufer (2003) and needs more future research confirmation. On the other hand, TFA appears to suggest more sensitive factors measuring vocabulary learning. In this study, tasks 2 and 3 yielded the best during-task performance. While checking against the distribution of the five factors within the TFA between the two tasks, it was observed that both of them included motivation, noticing, and retention but lacked either retrieval or generation. Thus, it is possible that the three factors of motivation, noticing, and retention contributed to learners' initial vocabulary learning. Among the tasks; however, task 4, which consisted of productive generation by asking students to rewrite the target sentences, yielded the best retention performance. This could have been because the productive component (i.e., productive generation) of the task may have enhanced the quality of the link between the meaning and the form of the target words and hence could have increased their subsequent learning (Nation & Webb, 2011). The other three tasks did not have this productive component.

6. Conclusions and implicaitons Overall, the results suggest that of the two frameworks, namely ILH and TFA, the latter was a significantly stronger predictor of lexical gains than the former. This was, for example, evidenced by the findings that tasks that scored higher by the TFA (i.e., reading plus fill-in-the-blanks) led to better during-task performance than other tasks (e.g., multiple-choice task). Among the tasks, task 4 with a productive component led to the best word retention in the posttest, suggesting that generation plays an important role on vocabulary acquisition. This finding also suggests that form-focusedness may be a significant factor contributing to the learning of a new word (Laufer, 2005, 2006). Pedagogically, these results suggest that when designing vocabulary tasks instructors could make use of TFA as a helpful framework by checking the features against the TFA and try to use tasks that have more of the TFA features. Also, the teacher should use tasks that provide more opportunities for learner engagement, and one way of doing so would be by incorporating a productive form-focused dimension into the task (please note the most effective task in this study was task 4). This productive generation can be included in tasks such as composition writing and sentence writing to see its effect on learners with different proficiency levels and how the two variables (i.e., task and proficiency level) interact with each other. Having a productive component would not only help learners better notice their gap of knowledge (Swain, 2005), but it would also help their learning by providing opportunities for retrieval and rehearsal, which would consequently consolidate vocabulary knowledge (Keating, 2008; Laufer, 2005, 2006). Tasks with productive retrieval shown in audiovisual presentation (e.g.,

38

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

crossword puzzles) should also be strongly encouraged as they strengthen form-meaning connection (Nation & Webb, 2011). There are a few points that should be considered when interpreting the result of this study. One is that the four tasks used in the study had close involvement load indexes and technique feature scores. However, future research could use more tasks with similar and different involvement indexes and technique feature scores. Although the overall indexes of each task used in this study were based on the two frameworks and were also calculated and compared according to what has been proposed in Nation and Webb (2011), the actual weight of each factor contributing to vocabulary learning in both the ILH and TFA could not be exactly measured. This is a limitation of the two frameworks and as a result the present study was not able to evaluate them with completely accurate measures. This limitation should be considered with more caution when interpreting the results. In this study, time on task was not controlled and this can be considered as a limitation. However, time on task did not seem to have had any effect on the overall comparison in this study as the higher involvement load tasks that demand longer time to be finished did not necessarily result in better performance. For example, when doing tasks 2 and 3, participants spent more time on them than doing task 4, but still task 4 led to the highest vocabulary gain in the posttest. This suggests that other factors such as form-focusedness, retrieval, and generative production played more important roles than time on task. Another limitation is that the four tasks examined in this study had very close involvement load indexes. This was mainly because each of the three involvement load indexes varies from 1 to 3. However, future research could use more tasks with similar and different involvement loads, which then might lead to more cumulative effect and perhaps more differences in the learning gains. Finally, more qualitative observation and analysis of vocabulary learning activities are strongly needed to examine whether their use matches their goals as well as to offer some improving guidelines to adapt the techniques (Nation & Webb, 2011). Acknowledgment This study is supported by a grant from Ministry of Science and Technology in Taiwan (formerly known as National Science Council). The grant number is NSC101-2410-H-240-007. Appendix 1 Vocabulary Pretest and Posttest Please fill in the blanks with the correct Chinese translation or English synonym of the target word (請在空格內填入等同於 前面單字的中文意義或是英文同義詞)

1. scandal (n.) 2. committed (adj.) 3. asset (n.) 4. debt (n.) 5. consumption (n.) 6. priorities (n.) 7. invest (v.) 8. obsessed (adj.) 9. emerging (adj.) 10. entitlement (n.) 11. pales (v.) 12. promotion (n.) 13. differentiate (v.) 14. display (v.)

______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________ ______________________

References Baddeley, A. D. (1990). Human memory: Theory and practice. London: Lawrence Erlbaum Associates. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213e238. Craik, F. M., & Lockhart, R. S. (1972). Levels of processing: a framework for memory research. Journal of Verbal Learning and Verbal Behaviour, 11, 671e684. Craik, F. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology, 104, 268e294. CSEPT. Retrieved Nov 25, 2014 from http://www.lttc.ntu.edu.tw/CSEPT.aspx. Eckerth, J., & Tavakoli, P. (2012). The effects of word exposure frequency and elaboration of word processing on incidental L2 vocabulary acquisition through reading. Language Teaching Research, 16(2), 227e252. Ellis, N. (1994). Consciousness in L2 learning: psychological perspectives on the role of conscious processes in vocabulary acquisition. AILA Review, 11(1), 37e56. Folse, K. (2006). The effect of type of written exercise on L2 vocabulary retention. TESOL Quarterly, 40(2), 273e293. Hu, M., & Nassaji, H. (2012). Ease of inferencing, learner inferential strategies, and their relationship with retention of word meanings inferred from context. Canadian Modern Language Review, 68, 54e77. Hulstijn, J. H., & Laufer, B. (2001). Some empirical evidence for the involvement load hypothesis in vocabulary acquisition. Language Learning, 51(3), 539e558.

H.-c.M. Hu, H. Nassaji / System 56 (2016) 28e39

39

Keating, G. (2008). Task effectiveness and word learning in a second language: the involvement load hypothesis on trial. Language Teaching Research, 12(3), 365e386. Kim, Y. J. (2008). The role of task-induced involvement and learner proficiency in L2 vocabulary acquisition. Language Learning, 58(2), 285e325. Laufer, B. (2001). Reading, word-focused activities and incidental vocabulary acquisition in a second language. Prospect, 16(3), 44e54. Laufer, B. (2003). Vocabulary acquisition in a second language: do learners really acquire most vocabulary by reading? Some empirical evidence. Canadian Modern Language Review, 59, 567e587. Laufer, B. (2005). Focus on form in second language vocabulary learning. In S. H. Foster-Cohen, M. Garcia-Mayo, & J. Cenoz (Eds.), Eurosla yearbook (Vol. 5, pp. 223e250). Amsterdam: John Benjamins. Laufer, B. (2006). Comparing focus on form and focus on forms in second-language vocabulary learning. Canadian Modern Language Review, 63, 149e166. Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: the construct of task-induced involvement. Applied Linguistics, 22(1), 1e26. Laufer, B., & Rozovski-Roitblat, B. (2011). Incidental vocabulary acquisition: the effects of task type, word occurrence and their combination. Language Teaching Research, 15(4), 391e411. Lockhart, R. S., & Craik, F. I. M. (1990). Levels of processing: a retrospective commentary on a framework for memory research. Canadian Journal of Psychology, 44(1), 87e112. Nagy, W. E., Anderson, R. C., & Herman, P. A. (1987). Learning word meanings from context during normal reading. American Educational Research Journal, 24(2), 237e270. Nagy, W. E., & Herman, P. A. (1987). Breadth and depth of vocabulary knowledge: implications for acquisition and instruction. In M. G. McKeown, & M. E. Curtis (Eds.), The nature of vocabulary acquisition (pp. 19e35). Mahwah, NJ: Lawrence Erlbaum Associates. Nagy, W. E., Herman, P., & Anderson, R. C. (1985). Learning words from context. Reading Research Quarterly, 20(2), 233e253. Nassaji, H. (2003). L2 vocabulary learning from context: Strategies, knowledge sources, and their relationship with success in L2 lexical inferencing. TESOL Quarterly, 37(4), 645e670. Nassaji, H. (2004). The relationship between depth of vocabulary knowledge and L2 learners' lexical inferencing strategy use and success. The Canadian Modern Language Review / La revue canadienne des langues vivantes, 61(1), 107e134. Nassaji, H., & Hu, M. (2012). The relationship between task-induced involvement load and learning words from context. International Review of Applied Linguistics in Language Teaching (IRAL), 5, 69e86. Nation, P. (2001). Learning vocabulary in another language. Cambridge, UK: Cambridge University Press. Nation, P., & Webb, S. (2011). Researching and analyzing vocabulary. Boston: Heinle. Peters, E. (2012). The differential effects of two vocabulary instruction methods on EFL word learning: a study into task effectiveness. International Review of Applied Linguistics in Language Teaching, 50(3), 213e238. Peters, E., Hulstijn, J. H., Sercu, L., & Lutjeharms, M. (2009). Learning L2 German vocabulary through reading: the effect of three enhancement techniques compared. Language Learning, 59(1), 113e151. Pulido, D. (2007). The effects of topic familiarity and passage sight vocabulary on L2 lexical inferencing and retention through reading. Applied Linguistics, 28(1), 66e86. Pulido, D. (2009). How involved are American L2 learners of Spanish in lexical input processing tasks during reading? Studies in Second Language Acquisition, 31, 31e58. Rott, S. (2007). The effect of frequency of input-enhancements on word learning and text comprehension. Language Learning, 57(2), 165e199. Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3e32). Cambridge, UK: Cambridge University Press. Sternberg, P. (1987). Most vocabulary is learned from context. In M. G. McKeown, & M. E. Curtis (Eds.), The nature of vocabulary acquisition (pp. 89e105). Hillsdale, NJ: Lawrence Erlbaum Associates. Swain, M. (2005). The output hypothesis: theory and research. In E. Heinkel (Ed.), Handbook of research in second language teaching and learning (pp. 471e483). Mahwah, NJ: Lawrence Erlbaum Associates.