Adaptive feedback selection for intelligent tutoring systems

Adaptive feedback selection for intelligent tutoring systems

Expert Systems with Applications 38 (2011) 6146–6152 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

452KB Sizes 1 Downloads 195 Views

Expert Systems with Applications 38 (2011) 6146–6152

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Adaptive feedback selection for intelligent tutoring systems Fernando Gutierrez, John Atkinson ⇑ Department of Computer Sciences, Universidad de Concepcion, Concepcion, Chile

a r t i c l e

i n f o

Keywords: Intelligent tutoring systems Feedback strategies Machine learning Classification

a b s t r a c t In this work, an adaptive method for feedback strategy selection is proposed in the context of intelligent tutoring systems. This uses a combination of machine learning methods to automatically select the best feedback strategy for students engaging in a foreign language learning context. Experiments show that our adaptive multi-strategy feedback model allows students to achieve correct answers by reducing their errors. Results also show the promise of the method compared with traditional methods of feedback generation. The approach is not only capable of dynamically adapting a feedback strategy, but also guiding the tutorial conversation so that student’s correct answers can be obtained with a minimum feedback. Our approach also suggested that combining SVM and CRF models are promising to get effective feedback correction from student tutoring, showing that our multi-strategy selection approach outperformed the traditional meta-linguistic rules based feedback strategies. Experiments also showed a good correlation between the best strategy generated by our model and the decision taken by a human tutor. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction In the past years, particular attention has been put to intelligent tutoring systems (ITSs), which are sophisticated software systems that can provide personalized instruction to students, in some respect similar to one-on-one tutoring. Many of these systems have been shown to be very effective in procedural domains such as algebra, physics, etc. In many experiments, ITSs induced learning gains higher than those measured in a classroom environment, but lower than those obtained with one-on-one interactions with human tutors. Successful ITSs are making a significant use of basic natural language processing (NLP) techniques so as to understand students’ answers and provide an effective interaction. An important form of student-tutor interaction is feedback. Negative feedback can be provided by the tutor in response to students mistakes. An effective use of negative feedback can help the student correct a mistake and prevent him/her from repeating the same or a similar mistake again, effectively providing a learning opportunity to the student. While there have been advances for providing feedback strategies for ITS in procedural domains (Song, Hahn, Tak, & Kim, 1997), it has not been the case for non-procedural contexts such as language learning and teaching. Researchers of computer-assisted language learning (CALL) stress two key issues that should be addressed for designing feedback strategies: defining specific errors to be indicated to students and determining how extensive the

information should be about these errors. In addition, language acquisition researchers focus on whether these corrections must be implicit/explicit, and giving this, whether the correction should be through indications or questions. With these concerns in mind, some approaches have exploited the single generation of feedback for ITS for foreign language learning based on empirical observational studies (Ferreira, Moore, & Mellish, 2007). However, the feedback provided is based on static simple meta-rules and strongly grounded on particular tutoring scenarios, hence more adaptive and dynamic approaches are required to provide automatic feedback to students as the tutorial dialogue interaction goes on. Accordingly, this work proposes a new adaptive model to automatically generate feedback strategies for language learning in ITSs contexts. This is strongly based on training different machine learning models on sample student-tutor interactions of real classrooms. The approach uses a mechanism which allows for the selection of multiple corrective feedback strategies in the same foreign language session. This paper is organized as follows: Section 2 discusses the main approaches for feedback handling and highlights some machine learning methods, in Section 3 a new approach for adaptive feedback selection is presented, Section 4 describes the main experiments and results of applying our model in tutorial classrooms, and finally, the main conclusions are drawn in Section 5.

2. Related work ⇑ Corresponding author. E-mail address: [email protected] (J. Atkinson). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.058

Intelligent tutoring systems (ITSs) have been pursued for decades by researchers in education, psychology, and artificial intelli-

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152

gence. The goal of ITS is to provide the benefits of one-on-one instruction automatically. ITS enables students to practice their skills by carrying out tasks within highly interactive learning environments. Unlike other computer-based training technologies, ITS systems assess each learner’s actions within these interactive environments and develop a model of their knowledge, skills, and expertise. Based on the learner model, ITSs tailor instructional strategies, in terms of both the content and style, and provide explanations, hints, examples, and practice problems as needed. ITS for foreign language (FL) have incorporated NLP techniques e.g., analyzing language learners’ language production or modeling their knowledge of a second language to provide the learners with more flexible feedback and guidance in their learning process (Cumming, Sussex, & Cropp, 1993; Micarelli & Boylan, 1997). An outstanding example is the E-Tutor, an online Intelligent Language Tutoring System developed for German (Heift & Schulze, 2007). It provides error-specific and individualized feedback by performing a linguistic analysis of student input and adjusting feedback messages suited to learner expertise. The experience with E-tutor supports the need for a CALL system that addresses multiple errors by considering language teaching pedagogy. Other successful ITS system developed for Japanese (Nagata, 1996) employs NLP technology to enable learners to freely produce sentences and to provide detailed feedback concerning the specific nature of the learner’s errors. The tutor incorporates several lessons covering the grammatical constructions encountered in a standard undergraduate curriculum. In addition, the system allows the learner to produce any sentence because it can identify parts of speech and syntactic patterns in the learner’s sentence, on the basis of the general principles of grammar. Based on these grammatical principles, the system determines whether the sentence is grammatical and generates intelligent feedback targeted to specific deficiencies in the learner’s performance. This is thus clear that errors and corrective feedback constitute a natural part of the teaching–learning process in FL teaching. Errors can be defined as deviations from the norms of the target language, and reveal the patterns of learners’ development of inter-language systems, showing where they have over-generalized a language rule or where they have inappropriately transferred a first language rule to the foreign language. Corrective feedback is an indication to a learner that his/her use of the target language is incorrect, and includes a variety of responses that a language learner receives. Corrective feedback can be explicit (e.g., ‘‘No you should say goes, not go.’’) or implicit (e.g., ‘‘Yes, he goes to school every day.’’), and may or may not include meta-linguistic information (e.g., ‘‘Do not forget to make the verb agree with the subject.’’). In general, five types of corrective feedback strategies have been identified and divided into two groups of meta-strategies (Ferreira et al., 2007): 1. Giving-Answer Strategies (GAS): the teacher directly gives the target form corresponding to the error in a student’s answer. These include the following specific strategies: (a) Repetition of the error or the portion of the learner’s phrase containing the error. (b) Recast: reformulation of all or part of the student’s answer, providing the target form. (c) Explicit correction: The teacher provides the correct target form. This differs from recast because the teacher directly corrects the error without rephrasing or reformulating the student’s answer. (d) Give answer: used in cases when the student does not know or is unsure of the answer. 2. Prompting-Answer Strategies (PAS): the teacher pushes students to notice a language error in their response and to repair the error for themselves. These include the following specific strategies:

6147

(a) Meta-linguisticcues: the teacher provides information or asks questions regarding the correctness of the student’s utterance, without explicitly providing the target form. (b) Clarification requests: questions intended to indicate to the student that his/her answer has been misunderstood due to a student error. (c) Elicitation: The teacher encourages the student to give the correct form by pausing to allow the student to complete the teacher’s utterance, by asking the student to reformulate the utterance, etc. Different studies on second language acquisition suggest that feedback should be handled as in reality: precise and concise, otherwise, extensive feedback is not usually read, becoming useless (Sams, 1995; Van der Linden, 1993). Some approaches showed that almost 40% of analyzed student’s answers had more than one error (Heift, 2001). However, foreign language teachers usually focus only on relevant errors, discussing them in depth. As a consequence, feedback generation by the tutor is carried out by selecting the best strategy based on a student’s performance (Heift & Schulze, 2007). Some studies indicate that the effectiveness of the error-specific feedback (based on metalinguistic rules) is better than the traditional feedback in CALL, in which the feedback only established whether the answers is incomplete or unexpected (Nagata, 1996). Nevertheless, this kind of feedback is not typically used by foreign language teachers in classrooms. In these, the cycle question– answer-feedback is due to the dialogue interaction between two agents: the teacher (tutor) and the student. For intelligent CALL (iCALL) systems, the teacher is replaced with the system, hence it becomes a key issue to conceive effective conversational dialogue models. The aim of this teaching–learning scenario is to generate suitable corrective feedback in order for students to be aware of their errors. For this, a dialogue model should answer questions such as: what should the system say so as to get a student’s correct answer? In this kind of dialogue, conversation is carried out by communicating utterances, each containing an explicit/implicit structure of their meanings. In order to uncover this underlying structure, several predictive and classification methods have been used such as hidden markov model (HMM) and conditional random fields (CRF). On the other hand, detecting the meaning of a dialogue utterance by itself has been seen as a classification of candidate senses hence supervised machine learning methods have been applied including support vector machines (SVM), maximum entropy and Gaussian mixture models (GMM) (Abe, 2005; Campbell, Campbell, Reynolds, Singer, & Torres-Carrasquillo, 2006). Once individual senses for the utterances have been detected, the underlying dialogue relationships are looked for Blaylock and Allen (2006). Unlike state-of-the-art dialogue models, the task of dialogue planning and recognition for ITS is naturally restricted to a simple kind of tutor–student interaction in which the dialogue’s objective is clearly defined (i.e., learn a language). In addition, the interaction is mainly based on questions and answers in which dialogue turns are explicitly carried out once the question is made (i.e., question–answer adjacent pair). 2.1. Machine learning methods A Hidden Markov Model (HMM) is a Markov model in which the states cannot be observed but symbols that are consumed or produced by transition are observable. Each state of a HMM is associated with a probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution in which

6148

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152

the observations must be expressed independently. It is only the outcome, not the state visible to an external observer and therefore states are hidden to the outside. HMMs have been successfully been applied in several NLP tasks including named-entity recognition, information extraction, part-of-speech tagging, etc. (Blaylock & Allen, 2006; Jurafsky & Martin, 2008; McCallum & Li, 2003). However, several features observed in linguistic data cannot usually be independent (i.e., dialogue moves) (Lafferty, McCallum, & Pereira, 2001), hence more powerful representations and inference mechanisms are required. A recent approach that aims to deal with this issue, uses the notion of conditional random fields (CRFs). These model in a compact form, the conditional probability of a sequence of labels given a sequence of observations. While HMM restricts observations to be independent, CRF can incorporate complex features of the observations without violating the independence assumption, and at the same time, ensuring tractable inferences. Specifically, CRFs are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models and other conditional Markov models based on directed graphical models. CRFs outperform HMMs on a number of realworld tasks in many fields, including opinion mining (Choi, Cardie, Riloff, & Patwardhan, 2005), morphology (Kudo, Yamamoto, & Matsumoto, 2004; McCallum & Li, 2003), and semantic role labeling (Yang, Lin, & Chen, 2007). Despite their advantages, the methods are not very natural to cope with multi-dimensional classification problems. A recent kind of technique which has successfully been applied in combination with the previous methods to many NLP tasks is known as Support Vector machines (SVMs) which are a type of machine learning methods used for classification and regression. A support vector machine constructs a hyperplane or set of hyperplanes in a high-dimensional space, which can be used for classification. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training datapoints of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier (Abe, 2005). A SVM is characterized by the use of kernels to maximum-margin hyperplanes, which appear to be among the most significant features in machine learning methods. Recently applications have been reported for text categorization, topic detection, speech acts tagging (Liu, Wang, & Xiao, 2007), and summarization (Hoefel & Elkan, 2008; Yang et al., 2007).

3. An adaptive model for feedback multi-strategy selection In this work, a new approach for adaptive feedback multi-strategy selection is proposed as a part of a ITS system. The overall structure of the three-stage model considering the error detection

component and the feedback generator based on Criswell, Byrnes, and Pfister (1991), Heift and Schulze (2007) can be seen at Fig. 1. Firstly, the error detection component identifies the type of the student’s error when answering the system’s question. Based on this error, the feedback strategy selection determines the best meta-strategy (GAS or PAS) and then the specific strategy to be passed onto the feedback generation component. This strategy selection mechanism is responsible to determine the feedback contents to be provided to the student based on the previous experience and the current state of the dialogue. The specific strategy is then sent to the Feedback Generator module so that a correction can be created for the student’s error. Note that the strategy is automatically selected according to decisions of machine learning methods used for this purposes. In addition, multiple strategies are generated for one tutor–student session depending on each student’s error so that the system can adapt to different FL tutorial scenarios.

3.1. Feedback strategy selection Recent approaches to feedback generation use simple and static decision rules where the type of detected error triggers a particular strategy (Ferreira et al., 2007). While this kind of approach is easy to implement, its rigid structure does not enable multiple feedback interaction between system (tutor) and student. In addition, decision rules must be known before implementing the system, so that as a consequence, any unexpected feedback interaction is not allowed. A more feasible approach involves generating feedback classifiers by automatically learning from previous dialogue interactions, using probabilistic methods. These can also adapt to new tutoring scenarios and new student’s behavior. Thus, the feedback becomes a natural consequence of the teacher-student interaction, which allows to advance on the learning cycle. Each set of interactions is triggered by the system (the teacher) who makes the questions thus producing a question structure as seen in Table 1. A student’s error is supposed to receive only one feedback (Garrett, 1987) hence question structures are represented as a sequence of answer-feedback pairs (i.e., (verbal error, elicitation), (correct, Rephrasing)). For our approach, this sequence is modeled using HMM in which:  Hidden states represent the system’s actions.  Observable states represent the student’s answers. Thus, feedback strategies can be predicted from student’s error based on previous observations. In order to restrict the scope of the error handling, the types of errors can be taken as an input for strategy selection include lexical, grammar and pronunciation errors. Based on previous empirical and observational studies (Ferreira et al., 2007), two meta-strategies are handled (PAS and GAS) in a first stage of the classification and the specific strategies are dealt with in a second staget (i.e., Metalinguistic cues, Clarification, and

Fig. 1. The model for adaptive feedback strategy selection.

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152 Table 1 Sample of question structure pairs. Answer

Feedback

QUESTION NO. 1 LEXICAL

ELICITATION

GRAMMAR

CLARIFICATION

PRONUNCIATION

CLARIFICATION

CORRECT

ACCEPT

GRAMMAR

ELICITATION

PRONUNCIATION

REPETITION

CORRECT

ACCEPT

sponds to the meta-strategy MSt which can be either GAS (G) or PAS (P). 2. Determining the Specific Feedback Strategy:On this stage the feedback strategy is obtained by using CRF predictors having the vector Mt as an input. This contains both the MS and the error for the times t and t  1. Such information is received by CRF in the form

2

QUESTION NO. 2

6149

error t

3

6 MS 7 t 7 6 Mt ¼ 6 7 4 error t1 5

ð2Þ

MSt1 Elicitation). In addition, either a correct answer or an accept feedback indicates the end of a question structure. An example of a typical student-tutor interaction using these strategies can be seen as follows: Incomplete < Student>. they have changed. . . Elicitation < Tutor>. Sandra? Incomplete < Student>. they have changed. . . Elicitation < Tutor>. they have changed. . . Correct < Student>. they have changed over the years. Accept < Tutor>. they have changed over the years, right? ... Formally, the strategy selection component (SE) receives the error type (errort) from the Error Detection component, and computes, by using HMM, the most probable strategy for this error (St) : St = SE(errort). In general, the task of strategy selection is divided into two stages: one predicts a general meta-strategy (i.e., GAS or PAS) and the other identifies the specific feedback based on the previous error and this meta-strategy (see Fig. 2). 1. Identifying a meta-strategy: In order to guide the selection process for the strategies, an intermediate step is included which leads to what is called a Meta-Strategy (MS). This corresponds to one of the two groups of strategies presented earlier (Ferreira et al., 2007): GAS and PAS. The selection of this meta-strategy is done through SVM with the polynomial kernel inverted (Hoefel & Elkan, 2008; Liu et al., 2007; Yang et al., 2007), which labels and classifies the linguistic structures. The input received by SVM corresponds to the vector It which is defined as:

2

error t

3

6 7 It ¼ 4 error t1 5

ð1Þ

MSt1 where errort is the last error made by the student, errort1 is the immediate earlier error made by the student and MSt1 is the last meta-strategy selected by the system. The output of SVM corre-

where errort and MSt are the error made by the student and the meta-strategy at the instant t respectively, while errort1 and MSt1 are the corresponding values of the error made by the student and the meta-strategy MSt1 for the instant t  1. As output CRF produces the feedback strategy St which can be one of those proposed by Ferreira et al. (2007). This output (St) of CRF which is the output of the strategy selection system itself also constitutes the specific feedback to the error errort made by the student. Note that while the specific strategy St is strongly tied to a meta-strategy MSt (Ferreira et al., 2007), it does not imply that whenever MSt = G then St becomes a strategy from the GAS type. This is mainly due to the nature of the training data and the fact that the CRF prediction method uses all the sequence of observations and not only that of the current time t. Hence, for some scenarios the final strategy might not be that of the selected meta-strategy (St R MSt). Furthermore, since that changing from one meta-strategy to another can be very usual, it can be the case that in a first stage SE, the most important feature of the model in time t for meta-strategy MSt is errort. Hence for SVM, features MSt1 and errort1 are quite complementary. However, in a second stage, the sequence of the observation allows the method to determine Si. This, in turn, provides a major flexibility to the strategy selection mechanism. 4. Experiments and results In order to assess the effectiveness of our model, a prototype was built with which different setting and FL classroom experiments were conducted. Results of applying our approach to automatically generate feedback strategies were compared against traditional feedback strategies based on metalinguistic rules used for iCALL. The experiments were organized into two activities: one for parameter setting of our multi-strategy selection method and other assessing the learning effectiveness of the whole approach and comparing it with other metalinguistic-based mechanisms. For training, two datasets were used:

Fig. 2. Task of multi-strategy feedback selection.

6150

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152

 Transcriptions of 10 Spanish lectures for English native speakers.  Transcriptions of 10 English lectures for Spanish native speakers. These were annotated with the question–answer adjacent structures existing in the conversations and then filtered and formatted leaving all the information not directly related with answers, feedback, etc. Each answer was correlated with one of the defined feedback strategies, producing 1721 answer-feedback pairs and 482 question structures, each containing between 1 and 17 answer-feedback pairs. Of these, a 80% was used for training the predictors and classifiers, and the 20% was used for testing purposes. Overall accuracy of the methods was evaluated using typical a m-fold cross-validation method with m = 5 as the size of the sample data was not very large. 4.1. Settings the methods Two kind of methods were used to evaluate the accuracy of the strategy selection: structural probabilistic methods (HMM and CRF) for predicting meta-strategy, and binary classifiers (SVM) to classify specific feedback strategies. The methods were individually evaluated by receiving the student’s answers and predicting the correct strategy as compared with the testing data. The first experiment (Table 2(a)) showed that individual accuracy of the methods were not very high so we then assessed the precision of combining different methods, having the SVM as that first predicting the meta-strategy. Resulting accuracy for such combinations can be seen in Table 2(b). While the combination SVM/CRF achieved the best setting results (Hoefel & Elkan, 2008; Liu et al., 2007; Yang et al., 2007), these suffered from some ambiguity in annotating the transcriptions such as the corrective feedback. This best configuration can be used for real tutoring experiments in order to predict the feedback strategies (t-test = 2.64, p < 0.01).

2. If the student’s answer is not correct, the teacher classifies it according to one of the four groups of possible incorrect answers delivered by our system. This then provides the teacher with a specific strategy which is interpreted by the teacher and given to the students. 3. If the new answer is correct, the teacher goes on with another question. Otherwise, the teacher must repeat the step 2 until the student provides the correct answer. For the test group, we compared the feedback strategy generated by our model against the best strategy selected by the teacher for a student’s given answer. On the other hand, for the control group the same procedure was followed. However, the teacher only considered the meta-linguistic strategy as this is usually applied for iCALL tasks. Note that this kind of strategy is far more difficult to apply in classrooms that in iCALL systems. The student’s answer and the strategy given to the teacher (who interpreted it before feeding-back to the student) is kept as answerfeedback pairs. The 90-minutes lecture was stored in the form of 146 answer-feedback pairs which led to 46 question structures for the test group, whereas for the control group, 101 answer-feedback pairs which led to 46 question structures were extracted as seen in Table 3. The real experiments aimed to establish whether adaptive feedback multi-strategy selection is more effective than single-strategy approaches. Effectiveness is seen as the reduction of the number of student’s errors before getting a correct answer. This number will be referred to as the Length of the Question Structure (LQS), hence Table 3 Data summary for experiments.

Duration (min) Question structures Answer-feedback pairs No. of students

Control group

Test group

80 21 101 21

90 46 146 18

4.2. Assessing the effectiveness of adaptive strategy selection In order to assess the effectiveness of our approach, two real lectures were used, both having the same teacher. One classroom was used as the control group and the other used as the test group which uses our adaptive method for strategy selection. The experiments aimed to evaluate the extent to which our strategy selection approach outperforms traditional methods in terms of reducing student’s error when learning in ITS contexts (Criswell et al., 1991; Heift & Schulze, 2007). Accordingly, the following activities were carried out: 1. The teacher makes the question and one of the students gives an answer. If the answer is correct then the teacher makes another question. Table 2 Accuracy of (a) individual and (b) combined machine learning methods. Method

Accuracy

Panel (a) CRF CRF (linear) HMM SVM

71,8% 68,5% 67,8% 75,2

Panel (b) SVM/CRF SVM/CRF (linear) SVM/HMM

79,4 % 78,9% 75,6%

Fig. 3. Frequency of LQS for different target groups.

Table 4 Summary of obtained LQS for the different groups.

Average errors Maximum LQS Frequency in maximum LQS Frequency in minimum LQS

Control

Test

3.2 7 18% 24%

2.2 5 6% 37%

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152

6151

Fig. 4. Correlation between the model and the FL teacher.

for each question, the model is more effective for smaller values of LQS. It should be clear that each question structure ends with the pair correct answer-positive feedback. Thus, the LQS equals the number of corrective feedbacks provided for each question structure. Note however that one-to-one tutorial interactions differs from those of several students – one teacher used here. Accordingly, two observations are highlighted: 1. Several (correct and incorrect) answers are possible within the same level of a question structure so that one question might give rise to a series of question structures. For this, all these question structures are considered independently. 2. Each question structure ends with the correct answer so that the number of correct answers for each question is 1, whereas the number of corrective feedbacks is equal or larger than 1. All the relevant information is contained in the LQS distribution for both groups shown in Fig. 3. In order to compare groups having different numbers of answer-feedback pairs, these distributions are normalized to the unity. In addition, question structures which do not contain a corrective feedback (or those which have been cut without reaching a correct answer) have been removed. From the graphics, the following results can be observed: 1. For the control group:  The range of LQS (between 1 and 7 errors per question) is larger than that for the test group (i..e, 1–5).  There is a significant number of cases (18%) with 7 errors, whereas for the test group, there are 5 corrective feedbacks for only 7% of the cases. 2. For the test group:  The first corrective feedback (after the first error of the student) leads to the correct answer in 37% of the cases. for the control group, this leads only to the correct answer in 24% of the cases.  In 36% of the cases, more than two corrective feedbacks were required before getting the correct answer, whereas for the control group it was for 52% of the cases. This suggests that our multi-strategy strategy selection method might be more effective than the single-strategy approach. In order to globally confirm this hypothesis, the average value of the LQS Pn ðLQSi Þ was computed for both target groups as: hLQSi ¼ i¼1n , where (LQSi) is the Length of the Question Structure for the question i, and n is the number of question structures for each group (46 for

test group and 21 for control group). A simple calculation yields hLQSitest = 2.2 and hLQSicontrol = 3.2 with v2 = 12.27 and p < 0.01, which confirms the larger effectiveness of the proposed method above the single one. Further details of errors and LQS can be seen in Table 4. In order to determine the extent to which the model’s selection correlates with the decisions made by the human tutor (English teacher), a correlation analysis was carried out as shown in graphics of Fig. 4. The graphics shows that the teacher and the model are well correlated with (Spearman) r = 0.73, suggesting that our model is a good predictor of feedback strategies to be delivered in ITSs for foreign language. 5. Conclusions This work proposed as new approach for adaptive feedback multi-strategy selection in ITSs for foreign language based on a combination of machine learning methods. These learn questionstructure pairs from historical annotated transcription of classrooms interactions so as to predict the best feedback strategy to be provide for students. Different experiments show the promise of the approach compared with traditional methods of feedback generation. The approach is not only capable of dynamically adapting feedback strategy to the tutorial dialogue state, but also guiding the tutorial conversation so that student’s correct answers can be obtained with a minimum feedback. While individual machine learning methods did not show very high accuracy results, our approach suggested that a hybrid architecture, combining SVM and CRF models were more promising. By analyzing metrics such as the length of the question structure (LQS), the experiments assessed the number of feedback given to students before getting a correct answer. These showed that our multi-strategy selection approach outperformed the typical metalinguistic rules based feedback strategies (Ferreira et al., 2007). A good correlation was also found between the best strategy generated by our model and the decision taken by the teacher. However, it is clear that further setting and adjusting experiments are required in order to find the best learning parameters for the classification methods as these are still dependent on the size of the annotated dataset (transcripts and question-structure pairs) and its quality (ambiguities in annotations should be dealt with). Acknowledgement This research is partially sponsored by the Universidad de Concepcion, Chile under grant number DIUC No. 210.093.015-1.0:

6152

F. Gutierrez, J. Atkinson / Expert Systems with Applications 38 (2011) 6146–6152

‘‘Shallow Adaptive Planning for Intelligent Web-based NaturalLanguage Dialogue’’. References Abe, S. (2005). Support vector machines for pattern classification. Springer. Blaylock, N., & Allen, J. (2006). Fast hierarchical goal schema recognition. In AAAI. Campbell, W., Campbell, J., Reynolds, D., Singer, E., & Torres-Carrasquillo, P. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229. odyssey 2004: The speaker and language recognition workshop. Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In HLT ’05: Proceedings of the conference on human language technology and empirical methods in natural language processing, association for computational linguistics, Morristown, NJ, USA (pp. 355–362). doi:http://dx.doi.org/10.3115/1220575.1220620. Criswell, E., Byrnes, H., & Pfister, G. (1991). Intelligent automated strategies of teaching foreign language in context. In Intelligent tutor in systems for foreign language learning: The bridge to international communication. Cumming, G., Sussex, R., & Cropp, S. (1993). Learning English as a second language: Towards the ‘‘mayday’’ intelligent educational system. Computers & Education, 20(1), 119–126. Ferreira, A., Moore, J., & Mellish, C. (2007). A study of feedback strategies in foreign language classrooms and tutorials with implications for intelligent computerassisted language learning systems. International Journal of Artificial Intelligence and Education, 17, 389–422. Garrett, N. (1987). A psycholinguistic perspective on grammar and CALL. In W. F. Smith (Ed.), Modern media in foreign language education: Theory and implementation, (pp. 169–196), Lincolnwood, IL: National Textbook. Heift, T. (2001). Error-specific and individualised feedback in a Web-based language tutoring system& colon; do they read it? ReCALL, 13(1), 99–109. ISSN 09583440, doi http://dx.doi.org/10.1017/S095834400100091X. Heift, T., & Schulze, M. (2007). Errors and intelligence in computer-assisted language learning: Parsers and pedagogues. Series: Routledge studies in computer-assisted language learning. New York: Routledge.

Hoefel, G., & Elkan, C. (2008). Learning a two-stage SVM/CRF sequence classifier. In CIKM08. California, USA: Napa Valley. Jurafsky, D., & Martin, J. (2008). Speech and language processing (2nd ed.). Prentice Hall. Kudo, T., Yamamoto, K., & Matsumoto, Y. (2004). Applying conditional random fields to Japanese morphological analysis. In EMNLP. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning, Williamstown, MA. Liu, J., Wang, Z., & Xiao, X. (2007). A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition. Pattern Recognition Letters, 28(8), 912–920. ISSN 0167-8655, doi:http://dx.doi.org/10.1016/ j.patrec.2006.12.007. McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Seventh conference on natural language learning (CoNLL). Micarelli, A., & Boylan, P. (1997). Conversation rebuilding: From the foreign language classroom to implementation in an intelligent tutoring system. Computers & Education, 29(4), 163–180. Nagata, N. (1996). Computer vs. workbook instruction in second language acquisition. CALICO, 14, 53–75. Sams, M. (1995). Advanced Technologies for Language Learning: The BRIDGE project within the ARI language tutor program. In M. Holland, J. Kaplan, & M. Sams (Eds.), Intelligent language tutors: Theory shaping technology (pp. 7–21). Hillsdale, NJ: Lawrence Erlbaum Associates. Song, J., Hahn, S., Tak, K., & Kim, J. (1997). An intelligent tutoring system for introductory C language course. Computers & Education, 28(2), 93–102. Van der Linden, E. (1993). Does feedback enhance computer-assisted language learning? Computers & Education, 21(1-2), 61–65. ISSN 0360-1315, doi: http:// dx.doi.org/10.1016/0360-1315(93)90048-N. Yang, C., Lin, K. H. -Y., & Chen, H. -H. (2007). Emotion classification using Web blog corpora. In WI ’07: Proceedings of the IEEE/WIC/ACM international conference on Web intelligence, IEEE computer society, Washington, DC, USA (pp. 275–278). ISBN 0-7695-3026-5. doi:http://dx.doi.org/10.1109/WI.2007.50.