Interactive and dynamic review course composition system utilizing contextual semantic expansion and discrete particle swarm optimization

Interactive and dynamic review course composition system utilizing contextual semantic expansion and discrete particle swarm optimization

Expert Systems with Applications 36 (2009) 9663–9673 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

1MB Sizes 0 Downloads 11 Views

Expert Systems with Applications 36 (2009) 9663–9673

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Interactive and dynamic review course composition system utilizing contextual semantic expansion and discrete particle swarm optimization Tzone I. Wang, Kun Hua Tsai * Laboratory of Intelligent Network Applications, Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan

a r t i c l e

i n f o

Keywords: Course composition e-Learning Contextual semantic expansion Discrete particle swarm optimization

a b s t r a c t In the present learning cycle, new knowledge learning and known knowledge review are two important learning processes. Currently, the major attempts of e-Learning systems are devoted to promote the learners’ learning efficiency in new knowledge learning, but only few in known knowledge review. Hence, this paper proposes the review course composition system which adopts the discrete particle swarm optimization to quickly pick the suitable materials, and can be customized in accordance with the learner’s intention. Furthermore, the greed-like materials sequencing approach is also proposed to smoothe the reading order of the course. As a result, such a composition system satisfies the majority of learners with the customized review courses based on their needs. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction At present, numerous tutoring systems have been advanced in the fields of e-Learning and far-distance learning for the ultimate objective to propose the perfect and mature tutoring system, not only containing multiple and standardized teaching materials but also to promote learning efficiency for different kind of learner. However, in the learning and understanding of knowledge in any course, two main domains, the learning of new concepts as well as the review of known concepts, should be eligibly conferred. In our previous researches related, most e-Learning systems have made efforts in the promotion of learning efficiency of new concepts in a course, where they decomposed the courses as many learning units as a linear order or a graph for concepts learning. The systems such as Chen and Duh (2008), Huang, Huang, and Chen (2007), Wang, Wang, and Huang (2008) and Yang and Wu (2009) all adopted the linear learning approach associated with the mastery theory to tutor the learners. After learning a unit, a relevant exam is given, which will evaluate whether the learner is qualified to go to the next unit; otherwise he/she has to stay in the current unit again until the exam result meets the requirement. Huey-Ing and Min-Num (2005) utilizes dynamic multiple-paths, by each of which each learner is directed to different units in this system, where a course is considered as a directed graph with each node representing a learning unit, and the directed edge, between two nodes, representing the difficulty degree. After learning a unit, if the exam score is greater than the passing threshold between the

* Corresponding author. E-mail addresses: [email protected] (T.I. Wang), [email protected] (K.H. Tsai). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.12.010

current unit and next target unit, it means the learner can go to next target unit directly; otherwise, the learner has to go other path to the target unit. In other words, the learner needs learn additional units in order to move to the next target unit. Similarly, in the systems by Huang, Huang, and Cheng (2008) Wang, Huang, Jeng, and Wang (2008), they utilized the above learning strategies and combined with some auxiliary materials like Blogs to assist the learner. If the exam result is not qualified, the systems will recommend some auxiliary materials associated with current subjects instead of original teaching materials. In many concerning experiments, the above systems have been proved that they indeed promoted the learning efficiency of the learners. Unfortunately, the above systems mostly focused on the learning of new concepts in a course and few involving in keeping the memory of known concepts. According to the research (Chen & Chung, 2008; Waugh & Norman, 1965), a learner forgot what he has studied easily after he/she had learned them for a period of time. In other words, a learner’s memory retention decreases gradually with the time. In order to overcome this problem, adopting the review approach is to be more essential and practical. For this reason, this paper proposes an Interactive and Dynamic Review Course Composition System which can automatically compose and plan review materials based on a learner’s intention with discrete particle swarm optimization. First, this system adopts the notion of contextual semantic expansion to expand the learner’s intention by the Concept Semantic Map (CSM). Next, it utilizes the discrete particle swarm optimization (DPSO) to tackle the problem that what materials should be selected and composed of. In addition, it employs the greedy-like approach to plan the suitable reading order of the materials in the customized review course. The rest of this paper is structured as follows: in Section 2, related works,

9664

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

including the introduction of CSM and the original version of particle swarm optimization (PSO), are presented. Section 3 exhibits the framework of proposed system and illustrates the process of composing customized review course. The discussions and evaluations of proposed system are signified in Section 4. Finally, a brief conclusion is given in Section 5.

where RV(ci, cj) represents the degree of a hierarchical relationship or interactive relationship. In application, RV(ci, cj) is defined as the probability that cj is the child of ci in the hierarchy, while in the notion of interactive relationship, RV(ci, cj) represents the probability that cj appears in the sentence which contains ci. According to the MFCA and two defined relationships, CSM is constructed, which contains three steps, detailed as follows:

2. Related works

Step 1: Determine relevant concepts.For the CSM construction, the first task is to collect the complete materials of a course. For example, the study case of this paper is JAVA so it samples the documents from the Java Tutorial which is provided by Sun (Sun). According to the opinions of experts, the relevant concepts are opted for and put into a concept set. Next, the content of each material is decomposed automatically into some sentences, each of which is regarded as the analysis unit, where the included concepts are tagged according to the sentence content. Step 2: Produce the context of concepts and sentences.The context of concepts and sentences represents the matrix as shown in the Table 1. In the columns, the fields C1–C6 stand for the relevant concepts which belong to the specific course. In the rows, the sentences S1–S5 are obtained from the contents of materials in the course. In the matrix, the element value is set to 1 if the concept c appears in the sentence s (where is marked as ). Otherwise, the value is set to zero. Step 3: Construct CSMThe CSM construction is based on the MFCA approach. If S(Ca) and S(Cb) are the sets which include some sentences in which the concepts Ca and Cb exist, respectively, the notion of MFCA, which constructs the CSM will display that the concept Ca should be a parent concept, and the concept Cb should be a child concept if S(Cb) is the subset of S(Ca). The basic principle goes that a general concept should be appeared in more documents or sentences than just in a specific concept; for instance, considering the concept C1 and C4 in Table 1, the concept C1 is included in the S1, S2 and S3 while the concept C4 appears in the S1 and S2. According to the above notion, C1 will be a parent concept of C4. After constructing the hierarchy of CSM, the degrees of the hierarchical and interactive relationships between concepts are thus calculated by the formula (1).

In order to specify a customized review course which contains suitable materials in accordance with a learner’s desire, this paper adopts two technologies to satisfy the need; the first one applied is Concept Semantic Map (CSM), which is used to guide the system to find out what concepts should be recommended to review for a learner. Next, the original version of combination optimization algorithm, PSO, is introduced in this section first but the modified PSO is described in later section. 2.1. Concept semantic map Comprehending a learner’s intention is very important if the system wants to precisely produce a customized review course for the learner. For this reason, this paper proposes the CSM which can clearly describe what concepts exist in this course domain and also point the contextual relations among concepts. By the guidance of CSM, the system can easily find out which concepts are close to the learner’s intention. Generally, CSM construction are divided into manual approach and semi-automatic approach, while obviously, the former takes a lot of expert time than the latter despite the former can provide the more precise information. This paper adopts a semi-automatic construction approach based on the Formal Concept Analysis (FCA) (Formica, 2008; Hashemi, De Agostino, Westgeest, & Talburt, 2004; Huaiguo, 2006) to construct CSM in the specific course domain. The proposed approach can quickly and easily to construct CSM through the predefined information of experts and the modified FCA (MFCA) algorithm (Weng, Tsai, Liu, & Hsu, 2006; Yaohua & Yiyu, 2005). The CSM contains two elements, the concept, which represents a notion that a learner needs study or review, and the relation, which is a different relationship between two concepts. The relation connotes two distinct relationships, the hierarchical relationship and the interactive relationship, for each of which there is a relation value to indicate the intensity between two concepts. Besides, each relationship is featured asymmetric; namely, the relation value in the relationship, R(C1,C2) is not equal to that in the relationship, R(C2,C1) where C1 and C2 represent two different concepts. The definitions of the relationships are described as follows. Definition 1 (Hierarchical relationship). In a course, a general concept may cover several specialized concepts. In CSM, the hierarchical relationships exist between a general concept and its specialized concepts. The general concept can be considered as a parent concept while the specialized concept represents a child concept (or subconcept). Definition 2 (Interactive relationship). By and large, different concepts in a course may construct no hierarchical relationships, but the interactive relationship may exist among them. The interactive relationship between two concepts is estimated according to the frequency at which the concepts appear in the same sentence. For the above hierarchical and interactive relationships, their strength values are calculated by the formula (1):

RVðci ; cj Þ ¼

the number of sentences which contain ci and cj ; the number of sentences which contain ci ð1Þ

2.2. Particle swarm optimization The original version of particle swarm optimization (PSO) for the optimal combination problems was proposed by Kennedy and Eberhart (Eberhart & Kennedy, 1995; Kennedy & Eberhart, 1995). This approach was developed by a simulation of social behavior models. PSO keeps a swarm of particles, which are like the bird flocking, in which each particle represents a potential solution to an optimization problem. The primary notion of PSO is that each particle keeps track of its coordinates in an N-dimensional problem space which is related to the optimal solution it

Table 1 The context of a course. Sentence

Concept

S1 S2 S3 S4 S5

  

C1

C2

C3

 

 

C4

C5

C6

 



  

 

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

has achieved so far. Using the designed fitness function, the calculated solution can provide a quantitative value of the particle’s location. In the beginning, PSO initiates a swarm of particles randomly including the random positions and velocities to search for the optimal solution. In every generation, each particle moves to a new position according to the velocity function, which follows three values, the inertia of each particle itself, the personal best position of each particle in the past experience, called pbest, and the global optimal solution of total particles in the past iterations, called gbest. When the termination condition as maximum of iterations has been attained, the PSO process would be terminated. The PSO has been proved that it can solve the optimal problems usefully (Cheng, Lin, & Huang, 2009; Huang et al., 2008; Zhao & Yang, 2009). Unfortunately, Kennedy and Eberhart (1997) find that PSO only could be applied to solve the problems when their solutions are in the continuous space. However, for many optimal combination problems in the real world, their solutions may exist in the discrete space. In order to cope with this quandary, this paper applies the solution (Chen & Wang, 2007) to overcome the discrete problem, and the detail would be illustrated in the later section. 3. Review course composition system In this section, several important processes for composing customized review course are introduced. The Subsection 3.1 gives an overview of the proposed system first. Subsequently, the concept expansion approach based on the contextual semanteme is described in the Section 3.2. Next, the materials picking with DPSO is proposed in detail in the Section 3.3. Finally, the greedy-like materials sequencing approach is explained in the Section 3.4.

3.1. An overview of proposed system In this subsection, the proposed system which can dynamically compose a suite of review course in accordance with a learner’s intention is introduced. Fig. 1 shows the basic architecture, which contains main four modules for dynamic review course composition. In the first module, Concept Analysis and Expansion is responsible for receiving the query that a learner issued, and

9665

decomposing and transferring it into several meaningful concepts, called primitive concepts. For the purpose of smooth review, this module expands the primitive concepts through the guidance of CSM and produces an extensive concept set including those to be reviewed. Subsequently, materials picking with DPSO module searches for the related review materials which are associated with the concepts in the expanded concept set. Later, in the course composition process, Course Composition module plans the order of selected materials by the greedy-like approach. Finally, a customized review course is completed for the learner. Especially, the proposed system also provides the module, Interactive Tagger, which enables the learner to tag the studied material. When the tagger finds that a material is tagged by some extra concepts frequently, this extra concept will be built into the tag set of this material. By the tag function, the tags of each material can be more conformed to the content of this material. 3.2. Contextual semantic expansion The first issue for customizing review course is that what materials should be picked. This problem can be transferred into what concepts the learner should review because these teaching materials can be considered as the instances of concepts in the course domain. An intuitional idea of obtaining the concepts is to analyze the learner’s query. Unfortunately, it usually is more difficult to issue the precise query to reflect their intention for a majority of learners. The provided materials may not satisfy the learner if the system does not receive adequate inputs; hence, this paper presents a concept expansion algorithm to solve this problem. The proposed algorithm determines which concepts need to be reviewed according to their contextual semantic in CSM, and also decides their weights which need to be reviewed. The process of expanding concepts contains four steps and the detailed contents are introduced as follows: Step 1: Decide primitive concepts and map to CSM. When a learner issues a query according to his attempt, the system receives this query and decomposes it to several primitive concepts by the tokenization, lowercasing, stop-word removing, and stemming functions. An issued query can be treated as a simple document and there are many useless terms, such

Fig. 1. The framework of interactive and dynamic review course composition system.

9666

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

as definite article, preposition, and etc. A stop words list is referred when removing those terms. Stemming reduces inflected (or sometimes derived) words to their stem, base or root form. For example, the words ended with ‘‘ed”, ‘‘ing”, or ‘‘ly”, are removed. This research uses the Porter’s stemming algorithm (Porter). Subsequently, each primitive concept is mapped into CSM. As shown in Fig. 2, the graph represents a simple CSM, and the gray nodes represent the concepts which are mapped by the primitive concepts. In the Fig. 2, two concepts {G, H} are mapped. Step 2: Find out the hierarchical concepts of primitive concepts. In the course domain, a general concept contains several specified concepts. From the viewpoint of review, the specified concepts should be reviewed if their general concept is the review target. According to this notion, the subconcepts of each primitive concept are found out from CSM. For each primitive concept c, the system finds out all its subconcepts and builds a hierarchical graph of the c. In expanding process, a threshold Sh can be set to decide the depth of a hierarchical graph. In the formula (1), it calculates the strength value of hierarchical relationship between two concepts. For a concept c1, its subconcept c2 can be expanded if the RV(c1,c2) is greater than or equal to the Sh, whose example is shown in Fig. 3. The hierarchical graphs in Fig. 3 are built from the Fig. 2, and their Sh is set to zero. This means all subconcepts will be expanded. For the primitive concept, G, its hierarchical graph includes {G, I, J, E, F}, and the other primitive concept H, it contains {H, J, K, E}. Step 3: Find out the interactive concepts of primitive concepts. In the proposed CSM, its relations contain not only the hierarchical relationship, but also the interactive relationship, which exhibits the appearance strength of a concept c1 and a concept c2 exist at the same time. From the viewpoint of review, if the

concept c1 and the concept c2 always appear in many materials frequently, then it means that the concept c2 should be provided if the concept c1 is the learner’s review target. Similarly, a threshold Si is set to decide which interactive concepts should be expanded. The threshold is suggested to assign a value which is greater than zero. If the threshold is zero, it will result in a larger number of interactive concepts are expanded. Fig. 4 manifests the interactive graph with Si = 0. For the primitive concepts G and H, the interactive graphs include {C, K, A, D} and {D, A, B, I}, respectively. Step 4: Merge the hierarchical graph and interactive graph. In the steps 2 and 3, from two different viewpoints of review, two graphs are built, respectively. Finally, they are merged into a complete graph of concept expansion which contains the concepts for the learner to review. The Fig. 5 shows the result after merging where the Si (the threshold of expanding interactive concept) is set to 0.3. Next, all concepts in the expansion concept graph are calculated the suggested weights immediately. Several related definitions and formulas about the weight calculation are introduced as follows. Definition 3 (Primitive Concept Weight (PCW)). The PCW (ci) represents the original weight of a primitive concept ci. In this paper, the PCW of a concept in CSM is set to 1, if this concept is mapped by the query as a primitive concept. Otherwise, the PCW is set to zero. Definition 4 (Hierarchical Concept Weight (HCW)). The HCW(ci) represents the summation of hierarchical weights of a concept ci, from other hierarchical concepts in its hierarchical graph. The calculation of HCW is expressed as formula (2). Two factors will affect

Fig. 4. The interactive graphs of primitive concepts G and H.

Fig. 2. The CSM with two mapped concepts.

Fig. 3. The hierarchical graphs of the primitive concepts G and H.

Fig. 5. The complete expansion concept graph.

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

the HCW of a concept. The first one is the PCWs of other hierarchical concepts in the same hierarchical graph. If these summation of PCWs in other hierarchical concepts is higher, HCW(ci) will obtain the higher value. But the affection between two concepts needs to be decreased with the increase of the concept distance between two concepts. Hence, for the PCW of each hierarchical concept which is passed on to the concept ci, the value will be decreased gradually. The longer concept distance will result in the lower decreased PCW.

HCWðci Þ ¼

X k2HC

PCWðkÞ 2  distance ðci ; kÞ

ð2Þ

Definition 5 (Interactive Concept Weight (ICW)). The ICW(ci) refers to the weights of the concept ci from other interactive concepts in the expansion concept graph. If a concept ck is a primitive concept and it has the interactive relation with the concept ci, the concept ci will can receive the ICW from the concept ck. The calculation of ICW is expressed as the formula (3). In the formula, IC represents the set which includes the interactive concepts of concept ci. For each interactive concept s, it can pass its PCW on to the concept ci but the concept ci only can obtain the part of PCW. The ratio is calculated according to the RV(s,ci). If the concept ci has more strong interactive relationship with the concept s, it can obtain the higher ratio of PCW(s).

X ICWðci Þ ¼ ðRVðs; ci Þ  PCWðsÞÞ

ð3Þ

s2IC

Definition 6 (Enhanced Concept Weight (ECW)). The ECW(ci) represents the weight which the provided materials should contain for the expansion of the concept ci. For all ECW(ci) of expanded concepts, they are considered as a enhanced concept vector. The strategy of materials picking employs this vector to select more suitable teaching materials to compose a review course. The formula (4) introduces how to calculate each ECW.

ECWðci Þ ¼ PCWðci Þ þ HCWðci Þ þ ICWðci Þ

ð4Þ

3.3. Review materials picking with DPSO It seems to be very difficult for picking suitable materials to compose a customized review course from a huge number of teaching materials. If there are n materials in the repository and the customized course needs to contain m materials in accordance with a learner’s intention, there exist C nm combinations. Due to PSO which just copes with the problems which their solutions are in the continuous space, this papers founded on (Chen & Wang, 2007) to propose the Discrete PSO approach to accomplish the goal of materials composition. In the following, it will first explain how DPSO is applied to solve the materials composition problem. Subsequently, some operations of the DPSO will be defined for working successfully. Finally, it will elucidate the designed fitness function and the process of DPSO. 3.3.1. Problem description The problem of review materials composition is defined to pick some subject-related materials from the material repository, and the acquired materials must correspond to a learner’s intention as possibly. After formulated into DPSO, each picking set which contains some materials is regarded as a feasible solution. If there are N materials in the repository, the search space is defined in the N-dimensional space and the feasible solution in each dimensional is defined as 1 or 0. In other words, the value in the ith dimensional

9667

represents whether ith material is selected. In the search space, each feasible solution is regarded as the position of a particle. The related formulas in DPSO are defined as follows:

( DPSO :

V tþ1 ¼ ðx  V ti Þ þ ðc1  ðPlbest  Pti ÞÞ þ ðc2  ðPgbest  P ti ÞÞ i i i ¼ Pti þ V tþ1 Ptþ1 i i

;

ð5Þ where Pi = (xi,1, xi,2, xi,3, . . .. . .,xi,N), N is the number of materials Vi = (vi,1, vi,2, vi,3, . . .. . ., vi,N). In the DPSO formula, Pi represents the current position of the ith particle. Each particle can move to next position in each generation according to three models. The first one is the inertia model which is produced by the current position vector and the inertia weight, ðw  V ti Þ. The second model is cognition-only model, which is to keep the particle at the local best position in its past experience by a learning coefficient c1 and the difference vector between current position and local best position. The last model is social-only model, which guides the particle move to the global best position in all neighbors by calculating the difference vector between current position and global best position where the learning coefficient c2 is used to accelerate the movement of the particle. By the three models calculation, each particle can obtain a new velocity and move to the best position. 3.3.2. DPSO operations As mentioned above, PSO cannot cope with the discrete problems. In brief, in continuous solution space, the movement of a particle is decided according to its new velocity. The velocity is explained as the movement of distance and direction. If the notion of velocity wants to be applied to discrete space, it may need to be defined anew. In Kennedy and Eberhart (1997), the velocity is explained as the probability. Here, this paper extends similar approach to define the operations of the velocity and the position anew to satisfy the need in the discrete space as follows:  Difference vector of positions. In the above formula (5), the local and global best positions are two important factors which can affect the next position of current particle. The position difference vector of a particle, P di , can be expressed as Pdi ¼ P1i  P 2i where i represents ith particle. The P1i can be the local best position or global best position and P2i is the current position. The new subtraction is redefined as the formula (6), where k represents the dimension. The p1i;k and p2i;k are the values of kth dimension in P1i and P2i , respectively while pdi;k is the values of kth dimension in the difference vector Pdi .

( pdi;k

¼

0;

if p1i;k ¼ p2i;k

p1i;k ;

otherwise

ð6Þ

 Multiplication vector of a position or a velocity. From the notion of the inertia model, cognition-only model and social-only model, there are different coefficients to be used in the formula (5). A multiplication of a coefficient and a vector is expressed as ¼ c  P now where Pnow may be a velocity vector or a Pnew i i i difference vector between two positions. The multiplication in is the value of kth DPSO is defined as formula (7) where xnow i;k and xnew is the value of kth dimension dimension in the P now i i;k . The value, rand, is a random value in [0, 1]. in the P new i

(

xnew i;k

¼

if rand < c xnow i;k ; 0;

otherwise

ð7Þ

 Velocity update. A new velocity is calculated by the inertia vector and two difference vectors. A velocity calculation can be ¼ V 1i þ V 2i . In means that new velocity is expressed as V new i decided according to the sum of several velocities in different

9668

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

aspects. The new velocity is calculated by the formula (8) where v 1i;k and v 2i;k represent the velocity value of kth dimension in the V 1i and V 2i , and v new i;k is the new velocity value of kth dimension in : the V new i

(

v new i;k ¼

v v 1i;k ; 2 i;k ;

if

v

2 i;k –0

ð8Þ

otherwise

 Position update. A particle can move from the current position to the next position by its new velocity. The new position can be ¼ P now þ V new . The formula for position update expressed as P new i i i and pnow are the kth is defined as the formula (9) where pnew i;k i;k value in the new position and the current position, respectively, is the kth new velocity value in the V new : and the v new i i;k

( pnew i;k ¼

v new i;k ;

if

pnow i;k ;

v new i;k

otherwise

–0

ð9Þ

3.3.3. Fitness function In order to appropriate a review course conforming to a learner’s expectation, the system takes three objectives into account, relevance degree of topics E, the difficulty degree of the customized course D, and the number of provided teaching materials, which are designed as follows. n n n X X X E¼ ðECW i  OCW i Þ2 ; where OCW i ¼ ðxij  mj Þ mj i¼1

j¼1

!1

j¼1

ð10Þ t X  ðdi  d Þ2 D¼

ð11Þ

i¼1



t X ðECW i Þ2 i¼1

!

n X ðmj Þ  S

!2 ð12Þ

j¼1

Z ¼EþDþI

ð13Þ

In the first considered subject, relevance degree of topics is to compare the difference between each concept weight in the selected materials and in the enhanced concept vector. In the formula (10), the ECWi represents the enhanced weight of the concept i, and the OCWi is the weight of the concept i in the selected materials. The OCWi is normalized by the all weight wij in the selected materials. As for the difficulty degree of materials, each material is assigned to a difficulty value di previously. The difference of difficulty degree between the selected materials and the difficulty d* which the learner issues, is estimated by the formula (11). The last issue is to consider the number of selected materials. If the few materials are selected, the learner may not obtain enough review, yet too many may bore the learner when he/she studies them, for which a suitable number of materials should be decided. The formula (12) estimates the proper degree of number of materials where S is the number that the learner wants, and a penalty coefficient is the summation of the square of the enhanced weights. If selected materials are too few or too many, it will result in the penalty increment in the value, I. Finally, the fitness function consists of three objective functions E, D and I as shown in the formula (13), where the minimum value of their fitness function is requested for the recommended materials.

approximate solution. The detailed processes are introduced as follow: Step 1: Enhanced concept weights calculation. First, the learner’s query is transferred into some primitive concepts. Subsequently, the expanded concept set is produced according to the concept expansion algorithm. Each concept in the expanded concept set is attached to an enhanced weight. The weights are presented as the enhanced weight vector. Step 2: Swarm initialization. All materials are encoded using the binary approach as shown in Fig. 6. If ith bit is set to 1, it represents ith material is selected; otherwise, it is not selected. Next, the numbers and generations of the particles are set. In the initial position and velocity of each particle are set randomly. Step 3: Evaluate fitness function. For each particle, it calculates its fitness value of its current position by the fitness function as the formula (13). The position will be explained as which materials are selected, and the formulas (10)–(12) are applied to evaluate the position. If the fitness value of current position is smaller than the past experience, then the current position is considered as its Plbest. Subsequently, each particle will pick the highest Plbest as the Pgbest among its neighbors and itself. Step 4: Update position and velocity. After updating Plbest as the Pgbest of each particle, every particle calculates its new velocity and position by the formula (5). The formula combines the notions of particle inertia, social-only model and cognition-only model to calculate the new velocity. In the calculation process of the velocity, these formulas (6)–(8) are applied to estimate the new velocity. Similarly, the new position of each particle also is located by the formula (9). Step 5: Particles iteration. After update the velocity of each particle, it moves to new position. Next, the same processes in the steps 3 and 4 are performed according to the generation. In each generation, the Plbest and Pgbest are updated according to their experiences. The last approximate solution can be obtained when the iteration is terminated. The corresponding bits of materials are 1 will be picked as the elements of customized course.

3.5. Materials sequencing with greedy-like materials composition By the DPSO, some review materials are picked for a learner. In order to smoothe the reading of materials, the system offers the

3.4. Review materials recommendation In this subsection, the algorithm process of DPSO recommendation is described. The algorithm consists of five steps to find an

Fig. 6. The encoding of materials.

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

suitable reading order for the learner by the greedy-like materials sequencing algorithm. The notion of materials sequencing is based on the similarity among materials which have been picked by DPSO. The complete pseudo-code is described as follows: Input: selected materials M, enhanced concept vector ECV Output: a customized course with smooth order Algorithm Name: Greedy-like materials composition 01 Build a matrix SM with the dimension is equal to the ECV. 02 For loop 03 Calculate the sim(i,j), similarity of any two materials i, j in M by the formula (14); 04 Assign sim(i,j) to SM[i,j]; 05 End for 06 Find all roots in the complete expanded concept graph and add them to list R; 07 new a list set L; 08 For each root r in R 09 new a listr and add the material r to listr; 10 add listr to L; 11 End for 12 For each listm in L 13 While (size of listm< the number of concepts in the ECV) 14 Find the material k which the similarity sim(m, k) is highest and the k doesn’t in listm; 15 add material k to listm; 16 End while 17 End for 18 Find the best listm which has the highest the sum of all similarities from the L; 19 return best listm; In the material sequencing algorithm, it first calculates the similarity of any two selected materials by the formula (14).

P c2ECV ðwi;c  wj;c Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; sim ði; jÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P 2 2 c2ECV wi;c  c2ECV wj;c

ð14Þ

where wi,c and wj,c represent the weight of the concept c appears in the materials i and j, respectively and the each similarity between

9669

materials i and j is stored at the matrix SM. Subsequently, for each candidate beginning material m, it finds out next material k which has highest similarity and it does not exist in the listm. The same process is done until all materials in M are included in the listm. Finally, a list which has highest sum of the similarities is picked as the order of provided materials.

4. Experiment and evaluation 4.1. System introduction The proposed system, designed for learners to review their known course, can compose a customer-specified review course to meet a learner’s requirement, in addition, with the smooth reading order. Fig. 7 displays the interface of customized review course. As shown in the , each learner at first can decide the course domain, how many materials he/she hopes to review and the difficulty degree of materials. Next, the learner can enter a specific query according to his intention. The search function will start the concept expansion as mentioned in Section 3.2. Finally, the review materials which are picked by the DPSO are displayed in the center of interface as the . Each of the material includes several messages including the title which describes the material, the date which the material is issued, and the abstract content. For the review behaviors, the system provides two modes, the smooth read. In the smooth ing and free reading, which are shown as the reading, the learner review recommended materials by the smooth reading order planned by the greedy-like materials sequencing approach. In another mode, free reading mode, the learner can read is offered them free. In the left of interface, a Quick Browser as to list the title of the recommended materials by the default order, by which the learner can quickly select the material which he/she want to review by this function. When the learner selects a or , the interface of the recommended material material from as shown in Fig. 8, will display the related content of this material. reveals the Title of this material and In the top of Fig. 8, the displays the Difficulty Degree of the material, and the issued date. In , the material content is presented. Especially, the system offers the tagger for the learner to tag this material. In , it shows the tags which the material contains at present. The learner can

Fig. 7. The interface of customized review course.

9670

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

Fig. 8. The interface of the recommended material.

add new tag by the tag selection box, including two parts, Tag Name, and Tag Relevance Degree as the . 4.2. Experiments and discussions 4.2.1. Measures The effectiveness of an information retrieval system is usually measured by two quantities and one combined measure. The ‘‘Recall” and ‘‘Precision” rates are the two quantities, and the f-measure is the combined measure. In this paper, they are applied to measure the proper degree of the customized review course. From the recall perspective shown as Fig. 9a, the Ne is the number of expanded concepts, and Np is the number of the concepts in the customized review course. If the interaction of Ne and Np is higher, the concepts in the customized review course can cover the majority of expanded concepts. Here, this paper defines the recall as the ratio of Ne \ Np and Ne. The higher ratio represents that the customized review course is more conformed to the expanded concepts. From the precision perspective shown as Fig. 9b, it considers the precision degree of the customized review course. If the set Np-(Ne \ Np) is fewer, this means that the course has higher precision because the customized review course only contains few uncared concepts. The precision is defined as the ratio of Ne \ Np and Np,, whose measures are expressed as follows, where the f-measure is combination measure to evaluate the recall and precision together.

set by the hierarchical threshold (Sh) = 0.1 and the interactive threshold (Si) = 0.3. The third parameter is the difficulty degree where learners can select the suitable difficulty which is conformed to their levels with five degrees, very hard, hard, common, easy and very easy. Finally, in DPSO, two numbers of particles, 10 and 20 are used to find the optimal solution. Different termination conditions are selected to evaluate the variations of fitness values. Besides, five material repositories which have different numbers of materials also are also availed in the experiments. 4.2.3. Experimental result 1 – evaluate the variation of fitness values To evaluate the efficiency of DPSO, two different numbers of particles are applied to the DPSO. In Fig. 10, it shows the curves of average of fitness values with particles = 20 and particles = 10 in the different numbers of generations. In this experiment, there are 400 candidate materials and the number of materials in the customized course is assigned to 15. The curve of particles = 20 is converged when the generation is around 60 while other curve with particles = 10 is converged when generation is around 85. The real optimal solution is 0.192 while the approximate best fitness value in DPSO is 0.21 when the number of materials is 400. It is obvious that the convergence generation of 20 particles is bet-

Ne \ Np ; Ne Ne \ Np Measure 2: Precision ¼ ; Np 2  Recall  Precision Measure 3: f -measure ¼ Recall þ Precision Measure 1: Recall ¼

4.2.2. Experimental setting Before discussing the experimental result, several experimental settings must be introduced first. The related parameters are illustrated in the Table 2: Each material is tagged by several related concepts which include the relevance degrees. The values of relevance degrees are used in the similarity calculation as the formulas (10) and (14). The thresholds of primitive concepts expansion are

Fig. 9. The notions of the recall and precision.

9671

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673 Table 2 Experimental parameters setting.

ter than 10 particles. Although, using 20 particles will raise the cost of searching, the increment of number of particles can accelerate the convergence. Hence, the proposed system uses 20 particles to search the best solution in DPSO. Next, let’s observe the variation of fitness values with 20 particles in the different number of materials. Fig. 11 shows three curves which are terminated in 30, 50, and 100 generations. When the numbers of materials are fewer than 400, due to the factors of material category and the number of recommended materials, the fitness values easily are affected. However, with the increment of materials, the DPSO can pick suitable materials easily.

Fig. 10. Fitness value evaluations with different number of particles.

Fig. 11. Fitness value evaluations with different number of generations.

4.2.4. Experiment result 2 – evaluations of recall, precision, and f-measure The second experiment was evaluated by the three measures, Recall, Precision, and f-measure. The experiment adopted two approaches, No Concept Expansion and Concept Expansion, with different numbers of particles and generations. The detailed measured values are shown in Table 3. Observing the Recall fields in two approaches, the Recall values reflect that the values are increased gradually when the number of particle or generation is increased. But, it is clear that the Recall values in No Concept Expansion and Concept Expansion are almost equal, which indicates the proposed expansion approach only obtains a little promotion in Recall. But, in the fields of Precision, it’s obvious that the Precision values are promoted, which exhibits that the concepts of recommended materials are more conformed to the expanded concepts. By the three measures, they show that the DPSO still works successfully because the Recall and Precision of the customized review course are satisfying. Next, let us observe the results which are obtained from learners’ feedbacks. A question, ‘‘Do you think the materials which are included in the customized review course are conformed to your intention?” is proposed to ask the learners. For the satisfied degree of the recommended materials,

Table 3 The evaluations of three measures with no concept expansion and concept expansion. Particles

Generation

No concept expansion

Concept expansion

Recall

Precision

f-measure

Recall

Precision

f-measure

10 10 10 20 20 20

30 50 100 30 50 100

0.79 0.874 0.891 0.886 0.92 0.92

0.62 0.651 0.68 0.662 0.721 0.74

0.695 0.746 0.771 0.758 0.808 0.82

0.773 0.869 0.879 0.895 0.914 0.923

0.79 0.812 0.85 0.837 0.843 0.861

0.781 0.84 0.864 0.865 0.877 0.891

9672

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

Fig. 12. The satisfied degrees with no concept expansion and concept expansion.

there are five levels, Very Satisfied, Satisfied, General, Bad, and Very Bad are used. In the Fig. 12, it illustrates the different satisfied degree in No Concept Expansion and Concept Expansion. In No Concept Expansion, the majority of learners give the General value (52%) to their customized courses. However, in the Concept Expansion, a half of General values are promoted to Very Satisfied (26%) and Satisfied (38%). Such yield presents the appreciation of Concept Expansion. 4.2.5. Experimental result 3 – evaluate satisfied degree of reading order In the third experiment, it estimates the satisfied degree of reading order of the customized review course for smooth reading. The satisfied degree includes three values, Satisfied, General, and Bad. According to the customized review courses which are composed form learners’ queries, they are classified to some subjects as shown in Table 4. There are eight single-subjects and three multi-subjects. The multi-subject A covers the Object-orient Programming Concepts, Class concepts and Inheritance. The multi-subject B contains Variables, Expressions, Statements and Blocks, and Control Flow Statements. The final multi-subject C includes Class concepts, Inheritance and Interface. At the eight single-subjects, the General feedbacks lie in between 46% and 62%, which represents the read-

Table 4 The satisfied degrees in different subjects with no smooth reading and smooth reading. Subject name

Object-oriented programming concepts Variable Expression, statements, and blocks Control flow statements Class concepts Inheritance Interface Packages Multi-subjects A Multi-subjects B Multi-subjects C

No smooth reading

With smooth reading

Satisfied (%)

General (%)

Bad (%)

Satisfied (%)

General (%)

Bad (%)

23

52

25

33

58

9

31 32

49 59

20 9

40 45

53 48

7 7

21 27 31 35 35 19 17 13

53 46 62 58 49 36 32 42

26 27 7 7 16 45 51 45

36 37 43 46 47 46 49 38

47 52 47 52 50 42 38 46

13 11 10 2 3 12 13 16

Fig. 13. The satisfied degrees of reading order. (a) Without smooth reading. (b) With smooth reading.

ing order without smooth reading is still accepted. Yet in the Satisfied and Bad feedbacks, they are around 20–30%. But it is obvious that the General feedbacks in multi-subjects are lower 10% than in single-subjects. Unfortunately, the Satisfied feedbacks in multi-subjects are diminished to 20% below while the bad feedbacks are raised to around 45%. Let’s look at the Fig. 13a, it shows the average rates of three feedback types in single-subjects and in multi-subjects, respectively. From the results, learners seem have bad feelings when they read the customized review courses which cover multi-subjects without smooth reading. Next, in the right of Table 4, the feedbacks result with smooth reading. For the singlesubject, the General feedbacks keep the similar rates as the rates without smooth reading, while the Satisfied feedbacks are increased about 10% and Bad feedbacks also are reduced. Subsequently, let us look at the results in the multi-subjects, in which the rates of Satisfied feedbacks are excitingly increased very much. Despite around 20–25% increments, the rates of Bad feedbacks are reduced to 12–16%. From the Fig. 13b which using the smooth reading, the average rates of three feedback types, Satisfied feedbacks are increased especially in the multi-subjects; moreover the Bad feedbacks are also reduced substantially. From above experimental results, adopting the smooth reading order by the greedy-like materials composition approach indeed contribute the learners a more smooth and suitable review order. 5. Conclusion In this paper, an interactive and dynamic review course composition system is proposed. Each learner can interact with this system by entering the queries. When the proposed system receives a

T.I. Wang, K.H. Tsai / Expert Systems with Applications 36 (2009) 9663–9673

query, it will compose a customized review course dynamically. In the process of the composition, there are three approaches to be applied to customize the suitable course. The first approach is contextual semantic expansion, which utilizes the CSM to find out which concepts should be recommended to review. In the materials picking, the second approach, DPSO, is adopted to select better and more subject-related materials. By DPSO, these materials which are conformed to the learner’s intention will be appropriately selected. Finally, the smooth reading is applied to smoothe the order of the materials in the customized review course, with the greedy-like approach to sequence the order of materials. In the experiment part, the results in three different experiments are manifested and discussed. The first part is to probe the DSPO performance, where two numbers of particles and three different generations are used to evaluate the variation of fitness values. It is more obvious that larger particles and larger generation can increase more time of convergence. Considering the computing overhead and time of customizing review course, the particles = 20 and generations = 100 are adopted in the proposed system. Besides, the experiments also analyze the satisfied degrees of recommended materials and the reading orders. The contextual semantic expansion proves that it can offer more subject-related materials for the learner because the higher feedbacks lie in Very Satisfied, Satisfied, and General. Similarly, this system also compares the satisfied degrees with smooth reading and no smooth reading, in which the rate of Satisfied feedbacks is promoted while the rate of Bad feedbacks is reduced, which signifies that the greedy-like materials sequencing approach really facilitates the reading order of the customized review course. By the experimental results, it proves that the proposed system indeed provides the suitable and smooth review courses for learners. Acknowledgement This work is supported by the Nation Science Council of Taiwan under the contract NSC95-2221-E-006-158-MY3. References Chen, C.-M., & Chung, C.-J. (2008). Personalized mobile English vocabulary learning system based on item response theory and learning memory cycle. Computers and Education, 51(2), 624–645.

9673

Chen, C.-M., & Duh, L.-J. (2008). Personalized web-based tutoring system based on fuzzy item response theory. Expert Systems with Applications, 34(4), 2298–2315. Chen, M., & Wang, Z.-W. (2007). An Approach for web services composition based on qos and discrete particle swarm optimization. In Proceedings of eighth ACIS international conference on the software engineering, artificial intelligence, networking, and parallel/distributed computing, 2007, SNPD 2007. Cheng, S.-C., Lin, Y.-T., & Huang, Y.-M. (2009). Dynamic question generation system for web-based testing using particle swarm optimization. Expert Systems with Applications, 36(1), 616–624. Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. In Proceedings of the sixth international symposium on the micro machine and human science, 1995, MHS ’95. Formica, A. (2008). Concept similarity in formal concept analysis: An information content approach. Knowledge-Based Systems, 21(1), 80–87. Hashemi, R. R., De Agostino, S., Westgeest, B., & Talburt, J. R. A. T. J. R. (2004). Data granulation and formal concept analysis. In Proceedings of IEEE annual meeting of the Fuzzy Information, 2004: Processing NAFIPS ’04. Huaiguo, F. (2006). Formal concept analysis for digital ecosystem. In Proceedings of fifth international conference on the machine learning and applications, 2006: ICMLA ’06. Huang, M.-J., Huang, H.-S., & Chen, M.-Y. (2007). Constructing a personalized eLearning system based on genetic algorithm and case-based reasoning approach. Expert Systems with Applications, 33(3), 551–564. Huang, T.-C., Huang, Y.-M., & Cheng, S.-C. (2008). Automatic and interactive eLearning auxiliary material generation utilizing particle swarm optimization. Expert Systems with Applications, 35(4), 2113–2122. Huey-Ing, L., & Min-Num, Y. (2005). QoL guaranteed adaptation and personalization in e-Learning systems. Education, IEEE Transactions on, 48(4), 676–687. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of IEEE international conference on the neural networks, 1995. Kennedy, J., & Eberhart, R.C. (1997). A discrete binary version of the particle swarm algorithm. In Proceedings of IEEE international conference on the systems, man, and cybernetics, 1997, Computational cybernetics and simulation, 1997. Porter, M. The porter stemming algorithm. http://www.tartarus.org/martin/ PorterStemmer/. Sun. The Java Tutorials. from http://java.sun.com/docs/books/tutorial/. Wang, K. T., Huang, Y.-M., Jeng, Y.-L., & Wang, T.-I. (2008). A blog-based dynamic learning map. Computers & Education, 51(1), 262–278. Wang, T.-I., Wang, K.-T., & Huang, Y.-M. (2008). Using a style-based ant colony system for adaptive learning. Expert Systems with Applications, 34(4), 2449–2464. Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72, 89–104. Weng, S.-S., Tsai, H.-J., Liu, S.-C., & Hsu, C.-H. (2006). Ontology construction for information classification. Expert Systems with Applications, 31(1), 1–12. Yang, Y. J., & Wu, C. (2009). An attribute-based ant colony system for adaptive learning object recommendation, Part 2. Expert Systems with Applications, 36(2), 3034–3047. Yaohua, C., & Yiyu, Y. (2005). Formal concept analysis and hierarchical classes analysis. In Paper presented at the Fuzzy Information Processing Society, 2005, NAFIPS 2005: Annual Meeting of the North American. Zhao, L., & Yang, Y. (2009). PSO-based single multiplicative neuron model for time series prediction, Part 2. Expert Systems with Applications, 36(2), 2805–2812.