Fuzzy swarm diversity hybrid model for text summarization

Fuzzy swarm diversity hybrid model for text summarization

Information Processing and Management 46 (2010) 571–588 Contents lists available at ScienceDirect Information Processing and Management journal home...

1MB Sizes 5 Downloads 89 Views

Information Processing and Management 46 (2010) 571–588

Contents lists available at ScienceDirect

Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman

Fuzzy swarm diversity hybrid model for text summarization Mohammed Salem Binwahlan a,*, Naomie Salim b, Ladda Suanmali c a b c

Faculty of Applied Sciences, Hadhramout University of Science & Technology, Yemen, 81310 Skudai, Johor, Malaysia Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia Faculty of Science and Technology, Suan Dusit Rajabhat University, 10300 Dusit, Bangkok, Thailand

a r t i c l e

i n f o

Article history: Received 1 June 2009 Received in revised form 10 February 2010 Accepted 14 March 2010 Available online 10 April 2010 Keywords: Diversity Feature Fuzzy logic Particle swarm optimization Summarization

a b s t r a c t High quality summary is the target and challenge for any automatic text summarization. In this paper, we introduce a different hybrid model for automatic text summarization problem. We exploit strengths of different techniques in building our model: we use diversitybased method to filter similar sentences and select the most diverse ones, differentiate between the more important and less important features using the swarm-based method and use fuzzy logic to make the risks, uncertainty, ambiguity and imprecise values of the text features weights flexibly tolerated. The diversity-based method focuses to reduce redundancy problems and the other two techniques concentrate on the scoring mechanism of the sentences. We presented the proposed model in two forms. In the first form of the model, diversity measures dominate the behavior of the model. In the second form, the diversity constraint is no longer imposed on the model behavior. That means the diversity-based method works same as fuzzy swarm-based method. The results showed that the proposed model in the second form performs better than the first form, the swarm model, the fuzzy swarm method and the benchmark methods. Over results show that combination of diversity measures, swarm techniques and fuzzy logic can generate good summary containing the most important parts in the document. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Automatic text summarization is creation of a summary by machine. It has become a very important factor required in fields that deal with huge amounts of data, even in the field of law (Moens, 2007). The aim of automatic text summarization is to condense the source text by extracting its most important content that meets a user’s or application’s needs (Mani, 2001). High quality summary is the target and challenge of any automatic text summarization. The selection of the distinct ideas included in the document is called diversity-based selection. Diversity is important to control the redundancy in summarized text and produce a more appropriate summary. Many approaches have been proposed for text summarization based on diversity. The pioneer work for diversity-based text summarization is maximal marginal relevance (MMR), introduced by Carbonell and Goldstein (1998). MMR maximizes marginal relevance in retrieval and summarization. The sentence with high maximal relevance means it is highly relevant to the given query and less similar to already selected sentences. Sweeney et al. (2008) studied two different approaches to determine whether the focus on extracting the most diverse information has positive or negative effect on the quality of the generated summary. The first approach focuses on generating a summary by extracting the most dissimilar sentences and the other approach adds some additional information to the summary in addition to the most diverse content to keep the context of the document. The researchers concluded that the existence of additional information with the most diverse content was not so important * Corresponding author. Tel.: +60 177582859/7 5532208; fax: +60 7 5532210. E-mail address: [email protected] (M.S. Binwahlan). 0306-4573/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2010.03.004

572

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

and the summary without such information could represent the original document effectively. In our previous work (Binwahlan, Salim, & Suanmali, 2009a), we introduced a modified version of MMR, which maximizes the marginal importance and minimizes the relevance. That approach treats a sentence with high maximal importance as one that has high importance in the document and less relevance to already selected sentences. Machine learning approaches (Conroy & O’leary, 2001; Fattah & Ren, 2008; Kupiec, Pedersen, & Chen, 1995; Lin, 1999; Lin & Hovy, 1997; Osborne, 2002; Svore, Vanderwende, & Burges, 2007; Yeh, Ke, Yang, & Meng, 2005) have also proven their ability in improving summarization performance. In another work (Binwahlan, Salim, & Suanmali, 2009b), we used particle swarm optimization (PSO) (Kennedy & Eberhart, 1995) for features selection problem in order to study the feature structure effect on features selection. One of the results obtained from that study is the learned weight of each feature. Later, we have applied the features weights produced (Binwahlan, Salim, & Suanmali, 2009c) and found that summarization performance was improved. The motivation to use PSO for text summarization problem was its successful application in other related problems such as text classification and data clustering (Cui, Potok, & Palathingal, 2005; Merwe & Engelbrecht, 2003; Wang, Zhang, & Zhang, 2007; Ziegler & Skubacz, 2007). A few studies have applied fuzzy logic for text summarization (Kiani & Akbarzadeh, 2006; Kyoomarsi, Khosravi, Eslami, Dehkordy, & Tajoddin, 2008). In our previous work (Binwahlan, Salim, & Suanmali, 2009d), we found the integration of the fuzzy logic and swarm intelligence could give better performance in terms of summarization. Improvement of summary quality remains a key research problem and needs much work. The good performance of swarm intelligence (PSO), diversity-based method and fuzzy logic in above studies can be incorporated in a hybrid model of fuzzy swarm diversity, which could identify appropriate sentences to be included in a text summary that can minimally represent large texts for storage and retrieval purposes. The rest of this paper is organized as follows: Section 2 presents related work to this study. Section 3 describes Maximal Marginal Importance (MMI) diversity-based text summarization. Section 4 introduces swarm-based summarization. Section 5 presents swarm diversity-based text summarization. Section 6 introduces fuzzy swarm-based text summarization. Section 7 presents fuzzy swarm diversity hybrid model for automatic text summarization. Section 8 discusses the generalization of the proposed method results via use of confidence limits. Section 9 describes the experimental design. Section 10 presents the experimental results. Section 11 presents discussion of the results. Section 12 draws the conclusion and future work. 2. Related work Similar works to ours in terms of hybrid model for automatic text summarization problem are few. Aretoulaki (1994) proposed a hybrid system based on four modules, where each module tries to look for specific features and information in the input text, then the outputs of those modules are passed to an artificial neural network (ANN) to score the text units as important and unimportant based on the outputs of the four modules. Alemany and Fort (2003) presented a summarizer based on lexical chains, in which the cohesive properties of the text were combined with coherence relations to produce good summaries. A different hybrid model was introduced by Cunha et al. (2007), which combines mainly three systems, each system produces its own extract, then an algorithm creates the final summary by selecting sentences with the highest scores from the three extracts after scoring of those extracted sentences. In this paper, we introduce a different hybrid model for automatic text summarization problem. We try to exploit different resources advantages in building our model like the advantage of diversity-based method (Binwahlan et al., 2009a) which can filter similar sentences and select the most diverse, the advantage of differentiation between the more important and less important features using swarm-based method (Binwahlan et al., 2009c) and the advantage of the fuzzy logic that can make the risks, uncertainty, ambiguity and imprecise values of the text features weights flexibly tolerated. First we discuss each method separately and then we combine them together in some hybrid models. 3. Maximal marginal importance (MMI) diversity-based text summarization Maximal marginal importance (MMI) (Binwahlan et al., 2009a) is a diversity-based text summarization method for summary generation. It depends on the extraction of the most important sentences from the original text. Most features used in this method are combined in a linear combination to show the importance of the sentence. The reason for including the importance of the sentence in the method is to emphasize on the information richness in the sentence as well as information novelty. The features used are as follow: 3.1. Sentence centrality The sentence centrality consists of summation of three features: the similarity between the sentence in hand si and each document sentence sj, shared friends (the group of sentences which are similar to both sentences si and sj) and shared ngrams (the group of n-grams which are contained in both sentences si and sj), normalized by n  1, where n is the number of sentences in the document

Pn1 SCðsi Þ ¼

j¼1

simðsi ; sj Þ þ

Pn1 j¼1

n-friendsðsi ; sj Þ þ n1

Pn1 j¼1

n-gramsðsi ; sj Þ

ji – j and simðsi ; sj Þ >¼ h

ð1Þ

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

573

where sj is a document sentence except si, n is the number of sentences in the document. h is the similarity threshold which is determined empirically in an experiment run to determine the best similarity threshold value. We have found that the best similarity threshold is either 0.03 or 0.16. sim is the similarity between the sentence in hand si and each document sentence sj (calculated using cosine similarity measure (Erkan & Radev, 2004)), n-friends are shared friends (the group of sentences which are similar to both sentences si and sj (calculated using Eq. (2))) and n-grams are shared n-grams (the group of n-grams which are contained in both sentences si and sj (calculated using Eq. (3))).

jsi ðfriendsÞ \ sj ðfriendsÞj ji – j jsi ðfriendsÞ [ sj ðfriendsÞj jsi ðn-gramsÞ \ sj ðn-gramsÞj ji – j n-gramsðsi ; sj Þ ¼ jsi ðn-gramsÞ [ sj ðn-gramsÞj

n-friendsðsi ; sj Þ ¼

ð2Þ ð3Þ

where si(friends) is the group of sentences similar to si, sj(friends) is the group of sentences similar to sj, si(n-grams) is the group of terms contained in sentence si and sj(n-grams) is the group of terms contained in sentence sj. 3.2. Title feature This feature is formed as average of two features as in Eq. (4), which are title-help sentence (THS) and title-help sentence relevance sentence (THSRS).

SS NG ¼

THSðsi Þ þ THSRSðsi Þ 2

ð4Þ

where THS is title-help sentence (THS) which is the sentence containing n-gram terms of title (calculated using Eq. (5)) and THSRS is title-help sentence relevance sentence (THSRS) which is the sentence containing n-gram terms of any title-help sentence (calculated using Eq. (6)).

THSðsi Þ ¼

jsi ðn-gramsÞ \ Tðn-gramsÞj jsi ðn-gramsÞ [ Tðn-gramsÞj

ð5Þ

where si(n-grams) are terms of any sentence si in the document and T(n-grams) are terms of the document title.

THSRSðsj Þ ¼

jsj ðn-gramsÞ \ THSðsi ðn-gramsÞÞj jsj ðn-gramsÞ [ THSðsi ðn-gramsÞÞj

ð6Þ

where si(n-grams) are terms of any sentence sj in the document and THS(si(n-grams)) are terms of any sentence sj which was previously marked as title-help sentence. 3.3. Word sentence score (WSS) It is calculated as the following:

P

WSSðsi Þ ¼ 0:1 þ

tj

2 Si W ij

HTFS

jno: of sentences containing tj >¼

1 LS 2

ð7Þ

where: 0.1 = Minimum score the sentence gets in case it’s terms are not important Wij = As in (8), is the term weight (TF–ISF) of the term tij in the sentence si. TF–ISF is acronym of term frequency–inverse sentence frequency. Term frequency (tf) means number of times the term tij occurs in the sentence si. Inverse sentence frequency (ISF) is calculated as bellow: isf = 1-[(log(sf(tij) + 1)/log(n + 1)]. Then TF–ISF = tfij  isf sf is number of sentences containing the term (tij) and n is total number of sentences in the document. LS = Summary length and HTFS is highest term weights (TF–ISF) summation of a sentence in the document.

  logðsf ðtij Þ þ 1Þ W ij ¼ tfij  isf ¼ tf ðt ij ; si Þ 1  logðn þ 1Þ

ð8Þ

3.4. Key word feature The top 10 words whose high TF–ISF scores are chosen as key words. 3.5. Similarity to first sentence This feature is to score the sentence based on its similarity to the first sentence in the document (calculated using cosine similarity measure, (Erkan & Radev, 2004)), where in news article, the first sentence in an article is considered very important.

574

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

Hurricane Gilbert Heads Toward Dominican Coast. Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil Defense alerted its heavily populated south coast to prepare for high winds, heavy rains and high seas. The storm was approaching from the southeast with sustained winds of 75 mph gusting to 92 mph. ``There is no need for alarm,'' Civil Defense Director Eugenio Cabral said in a television alert shortly before midnight Saturday. Cabral said residents of the province of Barahona should closely follow Gilbert's movement. An estimated 100,000 people live in the province, including 70,000 in the city of Barahona, about 125 miles west of Santo Domingo. Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a hurricane Saturday night. The National Hurricane Center in Miami reported its position at 2 a.m Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo. The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving westward at 15 mph with a ``broad area of cloudiness and heavy weather'' rotating around the center of the storm. The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6 p.m. Sunday. Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds and up to 12 feet feet to Puerto Rico's south coast. There were no reports of casualties. San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided during the night. On Saturday, Hurricane Florence was downgraded to a tropical storm and its remnants pushed inland from the U.S. Gulf Coast. Residents returned home, happy to find little damage from 80 mph winds and sheets of rain. Florence, the sixth named storm of the 1988 Atlantic storm season, was the second hurricane. The first, Debby, reached minimal hurricane strength briefly before hitting the Mexican coast last month.

Fig. 1a. An example of original document.

The summarization steps for creating summary of a given document D using this method can be described in Algorithm 1. Algorithm 1. Maximal Marginal Importance (MMI) diversity-based method 1. Input document D: take the document D as input, D = {T, s1, s2, s3, s4, s5, . . . , sn} 2. Preprocessing: segment document D into separated sentences D={T, s1, s2, s3, s4, s5, . . . , sn}, remove stop words and then stem the words. 3. Features Extraction: extract the features, for each si there is a set of eight features F, F = {WSS, SC, SS_NG, sim_fsd, kwrd, nfriends, ngrams, sim}. 4. Sentence clustering and binary tree building: 4.1 Cluster the document sentences into a number of clusters equal the summary length. 4.2 Find friendNo(si): use Eq. (4.12) to calculate the similarity between each sentence and other sentences in the document, then for each sentence select the sentences having similarity degree greater or equal the threshold. 4.3 Pick up the sentence having highest number of friends (similar sentences) and present it in the binary tree. 4.4 Pick up the most similar sentences to the picked up sentence in step 4.3 and present them in the same binary tree. 4.5 Repeat steps from 4.2 to 4.4 in case there are sentences remaining in the current cluster. 4.6 Repeat steps from 4.2 to 4.5 for each cluster. 5. Sentence order in the binary tree: calculate the sentence score in the binary tree using Eq. (9)

ScoreBT ðsi Þ ¼ imprðsi Þ þ ðimprðsi Þ  friendsNoðsi ÞÞ

ð9Þ

where

imprðsi Þ ¼ av gðWSSðsi Þ þ SCðsi Þ þ SS NGðsi Þ þ sim fsdðsi Þ þ kwrdðsi ÞÞ

ð10Þ

6. Summary generation: 6.1 Apply MMI (Eq. (11)) on the binary tree of each sentence cluster to select one sentence to be included in the final summary.

MMIðsi Þ ¼ Arg max ½ðScoreBT ðsi Þ  bðsi ÞÞ  maxðrelðsi ; sj ÞÞ si 2CSnSS

sj 2SS

ð11Þ

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

575

where:

relðsi ; sj Þ ¼ av gðn-friendsðsi ; sj Þ þ n-gramsðsi ; sj Þ þ simðsi ; sj ÞÞ

ð12Þ

6.2 Order the summary sentences in the same order of the original document. For step 1, For example, the input document D is as shown in Fig. 1a. For step 2, the document separated sentences are as shown in Fig. 1b. For step 3, WSS: word sentence score, SC: sentence centrality, SS_NG: average of THS and THSRS features, sim_fsd: the similarity of the sentence si with the first document sentence calculated using cosine similarity measure (Erkan & Radev, 2004), kwrd is the key word feature, n-friends is the shared friends (the group of sentences which are similar to both sentences si and sj), n-grams is the shared n-grams (the group of ngrams which are contained in both sentences si and sj) and the sim is the similarity between those two sentences. For step 4, the document sentences are clustered into clusters (using k-means clustering algorithm (Jain & Dubes, 1988)) where each cluster contains the most similar sentences. The cluster number is determined automatically by the summary length (number of sentences in the final summary which equals 20% of the total number of original document sentences). For example, the input document D contains 16 sentences; its summary length will be three sentences. Therefore the number of clusters should be 3 clusters, each sentence cluster is represented as one binary tree or more, Fig. 1c shows an example of the binary trees of the clusters of the input document D. The first sentence which is presented as the root in the binary tree is the sentence with highest number of friends (number of similar sentences (calculated using cosine similarity measure, (Erkan & Radev, 2004))). After that, the sentences which are most similar to already presented sentence are selected and presented in the same binary tree. In case there are sentences remaining in the same cluster, a new binary tree is built for them by the same way. For step 5, the sentences in the binary tree are ordered based on their scores. This is shown in Fig. 1d. The score of the sentence in the binary tree building process is calculated based on the importance of the sentence and the number of its friends using Eq. (9). Where ScoreBT(si) is the score of the sentence si in the binary tree building process, impr(si) is the importance of the sentence si calculated using normal features (Eq. (10)) and friendsNo(si) is the number of sentences which are similar to sentence si. Where WSS: word sentence score, SC: sentence centrality, SS_NG: average of THS and THSRS features,

T: Hurricane Gilbert Heads Toward Dominican Coast S1: Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil Defense alerted its heavily populated south coast to prepare for high winds, heavy rains and high seas. S2: The storm was approaching from the south east with sustained winds of 75 mph gusting to 92 mph. S3: ``There is no need for alarm,'' Civil Defense Director Eugenio Cabral said in a television alert shortly before midnight Saturday. S4: Cabral said residents of the province of Barahona should closely follow Gilbert's movement. S5: An estimated 100,000 people live in the province, including 70,000 in the city of Barahona, about 125 miles west of Santo Domingo. S6: Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a hurricane Saturday night. S7: The National Hurricane Center in Miami reported its position at 2 a.m Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo. S8: The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving westward at 15 mph with a ``broad area of cloudiness and heavy weather'' rotating around the center of the storm. S9: The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6 p.m Sunday. S10: Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds and up to 12 feet to Puerto Rico's south coast. S11: There were no reports of casualties. S12: San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided during the night. S13: On Saturday, Hurricane Florence was downgraded to a tropical storm and its remnants pushed inland from the U.S. Gulf Coast. S14: Residents returned home, happy to find little damage from 80 mph winds and sheets of rain. S15: Florence, the sixth named storm of the 1988 Atlantic storm season, was the second hurricane. S16: The first, Debby, reached minimal hurricane strength briefly before hitting the Mexican coast last month. Fig. 1b. The separated sentences of the original document d061j: AP880911-0016 after the segmentation process.

576

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

Fig. 1c. Binary trees of sentence clusters: A for cluster 1, B for cluster 2 and C for cluster 3, the sentences of each cluster are unordered.

Fig. 1d. Binary trees of sentence clusters: A for cluster 1, B for cluster 2 and C for cluster 3, the sentences of each cluster are ordered based on the sentence importance in the binary tree.

Fig. 1e. The summary of the original document d061j: AP880911-0016 generated by the proposed model in the first form (fuzzy swarm diversity hybrid model 1st form for automatic text summarization).

sim_fsd: the similarity of the sentence si with the first document sentence calculated using cosine similarity measure (Erkan & Radev, 2004) and kwrd(si) is the key word feature. Each level in the binary tree contains 2ln of the higher score sentences, where ln is the level number, ln = 0, 1, 2, . . . , n. The top level contains one sentence which is a sentence with highest score. For step 6, MMI (Eq. (11)) will be applied on all binary trees shown in Fig. 1d. In each binary tree, a level penalty b is imposed on each level of sentences which is 0.01 times the level number. We estimated the initial value of b as a very small value to assign sentences in the highest levels with high scores and when we go down, we increase b by replacing the previous value by 0.01 times the level number because we aim to decrease the scores of sentences in the lowest levels, regarded as unimportant sentences. The summary sentence is selected from the binary tree by traversing all levels and applying MMI on the sentences in each level. Then order the summary sentences in the same order of the original document. Fig. 1e shows the generated summary of the input document shown in Fig. 1a. In Eq. (11), rel(si,sj) is the relevance between the two competitive sentences calculated using Eq. (12), si is the unselected sentence in the current binary tree, sj is the already selected sentence, ss is the list of already selected sentences, cs is the competitive sentences of the current binary tree, b is the penalty level which is 0.01 times the level number. In Eq. (12), n-friends(si,sj) is the shared friends (the group of sentences which are similar to both sentences si and sj), n-grams(si,sj) is the shared n-grams (the group of n-grams which are contained in both sentences si and sj) and sim(si,sj) is the similarity between those two sentences calculated using cosine similarity measure, (Erkan & Radev, 2004).

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

577

Fig. 2. Structure of a particle position.

4. Swarm-based summarization The swarm-based text summarization method (Binwahlan et al., 2009c) generates a summary of the original document by picking up the top n sentences which have the highest scores. n is equal to the predefined summary length and the sentences are scored using the same features presented in Section 3. A weight is assigned to each feature score. Assigning weights for text features is an attempt to find a way to differentiate between higher and less important features. An earlier attempt was done manually by Edmundson (1969). Assigning such weights in absence of a mechanism makes it difficult to know whether each feature got a suitable weight or not. For example, D is an input document, D = {T, s1, s2, s3, s4, s5, . . . , sn}. For each si, there is a set of features F, F = {f1, f2, f3, f4, f5, . . . , fm}. For each feature fj, there is a corresponding weight wj, W = {w1, w2, w3, w4, w5, . . . , wm}. WF is a set of features after adjusting each feature by its corresponding weight, WF = {w1f1, w2f2, w3f3, w4f4, w5f5, . . . , wmfm}. The mechanism we proposed is based on a well-known soft computing technique called the particle swarm optimization (PSO) (Kennedy & Eberhart, 1997). The role of PSO is to find and optimize the corresponding weight wj of each feature fj. The binary PSO was used, in which the particle position is represented as bit string. If the bit has value 1, it means the corresponding feature is selected. Otherwise, the corresponding feature is unselected. The first bit refers to the first feature, the second bit refers to second feature and so on, Fig. 2 illustrates the structure of a particle position. The velocity of the particle is represented in same way, where the value of each bit is retrieved from the sigmoid function. In each iteration, each particle selects a specific number of features. Based on the selected features, a summary for the current document is created and used as input for the fitness function. We use the ROUGE-1 (Lin, 2004) shown in Eq. (17) as the fitness function. By the end of each iteration, there will be five evaluation values because there are five particles (the number of particles can be assigned to any value but a higher number of particles causes an increase in the computational time (Kiran, Jetti, & Venayagamoorthy, 2006) and for this reason we keep the number of particles small). If the iteration is the first iteration, the evaluation value of each summary is selected as pbest for the corresponding particles and the best evaluation value among those five evaluation values is selected as gbest. If the iteration number is 2 or above, the new pbest and gbest are selected by comparing the new evaluation values with the previous pbests. If any new evaluation value is better than the current pbest, the evaluation value will be selected as pbest. If there is any change in the pbest for any particle, the new pbest will be compared with the current gbest, if it is better than gbest. It will be selected as new gbest. By the end of each run, the position of the particle with the gbest value is selected as vector for the best selected features of the current document. The feature weights W = {w1, w2, w3, w4, w5, . . . , wm} of the current document is calculated as average of the vectors created in each run. The final features weights vector is calculated over the vectors of the features weights of all documents in the data collection. We have formed the data set by selecting 100 documents from the first document sets in DUC 2002 which are (D061j, D062j, D063j, D064j, D065j, D066j, D067f, D068f, D069f, D070f, D071f, D072f and D073b) to be used as training and testing data. The swarm method is defined as combination of adjusted features scores as in (13)

swarm imprðsi Þ ¼

5 X

wj  score fj ðsi Þ

ð13Þ

j¼1

where swarm_impr(si) is The score of the sentence si, wj is the weight of the feature j produced by PSO, j = 1  5 which is the number of features and score_fj (si) is the score of the feature j given to sentence si. The steps of summarization process using this method can be concluded in Algorithm 2. Algorithm 2. Swarm-based summarization method 1. 2. 3. 4.

Input document D: take the document D as input, D = {T, s1, s2, s3, s4, s5, . . . , sn}. Preprocessing: same as step 2 in Algorithm 1. Features extraction: same as step 3 in Algorithm 1. Feature scores modification: 4.1 Use the optimized weights W = {w1, w2, w3, w4, w5} to adjust the feature scores F = {WSS, SC, SS_NG, sim_fsd, kwrd}. 4.2 Get the set of modified feature scores WF = {w1  WSS, w2  SC, w3  SS_NG, w4  sim_fsd, w5  kwrd}. 5. Sentence score calculation: use Eq. (13) to calculate the score of each sentence si in the document D.

swarm imprðsi Þ ¼

5 X j¼1

wj  score fj ðsi Þ

ð13Þ

578

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

6. Summary generation: 6.1 Order the sentences based on their score in descending order. 6.2 Select top n sentences as summary sentences. 6.3 Order the summary sentences in the same order as in the original document. For step 6, top n sentences equal summary length. (three is the summary length of document D in the same example, see Section 3). We chose particle swarm optimization for learning the text feature weights for some reasons: it’s simplicity where it has simple parameters, it does not have genetic operators like crossover and mutation and it has simple implementation. From computational view of point, it only needs low requirements of memory and time. In PSO unlike other evolutionary computation techniques, each particle flies in the search space with a velocity that is dynamically adjusted according to its own flying experience and its companions’ flying experience and retains the best position it ever encountered in memory. The particles have memory, which is important to the algorithm. Compared with other evolutionary computation techniques like GA, the mechanism of information sharing is totally different in PSO. In PSO, the particles share information with each other. In GA; chromosomes share the information with each other. However, in GA, the whole population flies like one group towards an optimal area. In PSO, only the particle with gbest announces the information to other particles. Compared to Artificial Neural Network (ANN) or to any other gradient methods, ANN is a local searching method depending on gradient information for finding the optimal solution. The optimal weights of ANN that can lead to finding the optimal minima are affected by the initial weights. In some cases, inappropriate initial weights case the method stuck at a sub-optimal solution near the local minima. PSO is a global search method and is adequate for problems with many local minima. PSO does not use gradient information. Therefore, it does not suffer from the problem of local minima. (Braik, Sheta, & Arieqat, 2008; Ercan, 2008; Kiran et al., 2006; Xu, He, Zhu, Liu, & Li Y., 2008). 5. Swarm diversity-based text summarization Swarm diversity-based method (Binwahlan, Salim, & Suanmali, 2009e) as shown in Fig. 3 is an integration of the two methods presented in the previous sections (MMI Diversity-based Text Summarization and Swarm-based Text Summarization). In MMI method, the score of the sentence in the binary tree is calculated using Eq. (9). In that equation, the importance of the sentence (impr(si)) appears twice in two positions and calculated using a simple combination of the text features score. We replaced the sentence importance in the second position by the sentence importance (swarm_impr(si)) which is calculated using Eq. (13). The new formula of scoring of the sentence in the binary tree is as shown:

scoreBT ðsi Þ ¼ imprðsi Þ þ ðswarm imprðsi Þ  friendsNoðsi ÞÞÞ

ð14Þ

where: impr(si) is the importance of the sentence si calculated using normal features (Eq. (10)), swarm_impr(si) is the importance of the sentence si calculated using Eq. (13) and friendsNo(si) is the number of sentences which are similar to sentence si. The steps of summarization process using this method can be concluded in Algorithm 3. Algorithm 3. Swarm diversity-based summarization method 1. 2. 3. 4. 5.

Input document D: take the document D as input, D = {T, s1, s2, s3, s4, s5, . . . , sn}. Preprocessing: same as step 2 in Algorithm 1. Features extraction: same as step 3 in Algorithm 1. Sentence clustering and binary tree building: same as step 4 in Algorithm 1. Sentence order in the binary tree: 5.1 Calculate the score of the sentence in the binary tree is using Eq. (14).

ScoreBT ðsi Þ ¼ imprðsi Þ þ ðswarm imprðsi Þ  friendsNoðsi ÞÞ

ð14Þ

where:

imprðsi Þ ¼ av gðWSSðsi Þ þ SCðsi Þ þ SS NGðsi Þ þ sim fsdðsi Þ þ kwrdðsi ÞÞ

ð10Þ

and

swarm imprðsi Þ ¼

5 X

wj  score fj ðsi Þ

ð13Þ

j¼1

5.2. Order the sentences in the binary tree based on their scores. 6. Summary generation: same as step 6 in Algorithm 1 with a different generated summary. For step 5, the sentences in the binary tree are ordered based on their scores. This can look somewhat as shown in Fig. 1d. Each level in the binary tree contains 2ln of the higher score sentences, where ln is the level number, ln = 0, 1, 2, . . . , n, the top level contains one sentence which is a sentence with highest score.

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

579

Fig. 3. Integrating of the MMI diversity-based Text Summarization and Swarm-Based Text Summarization.

The reason of making the features the central point for integrating the two methods is because the features are the cornerstone in the generating text summary. The summary quality is sensitive to how the sentences are scored based on the features used. Therefore, exploiting the advantages of different resources can be a good way for evaluating the sentences. 6. Fuzzy swarm-based text summarization In text summarization, the data which is used for training of a machine learning algorithm is mostly prepared by humans. Summaries from a number of humans are used for such purpose. The agreement among those humans on selection of specific sentences to form the summary sentence is fuzzy and low (Zha, 2002; Lin and Hovy, 2002; Murad and Martin, 2007), leading to an inconsistency of the training data, which is a common problem of machine learning approaches. As mentioned in Section 4, PSO is trained using that data to find and optimize the weights which will be assigned to each feature. In each iteration of the training process, new summaries are created for each document and evaluated against a human summary. By the end of the training process, the optimized weights of the text features are obtained W = {w1, w2, w3, w4, w5, . . . , wm}. Such weights can be considered imprecise values due to the inconsistency of the training data which is caused by the fuzzy and low agreement among those humans who created the summaries of training data. This problem did not gain a serious attention in all existing supervised machine learning approaches which have been applied to text summarization problem. Using the text features scores which were adjusted by those imprecise values of weights WF = {w1f1, w2f2, w3f3, w4f4, w5f5, . . . , wmfm} as inputs to fuzzy inference system can enable us to get more accurate sentence scores which can lead to creation of high quality summaries. For such problem, we model human knowledge in the form of IF-THEN rules. We believe that the integration of soft computing techniques could lead to a suitable way which is more effective than the manual way (Edmundson, 1969). By incorporating fuzzy logic with swarm intelligence, the risks, uncertainty, ambiguity and imprecise values can be flexibly tolerated. Fuzzy swarm-based text summarization (Binwahlan et al., 2009d) was implemented using Matlab fuzzy logic toolbox which contains built-in Mamdani’s fuzzy inference method (Mamdani & Assilian, 1975). The steps of summarization process using this method can be concluded in Algorithm 4. Algorithm 4. Fuzzy swarm-based text summarization method 1. 2. 3. 4.

Input document D: take the document D as input, D = {T, s1, s2, s3, s4, s5, . . . , sn}. Preprocessing: same as step 2 in Algorithm 1. Features Extraction: same as step 3 in Algorithm 1. Feature scores modification: 4.1 Use the optimized weights W = {w1, w2, w3, w4, w5} to adjust the feature scores F = {WSS, SC, SS_NG, sim_fsd, kwrd}. 4.2 Get the set of modified feature scores WF = {w1  WSS, w2  SC, w3  SS_NG, w4  sim_fsd, w5  kwrd}.

580

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

5. Sentence score calculation: calculate the score of each sentence si through fuzzy inference system: 5.1. Fuzzification: use the trapezoidal membership function (Eqs. (15a) or (15b)) for fuzzifying the crisp numerical values of text features WF = {w1  WSS, w2  SC, w3  SS_NG, w4  sim_fsd, w5  kwrd}.

8 x a j ij > ; > > bij aij > > < 1; Aij ðxj Þ ¼ d x ij j > > ; > dij cij > > : 0;

if aij < xj < bij if bij 6 xj < cij

ð15aÞ

if cij 6 xj < dij otherwise

where, aij  bij  cij  dij must hold. Or in short form:  

Aij ðxj ; aij ; bij ; cij ; dij Þ ¼ max min

  xj  aij dij  xj ;0 ; 1; bij  aij dij  cij

ð15bÞ

5.2. Inference: 5.2.1 Use the facts resulting in the fuzzification in step 5.1 and merge them with a series of production rules (IFTHEN rules) to perform the fuzzy reasoning process. 5.2.2 Use the trapezoidal membership function illustrated in Fig. 5 as the output fuzzy membership. 5.3. Defuzzification: 5.3.1. defuzzify the fuzzy results of the inference into a crisp output using Eq. (16).

Pq

j¼1 zj uc ðzj Þ

Z ¼ Pq

ð16Þ

j¼1 uc ðzj Þ

5.3.2. Get the value of Z as the final score of the sentence. 6. Summary generation: 6.1 Reorder the sentences in descending order based on their scores produced by the fuzzy inference system. 6.2 Select top n sentences as summary sentences. 6.3 Order the summary sentences in the same order as in the original document. For step 5.1, those features presented in Section 3. The features values were adjusted using the weights obtained in the training of the particle swarm optimization (PSO) explained in Section 4. This process forms the central point of merging of fuzzy logic with swarm intelligence. To determine the degree to which the input values belong to each of the appropriate fuzzy sets, we use the trapezoidal membership function due to its simplicity and wide use. Three fuzzy sets are used: low, medium and high. The trapezoidal membership function contains four parameters (a, b, c and d) with four breakpoints of the trapezium which determine the shape of the function. Moreover, the membership function is described by the two indices i and j. For example, the membership function Aij (aij, bij, cij, dij) belongs to the ith fuzzy set and the jth input variable. Bi (ai, bi, ci, di) is the output membership function of the ith fuzzy set. The trapezoidal curve is a function of a vector, x, (the jth fuzzy variable) in the ith fuzzy set and depends on the four scalar parameters a, b, c and d, as given in Eqs. (15a) or (15b). The parameters a and d locate the ‘‘feet” of the trapezoid and the parameters b and c locate the ‘‘shoulders”. The output of the trapezoidal membership function is a fuzzy degree of membership (in the range [0, 1]) in the fuzzy set. Fig. 4 illustrates the membership functions of fuzzification of the input value of the sentence centrality feature (SC).

Fig. 4. The trapezoidal membership functions of the sentence centrality feature (SC).

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

581

For step 5.2, for inference process, around 200 IF-THEN rules were defined by human experts. Three human experts were asked to define IF-THEN rules based on their experience. We have received 100 rules from the first expert, 0 rules from the second expert and 100 rules from the third expert. Therefore, the total number of IF-THEN rules we got it is 200 rules. The following is an example for those rules: If (WSS is H) and (SC is H) and (S_FD is M) and (SS_NG is H) and (KWRD is H) then (output is important) For step 5.3, For defuzzification, which is to convert the fuzzy results of the inference into a crisp output which represents the final score of the sentence, we used the centroid method (Sivanandam, Sumathi, & Deepa, 2006) in Eq. (16), which returns the center (one crisp number) of the area under the curve of the output fuzzy set. In Eq. (16), uc is the membership in class c at value zj. The class c represents an output fuzzy set and zj is the value resulting in application of the fuzzy rule. That value is considered as the fuzzy sentence score value. Therefore, after performing the calculation in Eq. (16) for all fuzzy sentence scores and their membership degrees, the final score of the sentence Z will be obtained. 7. Fuzzy swarm diversity hybrid model for automatic text summarization Our hybrid model consists of three components: diversity-based method (presented in Section 3), fuzzy swarm-based method (introduced in Section 6) and third component which has two different forms: the first form as integration of swarm-based method with diversity-based method (swarm diversity was discussed in Section 5) and second form is only swarm-based method (presented in Section 4). This section discusses the combination of the three components in a different hybrid model; Fig. 6a shows the proposed model in which diversity, swarm diversity and fuzzy swarm-based methods were combined together. In this form of the model, the diversity dominates the behavior of the model, where the sentence which is selected by only fuzzy swarm method will not be included in the final summary. For inclusion, it must also be selected by either the diversity-based method, the swarm diversity-based method or both. The central part for combining the three components is the sentence selector. First, each component creates its own summary for the input document, and then the three summaries are passed as input to the sentence selector. The sentence selector then picks up the sentences with high scores. The score of the sentence in this phase depends on the method which selects that sentence for inclusion in the summary. The scores are 1, 1.5 and 2 for diversity-based method, swarm diversity-based method and fuzzy swarm-based method, respectively. The swarm diversity-based method has higher sentence score than diversity-based method because it is integrating two methods. The fuzzy swarm-based method has higher sentence score than the other two methods because we need to make the sentence selected by the two diversity-based methods get a less score than the sum of score of the sentence selected by one of the diversity-based methods and the fuzzy swarm-based method (3 or 3.5), this is to ensure that the sentence selected by diversity-based method and non diversity method is more important than the sentence selected by two diversity-based methods. In case, all remaining sentences have equal scores; the sentences selected by the swarm diversity-based method are chosen because this method has advantage of the other two methods. In the second form of the proposed model Fig. 6b, the fuzzy swarm-based method replaces the swarm diversity-based method and the swarm-based method replaces the fuzzy swarm-based method. The new structure of the model consists of diversity-based method, fuzzy swarm-based method and swarm-based method. In this structure of the model, the diversity constraint is no longer imposed on the model behavior. That means the diversity-based method works same as fuzzy swarm-based method. Based on this, the sentence which was selected by fuzzy swarm-based method alone or by fuzzy swarm-based method plus swarm-based method and has high score can be included in the final summary but the sentence which was selected only by swarm-based method will not be included in the final summary. The new scores of the sentence are 1, 1.5 and 2 for diversity-based

Fig. 5. The trapezoidal membership function of the output.

582

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

method, fuzzy swarm-based method and swarm-based method, respectively. Algorithm 5 illustrates the full steps of the proposed model in the first form. Algorithm 5. Fuzzy swarm diversity hybrid model for automatic text summarization (first form). 1. 2. 3. 4.

Input document D: For example, the input document D is as that in Fig. 1a. Preprocessing: same as step 2 in Algorithm 1. Features Extraction: same as step 3 in Algorithm 1. Summary generation using MMI diversity-based method: 4.1 Sentence clustering and binary tree building: same as step 4 in Algorithm 1. 4.2 Sentence order in the binary tree: same as step 5 in Algorithm 1. 4.3 Summary generation: same as step 6 in Algorithm 1. 5. Summary generation using swarm diversity-based method: 5.1 Sentence clustering and binary tree building: same as step 4 in Algorithm 3. 5.2 Sentence order in the binary tree: same as step 5 in Algorithm. 3. 5.3 Summary generation: same as step 6 in Algorithm 3. 6. Summary generation using fuzzy swarm-based method: 6.1 Feature scores modification: same as step 4 in Algorithm 4. 6.2 Sentence score calculation: same as step 5 in Algorithm 4. 6.3 Summary generation: same as step 6 in Algorithm 4. 7. Final summary generation: 7.1 Pass the three summaries created in steps: 4 ‘‘summary 1”, 5 ‘‘summary 2”, and 6 ‘‘summary 3” as inputs to the sentence selector. 7.1.1 Give each sentence a score based on the method which selects that sentence for inclusion in the summary: 7.1.1.1 If the sentence selected by diversity-based method, give it score 1, otherwise give it 0. 7.1.1.2 If the sentence selected by swarm diversity-based method, give it score 1.5, otherwise give it 0. 7.1.1.3 If the sentence selected by fuzzy swarm-based method, give it score 2, otherwise give it 0. 7.1.2 Sum up the scores of each sentence: 7.1.2.1 Check each sentence in each summary for the methods which selected that sentence. 7.1.2.2 Sum up the scores given by the three methods which selected that sentence. 7.2 Reorder the sentences of each summary. 7.3 Pick up the top n sentences from both summary 1 and summary 2 with highest scores. 7.4 In case, all remaining sentences in the two summaries have equal scores; then pick up the sentences selected by the swarm diversity-based method. 7.5 Order the summary sentences in the same order as in the original document.

For step 7.3, the sentences of summary 3 are used only for helping the sentences of other two summaries to get higher scores.

8. Generalizing the proposed model results via confidence limits The aim of generalization is to get one value which can express all values in the population of the results. For each summary, evaluation values (recall, precision and f-measure) are created using the evaluation measure ROUGE (Lin, 2004). To measure the performance of the proposed model, we need to check each evaluation value separately. Doing so is a tough job and a waste of resources. The solution is to use the sample of results (summaries evaluation values) to calculate a range within which any value in the population is 95% likely to belong to (95% confidence interval). The minimum and maximum values in that range are called the confidence limits. The interval is all values between the confidence limits. In our study, the generalization task is accomplished by ROUGE package (Lin, 2004) which generalizes the evaluation results using bootstrapping (resampling) method. 9. Experimental design The proposed model will be evaluated using 100 documents selected from the first document sets (D061j, D062j, D063j, D064j, D065j, D066j, D067f, D068f, D069f, D070f, D071f, D072f and D073b) of the DUC 2002. The Document Understanding Conference (DUC) data collection is a standard data set for testing any summarization method. DUC 2002 data consist of training set and test set. The training set is comprised of 30 sets of approximately 10 documents each, together with two human written summaries for each document. The whole set of human written summaries were written by 10 human experts. The test set is comprised of 30 document sets. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) toolkit Lin (2004) is used for evaluating the text summarization methods, where ROUGE compares a system generated summary against a human generated summary to measure the

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

583

- Segmentation - Stop Words Removal - Stemming

Document

Preprocessing

Features Extraction

MMI Diversity based Method

Swarm MMI Diversity based Method

Fuzzy Swarm based Method

Summary

Summary

Summary

Sentence Selector

Final Summary

Fig. 6a. Fuzzy swarm diversity hybrid model 1st form for automatic text summarization.

quality. ROUGE is found to be the most appropriate evaluation metric. It is the main metric in the DUC text summarization evaluations. ROUGE has the following measures: ROUGE-N (N is the number of ngrams), ROUGE-L, ROUGE-W with weighting factor a = 1.2, ROUGE-S and ROUGE-SU (maximum skip distance dskip = 1, 4, and 9). ROUGE-N calculates the shared n-grams between system generated summary and one or a set of human generated summaries producing recall score:

P

P

S2fReferencesummariesg gramn 2S P P S2fReferencesummariesg

count match ðgramn Þ

gramn 2S countðgramn Þ

ð17Þ

where n is the length of the n-gram (gramn) and countmatch is the most possible number of n-grams shared between a system generated summary and a set of reference summaries produced by human. ROUGE-L calculates the longest common subsequence (LCS). Suppose we have two sentences X and Y, and LCS is a common subsequence with maximum length. In our experiment, we use ROUGE-N (N = 1 and 2) and ROUGE-L. The reason for selecting these measures is because they work well for single document summarization (Lin, 2004). In DUC 2002 document sets, each document set contains two model or human generated summaries for each document. We gave the names H1 and H2 for those two model summaries. The human summary H2 is used as benchmark to measure the quality of our proposed model summary, while the human summary H1 is used as reference summary. Beside the human with human benchmark (H2–H1) (H2 against H1); we also use another benchmark which is MS word summarizer (Msword). In addition to the two benchmarks (MS word summarizer and human summarizer) we also compare the performance of our model with the best system (sys19) (Harabagiu et al., 2002) and worst system (sys30) (Zajic, Dorr, & Schwartz, 2002) that have participated in DUC 2002 (Nenkova, 2005).

10. Experimental results Tables 1–3 show comparison between the proposed model and the other six methods (swarm model (M1), fuzzy swarm (M2), Msword, H2–H1, sys19 and sys30) based on the average recall, precision and F-measure using ROUGE-1, ROUGE-2 and ROUGE-L respectively, where those averages were generalized using the 95%-confidence interval. Figs. 7–9 visualize the

584

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

- Segmentation - Stop Words Removal - Stemming

Document

Preprocessing

Features Extraction

MMI Diversity based Method

Fuzzy Swarm based Method

Swarm Based Method

Summary

Summary

Summary

Sentence Selector

Final Summary

Fig. 6b. Fuzzy swarm diversity hybrid model 2nd form for automatic text summarization.

Table 1 The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-1 at the 95%-confidence interval. Method

AVG-R

AVG-P

AVG-F

Msword Sys19 Sys30 M1 M2 M3 M4 H2–H1

0.39306 0.40259 0.06705 0.43028 0.43622 0.42753 0.43962 0.49657

0.48487 0.50244 0.68331 0.47741 0.49126 0.49493 0.49548 0.49613

0.42477 0.43642 0.1209 0.44669 0.45524 0.44947 0.45897 0.49605

same results drawn in the Tables 1–3 respectively. The purpose of using the human summarizer (H2–H1) as benchmark is to show how much the performance of the two forms of the proposed model, swarm model, fuzzy swarm, Msword, sys19 and sys30 summarizers is acceptable compared to the performance of a human (H2–H1). As abbreviation, we will refer to swarm model as M1, fuzzy swarm as M2, the first form of the proposed model as M3 and the second form of the proposed model as M4. Based on the generalization of the results in Tables 1 and 3 for ROUGE-1 and ROUGE-L, the second form of the proposed model (M4: composed of fuzzy swarm-based method, swarm based model and diversity-based method) got better performance than the first form of the proposed model (M3: composed of fuzzy swarm-based method, swarm diversity-based method and diversity-based method). It is also better than swarm model (M1), fuzzy swarm (M2) and the three benchmarks (Msword, Sys19 and Sys30). The results in Tables 2 for ROUGE-2 show that Fuzzy swarm (M2) based method outperforms all other methods (except H1–H2) including the proposed model. In general, the proposed model provides good enhancement indicating that the combination of soft computing techniques with diversity-based method has positive effect on the performance. Although the first form of the proposed model got poorer performance than the second form of the proposed model, we consider it preferable because it takes into account the redundancy problem, in which the diversity dominates the behavior of the model keeping the final summary empty from redundant information.

585

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

Table 2 The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-2 at the 95%-confidence interval. Method

AVG-R

AVG-P

AVG-F

Msword Sys19 Sys30 M1 M2 M3 M4 H2–H1

0.16325 0.1842 0.03417 0.18828 0.19702 0.18721 0.19287 0.20957

0.21066 0.24516 0.38344 0.21622 0.23037 0.2293 0.22618 0.2094

0.17947 0.20417 0.06204 0.19776 0.20847 0.20073 0.2042 0.20938

Table 3 The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-L at the 95%-confidence interval. Method

AVG-R

AVG-P

AVG-F

Msword Sys19 Sys30 M1 M2 M3 M4 H2–H1

0.36605 0.37233 0.06536 0.39674 0.40144 0.39296 0.40463 0.46524

0.45272 0.46677 0.66374 0.44143 0.45355 0.45772 0.45769 0.4649

0.39604 0.40416 0.11781 0.41221 0.41937 0.41387 0.42291 0.46479

AVR_Recall

AVG_Precision

AVG_F-measure

0.8

AVG (R, P AND F)

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Msword

Sys19

Sys30

M1

M2

M3

M4

H2-H1

Fig. 7. The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-1.

11. Discussion We could infer three observations from the experimental results. First, selection of appropriate features using particle swarm optimization (PSO) and assignment of appropriate weights to them can give better performance. The scoring of sentences based on adjusted weight features using fuzzy logic can also improve the performance. The filtering of the redundant sentences using the diversity selection gives enhanced performance. Combining all these criteria gives better performance compared to considering individual criteria and compared to other methods individually. Secondly, the structure of the model in terms of how its constituents were combined together played an important role in improving the performance. Based on the generalization of the results, the second form of the proposed model (M4: composed of fuzzy swarm-based method, swarm based model and diversity-based method) got better performance than the first form of the proposed model (M3: composed of fuzzy swarm-based method, swarm diversity-based method and diversity-based method). It is also better than the swarm model (M1), fuzzy swarm (M2), Msword, sys19 and sys30 summarizers. The experimental results supported the incorporation of fuzzy logic with swarm intelligence to make the risks, uncertainty, ambiguity and imprecise values for

586

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

AVGR_Recall

AVG_Precision

AVG_F-measure

0.45

AVG (R, P AND F)

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

Msword

Sys19

Sys30

M1

M2

M3

M4

H2-H1

Fig. 8. The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-2.

AVG_Recall

AVG_Precision

AVG_F-measure

0.7

AVG (R, P AND F)

0.6 0.5 0.4 0.3 0.2 0.1 0

Msword

Sys19

Sys30

M1

M2

M3

M4

H2-H1

Fig. 9. The proposed model (M3 and M4), swarm model (M1), fuzzy swarm (M2), Sys19, Sys30, msword summarizer and H2–H1 comparison: average recall using ROUGE-L.

choosing the weights (scores) of the text features to be flexibly tolerated. Finally, the low overlap between the summaries generated manually by the humans made achieving high evaluation values difficult. For instance, we found that the overlap between the two human summaries (H2 and H1) which we used in this study is 49%. Another thing to note is that the methods used in this study including the benchmarks create the summary by selecting 20% out of the total number of the sentences in the original document. A few of the 20% sentences are added to the final summary due to the limited summary length. Sentences in the human summary are not exactly the same as their original sentences in terms of the length. We found that the human summarizer selects a small part of each sentence to be included in the summary. This causes the number of sentences in the human summary to be greater than the number of sentences in the system summary. Based on this observation, we could see that the results based on average recall are less than the results based on average precision because in the recall evaluation, the total shared n-grams between the human summary and the system summary is divided by the human summary length but in the precision evaluation the total shared n-grams between the human summary and the system summary is divided by the system summary length. Although the first form of the proposed model got less performance than the second form, we consider it preferable because the diversity consideration keeps the final summary void of redundant information. 12. Conclusion and future work In this paper, we introduced a different hybrid model based on fuzzy logic, swarm intelligence (PSO) and diversity selection for text summarization problem. The purpose of employing the swarm intelligence (PSO) for producing the text features weights was to emphasize on dealing with the text features fairly based on their importance. The weights suggested by swarm intelligence (PSO) were used to adjust the text features scores, which played an important role in the differentiation

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

587

between higher and less important features. In the fuzzy logic, the trapezoidal membership function was used for fuzzifying the crisp numerical values of text features. The features values were adjusted using the weights obtained in the training of the particle swarm optimization (PSO). This forms the central point of merging of fuzzy logic with swarm intelligence. To determine the degree to which the input values belong to each of the appropriate fuzzy sets, we use the trapezoidal membership function. Three fuzzy sets are used: low, medium and high. For the inference process, the facts resulted in the fuzzification step need to be merged with a series of production rules (IF-THEN rules) to perform the fuzzy reasoning process. For defuzzification which is to convert the fuzzy results of the inference into a crisp output which represents the final score of the sentence, we used the centroid method. After getting the scores of all sentences produced by the fuzzy inference system, the sentences are reranked based on those scores in descending order. The top n sentences are selected as the summary, where n is equal to the compression rate. We presented the proposed model in two forms based on the structure. The difference between the two forms is: in the first form of the model the diversity dominates the behavior of the model, but in second form, the diversity does not dominate the model behavior and it works the same in the way as fuzzy swarm-based method. The results showed that the proposed model in the second form performs well and outperforms the proposed model in the first form, the swarm model, the fuzzy swarm method and the benchmark methods. This gives an indication that selection of appropriate features using particle swarm optimization (PSO) and assignment of appropriate weights to them can give better performance. The scoring of sentences based on adjusted weight features using fuzzy logic can also improve the performance. The filtering of the redundant sentences using the diversity selection also gives better performance. Combining all these criterias give better performance compared to just considering individual criteria and compared to the other three methods individually. For future work, we are planning to improve our model performance by adding an annotator module in order to provide semantic annotations of the sentences by using semantic role labeling and the lexical database for the English language (WordNet). The aim of such action is to avoid the inclusion of unimportant semantic concepts in the summary, which will consume the summary length and prevent important information from being included in the summary. We are also planning to apply the proposed model for multi document summarization problem to find out if they can give better. Acknowledgment This project is sponsored partly by the Ministry of Science, Technology and Innovation under E-Science grant 01-01-06SF0502, Malaysia. References Alemany, A. L., & Fort, M. F. (2003). Integrating cohesion and coherence for automatic summarization. In EACL 2003 student session (pp. 1–8). Budapest: ACL. Aretoulaki, M. (1994). Towards a hybrid abstract generation system. In International conference on new methods in language processing, Manchester (pp. 220– 227). Binwahlan, M. S., Salim, N., & Suanmali, L. (2009a). MMI diversity based text summarization. IJCSS International Journal of Computer Science and Security, 3(1), 23–33. Binwahlan, M. S., Salim, N., & Suanmali, L. (2009b). Swarm based features selection for text summarization. IJCSNS International Journal of Computer Science and Network Security, 9(1), 175–179. Binwahlan, M. S., Salim, N., & Suanmali, L. (2009c). Swarm based text summarization. In Proceedings of the international conference on IACSIT spring conference, April 17–20, Singapore (pp. 145–150). Binwahlan, M. S., Salim, N., & Suanmali, L. (2009d). Fuzzy swarm based text summarization. Journal of Computer Science, 5(5), 338–346. Binwahlan, M. S., Salim, N., & Suanmali, L. (2009e). Integrating of the diversity and swarm based methods for text summarization. In The 5th postgraduate annual research seminar (PARS), 17–19 June, Johor, Malaysia (pp. 523–527). Braik, M., Sheta, A., & Arieqat, A. (2008). A comparison between GA and PSO in training ANN to model the TE chemical process reactor. In Proceedings of the AISB 2008 symposium on swarm intelligence algorithms and applications, 1–4 April 2008 (Vol. 11, pp. 24–30). University of Aberdeen. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR ‘98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, 24–28 August, Melbourne, Australia (pp. 335– 336). Conroy, J. M., & O’leary, D. P. (2001). Text summarization via hidden markov models. In Proceedings of SIGIR ’01, 9–12 September, New Orleans, Louisiana, USA (pp. 406–407). Cui, X., Potok, T. E., & Palathingal, P. (2005). Document clustering using particle swarm optimization. In IEEE swarm intelligence symposium, 8–10 June, Pasadena, California (pp. 185–191). Cunha, I. D., Fernández, S., Morales, P. V., Vivaldi, J., SanJuan, E., & Torres-Moreno, J. M. (2007). A new hybrid summarizer based on vector space model, statistical physics and linguistics. In A. Gelbukh & A. F. Kuri Morales (Eds.), MICAI 2007, LNAI 4827 (pp. 872–882). Berlin, Heidelberg: Springer-Verlag. Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the Association for Computing Machinery, 16(2), 264–285. Ercan, M. F. (2008). A performance comparison of PSO and GA in scheduling hybrid flow-shops with multiprocessor tasks. In Proceedings of the 2008 ACM symposium on applied computing, SAC’08, 16–20 March, Fortaleza, Ceará, Brazil (pp. 1767–1771). Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22, 457–479 (AI Access Foundation). Fattah, M. A., & Ren, F. (2008). GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Computer Speech and Language, 23(1), 126–144. Harabagiu, S. M., & Lacatusu, F. (2002). Generating single and multi-document summaries with GISTEXTER. In Proceedings of the workshop on text summarization, 11–12 July, Philadelphia, PA, USA. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ, USA: Prentice Hall. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks, 27 November–1 December, Perth, Australia (pp. 1942–1948). Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm algorithm, systems, man, and cybernetics. In IEEE international conference on computational cybernetics and simulation (Vol. 5, pp. 4104–4108). New York.

588

M.S. Binwahlan et al. / Information Processing and Management 46 (2010) 571–588

Kiani, A., & Akbarzadeh, M. R. (2006). Automatic text summarization using: Hybrid fuzzy GA–GP. In IEEE international conference on fuzzy systems, 16–21 July. Vancouver, BC, Canada (pp. 977–983). Kiran, R., Jetti, S. R., & Venayagamoorthy, G. K. (2006). Online training of a generalized neuron with particle swarm optimization. In Proceedings of international joint conference on neural networks, July 16–21. Vancouver, BC, Canada. Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable document summarizer. In Proceedings of the ACM. SIGIR conference, July, New York, USA (pp. 68–73). Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K., & Tajoddin, A. (2008). Optimizing text summarization based on fuzzy logic. In Proceedings of the seventh IEEE/ACIS international conference on computer and information science, 14–16 May (pp. 347–352). Washington, DC, USA: IEEE Computer Society. doi:10.1109/ICIS.2008.46. Lin, C. Y. (1999). Training a selection function for extraction. In Proceedings of the eighteenth annual international ACM conference on information and knowledge management (CIKM), 2–6 November. Kansas City, Kansas (pp. 55–62). Lin, C. (2004). Rouge: A package for automatic evaluation of summaries. In Proceedings of the workshop on text summarization branches out, 42nd annual meeting of the association for computational linguistics, 25–26 July. Barcelona, Spain (pp. 74–81). Lin, C. Y., & Hovy, E. (1997). Identifying topics by position. In Proceedings of the fifth conference on applied natural language processing, March, San Francisco, CA, USA (pp. 283–290). Lin, C. Y., & Hovy, E. (2002). Manual and automatic evaluation of summaries. In Proceedings of the ACL-02 Workshop on Automatic Summarization, July. Morristown, NJ, USA (pp. 45–51). Mamdani, E. H., & Assilian, S. (1975). An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man–Machine Studies, 7(1), 1–13. Mani, I. (2001). Automatic summarization. Amsterdam: John Benjamins Publishing Company. Merwe, V. D., & Engelbrecht, A. P. (2003). Data clustering using particle swarm optimization. In Proceedings of IEEE congress on evolutionary computation, 8– 12 December, Canbella, Australia (pp. 215–220). Moens, M. (2007). Summarizing court decisions. Information Processing and Management, 43(6), 1748–1764. Murad, M. A. A., & Martin, T. P. (2007). Similarity-based estimation for document summarization using fuzzy sets. International Journal of Computer Science and Security, 1(4), 1–12. Nenkova, A. (2005). Automatic text summarization of newswire: Lessons learned from the document understanding conference. In Proceedings of American Association for Artificial Intelligence, Pittsburgh, USA. Osborne, M. (2002). Using maximum entropy for sentence extraction. In Proceedings of the ACL’02 workshop on automatic summarization, July, Morristown, NJ, USA (pp. 1–8). Sivanandam, S. N., Sumathi, S., & Deepa, S. N. (2006). Introduction to fuzzy logic using MATLAB (1st ed.). New York: Springer-Verlag. Svore, K., Vanderwende, L., & Burges, C. (2007). Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, June (pp. 448–457). Prague: Association for Computational Linguistics. Sweeney, S., Crestani, F., & Losada, D. E. (2008). ‘Show me more’: Incremental length summarization using novelty detection. Information Processing and Management, 44(2), 663–686. doi:10.1016/j.ipm.2007.03.012. Wang, Z., Zhang, Q., & Zhang, D. (2007). A pseudo-based web document classification algorithm. In IEEE eighth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, 30 July–1 August, Qingdao, China (pp. 659–664). Xu, S., He, Y., Zhu, K., Liu, T., & Li Y. (2008). A PSO-ANN integrated model of optimizing cut-off grade and grade of crude ore. In Proceedings of fourth international conference on natural computation, 18–20 October (Vol. 7, pp. 275–279). Yeh, J., Ke, H., Yang, W., & Meng, I. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, 41(1), 75–95. doi:10.1016/j.ipm.2004.04.003. Zajic, D., Dorr, B., & Schwartz, R. (2002). Automatic headline generation for newspaper stories. In Proceedings of the workshop on text summarization, 11–12 July, Philadelphia, Pennsylvania, USA. Zha, H. (2002). Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of 25th ACM SIGIR, 11–15 August. Tampere, Finland (pp. 113–120). Ziegler, C., & Skubacz, M. (2007). Content extraction from news pages using particle swarm optimization on linguistic and structural features. In IEEE/WIC/ ACM international conference on web intelligence, 2–5 November, Silicon Valley, USA (pp. 242–249).