Computer assisted writing system

Computer assisted writing system

Expert Systems with Applications 38 (2011) 804–811 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www...

1MB Sizes 0 Downloads 54 Views

Expert Systems with Applications 38 (2011) 804–811

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Computer assisted writing system Chien-Liang Liu *, Chia-Hoang Lee, Ssu-Han Yu, Chih-Wei Chen Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC

a r t i c l e

i n f o

Keywords: Computer assisted writing Natural language processing Keywords

a b s t r a c t In this paper, we designed and implemented a computer assisted writing system and the application domain is love letter. The system includes text generation module, synonym substitution module and simile expression module. A text generation model is proposed based on keyword generation model and sentence generation model. The keyword generation model extracts important keywords from the corpus and they will become the backbone of the template. Meanwhile, the sentences between keywords will construct the content of the template and candidate sentences are retrieved from the corpus based on statistical analysis. Synonym substitution and simile expression are two modules that could enrich the content of the text. Synonym terms are retrieved from the Internet and a simile expressions discovery mechanism is proposed to collect related simile expressions. The prototype system has shown that it could work well on love letter application domain and the concept of this research could be extended to another domain with minor modification. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction In essence, the ability to write plays an important role in language learning. Not only can it improve users’ writing skills, but also it helps them develop the ability of communication. In recent years, essays have become a major part of a formal education and students are encouraged to have the ability to write in many exams. Although writing is very important, it is a difficult job for many users to write an article from scratch. Besides, writing is important not only in schools, but also in our daily life. For example, when people would like to write a love letter, the ability of writing will help the users compose a good love letter. In general, reading and writing are closely related, since reading a lot of articles means that the users may have enough material to compose their articles. Besides, learning from examples could also help users take the work of a master writer and use the structure and patterns to compose their own articles. As with the popularity of Internet, Internet has become a new knowledge source and the concept of ‘‘Web 2.0”, which is the business revolution in the computer industry caused by the move to the Internet as a platform, facilitates communication, information sharing, interoperability, and collaboration on the World Wide Web. Many people are willing to share their ideas and works with other people through new services such as social-networking sites, video-sharing sites, wikipedia and blogs. Therefore, Internet could be regarded as a huge database and many literature works could be obtained from the * Corresponding author. Tel.: +886 3 5131503; fax: +886 3 5734935. E-mail address: [email protected] (C.-L. Liu). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.07.038

Internet. Meanwhile, computers have become essential equipments, so it is appropriate to make use of interactive computer assisted writing tools integrating with Internet data to assist users in essay writing. In practice, the availability of enhanced word processors, spelling checkers and grammar checkers could offer assistance to users in the process of writing. Over the last few decades, much research has been done on spelling and grammar checkers and these checkers have been integrated into many word processor softwares. In practice, these tools could correct users writing errors, but they could not assist the users in writing an article from scratch. For example, if a user would like to compose a love letter, the biggest problem is how to organize the content and how to use appropriate sentences to express his/her feelings instead of the spelling and grammar errors. In addition, people tend to use some sentences or terms that have been appeared in other articles. Moreover, templates could provide a framework and reduce the physical effort spent on writing so that people can pay attention to organization and content. The observation above provides us the motivation to construct a computer assisted writing system for users to compose a love letter. Meanwhile, the concept of this research could be extended to another domain with minor modification. The template content will be generated by a text generation model based on statistical analysis. In essence, it is necessary to understand the meaning of the text to produce high-quality and fluent text. However, it is still infeasible to apply natural language understanding approach to text generation, because natural language understanding will require extensive knowledge about the outside world and the ability to manipulate it. In theory, the task

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

of a text generation system can be characterized as mapping from some input data to an output text. Meanwhile, the job of machine translation is to render in one language the meaning expressed by a passage of text in another language. Therefore, input data in text generation system is similar to the source language in machine translation and text generation process is similar to translation process. Many problems within natural language processing apply to both generation and understanding and statistical natural language processing (Manning & Schuetze, 1999) uses stochastic, probabilistic and statistical methods to resolve some of the difficulties. Statistical approach could be applied to any specific pair of languages without linguistic rules. On the other hand, rule-based or grammar-based translations systems require the manual development of linguistic or grammatical rules, so these approaches will be costly and they could not be applied to other languages. Therefore, the success of statistical approach in machine translation provides us the motivation to adopt statistical approach on text generation. Based on the statistical approach and the observation, the computer assisted writing system proposed in this paper is text generation based. The text generation model is based on a keyword generation model and a keyword extraction algorithm is proposed to discover special keywords from text corpus. Besides, a keyword expansion model is proposed to expand core keywords. The expanding keywords will act as the backbone of the text and statistical mechanism will be adopted to select appropriate sentences from text corpus to fill in the content between keywords. Moreover, synonym of terms and simile expressions could enrich the content of the article and enhance the variety of the articles. Hence, synonym substitution module and simile expressions module are used to decorate the text in the system. In synonym substitution module, the synonym of term is retrieved from the Internet. Meanwhile, in simile expressions module, we proposed a simile expressions discovery mechanism, which adopts 14 simile terms as seeds to collect related simile expressions. The experiment shows that the simile expressions collected from this approach could provide interesting simile expressions. As our experience with a first system has shown that the computer assisted text generation system works well and it could help students develop the ability of essay writing by learning from examples. The rest of the paper is structured as follows. In Section 2 a survey of related researches on spelling and grammar checker and natural language generation is presented. Section 3 describes the text generation model which could be decomposed into keyword generation model and sentence generation model, respectively. In Section 4, the system architecture and design is presented. Section 5 describes the experiment and evaluation result. Finally, Section 6 contains the conclusion.

2. Related work

805

rection for the treatment of context-dependent spelling errors. Although spelling and grammar checkers could help users correct spelling and grammar errors, these tools could not help users organize and compose an article from scratch. 2.2. Text generation In practice, text generation system, which investigates how computer programs can be made to produce high-quality natural language text, could provide users a template of the article and that could facilitate the users to finish an article. Practically, text generation technique has been applied to many application domains. Goldberg, Driedger, and Kittredge (1994) proposed to generate textual weather forecasts from representation of graphical weather map. Meanwhile, Reiter, Mellish, and Levine (1995) proposed to use natural language generation (NLG) techniques to automatically produce technical documentation from a domain knowledge base and linguistic and contextual models. Buchanan et al. (1995) built an intelligent explanation module that produces an interactive information sheet containing explanations in everyday language that are tailored to individual patients, and responds intelligently to follow-up questions about topics covered in the information sheet. Williams and Reiter (2008) proposed SkillSum, a NLG system that generates a personalized feedback report, to generate basic skills report for low-skilled readers. In the following sections, we give an overview about text generation systems that adopt different approaches. 2.2.1. Corpus-based Langkilde and Knight (1998) introduced Nitrogen, a system that implements a new style of generation in which corpus-based ngram statistics are used in place of deep, extensive symbolic knowledge to provide very large scale generation. However, the quality of the output is limited by the use of only bigram word statistical information, which cannot handle long-distance agreement, or distinguish likely collocation from unlikely grammatical structure. The experiments of Nitrogen showed that corpus-based knowledge greatly reduced the need for deep, hand-crafted knowledge. This knowledge, in the form of n-gram (word-pair) frequencies, could be applied to a set of semantically related sentences to help sort good ones from bad ones. 2.2.2. Keyword-based Uchimoto, Isahara, and Sekine (2002) proposed to generate sentences from ‘‘keywords” or ‘‘headwords”. This model considers not only n-gram information, but also dependency information between words. The construction part generates text sentences in the form of dependency trees by using complementary information to replace information that is missing to generate natural text sentences based on a particular monolingual corpus. The evaluation part consists of a model to generate an appropriate text when given keywords.

2.1. Spelling checker and grammar checker Over the last few decades, much research has been done on spelling and grammar checkers and these checkers have been integrated into many word processor softwares. Genthial and Courtin (1992) proposed an architecture of a computer assisted writing system which includes morphological parsing and generation, lexical correction techniques, syntactic parser and document editing and exporting. Kukich (1992) focused on non-words error correction, isolated-word error correction and context-dependent word correction to correct words in text. Bustamante and Leon (1996) presented a grammar and style checker for Spanish and Greek native writers. Paggio (2000) developed a spelling and grammar corrector for Danish and addressed in particular the issue of how a form of shallow parsing is combined with error detection and cor-

2.2.3. Template-based In practice, some simple approaches such as canned text systems and template systems could be used to generate high-quality text. For example, canned text systems could be used to produce error messages, warnings, letters, etc. Meanwhile, template systems could be used in the circumstances where a message must be produced several times with slight alterations. Template systems are often adopted in form letters, in which a few open fields are filled in specified constrained ways. The template approach is used mainly for multi sentence generation, particularly in applications whose texts are fairly regular in structure. Templates only work in very controlled or limited situations. They cannot provide the expressiveness, flexibility or scalability that many real domains need (Langkilde & Knight, 1998).

806

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

3. Text generation model The job of machine translation is to render in one language the meaning expressed by a passage of text in another language. Statistical machine translation is a machine translation paradigm where translations are generated based on the statistical models whose parameters are derived from the analysis of bilingual text corpora. Meanwhile, the task of a natural language generation system can be characterized as mapping from some input data to an output text. Therefore, input data in text generation is similar to the source language in machine translation and text generation process is similar to translation process. In machine translation, statistical approach has been widely used and it has shown its capability. Essentially, statistical approach could be applied to any specific pair of languages without linguistic rules. On the other hand, rule-based or grammar-based translations systems require the manual development of linguistic or grammatical rules, so these approaches will be costly and they could not be applied to other languages. The ideas behind statistical machine translation come out of information theory. Given a French string f, the job of translation system is to find the string e that the native speaker had in mind when he produced f. In other words, the translation process could be characterized by using Bayes’ theorem as shown in Eq. (1) (Brown, Pietra, Pietra, & Mercer, 1993).

^e ¼ argmax Pðe j f Þ ¼ e

PðeÞPðf j eÞ Pðf Þ

ð1Þ

Since the denominator here is independent of e, finding ê is the same as finding e so as to make the product P(e)P(f j e) as large as possible and the Eq. (2) shows fundamental equation of machine translation.

^e ¼ argmaxðPðeÞPðf j eÞÞ

ð2Þ

e

According to our observation, when a student would like to finish an essay about a specific subject, he/she tends to start from the concepts related to this subject. In addition to the key concepts related to the subject of the article, people also tend to use some sentences or terms that have been appeared in other articles. Motivated by these observations and the statistical machine translation, text generation in this paper adopts a text generation model as shown in Eq. (3), where T and K represent text and keyword, respectively. P(T j K) represents a text generation model which indicates that text sentence T will be generated when given a set of keywords K. Similarly, this model could be characterized as Eq. (4), where the model represented by P(K j T) is a keyword production model and P(T) is a language model. The keyword production model outputs the main keywords when given the text and sentence generation model is used as a language model in this paper.

PðTÞPðK j TÞ Tb ¼ argmax PðT j KÞ ¼ ; PðKÞ T Tb ¼ argmax ðPðTÞPðK j TÞÞ:

ð3Þ ð4Þ

T

In this paper, we propose a text generation model and develop a text generation system that uses the model. Based on the above equations, the text generation model includes two parts: keyword generation model and sentence generation model. Keyword generation model and sentence generation model will be described in the following sections. 4. System architecture Fig. 1 shows the text generation system flow which includes keyword extraction module, keyword expansion module, candidate sentence selection module and text generation module. The

Fig. 1. System flow.

keyword extraction module will extract candidate keywords and core keywords. Core keywords are coming from special keywords and they will become the input of the system. In this paper, a modified strict phrase likelihood ratio (SPLR) algorithm is proposed to retrieve special keywords of the text. These special keywords are used to provide an overview about the template and the users could choose appropriate template based on these special keywords. Meanwhile, candidate keywords are used to construct the skeleton of the text in keyword expansion module and appropriate sentences will be selected to complete the content of the text. Moreover, synonym terms and simile expressions are adopted to enrich the content. The process and structure of each module will be described in the following sections. 4.1. Keyword extraction Fig. 2 shows the keyword extraction flow which includes term segmentation, term extraction, and special keyword scoring mechanism. Unlike English, Chinese language could not make use of spaces as boundary to separate words, so Chinese words segmentation is required in this stage. In this paper, maximum matching method, which extracts the longest meaningful substring, is used

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

807

Fig. 2. Keyword extraction flow.

tributed from 0 to 1. The keywords with the highest five scores will become core keywords. If the number of keywords is less than five, candidate keywords will be selected as the core keywords. The benefit of core keywords is that users could have a rough idea about the final content from the core keywords. 4.2. Keyword expansion

Fig. 3. SPLR scoring example.

to perform Chinese words segmentation. In practice, the Noun and Verb terms in an article could roughly represent the meaning of an article, so Noun and Verb terms are selected as the candidate keyword list in term extraction step. Meanwhile, the computer assisted writing system in this paper is based on keywords and the users determine the template from the keyword list. However, the Noun and Verb terms extracted from the articles coming from the same domain will be similar. Therefore, when the users would like to determine the template from keywords, it is difficult for users to differentiate the template content from similar keyword list. Thus, in addition to the keyword list extracted from Noun and Verb terms, special terms will be retrieved from the articles and these terms will become the input of the system. In other words, the users would determine the template content from these special terms. Chang and Lee (2003) presented and developed a strict phrase likelihood ratio (SPLR) approach to extract Chinese unknown words more efficiently and precisely. In practice, these unknown words could be used to differentiate the articles due to their rareness. A modified SPLR approach is proposed in this paper to calculate term scores and the higher ones will be selected as the candidates of core keyword list and Eq. (5) shows the computation model.

SPLR ¼

tf ðKW i Þ ; KW i len > 1 Maxðtf ðKW i LÞ; tf ðKW i RÞÞ

ð5Þ

In Eq. (5), tf(KWi), tf(KWi_L) and tf(KWi_R) represent term frequency of the keyword, term frequency of left hand side of the keyword and term frequency of right hand side of the keyword respectively. Meanwhile, since single word in Chinese is not easy to express important concept, only the terms with word length larger than 1 will be taken into account. As shown in Fig. 3, the SPLR score of the keyword ‘‘the president of student association” could be obtained from the frequency information. In modified SPLR score computation model, normalization step is required to normalize the scores and the final scores will be dis-

As described above, Noun and Verb terms are selected as candidate keyword list and modified SPLR computational model are used to choose core keywords of the system. In practice, it is not enough to generate an article from the core keywords. Therefore, a keyword expansion process is adopted to expand the core keywords and Fig. 4 shows the keyword expansion model. In Fig. 4, W1, W2, . . ., W10 represent the candidate keywords and W3, W8 represent core keywords. The expansion model will start from the core keywords to include other keywords as text generation keywords. Practically, if all the candidate keywords are included into the text generation keyword set, the variation of final text will be limited. On the other hand, if the number of keywords is few, it is not easy to find appropriate sentences to complete the gap between the keywords. In the expansion model, two previous keywords and two next keywords of the core keywords will be selected as the generation keywords. Thus, as shown in Fig. 4, W1, W2, W4 and W5 will be expanded from W3. Meanwhile, W6, W7, W9 and W10 will be expanded from W8. If expanded keywords overlap, intersection approach will be adopted and only one overlapped keyword will be selected. Hence, as shown in Fig. 5, keyword W5 is a overlapped keyword and the final generation keywords will be W1, W2, W3, W4, W5, W6, W7, W8 and W9. Based on the core keyword generation model and keyword expansion model, the users could determine the generating text from core keywords and the system could generate text from expanding keywords. 4.3. Candidate sentence selection and text generation In an article, keywords are like the backbone of the article and the sentences between keywords make up the content of the text. When the expanding keywords are extracted from the above process, appropriate sentences between two keywords should be determined as well. Eq. (6) shows how to select candidate sentences between two keywords.   jPre KW \ Wordði; jÞ \ Next KW j > 0 CandidateðuÞ C u 2 Unitði; jÞj Wordði; jÞ len 6 Threshold

ð6Þ where i is the index of the article, j is the position index of article i, Pre_KW represents the previous keyword, Next_KW represents the next keyword, Unit(i, j) represents the sentences that appear at the jth position of the ith article, Word(i, j) represents the sentences

808

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

Fig. 4. Keyword expansion model.

Fig. 5. Keyword expansion model with overlapped keywords.

Fig. 6. Sentence selection from corpus based on keywords.

that have appeared between Pre_KW and Next_KW in the corpus and Word(i, j)_len represents the length of Word(i, j). In Eq. (6), the sentences that appear between two keywords and their lengths are less than threshold value will be selected as candidate sentences. Fig. 6 shows that the sentences that appearing between ‘‘sensitive” and ‘‘spotlight” will be the candidate sentences. Based on the above process, candidate sentence list could be obtained and these candidate sentences could be mixed with the keywords to generate different texts. The main goal of the system is to construct a computer assisted text generation system to provide a draft version of text for the users. In addition, the users could learn how to improve their writing skills by learning from examples. Fig. 7 shows the text generated by the system and the rough English translation of the content is presented in Appendix. Practically, the content generated by the system provides a reference template for the users and the users could modify this love letter to meet their requirements. 4.4. Synonym substitution and simile expression In essence, synonym terms plays an important role in essay writing. Different terms with similar meaning could add variety to the content and it will be better to use different terms with similar meaning in one article. Thus, the system proposed in this paper provides synonym terms substitution mechanism to enrich the content. Fig. 8 shows the synonym terms extraction flow. Take love letter generation as an example, the data sources are from love letter corpus and the Sinica corpus.1 As described above, Chinese terms need to be segmented first. In this paper, the Noun and Verb terms are collected as the term set and these terms will be sent to Sinica ‘‘Image Reflection Lake” system2 to obtain the synonyms of these terms. The same terms will be filtered out first and the total number of Noun and Verb terms is 80,972. The terms in the synonym sets could be used to replace the terms with similar meanings. Fig. 9 shows the screen shot of synonym substitution scenario where different terms could be used to represent ‘‘deeply”. Generally speaking, simile is an expression that describes something by comparing it with something else. With the help of the 1 2

http://www.sinica.edu.tw/SinicaCorpus/. http://www.sinica.edu.tw/wen/Dictionary/sym-asym-demo.html.

Fig. 7. Love letter generation result.

simile, the content of the text will become more vivid. For example, ‘‘as white as snow” could be used to describe the color white by using snow. In English, a simile is a figure of speech comparing two different things, often introduced with the word ‘‘like” or ‘‘as”. Therefore, it is very important to find out the terms that will appear in the simile expressions. In this paper, we conduct related surveys and Fig. 10 shows the Chinese terms that always appear in the simile expressions. All these terms are used as the seeds and the sentences appearing after these terms will be collected as simile expressions. As shown in Fig. 11, the sentences after the seeds could be used as simile expressions. Therefore, the simile expressions that are related to ‘‘smile”, ‘‘will” and ‘‘freedom” will be obtained and they could be used in the text to enrich the content. Fig. 12 shows the simile expression scenario where ‘‘brave” could be replaced by ‘‘as brave as a warrior”. 5. Experiment and result 5.1. Data set The prototype system developed in this paper is applied to love letter application domain. The corpus comes from love letters that

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

Fig. 8. Synonym extraction flow.

Fig. 9. Synonym substitution screen shot.

are collected from the Internet and the number of love letters is 446. In addition to the application domain corpus, Sinica corpus are adopted to increase the content of the corpus. Based on the corpus, this paper proposed and developed an computer assisted writing system.

Fig. 11. Simile expression examples.

5.2. System design As shown in Fig. 13, the input of the text generation system is the keyword list. Each keyword list contains at most five keywords and these keywords could give users a rough idea about the final content. Moreover, the system allows users to generate text with short content or long content. In love letter, users could provide the receiver’s name and sender’s name and the system will use these name information in the text. Fig. 14 shows the text generated by the system. As described above, the goal of the system is to provide a computer assisted writing tool, so users could adjust the content according to their specific requirements. Moreover, synonym substitution and simile expression are two important modules that could make the article become more interesting and enrich the content of the article. In synonym substitution module, the system focuses on the Noun and Verb terms and the terms with similar meanings will be stored into the database for reference. As shown in Fig. 15, in the sentence ‘‘I want to be with you bravely”, ‘‘bravery” is similar to ‘‘courageously”, so these two terms could be interchanged. The system will provide the terms with similar meanings to the users and users could determine the best one. Moreover, the simile expression module will provide the simile expressions to enrich the content. For example, the term ‘‘courageous” could be enriched by ‘‘as courageous as a brave warrior”. Fig. 16 shows the finial text after performing synonym substitution

Fig. 10. Simile expression seeds.

Fig. 12. Simile expression screen shot.

Fig. 13. Keyword list selection.

809

810

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

Fig. 14. Love letter generated by the system.

Fig. 16. System screen shot.

Table 1 Evaluation result.

Average Proportion (P3) (%)

Readability

Relevance

Rhetoric

2.925 63.5

3.575 87.0

3.075 73.0

and the system provides enough flexibility for the users to alter the content to meet their requirement. 6. Conclusion Fig. 15. Synonym replacement and simile expressions functions.

and simile expression process. Meanwhile, the system allows users to define their own synonym terms and simile expressions, so the system could obtain more feedback data from users and that could enrich the content of the corpus. 5.3. Evaluation The evaluation of this system includes readability, relevance and rhetoric. Ten people, including nine males and one female, are invited to experience the system and they are asked to give a score for each item. The score ranges from 1 to 5 and the highest score is 5. Table 1 shows the result which includes average score and the proportion which is higher than 3. Each user is asked to review 20 love letters that are generated by the system and he/she evaluated the readability, relevance and rhetoric expression of the articles. The average score of the experiment is higher than 3 and that means that computer assisted writing system could provide some help for users to write a love letter. In essence, it is necessary to understand the meaning of the text to produce high-quality and fluent text. However, it is still infeasible to apply natural language understanding approach on text generation, because natural language understanding will require extensive knowledge about the outside world and the ability to manipulate it. In this paper, special keywords are used to provide the clue about the template and statistical approach is adopted to choose appropriate sentences between keywords. Meanwhile, synonym terms and simile expression could enrich the content

A computer assisted writing system, which includes text generation, synonym terms substitution and simile expression suggestion, is presented in this paper. The text generation model is proposed based on keyword generation model and sentence generation model. The keyword generation model extracts important keywords from the corpus and they will become the backbone of the template. Meanwhile, the sentences between keywords will construct the content of the template and candidate sentences are retrieved from the corpus based on statistical analysis. The benefit of this approach is that the template could provide a framework and reduce the physical effort spent on writing so that people can pay attention to organization and content. In addition, people tend to use some sentences or terms that have been appeared in other articles, so the candidate sentences that are coming from the corpus provide more material for the users to compose a love letter. Moreover, synonym substitution module and simile expression module both could enrich the content of the articles and enhance the variety of the articles. A simile expressions discovery scheme is proposed in this paper to obtain simile expressions and it works well according to the experiment. The prototype system has shown that it could work well on love letter application domain and the concept of this research could be extended to another domain with minor modification. Acknowledgment This work was supported in part by the National Science Council under the Grants NSC-97-2221-E-009-135 and NSC-97-2811-E009-019.

C.-L. Liu et al. / Expert Systems with Applications 38 (2011) 804–811

Appendix Rough translation of the love letter in Fig. 7. I am so scared, because I never wrote a love letter before and I do not know how to let you know my heart. Somehow it feels like losing you before completely having you. I truly feel blessed that you have become a part of my life, and I cannot wait for the day that we can join our lives together. I have found out someone whom I can share my feelings with. I cry with her and perceive her nervousness. She is only a substitute of you and I wish you where her. I really like you and how could I let you know I miss you so much. I am suffering from the pain you left. Even though we have some quarrels, happiness still exist in my heart and I still miss the time with you. References Brown, P. E., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19, 263–311. Buchanan, B. G., Moore, J. D., Forsythe, D. E., Carenini, G., Ohlsson, S., & Banks, G. (1995). An intelligent interactive system for delivering individualized information to patients. Artificial Intelligence in Medicine, 7(2), 117–154. Bustamante, F. R., & Leon, F. S. (1996). Gramcheck: A grammar and style checker. In Proceedings of the international conference on computational linguistics (COLING96) (pp. 175–181).

811

Chang, T.-H., & Lee, C.-H. (2003). Automatic chinese unknown word extraction using small-corpus-based method. In Proceedings of international conference on natural language processing and knowledge engineering (pp. 459– 464). Genthial, D., & Courtin, J. (1992). From detection/correction to computer aided writing. In Proceedings of the 14th conference on computational linguistics (pp. 1013–1018). Association for Computational Linguistics. Goldberg, E., Driedger, N., & Kittredge, R. I. (1994). Using natural-language processing to produce weather forecasts. IEEE Expert: Intelligent Systems and Their Applications, 9(2), 45–53. Kukich, K. (1992). Technique for automatically correcting words in text. ACM Computing Surveys, 24, 377–439. Langkilde, I., & Knight, K. (1998). Generation that exploits corpus-based statistical knowledge. In ACL-36: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics (pp. 704–710). Morristown, NJ, USA: Association for Computational Linguistics. Manning, C. D., & Schuetze, H. (1999). Foundations of statistical natural language processing. The MIT Press. Paggio, P. (2000). Spelling and grammar correction for danish in scarrie. In Proceedings of the sixth conference on applied natural language processing (pp. 255–261). Reiter, E., Mellish, C., & Levine, J. (1995). Automatic generation of technical documentation. Applied Artificial Intelligence, 9(3), 259–287. Uchimoto, K., Isahara, H., & Sekine, S. (2002). Text generation from keywords. In Proceedings of the 19th international conference on computational linguistics (pp. 1–7). Williams, S., & Reiter, E. (2008). Generating basic skills reports for low-skilled readers. Natural Language Engineering, 14, 495–525.