Improving knowledge of patient skills thanks to automatic analysis of online discussions

Improving knowledge of patient skills thanks to automatic analysis of online discussions

Patient Education and Counseling 92 (2013) 197–204 Contents lists available at SciVerse ScienceDirect Patient Education and Counseling journal homep...

490KB Sizes 0 Downloads 7 Views

Patient Education and Counseling 92 (2013) 197–204

Contents lists available at SciVerse ScienceDirect

Patient Education and Counseling journal homepage: www.elsevier.com/locate/pateducou

Improving knowledge of patient skills thanks to automatic analysis of online discussions Thierry Hamon a,*, Re´mi Gagnayre b a b

Laboratory of Medical Informatics and BioInformatics (EA3969), University Paris 13, Bobigny, France Laboratory of Pedagogy of the Health (EA3412), University Paris 13, Bobigny, France

A R T I C L E I N F O

A B S T R A C T

Article history: Received 27 July 2012 Received in revised form 15 May 2013 Accepted 17 May 2013

Objective: Automatically analyze the online discussions related to diabetes and extract information on patient skills for managing this disease. Methods: Two collections of about 7000 and 23,000 messages from online discussion fora and 174 skills from an available taxonomy are processed with Natural Language Processing methods and semantically enriched. Skills are projected on the messages to detect those skills which are mentioned by patients. Quantitative and qualitative evaluation is performed. Results: The method recognizes almost all the aimed skills in fora. The quality of the skills’ recognition varies with the method’s parameters. Most of the selected messages are relevant to at least one of the associated skills. Manual analysis shows a substantial number of messages is dedicated to daily self-care and psychosocial skills. Conclusion: Study of real exchanges between patients leads to a better understanding of their skills in daily self-management of diabetes. Practice implications: Our experiments can be useful for a better understanding and better knowledge of self-management of diseases by patients. They can also refine existing patient education programs. ß 2013 Elsevier Ireland Ltd. All rights reserved.

Keywords: Natural Language Processing Text mining Patient skills Online discussion fora Taxonomy of patient skills Therapeutic patient education

1. Introduction For several years now, the Internet has been the main source used by patients to find medical knowledge [1,2]. Different kinds of medical documents are accessible on the Internet (web pages, online discussion fora, support groups websites, news articles, scientific papers, etc.) and these can help patients to understand and manage their diseases [3–6]. We assume online discussion fora (ODFs) are a precious source of information about exchanges between patients. It is crucial then to analyze them for identification of the skills patients put into practice and of patients’ needs regarding self-management of their diseases. 1.1. Medical context

Health and Territories law). Moreover, healthcare professionals must help patients to acquire and to maintain their skills for a better management of chronic diseases. Thus, designing educational contents and pedagogical methods requires a better understanding of the daily requirements and current skills of patients [7]. The diabetes mellitus (DM) is a chronic disease for which one of the recurrent problems is its understanding by patients. This is particularly true for the diabetes mellitus 1 (DM1), which appears at the early stages of life. TPE aims at helping patients with management of insulin dosage, glycemia, hyperglycemia and hypoglycemia in various contexts (sport, diet, work, etc.). Diabetes is the first priority in France: the first TPE content addressed diabetes, and currently 30% of the 2660 authorized TPE programs by the French Regional Health Agencies are related to diabetes [8,9].

The French legislation establishes the therapeutic patient education (TPE) in the French healthcare system (Hospital, Patient,

1.2. Background and objectives

* Corresponding author at: Laboratoire d’Informatique Me´dicale et de BioInformatique (EA3969), Universite´ Paris 13, 74 rue Marcel Cachin, 93017 Bobigny Cedex, France. Tel.: þ33 148387307; fax: þ33 148387355. E-mail address: [email protected] (T. Hamon).

Research on TPE mainly focuses on documents creation adapted to patients. These are usually issued from texts dedicated to physicians [10]. Such an approach requires control of their accessibility [11], readability [12–14] and relevance [15]. Before being able to create such documents, physicians have to first

0738-3991/$ – see front matter ß 2013 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.pec.2013.05.012

198

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

correctly understand what the patients’ real needs are. To avoid bias which may be caused by physicians, survey of patients’ needs should rely on analysis of patients’ discussions, such as those from ODFs. Indeed, the Internet has become an essential means for patients in health management and support [16,17]. It is also noteworthy that a real public health area has appeared on the Internet [18]. There, patients can exchange information among themselves and help each other in supporting and facing complex situations and in understanding and managing their pathology. Development of ODFs, blogs and social networks leads to several benefits. First of all, it helps patient empowerment by enabling them to look up and share information about their disease [2,19,20]. Moreover, it motivates research on patient education and study of patient skills [21,22]. ODFs are becoming an important source for understanding other aspects: patient behavior [18,23], dialogue between doctors and patients [24,25], improvement of education programs [26], interaction among patients [27], rules underlying the organization of patient exchanges [28,29], differentiation among patients profiles and motivations [18,30] and information trustworthiness in ODFs [28,31–33]. Besides, patient vocabulary has been compared with the terms issued from the Unified Medical Language System (UMLS) [34] which is a terminology collection traditionally used in the medical domain. The experiment shows that they have only a small overlap and that an effort should be done to define a patient knowledge model in order to improve dialogue between physicians and patients [35]. In another work, exchanges between patients and physicians have been characterized with a taxonomy [36], which was helpful to identify patients’ skills and to improve communication. In an education context, ODFs are generally analyzed manually: to evaluate the model designed for improving patient communication skills [37], to assess the impact of patients’ empowerment [38], and to analyze online discussions dedicated to pregnancy problems [39]. When performed automatically, such analysis is usually based on statistical or machine learning approaches [40,41] in contexts related to information retrieval: identifying fora which propose the most relevant information [42], identifying topics of patients discussions [43], etc. In our work, we propose to automatically analyze ODFs dedicated to diabetes. We aim identify patients skills and mapping these skills to an existing taxonomy [44–46]. This taxonomy has been created manually and describes pedagogical content of training programs dedicated to diabetic patients. It relies on the constructivism [47] and pedagogy of integration [48] principles. Its content has been defined thanks to recommendations on skills the patients should have. For instance, self-care skills cover decisions related to management of treatments and their consequences; while adaptation skills describe life changes possibly induced by chronic diseases and their management [46,49,50]. A special place is given to metacognitive skills because they play a crucial role in selfconfidence, self-evaluation during the learning process [51] and acquisition of other skills [52] (see Section 2.1.2). Previously, this taxonomy was used as a framework for skill inference from ODFs inductive analysis [53], during which some difficulties appeared [54]:  When patients mention their skills in the ODFs, they do not always use the taxonomy terms;  Some skills mentioned by patients are not described in the taxonomy;  While some skills are very frequent, other skills are missing in ODFs.

In previous manual work, two thirds of the processed ODF messages were associated with this taxonomy [55]. The observations remain incomplete and inconclusive because only few messages were analyzed. Our current objective is to go beyond these studies and to perform a systematic survey of the messages posted on French ODFs. We propose to use Natural Language Processing (NLP) approaches to semantically code and analyze messages and patients’ skills. We perform quantitative and qualitative analyses to obtain a better knowledge of patients’ skills. 2. Methods 2.1. Material 2.1.1. Corpora of forum discussion We chose the two most active French language ODFs on diabetes (messages posted before June 2011):  Les diabe´tiques (http://www.lesdiabetiques.com/modules.php?name=Forums) forum contains 839 threads and 6982 messages (624,571 words);  Diabetes Doctissimo (http://forum.doctissimo.fr/sante/diabete/ liste_sujet-1.htm) forum contains 22,532 threads and 560,066 messages (35,059,868 words). 2.1.2. Skill taxonomy 174 patient skills have been grouped into nine categories (Fig. 1) and are hierarchically structured [46]. Most of the skills are verbal phrases which shortly describe the skill, as in Fig. 2 (each level specifies its upper level). 2.1.3. Linguistic resources Because the vocabulary used in the skill taxonomy differs from patient vocabulary [35,54], we expand its coverage with linguistic ‘‘vague’’ resources providing synonymy and associative relations. We assume that a certain degree of vagueness in the expansion may be useful given different expertise levels of contributors:  The French general dictionary Le Robert [56], which provides 149,309 synonyms for 48,859 words may contain fuzzy see-also relations;  Three distributional resources: – FreDist (http://fredist.gforge.inria.fr/) [57] built from the newspaper L’Est Re´publicain and the French version of Wikipedia. It contains 24,749 words (nouns, adjectives, adverbs and verbs) and their 1,853,475 semantic neighbors; – Les voisins de Le Monde (VdLM) (http://redac.univ-tlse2.fr/ applications/vdlm.html) [58] built from the newspaper Le Monde. It contains 145,164 words (nouns, adjectives, phrases and prepositions) with 2,762,739 semantic neighbors;

Fig. 1. List of skill categories.

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

199

important step of the method: beyond expansion and mapping, selection of the most relevant messages for a given skill is performed. For instance, in Fig. 3, we recognize the skill readjust diet because we can identify clusters readjust and diet thanks to the words aliment (staple) and calculer (compute). Fig. 2. An excerpt from the skill taxonomy.

– Les voisins de Wikipedia (VdW) (http://redac.univ-tlse2.fr/ applications/vdw.html) built from the French version of Wikipedia. It provides 173,853 words with 43,690 semantic neighbors. Differences between these distributional resources lie in exploitation of different source texts and methods. The core part of the methods is the distributional analysis [59] helped by statistical measures [60], which aim at grouping words or phrases which share common contexts. For instance, symptom and pain may be grouped together because they share several common contexts (relieve, appear, treatment, etc.). It is assumed that such groups of words or phrases also have some semantic relations among them, although these relations remain unspecified and vague. 2.2. Mapping between skill taxonomy and corpora We perform a linguistic and semantic enrichment of the skills to expand their coverage and to map them with messages. Skills and messages are processed through the same NLP methods. 2.2.1. Pre-processing and converting the skill taxonomy into keywords Skill taxonomy is first pre-processed: part-of-speech identification and lemmatisation [61] (the word problems is automatically identified as a noun and associated to the dictionary entry problem). Functional items (determinants, prepositions, etc.) and some very general words (fonction (function), contexte (context)) are not considered. We keep nouns (alimentation (diet)), adjectives (glyce´rique (glycaemic)) and verbs (re´ajuster (readjust)) as keywords. For some experiments, we also automatically extract terms [62], which are also considered as keywords. For instance, the skill re´ajuster l’alimentation en fonction du de´se´quilibre glyce´rique (readjust diet according to a poor glycaemic control) is converted into four keywords (re´ajuster (readjust), alimentation (diet), de´se´quilibre (poor control), glyce´rique (glycaemic)), to which the automatically extracted term de´se´quilibre glyce´rique (poor glycaemic control) may be added. 2.2.2. Keyword enrichment with linguistic resources We expand keywords with linguistic resources: each keyword and the related words from resources form a word subset we named cluster. Keywords are considered as semantic labels of their clusters. For the example given above, the keyword alimentation (diet) is enriched with related words, such as aliment (food), ressource (resource), fourniture (supply), nutrition (nutrition), and nourriture (food/diet). This cluster receives the semantic label diet. In this way, each skill may be described by n clusters, according to the number of keywords it contains. 2.2.3. Pre-processing of messages and their selection Messages are pre-processed in the same way as the skill taxonomy (Section 2.2.1). Words and terms from the clusters are mapped with words and terms from corpora. This is the most

2.2.4. Evaluation Reference data contain 105 randomly selected messages manually annotated by an expert in health pedagogy. The objective is to appraise the quality and the exhaustiveness of automatically obtained results, and to give a judgment on:  suitability of automatic method to help healthcare educators;  usefulness of these results for a better understanding and contextualizing of patient skills, both used and needed ones. 3. Results 3.1. Experiments We performed several experiments combining and varying several parameters (term extraction, linguistic resources, etc.) to select messages corresponding to skills. We present the most important experiments (Table 1):  baseline: Clusters contain only basic keywords (nouns, adjectives, verbs).  baseline þ terms: Baseline is expanded with extracted terms.  baseline þ synonymy: Baseline is expanded with the synonymy resources.  baseline þ terms þ synonymy: Baseline is expanded with extracted terms and synonyms.  baseline þ terms þ freDist, baseline þ terms þ VdLM, baseline þ terms þ VdW: Baseline is expanded with extracted terms and one of the distributional resources. 3.2. Keyword enrichment In this section, we present the influence of the linguistic resources for keyword enrichment in order to improve the skill identification in the ODFs. We illustrate their contribution in Table 4 regarding the lexical richness (number of words per cluster) and ambiguity (number of clusters associated to each word). However, the real impact of the resources will be analyzed in the following sections. We performed keyword enrichment and built clusters according to the presented experiments. With the baseline, the number of clusters varies between 1 and 8 (mean 3.40 clusters/skill). Term extraction adds at most two terms per skill and increases clusters number from 591 to 701 (average of 4.03 clusters/skill). Size of the enriched clusters varies significantly according to linguistic resources: from 136 (average 23.58) with synonymy resource to 2449 (average 673.11) with distributional resource VdLM. As a matter of fact, some clusters are heavily enriched because of the word ‘‘faire’’ (to do) which appears in the verbal

Fig. 3. Excerpt from a message with annotated words.

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

200 Table 1 Summary of the parameter sets. Parameter sets

Keywords

Baseline Baseline þ terms Baseline þ synonymy Baseline þ terms þ synonymy Baseline þ terms þ freDist Baseline þ terms þ VdLM Baseline þ terms þ VdW

Noun, Noun, Noun, Noun, Noun, Noun, Noun,

Resource-based expansion

adjectives, adjectives, adjectives, adjectives, adjectives, adjectives, adjectives,

phrase ‘‘faire valoir’’ (assert). In future experiments, the phrase ‘‘faire valoir’’ (assert) will be considered as idiomatic expression. On average, the extension of clusters varies from 17.32 with VdW to 673.11 words with VdLM. We also observe that distributional resources built from newspapers (VdLM and freDist) are linguistically richer and provide a larger vocabulary. Moreover, VdW contributes fewer cluster expansion than the synonymy resource, while its content is bigger. Total number of words which may appear in expanded clusters (Table 2, column 2) can be associated to the cluster richness, while cluster number per word can be associated with the notion of word ambiguity (Table 2, columns 3–5). These two notions are related: from two opposite points of view, they shed light on advantages and limitations of the exploited resources:  With baseline, words are associated to at most 36 clusters (average 2.14 or 1.9 according to experiments);  With synonymy resource, up to 56 clusters are associated to a given word (average 3.8);  Distributional resources have a variable influence: (1) average number of clusters is similar with freDist and VdW, while vocabulary size of the former is four times bigger than the latter; (2) with VdLM, ambiguity of words increases, as we obtain on average almost 66 clusters/word. Several of them are associated to analyze, such as explore, exploit, carry out, readjust, check, which indeed correspond to different meaning of the verb analyze. The total number of words also provides interesting information on vocabulary:  As expected, the linguistic resources help to expand the vocabulary but their contribution is variable: the vocabulary multiplies tenfold with the synonymy resource, twentyfold with VdLM and FreDist;  Use of distributional resources increases the lexical richness of skills but adds semantic vagueness. This also has a positive influence on message selection and on the identified number of skills, which was one of the objectives of our work.

Table 2 Number of clusters per word. Parameter sets

Total number of words

Number of clusters per word Min

Max

Average

Baseline Baseline þ terms Baseline þ synonymy Baseline þ terms þ synonymy Baseline þ terms þ freDist Baseline þ terms þ VdLM Baseline þ terms þ VdW

276 368 3582 3674 6127 7124 1391

1 1 1 1 1 1 1

36 36 56 56 138 359 97

2.14 1.90 3.89 3.82 8.83 66.23 8.73

verbs verbs, verbs, verbs, verbs, verbs, verbs,

terms terms terms terms terms terms

Synonymy freDist VdLM VdW

3.3. Skill identification in corpora In this section we analyze the skill identification results in the two ODF corpora. We present the impact of linguistic resources during the selection of relevant threads and messages (Tables 3 and 4) and then according to skill coverage and the skill mention in patient messages (additional tables are available at http://wwwlimbio.smbh.univ-paris13.fr/membres/hamon/Files/2013PEC-appendix). Results show that NLP methods and resources offer an efficient possibility to systematically process ODF content and access mentioned skills. Tables 3 and 4 present results on skill identification at thread and message levels. At the thread level:  Baseline allows to select about 35% of threads;  Term extraction reduces the number of threads (5%) and messages (9%);  Linguistic resources increase the thread coverage (94–99%). However, at the message level, these observations vary according to corpora:  LesDiabe´tiques: only 7.5% of messages are selected with the baseline, while with synonyms we reach 70% of the messages, and 85–93.4% with distributional resources (respectively freDist/VdW and VdLM);  Doctissimo: only 3% of messages are extracted with the baseline, 41% with synonyms and 50–66% with distributional resources. On the whole, the results on skill identification indicate that, even with the baseline, a greater number of skills is identified in Doctissimo than in LesDiabe´tiques. Unsurprisingly, the top level skills are the ones with the best coverage. Skills are not identified homogeneously:  Skills belonging to category 1 (express the needs regarding the pathology) are retrieved in both corpora;  Most of the skills from categories 5 (management of an emergency situation) and 9 (metacognition, anticipate and plan actions, and self-evaluation) may be associated with no messages;  Other skill categories are more or less covered according to corpora: categories 2 (Understand and explain the pathology to himself/herself), 3 (Identify the symptoms, analyze a risk situation, interpret and measure the values) and 6 (practice, know how) have better coverage in Doctissimo than in LesDiabe´tiques. We assume these variations are due to the fact that skills are not mentioned by patients or, when mentioned, they are not expressed with the expected words. The last columns of Tables 3 and 4 indicate the influence of resources: synonymy identifies more skills than any other resource, while distributional resources increase the number of selected messages.

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

201

Table 3 Skill identification overall results in the corpus LesDiabe´tiques. Parameter sets

Number of threads

Number of messages

Average number of messages per thread

Number of skills in corpora

Baseline Baseline þ terms Baseline þ synonymy Baseline þ terms þ synonymy Baseline þ terms þ freDist Baseline þ terms þ VdLM Baseline þ terms þ VdW

291 277 799 795 825 827 821

524 476 4956 4914 6044 6519 5942

1.80 1.72 6.2 6.18 7.33 7.88 7.24

30 28 127 82 95 94 73

(35%) (33%) (95%) (95%) (98%) (98%) (98%)

(7.5%) (7%) (71%) (70%) (86.5%) (93.4%) (85%)

(17%) (16%) (73%) (47%) (54.5%) (54%) (42%)

Table 4 Skill identification overall results in the corpus Doctissimo. Parameter sets

Number of threads

Number of messages

Average number of messages per thread

Number of skills in corpora

Baseline Baseline þ terms Baseline þ synonymy Baseline þ terms þ synonymy Baseline þ terms þ freDist Baseline þ terms þ VdLM Baseline þ terms þ VdW

8082 7711 21,289 21,263 22,156 22,240 21,976

16,729 15,211 232,611 231,155 318,414 356,939 272,270

2.07 1.97 10.94 10.87 14.37 16.05 12.39

109 85 163 113 128 121 114

(36%) (34%) (94.4%) (94%) (98%) (99%) (97%)

Term extraction influences results from Doctissimo: it helps to select messages that have been missed by the baseline (i.e. explain the causes and consequence of the hypoglycemia, use of insulin pump and contextualization process). Results of the baseline þ synonymy experiment have been compared to the reference data:  63% of messages contain at least one correctly identified skill from high or low hierarchical levels of the taxonomy;

(3%) (2.7%) (41,5%) (41%) (57%) (64%) (48%)

(62%) (49%) (93%) (65%) (73.5%) (69.5%) (65.5%)

 54% of skills are correctly coded. An example of a coded message is given in Fig. 4. It appears automatically extracted results are also more complete than the reference data. While additional examination remains important, expert felt that such results are very helpful for understanding and contextualizing of patient skills within the ocean of forum discussions. Still, some skills, such as those related to meta-cognition, are missed by our method: although they occur in ODFs specific linguistic resources are required to reach them.

Fig. 4. Excerpt of a message with manual and system driven message coding. (We show here only the English translation of the original message in French. Common skills are in bold. The spelling errors are genuine. Line-breaks are removed.)

202

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

4. Discussion and conclusion 4.1. Discussion First of all, results confirm that it is possible to automatically identify skills mentioned by patients in online discussions in which physicians are not involved [63]. This meets our main objective. Moreover, from the analysis of the selected messages responding to 174 skills, we observe that two sets of categories are mainly mentioned in discussions: self-care skills (categories 3, 6 and 7) and psychosocial skills (categories 1 and 2) [46]. Self-care skills are most frequent because patients give precise descriptions and refer to protocols which correspond to the core treatment management (i.e., overall diabetes pathophysiology, diet influence, physical activity, alcohol and glycaemic control). Thus, in numerous threads, we observe discussions about treatment procedures, and especially insulin pump setup and its daily use. Self-care skills are always mentioned together with psychosocial skills. Among psychosocial skills, we clearly identify express his/her emotions regarding the pathology, encourage and support the motivation of the other internet user, and improve selfconfidence [20]. Among frequent skills, we can also find those asserting health choices and rights, and especially financial and insurance responsibility for the treatment [20]. Finally, such message analysis goes beyond skill identification. For instance, we observed that patients wonder about the pathology experience, expect to get answers on daily therapeutic problems, and want to get reassured they manage their disease correctly. Their needs concern also their relation with healthcare professionals: patients share their expectations about the healthcare system and the information it should propose [2]. The proposed NLP methods appear to be helpful to mine ODFs. The use of linguistic resources improve the skill identification in ODF messages. We observe an important increase of the selected messages and the identified skills. This demonstrate that such approach can be useful for increasing the mapping between patient and physician vocabularies, which has shown to be deficient up to now [35]. When skill keywords can be found in discussions, our method has good coverage, in particular with distributional resources. However, in some cases, distributional resources are too ambiguous (e.g. faire – to do) or semantically too vague (e.g. affichage – display is related to alimentation – diet in VdLM). To avoid this limitation, we will apply an additional filtering (number of neighbors, association strength, etc.). We also face difficulties when skills are too general. In this case, other resources should be exploited, such as hierarchical relation (e.g. sport and more specific activities such as football, hiking, or cycling), to add precision to some parts of the skills. For instance, hierarchical relations from the UMLS [34] can be useful to describe sport and physical activities for the skills adjust the insulin treatment to the sport and adjust the insulin treatment to the physical activity. Additional investigations will be performed to improve precision: we plan to process some specific message parts (e.g. thread subject, which seems to provide important information) and remove other parts (e.g. signature). Finally, because health fora may contain some specific vocabulary or meanings not covered by the exploited resources, we plan to build additional resources from ODF. 4.2. Conclusion We exploited NLP methods and resources to identify patient skills in two ODF corpora. The proposed approach is based on linguistic and semantic skill enrichment. Mapping of enriched skills with forum messages allows to select most of the messages mentioning correctly at least one of the associated skills. Manual

analysis of a subset of the selected messages shows that an important number of these messages are about daily self-care and psychosocial skills. Two main limitations of our work are the small size of the reference data and the fact that only one expert analyzed the results. We are currently addressing these by building a more complete reference data and looking for more experts. This work demonstrates that the use of NLP can be helpful for identifying and coding patient skills in an unsupervised way. On the whole, this approach is very useful to understand real behavior and patients’ needs without any interference of healthcare professionals, and to improve our knowledge of patients’ needs. 4.3. Practice implications Our results and observations have several practice implications for medical care and especially for therapeutic patient education. These implications are strengthened by the fact that this study was done directly on ODF messages, and without healthcare professional interference. First of all, the analysis of the ODFs messages helps to identify topics which are the most important for patients in their daily life. Our work is also very helpful to understand patient needs and to be sure TPE programs are adapted to patients. Analysis of some problematic situations mentioned by patients also gives information about their learning difficulties, which may allow to increase the didactic value of TPE programs by strengthening the patient training time dedicated to reasoning and decision making in situations with variable complexity. We assume this is the main challenge of educational programs. Finally, analysis of discussions between patients illustrates the learning process between peers: content of messages and modality of their creation (asynchronicity, daily/nightly temporality, confidentiality, etc.) are specific to ODFs. Our study, done in situ, allows to discover patients’ willingness and needs to share with others their ‘‘lifestyle with a pathology’’ and to learn together. If the results and knowledge extracted in these experiments, were available for healthcare professionals, this would allow to reinforce patient skills to appraise the relevance of answers and suggestions and thus to increase the value of the forum exchanges. Conflict of interest The two authors have no conflict of interest. Acknowledgments This work is funded by the University Paris 13 (project BQR Bonus Quality Research, 2011). We would like to thank Natalia Grabar for her help with the preliminary version of the paper. We also thank Didier Bourigault and Frank Sajous for providing distributional resources Les voisins de Le Monde and Les voisins de Wikipedia. We are grateful to Bert Cappelle for his help with the editing of this article. References [1] Murray E, Lo B, Pollack L, Donelan K, Catania J, White M, et al. The impact of health information on the internet on the physician–patient relationship: patient perceptions. Arch Intern Med 2003;163:1727–34. [2] Wald HS, Dube CE, Anthony DC. Untangling the web. The impact of internet use on health care and the physician–patient relationship. Patient Educ Couns 2007;68:218–24. [3] Plougmann S, Hejlesen OK, Caven DA. Diasnet: a diabetes advisory system for communication an education via the internet. Int J Med Inform 2001;64: 319–30. [4] Diaz JA, Griffith RA, Ng JJ, Reinert SE, Friedmann PD, Moulton AW. Patients’ use of the internet for medical information. J Gen Intern Med 2002;17:180–5.

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204 [5] Ralston JD, Revere D, Robins LS, Goldberg HI. Patients’ experience with a diabetes support programme based on an interactive electronic medical record: qualitative study. Brit Med J 2004;328:1159–62. [6] McMullan M. Patients using the internet to obtain health information: how this affects the patient-health professional relationship. Patient Educ Couns 2006;63:24–8. [7] Golay A, Lagger G, Giordan A. Motivating patient with chronic diseases. J Med Pers 2007;5:57–63. [8] de Penanster D. Pre´sentation de la proble´matique institutionnelle [Presentation of the institutional problematic]. Demi-Journe´e de pre´paration de l’appel a` projets ETP; 2012. [9] Boisseau MT. Plan pour l’ame´lioration de la qualite´ de vie des personnes atteintes de maladies chroniques 2007–2011 [Program for the improvement of the life quality of the patients suffering of chronic diseases 2007– 2011]. In: Rapport annuel du comite´ de suivi 2011 [Annual report of the follow-up committee 2011]. Ministe`re des Affaires Sociales et de la Sante´; 2012 . [10] Zeng QT, Tse T. Exploring and developing consumer health vocabularies. J Am Med Inform Assoc 2006;13:24–9. [11] Zeng X, Parmanto B. Evaluation of web accessibility of consumer health information websites.. In: Proceedings of AMIA 2003 Symposium; 2003. p. 743–7. [12] Kandula S, Zeng-Treitler Q. Creating a gold standard for the readability measurement of health texts. In: Proceedings of the AMIA 2008 Symposium; 2008. p. 353–7. [13] Leroy G, Helmreich S, Cowie JR, Miller T, Zheng W. Evaluating online health information: beyond readability formulas. In: Proceedings of the AMIA 2008 Symposium; 2008. p. 394–8. [14] Wang Y. Automatic recognition of text difficulty from consumers health information. In: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems (CBMS’06); 2006. p. 131–6. [15] Berland GK, Elliott MN, Morales LS, Algazy JI, Kravitz RL, Broder MS, et al. Health information on the internet: accessibility, quality, and readability in English and Spanish. J Amer Med Assoc 2001;285:2612–21. [16] Hardey M. Internet et socie´te´: reconfigurations du patient et de la me´decine ? [Internet and Society: reconfiguration of the patient and the medicine?] Sciences sociales et Sante´ 2004;22:21–44. [17] Nabarette H. L’infome´diation en sante´. exemple d’orphanet dans les maladies rares [Health infomediation. The example of Orphanet for the rare diseases]. Universite´ de Paris I Panthe´on Sorbonne; 2003 [PhD thesis]. [18] Akrich M, Me´adel C. Les e´changes entre patients sur l’internet [The communication between patients on the Internet]. La Presse Me´dicale 2009;38: 1484–90. [19] Lau DH. Patient empowermenta patient-centred approach to improved care. Hong Kong Med J 2002;8:372–4. [20] Hallett J, Brown B, Maycock B, Langdon P. Changing communities, changing spaces: the challenges of health promotion outreach in cyberspace. Promot Educ 2007. [21] Doumont D, Aujoulat I. L’empowerment et l’e´ducation du patient [The empowerment and the education of the patient]; vol. 1 of Se´rie de dossiers techniques. Bruxelles, UCL – RESO; 2002. 24. [22] Glasgow RE, Kurz D, King D, Dickman JM, Faber AJ, Halterman E, et al. Twelvemonth outcomes of an internet-based diabetes self-management support program. Patient Educ Couns 2012;87:81–92. [23] Dickerson S, Reinhart AM, Feeley TH, Bidani R, Rich E, Garg VK, et al. Patient internet use for health information at three urban primary care clinics. J Am Med Inform Assoc 2004;11:499–504. [24] Kharrazi H. Improving healthy behaviors in type 1 diabetic patients by interactive frameworks. In: Proceedings of AMIA 2009 Symposium; 2009. p. 322–6. [25] Yu C, Parsons J, Mamdani M, Lebovic G, Shah B, Bhattacharyya O, et al. Designing and evaluating a web-based self-management site for patients with type 2 diabetes – systematic website development and study protocol. BMC Med Inform Decis 2012;12:57–66. [26] Mulvaney SA, Rothman RL, Osborn CY, Lybarger C, Dietrich MS, Wallston KA. Self-management problem solving for adolescents with type 1 diabetes: intervention processes associated with an internet program. Patient Educ Couns 2010;85:140–2. [27] Thoe¨r C, Aumond S. Construction des savoirs et du risque relatifs aux me´dicaments de´tourne´s dans les forums sur internet [Building of the knowledge and the risk related to the drugs on the Internet fora]. Anthropologie et socie´te´s – Nume´ro spe´cial ’Cyberespace et Anthropologie transmission des savoirs et des savoir-faire’’ 2011;35. [28] Romeyer H. TIC et sante´: entre information me´dicale et information de sante´ [ICT and health: beween medical information and health information]. TIC&Socie´te´ 2008;2. [29] Akrich M, Me´adel C. Policing exchanges as self-description in internet groups.. In: Brousseau E, Marzouki M, Me´adel C, editors. Governance, regulations and powers on the Internet. 2012. p. 232–56. [30] Paganelli C, Clavier V. Le forum de discussion: une ressource informationnelle hybride entre information grand public et information spe´cialise´e [The discussion forum: a hybrid informational resource between general and specialised information].. In: Yasri-Labrique E, editor. Les forums de discussion: agoras du XXIe sie`cle? The´ories, enjeux et pratiques discursives. Langue et Parole; L’harmattan; 2011. p. 39–55.

203

[31] Senis F. Pourquoi acce´der a` linformation me´dicale sur internet par le biais des groupes de discussions ? qualite´, centres dinte´reˆt et motivations des participants aux forums me´dicaux. a propos du forum usenet fr.bio.medecine [Why to access to the medical information on Internet through the newgroups? Quality, interests and motications of the contributors of medical fora. About the Usenet group Fr.bio.medecine]. Universite´ Bordeaux 2 Faculte´ de Me´decine; 2003 [The`se de doctorat de me´decine ge´ne´rale]. [32] Que´meras C. Inte´reˆt des listes de discussion destine´es aux patients concerne´s par une pathologie rare, grave ou chronique: comparaison du point de vue de la population ge´ne´rale et du point de vue me´dical [Interest of the discussion list for patients with rare, serious or chronic disease: comparison from the point of view of the general population and medical one]. Universite´ de BrestBretagne occidentale, Faculte´ de me´decine; 2003 [The`se de doctorat en me´decine]. [33] Romeyer H. La sante´ en ligne: des enjeux au-dela` de l’information [Online health: challenges beyond the information]. Communication 2012;30. [34] National Library of Medicine, editor. UMLS Knowledge Source. 13th ed.; 2003. [35] Smith CA, Wicks PJ. Patientslikeme: consumer health vocabulary as a folksonomy. In: Proceedings of the AMIA 2008 Symposium; 2008. p. 682–6. [36] Chan CV, Matthews LA, Kaufman DR. A taxonomy characterizing complexity of consumer ehealth literacy.. In: Proceedings of the AMIA 2009 Symposium; 2009. p. 86–90. [37] Tran AN, Haidet P, Street RL, O’Malley KJ, Martin F, Ashton CM. Empowering communication: a community-based intervention for patients. Patient Educ Couns 2004;52:113–21. [38] van Uden-Kraan CF, Drossaert C, Taal E, Seydel ER, van de Laar M. Participation in online patient support groups endorses patients’ empowerment. Patient Educ Couns 2009;74:61–9. [39] Fredriksen EH, Moland KM, Sundby J. ‘‘Listen to your body’’. A qualitative text analysis of internet discussions related to pregnancy health and pelvic girdle pain in pregnancy. Patient Educ Couns 2008;73:294–9. [40] Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv 2002;34:1–47. [41] Li YM, Liao TF, Lai CY. A social recommender mechanism for improving knowledge sharing in online forums. Inform Process Manag 2012;48:978–94. [42] Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 2010;48:354–68. [43] Fu-RenLin, Hsieh LS, Chuang FT. Discovering genres of online discussion threads via text mining. Comput Educ 2009;52:481–95. [44] d’Ivernois JF, Gagnayre R. Mettre en œuvre l’e´ducation the´rapeutique [Implementing the therapeutic education]. Actualite´ et Dossier en Sante´ Publique 2001;36:11–3. [45] d’Ivernois JF, Gagnayre R. Vers une de´marche qualite´ en e´ducation the´rapeutique du patient [Towards a quality process in therapeutic patient education]. Actualite´ et Dossier en Sante´ Publique 2002;39:14–6. [46] d’Ivernois JF, Gagnayre R, the members of the CPEM working group. Compe´tences d’adaptation a` la maladie du patient: une proposition [The patient’s psychosocial skills: a proposal]. Educ Ther Patient/Ther Patient Educ 2011;3: S201–5. [47] Scallon G. L’e´valuation des apprentissages dans une approche par compe´tence [The evaluation of the learning in a skill-based approach]. Bruxelles: De Boeck; 2004. [48] Roegiers X. Une pe´dagogie de l’inte´gration. Compe´tences et inte´gration des acquis dans l’enseignement [A pedagogy of the integration. Skills and integration of the knowledge in the teaching]. Bruxelles: De Boeck; 2000. [49] HAS-INPES. Structuration d’un programme d’e´ducation the´rapeutique du patient dans le champ des maladies chroniques [Structuring a program of therapeutic patient education in the field of the chronic diseases]. Guide me´thodologique; 2007. [50] Funnell MM, Brown TL, Childs BP, Haas LB, Hosey GM, Jensen B, et al. National standards for diabetes self-management education. Diabetes Care 2012;35: S101–8. [51] Bandura A. Social foundations of thought and action: a social cognitive theory. Englewood Cliffs, NJ: Prentice-Hall; 1986. [52] Johnston-Brooks CH, Lewis MA, Garg S. Self-efficacy impacts self-care and HbA1c in young adults with type 1 diabetes. Psychosom Med 2002;64: 43–51. [53] Bruce CD. Questions arising about emergence, date collection and its interaction with analysis in a grounded theory study. Int J Qual Meth 2007;6: 1–2. [54] Laghmari N. Analyse des e´changes e´crits entre personnes atteintes de diabe`te ou vivant aupre`s d’une personne diabe´tique sur les forums de discussions sur des sites internet de´die´s: Pre´-e´tude me´thodologique [Analysis of the written communication between diabetic patients or their relatives on dedicated online discussion fora: methodological pre-study]. Universite´ Paris 13; 2009 [The`se pour l’obtention du diploˆme d’e´tat de docteur en me´decine]. [55] Harry I, Gagnayre R, d’Ivernois JF. Analyse des e´changes e´crits entre patients diabe´tiques sur les forums de discussion [Analysis of the written communication between diabetic patients on the online discussion fora]. Distances et savoirs 2008;6:393–412. [56] Le petit Robert. 1990. [57] Anguiano EH, Denis P. Fredist: automatic construction of distributional thesauri for French. In: Proceedings of the conference TALN 2011; 2011. p. 119–24.

204

T. Hamon, R. Gagnayre / Patient Education and Counseling 92 (2013) 197–204

[58] Bourigault D. Upery: un outil d’analyse distributionnelle e´tendue pour la construction d’ontologies a` partir de corpus [Upery: a software based on an extended distributional analysis for the ontology building]. In: Proceedings of the 9th conference TALN 2002; 2002. p. 75–84. [59] Harris Z, Gottfried M, Ryckman T,Mattick Jr P, Daladier A, Harris T, et al. The Form of Information in Science, Analysis of Immunology Sublanguage; vol. 104 of Boston Stud Philos Sci. Dordrecht. The Netherlands: Kluwer Academic Publisher; 1989 . [60] Curran JR. From distributional to semantic similarity. University of Edinburgh; 2004 [PhD thesis].

[61] Schmid H. Probabilistic part-of-speech tagging using decision trees. In: Jones D, Somers H, editors. New methods in language processing. Studies in computational linguistics. UCL Press; 1997. p. 154–64. [62] Aubin S, Hamon T. Improving term extraction with terminological resources. In: Salakoski T, Ginter F, Pyysalo S, Pahikkala T, editors. Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006). No. 4139 in LNAI. Springer; 2006. p. 380–7. [63] Fergusson T, Frydman G. The first generation of e-patients. Brit Med J 2004;328:1148–9.