Lexical use in emotional autobiographical narratives of persons with schizophrenia and healthy controls

Lexical use in emotional autobiographical narratives of persons with schizophrenia and healthy controls

Psychiatry Research 225 (2015) 40–49 Contents lists available at ScienceDirect Psychiatry Research journal homepage: www.elsevier.com/locate/psychre...

733KB Sizes 0 Downloads 30 Views

Psychiatry Research 225 (2015) 40–49

Contents lists available at ScienceDirect

Psychiatry Research journal homepage: www.elsevier.com/locate/psychres

Lexical use in emotional autobiographical narratives of persons with schizophrenia and healthy controls Kai Hong a, Ani Nenkova a, Mary E. March b, Amber P. Parker c, Ragini Verma d, Christian G. Kohler b,n a

Department of Computer and Information Science, University of Pennsylvania School of Engineering and Applied Science, USA Schizophrenia Research Center, Department of Psychiatry, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA c University of Pennsylvania School of Art and Sciences, Philadelphia, PA 19104, USA d Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA b

art ic l e i nf o

a b s t r a c t

Article history: Received 7 October 2013 Received in revised form 26 September 2014 Accepted 4 October 2014 Available online 3 December 2014

Language dysfunction has long been described in schizophrenia and most studies have focused on characteristics of structure and form. This project focuses on the content of language based on autobiographical narratives of five basic emotions. In persons with schizophrenia and healthy controls, we employed a comprehensive automated analysis of lexical use and we identified specific words and semantically or functionally related words derived from dictionaries that occurred significantly more often in narratives of either group. Patients employed a similar number of words but differed in lower expressivity and complexity, more self-reference and more repetitions. We developed a classification method for predicting subject status and tested its accuracy in a leave-one-subject-out evaluation procedure. We identified a set of 18 features that achieved 65.7% accuracy in predicting clinical status based on single emotion narratives, and 74.4% accuracy based on all five narratives. Subject clinical status could be determined automatically more accurately based on narratives related to anger or happiness experiences and there were a larger number of lexical differences between the two groups for these emotions compared to other emotions. & 2014 Published by Elsevier Ireland Ltd.

Keywords: Emotion Lexical features LIWC Diction Learning-based analyses Text classification

1. Introduction Narratives of emotional experiences contain rich personal information in linguistic form. It has been suggested that evolutionary brain development has led to structural asymmetries that underlie the components of human language (Geschwind and Galaburda, 1987), and that these asymmetries relate to the emergence of schizophrenia as a human disorder (Crow, 1997). Disturbed functions of language have long been described in persons with schizophrenia, for example, in areas of phonetic intonation (dysprosody), lack of volume or content (alogia) and disturbed content or relatedness of speech production (thought disorder or schizophasia). Considered as core phenomenologic characteristics of the illness, these functions relate to different linguistic categories, as summarized in reviews (DeLisi, 2001;

n Correspondence to: Neuropsychiatry Section, Department of Psychiatry, 10th Floor, Gates Building, University of Pennsylvania School of Medicine, 3400 Spruce Street, Philadelphia, PA 19104, USA. Tel.: þ1 215 614 0161; fax: þ1 215 662 7903. E-mail addresses: [email protected] (K. Hong), [email protected] (A. Nenkova), [email protected] (M.E. March), [email protected] (A.P. Parker), [email protected] (R. Verma), [email protected] (C.G. Kohler).

http://dx.doi.org/10.1016/j.psychres.2014.10.002 0165-1781/& 2014 Published by Elsevier Ireland Ltd.

Covington et al., 2005; McKenna and Oh, 2005). Examinations of spontaneous and conversational speech in schizophrenia have tended to focus on measures of coherence, which represents the semantic relationship of expressed ideas. Later analyses include cloze procedure (Manschreck et al., 1981; Newby, 1998), ambiguous referents (Docherty et al., 1996) and unusual word combinations (Solovay et al., 1987; Niznikiewicz et al., 2002). All these studies have detected abnormalities related to the production of coherent discourse. The findings hold even when the comparison group is patients with mood disorders rather than healthy persons (Docherty et al., 1996). These prior studies have also confirmed the relationship between peculiarities of language use and cognitive dysfunction involving attention and executive abilities (Docherty, 2005; Marini et al., 2008). Despite these compelling findings, analysis of patent's language production on a large scale remains a practical problem. Human annotation of patient speech is time consuming and rather subjective for some categories of linguistic expression. Patients often do not talk much, so there is relatively short sample of language available for the analysis. To address these challenges in prior work, Elvevåg et al. (2007) applied Latent Semantic Analysis (LSA) to compute automatically semantic similarities between the answers to a set of standardized questions involving different

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

thematic areas. The similarities were computed at three levels of granularity: words, sentences and entire answers. In that study, LSA scores for answer coherence correlated reasonably well with human ratings of thought disorder. The proposed method was able to discriminate between patients with high levels of thought disorder from controls. Cretchley et al. (2010) conducted qualitative analysis by examining the conversations between caretakers and schizophrenic patients using Leximancer. The Leximancer toolkit employs word-association information to extract concepts. This method makes it feasible to generate a tailored taxonomy for each dataset (Smith, 2003). Cretchley et al. (2010) found that the carers used different strategies to communicate with the schizophrenia patients, depending on the conversational tendencies and relationship context of the patients. In our work we focus exclusively on examining lexical features in personal narratives about past experiences related to five basic emotions. Our goal is to identify differences in lexical use between schizophrenia patients and healthy controls. We explore four types of lexical features to characterize narratives: generic, word identity, dictionary and language model (LM) features. Generic features include the number of words per sentence, the number of letters (graphemes) per word, the number of sentences per narrative and the number of repetitions of any words. Traditionally generic features have been used in the readability literature, in the context of a task to find reading material appropriate for a given grade level of reading competency (Heilman et al., 2007). They reflect the complexity of the analyzed language. Word identity features, on the other hand, track the frequency of individual words. These features have been employed in analyses of cancer concerns (Ando et al., 2007) and detection of post-traumatic stress disorder (He et al., 2012). In work specifically related to schizophrenia, word identity features derived from outpatient consultations between patients and psychiatrists have been shown useful for predicting positive and negative syndrome scale (PANSS) and adherence to treatment for schizophrenic patients (Howes et al., 2012a, 2012b, 2013). Dictionary features provide a more robust way to analyze lexical use by grouping words into semantic and grammatical categories. One can track the occurrences of these more interpretable and general categories rather than the occurrences of individual words. A popular dictionary-based package for psychometric analysis is the Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al., 2007), which groups words into psychologically meaningful categories. LIWC has been applied to analyses of various forms of texts, including written accounts of personal emotional experiences and transcripts of spoken narratives. It has been shown that the use of pronouns and function words are good indicators to reveal if the narrator is deceptive or honest (Newman et al., 2003), what is the narrator's personality type (Pennebaker and King, 1999) and mental health status (Tausczik and Pennebaker, 2010). LIWC and word identity features can elucidate personality styles, such as introversion, openness, and conscientiousness from analysis of e-mails (Gill et al., 2006), blogs (Gill et al., 2009) and conversations (Mairesse et al., 2007). One of the main findings based on LIWC dictionary features is that the increased use of first-person pronouns is a reliable indicator of emotional distress (Rude et al., 2004) and suicide intent (Stirman and Pennebaker, 2001). Finally we experiment with language model (LM) features. In these models the probability of word sequences (n-grams) is first estimated from a training corpus. We train one LM for patients and one for controls, using the narratives from the corresponding groups as training data respectively. After that, the LMs are used to estimate the likelihood of new narratives, effectively summarizing in a single number the similarity of the narrative to the previously seen narratives from each of the two groups. In previous research

41

LMs have been used to detect language dominance in bilingual children (Solorio et al., 2011), autism (Prud’hommeaux et al., 2011) and language impairment in monolingual and bilingual children (Gabani et al., 2009). Our project focuses on discovering differences in lexical use between persons with schizophrenia and healthy controls, based on narratives in which the subjects described past emotional experiences that evoked happy, sad, angry, fear and disgust emotions. We performed automated analyses of generic, word identity, dictionary and LM features in the autobiographical narratives of the subjects in our study. From the full set of available features we identified a limited set of features that were sufficient to distinguish the clinical status of the subject. We observed that persons with schizophrenia offer autobiographical descriptions of experiences that consist of: (1) shorter words, fewer words per sentence and more sentences per narrative, (2) a greater number of references to themselves, (3) a higher number of word repetition and (4) adverbs that denote intensity which were included in the psychiatrist's questions. We developed a comprehensive machine learning model by identifying a set of lexical features for each training fold. Our supervised classifier can differentiate persons with schizophrenia from normal controls with 74.4% accuracy. To examine the effectiveness of each feature group, we performed ablation experiments. It turned out that removing each feature group led to a decrease in performance, with the largest decrease occurring when word identity features were removed. The significant features detected in our work provide a candidate list of linguistic features that can be tested robustly in future work.

2. Method 2.1. Participants As part of our standardized method of obtaining evoked facial expressions (Kohler et al., 2010), we collected autobiographical narratives from 39 participants (19 male, 20 female; 23 persons with schizophrenia and 16 controls), group matched for age, gender and ethnicity. Demographic and clinical information is provided in Table 1. All patients were deemed clinically stable, without hospitalization for the past 6 months or no change in antipsychotic medications for the past 3 months. Participants were asked to narrate their autobiographical experiences of five universal emotions: anger, disgust, fear, happy and sad in pseudorandomized order. Subjects were instructed “to narrate the experience, as if telling to a good friend”, describing the general setting and proceeding to include moments when they experienced the target emotion to a mild, moderate or extreme degree. Narratives lasted between 30 and 90 s and contained 188 words on average. We obtained 120 narratives from patients and 81 from controls, as several participants offered Table 1 Demographic information for schizophrenic patients and healthy controls Variables

Full sample (N ¼ 39)

Schizophrenia (N ¼ 23)

Control (N ¼16)

Gender: male Ethnic White Black/African American Asian/Hybrid Mean age (S.D.) Education (S.D.) Mother education (S.D.) Father education (S.D.) SANS (S.D.)nnn SAPS (S.D.)nnn HAM-D (S.D.) LevFun (S.D.)nnn

19

12

7

12 23 4 33.21 (8.58) 13.63 (2.21) 13.39 (3.12) 13.63 (3.33) 18.90 (17.22) 7.82 (12.03)

8 12 3 33.81 (9.65) 13.08 (2.21) 13.33 (3.40) 13.65 (3.83) 28.04 (15.85) 12.52 (13.40) 5.95 (4.78) 24.52 (8.07)

4 11 1 32.29 (6.59) 14.47 (2.00) 13.06 (2.79) 13.71 (2.58) 4.87 (6.40) 0.6 (2.24)

26.75 (8.31)

33.43 (4.96)

None of the other differences were significant at the 95% confidence level. nnn

po 0.001 was noted for two-tailed t-test p-values.

42

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

partial stories that were recorded as two narratives.1 Audio recordings of the narratives were manually transcribed in plain text format by a transcriber who was blinded to the status of the narrator. Semantic and syntactic completeness were considered as the most important factor for splitting sentences during transcription. Periods, question marks or exclamation points were used to indicate the end of a sentence. Commas within the sentences were used in the following cases: (1) long or short pauses, (2) when the comma is semantically or syntactically necessary, as in normal English texts (including lists, separation of clauses, etc.).2 Spontaneous speech contains fillers (e.g. like, you know, I mean, kind of) and disfluencies (e.g. stuttering, hm, uh, um, and er) that can be informative of a person's speech patterns. Fillers and disfluencies were meticulously transcribed so as to be treated as words in the analysis. For example, when the word “like” was used as a filler, it was coded as “rrlike” in the transcript. Quality control on all narratives was performed by another trained transcriber to ensure that the audio recordings were transcribed and edited to the highest standards. Any differences in transcription were resolved in an adjudicating discussion. 2.2. Lexical features Lexical features are grouped into four classes: (1) generic features that capture basic text properties, (2) word identity features that track the occurrence of specific words in the narratives, (3) dictionary features derived on the basis of pre-existing dictionaries and (4) language model features. 2.2.1. Generic features The generic features include:

 Type-Token ratio: The number of unique words divided by the total number of





 

words in the narrative. Lower value of this feature indicates more repetitions of the same words in the text. This ratio is highly correlated with clinical judgments of thought disorder, especially in schizophrenia (Mann, 1944; Manschreck et al., 1981). Mean word-length: The average number of letters per word. Longer words are usually more complex and their use indicates better mastery of language. For example, the Coleman–Liau Index (Coleman and Liau, 1975) uses letters per word statistics as an important component of evaluating the understandability of a text. Average number of words per sentence: Longer sentences tend to be syntactically more complex and express more complex ideas. Previous work has employed mean length of utterance for analyzing narrative language in aphasia (Marini et al., 2011). Number of sentences per narrative: For text with similar length in number of words, more sentences indicate the use of shorter, less complex sentences. Total number of words per narrative.

In addition, we define the following three features related to repetition in the narrative.

 Word repetitions: Previous work has shown that schizophrenics tend to repeat

 

the same words frequently in the same passage (Maher, 1972). To capture this phenomenon, we define word repetition as the occurrence of the same word, not including punctuation, in a sliding window of five words within the same sentence. The window size is heuristically chosen. The value of this feature is the number of repetitions of any word, regardless of which specific word was repeated, divided by the total number of words in the narrative. For example, for the sentence “I am, am, afraid, that something bad would, would happen.”, “am” is repeated once, “would” is repeated once, giving a total of two repetitions. Presence of multiple commas: More commas might indicate more pauses within a sentence, higher sentence complexity or word/phrase enumeration. Sentence repetitions: This feature captures the amount of overlap between two adjacent sentences. We define sentence overlap to be equal to the number of words at the beginning of two adjacent sentences that are exactly the same. The sentence repetition feature is computed as the average sentence overlap value of all adjacent sentences within the narrative.

2.2.2. Word identity features The word identity features track the frequency of words in the narrative. The frequency of a given word is equal to the number of times that word appeared in the narrative divided by the total number of words in the narrative. For our general

1 The number of narratives from each group and each emotion is presented in Table A1 in the Supplement. 2 Sample snippets are shown in Supplement B. Since there was a mixed use of comma during transcription, we do not focus on analysis of features related to commas in this work.

analysis of lexical use, we employ all words (in total 1004 words) that appear at least five times in the transcripts. For training and testing a classifier for the prediction of subject status, we employ only the words that appear at least five times in the training data. In addition, we compute repetition features similar to the generic repetition features but computed for individual words. Here we track whether a particular word was repeated in the span of five consecutive words within a narrative. There are 288 words that are repeated at least once in the entire collection of narratives. 2.2.3. Dictionary features We used two dictionary-based systems to score transcripts based on the occurrence of categories of words. (a) Linguistic Inquiry and Word Count 2007 (LIWC) (Pennebaker et al., 2007): several manually compiled dictionaries corresponding to different categories are at the heart of the LIWC application. In LIWC, word categories are defined as a list of words or word stems. Each word is mapped to a list of categories associated with the word in LIWC. For instance, the word “cried” is part of the following categories: sadness, negative emotion, overall affect, verb, and past tense verb. When a narrative contains the word “cried”, the scores corresponding to these five sub-categories are incremented. For each narrative, LIWC outputs 69 real-value scores that correspond to each of its categories. (b) DICTION 6.0 (Hart, 2012): similar to LIWC, the DICTION tool outputs scores to characterize each narrative based on classes of words defined according to manually compiled dictionaries. There are in total 45 features derived by DICTION. Those features belong to three categories: basic scores (nine features), subvariables (31 features) and master variables (five features). Each sub-variable corresponds to a dictionary included in the tool. There are five master variables (activity, certainty, commonality, optimism and realism) in DICTION, calculated as the combination of scores from sub-variables. For example, Certainty is calculated as: Certainty ¼[Tenacity þLevelingþ Collectives þInsistence]  [Numerical Terms þ Ambivalence þ Self-ReferenceþVariety]. Examples of sub-variables include Cognition (words referring to cerebral processes, both functional and imaginative) and Satisfaction (terms associated with positive affective state).3 2.2.4. Language model features Language models (LMs) are tables of the probabilities of word sequences that can be easily estimated from sample narratives. For each training fold, we train one LM for narratives from patients and one LM for narratives from controls. We employ a bigram LM in which we estimate the probabilities of a word given the immediately preceding word, using add-one smoothing (Jurafsky and Martin, 2008). Then we compute the perplexity of a new transcript. Perplexity is defined as two to minus the average log probability of words in the text: the less probable the text is under the language model, the higher its perplexity will be. Let us denote by w1 w2 …wn the text which we wish to classify as produced by patient or by control and by Pðwi jwi  1 Þ the probability of a word given its preceding word. The perplexity for the given text according to the bigram LM is given by sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n 1 n PPðw1 w2 …wn Þ ¼ ∏ i ¼ 1 Pðwi jwi  1 Þ We train bigram LMs for both words (word-LM) and part of speech (POS) sequences (POS-LM). The part of speech tags were automatically assigned by the Stanford POS tagger (Toutanova et al., 2003). We defined eight ways to characterize each narrative with respect to LM information. These features indicate whether on average the story looks more like one produced by patients or by one told by controls.

 LM1 is the ratio between the narrative perplexity under the control and patient word-LMs.

 LM2 is the ratio between the narrative perplexity under the control and patient POS-LMs.

 LM3 is equal to 1 if the perplexity of the narrative according to the control 

word-LM is smaller than that of the patient word-LM. Otherwise LM4 is equal to 1. LM5 and LM6 are the analogous features for the POS-LMs. LM7 and LM8 are version of LM1 and LM2 scaled to the [0, 1] interval.

2.3. Feature selection and automatic classification We trained a supervised classifier to predict clinical status—schizophrenia versus control—based on lexical use in the personal narratives. Automatic prediction of subject status (patient or control) was performed in leave-one-subject-out fashion, where the data from one subject was held out for testing and the data for the remaining subjects was used for feature selection and training. We employ Support Vector Machine (SVM) (Cortes and Vapnik, 1995) as our learning model, due to its accuracy and effectiveness in text classification and its robustness for multi-dimensional and continuous features (Joachims, 1998; Schölkopf and Smola,

3 We give definitions for some of the LIWC and DICTION categories in Table A3 in the Supplement.

K. Hong et al. / Psychiatry Research 225 (2015) 40–49 2001; Kotsiantis, 2007). SVM has also been applied for clinical related tasks such as predicting adherence to treatment for schizophrenia (Howes et al., 2012a, 2012b), predicting psychological disorder (DeVault et al., 2013) and identifying subjects with depression and post-traumatic stress disorder (Yu et al., 2013). Specifically, we use the SVM-light (Joachims, 1999) tool. Training and testing are performed 39 times, with the narratives from each subject serving as the test data once. We compute two accuracy scores. One is the accuracy of predicting the status of the subject who told the story, without taking into account that the same subject recounted multiple stories (per story accuracy). The other is the accuracy of predicting the status of a subject based on all stories the subject told (per subject accuracy): patient status was assigned for a subject if at least half of his/her stories were predicted as being from a patient. We performed feature normalization using two approaches (binary normalization and feature scaling4) for all but the LM features and the repetition of specific words. We performed two-tailed t-tests to quantify the difference in usage among the patients and controls. We examined the performance of the classifier based only on the top ranked features at different significance levels based on different pvalue cutoffs. Because we do leave-one-subject-out experiments, where one subject is left out as testing and the machine learning and feature selection is performed over the training data, the features selected from different training folds may vary. We trained a classifier only based on the features whose p-values are below a predefined cut-off. The goal was to identify a small set of features which led to good discrimination accuracy in order to narrow down the analysis of most significant differences between the two groups. We also examined the interactions between subject status and emotions. We found out that there were more significant differences for certain emotions than for others. Fig. 1 describes the process of feature ranking and automatic classification in leave-one-subject-out fashion. 2.4. Evaluation metrics Several popular evaluation metrics capture different aspects of the performance of the classifier. They are all combinations of the four possible outcomes in prediction for a test example: true positive (TP) means the status of a subject is correctly identified as belonging to a target group, true negative (TN) means the status of a subject is correctly determined as not belonging to a target group, false positive (FP) means a subject is incorrectly identified as belonging to a target group, false negative (FN) means a subject from the target group is incorrectly predicted as belonging to the other group. We report precision, recall, F-score and accuracy, which are defined as Recall ¼

TP TP þ FN

Precision ¼

TP TP þ FP

F  score ¼

2nRecallnPrecision Precision þ Recall

Accuracy ¼

TP þ TN TP þ TNþ FP þ FN

Recall for the patient and control group respectively is called sensitivity and specificity in the literature. We also report macro F-score, which is equal to the average of the F-score for patients and controls. By using precision, recall, F-score and macro-F as metrics, one could understand the results better when there is an imbalance between the data of the two groups (Kotsiantis et al., 2006).

3. Results 3.1. Significant features We have a total of 2558 features from the four feature groups described above. To investigate the predictive power of these features, we performed two-tailed t-test based on the full set of narratives. Among all features, 173 features showed group differences at the 95% confidence level. Of these, 114 were more prevalent in patients and 59 were more prevalent in controls. Table 2 shows the complete list of significant features for each of the four feature groups. We divide these features into three levels of significance according to p-values: (i) high significance: p o0.0005 (15 features); (ii) medium significance: 0.0005 r 4

Details about binary normalization and feature scaling can be found in Supplement C.

43

p o0.005 (20 features); (iii) low significance: 0.005 r p o0.05 (138 features). The normalized feature has two associated p-values, one from t-test on the values of the feature with binary normalization and one with feature scaling. In such cases, we show the significance level of the feature according to its smaller p-value. We next discuss the significant features according to the feature groups.5 Generic features: Since this is a small class, we list the average values of all features for patients and controls before normalization in Table 3. We show detailed information for the number of words per story by different emotions and different status in Fig. 2. When comparing overall story length by emotion, disgust (mean words¼157.83) stories were significantly shorter than both anger (mean words¼ 206.95, t¼  2.080, d.f. ¼79.1, p ¼0.041) and sad (mean words ¼213.5, t¼  2.030, d.f. ¼70.4, p ¼0.046) stories. Patient and control groups did not differ in the number of words per story (t ¼ 0.622, d.f. ¼ 169.9, p ¼0.460) or within the five specific emotions. Based on analysis after feature normalization, there was no significant difference in the type-token ratio (t¼  1.295, d.f. ¼169.9, p ¼0.153). Patients employed significantly fewer words per sentence (t¼  2.593, d.f. ¼171.1, p ¼0.010), had more sentences per narrative (t ¼2.088, d.f. ¼198.68, p ¼0.038) and used shorter words (t¼  3.046, d.f. ¼158.3, p ¼0.003). Patients employed more word repetitions (t ¼4.46, d.f. ¼199.0, p¼ 1.33e 5) and there were fewer multiple commas transcribed (t¼  4.17, d.f. ¼161.7, p¼ 4.88e  5). Word identity features: From the features with high and medium significance levels, we see that patients expressed more subjective terms (I), talked more about money, used significantly more often the words got, took, way and couldn't. From the features with low significance level, we find that patients talked significantly more about dogs (e.g., dog and dogs), friends (e.g., friends and hanging) and family (e.g., grandfather, sister, and son). Patients used more adverbs that denote intensity (p o0.005), including extremely, moderately and mildly. Patients also quoted more questions while talking about their experiences or posed more questions to the psychiatrists, as indicated by a significantly higher rate of question marks in their narratives. Across narratives of all emotions, repetitions were found more often in schizophrenia. The repetition of I, and were of high significance while the repetition of a, was and the filler um were in the low significance level. Controls used the word very much more often than patients (p o0.005). At low significance level, the words more commonly employed by controls included adverbs such as actually, basically, really, the word sorry as well as some personal pronouns such as she's, theirs. Dictionary features: The predominant usage of self-reference among patients was one of the most significant features, measured by both LIWC (t¼4.31, d.f. ¼176.2, p¼ 2.73e  5) and DICTION (t¼ 4.62, d.f. ¼ 172.9, p ¼7.343e 6). The schizophrenia group also scored higher in the LIWC categories subjective pronouns, insight and DICTION categories of Cognition, Past, Insistence and Satisfaction (0.005 r p o0.05). Among the medium significance features, controls utilized more adverbs and exclusive words according to LIWC. Controls expressed at significantly higher rates the DICTION categories of Certainty, Cooperation, Diversity, Familiarity and Realism (0.005 r po 0.05).

5 The total number of features for each feature group can be found in Table A2 in the Supplement. We also show sample snippets for some of the significant features in Supplement B.

44

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

Fig. 1. The supervised learing framework for story-level status prediction and subject-level status prediction.

Language model features: Four features were identified as significantly different between the groups: LM1, LM4 and LM7 have higher values for patients and LM3 for controls. All of the features derived from word-LMs were significant, while none from POS-LMs were significant.

3.2. Status prediction Now we turn to studying the accuracy of subject status prediction (patient versus control) using different feature sets determined by their levels of significance for association with one of the classes computed on the training data. Fig. 3 shows the prediction accuracy for individual emotions (by story) and for all emotions combined per subject (by subject), using different significance levels (p-value cutoffs) for feature selection. We tested the statistical significance of our results over the majority baseline (59.7%) using one-tailed t-test. We marked the values significantly better than the majority at 95% confidence level in yellow in Fig. 3. Our model predicts the correct subject status with at least 65% accuracy in a range of cutoffs (0.0006 r p r 0.01). Peak performance for both story and subject level prediction was achieved by selecting features with p-values smaller than 0.0007. The prediction accuracy evaluated by story was 65.2% (t ¼1.327, n ¼200, p ¼0.093), while the accuracy evaluated by subject was 74.4% (t ¼1.966, n ¼38, p ¼0.028). Table 4 includes the detailed results at the best p-value cutoff. Different features were selected from different training folds, even using the same p-value cutoff. In Table 5 we show the features selected at least 10 times and the number of times they were selected at the threshold po 0.0007. Some of the features were selected in each training fold (39 times): repetition of words, self-reference, money, extremely, etc.

3.3. Feature ablation experiment We study the contribution of features from different groups by performing group ablation experiments at the po 0.0007 threshold. The result is shown in Table 6. Each row in the table corresponds to the performance of a classifier in which the specified class of features was removed from the full set of features. Removing any of the four feature groups led to a decrease in performance. The largest degradation was noted when word identity features were removed. By removing this class, the performance decreased by 11% evaluated by story-level accuracy. This is consistent with previous observations that word identity features contributed the most to the accuracy of predicting adherence to treatment for schizophrenia (Howes et al., 2012a, 2012b). The story-level accuracy also dropped with the removal of general features (by 1.0%), dictionary features (by 4.5%) and LM features (by 1.0%). These results suggest that all classes of features we proposed contribute to the accuracy of prediction and that they carry complementary information about subject status. To evaluate the predictive power of individual features, we examined the accuracy of a one-rule classifier. A one-rule classifier (decision stump) is a one-level decision tree (Iba and Langley, 1992; Holte, 1993). By using a threshold value determined on the training data, the decision stump makes its prediction based on the value of a single feature on the root node. The features related to self-reference had the best discriminative power (see Table 7). In fact, by using the DICTION version of self-reference feature after binary normalizations, the classifier could predict subject status with 79.5% accuracy, even higher than the SVM model. The difference however is not statistically significant (p 40.2) according to one-tailed t-test. To determine if the predictive power of the SVM classifier was only due to the presence of self-reference features, we removed all

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

45

Table 2 Significant features between schizophrenic patients and healthy controls Features more common in schizophrenia Generic features High-S: word repetitions Low-S: sentences/narrative Word identity features Words High-S: extremely, I, mildly, money, got Medium-S: couldn't, moderately, took, way, ? Low-S: ain't, alone, at, aw, became, before, behind, care, chance, confused, December, dog, dogs, extreme, feeling, forty, friends, grandfather, god, guess, guy, hanging, hard, hearing, hundred, increased, looking, loved, me, mental, met, mild, moderate, my, myself, outside, paper, passed, piece, remember, sister, son, stand, step, story, take, taken, then, throwing, trouble, use, wake Word High-S: and, I repetitions Low-S: a, um, was Dictionary features LIWC High-S: self-reference Low-S: insight, personal pronoun DICTION High-S: self-reference Low-S: cognition, insistence, past concern, satisfaction Language model features High-S: LM1, LM7 Low-S: LM4 Feature more common in control Generic features High-S: presence of multiple commas Medium-S: mean word-length Low-S: words/sentence Word identity features Words Medium-S: “,”, very Low-S: able, actually, are, basically, be, being, every, get's, in, late, not, really, relationship, result, she's, sleep, sorry, tell, their, there's, weeks, with Word Low-S: very repetitions Dictionary features LIWC Medium-S: adverb, exclusive words Low-S: inhibition, Zsix-letter words DICTION Medium-S: mean word-length, word-complexity Low-S: certainty, cooperation, diversity, familiarity, realism Language model features Low-S: LM3 Note. High-S (high significance): o 0:0005; Medium-S (medium significance): 0:0005 r p o 0:005; Low-S (low significance): 0:005 r p o 0:05. Definitions for some of the significant dictionary-based feature categories are listed in Table A3 in Supplement. Examples of some snippets for the features I, dogs, money, sorry, very and relationship are listed in Supplement B.

Table 3 Average values and statistical significance of generic features before normalization, for schizophrenic patients and healthy controls. Features

Schizophrenia (N ¼ 23)

Control (N ¼ 16)

p

Type/token ratio Mean word-length Words/sentence Sentences/narrative Words/narrative Word repetitions Presence of multiple commas Adjacent sentence initial overlapping

0.466 3.788 18.89 10.34 192.22 0.059 0.040 3.876

0.485 3.891 21.09 8.77 180.79 0.041 0.056 3.692

0.153 0.003 0.010 0.038 0.460 0.00001 0.001 0.425

self-reference related features at the p o0.0007 threshold (see Table 6). The accuracy decreased to 60.7% on story-level and 64.1% on subject-level but remained considerably higher than chance level and majority. Our findings reveal that self-reference features are even stronger indicator of subject status than we have anticipated based on findings from prior work. They also reveal that the other features also capture salient lexical differences between the groups. 3.4. Status prediction by specific emotions The SVM classifier has an accuracy of 65.2% by story at the p o0.0007 cutoff. We further investigated the story-level accuracy

Fig. 2. The average number of words for narratives across five emotions. S¼Schizophrenia, C ¼ Control in this figure. For this box-whisker plot, we use the default settings from R function: boxplot. The box indicates lower quartile as smaller number (Q1), median in the middle (Q2) and upper quartile as larger number (Q3) in the box. The whiskers are Q3 þ 1.5nIQR and Q1–1.5nIQR, where IQR¼ Q3  Q1.

based on the emotion that a narrative conveyed, as shown in Table 4. The accuracy was similar for anger (68.3%), fear (67.5%), happy (66.7%) and sad (67.5%) stories, but relatively low for disgust stories (60.9%). The accuracies for all five emotions were higher than majority, but none of them were significantly better.

46

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

Prediction Accuracy Changing with p-values 0.8 0.75 0.7 0.65 0.6 0.55 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.15

0.5 p-value cutoff

by story

by subject

Fig. 3. Prediction accuracy by narratives for single emotions per subject (by story) and prediction accuracy for emotions combined per subject (by subject) at different significance levels. Values significantly better than majority are marked as yellow (po 0.05). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Next, we performed exploratory two-tailed t-test for each feature only on the narratives for a specific emotion. In Fig. 4, we show the number of significant features at different p-value cutoffs per emotion. At larger p-value cutoffs (i.e., 0.03, 0.04, and 0.05), there were more significant features identified from anger and sad narratives. At smaller p-value cutoffs (p r 0.005), there were more significant features from happy and anger stories, while there were almost no differences in lexical use for the other emotions. We further compared the significant features (the 95% confidence level) by emotion.6 Some features were significant for only one emotion. In anger stories, patients used at higher rates words from ingest and cognition dictionary categories. In disgust stories, patients more commonly talked about dogs and health issues, while controls showed a higher communication score according to DICTION. In fear stories, patients talked about money more often than controls. Meanwhile, controls used more inhibition words (for example, block, constrain and stop). In happy stories, patients had a higher tendency of being ambivalent and used more past tense according to DICTION. When talking about sad experiences, patients' stories scored higher in future, insight and motion categories according to DICTION; controls used the word working more often. Some features were significant for more than one emotion. These include word repetitions and self-reference among the patient group. Patients talked more about friends in happy and sad stories. There were also more question marks transcribed in the fear and sad stories told by patients. Patients used the filler word “um” more often in disgust and happy stories, which indicates a higher degree of disfluencies in the narratives for these emotions. Features with significantly higher values for controls include mean word-length in anger and disgust emotions.

4. Discussion Aspects of structure, form and content of language can be gleaned from categories that relate to the lexicon as the central constituent. Behavioral studies have elucidated widespread language dysfunction impairments that are supported by functional imaging (Li et al., 2009) and electrophysiological studies (Strik et al., 2008). There, research in schizophrenia has tended to focus on measures of coherence, which reflect communicatory 6 A complete list of significant features for each emotion can be found in Table A5 in Supplement A.

dysfunction and formal thought disorder. Even more than other clinical symptoms, disordered language may evidence itself readily during clinical progression and, depending on persistence and severity, will have marked effects on communicatory abilities and social functioning. The purpose of our project was to examine differences in lexical use between matched groups of persons with schizophrenia and healthy controls who narrated experiences of five universal emotions. As part of our established procedure to assess evoked expressions of emotions, we asked participants to recount autobiographical experiences of happy, sad, angry, fearful and disgusted emotions. Automated lexical analysis focused on four main features, including generic features which reflect basic information regarding the number of letters and words, word identity features which track specific words, dictionary features that group words into semantic categories and language models which track probabilities of word sequences. Employing all available information, we examined group differences in lexical use. We also tested an automated classifier predicting subject status, i.e. patient versus control based on features extracted from the narratives. Automated analyses revealed over 170 features that differed between groups. As the model did not correct for multiple comparisons, further discussion will be limited to those features that differed in significance at either high (p o0.0005) or medium (0.0005 r p o0.005) significance levels. These accounted for altogether 35 features. Overall, based on the automated analysis of autobiographical narratives using multiple lexical categories, we found that persons with stable schizophrenia produced similar quantity of speech, but differed in lower expressivity and complexity. Patients also had significantly higher rates of self-reference and repetitions, consistent with prior works. The automated classifier was able to distinguish clinical status of patient versus control with promising accuracy. The methodology of feature analysis according to general features, dictionary features (LIWC and DICTION) and word identity features is easy to replicate for analyzing lexical differences in a broader scope of applications. Our SVM-based text classification model can also be applied to investigate classifying the presence and absence of other mental disorders and predicting the degree of symptoms. Regarding generic features, the groups did not differ with respect to the number of words per story overall and nor within specific emotion, i.e. narratives were of roughly comparable length between the two groups. This was somewhat surprising given the common notion that verbal output is more limited in schizophrenia. Patients produced sentences with a higher number of word repetitions, consistent with previous research (Maher, 1972). They also used shorter words and shorter sentences than controls. As anticipated, patients employed more references about themselves and this is in line with previous findings for persons with depression and those with emotional distress (Rude et al., 2004). Patients mentioned money more often than controls did. The significance of money as an emotional theme may relate to patients' more limited access to finances and therefore greater emotional value. References to money were widespread among patients. Money was mentioned in 19 stories of 13 subjects, in contrast to appearing only in one story for controls. Patients employed adverbs which denote intensity such as mildly, moderately and extremely. The use of those adverbs relates to the exact instructions participants received before narrating autobiographical experiences and, along with repetition of words and filler pauses (um), indicates more constricted use of language. Controls, in contrast, employed longer sentences and longer words. A significantly higher number of multiple commas were transcribed among the narratives of controls. Since “,” is used for transcription when either pauses appeared or for syntax/semantic reasons, the reasons for the significance of this feature may be confounded and future studies will have to study pauses and the

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

47

Table 4 Story-level and subject-level prediction performance using different feature selection approaches. Measurement

By story Random Majority po 0.0007 By subject Random Majority po 0.0007 By emotion Anger Disgust Fear Happy Sad

Schizophrenia

Control

General

Precision

Recall

F-score

Precision

Recall

F-score

Accuracy

p

Macro-F

0.597 0.597 0.689

0.500 1 0.758

0.544 0.748 0.722

0.405 0 0.581

0.500 0 0.494

0.446 0 0.533

0.500 0.597 0.652

0.093

0.495 0.374 0.628

0.590 0.590 0.724

0.500 1 0.913

0.541 0.742 0.808

0.410 0 0.800

0.500 0 0.500

0.450 0 0.615

0.500 0.590 0.744

0.028

0.496 0.371 0.712

0.677 0.652 0.656 0.720 0.762

0.875 0.600 0.913 0.750 0.667

0.763 0.625 0.763 0.735 0.711

0.700 0.444 0.750 0.570 0.579

0.412 0.500 0.353 0.533 0.688

0.519 0.471 0.480 0.551 0.629

0.683 0.609 0.675 0.667 0.674

0.105 0.322 0.080 0.300 0.237

0.641 0.548 0.622 0.643 0.670

Note: F-score¼ 2nPrecisionnRecall/(Precision þRecall); Macro-F¼Average of F-scores for schizophrenics and controls. We test the significance of prediction accuracy against the majority baseline using one-tailed t-test. Alternative hypothesis is that our model is better than the majority.

Table 5 Top features selected at the po 0.0007 threshold from at least 10 training folds. Feature names

Num. of times selected

Feature names

Num. of times selected

Presence of multiple “,”a Self-reference (LIWC)a Moneya Extremelya Self-reference (DICTION)a Repeat of wordsb Self-reference (LIWC)b Self-reference (DICTION)b Repeat of “and” LM7b

39

LM1a

37

39

Repeat of “I” a

36

39 39 39

Mildly Ib Mildlyb

36 35 34

39 39

Gota Adverb (LIWC)b “,”b

26 18

39 38 37

Couldn't

12 a

10

a

Feature derived by binary normalization. Feature derived by feature scaling, which scales features to real numbers in the [0, 1] interval. b

syntactic use of commas specifically to disentangle their contributions. Controls also more commonly employed adverbs (e.g., actually, basically, really) and words of exclusion (e.g., but, without, exclude). The usage of the adverb “very” (e.g., very happy, very respectable, also see Supplement B for examples) is more prominent among controls. In sum, our findings indicate higher complexity and more expressive language in controls. Most differences were found in generic and word identity features. Differences in dictionary features, such as LIWC and DICTION which place specific words in categories, did not yield additional information at medium or high significance levels. The language model features capture typical lexical usage in patients and controls. The significance of language model features indicates that the two groups utilized different words and topics in the expression of emotion narratives. Our classifier to predict clinical status, when utilizing an average of 17 features, showed sensitivity of over 90% and specificity of 50% to determine the group affiliation of schizophrenia subjects. The prediction accuracy was similar for all emotions except for disgust where accuracy was much lower. Analyzed by the number of features across different emotions at different cutoffs, the two groups did not show as much lexical

differences to describe fearful and disgust experiences as for other emotions (see Fig. 4). This finding suggests that the narratives of happy, sad and anger emotions might reflect the lexical and topical differences of the two groups better than other emotions. The main limitation of our work is the limited sample size. It did not permit us to pursue the examination whether lexical features were related to common clinical symptoms and functioning, as patients were deemed clinically stable and their symptoms fell within a narrow range. Although our model demonstrated its effectiveness in identifying patients from healthy controls, the performance varied with different p-value cutoffs, especially with small or large p-values cutoffs. Ideally, we would like to have a dedicated test set, with p-value selected using a held-out development set. However, it is difficult to do this given the small sample size. We are surprised to find out that individual features related to self-reference result in similar accuracy of prediction as the complete SVM model. To quantify the degree to which the selfreference features and the remaining features capture complementary information about the difference between the two groups, we compared the prediction errors between the SVM classifier at the p r 0.0007 threshold and the one-rule classifier using self-reference in DICTION as the only feature. On the story level, the two classifiers made 70 and 69 wrong predictions respectively, but out of these only 44 were errors on the same stories. On the subject level, the two classifiers made 10 and eight wrong predictions respectively, six of which were for the same subject. In sum, the predictions made by SVM were quite different from the one-rule classifier. These results indicate that information encoded in self-reference features and in the other features we studied are complementary and the respective classifiers make errors on different instances. We leave for future work the development of decision level combination of the two classifiers, which has been shown in many other applications to be more powerful than simply combining the features in a single classifier. Furthermore, the ablation experiments we performed demonstrated the effectiveness of the features not related to selfreference. Since machine learning requires large amount of data, it is likely that SVM will perform better than the one-rule classifier if more data were available. We leave the testing of this possibility for future work. For investigating the lexical differences based on dictionary features, we employed two toolkits: LIWC and DICTION. Of these two, DICTION is not as popular as LIWC for psychological analysis.

48

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

Table 6 Feature ablation experiment. Feature selection is performed first at the po 0.0007 threshold. Measurement

Schizophrenia

By story Features removed None General Dictionary Word-identity LM Self-reference By subject Ablated features None General Dictionary Word-identity LM Self-reference

Control

General

Precision

Recall

F-score

Precision

Recall

F-score

Accuracy

p

Macro-F

0.689 0.685 0.652 0.607 0.682 0.661

0.758 0.742 0.733 0.617 0.750 0.700

0.722 0.712 0.690 0.612 0.714 0.680

0.580 0.563 0.515 0.418 0.565 0.514

0.494 0.494 0.420 0.407 0.481 0.469

0.533 0.526 0.463 0.413 0.520 0.490

0.652 0.642 0.607 0.532 0.642 0.607

0.091 0.143 0.404 0.912 0.143 0.409

0.628 0.619 0.577 0.513 0.617 0.585

0.724 0.700 0.679 0.643 0.724 0.667

0.913 0.913 0.826 0.783 0.913 0.783

0.808 0.792 0.745 0.706 0.808 0.720

0.800 0.787 0.636 0.544 0.800 0.583

0.500 0.438 0.438 0.376 0.500 0.438

0.615 0.560 0.519 0.444 0.615 0.500

0.744 0.718 0.667 0.615 0.744 0.641

0.028 0.048 0.187 0.384 0.028 0.285

0.712 0.676 0.632 0.575 0.712 0.610

Note: Features related to self-reference are of the highest accuracy by using one-rule classifier. Therefore, we exclude those features as another feature ablation experiment. We test the significance of prediction accuracy against the majority baseline using one-tailed t-test. Alternative hypothesis is that our model is better than the majority.

Table 7 Performance of the one-rule classifier using the most discriminative features (selfreference). Measurement

Self(LIWC)a Self(LIWC)b Self(DICTION)a Self(DICTION)b Repeat of “I” Ia Ib

By story

By subject

Accuracy

p

Macro-F

Accuracy

p

Macro-F

0.642 0.667 0.657 0.657 0.652 0.607 0.672

0.186 0.061 0.111 0.096 0.096 0.419 0.023

0.638 0.654 0.652 0.645 0.630 0.601 0.634

0.744 0.744 0.795 0.744 0.769 0.615 0.744

0.080 0.055 0.022 0.055 0.016 0.400 0.016

0.739 0.729 0.788 0.729 0.745 0.598 0.699

a

Feature derived by binary normalization. Feature derived by feature scaling, which scales features to real numbers in the [0, 1] interval. b

Number of significant features for the five basic emotions Number of features

70

These findings indicate that future studies will benefit from the use of both dictionaries. Future directions of this research may include the potential relatedness of language measures with clinical symptoms that reflect acuity of illness and the more enduring negative symptoms, and with quality of life measures. Our computer model focused mainly on differences in content of language. In schizophrenia, both abnormal lexicon and thought progression, or formal thought disorder, are common clinical features and our approach can be extended to include both aspects of language. In recent studies, it has been shown that both positive and negative symptoms of schizophrenia map onto certain topics generated during a nonclinical interview that related to quality of life measures (Howes et al., 2013) and that the verbal discourse between caretakers and persons with schizophrenia (Cretchley et al., 2010) is modified by the conversation profile of the patient. Therefore in combination with syntactic measures of coherence, an automated model of both form and content of thought may allow for determination of discourse in dyadic interactions that could assist in remediation approaches that focus on more effective use of language and social interactions.

60 50

Appendix A. Supporting information

40 30

Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.psychres.2014.10.002.

20 10 0 0.003

0.005

0.007

anger

0.01 disgust

0.02 fear

0.03 happy

0.04 0.05 p-value cutoff sad

Fig. 4. The number of significant features for the five basic emotions at different pvalue cutoffs.

In our work, however, the word categories by DICTION revealed several statistically significant differences between the two groups. In fact, the p-value of the DICTION-based self-reference feature was smaller than those for its LIWC counterpart. Word categorizations of the dictionaries in the toolkits are also different.

References Ando, M., Morita, T., O’Connor, S.J., 2007. Primary concerns of advanced cancer patients identified through the structured life review process, a qualitative study using a text mining technique. Palliative & Supportive Care 5 (3), 265–271. Coleman, M., Liau, T.L., 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 283–284. Cortes, C., Vapnik, V., 1995. Support vector networks. Machine Learning 20, 273–297. Covington, M.A., He, C., Brown, C., Naci, L., McClain, J.T., Fjordbak, B.S., Semple, J., Brown, J., 2005. Schizophrenia and the structure of language: the linguist's view. Schizophrenia Research 77, 85–98. Crow, T.J., 1997. Is schizophrenia the price that Homo sapiens pays for language? Schizophrenia Research 28, 127–141.

K. Hong et al. / Psychiatry Research 225 (2015) 40–49

Cretchley, J., Gallois, C., Chenery, H., Smith, A., 2010. Conversations between carers and people with schizophrenia: a qualitative analysis using leximancer. Qualitative Health Research 20 (12), 1611–1628. DeLisi, L.E., 2001. Speech disorder in schizophrenia: review of the literature and exploration of its relation to the uniquely human capacity for language. Schizophrenia Bulletin 27 (3), 481–496. DeVault, D., Georgila, K., Artstein, R., Morbini, F., Traum, D., Scherer, S., Rizzo, A., Morency, L., 2013. Verbal indicators of psychological distress in interactive dialogue with a virtual human. In: Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2013), Metz, France, pp. 193–202. Docherty, N.M., 2005. Cognitive impairments and disordered speech in schizophrenia: thought disorder, disorganization, and communication failure perspectives. Journal of Abnormal Psychology 114, 269–278. Docherty, N.M., DeRosa, M., Andreasen, N.C., 1996. Communication disturbances in schizophrenia and mania. Archives of General Psychiatry 53 (4), 358–364. Elvevåg, B., Foltz, P.W., Weinberger, D.R., Goldberg, T.E., 2007. Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia. Schizophrenia Research 93, 304–316. Gabani, K., Sherman, M., Solorio, T., Liu, Y., Bedore, L., Peña, E., 2009. A corpus-based approach for the prediction of language impairment in monolingual English and Spanish-English bilingual children. In: Proceedings of the 2009 Conference of the North America Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT 2009), Boulder, Colorado, USA, pp. 46–55. Geschwind, N., Galaburda, A.M., 1987. Cerebral Lateralization: Biological Mechanisms, Associations and Pathology. The MIT press, Cambridge, MA. Gill, A.J., Nowson, S., Oberlander, J., 2009. What are they blogging about? Personality, topic and motivation in blogs. In: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media (ICWSM 2009), pp. 18–25. Gill, A.J., Oberlander, J., Austin, E., 2006. Rating e-mail personality at zero acquaintance. Personality and Individual Differences 40 (3), 497–507. Hart, R., 2012. DICTION 6.0, The Text-Analysis Program User's Manual, Scolari Software, Sage Press. Retrieved from: 〈http://www.dictionsoftware.com〉. He, Q., VeldKamp, B.P., de Vries, T., 2012. Screening for posttraumatic stress disorder using verbal features in self narratives: a text mining approach. Psychiatry Research 198, 441–447. Heilman, M.J., Collins, K., Callan, J., 2007. Combining lexical and grammatical features to improve readability measures for first and second language texts. In: Proceedings of the 2007 Conference of the North America Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT 2007), Rochester, New York, USA, pp. 460–467. Holte, R.C., 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90. Howes, C., Purver M., McCabe R., Healey, P.G.T. and Lavelle, M., 2012a. Helping the medicine go down: repair and adherence in patient-clinician dialogues. In: Proceedings of the 16th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL 2012), Paris, France. Howes, C., Purver M., McCabe R., Healey, P.G.T. and Lavelle, M., 2012b. Predicting adherence to treatment for schizophrenia from dialogue transcripts. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2012), Seoul, South Korea, pp.79–83. Howes, C., Purver, M., McCabe, R., 2013. Using conversation topics for predicting therapy outcomes in schizophrenia. Biomedical Informatics Insights 6 (Suppl. 1), S39–S50. Iba, W., Langley, P., 1992. Induction of one-level decision trees. In: Proceedings of the Ninth International Conference on Machine Learning (ICML 1992), Aberdeen, Scotland, pp. 233–240. Joachims, T., 1998. Text categorization with support vector machines. In: Proceedings of the European Conference on Machine Learning (ECML 1998), Chemnitz, Germany. Joachims, T., 1999. Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods—Support Vector Learning. The MIT Press, Cambridge, MA, pp. 169–184. Jurafsky, D., Martin, J.H., 2008. Speech and Language Processing, 2nd ed. PrenticeHall, Upper Saddle River, NH. Kohler, C.G., Walker, J.B., Martin, E.A., Healey, K.M., Moberg, P.J., 2010. Facial emotion perception in schizophrenia: a meta-analytic review. Schizophrenia Bulletin 36 (5), 1009–1019. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., 2006. Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering 30 (1), 25–36. Kotsiantis, S.B., 2007. Supervised machine learning: a review of classification techniques. Informatica 31, 249–268.

49

Li, X., Branch, C.A., DeLisi, L.E., 2009. Language pathway abnormalities in schizophrenia: a review of fMRI and other imaging studies. Journal of Current Opinion in Psychiatry 22 (2), 131–139. Maher, B.A., 1972. The language of schizophrenia: a review and interpretation. British Journal of Psychiatry 120, 3–17. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K., 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30, 457–500. Mann, M.B., 1944. The quantitative differentiation of samples of spoken language. Psychological Monograph 56, 41–74. Manschreck, T.C., Maher, B.A., Ader, D.N., 1981. Formal thought disorder, the typetoken ratio, and disturbed voluntary motor movement in schizophrenia. British Journal of Psychiatry 139, 7–15. Marini, A., Andreetta, S., Del Tin, S., Carlomagno, S, 2011. A multi-level approach to the analysis of narrative language in aphasia. Aphasiology 25, 1372–1392. Marini, A., Spoletini, I., Rubino, I.A., Ciuffa, M., Bria, P., Martinotti, G., Banfi, G., Boccascino, R., Strom, P., Siracusano, A., Caltagiorone, C., Spalletta, G., 2008. The language of schizophrenia: an analysis of micro and macrolinguistic abilities and their neuropsychological correlates. Schizophrenia Research 105 (1–3), 144–155. McKenna, P.J., Oh, T.M., 2005. Schizophrenic Speech: Making Sense of Bathroots and Ponds that Fall in Doorways, . Cambridge University Press. Newby, D., 1998. ‘Cloze’ procedure refined and modified. ‘Modified Cloze’, ‘reverse Cloze’ and the use of predictability as a measure of communication problems in psychosis. The British Journal of Psychiatry 172, 136–141. Newman, M.L., Pennebaker, J.W., Berry, D.S., Richards, J.M., 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29, 665–675. Niznikiewicz, M.A., Shenton, M.E., Voglmaier, M., Nestor, P.G., Dickey, C.C., Frumin, M., Seidman, L.J., Allen, C.G., McCarley, R.W., 2002. Semantic dysfunction in women with schizotypal personality disorder. American Journal of Psychiatry 159, 1767–1774. Pennebaker, J.W., King, L.A., 1999. Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77, 1296–1312. Pennebaker, J.W., Booth R.J., Francis, M.E., 2007. Linguistic inquiry and word count (LIWC2007): a text analysis program, Austin, Texas. Retrieved from: 〈http:// www.liwc.net〉. Prud’hommeaux, E.T., Roark, B., Black, L.M., Santen, J.V., 2011. Classification of atypical language in autism. In: Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland, Oregon, USA, pp. 88–96. Rude, S., Gortner, E.M., Pennebaker, J.W., 2004. Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18 (8), 1121–1133. Schölkopf, B., Smola, A.J., 2001. Learning with Kernels. The MIT Press, Cambridge, MA. Smith, A. E., 2003. Automatic extraction of semantic networks from text using Leximancer. In: Proceedings of the 2003 Conference of the North America Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT 2003): Companion Volume, Edmonton, Canada, pp. Demo23–24. Solorio, T., Sherman, M., Liu, Y., Bedore, L., Peña, E., Iglesias, A., 2011. Analyzing language samples of Spanish-English bilingual children for the automated prediction of language dominance. Natural Language Engineering 17 (3), 367–395. Solovay, M.R., Shenton, M.E., Holzman, P.S., 1987. Comparative studies of thought disorder. I. Mania and schizophrenia. Archives of General Psychiatry 44, 13–20. Stirman, S.W., Pennebaker, J.W., 2001. Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine 63, 517–522. Strik, W., Direks, T., Hubl, D., Horn, H., 2008. Hallucinations, thought disorders, and the language domain in Schizophrenia. Clinical EEG & Neuroscience Journal 39 (2), 91–94. Tausczik, Y.R., Pennebaker, J.W., 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29 (1), 24–54. Toutanova, K., Klein D., Manning, C., Singer, Y., 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North America Chapter of the Association for Computational Linguistics: Human Language Technology (NAACL-HLT 2003), Edmonton, Canada, pp. 252–259. Yu, Z., Scherer, S., Devault, D., Gratch, J., Stratou, G., Morency, L., Cassell, J., 2013. Multimodal prediction of psychological disorder: learning nonverbal commonality in adjacency pairs. In: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL 2013), Amsterdam, Holland.