Accepted Manuscript Likeableness and Meaningfulness Ratings of 555 (+486) Person-Descriptive Words Jesse Chandler PII: DOI: Reference:
S0092-6566(16)30084-8 http://dx.doi.org/10.1016/j.jrp.2016.07.005 YJRPE 3582
To appear in:
Journal of Research in Personality
Received Date: Revised Date: Accepted Date:
1 July 2015 4 July 2016 14 July 2016
Please cite this article as: Chandler, J., Likeableness and Meaningfulness Ratings of 555 (+486) Person-Descriptive Words, Journal of Research in Personality (2016), doi: http://dx.doi.org/10.1016/j.jrp.2016.07.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Running Head: 555(+486) PERSON-DESCRIPTIVE WORDS
Likeableness and Meaningfulness Ratings of 555 (+486) Person-Descriptive Words Jesse Chandler Mathematica Policy Research University of Michigan (5451 words)
Author Note Jesse Chandler, University of Michigan. I am grateful to Danielle Shapiro and an anonymous reviewer for helpful comments on an earlier draft of this manuscript. Correspondence concerning this article should be addressed to Jesse Chandler, Institute for Social Research, University of Michigan, Ann Arbor, MI, 48103. Contact:
[email protected]
555(+486) PERSON-DESCRIPTIVE WORDS Abstract <110 words> The present study renorms and expands upon a list of person descriptive words originally compiled by Anderson (1968). Anderson observed that person descriptive words had a bimodal and slightly negative distribution. Averill (1980) localized this negativity to emotion words, finding no difference for non-emotional words. Likeability ratings observed in the original study and the present study were highly correlated. Despite these similarities, significant differences in likability were observed across a large proportion of words. There was some evidence that words describing emotions and temporary states were unusually negative suggesting that either negative behavior is categorized differently or that the granularity of negative behavior differs across kinds of person descriptive words. Keywords. Lexical hypothesis, impression formation, negativity bias
555(+486) PERSON-DESCRIPTIVE WORDS Likeableness and Meaningfulness Ratings of 555 (+486) Person-Descriptive Words
1. Introduction Almost fifty years ago, Norman Anderson (1968) published normed data on 555 person descriptive words. This list was constructed through an iterative process in which 2200 person descriptive words deemed “likely to be useful” to describe people were selected by Anderson from a much longer list of person descriptive words assembled by Gordon Allport and Henry Odbert (1936). These researcher-nominated words were further reduced through college student ratings of appropriateness to a final list of words that were then normed for likability and meaningfulness. Anderson (1968) has been enormously influential: according to Google Scholar this paper has been cited over 1700 times. Researchers across all disciplines of psychology have explicitly used his norms to develop experimental stimuli (e.g. Adolphs, Tranel & Damasio, 1998), dependent measures (e.g. Markus, 1977), and codebooks to quantify descriptive text (e.g., Moretti & Higgins, 1990). Other researchers have indirectly relied on his norms by replicating studies that selected stimuli based on these norms or by adapting these experiments to test other questions (e.g. variations of the “Donald” paradigm; Higgins, Rholes & Jones, 1977). Anderson’s normed ratings have also led to two important observations about the likability of person descriptive words. First, likability ratings of person descriptive words are bimodal, with many positive and negative words and few neutral words. This suggests that person descriptive words are not impartial characterizations of human behavior but rather almost always include some blend of description and evaluation, reflecting the importance, and perhaps even the primacy, of evaluation in person perception (Murphy & Zajonc, 1993). This tendency is
555(+486) PERSON-DESCRIPTIVE WORDS most widely discussed in the context of problems this distribution may cause in the measurement of underlying dimensions of personality (for discussions see Ashton, Lee, Goldberg & de Vries, 2009; Block, 1995). Second, and of broader interest, there are more negative person descriptive words than positive person descriptive words. Anderson (1968) initially characterized the increased prevalence of negative words as “slight,” but Averill (1980) noted that it differed significantly from an equal distribution. To the extent that a larger number of words about a topic reflects its greater psychological importance (see John, Angleitner, & Ostendorf, 1988), the preponderance of negative words may reflect the importance of attending to negative information (Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001; Rozin & Rozyman 2001). Descriptions of negative behavior may be more granular (and thus more plentiful) because people think about negative information more carefully (e.g., Schwarz, 2002; Unkelbach, Fiedler, Bayer, Stegmuller, & Danner, 2008). More recently, this tendency has also been interpreted within an evolutionary framework as reflecting the importance of correctly identifying and communicating information about interpersonal threats (Leising, Ostrovski, & Borkenau, 2012; Leising, Scharloth , Lohse, & Wood, 2014). These interpretations overlook a potentially important caveat to the general tendency for person descriptive words to be negative. Averill (1980) found that the prepondence of negative words was explained entirely by negativity within words that described emotional states. The remaining non-emotional words were more likely to be positive (see also Kloumann et al., 2012). Accounts that emphasize the importance of attending to environmental threats or carefully processing negative stimuli are typically silent on why only negative emotions are described with such nuance.
555(+486) PERSON-DESCRIPTIVE WORDS Averill’s (1980) explanation of the unique negativity of emotion words started with the observation that people not only evaluate others but are themselves the target of evaluation, both by others, and by themselves (Bem, 1972). In a variation of the self-serving bias (Campbell & Sedikes, 1999), he further speculated that the motivation to appear favorable to the self and others drives the increased specificity of words used to describe negative emotions. Negative behavior is threatening to the self and granular distinctions between negative terms compartmentalizes undesirable behavior. Emotions in particular are protective because they are often thought of as beyond control and external to the self, insulating actors from blame (see also Pizarro, Uhlmann & Salovey, 2003). 1.1 Motivations for Replication The present study both directly (or at least closely; Strobe & Strack, 2014) and conceptually replicates both Anderson’s original study and Averill’s (1980) secondary analysis. It directly replicates Anderson by renorming words included in his original list and examining agreement between norms obtained in the original study and the replication. It conceptually replicates these findings by norming a new set of person descriptive words that are selected in a manner that is free of researcher bias. The proportion of negative words overall, including both emotion and non-emotion words, will be examined within both the direct and conceptual replication. Adding new words to Anderson’s list is of practical importance. The original word list is small enough that it can be difficult to use it to develop sets of stimuli that satisfy multiple constraints. Renorming Anderson’s original words is important because there have been few systematic investigations of the stability of ratings for individual person descriptive words (but see Conolley & Knight, 1976; Dumas, Johnson & Lynch, 2002). Many factors can change the
555(+486) PERSON-DESCRIPTIVE WORDS likability of a word. First, how people evaluate the common sense in which a word is used can change. A resurgence in the value placed on handmade objects can lead people to appreciate “craftiness.” At the same time, the dominant sense in which a word is used can change. For example, people may think “craftiness” refers to skill at macramé, or they may think of it as a synonym for “devious.” Direct and conceptual replication also allow key assumptions that underlie Anderson’s (1969) method to be tested. Direct replication makes it possible to identify whether tacit assumptions of the original study can be reconstructed (Collins, 1974) and whether a finding generalizes across contexts, people or time. Conceptual replication strengthens confidence that research findings are robust to decisions made in experimental design and are not a result of experimental artifacts (Schmidt, 2009). Of particular relevance for the present study, recall that Allport’s first step was to restrict the universe of possible person descriptive words to those deemed “likely to be useful.” This step included a deliberate effort to eliminate “extreme words...and other words not considered suitable for the impression formation task” (Anderson, 1968, pg. 272). Consequently, it is difficult to know the extent to which observed patterns reflect the true distribution of person descriptive words and the extent to which they inadvertently reflect Anderson’s own beliefs, actions, or motives (for a related discussion of trait taxonomies see Block, 1991). For example, Anderson may have searched for antonyms as a strategy for identifying words, either because this was consistent with the assumptions of dominant personality theories of the time (John & Srivastava, 1999), or because positive and negative words were of greatest relevance for his own research (e.g., Anderson, 1965).
555(+486) PERSON-DESCRIPTIVE WORDS For similar reasons, the proportion of negative person descriptive words in reported bu Anderson may not reflect the proportion of negative person descriptive words in English. The negativity of traits could have been amplified by psychology’s historic focus on negative aspects of the human experience (Carlson, 1966; Peterson, 2006, Seligman & Csikszentmihalyi, 2000), or attenuated by Anderson’s exclusion of extremely evaluative traits or other (unspecified) efforts to “balance” the list. The conceptual replication avoids researcher artifacts by using a more systematic method of determining traits for inclusion. 2. Method The replication effort updates or replicates efforts published in three different papers: the initial listing of person descriptive words collected and classified into four broad categories by Allport and Odbert (1936), Anderson’s (1968) reduction of this list to a smaller subset of words that he then normed for likability and meaningfulness and Averill’s (1980) secondary analysis of the overall proportion of negative words and the proportions of negative emotional and nonemotional words. Using the process, 960 words, including 473 of the words on Anderson's (1968) original list were selected (referred to as the “conceptual replication”). The remaining 82 words from Anderson's original list were added to allow for a direct replication of his original 555 words (referred to as the “direct replication”). The final list consisted of 1042 words (referred to as the “total sample”). At times, the direct and conceptual replications are compared to the earlier replication conducted by Dumas and colleagues (2001; referred to as “the Dumas replication”) 2.1 Generating a Comprehensive List of Person Descriptive Words Anderson (1968) used Allport and Odbert's (1936) list of person descriptive words as a starting point for a comprehensive list of words. For the present study, this list was supplemented
555(+486) PERSON-DESCRIPTIVE WORDS with additional person descriptive words added to the Oxford English Dictionary after 1935. The final list consisted of 21,342 words. 2.2 Reducing the Comprehensive List In Anderson’s original study, students were asked students to rate a list of 2200 words he selected according to how appropriate they were for describing college students. If at least two respondents indicated that they did not know the meaning of a word it was excluded from further consideration. The most appropriate words (that is, those scoring above an unspecified cutoff score) were selected for inclusion in the final published list. In the present study, all words were rated for appropriateness for describing people. Ratings were provided by US Mechanical Turk workers who had completed at least 500 HITs with a HIT approval rating of greater than 95% (for a discussion of using Mechanical Turk workers to rate items see Chandler, Paolacci & Mueller, 2013). Workers were paid 10 cents for each HIT that they completed. Each HIT consisted of a block of 20-25 words. Workers could rate as many blocks as they wished. Following the original study, for each word, workers were asked to: “Select X if you personally do not have a sense of what the word means. If you have a sense of what the word means, select a number from 0 to 3 according to its appropriateness for describing people from 0 (can never be used to describe people) to 3 (can only be used to describe people).” A word was excluded from further consideration if three respondents indicated that they did not know what it meant, or if two respondents indicated that they did not know what it meant and one participant rated the word as a '0' in meaningfulness.
555(+486) PERSON-DESCRIPTIVE WORDS As a quality-control measure, each block included two pseudowords (strings of letters that phonetically appear to be words, but have no meaning). A block of word ratings was replaced with new ratings if the worker assigned an average appropriateness to the two pseudowords greater than 0.5. In total 5.6% of responses were rejected and replaced in this manner. Average appropriateness ratings were calculated for each remaining word. Words were selected for the conceptual replication if they fell above the arbitrary cutoff of 2.36. 2.3 Likableness Ratings In the original study 100 college students rated each word for likableness. In the present study, approximately 50 participants from a university subject pool and 50 workers recruited through MTurk rated each word for likableness. Subject pool participants were each asked to rate between 50 and 500 words as a part of larger packets of questionnaires. Each word was rated approximately 50 times. MTurk workers who had completed at least 50 HITs with a HIT approval rating of greater than 95% rated words in blocks of 20-25 words at a time. Workers were paid 10 cents for each block of words that they rated. Workers could rate as many blocks as they wished. Following the original study, for each word participants were asked to: “Think of a person as being described by each of these words. Rate the word according to how much you would like the person described by this trait. Try to use a range of different ratings, rather than just a few across all of the words. Most importantly, rate each trait according to your own personal opinion.” Each word was rated on a scale from 0 (least favorable or desirable) to 6 (most favorable or desirable).
555(+486) PERSON-DESCRIPTIVE WORDS Quality control measures were not applied to college student participants. Word ratings for each HIT submitted by MTurk workers were correlated with the mean rating of those words. A block of ratings was rejected and replaced with new responses if they were uncorrelated or negatively correlated with the average ratings of other workers. Less than 1% of responses were rejected and replaced in this manner. 2.4 Meaningfulness Ratings In the original study, 50 college students rated each word for meaningfulness. In the present study, words were rated by 50 MTurk workers. Workers were eligible to rate words if they had completed at least 50 HITs with a HIT approval rating of greater than 95%. Workers were presented with blocks of 20 words and two pseudowords. Workers were paid 10 cents for each block of words that they rated. Each block was rated by approximately 50 US workers. Workers could rate as many blocks as they wished, but could only rate each block once. Following the original study, participants were asked to: “Try to think of a person as being described by each of these words. Rate the word according to how well you know its meaning as a description of people.” Each word was rated on a scale ranging from 0 ("I have almost no idea of the meaning of this word") to 4 ("I have a very clear and definite understanding of the meaning of this word"). A block of ratings was replaced with new responses if they were uncorrelated or negatively correlated with the average ratings of other workers (pseudowords were included to avoid issues with correlating words with a potentially restricted range of meaningfulness ratings). Less than 2% of responses were rejected and replaced in this manner. 2.5 Word Categorization
555(+486) PERSON-DESCRIPTIVE WORDS 2.5.1 Emotionality. Averill (1980), classified words in the original study according to whether they are descriptive of emotional states or tendencies by first identifying all words that overlapped with an existing taxonomy of emotions (Averill, 1975) and asking three graduate students to classify remaining words according to whether they described an emotion or emotional predisposition (Averill, 1980, pg. 12). Words were classified as emotional if two raters indicated such. Individual word ratings and reliability statistics were not reported, but depending on the judge, between 226 and 318 words were categorized as emotional. In the present study, each word was categorized by 11 MTurk workers. Eleven workers were used because prior research has demonstrated that data quality for majority rules crowd ratings asymptotes at around 10 responses (Carvalho, Dimitrov & Larson, 2016) and that crowds of this size are often equivalent to or superior to the aggregated judgments of smaller groups of experts (Benoit et al. 2014; Byun et al. 2014). Ratings were provided by US MTurk workers who had completed at least 500 HITs with a HIT approval rating of greater than 95%. Workers were paid 10 cents for each HIT that they completed. Each HIT consisted of a block of 19 words. Workers could rate as many blocks as they wished. Emotional words were defined at those that "could describe either a temporary emotion (e.g. ‘angry’), or a long term tendency to experience a particular type of emotion (e.g., ‘ill-tempered’)." Workers were then asked to indicate whether each word was emotional or not using a binary choice measure. A block of ratings was replaced with new responses if they were uncorrelated or negatively correlated with the average ratings of other workers. In total 4.6% of responses were rejected and replaced in this manner.
555(+486) PERSON-DESCRIPTIVE WORDS The final categorization of each word was determined by the majority of worker responses. In total, 274 words were categorized as emotion words (210 from Anderson’s original list) and 768 words were categorized as nonemotion words. 2.5.2 Traits, states, evaluations and other terms. Words were classified using the categories assigned by Allport & Odbert (1935): personal traits (permanent internal qualities), temporary states (transient internal qualities), social evaluations (descriptions of the effect an individual has on others that do not correspond to a singular internal disposition) and miscellaneous, primarily metaphorical or doubtful words (a catch all category of remaining words). The original classifications were retained for the 927 words in this list that were included in Allport and Odbert (1936). I classified new words using the system developed by Allport and Odbert after the collection of appropriateness ratings and prior to the collection of all other data. Due to the overlapping nature of these categories and their low reliability attained by the original authors, this classification should be considered purely exploratory. 3. Results 3.1 Preliminary Analysis Likability ratings provided by the MTurk workers and college subject pool participants were highly reliable, MTurk single measure ICC(1) = .51 [95% CI = 48, .54] average ICC(1) = .980 [95% CI = .979, .983], Subject Pool single measure ICC(1) = .62 [95% CI = .59, .64] average ICC(1) = .988 [95% CI = .987, .989]. In other words, each individual MTurk workers ratings were less correlated with the ratings of other individual MTurk workers, but the correlation between individual ratings and the mean rating within both samples was quite high. Further, the average MTurk worker ratings and average college pool ratings were highly
555(+486) PERSON-DESCRIPTIVE WORDS correlated, r(1039) = .96 and were thus aggregated into a single rating for each word. Meaningfulness ratings were somewhat less reliable, average ICC(1) = .888 [95% CI = .782, .953]. Means and standard deviations of likeableness and meaningfulness ratings for each word are provided in the supplementary material. 3.2 Comparing Past and Present Measures of Likability and Meaningfulness Likability ratings in the direct replication were highly correlated with likability ratings in the original study and likability ratings reported in the Dumas replication (Table 1). Meaningfulness ratings in original study were only moderately correlated with meaningfulness ratings in the direct replication and the Dumas replication (rs = .49). The correlation between meaningfulness ratings in the Dumas replication and the direct replication was much higher (r = .81), z = 9.82, p < .0001. Ratings of likability and meaningfulness were weakly correlated in both the direct replication and the Dumas replication (Table 1). A similar relationship was observed in the conceptual replication, r(1040) = .24. These findings are contrary to the zero correlation in the original study. 3.3 Changes in the Likability of Specific Items Individual likability ratings from the original study, the Dumas replication and the direct replication were mean centered and compared using planned contrasts (Rosenthal, Rosnow & Rubin, 2000). Ratings from each study were compared pairwise to assess overall consistency across studies. To minimize alpha inflation from multiple comparisons, family-wise alpha for each contrast was set at .01 using a sequential Bonferonni adjustment (Holm, 1979). Three hundred twenty-five words did not significantly differ in likeability across any of the samples (see supplementary materials). The average absolute difference in likability for these
555(+486) PERSON-DESCRIPTIVE WORDS 325 words across all three samples was 0.22 (SD = 0.10).One hundred twenty-seven words differed in one of the three pairwise comparisons, with 40 words differing between the original study and Dumas replication, 80 words differing between the Dumas replication and the direct replication and 7 words differing between the original study and the direct replication. The average absolute difference in likability for these 127 words between all three samples was 0.40 (SD = 0.10). One hundred words differed across two comparisons, with the original study differing from both the Dumas replication and the direct replication in 49 cases, the Dumas replication differing from the original study and direct replication in 45 cases and the direct replication differing from the original study and the Dumas replication in 6 cases. The average absolute difference in likability for these words between all three samples was 0.62 (SD = 0.19). Three words differed significantly across all three comparisons. The average absolute difference in likability for these words between all three samples was 0.95 (SD = 0.06). These findings highlight that despite the strong overall correlations between samples, there is considerable variation in the likability of individual items. The linear effect of time on likability was tested using planned contrasts for each word. Contrast weights were assigned to each study that equaled the number of years by which it differed from the mean publication year of all three studies (original study -29, Dumas 7, direct replication 22). Again, family-wise alpha was set at .01 using a sequential Bonferonni adjustment (Holm, 1979). The null hypothesis was rejected in favor of the linear effect of time for 87 words. When the null hypothesis is rejected in favor of the linear effect of time it does not mean that it provides the best explanation of the data, only that it provides a better explanation than the null hypothesis. Strong evidence of a truly linear effect would also provide a significantly better
555(+486) PERSON-DESCRIPTIVE WORDS explanation of the data than alternative hypotheses. In particular, since both the Dumas replication and the direct replication are contrast coded with the same sign, it is important to distinguish between a true linear effect of time and cases in which the mean observed in the direct replication fell between the mean in the Dumas replication and the mean in the original study. Words for which the null hypothesis was rejected were subjected to a second test in which the linear effect of time was compared against the hypothesis that the Dumas replication differed from both the original study and the direct replication (contrast weights: original study 1, Dumas 2, direct replication -1). Following the procedure outlined by Rosenthal and colleagues (2001), first the linear contrast and comparison of the Dumas replication to the other studies were standardized so that the standard deviation of the contrast weights equaled one (contrast weights -1.36, 0.33, 1.03 and -.71, 1.41, 0.71 respectively). The difference between the standardized contrast weights was then calculated (dcontrast weights: original study -.65, Dumas replication -1.09, direct replication 1.74). This difference contrast serves as a test of whether one theory better explains the data than the other For 28 words the linear time contrast was a superior fit to the data than the contrast comparing the Dumas replication to the original study and direct replication, suggesting that these words have the clearest evidence for consistent and sustained change over time. Among the words that have changed linearly over time (listed in Table 2), likability judgments tended to become more extreme: 11 of the 15 initially negative words became more negative and 11 of the 13 initially positive words became more positive. Qualitatively, it is unusual that of the twelve words that became more negative, eight were related to disagreeableness. Consistent with this theme, "calm" and "easygoing" became more positive
555(+486) PERSON-DESCRIPTIVE WORDS (although "non-confrontational" became more negative). Of the 16 words that became more positive many were related to optimism (i.e., positive, confident, hopeful), conviction (i.e., headstrong, bold, moral, tough) and self-sufficiency (i.e., self-contented, self-sufficient). 3.4 The Proportion of Negative Person Descriptive Words In the original study, 302 of the 555 words fell below the midpoint of the likability scale, significantly more than would be expected from an equal distribution according to a sign test, Z = 2.04, p = .04. In the direct replication, more than half of the words (N = 288) included in Anderson’s original list fell below the midpoint of the likability scale, but this was not significantly different from an equal distribution, Z = .85, p = .40. However, the proportion of negative words in the original study and the direct replication did not differ, χ2(1, N = 555) < 1. In the conceptual replication, 554 words were negative, significantly more than expected from an equal distribution, Z = 4.74, p < .001. 3.5 Word Categorization 3.5.1 Emotionality. In the original study, there were more negative (58.1%) emotion words than positive emotion words, Z = 2.28, p = .02. There were equal proportions of negative (47.8%) and positive non-emotion words, Z = .75, p = .45, ns. However, the proportion of negative emotion words was not higher than the proportion of negative non-emotion words, χ2(1, N = 555) = 1.85, p = .17. A similar pattern was observed in the direct replication. There were more negative emotion words (58.1%) than positive emotion words, Z = 2.28, p = .02. There were equal proportions of negative (48.1%) and positive non-emotion words, Z = .75, p = .45. The
555(+486) PERSON-DESCRIPTIVE WORDS proportion of negative words was higher for emotion words than for non-emotion words, χ2(1, N = 555) = 5.21, p = .02. In the conceptual replication, there were more negative emotion words (62.7%) than positive emotion words, Z = 4.01, p <.001. There were also more negative non-emotion words (55.9%) than positive non-emotion words, Z = 3.09, p = .01. The proportion of negative words was slightly higher for emotion words than for non-emotion words, χ2(1, N = 960) = 3.61, p = .06. 3.5.2 Traits, states, evaluations and other words. The proportion of negative words was examined within each of the four categories of descriptive words used by Allport & Odbert (1935) in both the direct and conceptual replications to explore other potential drivers of negativity. As can be seen in Table 3, in the direct replication, negative words were only more common within temporary states – a category that contains a high proportion of emotion words. In the conceptual replication, all word categories have a greater proportion of negative words, but the proportion is greatest for temporary states, followed by social evaluations, which is in turn contains significantly more negative words than the trait and metaphorical categories. 3.6 The Bimodality of Person Descriptive Words The distribution of likability ratings was examined with a dip test, which tests for multimodality against the null hypothesis of a unimodal distribution (Hartigan & Hartigan, 1985). Anderson’s original likability ratings are not unimodal, p < .001, nor were likability ratings in the direct replication, p < .001 but in the conceptual replication the null hypothesis was retained,
555(+486) PERSON-DESCRIPTIVE WORDS p = .52 suggesting that for this sample it would be more accurate to describe the distribution of negativity ratings as unimodal and negative (Figure 1). Each of Allport and Odbert’s word categories were tested for unimodality individually. Social evaluations are not unimodal in both the direct replication, p < .00001 and the conceptual replication, p < .01. Traits are not unimodal in the direct replication, p < .00001, but are unimodal in the conceptual replication, p = .18. Temporary states and other words were unimodally distributed across both the direct and conceptual replications. 4. Discussion The present study provides updated likability and meaningfulness ratings for Anderson's (1968) list of person descriptive words. In conjunction with other replications (Conolley & Knight, 1976; Dumas et al., 2002) these data provide some insight into the stability of meaningfulness and likability ratings over the past fifty years. There was only a modest correlation between meaningfulness ratings in the original study and in the direct replication. Further, unlike the original study, in the direct replication meaningfulness and likability were correlated. A similar correlation was observed by Dumas and colleagues (2002). It is likely that unspecified methodological differences led Anderson to obtain uncorrelated likability and meaningfulness ratings. He remarked that it took “considerable” initial pilot testing to stop participants from ascribing greater meaningfulness to more positive words (Anderson, 1968, pg. 272). While Anderson clearly thought that it was important to force orthogonality on meaningfulness and likability ratings, researchers may find people’s spontaneous ratings to be more useful when equating stimuli to each other.
555(+486) PERSON-DESCRIPTIVE WORDS The distribution of likability ratings observed in the conceptual replication - which used words selected based on the objective criteria of their appropriateness for describing people - was unimodal and negative. This pattern is somewhat different from the bimodal and less negative distribution observed in Anderson's (1968) study and the direct replication. These differences may be due to Anderson selecting words that are "likely to be useful." The original list included a larger proportion of trait words, which tend to be less negative. Further, within each of Allport and Odbert’s (1936) categories of person descriptive words, there were proportionately fewer negative words in Anderson's (1968) list than in the conceptual replication. The preponderance of negative person descriptive words is consistent with claims that people process negative information about others in more detail (e.g., Baumeister et al., 2001; Rozin & Rozyman 2001; Schwarz, 2002; Unkelbach,et al., 2008). It is also consistent with evolutionary accounts that emphasize the importance of attending to and explaining the behavior of potentially threatening individuals (e.g., Leising et al., 2014). The present study complicates earlier interpretations of this negativity bias by demonstrating that it is particularly strong for descriptions of states, somewhat less true of descriptions of social evaluations and not true for descriptions of traits. This finding is consistent with earlier research demonstrating a correlation between ratings of "traitness" and likability within a corpus of words that people spontaneously use to describe others (Leising et al., 2014). There could be many reasons why state words and social evaluations are more negative than trait words. The detail oriented processing style afforded by negative affect may not only lead to the description of behaviors in more granular terms but also lead to more state based rather than trait based explanations of behavior. Consistent with this hypothesis, negative mood
555(+486) PERSON-DESCRIPTIVE WORDS tends to reduce the number of dispositional attributions people make when explaining behavior (Forgas, 1998). People may also think differently about different kinds of person descriptive words. If lay theories of personality assume that personality traits as existing on bipolar continua, the requirement that each trait continuum have words representing both poles could result in a relatively more equal distribution of valenced words. Alternatively, people may think more about some kinds of person descriptive words than others. Social evaluations of others are a frequent topic of conversation (Foster, 2004). People are also likely to think more about states, both because they are vivid phenomenological experiences and because changes (in the self and perhaps in others) demand more attention than baseline states and attributes (for an overview see Diener, Lucas & Scollon, 2006). Spending more time thinking about something tends to amplify the effects of the cognitive style or strategy used to think about it (Larrick, 2004), which should result in more pronounced differences in the granularity of negative and positive words. Averill's (1980) hypothesis that the preponderance of negative words was a result of selfprotective attributions of negative behavior made specific predictions about differences across kinds of words. In the present study, there was some support for his claim that emotion words are more negative than non-emotion words. In the present replication, the truth of this claim hinges on the analytic strategy used (i.e., comparing both proportions to zero, or each proportion to the other; Silberzahn & Uhlmann, 2015), and one's conviction that God loves p = .06 (Rosnow & Rosenthal, 1989). Notably, his explanation is also compatible with the distributions of negative words observed within states and traits. The state/trait distinction may even be a cleaner test of self-protective attributions because non-emotional states can also provide self-protective
555(+486) PERSON-DESCRIPTIVE WORDS attributions for bad behavior (e.g., "tired") while emotional dispositions (e.g., “ill-tempered”) do not. Future research could fruitfully tease apart the contributions of these various processes by examining spontaneous use of person descriptive words under different conditions or through more detailed content coding of words. Evolutionary accounts emphasize that the perceived threat posed by others is a key moderating variable of the granularity of descriptive words, suggesting that threatening others should be described in more granular terms than nonthreatening but negative others. Mood based theories predict that experiencing negative affect should increase the granularity of person descriptive terms and possibly leads to a greater emphasis on states rather than traits. Finally, if self-protective biases play a role, self-descriptions (but not other descriptions) should shift to emphasize temporary states when the self is under threat (Campbell & Sedikes, 1999). One major caveat to these results is that they are based on a relatively small sample of state words. State words make up a small proportion of both Anderson's (1965) original study (5%) and the conceptual replication (7%) relative to the proportion of state words (compared to all other person descriptive words) in the English language (25%; Allport & Odbert, 1936). This discrepancy might result from the constraint that words should be uniquely applicable to people in each of the studies. A large number of states that are applicable to animals in general (e.g., hunger, fear) are thus missing from both the direct and conceptual replication. Future research could examine the distribution of likability ratings across a sample of words that did not impose this constraint. Perhaps the most surprising finding is that despite the overall consistency observed across the likability ratings of all words, a large proportion of words exhibited considerable variance
555(+486) PERSON-DESCRIPTIVE WORDS across replications. For almost 40% of the rated words, there was at least one statistically significant difference in likability ratings observed across Anderson's (1968) original study, Dumas and colleagues (2002) replication and the present study. These differences were observed despite a sample that was powered to detect only relatively large effects (d > 0.35). Approximately 10% of the words that differed across samples changed linearly over time. Words related to disagreeableness became markedly more negative, while words related to individuality and self-expression became more positive. These findings suggest that people today might be more likely to censure displays of aggression and more likely to value self-expression than they have in the past (see also Twenge, 1997). These findings are exploratory and correlational, and should be interpreted as suggestive of promising areas of future cross-temporal research rather than as definitive evidence of change. Many of the likability differences observed across samples were not explained by a linear time trend. These differences reflect an unknown combination of measurement error (McKillip, 1978), non-linear temporal or cohort effects (e.g., oscillation in likability over time), changes in meaning or likability due to temporarily accessible information (e.g., news and current events), differences in the traits that are valued by the populations that were used in each sample, or incidental differences in methodology (e.g., Dumas et al. [2002] used a six point likability scale anchored at 1 rather than a seven point scale anchored at 0). A practical take away from these findings is that a large proportion of words may not elicit reliable perceptions of likability across samples, and my implication across individuals within a sample. This is particularly true within the Mechanical Turk sample, which correlated quite well with the student sample in aggregate, but also demonstrated greater heterogeneity in individual ratings, perhaps reflecting the greater diversity of individuals within this sample.
555(+486) PERSON-DESCRIPTIVE WORDS These findings should serve as a note of caution for researchers relying on these lists as evidence that words elicit a particular degree of positivity or negativity without appropriate pretesting or manipulation checks.
555(+486) PERSON-DESCRIPTIVE WORDS References Adolphs, R., Tranel, D., & Damasio, A. R. (1998). The human amygdala in social judgment. Nature, 393, 471. Allport, G. W., & Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(1). Anderson, N. H. (1965). Averaging versus adding as a stimulus-combination rule in impression formation. Journal of Experimental Psychology, 70(4), 394-400. Anderson, N. H. (1968). Likableness ratings of 555 personality-trait words. Journal of Personality and Social Psychology, 9(3), 272-279. Ashton, M. C., Lee, K., Goldberg, L. R., & de Vries, R. E. (2009). Higher order factors of personality: Do they exist?.Personality and Social Psychology Review, 13(2), 79-91. Averill, J. R. (1975). A semantic atlas of emotional concepts. JSAS: Catalog of Selected Documents in Psychology, 5, 330. Ms. No. 421. Averill, J. R. (1980). On the paucity of positive emotions. In K. R. Blankstein, P. Pliner & J Polivy (Eds.), Assessment and Modification of Emotional Behavior (pp. 7-45). New York, NY: Plenum Press. Baumeister, R. F., Bratslavsky, E., Finkenauer, C., & Vohs, K. D. (2001). Bad is stronger than good. Review of General Psychology, 5(4), 323-370. Block, J. (1995). A contrarian view of the five-factor approach to personality description. Psychological Bulletin, 117(2), 187-215. Bem, D. J. (1972). Self-perception theory. Advances in Experimental Social Psychology, 6, 1-62. Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2015). Crowd-sourced text analysis: reproducible and agile production of political data. American Political Science Review. Byun, T. M., Halpin, P. F., & Szeredi, D. (2015). Online crowdsourcing for efficient rating of speech: A validation study. Journal of Communication Disorders, 53, 70-83.
555(+486) PERSON-DESCRIPTIVE WORDS Campbell, W. K., & Sedikides, C. (1999). Self-threat magnifies the self-serving bias: A metaanalytic integration. Review of General Psychology, 3(1), 23-43. Carlson, E. R. (1966). The affective tone of psychology. Journal of General Psychology, 75, 65-78. Carvalho, A., Dimitrov, S., & Larson, K. (2016). How many crowdsourced workers should a requester hire? Annals of Mathematics and Artificial Intelligence. DOI: 10.1007/s10472-015-9492-4 Chandler, J., Paolacci, G., & Mueller, P. (2013). Risks and rewards of crowdsourcing marketplaces. In P. Michelucci (Ed.) Handbook of Human Computation (pp. 377-392). New York: Springer. Collins, H. M. (1974). The TEA set: Tacit knowledge and scientific networks.Science Studies, 4(2), 165185. Conolley, E. S., & Knight, G. P. (1976). Anderson's personality-trait words: Has their likableness changed? Personality and Social Psychology Bulletin, 2(3), 303-306.
Diener, E., Lucas, R. E., & Scollon, C. N. (2006). Beyond the hedonic treadmill: revising the adaptation theory of well-being. American Psychologist, 61(4), 305-314 Dumas, J. E., Johnson, M., & Lynch, A. M. (2002). Likableness, familiarity, and frequency of 844 person-descriptive words. Personality and Individual Differences, 32(3), 523-531. Forgas, J. P. (1998). On being happy and mistaken: mood effects on the fundamental attribution error. Journal of Personality and Social Psychology, 75(2), 318-331. Foster, E. K. (2004). Research on gossip: Taxonomy, methods, and future directions. Review of General Psychology, 8(2), 78. Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. The Annals of Statistics, 13(1), 7084. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65-70. Higgins, E. T., Rholes, W. S., & Jones, C. R. (1977). Category accessibility and impression formation. Journal of Experimental Social Psychology, 13(2), 141-154.
555(+486) PERSON-DESCRIPTIVE WORDS John, O. P., Angleitner, A., & Ostendorf, F. (1988). The lexical approach to personality: A historical review of trait taxonomic research. European Journal of Personality, 2(3), 171-203. John, O. P., & Srivastava, S. (1999). The Big Five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of personality: Theory and research, 2(1999), 102-138. Kloumann, I. M., Danforth, C. M., Harris, K. D., Bliss, C. A., & Dodds, P. S. (2012). Positivity of the English language. PloS one, 7(1), e29484. Larrick (2004). Debiasing. In D. J. Koehler & N Harvey (Eds.) Blackwell handbook of judgment and decision making (pp. 316.338). New York: John Wiley & Sons. Leising, D., Ostrovski, O., & Borkenau, P. (2012). Vocabulary for describing disliked persons is more differentiated than vocabulary for describing liked persons. Journal of Research in Personality, 46(4), 393-396. Leising, D., Scharloth, J., Lohse, O., & Wood, D. (2014). What Types of Terms Do People Use When Describing an Individual’s Personality? Psychological Science, 25(9), 1787-1794. Markus, H. (1977). Self-schemata and processing information about the self. Journal of Personality and Social Psychology, 35(2), 63-78. McKillip, J. (1978). Comment on" Anderson's personality trait words: Has their meaning changed?". Personality and Social Psychology Bulletin, 4(2), 289-291. Moretti, M. M., & Higgins, E. T. (1990). Relating self-discrepancy to self-esteem: The contribution of discrepancy beyond actual-self ratings. Journal of Experimental Social Psychology, 26(2), 108123. Murphy, S. T., & Zajonc, R. B. (1993). Affect, cognition, and awareness: affective priming with optimal and suboptimal stimulus exposures. Journal of Personality and Social Psychology, 64(5), 723739. Peterson, C. (2006). A Primer in positive psychology. Oxford University Press. Raghunathan, R., & Pham, M. T. (1999). All negative moods are not equal: Motivational influences of anxiety and
555(+486) PERSON-DESCRIPTIVE WORDS sadness on decision making. Organizational Behavior and Human Decision Processes, 79(1), 5677. Pizarro, D., Uhlmann, E., & Salovey, P. (2003). Asymmetry in judgments of moral blame and praise the role of perceived metadesires. Psychological Science, 14(3), 267-272. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. New York: Cambridge University Press. Rosnow, R.L. & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284. Rozin, P., & Royzman, E. B. (2001). Negativity bias, negativity dominance, and contagion. Personality and Social Psychology Review, 5(4), 296-320. Seligman, M. E., & Csikszentmihalyi, M. (2000). Positive psychology: An introduction. American Psychologist, 55(1), 5-14.
Silberzahn, R., & Uhlmann, E. L. (2015). Crowdsourced research: Many hands make tight work. Nature, 526(7572), 189. Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13(2), 90. Schwarz, N. (2002). Situated cognition and the wisdom of feelings: Cognitive tuning. In L. Feldman Barrett & P. Salovey (Eds.), The wisdom in feelings (pp.144-166). New York: Guilford. Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9(1), 59-71. Twenge, J. M. (1997). Changes in masculine and feminine traits over time: A meta-analysis. Sex Roles, 36, 305–325 Unkelbach, C., Fiedler, K., Bayer, M., Stegmüller, M., & Danner, D. (2008). Why positive information is processed faster: The density hypothesis. Journal of Personality and Social Psychology, 95, 36– 49.
555(+486) PERSON-DESCRIPTIVE WORDS
555(+486) PERSON-DESCRIPTIVE WORDS Figure 1. Histogram of Words by Likability. Distribution of words by mean likability of words selected by Anderson (Direct Replication) and words selected through a systematic process (Conceptual Replication)
555(+486) PERSON-DESCRIPTIVE WORDS
555(+486) PERSON-DESCRIPTIVE WORDS Table 1 Correlation between Ratings of Meaningfulness and Likability for Anderson's 555 Words across three studies Anderson (1968) Likable Anderson
Likable
(1968)
Meaningful
Dumas et
Likable
al. (2002)
Meaningful
Present
Likable
Study
Meaningful
2.93(1.46)
Meaningful
Dumas et al. (2002) Likable
Meaningful
Present Study Likable
Meaningful
.01
.94**
.20**
.96**
.23**
3.54(.20)
.03
.49**
.01
.49**
5.64(1.35)
.26**
.98**
.29**
5.59(.36)
.20**
.81**
2.95(1.54)
.25**
Note: M(SD) reported on the diagonal. ** p < .01.
3.68(.26)
555(+486) PERSON-DESCRIPTIVE WORDS Table 2 Words that have significantly different likability ratings in Anderson (1965) and the present study Word
Anderson (1968)
Dumas et al. (2001)
Present Study
Linear Contrast
M(SD)
M(SD)
M(SD)
t
crafty
-0.70(1.98)
0.96(1.31)
1.06(1.45)
10.95
headstrong
-0.97(1.17)
0.12(1.53)
0.60(1.31)
9.96
positive
1.10(1.28)
2.10(0.75)
2.30(0.89)
9.26
comical
0.96(1.09)
1.82(1.01)
2.00(0.95)
7.81
easygoing
1.19(1.20)
1.98(0.82)
2.15(1.04)
7.22
tough
-0.65(1.74)
0.01(1.42)
0.55(1.24)
6.97
confident
1.08((1.04)
1.71(0.89)
2.02(1.02)
6.93
old-fashioned
-0.54(1.39)
0.24(1.31)
0.47(1.30)
6.56
calm
1.13(0.84)
1.52(1.00)
1.89(0.96)
5.51
hopeful
1.13(0.92)
1.58(1.03)
1.88(1.06)
5.37
bold
0.43(1.22)
0.92(1.17)
1.24(1.22)
5.30
moral
1.18(1.67)
1.62(1.10)
2.01(1.04)
5.19
self-sufficient
1.19(1.30)
1.44(1.06)
1.96(0.95)
4.87
sociable
1.36(0.85)
1.70(1.00)
2.02(0.88)
4.83
generous
1.66(0.89)
1.94(0.81)
2.27(0.81)
4.59
self-contented
0.31(2.04)
0.54(1.37)
1.18(1.67)
4.35
unaccomodating
-1.19(0.68)
-1.24(1.20)
-1.87(0.88)
-4.36
555(+486) PERSON-DESCRIPTIVE WORDS ungracious
-1.53(0.71)
-1.81(0.75)
-2.14(0.94)
-4.68
self-righteous
-0.06(2.46)
-0.15(1.54)
-1.21(1.40)
-5.29
discontented
-0.56(1.00)
-1.08(0.98)
-1.30(1.05)
-5.40
nonconfronting
0.76(1.33)
-0.87(1.11)
0.23(1.52)
-5.47
materialistic
-0.33(1.66)
-0.99(1.22)
-1.27(1.31)
-5.86
abusive
-1.93(0.83)
-2.15(0.68)
-2.70(0.63)
-5.97
angry
-1.24(0.90)
-1.81(0.74)
-1.99(0.93)
-6.05
disturbed
-1.04(0.97)
-1.60(0.90)
-1.88(1.08)
-6.17
unagreeable
-1.09(1.98)
-1.69(0.87)
-1.91(0.91)
-6.21
neglectful
-1.34(0.59)
-1.95(0.65)
-2.09(0.85)
-6.81
aggressive
0.11(1.43)
-0.67(1.34)
-0.99(1.27)
-6.98
Note. Means are centered within study. All words on this list have significantly different means across study, p < .01 after applying a sequential Bonferroni correction. Words with an asterisk are significantly different at this level after controlling for measurement error implied by SD.
555(+486) PERSON-DESCRIPTIVE WORDS Table 3 Distribution of positive and negative person descriptive words across categories Direct Replication Pct. Negative
Exact binomial
(N)
Conceptual Replication Pct. Negative
Exact binomial
(N)
Personal Traits
51% (380)a
Z = 0.26, p = .80 55% (529)c
Z = 1.74, p = .08
Temporary States
70% (30)b
Z = 2.01, p = .05 76% (63)d
Z = 4.03, p < .001
Social Evaluations
53% (131)ab
Z = 0.70, p = .49 62% (253)e
Z = 3.77, p <.001
Z = 1.34 p = .18
Z = 1.12, p = .26
Metaphorical/Doubtful 29% (14)a
55% (115)ce
Note: Ratios in the same column with different letter subscripts are significantly different, exact binomial test, p < .05. Median likability does not differ between groups, Mann-Whitney U, p < .05
555(+486) PERSON-DESCRIPTIVE WORDS Highlights
Anderson's (1968) ratings of trait likability and meaningfulness are directly and conceptually replicated. Likability ratings in the original study and the replication were highly correlated, but significant differences in ratings were observed for a substantial number of items. In both the original study and replication there were more negative person descriptive words than positive person descriptive words. The preponderance of negative person descriptive words was driven by social evaluations and temporary states: traits were equally likely to be positive or negative