New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Human language reveals a universal positivity bias
Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved January 9, 2015 (received for review June 23, 2014)
This article has a letter. Please see:

Significance
The most commonly used words of 24 corpora across 10 diverse human languages exhibit a clear positive bias, a big data confirmation of the Pollyanna hypothesis. The study’s findings are based on 5 million individual human scores and pave the way for the development of powerful language-based tools for measuring emotion.
Abstract
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
Human language, our great social technology, reflects that which it describes through the stories it allows to be told and us, the tellers of those stories. Although language’s shaping effect on thinking has long been controversial (1⇓–3), we know that a rich array of metaphor encodes our conceptualizations (4), word choice reflects our internal motives and immediate social roles (5⇓–7), and the way a language represents the present and future may condition economic choices (8).
In 1969, Boucher and Osgood (9) framed the Pollyanna hypothesis: a hypothetical, universal positivity bias in human communication. From a selection of small-scale, cross-cultural studies, they marshaled evidence that positive words are likely more prevalent, more meaningful, more diversely used, and more readily learned. However, in being far from an exhaustive, data-driven analysis of language, which is the approach we take here, their findings could only be regarded as suggestive. Indeed, studies of the positivity of isolated words and word stems have produced conflicting results, some pointing toward a positivity bias (10) and others toward the opposite (11, 12), although attempts to adjust for use frequency tend to recover a positivity signal (13).
Materials and Methods
To explore the positivity of human language deeply, we constructed 24 corpora spread across 10 languages. Our global coverage of linguistically and culturally diverse languages includes English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese (Simplified), Russian, Indonesian, and Arabic. The sources of our corpora are similarly broad, spanning books (14), news outlets, social media, the web (15), television and movie subtitles, and music lyrics (16). Our work here greatly expands upon our earlier study of English alone, where we found strong evidence for a use-invariant positivity bias (17). In SI Appendix, we provide full details of our corpora (SI Appendix, Table S1), survey, and participants (SI Appendix, Table S2).
We address the social nature of language in two important ways: (i) We focus on the words people most commonly use, and (ii) we measure how those same words are received by individuals. We take word use frequency as the primary organizing measure of a word’s importance. Such a data-driven approach is crucial for both understanding the structure of language and creating linguistic instruments for principled measurements (18, 19). By contrast, earlier studies focusing on meaning and emotion have used “expert” generated word lists, and these word lists fail statistically to match frequency distributions of natural language (10⇓–12, 20), confounding attempts to make claims about language in general. For each of our corpora, we selected between 5,000 and 10,000 of the most frequently used words, choosing the exact numbers so that we obtained ∼10,000 words for each language.
Of our 24 corpora, we received 17 already parsed into words by the source: the Google Books Project (six corpora), the Google Web Crawl (eight corpora), and movie and television subtitles (three corpora). For the other seven corpora (five Twitter corpora, the New York Times, and music lyrics), we extracted words by standard white space separation. Twitter was easily the most variable and complex of our text sources, and required additional treatment. In parsing Twitter, we required strings to contain at least one Unicode character and no invisible control characters, and we excluded strings representing web links, bearing a leading @, ampersand (&), or other punctuation (e.g., Twitter IDs) but kept hashtags. Finally, for all corpora, we converted words to lowercase. We observed that common English words appeared in the Twitter corpora of other languages, and we have chosen simply to acknowledge this reality of language and allow these commonly used borrowed words to be evaluated.
Although there are many complications with inflections and variable orthography, we have found merit for our broad analysis in not collapsing related words. For example, we have observed that allowing different conjugations of verbs to stand in our corpora is valuable because human evaluations of such have proved to be distinguishable [e.g., present vs. past tense (18)]. As should be expected, a more nuanced treatment going beyond the present paper’s bounds by involving stemming and word type, for example, may lead to minor corrections (21), although our central observations will remain robust and will in no way change the behavior of the instruments we generate.
There is no single, principled way to merge corpora to create an ordered list of words for a given language. For example, it is impossible to weight the most commonly used words in the New York Times against the most commonly used words in Twitter. Nevertheless, we are obliged to choose some method for doing so to facilitate comparisons across languages and for the purposes of building adaptable linguistic instruments. For each language where we had more than one corpus, we created a single quasi-ranked word list by finding the smallest integer r such that the union of all words with a rank
We then paid native speakers to rate how they felt in response to individual words on a nine-point scale, with 1 corresponding to most negative or saddest, 5 to neutral, and 9 to most positive or happiest (10, 18) (SI Appendix). This happy–sad semantic differential (20) functions as a coupling of two standard five-point Likert scales. Participants were restricted to certain regions or countries (e.g., Portuguese was rated by residents of Brazil). Overall, we collected 50 ratings per word for a total of around 5 million individual human assessments. We provide all datasets as part of SI Appendix.
Results and Discussion
In Fig. 1, we show distributions of the average happiness scores for all 24 corpora, leading to our most general observation of a clear positivity bias in natural language. We indicate the above-neutral part of each distribution with yellow and the below-neutral part with blue, and we order the distributions moving upward by increasing median (vertical red line). For all corpora, the median clearly exceeds the neutral score of 5. The background gray lines connect deciles for each distribution. In SI Appendix, Fig. S1, we provide the same distributions ordered instead by increasing variance.
Distributions of perceived average word happiness
As is evident from the ordering in Fig. 1 and SI Appendix, Fig. S1, although a positivity bias is the universal rule, there are minor differences between the happiness distributions of languages. For example, Latin American-evaluated corpora (Mexican Spanish and Brazilian Portuguese) exhibit relatively high medians and, to a lesser degree, higher variances. For other languages, we see that those languages with multiple corpora have more variable medians, and specific corpora are not ordered by median in the same way across languages (e.g., Google Books has a lower median than Twitter for Russian, but the reverse is true for German and English). In terms of emotional variance, all four English corpora are among the highest, whereas Chinese and Russian Google Books seem especially constrained.
We now examine how individual words themselves vary in their average happiness score between languages. Owing to the scale of our corpora, we were compelled to use an online service, choosing Google Translate. For each of the 45 language pairs, we translated isolated words from one language to the other and then back. We then found all word pairs that (i) were translationally stable, meaning the forward and back translation returns the original word, and (ii) appeared in our corpora for each language.
We provide the resulting comparison between languages at the level of individual words in Fig. 2. We use the mean of each language’s word happiness distribution derived from its merged corpora to generate a rough overall ordering, acknowledging that frequency of use is no longer meaningful, and moreover is not relevant, because we are now investigating the properties of individual words. Each cell shows a heat map comparison with word density increasing as shading moves from gray to white. The background colors reflect the ordering of each pair of languages: yellow if the row language had a higher average happiness than the column language and blue for the reverse. Also, in each cell, we display the number of translation-stable words between language pairs, N, along with the difference in average word happiness,
Scatter plots of average happiness for words measured in different languages. We order languages from relatively most positive (Spanish) to relatively least positive (Chinese); a yellow background indicates the row language is more positive than the column language, and a blue background indicates the converse. The overall plot matrix is symmetrical about the leading diagonal, with the redundancy allowing for easier comparison between languages. In each scatter plot, the key gives the number of translation-stable words for each language pair, N; the average difference in translation-stable word happiness between the row language and the column language,
A linear relationship is clear for each language–language comparison, and is supported by Pearson’s correlation coefficient r being in the range 0.73–0.89 (P <10−118 across all pairs; Fig. 2 and SI Appendix, Tables S3–S5). Overall, this strong agreement between languages suggests that approximate estimates of word happiness for unscored languages could be generated with no expense from our existing dataset. Some words will, of course, translate unsatisfactorily, with the dominant meaning changing between languages. For example “lying” in English, most readily interpreted as speaking falsehoods by our participants, translates to “acostado” in Spanish, meaning recumbent. Nevertheless, happiness scores obtained by translation will be serviceable for purposes where the effects of many different words are incorporated.
Stepping back from examining interlanguage robustness, we return to a more detailed exploration of the rich structure of each corpus’s happiness distribution. In Fig. 3, we show how average word happiness
Examples of how word happiness varies little with use frequency. (A–D) Above each plot is a histogram of average happiness
We chose the four example corpora shown in Fig. 3 to be disparate in nature, covering diverse languages (French, Egyptian Arabic, Brazilian Portuguese, and Chinese), regions of the world (Europe, the Middle East, South America, and Asia), and texts [Twitter, movies and television, the web (15), and books (14)]. The remaining 20 corpora yield similar plots (SI Appendix), and all corpora also exhibit an approximate self-similarity in SD for word happiness.
Across all corpora, we observe visually that the deciles tend to stay fixed or move slightly toward the negative, with some expected fragility at the 10% and 90% levels (due to the distributions’ tails), indicating that the overall happiness distribution of each corpus approximately holds independent of word use. In Fig. 3, for example, we see that both the Brazilian Portuguese and French examples show a small shift to the negative for increasingly rare words, whereas there is no visually clear trend for the Arabic and Chinese cases. Fitting
In constructing language-based instruments for measuring expressed happiness, such as our hedonometer (18), this frequency independence allows for a way to “increase the gain” in a fashion resembling standard physical instruments. Moreover, we have earlier demonstrated the robustness of our hedonometer for the English language, showing, for example, that measurements derived from Twitter correlate strongly with Gallup well-being polls and related indices at the state and city level for the United States (19).
Here, we provide an illustrative use of our hedonometer in the realm of literature, inspired by Vonnegut’s shapes of stories (23, 24). In Fig. 4, we show “happiness time series” for three famous works of literature, evaluated in their original languages of English, Russian, and French, respectively: Melville’s Moby Dick (www.gutenberg.org), Dostoyevsky’s Crime and Punishment (25), and Dumas’ The Count of Monte Cristo (www.gutenberg.org). We slide a 10,000-word window through each work, computing the average happiness using a “lens” for the hedonometer in the following manner. We capitalize on our instrument’s tunability to obtain a strong signal by excluding all words for which
Emotional time series for three famous 19th century works of literature: Melville’s Moby Dick (Top), Dostoyevsky’s Crime and Punishment (Middle), and Dumas’ The Count of Monte Cristo (Bottom). Each point represents the language-specific happiness score for a window of 10,000 words (converted to lowercase), with the window translated throughout the work. The overlaid word shifts (A–I) show example comparisons between different sections of each work. Word shifts indicate which words contribute the most toward and against the change in average happiness between two texts (SI Appendix, pp. S9–S10). Although a robust instrument in general, we acknowledge the hedonometer’s potential failure for individual words due to both language evolution and words possessing more than one meaning. For Moby Dick, we excluded “cried” and “cry” (to speak loudly rather than weep) and “Coffin” (surname, still common in Nantucket). Such alterations, which can be done on a case-by-case basis, do not noticeably change the overall happiness curves, although leaving the word shifts more informative. We provide online, interactive visualizations of the emotional trajectories of over 10,000 books at hedonometer.org/books.html.
The three resulting happiness time series provide interesting detailed views of each work’s narrative trajectory revealing numerous peaks and troughs throughout, at times clearly dropping below neutral. Both Moby Dick and Crime and Punishment end on low notes, whereas The Count of Monte Cristo culminates with a rise in positivity, accurately reflecting the finishing arcs of all three. The “word shifts” overlaying the time series compare two distinct regions of each work, showing how changes in word abundances lead to overall shifts in average happiness. Such word shifts are essential tests of any sentiment measurement, and are made possible by the linear form of our instrument (16, 18) [a full explanation is provided in SI Appendix, pp. S9–S10]. As one example, the third word shift for Moby Dick shows why the average happiness of the last 10% of the book is well below the average happiness of the first 25%. The major contribution is an increase in relatively negative words, including “missing,” “shot,” “poor,” “die,” and “evil.”
By adjusting the lens, many other related time series can be formed, such as those produced by focusing on only positive or negative words. Emotional variance as a function of text position can also be readily extracted. At hedonometer.org/books.html, we provide online, interactive emotional trajectories for over 10,000 works of literature where different lenses and regions of comparisons may be easily explored. Beyond this example tool we have created here for the digital humanities and our hedonometer for measuring population well-being, the datasets we have generated for the present study may be useful in creating a great variety of language-based instruments for assessing emotional expression.
Overall, our major scientific finding is that when experienced in isolation and weighted properly according to use, words, which are the atoms of human language, present an emotional spectrum with a universal, self-similar positive bias. We emphasize that this apparent linguistic encoding of our social nature is a system-level property, and in no way asserts all natural texts will skew positive (as exemplified by certain passages of the three works in Fig. 4) or diminishes the salience of negative states (26). Going forward, our word happiness assessments should be periodically repeated and carried out for new languages, tested on different demographics, and expanded to phrases both for the improvement of hedonometric instruments and to chart the dynamics of our collective social self.
Acknowledgments
We thank M. Shields, K. Comer, N. Berry, I. Ramiscal, C. Burke, P. Carrigan, M. Koehler, and Z. Henscheid, in part, for their roles in developing hedonometer.org. We also thank F. Henegan, A. Powers, and N. Atkins for conversations. P.S.D. was supported by National Science Foundation CAREER Award 0846668.
Footnotes
- ↵1To whom correspondence may be addressed. Email: peter.dodds{at}uvm.edu, btivnan{at}mitre.org, or chris.danforth{at}uvm.edu.
Author contributions: P.S.D., B.F.T., and C.M.D. designed research; P.S.D., E.M.C., S.D., M.R.F., A.J.R., J.R.W., L.M., K.D.H., I.M.K., J.P.B., K.M., M.T.M., and C.M.D. performed research; P.S.D., E.M.C., A.J.R., K.M., and M.T.M. contributed new reagents/analytic tools; P.S.D., E.M.C., S.D., M.R.F., A.J.R., J.R.W., J.P.B., and C.M.D. analyzed data; and P.S.D. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Data deposition: Data are available in Dataset S1 and at www.uvm.edu/storylab/share/papers/dodds2014a/index.html.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1411678112/-/DCSupplemental.
Freely available online through the PNAS open access option.
References
- ↵.
- Whorf BL
- ↵.
- Chomsky N
- ↵.
- Steven Pinker
- ↵.
- Lakoff G,
- Johnson M
- ↵.
- Campbell RS,
- Pennebaker JW
- ↵
- ↵.
- Pennebaker JW
- ↵
- ↵
- ↵.
- Bradley MM,
- Lang PJ
- ↵.
- Stone PJ,
- Dunphy DC,
- Smith MS,
- Ogilvie DM
- ↵.
- Pennebaker JW,
- Booth RJ,
- Francis ME
- ↵
- ↵.
- Michel J-B, et al.
- ↵.
- Brants T,
- Franz A
- ↵.
- Dodds PS,
- Danforth CM
- ↵
- ↵
- ↵
- ↵.
- Osgood C,
- Suci G,
- Tannenbaum P
- ↵
- ↵.
- Zipf GK
- ↵.
- Vonnegut K Jr
- ↵Vonnegut K (2010) Kurt Vonnegut on the shapes of stories. Available at www.youtube.com/watch?v=oP3c1h8v2ZQ. Accessed May 15, 2014.
- ↵Dostoyevsky F (1866) Crime and Punishment. Original Russian text. Available at ilibrary.ru/text/69/p.1/index.html. Accessed December 15, 2013.
- ↵.
- Forgas JP
Citation Manager Formats
More Articles of This Classification
Social Sciences
Psychological and Cognitive Sciences
Related Content
Cited by...
- Integrating sentiment and social structure to determine preference alignments: the Irish Marriage Referendum
- Linguistic positivity in historical texts reflects dynamic environmental and psychological factors
- Measuring patient-perceived quality of care in US hospitals using Twitter
- Impact of lexical and sentiment factors on the popularity of scientific papers
- Reply to Garcia et al.: Common mistakes in measuring frequency-dependent word characteristics
- The language-dependent relationship between word happiness and frequency