New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Linguistic positivity in historical texts reflects dynamic environmental and psychological factors
Contributed by Robert Axelrod, August 30, 2016 (sent for review July 21, 2016; reviewed by Molly Ireland, Matthias Mehl, and Kenneth W. Wachter)
This article has a Correction. Please see:

Significance
For nearly 50 y social scientists have observed that across cultures and languages people use more positive words than negative words, a phenomenon referred to as “linguistic positivity bias” (LPB). Although scientists have proposed multiple explanations for this phenomenon—explanations that hinge on mechanisms ranging from cognitive biases to environmental factors—no consensus on the origins of LPB has been reached. In this research, we derive and test, via natural language processing and data aggregation, divergent predictions from dominant explanations of LPB by examining it across time. We find that LPB varies across time and therefore cannot be explained simply as the product of cognitive biases and, further, that these variations correspond to fluctuations in objective circumstances and subjective mood.
Abstract
People use more positive words than negative words. Referred to as “linguistic positivity bias” (LPB), this effect has been found across cultures and languages, prompting the conclusion that it is a panhuman tendency. However, although multiple competing explanations of LPB have been proposed, there is still no consensus on what mechanism(s) generate LPB or even on whether it is driven primarily by universal cognitive features or by environmental factors. In this work we propose that LPB has remained unresolved because previous research has neglected an essential dimension of language: time. In four studies conducted with two independent, time-stamped text corpora (Google books Ngrams and the New York Times), we found that LPB in American English has decreased during the last two centuries. We also observed dynamic fluctuations in LPB that were predicted by changes in objective environment, i.e., war and economic hardships, and by changes in national subjective happiness. In addition to providing evidence that LPB is a dynamic phenomenon, these results suggest that cognitive mechanisms alone cannot account for the observed dynamic fluctuations in LPB. At the least, LPB likely arises from multiple interacting mechanisms involving subjective, objective, and societal factors. In addition to having theoretical significance, our results demonstrate the value of newly available data sources in addressing long-standing scientific questions.
For a naive observer, human language use often appears to be a function of utility and context. We craft our enunciations according to our intentions and, aside from correlations between language and environmental and experiential cues, we seem to be masters of our words. However, as social scientists have long recognized, linguistic patterns are also a function of social, cultural, and cognitive factors (1⇓⇓⇓–5).
One of the more striking examples of such linkages comes from research on the affective content of language revealing that people use positive words more frequently than negative words, an effect first reported by Zajonc in 1968 (6) and subsequently replicated many times (7⇓⇓⇓–11). This phenomenon, initially referred to as the “Pollyanna hypothesis” (7) and more recently as “linguistic positivity bias” (LPB) (9), has been demonstrated across numerous languages and corpora (7, 11, 12), leading to the conclusion that it is a panhuman tendency.
However, although the existence of LPB is well-established, there is surprisingly little consensus on what mechanisms are responsible for the effect. Among the mechanisms proposed are universal cognitive biases (6, 7), affective states (12), objective circumstances (8), and societal norms (9, 10, 12). However, research has been unable to establish which of these mechanisms drives LPB and whether the effect might be driven by interaction between a subset of these or other factors. One reason for this uncertainty is that previous investigations of LPB have taken a synchronic approach to language and were unable to offer any insight into whether or to what extent LPB is stable across time and context within a given language.
In this work we adopt an alternative approach in which LPB is treated as a dynamic phenomenon. Specifically, using two time-stamped American English corpora (13, 14), we study longitudinal variation in LPB as a function of subjective, objective, and societal factors. This approach allows us not only to investigate diachronic variation in LPB, an unexplored dimension of the effect, but also to adjudicate between previously proposed explanations of LPB.
Background
In a seminal paper on affective asymmetries, Zajonc (6) found that the degree to which a word is liked is positively associated with its frequency of appearance; this association was cited as evidence supporting the “mere exposure effect.” Boucher and Osgood (7) replicated this observation, but they proposed a different causal explanation and noted several major flaws in Zajonc’s model, arguing that although the mere exposure effect can account for the preference of previously unknown words, repeated exposure to both positive and negative words over time should wash out the effect of their disproportionate frequencies. In their view, LPB was a reflection of a “universal human tendency” to prefer positive over negative language. Although initial empirical tests were conducted only on English, subsequent work found the same effect in multiple other languages.*
Since the work of Zajonc (6) and Boucher and Osgood (7), several additional explanations of LPB have been proposed. Matlin and Stang (18) formulated the Pollyanna principle, which maintained that positive words appear more frequently than negative words because positive information is cognitively privileged, easier to remember, and more likely to be recalled. Others have argued that the evaluative content of language might express people’s current affective state (19⇓–21). Because humans tend to show an affective bias toward positivity (22⇓–24)—a phenomenon often referred to as the “positivity offset” (25)—it is plausible to see LPB as a reflection of a lower-level bias toward subjective positive affect (12, 26, 27). We refer to this position as the “subjective mood framework.” However, researchers have also questioned whether, more simply, LPB might be a function of objective circumstances. In this view, which we refer to as the “objective circumstances framework,” the higher relative frequency of positive words reflects the higher relative frequency of positive events in speakers’ environments (8). Finally, in recent years, social scientists have proposed that it is human sociality that drives LPB (9, 10, 12). Positive messages facilitate group interactions, cooperation, and social bonding; thus groups and individuals using more positive language might be favored over those who talk persistently about the “ugly things in life” (7). We refer to this theory as the “prosociality framework.”
Current Research
Explanations of LPB vary considerably in the focal mechanisms they propose, which span multiple levels of analysis. Although the mere exposure effect (6), the Pollyanna principle (18), and the subjective mood frameworks interpret LPB as the consequence of a universal characteristic of the human mind, the objective circumstances and prosociality frameworks see LPB as a function of the environment and social organization, respectively. However, despite the diversity of these explanations, when language is treated as a static phenomenon they make indistinguishable predictions; these uniform predictions are a major reason why synchronic investigations of LPB have not been able to adjudicate among theories.
In contrast, when language is treated as a dynamic phenomenon, diverging predictions can be extracted from these frameworks. Both the mere exposure effect and the Pollyanna principle treat LPB as a result of shared cognitive architecture and therefore predict not only cultural/linguistic universalism but also temporal stability. A subjective mood framework, however, can account for various temporal fluctuations. For example, if for some reason a large proportion of a language community becomes simultaneously less happy, one could expect a temporary decrease in LPB, followed by at least partial adaptation to the positive offset point. Although how fast and to what degree such recovery to overall positive mood should happen remain open questions (28), this framework nonetheless allows for dynamic changes in LPB. Similarly, the objective circumstances framework predicts that various short- or long-term changes in the environment influence LPB: Deteriorating conditions should decrease the amount of positivity in language, and improving conditions should increase it. Finally, the prosociality framework suggests that changing societal principles can dynamically influence LPB. If a society develops interpersonal norms that promote greater social closeness, one can expect higher LPB; however, the opposite trend, the development of norms promoting greater social distance, should lead to decreased LPB (see Table 1 for an overview of LPB frameworks and their predictions).
Summary and predictions of LPB frameworks
In this work, we present a diachronic investigation of LPB in American English that examines the effect as a function of time, objective circumstances, and group-level subjective states. Our central hypothesis is that LPB is not a static phenomenon but instead varies over time. Further, we consider two classes of temporal variation, long-term linear trends (study 1) and short-term fluctuations (studies 2, 3, and 4), and we test two sets of hypotheses regarding specific mechanisms that might drive either class of variation.
Linear Trends in LPB.
In study 1 we investigate linear trends in LPB. Predictions about the linear longitudinal trend of LPB vary according to which explanatory framework is used to inform the prediction. If the effect is driven entirely by universal cognitive characteristics, LPB should be relatively stable across time and follow neither a positive nor negative trend. However, if LPB is driven by subjective, objective, or societal factors, and these factors follow either a positive or negative linear trend, LPB should shift in their direction. Accordingly, by investigating whether LPB shows a positive, negative, or null trend, we can assemble discriminative evidence that will help adjudicate between explanations of LPB.
One weakness in this approach is that, because some linear LPB predictions depend on the direction of proposed explanatory variables, any uncertainty regarding the direction of the explanatory variables transfers to the LPB predictions. This issue has the greatest bearing on predictions made from the objective circumstances and subjective mood frameworks. Most problematically, there is no consensus, or even an agreed-upon metric, regarding whether objective circumstances in the United States have generally improved or degraded over time. Consequently, we do not test linear LPB predictions based on the objective circumstances in this study.
In contrast, deriving predictions from the subjective framework is more feasible because a well-researched variable, happiness, can be identified. However, there are at least two problems with using longitudinal happiness to test the subjective mood framework directionally. First, because the earliest measurement of happiness in the United States dates back only to 1946, the time span might not be long enough to capture a gradual linear trend. Second, there is not a consensus on the direction of happiness in the United States over time. Although the prevailing view is that the aggregated time series of the happiness in the United States suggests a nil trend (29), there have been reports of both positive (30) and negative (31) statistically reliable trends. Accordingly, the presence or absence of a particular trend in LPB cannot necessarily be interpreted as evidence for or against the subjective mood framework, because the literature is unclear about the direction of happiness in the United States.
With these caveats in place, we consider the following hypotheses:
Hypothesis 1.1: LPB has a nil linear trend. Many researchers agree that the historical trend for happiness in the United States is nil. If LPB is strictly linked to subjective happiness, we can expect temporal fluctuations but no linear trend. This hypothesis is also consistent with the universalist explanations of LPB, such as the mere exposure and the Pollyanna principle.
Hypothesis 1.2: LPB is increasing over time. Veenhoven and Hagerty (32) suggested that there is indeed a positive trend for happiness in the United States, but such trend is hard to detect statistically because of the relatively short time series being used. Because our data cover a much longer time period than the longest time series for happiness in the United States, we might be able to detect an increase in LPB that is not observable in national happiness studies alone. Thus, an increase in LPB could support the subjective mood framework if we take a positive trend as a reference.
Hypothesis 1.3: LPB is decreasing over time. There is growing evidence that the American society has been changing in the direction of lower social cohesion. Social scientists using automated text analysis have found that the Americans gradually have been becoming less concerned with virtues, duties, moral obligations, and social values (33, 34). According to the prosociality framework, increased social distance should be reflected in a decreasing LPB trend. According to the subjective mood framework, a negative trend in happiness in the United States, as observed in ref. 31, could also lead to this effect.
Short-Term Fluctuations in LPB.
In investigating short-term fluctuations in LPB, we consider two LPB frameworks, the objective circumstances framework (studies 2 and 3) and the subjective mood framework (study 4). However, because subjective mood and objective circumstances are often tightly coupled (35), we distinguish between the two variables in terms of types of measurement but not in terms of independent effects on LPB. Disentangling the potential effects of these two variables is important, but this task is beyond the scope of our data and must be left to future research. Specifically, we test two hypotheses regarding short-term fluctuations in LPB:
Hypothesis 2.1: Changes in LPB will be predicted by changes in the objective circumstances, and a deteriorating environment will be associated with a decrease in LPB. Two salient and well-documented phenomena relevant to well-being are war and the economy. Accordingly, using the number of military casualties in war and Okun’s Misery Index (36) to operationalize these factors, we tested the hypotheses that LPB would be negatively associated with larger wars (study 2) and higher Misery Index scores (study 3). Previous work produced mixed results, finding decreased language positivity during World War II but not during World War I (37).
Hypothesis 2.2: Changes in LPB will be predicted by changes in the collective affect, so that decreasing happiness on the national level will be associated with a decline in LPB. Previous work has shown that language positivity is aligned with the mood of the author, i.e., happier people use more positive language (20, 21, 38, 39). Although these findings were based on the writings of individual subjects, we hypothesize that the same pattern should be observable in group-level time series. More specifically, we predict that happier periods will be associated with higher LPB on a national level.
To test our hypotheses regarding long-term linear trends and short-term fluctuations in LPB, we conducted four studies using time-stamped corpora of American English. The first study tests if there is a general linear trend, comparing the empirical support for hypotheses 1.1, 1.2, and 1.3. In the second study we measured whether the LPB is associated with military casualties in war, testing hypothesis 2.1. In a third study, we further tested hypothesis 2.1, comparing the trends in LPB and economic prosperity. In the fourth study, we examined the association of LPB with a direct measure of subjective happiness, testing hypothesis 2.2.
Study 1: Linear Trends
In study 1 we measured LPB in two time-stamped historical corpora of American English—Google books and the New York Times—and tested for any linear trends.
Results.
First we analyzed the Google books corpus (1800–2000). For each year we computed the total number of positive words and the total number of negative words and divided them by the total number of words for that year. There was a downward trend both for positive [β = −0.96, 95% confidence interval (CI) (−1.00, −0.93), t(199) = −50.40, P < 0.001] and negative [β = −0.80, 95% CI (−0.89, −0.72), t(199) = −18.97, P < 0.001] words, suggesting that affective language in American books has been in decline. (All slopes are standardized beta coefficients. All significance tests are two-tailed.)
Next, we computed an LPB index as the ratio of positive to negative words. Notably, this index is not a suitable measure of LPB as a static construct, because we do not control for the total count of positive and negative words (types) or for their average frequencies (tokens). However, this index, can detect if LPB changes over time. If we observe that the index is reliably higher at one time point than at another, we can conclude that LPB has changed. Using this index, we tested our general hypothesis that LPB changes with time and found a significant negative linear trend [β = −0.83, 95% CI (−0.91, −0.76), t(199) = −21.18, P < 0.001] (Fig. 1).
LPB by year for Google books and the New York Times. The straight lines represent the estimated linear trends.
Standardized residuals of LPB and number of casualties of war with linear trends removed. Solid lines represent LPB, and dashed lines represent the averaged casualties of war for a given year.
We then repeated these analyses using the New York Times corpus (1851–2015). However, because the New York Times Chronicle website returns the number of articles that contain a queried word rather than the actual frequency of a word, we had to use a slightly different measure of affective language frequency. To construct these measures, we calculated the total number of articles that mentioned any of the 5,000 most frequent English words (www.wordfrequency.info) and used these frequencies to normalize the positive and negative words time series. Specifically, for each year, we computed the ratio of the sum of all articles mentioning a positive word to the sum of all articles mentioning any of the 5,000 common English words. We computed the same ratio for the negative words. Using this affective language metric, we tested for linear trends in the normalized positive and negative time series in the New York Times corpus. Again, there was a decreasing trend both for positive [β = −0.64, 95% CI (−0.76, −0.53), t(163) = −10.75, P < 0.001] and negative [β = −0.41, 95% CI (−0.55, −0.27), t(163) = −5.73, P < 0.001] words. Next we computed an LPB index by dividing the positive time series by the negative time series and tested for a linear trend; this computation yielded a marginally significant decrease in positivity over time [β = −0.13, 95% CI (−0.28, 0.03), t(163) = −1.63; P = 0.10].
Discussion.
This study yielded two primary results. First, we observed that affective word use in American English has decreased over time. This finding converges with a similar observation by Acerbi et al. (37) and might correspond to the recently discovered rise in rational language use in English (40). Second, we found compelling evidence for a decreasing longitudinal trend in LPB. This trend was quite strong in the Google Ngrams corpus and was marginally significant in the New York Times corpus.
Taken together, these results provide strong evidence against hypotheses 1.1 and 1.2, which propose that LPB follows a null or positive longitudinal trend. Further, the observed negative effect indicates that LPB cannot be solely a function of universal cognitive mechanisms such as the mere exposure effect or Pollyanna principle. If LPB were driven by these mechanisms alone, the effect should be relatively stable across time, because such mechanisms are theorized to be hardwired and largely constant characteristics of human cognition.
As indicated by hypothesis 1.3, which predicted a negative trend in LPB, the findings of this study suggest that LPB is at least partially a function of one or more dynamic factors. One strong candidate factor is decreasing social cohesion in the United States (33, 34), according to the prosociality framework. However, because of the uncertainty about the direction of happiness in the United States, it is also possible that a negative trend in happiness in the United States (31) exists and that this factor is also partly responsible for the observed decline in LPB. Similarly, although there are no widely accepted views regarding the direction of objective circumstances in the United States, it is possible that, to the extent that objective circumstances might be steadily declining, the objective circumstances framework could at least partially account for our results. Accordingly, to understand better whether objective circumstances and subjective happiness in the United States might contribute to LPB, we directly investigated these effects in the next three studies.
Study 2: Casualties of War
In study 2 we tested hypothesis 2.1, which predicts that changes in LPB will be associated with changes in objective circumstances. We did so by investigating whether, while controlling for the linear trends observed in study 1, LPB can be linked to objective environmental factors that have a high impact on human well-being. There are many potentially relevant environmental factors that could be used to test this hypothesis, but the historical data for many of these factors are insufficient. Further, many seemingly relevant objective factors might be complicated by unanticipated boundary conditions such as multimodal group-level distributions of valence and public salience. Accordingly, the variable selection process for this study required that the target variable satisfy three constraints: availability of extensive historical data, minimally ambiguous valence, and sufficient public awareness.
One highly relevant candidate variable that we identified, given these requirements, was engagement in war, with the added information of the number of casualties. Most importantly, there is ample historical record of war and casualties. Further, although wars certainly can have polyvocal socio-cultural significance, we would argue that, on average, war is negatively valenced and that the negative valence of a given war increases with its number of casualties. Finally, although there is some variation in public awareness of wars, media coverage and the political prominence of wars during wartime suggest that the floor of this awareness is at least somewhere above liminal. Accordingly, in this study, we investigate whether LPB fluctuates with war and peace by testing the hypothesis that conflicts with larger casualty counts will be associated with decreased LPB.
Results.
We computed the same LPB index as in study 1 and regressed this index on time and number of casualties per year (1800–2010).† Including time in the models allowed us to test if there is an observable relationship between LPB and wars independent of the overall linear trend detected in study 1. The number of military casualties was a significant predictor of LPB [β-casualties = −0.14, 95% CI (−0.22, −0.07), t(198) = −3.71, P < 0.001; β-time = −0.82, 95% CI (−0.90, −0.75), t(198) = −21.49, P < 0.001]: A higher number of casualties was associated with lower LPB. The same pattern was observed in the New York Times corpus [β-casualties = −0.45, 95% CI (−0.59, −0.31), t(157) = −6.30, P < 0.001; β-time = −0.18, 95% CI (−0.31, −0.04), t(157) = −2.47, P < 0.05].‡
Discussion.
As predicted by hypothesis 2.1, this study shows that LPB is associated with objective circumstances. After accounting for the negative longitudinal decline in LPB, we found that LPB also decreases as a function of wars’ casualty counts. Further, this effect was found in two independent corpora using different measures of LPB, a replication that indicates that this effect is robust and reliable. Perhaps even more importantly than providing supporting evidence for the objective circumstances hypothesis, these results make a strong case against explanations of LPB as an exclusive function of static, built-in cognitive biases. Although the finding that objective circumstances are associated with dynamic shifts in LPB is not evidence against the involvement of universal cognitive mechanisms in LPB, it does directly oppose the view that LPB emerges solely from our cognitive architecture. At the very least, this study demonstrates that short-term fluctuations in LPB are not random noise but rather are at least partially a function of dynamic phenomena such as variations in objective circumstance.
Although this study provided initial evidence that fluctuations in LPB can be linked to the current state of the environment, it has a noteworthy shortcoming. We purposefully chose an extremely valenced factor of the environment, maximizing our chances of detecting meaningful patterns in the LPB fluctuations. It is not clear, however, if these findings are generalizable to factors less salient than the distinction between war and peace. We address this issue in the next study.
Study 3: Economic Misery
In study 2 we found that changes in LPB can be predicted by changes in objective circumstances. In study 3 we further tested the generalizability of hypothesis 2.1. Instead of using casualties of war as a measure of adversarial environment, we represented objective circumstance as economic wellbeing. This factor was operationalized using Okun’s Misery Index (36), a combination of unemployment and inflation rates, that indicates how harsh the economic environment is for the average person. According to hypothesis 2.1, we expect that, when controlling for linear trends, an increase in the Misery Index will be accompanied by a decrease in LPB.
Results.
The LPB index was calculated as in studies 1 and 2. As expected, when controlling for time, the Misery Index (1948–2015) significantly predicted LPB in Google books [β-misery = −0.35, 95% CI (−0.49, −0.21), t(50) = −4.92, P < 0.001; β-time = −0.70, 95% CI (−0.84, −0.56), t(50) = 9.83, P < 0.001], such that deteriorating economic environments were associated with lower levels of LPB. The same pattern was observed in the New York Times corpus [β-misery = −0.42, 95% CI (−0.62, −0.22), t(65) = −4.20, P < 0.001; β-time = −0.38, 95% CI (−0.57, −0.18), t(65) = −3.76, P < 0.001] (Fig. 3).
Standardized residuals of LPB and Misery Index measures with linear trends removed. Solid lines represent LPB, and dashed lines represent the Misery Index for a given year.
Discussion.
This study provided further support for hypothesis 2.1, demonstrating that LPB can be predicted by a less extreme measure of objective circumstances. After controlling for time, we found that years with a higher Misery Index tended to have lower levels of LPB in both corpora. Not only the distinction between war and peace but also the difference between economic hardships and economic prosperity seem to be relevant when predicting the rate of LPB.
Notably, however, although LPB clearly appears to be associated with objective circumstances, this association has little bearing on whether and to what extent other dynamic factors might influence it. Accordingly, in study 4 we investigated the relationship between LPB and another dynamic factor, subjective happiness, which has been considered as an explanation of LPB.
Study 4: Subjective Happiness
In the previous studies we found that LPB can fluctuate in accordance with the valence of the objective circumstances. In the study 4 we tested whether LPB can be predicted from direct measures of subjective happiness.
Results.
We followed the analysis strategy used in the previous study and predicted changes in LPB from changes in subjective happiness (1946–2014) while controlling for linear trends. Happiness was a significant predictor both for the Google books corpus [β-happiness = 0.33, 95% CI (0.15, 0.51), t(38) = 3.59, P < 0.001; β-time = −0.77, 95% CI (−0.95, −0.59), t(38) = −8.45, P < 0.001] and for the New York Times corpus [β-happiness = 0.31, 95% CI (0.06, 0.55), t(47) = 2.48, P < 0.05; β-time = −0.47, 95% CI (−0.72, −0.23), t(47) = −3.84, P < 0.001]. That is, in the years when the level of national subjective happiness in the United States was lower, LPB tended to be lower also (Fig. 4).
Scatterplots of standardized residuals of LPB and subjective happiness with linear trends removed. The data points are labeled by the year for which the measurement was taken.
Discussion.
In this study we found strong support for hypothesis 2.2, which proposed that LPB is a function of affective states. Specifically, we found that short-term fluctuations in LPB varied with nation-wide measures of happiness. As in studies 2 and 3, this effect was replicated in a second corpus, increasing the certainty that the observed association between national happiness and LPB is a reliable effect. In addition to supporting the subjective mood framework, these results further corroborate our view that LPB cannot be explained simply as a function of universal cognitive mechanisms. Rather, similar to studies 2 and 3, this study indicates that short-term trends in LPB are meaningful reflections of the relationship between the effect and dynamic phenomena, in this case affective states. Beyond providing evidence against exclusively universalist explanations for LPB, this study, taken in conjunction with the previous two studies, offers tentative support to the view that LPB is a function of multiple dynamic factors. That is, LPB might not be a function of either objective circumstances or affective states alone but rather a function of a set of factors that includes these factors and potentially others, such as societal norms (9, 10).
General Discussion
More than four decades ago Zajonc (6) discovered that positive words are used disproportionately more often than negative words. Although there was no consensus about why this imbalance exists, subsequent research demonstrated repeatedly that the phenomenon is remarkably robust, appearing across contexts, cultures, and languages (7⇓⇓⇓⇓–12). However, although this research has yielded multiple compelling explanations for LPB, such as in-built cognitive biases (6, 18), affect (12), objective circumstances (8), and societal norms (9, 10, 12), there has been little attempt to determine which explanations are valid and which are erroneous. One reason for the lack of such studies is simply that discriminant hypothesis testing was not feasible for previous investigations of LPB. Despite the substantive variability between competing explanations of the effect, each explanation generated the same prediction, that positive words are used more than negative words.
However, by introducing a previously unexplored dimension to LPB—time—we were able not only to provide evidence that this is indeed a dimension that needs to be accounted for but also to devise studies in which previous explanatory frameworks yielded divergent predictions. Specifically, if LPB were purely a function of static, universal cognitive mechanisms (6, 7, 11, 18), LPB should be relatively stable across time, and variations in LPB should be minimal and centered around a relatively stable reference point. In contrast, if LPB changes across time, then universalist frameworks must be, at the very least, incomplete.
Across four studies, we found strong evidence for the latter conclusion, that universalist frameworks alone cannot account for LPB. In study 1, we found that LPB has been steadily decreasing over time, a trend that cannot be accounted for by the universalist position alone. Notably, this negative linear trend can also not be accounted for by the original formulation of the subjective mood framework, because in its original form that framework assumes a partially homeostatic model in which dynamic fluctuations are centered around a relatively stable positivity offset. Although we know that nation-level subjective happiness can change reliably over time (41), there is no ubiquitous evidence that in the United States this positivity offset was higher in the past. Therefore, a downward shift in the positivity offset cannot be the explanation for why the LPB is lower now. Similarly the negative trend is difficult to explain as a function of objective circumstances, because during the last two centuries the United States has prospered in many domains related to human well-being, such as food and energy availability, transportation, health-care, education, and overall national wealth. Of course, if one accepts the assumptions that in the United States there has been a stable decline in subjective mood or in objective circumstances, then either of these frameworks might serve as a plausible explanation of the LPB. However, it is far from clear that such declines exist; indeed, there is little evidence to suggest so. Thus it is unlikely that the observed negative trend is a function of either of these factors.
We suggest that, of all five theoretical frameworks, the prosociality explanation is most consistent with the observed negative linear trend. Researchers using cross-sectional design have found direct evidence that positive language is related to prosociality, such that increased cooperation (42) and increased need for belonging (43) are associated with a greater prevalence of positive language. Further, researchers using longitudinal designs have found evidence that the social cohesion of the American society is becoming weaker over time. Compared with their predecessors, Americans seem to be more individualistic (34) and less conformist (44, 45), to need less social approval (46), to be less concerned with moral virtues and social duties (33), and to be less trusting (47) and less empathetic (48). If LPB is indeed related to the prosocial values in a society, the observed decrease in LPB could be driven by the general decline of prosociality in American culture.
However, although the prosociality framework can account for the steady decrease in LPB that we observed in study 1, it cannot explain the dynamic fluctuations that are also present in the longitudinal trend. Shifts in cultural norms tend to be relatively slow processes in which the unit of time is more often a generation than a year; accordingly, although social change is a strong candidate for explaining long-term variation in LPB, it not clear how it could account for the strong short-term shifts observed in the time series. On the other hand, both the objective circumstances and subjective mood frameworks, can account for such dynamic fluctuations. Objective circumstances can fluctuate between positive and negative, with war and peace being among the most dramatic examples for such temporary changes. If objective circumstances are related to LPB, such fluctuations should be reflected in the variations in LPB. Similarly, although the subjective mood explanation relies on a relatively stable positive offset, it does not preclude temporary upward or downward shifts. Strikingly, in studies 2–4 we found strong evidence for an association between each of these factors and LPB.
The mapping between our results and the predictions from the five theoretical frameworks suggests that no single theory can account for all the patterns observed. Accordingly, we propose that subjective affect, objective circumstances, and societal principles each play an important, dynamic role in LPB. Additionally, although we provide strong evidence that universal cognitive mechanisms alone cannot account for the dynamic patterns we observed in LPB, the results do not contradict the view that such mechanisms also might play a role in LPB. Our findings thus suggest that LPB is a complex phenomenon that emerges from the intersection of multiple factors occurring at various levels.
However, although this research answers important questions about the dynamics of LPB, it also raises many new questions. Do objective circumstances and subjective mood have independent roles in the fluctuations of LPB? Do other languages reveal diachronic patterns similar to those we found in American English? Will happier countries (e.g., Finland) show stronger LPB than less happy countries (e.g., Russia) when controlling for historical period? Are changes from one social model into another (e.g., the fall of socialism in Eastern Europe) marked by a change in LPB? Will cultures with different concepts of happiness (49) have different trends of LPB? The recent sharp growth in the availability of data (50, 51) and the progress in developing more sophisticated tools for measuring affect in language (11, 15, 52, 53) will help us address these and other related questions in the future.
In addition to having theoretical relevance, our results also have methodological significance. During recent years there has been an upsurge in the availability of time-stamped text corpora, and social scientists have been quick to use them as a source of empirical data (33, 34, 40, 54, 55). Measuring the historical changes of word frequencies, authors have made conclusions about social, moral, and cognitive changes in the culture. Such comparisons typically measure the linear trend of a set of words and check whether the trend is directionally consistent with an a priori prediction based on a particular theory. For example, if previous research has found increasing individualism in the society, researchers might predict that this trend will be reflected in language, either as a corresponding linear trend in content words (34) or in pronouns (55). In the current work we went a step further and measured the statistical association between patterns derived from historical texts and a direct measure of a psychological construct (happiness), showing time series correspondence between self-reported and text-derived psychological measures. Thereby our results demonstrate the value of automated text analysis in social and behavioral sciences and encourage the development of more precise tools for inferring psychological states from historical texts.
Conclusion
The recent boom in data availability and computational resources has provided social scientists with tools to address empirically questions about which previously we could only speculate. In this work, we used automated text analysis tools to investigate whether LPB is a static property of language or varies as a function of time. We found a reliable downward trend as well as short-term fluctuations predicted by environmental factors and national-level subjective states. These results have important theoretical implications for our understanding of the role of affect in language. In addition, the results demonstrate the role that novel methodological approaches can play in resolving long-lasting theoretical problems.
Materials and Methods
Corpora.
Google provides the largest available time-stamped corpus (50). We included word-frequency data only for books printed in the United States. Following a recommendation by the creators of the corpus (www.culturomics.org/Resources/faq), we included data from 1800–2000. Data earlier than 1800 are scarcer and produce unstable patterns, and data after 2000 are less representative because of changes in the sampling procedure. The corpus contained 192 billion words from 1.3 million books. In addition, we used the New York Times Chronicle web service to assemble a second corpus covering the period 1851–2015. It should be noted that the Ngrams and New York Times corpora were partially assembled using optical recognition technology, which is known to make word-recognition errors at low frequencies. However, there is no evidence that such errors are not randomly distributed across word valence; thus we do not expect that recognition errors would have any effect on our results. One difference between the two data sources is that the New York Times provides the proportion of articles per year containing a given word instead of the frequency of a given word relative to all words for that year. This corpus was based on 14.9 million articles.§ These two corpora were used in all of the studies.
Preprocessing.
Because the Google corpus contained multiple nonwords, we filtered out all entries that were not part of the Unix standard English dictionary. (The dictionary is commonly used by spell-checking programs; for Ubuntu 14.04 it consisted of 99,171 words.)
Affect Dictionary.
We used the positive and negative emotion word categories from the linguistic inquiry and word count (LIWC) dictionary (56), which contained 907 words and stems. There were 408 entries in the positive category, such as “awesome,” “relief,” “pretty,” “clever,” “confident,” “easy,” “helpful,” and “grace,” and 499 entries in the negative category, such as “abandon,” “fear,” “suffer,” “hatred,” “lost,” “overwhelm,” “boring,” and “grief.” It has been shown that these two categories correspond to (i) the affective content of the written text, as judged by human judges (19); (ii) generalized depression, as measured by Beck’s Depression Inventory (20) or by self-selection in internet forums (39); and (iii) short-term mood fluctuations induced by experimental manipulations (21). This dictionary was used in all the studies. One drawback to this approach is that using a fixed set of dictionary terms to operationalize expressions of affect does not account for possible shifts in word meanings over time. For example, some words with historically negative valence, such as “awesome” and “terrific,” are now used to indicate positive affect (53). However, research suggests that only a small percentage of affect words have actually shifted in valence (53), and there is no evidence that such shifts are biased in one direction or the other. Consequentially, although semantic drift might contribute noise to longitudinal analyses such as those reported in the current research, this noise most likely simply increases the difficulty of detecting a signal and does not introduce a systematic bias.
Casualties of War Time Series.
In study 2, we used data from a fact sheet from the US Department of Veteran Affairs (57) and computed the average number of United States military casualties for the wars in which the United States was engaged during the last two centuries. In the fact sheet the number of casualties was reported by war rather than by year. For simplicity we assumed a uniform distribution of victims over time for each of the wars. For the period 2001–2010 we used data from ref. 58, which provided exact numbers of military deaths caused by hostile action for each year.
Misery Index Time Series.
In study 3, we used data from www.miseryindex.us, which contained Misery Index measures for the United States in the period 1948–2015.
Happiness Time Series.
In study 4, we used survey data on happiness in the United States available from the World Database of Happiness (59). [Veenhoven’s database provides two types of well-being measures: subjective happiness and life satisfaction. Here we use only the happiness measures because previous research found that LIWC’s affect categories are correlated with subjective emotions and not with life satisfaction (60).] The United States portion of the database contains measures from various questions related to happiness from different years, covering the period from 1946 to 2014. For some years there are multiple data points, but for others there are none. The answer scales also vary, so they had been transformed by Veenhoven (59) to a common 0–10 metric. We combined all available questions on happiness in the United States into a single time series. [Criticism has been directed at combining data from different surveys (29, 61) and from different scales (62), but our aim was to obtain the longest time series possible, so we used the maximum span of Veenhoven’s data.] When there was more than one data point per year, we took the average; years with no data points were treated as missing data. During the period 1946–2014 there were data points for 50 y.
Acknowledgments
We thank Richard Easterlin, Selin Kesebir, and John Monterosso for useful comments and suggestions. This research was supported in part by National Science Foundation Interdisciplinary Behavior and Social Sciences Grant 1520031 (to M.D.). R.I. is the recipient of a University of Michigan Research Assistantship.
Footnotes
- ↵1To whom correspondence may be addressed. Email: axe{at}umich.edu or rumen.i.iliev{at}gmail.com.
Author contributions: R.I. and R.A. designed research; R.I. and R.A. performed research; R.I., J.H., M.D., and R.A. analyzed data; and R.I., J.H., M.D., and R.A. wrote the paper.
Reviewers: M.I., Texas Tech University; M.M., University of Arizona; and K.W.W., University of California, Berkeley.
The authors declare no conflict of interest.
↵*In addition to testing the cross-linguistic generalizability of LPB, researchers studying the effect have considered questions about whether LPB occurs on the level of word types or tokens. Word types are the individual words within a language that constitute its vocabulary (e.g., “happy,” “sad,” “hopeful”); word tokens are actual occurrences of words. For example, if the word “joy” is noted five times in an email, it would be counted as one type and five tokens. Although there is general agreement that LPB is observable in multiple languages, there has been an ongoing debate about whether LPB happens on the level of types or on the level of tokens (10, 11, 15⇓–17). In this paper we study the phenomenon on the level of tokens only.
↵†When choosing a time lag, we faced two conceptual challenges. First, we do not know the average time lag between the writing of a piece of text and its publication date. Furthermore, this time lag might vary across time. Second, changes in writing might either follow environmental and societal changes or precede them. Writers, being in the vanguard of culture, might anticipate or even cause societal changes. Because we lack data to address these concerns and to adjust our measures accordingly, we work with a time lag of 0 for both corpora across all subsequent studies.
↵‡These findings agree only partially with the observations of Acerbi et al. (37), who found less positivity in American books during World War II but not during World War I. In Fig. 2 we can see a dip in LPB for each major military conflict in the history of the United States except the Korean War (1950–1953), for which the pattern is less consistent. The disparity most likely results from the different dictionaries used to measure affective content.
↵§During initial data analysis we observed abnormally high positivity (7.6 SDs) for the year of 1964. We traced the problem back to two words, “please” and “original,” which were 130 and 44 times more frequent, respectively, for the year 1964 than for neighboring years. We replaced the 1964 frequencies for these two words with the interpolated frequencies from years 1963 and 1965.
Freely available online through the PNAS open access option.
References
- ↵.
- Freud S
- ↵.
- Rorschach H
- ↵
- ↵.
- Iliev R,
- Dehghani M,
- Sagi E
- ↵.
- Iliev R,
- Smirnova A
- ↵.
- Zajonc RB
- ↵
- ↵.
- Rozin P,
- Berman L,
- Royzman E
- ↵.
- Augustine AA,
- Mehl MR,
- Larsen RJ
- ↵.
- Warriner AB,
- Kuperman V
- ↵.
- Dodds PS, et al.
- ↵
- ↵.
- ↵.
- New York Times Research and Development Group
- ↵
- ↵.
- Garcia D,
- Garas A,
- Schweitzer F
- ↵.
- Dodds PS, et al.
- ↵.
- Matlin MW,
- Stang DJ
- ↵.
- Alpers GW, et al.
- ↵.
- Rude S,
- Gortner EM,
- Pennebaker J
- ↵
- ↵.
- Parducci A
- ↵.
- Diener E,
- Diener C
- ↵.
- Biswas-Diener R,
- Vittersø J,
- Diener E
- ↵.
- Ito T,
- Cacioppo J
- ↵
- ↵.
- Diener E,
- Kanazawa S,
- Suh EM,
- Oishi S
- ↵.
- Lucas RE
- ↵
- ↵
- ↵.
- Marsden PV
- Firebaugh G,
- Tach L
- ↵
- ↵.
- Kesebir P,
- Kesebir S
- ↵.
- Greenfield PM
- ↵
- ↵.
- Brookings Institute
- ↵
- ↵
- ↵.
- Ramirez-Esparza N,
- Chung CK,
- Kacewicz E,
- Pennebaker JW
- ↵.
- Iliev R,
- Axelrod R
- ↵.
- Elmendorf DW,
- Mankiw NG,
- Summers LH
- Stevenson B,
- Wolfers J
- ↵
- ↵.
- Williams FM
- ↵.
- Perrin S,
- Spencer C
- ↵
- ↵
- ↵
- ↵.
- Konrath SH,
- O’Brien EH,
- Hsing C
- ↵.
- Oishi S,
- Graham J,
- Kesebir S,
- Galinha IC
- ↵.
- Michel JB, et al., Google Books Team
- ↵.
- Frank MR,
- Mitchell L,
- Dodds PS,
- Danforth CM
- ↵.
- Jurafsky D,
- Martin HJ
- ↵.
- Hamilton WL,
- Clark K,
- Leskovec J,
- Jurafsky D
- ↵
- ↵.
- Twenge JM,
- Campbell WK,
- Gentile B
- ↵.
- Pennebaker JW,
- Booth RJ,
- Francis ME
- ↵.
- Department of Veterans Affairs
- ↵.
- DeBruyne NF,
- Leland A
- ↵.
- Veenhoven R
- ↵
- ↵.
- Stevenson B,
- Wolfers J
- ↵.
- Batz C,
- Parrigon S,
- Tay L