Postmortem memory of public figures in news and social media

Significance Who is remembered by society after they die? Although scholars as well as the broader public have speculated about this question since ancient times, we still lack a detailed understanding of the processes at work when a public figure dies and their media image solidifies and is committed to the collective memory. To close this gap, we leverage a comprehensive 5-y dataset of online news and social media posts with millions of documents per day. By tracking mentions of thousands of public figures during the year following their death, we reveal and model the prototypical patterns and biographic correlates of postmortem media attention, as well as systematic differences in how the news vs. social media remember deceased public figures.

Included refers to the 2,362 individuals included in the study (exclusion was mostly due to below-threshold pre-mortem mention frequencies, cf. Materials and Methods in the main paper). Regression is a subset of Included and refers to the 870 individuals included in the regression analysis (exclusion was due to unknown age, gender, or manner of death). N/A refers to the percentage of individuals for whom the respective property was not available in Freebase. The remaining relative frequencies were computed based on the set of individuals for whom the property was available, so they sum to 100%.  Figure 2: Comparison of ten models S(t) for fitting empirical mention frequencies, for (a) the news and (b) Twitter. All y-axes are logarithmic. The left and right plots in each pair show the same fits, the only difference being that the left plots have linear x-axes, whereas the right plots have logarithmic x-axes. Fits are nonlinear least-squares fits obtained in log space, i.e., logarithms were taken of both the empirical data and the model S(t) before performing the least-squares optimization (see equation 1 in the paper for the case of the shifted power function). The biexponential function was introduced by Candia et al.

Model comparison (a) News
("The universal decay of collective memory and attention." Nature Human Behaviour. 2019; 3(1):82-91) and is parameterized by N, p, q, r > 0, as follows: • Biexponential: S(t) = N p+r−q (p − q)e −(p+r)t + re −qt . Based on the biexponential function, we define the (novel) bipower function by replacing exponentials with powers: By fixing q = 0 and defining a = N p p+r , b = p + r, and c = N r p+r , we obtain the shifted power function as a special case of the bipower function: • Shifted power: Theoretical motivations for six of the seven remaining functions are given by Rubin and Wenzel ("One hundred years of forgetting: A quantitative description of retention." Psychological Review. 1996;103(4):734-760). Four of these functions are parameterized by two parameters a, b > 0, as follows: • Exponential: S(t) = a e −bt , i.e., log S(t) = log a − be log t . • Hyperbolic: S(t) = (a + bt) −1 , i.e., log S(t) = − log(a + be log t ).
• Logarithmic: The four above functions share the property of being concave (exponential, hyperbolic, logarithmic) or linear (power) when plotted on log-log axes (i.e., log S(t) is a concave function of log t, cf. right columns), whereas the empirical curves are convex on log-log axes.
Rubin and Wenzel also proposed generalized versions of the exponential and hyperbolic functions, called exponential-power and hyperbolic-power functions, respectively, where t is replaced by the power t c , where c is a third parameter. (Analogous generalized versions of the logarithmic and power functions are not necessary, as they can already be expressed by the plain logarithmic and power functions, since b log t c = (bc) log t.) For c > 0, the exponential-power and hyperbolic-power functions, too, are concave in log-log space, but when allowing for b, c < 0, they can be made convex and are thus better suited for fitting the empirical data (note that, in the following specifications, we maintain a, b, c > 0, but replace b by −b, and c by −c): • Exponential-power: Finally, as the last function, we consider what Candia et al. refer to as the "log-normal" function, defined as S(t) = exp(log a − b log t − c(log t) 2 ). To recognize the fact that, although this function takes the functional form of the log-normal distribution, it is not actually used to describe a probability distribution here, we refer to the function as "log-normal-based". The log-normal-based function, too, is concave in log-log space, but can be made convex by replacing c by −c, i.e., S(t) = exp(log a − b log t + c(log t) 2 ). Note, however, that this results in S(t) being an increasing function of t as t → ∞, unlike the empirical data and unlike what one would require from a sound theoretical model of collective memory. We hence constrain the parameters such that the fitted function is monotonically decreasing over the modeled time range (days 1 to t max = 400). Since the unconstrained function, when fitted to the empirical data, assumes a minimum at 1 < t < t max , the monotonicity constraint is equivalent to requiring the minimum to occur at t = t max , which happens for b = −2ct max and gives rise to the following model: We quantify goodness of fit using two measures (results in the legends of the right plots): (1) via coefficients of determination (R 2 ; computed as the squared correlation between observed and predicted values on the log scale; larger is better); (2) in order to account for the varying model complexity (the models have between two and four parameters), via Akaike's information criterion (AIC; smaller is better). R 2 and AIC result in the same ordering of the 10 models, and the ordering is identical across the two media (news and Twitter). In the figure, the models are sorted, top-down, in increasing order of goodness of fit. The shifted power model provides the best fit according to both measures (R 2 and AIC) and for both media (news and Twitter), with R 2 = 0.989 for the news and R 2 = 0.985 for Twitter. The bipower model yields essentially the same fit as the shifted power model, which, as mentioned above, is a special case of the bipower model with q = 0 fixed (optimal bipower fit: q = 0.0068 for the news, q = 0 for Twitter).  Figure 4: Relative length increase of documents that mention deceased public figures (excluding Twitter posts), with respect to the respective public figure's pre-mortem mean document length, as a function of days since death. All means in this analysis are geometric means. Error bars are 95% confidence intervals approximated as ±2 standard errors. Left: Direct mean estimates of relative length increase. Right: Estimates adjusted for population drift (since certain groups of people are more likely to be mentioned post-mortem than others). Each day t's adjusted estimate was obtained from a separate linear regression model for day t only, which included each person i mentioned that day as a data point, with log(L it /P i ) as the dependent variable (where L it is i's mean document length on day t, and P i is i's pre-mortem mean document length), and with independent variables defined by the predictors that were previously found to be significantly associated with short-term post-mortem mention frequency (language, pre-mortem mean mention frequency rank, manner of death, age group), such that estimates are for anglophones of median pre-mortem popularity who died an unnatural death at age 70-79 years. The adjusted estimate of the (geometric) mean relative length increase for day t is then given by e at − 1, where a t is the intercept of the regression for day t. We see that, both with and without adjustment, the documents that mentioned a person on the day of their death were on average about 40% shorter than documents that mentioned the person before their death, and that the pre-mortem level is reached again (and then surpassed) after about one month.

Regression modeling
Note: the variable names in the code may differ from those in the paper; see mapping in lists below.

Independent variables:
• pre-mortem mean (mean_before) • age at death (age_group) • manner of death (death_type) • notability type (type_group) • language (anglo) • gender (gender) Dependendent variables: • short-term boost (peak_mean_boost) • long-term boost (perm_boost) We report results for two variants of each model, which differ in the way dependent variables are treated. Model variants are marked via the "Transformation on dependent variable" descriptor in the header of each model. The two variants are the following: • NONE: dependent variables were used as-is. This variant is used in the main paper. • RELATIVE RANKS: dependent variables were transformed to relative ranks; i.e., they were ranktransformed and then shifted/scaled to the interval [−0.5, 0.5]. This way, dependent variables need to be interpreted in relative terms: a value of 0 corresponds to the median, and positive [negative] values to ranks above [below] the median.