Cross-correlations of American baby names
Contributed by Giorgio Parisi, April 27, 2015 (sent for review October 11, 2014; reviewed by R. Alexander Bentley)
Significance
Societal and cultural transformations are very general and debated topics, both by scientists (e.g., sociologists) and by public opinion (e.g., artists, music producers, brand manufacturers, and advertising agencies). Although almost everyone would be able to express a position on such arguments, it is much more difficult to support such an opinion based on scientific evidence. In this work we analyze the case of American baby names and describe the evolution of tastes of parents regarding the choice of the name during the years of the last century. Using quantitative methods we find that a deep transformation occurred at the end of the 20th century and suggest that this might be studied from a quantitative sociological point of view.
Abstract
The quantitative description of cultural evolution is a challenging task. The most difficult part of the problem is probably to find the appropriate measurable quantities that can make more quantitative such evasive concepts as, for example, dynamics of cultural movements, behavioral patterns, and traditions of the people. A strategy to tackle this issue is to observe particular features of human activities, i.e., cultural traits, such as names given to newborns. We study the names of babies born in the United States from 1910 to 2012. Our analysis shows that groups of different correlated states naturally emerge in different epochs, and we are able to follow and decrypt their evolution. Although these groups of states are stable across many decades, a sudden reorganization occurs in the last part of the 20th century. We unambiguously demonstrate that cultural evolution of society can be observed and quantified by looking at cultural traits. We think that this kind of quantitative analysis can be possibly extended to other cultural traits: Although databases covering more than one century (such as the one we used) are rare, the cultural evolution on shorter timescales can be studied due to the fact that many human activities are usually recorded in the present digital era.
Sign up for PNAS alerts.
Get alerts for new articles, or get an alert when an article is cited.
Cultural traits are behavioral patterns shared by the members of social communities. Traditions, religions, beliefs, language, and values are some examples. Far from being static and isolated, they are continuously evolving and interacting with the external environment, e.g., other communities and mass media, and they can be transmitted among members of communities on timescales that are much shorter than those characterizing cultural movements. Although changes in cultural movements may occur over decades or centuries, changes in cultural traits may be observed from a daily to a yearly basis, depending on the trait. An accurate analysis of existing, public data can teach a lot about their reciprocal influence. A cultural trait may promote or prevent the popularity rise of others, a past cultural trait may have an influence on current and future ones, and finally the rise or fall of a cultural trait in a certain area may influence cultural traits in other areas. Cultural traits can be considered as the fundamental blocks of the culture of communities, and their evolution can be used to describe the evolution of society.
Some of the most important progress in the understanding of the evolutionary process of cultures is described in a number of texts that are at this point classic references: Among them are refs. 1–3. Also many cultural traits have been studied in the past. Among them are those that have negligible differences among each other, in terms of intrinsic costs and benefits, which are usually referred to as neutral traits. They play a special role in our study, as we explain below. Some of these traits are skirt lengths (4), pop songs (5), dog breeds (6), and pottery decorations in the archaeological record (7). Also keywords in academics vocabulary have been the focus of recent interest (8). Data about names given to newborns have been investigated for similar reasons (9, 10), and they are the focus of our investigation.
Names come and go in society, as does any other cultural trait. Most of them have a popularity peak and then disappear. They carry important information on the transformation of the social structure (11, 12). Several quantitative approaches to analyze what can be learned from names have been proposed, and we briefly describe them in the following lines. Compared with other relevant traits, neutral traits and, in particular, names appear very appropriate to study cultural changes, because the success of a name depends mainly on the influence that the surrounding culture wields on the parents of the newborns. Other traits suffer, for example, the influence of external forces, such as that due to the producers, which may artificially shape the tastes of consumers. This is particularly evident in the fashion market and in the music market (13).
The frequency distributions of names given to newborns have fat tails, typical of many complex physical problems (14). Fat-tailed distributions can be generated by different mechanisms (7, 15, 16), and in the case of names they have been given several explanations. A scale-free network was used to study a fashion diffusion process where each node could take one of many values (9), imitating “popular” nodes and avoiding “nonpopular” nodes. A stochastic model for cultural evolution has been proposed recently. Here names were chosen according to both individual preferences and social influence (10). These different mechanisms are all able to reproduce a fat-tailed distribution and were shown to reproduce several features of the real data. The popularity of a name was also shown to be correlated with the popularity of similar names in previous years (13). Furthermore, names were analyzed in terms of activation and inhibition processes (17), to explain their popularity. The rates of the rise and of the fall of the popularity of names were found to be correlated in refs. 11 and 18. The same phenomena are also being studied in the context of collective behavior (19, 20), where limited attention seems to play a crucial role (21, 22), and in the context of citation dynamics (23), where a universal temporal pattern is found.
Some very interesting phenomena on the dynamics of American baby names have been recently discussed in ref. 24, where the authors introduced a new model for the choice of names. The model is defined in terms of a population of agents (babies), each of which holds a single variant (name), and where names of the new generations are given mainly by copying from the last generations but also, sometimes, by inventing new names. Comparing real world data with the model in ref. 24 unveils a considerable increase in the innovation of names in the last part of the century. This is consistent with the main findings of this paper, as we discuss further in the following.
The mechanisms behind the spreading of cultural traits are still debated. The original hypothesis of Simmel (25) was that a fashion arises because individuals of lower social status copy those of perceived higher status. This is the idea used for the analysis done in ref. 9. This approach is different from the neutral model proposed in ref. 26, where naming was considered in close connection with the infinite-allele model of population genetics with a random genetic drift. A preference model of fashion (18) where individuals can copy preferences of other agents, was said to better reproduce the empirical features of American baby names. These studies on names were mainly focused on global distributions, but not on the relations between local distributions of names in different states of the United States (i.e., distributions in single states). We believe that much can be learned, for example, from the relations between local distributions of names in different states. Our main working hypothesis is that local changes of names convey a large body of information on the mutual cultural influences that communities (states) wield on each other.
We focus our correlation analysis on different states of the United States during the 20th century. Statistics on names given to newborns in the United States can be downloaded from the webpage of the US Social Security Administration (SSA) (27). Different states have different popularity spreading curves for each name (Fig. 1) (many of the common names rise and fade with a very similar behavior). The overlap between these curves could be used to describe the similarities between US states. Instead of considering these overlaps in time, we consider the correlations between states on a yearly basis, by studying the whole distribution of baby names in every state (Fig. 2). This analysis gives robust results, as we show in the following.
Fig. 1.

Fig. 2.

Methods
For all available years [that range, in the SSA archives (27), from 1910 to 2012] we study how names given in a state i are correlated to names given in a state j. The distribution of these names has already been analyzed (28) and it is further described in SI Text (Figs. S1 and S2). For each pair of states i and j, with (the federal district of Washington, DC is considered by itself), we evaluate a correlation coefficient , computed as follows. Let us consider a generic year y and let be the number of girls named q born in the state S in the year y (we limit ourselves to describing the girls’ case as we have verified that analyzing baby boys’ names leads to the same conclusions). In each year, we have a rectangular matrix, where is the total number of different girl names present in the database, and the number of US states. The entries of the matrix, , are the occurrences of the baby girl names, with and . Because the information provided by the SSA includes only names that occur at least five times, if the name q has been used less than five times in the state S in the year y, then we have . These matrices are sparse, and only an average of of the entries are different from zero. The frequency of the name q in the state S is given by
[1]
Fig. S1.

Fig. S2.

The average frequency of the name q over the states is
[2]
It is useful to define the quantities
[3]
which are related to fluctuations of the frequencies of the names over the states. The average of over all of the names is zero in each state S, as can be explicitly seen from their definition
[4]
given that in each state S, . To compute how the names in the state i are correlated with the names in the state j, we analyzed, year after year, the Pearson correlation between the variables and , which is the square matrix
[5]
This matrix can be used to capture the emergence of complex correlations between clusters of states and to study their evolution in time. However, separating the interesting information from the underlying noise is a nontrivial problem. Similar issues have already been faced when analyzing biological problems (29–32) and financial stock markets (33–36) as well as in Internet traffic analysis (37) and in the statistics of atmospheric correlations (38). The main point is that even though the empirical correlation matrix is noisy, it does have stable properties (Figs. S3 and S4). We checked these properties, such as the eigenvalue spectrum and eigenvector localization, and compared them to the ones implied by a null hypothesis, i.e., to the properties of random matrices (Fig. S5).
Fig. S3.

Fig. S4.

Fig. S5.

Here we apply two general methods for the analysis of correlation matrices, i.e., principal component analysis (PCA) and hierarchical clustering (HC). PCA is based on the selection of the eigenvectors corresponding to the largest eigenvalues of the cross-correlation matrix. This choice relies on the hypothesis that smaller eigenvalues are related to noise whereas larger ones are related to the true system dynamics. HC, on the other hand, starts from M clusters formed of one state each and allows one to set up a hierarchy of clusters by merging clusters according to their distances, which can be defined in several ways from their mutual correlations. These two methods give very similar results, both for male and for female names, year after year and for different choices of the metrics in the HC algorithm. These methods are further described in SI Text.
SI Text
Distribution of Names.
We show in Fig. S1 the cumulative distribution of the occurrence of the different baby names, averaged over years. Although it is clear from Fig. S1 that the distribution of names has fat tails, it is also clear that it is more complex than a power law (Fig. S1). As discussed in ref. 28 this distribution can be fitted with a combination of a beta function and a power law. The vast majority of names appear and spread fast through the states. They stay popular for a few years and then disappear (or, more precisely, their frequency goes down to very low, endemic levels), without keeping a high popularity for a long time. Rise and fall in popularity is a general mechanism in social science, and various preferential attachment models with noise and/or limited attention have been proposed to capture these rise/fall curves (21–23). The popularity rise and fall of the most representative names can be observed in Fig. S2. Even if this process is not completely symmetric (17), the rate of ascent is very similar to the rate of descent. The main difference is the leftover tail at very large times (a name that has appeared does not completely disappear).
Comparison with Random Matrices.
In all years of the time period under investigation the first eigenvalues of the matrix of correlations are well separated from all of the other eigenvalues, as can be seen in Fig. S3. However, Fig. S4 shows that in the last part of the 20th century there is no clear separation between the first and the second eigenvalue. This is not an effect due to the random noise: This issue is rather subtle and we comment further in the following section. In this section we discuss how to deal with the noise and how to identify system-specific, nonrandom, correlations. One of the possible methods is to compare the spectrum of the correlation matrices with the one of random Wishart–Laguerre matrices (34). This method provides bounds for the random bulk of the spectrum and thus the eigenvalues (and the corresponding eigenvectors) outside this interval are thought to yield information on the genuine correlations of the underlying system. The largest eigenvalues have been shown to play an important role in many situations (35, 40) and we will see below that this is the case also in our problem. We also used a slightly different method to establish that noise is not affecting our findings. We compared the spectrum of correlation matrices with the one obtained from random correlation matrices: In each year, we made a random permutation of the occurrences of the names inside each state, and we used it to compute the correlation matrix . Because , the correlation matrices are similar to identity matrices, whose spectrum is obviously made by ones. The spectrum of is different from the one of C, even if the widely used unfolded nearest-neighbor spacing distribution (for example, ref. 34) is similar. To rule out a role of the random noise, we compared the inverse participation ratio (IPR) of our data to that of the null model. This test provides a direct evaluation of the nonrandom part of the spectrum, and it is defined as follows. Let be the ith component of the (normalized) eigenvector such that , andwhere M is the total number of components. This quantity is often used in localization problems, because its reciprocal equals the number of states over which a vector is localized. An easy way to understand this point is to use the normalization condition , from which we see that for a vector localized over c states, implying that . A localized vector is a vector for which , whereas for nonlocalized vectors . We compared the IPR of all of the eigenvectors of and and we found that although there is not a complete agreement for the respective in the right region of the spectrum (small eigenvalues), there is a clear separation in the left part of the spectrum, which can be interpreted as a deviation from the random matrices behavior, as can be seen in Fig. S5. The first eigenvalues carry relevant information for the detection of collective modes in the system: They are found to be nonlocalized and to give important information on the correlation between states. The fact that the distribution of names occurrences is fat tailed is crucial for the qualitative agreement we find in the right region of the spectrum, as it is known that, in cross-correlation matrices of signals sampled from a fat-tailed distribution, eigenvectors corresponding to small eigenvalues are localized (41), whereas this does not happen in the Gaussian case, where the IPR is flat for all λ (34, 37, 40).
[S1]
Principal Component Analysis and Hierarchical Clustering.
Let us introduce here principal component analysis (PCA) and hierarchical clustering (HC), which are the two methods we used to investigate the (genuine) correlations of the states through the years. A simple way to extract information from the correlation matrix C is PCA (42), which consists of a partial eigendecomposition of the correlation matrix. It uses only the largest eigenvalues and their corresponding eigenvectors of the matrix to reconstruct collective modes of the system. It can be easily described by means of an example. Suppose that the correlation matrix is such that the nondiagonal elements of the line i are all very small except for a single element , which is almost one, and that all of the other nondiagonal terms are also very small (but ). It is clear that this situation corresponds to some kind of connection between i and j. In this case the eigenvector of C that corresponds to the largest eigenvalue contains this information in its components; it can be shown that the only components of v significantly different from zero are and . Now suppose that the correlation matrix is slightly more complicated and that many nondiagonal elements are large. Although in some situations this effect is purely due to the noise, it can also be related to the fact that complicated patterns (more complex than pairwise correlations) are hidden in the system. It is reasonable to assume that eigenvectors of C take into account this information and that eigenvectors corresponding to largest eigenvalues are localized on such patterns. Thus, PCA may lead to a reconstruction of the complex correlations among many states. To get rid of the noise contained in the correlation matrices and to be sure that eigenvectors corresponding to large eigenvalues contain genuinely nonrandom information, we proceed as explained in the previous section.
Taking the first two eigenvectors of C, and , a bidimensional representation of these eigenvectors can be obtained by plotting the points of coordinates , one for each state. This can be done year after year and the result is shown in Fig. S6. In these scattered plots, clusters of states that are similar, in a sense that is discussed below, naturally emerge.
Fig. S6.

The notion of similarity can be understood with the following qualitative description. Let us define , which is a positive definite matrix, whose elements satisfy . Let us assume that the state i has large correlations with a group of states , whereas it has small correlations with the rest of the states; i.e., when . Let us also assume that another state k has a large correlation with a group of states whose overlap with is big enough. In this case, i and k are said to be similar, even if the correlation coefficient between them is small, and those two states belong to the same cluster.
These results have been compared with those obtained using an agglomerative hierarchical clusterization based on the matrix . HC is a method to build a hierarchy of clusters starting from M clusters of one state each and such that clusters are merged as one moves up in the hierarchy. This can be done in many ways. First, one has to define a metric , for example the Euclidean one, between the columns of the correlation matrix. Then one has to minimize a distance between the clusters X and Y: At each step, the two clusters separated by the shortest distance are merged. There are several possible choices for , each one defining a possible criterion to perform the clusterization. We used the complete criterion, which is defined bywhich tends to find compact clusters of approximately equal diameters. This procedure leads to the construction of a tree diagram called a dendrogram, which can be cut at the height one prefers. This introduces a sort of arbitrariness in the number of clusters that are considered. Looking at the distribution of the scatter plot in Fig. S6, we decided to fix the number of clusters to three.
[S2]
It can be noted that states that are similar in the sense specified before are found to belong the same cluster. In each year, to measure the quality of the clusterization obtained, we computed the cophenetic coefficient. Given the Euclidean distance between the points i and j and the corresponding distance that these points have on the dendrogram, the cophenetic coefficient c is defined aswhere and are, respectively, the average value of the Euclidean and of the dendrogramatic distances. Values of c close to one indicate a good clusterization, because the correlation coefficient between the actual distances and the dendrogramatic one is large. In all years c is found to be close to 0.85 and never smaller than 0.7. The results from PCA and HC can be found in Figs. 3 and 4 of the main text.
[S3]
What Are the Principal Components Made of?
As we already noted Fig. S4 shows that the first eigenvalues are always well separated during the first decades of the 20th century, showing that persisting patterns in the states’ interrelations survive during years. We also noted that the situation is less clear at the end of the 20th century, because in that period there is not a clear separation between the first two eigenvalues. Here we discuss this point further. In each year, we considered the first two eigenvectors of the correlation matrix and we assigned a color to each state, corresponding to the position it had in the plane spanned by and , i.e., the eigenvectors corresponding to the two largest eigenvalues. Although these persistent patterns are clear in the first decades, indicating a separation between northern states and southern states, the situation changes at the end of the 20th century, when states start to form two main groups, one formed by the coastal states and a second one by the central states (a better visualization of such a transition can be obtained from Movies S1–S4, which show data for all of the available years). To better understand what has happened in the transition years, we studied how eigenvectors changed during years. Fig. S4 suggests to choose, as a reference, the eigenvectors of the correlation matrix in 1950, because it is the year with the largest difference . We then studied how the first three eigenvectors of the correlation matrix in the year y are related to those of 1950. For each year we evaluated the projections of the kth eigenvector on the jth 1950s eigenvector,The first eigenvector turns out to be rather stable until the 1990s, when it becomes mostly a superposition of 1950s second and third eigenvectors. Conversely, after this transition, the second eigenvector is basically substituted by the first 1950s eigenvector, as can be seen in Figs. S7 and S8. These figures provide a clear interpretation of the evolution of the correlations between states: For many decades of the 20th century there has been a stable configuration of groups (clusters) of states that eventually broke down at the end of the century. We do not further investigate the reasons behind this change. Looking at the distribution of baby names allows a sensible and realistic clusterization of states to emerge, and this clusterization is related to cultural influences. The deep motivations that lie behind this change deserve to be studied by sociologists, maybe helped by further quantitative testing of large datasets.
[S4]
Fig. S7.

Fig. S8.

Results
Both our algorithms show a clear division of the different states in a number of homogeneous groups. A group of states is qualitatively defined as states that share some level of similarity in their distributions of names, and it is natural to associate them to a common cultural area. In Fig. 3, states in the same group are assigned similar colors. This group structure is robust over timescales of the order of a few years: It is thus worth looking at their evolution over larger timescales. In the beginning of the 20th century states were divided into a group of northern states and a group of southern ones, and this separation remains stable across many years. This structure suddenly breaks down in the last decades of the 20th century, and a new configuration of groups emerges. The evolution of these groups of states is clear in Fig. 3.
Fig. 3.

In the new stable configuration that emerges at the end of the 20th century, some states of the Atlantic and of the Pacific coasts share common features and belong to the same group, different from that to which many of the central states belong. To better identify these groups of states we used a hierarchical clustering algorithm. A better visualization of this transition can be observed in Movies S1–S4. This method allows a precise and quantitative definition of the groups mentioned above and leads to the formation of clusters, identified by different colors in Fig. 4 (see also Fig. S6). The two different methods give the same answer and make manifest a very interesting social cross-fertilization. This approach is able to describe the emergence of clusters of states through the analysis of the mutual correlations of their newborns names and extracts interesting information on the evolution of these clusters (Figs. S7 and S8).
Fig. 4.

Discussion
The study of name distributions at the state level enabled us to avoid the effects of strong fluctuations on smaller scales due to local socioeconomic factors, such as for instance economic segregation or ethnicity (39), and to capture macroscopic changes in the structure of mutual correlations. We do not discuss here the origin of these correlations or the mechanisms according to which names are given to newborn babies. Some steps in this direction have been taken in refs. 10 and 24. We also do not study the reasons why there is a reorganization of clusters in the last decades of the 20th century, compared with the relatively stable situation of the first half of the century. These are two very interesting issues that deserves a more specific study. Irregularities in the retarded cross-correlations between the total distributions of the American names were found in ref. 10 to appear in the 1970s; this effect is very probably connected to the reorganization of the clustering of the states that we unveil here. As suggested by the authors in ref. 10, this effect is probably due to the deep cultural transformation that occurred in the United States after the Vietnam War. The authors also proposed a model for the generation of names, showing, very interestingly, that in recent years the inequality between names has been decreasing. The decrease in time of the probability that two individuals chosen at random from a set of newborn babies share the same name (and the corresponding increase in the diversity of names) is shown in ref. 9 to be related to the shorter timescale of the name’s lifetime, whereas the authors of ref. 10 explained how the weakening of social influence could also be a relevant factor. This effect is very closely related to the increase of innovation found by Bentley and Ormerod (24), which detected a steep increase in the coefficient that regulates the amount of invention of new names, compared with the number of names copied by previous generations. One of the possible explanations of this fact is the much deeper interconnectedness of the society of the last decades, compared with that of the last century, whose culture sources were mainly represented by mass media. Much more should be said on this, but this would take us far from the main aims of this work. We mention only that our analysis can be adapted to capture which states influence a given state and which ones are influenced by the same state. This amounts to identifying a directed network, which may be relevant for studying time correlations and culture propagation. In the Internet era one can get high statistical data about an extremely large number of behaviors and cultural traits: Using our approach on the combined ensemble of such abundant datasets should allow one to organize a precise quantitative understanding about the functioning of cultural influences and their evolution.
Acknowledgments
The research leading to these results has received funding from the Italian Research Ministry through the Futuro in Ricerca Project RBFR086NN1.
Supporting Information
Supporting Information (PDF)
Supporting Information
- Download
- 623.96 KB
pnas.1507143112.sm01.mov
- Download
- 15.54 MB
pnas.1507143112.sm02.mov
- Download
- 15.74 MB
pnas.1507143112.sm03.mov
- Download
- 12.72 MB
pnas.1507143112.sm04.mov
- Download
- 12.86 MB
References
1
MW Feldman, LL Cavalli-Sforza Cultural Transmission and Evolution: A Quantitative Approach (Princeton Univ Press, Princeton, NJ) Vol 16 (1981).
2
R Boyd, PJ Richerson Culture and the Evolutionary Process (Univ of Chicago Press, Chicago, 1985).
3
A Mesoudi Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences (Univ of Chicago Press, Chicago, 2011).
4
BD Belleau, Cyclical fashion movement: Women’s day dresses: 1860-1980. Cloth Text Res J 5, 15–20 (1987).
5
RA Bentley, CP Lipo, HA Herzog, MW Hahn, Regular rates of popular culture change reflect random copying. Evol Hum Behav 28, 151–158 (2007).
6
HA Herzog, RA Bentley, MW Hahn, Random drift and large shifts in popularity of dog breeds. Proc Biol Sci 271, S353–S356 (2004).
7
FD Neiman, Stylistic variation in evolutionary perspective: Inferences from decorative diversity and interassemblage distance in Illinois woodland ceramic assemblages. Am Antiq •••, 7–36 (1995).
8
RA Bentley, Random drift versus selection in academic vocabulary: An evolutionary analysis of published keywords. PLoS ONE 3, e3057 (2008).
9
MJ Krawczyk, A Dydejczyk, K Kulakowski, The Simmel effect and babies’ names. Physica A Stat Mech Appl 395, 384–391 (2014).
10
N Xi, et al., Cultural evolution: The case of babies’ first names. Physica A Stat Mech Appl 406, 139–144 (2014).
11
J Berger, G Le Mens, How adoption speed affects the abandonment of cultural tastes. Proc Natl Acad Sci USA 106, 8146–8150 (2009).
12
RL Goldstone, TM Gureckis, Collective behavior. Top Cogn Sci 1, 412–438 (2009).
13
J Berger, ET Bradlow, A Braunstein, Y Zhang, From Karen to Katie: Using baby names to understand cultural evolution. Psychol Sci 23, 1067–1073 (2012).
14
HJ Jensen Self-Organized Criticality: Emergent Complex Behavior in Physical and Biological Systems (Cambridge Univ Press, Cambridge, UK) Vol 10 (1998).
15
R Albert, AL Barabási, Statistical mechanics of complex networks. Rev Mod Phys 74, 47 (2002).
16
MW Feldman, LL Cavalli-Sforza, Further remarks on Darwinian selection and “altruism”. Theor Popul Biol 19, 251–260 (1981).
17
DH Zanette, Dynamics of fashion: The case of given names. arXiv:1208.0576. (2012).
18
A Acerbi, S Ghirlanda, M Enquist, The logic of fashion cycles. PLoS ONE 7, e32541 (2012).
19
JP Onnela, F Reed-Tsochas, Spontaneous emergence of social influence in online systems. Proc Natl Acad Sci USA 107, 18375–18380 (2010).
20
JP Gleeson, D Cellai, JP Onnela, MA Porter, F Reed-Tsochas, A simple generative model of collective online behavior. Proc Natl Acad Sci USA 111, 10411–10415 (2014).
21
L Wu, A Flammini, A Vespignani, F Menczer, Competition among memes in a world with limited attention. Sci Rep 2, 335 (2012).
22
F Wu, BA Huberman, Novelty and collective attention. Proc Natl Acad Sci USA 104, 17599–17601 (2007).
23
D Wang, C Song, AL Barabási, Quantifying long-term scientific impact. Science 342, 127–132 (2013).
24
RA Bentley, P Ormerod, Accelerated innovation and increased spatial diversity of us popular culture. Adv Complex Syst 15, 1150011 (2012).
25
G Simmel, Fashion. Am J Sociol 62, 541–558 (1904).
26
MW Hahn, RA Bentley, Drift as a mechanism for cultural change: An example from baby names. Proc Biol Sci 270, S120–S123 (2003).
27
US Social Security Administration, The Official Website of the US Social Security Administration. Available at www.ssa.gov.
28
W Li, Analyses of baby name popularity distribution in us for the last 131 years. Complexity 18, 44–50 (2012).
29
O Sporns, DR Chialvo, M Kaiser, CC Hilgetag, Organization, development and function of complex brain networks. Trends Cogn Sci 8, 418–425 (2004).
30
F Luo, J Zhong, Y Yang, RH Scheuermann, J Zhou, Application of random matrix theory to biological networks. Phys Lett A 357, 420–423 (2006).
31
F Luo, et al., Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinformatics 8, 299 (2007).
32
S Jalan, N Solymosi, G Vattay, B Li, Random matrix analysis of localization properties of gene coexpression network. Phys Rev E Stat Nonlin Soft Matter Phys 81, 046118 (2010).
33
L Laloux, P Cizeau, JP Bouchaud, M Potters, Noise dressing of financial correlation matrices. Phys Rev Lett 83, 1467 (1999).
34
V Plerou, P Gopikrishnan, B Rosenow, LAN Amaral, HE Stanley, Universal and non-universal properties of cross correlations in financial time series. Phys Rev Lett 83, 1471 (1999).
35
V Plerou, et al., Random matrix approach to cross correlations in financial data. Phys Rev E Stat Nonlin Soft Matter Phys 65, 066126 (2002).
36
T Conlon, HJ Ruskin, M Crane, Cross-correlation dynamics in financial time series. Physica A Stat Mech Appl 388, 705–714 (2009).
37
M Barthélemy, B Gondran, E Guichard, Large scale cross-correlations in Internet traffic. Phys Rev E Stat Nonlin Soft Matter Phys 66, 056110 (2002).
38
MS Santhanam, PK Patra, Statistics of atmospheric correlations. Phys Rev E Stat Nonlin Soft Matter Phys 64, 016102 (2001).
39
RG Fryer, SD Levitt, The causes and consequences of distinctively Black names. The Quarterly Journal of Economics 119, 767–805 (2004).
40
A Edelman, Eigenvalues and condition numbers of random matrices. SIAM J Matrix Anal Appl 9, 543–560 (1988).
41
P Cizeau, JP Bouchaud, Theory of Lévy matrices. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 50, 1810–1822 (1994).
42
IM Johnstone, High dimensional statistical inference and random matrices., Proceedings of the International Congress of Mathematicians Madrid, August 22–30, 2006, eds Sanz-Solé M, Soria J, Varona JL, Verdera J (EMS Publishing House, Zürich) pp 307–333. (2007).
Information & Authors
Information
Published in
Classifications
Submission history
Published online: June 11, 2015
Published in issue: June 30, 2015
Keywords
Acknowledgments
The research leading to these results has received funding from the Italian Research Ministry through the Futuro in Ricerca Project RBFR086NN1.
Authors
Competing Interests
The authors declare no conflict of interest.
Metrics & Citations
Metrics
Altmetrics
Citations
Cite this article
Cross-correlations of American baby names, Proc. Natl. Acad. Sci. U.S.A.
112 (26) 7943-7947,
https://doi.org/10.1073/pnas.1507143112
(2015).
Copied!
Copying failed.
Export the article citation data by selecting a format from the list below and clicking Export.
Cited by
Loading...
View Options
View options
PDF format
Download this article as a PDF file
DOWNLOAD PDFLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginRecommend to a librarian
Recommend PNAS to a LibrarianPurchase options
Purchase this article to access the full text.