# Persistence and uncertainty in the academic career

^{a}Laboratory for the Analysis of Complex Economic Systems, Institutions Markets Technologies (IMT) Lucca Institute for Advanced Studies, 55100 Lucca, Italy;^{b}Laboratory of Innovation Management and Economics, IMT Lucca Institute for Advanced Studies, 55100 Lucca, Italy;^{c}Crisis Lab, IMT Lucca Institute for Advanced Studies, 55100 Lucca, Italy;^{d}Department of Managerial Economics, Strategy and Innovation, Katholieke Universiteit Leuven, 3000 Leuven, Belgium; and^{e}Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215

See allHide authors and affiliations

Contributed by H. Eugene Stanley, December 28, 2011 (sent for review December 14, 2011)

## Abstract

Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative representation of how careers evolve over time. Because knowledge spillovers, cumulative advantage, competition, and collaboration are distinctive features of the academic profession, both the employment relationship and the procedures for assigning recognition and allocating funding should be designed to account for these factors. We study the annual production *n*_{i}(*t*) of a given scientist *i* by analyzing longitudinal career data for 200 leading scientists and 100 assistant professors from the physics community. Our empirical analysis of individual productivity dynamics shows that (*i*) there are increasing returns for the top individuals within the competitive cohort, and that (*ii*) the distribution of production growth is a leptokurtic “tent-shaped” distribution that is remarkably symmetric. Our methodology is general, and we speculate that similar features appear in other disciplines where academic publication is essential and collaboration is a key feature. We introduce a model of proportional growth which reproduces these two observations, and additionally accounts for the significantly right-skewed distributions of career longevity and achievement in science. Using this theoretical model, we show that short-term contracts can amplify the effects of competition and uncertainty making careers more vulnerable to early termination, not necessarily due to lack of individual talent and persistence, but because of random negative production shocks. We show that fluctuations in scientific production are quantitatively related to a scientist’s collaboration radius and team efficiency.

Institutional change could alter the relationship between science and scientists as well as the longstanding patronage system in academia (1, 2). Some recent shifts in academia include the changing business structure of research universities (3), shifts in the labor supply demand balance (4), a bottleneck in the number of tenure track positions (5), and a related policy shift away from long-term contracts (3, 6). Along these lines, significant factors for consideration are the increasing range in research team size (7), the economic organization required to fund and review collaborative research projects, and the evolving definition of the role of the academic research professor (3).

The role of individual performance metrics in career appraisal, in domains as diverse as sports (8, 9), finance (10, 11), and academia, is increasing in this data rich age. In the case of academia, as the typical size of scientific collaborations increases (7), the allocation of funding and the association of recognition at the varying scales of science [individual ⇆ group ⇆ institution (12)] has become more complex. Indeed, scientific achievement is becoming increasingly linked to online visibility in a considerable reputation tournament (13).

Here we seek to identify (*i*) quantitative patterns in the scientific career trajectory towards a better understanding of career dynamics and achievement (14⇓⇓⇓⇓⇓–20), and (*ii*) how scientific production responds to policies concerning contract length. Using rich productivity data available at the level of single individuals, we analyze longitudinal career data keeping in mind the roles of spillovers, group size, and career sustainability. Although our empirical analysis is limited to careers in physics, our approach is general. We speculate that similar features describe other disciplines where academic publication is a primary indicator and collaboration is a key feature.

Specifically, we analyze production data for 300 physicists *i* = 1…300 who are distributed into 3 groups: (*i*) Group A corresponds to the 100 most cited physicists with average *h*-index , (*ii*) Group B corresponds to 100 additional highly cited physicists with , and (*iii*) Group C corresponds to 100 assistant professors in 50 US physics departments with . We define the annual production *n*_{i}(*t*) as the number of papers published by scientist *i* in year *t* of his/her career. We focus on academic careers from the physics community to approximately control for significant cross-disciplinary production variations. Using the same set of scientists, a companion study has analyzed the rank-ordered citation distribution of each scientist with a focus on the statistical regularities underlying publication impact (17). We provide further description of the data and present a parallel analysis of 21,156 sports careers in *SI Appendix*.

We begin this paper with empirical analysis of longitudinal career data. Our empirical evidences serve as statistical benchmarks used in the final section where we develop a stochastic proportional growth model. In particular, our model shows that a short-term appraisal system can result in a significant number of “sudden” early deaths due to unavoidable negative production shocks. This result is consistent with a Matthew Effect model (16) and recent academic career survival analysis (21), which demonstrate how young careers can be stymied by the difficulty in overcoming early achievement barriers. Altogether, our results indicate that short-term contracts may increase the strength of the “rich-get-richer” mechanism in science (22, 23) and may hinder the upward mobility of young scientists.

## Results

### Scientific Production and the Career Trajectory.

The academic career depends on many factors, such as cumulative advantage (16, 19, 22, 23), the “sacred spark,” (24, 25), and other complex aspects of knowledge transfer manifest in our techno-social world (26). To exemplify this complexity, a recent case study on the impact trajectories of Nobel prize winners shows that “scientific career shocks” marked by the publication of an individual’s “magnum opus” work(s) can trigger future recognition and reward, resembling the cascading dynamics of earthquakes (27).

We model the career trajectory as a sequence of scientific outputs which arrive at the variable rate *n*_{i}(*t*). Because the reputation of a scientist is typically a cumulative representation of his/her contributions, we consider the cumulative production as a proxy for career achievement. Fig. 1*A* shows the cumulative production *N*_{i}(*t*) of six notable careers which display a temporal scaling relation where *α*_{i} is a scaling exponent that quantifies the career trajectory dynamics. The average and standard deviation of the *α*_{i} values calculated for each dataset are [A], 1.44 ± 0.26 [B], and 1.30 ± 0.31 [C]. We justify this two-parameter model in the *SI Appendix* text using scaling methods and data collapse.

There are also numerous cases of *N*_{i}(*t*) which do not exhibit such regularity (see *SI Appendix: Fig. S1*), but instead display marked nonstationarity and nonlinearity arising from significant exogenous career shocks. Positive shocks, possibly corresponding to just a single discovery, can spur significant productivity and reputation growth (24, 27). Negative shocks, such as in the case of scientific fraud, can end the career rather suddenly. We also acknowledge that the end of the career is a difficult phase to analyze, because such an event can occur quite abruptly, and so our analysis is mainly concerned with the growth phase and not the termination phase.

In order to analyze the average properties of *N*_{i}(*t*) for all 300 scientists in our sample, we define the normalized trajectory . The quantity is the average annual production of author *i*, with by construction (*L*_{i} corresponds to the career length of individual *i*). Fig. 1*B* shows the characteristic production trajectory obtained by averaging together the 100 belonging to each dataset, [1]The standard deviation *σ*(*N*^{′}(*t*)) shown in *SI Appendix: Fig. S2 B* begins to decrease after roughly 20 y for dataset [A] and [B] scientists. Over this horizon, the stochastic arrival of career shocks can significantly alter the career trajectory (20, 24, 27, 28).

Each exhibits robust scaling corresponding to the scaling law . This regularity reflects the abundance of careers with *α*_{i} > 1 corresponding to accelerated career growth. This acceleration is consistent with increasing returns arising from knowledge and production spillovers.

### Fluctuations in Scientific Output over the Academic Career.

Individuals are constantly entering and exiting the professional market, with birth and death rates depending on complex economic and institutional factors. Due to competition, decisions and performance at the early stages of the career can have long lasting consequences (16, 29). To better understand career uncertainty portrayed by the common saying “publish or perish” (30), we analyze the outcome fluctuation [2]of career *i* in year *t* over the time interval Δ*t* = 1 y. Fig. 2 *A* and *B* show the unconditional probability density function (pdf) of *r* values which are leptokurtic but remarkably symmetric, illustrating the endogenous frequencies of positive and negative output growth. Output fluctuations arise naturally from the lulls and bursts in both the mental and physical capabilities of humans (31, 32). Moreover, the statistical regularities in the annual production change distribution indicate a striking resemblance to the growth rate distribution of countries, firms, and universities (33, 34).

To better account for individual growth factors, we next define the normalized production change [3]which is measured in units of the fluctuation scale *σ*_{i}(*r*) unique to each career. We measure the average and the standard deviation *σ*_{i}(*r*) of each career using the first *L*_{i} available years for each scientist *i*. is a better measure for comparing career uncertainty, because individuals have production factors that depend on the type of research, the size of the collaboration team, and the position within the team. Fig. 2*C* shows that *P*(*r*^{′}), the pdf of *r*^{′} measured in units of standard deviation, is well approximated by a Gaussian distribution with unit variance. The data collapse of each *P*(*r*^{′}) onto the predicted Gaussian distribution (solid green curve) indicates that individual output fluctuations are consistent with a proportional growth model. We note that the remaining deviations in the tails for |*r*^{′}|≥3 are likely signatures of the exogenous career shocks that are not accounted for by an endogenous proportional growth model.

The ability to collaborate on large projects, both in close working teams and in extreme examples as remote agents [i.e. Wikipedia (35)], is one of the foremost properties of human society. In science, the ability to attract future opportunities is strongly related to production and knowledge spillovers (28, 36, 37) that are facilitated by the collaboration network (7, 12, 38⇓⇓⇓–42). Indeed, there is a tipping point in a scientific career that occurs when a scientist’s knowledge investment reaches a critical mass that can sustain production over a long horizon, and when a scientist becomes an attractor (as opposed to a pursuer) of new collaboration/production opportunities. To account for collaboration, we calculate for each author the number *k*_{i}(*t*) of distinct coauthors per year and then define his/her collaboration radius *S*_{i} as the median of the set of his/her *k*_{i}(*t*) values, *S*_{i} ≡ *Med*[*k*_{i}(*t*)]. We use the median instead of the average because extremely large *k*_{i}(*t*) values can occur in specific fields such as high-energy physics and astronomy.

Given the complex scientific coauthorship network, we ask the question: what is the typical number of unique coauthors per year? Fig. 2*D* shows the cumulative distribution function *CDF*(*S*_{i}) of *S*_{i} values for each dataset. The approximately linear form on log-linear axes indicates that *S*_{i} is exponentially distributed, *P*(*S*_{i}) ∼ exp[-*λS*_{i}]. We calculate *λ* = 0.15 ± 0.01 [A], *λ* = 0.11 ± 0.01 [B], and *λ* = 0.11 ± 0.01 [C]. The exponential size distribution has been shown to emerge in complex systems where linear preferential attachment governs the acquisition of new opportunities (43). This result shows that the leptokurtic “tent-shaped” distribution *P*(*r*) in Fig. 2 follows from the exponential mixing of heterogenous conditional Gaussian distributions (44).

The exponential mixture of Gaussians decomposes the unconditional distribution *P*(*r*) into a mixture of conditional Gaussian distributions [4]each with a fluctuation scale *σ*_{i}(*r*) depending on *S*_{i} by the scaling relation [5]Hence, the mixture is parameterized by *ψ* [6]The independent case *ψ* = 0 results in a Gaussian *P*_{ψ}(*r*) and the linear case *ψ* = 1 results in a Laplace (double-exponential) *P*_{ψ}(*r*). See *SI Appendix* and ref. 44 for further discussion of the *ψ* dependence of *P*_{ψ}(*r*).

### The Size-Variance Relation and Group Efficiency.

The values of *ψ* for scientific and athletic careers follow from the different combination of physical and intellectual inputs that enter the production function for the two distinct professions. Academic knowledge is typically a nonrival good, and so knowledge-intensive professions are characterized by spillovers, both over time and across collaborations (36, 37), consistent with *α*_{i} > 1 and *ψ* > 0. Interestingly, Azoulay, et al. show evidence for production spillovers in the 5–8% decrease in output by scientists who were close collaborators with a “superstar” scientists who died suddenly (28).

We now formalize the quantitative link between scientific collaboration (38, 39) and career growth given by the size-variance scaling relation in Eq. **5** visualized in the scatter plot in Fig. 3*B*. Using ordinary least squares (OLS) regression of the data on log-log scale, we calculate *ψ*/2 ≈ 0.40 ± 0.03 (*R* = 0.77) for dataset [A], *ψ*/2 ≈ 0.22 ± 0.04 (*R* = 0.51) [B], and *ψ*/2 ≈ 0.26 ± 0.05 (*R* = 0.45) [C]. Interdependent tasks that are characteristic of group collaborations typically involve partially overlapping efforts. Hence, the empirical *ψ* values are significantly less than the value *ψ* = 1 that one would expect from the sum of *S*_{i} independent random variables with approximately equal variance *V*. Collectively, these empirical evidences serve as coherent motivations for the preferential capture growth model that we propose in the following section.

Alternatively, it is also possible to estimate *ψ* using the relation between the average annual production and the collaboration radius *S*_{i}. The input-output relation quantifies the collaboration efficiency, with *ψ* = 0.74 ± 0.04 (*R* = 0.87) for dataset [A] and *ψ* = 0.25 ± 0.04 (*R* = 0.37) for dataset [B]. If the autocorrelation between sequential production values *n*_{i}(*t*) and *n*_{i}(*t* + 1) is relatively small, then we expect the scaling exponents calculated for and to be approximately equal. This result follows from considering *r*_{i}(*t*) as the convolution of an underlying production distribution *P*_{i}(*n*) for each scientist that is approximately stable. Interestingly, the larger *ψ* values calculated for dataset [A] scientists suggests that prestige is related to the increasing returns in the scientific production function (45).

Next we use an alternative method to estimate the annual collaboration efficiency by relating the number of publications *n*_{i}(*t*) in a given year to the number of distinct coauthors *k*_{i}(*t*) over the same year. We use a single-factor production function, [7]to quantify the relation between output and labor inputs with a scaling exponent *γ*_{i}. We estimate *q*_{i} and *γ*_{i} for each author using OLS regression, and define the normalized output measure using the best-fit *q*_{i} and *γ*_{i} values calculated for each scientist *i*. Fig. 3*C* shows the efficiency parameter *γ* calculated by aggregating all careers in each dataset, and indicates that this aggregate *γ* is approximately equal to the average calculated from the *γ*_{i} values in each career dataset: *γ* = 0.68 ± 0.01 [A], *γ* = 0.52 ± 0.01 [B], and *γ* = 0.51 ± 0.02 [C]. Furthermore, the *ψ* and *γ* values are approximately equal, which is not surprising, because both scaling exponents are efficiency measures that relate the scaling relation of output *n*_{i}(*t*) per input *k*_{i}(*t*).

### A Proportional Growth Model for Scientific Output.

We develop a stochastic model as a heuristic tool to better understand the effects of long-term vs. short-term contracts. In this competition model, opportunities (i.e., new scientific publications) are captured according to a general mechanism whereby the capture rate depends on the appraisal *w*_{i}(*t*) of an individual’s record of achievement over a prescribed history. We define the appraisal to be an exponentially weighted average over a given individual’s history of production [8]which is characterized by the appraisal horizon 1/*c*. We use the value *c* = 0 to represent a long-term appraisal (tenure) system and a value *c*≫1 to represent a short-term appraisal system. Each agent *i* = 1…*I* simultaneously attracts new opportunities at a rate [9]until all *P* opportunities for a given period *t* are captured. We assume that each agent has the production potential of one unit per period, and so the total number of opportunities distributed per period *P* is equal to the number of competing agents, *P* ≡ *I*.

We use Monte Carlo (MC) simulation to analyze this two-parameter model over the course of *t* = 1…*T* sequential periods. In each production period (i.e., representing a characteristic time to publication), a fixed number of *P* production units are captured by the competing agents. At the end of each period, we update each *w*_{i}(*t*) and then proceed to simulate the next preferential capture period *t* + 1. Because depends on the relative achievements of every agent, the relative competitive advantage of one individual over another is determined by the parameter *π*. In the *SI Appendix* we elaborate in more detail the results of our simulation of synthetic careers dynamics. We vary *π* and *c* for a labor force of size *I* ≡ 1,000 and maximum lifetime *T* ≡ 100 periods as a representative size and duration of a real labor cohort. Our results are general, and for sufficiently large system size, the qualitative features of the results do not depend significantly on the choice of *I* or *T*.

The case with *π* = 0 corresponds to a random capture model that has (*i*) no appraisal and (*ii*) no preferential capture. Hence, in this null model, opportunities are captured at a Poisson rate *λ*_{p} = 1 per period. The results of this model (see *SI Appendix: Fig. S13*) show that almost all careers obtain the maximum career length *T* with a typical career trajectory exponent . Comparing to simulations with *π* > 0 and *c*≥0, the null model is similar to a “long-term” appraisal system (*c* → 0) with sublinear preferential capture (*π* < 1). In such systems, the long-term appraisal time scale averages out fluctuations, and so careers are significantly less vulnerable to periods of low production and hence more sustainable because they are not determined primarily by early career fluctuations.

However, as *π* increases, the strength of competitive advantage in the system increases, and so some careers are “squeezed out” by the larger more dominant careers. This effect is compounded by short-term appraisal corresponding to *c* ≈ 1. In such systems with superlinear capture rates and/or relatively large *c*, most individuals experience “sudden death” termination relatively early in the career. Meanwhile, a small number of “stars” survive the initial selection process, which is governed primarily by random chance, and dominate the system.

We found drastically different lifetime distributions when we varied the appraisal (contract) length (see *SI Appendix: Figs. S12–S16*). In the case of linear preferential capture with a long-term appraisal system *c* = 0, we find that 10% of the labor population terminates before reaching career age 0.94*T* (where *T* is the maximum career length or “retirement age”), and only 25% of the labor population terminates before reaching career age 0.98*T*. On the contrary, in a short-term appraisal system with *c* = 1, we find that 10% of the labor population terminates before reaching age 0.01*T*, and 25% of the labor population dies before reaching age 0.02*T* (see *SI Appendix: Table S1*). Hence, in model short contract systems, the longevity, output, and impact of careers are largely determined by fluctuations and not by persistence.

Fig. 4 shows the MC results for *π* = 1. For *c*≥1 we observe a drastic shift in the career longevity distribution *P*(*L*), which becomes heavily right-skewed with most careers terminating extremely early. This observation is consistent with the predictions of an analytically solvable Matthew effect model (16) which demonstrates that many careers have difficulty making forward progress due to the relative disadvantage associated with early career inexperience. However, due to the nature of zero-sum competition, there are a few “big winners” who survive for the entire duration *T* and who acquire a majority of the opportunities allocated during the evolution of the system. Quantitatively, the distribution *P*(*N*) becomes extremely heavy-tailed due to agents with *α* > 2 corresponding to extreme accelerating career growth. Despite the fact that all the agents are endowed initially with the same production potential, some agents emerge as superstars following stochastic fluctuations at relatively early stages of the career, thus reaping the full benefits of cumulative advantage.

## Discussion

An ongoing debate involving academics, university administration, and educational policy makers concerns the definition of professorship and the case for lifetime tenure, as changes in the economics of university growth have now placed *tenure* under the review process (3, 6). Critics of tenure argue that tenure places too much financial risk burden on the modern competitive research university and diminishes the ability to adapt to shifting economic, employment, and scientific markets. To address these changes, universities and other research institutes have shifted away from tenure at all levels of academia in the last thirty years towards meeting staff needs with short-term and nontenure track positions (3).

For knowledge intensive domains, production is characterized by *long-term* spillovers both through time and through the knowledge network of associated ideas and agents. A potential drawback of professions designed around short-term contracts is that there is an implicit expectation of sustained annual production that effectively discounts the cumulative achievements of the individual. Consequently, there is a possibility that short-term contracts may reduce the incentives for a young scientist to invest in human and social capital accumulation. Moreover, we highlight the importance of an employment relationship that is able to combine positive competitive pressure with adequate safeguards to protect against career hazards and endogenous production uncertainty an individual is likely to encounter in his/her career.

In an attempt to render a more objective review process for tenure and other lifetime achievement awards, quantitative measures for scientific publication impact are increasing in use and variety (17⇓⇓–20, 24, 27, 46, 47). However, many quantifiable benchmarks such as the *h*-index (17) do not take into account collaboration size or discipline specific factors. Measures for the comparison of scientific achievement should at least account for variable collaboration, publication, and citation factors (19, 46, 47). Hence, such open problems call for further research into the quantitative aspects of scientific output using comprehensive longitudinal data for not just the extremely prolific scientists, but the entire labor force.

Current scientific trends indicate that there will be further increases in typical team sizes that will forward the emergent complexity arising from group dynamics (7, 12, 42), and overall, an incredible growth of science. There is an increasing need for individual/group production measures, such as the output measure *Q*, following from Eq. **7**, which accounts for group efficiency factors. Normalized production measures which account for coauthorship factors have been proposed in refs. 19, 46, but the measures proposed therein do not account for the variations in team productivity.

The complexity of large collaborations raises open questions concerning scientific productivity and the organization of teams. We measure a decreasing marginal return *γ* < 1 with increasing group size which identifies the importance of team management. A theory of labor productivity can help improve our understanding of institutional growth, for organizations ranging in size from scientific collaborations to universities, firms, and countries (33, 34, 44, 47⇓⇓–50).

## Acknowledgments

We thank D. Helbing, N. Dimitri, O. Penner, and an anonymous PNAS Board Member for insightful comments. We gratefully acknowledge support from the IMT and Keck Foundations, the Defense Threat Reduction Agency (DTRA) and Office of Naval Research (ONR), and the National Science Foundation (NSF) Chemistry Division (Grants CHE 0911389 and CHE 0908218).

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. E-mail: hes{at}bu.edu, Alexander M. Petersen alexander.petersen{at}imtlucca.it, or Fabio Pammolli f.pammolli{at}imtlucca.it.

Author contributions: A.M.P., M.R., H.E.S., and F.P. designed research; A.M.P., M.R., H.E.S., and F.P. performed research; A.M.P., M.R., H.E.S., and F.P. analyzed data; and A.M.P., M.R., H.E.S., and F.P. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1121429109/-/DCSupplemental.

## References

- ↵
- David PA

- ↵
- ↵
- Chait RP

- ↵
- Austin J

- ↵
- ↵
- ↵
- Wutchy S,
- Jones BF,
- Uzzi B

- ↵
- Petersen AM,
- Jung W-S,
- Stanley HE

- ↵
- ↵
- Coates JM,
- Herbert J

- ↵
- Saavedra S,
- Hagerty K,
- Uzzi B

- ↵
- Börner K,
- et al.

- ↵
- ↵
- Shockley W

- ↵
- ↵
- Petersen AM,
- Jung W-S,
- Yang J-S,
- Stanley HE

- ↵
- Petersen AM,
- Stanley HE,
- Succi S

- ↵
- ↵
- ↵
- Benjamin FJ,
- Weinberg BA

- ↵
- Kaminski D,
- Geisler C

- ↵
- Merton RK

- ↵
- ↵
- ↵
- ↵
- Vespignani A

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Fu D,
- et al.

- ↵
- Capocci A,
- Rao F,
- Caldarelli G

- ↵
- Romer PM

- ↵
- Owen-Smith J,
- Powell WW

- ↵
- Börner K,
- Maru JT,
- Goldstone RL

- ↵
- ↵
- Newman MEJ

- ↵
- ↵
- Guimerá R,
- Uzzi B,
- Spiro J,
- Amaral LAN

- ↵
- ↵
- ↵
- Nelson RR

- Arrow KJ

- ↵
- Radicchi F,
- Fortunato S,
- Castellano C

- ↵
- ↵
- ↵
- Riccaboni M,
- Pammoli F,
- Buldyrev SV,
- Ponta L,
- Stanley HE

- ↵
- Podobnik B,
- Horvatic D,
- Petersen AM,
- Njavro M,
- Stanley HE

## Citation Manager Formats

## Article Classifications

- Social Sciences
- Economic Science

- Physical Sciences
- Applied Mathematics