Accelerating Scientific Publication in Biology

Scientific publications enable results and ideas to be transmitted throughout the scientific community. The number and type of journal publications also have become the primary criteria used in evaluating career advancement. Our analysis suggests that publication practices have changed considerably in the life sciences over the past thirty years. More experimental data is now required for publication, and the average time required for graduate students to publish their first paper has increased and is approaching the desirable duration of Ph.D. training. Since publication is generally a requirement for career progression, schemes to reduce the time of graduate student and postdoctoral training may be difficult to implement without also considering new mechanisms for accelerating communication of their work. The increasing time to publication also delays potential catalytic effects that ensue when many scientists have access to new information. The time has come for life scientists, funding agencies, and publishers to discuss how to communicate new findings in a way that best serves the interests of the public and the scientific community.

Most biologists have become frustrated with the current state of scientific publishing. Attention has been drawn to flaws in using journal impact factors for evaluating scientific merit (1), the hypercompetitive environment created by scientists seeking to publish their work in the top journals (2), and the extensive revisions required by reviewers and editors (3,4). In this Perspective, I wish to focus on another issue that has received less attention-the increasing amount of data and time required to publish a paper.
As a consumer of scientific literature, I enjoy reading the comprehensive scientific studies that are being published today. However, the foundation of today's data--rich articles is acquired at a cost, which is the time that graduate students and postdoctoral fellows spend in collecting and analyzing data. Indeed, as I will discuss later, the length of time required to produce and then publish a scientific work is likely impacting the duration and quality of Ph.D. and postdoctoral training. Furthermore, as laboratories wait to accumulate more experimental data before they feel that a benchmark for publication is met, crucial results are being sequestered from the scientific community for longer periods of time. In this Perspective, I will argue that creating new outlets for faster and more nimble scientific communication could have positive outcomes on professional training, catalyzing scientific progress, and improving the culture of communication within the life sciences as a whole.

A trend toward increasing data required for publication
Many senior scientists feel that the amount of data required for publication has increased over their careers (for example, see ref. 4). But is this actually true? Quantifying the amount of experimental data in a publication is non--trivial, as data can take many different forms and varies in the amount of time required for its acquisition. Furthermore, comparing the amount of data in contemporary versus prior papers is difficult. For example, the time required to obtain certain types of information has decreased; as an extreme example, sequencing an entire genome now requires less time than cloning and sequencing a single gene 40 years ago. However, scientists always push technical limits, and many of the experiments performed today also are difficult and require a long time to master and execute. Thus, I would argue that truly informative experimental data are not vastly easier to obtain now than in the past. Practices in data inclusion, however, may have changed; for example, experiments previously described as "data not shown" would now likely be included in a supplemental figure, and figures also are easier to prepare now with computer programs compared with more cumbersome manual methods in the past. the amount of data per paper is even more substantial when supplemental information, which began to appear ~1997, is taken into consideration (Fig. 1B, C). In particular, the number of supplemental figures and their panels were comparable to (Cell) or exceeded (Nature) those that were published in the print version (Fig. 1C). Consistent with this trend of more data and the likely use of more diverse and complex techniques, today's papers in Cell, Nature and JCB have 2--4 fold more authors than those from 1984 (Fig.  1D).
However, enlisting more authors is probably not the sole mechanism for acquiring the additional data needed for contemporary papers. As will be discussed later, it also appears to take a longer period of time to publish a paper now than in the past.

Factors driving an increasing amount of data per publication
What factors have driven the increasing amount of data per publication over the past few decades? One likely factor is supply and demand-- more scientists are competing for the same or less real estate (space in top journals, Fig. 1A) compared to thirty years ago.
Over the past 30 years, the US scientific workforce (e.g. postdoctoral fellows and graduate students) has increased by almost three--fold (5,6) While the elite journals are important driving forces in the scientific market place, the trend towards more data is felt throughout the publication ecosystem. One reason is that non--elite journals want to improve their status, and, as a consequence, strive to be selective and seek more mature stories. This is perhaps why JCB accepts fewer papers now than it did in the 1980s (Fig. 1A). Second, scientists feel pressured to aim high and acquire the data that they think will be needed for publication in an elite journal. But alas, when it comes time for journal courtship, they find their work editorially rejected not once, but thrice, and then eventually publish their large body of work in a lower tier journal. It is not easy to obtain information on journal rejections from the 1980s, although I speculate that the frequency has increased considerably in the past thirty years. Thus, in addition to the time invested in acquiring data, the time spent in finding a home for a paper through sequential journal submissions also significantly delays the transmission of results to the scientific community.

What is a minimal unit for publication?
Most scientific papers, now and in the past, usually have one or two key findings.
But with the trend towards publishing more mature scientific stories, it has become harder to publish just a key initial finding or a bold hypothesis. Let's consider the Watson and Crick publications, perhaps the most famous in modern biology, and imagine how they might fare in today's publishing environment.
Many people may be unaware that Watson and Crick published not one but two papers on DNA in Nature in successive months. The first paper published on April 25, 1953 described a structural model for the DNA double helix (8). Despite having a single figure (a model figure without data), it was listed as an "Article" rather than a "Letter", based upon the magnitude of the idea. The first Watson/Crick paper was accompanied by two other Articles on the X--ray diffraction pattern of DNA; the paper by Maurice Wilkins had two figures (9) and the one by Rosalind Franklin displayed a single figure (10). The second Watson/Crick Nature paper (also an Article published on May 30) was entitled "Genetic Implications of the Structure of Deoxyribonucleic Acid". It described, without any data, a hypothesis for the hydrogen bonding of the "Watson--Crick" base pairs and speculated how the two DNA strands might each provide a template for the replication of genetic information (11). Several months later, Wilkins and Franklin each independently published second Nature Articles describing more complete analyses of the structure of DNA (12,13 -1985) were published in five papers in 1985 (22--26). In all of the above examples, the information could have been delayed and compacted into fewer publications, as no doubt would occur today. However, by unfolding these breakthroughs in a series of papers, the progression of results could be quickly disseminated to the scientific community, the value of which will be discussed in the next section. Today, two opposing factors come into play in deciding when to publish a paper. On one hand, scientists want to get their work published as fast as possible, both for advancing their careers as well as claiming priority for their discovery and avoiding getting "scooped".
However, publishing in a top journal has become an equally compelling consideration for many scientists, and this latter factor can tip the balance towards delaying submission until more experimental data can be obtained.

Consequences on the exchange of information within the scientific community
The "comprehensive" paper enables authors to build a convincing argument for

Consequences for Training
In 1990, the average age at which scientists received their first R01 NIH grant was less than 38 years; in 2013, that same milestone was reached at an average age of over 45 years (27). This trend is of great concern for many obvious reasons (2,27), including the fact that it is making a career in biomedical research less attractive to young people (28).
In an attempt to reverse this trend, efforts are now being made to accelerate the career track of young scientists. Many graduate schools require regular thesis committee meetings to promote timely graduation, and a recent Perspective in PNAS suggests limiting funds for graduate training to 5 years (29). Some institutions and granting agencies limit the length of postdoctoral training to 5 years, which is also strongly recommended by a recent National Research Council report (30) and others (29). In addition, new grant schemes, such as the NIH K99, seek to promote the transition of postdoctoral fellows to junior faculty positions. All of these measures are worthy, but for them to succeed in reducing training time, they must be accompanied by changes in the publication system. Placing term limits on graduate and postdoc training would be a perfect solution if PIs were always responsible for keeping their trainees for too long in their laboratories. While this no doubt occurs, graduate students and postdocs also are asking their PIs if they can stay for a longer period of time. To understand why this is happening, one has to appreciate the connection between publication and career advancement. Scientific papers are required for obtaining a job, a promotion, or a grant, and thus have become a primary currency for professional advancement. Furthermore, papers in elite journals have become particularly valuable in the career marketplace. Graduate students and postdocs understand the "paper economy", and they want to publish as many papers as possible and ideally publish a paper in Cell, Science or Nature. But it seems as though publishing many papers and being published in elite journals is harder now than it was in the past. I examined the publication records for Ph.D. students at University of California San Francisco (UCSF) who graduated in the 1980s (n = 71) versus those that graduated in the past three years (n= 104; Table  1; Fig.  S3 and S4). The average time for acquiring a Ph.D. increased slightly between the past (5.7 years) and current (6.3 years) student groups; these times to degree are largely consistent with national trends (5, 29). However, even though the contemporary group of graduate students was in school for one--half year longer, they published fewer first/second author papers and published much less frequently in the three most prestigious journals.
Consistent with the notion of more data being required for publication, the contemporary students also took an additional 1.3 years, on average, to publish their first, first--author paper compared with students from the 1980s. Strikingly, the average time to a first author publication for the current cohort (6 years for students who publish) is just below the average time of their graduation (6.3 year) and at the desired upper boundary for training in these graduate programs (6 years or less). These general trends also are apparent when comparing the top 1/3 rd of students with the best publication records, suggesting that the differences cannot be explained by admitting a pool of less capable students now than in the past (Table 1). UCSF also remains a highly sought--after graduate school, and its reputation has gotten stronger since the 1980s. This type of analysis should be extended to larger numbers of students from many different universities, but these preliminary data suggest that it has become harder for graduate students to publish. The increasing time to publication poses difficulties in reaching milestones for career advancement. Graduate students often need to apply for a postdoctoral position 9--12 months prior to graduation and thesis committees frequently recommend having a first--author paper accepted for publication prior to initiating the application process. Postdocs seeking a job or grant support face a similar predicament. For example, let's consider the timing of the highly sought--after NIH K99 Pathway to Independence Award, which provides 1--2 years of postdoctoral training and 3 years of independent support. The postdoc likely requires 2 months to write a successful grant and then it can take 9 months from submission to the time when funding is received. Importantly, a K99 grant will be considered much more competitive if the postdoc has a prior publication; a "manuscript in submission" cannot be listed in an NIH grant application. If it takes a postdoc three years to have a paper accepted before submitting a competitive K99 application (often a best case scenario), then a talented young scientist will spend ~5--6 years in a postdoc before getting a job (three years to publish a paper, an additional year from grant writing to funding, followed by a ~1--2 year training period). In summary, the ability of thesis, grant, and job committees to access a formal and publicly accessible paper could accelerate career transitions towards the end of graduate and postdoctoral training.
Providing young scientists with more opportunities to publish also has other advantages for training. Preparing and publishing a scientific paper is a critical part of the apprenticeship of becoming a scientist. This experience not only promotes skills in writing, but also in organizing experimental data and learning how to convey ideas effectively. The process of completing a scientific paper also teaches young scientists how to be more efficient in planning and executing experiments in their future projects. However, with the increasing time involved in acquiring data and publishing, young scientists get fewer chances to write papers and thus arguably are less well trained in these skills than trainees in the past (Table  1). Furthermore, if a critical study reaches the point of publication after 4--5 years of work, all too often the PI, who has more experience, takes over the process of writing from a graduate student or postdoc. In such cases, neither the young scientist nor the PI are willing to take chances with the paper being accepted in today's competitive publication environment. Another value of publishing earlier is that it allows a graduate student or a postdoc to explore more options for utilizing their remaining training period. Rather than myopically focusing on getting their one paper out, trainees can decide whether they want expand their first study, move on to another research question, or spend some time pursuing additional career training (e.g. teaching). , 2) succeed in capturing the very best work in the field, 3) be able to launch and co--exist with existing journals, and 4) be cost--effective and be possible to implement on a time scale of years rather than decades.

Lessons from the Physics Community: Should Biologists Adopt an Internet Pre--Print System?
A mechanism for accelerating scientific communication that meets the above criteria has been developed already by the physical science community. Physicists, mathematicians, and computer scientists typically deposit their scientific manuscripts prior to journal publication in an open access e--print service called arXiv (pronounced "archive"), which was founded by Paul Ginsparg and is now operated by Cornell Library. At first created for the high energy physics community, arXiv usage has spread over time to other sectors of physics, mathematics, computer science, and quantitative biology. This repository of electronic pre--prints is searchable, and many physicists have developed a habit of checking for alerts from arXiv first thing in the morning. Generally, although not always, a paper uploaded onto arXiv is then submitted to a journal. Importantly, the public disclosure through arXiv is accepted by the physical science/mathematics community as priority for a discovery, and an arXiv posting is acceptable as a reference in a journal, book or grant application. After the original paper is posted in arXiv, new versions can be uploaded, for example after a paper has been revised through the journal review process or in response to other comments received by the community. However, earlier versions of the paper are retained and the nature of the changes are indicated in revised uploads. ArXiv evolved from a common practice in the physics community, beginning several decades ago, of mailing unpublished manuscripts to colleagues in the field. This also was more common in the early years of molecular biology, a famous example being Watson and Crick obtaining a pre--print from Linus Pauling that proposed the erroneous triple helix Press) as well as PeerJ and F1000 Research, for--profit companies that also offer platforms for peer review. However, pre--prints in biology have not achieved a critical mass for take-off. Last year, for example, bioRxiv received 888 pre--prints compared to 97,517 for arXiv, even though many more papers are published in the life sciences.
Having never used a pre--print server myself, I tried the experiment of submitting this Perspective to bioRxiv and PNAS on the same day (July 10, 2015); after initial screening, the article was posted as a PDF on bioRxiv on July 11 (33). Fig. 2 shows the number of views of the bioRxiv article and social media exchanges ("tweets") from the time of pre--print posting until the receipt of two peer reviews and an editorial decision from PNAS (August 21). The data show the pre--print reached a large audience (views of the abstract were over twice that of whole article) and also reveal how social media can drive viewership. Importantly, even prior to the receipt of two anonymous referee reports, I received extensive feedback on the article through comments posted on bioRxiv, direct emails from readers, and numerous personal discussions. Such feedback helped me to formulate a set of the pros, cons, and uncertainties surrounding pre--prints, as discussed below (for a more extensive discussion of these issues, see the Q&A in the Supporting Information).
The Pros: Fast, Free and Feasible 1) Submission to a pre--print repository allows a paper to be seen and evaluated by colleagues and search/grant committees immediately after its completion. This could enable trainees to apply for postdoctoral positions, grants, or jobs earlier than waiting for the final journal publication. It also allows independent investigators to transmit their latest work in a reliable manner to grant review committees, without an unknown delay imposed by the journal publication process. A recent study of several journals found an average delay of ~7 months from acceptance to publication (34), but some journals take longer (34) and this time does not take into account journal rejections and the increasingly prevalent need to "shop" for a journal that will publish the work.
2) A primary objective of a pre--print repository is to transmit scientific results more rapidly to the scientific community, which should appeal to funding agencies whose main objective is to catalyze new discoveries overall. Furthermore, authors can receive faster and broader feedback on their work than occurs through peer review, as I have discussed as a case in point with this article (also see an experience from a junior faculty member in the Q&A, SI).
3) If widely adopted, a pre--print repository (which acts as an umbrella to collect all scientific work and is not associated with any specific journal) could have the welcoming effect of having colleagues read and evaluate scientific work before it has been branded with a journal name. For grants, jobs and awards, physicists will read and evaluate science posted on arXiv. The life science community needs to return to a culture of evaluating scientific merit from reading manuscripts, rather than basing judgment on where papers are published.

4) A pre--print repository is good value in terms of impact and information transferred
per dollar spent. Compared to operating a journal, the cost of running arXiv is low, with most of its operating costs covered from modest subscription payments from 175 institutions and a matching grant from the Simons Foundation. Unlike a journal, submissions to arXiv (and currently bioRxiv) are free. Cons: Lack of peer review and information overload 1) The lack of peer review might invite lower quality or irreproducible data to be disseminated. While a risk particularly for certain medical research (see Q&A in SI), several factors mitigate such concerns. First, arXiv and bioRxiv each have an initial screening mechanism that helps to eliminate overtly "unscientific" articles. Second, the major factor for ensuring quality is that the reputation of the investigator is at stake, and achieving a good reputation within the community is a primary motivating factor for scientists. Indeed, a pre--print submission is immediately visible to the entire community, whereas a journal submission is seen confidentially by only a couple of referees. Thus, posting of a poor quality paper on a pre--print server will be widely visible and reflect poorly on the investigator and his/her lab. Scientists take pride in their work and will be guided by their own internal standards in deciding when their work is ready to be released to the community. Third, the paper can receive input (as this article has) from more than 2--3 referees, which could help authors correct flawed experiments/statements and help produce a better final product published in the journal. Fourth, peer review by journals, while helpful, is certainly not a fool--proof mechanism for identifying problems or eliminating scientific irreproducibility, especially since the referees' first task is to assess whether the work is "exciting enough" rather than "accurate enough". If a recent fictitious method for preparing pluripotent stem cells (35) had first surfaced as a pre--print, many scientists would have likely noted its flaws well before journal publication. Thus, the buyer always must beware and exercise appropriate judgment for scientific quality, regardless of whether a study appears in an elite journal or an electronic pre--print server. In addition, one could imagine an option of incorporating author--initiated peer evaluations as part of a pre--print, which most scientists do informally before submitting their work to a journal and is not unlike the mechanism by which National Academy of Science members submit papers to the PNAS.
2) Pre--prints could expand the problem of information overload in biology by opening the door to less interesting reports that are not being published by journals. While this could be true, certain "unpublishable" studies, such as a negative result or whether a prior finding can be reproduced, might provide useful information to some scientists.
Furthermore, scientists are already living in a world of information overload. Rather than suppressing pre--prints, the answer may lie in better search filters such as key words, colleagues of interest, social media cues, and potentially even other measures of validation (such as whether the work was supported by a grant from NIH, NSF, or other major agencies).
Uncertainties: culture, priority, and government and journal support If the pros seem attractive and the cons manageable, then why are pre--prints not being used by biologists? One reason is that most biologists simply don't know about pre-print servers. But there are other reasons as well. Many believe that biology has a different culture from physics, which will make it impossible for the success of arXiv to be extended into biology. "Culture" refers to the moral fabric of the community--how credit for a discovery is assigned, how information is shared, and how a scientist's work is evaluated. Currently, many issues regarding pre--prints, which are clear for physicists, are clouded by uncertainty in the biology community (see also Q&A, SI). In the fast moving world of experimental biology, will a pre--print publication result in an increased risk of losing credit and getting "scooped"? Will a pre--print put a journal submission at risk for automatic rejection. Will a pre--print be recognized by grant agencies, thesis committees, etc.? These uncertainties create considerable barriers to use of pre--prints in the biology community. The following leadership and policy changes could eliminate these barriers: 1) Pre--prints become accepted as evidence for establishing priority of a discovery, as is true in physics.
2) Pre--prints become accepted as evidence of productivity in grant applications.
Currently, NIH only allows listing of accepted peer--reviewed papers in a grant.
However, grant reviewers are "peer reviewers" and should be able to judge the quality of a scientist's most recent work in the form of a pre--print.
3) Pre--prints become accepted by life science journals. Currently, many journals (Science, Nature, eLIFE, PNAS, others) allow prior pre--print submissions; however, some journals still have ambiguous policies, which constitutes an overall deterrent.

Help from the journals: creating a new "Key Finding" format
A pre--print server provides a solution for improving the ease and speed of communicating a paper, but it does not necessarily address the escalating amount of data needed for publications in journals (Fig.  1). Here, journals themselves could take the lead. Changing the status quo appears daunting if not impossible, particularly to many young scientists who feel frustrated by the present publication system. It is easy to assign the fault to the journals, but such blame is misplaced and diverts attention from where the lion's share of the responsibility lies-in our own life sciences community. As scientists, we need to define our culture and take ownership in developing a system for communicating research results that best suits our needs as well as the needs of the public.
We have not done so, at least not yet. Optimistically, change can happen if our community sets its mind to the task, recognizing that universal consensus may not be achievable and that certain subfields of biology will likely embrace new ideas more readily than others.
Young scientists, who have grown up in a culture of sharing information on the internet, also may embrace a new opportunity, if it is presented to them.
As is often the case, it is easier to articulate the problem than derive an effective solution. One idea discussed here for accelerating publication in the life sciences is the wide--spread adoption of electronic pre--prints. Mechanisms for submitting pre--prints already exist; however, with everyone standing at the shore and very few people willing to jump in, the water looks cold and uninviting. Thus, a challenge for this idea becomes changing behavior on a massive scale, which first requires removing barriers and providing better incentives for pre--print publishing; only then can the experiment be done properly of establishing whether pre--prints serve the needs of biologists. Others may feel that reform of the existing journal system (better and more transparent reviewing, better evaluation metrics) might suffice without resorting to a pre--print server or other new model. But how effective will these reforms be without implementing new incentives for currently overwhelmed scientific referees and will they be sufficient to truly change the "daily lives" of graduate students and postdoctoral fellows? Others feel that journals and      Table  Footnote

Scoring of Panels and Data
Panels were scored by simply counting the lettering (a, b, etc)

designations in figures.
Data--containing tables and figure schematics were counted as panels. Videos in the supplemental material were not counted. Panels are an imprecise proxy for the experimental data contained within a paper, and we therefore we attempted to estimate the amount of distinct pieces of data in Fig. S2. For example, a single experiment may be displayed in multiple panels with separate letters, such as different views of a fluorescence micrograph. Conversely, a single labeled panel may contact multiple different types of experiments. Therefore panels were scored as to whether they contained distinct pieces of data. To provide examples, if a representative image in one panel and quantification of the same experiment was provided in another panel, then both panels would be counted as a single piece of data. Also, if the same experiment was quantified in multiple ways (e.g. analysis of different organelle sizes or multiple kinetic parameters from the same experiment) and presented in multiple panels, then it would still be counted as a single piece of data. Different views or slices of the same sample, views of the same crystal structure, and multiple probes (for DNA or protein) used for the same sample also were considered as a single piece of information. Identical experiments applied to two different cell lines were also considered as one piece of data. Sequence alignments were counted as a one piece of data as were tables. Differentiation of separate pieces of data only were evaluated and scored between panels in a single figure and not between figures.
Schematics and model figures were also not counted as "data" in this analysis. Two graduate students independently quantified the data presented in January and February 1984 articles in Cell to determine whether these criteria led to consistent scoring. The average pieces of distinct data per article were 7.33 and 7.16, indicating good overall agreement between two independent scorers. The other months of Jan--June from 1984 and 2014 for Cell and Nature were scored by a single person.

Analysis of UCSF Graduate Student Publications
Several basic science graduate programs in the 1980s have disappeared or merged with other programs and new graduate programs have formed more recently. To make a fair comparison of graduate student work between the 1980s and current times, we analyzed student data from four basic science PhD degree granting programs that have spanned both time periods: Biochemistry and Molecular Biology, Biophysics, Genetics, and Neuroscience. Since this study was focused on experimental science, students conducting exclusively theory or modeling studies were not counted in this analysis (5 students in 2012--4 in this category). Information on the time of entering graduate school and the time at which the degree was granted was obtained from the UCSF student registrar's office.
Publication references and dates for the students were obtained by searching PubMed.
Reviews or methods papers that were largely more detailed descriptions of previously published methods were not counted. "Shared authorship" represents a difficult issue, since this designation did not exist in the 1980s. While acknowledging the drawbacks of doing so, we only scored the order of authorship; thus a shared first author in the second position was counted as a second authorship in our analysis. The reason for doing so is to allow a more direct comparison with data from the 1980s, which did not employ co--first or co--second authorship as a credit sharing strategy. However, an exception was made for students that only published a single co--first author in their graduate work; in this case, this second--position work was counted as a first--author paper (6 student in this category).
A second complication was scoring papers that were published a year or more after a degree was awarded. We directly emailed faculty or students from the 1980s to inquire whether such late publications were a product of their thesis work or primarily from a subsequent postdoctoral period (which were not scored). With only a couple of exceptions, these late publications were from thesis work; in many cases, difficulties in communication after leaving the laboratory between student and PI in the "pre--internet" era was cited as reasons for the delay in publication. However, papers published ~2 years beyond their graduation date were not scored in our analysis, unless it was their sole paper (1 student).
For the recent UCSF graduate students, we contacted the PIs of students who graduated between June 2013--December 2014 to inquire whether the student was working on additional first or second author publications and whether the paper was in preparation, submission, revision, or in press. We added all anticipated publications to the student's data profile (17 students), estimating an approximate, best circumstance time of publication based upon the status described by the PI (~9 months for in preparation, 6 months for submitted, and 3 months for revision). It is possible that some of these anticipated papers may not be published or published with a longer time frame. If a student did not produce a first or a first/second author publication, then a "0" was entered for that category of publications. In the 1979--1989 group, there were 8 students without a first author publication and 4 students for whom we could not find a record of any publication in PubMed, although supporting evidence on the internet confirmed that they graduated.
In the 2013--14 group, there were 9 students without an anticipated first author publication and 4 students without an anticipated first/second author publication. Students who did not publish a first--author paper were not included in the analysis of time to first author publication. Figures   Fig. S1: Breakdown of information for long and short format papers. A) Data for Nature: long format (Articles) and short format (Letters). B) Data for Journal of Cell Biology (JCB): long format (Articles) and short format (Rapid Communications (1984 name) or Reports (2014 name). These data from long and short format papers were combined together in the analysis in Fig. 1.

Q&A Regarding Pre--prints
The following questions or concerns (paraphrased here in italics) were raised by others in response to the initial posting of this article on bioRxiv. My responses are presented below each question.

Reproducibility and Quality
We already have a looming problem of irreproducibility. Pre--prints will just encourage more irreproducible results to be spread throughout the community. This issue is indeed important, since pre--prints open up the possibility of wide-spread science communication prior to peer review. Pre--prints might allow work to be disseminated, before mistakes are caught by peer review and thus lead researchers down wrong tracks. On the other hand, many peer--reviewed articles have proven to be inaccurate, and there is no clear data indicating how successful peer review is in filtering out irreproducible, inaccurate or fraudulent data. It might be better to have many people see the work right away, allowing the possibility of inadvertent mistakes to be caught and helping peer reviewers and the authors themselves to produce an accurate final product. Furthermore, a high profile result will likely be replicated right away and thus validated before it is published in a high profile journal. A good commenting system on pre--prints might help this process.
The immediate exposure of preprints also will likely be a motivating factor for accuracy. Many researchers intentionally do not complete all of their experiments in their first journal submission, since the journals emphasize "impact" in their first round of screening. Thus, mistakes in an initial journal submission and peer review are "invisible" and have no or minimal negative consequences for the author if the paper is rejected. This contrasts with a pre--print submission, in which all of the data is immediately transparent to the science community. This transparency will cause good scientists to be very cautious about their submission to a pre--print server, since that work will be seen and judged by their peers immediately. Having the scientist decide when his/her work is ready for dissemination will be an empowering action and also one filled with a sense of responsibility.
The subject of reproducibility, however, is a very complex one and should be taken into careful consideration. I would recommend collecting data on how pre--prints impact scientific reproducibility, but would argue that the disincentives for pre--prints (described in this Perpectives) should be removed first to allow increased use. Journal filters are good. I don't have time to sort through work in a massive pre--print server. I also am more assured of quality if I read work in top journals.
Pre--prints will not replace the journals and instead will exist alongside them. You might prefer reading journals in order to learn about a new field, where the speed of access to new information might be less important. However, pre--prints would allow you to access to information faster in your own field, which might help to advance your research program. Thus, pre--prints and journal articles together can serve different needs in the scientific community. As discussed in the Perspective, it is also possible to experiment with filters that will allow users to sort through the content of pre--print servers for benchmarks someone rushes out an incomplete paper and then subsequently wants to correct mistakes, they can upload a new version, but the original version remains on the site for all to see.

How are news and publicity handled if there is a preprint submission as well as a subsequent journal publication
Historical examples from arXiv reveal various ways in which this has been handled. In some cases, a journal or press will "find" a pre--print on arXiv and run a story on the work prior to journal publication. In some cases, the pre--print will be posted on arXiv at the time of acceptance to a journal (but prior to publication), and the press will cite the arXiv pre--print and name the journal in which it will be ultimately published. Even government agencies such as NSF have issued publicity surrounding a pre--print. In other cases, publicity only arises with the greater attention associated with the journal publication. A critical issue is that the authors need to follow the embargo policy of the journal to which they intend to submit, which usually prohibits the authors from speaking about their work directly with the press themselves prior to publication. See Nature's guidelines on publicity": http://www.nature.com/authors/policies/confidentiality.html. In general, news and publicity has been managed successfully with arXiv and the journal system.

A bigger issue to me is open access.
Pre--prints are free for anyone in the world. Use of this system will therefore ensure that there is always a version of manuscript that is freely available, regardless of what journal it is eventually published in. However, for certain journals, the accepted version of an article cannot be posted as pre--print for up to six months from the time of publication (e.g. see Nature guidelines cited above).

Having pre--prints listed on PubMed would be helpful as one--stop shopping to find science content
Currently PubMed is only for peer--reviewed articles. To facilitate content discovery, one could imagine developing a new biologist--friendly search engine that will search for content on PubMed, bioRxiv and arXiv. On the other hand, such functions could be integrated into PubMed. Both solutions are workable, and the community and NIH can decide on the best course of action.

Ethical and Practical Issues for Biology
Experimental biology is moving so fast. I am worried that if I post on bioRxiv or arXiv then someone will scoop me by rushing a paper to a journal and perhaps be luckier in the publication process.
The possibility that results/ideas might be "stolen" from a pre--print, resulting in the loss of credit for researcher, seems to be a prevalent concern in the biology community. This is why some argue that pre--prints simply will not work in biology as they have in physics. Here is an excerpt from a reviewer's comment on this Perspective from PNAS: "Should the author choose to continue to push the prepublication format, he might anticipate the following criticism of his logic. He poses that prepublication works for experimental physics so it can work for experimental biology. This analogy appears flawed. Physics today is like biology 40 years ago. The experimental systems needed to address a problem are unique, for example a synchrotron to address a problem in subatomic physics (like a bicoid mutant that nobody but Ed Lewis had). Hence, a prepublication is safe. Nobody can quickly generate the data of the prepublication or has preliminary data similar to the prepublication. What makes current biology so exciting is the lightening fast connections that are made between very rapidly moving systems. These same connections generate problems for the prepublication concept. Here is the scenario that critics will bring forward. One has a very nice unpublished discovery and talks about it at a meeting or University. A member of the audience has some preliminary results in another system that in the context of the talk all of sudden make sense. With much greater confidence the member of the audience adds a few experiments, publishes these results and common conclusion in a prepublication. This minimal publication is much weaker than the lecture but nonetheless gets priority. This scenario can't or would rarely happen in physics but would be the fear of every biologist talking about unpublished results. Nobody would share unpublished results because the speed at which unrefereed results could be published. In this light the author might use the text to probe a little deeper why biology did not move to prepublication format if in fact biology and physics are interchangeable. As it is, he will get much criticism for comparing apples to oranges." These remarks are thoughtful and reflect many people's concerns. My response is that not all biology experiments are so lightening fast to repeat. Some are, but most papers are fairly complex and not trivial to repeat in a few weeks even by a well--established competing lab. However, talking about work in a lecture constitutes a problem for establishing priority, as the referee indicates. Physicists tend to acknowledge information transmitted in public talks. But part of the motivation for establishing arXiv was to create a common access point where a discovery could be announced to the community, since not everyone can attend a lecture. Because arXiv is so widely viewed by the community, it is very difficult for an individual to "steal and run" with an idea/experiment with the excuse that they never saw it on arXiv. If pre--prints are going to be successful, they must carry with them the gravitas of priority. Finally, I asked Paul Ginsparg, founder of arXiv, if there were examples of transgressions where a result was posted on arXiv and then someone copied it and published it faster in a journal to claim priority. He could not think of a single example where this happened and also thought that the physics community would not tolerate such behavior. They also would not tolerate someone publishing a cheap paper on arXiv in response to hearing an outstanding work or idea in a public lecture. Furthermore, work appearing close in time as pre--prints (e.g. within a couple of months) will be compared based upon quality and acknowledged as co--discoveries if they deserve to be, just as is the case with journal publications. Perhaps physicists are not behind biologists (see the referee's comment), but rather are 40 years ahead of us in science communication and ethics. The interesting question is how does one define priority and associated ethical practices in biology? Perhaps the signing of a declaration by leaders in the biology might be helpful in initiating the process and setting a new tone. Ultimately, however, it will have to be further propagated by investigators themselves and how they teach their trainees. Biologists develop specialized reagents and strains for their work; there is an obligation to release these reagents immediately to the community upon publication. Will this obligation apply to preprint postings?
Some investigators may be happy to release their reagents or share software at the pre--print stage. Others may be reluctant to do so until after journal publication, especially given current concerns described above. Thus, the community may wish to develop a considerable resistance and thus likely fail. Pre--prints, on the other hand, represent a viable evolutionary intermediate. Pre--prints can co--exist with the journal system, and thus do not represent an either--or choice for scientists. Also, supporting pre--prints should not prevent other desirable changes in the science communication system that our community would like to establish later on (e.g. changes in pre--or post--publication review and evaluation). Indeed, a short--term success with pre--prints would convey a message to our community that we are not locked into the status quo and that other changes are possible over time. F1000 R has a complete publishing platform that communicates the "pre--print" but also initiates transparent peer review and then indexes successfully peer reviewed papers on PubMed. What about systems like this? F1000 R has an interesting and new publishing process. However, submitting a work to F1000 R precludes submitting the same work in another journal (unlike bioRxiv or arXiv). Individual scientists will have to decide on a publishing mechanism that makes sense for them--submission to F1000 R, or through a pre--print server (bioRxiv/arXiv) plus a subsequent journal of their choice, or through other mechanisms. Are you sure that pre--prints will work in biology?
No. I am only sure of death and taxes. However, we have to try experiments in scientific communication. This seems to be a relatively easy one to try, since the potential harm, cost, and infrastructure are minimal and the current barriers are not so difficult to overcome. If pre--prints are tried but do not succeed, then the answer will be clear and new ideas can be investigated.