Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Choosing experiments to accelerate collective discovery

Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster, and View ORCID ProfileJames A. Evans
  1. aDepartments of Medicine and Human Genetics, University of Chicago, Chicago, IL 60637;
  2. bComputation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL 60637;
  3. cInstitute of Genomic and Systems Biology, University of Chicago, Chicago, IL 60637;
  4. dDepartment of Sociology, University of California, Los Angeles, CA 90095;
  5. eMathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60637;
  6. fDepartment of Sociology, University of Chicago, Chicago, IL 60637

See allHide authors and affiliations

PNAS first published November 9, 2015; https://doi.org/10.1073/pnas.1509757112
Andrey Rzhetsky
aDepartments of Medicine and Human Genetics, University of Chicago, Chicago, IL 60637;
bComputation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL 60637;
cInstitute of Genomic and Systems Biology, University of Chicago, Chicago, IL 60637;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: arzhetsky@uchicago.edu jevans@uchicago.edu
Jacob G. Foster
dDepartment of Sociology, University of California, Los Angeles, CA 90095;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ian T. Foster
bComputation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL 60637;
eMathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60637;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James A. Evans
bComputation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL 60637;
fDepartment of Sociology, University of Chicago, Chicago, IL 60637
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James A. Evans
  • For correspondence: arzhetsky@uchicago.edu jevans@uchicago.edu
  1. Edited by Yu Xie, University of Michigan, Ann Arbor, MI, and approved September 8, 2015 (received for review May 18, 2015)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Scientists perform a tiny subset of all possible experiments. What characterizes the experiments they choose? And what are the consequences of those choices for the pace of scientific discovery? We model scientific knowledge as a network and science as a sequence of experiments designed to gradually uncover it. By analyzing millions of biomedical articles published over 30 y, we find that biomedical scientists pursue conservative research strategies exploring the local neighborhood of central, important molecules. Although such strategies probably serve scientific careers, we show that they slow scientific advance, especially in mature fields, where more risk and less redundant experimentation would accelerate discovery of the network. We also consider institutional arrangements that could help science pursue these more efficient strategies.

Abstract

A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.

  • complex networks
  • computational biology
  • science of science
  • innovation
  • sociology of science

Footnotes

  • ↵1To whom correspondence may be addressed. Email: arzhetsky{at}uchicago.edu or jevans{at}uchicago.edu.
  • Author contributions: A.R., J.G.F., and J.A.E. designed research; A.R., J.G.F., and J.A.E. analyzed data; and A.R., J.G.F., I.T.F., and J.A.E. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • *The notion that linking distant literatures is hard but potentially fruitful underwrites Swanson’s work on literature-based discovery (31).

  • †Scientists often study several entities in combination. This complicates the modeling, so we approximate the discovery process with dyadic strategies.

  • ‡Some values of αμ and αι describe a mechanism analogous to preferential attachment (21, 33), in which researchers choose concepts in proportion to the product of their degrees. Our model encodes many types of preferential attachment, e.g., versions that are superlinear in the degrees. We find that such preferential attachment strategies can be much more efficient for discovery.

  • §Observed behavior is generated by the interaction between preferences and the evolving set of opportunities. This makes interpretation subtle. For example, when considering chemicals in different connected components, a specific opportunity to combine them would be preferred (i.e., has a higher probability than an opportunity to connect similar chemicals at finite distance). Over time, however, more nodes enter the giant component. Hence, fewer opportunities exist to connect nodes in different components, leading to their small absolute number (Figs. S3B and S6).

  • ¶We assume that published research reflects the underlying distribution of research effort in a relatively undistorted way. Recent survey data on scientific choice are consistent with this assumption (41). Although we interpret Fig. 1 and Fig. S6D to imply that scientists pursue less risky projects over time, it is possible that scientists pursue such projects with the same intensity, but that fewer succeed and are published in later periods. We cannot tackle this issue directly, but consider how effort is screened by experimental failure, publication bias, etc., to produce the distribution of published choices. Our interpretation assumes that although a priori “risky” strategies (like combining two distant, low-degree chemicals) may fail more often than conservative alternatives, the risk is not so high that the published record no longer reflects the underlying distribution of effort. It also requires that risky strategies do not become much riskier over time. If the selection process has these plausible properties—i.e., it is well behaved and near stationary—then changes in the observed distribution and inferred parameters will track changes in the unobserved distribution of research effort and scientific choice.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1509757112/-/DCSupplemental.

Freely available online through the PNAS open access option.

Next
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Choosing experiments to accelerate collective discovery
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Accelerating collective discovery
Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster, James A. Evans
Proceedings of the National Academy of Sciences Nov 2015, 201509757; DOI: 10.1073/pnas.1509757112

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Accelerating collective discovery
Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster, James A. Evans
Proceedings of the National Academy of Sciences Nov 2015, 201509757; DOI: 10.1073/pnas.1509757112
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 118 (9)
Current Issue

Submit

Sign up for Article Alerts

Jump to section

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Setting sun over a sun-baked dirt landscape
Core Concept: Popular integrated assessment climate policy models have key caveats
Better explicating the strengths and shortcomings of these models will help refine projections and improve transparency in the years ahead.
Image credit: Witsawat.S.
Model of the Amazon forest
News Feature: A sea in the Amazon
Did the Caribbean sweep into the western Amazon millions of years ago, shaping the region’s rich biodiversity?
Image credit: Tacio Cordeiro Bicudo (University of São Paulo, São Paulo, Brazil), Victor Sacek (University of São Paulo, São Paulo, Brazil), and Lucy Reading-Ikkanda (artist).
Syrian archaeological site
Journal Club: In Mesopotamia, early cities may have faltered before climate-driven collapse
Settlements 4,200 years ago may have suffered from overpopulation before drought and lower temperatures ultimately made them unsustainable.
Image credit: Andrea Ricci.
Steamboat Geyser eruption.
Eruption of Steamboat Geyser
Mara Reed and Michael Manga explore why Yellowstone's Steamboat Geyser resumed erupting in 2018.
Listen
Past PodcastsSubscribe
Birds nestling on tree branches
Parent–offspring conflict in songbird fledging
Some songbird parents might improve their own fitness by manipulating their offspring into leaving the nest early, at the cost of fledgling survival, a study finds.
Image credit: Gil Eckrich (photographer).

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490