Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Identifying personal microbiomes using metagenomic codes

Eric A. Franzosa, Katherine Huang, James F. Meadow, Dirk Gevers, Katherine P. Lemon, Brendan J. M. Bohannan, and Curtis Huttenhower
  1. aBiostatistics Department, Harvard School of Public Health, Boston, MA 02115;
  2. bMicrobial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142;
  3. cInstitute of Ecology and Evolution, University of Oregon, Eugene, OR 97403;
  4. dDepartment of Microbiology, The Forsyth Institute, Cambridge, MA 02142; and
  5. eDivision of Infectious Diseases, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115

See allHide authors and affiliations

PNAS June 2, 2015 112 (22) E2930-E2938; first published May 11, 2015; https://doi.org/10.1073/pnas.1423854112
Eric A. Franzosa
aBiostatistics Department, Harvard School of Public Health, Boston, MA 02115;
bMicrobial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine Huang
bMicrobial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James F. Meadow
cInstitute of Ecology and Evolution, University of Oregon, Eugene, OR 97403;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dirk Gevers
bMicrobial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine P. Lemon
dDepartment of Microbiology, The Forsyth Institute, Cambridge, MA 02142; and
eDivision of Infectious Diseases, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brendan J. M. Bohannan
cInstitute of Ecology and Evolution, University of Oregon, Eugene, OR 97403;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Curtis Huttenhower
aBiostatistics Department, Harvard School of Public Health, Boston, MA 02115;
bMicrobial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA 02142;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: chuttenh@hsph.harvard.edu
  1. Edited by Ralph R. Isberg, Howard Hughes Medical Institute, Tufts University School of Medicine, Boston, MA, and approved April 6, 2015 (received for review December 15, 2014)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Article Figures & SI

Figures

  • Tables
  • Fig. 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 1.

    Metagenomic codes (overview). (A) Three individuals and their metagenomic features (represented by capital letters) are shown. For each individual, a subset of features is highlighted that is unique among the three individuals. We refer to these sets as metagenomic codes. (B) The same three individuals reevaluated after weeks to months. Individual 1’s microbiome has remained stable, and his code still uniquely identifies him among the population (a true positive). Individual 2 has lost metagenomic feature C, and his code no longer identifies him (a false negative). Individual 3 has lost feature B and gained feature C. Individual 3 is still a true positive with respect to his own code, but also matches individual 2’s code (a false positive). (C) Illustration of the four metagenomic feature types considered in this work: OTUs, species, kilobase windows from reference genomes (kbwindows), and species-specific marker genes (markers) (see Methods and Table 1 for details).

  • Fig. 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 2.

    Properties associated with microbiome feature stability. For each (body site, feature type) combination, we counted cases of features confidently detected across subjects’ first sampling visits (time 1). The fraction of these cases that remained confidently detected at subjects’ second sampling visits (time 2; weeks to months later) provided a measure of feature stability. Stability was positively and strongly correlated with (A) feature abundance and (B) feature prevalence. (C) Highly prevalent features that were not detected in subjects’ time 1 samples had a high probability of being acquired by time 2, particularly at more exposed sites (e.g., skin). (D) Sampling time interval had a less marked effect on stability. NA, a (body site, feature type) combination with <10 confident detection events at time 1. Abundance values for OTUs and species reflect relative abundance; abundance values for markers and kbwindows reflect RPKM units.

  • Fig. 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 3.

    Temporal stability of metagenomic codes. (A) We identified unique metagenomic codes for individuals based on their first sampling visits (time 1); an individual whose microbial features were a subset of a second individual’s features had no unique code (black bars). Red bars represent true positives (TPs): codes that uniquely identified their owners at time 1 and again at the second sampling visit (time 2; weeks to months later). Blue bars represent false negatives (FNs): codes that matched no one at time 2. Pink and cyan bars represent false positives (FPs): codes that matched someone other than their owner at time 2, either in addition to their owner (TP+FP) or instead of their owner (FN+FP). (B) Average and SD of metagenomic code size. A target size (seven features) was imposed to reduce FPs. (C) Distribution of sampling time intervals for TPs and FNs, with each individual represented by a hash mark. FNs were weakly associated with longer sampling time intervals than TPs in a few body sites and very weakly in aggregate (Mann–Whitney u test). Green numbers indicate the number of individuals profiled at time 1 and time 2 for each (body site, feature type) combination (see Methods for an explanation of why kbwindows numbers differ from species and markers numbers).

  • Fig. 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 4.

    Influence of strain-level variation on marker gene-based codes. (A) Species varied greatly in their likelihood to contribute marker genes to a code (vertical axis) and the numbers of marker genes thus contributed (horizontal axis). Samples from the anterior nares and posterior fornix body sites were typically identified by individual strains (several markers each) of a few dominant taxa, whereas stool and oral sites were instead identified by combinations of species within (e.g., Bacteroides) or across genera, respectively. (B) Each row depicts the abundance of 293 Prevotella copri-specific marker genes in a stool metagenome. The three dark gray rows correspond to three sampling visits from one subject (HMP identifier 158802708) and the two light gray rows correspond to two visits from a second subject (159166850). Certain markers were consistently absent in one subject across visits and consistently present in the other, indicative of stable carriage of subject-specific strains of P. copri. Red markers were included in the subjects’ codes; triangles indicate encoded markers that differentiated the first subject from the second subject (or vice versa). Heights of marker genes within each row vary with gene abundance (binned according to the confident detection, relaxed detection, and confident nondetection thresholds used in the construction and evaluation of metagenomic codes; see inset key). (C) This panel uses the same format as B to explore marker profiles of Leptotrichia buccalis from the supragingival plaque (oral) samples of HMP subjects 159591683 and 159207311. Here, an open triangle represents an encoded marker gene that was acquired between time points (in a potential lateral transfer or strain replacement event), which could contribute to a possible false positive match. (D) This panel uses the format from B and C to explore marker profiles of Lactobacillus crispatus from the posterior fornix (vaginal) samples of HMP subjects 160502038 and 764042746.

Tables

  • Figures
    • View popup
    Table 1.

    Properties of metagenomic features and detection thresholds

    Feature descriptionShort nameSequencing basisUnitsConfident detection thresholdRelaxed detection thresholdConfident nondetection thresholdBody sitesPaired samples per body site
    Operational taxonomic unitsOTUs16S rRNA geneRelative abundance>1e−3>1e−4<1e−51825–105
    Microbial speciesSpeciesWhole metagenome shotgunRelative abundance>1e−3>1e−4<1e−5614–50
    Species-specific marker genesMarkersWhole metagenome shotgunRPKM>5>0.5<0.05614–50
    Kilobase windows from microbial reference genomeskbwindowsWhole metagenome shotgunRPKM>5>0.5<0.0569–45
    • In analyses of per-feature stability, a feature was considered detected if its abundance exceeded the confident detection threshold; a feature was considered acquired if it initially fell below the confident nondetection threshold and then later exceeded the confident detection threshold. When defining a metagenomic code, features with abundance between the confident detection and confident nondetection thresholds were considered ambiguous. When reevaluating a code at a later time point, the relaxed feature detection thresholds were used to add robustness to temporal variation.

Data supplements

  • Supporting Information

    • Download Appendix (PDF)
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Identifying personal microbiomes using metagenomic codes
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Human microbiome identifiability
Eric A. Franzosa, Katherine Huang, James F. Meadow, Dirk Gevers, Katherine P. Lemon, Brendan J. M. Bohannan, Curtis Huttenhower
Proceedings of the National Academy of Sciences Jun 2015, 112 (22) E2930-E2938; DOI: 10.1073/pnas.1423854112

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Human microbiome identifiability
Eric A. Franzosa, Katherine Huang, James F. Meadow, Dirk Gevers, Katherine P. Lemon, Brendan J. M. Bohannan, Curtis Huttenhower
Proceedings of the National Academy of Sciences Jun 2015, 112 (22) E2930-E2938; DOI: 10.1073/pnas.1423854112
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Biological Sciences
  • Microbiology

See related content:

  • Microbiome codes
    - May 26, 2015
Proceedings of the National Academy of Sciences: 112 (22)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Results
    • Discussion
    • Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Setting sun over a sun-baked dirt landscape
Core Concept: Popular integrated assessment climate policy models have key caveats
Better explicating the strengths and shortcomings of these models will help refine projections and improve transparency in the years ahead.
Image credit: Witsawat.S.
Model of the Amazon forest
News Feature: A sea in the Amazon
Did the Caribbean sweep into the western Amazon millions of years ago, shaping the region’s rich biodiversity?
Image credit: Tacio Cordeiro Bicudo (University of São Paulo, São Paulo, Brazil), Victor Sacek (University of São Paulo, São Paulo, Brazil), and Lucy Reading-Ikkanda (artist).
Syrian archaeological site
Journal Club: In Mesopotamia, early cities may have faltered before climate-driven collapse
Settlements 4,200 years ago may have suffered from overpopulation before drought and lower temperatures ultimately made them unsustainable.
Image credit: Andrea Ricci.
Steamboat Geyser eruption.
Eruption of Steamboat Geyser
Mara Reed and Michael Manga explore why Yellowstone's Steamboat Geyser resumed erupting in 2018.
Listen
Past PodcastsSubscribe
Birds nestling on tree branches
Parent–offspring conflict in songbird fledging
Some songbird parents might improve their own fitness by manipulating their offspring into leaving the nest early, at the cost of fledgling survival, a study finds.
Image credit: Gil Eckrich (photographer).

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490