Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Visualizing probabilistic models and data with Intensive Principal Component Analysis

View ORCID ProfileKatherine N. Quinn, Colin B. Clement, Francesco De Bernardis, Michael D. Niemack, and View ORCID ProfileJames P. Sethna
  1. aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501

See allHide authors and affiliations

PNAS July 9, 2019 116 (28) 13762-13767; first published June 24, 2019; https://doi.org/10.1073/pnas.1817218116
Katherine N. Quinn
aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Katherine N. Quinn
  • For correspondence: knq2@cornell.edu
Colin B. Clement
aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Francesco De Bernardis
aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael D. Niemack
aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James P. Sethna
aDepartment of Physics, Cornell University, Ithaca, NY 14853-2501
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James P. Sethna
  1. Edited by William Bialek, Princeton University, Princeton, NJ, and approved May 31, 2019 (received for review October 5, 2018)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Article Figures & SI

Figures

  • Fig. 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 1.

    (A–C) Hypersphere embedding, illustrating an embedding of the 2D Ising model. Points were generated through a Monte Carlo sampling and visualized by projecting the probability distributions onto the first three principal components (28). The points are colored by magnetic field strength. As the system size increases from 2×2 to 4×4, the orthogonality problem is demonstrated by an increase in “wrapping” around the hypersphere. This effect can also be produced by instead considering four replicas of the original system, motivating the replica trick which takes the embedding dimension or number of replicas to zero.

  • Fig. 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 2.

    Replicated Ising model illustrating the derivation of our intensive embedding. All points are colored by magnetic field strength. (A) Large dimensions are characterized by large system sizes; here we mimic a 128×128 Ising model which is of dimension 21282. The orthogonality problem becomes manifest as all points are effectively orthogonal, producing a useless visualization with all points clustered in the cusp. (B) Using replica theory, we tune the dimensionality of the system and consider the limit as the number of replicas goes to zero. In this way, we derive our intensive embedding. Note that the z axis reflects a negative-squared distance, a property which allows violations of the triangle inequality and is discussed in the text.

  • Fig. 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 3.

    Stages of training a CNN. Each point in the 3D projections represents one of 10,000 test images supplied to the CNN (29). At the first epoch, the neural network is untrained and so is unable to reliably classify images, with about a 90% error rate—an effect reflected in the cloud of points. As training progresses and error rate decreases, the cloud begins to cluster as shown by InPCA at the 20th epoch. Finally, when completely trained, the clustered regions are manifest at the 2000th epoch with 10 clusters representing the 10 digits.

  • Fig. 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 4.

    InPCA visualization of biased coins (30). (A) The first two InPCA components correspond to the coin bias and variance, yet the first one is real and the second one is imaginary (the aspect ratio between axes is one). The contour lines represent constant distances from a fair coin and are hyperbolas: Points can be a finite distance from a fair coin yet an infinite distance from each other. (B) The ordered eigenvalues correspond to the manifold lengths, illustrating the hierarchical nature of the components extracted from InPCA.

  • Fig. 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 5.

    Model manifold of the six-parameter ΛCDM cosmological model predictions of temperature and polarization power spectra in the CMB using InPCA, t-SNE, and the diffusion map. Axes reflect the true aspect ratio from extracted components in all cases. Here the model manifold is colored by the primordial fluctuation amplitude, the most prominent feature in CMB data. (A) InPCA extracts, as the first and second component, this amplitude term as well as the Hubble constant. These parameters control the two most dominant features in the Planck data and so reflect a physically meaningful hierarchy of importance. In contrast, (B) t-SNE extracts only the amplitude term and (C) the diffusion map extracts the amplitude term and a different parameter, the scalar spectral index η, which reflects the scale variance of the density fluctuations in the early universe. In all plots, the orange point represents our universe, as represented by Planck 2015 data.

Data supplements

  • Supporting Information

    • Download Appendix (PDF)
    • Download Movie_S01 (GIF) - Stages of training a convolutional neural network (CNN). Each point in this animation represents one of the 10,000 handwritten digits in the MNIST test data set. Each of the ten colors represents one of the ten digits (dark blue = 0, ..., yellow = 9). The time evolution, or epoch, reflects the training of the neural network. At the initial frame, the untrained network is unable to classify images – it has a 90% error rate, just as one would expect from random guessing. As the training progresses and the error rate decreases, the digits in the cloud begin to cluster. (The network was trained on a separate set of 55,000 images, so the clustering represents success in recognizing new digits.)
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Visualizing probabilistic models and data with Intensive Principal Component Analysis
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Visualizing probabilistic models and data with Intensive Principal Component Analysis
Katherine N. Quinn, Colin B. Clement, Francesco De Bernardis, Michael D. Niemack, James P. Sethna
Proceedings of the National Academy of Sciences Jul 2019, 116 (28) 13762-13767; DOI: 10.1073/pnas.1817218116

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Visualizing probabilistic models and data with Intensive Principal Component Analysis
Katherine N. Quinn, Colin B. Clement, Francesco De Bernardis, Michael D. Niemack, James P. Sethna
Proceedings of the National Academy of Sciences Jul 2019, 116 (28) 13762-13767; DOI: 10.1073/pnas.1817218116
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Physical Sciences
  • Applied Mathematics
Proceedings of the National Academy of Sciences: 116 (28)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Model Manifolds of Probability Distributions
    • Hypersphere Embedding
    • Replica Theory and the Intensive Embedding
    • Intensive Principal Component Analysis
    • Properties of the Intensive Embedding and InPCA
    • Summary
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Water from a faucet fills a glass.
News Feature: How “forever chemicals” might impair the immune system
Researchers are exploring whether these ubiquitous fluorinated molecules might worsen infections or hamper vaccine effectiveness.
Image credit: Shutterstock/Dmitry Naumov.
Reflection of clouds in the still waters of Mono Lake in California.
Inner Workings: Making headway with the mysteries of life’s origins
Recent experiments and simulations are starting to answer some fundamental questions about how life came to be.
Image credit: Shutterstock/Radoslaw Lecyk.
Cave in coastal Kenya with tree growing in the middle.
Journal Club: Small, sharp blades mark shift from Middle to Later Stone Age in coastal Kenya
Archaeologists have long tried to define the transition between the two time periods.
Image credit: Ceri Shipton.
Mouse fibroblast cells. Electron bifurcation reactions keep mammalian cells alive.
Exploring electron bifurcation
Jonathon Yuly, David Beratan, and Peng Zhang investigate how electron bifurcation reactions work.
Listen
Past PodcastsSubscribe
Panda bear hanging in a tree
How horse manure helps giant pandas tolerate cold
A study finds that giant pandas roll in horse manure to increase their cold tolerance.
Image credit: Fuwen Wei.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Cozzarelli Prize
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490