RT Journal Article
SR Electronic
T1 Visualizing probabilistic models and data with Intensive Principal Component Analysis
JF Proceedings of the National Academy of Sciences
JO Proc Natl Acad Sci USA
FD National Academy of Sciences
SP 13762
OP 13767
DO 10.1073/pnas.1817218116
VO 116
IS 28
A1 Quinn, Katherine N.
A1 Clement, Colin B.
A1 De Bernardis, Francesco
A1 Niemack, Michael D.
A1 Sethna, James P.
YR 2019
UL http://www.pnas.org/content/116/28/13762.abstract
AB We introduce Intensive Principal Component Analysis (InPCA), a widely applicable manifold-learning method to visualize general probabilistic models and data. Using replicas to tune dimensionality in high-dimensional data, we use the zero-replica limit to discover a distance metric, which preserves distinguishability in high dimensions, and an embedding with superior visualization performance. We apply InPCA to the model of cosmology which predicts the angular power spectrum of the cosmic microwave background, allowing visualization of the space of model predictions (i.e., different universes).Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the “curse of dimensionality” in high dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is intensive embedding, which not only is isometric (preserving local distances) but also allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter (ΛCDM) model as applied to the cosmic microwave background.