Optimal errors and phase transitions in high-dimensional generalized linear models

Edited by David L. Donoho, Stanford University, Stanford, CA, and approved January 18, 2019 (received for review February 14, 2018)
March 1, 2019
116 (12) 5451-5460

Significance

High-dimensional generalized linear models are basic building blocks of current data analysis tools including multilayers neural networks. They arise in signal processing, statistical inference, machine learning, communication theory, and other fields. We establish rigorously the intrinsic information-theoretic limitations of inference and learning for a class of randomly generated instances of generalized linear models, thus closing several decades-old conjectures. Moreover, we delimit regions of parameters for which the optimal error rates are efficiently achievable with currently known algorithms. Our proof technique is able to deal with the output nonlinearity and is hence of independent interest, opening ways to establish similar results for models of neural networks where nonlinearities are essential but in general difficult to account for.

Abstract

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

Continue Reading

Data Availability

Data deposition: Code related to this paper has been deposited in Github (https://github.com/sphinxteam/GeneralizedLinearModel2017).

Acknowledgments

F.K. and L.Z. acknowledge funding from the European Research Council under the European Union’s 7th Framework Program Grant Agreement 307087-SPARCS and under Horizon 2020 Research and Innovation Programme Grant Agreement 714608-SMile. F.K. and N.M. acknowledge support from the PAIL bilateral grant of Agence Nationale de la Recherche and Swiss National Foundation for Science (200021E-175541).

Supporting Information

Appendix (PDF)

References

1
J Nelder, R Wedderburn, Generalized linear models. J R Stat Soc Ser A 135, 370–384 (1972).
2
P McCullagh, Generalized linear models. Eur J Oper Res 16, 285–292 (1984).
3
JR Fienup, Phase retrieval algorithms: A comparison. Appl Opt 21, 2758–2769 (1982).
4
L Demanet, P Hand, Stable optimizationless recovery from phaseless linear measurements. J Fourier Anal Appl 20, 199–221 (2014).
5
EJ Candes, T Strohmer, V Voroninski, Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Commun Pure Appl Math 66, 1241–1274 (2013).
6
PT Boufounos, RG Baraniuk, 1-bit compressive sensing. 42nd Annual Conference on Information Sciences and Systems (CISS) (IEEE, Piscataway, NJ), pp. 16–21 (2008).
7
P Bühlmann, S Van De Geer Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer, Berlin, 2011).
8
Y LeCun, Y Bengio, G Hinton, Deep learning. Nature 521, 436–444 (2015).
9
DL Donoho, J Tanner, Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc Natl Acad Sci USA 102, 9446–9451 (2005).
10
EJ Candes, T Tao, Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans Inf Theory 52, 5406–5425 (2006).
11
DL Donoho, A Maleki, A Montanari, Message-passing algorithms for compressed sensing. Proc Natl Acad Sci USA 106, 18914–18919 (2009).
12
S Rangan, Generalized approximate message passing for estimation with random linear mixing. IEEE International Symposium on Information Theory Proceedings (ISIT), eds A Kuleshov, VM Blinovsky, A Ephremides (IEEE, Piscataway, NJ), pp. 2168–2172 (2011).
13
L Zdeborová, F Krzakala, Statistical physics of inference: Thresholds and algorithms. Adv Phys 65, 453–552 (2016).
14
U Kamilov, VK Goyal, S Rangan, Optimal quantization for compressive sensing under message passing reconstruction. IEEE International Symposium on Information Theory Proceedings (ISIT) (IEEE, Piscataway, NJ), pp. 459–463 (2011).
15
Y Xu, Y Kabashima, L Zdeborová, Bayesian signal reconstruction for 1-bit compressed sensing. J Stat Mech Theory Exp 2014, P11015 (2014).
16
P Schniter, S Rangan, Compressive phase retrieval via generalized approximate message passing. IEEE Trans Signal Process 63, 1043–1055 (2015).
17
M Bayati, A Montanari, The lasso risk for Gaussian matrices. IEEE Trans Inf Theory 58, 1997–2017 (2012).
18
N El Karoui, D Bean, PJ Bickel, C Lim, B Yu, On robust regression with high-dimensional predictors. Proc Natl Acad Sci USA 110, 14557–14562 (2013).
19
D Donoho, A Montanari, High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probab Theory Relat Fields 166, 935–969 (2016).
20
R Gribonval, P Machart, Reconciling “priors” & “priors” without prejudice? Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, eds Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ (Neural Information Processing Systems Foundation, La Jolla, CA). Available at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-26-2013. Accessed February 20, 2019. (2013).
21
M Advani, S Ganguli, An equivalence between high dimensional Bayes optimal inference and m-estimation. Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, eds Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (Neural Information Processing Systems Foundation, La Jolla, CA). Available at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-29-2016. Accessed February 20, 2019. (2016).
22
E Gardner, B Derrida, Three unfinished works on the optimal storage capacity of networks. J Phys A Math Gen 22, 1983–1994 (1989).
23
HS Seung, H Sompolinsky, N Tishby, Statistical mechanics of learning from examples. Phys Rev A 45, 6056–6091 (1992).
24
TLH Watkin, A Rau, M Biehl, The statistical mechanics of learning a rule. Rev Mod Phys 65, 499–556 (1993).
25
A Engel, C Van den Broeck Statistical Mechanics of Learning (Cambridge Univ Press, New York, 2001).
26
A Engel, L Reimers, Reliability of replica symmetry for the generalization problem in a toy multilayer neural network. Europhys Lett 28, 531–536 (1994).
27
GJ Bex, R Serneels, CV den Broeck, Storage capacity and generalization error for the reversed-wedge Ising perceptron. Phys Rev E 51, 6309–6312 (1995).
28
T Hosaka, Y Kabashima, H Nishimori, Statistical mechanics of lossy data compression using a nonmonotonic perceptron. Phys Rev E 66, 066126 (2002).
29
C Baldassi, et al., Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes. Proc Natl Acad Sci USA 113, E7655–E7662 (2016).
30
CH Martin, MW Mahoney, Rethinking generalization requires revisiting old ideas: Statistical mechanics approaches and complex learning behavior. arXiv:1710.09553. Preprint, posted October 26, 2017. (2017).
31
N Tishby, FC Pereira, W Bialek, The information bottleneck method. Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing (Univ of Illinois, Champaign, IL), pp. 368–377 (1999).
32
R Shwartz-Ziv, N Tishby, Opening the black box of deep neural networks via information. arXiv:1703.00810. Preprint, posted March 2, 2017. (2017).
33
CE Shannon, A mathematical theory of communication, part i, part ii. Bell Syst Tech J 27, 623–656 (1948).
34
T Tanaka, A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors. IEEE Trans Inf Theory 48, 2888–2910 (2002).
35
D Guo, S Verdú, Randomly spread CDMA: Asymptotics via statistical physics. IEEE Trans Inf Theory 51, 1983–2010 (2005).
36
AR Barron, A Joseph, Toward fast reliable communication at rates near capacity with Gaussian noise. IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, NJ), pp. 315–319 (2010).
37
J Barbier, F Krzakala, Approximate message-passing decoder and capacity-achieving sparse superposition codes. IEEE Trans Inf Theory 63, 4894–4927 (2017).
38
C Rush, A Greig, R Venkataramanan, Capacity-achieving sparse superposition codes via approximate message passing decoding. IEEE Trans Inf Theory 63, 1476–1500 (2017).
39
J Barbier, M Dia, N Macris, Threshold saturation of spatially coupled sparse superposition codes for all memoryless channels. IEEE Information Theory Workshop (ITW) (IEEE, Piscataway, NJ), pp. 76–80 (2016).
40
J Barbier, M Dia, N Macris, Universal sparse superposition codes with spatial coupling and GAMP decoding. arXiv:1707.04203. Preprint, posted July 13, 2017. (2017).
41
M Mézard, The space of interactions in neural networks: Gardner’s computation with the cavity method. J Phys A Math Gen 22, 2181–2190 (1989).
42
E Bolthausen, An iterative construction of solutions of the TAP equations for the Sherrington–Kirkpatrick model. Commun Math Phys 325, 333–366 (2014).
43
M Bayati, A Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans Inf Theory 57, 764–785 (2011).
44
M Bayati, M Lelarge, A Montanari, Universality in polytope phase transitions and message passing algorithms. Ann Appl Probab 25, 753–822 (2015).
45
A Javanmard, A Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling. Inf Inference 2, 115–144 (2013).
46
J Barbier, M Dia, N Macris, F Krzakala, The mutual information in random linear estimation. 54th Annual Allerton Conference on Communication, Control, and Computing, eds M Do, N Hovakimyan (Piscataway, NJ), pp. 625–632 (2016).
47
J Barbier, N Macris, M Dia, F Krzakala, Mutual information and optimality of approximate message-passing in random linear estimation. arXiv:1701.05823. Preprint, posted January 20, 2017. (2017).
48
G Reeves, HD Pfister, The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, NJ), pp. 665–669 (2016).
49
M Mézard, G Parisi, MA Virasoro Spin Glass Theory and Beyond (World Scientific, Singapore, 1987).
50
G Györgyi, First-order transition to perfect generalization in a neural network with binary synapses. Phys Rev A 41, 7097–7100 (1990).
51
H Sompolinsky, N Tishby, HS Seung, Learning from examples in large neural networks. Phys Rev Lett 65, 1683–1686 (1990).
52
J Barbier, N Macris, The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference. arXiv:1705.02780. Preprint, posted May 8, 2017. (2017).
53
A Coja-Oghlan, F Krzakala, W Perkins, L Zdeborova, Information-theoretic thresholds from the cavity method. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), eds H Hatami, P McKenzie, V King (Association for Computing Machinery, New York), pp. 146–157 (2017).
54
D Guo, S Shamai, S Verdú, Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans Inf Theory 51, 1261–1282 (2005).
55
M Opper, D Haussler, Generalization performance of Bayes optimal classification algorithm for learning a perceptron. Phys Rev Lett 66, 2677–2680 (1991).
56
DJ Thouless, PW Anderson, RG Palmer, Solution of ‘solvable model of a spin glass’. Philos Mag 35, 593–601 (1977).
57
Y Kabashima, Inference from correlated patterns: A unified theory for perceptron learning and linear vector channels. J Phys Conf Ser 95, 012001 (2008).
58
DL Donoho, A Javanmard, A Montanari, Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. IEEE Trans Inf Theory 59, 7434–7464 (2013).
59
M Opper, O Winther, Mean field approach to Bayes learning in feed-forward neural networks. Phys Rev Lett 76, 1964–1967 (1996).
60
M Opper, O Winther, Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach. Phys Rev Lett 86, 3695–3699 (2001).
61
M Mondelli, A Montanari, Fundamental limits of weak recovery with applications to phase retrieval. arXiv:1708.05932. Preprint, posted August 20, 2017. (2017).
62
J Barbier, F Krzakala, N Macris, L Miolane, L Zdeborová, Data from “GeneralizedLinearModel2017.” Available at https://github.com/sphinxteam/GeneralizedLinearModel2017. Deposited October 27, 2017. (2017).
63
AK Fletcher, S Rangan, Iterative reconstruction of rank-one matrices in noise. Inf Inference 7, 531–562 (2018).
64
D Hansel, G Mato, C Meunier, Memorization without generalization in a multilayered neural network. Europhys Lett 20, 471–476 (1992).
65
F Pedregosa, et al., Scikit-learn: Machine learning in Python. J Machine Learn Res 12, 2825–2830 (2011).
66
F Chollet, Keras. Available at https://github.com/fchollet/keras. Accessed February 12, 2019. (2015).
67
Y Wu, S Verdú, Rényi information dimension: Fundamental limits of almost lossless analog compression. IEEE Trans Inf Theory 56, 3721–3748 (2010).
68
F Krzakala, M Mézard, F Sausset, Y Sun, L Zdeborová, Statistical-physics-based reconstruction in compressed sensing. Phys Rev X 2, 021005 (2012).
69
A Maleki, L Anitori, Z Yang, RG Baraniuk, Asymptotic analysis of complex lasso via complex approximate message passing (CAMP). IEEE Trans Inf Theory 59, 4290–4308 (2013).
70
M Soltanolkotabi, Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. arXiv:1702.06175. Preprint, posted February 20, 2017. (2017).
71
F Rosenblatt, The perceptron, a perceiving and recognizing automaton (Cornell Aeronautical Laboratory, Buffalo, NY), Project Para Report 85-460-1. (1957).
72
F Guerra, FL Toninelli, The thermodynamic limit in mean field spin glass models. Commun Math Phys 230, 71–79 (2002).
73
M Talagrand Mean Field Models for Spin Glasses: Volume I: Basic Examples (Springer, Berlin) Vol 54 (2010).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 116 | No. 12
March 19, 2019
PubMed: 30824595

Classifications

Data Availability

Data deposition: Code related to this paper has been deposited in Github (https://github.com/sphinxteam/GeneralizedLinearModel2017).

Submission history

Published online: March 1, 2019
Published in issue: March 19, 2019

Keywords

  1. high-dimensional inference
  2. generalized linear model
  3. Bayesian inference
  4. perceptron
  5. approximate message-passing algorithm

Acknowledgments

F.K. and L.Z. acknowledge funding from the European Research Council under the European Union’s 7th Framework Program Grant Agreement 307087-SPARCS and under Horizon 2020 Research and Innovation Programme Grant Agreement 714608-SMile. F.K. and N.M. acknowledge support from the PAIL bilateral grant of Agence Nationale de la Recherche and Swiss National Foundation for Science (200021E-175541).

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Jean Barbier2,1 [email protected]
Quantitative Life Sciences, International Center for Theoretical Physics, 34151 Trieste, Italy;
Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, 75005 Paris, France;
Florent Krzakala1
Laboratoire de Physique de l’Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Sorbonne Université, Université Paris-Diderot, Sorbonne Paris Cité, 75005 Paris, France;
Communication Theory Laboratory, School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland;
Léo Miolane2,1 [email protected]
Département d’Informatique de l’Ecole Normale Supérieure, Université Paris-Sciences-et-Lettres, Centre National de la Recherche Scientifique, Inria, 75005 Paris, France;
Lenka Zdeborová1
Institut de Physique Théorique, Centre National de la Recherche Scientifique et Commissariat à l’Energie Atomique, Université Paris-Saclay, 91191 Gif-sur-Yvette, France

Notes

2
To whom correspondence may be addressed. Email: [email protected] or [email protected].
Author contributions: J.B., F.K., N.M., L.M., and L.Z. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.
1
J.B., F.K., N.M., L.M., and L.Z. contributed equally to this work.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Optimal errors and phase transitions in high-dimensional generalized linear models
    Proceedings of the National Academy of Sciences
    • Vol. 116
    • No. 12
    • pp. 5199-5831

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media