## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Inference on graphs via semidefinite programming

Inference problems on graphs arise naturally when trying to make sense of network data. Oftentimes, these problems are formulated as intractable optimization programs. This renders the need for fast heuristics to find adequate solutions and for the study of their performance. For a certain class of problems, Javanmard et al. (1) successfully use tools from statistical physics to analyze the performance of semidefinite programming relaxations, an important heuristic for intractable problems.

## Inverse Problems On Graphs

A particularly interesting class of inverse problems in graphs is that of synchronization problems (2, 3). These include registration of multiple pictures of the same scene, alignment of signals, community detection, and many others. After associating objects (for example, images or signals) to nodes of a graph, the goal is to estimate labels of the nodes (viewing direction of the image, shift of the signal, or community membership) from pairwise information about labels of nodes sharing edges (obtained, for example, by comparing two images). It is productive to think of the labels as being in a group (for example, of transformations) and the pairwise information being about group ratios (or relative transformations). An important example is synchronization of 3D rotations, which is a crucial step in the reconstruction problem in cryoelectron microscopy (4).

Community detection under the binary stochastic block model has recently received significant attention (5⇓⇓⇓⇓–10). In this model, a random graph is drawn on vertices belonging to two communities of the same size. Each pair of nodes share an edge, independently, with a certain probability: *p* if they belong to the same community and *q* otherwise (oftentimes

A natural approach to efficiently estimate the memberships of the nodes of a graph generated from this model is belief propagation. In a nutshell, the idea is to have each node record its confidence of belonging to one of the communities and, in each iteration, have each node observe the confidences of all its neighbors and update its own confidence accordingly (through Bayes rule). In fact, Decelle et al. (5) used tools from statistical physics to make remarkably precise predictions of the performance of belief propagation in community detection and, based on those, conjectured the exact location of a phase transition below which it is information theoretically impossible to make an estimate of the community memberships that outperforms an uninformed random guess. This conjectured phase transition was mathematically proved in a recent series of works (6⇓–8) that also proposed ingenious estimation procedures that provably outperform a random guess above the phase transition. However, the exact performance of belief propagation has yet to be fully rigorously understood [there has, nevertheless, been remarkable progress (12⇓–14)].

## Semidefinite Programming

From the point of view of worst-case approximation in theoretical computer science, semidefinite programming [and the so-called sum-of-squares hierarchy (15⇓⇓–18)] seems to provide an “off the shelf” hierarchy of optimal algorithms for a large class of problems (19) [this is related to the famous unique games conjecture (20)]. In fact, semidefinite programming has been used to highlight another phase transition (10) in the binary stochastic block model, below which it is information theoretically impossible to exactly recover the community memberships of all of the nodes (although it may be possible to outperform a random guess). Using estimates in random matrix theory, it was recently shown that the semidefinite programming heuristic succeeds at exact recovery above this threshold (21, 22). It is worth noting that belief propagation can be viewed as attempting to approximate the marginal distributions of the node memberships and not necessarily directly solving an optimization problem, whereas the semidefinite program is obtained via the relaxation of an optimization problem.

This raises the natural question of understanding the performance of the semidefinite programming approach in the sparse regime, where exact recovery is impossible. A first step toward this was taken in ref. 23, and the results were further improved in ref. 24. However, a precise rigorous understanding of its performance seems outside the scope of our current techniques. Remarkably, Javanmard et al. (1) use nonrigorous tools from statistical physics to make very precise predictions, confirmed with numerical simulations, of the performance of this approach. These predictions suggest that semidefinite programming performs extremely well in this regime. In my view, this is particularly exciting for a few reasons: Firstly, it suggests that algorithms from the sum-of-squares hierarchy (the hierarchy of semidefinite programs believed to be optimal from the worst-case approximation point of view) may be nearly optimal also for a certain class of inference problems. Secondly, the results in Javanmard et al. (1) suggest that these tools may be used to produce precise predictions for the performance of algorithms in this hierarchy. Finally, this makes a fascinating connection between hierarchies of semidefinite programs (which are often believed, in theoretical computer science, to yield optimal algorithms for a certain class of problems) and belief propagation [which is tightly connected with the analysis in Javanmard et al. (1) and is often believed, in statistical physics, to be optimal for certain inference problems]. It is worth noting that the predictions suggest that, for a fixed average degree, this algorithm does not fully reach the described phase transition (although getting very close to it). In fact, a recent paper of Moitra et al. provides another argument suggesting that this is the case (25).

Javanmard et al. (1) address not only the community detection problem but also a

Although semidefinite programming is solvable in polynomial time (26), it tends to be computationally expensive. Javanmard et al. (1) used a low-rank factorization heuristic, proposed over a decade ago in ref. 27, to solve fairly large problem instances. Interestingly, they also made use of this factorization in their analysis. With the factorization, the semidefinite program can be seen as an optimization problem where each node is associated with a vector variable, which makes it more suitable for the statistical physics tools that are used. This suggests that the predictions obtained are valid for the heuristic itself, rendering it extremely effective at solving these types of semidefinite programs. In fact, this was recently backed up by rigorous guarantees for this heuristic in refs. 28 and 29.

## Acknowledgments

A.S.B. was supported by National Science Foundation Grant DMS-1317308.

## Footnotes

- ↵
^{1}Email: bandeira{at}mit.edu.

Author contributions: A.S.B. wrote the paper.

The author declares no conflict of interest.

See companion article 10.1073/pnas.1523097113.

## References

- ↵.
- Javanmard A,
- Montanari A,
- Ricci-Tersenghi F

- ↵
- ↵.
- Bandeira AS,
- Chen Y,
- Singer A

- ↵
- ↵
- ↵.
- Mossel E,
- Neeman J,
- Sly A

- ↵.
- Mossel E,
- Neeman J,
- Sly A

- ↵.
- Massoulié L

*Proceedings of the 46th Annual ACM Symposium on Theory of Computing*(Assoc Comput Machinery, New York), pp 694–703 - ↵.
- Mossel E,
- Neeman J,
- Sly A

- ↵.
- Abbe E,
- Bandeira AS,
- Hall G

- ↵
- ↵.
- Mossel E,
- Neeman J,
- Sly A

- ↵.
- Bordenave C,
- Lelarge M,
- Massoulié L

- ↵.
- Abbe E,
- Sandon C

- ↵.
- Parrilo PA

- ↵
- ↵.
- Shor N

- ↵.
- Nesterov Y

*High Performance Optimization*, Applied Optimization Series (Springer, New York), Vol 33, pp 405−440 - ↵.
- Barak B,
- Steurer D

- ↵.
- Khot S

*Proceedings of the 2010 IEEE 25th Annual Conference on Computational Complexity*(IEEE Comput Soc, Washington, DC), pp 99–121 - ↵.
- Bandeira AS

- ↵.
- Hajek B,
- Wu Y,
- Xu J

- ↵.
- Guedon O,
- Vershynin R

- ↵.
- Montanari A,
- Sen S

- ↵.
- Moitra A,
- Perry W,
- Wein AS

*Proceedings of the 48th Annual ACM Symposium on Theory of Computing*(Association for Computing Machinery, New York) - ↵
- ↵
- ↵.
- Bandeira AS,
- Boumal N,
- Voroninski V

- ↵.
- Montanari A

## Citation Manager Formats

### More Articles of This Classification

### Related Content

### Cited by...

- No citing articles found.