The impossibility of low-rank representations for triangle-rich complex networks

Significance Our main message is that the popular method of low-dimensional embeddings provably cannot capture important properties of real-world complex networks. A widely used algorithmic technique for modeling these networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to common wisdom, we argue that such graph embeddings do not capture salient properties of complex networks. We mathematically prove that low-dimensional embeddings cannot generate graphs with both low average degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. This establishes that popular low-dimensional embedding methods fail to capture significant structural aspects of real-world complex networks.

graph embeddings | graph representations | low-dimensional embeddings | low-rank representations | singular value decomposition C omplex networks (or graphs) are a fundamental object of study in modern science, across domains as diverse as the social sciences, biology, physics, computer science, and engineering (1)(2)(3). Designing good models for these networks is a crucial area of research, and also affects society at large, given the role of online social networks in modern human interaction (4)(5)(6). Complex networks are massive, high-dimensional, discrete objects, and are challenging to work with in a modeling context. A common method of dealing with this challenge is to construct a low-dimensional Euclidean embedding that tries to capture the structure of the network (see ref. 7 for a recent survey). Formally, we think of the n vertices as vectors v1, v2, . . . , vn ∈ R d , where d is typically constant (or very slowly growing in n). The likelihood of an edge (i, j ) is proportional to (usually a nonnegative monotone function in) vi · vj (8,9). This gives a graph distribution that the observed network is assumed to be generated from.
The most important method to get such embeddings is the singular value decomposition (SVD) or other matrix factorizations of the adjacency matrix (8). Recently, there has also been an explosion of interest in using methods from deep neural networks to learn such graph embeddings (9-12) (refer to ref. 7 for more references). Regardless of the specific method, a key goal in building an embedding is to keep the dimension d small-while trying to preserve the network structure-as the embeddings are used in a variety of downstream modeling tasks such as graph clustering, nearest-neighbor search, and link prediction (13). Yet a fundamental question remains unanswered: To what extent do such low-dimensional embeddings actually capture the structure of a complex network?
These models are often justified by treating the (few) dimensions as "interests" of individuals, and using similarity of interests (dot product) to form edges. Contrary to the dominant view, we argue that low-dimensional embeddings are not good representations of complex networks. We demonstrate mathematically and empirically that they lose local structure, one of the hallmarks of complex networks. This runs counter to the ubiquitous use of SVD in data analysis. The weaknesses of SVD have been empirically observed in recommendation tasks (14)(15)(16), and our result provides a mathematical validation of these findings.
Let us define the setting formally. Consider a set of vectors v1, v2, . . . , vn ∈ R d (denoted by the d × n matrix V ) used to represent the n vertices in a network. Let GV denote the following distribution of graphs over the vertex set [n]. For each index pair i, j , independently insert (undirected) edge (i, j ) with probability max(0, min( vi · vj , 1)). (If vi · vj is negative, (i, j ) is never inserted. If vi · vj ≥ 1, (i, j ) is always inserted.) We will refer to this model as the "embedding" of a graph G, and focus on this formulation in our theoretical results. This is a standard model in the literature, and subsumes the classic Stochastic Block Model (17) and Random Dot Product Model (18,19). There are alternate models that use different functions of the dot product for

Significance
Our main message is that the popular method of lowdimensional embeddings provably cannot capture important properties of real-world complex networks. A widely used algorithmic technique for modeling these networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to common wisdom, we argue that such graph embeddings do not capture salient properties of complex networks. We mathematically prove that low-dimensional embeddings cannot generate graphs with both low average degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. This establishes that popular low-dimensional embedding methods fail to capture significant structural aspects of real-world complex networks. the edge probability, which are discussed in Alternate Models. Matrix factorization is a popular method to obtain such a vector representation: The original adjacency matrix A is "factorized" as V T V , where the columns of V are v1, v2, . . . , vn .
Two hallmarks of real-world graphs are 1) sparsity, where the average degree is typically constant with respect to n, and 2) triangle density, where there are many triangles incident to low-degree vertices (5,(20)(21)(22). The large number of triangles is considered a local manifestation of community structure. Triangle counts have a rich history in the analysis and algorithmics of complex networks. Concretely, we measure these properties simultaneously as follows.
Definition 1. For parameters c > 1 and ∆ > 0, a graph G with n vertices has a (c, ∆)-triangle foundation if there are at least ∆n triangles contained among vertices of degree, at most, c. Formally, let Sc be the set of vertices of degree, at most, c. Then, the number of triangles in the graph induced by Sc is at least ∆n.
Typically, we think of both c and ∆ as constants. We emphasize that n is the total number of vertices in G, not the number of vertices in S (as defined above). Refer to real-world graphs in Table 1. In Fig. 1, we plot the value of c vs. ∆. (Specifically, the y axis is the number of triangles divided by n.) This is obtained by simply counting the number of triangles contained in the set of vertices of degree, at most, c. Observe that, for all graphs, for c ∈ [10, 50], we get a value of ∆ > 1 (in many cases, ∆ > 10).
Our main result is that any embedding of graphs that generates graphs with (c, ∆)-triangle foundations, with constant c, ∆, must have near-linear rank. This contradicts the belief that low-dimensional embeddings capture the structure of real-world complex networks. Theorem 1. Fix c > 4, ∆ > 0. Suppose the expected number of triangles in G ≈ GV that only involve vertices of expected degree c is at least ∆n. Then, the rank of V is at least min(1, poly(∆/c))n/ lg 2 n.
Equivalently, graphs generated from low-dimensional embeddings cannot contain many triangles only on low-degree vertices. We point out an important implication of this theorem for Stochastic Block Models. In this model, each vertex is modeled as a vector in [0, 1] d , where the ith entry indicates the likelihood of being in the ith community. The probability of an edge is exactly the dot product. In community detection applications, d is thought of as a constant, or at least as much smaller than n. On the contrary, Theorem 1 implies that d must be Ω(n/ lg 2 n) to accurately model the low-degree triangle behavior.

Empirical Validation
We empirically validate the theory on a collection of complex networks detailed in Table 1. For each real-world graph, we compute a 100-dimensional embedding through SVD (basically, the top 100 singular vectors of the adjacency matrix). We generate 100 samples of graphs from these embeddings, and compute their c vs. ∆ plot. This is plotted with the true c vs. ∆ plot. (To account for statistical variation, we plot the maximum value of ∆ observed in the samples, over all graphs. The variation observed was negligible.) Fig. 1 shows such a   1. Plots of degree c vs. ∆: For a High Energy Physics coauthorship network, we plot c versus the total number of triangles only involving vertices of degree, at most, c. We divide the latter by the total number of vertices n, so it corresponds to ∆, as in Definition 1. We plot these both for the original graph (in thick blue) and for a variety of embeddings (explained in Alternate Models). For each embedding, we plot the maximum ∆ in a set of 100 samples from a 100-dimensional embedding. The embedding analyzed by our main theorem (TDP) is given in thick red. Observe how the embeddings generate graphs with very few triangles among low-degree vertices. The gap in ∆ for low degree is two to three orders of magnitude. The other lines correspond to alternate embeddings, using the NODE2VEC vectors and/or different functions of the dot product.
plot for a physics coauthorship network. More results are given in SI Appendix.
Note that this plot is significantly off the mark at low degrees for the embedding. Around the lowest degree, the value of ∆ (for the graphs generated by the embedding) is two to three order of magnitude smaller than the original value. This demonstrates that the local triangle structure is destroyed around low-degree vertices. Interestingly, the total number of triangles is preserved well, as shown toward the right side of each plot. Thus, a nuanced view of the triangle distribution, as given in Definition 1, is required to see the shortcomings of low dimensional embeddings.

Alternate Models
We note that several other functions of dot product have been proposed in the literature, such as the softmax function (10,12) and linear models of the dot product (7). Theorem 1 does not have direct implications for such models, but our empirical validation holds for them as well. The embedding in Theorem 1 uses the truncated dot product (TDP) function max(0, min( vi · vj , 1)) to model edge probabilities. We construct other embeddings that compute edge probabilities using machine learning models with the dot product and Hadamard product as features. This subsumes linear models as given in ref. 7. Indeed, the TDP can be smoothly approximated as a logistic function. We also consider (scaled) softmax functions, as in ref. 10 For each of these models (softmax, LRDP, and LRHP), we perform the same experiment described above. Fig. 1 also shows the plots for these other models. Observe that none of them capture the low-degree triangle structure, and their ∆ values are all two to three orders of magnitude lower than the original.
In addition (to the extent possible), we compute vector embeddings from a recent deep learning-based method [node2vec (12)]. We again use all of the edge probability models discussed above, and perform an identical experiment (in Fig. 1, these are denoted by "n2v"). Again, we observe that the low-degree triangle behavior is not captured by these deep learned embeddings.

Broader Context
The use of geometric embeddings for graph analysis has a rich history, arguably going back to spectral clustering (23). In recent years, the Stochastic Block Model has become quite popular in the statistics and algorithms community (17), and the Random Dot Product Graph model is a generalization of this notion [refer to recent surveys (19,24)]. As mentioned earlier, Theorem 1 brings into question the standard uses of these methods to model social networks. The use of vectors to represent vertices is sometimes referred to as latent space models, where geometric proximity models the likelihood of an edge. Although dot products are widely used, we note that some classic latent space approaches use Euclidean distance (as opposed to dot product) to model edge probabilities (25), and this may avoid the lower bound of Theorem 1. Beyond graph analysis, the method of Latent Semantic Indexing also falls in the setting of Theorem 1, wherein we have a low-dimensional embedding of "objects" (like documents), and similarity is measured by dot product (https://en.wikipedia.org/wiki/Latent semantic analysis).

High-Level Description of the Proof
In this section, we sketch the proof of Theorem 1. The sketch provides sufficient detail for a reader who wants to understand the reasoning behind our result, but is not concerned with technical details. We will make the simplifying assumption that all vi have the same length L. We note that this setting is interesting in its own right, since it is often the case, in practice, that all vectors are nonnegative and normalized. In this case, we get a stronger rank lower bound that is linear in n. Dealing with Varying Lengths provides intuition on how we can remove this assumption. The full details of the proof are given in Proof of Theorem 1.
First, we lower-bound L. By Cauchy-Schwartz, vi · vj ≤ L 2 . Let Xi,j be the indicator random variable for the edge (i, j ) being present. Observe that all Xi,j are independent, and E[Xi,j ] = min( vi · vj , 1) ≤ L 2 .
The expected number of triangles in G ≈ GV is [3] Note that j E[Xi,j ] = E[ j Xi,j ] is, at most, the degree of i, which is, at most, c. (Technically, the Xi,i term creates a selfloop, so the correct upper bound is c + 1. For the sake of cleaner expressions, we omit the additive +1 in this sketch.) The expected number of triangles is at least ∆n. Plugging these bounds in, Thus, the vectors have a length of at least √ ∆/c. Now, we lowerbound the rank of V . It will be convenient to deal with the Gram matrix M = V T V , which has the same rank as V . Observe that Mi,j = vi · vj ≤ L 2 . We will use the following lemma stated first by Swanapoel (26), but which has appeared in numerous forms previously Lemma 1 (Rank lemma). Consider any square matrix M ∈ R n×n . Then Note that Mi,i = vi · vi = L 2 , so the numerator | i Mi,i | 2 = n 2 L 4 . The denominator requires more work. We split it into two terms.
vi · vj ≤ cn. [5] If, for i =j , vi · vj > 1, then (i, j ) is an edge with probability 1. Thus, there can be, at most, (c − 1)n such pairs. Overall, there are, at most, cn pairs such that vi · vj > 1. So, Overall, we lower-bound the denominator in the rank lemma by cn(L 4 + 1).
We plug these bounds into the rank lemma. We use the fact that f (x ) = x /(1 + x ) is decreasing for positive x , and Dealing with Varying Lengths. The math behind Eq. 4 still holds with the right approximations. Intuitively, the existence of at least ∆n triangles implies that a sufficiently large number of vectors have a length of at least √ ∆/c. On the other hand, these long vectors need to be "sufficiently far away" to ensure that the vertex degrees remain low. There are many such long vectors, and they can only be far away when their dimension/rank is sufficiently high.
The rank lemma is the main technical tool that formalizes this intuition. When vectors are of varying length, the primary obstacle is the presence of extremely long vectors that create triangles. The numerator in the rank lemma sums Mi,i , which is the length of the vectors. A small set of extremely long vectors could dominate the sum, increasing the numerator. In that case, we do not get a meaningful rank bound.
But, because the vectors inhabit low-dimensional space, the long vectors from different clusters interact with each other. We prove a "packing" lemma (Lemma 5) showing that there must be many large positive dot products among a set of extremely long vectors. Thus, many of the corresponding vertices have large degree, and triangles incident to these vertices do not contribute to low-degree triangles. Operationally, the main proof uses the packing lemma to show that there are few long vectors. These can be removed without affecting the low-degree structure. One can then perform a binning (or "rounding") of the lengths of the remaining vectors, to implement the proof described in the above section.

Proof of Theorem 1
For convenience, we restate the setting. Consider a set of vectors v1, v2, . . . , vn ∈ R d , that represent the vertices of a social network. We will also use the matrix V ∈ R d×n for these vectors, where each column is one of the vi . Abusing notation, we will use V to represent both the set of vectors and the matrix. We will refer to the vertices by the index in [n].
Let GV denote the following distribution of graphs over the vertex set [n]. For each index pair i, j , independently insert (undirected) edge (i, j ) with probability max(0, min( vi · vj , 1)).
The Basic Tools. We now state some results that will be used in the final proof. Lemma 2 is an existing result. For all other statements, the proofs are provided in SI Appendix. (26)] Consider any square matrix A ∈ R n×n . Then
Recall that an independent set is a collection of vertices that induce no edge. Lemma 4. Any graph with h vertices and maximum degree b has an independent set of at least h/(b + 1).
A key component of dealing with arbitrary-length vectors is the following dot product lemma. This is inspired by results of Alon (27) and Tao (28), who get a stronger lower bound of 1/ √ d for absolute values of the dot products.
Lemma 5. Consider any set of 4d unit vectors u1, u2, . . . , u 4d in R d . There exists some i =j such that ui · uj ≥ 1/4d . The Main Argument. We prove by contradiction. We assume that the expected number of triangles contained in the set of vertices of expected degree, at most, c is at least ∆n. We remind the reader that n is the total number of vertices. For convenience, we simply remove the vectors corresponding to vertices with expected degree of at least c. LetV be the matrix of the remaining vectors, and we focus on GV . The expected number of triangles in G ≈ GV is at least ∆n.
The overall proof can be thought of in three parts. Part 1, remove extremely long vectors: Our final aim is to use the rank lemma (Lemma 2) to lower bound the rank of V . The first problem we encounter is that extremely long vectors can dominate the expressions in the rank lemma, and we do not get useful bounds. We show that the number of such long vectors is extremely small, and they can be removed without affecting too many triangles. In addition, we can also remove extremely small vectors, since they cannot participate in many triangles. Part 2, find a "core" of sufficiently long vectors that contains enough triangles: The previous step gets a "cleaned" set of vectors. Now, we bucket these vectors by length. We show that there is a large bucket, with vectors that are sufficiently long, such that there are enough triangles contained in this bucket.
Part 3, apply the rank lemma to the "core": We now focus on this core of vectors, where the rank lemma can be applied. At this stage, the mathematics shown in High-Level Description of the Proof can be carried out almost directly. Now for the formal proof. For the sake of contradiction, we assume that d = rank(V ) < α(∆ 4 /c 9 ) · n/ lg 2 n (for some sufficiently small constant α > 0).

Part 1: Removing extremely long (and extremely short) vectors
We begin by showing that there cannot be many long vectors inV . Formally, for any edge (i, j ) in H , vi · vj ≥ vi 2 vj 2 /4n ≥ (2 √ n) 2 /4n = 1. So (i, j ) is an edge with probability 1 in G ≈ GV . The degree of any vertex in H is, at most, c. By Lemma 4, H contains an independent set I of a size of at least 5cd /(c + 1) ≥ 4d . Consider an arbitrary sequence of 4d (normalized) vectors in I u1, . . . , u 4d . Applying Lemma 5 to this sequence, we deduce the existence of (i, j ) in I (i =j ) such that v i v i 2 · v j v j 2 ≥ 1/4d ≥ 1/4n. Then, the edge (i, j ) should be present in H , contradicting the fact that I is an independent set.
Denote by V the set of all vectors inV with length in the range [n −2 , 2 √ n]. Proposition 2. The expected degree of every vertex in G ≈ G V is, at most, c, and the expected number of triangles in G is at least ∆n/2.
Proof. Since removal of vectors can only decrease the degree, the expected degree of every vertex in G V is, naturally, at most, c. It remains to bound the expected number of triangles in G ≈ G V . By removing vectors in V \ V , we potentially lose some triangles. Let us categorize them into those that involve at least one "long" vector (length ≥ 2 √ n) and those that involve at least one "short" vector (length ≤ n −2 ) but no long vector.
We start with the first type. By Lemma 6, there are, at most, 5cd long vectors. For any vertex, the expected number of triangles incident to that vertex is, at most, the expected square of the degree. By Proposition 1, the expected degree squares is, at most, c + c 2 ≤ 2c 2 . Thus, the expected total number of triangles of the first type is, at most, 5cd × 2c 2 ≤ ∆n/ lg 2 n.
Consider any triple of vectors ( u, v , w ) where u is short and neither of the others are long. The probability that this triple forms a triangle is, at most, Summing over all such triples, the expected number of such triangles is, at most, 4. Thus, the expected number of triangles in G ≈ G V is at least ∆n − ∆n/ lg 2 n − 4 ≥ ∆n/2.

Part 2: Finding core of sufficiently long vectors with enough triangles
For any integer r , let Vr be the set of vectors { v ∈ V | v 2 ∈ [2 r , 2 r +1 )}. Observe that the Vr form a partition of V . Since all lengths in V are in the range [n −2 , 2 √ n], there are, at most, 3 lg n nonempty Vr . Let R be the set of indices r such that |Vr | ≥ (∆/60c 2 )(n/ lg n). Furthermore, let V be r ∈R Vr . Proposition 3. The expected number of triangles in G ≈ G V is at least ∆n/8.
Proof. The total number of vectors in r / ∈R Vr is, at most, 3 lg n × (∆/60c 2 )(n/ lg n) ≤ (∆/20c 2 )n. By Proposition 1 and linearity of expectation, the expected sum of squares of degrees of all vectors in r / ∈R Vr is, at most, (d + c 2 ) × (∆/20c 2 )n ≤ ∆n/10. Since the expected number of triangles in G ≈ G V is at least ∆n/2 (Proposition 2) and the expected number of triangles incident to vectors in V \ V is, at most, ∆n/10, the expected number of triangles in G ≈ G V is at least ∆n/2 − ∆n/10 ≥ ∆n/8.
We now come to an important proposition. Because the expected number of triangles in G ≈ G V is large, we can prove that V must contain vectors of at least constant length. Proposition 4. maxr∈R 2 r ≥ √ ∆/4c. Proof. Suppose not. Then every vector in V has a length of, at most, √ ∆/4c. By Cauchy-Schwartz, for every pair u, v ∈ V , u · v ≤ ∆/16c 2 . Let I denote the set of vector indices in V (this corresponds to the vertices in G ≈ G V ). For any two vertices i =j ∈ I , let Xi,j be the indicator random variable for edge (i, j ) being present. The expected number of triangles incident Observe that E[X j ,k ] is, at most, vj · v k ≤ ∆/16c 2 . Furthermore, i ] (recall that Di is the degree of vertex i). By Proposition 1, this is, at most, c + c 2 ≤ 2c 2 . The expected number of triangles in G ≈ G V is, at most, n × 2c 2 × ∆/16c 2 = ∆n/8. This contradicts Proposition 3.
Part 3: Applying the rank lemma to the core We are ready to apply the rank bound of Lemma 2 to prove the final result. The following lemma contradicts our initial bound on the rank d , completing the proof. We will omit some details in the following proof, and provide a full proof in SI Appendix. Lemma 7. rank(V ) ≥ (α∆ 4 /c 9 )n/ lg 2 n. Proof. It is convenient to denote the index set of V be I . Let M be the Gram matrix (V ) T (V ); so, for i, j ∈ I , Mi,j = vi · vj . By Lemma 2, rank(V ) = rank(M ) ≥ ( i∈I Mi,i ) 2 / i,j ∈I |Mi,j | 2 . Note that Mi,i is vi 2 2 , which is at least 2 2r for vi ∈ Vr . Let us denote maxr∈R 2 r by L, so all vectors in V have a length of, at most, 2L. By Cauchy-Schwartz, all entries in M are, at most, 4L 2 .
We lower-bound the numerator.
A series of technical calculations are needed to upper-bound the denominator, i,j ∈I |Mi,j | 2 . These details are provided in SI Appendix. The main upshot is that we can prove i,j ∈I |Mi,j | 2 ≤ 128cn(1 + L 4 ).

Details of Empirical Results
Data Availability. The datasets used are summarized in Table 1. We present here four publicly available datasets from different domains. The ca-HepPh is a coauthorship network, Facebook is a social network, and cit-HepPh is a citation network, all obtained from the SNAP graph database (29). The String hs dataset is a protein-protein interaction network obtained from ref. 30. (The citations provide the link to obtain the corresponding datasets.) We first describe the primary experiment, used to validate Theorem 1 on the SVD embedding. We generated a ddimensional embedding for various values of d using the SVD. Let G be a graph with the n × n (symmetric) adjacency matrix A, with eigendecomposition ΨΛΨ T . Let Λ d be the matrix with the d × d diagonal matrix with the d largest magnitude eigenvalues of A along the diagonal. Let Ψ d be the n × d matrix with the corresponding eigenvectors as columns. We compute the matrix d and refer to this as the d spectral embedding of G. This is the standard principal components analysis (PCA) approach.
From the spectral embeddings, we generate a graph from A d by considering every pair of vertices (i, j ) and generate a random value in [0, 1]. If the (i, j )th entry of A d is greater than the random value generated, the edge is added to the graph. Otherwise, the edge is not present. This is the same as taking A d and setting all negative values to 0 and all values >1 to 1 and performing Bernoulli trials for each edge with the resulting probabilities. In all of the figures, this is referred to as the "SVD TDP" embedding.
Triangle Distributions. To generate Figs. 1 and 2, we calculated the number of triangles incident to vertices of different degrees in both the original graphs and the graphs generated from the embeddings. Each of the plots shows the number of triangles in the graph on the vertical axis and the degrees of vertices on the horizontal axis. Each curve corresponds to some graph, and each point (x , y) in a given curve shows that the graph contains y triangles if we remove all vertices with a degree of at least x . We then generate 100 random samples from the 100-dimensional embedding, as given by SVD (described above). For each value of c, we plot the maximum value of ∆ over all of the samples. This is to ensure that our results are not affected by statistical variation (which was quite minimal).
Alternate Graph Models. We consider three other functions of the dot product, to construct graph distributions from the vector embeddings. Details on parameter settings and the procedure used for the optimization are given in SI Appendix. LRDP. We consider the probability of an edge (i, j ) to be the logistic function L(1 + exp(−k ( vi · vj − x0))) −1 , where L, k , x0 are parameters. Observe that the range of this function is [0, 1], and hence can be interpreted as a probability. We tune these parameters to fit the expected number of edges, to the true number of edges. Then, we proceed as in the TDP experiment. Fig. 2. Plots of degree c vs. ∆: For each network, we plot c versus the total number of triangles only involving vertices of degree of, at most, c. We divide the latter by the number of vertices, so it corresponds to ∆, as in the main definition. In each plot, we plot these for both the original graph and the maximum ∆ in a set of 100 samples from a 100-dimensional embedding. Observe how the embeddings generate graphs with very few triangles among low-degree vertices. The gap in ∆ for low degree is two to three orders of magnitude in all instances. We note that the TDP can be approximated by a logistic function, and thus the LRDP embedding is a "closer fit" to the graph than the TDP embedding. LRHP. This is inspired by linear models used on low-dimensional embeddings (7). Define the Hadamard product vi vj to be the d -dimensional vector where the r th coordinate is the product of the r th coordinates. We now fit a logistic function over linear functions of (the coordinates of) vi vj . This is a significantly richer model than the previous model, which uses a fixed linear function (sum). Again, we tune parameters to match the number of edges. Softmax. This is inspired by low-dimensional embeddings for random walk matrices (10,12). The idea is to make the probability of edge (i, j ) proportional to softmax, exp( vi · vj )/ k ∈[n] vi · v k . This tends to push edge formation even for slightly higher dot products, and one might imagine this helps triangle formation. We set the proportionality constant separately for each vertex to ensure that the expected degree is the true degree. The probability matrix is technically undirected, but we symmetrize the matrix. node2vec experiments. We also applied NODE2VEC, a recent deep learning-based graph embedding method (12), to generate vector representations of the vertices. We use default parameters to run NODE2VEC. (More details are provided in SI Appendix.) The NODE2VEC algorithm tries to model the random walk matrix associated with a graph, not the raw adjacency matrix. The dot products between the output vectors vi · vj are used to model the random walk probability of going from i to j , rather than the presence of an edge. It does not make sense to apply the TDP function to these dot products, since this will generate (in expectation) only n edges (one for each vertex). We apply the LRDP or LRHP functions, which use the NODE2VEC vectors as inputs to a machine learning model that predicts edges. In Figs. 1 and 2, we show results for all of the datasets. We note that, for all datasets and all embeddings, the models fail to capture the low-degree triangle behavior.
Degree Distributions. We observe that the low-dimensional embeddings obtained from SVD and TDP can capture the degree distribution accurately. In Fig. 3, we plot the degree distribution (in loglog scale) of the original graph with the expected degree distribution of the embedding. For each vertex i, we can compute its expected degree by the sum i pij , where pij is the probability of the edge (i, j ). In all cases, the expected degree distribution is close to the true degree distributions, even for lower degree vertices. The embedding successfully captures the "first-order" connections (degrees), but not the higher-order connections (triangles). We believe that this reinforces the need to look at the triangle structure to discover the weaknesses of low-dimensional embeddings.