Network structure of production
See allHide authors and affiliations
Edited by Lars Peter Hansen, University of Chicago, Chicago, IL, and approved February 2, 2011 (received for review October 15, 2010)

Abstract
Complex social networks have received increasing attention from researchers. Recent work has focused on mechanisms that produce scale-free networks. We theoretically and empirically characterize the buyer–supplier network of the US economy and find that purely scale-free models have trouble matching key attributes of the network. We construct an alternative model that incorporates realistic features of firms’ buyer–supplier relationships and estimate the model’s parameters using microdata on firms’ self-reported customers. This alternative framework is better able to match the attributes of the actual economic network and aids in further understanding several important economic phenomena.
Firms’ interconnections through buyer–supplier relationships affect economic phenomena ranging from the spread of innovative ideas (1) to the transmission of economic shocks (2) to trade patterns (3). Recognizing this, economists have started to pay explicit attention to firm network structures (refs. 4 and 5 and the studies discussed in ref. 6). However, no one has theoretically or empirically characterized the actual firm network structure in any large economy. Here, we establish basic features of the buyer–supplier network of firms in the United States and develop a model of firm birth, death, and input–output link formation that closely replicates the observed network.
Earlier research modeled the formation and structure of complex social networks more broadly. Examples include links on the worldwide web (7), job-search networks (8), and friendships (9); refs. 10 and 11 have recent surveys. Much of this recent work was spurred by seminal work (7) documenting the scale-free nature of many networks. We show, however, that scale-free network models miss important elements of the US economy’s firm network. In particular, the fat-tail nature of scale-free networks overstates the connectivity of the economy’s most central vertices—that is, the most vertically interconnected firms. At the same time, it overpredicts the number of minimally connected firms.
We propose an alternative model of network formation that better matches the connectivity distribution of US firms. Following the model in ref. 12, our model adds processes for vertex (firm) death and reattachment of those edges (buyer–supplier relationships) among surviving firms. It also allows new edges to be formed through a mix of the preferential attachment mechanisms emblematic of scale-free network models (where new edges are more likely to be formed with vertices that already have more edges) and random attachment (similar to that in ref. 13). Although these extensions are sparsely parameterized, they considerably extend the ability of network formation models to match observed firm network structures. Importantly, they also embody realistic features of the actual firm network: firms often go out of business, and many suppliers actively prefer to work with less-connected downstream firms because of product specialization and long-term contracting issues. We estimate our model’s parameters using microdata on firms’ self-reported buyer–supplier links. This approach shows that the model, despite being estimated using variation at the micro level, is able to closely match the macro distribution of firm connectedness. Using the model, we can predict economic phenomena such as the transmission of economic shocks throughout the network.
Modeling the In-Degree Distribution
Denote N(t) as the number of vertices, which represent firms, in the network at any time t. Each vertex has an in-degree, k; these k edges represent links with each of the suppliers of the firm. Let n(k, t) denote the number of vertices of in-degree k at time t [∑k n(k, t) = N(t)]. Let be the average number of customers (or suppliers) per firm. At each t, three distinct processes act to change the network structure.
i) Death of existing firms. Firms uniformly and permanently exit the network with probability q.* This results in the destruction of q(2 − q)N(t)m(t) edges.† Of these destroyed edges, q(1 − q)N(t)m(t) have the receiving vertex survive to the next period.
ii) Rewiring of surviving firms. q(1 − q)N(t)m(t) of the edges that were destroyed because of firm death are reformed among surviving vertices as firms attempt to replace existing customers. We assume a fraction r of these rewired edges is allocated uniformly (that is, with probability
) across each of the surviving vertices. The remaining fraction of 1 − r edges is allocated by preferential attachment: a vertex with k surviving edges receives a rewired edge from another surviving firm with probability
, the vertex’s share of surviving edges in the network.
iii) Birth of new firms. (g + q)N(t) new vertices enter the network, each forming m(t) edges. A fraction δ of these edges extends to existing firms. A fraction 1 − r is allocated by a preferential attachment rule, whereas the other r of the δ(q + g)N(t)m(t) edges is allocated uniformly across the existing vertices. Finally, 1 – δ of the (q + g)N(t)m(t) new edges is assumed to be distributed uniformly and independently among the other (q + g)N(t) new firms that entered at the same time. Note that, because q is the average probability of vertex death, g is the net average growth rate of the number of vertices in the network.
The structure of a network with these growth and decay features in which edges and nodes appear and disappear probabilistically can be approximated by the following partial differential equation (12) (Eq. 1):
γ(k, t) is the in-degree growth rate of a vertex, and β(k, t) is the in-degree distribution of entering vertices. We derive these expressions below. This mass balance equation says that the internally accumulated change in the network’s in-degree structure must equal the net change caused by birth and death.
The expressions β(k, t) and γ(k, t) are determined as follows. Recall that (q + g)N(t)(1 – δ)m(t) new edges are distributed uniformly among the (q + g)N(t) new firms at t. Because these edges are allocated independently, the in-degree distribution for an entering vertex is Binomial. To obtain a continuous approximation to this distribution, we use the exponential
.
Each period, the in-degree of a vertex can change in one of three ways. It can lose edges because of the exit of other vertices, receive new edges from existing vertices through the rewiring process, or form edges with new vertices. Putting the three processes together, a vertex of in-degree k adds, on average, edges per time step.
Let be the density of firms with in-degree k at time t. Divide Eq. 1 by N(t) and rearrange (Eq. 2):
We want to solve for stationary distribution of p(k, t). Letting and substituting our expressions for γ(k, t) and β(k, t) into Eq. 2 yields (Eq. 3)
The solution to Eq. 3 takes the following form (Eq. 4):where
, and Γ is the upper incomplete γ-function.‡
It is useful to compare this model to the predicted in-degree distribution of a pure preferential attachment model, as in ref. 7. The cumulative distribution function of vertex in-degree is , and the slope of log(1 − F(k)) vs. log(k) is constant. Departures from a linear relationship in our model occur when δ decreases or r increases. Intuitively, for smaller δ or larger r, a larger fraction of the edges is allocated to vertices independent of the vertices’ in-degrees.
We will use our microdata on buyer–supplier relationships to estimate the model’s parameters, solve for the implied steady-state in-degree distribution using Eq. 4, and compare the result with the observed distribution in the data.
Empirical Approach and Results
Data.
We estimate the parameters of our model using yearly firm-level data from the Compustat database. These data contain accounting and operations information compiled from publicly listed firms’ financial disclosures. Our firm panel spans from 1979 to 2007 and contains a total of over 39,000 firm-year observations. The longitudinal nature of the data lets us track individual firms’ operations over time. Critically for our use here, Compustat contains firms’ own reports of their major customers in accordance with Financial Accounting Standards No. 131. A major customer is defined as a firm that purchases more than 10% of the reporting seller’s revenue, although firms sometimes also report customers that account for less than this. Although this reporting threshold obviously creates a truncation in the number of edges that we can identify downstream of a firm, they allow us to compile much more comprehensive lists of firms’ suppliers and through this, a firm’s degree of connectedness in the network.
In SI Text, we show that the truncation issue does not affect the shape of the in-degree distribution; we argue that the probability that an edge is observed is similar for edges with a large or small receiving firm. Therefore, for firms that appear as customers in our dataset, the fraction of edges that we miss because of the 10% rule is similar for low and high in-degree firms.
Table 1 shows the 10 most connected firms in our data for the two 5-y intervals at each end of our sample period. The results are intuitive. The early period is dominated by large manufacturers like the Big Three automakers, Boeing, and McDonnell Douglas, the conglomerate GE, large retailers like Sears and JCPenney, and AT&T. By the end of the sample, the shift of US economic activity away from manufacturing and to services (and particularly, health services) during the past several decades is apparent. The Big Three are still in the 10 most connected firms, although at a lower rank. The most connected retailers have changed to Wal-Mart, Home Depot, and Target, Hewlett-Packard is now the most central technology company, and medical goods and service providers Cardinal Health, AmerisourceBergen, and McKesson have entered the top 10.
Top 10 firms from 1979 to 1983 and from 2003 to 2007
These basic patterns are reassuring that our measures of firms’ connectedness are meaningful. That said, there are some limitations to the Compustat dataset, primarily that it contains only publicly listed firms and that different firms do not follow uniform listing criteria for their buyers. However, listed firms account for a very large share of private sector gross domestic product and span virtually every sector of the US economy, a span of coverage that few datasets can match.
Estimation.
Our model has five parameters, q, m, r, δ, and g. We use our microdata on buyer–supplier links to estimate their values in the US firm network. Four parameters can be measured directly in the data. The vertex exit rate, q, is 0.24. The average number of edges per vertex, m, is 1.06. The fraction of edges connecting new vertices to previously existing firms, δ, is 0.75, and the average growth rate of the number of vertices in the network, g, is 0.04. The remaining parameter is r, the fraction of edges that are assigned across existing vertices with uniform probability rather than through preferential attachment. This is not directly observable in the data. We can see which links are formed but cannot directly observe their ex ante probability of being assigned to a particular vertex. However, our model gives an expression for the expected probability that a k in-degree vertex received a particular link from another surviving vertex, an observable event. We use this probability expression, , to estimate r using maximum likelihood. We find that r = 0.18.§
Substituting these parameter estimates into Eq. 4 gives us the model’s prediction of the in-degree distribution of the US firm network.
Results.
Fig. 1A overlays the distribution predicted by the preferential attachment model in ref. 7,¶ on the empirical distribution. The line drawn has a slope of −1, with the intercept chosen to provide the best fit to the data. The Pareto distribution predicted by the model has more mass in the right tail than does the actual network: the most central firms in the network (e.g., Wal-Mart, GE, and Cardinal Health) have fewer buyer–supplier links than the model would predict. Furthermore, the Pareto distribution overpredicts the mass of firms that have low in-degrees.‖ Both of these deviations from the actual distribution are potentially important for evaluating the importance of firm interconnectedness. The roles of the most central firms are, of course, the focus of much research. For example, refs. 16–18 study whether—depending on the structure of the network—a shock to one financial institution can cause a systemic crisis. Although the less-connected firms are individually less critical to the operation of the production network, their sheer joint mass makes them an important aggregate force as well.
Model fit. (A) Preferential attachment model, with data for 2005 (squares) and 2006 (triangles). (B) Model of section two (dashed line) and preferential attachment model (solid line), with data for 2005 (squares) and 2006 (triangles).
Fig. 1B adds the predicted in-degree distribution from our model. Its features introduce a curvature in the relationship between log(1 − F(k)) and log(k) that fits the data better than the linear relationship of the standard preferential attachment model. Our model has a direct departure from the preferential attachment mechanism in that a fraction r of the rewired edges and a fraction 1 − δ + δr of the edges from entering vertices are allocated uniformly across existing vertices. The possibility that not all edges are allocated on the “big get bigger” basis of the preferential attachment mechanism helps capture this curvature. As discussed in the Introduction, this departure from preferential attachment captures realistic features of buyer–supplier networks in an economy where product specialization and vertical contracting considerations may reward tight connections between small numbers of vertically connected firms.
We note that we estimate the model’s parameters from the relationships existing within the microdata and then use the model to project the implications of these parameters out to the cumulative distribution function for the network. We do not simply choose the parameters to find the best-fit curve to the cumulative distribution function. Thus, the model is consistent with both the micro- and macroattributes of the buyer–supplier network.
Our model still preserves the feature of pure preferential attachment models that the probability that a firm adds new suppliers is positively and significantly related to its number of vertical links with existing firms. We verify that this property holds in the data using a logistic regression, where the dependent variable is the probability that a new link forms between two vertices in a given year (the full regression results are available in SI Text). A 1-SD increase in the previous in-degree of vertex j is associated with a 0.03% increase in the probability that a new link forms from vertex i to vertex j (the unconditional probability that a new link forms to vertex j from vertex i is 0.26%). However, these results indicate that many other factors affect the probability that two firms are linked. Firms in the same industry are more likely to be linked to one another, and firms that are geographically close to one another are more likely to be linked to one another. The influence of these other factors is not accounted for in a pure preferential attachment model, and this could be one reason why such models miss important empirical features of the observed buyer–supplier network.
One useful application of mapping the network structure is that it facilitates assessment of the US production system’s vulnerability to shocks. Taking as motivation the recent turmoil in the US automotive industry, we consider the effects of a negative shock to the Big Three auto manufacturers. Fig. 2 shows the 2006 firm network, with the Big Three in red, their immediate suppliers in orange, and all other firms in gray.** The Big Three were responsible for $82 billion dollars in purchases from their suppliers in 2006. Assuming a 45% drop in the Big Three’s purchases (commensurate with their drop in unit sales during 2007–2009), these immediate suppliers would suffer a short-run loss of business of $37 billion. The network map indicates that this immediate spillover impact would affect a substantial but not overwhelming portion of the production network.
Buyer–supplier network in 2006. GM, Ford, and Chrysler are colored red. Their suppliers are colored orange. All other firms are gray.
Conclusion
We have theoretically and empirically characterized the buyer–supplier network of the US economy. Scale-free frameworks that have seen increasing use in modeling social networks have trouble matching the network’s empirical in-degree distribution. We propose an alternative model that parsimoniously incorporates realistic features of firms’ buyer–supplier relationships. Estimating the model from microdata on firms’ self-reported customers, we find that our alternative framework is better able to match the attributes of the actual economic network.
Besides its obvious connection to other work on social networks, we see this research as also being related to investigations into the firm-size distribution (15, 19–22). Those investigations have tied features of firm growth to issues of broader economic importance, such as the ability (or inability) of the macroeconomy to absorb idiosyncratic shocks. An application of the current paper’s framework, which is a topic also considered in ref. 23, is to explore the potentially more direct roles that firm connectedness might play in explaining such issues.
Footnotes
- ↵1To whom correspondence should be addressed. E-mail: hortacsu{at}uchicago.edu.
Author contributions: E.A., A.H., J.R., and C.S. designed research, performed research, analyzed data, and wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
↵*This exit process can be extended to allow for the probability of exit to depend on a vertex’s degree of connectivity, although this considerably complicates the solution.
↵†There are qN(t)m(t) edges that have the sending vertex exit the network, qN(t)m(t) edges that have the receiving vertex exit the network, and q2N(t)m(t) edges that have both the receiving and sending vertex exit the network. Combining these terms, there are (2q – q2)N(t)m(t) edges that are destroyed each period.
↵‡The upper incomplete γ-function is given by
. The limit of Γ(a, x) as x → ∞ is 0.
↵§Our estimate of r is the maximand of
.
↵¶Refs. 14 and 15 also propose models of city and firm growth, respectively, that generate this predicted distribution.
↵‖From Fig. 1A, we see that a pure preferential attachment model would predict that 84% of firms reported no major suppliers. In 2005, only 62% of the firms reported no major suppliers.
↵**In Fig. 2, we include only vertices in the giant weakly connected component (the largest subset of firms that are connected to one another).
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1015564108/-/DCSupplemental.
References
- ↵
- ↵
- ↵
- ↵
- Kranton RE,
- Minehart DF
- ↵
- Alfaro L,
- Chen M
- ↵
- Jackson MO
- ↵
- ↵
- Granovetter MS
- ↵
- ↵
- Schweitzer F,
- et al.
- ↵
- Borgatti SP,
- Mehra A,
- Brass DJ,
- Labianca G
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Axtell RL
- Rossi-Hansberg E,
- Wright MLJ
- ↵
- Gabaix X
- ↵
- Carvalho V
Citation Manager Formats
Article Classifications
- Social Sciences
- Economic Sciences