# Role of design complexity in technology improvement

^{a}Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501;^{b}Department of Physics, Boston University, Boston, MA 02215;^{c}Libera Universitá Internazionale degli Studi Sociali, Guido Carli, Viale Pola 12, 00198 Rome, Italy;^{d}Center for Polymer Studies, Boston University, Boston, MA 02215; and^{e}Engineering Systems Division, Massachusetts Institute of Technology, Cambridge, MA 02139

See allHide authors and affiliations

Edited by Hans-Joachim Schellnhuber, Potsdam Institute for Climate Impact Research (PIK), Potsdam, Germany, and approved March 8, 2011 (received for review November 22, 2010)

## Abstract

We study a simple model for the evolution of the cost (or more generally the performance) of a technology or production process. The technology can be decomposed into *n* components, each of which interacts with a cluster of *d* - 1 other components. Innovation occurs through a series of trial-and-error events, each of which consists of randomly changing the cost of each component in a cluster, and accepting the changes only if the total cost of the cluster is lowered. We show that the relationship between the cost of the whole technology and the number of innovation attempts is asymptotically a power law, matching the functional form often observed for empirical data. The exponent α of the power law depends on the intrinsic difficulty of finding better components, and on what we term the design complexity: the more complex the design, the slower the rate of improvement. Letting *d* as defined above be the connectivity, in the special case in which the connectivity is constant, the design complexity is simply the connectivity. When the connectivity varies, bottlenecks can arise in which a few components limit progress. In this case the design complexity depends on the details of the design. The number of bottlenecks also determines whether progress is steady, or whether there are periods of stasis punctuated by occasional large changes. Our model connects the engineering properties of a design to historical studies of technology improvement.

The relation between a technology’s cost *c* and the cumulative amount produced *y* is often empirically observed to be a power law of the form [1]where the exponent α characterizes the rate of improvement. This rate is commonly termed the progress ratio 2^{-α}, which is the factor by which costs decrease with each doubling of cumulative production. A typical reported value (1) is 0.8 (corresponding to *α* ≈ .32), which implies that the cost of the 200th item is 80% that of the 100th item. Power laws have been observed (or at least assumed to hold), for a wide variety of technologies (1–3), although other functional forms have also been suggested and in some cases provide plausible fits to the data*. We give examples of historical performance curves for several different technologies in Fig. 1.

The relationship between cost and cumulative production goes under several different names, including the “experience curve,” the “learning curve,” or the “progress function.” The terms are used interchangeably by some, whereas others assign distinct meanings (1, 4). We use the general term performance curve to denote a plot of any performance measure (such as cost) against any experience measure (such as cumulative production), regardless of the context. Performance curve studies first appeared in the 19th century (5, 6), but their application to manufacturing and technology originates from the 1936 study by Wright on aircraft production costs (7). The large literature on this subject spans engineering (8), economics (4, 9), management science (1), organizational learning (16), and public policy (17). Performance curves have been constructed for individuals, production processes, firms, and industries (1).

The power law assumption has been used by firm managers (18) and government policy makers (17) to forecast how costs will drop with cumulative production. However, the potential for exploiting performance curves has so far not been fully realized, in part because there is no good theory explaining the observed empirical relationships. Why do performance curves tend to look like power laws, as opposed to some other functional form? What factors determine the exponent α, which governs the long-term rate of improvement? Why are some performance curves steady and others erratic? By suggesting answers to these questions, the theory we develop here can potentially be used to guide investment policy for technological change.

An example of the possible usefulness of such a theory is climate change mitigation. Good forecasts of future costs of low-carbon energy technologies could help guide research and development funding and climate policy. Our theory suggests that based on the design of a technology we might be able to better forecast its rate of improvement, and therefore make better investments and better estimates of the cost of achieving low-carbon energy conversion.

There have been several previous attempts to construct theories to explain the functional form of performance curves (19–21). Muth constructed a model of a single-component technology in which innovation happens by proposing new designs at random (21). Using extreme value theory he derived conditions under which the rate of improvement is a power law. An extension to multiple components, called the production recipe model, was proposed by Auerswald et al. (19). In their model each component interacts with other components, and if a given component is replaced, it affects the cost of the components with which it interacts. They simulated their model and found that under some circumstances the performance curves appeared to be power laws. Other models include Bendler and Schlesinger, who derive a power law based on the assumption that barriers to improvement are distributed fractally (22), and Huberman, who represents the design process as a random evolving graph (20). More recently Frenken has used the Auerswald model to interpret and address questions such as the efficacy of outsourcing (23, 24). Other related models that use random search to model technological progress (but which do not directly address performance curves) are those of Silverberg and Verspagen (25, 26) and Thurner et al. (27).

In this paper we both simplify and extend the production recipe model of Auerswald et al. (19). The simplifications allow us to derive the emergence of a power law, and most importantly, to derive its exponent α. We find that *α* = 1/(*γd*^{∗}), where γ measures the intrinsic difficulty of finding better components and *d*^{∗} is what we call the design complexity. When the connectivity of the components is constant the design complexity *d*^{∗} is equal to the connectivity. When connectivity is variable, the complexity can also depend on the detailed properties of the design, in ways that we make clear. We also show that when costs are spread uniformly across a large number of components, the whole technology undergoes steady improvement. In contrast, when costs are dominated by a few components, the total cost undergoes erratic improvement. Our theory thus potentially gives insight into how to design a technology so that it will improve more rapidly and more steadily.

We should emphasize that many factors besides design can affect costs—for example, the cost of input materials or fuels may change due to market dynamics rather than technology design (10). Furthermore, design is generally focused not just on reducing costs, but also on improving other properties such as environmental performance or reliability. The variable “cost” in the theory here can be interpreted as any property that depends on technology design.

## The Model

The production design consists of *n* components, which can be thought of as the parts of a technology or the steps in an industrial process^{†}. Each component *i* has a cost *c*_{i}. The total cost *κ* of the design is the sum of the component costs: *κ* = *c*_{1} + *c*_{2} + ⋯+*c*_{n}. A component’s cost changes as new implementations for the component are found. For example, a component representing the step “move a box across a room” may initially be implemented by a forklift, which could later be replaced by a conveyor belt. Cost reductions occur through repeated changes to one or more components.

Components are not isolated from one another, but rather interact as parts of the overall design. Thus changing one component not only affects its cost, but also the costs of other dependent components. Components may be viewed as nodes in a directed network, with links from each component to those that depend on it. The relationship between the nodes and links can alternatively be characterized by an adjacency matrix. In systems engineering and management science this matrix is known as the design structure matrix (DSM) (28–30). A DSM is an *n* × *n* matrix with an entry in row *i* and column *j* if a change in component *j* affects component *i* (Fig. 2). The matrix is usually binary (31, 32); however, weighted interactions have also been considered (33). DSMs have been found to be useful in understanding and improving complex manufacturing and technology development processes.

The model is simulated as follows:

Pick a random component

*i*.Use the DSM to identify the set of components whose costs depend on

*i*(the outset of*i*).Determine a new cost for each component from a fixed probability distribution

*f*.If the sum of the new costs, , is less than the current sum,

*a*_{i}, then each*c*_{j}is changed to . Otherwise, the new cost set is rejected.

This process is repeated for *t* steps. The costs are defined on [0,1]. We assume a probability density function that for small values of *c*_{i} has the form ; i.e., the cumulative distribution . The exponent γ specifies the difficulty of reducing costs of individual components, with higher γ corresponding to higher difficulty. This functional form is fairly general in that it covers any distribution with a power-series expansion at *c* = 0.

## Independent Components

We first consider the simple but unrealistic case of a technology with *n* independent components. This generalizes the one component case originally studied by Muth (21). The cost of a given component at time *t* is equivalent to the minimum of *t* independent, identically distributed random variables. In *SI Text*, we use extreme value theory to show that to first order in *n*/*t* the expected cost *E*[*κ*(*t*)] is [2]where Γ(*a*) is Euler’s gamma function.

To understand intuitively why the expected cost decreases as a power law in time, consider the simple example where *γ* = 1. In this case at each innovation attempt a new cost *κ*^{′} is drawn uniformly from [0,1], and a successful reduction occurs if *κ*^{′} is less than the current cost *κ*. Because the distribution of new costs is uniform on [0,1] the probability Prob(*κ*^{′} < *κ*) that *κ*^{′} represents a reduction simply equals *κ*. When a reduction does occur, the average value of *κ*^{′} is *κ*/2. Making the approximation that *E*[*κ*^{2}] = *E*[*κ*]^{2}, in continuous time the rate of change of the average component cost is [3]The solution to Eq. **3** gives the correct scaling *κ*(*t*) ∼ 1/*t* as *t* → ∞. The cost reductions are proportional to the cost itself, leading to an exponential decrease in cost with each reduction; however, each reduction takes exponentially longer to achieve as the cost decreases. The competition between these two exponentials yields a power law.

## Interacting Components, Fixed Out-Degree

Now consider an *n*-component process with fixed out-degree, where each component affects exactly *d* - 1 other components, in addition to affecting itself. Whether or not a given component will improve in a given trial strongly depends on the other components in its cluster. Consequently, the costs are no longer independent. If the design structure matrix *D*_{ij} is invertible the total cost *κ* can be decomposed as [4]where is the cost of cluster *i* and . Because the interaction of components inside the same cluster is much stronger than that of components in different clusters, we can make the approximation that clusters evolve independently. In *SI Text*, we derive the approximate behavior using two different methods, one based on extreme value theory and the other based on a differential equation for *E*[*κ*]. In the latter case we find [5]where [6]We compare our prediction in Eq. **5** to simulations in Fig. 3. Initially each component cost *c*_{i} is set to 1/*n*, so that the total initial cost *c*(0) = 1, and we choose *γ* = 1 for simplicity. Eq. **5** correctly predicts the asymptotic power law scaling of the simulated performance curves, as well as the deviation from power law behavior for short times. (As shown in *SI Text*, the extreme value method also predicts the correct asymptotic scaling.)

The salient result is that the exponent *α* = 1/(*γd*) of the performance curve is directly and simply related to the out-degree *d*, which can be viewed as a measure of the complexity of the design, and γ, which characterizes the difficulty of improving individual components in the limit as the cost goes to zero. If *γd* = 1 then *α* = 1 and the progress ratio 2^{-1/(γd)} is 50%. If *γd* = 3 then *α* = 1/3 and the progress ratio is approximately 80%, a common value observed in empirical performance curves.

This *d* dependence has a simple geometric explanation. Consider the case where *γ* = 1. Drawing *d* new costs independently is equivalent to picking a point with uniform probability in a *d*-dimensional hypercube. The combinations of component costs that reduce the total cost lie within the simplex defined by , where are the new costs. The probability of reducing the cost is therefore the ratio of the simplex volume to the hypercube volume, [7]which is a decreasing function of *d*. Thus a component with higher out-degree (greater connectivity) is less likely to be improved when chosen.

## Interacting Components, Variable Out-Degree

When the out-degree of each component is variable the situation is more interesting and more realistic because components may differ in their rate of improvement (31). Slowly improving components can create bottlenecks that hinder the overall rate of improvement. In this case it is no longer a good approximation to treat clusters as evolving independently.

As illustrated in Fig. 4, there are two ways to reduce the cost of a given component *i*:

Pick

*i*and improve cluster .Pick component

*j*in the inset of*i*and improve cluster .

From Eq. **7**, if component *i* has a large out-degree *d*_{i}, it is relatively unlikely to be improved by process 1. Nonetheless, if *j* has low out-degree, then *i* will improve more rapidly via process 2. Let be the out-degree of component *j*, which is in the inset of *i*. Then the overall improvement rate of component *i* is determined by ; i.e., it is driven by the out-degree of the component *j* in its inset whose associated cluster is most likely to improve. In *SI Text*, we demonstrate numerically that asymptotically . As *t* becomes large, the difference in component costs can become quite dramatic, with the components with the largest values of dominating. The overall improvement rate for the whole technology is then determined by the slowest-improving components, governed by the design complexity [8]We call any component with a bottleneck. When *t* is large one can neglect all but the bottleneck components, and as we show in *SI Text*, the average total cost scales as *E*[*κ*] ∼ *t*^{-1/d∗}. Note that in the case of constant out-degree *d* Eq. **8** reduces to *d*^{∗} = *d*.

To test this hypothesis we randomly generated 90 different DSMs with values of *d*^{∗} ranging from 1 to 9 and *γ* = 1, simulated the model 300 times for each DSM, measured the corresponding average rate of improvement, and compared with that predicted from the theory. We find good agreement in every case, as demonstrated in *SI Text*.

## Fluctuations

The analysis we have given provides insight not only about the mean behavior, but also about fluctuations about the mean. These can be substantial, and depend on the properties of the DSM. In Fig. 5 we plot two individual trajectories of cost vs. time for each of three different DSMs. The trajectories fluctuate in every case, but the amplitude of fluctuations is highly variable. In Fig. 5 *Left* the amplitude of the fluctuations remains relatively small and is roughly constant in time when plotted on double logarithmic scale (indicating that the amplitude of the fluctuations is always proportional to the mean). For Fig. 5 *Center* and *Right*, in contrast, the individual trajectories show a random staircase behavior, and the amplitude of the fluctuations continues to grow for a longer time.

This behavior can be explained in terms of the improvement rates for each component. The maximum value of determines the slowest-improving components. In Fig. 5 *Left* the maximum value of . This value occurs for four components. After a long time these four components dominate the overall cost. However, because they have the same values of their contributions remain comparable, and the total cost is averaged over all four of them, keeping the fluctuations relatively small. (See Fig. 5 *Lower*.)

In contrast, in Fig. 5 *Center* we illustrate a DSM where the slowest-improving component (number 7) has and the next slowest-improving component (number 6) has . With the passage of time component 7 comes to dominate the cost. This component is slow to improve because it is rarely chosen for improvement. But in the rare cases that component 7 is chosen the improvements can be dramatic, generating large downward steps in its trajectory. The right case illustrates an intermediate situation where two components are dominant.

Another striking feature of the distribution of trajectories is the difference between the top and bottom envelopes of the plot of the distribution vs. time. In every case the top envelope follows a straight line throughout most of the time range. The behavior of the bottom envelope is more complicated; in many cases, such as Fig. 5 *Left*, this bottom envelope also follows a straight line, but in others (for example, Fig. 5 *Center*) the bottom envelope changes slope over a large time range. A more precise statement can be made by following the contour corresponding to a given quantile through time. All quantiles eventually approach a line with slope 1/*d*^{∗}. However, the upper quantiles converge to this line quickly, whereas in some cases the lower quantiles do so much later. This slower convergence stems from the difference in improvement rates of different components. Whenever there is a dramatic improvement in the slowest-improving component (or components), there is a period where the next slowest-improving component (or components) becomes important. During this time the lower value of the second component temporarily influences the rate of improvement. After a long time the slowest-improving component becomes more and more dominant, large updates become progressively more rare, and the slope becomes constant.

The model therefore suggests that properties of the design determine whether a technology’s improvement will be steady or erratic. Homogeneous designs (with constant out-degree) are more likely to show an inexorable trend of steady improvement. Heterogeneous designs (with larger variability in out-degree) are more likely to improve in fits and starts. High variability among individual trajectories can be interpreted as indicating that historical contingency plays an important role. By this we mean that the particular choice of random numbers, rather than the overall trend, dominates the behavior. In this case progress appears to come about through a series of punctuated equilibria.

To summarize, in this section we have shown that the asymptotic magnitude of the fluctuations is determined by the number of bottlenecks; i.e., the number with . The fluctuations decrease as the number of bottlenecks increases. In the constant out-degree case all of the components are equivalent, and this number is just *n*. In the variable out-degree case, however, this number depends on the details of the DSM, which influence .

## Testing the Model Predictions

Our model makes the testable prediction that the rate of improvement of a technology depends on the design complexity, which can be determined from a design structure matrix. The use of DSMs to analyze designs is widespread in the systems engineering and management science literature. Thus, one could potentially examine the DSMs of different technologies, compute their corresponding design complexities, and compare to the value of α based on the technology’s history^{‡}. Thus we are able to make a quantitative prediction about learning curves. This is in contrast to previous work, which did not make testable predictions about α^{§}. This test is complicated by the fact that α also depends on γ, which describes the inherent difficulty of improving individual components, which in turn depends on the inherent difficulty of the innovation problem as well as the collective effectiveness of inventors in generating improvements. The exponent γ is problematic to measure independently. Nonetheless, one could examine a collection of different technologies and either assume that γ is constant or that the variations in γ average out. Subject to these caveats, the model then predicts that the design complexity of the DSM should be inversely proportional to the estimated α of the historical trajectory. A byproduct of such a study is that it would yield an estimate of γ in different technologies.

To compare the model predictions to real data one must relate the number of attempted improvements to something measurable. It is not straightforward to measure the effort that has gone into improving a technology, and to compare to real data one must use proxies. The most commonly used proxy is cumulative production, but other possibilities include cumulative investment, installed capacity, research and development expenditure, or even time. The best proxy for innovation effort is a subject of serious debate in the literature (34–38).

## Possible Extensions to the Model

There are a variety of possible ways to extend the model to make it more realistic. For example, the model currently assumes that the design network described by the DSM is constant through time, but often improvements to a technology come about by modifying the design network. One can potentially extend the model by adding an evolutionary model for the generation of new DSMs.

The possibility that the design complexity *d*^{∗} changes through time suggests another empirical prediction. According to our theory, when *d*^{∗} changes, α changes as well. One can conceivably examine an historical sequence of design matrices, compute their values of *d*^{∗}, and compare the predicted *α* ∼ 1/*d*^{∗} to the corresponding observed values of α in the corresponding periods in time. Our theory predicts that these should be positively correlated.

We have assumed a particular model of learning in which improvement attempts are made at random, with no regard to the history of previous improvements or knowledge of the technology. An intelligent designer should be able to do (as well or) better, drawing on his or her knowledge of science, engineering, and present and past designs. (We note that for particularly complex design problems, random search may be computationally the most efficient option.) The model we study here can be viewed as a worst case, which should be an indicator of the difficulty of design under any approach: A problem that is harder for a designer to solve under random search is also likely to be more difficult to solve with a more directed search.

## Discussion

We have developed a model that both simplifies and generalizes the original model of Auerswald et al. (19), which predicts the improvement of cost as a function of the number of innovation attempts. Whereas we have formulated the model in terms of cost, one could equally well have used any performance measure of the technology that has the property of being additive across the components. Our analysis makes clear predictions about the trajectories of technology improvement. The mean behavior of the cost is described by a power law with exponent *α* = 1/(*γd*^{∗}), where *d*^{∗} is the design complexity and γ describes the intrinsic difficulty of improving individual components. In the case of constant connectivity (out-degree) the design complexity is just the connectivity, but in general the complexity can depend on details of the design matrix, as spelled out in Eq. **8**. In addition, the range of variation in technological improvement trajectories depends on the number of bottlenecks. This number coincides with the total number of components *n* in the case of constant connectivity, but in general the number of bottlenecks is smaller, and depends on the detailed arrangement of the interactions in the design.

Many studies in the past have discussed effects that contribute to technological improvement, such as learning-by-doing, research and development, or capital investment. Our approach here is generic in the sense that it could apply to any of these mechanisms. As long as these mechanisms cause innovation attempts that can be modeled as a process of trial and error, any of them can potentially be described by the model we have developed.

Our analysis makes a unique contribution by connecting the literature on the historical analysis of performance curves to that on the engineering design properties of a technology. We make a prediction about how the features of a design influence its rate of improvement, focusing attention on the interactions of components as codified in the design structure matrix. Perhaps most importantly, we pose several falsifiable propositions. Our analysis illustrates how the same evolutionary process can display either historical contingency or steady change, depending on the design. Our theory suggests that it may be possible to influence the long-term rate of improvement of a technology by reducing the connectivity between the components. Such an understanding of how the design features of a technology affect its evolution could aid engineering design, as well as science and technology policy.

## Acknowledgments

We thank Yonathon Schwarzkopf and Aaron Clauset for helpful conversations and suggestions, and Sidharth Rupani and Daniel Whitney for useful correspondence. J.M., J.D.F., and J.E.T. gratefully acknowledge financial support from National Science Foundation (NSF) Grant SBE0738187; S.R. acknowledges support from NSF Grant DMR0535503.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: trancik{at}mit.edu.

Author contributions: J.M., J.D.F., S.R., and J.E.T. performed research and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1017298108/-/DCSupplemental.

↵

^{*}Koh and Magee (35, 36) claim an exponential function of time (Moore’s law) predicts the performance of several different technologies. Goddard (34) claims costs follow a power law in production rate rather than cumulative production. Multivariate forms involving combinations of production rate, cumulative production, or time have been examined by Sinclair et al. (38) and Nordhaus (37).↵

^{†}The original production recipe model (19) contained 6 parameters. We eliminated four of them as follows: length of production run*T*→ ∞; output-per-attempted-recipe-change ; available implementations per component*s*→ ∞; search distance*δ*→ 1.↵

^{‡}One problem that must be considered is that of resolution. As an approximation, a DSM can be constructed at the coarse level of entire systems (e.g., “electrical system,” “fuel system”) or it can be constructed more accurately at the microscopic level in terms of individual parts. In general these will give different design complexities.↵

^{§}A possible exception is Huberman (20), who presents a theory in terms of an evolving graph, and gives a formula that predicts power law scaling as a function of “the number of new shortcuts.” It is not clear, however, whether this could ever be measured.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- Argote L,
- Epple D

- ↵
- ↵
- Thompson P

- ↵
- ↵
- Ebbinghaus H

- ↵
- Wright TP

- ↵
- ↵
- ↵
- McNerney J,
- Farmer JD,
- Trancik JE

- ↵
- ↵
- Maycock PD

- ↵
- ↵
- Strategies-Unlimited

- ↵
- Moore GE

- ↵
- Argote L

- ↵
- Wene C

- ↵
- Boston Consulting Group

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Baldwin CY,
- Clark KB

- ↵
- ↵
- Steward DV

- ↵
- ↵
- Whitney DE,
- Dong Q,
- Judson J,
- Mascoli G

- ↵
- ↵
- ↵
- ↵
- ↵
- Nordhaus WD

- ↵

## Citation Manager Formats

## Article Classifications

- Social Sciences
- Economic Sciences

- Physical Sciences
- Engineering