## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Optimal decision making and matching are tied through diminishing returns

Edited by Ranulfo Romo, Universidad Nacional Autonóma de México, Mexico City, D.F., Mexico, and approved July 3, 2017 (received for review March 1, 2017)

## Significance

Decisions critically affect the well-being of individuals and societies. However, how to suitably model, understand, and predict people’s decisions has been a long-standing challenge. Economic models view individuals as optimal decision makers who maximize their overall reward income. Psychologists and ecologists have observed that decision makers tend to use a relatively simpler “matching” strategy, distributing their behavior in proportion to relative worth of their options. This article demonstrates that matching can be an optimal strategy for decision makers when the rewards associated with their options diminish with the invested effort, a relationship known as the law of diminishing returns. Because diminishing returns are prevalent in nature and social settings, the commonly observed matching behavior is not only simple but also efficient and rational.

## Abstract

How individuals make decisions has been a matter of long-standing debate among economists and researchers in the life sciences. In economics, subjects are viewed as optimal decision makers who maximize their overall reward income. This framework has been widely influential, but requires a complete knowledge of the reward contingencies associated with a given choice situation. Psychologists and ecologists have observed that individuals tend to use a simpler “matching” strategy, distributing their behavior in proportion to relative rewards associated with their options. This article demonstrates that the two dominant frameworks of choice behavior are linked through the law of diminishing returns. The relatively simple matching can in fact provide maximal reward when the rewards associated with decision makers’ options saturate with the invested effort. Such saturating relationships between reward and effort are hallmarks of the law of diminishing returns. Given the prevalence of diminishing returns in nature and social settings, this finding can explain why humans and animals so commonly behave according to the matching law. The article underscores the importance of the law of diminishing returns in choice behavior.

People’s decisions define the future of individuals and social groups. How to suitably model, understand, and predict individuals’ choice behavior has therefore been a matter of intense research efforts involving multiple disciplines.

Economic models provide normative prescriptions of how individuals should make decisions. According to these models, individuals make decisions in order to maximize their expected reward, utility, or income (1⇓–3). For example, in prospect theory (4), subjects maximize the expected utility of potential decision outcomes. The expected utility is computed as the sum of the outcomes’ values weighted by the probabilities that the individual outcomes will occur. Within such maximization models, Bayesian decision theories can be used to dictate how subjects should optimally compute the individual probability terms (5). Despite the normative appeal of such maximization models, it has been unclear whether organisms are capable of implementing and acting on the complex computations prescribed by these models (6⇓–8).

Researchers in psychology, ecology, sociology, and neuroscience found evidence for a relatively simpler model for decision making. It has been found that decision makers tend to distribute their behavior in proportion to relative rewards associated with their options (3, 8⇓⇓⇓⇓⇓⇓⇓–16). The match between the behavioral and reward distributions,

Which of these models is the more appropriate to capture and predict choice behavior has been a subject of substantial debate (7, 8, 18⇓⇓⇓–22). In some cases, subjects maximize and do not match (23, 24), whereas in other cases subjects match even though maximization would be a better strategy (3, 20, 25, 26). Psychologists have compared matching and maximization in tasks that used specific schedules of reinforcement (24, 27, 28), but whether the two models can be linked analytically at a general level has remained elusive. The present article turns matching and maximization face to face and at a general level. By doing so, it identifies a connection between the economic and psychological frameworks, and the nature of the connection provides an explanation for why humans and animals so often follow the matching strategy.

## Results

### Optimal Decision Making.

An optimal decision maker distributes her effort across options such as to maximize the total harvested reward. When effort *Methods*) when

### Matching Behavior.

Humans and animals often follow a matching strategy (3, 8⇓⇓⇓⇓⇓⇓⇓–16), distributing their behavior

### Maximization–Matching Relation.

Given these observations, maximization and matching align when marginal reward per effort associated with an option is a strictly monotonic function *Methods*). The marginal and average quantities for an example relationship between reward and effort are illustrated as dotted and dashed lines in Fig. 1, respectively.

This formulation empowers us to identify the relationships (henceforth referred to as contingencies) between reward and effort *Methods*) on *i*) *ii*) it must be *Methods*) yield contingencies including

These derived contingencies provide tight fits to the relationship between reward and effort in a dominant task used to study choice behavior (8, 9, 13). In particular, in the variable-interval schedule task (Fig. 2), an increase in a subject’s effort generally leads to an increase in the harvested reward. However, at a certain point the reward saturates and shows diminishing returns with additional effort (Fig. 2). The derived

This, in fact, turns out to be a general finding. For matching to provide a reward maximum, the reward–effort contingencies of the choice options must exhibit diminishing returns (see *Methods* for proof). This finding ties the matching law, deemed by many a fundamental principle underlying choice behavior (3, 11, 12, 14) with economic maximization through a surprising link—the law of diminishing returns.

This conclusion holds also in environments in which rewards are stochastic and for maximization metrics that explicitly incorporate probabilistic outcomes. For instance, when rewards *SI Methods*). In addition, the *SI Methods*).

## Discussion and Conclusions

The matching law has been an influential description of choice behavior of animals and humans (3, 8⇓⇓⇓⇓⇓⇓–15), but how this principle relates to the maximization apparatus of economic theories and why individuals so often adopt matching have been unclear. This article notes that reward depends on the invested effort and identifies the reward–effort contingencies for which matching is an optimal choice strategy. It is found that the contingencies for which matching provides a reward maximum share one striking commonality: They exhibit diminishing returns. In these cases (e.g., Fig. 2), the harvested reward increases as a function of the invested effort up to a point where additional investment confers a relatively small marginal benefit. The law of diminishing returns (29, 30) turns out to be an intriguing link that connects the psychological matching law with economic maximization.

The finding that matching is crucially based on diminishing returns was not obviously derivable from previous literature. The present study arrives at this finding by developing a framework that relates matching and optimization at a general level, by providing a complete space of solutions for which matching is optimal and by deriving the requirements for matching to attain reward maxima. Previous studies identified only two specific solutions, *Methods*).

The finding that matching critically rests on situations with diminishing returns brings us to the question of how commonly diminishing returns figure in nature and human society. The law of diminishing returns (29, 30) traces its history back to Turgot who discovered that agricultural output progressively decreases with increasing quantities of invested capital and labor (31). The idea has subsequently been elaborated by economists such as Malthus and Ricardo. The law of diminishing returns now lies at the heart of many branches in economics, including production, investment, and economic growth theories (32⇓⇓–35). For example, the present article identifies the Cobb–Douglas function (*Methods*), just as prescribed by the Cobb–Douglas function. This constitutes diminishing returns. Exemplified in production, if two or more production processes can be described by a Cobb–Douglas function with equal exponents

The findings of this article also apply to ecology. Consider a predator who must decide how to distribute her foraging effort between sources of prey. Foraging or harvest situations commonly involve diminishing returns (36). For example, either the predator gradually depletes the prey or the regeneration of a resource saturates due to factors such as crowding. This article shows that situations of diminishing returns allow the predator to distribute her effort effectively according to the matching law, equalizing the value (the obtained reward per invested effort) of each source. If in source 2 she obtains twice the amount of prey per foraging effort as in source 1, she spends twice the amount of effort on source 2 compared with source 1. The major benefit of this approach is that it is simple. The predator visits the sources to determine their value and distributes her effort to harvest equal value from both (37, 38). This is in contrast to a reward-maximizing agent that must learn about the outcomes of all possible allocations of effort across the sources and compute derivatives to evaluate the marginal reward per effort across the sources (39). In situations of diminishing returns like this, this article shows that matching represents an effective heuristic to maximizing reward income.

Diminishing returns also apply to situations that rest on temporal, financial, and mental effort. For instance, when effort involves time (

Matching is not a ubiquitous phenomenon. There are situations, such as those modeled by random (variable) ratio schedules, in which the reward rate *Methods*). Indeed, subjects in such cases commonly converge on almost exclusively choosing the richer alternative (23, 24). Therefore, the presence or absence of diminishing returns in a given task can be used as an indicator of whether subjects should or should not exhibit matching.

Diminishing returns embody a necessary condition for matching to provide maximal total reward harvested. This is not at the same time a sufficient condition; one can find examples of saturating functions for which matching does not imply a reward maximum. Such saturating functions can nonetheless be approximated with functions derived using specific forms of the generator function

In addition to illuminating the relationship between matching and reward maximization, this article also contributes to the body of research on effort-based decision making, in two ways. First, in the present framework, effort is treated as a resource instead of a variable with negative valence. Second, it is shown that computing value

Reward maximization represents a behavioral equilibrium characterized by equalized marginal reward per unit of effort across the choice options. Which optimization strategy may result in such an equilibrium has been a matter of debate (16, 18, 44). In stochastic environments, one of the main candidate frameworks that can lead to reward maximization has been the Bayesian decision theory. According to this formalism, subjects make decisions such as to maximize the expected utility, which incorporates probability terms that model uncertain relationships between decisions and outcomes (5, 45). The Bayesian framework provides an optimal prescription for how individuals should update their probability estimates given prior experience and recent evidence. Although the neuronal computations underlying Bayesian inference appear to be biologically plausible (5, 46), it remains to be seen whether such computations can be combined with utility representations to provide the maximization metrics necessary to guide optimal decisions in complex choice situations (4, 5).

Matching also represents a behavioral equilibrium, characterized by equalized average reward per effort across the choice options. Several behavioral strategies have been shown to result in matching (16, 47⇓⇓–50). One of the leading candidates, directly derived from matching, has been melioration (47, 51). According to melioration, decision makers assess, over a certain time period, the value (reward per effort) of each option and adjust their effort, with certain frequency, to the option with the highest value. Equilibrium is achieved and subjects match when the values (reward per effort) are equalized across the options. In contrast, optimization, the process that leads to reward maximization, continuously reallocates effort to the option with the highest marginal reward per effort. Equilibrium is achieved when the marginals are equal across the choice options (*Optimal Decision Making*). There are two major advantages of melioration over optimization. First, evaluating reward that has cumulated over a certain time period reduces noise in the reward income in stochastic environments. Second, because melioration operates on a certain time period, effort can be adjusted with a frequency corresponding to that time period; during optimization, effort is adjusted at each point in time. Evaluating cumulative values every so often is a biologically much more plausible strategy compared with computing local derivatives at each point in time. This paper shows that making decisions based on a certain time period—inherent to melioration and matching—can constitute a good strategy in choice environments with diminishing returns.

Deciding between different classes of options, such as whether to cook at home or eat out at a restaurant, involves a comparison of utilities and efforts associated with the options. In this regard, melioration as a simple behavioral strategy can be generalized to represent utilities instead of reward rates (14, 17). In this generalized model, subjects choose the option that provides the highest utility per effort (or per economic cost). In equilibrium, such a strategy leads to matching of effort (or financial resources) to the relative utilities.

In sum, the present article links two dominant frameworks of choice behavior and finds that matching can be an efficient and effective instantiation of economic maximization as long as the choice environment features diminishing returns. The observations that humans and animals so often behave according to the matching law now find footing in the law of diminishing returns. In light of diminishing returns, matching becomes an efficient heuristic to optimal decision making.

## Methods

### Reward and Effort in Variable-Interval Schedules of Reinforcement.

The relationship between reward and effort in Fig. 2 was obtained using a simulation of the variable-interval schedule task. In this task, a reward of a certain magnitude is delivered at random intervals but at a constant overall rate. For example, the data shown in Fig. 2 use a rate of

### Optimal Decision Making.

This section derives how effort should be distributed among choice options to maximize the total reward harvested, **3** dictates how effort should be allocated to attain a critical point in the reward landscape. This can be a maximum, a minimum, or a saddle point. To obtain a reward maximum, the leading principal minors of the bordered Hessian matrix corresponding to Eq. **1**, evaluated at critical points, must alternate in sign, with the first minor (of order 3) showing a positive sign (34).

Let us label **1** is

The same two arguments also hold for choice situations that feature three and more options. For example, for three options, the leading principal minor of order 4 is equal to

### Matching Behavior.

Behavior according to the matching law,

### Maximization–Matching Relation.

Matching and optimization align when marginal reward **4** into Eq. **3** leads to

Having turned the matching law and optimization face to face, we can identify the contingencies between reward and effort,

### Reward–Effort Contingencies for Estimated Reward Returns.

Let us assign the matching terms **4**, now becoming

### Reward–Effort Contingencies for Actual Reward Returns.

Let us now consider the cases in which subjects match their response distribution to the actual returns obtained from choosing an option. In such cases, the value function takes the form **4**, now becoming **6** embodies the Cobb–Douglas function that was originally applied to model diminishing returns in production output as a function of the input labor (32). This function has also been identified to relate reinforcement and behavior in psychology (21).

As another example, for **4** generates a reward–effort contingency

All other reward–effort contingencies can be generated by defining a specific

### Reward Maxima and Diminishing Returns.

Matching provides a critical point in the reward landscape for every solution of Eq. **4**. It will be shown here that whether the critical point is a maximum, a minimum, or a saddle point depends solely on the properties of the generator function

#### Estimated reward returns.

Applying the chain rule, the derivative of Eq. **4** with respect to effort for estimated rewards *Optimal Decision Making* showed that for our constraint optimization problem to yield a reward maximum, it must be

#### Actual reward returns.

The proof proceeds in a similar fashion for actually obtained reward returns **4** with respect to effort is**4**, *Optimal Decision Making* showed that for a critical point to represent a reward maximum, it must be *i*) *ii*)

For matching,

### Specific Examples.

The previous section showed that matching provides a reward maximum when **5** and **6** for which **6**, in addition to **7**, the

The previous section proved that this is a general finding. For matching to deliver a reward maximum, the reward–effort contingencies of all choice options must show diminishing returns.

### Increasing Returns.

It is worth also considering the complement, i.e., the possibility that matching might be optimal under situations of increasing (and not diminishing) returns. Increasing returns for an option *Reward Maxima and Diminishing Returns* showed that when Eq. **4** is satisfied and matching holds, *Optimal Decision Making*) become all negative. But negative principal minors constitute a sufficient condition for a reward minimum (34). Thus, increasing returns do not allow for matching to be optimal; diminishing returns are indeed required. An additional consequence of *Optimal Decision Making*) either alternate in sign—which is a sufficient condition for a reward maximum—or are all negative—which is a sufficient condition for a minimum. A third possible outcome—a saddle point—would require a distinct pattern of signs of the principal minors (34).

## SI Methods

### Probabilistic Reward Outcomes.

Rewards in natural settings are often stochastic. The stochasticity can manifest in two ways. Either the reward itself is delivered with a certain probability or the conversion of a subject’s decision to a choice is probabilistic. In both cases, a decision

### Optimal decision making with probabilistic rewards.

When rewards

Using the method of Lagrange multipliers to solve this problem, the Lagrangian is in this case formulated as**S3** dictates how effort should be allocated to attain a critical point in the reward landscape. This can be a maximum, a minimum, or a saddle point. To obtain a reward maximum, the leading principal minors of the bordered Hessian matrix corresponding to Eq. **S2**, evaluated at critical points, must alternate in sign, with the first minor (of order 3) showing a positive sign (34). As in *Methods* in the main text, let us label **S2** is

### Matching Behavior.

Matching can also be generalized to accommodate probabilistic reward, with subjects matching the expected values of rewards over the individual options:

### Matching–Optimization Relation.

Analogously to choice situations in the main text, matching and optimization in the probabilistic reward settings align when**S5** into Eq. **S3** leads to **S4**).

The contingencies between reward and effort,

### Reward–Effort Contingencies for Estimated Reward Returns.

It is easy to see that for a simple case of **S5**, now becoming

### Reward–Effort Contingencies for Actual Reward Returns.

Analogous findings are obtained for cases in which subjects match their response distribution to the actual returns obtained from choosing an option. For **S5**, now becoming **S5** generates a reward–effort contingency

As in the main text, all other reward–effort contingencies can be generated by defining specific forms of

### Reward Maxima and Diminishing Returns.

It will be shown here that for matching to deliver a maximum in the expected reward when rewards are probabilistic, the reward–effort profiles of all choice options must show diminishing returns. As in the main text, the proof follows from evaluating the second derivative of

#### Estimated reward returns.

Applying the chain rule, the derivative of Eq. **S5** with respect to effort for estimated rewards *Optimal Decision Making* showed that for our constraint optimization problem to yield a reward maximum, it must be

#### Actual reward returns.

The proof proceeds in a similar fashion for actually obtained reward returns **S5** with respect to effort is**S5**, *Optimal Decision Making* showed that for a critical point to represent a reward maximum, it must be

For matching,

Together, all findings obtained for deterministic rewards in the main text also hold in situations in which reward is probabilistic and in which subjects operate on expected values of rewards.

### Relationship to Prospect Theory.

The formalism of maximizing the expected value of reward (Eq. **S1**),

Note that in *SI Methods*, *Reward Maxima and Diminishing Returns*, the proof of matching resting on diminishing returns is very similar to that of the main text, with the exception of an additional multiplier

## Acknowledgments

I thank Drs. Leonard Green, Lawrence Snyder, Nina Miolane, and Julian Brown for helpful comments. This work was supported by the NIH Grant K99NS100986 and by the McDonnell Center for Systems Neuroscience.

## Footnotes

- ↵
^{1}Email: kubanek{at}stanford.edu.

Author contributions: J.K. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1703440114/-/DCSupplemental.

## References

- ↵.
- Friedman M

- ↵.
- Becker G

- ↵.
- Mazur JE

- ↵
- ↵.
- Körding K

- ↵.
- Hollis M,
- Nell EJ

- ↵
- ↵
- ↵
- ↵.
- Staddon J

- ↵
- ↵.
- Baum WM

- ↵.
- Davison M,
- McCarthy D

- ↵.
- Herrnstein RJ

- ↵.
- Sugrue LP,
- Corrado GS,
- Newsome WT

- ↵.
- Loewenstein Y,
- Seung HS

- ↵
- ↵
- ↵.
- Binmore K

- ↵.
- Staddon JER

- ↵.
- Rachlin H

- ↵
- ↵
- ↵
- ↵.
- Otto AR,
- Taylor EG,
- Markman AB

- ↵.
- Kubanek J,
- Snyder LH

- ↵
- ↵.
- Heyman GM

- ↵.
- Shephard RW,
- Färe R

- ↵.
- Brue SL

- ↵.
- Turgot ARJ

- ↵.
- Cobb CW,
- Douglas PH

- ↵
- ↵.
- Baumol WJ

- ↵.
- Jones C

- ↵.
- Real LA

- ↵.
- Fagen R

- ↵
- ↵
- ↵
- ↵
- ↵.
- Croxson PL,
- Walton ME,
- O’Reilly JX,
- Behrens TE,
- Rushworth MF

- ↵.
- Prévost C,
- Pessiglione M,
- Météreau E,
- Cléry-Melin ML,
- Dreher JC

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Soltani A,
- Wang XJ

- ↵.
- Herrnstein RJ,
- Prelec D

- ↵

## Citation Manager Formats

### More Articles of This Classification

### Social Sciences

### Psychological and Cognitive Sciences

### Biological Sciences

### Related Content

- No related articles found.

### Cited by...

- No citing articles found.