# Compressive mapping of number to space reflects dynamic encoding mechanisms, not static logarithmic transform

See allHide authors and affiliations

Edited by Wilson S. Geisler, The University of Texas at Austin, Austin, TX, and approved April 15, 2014 (received for review February 18, 2014)

## Significance

The ability to map numbers onto space is fundamental to measurement and mathematics. The mental “numberline” is an important predictor of math ability, thought to reflect an internal, native logarithmic representation of number, later becoming linearized by education. Here we demonstrate that the nonlinearity results not from a static logarithmic transformation but from dynamic processes that incorporate past history into numerosity judgments. We show strong and significant correlations between the response to the current trial and the magnitude of the previous stimuli and that subjects respond with a weighted average of current and recent stimuli, explaining completely the logarithmic-like nonlinearity. We suggest that this behavior reflects a general strategy akin to predictive coding to cope adaptively with environmental statistics.

## Abstract

The mapping of number onto space is fundamental to measurement and mathematics. However, the mapping of young children, unschooled adults, and adults under attentional load shows strong compressive nonlinearities, thought to reflect intrinsic logarithmic encoding mechanisms, which are later “linearized” by education. Here we advance and test an alternative explanation: that the nonlinearity results from adaptive mechanisms incorporating the statistics of recent stimuli. This theory predicts that the response to the current trial should depend on the magnitude of the previous trial, whereas a static logarithmic nonlinearity predicts trialwise independence. We found a strong and highly significant relationship between numberline mapping of the current trial and the magnitude of the previous trial, in both adults and school children, with the current response influenced by up to 15% of the previous trial value. The dependency is sufficient to account for the shape of the numberline, without requiring logarithmic transform. We show that this dynamic strategy results in a reduction of reproduction error, and hence improvement in accuracy.

Humans have a strong intuition of the spatial nature of numbers, usually (but not always) a horizontal “mental numberline,” with numbers increasing from left to right (1⇓⇓–4). However, the nature of number mapping is not identical for all, but changes during development, starting from a nonlinear representation, well characterized as logarithmic (placing, for example, the number 10 near the midpoint of a 1–100 scale), then becoming more linear over the first years of schooling (3, 5, 6). Similarly, logarithmic-like numberlines have been demonstrated in indigenous Amazonian populations without formal mathematical schooling (4).

Several recent studies have shown that under certain circumstances even the math-educated tend to reproduce numbers logarithmically. For example, we showed that depriving attentional resources leads to logarithmic-like numberline responses (7), consistent with the possibility that the native logarithmic encoding emerges when attention is deprived. Other studies have shown that the use of unfamiliar numerical format (such as exponential) can induce a switch from a linear to a logarithmic-like response, even in math-educated adults (7, 8). Most recently, Dotan and Dehaene (9) have devised a clever technique to record the whole trajectory of the pointing response (across the face of a touchscreen), rather than just the endpoint: The response begins quite logarithmically, then corrects toward linear mapping by the time contact is made. All these studies have led many to interpret the logarithmic map as the direct reflection of the internal native number representation (4, 10⇓–12) that becomes corrected over time by education but can emerge under special circumstances.

Whereas the nonlinear numberline is consistent with intrinsic logarithmic processes, other explanations have been suggested. One promising possibility is that the nonlinearity results from a “central tendency of judgment” or “regression toward the mean” (7, 13, 14), which has been successfully applied to many perceptual tasks (15) and recently well described within the Bayesian framework (7, 16): Under conditions of uncertainty—such as under attentional load or unfamiliar numerical format—responses tend to be biased toward the mean of the stimulus distribution. Importantly, strategies of this sort can lead to reductions in error, particularly under conditions of high uncertainty or noise (7, 16⇓⇓–19). In the numberline task, regression toward the mean predicts a logarithmic pattern of results (7, 13), with a goodness of fit similar to a log-linear model (7).

How can we distinguish between these two plausible classes of explanations of the numberline nonlinearities? One major difference is that whereas the logarithmic transform is a static nonlinearity, regression to the mean is a dynamic process that requires continuous online updating. A strong prediction, therefore, is that there should exist serial dependencies between the response to the current stimulus and the strength of previous stimuli. In this study we measure intertrial dependencies and demonstrate strong and significant correlations between the current response and previous stimuli, clearly favoring the central tendency explanation of the nonlinear numberline. We go on to develop a simple Bayesian integration model to show how the intertrial dependencies can explain completely the nonlinear numberline, without resort to logarithmic encoding or other static nonlinearities.

## Results

We measured numberline mapping under high and low attentional demand. Five subjects were asked to indicate the numerosity of clouds of dots on a line demarcated by a single dot on the left and a 100-dot cloud on the right. In the single-task condition they simply responded to the numerosity, in the dual-task condition they first responded to a difficult visual conjunction task, then to the numerosity (*Methods* and ref. 7). The symbols of Fig. 1 show the average results for single-task (Fig. 1*A*) and dual-task (Fig. 1*B*) conditions. Agreeing with previous studies (7, 14), the data show a less linear pattern under conditions of high attentional load. The thin black curves show the best linear and log fits and the red a mixed model combining linear and logarithmic components:

where *R* is the response to numerosity *N*, *a* is a scaling factor, *N*_{max} the end of the numberline (equal to 100), and *λ* a factor describing the logarithmic nonlinearity (0 for pure linear, 1 for pure logarithmic). This two-parameter fit was quite good (fitting parameters are given in the legend to Fig. 1). The logarithmic component *λ* was 0.11 for the single-task condition, and 0.38 for the double task, reflecting the nonlinear number mapping under attentional load. The blue curves show the predictions of a Bayesian integration model, described later.

Fig. 1 *C* and *D* show the SDs for the numberline judgments under the two conditions (on log–log coordinates). These can be taken as estimates of the thresholds for localization on the numberline. The thick lines show the output of two models (discussed later), and the thin straight lines best-fitting regressions (for use in modeling). In fitting the data, we excluded the subitizing range, because there is very good evidence that different mechanisms operate over that range (2). Furthermore, subitizing mechanisms are attention-dependent, operating only when there are sufficient attentional resources (20, 21). For that reason, it seemed safest to exclude those measurements from the fits of single-task data.

### Trialwise Dependencies.

The simplest direct prediction of the central-tendency explanation is that responses to trials preceded by a less numerous stimulus should on average be lower than responses to trials preceded by a more numerous stimulus. Fig. 2 shows that this prediction is borne out. The red points show response errors for when the previous trial was greater (by at least 7), blue points where it was at least 7 less, and the green points where it was similar *F*_{(1,4)} = 29.9, *P* < 0.005]. For the single-task condition (which showed little logarithmic tendency), the curves again separate (although less obviously), and again the difference is significant [*F*_{(1,4)} = 15.9, *P* = 0.016].

We next looked for more quantitative dependencies between the magnitude of the current responses and previous stimuli. Fig. 3*A* shows responses at four sample numerosities as a function of the numerosity of the previous trial for the single- (*Left*) and dual-task (*Right*) conditions. At low subitizing numerosities, the previous trials had very little effect, but at higher numerosities there was a clear dependency on the previous trial, with responses varying almost monotonically with the magnitude of previous stimulus. The dashed, color-coded lines show the robust linear regressions of the data, all of which have positive slope. The thick lines (in this and the other figures) are model predictions, discussed later.

We take the slope of these regressions as an index of the response dependency on previous stimuli and plot them in Fig. 3*B* as a function of the numerosity of the current trial (black squares). The dependencies are highest at midhigh numerosities, falling off at the lowest and the highest numerosities. We averaged the weights of Fig. 3*B* for the range of numerosities greater than 5 (outside the subitizing range) and plot them as a function of past and future trials in Fig. 3*C*. The average weights of trial *i*–1 were strong (w = 0.08 for single-task, w = 0.12 for dual-task), and highly significant (*P* < 10^{−5}, bootstrap sign test). For the dual-task condition, there was also a significant dependency for trial *i–*2 (w = 0.04, *P* = 0.04, bootstrap sign test). That for the single-task condition was also positive (w = 0.01) but not significantly greater than zero. Importantly, there were no significant dependencies on future trials, a strong control against statistical artifacts.

### Bayesian Integration Model.

To show how intertrial dependencies can predict the nonlinear numberline, we implement a simple Bayesian integration model whose output is shown by the blue lines in Figs. 1 and 3. This model should be considered more of an existence proof to show that correlations can lead to nonlinearities, rather than attempting to describe actual physiological mechanisms. The model resembles a Kalman filter, in that we assume that the expected response to any given trial *i* is given by the weighted average of the numerosity of the current stimulus and the estimate of that of the past:

where

How do we define the weight

This formula expresses ideal weighting when trials *i* and *i−*1 sample the same physical stimulus. In this case they do not, and the probability that they are different can be shown to vary with the square of their separation (22, 23). The formula now becomes

We assume that the SD (thresholds) will, like most sensory discriminations, follow a power law:

where *α* is the index of the power law and *k* a free constant (the only degree of freedom in the model), giving the overall level of noise. Substituting Eq. **5** in Eq. **4** the weight given to the previous trial is

Fig. 4 shows the general behavior of the model for various values of *α* and *k*. Note that *α* = 1 (threshold proportional to *N*) describes Weber’s law, *α* = 0.5 describes a square-root relationship, commonly referred to as “shot-noise” or “Poisson noise,” and *α* = 0 describes constant noise, invariant with numerosity. The curves for different *α* (Fig. 4*A*) all show strong logarithmic-like compression. The major effect of changing the index is that for high *α* the deviation from veridicality is mainly at high numbers, whereas for lower values there is also a deviation toward the mean at low numbers (clearer on the expanded insert). It is clear that the general pattern of results does not depend on a specific noise regime. Even the implausible assumption of constant noise (*α* = 0) causes a strong regression to the mean, logarithmic-like over much of the range. Simply adding the constraint of veridicality in the subitizing range would make the function quite like a logarithmic transform.

In the simulations of Fig. 4*A* the free parameter *k* was adjusted to best fit the data in the dual-task condition (which showed the most nonlinearity). Fig. 4*B* shows the effect of varying noise level *k* (fixing *α* at 0.36, the measured value for the dual-task condition). With increasing *k*, the functions become more curved, deviating more from veridicality in a logarithmic-like manner.

### Modeling the Data.

To model our numberline data, we calculated the value of the power-law index (*α* in Eqs. **5** and **6**) from the data (Fig. 1 *C* and *D*) by linear regression. This yielded a value of 0.52 (shot-noise) for the single-condition data and 0.36 for the dual-task condition. Note, however, that although using the data to determine *α* seemed the most assumption-free strategy, Fig. 4*A* shows that the choice of *α* is not fundamental for the general pattern of results. We then adjusted parameter *k* (the overall level of noise) to give the best fits to the data. With only this one degree of freedom, the model captures the nature of the numberline data with a comparable or better fit than the two-parameter log-linear model (fit parameters in the legend to Fig. 1).

The blue lines of Fig. 3 show Monte Carlo simulations of the model, obtained by simulating 1,000 virtual subjects, assuming that their estimates of *N*_{i} were corrupted by Gaussian noise of SD given by Eq. **5** (*α* and *k* given by data and best fit, respectively). Fig. 3*A* shows simulated responses as a function of magnitude of previous response. In general, the pattern of results is similar to the data, with clear dependencies at all except the lowest numerosity. We then calculated linear regressions of the simulations and plotted these with those of the data in Fig. 3*B*. Again the model captures the trend in the data: low for low numerosities, higher for mid and high levels. It even captures the tendency of the single-task data to show weaker dependencies at the highest numbers. Fig. 3*A* show the predictions of overall dependencies, again capturing the trend in the data, predicting strong dependencies in both conditions for trial *i*–1 and also a measureable dependency at trial *i*–2, as was observed. The red curves of Fig. 3 *B* and *C* show the simulations of a memory-free system with partial logarithmic encoding (using the values of the fit of Eq. **1**, and the same values of *α* and *k* as for the Bayesian model). As may be expected, the static linearity predicted zero correlations in all conditions. Although this prediction is obvious, it serves as a sanity check to ensure that the programs functioned correctly without introducing spurious, artifactual correlations.

Finally, we obtained fits to the data by simulating the behavior of 1,000 virtual subjects performing experimental sessions like the subjects, randomizing trial order for each simulation. For each trial the response was calculated as a weighted average of the current stimulus (corrupted by Gaussian noise with SD given by Eq. **5**) and the previous response:*W*_{i−1} follows Eq. **6**. Responses that would have exceeded the numberline (because of a large draw of noise) are clipped at the boundary of the line. The current response is stored and retrieved on the following simulation without further noise corruption.

The blue curves of Fig. 1 show the simulated responses (*A* and *B*) and SDs (*C* and *D*). Clearly, the simulation describes the data well, particularly in the dual-task condition where it captures 97% of the variance (see the Fig. 1 legend for fit parameters). The thick blue lines of Fig. 1 *C* and *D* show the SDs of the simulated responses. These follow a trend similar to the measured SDs, approximating a power law of similar slope to the data, but slightly higher: by a factor of 1.2 for the single task and 1.7 for the dual task. This implies that the system assumes a slightly higher noise level than actually exists. The red lines show the predictions of the log-linear model, assuming that precision is dictated by the inverse of the slope of the log-linear Eq. **1**. In practice this leads to Weber’s law when the model is fully logarithmic (*λ* = 1) and to constant noise when the mixture model is fully linear (*λ* = 0). However, we do not take the failure in predicted noise levels as strong evidence against the logarithmic model.

### Potential Advantages of Bayesian Integration.

Bayesian strategies are usually thought to be statistically advantageous, reducing the variability of sensory estimates (e.g., ref. 16). Fig. 5 shows how this may apply in this case. We simulated rms error as a function of the noise constant (*k* of Eq. **5**) for a memoryless linear system and the Bayesian integration model. The Bayesian model predicts less error than the memoryless model, and the difference increases with noise level. The symbols studded into the lines show the noise level that best fit the dual-task adult condition.

Fig. 5*B* gives an intuition of how the weighted average can lead to an advantage. We take the example of adult responses under attentional load. Following Jazayeri and Shadlen (16), we partition the error into two orthogonal components, the bias (average accuracy) given by the distance of the average response from veridicality (the distance of the points of Fig. 1 from the equality line), and the precision, given by the SD of the scatter of responses around that mean. The Pythagorean sum of these two components gives the total error, on this graph represented by the distance from the origin. While a memoryless linear system is essentially bias-free (near vertical red curve), both the Bayesian model and the data show increasing bias for larger numbers. The result of this is that the total error (distance from the origin) is about 33% less than that predicted by a memoryless system with the same amount of noise. The Bayesian model predicts the nature of the error, with positive bias for low numbers and negative for high. The total error predicted is about 1.7 times more than that actually observed (as we have seen in Fig. 1), suggesting that the system overestimates its internal noise. However, besides this small overestimation the model captures the pattern of the data.

### Nonlinear Number Mapping in Children.

Finally, we asked whether the current model may be useful in explaining previously reported numberline data for school-aged children. To this end, we reanalyzed the previously published (24) numberline data of 68 primary school children (ages 8–11 y). As with the adult data, we found a clear and statistically significant dependency of the current trial on trial *i*−1 (*P* = 0.002). The dependency on trial *i*–2 was not significant. A split-half analysis like that of Fig. 2 also revealed a significant difference between trials preceded by a lower numerosity and those preceded by a higher numerosity [*F*_{(1,3)} = 35.9, *P* = 0.009].

## Discussion

Two general conclusions can be drawn from this study. The first is that the characteristic nonlinearities in numberline mapping of young or unschooled subjects, or adults under attentional load, need not result from logarithmic encoding of number. We are able to account completely for the nonlinearities observed in three sets of data with a simple Bayesian model that performs a linear weighted sum between present and past stimuli. Obviously we cannot exclude that static nonlinearities also occur, but the nonlinear numberline cannot be taken as evidence for them. Second—and perhaps more importantly—we suggest that taking stimulus history into account may reflect a very general strategy of optimizing behavior to take into account environmental statistics. Because the physical world is largely stable and continuous over time, the recent past is a good predictor of the present (25). As Fischer and Whitney (26) have recently suggested for orientation, serial dependence of estimates of numerosity may reflect a basic mechanism to improve the efficiency of numerosity estimation.

### Logarithmic Transformation of Stimulus Magnitude.

Much evidence has been thought to reflect logarithmic encoding of number, including the approximation to Weber’s law (thresholds proportional to numerosity) of range for both humans (27) and monkeys, and the logarithmic bandwidth of neurons selective to number (11, 28, 29). However, compressive nonlinearities do not necessarily implicate intrinsic logarithmic encoding (30⇓–32). This is a very old controversy in psychophysics, dating back to Weber and Fechner, who interpreted the proportionality of sensory thresholds to stimulus intensity (now called the Weber–Fechner law) to reflect logarithmic processing (as the derivative of the logarithm is 1/*x*) (33, 34). However, this was famously challenged by Stevens (35), who showed that the logarithmic-like behavior holds only at threshold. Weber–Fechner behavior for sensory attributes is now rarely interpreted to imply logarithmic encoding, usually adaptation or gain control, mechanisms that have been linked to optimal behavior (26, 36⇓–38). So, the strongest evidence for logarithmic coding was the logarithmic numberline: Because that now has a more plausible explanation, there exists no evidence at all for logarithmic encoding of number in primate brains. Just as the dependency of increment thresholds on magnitude led to erroneous assumptions of logarithmic transformation of sensations, rather than dynamic gain control, so too has the logarithmic-like shape of the numberline led to assumptions of logarithmic encoding of number, rather than dynamic adaptive coding.

We also point out that although Weber’s law is often assumed it is seldom actually observed. In a recent study (39) we showed that for discrimination of numerosity Weber’s law applied over only a very limited interval, with the square-root law (α = 0.5) operating over most of the range. In the current study, which required a pointing response (which presumably also introduced some noise), the power law exponents ranged from 0.36 to 0.5, well outside the Weber range. Although this is not particularly strong evidence against logarithmic encoding, it certainly does not support the concept. And recent evidence that monkeys can perform simple additions almost linearly (40) also speaks against logarithmic encoding.

### Modeling.

We model our results with a simple Bayesian-like model, which incorporates a weighted estimate of past stimuli in a recursive way, so responses tend to be drawn toward the mean. The degree to which they deviate depends on three factors: the estimated reliability (inverse variance) of the current stimulus, that of the previous stimulus, and the difference in magnitudes between the two (Eq. **4**).

The model (Eq. **2**) resembles closely a Kalman filter, with the weighting to previous stimuli acting like Kalman gain. However, we do not attach any particular significance to this similarity. Kalman filters are typically used to stabilize and maintain calibration in control systems by minimizing the difference between predicted and observed states. The filter does not completely recalibrate each time a difference between prediction and observation occurs, because that would render it unstable. In our case it is not clear why a Kalman filter should apply, as it is being stabilized or calibrated. Perhaps the apparent similarity of our model to a Kalman filter merely results from the fact that both incorporate a running average in the final estimate of magnitude. We believe that the particular form of modeling we have chosen is not unique, but acts as an existence proof to show how trialwise correlations can lead to logarithmic-like distortions.

The model could certainly be refined. For example, there is no active memory in estimating the mean. Stimuli older than 1-back affect the predictions only because of the innate recursiveness of the model: The best estimate of the magnitude of trial *i*−1 is considered to be the response to it, which includes a weighting from trial *i*−2, etc. Indeed, in the condition with the highest noise (dual task), we do find a significant dependency on trial *i*−2. However, it is conceivable that the system may actively construct an estimate of the mean as the prior, as has been suggested in previous work (7, 16, 18). It would be relatively simple to extend the model to incorporate a running average of the mean, which may lead to further improvements in the fits.

Perhaps the most important question is, Why should the current responses depend on the magnitude of previous stimuli? Within the Bayesian framework, the use of priors has typically been considered to be optimal. Indeed, also in this case, the prior does lead to an improvement of overall accuracy (Fig. 5). However, the improvement is not great, only about 33%. Is this a large enough advantage to be driving this effect? Possibly, but it is also possible that other factors are at work. For example, the dependencies may reflect an inherent hysteresis of the system. Indeed, priming (positive dependency on past events) is a ubiquitous phenomenon in psychology (41, 42), which may well reflect a general strategy to cope adaptively in the natural environment (37, 43).

Some readers may wonder why we did not look for correlations between responses, rather than stimuli. Because Eq. **2** predicts a dependency on the estimate of the previous numerosity, approximated by the response to it (*R*_{i−1}), it may seem sensible to search for dependencies on previous responses, rather than stimuli. However, this is psychophysically impractical, because response biases could lead to spurious correlations between successive responses. For example, a subject first responding consistently too low, then consistently too high, would produce a spurious correlation between neighboring trials. However, no such behavior can lead to spurious correlations between previous stimuli and current responses, because each stimulus was drawn independently. As the main component of *N*_{i−1}, showing a dependency of *R*_{i} on *N*_{i−1} is the strongest support possible for the model. That there is no dependency on *N*_{i+1} is very strong proof against artifacts producing spurious correlations.

In our study we chose to use clouds of dot stimuli, because their sensory characteristics are more easily defined. However, many numberline tasks use symbolic stimuli, such as Arabic digits, and the logarithmic form of the numberline has been shown to be predictive of math performance in school-age children. We have yet to investigate whether the coding of symbolic numbers may be influenced by past history. Because the model does not depend on any particular power-law relationship, but works even with constant noise (*α* = 0), there is no reason why the model should not predict performance in this situation too. If similar effects are observed, it may provide insights into why nonlinear numberline performance is associated with poor math performance.

## Conclusions

To summarize, we have shown that when mapping number to space subjects adjust responses to take into account recent history, fitting well with suggestions that the spatial representation of numbers is not an inert map but is a highly dynamic process (8). If this notion is correct, we are in a position to make predictions for a wide range of research. For example, our model predicts the serial dependency recently reported for perception of orientation (26). We also predict serial dependencies in other tasks, such as Dotan and Dehaene’s (9) demonstration that numberline mapping responses are initially toward the “logarithmic” goal, before being corrected toward the linear response. We predict that the early responses should be highly correlated with prior stimuli, reflecting the intrinsic tendency to incorporate the prior, which is steadily reduced as evidence accumulates. We also expect to find strong trialwise correlations in many other experimental conditions, such as time reproduction (16) and even causality (44).

## Methods

Five adults naïve to the purpose of the study (mean age 26 y), all with normal or corrected-to-normal vision, participated in the study. Participants gave written informed consent. The experiments were approved by the local ethics committee (Azienda Ospedaliero-Universitaria Pisana n. 45060). Stimuli and procedures were similar to those in typical numberline experiments, described in detail elsewhere (e.g., ref. 7). Briefly, participants were presented with a cloud of dots and asked to indicate the quantity on a line demarcated by two sample numerosities. Each trial started with participants viewing a 22-cm “numberline” that remained visible throughout the trial with sample dot clouds representing the extremes: one dot on the left of the numberline and 100 on the right. Dot stimuli (half black, half white) were presented for 240 ms (in a circular region of 8° diameter) and were followed by a random-noise mask. The numerosities were 2, 3, 6, 18, 25, 42, 67, 71, 86; subjects responded by mouse click. As described in ref. 7, in the dual-task condition adults performed the task together with a color-orientation conjunction task on the central squares, before making the number-line judgment. In the single-task condition, everything was identical except they ignored the color-orientation stimuli and responded only to the number task. In each session, subjects were presented with two repetitions of each stimulus, a total of 18 trials, presented in random order. All subjects performed eight blocks of 18 trials each, randomly intermingling single- and dual-task conditions. No feedback was provided in any condition.

## Acknowledgments

This work was supported by the European Research Council (FP7; Space Time and Number in the Brain; and Early Sensory Cortex Plasticity and Adaptability in Human Adults) and the Italian Ministry of Research.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: dave{at}in.cnr.it.

Author contributions: G.M.C., G.A., and D.C.B. designed research; G.M.C. and G.A. performed research; G.M.C., G.A., and D.C.B. analyzed data; and G.M.C., G.A., and D.C.B. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

## References

- ↵
- ↵
- Dehaene S

- ↵
- ↵
- Dehaene S,
- Izard V,
- Spelke E,
- Pica P

- ↵
- Siegler RS,
- Opfer JE

- ↵
- ↵
- ↵
- ↵
- ↵
- Dehaene S

- ↵
- ↵
- Haggard P,
- Rossetti Y,
- Kawato M

- Dehaene S

- ↵
- ↵
- ↵
- Hollingworth HL

- ↵
- ↵
- ↵
- Petzschner FH,
- Glasauer S

- ↵
- Cicchini GM,
- Arrighi R,
- Cecchetti L,
- Giusti M,
- Burr DC

- ↵
- ↵
- Burr DC,
- Turi M,
- Anobile G

- ↵
- Roach NW,
- Heron J,
- McGraw PV

- ↵
- Ernst MO

- ↵
- ↵
- ↵
- ↵
- Jevons WS

- ↵
- ↵
- Nieder A,
- Merten K

- ↵
- ↵
- Brannon EM,
- Wusthoff CJ,
- Gallistel CR,
- Gibbon J

- ↵
- Kolyoak KJ,
- Morrison R

- Gallistel CR,
- Gelman R

- ↵
- Weber EH

- ↵
- Fechner GT

- ↵
- ↵
- ↵
- ↵
- Gepshtein S,
- Lesmes LA,
- Albright TD

- ↵
- Anobile G,
- Cicchini GM,
- Burr DC

- ↵
- Livingstone MS,
- et al.

- ↵
- ↵
- ↵
- Friston K,
- Kiebel S

- ↵
- ↵

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Psychological and Cognitive Sciences