Stochastic sampling provides a unifying account of working memory limits

Attempts to characterize the limits of human working memory have differed on whether internal representations are discrete or continuous, with models of each type competing to best capture the errors observers make in delayed reproduction of elementary stimulus features. Here we show discretization only weakly discriminates between models; the critical distinction is instead between deterministic (fixed) and stochastic (randomly varying) limits, with only the latter compatible with observed human performance and the underlying biological system. Reconceptualizing existing models in terms of sampling reveals strong commonalities between seemingly opposing accounts: adding stochasticity to a discrete model brings it into closer correspondence with theories of neural coding, and puts its quality of fit on a par with continuous models, but also eliminates the stability and dependencies between items implied by a fixed set of “slots”. A probabilistic limit on the number of items successfully retrieved is an emergent property of stochastic sampling, with no explicit mechanism required to enforce it. These findings resolve discrepancies between previous accounts and establish a unified computational framework for further investigating working memory.

level γ and distributed evenly among N stimuli (as proposed in [2]), the mean number of samples available to recover each item is γ/N , but the actual number varies from one retrieval to the next according to a Poisson distribution. Decoding of population activity can therefore be interpreted as stochastic sampling of stimulus features.
This stochasticity stands in contrast with most previous sampling-based models in the attentional and memory literature, and with the influential slots+averaging model [13], which can also readily be interpreted in terms of sampling ( Fig. 1E-G). Each slot is postulated to hold a representation of a single object with a fixed precision, and thus provides a noisy sample of the objects' feature values. Multiple slots, or samples, that correspond to the same object are averaged at retrieval to improve recall precision. Independent of whether working memory representations are feature-or object-based [14,15], the critical difference from stochastic sampling is that the total number of samples available for all items is fixed at a value K.
We now consider the distribution of representational precision in these models. For any given set of samples, the information they provide about the stimulus is described by the likelihood function. The width of the likelihood function is a measure of uncertainty in the estimate that also reflects trial-to-trial variability (see SI Sec. 4.2 and Fig. S1), so we define the precision of an individual estimate in terms of this width. If samples are drawn from a Gaussian distribution, precision increases linearly with the number of samples.
In the stochastic sampling model, precision has a Poisson distribution scaled by the precision of a single sample (Fig. 1C). The distribution of decoding errors can be described as a scale mixture of normal distributions with precision proportional to the sample count ( Fig. 1D  circular stimulus spaces typically used experimentally, this is a close approximation rather than exact, see SI Sec. 5.2). The dispersion of errors increases with decreasing activity (e.g. as a result of increasing set size; black curve vs red curve in Fig. 1D) and their distribution is leptokurtic, with long tails evident at lower activity levels (red curve).
In the fixed sampling model, making the common assumption that samples are distributed as evenly as possible among items [13,16], we obtain a discrete distribution over at most two precision values (Fig. 1F), which are multiples of the sample precision. As in the stochastic model, mean precision is inversely proportional to set size, but because the distributions over precision differ, the fixed and stochastic models make distinct, testable predictions for error distributions (Fig. 1G).
We fit the stochastic and fixed sampling models to a large dataset of single-report and wholereport tasks (see SI Sec. 1 and Fig. S2). In the latter, participants reported the feature values of all presented items, providing additional information about subjective confidence (which we equate with likelihood width) and error correlations between items, for which fixed and stochastic models make differing predictions. The stochastic model fit data substantially better than the fixed sampling model for both types of task ( Fig. 2A and B, Fig. S3), indicating that stochasticity is critical for capturing behavioral performance. Contrary to previous interpretations [17], model comparison on whole-report data did not support a slot-like mechanism with a fixed item limit. Intermediate models in which a fixed number of samples were randomly allocated to items (random-fixed model) or a Poisson random number of samples was distributed as evenly as possible between items (even-stochastic model) produced intermediate qualities of fit overall (Fig. 2C), with the latter's advantage over the fully stochastic model in single-report data outweighed by its significantly worse fit to whole-report data.   [16,18]). The Fano factor is fixed at ω 1 across all set sizes and levels of discretization. Note that discrete precision values for different set sizes are slightly shifted for visibility. For finite thresholds smaller than the base precision ω 1 , the number of above-threshold items saturates with increasing set size for all model parameters.
As the precision of each individual sample decreases, we find that the precision distribution approaches a continuous Gamma distribution ( Fig. 3D; see SI Sec. 6.3). Two previous studies [16,18] independently proposed a continuous scale mixture of normal distributions with Gamma-distributed precision to account for behavioral data, but could not motivate this choice theoretically. We can now account for these variable precision models as a limiting case of stochastic sampling with a very large number of very low-precision samples. We found that the Gamma model (p → 0) fit single-report data more poorly than the Poisson model (p = 1), and the best fits were obtained at intermediate levels of discretization (maximum likelihood at p = 0.39; Fig. 2E, top). However, individuals varied considerably in their estimated discretization parameter (Fig. 2E, bottom), and differences in fit (measured in log likelihood) were on average an order of magnitude smaller than those between fixed and stochastic sampling (compare with Fig. 2A), reflecting the limited effect of sampling discreteness on predicted error distributions (insets in Fig. 3).
One prediction of working memory models with a fixed number of samples [13,17] is the appearance of random "guesses" once the number of items exceeds that limit (Fig. 4A-B). In the stochastic sampling account, the number of samples available for each item varies probabilistically and independently of every other item. Nonetheless, if a small positive precision value is chosen as a threshold, the expected number of items that exceed that precision threshold will saturate as set size increases, irrespective of the level of discretization (Fig. 4C-D; SI Sec. 6.1). The asymptote depends on both the threshold precision and the level of discretization, and does not correspond in any direct way to the number of samples in the model. Thus, a probabilistic item limit is an emergent property of stochastic sampling that does not require an explicit mechanism nor imply a particular number of samples.
In order for the Fano factor relating the variance and mean of precision to be held constant as sample counts increase, the sample counts themselves must become "overdispersed" compared to Poisson variability (i.e. FF > 1). Overdispersion of spike counts is a common observation in visual cortical neurons, typically with FF in the range 1.5-3 (e.g. [19]), corresponding in our model to discretization p in the range 0.33-0.75. Additionally, several other factors present in real neural populations could have effects similar to decreasing discretization in the generalized model. Heterogeneity in tuning functions [20] leads to variation in the information carried by each spike, with the effect of smoothing out the discrete distributions over precision predicted by a homogeneous Poisson model. This has similar consequences for estimation error to decreasing p in the generalized model. Consistent with this idea, incorporating biologically realistic heterogeneity into the population model improved fits to data (see SI Sec. 5.1 and Fig. S4).
Spikes in real neural populations are not independent events as assumed by the sampling interpretation, but rather correlated within and between neurons. This will tend to result in deviations from the simple additivity assumed by sampling. An implementation of short-range correlations in the population model greatly increased the numbers of decoded spikes required to reproduce behavioral data, without changing quality of fit (see SI Sec. 5.1). We note however that the exact consequences of spike correlations for decoding depend on details of correlation structure that are difficult to measure experimentally [21][22][23], and suboptimal inference (in the form of a mismatched decoder) could play a part [24].
The degree to which working memory samples are discretized versus continuous has only very weak effects on predicted retrieval errors under the stochastic sampling model (e.g. insets of Fig. 3). Importantly, discrete representations are compatible with an underlying continuous memory resource that can be distributed according to behavioral goals [25][26][27]: indeed in the stochastic model the integer number of samples available for each item at retrieval is unpredictable, and so cannot be the basis of prioritization. Instead, the resource distributed between items corresponds to the mean or expected total number of samples, which is constant and continuous-valued -in the neural model [2] this is equated with the instantaneous firing rate or membrane potential, while decoding is based on the expression of this rate in discrete spikes.
The stochastic sampling model can be understood at multiple levels: in purely descriptive terms as a form of mixture model (like the normal+uniform model, [13]); at a cognitive level in terms of averaging samples; and at a neurocomputational level via its implementation in population coding. The neural interpretation provides the link to another recent proposal for understanding recall errors -psychophysical scaling [28] -which has an alternative expression as a Gaussian-noise population model [29].