Attention model of binocular rivalry

Significance Binocular rivalry provides a unique opportunity to characterize intrinsic neural dynamics of cortical processing. A computational model was developed as a parsimonious explanation of the empirical phenomena of rivalry for which there was no previous explanation. The key idea in the model is that rivalry relies on interactions between sensory processing and attentional modulation with distinct dynamics and selectivity. Bifurcation theory was used to identify the parameter regime in which the behavior of the model was consistent with empirical findings. The model explained a wide range of phenomena, including (i) that binocular rivalry requires attention, (ii) that different perceptual states emerge when the two images are swapped between the eyes, and (iii) how dominance duration changes as a function of stimulus input strength. When the corresponding retinal locations in the two eyes are presented with incompatible images, a stable percept gives way to perceptual alternations in which the two images compete for perceptual dominance. As perceptual experience evolves dynamically under constant external inputs, binocular rivalry has been used for studying intrinsic cortical computations and for understanding how the brain regulates competing inputs. Converging behavioral and EEG results have shown that binocular rivalry and attention are intertwined: binocular rivalry ceases when attention is diverted away from the rivalry stimuli. In addition, the competing image in one eye suppresses the target in the other eye through a pattern of gain changes similar to those induced by attention. These results require a revision of the current computational theories of binocular rivalry, in which the role of attention is ignored. Here, we provide a computational model of binocular rivalry. In the model, competition between two images in rivalry is driven by both attentional modulation and mutual inhibition, which have distinct selectivity (feature vs. eye of origin) and dynamics (relatively slow vs. relatively fast). The proposed model explains a wide range of phenomena reported in rivalry, including the three hallmarks: (i) binocular rivalry requires attention; (ii) various perceptual states emerge when the two images are swapped between the eyes multiple times per second; (iii) the dominance duration as a function of input strength follows Levelt’s propositions. With a bifurcation analysis, we identified the parameter space in which the model’s behavior was consistent with experimental results.

When the corresponding retinal locations in the two eyes are presented with incompatible images, a stable percept gives way to perceptual alternations in which the two images compete for perceptual dominance. As perceptual experience evolves dynamically under constant external inputs, binocular rivalry has been used for studying intrinsic cortical computations and for understanding how the brain regulates competing inputs. Converging behavioral and EEG results have shown that binocular rivalry and attention are intertwined: binocular rivalry ceases when attention is diverted away from the rivalry stimuli. In addition, the competing image in one eye suppresses the target in the other eye through a pattern of gain changes similar to those induced by attention. These results require a revision of the current computational theories of binocular rivalry, in which the role of attention is ignored. Here, we provide a computational model of binocular rivalry. In the model, competition between two images in rivalry is driven by both attentional modulation and mutual inhibition, which have distinct selectivity (feature vs. eye of origin) and dynamics (relatively slow vs. relatively fast). The proposed model explains a wide range of phenomena reported in rivalry, including the three hallmarks: (i) binocular rivalry requires attention; (ii) various perceptual states emerge when the two images are swapped between the eyes multiple times per second; (iii) the dominance duration as a function of input strength follows Levelt's propositions. With a bifurcation analysis, we identified the parameter space in which the model's behavior was consistent with experimental results. binocular rivalry | bistable perception | visual attention | computational model | dynamical system B inocular rivalry is a visual phenomenon in which perception alternates between incompatible monocular images presented to the two eyes. During binocular rivalry, perceptual experience evolves dynamically while the external inputs are held constant. Binocular rivalry thereby provides an opportunity to gain insights about the intrinsic cortical computations underlying visual perception (1,2).
In conventional models of binocular rivalry, the competition between two percepts has been characterized as mutual inhibition between two populations of neurons selective for each of the two stimuli (3)(4)(5)(6)(7)(8)(9)(10)(11). Notwithstanding the differences in their details, these models consider the neural processing underlying binocular rivalry to be an automatic process. These models predict, therefore, that the dynamics of binocular rivalry are influenced mainly by bottom-up sensory inputs.
Converging experimental evidence has shown, however, that binocular rivalry also depends on attention (for a review, see ref. 12). First, EEG has been used to measure a neural correlate of the perceptual alternations during binocular rivalry when observers pay attention to the rival stimuli (13). However, this rivalry-induced modulation of the EEG signal is largely or entirely eliminated when attention is diverted away from the stimuli (14). Second, behavioral experiments comparing the perceptual consequences induced by attended and unattended rival stimuli also support the notion that binocular competition in general, and binocular rivalry in particular, requires attention (15)(16)(17)(18). These findings demonstrate that visual attention, the neural and cognitive process that selectively prioritizes information under natural viewing conditions, is critical for binocular rivalry.
Here, we propose a computational model of binocular rivalry in which there are two processes that drive perceptual competition: attention and mutual inhibition. (i) Attention: According to the model, the two rival stimuli compete for attentional resources. Attention is modeled as multiplicative gains (attention gain factors; ref. 19) that fluctuate between neural populations selective for the two rival stimuli. At a given moment, the stimulus associated with stronger sensory responses attracts a greater share of attention and reduces the attention allocated to the other stimulus. This stimulusdriven attentional modulation is presumed to be active when observers attend the stimuli, but silent when attention is diverted. (ii) Mutual inhibition: In the model, mutual inhibition is mediated through opponency neurons (20). The opponency neurons take conflicting information between two eyes as inputs and suppress the activity of monocular neurons that respond to one or the other eye.
We show that the model exhibits three experimental hallmarks of binocular rivalry: (i) Rivalry (i.e., response alternations between two competing neural representations) occurs for attended stimuli with interocular conflict, but not without interocular conflict and not for unattended stimuli (12,(14)(15)(16). (ii) When the stimuli are rapidly swapped back and forth between the two eyes, the simulated percept either follows one image across the swapping or it follows the stimuli in one eye, depending on the temporal characteristics of the stimuli (21)(22)(23)(24). (iii) The simulated dominance duration changes as a function of stimulus strength, following Levelt's propositions (25,26). Bifurcation analysis was used to explore all of the possible behaviors of the model as a function of the strength of sensory Significance Binocular rivalry provides a unique opportunity to characterize intrinsic neural dynamics of cortical processing. A computational model was developed as a parsimonious explanation of the empirical phenomena of rivalry for which there was no previous explanation.
The key idea in the model is that rivalry relies on interactions between sensory processing and attentional modulation with distinct dynamics and selectivity. Bifurcation theory was used to identify the parameter regime in which the behavior of the model was consistent with empirical findings. The model explained a wide range of phenomena, including (i) that binocular rivalry requires attention, (ii) that different perceptual states emerge when the two images are swapped between the eyes, and (iii) how dominance duration changes as a function of stimulus input strength.
inputs, the amount of attentional modulation, and the amount of mutual inhibition. We identify the parameter regime in which the behavior of the model was consistent with empirical findings. Finally, we show that previous computational models of bistable perception fall short of explaining this full suite of phenomena.

Methods
Model. The model had three classes of neurons responsible for sensory representation, attentional modulation, and mutual inhibition, respectively (Fig. 1). The response of each individual neuron in the model is intended to represent the mean activity (instantaneous firing rate) of an ensemble of neurons with similar response properties. The responses of the neurons were computed by the same canonical motif, divisive normalization (27). The model is intended to characterize neural activity in visual cortex in terms of the signal-processing computations that are performed, not in terms of the underlying circuit, cellular, molecular, and biophysical mechanisms. For example, we do not intend for individual elements in the model to be interpreted as specific cell types, and the various model parameters (e.g., orientation selectivity, time constants) might emerge from a neural circuit rather than being intrinsic biophysical properties of individual neurons. See Table S1 for the values of all of the model parameters. Sensory representations. Sensory representations consisted of three populations: two monocular [left eye (LE) and right eye (RE)] populations and one binocular-summation population. Each population contained two neurons, each selective for one of two orthogonal orientations.
The response of the left-eye monocular neuron R l1 selective for orientation 1 was computed as follows: The responses of all four monocular neurons (R l1 , R l2 , R r1 , R r2 ) are characterized by similar expressions in which the subscripts l and r specify left eye and right eye, respectively, and the subscripts 1 and 2 specify the two orientations. The first line is an equation for calculating the response over time in terms of excitatory drive (E), suppressive drive (S), and adaptation (H). The subsequent lines provide expressions for the excitatory drive, suppressive drive, and adaptation, respectively. The excitatory drive (E) was determined by the input (D). The amplitude of the input was assumed to increase monotonically with stimulus contrast. The excitatory drive was modulated by two factors. First, the pooled responses of the opponency neurons (O r ) that responded to the opposite eye (Mutual inhibition) were subtracted from the input. Second, the resulting activity after this subtractive suppression was multiplied by an attention gain factor (1 + w a R a1 ), in which 1 was the baseline attention gain, and R a1 was the response of the attention neuron that was selective for the same orientation (Attentional modulation). The values of w o and w a determined the amount of subtractive mutual inhibition and attentional modulation, respectively. The notation ½ + represents half-wave rectification. The suppressive drive (S) of each monocular neuron was the sum of the excitatory drives of all of the monocular neurons. The values of n and σ determined the slope and the contrast gain of the contrast-response functions of the neurons, and α was a scaling factor that determined the maximum response. The value of τ s was the time constant of the monocular and binocular summation neurons. The sensory neurons slowly self-adapted through the adaptation term H with time constant τ h and magnitude w h . The response of the binocular summation neuron R b1 selective for orientation 1 was computed as follows: where the responses of both binocular neurons (R b1 , R b2 ) are characterized by similar expressions in which the subscript b specifies that it is a binocular neuron, and the subscripts 1 and 2 specify the two orientations. The excitatory drive (E) summed the responses of monocular neurons selective for the same orientation. The suppressive drive (S) was the same as the excitatory drive. Similar to monocular neurons, binocular summation neurons self-adapted through the adaptation term H. Unlike a previous model (4) that used mutual inhibition between binocular neurons to model a swapping experiment, there was no mutual inhibition between binocular neurons in our model. The functions of the binocular summation neurons in our model were only to represent sensory responses and provide inputs to neurons in the attention layer. Attentional modulation. According to the model, attention gain fluctuated between two sensory representations when observers attended to the rival stimuli. Whichever orientation had stronger sensory responses, at any moment, received a greater share of attention gain. We implemented the model in this way because a competing stimulus in one eye can suppress a target in the other eye by increasing the attention gain for the orientation of the competitor while decreasing the attention gain of the target (17,18). There were two neurons in the attention layer selective for orthogonal orientations ( Fig. 1 A and B). The response of the attention neuron R a1 that preferred orientation 1 was computed as follows: We used a similar expression to compute the responses of the other attention neuron R a2 . The excitatory drive (E a1 ) was the difference between the  responses of two binocular-summation neurons. The two orientation-selective neurons in the attention layer showed a trade-off. When the response of one neuron (e.g., R a1 ) was positive, the response of the other neuron (R a2 ) was negative, and vice versa. Hence, depending on their preferred orientation, monocular neurons in sensory layers received attention gain that fluctuated around a baseline value (Eq. 1). The negative responses of the attention neurons can be accommodated with complementary pairs of neurons that are each half-wave rectified, analogous to standard models of ON-and OFF-center retinal ganglion cells. τ a (=150 ms) was the time constant of attentional modulation. Mutual inhibition. Mutual inhibition was mediated through opponency neurons (20). There were four opponency neurons in total: a pair of left-minusright (LE-RE) opponency neurons and a pair of right-minus-left (RE-LE) opponency neurons, each with a pair of neurons selective for orthogonal orientations ( Fig. 1 A and C). The response of the RE-LE opponency neuron R or1 selective for orientation 1 was computed as follows: where the subscript o denotes that it is an opponency neuron, the subscript r indicates that it is the RE-LE opponency neuron (not an LE-RE opponency neuron), and the subscript 1 specifies the orientation preference. We used a similar expression to compute the responses of the other opponency neurons (R or2 , R ol1 , R ol2 ). The excitatory drive of the opponency neurons was the difference of the responses between LE and RE monocular neurons. The value of S or was the suppressive drive of both RL opponency neurons. τ o (=20 ms) was the time constant of the opponency neurons. The response of the two RL opponency neurons were pooled as O r and subtracted from the left-eye monocular neurons (Eq. 1). The subtractive suppression (−w o O r in Eq. 1) was analogous to mutual inhibition in previous models (3)(4)(5)(6)(7)(8). However, previous models exhibited response alternations (in a magnitude similar to the response alternations for rivalry stimuli) for stationary, monocular-plaid stimuli ( Noise. A stochastic term, an Ornstein-Uhlenbeck process (8, 10), simulating neural noise, was added to the input drive for some of the simulations (Figs. 3A and 5B and Figs. S7 and S8): τ n d dt n = −n + σ ffiffiffiffiffiffiffi ffi 2τ n p ξðtÞ, where τ n = 100 ms, σ = 0.02, and ξðtÞ was a Gaussian white-noise process. We assumed that the EEG signal was a noisy version of the simulated neural responses. In addition to the neural noise, we simulated the EEG measurements (14) by adding lowpass-filtered Gaussian noise, representing measurement noise, to the responses of the binocular summation neurons (Fig. 3A).
Inputs. The inputs (D) to the model were assumed to be the responses of subcortical visual neurons. When the stimuli were swapped or flashed, the inputs exhibited an onset transient response [modeled by an α function ðt=τ α Þe ð1−t=τα Þ , τ α = 3 ms], which was 1.5 times greater than the designated sustained input strength (Fig. S1A). The transient response dropped rapidly to a sustained level of input strength. After stimulus offset, a decay was modeled by a hyperbolic tangent function with a half-life of 15 ms. The onset transient and the offset decay captured the delay and the time course of the responses of subcortical neurons (29,30).
Simulations. The model can be simulated as a system of 18 ordinary differential equations (four monocular variables and two binocular variables, each with an intrinsic adaptation variable, two attention variables, and four opponency variables). Neural responses were simulated in MATLAB using forward Euler's method with a time step of 1 ms. Further reducing the time step did not change the results. Parameter values are listed in Table S1.
Rivalry Index. We followed the procedure in a previous EEG study (14) to simulate the rivalry index. The simulated responses of two binocular summation neurons (with neural and measurement noise; Methods, Noise) were first low-pass filtered with a Gaussian kernel (1.2-s SD). We then searched for local peaks in the time course of the neural responses. We segmented the time course into 6-s epochs, centered at the local peaks. For each epoch, the time course of the neuron associated with stronger responses (with the local peak) was defined as the "aligned signal," and the response of the other neuron (with weaker response) was defined as the "rival signal." The aligned signals were averaged across all epochs. The rival signals were also averaged across epochs. The rivalry index was computed by the amplitude (distance between the peak and trough) of the averaged aligned signal divided by the amplitude of the averaged rival signal.
Bifurcation Analysis. We used bifurcation analysis to investigate the dynamics of the model, as a function of the strength of input, the weight of attentional modulation, and the weight of mutual inhibition. The steady-state responses were tracked as the model parameters were varied, and the boundaries where the system's steady state changed qualitatively (bifurcations) were identified. This approach produced phase diagrams showing parameter values for which the model exhibited qualitatively different dynamical behaviors. Bifurcation analysis was performed using the freely available software AUTO-07p (31).

Results
Binocular Rivalry Requires Attention. We simulated neural responses under two attention conditions, attended and unattended. In the attended condition, attention gain fluctuated between monocular neurons selective for different orientations. In the unattended condition, the attention gain of all of the monocular neurons stayed at baseline (=1). This was accomplished by setting the weight of attention feedback (w a ) to 0. The simulations were performed for two types of stimuli: dichoptic gratings (a pair of stationary, orthogonal gratings, presented in different eyes; Fig. 2A) and plaids (a pair of stationary, orthogonal gratings, presented simultaneously to one or both eyes; Fig. 2B and Fig. S1C).
The simulated responses of the binocular summation neurons and monocular neurons exhibited response alternations over time, consistent with perceptual alternations ( Fig. 2C and Fig. S2A), when the stimuli were dichoptic gratings and were attended. For unattended dichoptic gratings, the responses alternated only briefly following stimulus onset, after which the competing neurons (the two binocular summation neurons or the two monocular neurons receiving inputs) exhibited responses that converged to the same steady-state level ( Fig. 2E and Fig. S2). The initial phase of response alternations (Fig. 2E) is consistent with a psychophysical study reporting that withdrawing attention does not eliminate onset rivalry (32), even though withdrawing attention is effective in abolishing ongoing rivalry.
Neither monocular plaids ( Specifically, the steady-state response of a neuron selective for the target is given by Eq. 1, which reduces to E l1 /(S m + σ n ) in the absence of mutual inhibition and attentional modulation. Adding a cross-orientation mask suppresses the target through an increment of the normalization pool (S m ), which is computed as the summed excitatory drive across orientations and eyes (Eq. 1).
In the model, attention and mutual inhibition both facilitated competition, but their effects interacted nonlinearly. The attention gain was found to fluctuate only for attended dichoptic gratings, not for attended monocular plaids (Fig. S2 A and B). That is, the initiation of attentional modulation required an imbalance between the two orientations that was triggered by mutual inhibition. In addition, when the orientations presented to the two eyes were different, the presence of attention amplified the responses of the opponency neurons. The attention-dependent dynamics depicted in Fig. 2 was robust with respect to changes in the time constants: The model exhibited similar results (only attended dichoptic gratings generated ongoing rivalry) when τ s , τ a , and τ o were independently adjusted to be nearly instantaneous (1 ms), or when τ a and τ o were changed to be double their original values. The values of the time constants were further constrained when simulating the eye-swapping experiments. In Operating Regime for Binocular Rivalry, we fully characterize how the strength of attentional modulation and mutual inhibition determined the dynamics of binocular rivalry in the model.
The model simulated human EEG measurements of neural activity when rival stimuli were attended and unattended (Fig. 3). To simulate the empirical EEG results (14), we added both neural noise to the input drive and measurement noise to the responses of the binocular summation neurons (Methods, Noise). We then analyzed the noisy responses of the binocular summation neurons using the same analysis as the EEG study. Specifically, we computed a rivalry index that quantified the amplitude of competition in rivalry (see Methods, Rivalry Index, and ref. 14 for details). In the attended condition, the peak of the response of one binocular summation neuron (Fig. 3A, aligned signal) was accompanied by the trough of response of the other neuron (Fig.  3A, rival signal). This counter phase time-course represented a neural signature of binocular rivalry. This pattern was greatly reduced in the unattended condition and resulted in substantially smaller rivalry index (Fig. 3A, Middle and Right). The simulated attended and unattended responses resemble the empirical findings (Fig. 3B). Following the empirical study by Zhang et al. (14), we also simulated a replay condition, in which the two stimuli physically alternated irregularly to simulate rivalry. Consistent with that study, the simulated counter phase timecourses were not influenced by attention in the replay condition.

Mutual Inhibition Supports Eye Dominance and Attention Stabilizes
Perceptual State. Swapping the stimuli between the two eyes rapidly and repetitively (Fig. 4) has been used to dissect the neural processing contributing to binocular rivalry (21)(22)(23)(24). Two types of percept have been reported under these conditions. In fast alternation (FA), one eye dominates for a period, and observers report perceiving a rapid alternation between two images at a frequency equal to the swap rate. In slow alternation (SA), the perceptual dominance of one stimulus persists across swapping, and observers report seeing one image for a few seconds similar to conventional binocular rivalry. SA has been taken as evidence that binocular rivalry cannot be explained simply by mutual inhibition between neurons that respond selectively to each of the two eyes. The proportion of time that observers experience FA and SA depends on the temporal characteristics of the stimuli. If the stimuli are static images, presented to alternate eyes immediately when swapped (Fig. 4A), then observers report FA most of the time. Using flickering images (Fig. 4B), with an on-off flicker rate higher than the swap rate, increases the proportion of SA (21)(22)(23)(24).
Neural responses simulated with our model exhibited slow alternations (SA) for some stimulus conditions and fast alternations (FA) for other stimulus conditions, consistent with empirical results. We simulated the eye-swapping experiment with a swap rate of 3.3 Hz, and with flicker rates of either 18 or 0 Hz (i.e., static). For static stimuli, the simulated percept followed the stimuli in one eye for a few seconds, such that the orientation "perceived" by the model changed rapidly with each swap (Fig.  4D, top row, alternating blue and green peaks). For flickering stimuli, the "perceived" orientation was maintained across swapping (Fig. 4E, top row, extended periods during which the green curve is above the blue curve and vice versa), resembling the SA percept. During simulated SA, the dominant orientation can be observed in both binocular-summation neurons (Fig. 4E, top row), and in monocular neurons (Fig. 4E, second and third rows). This result was consistent with the empirical findings that monocular populations are involved in the maintenance of the dominant percept in SA (33). According to the model, the temporal dependence of FA and SA resulted from the different temporal characteristics of mutual inhibition and attention. The opponency neurons (mutual inhibition), when activated, suppressed the weaker eye regardless of the preferred orientations of the monocular neurons. This process captured the feature-invariant component in rivalry (34), and thereby supported FA. The attentional modulation, on the other hand, supported SA because the attention neurons were selective for orientation but not for eye-of-origin. The responses of the opponency neurons had a short time-constant, and thus mutual inhibition decayed rapidly after stimulus offset. The attention neurons had a comparatively long time-constant. For flicker stimuli, there was a gap before the swap, which allowed the eye-specific suppression from the opponency neurons to decay, while the activity of the attention neurons was sustained. So the alternations were primarily controlled by the attentional modulation under these conditions (Fig. S3D, fourth row), simulating SA. With static stimuli and no gap, the alternations were dominated by the activity of the opponency neurons (Fig. S3C, bottom row), simulating FA.
Due to the temporal dynamics described above, using static stimuli with a short blank before the swap could also induce SA (Fig. 4C). Our model showed SA for a range of blank durations from 35 ms to 150 ms, with a fixed swap rate (Fig. 4F). This result is consistent with previous studies reporting that blank duration around 100-150 ms gave rise to the greatest proportion of SA; blank durations longer than about 200 ms resulted in plaid percepts (22,23).
Perceptual states in the swapping experiments depend on the timing of the stimuli, so the empirical results from these experiments constrain the time constants in the model. The results in Fig.  4 held when τ s was ≤10 ms, and when the time constant of attentional modulation (τ a ) was doubled, but the response alternations in the 150-ms-blank-only condition (Fig. 4C) disappeared if τ a dropped below 70 ms. The time constant of opponency neurons (τ o ) had to be shorter than the time constant of attentional modulation, as explained above. However, for the condition with a static image and 3.3-Hz swap rate, if τ o was too short (<10 ms), the dominant eye in FA remained dominant for only one swap, rather than over a few seconds as in Fig. 4D (second and third rows).
Two previous models (4,33) also aimed to explain perception in swapping experiments. We found that they were able to capture the perceptual dynamics in some, but not all, of the conditions demonstrated here. Specifically, Wilson's model (4) generated SA with flicker stimuli, but did not generate FA with static images. Instead, the model predicted that the "perceived" orientation alternated with a frequency half of, but not equal to, the swap rate ( Fig. S4 A-C). With blank intervals inserted before each swap, the model did not exhibit SA for either short or long blank duration (Fig. S4 D and E). The model proposed by Brascamp et al. correctly predicted FA with static images (Fig. S5B) and SA with 18-Hz flicker (Fig. S5C). However, their model generated SA only when the blank was short (35 ms in Fig. S5D), not when the blank duration was lengthened (100 ms in Fig. S5E). In contrast, empirical studies have reported that SA increases with longer blank durations (22,23). In addition, these two models exhibited strong competition for stationary plaid stimuli. Specifically, in response to monocular plaids, Wilson's model exhibited behavior in which one of the gratings suppressed the other indefinitely (Fig. S6C). For binocular plaids, Wilson's model exhibited response alternations, similar to dichoptic gratings (Fig. S6E), and Brascamp's model exhibited behavior in which the representation of the plaid in one eye suppressed the plaid in the other eye indefinitely (Fig. S6F). These results are inconsistent with empirical observations that rivalry does not occur when the images in the two eyes are compatible (35,36). See details of the Increasing stimulus strength in both eyes while keeping it equal between eyes will generally increase the perceptual alternation rate, but this effect may reverse at nearthreshold stimulus strengths.
We performed bifurcation analysis to characterize the steadystate responses of the binocular summation neurons as a function of input strength (D). The model exhibited three types of behavior (regimes) when the input strength was varied. (i) At the lowest and the highest input strength, the two binocular sum-mation neurons maintained at equal activity [black solid curves, labeled as both-down (BD) and both-up (BU) in Fig. 5A; also see Fig. S1B]. (ii) The BD state became unstable when input strength increased through a critical value (∼0.15), and a winner-take-all (WTA) state emerged-a so-called steady-state bifurcation (37). In the WTA regime, depending on the initial conditions, one binocular summation neuron dominated over the other indefinitely (green curves in Fig. 5A; also see Fig. S1B). (iii) There was an oscillatory regime in which the responses of two summation neurons alternated, resembling the perceptual alternations in rivalry (solid blue curves, labeled as "Oscillation" in Fig. 5A; also see Fig. 2C). In more detail, the BU state became unstable as input strength decreased through a critical value (Hopf bifurcation in Fig. 5A). Here, the dynamic response, locally near the BU state, changed from damped to growing oscillations and a smallamplitude oscillation state emerged-a so-called Hopf bifurcation (37). In this case, the emergent oscillation was unstable (dashed blue curves). It merged with the stable large-amplitude oscillation regime at a higher input strength (fold of limit cycles in Fig. 5A Fig. 5B]. This corresponded to the lower part of the WTA regime with no noise. We suggest that input strengths corresponding to this short ID branch were below the threshold for visibility (see Operating Regime for Binocular Rivalry).
Operating Regime for Binocular Rivalry. Bifurcation analysis over a wide range of parameter values allowed us to explore all possible behaviors of the model. We characterized the dynamics of the model as a function of three parameters that controlled input strength (D), attentional modulation (w a ), and mutual inhibition (w o ). In a subspace of the 3D volume defined by these three parameters, the behavior of the model was consistent with empirical findings. Here, we illustrate this subspace by depicting 2D slices of the parameter space, and by indicating the boundaries that differentiate distinct model behaviors.
We defined a boundary between different regimes in the parameter space as follows. In some cases, one regime turned into the other through loss of stability of one state and emergence via bifurcation to a stable state of a different type (e.g., the transition from BD to WTA in Fig. 5A) or to oscillatory responses [e.g., the transition from WTA to DD, the Hopf bifurcation (HB), in Fig. 5A]. We traced the bifurcation points and plotted them as boundaries in Figs. 6 and 7. In some cases, the stable oscillatory response of the neurons in the oscillatory region was not directly connected to the steadystate response of the equal-activity regime at the transition (e.g., the transition from oscillation to BU regimes in Fig. 5A). We identified such boundaries by tracing the outermost bound of the oscillatory responses (fold of limit cycles in Fig. 5A; blue curves in Figs. 6 and  7). In other cases, for example close to HB in Fig. 5A, bifurcations on several branches occurred within a small parameter range. This remained true along the Hopf bifurcation boundaries in Figs. 6 and 7. In the interest of parsimony, we plot only the Hopf curves, while noting that the explicit bifurcation mechanism for the transition between different regions can also involve other bifurcations.
Under a fixed input strength (D = 0.5, same as in the simulations in Figs. 2-4), the binocular summation neurons responded equally, and there were no response alternations when the attentional modulation and mutual inhibition were both weak (Fig. 6, gray zone). The model exhibited oscillations or WTA as the weight of attentional modulation or mutual inhibition increased (Fig. 6, blue and red zones). The boundary between these regimes had a negative slope, indicating that attention and mutual inhibition both facilitate competition in rivalry, and jointly determined the model's behavior. The shapes of these regimes also depended on input strength (Fig. 7). The WTA and oscillatory regimes extended to larger input strengths with increasing mutual inhibition (from Fig. 7A to Fig. 7C) or attentional modulation (within each panel). There were two distinct equal-activity regimes. When the input strength was low, the binocular summation neurons responded equally with small responses (BD, gray strip on the Left of each panel in Fig. 7). When the input strength was high, and when attentional modulation was small, the binocular summation neurons exhibited equally large responses (BU, gray region at the bottom of each panel in Fig. 7).
There was a 3D region within the parameter space for which the model exhibited the critical phenomenology of binocular rivalry (Figs. 6 and 7, in which the black star corresponds to the parameters used for the simulation results in Figs. 2-5). First, the weight of attentional modulation (w a ) was chosen so that the model did not exhibit oscillations for monocular or binocular plaids (Figs. 6 and 7B, black star below dashed gray boundaries). Second, the weight of mutual inhibition (w o ) was chosen so that there were no response alternations for unattended dichoptic gratings (Fig. 6, black star to the Left of red dashed line, so as to be in an equal-activity regime when w a = 0 directly below the black star). Third, there was a lower bound on the input strength to avoid responses alternations for unattended stimuli (Fig. 7B, black star to the Right of the red dashed line, so as to be in an equal-activity regime when w a = 0 directly below the black star). Input strengths below this level were assumed to be below the threshold for visibility. Fourth, within the volume defined by the previous three criteria, we chose a combination of attentional modulation and mutual inhibition (black star in Figs. 6 and 7), so the model exhibited distinctly different behaviors corresponding to the FA and SA percepts in the swapping experiment. We found that to fulfill this criterion the model had to operate close to the boundary between oscillatory and WTA regimes (Figs. 6 and 7, black star near solid red curve). Fifth, binocular rivalry has been reported up to the highest contrast level testable, imposing an upper bound of the input strength (Fig. 7B, input strength increases along the green dashed line but cannot exceed the value where the green dashed line intersects the blue curve). Input strengths above this level were presumed to be physically unrealizable.
In summary, when the rival stimuli were attended, the model operated close to the boundary between oscillatory and WTA regimes (Figs. 6 and 7B, black star). A previous study, in which the authors independently varied the strength of adaptation and noise to fit the statistics of dominance durations (9), also led to a similar conclusion. When attention was diverted, the model moved to an equal-activity regime that was near the boundary of  an oscillatory regime. We only specified the relative position of this volume and did not aim to illustrate its size and shape. The phases and regimes in the bifurcation diagram stretch or squeeze depending on the other parameters in the model including the exponent (n) and suppression constant (σ). For example, we assumed that the input to the neurons increased monotonically with stimulus intensity. Changing the mapping between stimulus intensity (e.g., contrast) and neural input strength would scale and/or warp the x axis in Figs. 5 and 7 (38,39).

Discussion
We propose a computational model of rivalry, in which perceptual competition is driven by both attentional modulation and mutual inhibition. Attention (with a relatively slow time constant) recurrently amplifies the imbalance between the two rival stimuli triggered by the mutual inhibition (with a relatively fast time constant). This model captures three signatures of binocular rivalry simultaneously. (i) Diverting attention greatly reduces or eliminates response alternations (14)(15)(16). The model exhibits this phenomenon because of the recurrent amplification from attention. (ii) When rival stimuli are swapped rapidly between the two eyes, the dominant percept either follows the same eye or the same image, depending on the temporal characteristics of the stimuli (21)(22)(23)(24). The model exhibits these phenomena because of the different temporal dynamics of mutual inhibition and attention. (iii) The relationship between dominance duration and input strength follows Levelt's propositions (10,25,26). These propositions are satisfied because of a combination of competition (from attentional modulation and mutual inhibition), recurrent excitation (from attentional modulation), and slow adaptation (6,38,39). Levelt's proposition IV, in particular, depends on having some degree of recurrent excitation (38,39).
Attentional Modulation. In our model, attentional modulation depends on the sensory responses, a form of stimulus-driven attention. Reducing stimulus-driven attention mimics the effect of diverting attention away from the rival stimuli. This is consistent with findings that an attention-demanding task diminishes the stimulus-driven attention triggered by the stimuli outside the focus of attention (40)(41)(42). The notion that bottom-up inputs strongly influence the deployment of attention in rivalry may explain why binocular rivalry is only weakly biased by instructions [e.g., attend to the left-tilted grating (43)].
Attentional modulation in the model is selective for orientation but not for eye of origin. In previous psychophysical experiments, we measured how a competing image in one eye modulated the discriminability of a target image in the other eye. We found that a model with feature-selective attention, not eye-based attention, best explained the data (18). Some behavioral studies have suggested that attention can modulate eye-specific information (44,45). However, these studies placed different stimuli not only in different eyes but also in different retinal locations. This is different from binocular rivalry in which conflicting information is presented in corresponding retinal locations. Moreover, with such experimental conditions, an apparent attention effect might be the result of a combination of spatial attention and interocular divisive normalization (46)(47)(48)(49).
The present model extends the normalization model of attention (19) to a dynamical system. The time constant of the attentional modulation chosen here (∼150 ms) is consistent with the temporal dynamics of stimulus-driven attention measured in psychophysical experiments. Stimulus-driven attention peaks around 100-120 ms after a trigger stimulus, whereas goal-driven attention requires more time (∼300 ms) to be deployed (50)(51)(52). The speed of stimulus-driven attention in rivalry could be slower than that typically measured in studies of exogenous attention, because the changes in neural activity during rivalry are less abrupt than those evoked by a high-contrast brief cue typically used to summon attention. In any case, the temporal dynamics of attention are much slower than the speed of the mutual inhibition [<50 ms in the present and previous models (reviewed in ref. 6)]. Because the attentional modulation is computed recurrently (via feedback) with a time constant longer than the sensory responses, the model is in line with neurophysiological findings that stimulus-driven attentional modulation has little or no impact on the early transient part of the response evoked by stimulus onset (53).
The simulation results indicated that the prevalence of two types of percepts (FA vs. SA) in swapping experiments depends on the balance between attention and mutual inhibition. This idea may explain two additional aspects of phenomenology observed in swapping experiments. First, observers usually report that the perceptual state fluctuates between FA and SA during a single trial, even when the stimuli are optimized for one particular percept (22,23,54). This might result from fluctuations in the strength of attention and arousal. Attention and arousal are known to vary over time during an experiment (55,56). Increasing the strength of attention would amplify SA and reduce FA, and vice versa. Second, the depth of suppression is weaker (57,58), and the dominance durations are shorter (54) in swapping experiments compared with conventional binocular rivalry. Likewise in our simulations, SA responses had shorter dominance durations than conventional rivalry responses, indicating a weaker competition in SA (Figs. 2C and 4E).
Future work might extend the current model to explain empirical observations regarding the deployment of endogenous attention in rivalry. Manipulating observers' endogenous attention by asking observers to "hold on" to one of the two images can bias the percept in ongoing rivalry (43,59,60). However, the effects of biasing endogenous attention (to one of the rivalry stimuli) are not as pronounced as those of withdrawing attention, which can abolish ongoing rivalry (14)(15)(16). This difference in the magnitude of attentional effect might result from the differences in experimental designs: First, the perceptual report when observers are instructed to hold on to one percept has been usually compared with a neutral condition (in which observers are not instructed to hold on). It has been assumed that observers' voluntary attention is equally deployed to the two images in this neutral condition, but this might not be the case. Given the stochastic nature of the perceptual alternations in rivalry, voluntary attention may participate when observers attempt to track and report their percept. Consequently, the effect of endogenous attention could be underestimated. Second, in the hold-on condition, the to-beattended image went through a period in which it was less visible than the to-be-ignored image presented at the same location (43,59,60). In contrast, in the withdraw-attention procedure the to-be-ignored rivalry stimuli were presented at other locations (14)(15)(16). Thus, the withdraw-attention procedure, involving spatial and feature-based attention, could be stronger than the hold-on condition, involving only feature-based attention.
Extensions and Limitations of the Model. By modeling neurons selective for dimensions other than orientation (e.g., motion), one can extend the current model to investigate other forms of bistable phenomena, such as motion-plaid rivalry (10, 61-63) and ambiguous structure from motion (64). The strength of mutual inhibition might vary across different bistable stimuli. If the mutual inhibition is strong enough, response alternations and thus bistable percepts can exist without attention (Fig. 6). This may explain why some bistable phenomena, such as motion-induced blindness and ambiguous structure from motion, persist when attention is diverted or withdrawn (64,65).
For a stationary plaid (Fig. 2B), observers can sometimes experience weak perceptual alternations [monocular rivalry (66)]. The model can capture this effect by adjusting the strength of attention to a value above the gray dashed lines in Figs. 6 and 7. In that regime, attentional modulation alone induces response alternations for a plaid stimulus.
We remain agnostic as to the mechanism contributing to the stochastic characteristics of binocular rivalry. Similar to previous oscillator models (3-6), we focused primarily on characterizing the deterministic response alternations exhibited by our model. Adding noise to this type of system generates a gamma-like distribution of dominance durations (7,9). One could finely tune the strength of adaptation and the amplitude of the noise to reproduce the dominance duration statistics found in behavioral data. Alternatively, a mechanism based on assemblies of stochastic, bistable neurons can generate gamma-like distributions of dominance durations and can account for the scaling properties (i.e., constant skewness and coefficient of variation) of the dominance durations observed across input strengths and across different bistable phenomena (67).
Introducing noise to the model does not change the qualitative organization of the bifurcation diagrams: In general, noise induces small fluctuations in the equal-activity regime, response alternations in the WTA regimes [if sufficiently close to the oscillation regime (9)], and some variability to the alternations in the oscillatory regime. Simulations with noise showed that withdrawing attention reduced the strength of competition and the probability (proportion of time) that rivalry was observed ( Fig. S8 and SI Effect of Noise on Model Behavior).
The current model only assumes that withdrawing attention abolishes the fluctuations of attention gain, but distributing attention can also lead to a decrease of effective or perceived contrast (68). Because our model follows Levelt's fourth proposition, one can extend the model-for example, by reducing the baseline of attention gain or input drive (69) while still allowing the attention neurons to be active-to simulate the finding that adding a concurrent task could slow down the reported perceptual switches (70).
Neural Mechanisms. In the present model, the neurons in the attention layer were selective for orientation, and their inputs were linear combinations of the responses of the early sensory neurons. Visual neurons in areas downstream from V1 (e.g., V2-V4 and LO) could be responsible for such computations. Stimulus-driven attention in the model might reflect enhanced communication between early and late visual cortical areas. This idea is supported by studies that demonstrated greater interarea correlations of activity between visual areas with attention (71)(72)(73). An alternative, but not mutually exclusive, hypothesis is that attentional modulation is mediated by neural signals from frontoparietal cortices (74,75). Neurons in frontoparietal regions not only exhibit control signals but also feature-selective representations (76,77). However, whether the frontoparietal cortices are directly involved in binocular rivalry is controversial (78,79).
Neural activity in visual cortex, measured with single-cell electrophysiology, exhibits rivalry-like alternations (80,81), and neural responses in visual cortex also exhibit interocular suppression (82,83). However, some of these experiments were performed with anesthetized animals. One should be careful about comparing these results with those from human neuroimaging and psychophysics because responses of neurons in visual cortex depend on brain state (84)(85)(86). The dynamics of our model depend on multiple factors, stimulus strength, attentional modulation, and mutual inhibition (Fig. 7), which could have very different effects under anesthesia.

Binocular Rivalry as a Gateway for Understanding Perceptual Inference.
Perception is unconscious inference (87). Sensory stimuli are inherently ambiguous so there are multiple (often infinite) possible interpretations of a sensory stimulus. Multistable phenomena (e.g., binocular rivalry) can be used to probe the intrinsic neural dynamics of cortical processing and the neural processes underlying perceptual inference (88)(89)(90), and neural networks with mutual inhibition as the main ingredient have been designed to perform perceptual inference (89). The present model adds attentional modulation as a critical component, not only to explain a large body of literature on the phenomenon of binocular rivalry but also toward developing a neural-based computational theory of perceptual inference.
Binocular rivalry is rare in everyday visual experience. Outside the laboratory, discrepancies between monocular images from the two eyes often occur when a foreground object occludes the background of a visual scene, leaving some regions of the background visible to only one eye and other regions visible to only the other eye. At corresponding retinal locations with conflicting inputs in the two eyes, the retinal image with lower strength (e.g., contrast) is usually suppressed for a prolonged period. Even though perceptual alternations are rarely experienced in this case, the interocular suppression might arise from the same processes that drive the suppression in binocular rivalry (91).

Conclusions
Attention has long been known to affect binocular rivalry (59,92). However, not until recently was it recognized that attention is necessary for rivalry (14)(15)(16)(17)(18). These findings require a revision of the computational framework for binocular rivalry. In the model we propose here, attention plays a role that amplifies visual competition by biasing attention gain toward one of the rival stimuli. This is similar to the role of attention in natural viewing: attention regulates competing information and allocates limited neural resources (93,94).
Our model exhibited attention-dependent dynamics and captured the dynamics of binocular rivalry in a wide range of experimental conditions. The computations in the model (divisive normalization, mutual inhibition, and attentional modulation) have been hypothesized to be canonical motifs underlying information processing at multiple stages of the visual processing hierarchy (19,27,38). Consequently, this framework can be extended to understand the spatiotemporal dynamics of interactions between attention and perception in conditions other than binocular rivalry.