## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Receptor arrays optimized for natural odor statistics

Edited by Leslie B. Vosshall, The Rockefeller University, New York, NY, and approved March 22, 2016 (received for review January 11, 2016)

## Significance

Natural odors typically consist of many molecules at different concentrations, which together determine the odor identity. This information is collectively encoded by olfactory receptors and then forwarded to the brain. However, it is unclear how the receptor activity can encode both the composition of the odor and the concentrations of its constituents. We study a simple model of the olfactory receptors from which we derive design principles for optimally communicating odor information in a given natural environment. We use these results to discuss biological olfactory systems, and we propose how they can be used to improve artificial sensor arrays.

## Abstract

Natural odors typically consist of many molecules at different concentrations. It is unclear how the numerous odorant molecules and their possible mixtures are discriminated by relatively few olfactory receptors. Using an information theoretic model, we show that a receptor array is optimal for this task if it achieves two possibly conflicting goals: (*i*) Each receptor should respond to half of all odors and (*ii*) the response of different receptors should be uncorrelated when averaged over odors presented with natural statistics. We use these design principles to predict statistics of the affinities between receptors and odorant molecules for a broad class of odor statistics. We also show that optimal receptor arrays can be tuned to either resolve concentrations well or distinguish mixtures reliably. Finally, we use our results to predict properties of experimentally measured receptor arrays. Our work can thus be used to better understand natural olfaction, and it also suggests ways to improve artificial sensor arrays.

Discrimination of olfactory signals occurs in a high-dimensional space of odor stimuli in which a large number of distinct molecules and their mixtures can be distinguished by a much smaller number of receptors (1⇓–3). For example, humans have about 300 distinct olfactory receptors (4), which can sense at least 2,100 odorant molecules (5), and the real number might be much larger (1). Moreover, humans can differentiate between mixtures of up to 30 odorants (6). Such remarkable molecular discrimination is thought to use a combinatorial code (7, 8), where typical odorant molecules bind to receptors of multiple types (1, 3). Each receptor type is expressed in many cells (9), and the information from all receptors of the same type is accumulated in corresponding glomeruli in the olfactory bulb (10, 11) (see Fig. 1*A*). The activity of a single glomerulus is thus the total signal of the associated receptor type, so the information about the odor is encoded in the activity pattern of the glomeruli (11, 12). This activity pattern is interpreted by the brain to learn about the composition and the concentration of the inhaled odor. We here study how receptor arrays can maximize the transmitted information.

It is known (13, 14) that the input−output characteristics of sensory apparatuses of many organisms are tailored to the statistics of the organism’s natural environment to maximize information transmission. For example, in the visual circuit of the fly, the input−output relationship of neurons is matched to the cumulative distribution of the input distribution (13). Similar observations have since been made in many sensory systems (14, 15) and even in transcriptional regulation (16). In all these cases, the distinguishable outputs of the sensory system must be dedicated to equal parts of the input distribution, which is known as Laughlin’s principle (13) or histogram equalization (17). Intuitively, more of the response range is dedicated to common stimuli, at the expense of less frequent stimuli (13).

Similarly, the binding affinities of olfactory receptors might reflect the natural statistics of odors in an organism’s environment. Odors vary across environments and differ in both their frequency and composition (18). For example, some molecules might frequently appear together because they originate from the same source, whereas others are rarely found in the same odor. Additionally, some odors are more important to recognize than others, which corresponds to considering an increased frequency for these odors. Together, the frequencies and correlations constitute the natural olfactory scene.

It is not clear how olfactory receptors can account for natural odor statistics. Merely dedicating more receptors to common odors is not optimal, given the small number of available receptors and the many-to-many relationship between receptors and odors. Further, the value of a receptor is strongly dependent on how it complements the other receptors in the array; many “good” receptors can still create a poor array. Finally, the concentrations of molecules composing an odor can vary widely. Odors need to be distinguished in both quality and quantity; hence receptors must vary in both what molecules they respond to and how strongly they do this. Given the statistics of an olfactory scene, what combination of odorants should different receptors in an array respond to?

We use an information theoretic approach to quantify how well a receptor array is matched to given odor statistics. We generalize Laughlin’s principle to the high-dimensional case and show that optimal receptor arrays should obey two general principles: (*i*) Each receptor should be active half the time when odors are presented with natural statistics. (*ii*) The activities of any pair of receptors should be uncorrelated when averaged over all odors presented with natural statistics. If both conditions are satisfied for an array of *B*). The two basic principles may be obvious with some thought, but they usually cannot be satisfied simultaneously. We thus also determine the relative costs of violating the two conditions and use this to carry out numerical and analytical optimizations to determine conditions for optimal receptor arrays. Furthermore, our model implies relationships between the typical ligand concentrations and the ability to discriminate mixtures that have been missed before.

After introducing our general framework below, we first discuss general properties of optimal receptor arrays. We then consider two different classes of natural statistics, for which we find optimal receptors in terms of random matrices. Here, our information theoretic approach provides a combined measure of the array’s performance in multiple aspects—from the resolution of ligand concentrations to the discrimination of mixture composition. We thus finally discuss the trade-off between such potentially mutually exclusive goals and compare our results to experimentally measured receptor arrays.

## Results

Odors are mixtures of odorant molecules that are ligands of olfactory receptors. Any odor can be described by a vector *n* to ligand *i* can be described by a single number *n* is given by (19, 20)*n* is given by**1** and **2** describe the mapping of the odor *C*). This activity pattern is then analyzed by the brain to infer the odor

We assume that the structure of natural odors in a given environment can be captured by a probability distribution *i* occurs in a random odor. The correlations between the occurrences of ligands are captured by a covariance matrix *i* is present, we assume its concentration *i* and a covariance matrix

### Optimal Receptor Arrays.

An optimal receptor array must tailor receptor sensitivities **1** and **2** dedicates more activity patterns to more frequent or more important odors as specified by *I* can be written as the entropy of the output distribution *I* depends on *I* is maximized by sensitivities

The mutual information *I* can be approximated (27) in terms of the mean activities **3** and **4**, the maximal mutual information of

These design principles follow from very general considerations, but they may not always be simultaneously achievable. To understand such constraints, we study how microscopic properties of receptor arrays (the sensitivities *Supporting Information*). The covariance *I* (see *Supporting Information*). These statistics of **1** and read

Combining Eqs. **4** and **6** to estimate mutual information, we can quantify how well an array’s sensitivities

### Random Sensitivity Matrices.

We next study which sensitivity matrices **5** for given odor statistics. Here, we will show that random

#### Narrow concentration distributions.

We begin with the simple case where the concentration distributions are narrow, *n* reacts to ligand *i* and **2** and **6**, as shown in *Supporting Information*. In the simple case of uncorrelated mixtures (*Supporting Information*, we also calculate corrections due to the correlated appearance of ligands (

In the case of uncorrelated mixtures, we find, using Eq. **5**, that *i*) the occurrence probabilities *ii*) no ligand activates multiple receptors. Because any given ligand is rare in natural odors, **4** gives the relative cost of violating these two possibly conflicting requirements.

This partition problem can be solved approximately using random binary sensitivity matrices. The ensemble of such matrices is characterized by a single parameter, the fraction of nonzero entries or sparsity *ξ*. Fig. 2*A* shows that there is an optimal sparsity *I* is maximized. It follows from *Supporting Information*). This condition for random matrices agrees well with the sparsity found from numerical optimization over all binary matrices (see Fig. 2*B*). However, for small *s*, the sparsity **8** for small mixture sizes *s* (see Fig. 2*B*).

#### Wide concentration distributions.

In reality, odor concentrations vary widely, and receptor arrays must thus measure both odor composition and concentrations. The concentration of a single ligand can be measured if many receptors react to it with different sensitivities (7). The receptor array is optimal for this task if all possible outputs occur with equal frequency. This is the case if the inverse of the sensitivities follows the same distribution as the ligand concentrations (13), which is known as Laughlin’s principle. However, it is not clear how this principle can be generalized for measuring the concentration of multiple ligands simultaneously.

We study this problem by considering random sensitivities that are lognormally distributed. This choice is motivated by the complex interaction between receptors and ligands, which typically leads to normally distributed binding energies (28). We will show later that experimentally measured sensitivities indeed appear to be lognormally distributed. Lognormal distributions are characterized by two parameters, the mean *λ* of the underlying normal distribution. We thus next ask how these parameters have to be chosen to maximize the mutual information *I*. To estimate *I*, we need to consider the excitations **6** and read **2** and find that the receptor array is optimal (*Supporting Information*)*I* as a function of *λ*. Fig. 3*A* shows that Eq. **9** predicts the optimal parameters of lognormally distributed sensitivities very well. Fig. 3*B* shows that this result also predicts the mean

Log-normally distributed sensitivities perform badly if the distribution width *λ* is small (see Fig. 3*A*). This is expected because receptors with narrowly distributed *I*. Interestingly, for large enough *λ*, the correlations are so small that the exact value of *λ* does not influence *I* significantly (see Fig. 3*A*). In fact, for very large *λ*, the **9**, receptors can thus only detect whether ligands are present or not, corresponding to the binary sensitivities discussed above, which cannot resolve the concentration of the ligands. Consequently, *λ* must influence how well such receptor arrays can resolve concentrations.

#### Trade-off between concentration resolution and mixture discriminability.

When the distribution width *λ* is large, the receptor arrays have similar performance *I*, so they are equally good at the combined problem of resolving concentrations and discriminating mixtures. However, the performance in the individual problems can vary widely. Because, in many contexts, we might wish to trade off performance, say, by sacrificing some ability to discriminate mixtures in favor of a better concentration resolution, we next investigate these properties in detail.

We define the concentration resolution *R* as the ratio of the concentration *c* at which a single ligand is presented and the concentration change *η* additional receptors have to be excited to register a change in concentration. *R* is a function of the concentration *c* at which it is measured and its maximal value*Supporting Information*).

The range of concentrations that can be detected by the receptor array is given by the ratio of the largest concentration *η*, the logarithm of the concentration range *Supporting Information*)**11** shows that *λ* determines the number of concentration decades over which the receptor array is sensitive.

Taken together, *λ* has opposing effects on the resolution and the range of concentration measurements (see Fig. 4*A*). Consequently, *λ* can be tuned either for receptors that resolve concentrations well or cover a large concentration range. If only single ligands are measured, the optimal *λ* only depends on the concentration distribution *I* can be calculated from the resolution function *I* (31). For odor mixtures, *I* accounts for a combination of the concentration resolution and the mixture discrimination, and maximizing *I* does not uniquely determine an optimal receptor array. We thus next study how the distribution width *λ* influences the ability to discriminate mixtures.

We first consider mixtures of *s* ligands, each at concentration *c*, and determine the maximal size *s* that obeys (see *Supporting Information*)*μ* and variance *A* shows that

Not all mixtures with less then *h* of the activity patterns *s* ligands, sharing *h* (see *Supporting Information*). Fig. 5*B* shows that this approximation (solid lines) agrees well with numerical calculations (symbols). The figure also shows that mixtures can only be distinguished well if the concentration of the constituents is in the right range. This is because receptors are barely excited for too small concentrations, whereas they are saturated for large concentrations. The distance *h* also strongly depends on the number *B* shows that a single different ligand can be sufficient to distinguish mixtures in the right concentration range (green line). This range increases with the width *λ* of the sensitivity distribution, similar to the range over which concentrations can be measured (see Eq. **11**). The suitable concentration range is also a function of the mean sensitivity **9**). Consequently, our model predicts that only mixtures with total concentrations near the average concentration in natural mixtures can be distinguished well.

### Experimentally Measured Receptor Arrays.

The response of receptors to individual ligands has been measured experimentally for flies (33) and humans (34). We use these published data to estimate the statistics of realistic sensitivity matrices as described in *Supporting Information*. Fig. 6 shows the histograms of the logarithms of the sensitivities for flies and humans. Both histograms are close to a normal distribution, with similar SDs *Supporting Information*). Consequently, these interaction energies exhibit a similar variation on the order of 1

We next use the measured lognormal distribution for the sensitivities to compare the concentration resolution *R* predicted by Eq. **10** to measured “just noticeable relative differences” **11** for

Our theory also predicts the maximal number of ligands that can be distinguished as a function of the concentration *c* of the individual ligands. For *A*). Experimental studies report similar numbers, e.g., *A* shows that *B* shows that the concentration range over which mixtures can be distinguished is less than an order of magnitude for

## Discussion

We studied how arrays of olfactory receptors can be used to measure odor mixtures, focusing on the combinatorial code of olfaction, i.e., how the combined response of multiple receptors can encode the composition (quality) and the concentration (quantity) of odors. Such arrays are optimal if each receptor responds to half of the encountered odors and the receptors have distinct ligand binding profiles to minimize correlations.

Our simple model of binary receptors can, in principle, distinguish a huge number of odors, because there are *ξ* in the simple case of binary sensitivities. If *ξ* is small, combining different ligands typically leads to unique output patterns that allow identification of the mixtures, but the concentration of isolated ligands cannot be measured reliably, because only a few receptors are involved. Conversely, if *ξ* is large, mixtures of multiple ligands will excite almost all receptors, such that neither the odor quality nor the odor quantity can be measured reliably. However, here, the concentration of an isolated ligand can be measured precisely. We discussed this property in detail for sensitivities that are lognormally distributed, where the width *λ* controls whether mixtures can be distinguished well or concentrations can be measured reliably. Interestingly, experiments find that individual ligands at moderate concentration only excite a few glomeruli (37), but natural odors at native concentrations can excite many (38). This could imply that the sensitivities are indeed adapted such that each receptor is excited about half the time for natural odors.

Our model implies that having more receptor types can improve all properties of the receptor array. In particular, both the concentration resolution *R* and the typical distance *h* between mixtures are proportional to *B*).

Our results also apply to artificial chemical sensor arrays known as “artificial noses” (40, 41). Having more sensors improves the general performance of the array, but it is also important to tune the sensitivity of individual sensors. Here, sensors should be as diverse as possible while still responding to about half the incoming mixtures. Unfortunately, building such chemical sensors is difficult, and their binding properties are hard to control (41). If the sensitivity matrix of the sensor array is known, our theory can be used to estimate the information *n* contributes as **4**). This can then be used for identifying poor receptors that contribute only a little information to the overall results.

Our focus on the combinatorial code of the olfactory system certainly neglects intricate details of the system. For instance, we do not consider the dynamics of sniffing and odor absorption, which are the first processing steps and influence the perception (42). Further, our simple model of the binding of odorants to receptors, described by sensitivity matrices with independent entries, neglects biophysical constraints that will cause chemically similar ligands to excite similar receptors (8, 43). This is important because it makes it difficult to distinguish similar ligands (44), and it might thus be worthwhile to dedicate more receptors to such a part of chemical space. Additionally, receptors or glomeruli might interact with each other, e.g., causing inhibition reducing the signal upon binding a ligand (45). We can, in principle, discuss inhibition in our model by allowing for negative sensitivities, but more complicated features cannot be captured by the linear relationship in Eq. **1**. One important nonlinearity is the dose–response curve of individual receptor neurons (21), which we approximate by a step function (see Eq. **2**). This simplification reduces the information capacity of a single glomerulus to 1 bit, whereas it is likely higher in reality. However, we expect that allowing for multiple output levels would only increase the concentration resolution and not change the discriminability of mixtures very much (23). Additionally, these perceptual quantities could be influenced by other processes, e.g., lateral inhibition between glomeruli (11, 46) and top-down modulation that adjusts the sensory system based on behavior (46). Besides such enhancements of olfactory sensing, further processing can only remove information, so our results provide an upper bound for the ability to recognize odors.

## Receptor Sensitivities

### Equilibrium Binding Model.

We consider a simple model where receptors *n* and ligand *i*. In equilibrium, the concentrations denoted by square brackets obey *n* is proportional to the concentration of bound ligands,*n*. As discussed in the Introduction, the excitations of all receptors of a given type are accumulated in the respective glomeruli, whose excitation *n*. In the simple case of binary outputs, a glomerulus becomes active if its excitation exceeds a threshold **S2** and introduce the rescaled quantities**1** and **2**.

A simple theory (28) predicts that the interaction energies **S1**). In this case, the sensitivities **S4**).

### Measured Receptor Sensitivities.

Response matrices have been measured experimentally for flies (33) and humans (34). The fly database has been constructed by merging data from many studies that used various methods to measure receptor responses (33). It contains a nonzero response for 5,482 receptor−ligand pairs, covering all 52 receptors that are present in flies. Fig. 6*A* shows the histogram of the logarithm of the associated sensitivities together with a normal distribution with the same mean and variance as the data.

The only comprehensive study of human olfactory receptors used a luciferase assay to measure receptor responses in vitro (34). It reports the intensity of clones of 511 human olfactory receptors in response to various concentrations of 73 ligands. Typically, the intensity of a given receptor−ligand pair is monotonously increasing as a function of ligand concentration *c*. We normalize the intensity to lie between 0 and 1 and fit a hyperbolic tangent function to determine the concentration *B* shows the histogram of the logarithm of these sensitivities together with a normal distribution with the same mean and variance as the data.

## Receptor Response

We next discuss the statistics of the receptor responses as a function of the odor statistics

### Narrow Concentration Distribution.

In this case, we are only interested in measuring the composition of an odor *i* (related to *i* and *j* (related to

#### Uncorrelated mixtures.

For uncorrelated mixtures (**S7** provides the mapping between the commonness **S5** and the ligand frequency **2**, is a function of the excitation **2** can be approximated by**7**.

The receptor activity for binary sensitivity matrices with independent and identically distributed entries is described by*ξ* denotes the sparsity of *I* is maximized is given by the condition **S14a** and solving for *ξ*, we obtain*s*, becomes

#### Correlated mixtures.

We consider weakly correlated mixtures, where we expand all results to linear order in *i* appears reads**S8**. Hence,**S20** and **S23** provide the mapping between **S5**.

The statistics of the receptor activity

Expanding the fractions in Eq. **S25**, we obtain**S26**, this becomes*ξ*, we obtain

### Wide Concentration Distribution.

We next consider mixtures where the concentrations of the individual ligands are drawn from a continuous distribution (*μ _{i}*, and their SD

*σ*

_{i}. In the case where a receptor is excited by many ligands, its excitation

**6**. The associated mean receptor activity

**S30**around the optimal point

**S33**. Consequently, we have

**S30**and

**S35**)

**S34**.

### Numerical Simulations.

We use a simple two-step procedure to draw odors **S5**. Here, **S20** and **S22**, respectively. In the case of narrow concentration distributions, the odor *i* that appears in a mixture (

Given this odor statistics *I* can, in principle, be calculated from Eqs. **1**−**3**. Calculating **3** involves an integral over **2**. We approximate this integral using Monte Carlo sampling of the odor statistics *I* is not exact. Consequently, we use the stochastic, derivative-free numerical optimization method covariance matrix adaptation evolution strategy (CMA-ES) (47) to optimize the sensitivity matrix *I* to produce Fig. 3*B*.

## Properties of Arrays with Random Sensitivities

We study properties of receptor arrays characterized by random sensitivity matrices *λ*, which is the SD of the underlying normal distribution. Note that all following calculations could also be performed for other sensitivity distributions.

### Concentration Resolution.

The fraction *c* reads*η* additional receptor is then defined by the condition *c*, the solution for **10**.

### Concentration Range.

The minimal concentration **11**.

### Maximal Number of Distinguishable Ligands.

In the simple case of a mixture with *s* ligands, all at concentration *c*, the fraction *A* shows that Eq. **S44** approximates the numerically determined

We next consider the maximal number of ligands that can be distinguished. Here, we, for simplicity, consider the case where mixtures can be distinguished when they excite activity patterns that differ for at least *η* receptors. Because a mixture with *s* components on average excites *s*, this condition can be approximated by*B* shows that this function has a single peak. Mixtures with *c* is above the odor detection threshold, and the second condition ensures that the two largest mixtures excite sufficiently different activity patterns.

### Discriminability of Two Mixtures of Equal Size.

We next consider how well two mixtures can be discriminated. For simplicity, we consider two mixtures, each with *s* ligands of which *c*. The excitations can be rewritten as**S51** can also be written as**S50** can be expressed as*s* and **1** and **2**, and determine the associated difference. Fig. S1 shows that Eq. **S54** agrees well with these numerical results. Although *h* is a function of *s*, *c*, *λ*, and *s*, *λ*, because *h* as a function of *s* and *η* receptors, mixtures can typically be distinguished if

## Acknowledgments

We thank Carl Goodrich, Venkatesh N. Murthy, and Michael Tikhonov for helpful discussions and a critical reading of the manuscript. This research was funded by the National Science Foundation (NSF) through DMR-1435964, DMR-1420570, and DMS-1411694. M.P.B. is an investigator of the Simons Foundation. D.Z. was also funded by the German Science Foundation through ZW 222/1-1, the NSF through PHY11-25915, the National Institutes of Health Award 5R25GM067110-07, and the Moore Foundation Award 2919.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: brenner{at}seas.harvard.edu or amurugan{at}uchicago.edu.

Author contributions: D.Z., A.M., and M.P.B. designed research, performed research, analyzed data, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1600357113/-/DCSupplemental.

## References

- ↵
- ↵
- ↵
- ↵
- ↵.
- Dunkel M, et al.

- ↵.
- Weiss T, et al.

- ↵.
- Hopfield JJ

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Tkacik G,
- Callan CG Jr,
- Bialek W

- ↵
- ↵.
- Wright GA,
- Thomson MG

*Integrative Plant Biochemistry*, Recent Advances in Phytochemistry (Elsevier, New York), Vol 39, pp 191–226 - ↵
- ↵
- ↵.
- Reisert J,
- Restrepo D

- ↵.
- Lowe G,
- Gold GH

- ↵.
- Koulakov A,
- Gelperin A,
- Rinberg D

- ↵.
- Stevens CF

- ↵
- ↵
- ↵
- ↵.
- Lancet D,
- Sadovsky E,
- Seidemann E

- ↵
- ↵.
- Abraham MH,
- Sánchez-Moreno R,
- Cometto-Muñiz JE,
- Cain WS

- ↵.
- Bialek W

- ↵.
- Bushdid C,
- Magnasco MO,
- Vosshall LB,
- Keller A

- ↵
- ↵
- ↵.
- Cain WS

- ↵.
- Jinks A,
- Laing DG

- ↵.
- Saito H,
- Chi Q,
- Zhuang H,
- Matsunami H,
- Mainland JD

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵.
- Ukhanov K,
- Corey EA,
- Brunert D,
- Klasen K,
- Ache BW

- ↵
- ↵.
- Hansen N