Communicating artificial neural networks develop efficient color-naming systems

Significance Color names in human languages are organized into efficient systems optimizing an accuracy/complexity trade-off. We show that artificial neural networks trained with generic deep-learning methods to play a color-discrimination game develop color-naming systems whose distribution on the accuracy/complexity plane is strikingly similar to that of human languages. We proceed to show that efficiency and narrow complexity crucially depend on the discrete nature of communication, acting as an information bottleneck on the emergent code. This suggests that efficient categorization of colors (and possibly other semantic domains) in natural languages does not depend on specific biological constraints of humans, but it is instead a general property of discrete communication systems.


Salience-weighted source distribution
prior is shown in Figure S3. We re-run the analysis described in the main paper, but now sampling both targets and distractors according to the SW 43 distribution. Figure S4 confirms that our results do not depend on the uniform assumption made in the main paper. With this 44 alternative skewed input distribution as well (see Figure S3), NN systems are as efficient as the humans ones, and lying just 45 below the same segment of the IB curve.   vs. their rotated hypothetical variants. The continuous curve represents averages across systems for each rotation degree, and the colored region marks standard deviation across systems.  attained using two color words only.

60
To construct the first game, we use the FCM clustering algorithm (see main text) to partition the color space into two 61 clusters optimized for minimal intra-cluster distance (FCM is a fuzzy clustering algorithm, but we discretize its outcome to 62 obtain a hard partition). As shown in Figure S7a (left), this leads roughly to a yellow/other distinction. For the second game, 63 we partition the color space into dark and light regions (see Figure S7b (left)). This partition is in line with the basic distinction 64 found in human languages with two color-terms (such as Dani) (3).
(a) Color space partition obtained with FCM clustering with 2 clusters (left panels), and the 3 successful NN systems trained on a game where targets and distractors are always sampled from the two distinct clusters (next 3 panels).
(b) Dark/light partition of the color space (left), and the 3 successful NN systems trained on a game where targets and distractors are always sampled from the two distinct regions (next 3 panels). We conjecture that the low success rate of GS with τ = 10 stems from different reasons than failures in the more discrete 84 settings. In particular, we expect that, in the more discrete settings, complexity of the emergent system after training is 85 systematically lower in failed runs, because failures stem from the difficulty of establishing a sufficiently complex protocol 86 through the discrete channel. However, this should not be the case for τ = 10, where complexity should be comparable in 87 failed and successful runs. We verify this hypothesis quantitatively in Table S1

149
A. Estimating sep. We aim to quantify how redundant/separate each word is. For example, in English, the word "scarlet" is in 150 a sense redundant, as its meaning is included in that of "red". Using both words makes the color-naming system less efficient.

151
Indeed, "red" is a fine word to refer to scarlet tonalities, and adding "scarlet" only slightly increases communication accuracy 152 at the cost of an increase in complexity ("scarlet" might still be useful as a specialized word, of course). Formally, to measure if 153 a word w is redundant, we need to find a w that covers the same reference. To do so, we define Cw = {c, s.t. c denoted by w} 154 the set of colors/references denoted by the word w. Moreover, for two given words w and w , p(Cw ⊂ C w ), is the probability 155 that the denotation of w is separate from that of w , such that: all systems in smooth (i.e., trained with high τ = 10) include some words with low sep, confirming that a smooth channel leads 178 to the emergence of redundant words, at the cost of efficiency.  the color space into convex regions, but they do not rely on the dark/light dimension as the core axis along which to partition colors. Also, they appear to stay closer to a purely perception-based partition of color space than human languages do, which 194 actually makes them more convex than human languages.

195
To quantify the similarity between NN and human color partitioning, we frame it as a clustering problem. The first row of Table S2 reports averaged best F1 for the NN systems when using the WCS names as ground-truth labels. ‡

208
To make sense of these numbers, we compare them to an upper bound and a baseline in the next two rows. The upper bound is 209 given by averaging the same score across WCS languages, when using the nearest language to each as ground truth. NN naming 210 schemes are clearly farther away from those of the nearest natural languages than natural languages are from each other. The 211 baseline is obtained by generating, for each WCS language, 100 pseudo-naming-systems with the same label frequencies. The

212
F1 score with respect to the reference WCS language is computed for each of the 100 pseudo-naming-systems, and the best one 213 is retained for each WCS language. This is an informed baseline because it has access to the ground-truth label distribution.

214
Across all cardinalities, NN systems are much closer to actual WCS languages than the informed baseline is.

215
Where does the difference between natural and NN naming schemes come from? A partial answer is provided by the next 216 two rows of Table S2, where we evaluate to what degree natural and NN systems match the partitions obtained through fuzzy 217 c-means (FCM) clustering (7) in CIELAB space. The latter should approximate color space partitions that are optimal on 218 purely perceptual grounds (FCM returns partitions that minimize within-cluster distance in color space). We compare each 219 NN/human naming system to the (discretized) FCM solution with K equal to the naming system cardinality. that are not part of our language emergence simulations.
226 Figure S12 provides a more qualitative insight into how NN and natural language color partitions differ, by visualizing 227 emergent naming systems of cardinality 3 and 5 together with their nearest WCS languages. To avoid cherry-picking, we chose, 228 for each cardinality, the naming systems with median best F1. The results are generally representative, although for higher 229 cardinalities a qualitative comparison becomes problematic due to considerable noise in the WCS data.

230
Wobé, the reference 3-color-term human system (top right of Figure S12), illustrates the near-universal 3-way split into 231 "light", "dark" and red (5). The corresponding NN system (top left) does not encode the dark/light split. While unnatural in 232 this respect, the partitioning is, like those found in human languages, clearly convex (8, 9). The NN system clusters correspond, 233 moreover, to the other basic colors attested in low-complexity human languages, once we exclude the dark/light distinction: 234 red, green and a yellow/brown patch. Bauzi (bottom right of Figure S12) Table S3. Average degree of convexity by cardinality for different naming systems (standard deviation in parenthesis; the latter is NA when there is only one tested naming system of the corresponding cardinality).
The degree of convexity of NN systems is extremely high, and approaching that of FCM clustering (which naturally favors 243 convexity because of its distance-minimizing objective). Remarkably, the degree of convexity of NN systems is higher than that 244 of the natural languages in WCS. This might be due, again, to the fact that humans must optimize communicative constraints 245 that are not entirely perception-driven, or, more simply, to the noise inherent in the WCS surveying methodology. We leave 246 this intriguing question to further work. §

247
In sum, NN color-naming systems, like (and perhaps more than) human ones, show a clear tendency to partition the color 248 space into convex regions. However, the latter regions depart to some extent from those typically defined by human color 249 naming. NN systems might stay closer to a purely perceptual partitioning of the color space. Moreover, qualitatively, they 250 do not seem to enforce the distinction between white (light) and black (dark), which is instead universally present in human 251 languages. Note however that, as we report in Supplementary 5, NNs are in principle able to discover the dark/light distinction 252 if we encourage it in the design of the game.