From likely to likable: The role of statistical typicality in human social assessment of faces
- aDepartment of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093;
- bDepartment of Psychology, University of California San Diego, La Jolla, CA 92093;
- cDepartment of Cognitive Neuroscience, Maastricht University, 6229 ER Maastricht, The Netherlands;
- dFaculty of Psychology, SWPS University of Social Sciences and Humanities, 03-815 Warsaw, Poland;
- eDepartment of Cognitive Science, University of California San Diego, La Jolla, CA 92093;
- fHalicioğlu Data Science Institute, University of California San Diego, La Jolla, CA 92093
See allHide authors and affiliations
Edited by Wilson S. Geisler, The University of Texas at Austin, Austin, Texas, and approved July 29, 2020 (received for review January 10, 2020)

Abstract
Humans readily form social impressions, such as attractiveness and trustworthiness, from a stranger’s facial features. Understanding the provenance of these impressions has clear scientific importance and societal implications. Motivated by the efficient coding hypothesis of brain representation, as well as Claude Shannon’s theoretical result that maximally efficient representational systems assign shorter codes to statistically more typical data (quantified as log likelihood), we suggest that social “liking” of faces increases with statistical typicality. Combining human behavioral data and computational modeling, we show that perceived attractiveness, trustworthiness, dominance, and valence of a face image linearly increase with its statistical typicality (log likelihood). We also show that statistical typicality can at least partially explain the role of symmetry in attractiveness perception. Additionally, by assuming that the brain focuses on a task-relevant subset of facial features and assessing log likelihood of a face using those features, our model can explain the “ugliness-in-averageness” effect found in social psychology, whereby otherwise attractive, intercategory faces diminish in attractiveness during a categorization task.
Humans readily form social impressions, such as attractiveness and trustworthiness, from a brief glance of a stranger’s face (1⇓⇓–4). While the accuracy of these social judgments is actively debated (4), such social impressions clearly exert a powerful influence on daily human social life (4), whether choosing a life partner, assessing eye witness testimony, interviewing a job candidate, or choosing whom to befriend. One of the most robust and curious findings in the study of social judgment of faces is the so-called “beauty-in-averageness” (BiA) effect, whereby blends of two or more face images are generally perceived to be more attractive than the “parent” face images (5, 6). A number of qualitative hypotheses have been put forth to explain the provenance of BiA, such as a human preference for symmetry (7, 8) or lack of blemishes (9). However, symmetry and blemishes appear at most to provide an incomplete explanation, as controlling for these factors does not eliminate BiA (10). Alternatively, it has been suggested that humans have a preference for highly prototypical stimuli over more unusual stimuli (11), possibly as a cue to mate value or reproductive health (12, 13). However, humans exhibit BiA effects not only for strangers’ faces but also for a variety of natural and artificial object categories, such as dogs, birds, butterflies, fish, automobiles, watches, and even synthetic dot patterns (11, 14, 15). Beyond attractiveness, perceived trustworthiness also appears greater for more typical-looking faces (16). Other recent studies have shown that attractiveness and trustworthiness perception are culturally dependent (17) and can be rapidly modified via exposure to specific types of faces (18). Altogether, these results suggest that there may be a general human social preference for more typical-looking objects, and this effect depends on rather general cognitive mechanisms beyond those specific to beauty or mating. Here, we propose a statistically grounded theory of human “liking” of more typical-looking faces (and other objects) (16, 19, 20). We suggest that the perceived appeal of a face is, at least in part, monotonically driven by a measure of “typicality” of the face with respect to the face distribution, specifically in terms of the log likelihood (LL) of the face under the distribution. We refer to our measure as statistical typicality to contrast it from previous nonstatistical proposals of the link between typicality and attractiveness (16, 19, 20). Intuitively, as long as the face distribution is unimodal (Fig. 1A), the blend (average) of two parent faces tends to be closer to the mean, and thus have higher LL than the parent faces, resulting in BiA (5, 6, 16). Formally, we model faces as points in a vector space (“face space”) (21), defined by their latent feature values from a widely used computer vision model (22⇓–24), and estimate the face distribution using a demographically balanced, publicly available face dataset, Chicago Face Database (CFD) (25). As we will show, attractiveness ratings from the CFD correlate with LL of these faces, and simulations based on CFD faces exhibit expected BiA effects. Moreover, we report an experimental finding that model-estimated LL of face stimuli linearly predicts not only perceived attractiveness, but also trustworthiness, dominance, and valence—traits thought by social psychologists to be particularly fundamental in face-based social judgments (4). We also demonstrate that statistical typicality may even partially explain the role of symmetry in attractiveness perception (7, 8), as symmetry correlates with LL among CFD faces, and symmetrization manipulations increase LL.
BiA vs. UiA. (A) In a unimodal distribution, the blend (
If human liking of a face indeed relates to statistical typicality, then where the distribution is bimodal, we would expect a cross-modal blend to be less likable than the parent stimuli (Fig. 1B). Indeed, empirical studies show that cross-category blends [e.g., biracial (26), bigender (27) faces, and cross-category synthetic stimuli (28)] elicit lower attractiveness ratings than same-category blends, but only in the relevant categorization context (outside the relevant categorization context, these stimuli still elicit BiA). To explain ugliness in averageness (UiA), we make one additional assumption: attention restricts processing to a task-relevant subspace of facial features (e.g., gender-discriminating features in the gender categorization task) (29⇓⇓–32). As in BiA, attractiveness perception is linked to LL but now defined over this task-relevant subspace instead of the full feature space. Whereas same-category blends exhibit the standard BiA (Fig. 1C), cross-category blends tend to fall in between modes, resulting in low LL and attractiveness. We will show how UiA arises naturally from the decrease in LL of cross-category faces when the representation is restricted to the task-relevant subspace, and that model-estimated LL of individual face images within that subspace is predictive of attractiveness rating in the categorization context (27).
One may well question why the brain should have an affect signal related to statistical typicality defined in terms of LL. For theoretical motivation, we appeal to Claude Shannon’s classical result (33) that a maximally efficient coding system (minimizing the code needed to represent data) should assign to each item a code length that is exactly its negative LL (−LL; i.e., shorter codes for more frequently encountered data). For example, Morse code is a fairly efficient code, consisting of short dots and long dashes, that assigns a dot to the most frequently used English letter “E” and much longer codes to rare letters such a “Q” (dash–dash–dot–dash). Relatedly, the “efficient coding hypothesis” posits that information representation and processing in the brain are highly efficient (34⇓–36), such that less neuronal response is allocated to encode more probable stimuli (34, 37⇓⇓⇓–41). In support, empirical evidence indicates that face-responsive areas in the human brain indeed decrease their response to more typical faces (42, 43). To attain and maintain coding efficiency in the brain, inefficiently coded stimuli should be aversive, so as to encourage representational updating that minimizes long-term coding cost averaged across observed data (44, 45). Interestingly, reducing the average negative LL of observed data under the assumed model, known as cross-entropy minimization, is also a popular and effective tool for representation learning in modern machine learning and artificial intelligence (46). Separately, empirical evidence indicates that face stimuli whose representation in the brain requires greater neural activity are indeed aversive (47⇓–49).
In summary, diverse empirical and theoretical results support efficient face coding in the brain and theoretically motivate a positive influence of LL-based statistical typicality on social liking of faces. However, this paper primarily focuses on the relationship between LL and social liking, and thus, its findings stand independently of the explanatory role played by efficient coding. In the following, we demonstrate how our model assumptions and predictions are validated by a combination of face data, statistical modeling, human behavioral data, and model simulations. We begin with an analysis of BiA effects and then proceed to UiA.
Results
We hypothesize that BiA arises because the blend of two faces tends to be closer to the mean of the distribution and have higher LL than the parent faces (Fig. 1A). To examine the statistical distribution of faces (Materials and Methods and SI Appendix), we utilize a demographically balanced, publicly available dataset (CFD) of 597 face images (25) and retrieve their latent feature representation from a commonly used computer vision model, the active appearance model (AAM) (22⇓–24). The AAM has previously been used to model human face representation (20, 50), and AAM feature dimensions (axes) appear to be encoded linearly by face-selective neurons in the primate brain (51).
Simulation: BiA.
To check whether statistical typicality (LL) of CFD faces and their blends can reproduce experimentally observed BiA effects, we need a parametric density model of faces. Due to the relatively small number of CFD faces (597) compared with the large number of AAM features (90), we fit a multivariate normal distribution to the CFD data—although we find that LL is highly correlated whether we fit a single Gaussian or a mixture of Gaussians corresponding to demographic subgroups (e.g., genders) (SI Appendix)—and the two models correlate with CFD attractiveness ratings similarly well (SI Appendix). Taking 500 random one-dimensional (1D) projections of the CFD data in the AAM feature space (Fig. 2A shows an example projection), we find that normality cannot be rejected (Anderson Darling test, significance level
BiA and statistical typicality. (A) Empirical face distribution projected into a random axis is unimodal and bell shaped. (B) BiA: humans perceive more “evenly” blended faces as more attractive (data from ref. 26). (C) Simulated statistical typicality captures a similar trend as seen in data. Five hundred pairs of faces are randomly selected from CFD, each pair’s coordinates are averaged in varying proportions, and LLs for blends of the same proportion are averaged across pairs. Error bars indicate SEM.
Our simulations (SI Appendix) indicate that the blend of two randomly sampled CFD faces has higher LL (thus, presumably higher attractiveness) than parent face images (two-sample t test,
Experiment: Social Liking Varies Linearly with Statistical Typicality.
If statistical typicality modulates the affective experience of perceptual stimuli, then it should not only be restricted to the perception of attractiveness but also influence other desirable attributes such as trustworthiness (16). Here, we report results from an experiment (Materials and Methods) in which subjects rate face images (SI Appendix, Fig. S2 shows example stimuli) for attractiveness, trustworthiness, dominance, and valence (how “positive” a face appears) (Materials and Methods)—traits thought by social psychologists to be the most fundamental in face-based social trait perception (4). We find that face image ratings, averaged across subjects, of all four traits increase monotonically with LL of the face stimuli under the estimated normal density model (attractiveness: Pearson
Trait rating increases linearly and monotonically against statistical typicality (LL) for all four traits: attractiveness (A), trustworthiness (B), dominance (C), and valence (D). Data binning ensures an equal number of samples in each bin. Linear regression line (using binned data) is superimposed for visualization. The text has correlation coefficients for raw data. Error bars indicate SEM over samples in each bin.
One corollary of the statistical typicality account of attractiveness is that, if the distribution of faces is Gaussian over the underlying face feature space, then LL is a particular parameter-free quadratic function of the underlying face features (SI Appendix). This is consonant with previous work showing that a freely fitted quadratic model of human perception of attractiveness improves over a pure linear model (52, 53), except we make a more refined claim that the quadratic component is precisely LL and has no free parameters.
We hardly expect statistical typicality to be the sole contributor to attractiveness. Previous work suggested that human perception of attractiveness, as well as other traits such as trustworthiness and dominance, has both linear and squared dependence on the face feature space (52, 53). Here, we adapt this idea and jointly fit a multiple linear regression (MLR) model (“linear + LL” model) consisting of all of the linear terms (
Model illustration and comparison. (A) We use a two-dimensional simplified illustration to visualize the linear + LL model (in actuality, there are 90 dimensions). Oval: equi-LL contour of an axis-aligned normal density function, with
The astute reader might have noticed that the quadratic component (LL) is shared by all social traits, while the linear component (LTA) is fit to each trait individually. Indeed, using 10-fold cross-validation, we find that the LTA for each trait makes better prediction about the same trait on held-out test data than the LTA for any other trait (paired t test,
Trait-specific LTAs
Simulation: Symmetry and Statistical Typicality.
Revisiting the role of symmetry in attractiveness perception (7, 8), we suggest the possibility that symmetry elevates attractiveness perception at least in part through statistical typicality. Fig. 5A shows that the statistical typicality (LL) of CFD face images is negatively correlated (Pearson
Symmetry and statistical typicality. (A) LL negatively correlates with shape asymmetry. (B and C) Shape symmetrization increases LL of most face images. (A) Data binning ensures an equal number of samples in each bin. Linear regression line (using binned data) is superimposed for visualization. The text has the correlation coefficient for raw data. Error bars indicate SEM over faces in each bin.
UiA.
We hypothesize that UiA arises because attentional mechanisms focus on the task-relevant feature subspace, such that statistical typicality and subjective liking are both assessed within this subspace instead of the full face space. If the data distribution projected into this subspace is bimodal, then statistical typicality and subjective liking of an average between two samples from two different modes should be lower than if the parent faces were from the same mode (Fig. 1B). When we project CFD faces into the 1D subspace that best discriminates race [Caucasian vs. Asian (26, 27), found by regularized linear discriminant analysis] (SI Appendix), we find the face distribution to be indeed bimodal (Fig. 6C). Likewise, projecting CFD faces into the gender-informative 1D subspace also exhibits clear bimodality (Fig. 7B and SI Appendix). It may seem odd that the face distribution is both approximately Gaussian and a mixture distribution. The reason is that the mixture components only appear as distinct components (multimodal) when viewed from a small number of very particular feature dimensions (e.g., important for discriminating race or gender), but still Gaussian in the great majority of dimensions.
Simulation: UiA due to bimodality in the race-informative subspace. (A) Data are adapted from ref. 26: single-race (SR; Asian–Asian, White–White) face blends are rated as more attractive than mixed-race (MR; Asian–White) face blends, when a race categorization task precedes attractiveness rating (two-sample t test,
UiA of bigender blends induced by gender categorization. (A) AAM reconstruction of example stimuli used in ref. 27: blends of varying proportions of male and female parent faces. (B) The empirical distribution of male and female faces (25) projected into the gender-informative subspace is a mixture of two approximately normal distributions. X: mean location of actual experimental stimuli (27) for each percentage of blend. (C) Data: attractiveness rating in experimental condition minus control condition as a function of percentage blend (27). (D) Model-predicted statistical typicality assessed within the gender-discriminating subspace for the same faces as in C (27). (E) Model-predicted statistical typicality assessed in the full (original) space for the same faces as in C (27). Error bars in C–E are SEM over all stimuli used in the experiment for each percentage blend.
Simulation: UiA Reproduced by LL in Task-Relevant Subspace.
Human subjects have been found to rate single-race face blends more attractive than mixed-race face blends when they are required to categorize race before rating attractiveness (Fig. 6A) (26). We simulate the effect of this race categorization task on statistical typicality as follows. First, we randomly sample face images from CFD (25) to create 100 single-race (Asian–Asian, White–White) and 100 mixed-race (Asian–White) blends. Then, we project all face blends into the race-informative 1D subspace. The model-predicted statistical typicality for each face blend is its LL under the Bayesian a posteriori most probable race category (SI Appendix), consistent with a body of work showing that people often use category membership to predict features of, and reason about, members of a category (55⇓⇓–58). We find that model-predicted statistical typicality reproduces the empirically observed UiA effects (Fig. 6B).
We can make more refined predictions by smoothly varying the percentage of blending in each pair of faces. We first randomly draw 60 Asian and White face images (with replacement) from the face dataset (25) and then blend them at
Image-Level Comparison: Model vs. Data.
As a more stringent test of our model, we investigate the relationship between individual faces’ attractiveness ratings and model-predicted statistical typicality in the attended, task-relevant [linear discriminant analysis (LDA)] subspace. We reanalyze the stimuli and behavioral data from the gender categorization study (27), in which subjects rated the attractiveness of blends from male and female parent faces (Fig. 7A) in different proportions (
Discussion
In this paper, we proposed a statistically grounded account of human liking of high-dimensional objects, which in the case of faces, manifests itself as positive social evaluation across multiple traits. We showed that human perception of attractiveness and other positive traits of a face image depends on its statistical typicality, defined as its LL relative to an internal representation of the face distribution. This hypothesis is motivated by statistical and information-theoretic arguments that a good or efficient representation should maximize the average LL (or equivalently, minimize the average code lengths or cross-entropy) of observed data, and is related to the efficient coding hypothesis of neural representation (34). While our analysis is inherently correlational in nature, some existing findings can be reinterpreted to imply that statistical typicality has a causal effect on subjective liking. For example, in a neural phenomenon known as “repetition suppression,” repeated presentation of the same stimulus, such as a face image, has been shown to increase predicted likelihood of observing the stimulus (60) and leads to a robust decrease in evoked neural response (60⇓⇓–63). Relatedly, in the psychological “mere exposure” phenomenon, repeated exposure to a novel stimulus, such as a face image, leads subjects to report greater liking (64) and more positive perceived valence associated with the face (65). Together, it is clear that empirically increased frequency of a stimulus is sufficient to induce both lower neural representational cost and greater subjective liking. Whether the increase in liking is causally mediated by the decrease in neural coding cost is an important direction of future research.
Additionally, we demonstrated that categorization-induced UiA effects (26, 27) naturally arise when statistical typicality is dynamically redefined over the task-relevant featural subspace via attentional modulations (29⇓⇓–32, 66). For example, while categorizing gender, attention dynamically enhances the processing of the gender-discriminative features—formally, by restriction to the relevant featural subspace. The statistical distribution of faces is redefined within this subspace (appearing bimodal), thus leading to systematic reassignment of LL to each face. In particular, the faces that straddle the boundary between two categories (e.g., bigender blends) tend to have high LL in the full face space but low LL in the dynamically restricted representation—thus resulting in UiA. We showed that this theory can indeed quantitatively capture the categorization-induced changes in liking on an image by image basis. Critical for this theory is the assumption that the brain can dynamically alter its representation of faces in a task-dependent manner. Consistent with this, neural receptive fields for faces are known to be rapidly and dynamically modified by attention and task context (67). Notably, we expect rapid dynamic modulation of stimulus representation to be primarily applicable in the case of featural dimensions that are ecologically relevant (such as those discriminating race and gender). Such dynamic modulation may also be possible for arbitrary featural dimensions but, as logic would suggest and empirical evidence concurs (18, 28), would require extensive additional training. It would be interesting to test in future work whether UiA can also be causally induced by newly learned multimodal distributions (18, 28).
Supporting our suggestion that energy allocation in the brain should be efficient and thus proportional to negative LL of the face stimulus, human functional magnetic resonance imaging (fMRI) studies have shown that energy expenditure across multiple face-responsive areas in the brain, indexed by blood-oxygenation-level-dependent (BOLD) response, is indeed approximately quadratic (negative LL of a multivariate normal distribution is quadratic) (42, 43). While these empirical findings (42, 43, 68) were originally interpreted to implicate the relevant brain regions as encoding typicality or “distinctiveness” of faces, we suggest instead that any efficient face representation must respond in this quadratic manner given the approximate normal distribution of faces—this is the case whether or not there is an explicit encoding or decoding of typicality or distinctiveness in a brain area exhibiting a quadratic response to faces. We also note here the important distinction between statistical typicality and subjective typicality. While statistical typicality, of the kind of quadratic signal found in face areas (42, 43), might well contribute to subjective typicality, we have found (50) that human judgment of face typicality and distinctiveness (related to memorability) both have a strong linear component in the face space, just like the four social traits reported here, and thus cannot only be driven by statistical typicality or the quadratic signal found in face-responsive areas.
While a detailed neurocomputational theory is outside the scope of this paper, we briefly discuss one plausible, although obviously greatly simplified, neural implementation of our computational-level theory (69). Various studies have shown that familiar and unfamiliar faces are represented differently in the brain, both in terms of brain regions (62, 70, 71) and coding scheme (51, 72). In particular, familiar faces appear to involve dedicated feature detectors (72), while unfamiliar faces appear to be encoded via a dimensional scheme (51). Within our framework, one may well ask how the brain encodes a distribution over faces, which is necessary to represent the LL of a new face. One possibility, related to a sampling representation of distributions (73) and the notion of landmark points for manifold learning in machine learning (74), is for the brain to represent the face manifold (and distribution) using a sparsely sampled representation consisting of well-known faces. When a novel face is encountered, the brain could first identify the closest known face (55⇓⇓–58) and then use a dimensional coding scheme (51) to encode the discrepancy between the retrieved prototype and the novel face. This scheme builds on both prototype- and norm-based face representations (75) and is consistent with the predictive coding hypothesis (76, 77). Within this framework, a statistically atypical face incurs high coding cost because it tends to be far from the retrieved exemplar, and thus, the discrepancy will be large and expensive to represent in a dimensional coding scheme (51). In the UiA-inducing categorization setting, attention enhances the neural response to task-relevant features relative to task-irrelevant features. This has the effect of increasing the overall coding cost of a category-straddling stimulus, since the featural discrepancy (and thus, coding cost) between the stimulus and the closest retrieved prototype is highest in the task-relevant dimensions, which are enhanced by attentional modulation. Importantly, this example also illustrates that attentional modulation and statistical typicality alone are sufficient to explain UiA at the neural level, regardless of whether the relationship between statistical typicality and liking is due to efficient coding or not.
The above is but one plausible neural coding scheme, but it illustrates the important distinction between dynamic coding cost, which we hypothesize drives liking, and long-term structural cost (e.g., forming a new feature detector). Well-known faces, which are statistically more typical, are given greater structural representational resources in order to minimize their dynamic coding cost; conversely, statistically atypical faces incur high dynamic coding cost as a consequence of little dedicated structural representation (no dedicated feature detectors). While atypical faces in a stable, well-known environment tend to cancel each other out in terms of driving learning since they are inevitably and symmetrically distributed in the fringes of an approximately normal distribution, a sudden influx of atypical faces (such as that induced by immigration) could drive systematic representational plasticity so as to reduce average long-term coding cost.
Statistical atypicality (negative LL) is related to the theoretical notion of “unexpected uncertainty,” which we earlier proposed to reflect a confluence of unexpected deviations between expectation and observations and signal the need for representational learning (44). We proposed that unexpected uncertainty, signaled by the neuromodulator norepinephrine, acts in concert with expected uncertainty, signaled by the neuromodulator acetylcholine, to assist the neocortex in learning and maintaining appropriate representations of environmental statistics as well as selecting the appropriate behavioral responses (44). This theory has received considerable empirical support in the intervening years (78⇓⇓⇓⇓–83). In the original formulation of expected and unexpected uncertainty (44), we had in mind rather simple kinds of statistical inference and learning such as those related to associative learning; the current work suggests that similar mechanisms may also apply to highly complex stimuli such as faces (68). Among other implications, this leads to the interesting hypothesis that atypical faces might lead to an elevation of norepinephrine release. Consistent with this, there is evidence that an early and rapid pupil constriction predicts high attractiveness rating for faces, and experimental manipulation of pupil size causally affects perceived facial attractiveness in the expected direction (84). Combined with the findings that phasic increase in pupil size is associated with elevated norepinephrine release in the brain (83) and that phasic pupil dilation enhances learning (both correlationally and causally) in a manner consistent with unexpected uncertainty (79, 85), this suggests that facial atypicality may indeed modulate norepinephrine-mediated control over attractiveness perception and representational learning in a computationally principled manner.
It may puzzle some readers that we assert greater learning about atypical faces despite their associated negative affect, instead of less attention or less approach behavior. While negative affect can lead to physical avoidance of a face (although not always) (86), it is but one contributing factor to approach/avoidance behavior. In a classical study, it has actually been found that there is a strong negative correlation (
We do not claim statistical typicality to be the sole determiner of attractiveness or other social trait evaluations. For example, our model does not explain certain aspects of facial preferences (e.g., sexual dimorphism in face perception) (52, 90) or systematic differences in preference judgment for faces vs. other categories of objects (89). Even in our own analysis [consistent with prior findings (50, 54)], we find a separate linear component, apparently unique to each social trait. Relatedly, we recently found that the liking “function” over the stimulus space may be modified by positive/negative encounters with specific exemplars, which are then extrapolated to the rest of the space depending on the clustering (categorical) structure of the data (91). Nevertheless, statistical typicality already provides a parsimonious and normative account of several previously proposed causal factors of attractiveness. For example, we found that popular methods for symmetrizing face images also tend to increase statistical typicality. There is also recent work showing that coding cost at low-level image statistics (e.g., related to small image patches) also decreases attractiveness (20, 92), which is expected since low-level statistical typicality (in an efficient coding scheme) is a subcomponent of general statistical typicality. From this perspective, it also makes sense that blemishes (9) should decrease attractiveness, as they are statistically irregular and thus, expensive to encode.
Previously, it has been suggested that “averageness” contributes to facial attractiveness (93, 94), but the effect size was found to be quite modest (18, 20, 52). Because LL of a multivariate normal distribution peaks at the mean face and falls off monotonically along any particular dimension, it can also be thought of as a measure of averageness—our results indicate a larger effect of averageness on attractiveness perception. In previous studies, averageness was quantified as negative Euclidean distance (20) or negative standardized Euclidean distance (normalized by the SD in each dimension) (52). In contrast, LL is equivalent to negative Mahalanobis distance, which not only takes SD into account but also, correlation between features (as are present between shape and texture features). A larger quadratic effect might also be present in our study because the face stimuli were designed to vary systematically in LL (Materials and Methods and SI Appendix)—although Fig. 3 makes it clear that it is not only the most extreme stimuli that are driving the effects but the full range of faces. In contrast to previous studies (16, 18) that constrained the analysis to a single dimension, our model allows the possibility of using stimuli that vary along all (90) face feature dimensions and thus, arrive at more accurate and general conclusions about human face processing. For example, Euclidean distance, standardized Euclidean distance, and Mahalanobis distance (−LL) are all confounded on a single dimension but differ from one another nontrivially when a larger number of dimensions is considered. A greater differentiation between statistical typicality and averageness arises in the case of multimodal distributions, since LL and averageness are no longer well correlated. In this vein, our UiA analyses suggest that statistical typicality is a more precise and general concept than averageness for explaining facial liking.
Another prominent notion related to statistical typicality is prototypicality. Prototypicality is a measure of “closeness” of a stimulus to the “prototype” (11, 21), implying there are clear, fixed modes in the stimulus distribution. Our account differs in two ways: firstly, it does not assume the prototypes to be fixed and predefined but rather, task dependent (e.g., the average male face can be considered a prototype in a gender categorization task but not in a race categorization task); secondly, statistical typicality is well defined even for distributions that have no distinct modes, such as for the fairly flat (close to uniform) distribution of age, in which case LL is approximately a constant.
Our statistical typicality account is also related to a rather different explanation of BiA and UiA, known as the fluency account (11, 26, 27, 95). The fluency account hypothesizes that stimuli, such as category-specific prototypes, may be processed more “fluently” than other stimuli, and human liking of faces and other objects decreases in response to “disfluency” in processing. At a broad level, the fluency account shares with our statistical typicality hypothesis the concept of a human preference for efficiency. However, fluency is not mathematically defined but empirically measured as categorization response time (26, 27), which is undefined for the BiA setting where there is no categorization task (11). There is also a subtle but important distinction between statistical typicality and fluency (11, 26, 27, 95): the latter assumes that the disfluency (and thus, disliking) of intercategory faces is specifically related to the difficulty of categorizing faces that are close to the decision boundary. So far, all experimental results related to UiA have been obtained using categories that correspond to statistical modes in the data, as is the case with race (26) and gender (27). In such cases, the intercategory stimuli are both close to the decision boundary and have low LL. However, one can disentangle the two factors by using a categorization task in which the categories do not correspond to natural statistical clusters: for example, by asking subjects to decide whether a face is above or below a certain age threshold (say 40). As the statistical distribution along the age-relevant dimension is not obviously bimodal, statistical typicality would not predict a UiA effect in this case, while the fluency account still would. More broadly, processing fluency can be expected to be influenced not only by coding efficiency but also by, for example, computational complexity, motor delay or effort, attention, or motivational factors unrelated to coding efficiency. Other qualitative concepts related to fluency and statistical typicality are “simplicity” (96), in the sense that processing can become more efficient after a simple explanation/representation is learned, and effort-based decision making (97, 98), as processing cost at the biophysical level may contribute to cognitively perceived effort. More broadly, future empirical and theoretical work is needed to clarify the formal relationship among computational concepts such as statistical typicality, coding efficiency (34, 39), and processing efficiency, and more qualitative psychological concepts such as fluency (95), effort (97, 98), and simplicity (96).
In addition to providing a statistically grounded explanation of contextual dependence of human attractiveness judgment, our work also provides some general insight as to how high-dimensional data can be analyzed and stored efficiently in an intelligent system. All elements of the model presented here can be easily generalized to nonface stimuli, such as dogs, birds, butterflies, fish, automobiles, watches, and synthetic dot patterns (11, 14, 15). Minimizing the average coding cost of statistically distributed data is a desirable goal for any efficient representational system, whether natural or artificial. Another important computational insight is that a system can overcome limitations in its representational and computational capacity by dynamically shifting its featural representation to focus on task-relevant dimensions according to the behavioral context. As such, our work sheds light on one possible functional role played by attentional selection in the brain: it is one way to dynamically construct subspaces that emphasize feature dimensions that are most relevant or salient for performing the task at hand (29⇓⇓–32, 66). A productive direction of future research would be to see whether our hypothesized role of attention can help to shed light on the large and confusing attention literature.
Our line of reasoning sheds light on a broader computational understanding of how the brain dynamically encodes and processes complex, high-dimensional data. Faces provide excellent stimuli for investigating such processes, because they are informationally rich, ecologically important, and amenable to neurally plausible, quantitative modeling via methods such as AAM. We therefore used faces to implement and test concrete ideas about information representation and its contextual modulation in this work. The attractiveness literature also provided a convenient empirical test of our theory. However, we expect that the dynamic representational framework we hypothesize here also affects other cognitive processes, such as working memory, learning, decision making, and problem solving, in the sense that all these cognitive processes can benefit from attentional enhancement of task-relevant feature dimensions over irrelevant dimensions. For example, learning to memorize a set of items should be easier if one’s attention is focused on the features that make these items easier to organize. Another example is that two stimuli that differ along an attended featural dimension should be easier to discriminate and later recall, and those that differ along an unattended dimension (orthogonal to the attended dimension or dimensions) should be harder to discriminate and later recall. In general, the benefits of cognitive expediency and behavioral accuracy derived from focusing on task-relevant features are broad and multifaceted, pointing to a promising direction for future research.
Materials and Methods
Formal Model.
We assume humans have an internal d-dimensional representation of faces X (21, 52, 99), in which each face is represented by a vector
AAM.
We model faces using the AAM, due to its neural relevance (51), previous success in modeling human face space and predicting human responses (20, 50, 100), ability to output feature values for novel faces and generate realistic-looking synthetic faces for any feature setting, and high transparency and customizability in contrast to commercial software. We train AAM (SI Appendix) using CFD, a public dataset of 597 face images (25). We obtain for each face 30 shape features related to the geometric layout of invariant elements of faces (e.g., eyes, eye brows, nose, mouth, contour of the face) and 60 texture features related to pixel variations within and among these elements. (Results are insensitive to exact number of features; see SI Appendix.)
Experiment.
Participants.
Forty-one (mean age 20.6 y, 24 female) University of California (UC) San Diego undergraduate students participated in the study in exchange for course credit. Participants gave informed consent before taking part in the study. Approval for the study was given by the UC San Diego Human Research Protection Program.
Stimuli
AAM was used to generate 2,520 synthetic face images (SI Appendix, Fig. S2 shows example stimuli).
Procedure.
Participants rated faces “intuitively” on a Likert scale (one to five) for how “attractive,” “trustworthy,” “dominant,” and positive (valence) faces appeared. Each image received on average 2.35 ratings per trait.
Data Availability.
Anonymized data from the experiment where participants rated attractiveness, trustworthiness, dominance, and valence are available in the Open Science Framework (101).
Acknowledgments
We thank Jamin Halberstadt for sharing data from ref. 26. This project was partially funded by University of California San Diego Academic Senate grants (to P.W. and A.J.Y.).
Footnotes
- ↵1To whom correspondence may be addressed. Email: ajyu{at}ucsd.edu.
Author contributions: C.K.R., S.G., P.W., and A.J.Y. designed research; C.K.R., S.G., and A.J.Y. performed research; C.K.R. and A.J.Y. contributed new reagents/analytic tools; C.K.R. and A.J.Y. analyzed data; and C.K.R., S.G., P.W., and A.J.Y. wrote the paper.
The authors declare no competing interest.
This paper results from the Arthur M. Sackler Colloquium of the National Academy of Sciences, “Brain Produces Mind by Modeling,” held May 1–3, 2019, at the Arnold and Mabel Beckman Center of the National Academies of Sciences and Engineering in Irvine, CA. NAS colloquia began in 1991 and have been published in PNAS since 1995. From February 2001 through May 2019, colloquia were supported by a generous gift from The Dame Jillian and Dr. Arthur M. Sackler Foundation for the Arts, Sciences, & Humanities, in memory of Dame Sackler’s husband, Arthur M. Sackler. The complete program and video recordings of most presentations are available on the NAS website at http://www.nasonline.org/brain-produces-mind-by.
This article is a PNAS Direct Submission.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1912343117/-/DCSupplemental.
Published under the PNAS license.
References
- ↵
- ↵
- ↵
- I. Gauthier,
- M. Tarr,
- D. Bub
- ↵
- ↵
- F. Galton
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- D. Symons
- ↵
- ↵
- ↵
- ↵
- ↵
- C. Sofer et al.
- ↵
- R. Dotsch,
- A. Todorov,
- R. R. Hassin
- ↵
- ↵
- I. J. Holzleitner et al.
- ↵
- ↵
- H. Burkhardt,
- B. Neumann
- G. J. Edwards,
- T. F. Cootes,
- C. J. Taylor
- ↵
- ↵
- G. Tzimiropoulos,
- M. Pantic
- ↵
- ↵
- ↵
- H. E. Owen,
- J. Halberstadt,
- E. W. Carr,
- P. Winkielman
- ↵
- T. Vogel,
- E. W. Carr,
- T. Davis,
- P. Winkielman
- ↵
- L. K. Saul,
- Y. Weiss,
- L. Bottou
- A. J. Yu,
- P. Dayan
- ↵
- ↵
- C. Gratton,
- K. K. Sreenivasan,
- M. A. Silver,
- M. D’Esposito
- ↵
- N. Davidenko,
- C. Q. Vu,
- N. H. Heller,
- J. M. Collins
- ↵
- ↵
- W. A. Rosenblith
- H. B. Barlow
- ↵
- L. Luo
- ↵
- ↵
- ↵
- T. M. Cover,
- J. A. Thomas
- ↵
- ↵
- Z. Wang,
- X.-X. Wei,
- A. A. Stocker,
- D. D. Lee
- ↵
- N. Qian,
- J. Zhang
- ↵
- ↵
- G. Mattavelli,
- T. J. Andrews,
- A. U. R. Asghar,
- J. R. Towler,
- A. W. Young
- ↵
- ↵
- ↵
- W. Cohen,
- A. Moore
- R. Caruana,
- A. Niculescu-Mizil
- ↵
- L. T. Trujillo,
- J. M. Jankowitsch,
- J. H. Langlois
- ↵
- ↵
- O. K. Kaminska et al.
- ↵
- J. Guan,
- C. Ryali,
- A. J. Yu
- ↵
- ↵
- ↵
- A. Todorov,
- N. Oosterhof
- ↵
- A. Todorov,
- R. Dotsch,
- D. H. Wigboldus,
- C. P. Said
- ↵
- ↵
- ↵
- ↵
- S. Y. Chen,
- B. H. Ross,
- G. L. Murphy
- ↵
- A. L. Jones,
- B. Jaeger
- ↵
- ↵
- R. Elliott,
- R. J. Dolan
- ↵
- ↵
- W. Koustaal et al.
- ↵
- ↵
- E. W. Carr,
- T. F. Brady,
- P. Winkielman
- ↵
- S. Kastner
- A. C. Nobre
- A. J. Yu
- ↵
- K. Grill-Spector,
- K. S. Weiner,
- J. Gomez,
- A. Stigliani,
- V. S. Natu
- ↵
- ↵
- D. Marr,
- Vision
- ↵
- S. M. Landi,
- W. A. Freiwald
- ↵
- ↵
- ↵
- ↵
- Y. Weiss,
- B. Schölkopf,
- J. C. Platt
- J. Silva,
- J. Marques,
- J. Lemos
- ↵
- D. A. Ross,
- M. Deroche,
- T. J. Palmeri
- ↵
- ↵
- P. Zmarz,
- G. B. Keller
- ↵
- ↵
- M. R. Nassar,
- R. C. Wilson,
- B. Heasly,
- J. I. Gold
- ↵
- F. Meyniel,
- S. Dehaene
- ↵
- ↵
- ↵
- ↵
- H.-I. Liao,
- M. Kashino,
- S. Shimojo
- ↵
- ↵
- ↵
- ↵
- ↵
- J. Park,
- E. Shimojo,
- S. Shimojo
- ↵
- ↵
- T. Vogel,
- M. N. Ingendahl,
- P. Winkielman
- ↵
- J. P. Renoult,
- J. Bovet,
- M. Raymond
- ↵
- ↵
- A. J. O’Toole,
- T. Price,
- T. Vetter,
- J. C. Bartlett,
- V. Blanz
- ↵
- ↵
- ↵
- ↵
- ↵
- N. N. Oosterhof,
- A. Todorov
- ↵
- S. Denison,
- M. Mack,
- Y. Xu,
- B. C. Armstrong
- C. K. Ryali,
- X. Wang,
- A. J. Yu
- ↵
- C. K. Ryali,
- S. Goffin,
- P. Winkielman,
- A. J. Yu
Citation Manager Formats
Article Classifications
- Social Sciences
- Psychological and Cognitive Sciences
- Biological Sciences
- Neuroscience