A wealth of literature has posited that faces are processed in a different fashion compared to other categories of visual stimuli. Specifically, deriving a rapid and accurate representation of an individual face relies on “holistic processing” (HP), or the tendency to process all face parts together, as a whole or gestalt (for different interpretations regarding HP, see Behrmann, Richler, Avidan, & Kimchi, 2015; Farah, Wilson, Drain, & Tanaka, 1998; Rossion & Boremanse, 2008; Tanaka & Farah, 1993).

Several hypotheses have been proposed to account for the mechanisms underlying holistic face perception. One is the template hypothesis or, more broadly, that an internal representation exists for faces (Rossion, 2013; Tanaka & Farah, 1993). According to this view, faces are represented as a whole due to this internal face representation. A second mechanism proposed to underlie holistic face perception is the automated attention hypothesis (Richler, Palmeri, & Gauthier, 2012; Richler, Wong, & Gauthier, 2011). This mechanism posits that perceiving faces as a whole is a result of an attentional strategy that had become automated with experience. That is, while the face parts are represented independently, human observers are compelled to direct attention to all parts at once due to a natural expertise with face perception. Additional mechanisms have been proposed in relation to face perception (for a review, see Piepers & Robbins, 2012; Richler et al., 2012), but the main controversy common to these different approaches pertains to whether the information about face parts is available when perceiving a face, or whether faces are fundamentally represented as a whole, rendering the parts inaccessible.

Several paradigms have been proposed to investigate these questions, including the widely used composite effect (Hole, 1994; Young, Hellawell, & Hay, 1987) and the face inversion effect (FIE; Yin, 1969; see also the part–whole paradigm; Tanaka & Farah, 1993). In the composite paradigm, top and bottom face halves are horizontally separated by a small gap, creating an illusion of a new face. Typically, participants are unable to selectively attend to the upper part of the face while ignoring the bottom part (Ramon, Busigny, & Rossion, 2010; Rossion, 2013), and this dependency between the face parts is considered a hallmark of HP. The second often-used paradigm to test HP is the FIE, in which the inversion of a face inordinately disrupts face processing, as compared to the effect of inverting other objects (Yin, 1969). Although the FIE is a dramatic perceptual effect, whether or not it can be used as an index of perturbed HP is disputed (Richler & Gauthier, 2014; Rossion, 2013). Specifically, some researchers have claimed that the deterioration in behavioral performance for inverted faces supports the claim that they are processed in a qualitatively different fashion than upright faces. According to this view, processing a face holistically depends on an upright representation, which is consistent with our lifelong daily experience, or may even be innate (Reid et al., 2017), whereas in the inverted condition individual face parts are processed independently, supporting the face internal representation hypothesis (Farah, Tanaka, & Drain, 1995; Rossion, 2013; Tanaka & Farah, 1993). However, others claim that face parts are encoded and represented independently and that HP is the consequence of an attentional strategy in which all face parts are attended as a whole (Richler & Gauthier, 2014). In the inverted condition, HP exerts its effect only following a relatively longer presentation duration. However, this is due to inefficient processing rather than to the existence of a different representation (Richler, Mack, Palmeri, & Gauthier, 2011). This finding supports the automated attention hypothesis (Richler & Gauthier, 2014; Richler, Mack, et al., 2011; Sekuler, Gaspar, Gold, & Bennett, 2004).

Two versions of the composite paradigm are commonly used to assess HP; each employs a different operational definition for this construct, and there have been some controversies regarding inconsistent findings documented using these different approaches (Richler & Gauthier, 2014; Richler, Mack, et al., 2011; Rossion, 2013). In the partial or the standard design, two faces are presented sequentially in a trial. The bottom part of these two faces is always different, while the top part can be either the same or different. In addition, the face parts can be spatially aligned or misaligned (e.g., the middle of the top part can be aligned with the left part of the bottom part). When the face parts are aligned, the two top parts are the same and the bottom parts are different, and the participant has trouble discriminating the top parts due to the interference induced by the incongruency of the bottom parts. This interference is substantially reduced when the face parts are misaligned, and this interference is used to assess HP at the single-subject level. That the holistic measurement in the partial design utilizes only the “same” condition and the bottom parts are always different raise a number of possible constraints (Richler & Gauthier, 2013). One is that “same” trials are always incongruent and “different” trials are always congruent, thus potentially leading to a response bias. Furthermore, because often only “same” trials are used for the analysis of HP, it is not possible to separate potential response biases from sensitivity.

The second version of the paradigm is termed the complete or full design (Farah et al., 1998; Richler & Gauthier, 2014). This design includes all four combinations of stimulus pairs: Within each trial two faces are presented, and both the top and bottom parts can be either the same or different. The different combinations are classified by the similarity of the top part of the face (same/different) and by the congruency between the identity of the top and bottom parts (congruent/incongruent; see Fig. 1). HP in this design is typically measured by the interaction between congruency and alignment, such that the congruency effect is most evident in aligned but not misaligned trials. In contrast to the measure used in the partial design, this approach yields a group-level, rather than a subject-level, measure of HP. Overall it seems that the complete design captures HP construct validity best, and this design is considered by many the gold standard for measuring HP (Richler, Cheung, & Gauthier, 2011; Richler & Gauthier, 2014). Specifically, the complete version of the composite effect addresses potential confounds such as response bias, embedded in the partial design, and has been shown to have nearly three times the effect size of the partial design. Moreover, it has been shown that HP, as measured by the partial design, can be modulated by, for example, participant’s beliefs regarding the proportion of trials belonging to the “same” condition. This result contradicts the definition of HP, which should be obligatory, regardless of the participants’ criterion or beliefs (Richler, Cheung, & Gauthier, 2011; Richler & Gauthier, 2014). Finally, the effect size of the complete composite design was found to be higher than that of the part–whole paradigm (Wang, Li, Fang, Tian, & Liu, 2012).

Fig. 1
figure 1

Experimental design. Creating the four possible combinations of stimuli, the face top halves (indicated by A or B), could be either the same or different, and could be either congruent or incongruent with the bottom halves (indicated by C or D). In addition (not presented in this figure), the faces could be aligned or misaligned and could be presented at seven different orientations (0, 30, 60, 90, 120,150, and 180 deg)

The two different versions of the composite task were previously used in conjunction with the FIE to assess whether the latter effect also reflects HP. Rossion and Boremanse (2008) used the partial design to test HP when a face was presented at various angles between upright and inverted (0 to 180 deg, seven angles). They found that HP sharply declined when faces were rotated from 60 to 90 deg—that is, when faces were rotated away from their common orientation. According to these results, HP is evident only when the stimulus orientation matches an internal representation that naturally exists for the upright position resulting from human common visual experience. Conversely, studies examining mental rotation of face stimuli have revealed a linear relation between face rotation and behavioral recognition performance (Bruyer, Galvez, & Prairial, 1993; Collishaw & Hole, 2002; Valentine & Bruce, 1988). These seemingly contradictory findings can be reconciled on the basis of recent studies showing that the correlations between general face recognition abilities with tasks related to HP are moderate at best (inversion effect), and non-existing at worst (partial composite effect task; Rezlescu, Susilo, Wilmer, & Caramazza, 2017). Along similar lines, Rossion (2013) suggested that an efficient HP representation is dependent on an upright template that cannot be effectively rotated.

In a different study, Richler, Mack, et al. (2011) used upright and inverted faces (0- and 180-deg ) and tested HP in the context of the complete composite design, which according to these authors is more suited to capture HP. They found that, although inverted faces were not perceived holistically at the short exposure duration (50 ms), they were perceived as such when the presentation duration was sufficiently long (>183 ms). On the basis of the finding that inverted faces could be perceived holistically, the authors argued against the employment of FIE as a measure for HP. Furthermore, the dependency of HP on exposure duration was used to support the automated attention hypothesis. Thus, the fact that HP may develop over time in the inverted condition implies that this process is not orientation-specific, which in turn would rule out the internal representation hypothesis. One possibility is that the contradictory conclusions derived from the two different versions of the composite paradigm could be reconciled by applying a more inclusive method that employed all conditions within a single experiment.

The purpose of our present work was to examine the two opposing hypotheses described above (face internal representation vs. automated attention) by directly evaluating the effect of face inversion on HP using the well-established complete HP design. Specifically, we tested HP when faces were presented at different viewing angles using the complete design of the composite paradigm, while providing a relatively long presentation duration (up to 2,500 ms) for the face stimuli. This would allow us to examine whether face rotation yields a nonlinear drop in HP and whether HP is orientation-specific. Consequently, this may provide further understanding of the mechanisms underlying face perception and holistic perception.

Method

Participants

Sixty-seven individuals (54 females and 13 males) with normal or corrected-to-normal vision participated in the experiment (mean age ± SD = 23.11 ± 1.47).Footnote 1 The data from three additional participants were discarded; the exclusion criterion was an accuracy rate at least two SDs below the mean.

Stimuli

The stimuli consisted of grayscale line drawings of top and bottom faces in a frontal view with neutral expression (84 images of each part, or 168 total; see Fig. 1), obtained from the Face Database of the Max Planck Institute for Biological Cybernetics (Max Planck Institute, Tubingen, Germany; Troje & Bülthoff, 1996). The face top and bottom parts were combined, separated by a five-pixel-thick gap, to form 512 different composite faces; each composite face appeared a maximum of three times. Misaligned faces were created by shifting the top part 50 pixels to the left. The images were normalized for contrast, cropped into a 190 × 130 oval window, and presented on a black background. Twenty-five stimuli were presented for each angle (0, 30, 60, 90, 120, 150, and 180), alignment (aligned/misaligned), congruency (congruent/incongruent), and correct response (“same”/“different”) condition, creating a total of 1,400 stimuli.Footnote 2

Procedure

Trial structure

Each trial consisted of a fixation cross presented for 500 ms, followed by the target face shown for 500 ms, a 500-ms scrambled image mask, and finally a probe face, which was presented until response or for a maximum of 2,500 ms. In each trial, the pair of probe and target composite faces were presented with the same angle and alignment. Participants were instructed to decide whether the top parts of the probe face and the target composite face were the same or different, while ignoring the bottom parts. See Fig. 2 for example trials. The experiment was built using OpenSesame 2.9 (Mathôt, Schreij, & Theeuwes, 2012).

Fig. 2
figure 2

Experimental trials. The upper row depicts an example of the aligned congruent condition at the 30-deg orientation, and the bottom row depicts the misaligned incongruent condition for the upright orientation (0-deg orientation). The correct response for the trial in the upper row is “different,” and that for the trial in the bottom row is “same". In all trials, both the probe and target faces were presented at the same alignment and orientation. Alignment was blocked, and all other conditions were presented randomly

Practice run

Participants completed a practice run of ten trials. A prerequisite for participation was to achieve 70% accuracy during the practice run. Note that participants had three attempts to achieve this minimum accuracy threshold. Participants who failed the practice runs (n = 2) were excluded from further analyses.

Experimental session

The main experimental session consisted of separate aligned and misaligned composite face runs. All other conditions were randomized. This design was employed to avoid context-dependent effects—for example, the induction of HP in misaligned trials by the preceding aligned trials (see Richler, Bukach, & Gauthier, 2009). The order of the runs was counterbalanced across participants.

Results

Using the complete design, HP is operationalized as the two-way interaction between Alignment (aligned/misaligned) and Congruency (congruent/incongruent) as within-subjects factors (Cheung, Richler, Palmeri, & Gauthier, 2008). We tested this interaction at each of the seven orientations (0, 30, 60, 90, 120, 150, and 180 deg) employed during the experiment, thus measuring a three-way interaction between alignment, congruency, and angle (2 × 2 × 7; see Table 1). Since we wanted to examine the contributions of sensitivity versus response bias, the dependent variables were defined as the sensitivity (d') and criterion (c) of the accuracy score (Macmillan & Creelman, 1991). In addition, to allow for comparisons with previous studies, we also used reaction time (RT). To remove outliers, trials with RT exceeding two SDs above the mean were discarded (2% of the trials), this was done separately for each participant in each condition (congruency, alignment, and orientation angle).

Table 1 Accuracy (A) and response time for correct trials (B) for all combinations of alignment, congruency, and rotation angle

Sensitivity

Using sensitivity (d') as the dependent measure, we found a three-way Alignment × Congruency × Angle interaction [F(6, 66) = 34.97, p < .001, η 2 = .343]. Probing this interaction revealed that it stemmed from a significant two-way interaction of alignment and congruency [F(1, 66) = 86.8, p < .001, η 2 = .564]. As we noted above, this interaction term serves as the operational definition of HP within the context of the complete design of the composite paradigm. As expected in the complete design, probing this two-way interaction in a planned-comparisons, simple-effect analysis revealed that the sensitivity in aligned trials was higher for the congruent than for the incongruent face stimuli [F(1, 66) = 147.8, p < .001, η 2 = .688] and that this effect was not found in misaligned trials [F(1, 66) = 0.11, p = .74]. Finally, we also found a main effect of angle [F(6, 66) = 88.72, p < .001, η 2 = .570] and a main effect of congruency [F(1, 66) = 75.6, p < .001, η 2 = .530], such that sensitivity was lower in the incongruent trials. A main effect of alignment was not found [F(1, 66) = 0.059, p = .81].

As is evident in Fig. 3, the F value, quantifying the two-way interaction of alignment and congruency, decreased as the face rotated from 0 to 180 deg. Most notably, a sharp decline in F value was observed between 0 and 30 deg. We examined each of the adjacent angels using the contrasts of the three-way interaction (Alignment × Congruency × Angle). The only comparison that remained statistically significant following the application of multiple-comparisons correction (Bonferroni correction, p < .05) was the one between 0 and 30 deg [F(1, 66) = 11.53, p = .001, η 2 = .147], and all other comparisons were not significant [30–60: F(1, 66) = 0.657, p = .42; 60–90: F(1, 66) = 0.244, p = .623; 90–120: F(1, 66) = 0.013, p = .91; 120–150: F(1, 66) = 2.25, p = .14; 150–180: F(1, 66) = 0.132, p = .72]. Importantly, the two-way Alignment × Congruency interaction was significant for each of the angles individually [0: F(1, 66) = 94.69, p < .001; 30: F(1, 66) = 21.91, p < .001; 60: F(1, 66) = 19.41, p < .001; 90: F(1, 66) = 8.073, p = .006; 120: F(1, 66) = 10.8, p = .002], except for the fully inverted faces (180 deg) and faces with a 150-deg rotation [180: F(1, 66) = 3.098, p = .083; 150: F(1, 66) = 0.767, p = .384].

Fig. 3
figure 3

F values for sensitivity (d'), which serves as the operational definition of holistic processing in the complete design of the composite paradigm (Congruency × Alignment), were calculated for all seven orientation angles. Significant F values (corresponding to p < .05) are marked in dark gray. The results for angles of 150 and 180 deg did not reach significance (p > .05). The observed sharp decline between 0 and 30 deg was the only statistically significant comparison in the three-way interaction for two adjacent angles [F(1, 66) = 11.53, p = .001, η 2 = .147]

Can a shift in the visual field account for the drop in HP between 0 and 30 deg rotation?

To rule out that the difference obtained between the upright and 30-deg rotations was the outcome of a shift in the visual field, we calculated the change in visual angle between these two orientations. To capture the maximal displacement in visual angle, this calculation was carried out on the misaligned faces, from the far top left edge of the face stimulus to the same location following a clockwise 30-deg rotation. Critically, this resulted in a shift of merely 1.1 deg of visual angle. Therefore, the visual field was approximately the same across the two most relevant orientations, and hence was not likely to induce a nonlinear HP decline.

Criterion (c)

A three-way analysis of variance (ANOVA) similar to the one used for the sensitivity analysis was conducted, with criterion scores as the dependent measure. The three-way Alignment × Congruency × Angle interaction was significant [F(6, 66) = 8.24, p = .005, η 2 = .110], but the two-way Alignment × Congruency interaction was not [F(1, 66) = 1.131, p = .291]. A main effect of congruency [F(1, 28) = 20.23, p < .001, η 2 = .232] revealed that participants were more likely to respond “different” in congruent trials. Finally, we also found a main effect of angle [F(1, 66) = 9.01, p < .004, η 2 = .119] and a main effect of alignment [F(1, 66) = 7.84, p = .007, η 2 = .104], such that participants were more likely to respond “different” in misaligned trials.

Response time (RT)

As in the sensitivity analysis, an Alignment × Congruency × Angle ANOVA was conducted with RT as the dependent measure. Only correct trials were included in this analysis. The results revealed that this three-way interaction was not significant [F(6, 66) = 0.56, p = .442], nor was the two-way Alignment × Congruency interaction [F(1, 66) = 3.24, p = .076]. Finally, we found a main effect of congruency [F(1, 66) = 26.31, p < .001, η 2 = .282], such that RT was shorter in congruent than in incongruent trials. We also found a main effect of angle [F(1, 66) = 10.00, p = .002, η 2 = .130], but the main effect of alignment was not significant [F(1, 66) = 0.03, p = .87].

Discussion

The goal of this study was to test two opposing hypotheses, of a face internal representation versus automated attention, by examining the influence of face inversion on HP. Two previous studies had reached contradictory results regarding this manipulation. Specifically, utilizing the partial composite design, Rossion and Boremanse (2008) found that HP was drastically disrupted when a face stimulus was rotated away from its standard, upright orientation. They concluded that processing faces holistically depends on an upright internal face representation. However, using the complete composite design, Richler, Mack, et al. (2011) found that although HP is not apparent for inverted faces when they are presented briefly (50 ms), it is evident for both upright and inverted faces when the presentation time is sufficiently long. These results imply that face inversion merely reduces the efficiency of the automated attention mechanism, as was evident when participants were provided with sufficient time for the probe face. The two studies used different methodologies, and this might have accounted for the contradicting results. In the present study, we combined the two methodologies to test HP when faces were shown at different orientations using the full design of the composite paradigm.

We observed a sharp decline in HP when faces were rotated away from the upright position (0–30 deg). The observed difference in the HP score of the full design between 0 and 30 deg was statistically significant, whereas the differences between all other adjacent angles were not, thus attesting to the privileged processing of upright faces. The attentional hypothesis for HP predicts a linear change in holistic perception, which has been apparent in studies that examined mental rotation of faces (Bruyer et al., 1993; Collishaw & Hole, 2002; Valentine & Bruce, 1988). In contrast, the observed nonlinear drop in HP and the significant three-way interaction between alignment, congruency, and angle indicate a decrease in the HP index of the complete design as faces are rotated away from 0. Thus, our findings, showing a nonlinear change in HP, support the notion of an internal representation that is highly specific to the common, upright face orientation. Finally, in the present study, as opposed to the results presented by Richler, Mack, et al. (2011), no significant HP effect was found for inverted faces (180 deg), despite the fact that, on the basis of the observation of these authors, the presentation duration was sufficiently long for such an effect to occur.

We note that this sharp decline was exhibited in our study already at the rotation of 30 deg, whereas in the results reported by Rossion and Boremanse (2008) this drop occurred only between 60 and 90 deg. This disparity may be accounted for by methodological differences related to the employment of the complete versus the partial composite design. The complete design is quantified with sensitivity, which is considered to be independent of decision bias, and thus may better capture perceptual processes. This is important in relation to the study by Rossion and Boremanse, in which the decision criterion could have accounted for the results. Such an interpretation is supported by the significant three-way Alignment × Congruency × Angle interaction for the criterion (c') found in the present study. Analyzing correct RT did not yield the same conclusive results as had the d' analysis, possibly due to high variance. A previous study also reported a more substantial composite effect for accuracy than for RT (see Rossion, 2008).

It is important to state that our paradigm differed in a number of ways from the full-design version employed by Richler, Mack, et al. (2011). In the original composite effect design (see Rossion, 2013, for a review), participants were instructed to attend to a certain part of the face while ignoring the irrelevant part, thus measuring a failure of their selective attention. In contrast, the participants in Richler, Mack, et al.’s study were given a visual cue that marked the top or bottom face part as the target for comparison. The cue was presented during the first mask stimulus and shown only after presentation of the target face. This design forced participants to attend to both halves of the target face stimuli, and hence does not necessarily demonstrate a failure of selective attention (see Richler & Gauthier, 2013; Rossion, 2013). This attentional confound could potentially account for the finding of HP for inverted faces by Richler, Mack, et al. when the faces were presented for a sufficiently long duration. In contrast, it could be that a brief exposure duration (50 ms) did not allow for such an attentional mechanism to exert its effect. To control for such a possible confound, in the present study, the top part of the face was always the target. Moreover, the alignment (aligned/misaligned) of the two faces was the same for both the target and probe faces, whereas for Richler, Mack, et al., the study face was always aligned.

Using the full design of the composite paradigm, we found a nonlinear drop in HP when faces were rotated away from the upright orientation. These results support the face internal representation hypothesis and the theory that this internal face representation is highly orientation-specific.

Author note

This work was support by the Israel Science Foundation (ISF) Grant No. 296/15 to GA.