Introduction

Eye gaze is one of the most important social cues in daily communication. Direct gaze captures the observer’s visual attention and serves as a signal to initiate social interaction (see review, Hamilton, 2016). On the other hand, averted gaze triggers attentional shifts and conveys information about the surrounding space and objects (Driver et al., 1999; Marotta et al., 2012; Oyama & Okubo, 2022). The fast and accurate processing of gaze direction is crucial for establishing appropriate and meaningful communication given the importance of eye gaze in navigating the social world.

Recently, spatial Stroop tasks have been used to investigate the mechanisms of gaze processing and gaze-triggered attention (i.e., the gaze spatial Stroop task, Cañadas & Lupiáñez, 2012; Edwards et al., 2020; Ishikawa et al., 2021; Marotta et al., 2018). In this task, the participants discriminate the gaze direction of a peripherally presented face while ignoring its location. In contrast to the typical spatial Stroop task, which produces a spatial Stroop effect (with shorter responses when the target direction and its location are congruent than when they are incongruent, for a review, see Lu & Proctor, 1995), Cañadas and Lupiáñez (2012) found a reversed congruency effect with shorter responses when the gaze direction and its location were incongruent (incongruent trial; e.g., the right-looking face was presented on the left side of the visual field) than when they were congruent (congruent trial; e.g., the right-looking face was presented on the right side of the visual field). The reversal of the spatial Stroop effect suggests a unique attentional mechanism for social interaction dedicated to eyes such as eye contact, joint attention, and perspective-taking, reflecting qualitative differences from those applied to other directional stimuli such as arrows (Cañadas & Lupiáñez, 2012; Edwards et al., 2020; Hemmerich, 2018; Ishikawa et al., 2021; Marotta et al., 2018).

The reversed congruency effect is modulated by facial contexts, such as facial expressions (Jones, 2015; Marotta et al., 2022). For example, Jones (2015) found a larger reversed congruency effect for angry and happy faces than for neutral ones, and found that the effect disappeared for fearful faces (see Marotta et al., 2022 for similar results for happy faces). These results suggest that gaze direction is processed through interaction and integration with facial context, modulating the reversed congruency effect.

Face inversion, which hinders various kinds of face processing, can affect the reversed congruency effect since the effect is modulated by the facial context in the gaze spatial Stroop task. Face inversion disrupts the holistic processing of faces and interferes with face recognition and the judgments of facial expressions and traits (a face inversion effect, Tanaka & Gauthier, 1997; Wilson et al., 2018; Yin, 1969; Young & Hugenberg, 2010). However, contrary to the robust effect of inversion on faces, results regarding the effect of inversion on gaze-triggered attention have been mixed. For example, the direct-gaze target was detected shorter than the averted-gaze target when faces were in an upright orientation in the visual search (i.e., stare-in-the-crowd effect, Von Grünau & Anston, 1995). This facilitation disappeared for the inverted faces (Böckler et al., 2015; Senju et al., 2005, 2008). As face inversion impairs the processing of configural information (the spatial relations between features in the whole face, Tanaka & Gauthier, 1997; Wilson et al., 2018; Yin, 1969; Young & Hugenberg, 2010), the disruptive effects of face inversion support the idea that the holistic face processing affects gaze processing. By contrast, Riechelmann et al. (2021) found an averted gaze advantage, shorter gaze discrimination on the averted-gaze target than on the direct-gaze target, emerged regardless of face orientation. This result suggests that people process gaze information in a part-based rather than a holistic manner (Ganel et al., 2005; Langton et al., 2004). This interpretation is consistent with the traditional account of gaze processing, which emphasizes the role of local eye features (i.e., the contrast and the positional relationship between the dark iris/pupil, and a large white sclera, Ando, 2002; Anstis et al., 1969; Sinha, 2000). Studies using a gaze cueing paradigm have also supported the part-based processing of gaze information. Tipples’s (2005) comprehensive study demonstrated that the gaze cueing effect was unaffected by face inversion (but also see Kingstone et al., 2000; Langton & Bruce, 1999 for vertical cues). These mixed results preclude any unequivocal conclusion regarding the effects of face inversion on gaze processing.

The present study examined the face inversion effect on the gaze spatial Stroop task to clarify whether the holistic face processing or part-based processing of the eyes is responsible for the reversed congruency effect. If holistic face processing was adopted (Böckler et al., 2015; Senju et al., 2005, 2008), face inversion would have a negative effect on the reversed congruency effect. In contrast, if part-based processing of features was exerted (Ganel et al., 2005; Langton et al., 2004), face inversion little would affect the reversed congruency effect.

Experiment 1

Methods

Participants

Forty students participated in Experiment 1 (eight women and 32 men, Mage = 19.13, SD = 0.79). A prior power analysis using GPower (Faul et al., 2007), assuming an effect size of d = 0.50, referencing the reversed congruency effect derived from Jones (2015), provided a sample size of 34 participants, which was sufficient for replicating a reversed congruency effect (power = .80, significant α = .05).

Material

We used two types of facial stimuli: upright and inverted. These stimuli consisted of two full-color male and female face photographs (i.e., four photographs in total) selected from the ATR Facial Expression Image Database (ATR-promotions, 2006). Examples of facial photographs are shown in Fig. 1. Each face photograph was subtended to be 300 × 356 pixels on the display and was turned straight to the camera with a neutral expression. The eye region in the photograph was averted to the left or right. The face photograph was presented upright or upside-down in either the left or right visual field. Stimuli presentation, timing, and data collection were controlled by jsPsych 6.3.1 (De Leeuw, 2015).

Fig. 1
figure 1

An example of stimuli and a trial sequence. The top figure illustrates a congruent trial of the upright face condition (the right-looking target was presented in the right visual field). The bottom figure illustrates an incongruent trial of the inverted face condition (the right-looking target was presented in the left visual field)

Procedure

Participants completed the online access experiment on their computer. The trial sequence is illustrated in Fig. 1. Each trial began with a white fixation cross presented at the center of the display for 1000 ms. The target face was then presented for 2000 ms, either to the left or right of the fixation cross. The distance from the center of fixation to the center of the target was 174 pixels on the display. Participants judged the gaze direction as quickly and accurately as possible while ignoring its location. Participants pressed the “F” key when the target faces were looking to the left and pressed the “J” key when they were looking to the right. If the answer was incorrect, the word “incorrect” was presented for 700 ms. The gaze direction and location were randomized throughout the experiment. Participants performed 16 practice trials, followed by two experimental blocks of 64 experimental trials for each condition (upright face: 64 trials; inverted face: 64 trials). The upright and inverted faces were presented as separate blocks. The order of the experimental blocks was counterbalanced among the participants.

Results and discussion

One participant who did not complete the task was excluded. Moreover, overall accuracy was high (94.25%) and susceptible to the ceiling. Thus, we did not analyze further. Based on Marotta et al.’s (2018) criteria, responses shorter than 200 ms (0.02%), slower than 1300 ms (0.66%), or incorrect responses (5.75%) were excluded from the analysis. We calculated the mean reaction time (RT) for the four experimental conditions defined by an orthogonal combination of face orientation and congruency. Fig. 2 and Table 1 represent the mean RT of correct responses for Experiment 1.

Fig. 2
figure 2

Means of reaction time for the spatial Stroop task as a function of face orientation and congruency in Experiment 1. Error bars show standard errors. Plots in dark red represent individual data

Table 1 Means and standard deviations of reaction time for each experimental condition

The RT data were subjected to a two-factor repeated-measures ANOVA with face orientation (upright vs. inverted faces) and congruency (congruent vs. incongruent). The main effect of face orientation was significant, with shorter responses observed for the upright face condition than for the inverted face condition, F (1, 38) = 23.30, p < .001, \({\eta}_p^2\) = 0.38, BFincl = 845.371. The main effect of congruency was also significant, with shorter responses observed in incongruent trials than in congruent trials, F (1, 38) = 65.00, p < .001, \({\eta}_p^2\) = 0.63, BFincl < .001. The interaction between face orientation and congruency was not significant, F (1, 38) = 0.03, p = .859, \({\eta}_p^2\) = 0.00, BFincl = 0.950.

Overall, the responses were slower for the inverted face than for the upright face. These results are generally consistent with previous studies on the face-inversion effect (e.g., Riechelmann et al., 2021; Senju et al., 2008). Face inversion disrupts holistic face processing, slowing down overall performance. Most importantly, the reversed congruency effect was observed regardless of the face orientation with an equivalent magnitude. Reaction times were significantly shorter in incongruent trials than in congruent trials for both inverted and upright faces. These results suggest that the reversed congruency effect is processed independently of the holistic processing of the face, suggesting part-based processing of the eye region (Ganel et al., 2005; Langton et al., 2004).

To verify the replicability of our results, we reconducted Experiment 1 with a newly recruited sample (N = 20). All the results in Experiment 1 were replicated, indicating high replicability (see supplemental dataFootnote 1).

Experiment 2

Experiment 1 suggested that part-based processing in gaze direction was responsible for the reversed congruency effect. However, Jones (2015) reported that the reversed congruency effect became larger for happy and angry faces than for neutral, and fearful faces (see Marotta et al., 2022 for similar results) and suggested that facial expressions containing approach signals heightened the reversed congruency effect (Adams Jr & Kleck, 2003). The modulation of facial expression indicates that gaze direction is integrated with other facial features in the spatial Stroop task, suggesting holistic face processing of the reversed congruency effect. However, the effects of facial expressions observed in previous studies need to be re-evaluated. Morphological features within the eye region vary with facial expressions (e.g., the sclera of sad faces is relatively small vertical direction compared to other facial expressions). To address this issue, we conducted the gaze spatial Stroop task manipulating the facial expressions of stimuli (i.e., angry, happy, neutral, and sad) in Experiment 2. As in Experiment 1, the face stimuli were presented in upright or inverted orientation. If morphological features of the eyes can influence the gaze direction judgments, the reversed congruency effect would emerge with equivalent magnitude independently of the face orientation across facial expressions.

Methods

The methods were identical except for the following differences.

Participants

One hundred and three students participated in Experiment 2 (sixteen women, 85 men, and two others, Mage = 19.46, SD = 0.92). A prior power analysis using GPower (Faul et al., 2007), assuming an effect size of \({\eta}_p^2\) = 0.11, referencing the interaction between facial expression and congruency derived from Jones (2015), provided a sample size of 93 participants, which was sufficient for replicating the effect (power = .80, significant α = .05).

Material

Examples of facial photographs are shown in Fig. 3. We used four types of facial expressions: angry, happy, neutral, and sad. These stimuli consisted of two full-color male and female face photographs (i.e., sixteen photographs in total) selected from the ATR Facial Expression Image Database (ATR-promotions, 2006).

Fig. 3
figure 3

An example of stimuli and a trial sequence. The top figure illustrates a congruent trial of the upright sad face condition (the right-looking target was presented in the right visual field). The bottom figure illustrates an incongruent trial of the inverted happy face condition (the right-looking target was presented in the left visual field)

Procedure

Participants performed 16 practice trials, followed by four experimental blocks of 64 experimental trials. The upright and inverted faces were presented randomly within the same experimental block.

Results and discussion

Two participants who did not complete the task were excluded. Thus, 101 data were used for the final analysis. Moreover, overall accuracy was high (94.41%) and susceptible to the ceiling. Thus, we did not analyze further. Based on Marotta et al.’s (2018) criteria, responses shorter than 200 ms (0.02%), slower than 1300 ms (0.51%), or incorrect responses (5.56%) were excluded from the analysis. We calculated the mean reaction time (RT) for the sixteen experimental conditions defined by an orthogonal combination of face orientation, facial expression, and congruency. Figure 4 and Table 1 represent the mean RT of correct responses, for Experiment 2.

Fig. 4
figure 4

Means of reaction time for the spatial Stroop task as a function of face orientation and congruency in Experiment 2. Error bars show standard errors. Plots in dark red represent individual data

The RT data were subjected to a three-factor repeated-measures ANOVA with face orientation (upright vs. inverted faces), facial expressions (angry vs. happy vs. neutral vs. sad), and congruency (congruent vs. incongruent). The main effect of face orientation was significant, with shorter responses for the upright face condition than for the inverted face condition, F (1, 100) = 204.00, p < .001, \({\eta}_p^2\) = 0.67, BFincl = 2.513E+13. The main effect of congruency was also significant, with shorter responses observed in incongruent trials than in congruent trials, F (1, 100) = 152.09, p < .001, \({\eta}_p^2\) = 0.60, BFincl = 2.513E+13. The main effect of facial expression was significant, F (3, 300) = 80.17, p < .001, \({\eta}_p^2\) =0.45, BFincl = 2.513E+13. The multiple comparisons (using the Holm correction adjustment) revealed shorter responses for neutral faces than for angry, happy, and sad faces (adj.ps < .001). Happy faces responded shorter than angry and sad faces (adj.ps < .001), and angry faces responded shorter than sad faces (adj.ps < .001). The interaction between facial expression and congruency was significant, F (3, 300) = 3.11, p = .027, \({\eta}_p^2\) =0.03, BFincl = 0.607. A two-way interactions between face orientation and facial expression, F (3, 300) = 0.85, p = . 470, \({\eta}_p^2\) =0.01, BFincl = 0.045, face orientation and congruency, F (3, 300) = 1.37, p = . 245, \({\eta}_p^2\) =0.01, BFincl = 0.484, and a three-way interaction between face orientation, facial expression, and congruency were not significant, F (3, 300) = 0. 96, p = . 410, \({\eta}_p^2\) =0.01, BFincl < .001.

Analysis of the reversed congruency effect

To clarify the interaction between facial expression and congruency, we compared the magnitude of the reversed congruency effect for four facial expressions by face orientation. Figure 5 represents the mean of the reversed congruency effect for each facial expression in Experiment 2. The RT data were subjected to a two-factor repeated-measures ANOVA with face orientation (upright vs. inverted faces) and facial expressions (angry vs. happy vs. neutral vs. sad). The main effect of facial expressions was significant, F (3, 300) = 3.11, p = .027, \({\eta}_p^2\) = 0.03, BFincl = 0.168. The reversed congruency effect tended to become larger for sad faces than for happy (adj.p = .094) and neutral faces (adj.p = .118), although it did not reach statistical significance. The main effect of face orientation, F (1, 100) = 1.37, p = .245, \({\eta}_p^2\) = 0.01, BFincl = 0.126, and interaction, F (3, 300) = 0.96, p = .410, \({\eta}_p^2\) = 0.01, BFincl = 0.004, were not significant.

Fig. 5
figure 5

Means of the reversed congruency effect for the spatial Stroop task as a function of facial expressions in Experiment 2. Error bars show standard errors

As in Experiment 1, face inversion little affected the reversed congruency, while the overall reaction time was slower for inverted faces than for upright faces. These results support the part-based account of the reversed congruency effect (Ganel et al., 2005; Langton et al., 2004). Facial expressions slowed the overall reaction time, with the slowest reaction times for the sad expression. The reversed congruency effect tended to be larger for sad than happy and neutral faces although it did not reach significance. It is noteworthy that these results were observed regardless of face orientation and, thus, were inconsistent with the idea of holistic face processing because the perception of facial expression is orientation-sensitive (Tanaka & Gauthier, 1997; Wilson et al., 2018; Yin, 1969; Young & Hugenberg, 2010). We further discuss these results in the General Discussion.

General Discussion

The present study examined the effect of face inversion on the gaze spatial Stroop task. Across two experiments, the overall reaction time was slower for inverted faces than for upright faces, suggesting that face inversion disrupts holistic processing and delays overall performance (Jenkins & Langton, 2003; Riechelmann et al., 2021). By contrast, the magnitude of the reversed congruency effect was almost equivalent for upright and inverted faces; the reversed congruency effect was thus independent of face inversion manipulation. These results indicate that, while holistic face processing affects overall performance, part-based processing is responsible for the reversed congruency effect. Judgments on gaze direction may rely on the local features of the eyes (e.g., contrast and the positional relationship between the dark iris/pupil and a large white sclera). Thus, the results of this study are consistent with the traditional account of gaze processing (Ganel et al., 2005; Langton et al., 2004).

In this study, face inversion substantially affected the overall response. This result is consistent with previous studies reporting the face inversion effect on gaze processing, wherein inverting the eye region impeded gaze discrimination performance (e.g., Jenkins & Langton, 2003; Riechelmann et al., 2021; Senju et al., 2008). Face inversion may disrupt the configural holistic processing of the face and thus affect the processing stage of extracting features (i.e., embedded eyes), resulting in the deterioration of overall performance.

Most important, the reversed congruency effect emerged regardless of the face orientation. While face inversion may affect the processing stage of extracting features, simple left-right discrimination of gaze direction can be performed independently of face orientation because extracted information can be processed in a part-based manner (Ganel et al., 2005; Langton et al., 2004). Previous studies have also demonstrated a discrepancy between the processing of face and gaze (e.g., Haxby et al., 2000; Hietanen & Leppänen, 2003; Tipples, 2005). For example, Tipples (2005) found the null effect of face inversion on gaze cueing and explained that face and gaze were processed in different systems. Adopting the gaze spatial Stroop paradigm, we observed that the reversed congruency effect was not affected by the face inversion effect throughout the two experiments. These results support the separate systems for invariant and changeable features proposed by Haxby et al. (2000). Haxby et al. (2000) conducted brain imaging experiments and proposed a face perception model emphasizing the distinction between the representation of invariant and changeable facial features. In his model, the former system dedicates to face identity, which is vulnerable to face inversion, while the latter, which is rather resistant to face inversion, dedicates to facial expressions, gaze direction, and lip movements. Because the gaze direction is changeable facial features, face inversion produced no effect on the reversed congruency effect in the present study.

Experiment 2 examined the interaction between facial expression and face inversion. Participants responded slower to sad than any other expression. However, these results were observed regardless of face orientation and were inconsistent with the idea of holistic face processing (Tanaka & Gauthier, 1997; Wilson et al., 2018; Yin, 1969; Young & Hugenberg, 2010). These results can be attributed to morphological features varying among the facial expressions. For example, the eyes for the sad expression narrowed when compared with other expressions. Such narrowed eyes might have made the judgment of gaze direction difficult and delayed the overall reaction time. This interpretation fits well with the part-based account of the reversed congruency effect. While face inversion disrupts holistic face processing, it little affects the processing of local features (e.g., the size of the sclera), resulting in the equivalent magnitude of the reversed congruency effect, which relies mainly on the part-based processing (Ganel et al., 2005; Langton et al., 2004).

Facial expression modulated the reversed congruency effect, with a larger reversion for sad than happy and neutral faces, although it did not reach statistical significance. As our sample size (i.e., 101) was much larger than the previous studies (30 in Jones, 2015, 18 in the low AQ group of Marotta et al., 2022), statistical power cannot explain the differences between the results. In addition, while sad faces produced a non-significant increase in the reversed congruency effect in the present study, previous studies observed it for happy faces (Jones, 2015; Marotta et al., 2022) and angry faces (Jones, 2015). More importantly, the non-significant effect of facial expression was observed even when the face was inverted in the present study. We suspect that the effects of facial expression on gaze spatial Stroop task may not be as robust and stable as the researchers assume.

The results of inverted faces in the present study provide theoretical implications for the reversed congruency effect. Previous studies explained the reversed congruency effect in terms of social facilitation by direct gaze (Cañadas & Lupiáñez, 2012) and joint attention (Edwards et al., 2020). Given that face inversion can reduce or eliminate these facilitations (Böckler et al., 2015; Kingstone et al., 2000; Langton & Bruce, 1999; Senju et al., 2005, 2008, but also see Tipples, 2005), the reversion for inverted faces may not be consistent with these explanations. In a different line, the reversed congruency effect can be explained by perspective-takingFootnote 2, which is the ability to recognize another person’s point of view (Hemmerich, 2018). Marotta explained that switching the viewpoints from the observer to the gaze target reversed the direction of the gaze, producing the reversed congruency effect (e.g., the right-looking target in the observer’s perspective looks left in the target’s perspective). However, the present results did not confirm this assumption because the perspective-taking account also supposes the decrease or disappearance of the reversed congruency effect for inverted faces; switching the viewpoint did not reverse the gaze direction for the inverted faces (e.g., the right-looking inverted face in the observer’s perspective looks right in the target’s perspective). While previous studies have focused on how the gaze target is interpreted and processed (e.g., eye contact, joint attention, and perspective-taking), our study highlights the importance of part-based processing in eye gaze. We believe the perceptual processes will help elucidate the unique attentional effects triggered by eye-gaze.