Abstract
Humans actively sample their environment with saccadic eye movements to bring relevant information into high-acuity foveal vision. Despite being lower in resolution, peripheral information is also available before each saccade. How the pre-saccadic extrafoveal preview of a visual object influences its post-saccadic processing is still an unanswered question. The current study investigated this question by simultaneously recording behavior and fixation-related brain potentials while human subjects made saccades to face stimuli. We manipulated the relationship between pre-saccadic “previews” and post-saccadic images to explicitly isolate the influences of the former. Subjects performed a gender discrimination task on a newly foveated face under three preview conditions: scrambled face, incongruent face (different identity from the foveated face), and congruent face (same identity). As expected, reaction times were faster after a congruent-face preview compared with a scrambled-face preview. Importantly, intact face previews (either incongruent or congruent) resulted in a massive reduction of post-saccadic neural responses. Specifically, we analyzed the classic face-selective N170 component at occipitotemporal electroencephalogram electrodes, which was still present in our experiments with active looking. However, the post-saccadic N170 was strongly attenuated following intact-face previews compared with the scrambled condition. This large and long-lasting decrease in evoked activity is consistent with a trans-saccadic mechanism of prediction that influences category-specific neural processing at the start of a new fixation. These findings constrain theories of visual stability and show that the extrafoveal preview methodology can be a useful tool to investigate its underlying mechanisms.
SIGNIFICANCE STATEMENT Neural correlates of object recognition have traditionally been studied by flashing stimuli to the central visual field. This procedure differs in fundamental ways from natural vision, where viewers actively sample the environment with eye movements and also obtain a low-resolution preview of soon-to-be-fixated objects. Here we show that the N170, a classic electrophysiological marker of the structural encoding of faces, also occurs during a more natural viewing condition but is strongly reduced due to extrafoveal preprocessing (preview benefit). Our results therefore highlight the importance of peripheral vision during trans-saccadic processing in building a coherent and stable representation of the world around us.
Introduction
Visual processing takes place primarily during periods of fixation, which are separated by fast eye movements known as saccades. Unlike in laboratory experiments, in which stimuli appear suddenly, the image present on the fovea during natural viewing is typically the result of a choice to fixate that item based on a peripheral preview of that object. Whether this peripheral preview influences visual processing during the new fixation, and how this might fit into competing theories regarding why visual perception seems stable and continuous across saccades, remains an important question.
One set of theories of visual stability emphasizes the role of a prediction about which visual information will be available after the saccade (for review, see Melcher and Colby, 2008; Melcher, 2011). In reading, a classic behavioral finding is the preview benefit effect (Rayner, 1975): when an upcoming word was visible in extrafoveal vision before a saccade, subsequent fixations on the word are shorter compared with an invalid preview condition. For other complex visual objects, such as faces, there is also evidence that pre-saccadic information can influence post-saccadic percepts (Melcher, 2005; Wolfe and Whitney, 2014) and facilitate the post-saccadic processing of the previewed stimulus (Edwards et al., 2018).
In terms of neural mechanisms, post-saccadic visual processing might be facilitated by neurons that change their tuning toward the “future receptive field” even before the eye movement occurs, a process called predictive remapping (Duhamel et al., 1992; Melcher and Colby, 2008; Melcher, 2011). This prediction signal might be the result of feedback connections between higher-level visual areas anticipating the post-saccadic responses in lower visual areas as well as feedforward connections transmitting information that was not predicted, i.e., the prediction errors (Srinivasan et al., 1982; Rao and Ballard, 1999; Clark, 2013). Together, these neural mechanisms might support trans-saccadic predictions.
There is converging evidence for a reduction in neural responses when a stimulus is predictable compared with when it is unexpected (for review, see de Lange et al., 2018). When looking at fixation-related brain potentials (fERPs), the behavioral preview benefits in reading are associated with a reduction of the evoked, word-specific neural response, an effect termed “preview positivity” (Dimigen et al., 2012; Kornrumpf et al., 2016). Importantly, in reading, preview positivity effects are much stronger when readers execute a saccade toward a word (active condition) compared with control conditions with passive extrafoveal stimulation (Kornrumpf et al., 2016). Along these lines, recent fMRI studies have shown a reduction in BOLD response when the features and location of a stimulus are consistent across a saccade (Dunkley et al., 2016; Zimmermann et al., 2016; Fairhall et al., 2017). These preview effects can be similarly explained by prediction mechanisms or by repetition suppression, i.e., the dampening of a neural signal when a stimulus category is viewed before (peripherally) and after (foveally) the eye movement. It is still debated whether repetition suppression effects represent a signature of prediction (Rostalski et al., 2019) or not (Tang et al., 2018).
An alternative view of visual stability focuses on the role of the spatial shift of attention toward the peripheral target before saccade execution (Hoffman and Subramaniam, 1995; Deubel and Schneider, 1996; Zhao et al., 2012; Buonocore et al., 2017), with this attentional shift playing a preeminent role (Mathôt and Theeuwes, 2011; Melcher, 2011). The key idea is that selective attention is already present at the beginning of the new fixation, leading to attentional facilitation of post-saccadic processing (for review, see Mathôt and Theeuwes, 2011). In contrast to prediction, which typically results in reduced evoked responses, selective attention tends to amplify neural responses (for review, see Thiele and Bellgrove, 2018). In the case of face stimuli, for example, selective attention enhances evoked responses in the electroencephalogram (EEG; Mohamed et al., 2009; Sreenivasan et al., 2009; Churches et al., 2010).
Testing whether there is a decrease in neural activity (due to prediction) versus an increase (due to attention) has, therefore, been suggested to be an important marker to differentiate between these two mechanisms (Kok et al., 2012; Spaak et al., 2016; de Lange et al., 2018). This raises the question of what happens in the case of preview effects with saccades. If the shift in attention plays a preeminent role in post-saccadic visual processing, then post-saccadic fixation-related ERPs would be expected to be larger in amplitude when a salient preview was available, due to the target receiving attentional enhancement. The aim of the current study was to investigate whether a peripheral preview of a face image would influence post-saccadic processing of that face and, if so, whether it would lead to an increase (attention) or reduction (via prediction or repetition suppression) of the neural response.
Materials and Methods
Participants.
Fifteen participants (10 females, age range: 20–31 years, M = 24.1) who reported no neurological or visual impairments were included in the data analysis. Two additional participants were recorded but had to be excluded based on their behavioral performance (i.e., excessive trial loss >60%; see Behavioral screening and analysis). The experiment was conducted in accordance with the Declaration of Helsinki (2008) and approved by the University of Trento Research Ethics Committee. Participants provided informed written consent and received a compensation of €10 per hour.
Apparatus.
Stimuli were presented on a 24-inch LED monitor (resolution: 1920 × 1080 pixels, subtending 44° × 25.9°) at a vertical refresh of 120 Hz. To reduce head movements, participants were seated with their head stabilized by a chin and forehead rest. The eyes were horizontally and vertically aligned with the center of the screen at a viewing distance of 63 cm. Eye movements were recorded with a video-based eye tracker (EyeLink 1000 with desktop mount; SR Research) at a sampling rate of 1000 Hz (detection algorithm: pupil and corneal reflex; thresholds for saccade detection: 30°/s velocity and 9500°/s2 acceleration). A five-point calibration and validation of the eye tracker on a standard rectangular grid was run at the beginning of the experiment and whenever necessary during the experiment. Programs for stimulus presentation and data collection were written in MATLAB (MathWorks) using the Psychophysics Toolbox v3 (Brainard, 1997; Pelli, 1997) and EyeLink Toolbox extensions (Cornelissen et al., 2002). Participants' manual responses were recorded on a standard keyboard.
The EEG was recorded from 64 Ag/AgCl electrodes (Brain Products) placed at standard locations of the International 10-10 system. Signals were recorded with a time constant of 10 s and a high cutoff of 250 Hz, referenced online against the left mastoid, and digitized at a rate of 1000 Hz. The system was set up with a parallel port splitter so that trigger pulses were sent simultaneously to the EyeLink and EEG acquisition computers.
Procedure.
Participants were seated in a dimly lit room and then briefly familiarized with the task by the experimenter. Figure 1 illustrates the trial scheme. Participants started each trial by pressing the space bar while maintaining their gaze at a central fixation cross (0.5° wide, shown in white on a black background). One second after this button press, two circular placeholders (white rings, diameter 4°, line width 1 pixel) appeared to the left and right of the central fixation cross. Placeholders were centered at eccentricities of ±8° and indicated the positions of the upcoming preview stimuli. Once the eye tracker detected a stable fixation for 1000 ms within an area of 2° around the central fixation cross, the preview display was triggered. Depending on the condition, the preview display consisted either of two different scrambled faces (scrambled-face preview condition) or two different intact faces (intact-face preview condition) that appeared at the previous positions of the placeholders (Fig. 1, Preview). After 500 ms of preview, the fixation cross changed its color and turned either green or red, thereby cueing the participant to execute a saccade toward the left or right stimulus, respectively (Fig. 1, Saccade cue). Participants were instructed to respond as quickly and accurately as possible to the cue with a single saccade. Saccadic reaction times (SRTs) were defined as the interval between cue onset and the onset of the first saccade executed toward the peripheral target.
Trial scheme. At the beginning of each trial, participants fixated a central fixation cross for 1000 ms. Afterward, two placeholders appeared in the periphery at ±8° to the left and right of fixation (Placeholders). After 1000 ms, two preview stimuli appeared at the position of the placeholders for 500 ms (Preview). These stimuli could be either scrambled-faces (blue outline) or intact-faces (dashed green/pink outline). After the preview interval, the central cross turned either green (left) or red (right), thereby cueing the participant to execute a saccade toward the left of right placeholder, respectively (Saccade cue). During the saccade, the preview was first changed into a scrambled image patch for one display cycle (8.3 ms) to introduce a peri-saccadic transient in all three conditions (Transient). Afterward, the stimulus changed to the target face in all conditions (Target). The relationship between the preview stimulus and the target face yielded three conditions for the behavioral and fixation-related EEG analysis: a scrambled preview condition (blue outline), an incongruent preview condition (green outline; different face seen before and after saccade), and a congruent preview condition (pink outline; same face seen before and after saccade). Participants were asked to discriminate the gender (male/female) of the face visible after the saccade with a button press. Note that stimuli are not drawn to scale.
During the saccade, once gaze position crossed an invisible vertical boundary placed a distance of 1° from the fixation cross, a scrambled version of the preview face (that was always different from those shown as previews in the scrambled-face preview condition) was transiently presented for just a single display cycle (8.3 ms; Fig. 1, Transient). The purpose of this gaze-contingent display change was to introduce an intra-saccadic visual transient in all experimental conditions, that is, also in the congruent-face preview condition in which the same face was presented before and after the saccade. After the transient was displayed, and still during the saccade, the preview stimulus always changed into an intact face (Fig. 1, Target).
Participants then responded with a button press whether the face that they had landed on with their eyes was male or female. Responses were given with the index fingers of the left and right hand using two keyboard buttons. Manual reaction time (RT) was defined as the interval between the saccade-contingent presentation of the target face (triggered by the saccade toward the face) and the button press. With this methodology, any potential difference in SRTs was excluded from the computation of manual RTs.
The experimental design comprised three main conditions: scrambled-face preview, incongruent-face preview, and congruent-face preview (Fig. 1, Preview). Each condition comprised 160 trials, leading to a total of 480 trials. Conditions differed in terms of the stimulus shown before the saccade (preview stimulus). In the scrambled-face preview condition, the stimuli presented during the preview interval were scrambled faces. In contrast, in both the incongruent- and congruent-face preview conditions, the stimuli shown as previews were intact faces. After the saccade, participants always looked at a face as the target stimulus. This means that in the scrambled-face preview condition, the scrambled face shown as a preview changed into a face.
In the incongruent-face preview condition, the target face shown after the saccade was different from the preview face seen before the saccade (in this condition, the face shown at the irrelevant screen location opposite the cued saccade direction remained the same). This incongruent-face preview condition included both pure changes in facial identity without a change of the gender (i.e., male-to-different-male, female-to-different-female) as well as changes in both identity and gender (male-to-female, female-to-male) in equal proportion. Finally, in the congruent-face preview condition, the target stimulus was identical to the face presented at this position before the saccade. The face seen after the saccade was equiprobably male and female and the gender of the target face was counterbalanced with the preview condition.
Stimuli.
Forty-two grayscale images were selected from the Nottingham face database (http://pics.stir.ac.uk/zips/nottingham.zip), each showing a frontal view of a face (21 female, 21 male) with a neutral facial expression. To standardize the images and to reduce differences between the genders, a black mask with a circular aperture was applied to each face to cover the external facial features (e.g., hair; Fig. 1). The aperture was centered on the nose, spanned from the forehead to the chin, and subtended a diameter of 4° of visual angle at the viewing distance of 63 cm.
For each original face stimulus, we also generated a scrambled counterpart that was used as the pre-saccadic preview stimulus in the scrambled-face preview condition (see Procedure). For this purpose, we calculated the 2D Fourier transform of each face image and then added a matrix of random phase angles to the existing phase information of the image. We then performed an inverse Fourier transform, thereby preserving the original power spectrum of the image. The same circular aperture as for the intact faces was also applied to the scrambled images.
Finally, for each face image, we selected a second face stimulus that served as the saccade target in the condition with an incongruent preview as well as a third scrambled-face stimulus, which was used as a transient during the saccade. Specifically, to control for low-level differences between the face stimuli shown before and after the saccade, we randomly selected for each image another face stimulus from the pool of 42 face images, such that their difference in average image luminance (estimated via their RGB gray values) was <4% (i.e., difference <11 in 8-bit gray values) and not statistically significant (as confirmed by a one-way ANOVA). In addition, possible differences in image luminance between the stimulus shown before and after the saccade were also controlled by adding luminance as a predictor in the statistical analysis of the EEG (see Single-subject GLM).
Behavioral screening and analysis.
In an initial analysis step, trials were screened for incorrect oculomotor behavior. Specifically, we removed all trials in which no saccade was executed toward either stimulus (0.1% of trials) or an eye blink occurred around the time of saccade execution (−200 to 600 ms around saccade onset; 1.1%). Furthermore, we removed trials in which the eyes deviated from the central fixation cross by >2° during the preview interval (1.9%), the saccadic reaction time was extremely short (<100 ms; 0.8%) or long (>530 ms; 19.6%), saccade amplitude was extremely small (<3°; 1.9%) or large (>10°; 2.6%), or in which the saccade went in the wrong direction (5.5%). Finally, we excluded trials in which the saccade-contingent display change was triggered prematurely by drift movements or microsaccades during the preview interval (0.2%) or in which the main saccade to the target was followed by a secondary saccade >3° within ≤150 ms (0.2%).
Manual RTs and response accuracies in the gender discrimination task were then submitted to repeated-measures ANOVAs on the three-level factor Preview (the incongruent-face preview condition included both changes in gender, i.e., male-to-female and female-to-male, and changes in identity only, i.e., male-to-different-male or female-to-different-female). For the analysis of the button presses, trials with an extreme manual RT (<200 or >1000 ms) were ignored as outliers. Furthermore, one participant was dropped from the manual RT analysis because of very slow manual RTs and therefore too few remaining trials.
Electrophysiological data analysis.
For the electrophysiological analysis, the EEG was first synchronized with the eye-tracking channels based on the shared trigger pulses using the EYE-EEG toolbox (Dimigen et al., 2011). The synchronized EEG was then downsampled to 500 Hz, bandpass-filtered from 0.1 to 40 Hz (passband edges) using EEGLAB's (Delorme and Makeig, 2004) finite response filter (pop_eegfiltnew.m) with default settings, and digitally re-referenced to an average reference. In the next step, ocular EEG artifacts were removed using an optimized eye-tracker-guided variant of Infomax ICA in EEGLAB. To optimize the ICA decomposition and the suppression of the myogenic spike potential peaking at saccade onset (Keren et al., 2010), the ICA was trained on a copy of the data high-pass filtered at 2 Hz (Winkler et al., 2015) in which EEG sampling points occurring around saccade onsets (−20 to +10 ms) were overweighted (Dimigen, 2020). The resulting unmixing weights computed on this high-pass filtered and optimized training data were then applied to the original unfiltered recording, and ocular components were automatically flagged using the eye tracker-guided procedure by Plöchl et al. (2012) with the saccade–fixation variance ratio threshold set to 1.1.
Based on the trials with correct oculomotor behavior, we then extracted two sets of 1000-ms-long epochs (−300 to 700 ms) from the artifact-corrected EEG. The first set was cut around the onset of the preview stimuli on the screen [traditional event-related potential (ERP) average]. The second set was cut around the onset of the first fixation on the target face following the saccade (fERP average). To exclude segments with residual non-ocular artifacts, we removed all epochs containing peak-to-peak voltage differences >120 μV in any channel (2.3% of ERP and 2.8% of fERP epochs). Epochs were then baseline-corrected by subtracting the mean channel voltages in the 200 ms interval before stimulus/fixation onset, respectively.
Single-subject GLM (first-level analysis).
Stimulus- and fixation-related potentials were analyzed using a mass univariate model (Smith and Kutas, 2015a) in which a GLM was fitted on each electrode and time point separately using the unfold toolbox (Ehinger and Dimigen, 2019). Analysis of EEG data with mass univariate models has advantages in terms of higher sensitivity (Rousselet et al., 2011; Smith and Kutas, 2015a) and allows to control for the effects of continuous covariates on the waveform. For ERPs, the model only contained the intercept term and one categorical predictor coding whether the preview stimuli consisted of two scrambled (0) or two intact faces (1). For the fERP analysis, the predictors in the regression model were a three-level categorical predictor coding the type of preview shown before the saccade (scrambled, incongruent, congruent) as well as two continuous linear covariates: saccade amplitude and the preview-target luminance difference. Saccade amplitude (in degrees of visual angle) was added to the model because the size of the incoming saccade has a well established and strong influence on the amplitude of the post-saccadic neural response (Thickbroom et al., 1991; Dandekar et al., 2012). Including saccade amplitude as a nuisance variable in the model therefore controlled for slight difference in incoming saccade amplitude (∼0.3°; see Results) between preview conditions. In addition, we also found that the fERP was modulated by the difference in mean luminance between the stimulus shown as preview and the post-saccadic target. The mean luminance difference between both stimuli was therefore also included as a continuous covariate.
As a control analysis, we repeated our analysis of the fERP using a GLM-based linear deconvolution technique (also called continuous-time regression; Dandekar et al., 2012; Smith and Kutas, 2015b; Ehinger and Dimigen, 2019) that is also implemented in the unfold toolbox. In the current experiment, SRTs were ∼30 ms longer for the scrambled-face preview than for the intact-face preview conditions (see Results). This means that the temporal overlap between the ERP evoked by the onset of the saccade cue (red/green fixation cross) and the fERP evoked at saccade offset differed systematically between conditions, potentially biasing the results. GLM-based deconvolution allows us to control this overlapping activity by modeling the response to both types of events (cue and fixation onset) in the same statistical model. However, because the results were virtually identical to those obtained with the simpler mass univariate model, we only report the results of the latter here.
Group statistics (second-level analysis).
Second-level statistical analyses were performed using the threshold-free cluster enhancement method (TFCE; Smith and Nichols, 2009; Mensen and Khatami, 2013), a permutation test (Maris and Oostenveld, 2007), which controls for multiple testing across electrodes and time points without the need to define an arbitrary cluster-forming threshold. Analyses were run using the MATLAB implementation of TFCE (http://github.com/Mensen/ept_TFCE-matlab) based on 2000 random permutations. For ERPs, we compared the response following an intact-face versus scrambled-face preview. For fERPs, we used the ANOVA variant of the TFCE algorithm, followed-up by Bonferroni-corrected pairwise comparisons between the three preview conditions, again using the TFCE method. For visualization of the TFCE results in Figures 2 and 4, p values were thresholded at p < 0.05, p < 0.01, and p < 0.005.
Results
In the following, we first report the neural response evoked by the onset of the preview stimuli (ERP to the pair of intact vs scrambled faces). This is followed by an analysis of the behavior and fERP to the post-saccadic face stimulus.
Preview stimulus onset: evoked response (ERP)
The goal of this analysis was to ensure that our stimuli were effective in eliciting typical face-related ERP components. Figure 2A shows the scalp-topographic difference maps of the difference between extrafoveal intact-face previews (i.e., 2 faces presented bilaterally at ±8° eccentricity) minus scrambled-face previews (2 scrambled faces presented at ±8°). Topographies are shown at three latencies after preview stimulus onset, corresponding to the peaks of the P1 (124 ms), N1 (226 ms), and P3 (350 ms) components. White dots in the scalp maps indicate electrodes, which showed significant differences between intact- and scrambled-face previews at the given latency (in a pairwise TFCE-based t test). Grand-mean waveforms in Figure 2B show the stimulus-ERP elicited by the onset of the bilateral preview display, averaged across two occipitotemporal electrodes over the left (PO7) and right hemisphere (PO8).
ERPs aligned to the onset of the preview display. A, Topographic difference maps of intact-face previews minus scrambled-face previews for three latencies after stimulus onset that represent the peak latencies of the P1, N1, and P300, respectively. White dots represent electrodes that show significant differences between the two preview conditions in the TFCE statistic at this latency. B, Grand-mean stimulus-locked ERP, averaged over occipitotemporal electrodes PO7 and PO8 for intact-face previews (green/pink) and scrambled-face previews (blue). C, Results of the TFCE statistic comparing face- and scrambled-face previews at all time points and channels. For visualization, p values are thresholded at 0.05, 0.01, and 0.005 with different shades of blue.
At the earlier latencies, during the P1 component, there was not yet a clear difference between the ERP responses for the two types of stimuli (intact vs scrambled faces) beside a small cluster of activation at right frontocentral sites. However, in the following N1 time window, a strong bilateral negativity emerged at occipital-temporal electrode sites that was slightly larger over the right hemisphere, as typical for N170 face effects (Eimer, 2011). Over frontocentral sites, the posterior N170 effect was accompanied by a corresponding “vertex-positive potential”; a broad positive potential generally taken to reflect the positive poles of a bilateral dipole pair generating the occipitotemporal N170 (Eimer, 2011). These results clearly show how the bilateral presentation of the face preview (dashed green/pink line) led to a markedly different evoked response than that of the scrambled-face images (blue line; Fig. 2B); with faces eliciting a much more pronounced occipital-temporal N170 component (Halgren et al., 2000; Hoshiyama et al., 2003; Deffke et al., 2007; Gao and Wilson, 2013). In contrast, only a smaller frontocentral cluster was observed during the earlier P1 component (Fig. 2A). With a peak at ∼226 ms, the N170 reached its peak ∼50 ms later than typically observed (Bentin et al., 1996). A likely reason for this delay is that the two face stimuli were presented bilaterally in the extrafoveal visual field, rather than in the fovea. By looking at the full matrix of TFCE p values depicted in Figure 2C, it is clear how clusters of significant activation arose at ∼160 ms after stimulus onset, both over frontal-central and occipitotemporal areas. Although the difference between the intact-face and scrambled-face preview condition reached its maximum after 226 ms, this effect remained topographically stable and statistically significant throughout the entire stimulus-locked analysis period (i.e., until 600 ms after stimuli onset).
Preview effects: behavioral results
Figure 3 summarizes behavioral performance in the task. A first finding is that saccadic reaction times were affected by the preview condition: SRTs were ∼30 ms faster in trials with an intact compared with a scrambled-face preview (intact vs scrambled: t(14) = −4.673; p < 0.0004; Fig. 3, left). The same pattern was also reflected in saccade amplitudes, which were slightly larger (∼0.3°) when the preview was an intact rather than a scrambled face (t(14) = 8.259; p < 0.000001; Fig. 3, center). This pattern of results indicates that seeing a possible target stimulus, i.e., a face, in the periphery enhanced the preparation of the oculomotor response toward the target.
Behavioral results. The average saccadic reaction time (left panel), saccadic amplitude (center panel), and manual RT (right panel) for the scrambled-, incongruent-, and congruent-face preview condition, respectively. *p < 0.05. Error bars denote ±1 SEM.
For the gender discrimination task following the saccade, response accuracy was generally high (89% correct) and did not differ between preview conditions: F(2,26) = 0.475 p = 0.627. However, like SRTs, manual RTs for the button press depended strongly on the preview condition (main effect: F(2,26) = 8.535 p < 0.001) with numerically shorter RTs observed in the two conditions in which a congruent- or an incongruent face was shown as a preview compared with the scrambled-face condition (Fig. 3, right). Bonferroni-corrected post hoc tests confirmed that congruent face previews produced significantly shorter RTs than scrambled previews: t(13) = −3.802; p < 0.007 (Bonferroni). Importantly, this effect replicates the classic trans-saccadic preview benefit also observed with other types of stimuli, in particular words (Rayner, 1975). When the preview was an incongruent face, there was a only a statistical trend for faster RTs compared with the scrambled-preview condition: t(13) = −2.546; p < 0.07 (Bonferroni). Manual RTs did not differ significantly between the congruent and incongruent preview condition. We also tested whether, within the incongruent-preview condition, manual RTs in the gender discrimination task differed according to whether the gender of the faces remained the same across the saccade (i.e., male-to-different-male and female-to-different-female change) or not (male-to-female and female-to-male change). Although participants responded numerically faster (by M = 15.2 ms) if the gender remained the same across the saccade, this difference was not significant (t(13) = 1.227, p = 0.2416) indicating that participants did not benefit more from correct-gender previews. For this reason, we did not differentiate between these sub-conditions in the subsequent analyses.
Together, these results replicate a robust trans-saccadic benefit for previewed human faces compared with a non-informative scrambled-preview condition. Both the initial oculomotor response toward the peripheral face as well as the subsequent foveal processing of the facial features (necessary for the gender discrimination task) were significantly enhanced if the extrafoveal preview provided before the saccade was also a human face, supporting the hypothesis of preview facilitation for the processing of face stimuli.
Preview effects: evoked response (fERP)
The main goal of the current study was to compare the fixation-related brain response elicited by the first direct fixation on the target face as a function of the extrafoveal information available during the preceding fixation: a scrambled face, a different person's face, or the same face. Figure 4 summarizes the fERP elicited by the first direct fixation on the target face after the end of the critical saccade. Figure 4A shows the topographic difference maps for the three contrasts at the peaks of the fixation-related P1 (106 ms), N1 (180 ms), and P3 (350 ms) components. Figure 4B shows the corresponding fERP waveforms, averaged again across occipitotemporal electrodes PO7 and PO8. Figure 4 presents the corresponding statistical comparison (TFCE) between the congruent- and the scrambled-face preview conditions (4C) and between the incongruent- and the scrambled-face preview conditions (4D).
Fixation-related potentials (fERP). A, Topographic difference maps for the difference between the congruent minus incongruent (top row), incongruent minus scrambled (middle row) and congruent minus scrambled preview condition (bottom row) at three latencies after fixation onset on the target face. The latencies correspond to the P1, N1, and P300, respectively. B, Grand-mean fERP averaged across occipitotemporal electrodes PO7 and PO8 for the scrambled (blue), incongruent (green), and congruent face preview condition (pink). Note that the three conditions only differ in terms of the stimulus seen before the saccade, whereas the target face fixated at time 0 of this plot was the same in all three conditions. C, D, TFCE results for the pairwise comparison between the congruent- and scrambled-face preview condition and the incongruent- and scrambled-face preview condition, respectively.
The first interesting observation is that when contrasting the activity following a congruent compared with an incongruent-face preview (Fig. 4A, top row), there was no sign of a significant difference across the entire scalp at any time point. In Figure 4A, second and third rows, we contrasted the activity of the incongruent- or congruent-face, respectively, against the scrambled-face preview. Differently from the previous comparison, it is now evident that seeing an intact face rather than a scrambled-face stimulus in the periphery led to a completely different response pattern at the time of the new fixation, once the target face was foveated. Whereas the fixation-related P1 did not differ between conditions, the following N1 was strongly influenced by the type of preview visible during the preceding fixation. In particular, we report a strong attenuation of the fixation-related N170 in the conditions in which a congruent or incongruent face was visible before the saccade. This effect was more pronounced over the right hemisphere, with a corresponding negative pole over central frontal regions, congruent with the activation pattern observed for the ERP time-locked to stimulus onset (see previous section).
This is especially clear by looking at the three waveforms for the electrodes PO7/PO8, whereby the incongruent- (green) and congruent- (pink) face preview showed a strong reduction in the post-saccadic response at the time of the N170, i.e., a preview positivity effect compared with the scrambled-face preview (Fig. 4B, blue). Figure 4C visualizes the p-value matrix for the contrast between the congruent-face minus scrambled-face preview condition across the entire epoch. This plot suggests that the preview positivity began around ∼160 ms and persisted up to ∼300 ms after fixation onset. This was then followed by a later and weaker cluster of activation between ∼360–420 ms, which shared a similar scalp topography as the initial N170 effect. In Figure 4D, we also report the TFCE p-value matrix for the contrast between the incongruent-face versus the scrambled-face preview condition. Landing on a different face from the one that was available during the preview led to an almost identical pattern of activation over all the electrodes for the entire epoch as in the congruent preview condition.
In a follow-up analysis, we also tested whether the preview positivity effect in fERPs was modulated by saccadic response time. For this purpose, the mass univariate model was expanded to include an additional predictor dummy-coding whether the SRT in the trial was below (0) or above (1) the participant's median SRT and this predictor was allowed to interact with Preview. The TFCE statistic provided no evidence for a significant interaction (all p values > 0.05).
Discussion
Face and object recognition have traditionally been studied by flashing stimuli to the central visual field during fixation. In contrast, natural vision typically affords an extrafoveal preview of soon-to-be fixated items before they are brought into the fovea by a saccade. Here we show that the extrafoveal preview of a face stimulus leads to a strong reduction in the post-saccadic evoked response compared with a control condition in which a meaningful preview was withheld by scrambling its spectral phase. In particular, the N170 component, which is classically linked to the structural processing of faces, was substantially reduced in trials with a face preview compared with those with a scrambled preview. These results are consistent with a “preview benefit” (i.e., reduction) in the evoked response as previously observed for visual words (Dimigen et al., 2012) and, more generally, with the notion that information about the saccadic target can influence post-saccadic processing (Edwards et al., 2018; Ehinger et al., 2015; for review, see Melcher and Colby, 2008; Melcher and Morrone, 2015).
Given that face processing and visual word recognition both involve highly specialized processing streams, it is perhaps not surprising that the timing and magnitude of our effects differ from the preview positivity effect in reading (Dimigen et al., 2012; Kornrumpf et al., 2016). For faces, the effect peaked earlier than that reported previously for words, was two to three times larger at occipitotemporal channels and also lateralized differently, with a stronger effect over the right (rather than left) hemisphere.
Current theories suggest that face processing involves several stages of neural processing that differ in terms of their feature-selectivity, neural substrate, and associated ERP components. The occipital face area (OFA) has been implicated in processing parts of faces, such as eyes or mouth. The P100 shows similar modulations, suggesting a link with OFA (Pitcher et al., 2007; Sadeh et al., 2010). The fusiform face area (FFA) is associated with configural or holistic face processing and linked to the N170 (Halgren et al., 2000; Hoshiyama et al., 2003; Deffke et al., 2007; Gao and Wilson, 2013). Here we found preview effects only after the end of the P1 component, ∼160 ms after fixation onset, which is consistent with an effect at the level of structural encoding of the face (Sadeh et al., 2010). Likewise, the lack of difference between congruent and incongruent previews is consistent with processing at the level of facial configuration rather than specific local features.
Beyond the N170 effect, a relative positivity for intact-face previews persisted throughout the later parts of the fERP epoch (as well as in the stimulus-onset ERP), which in the context of face stimuli might be associated with processing of dynamic facial expressions in the superior temporal sulcus (Itier and Taylor, 2004; Sadeh et al., 2010; Dalrymple et al., 2011). However, interpreting the pattern of activation in terms of brain regions is complicated. For example, N170 generators potentially involve a larger network than the FFA alone, such as the superior temporal sulcus (Henson et al., 2003). Moreover, preview faces in our study were presented in the periphery as part of a bilateral pair of stimuli, which may have influenced the timing of the evoked responses.
Our experimental design was motivated by the logic of distinguishing between an increase in neural activity, due to attentional enhancement, and a reduction due to prediction, i.e., an expectation driven by eye movement preparation about which information will be present at the future receptive field location (Kok et al., 2012; Spaak et al., 2016; de Lange et al., 2018). Although the exact mechanisms that cause this reduction in neural activity remain a matter of debate, reduced neural response can be considered a hallmark of prediction. In contrast, spatial attention allocated to the saccade target before the movement (Hoffman and Subramaniam, 1995; Deubel and Schneider, 1996; Deubel, 2008; Zhao et al., 2012; van Koningsbruggen and Buonocore, 2013; Buonocore et al., 2017) should be associated with an enhancement of the P1 and N1 components of the fixation-related ERP (Eimer, 2000; Mohamed et al., 2009; Sreenivasan et al., 2009; Churches et al., 2010; Meyberg et al., 2015). Although attention is deployed to the saccade target before the execution of the eye movement, our results clearly show a reduction, rather than enhancement, of the N170 component when a face preview was available.
The timing and topography of the effects observed here argue against a role for surprise, or a change in context, because such effects are more typically reflected in the later centroparietal P3 component (Sutton et al., 1965; Duncan-Johnson and Donchin, 1977; Donchin, 1981). Also, there was a face present after the saccade on every trial, so the face was never a “surprise” in that sense. In contrast, the relatively larger N1 in the scrambled-face preview condition might share some features with the visual mismatch negativity (Stefanics et al., 2014; Kornrumpf et al., 2016). Specifically, one could argue that in the scrambled-face preview condition (i.e., the condition where the prediction is not matched) the fixation of the face might lead to a “mismatch” between prediction and stimulus compared with a reduced or absent mismatch negativity in the conditions with an intact-face preview.
Interestingly, the pattern of effects observed in our experiment also resembles another well known effect reported in the ERP literature, repetition suppression (Kovács et al., 2006; Maurer et al., 2008; Kloth et al., 2010; Amihai et al., 2011). A number of previous studies have found a reduction of the N170 when a face is preceded by another face stimulus in foveal vision (for review, see Schweinberger and Neumann, 2016). It is important to then relate these foveal findings to the results of our trans-saccadic viewing paradigm. One interpretation of our results, following the repetition suppression literature, is that a “face detector” mechanism (Schweinberger and Neumann, 2016) would activate whenever a face appears in the periphery. In both the conditions with a congruent or incongruent preview, the face detector would then be suppressed when another face is foveated post-saccadically, leading to an attenuated N170.
Along these lines, previous studies using fMRI have shown a reduction in BOLD response when a stimulus was stable (i.e., its features and screen position remained constant) across a saccade (Dunkley et al., 2016; Zimmermann et al., 2016; Fairhall et al., 2017). In these studies, following a relatively long adaptation interval (e.g., 2 s), a saccade moved the stimulus from one hemifield (and, therefore, brain hemisphere) to the other. The results have been interpreted as a transfer of spatiotopic adaptation between retinotopic visual areas that is driven by an active mechanism linked to saccade execution (Zimmermann et al., 2016). The lack of a difference after incongruent- and congruent face previews would be consistent with these findings.
Our results add to this literature by demonstrating that the ERP effect can be measured across a saccade and with both faces at different retinotopic locations. They also add to the fMRI findings by showing that the effect develops rapidly, i.e., within 200 ms of the new fixation, after a comparatively brief preview interval. Note, however, that a lack of difference between the incongruent and congruent face preview might also be expected under a prediction mechanism. In our design, participants were likely unable to extract detailed local features from the preview faces appearing at 8° eccentricity. This is also supported by our finding that behavioral performance in the gender classification task did not improve if the gender of the preview face was correct (see Results, Preview effects: behavioral results). If the predictions about the post-saccadic input were only based on coarse configural information, both preview conditions would also not be expected to differ under a prediction view.
In practice, distinguishing whether the reduction in neural signal is due to repetition suppression or prediction might be difficult. There is an open debate whether repetition suppression effects are a signature of a prediction error (Rostalski et al., 2019), or whether they are separable processes (Tang et al., 2018). Some support for trans-saccadic prediction is reported in a recent study showing both a congruency benefit in behavior and reduced evoked responses when the orientation of a face, rather than just the presence of a face, was maintained across the saccade (Huber-Huber et al., 2019). Additional work is then necessary to directly test which of the different effects reported in our current study are driven by prediction and which can be explained by a mere trans-saccadic repetition of a face stimulus.
More generally, we demonstrate that fixation-related ERPs elicited by faces show a generally similar N170 component to that traditionally observed for a sudden stimulus onset (Soto et al., 2018). Given that face perception in real life is typically for a face brought into the fovea from the periphery, rather than a face appearing out of nowhere, it is important to show the ecological validity of such category-specific components. Overall, the current study provides a proof-of-concept for the usefulness of the fERP paradigm for studying visual stability. Follow-up studies could use this technique to investigate the mechanisms underlying trans-saccadic perception to distinguish between competing theories of how our impression of the visual world remains stable and continuous.
Footnotes
This work was supported by a Grant from the National Institute of Mental Health (R21MH117787) to D.M.
The authors declare no competing financial interests.
- Correspondence should be addressed to Antimo Buonocore at antimo.buonocore{at}cin.uni-tuebingen.de