Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 21;120(8):e2212735120.
doi: 10.1073/pnas.2212735120. Epub 2023 Feb 14.

Encoding of dynamic facial information in the middle dorsal face area

Affiliations

Encoding of dynamic facial information in the middle dorsal face area

Zetian Yang et al. Proc Natl Acad Sci U S A. .

Abstract

Faces in motion reveal a plethora of information through visual dynamics. Faces can move in complex patterns while transforming facial shape, e.g., during the generation of different emotional expressions. While motion and shape processing have been studied extensively in separate research enterprises, much less is known about their conjunction during biological motion. Here, we took advantage of the discovery in brain-imaging studies of an area in the dorsal portion of the macaque monkey superior temporal sulcus (STS), the middle dorsal face area (MD), with selectivity for naturalistic face motion. To gain mechanistic insights into the coding of facial motion, we recorded single-unit activity from MD, testing whether and how MD cells encode face motion. The MD population was highly sensitive to naturalistic facial motion and facial shape. Some MD cells responded only to the conjunction of facial shape and motion, others were selective for facial shape even without movement, and yet others were suppressed by facial motion. We found that this heterogeneous MD population transforms face motion into a higher dimensional activity space, a representation that would allow for high sensitivity to relevant small-scale movements. Indeed, we show that many MD cells carry such sensitivity for eye movements. We further found that MD cells encode motion of head, mouth, and eyes in a separable manner, requiring the use of multiple reference frames. Thus, MD is a bona fide face-motion area that uses highly heterogeneous cell populations to create codes capturing even complex facial motion trajectories.

Keywords: fMRI-targeted electrophysiology; face-processing systems; facial motion; gaze; superior temporal sulcus.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Recordings of MD cells’ responses to dynamic stimuli reveal selectivity to natural face motion. (A) Schematic of electrophysiological recording in MD. MD: the middle dorsal face area; STS: superior temporal sulcus. (B) Schematic of movies used to test face motion processing in MD. “Natural face” movies were 600-ms videos (25 fps) of monkey facial movements. In the two “down-sampled” conditions, videos were down-sampled to 5 fps and 8.3 fps, respectively. “Jumbled” movies were created by randomizing the frame orders of the natural movies. Slowed-down and sped-up videos were played at speeds three times slower or faster than natural movies, respectively. In “dephased” conditions, we scrambled the phase of the video content by applying the same randomization matrix to the phase component of each frame, which destroyed shape but retained motion. “Object” movies were videos (25 fps) of man-made object movements. Numbers on lower right corner of each image indicate corresponding frames in the natural face condition. See also Movie S1 for example videos. (C) Spike density function to movies (Top) and static images (Bottom) of an example face-motion-selective cell. Averaged response to all stimuli within each condition is shown. Gray band indicates time periods of stimulus presentation. Minor ticks on X-axis indicate the duration of each movie frame (40 ms). Firing rates (Hz) are color coded. Time-averaged response (baseline-subtracted) is shown on the right. Error bars represent mean ± SEM across trials within each category. (D) Mean population responses of all recorded MD cells to dynamic stimuli. **P < 10−4, ***P < 10−10, ns: P > 0.05 (Wilcoxon signed-rank test, false discovery rate corrected). Normalization was computed by first subtracting baseline activity and then dividing by the maximum absolute response magnitude across all stimuli. Error bars represent mean ± SEM across cells. (E) Distribution of dynamic face selectivity indices of entire population, calculated from responses to natural face and object movies. Dashed line at 0.33, corresponding to twice as high a response to face than to object movies.
Fig. 2.
Fig. 2.
MD shows highly heterogenous responses to facial motion. (A) Spike density function to movies (Top) and static images (Bottom) of an example face-motion-suppressive cell. Gray band indicates time periods of stimulus presentation. Minor ticks on X-axis indicate the duration of each movie frame (40 ms). Firing rates (Hz) are color coded. Time averaged response (baseline-subtracted) is shown on the right. Error bars denote mean ± SEM. (B) Distribution of face motion selectivity indices (FMSIs) across all face-selective cells. Proportion of cell tuning was inserted as the pie chart. FMSIs were calculated based on the averaged response to natural face videos. Top triangle depicts mean FMSI. FM+: face motion preferred, FM−: face motion suppression. Red and blue denote FM+ cells FM− cells, respectively, and gray denotes cells that are not tuned. (C) Scatter plot of SFSI versus FMSI for all face-selective cells. SFSIs were calculated based on the averaged response to monkey face images. Triangle and square mark example cells in Figs. 1C and 2A respectively. Color coding as in B.
Fig. 3.
Fig. 3.
MD captures physical similarities of face motion and encodes facial dynamics by dimensionality expansion. (A) Example movie frame (Top) and the estimated optic flow (orange arrows) overlaid on the frame (Bottom). (B) Schematic of dimensionality reduction analyses on optic flow and neural responses. First three PCs are shown for demonstration. (C) For face motion, distances between movie trajectories in optic-flow space were correlated with distances in the neural space (r: Pearson correlation). (D) For object motion, distances between movie trajectories in optic-flow space were not correlated with distances in the neural space. (E and F) Distribution of variance explained by first ten PCs in optic-flow and neural spaces for face motion (E) and object motion (F). Broader distribution suggests higher dimensionality. In E and F, P values are from Kolmogorov–Smirnov tests.
Fig. 4.
Fig. 4.
MD is sensitive to small movement of the eyes. (A) Example stimuli of dynamic and static gaze conditions with condition labels. Dynamic stimuli were gaze shifts (indicated by arrows) between two static gaze conditions. Dynamic conditions are labeled as “head orientation: first gaze direction -> second gaze direction”. Static conditions are labeled as “head orientation: gaze direction”. L: left, F: front, R: right. (B) Poststimulus spike density functions of an example cell showing tuning to gaze shifts. The cell showed strong tuning to the four gaze shifts within the frontal face (Left) but responded similarly to the three static gaze direction (Right). Shading of each line denotes mean ± SEM across trials. Gray band illustrates time period of stimulus presentation. (Top Left) Temporal structure of an example gaze shift: one frame (40 ms) of first static gaze followed by 160 ms of second static gaze. (Top Right) Temporal structure of an example static gaze stimulus. (C) Distribution of face-selective cells modulated by gaze shifts. All face-selective cells (N = 141) were included. (D) Population averaged response to the best dynamic gaze stimulus of each cell and that to the two composing static gaze stimuli. 40 dynamic gaze modulated cells were included. Shading of each line denotes mean ± SEM across cells. (E and F) Peak response to the best dynamic gaze plotted against that to the first (E) or second (F) composing static gaze across the 40 dynamic gaze modulated cells. P values from Wilcoxon signed-rank tests of cell responses at populational level. Red denotes cells that show significantly higher response to gaze shifts than the corresponding static gaze at single-cell level (right tailed rank sum tests on time-averaged single-trial responses were used). (G) Hypothesized neural dynamics to gaze shifts: A gaze shift could induce a transition of state from the neural trajectory to the first static gaze to the trajectory to the second static gaze. Open circles denote starting states. (H) Neural population trajectories of gaze-modulated cells to one example gaze shift condition (red) and the two composing static gaze conditions shown in neural space of first three PCs. Open circles denote starting states and filled dots indicate 40-ms intervals. Analysis was based on responses of the 49 gaze modulated cells (either dynamic or static gaze modulated). See SI Appendix, Fig. S2 for results of all other gaze shift conditions.
Fig. 5.
Fig. 5.
MD encodes face motion through multiple reference frames. (A) Workflow for motion energy estimation. For each natural face video in our dynamic stimulus set (12 videos in total, same as in Figs. 1 and 2), facial landmarks were tracked using DeepLabCut (23, 24). ROIs were then created based on the tracked landmarks to capture global head motion, motion of the eyes and of the mouth. Motion energy of the head was estimated by the magnitude of the averaged optic-flow vector (Bottom) within the head ROI. Within the eye and mouth ROIs, the head optic-flow vector was first subtracted to get the head-referenced motion vectors, after which the vector magnitude was averaged to obtain the energy values for the eyes and mouth. See Movie S3 for full results of motion estimation. (B) Averaged motion energy of the head, eyes, and mouth across all 12 natural face movies. Error bars indicate mean ± SEM across all video frames. *P < 0.05, ***P < 10−22 (Wilcoxon signed-rank test, FDR corrected). (C) Face-motion sensitivity matrices for cells showing sensitivity to global head motion (Leftmost), to face part motion of the eyes or mouth (Middle three panels), and to motion of both head and eyes or mouth (Rightmost). All FM+ cells (N = 80, Fig. 2B) were included in the analyses. Only cells with significant sensitivity are shown. Color coded is the neural sensitivity—standardized regression coefficients—of each cell to each motion energy. Overlaid stars mark significant sensitivity. *P < 0.05, **P < 0.01, ***P < 0.001 (t test). (D) Z-scored neural response traces (gray) from three example motion sensitive cells. Overlaid are linear regression model fits using the energy of head motion (red), mouth motion (light blue), or eye motion (dark blue). Cell responses to all 12 face videos are concatenated. Correlation between the model fits and the neural trace is shown using the same color code as the lines. (E) Schematic of the control analysis for eye motion sensitivity. The fixed-eye ROIs were fixed rectangles at the averaged position of the original eye ROIs. (F) Sensitivity to fixed-eye motion energy for the nine eye-motion sensitive cells in C. Conventions as in C. No eye-motion sensitive cells showed significant sensitivity to fixed-eye motion.

Similar articles

Cited by

References

    1. Gibson J. J., Visually controlled locomotion and visual orientation in animals. Br J. Psychol. 49, 182–194 (1958). - PubMed
    1. Puce A., Perrett D., Electrophysiology and brain imaging of biological motion. Philos. Trans. R Soc. Lond B Biol. Sci. 358, 435–445 (2003). - PMC - PubMed
    1. Duchaine B., Yovel G., A revised neural framework for face processing. Annu. Rev. Vis. Sci. 1, 393–416 (2015). - PubMed
    1. Pitcher D., Ungerleider L. G., Evidence for a third visual pathway specialized for social perception. Trends Cogn. Sci. 25, 100–110 (2021). - PMC - PubMed
    1. Haxby J. V., Hoffman E. A., Gobbini M. I., The distributed human neural system for face perception. Trends Cogn. Sci. 4, 223–233 (2000). - PubMed

Publication types