Abstract
Humans are unique in developing large lexicons as their communication tool. To achieve this, they are able to learn new words rapidly. However, neural bases of this rapid learning, which may be an expression of a more general cognitive mechanism, are not yet understood. To address this, we exposed our subjects to familiar words and novel spoken stimuli in a short passive perceptual learning session and compared automatic brain responses to these items throughout the learning exposure. Initially, we found enhanced activity for known words, indexing the ignition of their underlying memory traces. However, just after 14 min of learning exposure, the novel items exhibited a significant increase in response magnitude matching in size with that to real words. This activation increase, as we would like to propose, reflects rapid mapping of new word forms onto neural representations. Similar to familiar words, the neural activity subserving rapid learning of new word forms was generated in the left-perisylvian language cortex, especially anterior superior-temporal areas. This first report of a neural correlate of rapid learning suggests that our brain may effectively form new neuronal circuits online as it gets exposed to novel patterns in the sensory input. Understanding such fast learning is key to the neurobiological explanation of the human language faculty and learning mechanisms in general.
Introduction
In childhood, a large vocabulary of words is learnt rapidly with multiple words acquired daily; similar processes can take place later in life in second language acquisition. Dubbed “fast mapping” (Carey and Bartlett, 1978), rapid word learning has been explored in numerous behavioral studies over the recent decades (Heibeck and Markman, 1987; Gershkoff-Stowe and Hahn, 2007). While claims of learning just after a single exposure (Dollaghan, 1985) remain controversial, less so are suggestions that some dozens of trials may be sufficient for successful learning with further gain still possible through application of a behavioral routine containing up to 150 exposures within a short learning session (Pittman, 2008). Even more interestingly, while some experiments prompted suggestions of a unique human word-learning mechanism (Waxman and Booth, 2000), others have argued that rapid word learning may efficiently exploit general neurobiological learning mechanisms that are not necessarily language-specific (Markson and Bloom, 1997; Bloom, 2002) and may even be shared with other species (Kaminski et al., 2004).
Despite a substantial body of behavioral research, neural indexes of such rapid learning have not yet been identified and its implementation at the brain level remains obscure. Until now, most neuroscience research into long-term memory trace formation and language learning has concentrated on investigating learning-induced changes in the brain over days or weeks of practice and on overnight memory consolidation (for review, see Davis and Gaskell, 2009). In the present study, however, we look into plastic changes occurring in the brain within minutes of passive perceptual exposure to novel spoken stimuli.
To record activation of the putative emerging neural memory trace and compare it to a preexisting neural circuit, we used a passive oddball paradigm and presented our experimental participants with previously unfamiliar but phonologically regular meaningless “pseudo-word” and familiar word stimuli. Earlier research revealed that, given that stimuli are matched for acoustic and phonological features, passively presented oddball stimuli produce larger event-related brain responses for known words than unknown pseudo-words at 100–200 ms after the acoustic information is sufficient for identifying stimulus words (for review, see Pulvermüller and Shtyrov, 2006; Shtyrov and Pulvermüller, 2007). This activity enhancement was attributed to activation of neuronal circuits distributed over left-perisylvian areas which may serve as cortical memory traces of familiar language items (Näätänen et al., 1997; Pulvermüller and Shtyrov, 2006). As previous behavioral results (Pittman, 2008) suggested that performance on newly acquired items is improved by exposing the subjects to up to 150 trials in a short learning session, we presented our subjects with 160 trials of the novel word-form as an oddball stimulus in a short (∼14 min) auditory exposure session using a matched known word as control. Whole-head high-density electroencephalography was applied to map neuronal mass activity with high temporal resolution (Nunez and Srinivasan, 2006). Most importantly, contrary to the common practice of averaging activity over numerous trials, we scrutinized minute changes in the neural dynamics throughout the session.
Materials and Methods
Sixteen healthy right-handed native British English speakers participated in the experiments. They were seated in an electrically and acoustically shielded chamber while their electroencephalogram was recorded using a whole-scalp 65-channel set-up (Compumedics Neuroscan).
During the recording, spoken linguistic stimuli were presented binaurally at 50 dB above individual hearing threshold. Previous neurophysiological research showed that activation of individual word memory traces may be recorded as an early increased negative potential when linguistic material is presented passively in pseudorandom oddball sequences (Pulvermüller and Shtyrov, 2006). Notably, these word-specific potentials are generated automatically, in the absence of attention to the stimuli or stimulus-related tasks (Näätänen et al., 1997). Another advantage of the oddball design is that it allows to incorporate identical acoustic contrasts in different stimulus combinations thus fully controlling the contribution of acoustic-phonetic stimulus features to the brain response and making it possible to link minute differences in event-related potentials (ERPs) to psycholinguistic properties. Finally, this design allows one to specify precisely the time when the rare oddball stimulus diverges from its frequent competitors and when the phonological contrast can be perceived thus allowing for stimulus recognition; in this way, the exact timing of brain responses linked to potential memory trace activation becomes possible (Shtyrov and Pulvermüller, 2007). Given these advantages of the oddball response and previous behavioral linguistic research indicating that word learning reaches a plateau at ∼150 repetitions within ¼ h (Pittman, 2008), we presented our experimental subjects with a novel spoken pseudo-word (pite, [pait]) 160 times over ∼14 min as rare oddball stimuli, randomly interspersed (stimulus onset asynchrony 850 ms) with 660 standard filler stimuli (pipe, [paip]), in a passive task. In a second control block, an acoustically similar word (bite, [bait]) was presented against the background of frequent stimulus (bipe, [baip]) using identical experimental settings. The order of the two blocks was counterbalanced across the subject group.
The initial consonant-diphthong parts in both blocks were identical for the critical and filler items (see supplemental Fig. 1, available at www.jneurosci.org as supplemental material); this design guaranteed that in both blocks the critical oddball stimuli could only be identified from the competitor environment at the last phoneme ([t], as opposed to [p], in the frequent stimulus). These final phonemes were made identical across both conditions using cross-splicing to remove any associated acoustic confounds. This technique allowed us to avoid any acoustic differences before the onset of the final plosive and to control exactly the point in time when the acoustic contrast occurred and, consequently, when each item could be recognized as such with its memory trace (if present) activated. All stimuli were 460 ms in duration and were matched to have the same peak sound energy and fundamental frequency; the divergence point was at 345 ms following 85 ms silent closure typical of English stop-consonants (Fig. 1; supplemental Fig. 1, available at www.jneurosci.org as supplemental material). All analysis of the naturally spoken stimuli and their modifications were implemented in Cool Edit 2000 program (Syntrillium Software Corp.).
Initial inspection of the brain responses time-locked to the stimulus onset (Fig. 1) showed similar brain responses to frequent and rare stimuli up until the time point where they first diverged acoustically (divergence point) and stimulus words and pseudo-words could therefore be recognized. To concentrate on the time period when stimulus-specific recognition could occur, we therefore shortened the epochs (thereby also reducing rejection rate and thus improving signal-to-noise ratio) and timed them to the onset of the final plosion, i.e., the divergence point. After bandpass filtering (1–20 Hz, 12 dB/octave), epoching (−50 to 400 ms) and baseline correction (−50 to 0 ms), we removed all trials with excessive noise (gradient >100 μV/100 ms) separately for each channel. Following this, three types of analysis were used. We first compared the subsets covering the initial and final 25% of the learning session. Notably, these amounted to 40 or fewer trials, which is substantially below the standard auditory ERP studies that typically use in excess of 100 trials for averaging; as we hypothesized that rapid learning could occur within the short time interval, we had to limit the number of trials to see any potential learning effects. To overcome the low signal-to-noise ratio resulting from the inherent small number of trials, we also averaged together data from all midline electrodes where the auditory evoked response is typically maximal (Fz, FCz, Cz, CPz). Mean amplitudes in 20 ms window around the first negative peak (∼120 ms) were extracted and submitted to ANOVAs including the factors stimulus type (word/pseudo-word) and exposure time (early/late in the session).
Our second analysis, aimed at finer-scale temporal changes in the responses over the course of the session, applied linear regression on individual subjects' peak amplitude data obtained from consecutive 10% (i.e., 16 or fewer trials) intervals in both recordings. Having fitted the least-squares line to individual amplitude measurements, we submitted regression coefficients to ANOVAs to verify significance of any differences from zero and between conditions. Matlab 7.0 programming environment (MathWorks) was used for all above operations on the EEG signal; statistical analysis was implemented in Statistica 7 (Statsoft).
In the final analysis, aimed at localizing cortical sources of the found effects, we performed L2 minimum-norm current estimation using the trials collected in the beginning and end (25%) of the exposure session. This distributed source analysis does not make a priori assumptions about underlying generators and attempts to minimize the overall activity that can account for the recorded electric potentials (Ilmoniemi, 1993). Realistic head shape and three-layer (skin-skull-cortex) boundary-element model (BEM) were used to account for current spread through the head tissues. Source locations were constrained to the gray matter surface. Two analyses were performed to estimate the source dynamics with learning. First, using CURRY 6.1 software package (Compumedics Neuroscan), source reconstruction was applied to the grand-average data that benefit from increased signal-to-noise ratio; this was done at peak response intervals indicated by the ERP analysis above. As this source reconstruction indicated that the most prominent learning-related changes between the initial and the final quarters of exposure occurred in the perisylvian areas (Fig. 2; supplemental Fig. 2, available at www.jneurosci.org as supplemental material), we repeated the procedure on individual subjects' data, extracted average dipole moments in perisylvian regions-of-interest in both cerebral hemispheres, and submitted these to ANOVAs with factors stimulus type (word/pseudo-word) and exposure time (early/late in the session).
Results
As expected, ERP responses time-locked to the stimulus onsets showed no clear differences between the rare and frequent stimuli up until the divergence point (345 ms) when each stimulus could be uniquely identified from its environment (Fig. 1). After the divergence point, however, although acoustic contrasts were identical, the critical rare stimuli diverged in their temporal dynamics, which was also differential for the novel and familiar items early and late in the exposure session. This was further explored in the analysis that concentrated on the ERP to the critical stimuli following the divergence points. The results (Fig. 2) showed that the early (∼90–140 ms after the disambiguation point) enhancement of word-elicited brain activation over that of pseudo-words was present in the beginning of the session (F(1,15) = 4.14; p < 0.03), but vanished ∼10 min later (p > 0.3). Whereas the word response remained stable (with a nonsignificant (p > 0.5) tendency to decline, attributable to habituation of repeatedly activated memory circuits), the novel pseudo-word response significantly increased with time (F(1,15) = 4.32, p < 0.027; see also Fig. 3, left). A significant interaction of stimulus type by exposure time (F(1,15) = 4.55, p < 0.049) further confirmed different time evolutions of brain responses to novel and familiar items.
Electric brain responses, recorded early and late in the learning session, to rarely presented critical word and pseudo-word stimuli and their corresponding frequently presented standard stimuli. Responses recorded from the vertex (electrode Cz) and time-locked to stimulus onsets are presented here; waveforms and spectrograms of critical stimulus items are overlaid on the brain response plots (see also supplemental Fig. 1, available at www.jneurosci.org as supplemental material for complete stimulus information). Note that the responses to frequent and rare stimuli in each set are similar until the divergence point at 345 ms when the final stop occurs. Note also, that following the divergence point the temporal dynamics of word and pseudo-word responses differ: larger responses to words early in the session (left) and more similar responses at the end, after pseudo-word responses had increased (right).
Electric brain response registered at vertex (Cz) for word and pseudo-word stimuli early and late in the learning session. Responses are time-locked to the stimulus divergence points (final plosion onsets) when the final phoneme could be perceived and the word or pseudo-word stimulus first be recognized. Note the larger word response early in the session (left) and the similar responses at the end, after the pseudo-word response had increased (right). Cortical source distributions (L2 minimum-norm topographies at 120 ms after the divergence point, left view) are displayed in the insets. Note that known words elicited perisylvian language-area activation early and late in the session. In contrast, novel pseudo-words elicited anterior-superior-temporal activation only at the end of the session (brighter colors indicated higher dipole moment, activation thresholded at top 20%; for full unthresholded activations landscape, see supplemental Fig. 2, available at www.jneurosci.org as supplemental material).
Statistical assessment of response change through the exposure session. Note the absence of statistical change in the word response and significant increase in the pseudo-word response confirmed by both factorial comparison of early vs late 25% of trials (left) and linear regression over consecutive 10% sub-blocks (right). Individual subjects' data points and regression lines in black, group average amplitudes and regression lines in color.
To follow the development of language-evoked brain activity on a finer time scale, linear regression analysis was applied to word- and pseudo-word-elicited activation calculated for successive mini-averages (10% of the stimuli, 16 trials) obtained from each individual (Fig. 3, right). Regression coefficients calculated for words did not significantly differ from zero (p > 0.17). For the newly learnt pseudo-words, however, the regression analysis showed a significant increase in event-related activity with exposure time (F(1,15) = 4.098; p < 0.039). The specific increase of brain responses to pseudo-words, but not words, was further confirmed by a statistical comparison of regression slopes (β values) obtained from each subject individually and entered into group analysis (F(1,15) = 8.26; p < 0.006).
To localize the cortical sources potentially underlying the rapid emergence of memory traces for words, L2 minimum-norm current estimation was applied (Ilmoniemi, 1993) to the electric brain response, in the interval indicated by the sensor-level analysis. Sources of word-evoked responses were localized in left-perisylvian neocortex, in superior-temporal and frontal areas, this pattern remaining constant throughout the recording. Pseudo-words, which did not elicit pronounced cortical sources initially, strongly activated left anterior superior-temporal cortex toward the end of training (see source reconstruction maps, Fig. 1; supplemental Fig. 2, available at www.jneurosci.org as supplemental material). The activation of superior-temporal perceptual-acoustic areas, which lacked the frontal component seen for words, is consistent with the perceptual type of learning induced by the present experiment. Statistical comparison of average dipole moments over left perisylvian regions-of-interest extracted from individual subjects' data confirmed the increase in novel pseudo-word activation with exposure (F(1,15) = 3.70; p < 0.048) but showed no significant changes in the amount of activation elicited by word (p > 0.5). Sources in the right hemisphere, although suggestive of differential temporal patterns (supplemental Fig. 2, available at www.jneurosci.org as supplemental material), did not yield any statistically significant results.
Discussion
This is, to our knowledge, the first report of a cortical correlate of learning emerging within minutes of passive perceptual exposure to a new spoken pseudo-word. The data, showing an increase in brain response as an immediate result of learning, suggest that that our brain may be capable of forming new neuronal circuits for linguistic events rapidly as it gets exposed to novel patterns of human speech. Understanding such fast learning is key to the neurobiological explanation of the human language faculty, as only humans are capable of acquiring large word vocabularies rapidly.
The brain structures engaged by such rapid passive word-form learning are part of those also effective in the processing of meaningful words, specifically anterior superior-temporal cortex included in the “what” stream of auditory processing (Rauschecker and Scott, 2009). Fast learning can be explained by general neurobiological principles, most notably by Hebbian synaptic strengthening following correlated neuronal activity (Pulvermüller, 1999). This suggestion is therefore well in line with claims that rapid learning is not specific to language function or even to human species (Markson and Bloom, 1997; Kaminski et al., 2004) and may, as such, be an expression of a more general neurobiological learning mechanism. The extremely efficient application of this mechanism to the learning of vocabularies of thousands of words is, of course, a human feature that is potentially facilitated by neuroanatomical advantages in the form of efficient connections within left temporofrontal perisylvian networks (Catani et al., 2005; Saur et al., 2008).
Fast storage of novel word forms, which is long known from behavioral data (Carey and Bartlett, 1978), is also implied by everyday observations, e.g., in the context of a language lesson or when being exposed to new specialist terms, newly-learnt words can be used almost immediately, without a need to wait for long-term consolidation to take place. What we document here specifically is a potential neurophysiological correlate of the learning of novel word forms via repetitive perceptual exposure, along with the relevant neocortical structures and their activation time course. Importantly, a certain level of caution is necessary in parallelizing everyday language learning in infants or adults with our present results, which are obtained with meaningless pseudo-words in passive presentation and in a somewhat artificial experimental context, aimed at optimizing the signal-to-noise ratio of neurophysiological responses. While the current paradigm, which was based on previous behavioral and neurophysiological findings, appears to be an efficient tool for observing fast changes in cortical activation patterns, future research should further explore the exact influence of learning regime (e.g., passive vs active), stimulus psycholinguistic properties (e.g., semantic meaning) and ecological validity of task contexts on emergent memory circuits in the brain.
Footnotes
-
This work was supported by the Medical Research Council, UK (U.1055.04.014.00001.01 and U.1055.04.003.00001.01). We thank Olaf Hauk, Bundy Mackintosh, and Sally Butterfield for their help at various stages of this work.
- Correspondence should be addressed to Yury Shtyrov, Medical Research Council (MRC), Cognition and Brain Sciences Unit, 15 Chaucer Road, CB2 7EF Cambridge, UK. Yury.Shtyrov{at}mrc-cbu.cam.ac.uk