. 2018 Apr 3:7:e33468.

doi: 10.7554/eLife.33468.

Large-scale replication study reveals a limit on probabilistic prediction in language comprehension

Mante S Nieuwland^{1

2}, Stephen Politzer-Ahles^{3

4}, Evelien Heyselaar⁵, Katrien Segaert⁵, Emily Darley⁶, Nina Kazanina⁶, Sarah Von Grebmer Zu Wolfsthurn⁶, Federica Bartolozzi², Vita Kogan², Aine Ito^{2

4}, Diane Mézière², Dale J Barr⁷, Guillaume A Rousselet⁷, Heather J Ferguson⁸, Simon Busch-Moreno⁹, Xiao Fu⁹, Jyrki Tuomainen⁹, Eugenia Kulakova¹⁰, E Matthew Husband⁴, David I Donaldson¹¹, Zdenko Kohút¹², Shirley-Ann Rueschemeyer¹², Falk Huettig¹

Affiliations

¹ Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
² School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, United Kingdom.
³ Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
⁴ Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, United Kingdom.
⁵ School of Psychology, University of Birmingham, Birmingham, United Kingdom.
⁶ School of Experimental Psychology, University of Bristol, Bristol, United Kingdom.
⁷ Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.
⁸ School of Psychology, University of Kent, Canterbury, United Kingdom.
⁹ Division of Psychology and Language Sciences, University College London, London, United Kingdom.
¹⁰ Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
¹¹ Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom.
¹² Department of Psychology, University of York, York, United Kingdom.

PMID: 29631695
PMCID: PMC5896878
DOI: 10.7554/eLife.33468

Large-scale replication study reveals a limit on probabilistic prediction in language comprehension

Mante S Nieuwland et al. Elife. 2018.

. 2018 Apr 3:7:e33468.

doi: 10.7554/eLife.33468.

Authors

Affiliations

¹ Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
² School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, United Kingdom.
³ Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong.
⁴ Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, United Kingdom.
⁵ School of Psychology, University of Birmingham, Birmingham, United Kingdom.
⁶ School of Experimental Psychology, University of Bristol, Bristol, United Kingdom.
⁷ Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.
⁸ School of Psychology, University of Kent, Canterbury, United Kingdom.
⁹ Division of Psychology and Language Sciences, University College London, London, United Kingdom.
¹⁰ Institute of Cognitive Neuroscience, University College London, London, United Kingdom.
¹¹ Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom.
¹² Department of Psychology, University of York, York, United Kingdom.

PMID: 29631695
PMCID: PMC5896878
DOI: 10.7554/eLife.33468

Abstract

Do people routinely pre-activate the meaning and even the phonological form of upcoming words? The most acclaimed evidence for phonological prediction comes from a 2005 Nature Neuroscience publication by DeLong, Urbach and Kutas, who observed a graded modulation of electrical brain potentials (N400) to nouns and preceding articles by the probability that people use a word to continue the sentence fragment ('cloze'). In our direct replication study spanning 9 laboratories (N=334), pre-registered replication-analyses and exploratory Bayes factor analyses successfully replicated the noun-results but, crucially, not the article-results. Pre-registered single-trial analyses also yielded a statistically significant effect for the nouns but not the articles. Exploratory Bayesian single-trial analyses showed that the article-effect may be non-zero but is likely far smaller than originally reported and too small to observe without very large sample sizes. Our results do not support the view that readers routinely pre-activate the phonological form of predictable words.

Keywords: N400; human; language comprehension; neuroscience; prediction.

PubMed Disclaimer

Conflict of interest statement

MN, SP, EH, KS, ED, NK, SV, FB, VK, AI, DM, DB, GR, HF, SB, XF, JT, EK, EH, DD, ZK, SR, FH No competing interests declared

Figures

**Figure 1.. Replication analysis.**
Correlations between N400 amplitude and article/noun cloze probability per laboratory. N400 amplitude is the mean voltage in the 200–500 ms time window after word onset. A positive value corresponds to the canonical finding that N400 amplitude became smaller (less negative—more positive) with increasing cloze probability. Here and in all further plots, negative voltages are plotted upwards. Upper graph: Scatter plots showing the correlation between cloze and N400 activity at electrode Cz, for each lab. The position of Cz and the other electrodes is displayed in the head plot in between the upper and lower graph. Lower graph: Scalp distribution of the r-values for each lab. Asterisks (*) indicate electrodes that showed a statistically significant correlation (two-tailed p<0.05, not corrected for multiple comparisons). Exact r- and p-values for each laboratory and EEG channel are available as source data (Figure 1—source datas 1–4) and on https://osf.io/eyzaq.

**Figure 2.. Replication analysis.**
Scalp distribution and r-values at each channel based on data pooled from all laboratories, using a 500 ms baseline correction procedure as used by DeLong et al. (2005). Data were pooled after computing bin-averages per laboratory as in the original study, treating the laboratories as multiple observations of each bin-average. Asterisks (*) indicate electrodes that showed a statistically significant correlation (two-tailed, not corrected for multiple comparisons). Exact r- and p-values for each EEG channel are available as source data (Figure 2—source datas 1–4).

**Figure 3.. Single-trial analysis.**
Grand-average ERPs elicited by relatively expected and unexpected words (cloze higher/lower than 50%) and the associated difference waveforms (low minus high cloze) at electrode Cz. Dotted lines indicate one standard deviation above or below the grand average.

**Figure 4.. Single-trial analysis.**
Relationship between cloze and ERP amplitude for articles and nouns in the N400 spatiotemporal window, as illustrated by the mean ERP values per cloze value (number of observations reflected in circle size), along with the regression line and 95% confidence interval. A change in article cloze from 0 to 100 is associated with a change in amplitude of 0.296 µV (95% confidence interval: −0.08 to .67). A change in noun-cloze from 0 to 100 is associated with a change in amplitude of 2.22 µV (95% confidence interval: 1.75 to 2.69). The data for these analyses were pooled across all nine labs.

**Figure 5.. Exploratory single-trial analyses.**
The relationship between cloze and ERP amplitude as illustrated by the mean ERP values per cloze value (number of observations reflected in circle size), along with the regression line and 95% confidence interval, from two exploratory analyses. We performed a test which used a longer baseline time window (500 ms, left panel) to better control for pre-article voltage levels. This test reduced the initially observed effect of article-cloze, β = 0.14, CI [−0.25, .53], χ²(1)=0.46, p=0.50). An analysis in the 500 to 100 ms time window *before* article-onset (right panel) revealed a non-significant effect of cloze that resembled the pattern observed *after* article-onset, β = 0.16, CI [−0.07, .39], χ²(1)=1.82, p=0.18, shedding doubt on the conclusion that the observed results are due to the presentation of the articles.

**Figure 6.. Exploratory replication Bayes factor analysis.**
This analysis quantifies the obtained evidence for the null hypothesis (H₀) that N400 is not impacted by cloze, or for the alternative hypothesis (H₁) that N400 is impacted by cloze with the direction *and* size of effect reported by DeLong et al. Scalp maps show the common logarithm of the replication Bayes factor for each electrode, capped at log(100) for presentation purposes. Electrodes that yielded at least moderate evidence for or against the null hypothesis (Bayes factor of ≥3) are marked by an asterisk. At posterior electrodes where DeLong et al. found their effects, our article data yielded strong to extremely strong evidence for the null hypothesis, whereas our noun data yielded extremely strong evidence for the alternative hypothesis (upper graphs). These results were obtained with the procedure described in DeLong et al. (no baseline correction), and with a 500 ms pre-word baseline correction (lower graphs), the procedure later described by DeLong and colleagues.

**Figure 7.. Exploratory Bayesian mixed-effects model analyses.**
Posterior density distributions for the effect of cloze on ERP amplitudes in the N400 window. The x-axis shows cloze effect sizes (i.e. changes in microvolts associated with an increase from 0% cloze probability to 100% cloze probability). The black line indicates the posterior distribution of effects; higher values of the posterior density at a given effect size indicate higher probability that this is the true effect size in the population. The peak of the posterior distribution roughly corresponds to the point estimate of the effect size (the regression coefficient) fitted from the Bayesian mixed effect model, i.e., the most likely value of the true effect size. The middle 95% of the posterior distribution, shaded in orange, corresponds to a two-tailed 95% credible interval for the effect size—i.e., an interval that we can be 95% confident contains the true effect. The green dotted line indicates the prior distribution (i.e., our expectation about where the true effect would lie before the data were collected). For the articles, this prior is centred on 1.25 μV, an approximation of the effect observed by DeLong et al. (2005), and for the nouns it is centred on 3.5 μV. The black connected dots illustrate the ratio between the posterior and prior distribution (i.e. the Bayes factor) at the effect size of 0 μV; for example, a Bayes factor of 4 suggests we can be four times more certain that the true effect is zero after having conducted this experiment than before, or, in other words, that the data increased our confidence in the null effect of zero fourfold. We performed these analyses for each of the linear mixed-effects model analyses we performed. We note that in all the article-analyses, the posterior probability of the estimated effect being greater than zero is around 80 or 90%, although this is also true for the pre-stimulus variable, shedding doubt that the observed results are due to presentation of the articles. In none of our article-analyses did zero lie outside the obtained credible interval, whereas for the nouns, zero lay outside the credible interval. These results are consistent with a failure to replicate the size of the article-effect reported by DeLong et al. and a successful replication of the noun-effect.

**Figure 8.. Control experiment.**
P600 effects at electrode Pz per lab associated with flouting of the English a/an rule. Plotted ERPs show the grand-average difference waveform and standard deviation for ERPs elicited by ungrammatical expressions (‘an kite’) minus those elicited by grammatical expressions (‘a kite’).

See this image and copyright information in PMC

Cited by

On the Mathematical Relationship Between Contextual Probability and N400 Amplitude.
Michaelov JA, Bergen BK. Michaelov JA, et al. Open Mind (Camb). 2024 Jun 28;8:859-897. doi: 10.1162/opmi_a_00150. eCollection 2024. Open Mind (Camb). 2024. PMID: 39077107 Free PMC article.
Understanding words in context: A naturalistic EEG study of children's lexical processing.
Levari T, Snedeker J. Levari T, et al. J Mem Lang. 2024 Aug;137:104512. doi: 10.1016/j.jml.2024.104512. Epub 2024 Mar 8. J Mem Lang. 2024. PMID: 38855737 Free PMC article.
Explaining the Sentence Superiority Effect and N400s Elicited by Words and Short Sentences with OB1-Reader.
Seijdel N, Stolwijk G, Janicas B, Snell J, Meeter M. Seijdel N, et al. J Cogn. 2024 Apr 17;7(1):34. doi: 10.5334/joc.358. eCollection 2024. J Cogn. 2024. PMID: 38638462 Free PMC article.
Language prediction in monolingual and bilingual speakers: an EEG study.
Momenian M, Vaghefi M, Sadeghi H, Momtazi S, Meyer L. Momenian M, et al. Sci Rep. 2024 Mar 21;14(1):6818. doi: 10.1038/s41598-024-57426-y. Sci Rep. 2024. PMID: 38514713 Free PMC article.
A predictive coding model of the N400.
Nour Eddine S, Brothers T, Wang L, Spratling M, Kuperberg GR. Nour Eddine S, et al. Cognition. 2024 May;246:105755. doi: 10.1016/j.cognition.2024.105755. Epub 2024 Feb 29. Cognition. 2024. PMID: 38428168

See all "Cited by" articles

References

1. Altmann GT, Kamide Y. Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition. 1999;73:247–264. doi: 10.1016/S0010-0277(99)00059-1. - DOI - PubMed
1. Altmann GT, Mirković J. Incrementality and prediction in human sentence processing. Cognitive Science. 2009;33:583–609. doi: 10.1111/j.1551-6709.2009.01022.x. - DOI - PMC - PubMed
1. Baayen RH, Davidson DJ, Bates DM. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language. 2008;59:390–412. doi: 10.1016/j.jml.2007.12.005. - DOI
1. Baggio G, Hagoort P. The balance between memory and unification in semantics: A dynamic account of the N400. Language and Cognitive Processes. 2011;26:1338–1367. doi: 10.1080/01690965.2010.542671. - DOI
1. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language. 2013;68:255–278. doi: 10.1016/j.jml.2012.11.001. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

ERC Starting grant 636458/European Research Council/International

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large-scale replication study reveals a limit on probabilistic prediction in language comprehension

Affiliations

Large-scale replication study reveals a limit on probabilistic prediction in language comprehension

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources