Abstract
Reward representation in ventral striatum is boosted by perceptual novelty, although the mechanism of this effect remains elusive. Animal studies indicate a functional loop (Lisman and Grace, 2005) that includes hippocampus, ventral striatum, and midbrain as being important in regulating salience attribution within the context of novel stimuli. According to this model, reward responses in ventral striatum or midbrain should be enhanced in the context of novelty even if reward and novelty constitute unrelated, independent events. Using fMRI, we show that trials with reward-predictive cues and subsequent outcomes elicit higher responses in the striatum if preceded by an unrelated novel picture, indicating that reward representation is enhanced in the context of novelty. Notably, this effect was observed solely when reward occurrence, and hence reward-related salience, was low. These findings support a view that contextual novelty enhances neural responses underlying reward representation in the striatum and concur with the effects of novelty processing as predicted by the model of Lisman and Grace (2005).
Introduction
The basal ganglia, together with their dopaminergic afferents, provide a mechanism to learn about reward value of different behavioral options (Berridge and Robinson, 2003; Frank et al., 2004; Pessiglione et al., 2006). In line with this view, fMRI studies show that reward and reward-predictive cues elicit brain activity in the striatum (Delgado et al., 2000; Knutson et al., 2000; O'Doherty et al., 2003, 2004) and midbrain (Aron et al., 2004; Wittmann et al., 2005). However, the midbrain dopaminergic system also responds to nonrewarding novel stimuli in monkeys (Ljungberg et al., 1992) and humans (Bunzeck and Duzel, 2006; Wittmann et al., 2007). From a computational perspective, it has been suggested that novelty itself may act as a motivational signal that boosts reward representation and drives exploration of an unknown, novel choice option (Kakade and Dayan, 2002).
Although novelty processing and reward processing share common neural mechanisms, the neural substrate that supports an interaction between novelty and reward remains poorly understood. Research in animals reveals that hippocampal novelty signals regulate the ability of dopamine neurons to show burst firing activity. Given that burst firing is the main dopaminergic response pattern coding for rewards, and possibly other salient events, there is good reason to suspect that hippocampal novelty signals have the potential to regulate reward processing and salience attribution (Lisman and Grace, 2005). Hippocampal novelty signals are conveyed to ventral tegmental area (VTA) through the subiculum, ventral striatum, and ventral pallidum, where they cause disinhibition of silent dopamine neurons to induce a mode of tonic activity (Grace and Bunney, 1983; Lisman and Grace, 2005). Importantly, only tonically active but not silent dopamine neurons transfer into burst firing mode and show phasic responses (Floresco et al., 2003). In this way, hippocampal novelty signals have the potential to boost phasic dopamine signals and facilitate encoding of new information into long-term memory.
Although recent research has shown that stimulus novelty enhances a striatal reward prediction error (Wittmann et al., 2008), this finding does not address a physiological hypothesis that contextual novelty exerts an enhancing effect upon subsequent reward signals (Lisman and Grace, 2005). Testing this requires an independent manipulation of the level of novelty and reward such that novelty (and familiarity) act as temporally extended contexts preceding rewards. We investigated the expression of striatal modulation of reward processing in the context of novelty by presenting a novel stimulus preceding the presentation of cues that predict rewards. Furthermore, we manipulated both factors (novelty and reward) independently; this allowed us to distinguish between their corresponding neural representations. We presented subjects with one of three different fractal images that cued reward delivery with a given probability [no reward (p = 0), low reward probability (p = 0.4), and high reward probability (p = 0.8)]. In this way, our design also enabled us to investigate whether contextual novelty influences on reward responses were affected by the probability of reward occurrence. A probability-dependent effect of novelty on reward processing would provide a strong support for the prediction that novelty and reward processing functionally interact. In contrast, an effect of novelty on reward-related brain activity that is independent of reward-probability and magnitude would indicate that novelty and reward share brain regions and produce additive neural activity without a functional interaction.
Materials and Methods
Subjects.
Sixteen adults participated in the experiment (nine female and seven male; age range, 19–32 years; mean = 23.8, SD = 3.84 years). All subjects were healthy, right-handed, and had normal or corrected-to-normal acuity. None of the participants reported a history of neurological, psychiatric, or medical disorders or any current medical problems. All experiments were run with each subject's written informed consent and according to the local ethics clearance (University College London, London, UK).
Experimental design and task.
The task was divided into three phases. In phase 1, subjects were familiarized with a set of 10 images (five indoor, five outdoor). Each image was presented 10 times for 1000 ms with an interstimulus interval of 1750 ± 500 ms. Subjects indicated the indoor/outdoor status using their right index and middle fingers. In phase 2, three fractal images were paired, under different probabilities (0, 0.4, and 0.8), with a monetary reward of 10 pence in a conditioning session. Each fractal image was presented 40 times. On each trial, one of three fractal images was presented on the screen for 750 ms and subjects indicated the detection of the stimulus presentation with a button press. The probabilistic outcome (10 or 0 pence) was presented as a number on the screen 750 ms later for another 750 ms and subjects indicated whether they won any money or not using their index or middle finger. The intertrial interval (ITI) was 1750 ± 500 ms. Finally, in a test phase (phase 3), the effect of contextual novelty on reward-related responses was determined in four 11 min sessions (Fig. 1). Here, an image was presented for 1000 ms and subjects indicated the indoor/outdoor status using their right index and middle fingers. Responses could be made while the scene picture and subsequent fractal image were displayed on the screen (1750 ms in total). The image was either from the familiarized set of pictures from phase 1 (referred to as “familiar images”) or from another set of pictures that had never been presented (referred to as “novel images”). In total, 240 novel images were presented to each subject. Thereafter, one of the three fractal images from phase 2 (referred to as reward-predictive cue) was presented for 750 ms (here, subjects were instructed not to respond). As in the second phase, the probabilistic outcome (10 or 0 pence) was presented 750 ms later for another 750 ms and subjects indicated whether they won money or not using their index or middle finger. Responses could be made while the outcome was displayed on the screen and during the subsequent intertrial interval (2500 ± 500 ms in total). The ITI was 1750 ± 500 ms. During each session, each fractal image was presented 20 times following a novel picture and 20 times following a familiar picture, resulting in 120 trials per session. The presentation order of the six trial types was fully randomized. All three experimental phases were performed inside the MRI scanner but blood oxygenation level-dependent (BOLD) data were acquired only during the test phase (phase 3). Subjects were instructed to respond as quickly and as correctly as possible and that they would be paid their earnings up to £20. Participants were told that 10 pence would be subtracted for each incorrect response—these trials were excluded from the analysis. Total earnings were displayed on the screen only at the end of the fourth block.
Experimental design. Trial time line of the test task used during fMRI data acquisition. Beforehand, subjects underwent a familiarization and a conditioning phase inside the scanner but fMRI data were not acquired.
All images were gray-scaled and normalized to a mean gray-value of 127 and an SD of 75. None of the scenes depicted human beings or human body parts (including faces) in the foreground. Stimuli were projected onto the center of a screen and the subjects watched them through a mirror system mounted on the head coil of the fMRI scanner.
fMRI data acquisition.
fMRI was performed on a 3 tesla Siemens Allegra magnetic resonance scanner (Siemens) with echo planar imaging (EPI). In the functional session, 48 T2*-weighted images per volume (covering whole head) with BOLD contrast were obtained (matrix, 64 × 64; 48 oblique axial slices per volume angled at −30° in the anteroposterior axis; spatial resolution, 3 × 3 × 3 mm; TR = 2880 ms; TE = 30 ms). The fMRI acquisition protocol was optimized to reduce susceptibility-induced BOLD sensitivity losses in inferior frontal and temporal lobe regions (Weiskopf et al., 2006). For each subject, functional data were acquired in four scanning sessions containing 224 volumes per session. Six additional volumes at the beginning of each series were acquired to allow for steady-state magnetization and were subsequently discarded. Anatomical images of each subject's brain were collected using multiecho three-dimensional FLASH for mapping proton density, T1, and magnetization transfer (MT) at 1 mm3 resolution (Weiskopf and Helms, 2008), and by T1-weighted inversion recovery prepared EPI sequences (spatial resolution, 1 × 1 × 1 mm). Additionally, individual field maps were recorded using a double-echo FLASH sequence (matrix size, 64 × 64; 64 slices; spatial resolution, 3 × 3 × 3 mm; gap, 1 mm; short TE, 10 ms; long TE, 12.46 ms; TR, 1020 ms) for distortion correction of the acquired EPI images (Weiskopf et al., 2006). Using the “FieldMap toolbox” (Hutton et al., 2002), field maps were estimated from the phase difference between the images acquired at the short and long TEs.
fMRI data analysis.
Preprocessing included realignment, unwarping using individual fieldmaps, spatial normalizing to the Montreal Neurology Institute space, and finally smoothing with a 4 mm Gaussian kernel. The fMRI time series data were high-pass filtered (cutoff = 128 s) and whitened using an AR(1) model. For each subject, a statistical model was computed by applying a canonical hemodynamic response function combined with time and dispersion derivatives (Friston et al., 1998).
Our 2 × 3 factorial design included six conditions of interest, which were modeled as separate regressors: familiar image with reward probability 0, familiar image with reward probability 0.4, familiar image with reward probability 0.8, novel image with reward probability 0, novel image with reward probability 0.4, and novel image with reward probability 0.8. The temporal proximity of the reward-predictive cues (i.e., fractal image) and the reward outcome itself pose problems for the separation of BOLD signals arising from these two events. Therefore, we modeled each trial as a compound event, using a mini-boxcar that included the presentation of both the cue and the outcome. This technical limitation was not problematic for our factorial analysis, which concentrated on the interaction between novelty and reward processing and co-occurrences of reward and novelty effects. Error trials were modeled as a regressor of no interest. To capture residual movement-related artifacts, six covariates were included (three rigid-body translations and three rotations resulting from realignment) as regressors of no interest. Regionally specific condition effects were tested by using linear contrasts for each subject and each condition (first-level analysis). The resulting contrast images were entered into a second-level random-effects analysis. Here, the hemodynamic effects of each condition were assessed using a 2 × 3 ANOVA with the factors novelty (novel, familiar) and reward probability (0, 0.4, 0.8).
We focused our analysis on three anatomically defined regions of interest (ROIs) (striatum, midbrain, and hippocampus) where interactions between novelty and reward processing were hypothesized based on previous studies (Lisman and Grace, 2005; Wittmann et al., 2005; Bunzeck and Duzel, 2006). For completeness, we also report whole-brain results in the supplemental material (available at www.jneurosci.org). Both the striatum and hippocampus ROIs were defined based on the Pick Atlas toolbox (Maldjian et al., 2003, 2004). While the striatal ROI included the head of caudate, caudate body, and putamen, the hippocampal ROI excluded the amygdala and surrounding rhinal cortex. Finally, the substantia nigra (SN)/VTA ROI was manually defined, using the software MRIcro and the mean MT image for the group. On MT images, the SN/VTA can be distinguished from surrounding structures as a bright stripe (Bunzeck and Duzel, 2006). It should be noted that in primates, reward-responsive dopaminergic neurons are distributed across the SN/VTA complex and it is therefore appropriate to consider the activation of the entire SN/VTA complex rather than focusing on its subcompartments (Duzel et al., 2009). For this purpose, a resolution of 3 mm3, as used in the present experiment, allows sampling of 20–25 voxels of the SN/VTA complex, which has a volume of 350–400 mm3.
Results
Behaviorally, subjects showed high accuracy in task performance during the indoor/outdoor discrimination task [mean hit rate = 97.1%, SD = 2.8% for familiar pictures, mean hit rate = 96.8%, SD = 2.1% for novel pictures, t(15) = 0.38, not significant (n.s.)], as well as for the win/no win discrimination at the outcome time (mean hit rate = 97.8%, SD = 2.3% for win events, mean hit rate = 97.7%, SD = 2.2% for no win events, t(15) = 0.03, n.s.). Subjects discriminated indoor and outdoor status faster for familiar compared with novel images [mean reaction time (RT) = 628.2 ms, SD = 77.3 ms for familiar pictures; mean RT = 673.8 ms, SD = 111 ms for novel pictures; t(15) = 4.43, p = 0.0005]. There was no RT difference for the win/no win discrimination at the outcome time (mean RT = 542 ms, SD = 82.2 ms for win trials; mean RT = 551 ms, SD = 69 ms for no win trials; t(15) = 0.82, n.s.). Similarly, during conditioning, there were no RT differences for the three different fractal images (0.8-probability: RT = 370.1 ms, SD = 79 ms; 0.4-probability: RT = 354.4, SD = 73.8 ms; 0-probability: RT = 372.2 ms, SD = 79.3 ms; F(1,12) = 0.045, n.s.). The latter RT analysis excluded three subjects due to technical problems during data acquisition.
In the analysis of the fMRI data, a 2 × 3 ANOVA with factors novelty (novel, familiar) and reward probability (p = 0, p = 0.4, p = 0.8) showed a main effect of novelty bilaterally in the hippocampus (Fig. 2A) and right striatum, false discovery rate (FDR)-corrected for the search volume of the ROIs. A simple main effect of reward (′p = 0.8 > p = 0′) was observed within the left SN/VTA complex (Fig. 2B) and within bilateral striatum (Fig. 2C). See Table 1 for all activated brain regions.
fMRI results. A, Results of the contrast novel versus familiar in the hippocampus ROI, and parameter estimate at the peak voxel of the activated cluster displayed in the map. B, Results of the contrast high reward probability (p = 0.8) versus no reward (p = 0) in the midbrain ROI, and parameter estimate at the peak voxel of the activated cluster displayed in the map. C, Results of the contrast high reward probability versus no reward in the striatum ROI, and parameter estimate at the peak voxel of the three activated clusters displayed in the map. Data are thresholded at p < 0.05 FDR. Activation maps are superimposed to on a T1 group template (A, C) and on an MT group template.
fMRI results: details of fMRI activations within the hippocampus, the striatum, and the midbrain ROI
We did not observe a novelty × reward probability interaction when correcting for multiple tests over the entire search volume of our ROIs. However, when performing a post hoc analysis (t test) of the three peak voxels showing a main effect of reward in the striatum, we found (orthogonal) effects of novelty and its interaction with reward: one voxel also showed a main effect of novelty and a novelty × reward interaction, whereas another voxel also showed a main effect of novelty.
As shown in Figure 2C (middle), in the first voxel ([8 10 0]; main effect of reward, F(2,30) = 8.12, p = 0.002; main effect of novelty, F(1,15) = 7.03, p = 0.02; novelty × reward interaction, F(2,30) = 3.29, p = 0.05), this effect was driven by higher BOLD responses to trials with reward probability 0.4 and preceded by a novel picture (post hoc t test: t(15) = 3.48, p = 0.003). In the second voxel (Fig. 2C, right) ([−10 14 2]; main effect of reward, F(2,30) = 13.13, p < 0.001; main effect of novelty, F(1,15) = 9.19, p = 0.008; no significant interaction, F(2,30) = 1.85, n.s.), post hoc t tests again demonstrated that the main effect of novelty was driven by differences between novel and familiar images at the two low probabilities of reward delivery (t(15) = 2.79, p = 0.014; t(15) = 2.19, p = 0.045, for probability p = 0 and p = 0.4, respectively) (Fig. 2C). In contrast, the third voxel (Fig. 2C, left) ([−22 4 0]; main effect of reward, F(2,30) = 9.1, p = 0.001), neither showed a main effect of novelty (F(1,15) = 2.33, n.s.) nor an interaction (F(2,30) = 1.54, n.s.).
In the midbrain, the voxel with maximal reward-related responses ([−8 −14 −8]; F(2,30) = 12.19, p < 0.001) also showed a trend toward a main effect of novelty (F(1,15) = 4.18, p = 0.059) in the absence of a significant interaction (F(2,30) = 0.048, n.s.).
Discussion
Novel images of scenes enhanced striatal reward responses elicited by subsequent and unrelated rewarding events (predicting abstract cues and reward delivery). As expected, novel images also activated the hippocampus. These findings provide first evidence, to our knowledge, for a physiological prediction that novelty-related hippocampal activation should exert a contextually enhancing effect on reward processing in the ventral striatum (Lisman and Grace, 2005; Bunzeck and Duzel, 2006).
Due to the properties of the BOLD signal, the temporal proximity of the reward-predictive cue and the outcome delivery prevented an estimation of the effects of novelty on these events separately. Rather, we considered the cue-outcome sequence as a compound event and found that the effect of novelty on reward processing varied as a function of the probability of reward occurrence. An enhancement was observed solely when the probability of predicted reward was low (0 or 0.4) and was absent for high reward probability (0.8) (Fig. 2C). It is important to note that this pattern of results cannot be explained by independent effects of novelty and reward in the same region. BOLD effects caused by two functionally distinct but spatially overlapping neural populations would be additive regardless of reward probability and hence lead to a novelty effect also in the 0.8 probability condition. Therefore, these probability-dependent effects of novelty on reward processing argue against the possibility that they reflect a contamination by BOLD responses elicited by novel stimuli themselves. Rather, the findings indicate that contextual novelty increased reward processing per se, albeit only in the low probability condition.
As explained above, we could not disambiguate BOLD responses between reward anticipation (cues) and reward delivery (outcomes). Novelty may have selectively increased the processing of nonrewarding outcomes (no win trials). This would be consistent with the fact that we did not observe any significant novelty effect on trials with high reward probability because 80% of these trials resulted in reward being delivered. Alternatively, novelty may have influenced reward anticipation for cues that predicted reward delivery with low probability (i.e., 0 and 0.4). In either case, contextual novelty enhanced brain representation for those events that were objectively less rewarding. Moreover, the lack of novelty modulation of reward signals in the high probability condition is unlikely to be due to a ceiling effect in reward processing. Previous work has shown that reward-related responses in the human striatum are scaled adaptively in different contexts, resulting in a signal that represents whether an outcome is favorable or unfavourable in a particular setting (Nieuwenhuis et al., 2005). It can thus be expected that reward responses should also be capable of accommodating a novelty bonus under conditions of high reward probability.
It is well established that the primate brain learns about the value of different stimuli paired with reward in classical conditioning experiments, as measured by increased anticipation of the outcome (e.g., increased licking). In the present experiment, we measured reaction times during the conditioning phase but did not find differences across the different levels of predictive cue strengths. Considering the simplicity of the task and the speed at which subjects responded (<375 ms for all conditions), this lack of a differential response may be due to a ceiling effect. Despite the lack of an objective behavioral measure for conditioning, the successful use of this cue type in previous studies (O'Doherty et al., 2003) suggests that subjects still formed an association between the cues and the different probabilities of reward delivery.
In previous work, reward signals in the striatum have been linked to a variety of reward-related properties both in humans and nonhuman primates, including probability (Preuschoff et al., 2006; Tobler et al., 2008), magnitude (Knutson et al., 2005), uncertainty (Preuschoff et al., 2006), and action value (Samejima et al., 2005). This diversity of reward-related variables expressed in the striatum fits well with its role as a limbic/sensorimotor interface with a critical role in the organization of goal-directed behaviors (Wickens et al., 2007). Both the SN/VTA and the striatum, one of the major projection sites of the midbrain dopamine system, also respond to reward and reward-predictive cues in classical conditioning paradigms (Delgado et al., 2000; Knutson et al., 2000; Fiorillo et al., 2003; Knutson et al., 2005; Tobler et al., 2005; Wittmann et al., 2005; D'Ardenne et al., 2008). According to several computational perspectives, dopamine transmission originating in the SN/VTA teaches the striatum about the value of conditioned stimuli via a prediction error signal (Schultz et al., 1997).
Although in classical conditioning studies, reward and nonreward representations expressed in the striatum do not always have obvious behavioral consequences (O'Doherty et al., 2003; den Ouden et al., 2009), fMRI studies have systematically shown that changes in striatal BOLD activity correlate with prediction errors related to the value of choice options as characterized by computational models fit to behavioral data (O'Doherty et al., 2004; Pessiglione et al., 2006). Striatal state value representations not linked to an action may be related to signals of reward availability that are translated into preparatory responses, for example approach or invigorating effects as seen in pavlovian-instrumental transfer (Cardinal et al., 2002; Talmi et al., 2008). Our data suggest that novelty modulates such state value representations by increasing the expectancy of reward or the response to nonrewarding outcomes. The consequence of this interaction between novelty and reward could be the generation of unconditioned preparatory responses. In the real world, such responses would lead to enhanced approach when novelty is identified with a cue (Wittmann et al., 2008) or to random exploration of the environment when novelty is detected but not associated with a specific cue, as observed in the animal literature (Hooks and Kalivas, 1994). This view is also consistent with influential computational models (Kakade and Dayan, 2002).
One critical structure that is likely involved in the contextually enhanced reward responses in the striatum is the hippocampus. As in previous studies (Tulving et al., 1996; Strange et al., 1999; Bunzeck and Duzel, 2006; Wittmann et al., 2007), we show that contextual novelty activated the hippocampus more strongly than familiarity. Given its strong (indirect) projections to the SN/VTA, we suggest that this structure is the likely source for a novelty signal to the midbrain dopaminergic system (Lisman and Grace, 2005; Bunzeck and Duzel, 2006). The dopaminergic midbrain also receives input from other brain areas, such as the prefrontal cortex, that could also have conveyed novelty signals to it (Fields et al., 2007). Given the evidence to date, however, we consider the hippocampus as the most likely candidate for driving a novelty-related disinhibition of midbrain dopamine neurons that would explain an amplification of striatal reward signals in the context of novelty. In contrast, the probability-dependent moderation of the contextual novelty effect, in turn, may have originated in the prefrontal cortex (PFC). Physiological studies show that increasing PFC drive to SN/VTA neurons enhances dopaminergic modulation of PFC regions only, but not dopaminergic input to the ventral striatum (Margolis et al., 2006). Through such a mechanism, PFC could regulate the probability-dependent contextual effects of novelty on SN/VTA and ventral striatal reward representation.
To conclude, the present results demonstrate that contextual novelty increases reward processing in the striatum in response to unrelated cues and outcomes. These findings are compatible with the predictions of a polysynaptic pathway model (Lisman and Grace, 2005) in which hippocampal novelty signals provide a mechanism for the contextual regulation of salience attribution to unrelated events.
Footnotes
-
This work was supported by Wellcome Trust Project Grant 81259 (to E.D. and R.J.D.). R.J.D. is supported by a Wellcome Trust Programme Grant. M.G.-M. holds a Marie Curie Fellowship. K.E.S. acknowledges support by the SystemsX.chh project NEUROCHOICE.
- Correspondence should be addressed to Marc Guitart-Masip at the above address. m.guitart{at}ucl.ac.uk