Introduction

Multisensory information processing is vital in the maintenance of many critical daily tasks. Multisensory integration refers to individuals integrating information from different sensory modalities into a unified, coherent, stable, and meaningful perception (Stein & Stanford, 2008; Tang et al., 2016). When multiple sensory inputs are received simultaneously, it is hypothesized that the human brain does not give equal weight to each sensory modality, resulting in some cases where information from one sensory modality receives prioritized processing, leading to the sensory dominance effect (Callan et al., 2015; Hirst et al., 2018; Zhou et al., 2010).

Sound-induced flash illusion (SiFI) is a multisensory phenomenon involving a demonstration of the influence of sound on visual perception, in which participants misperceive that the number of visual flashes is equal to the number of auditory beeps when the visual flashes are accompanied by an unequal number of auditory beeps presented within 100 ms (Shams et al., 2000, 2002). It encompasses fission illusion and fusion illusion. The former is two auditory beeps accompanied by one visual flash resulting in the perception of two visual flashes (Shams et al., 2000, 2002). The latter is that two visual flashes are incorrectly perceived as one when two visual flashes are presented with one auditory beep (Andersen et al., 2004). A large number of empirical studies have explored the top-down and bottom-up factors associated with the perception of the illusion (Kamke et al., 2012; Keil, 2020; Wang et al., 2019). For example, our previous studies found that top-down reward and cognitive expectations could reduce the fission illusion, which improved the accuracy of judgment (Wang et al., 2019; Yu et al., 2022). Moreover, increased spatial frequency and visual complexity of the visual stimulus reduce illusion perception, whereas increased the depth of auditory stimuli has the opposite effect (Sun et al., 2022; Takeshima & Gyoba, 2013, 2015). Recently, our studies found that the effect of auditory repetition suppression (RS), an adaptive effect caused by stimulus repetition, reduced the size of the fission illusion (Sun et al., 2020; Wang et al., 2021).

Previous studies have discovered that individual perceptual sensitivity is one of the major causes of the magnitude of the illusion (Y.-C. Chen et al., 2017; McCormick & Mamassian, 2008; Vanes et al., 2016). Measures of the individual susceptibility to the illusion based on signal detection theory have been proposed (Watkins et al., 2006; Whittingham et al., 2014). Information from different congruent and incongruent stimulus combinations was combined to compute the sensitivity (d′) and response criteria. Vanes et al. (2016) described how sensitivity and criteria could be computed in the analysis of the fission and fusion illusions. Some investigations have found that the visual perceptual sensitivity of both the fission and fusion illusions was significantly lower than that of the nonillusion condition through the above approach (McGovern et al., 2014; Vanes et al., 2016). Complementarily, Kumpik et al. (2014) proposed that SiFI was determined by individual perceptual sensitivity to the presentation of flashes in the visual field (Kumpik et al., 2014). Consistent with this is evidence that the SiFI in different visual fields is related to discrimination and criteria (Y.-C. Chen et al., 2017). Furthermore, our studies suggested that lower perceptual sensitivity based on auditory RS could weaken the SiFI effect (Sun et al., 2020; Wang et al., 2021). These findings highlight that SiFI was induced by a decrease in perceptual sensitivity, which allowed individuals to perceive the illusion (Kumpik et al., 2014; McCormick & Mamassian, 2008).

Visual adaptation has been shown to change individuals’ visual sensitivity (Tseng et al., 2004; von der Twer & MacLeod, 2001). It usually refers to the improvement of our visual system’s ability to process the environment through an adaptation effect. Neurons in the retina or visual cortex can become less sensitive after adaptation (Kohn, 2007), and they can show an increase in the efficiency of neuronal coding (Clifford et al., 2007). Karjack et al. (2021) added adapted to a horizontally drifting grating for 2 minutes prior to determining whether a drifting test grating moved to the left or the right. They discovered that visual motion adaptation decreases the sensitivity of direction discrimination. Additionally, existing studies also have demonstrated that adaptation to low-level vision, such as contrast (Baker & Meese, 2012), and high-level properties, such as face view (Chen, 2010; Rhodes et al., 2010), were thought to decrease perceptual sensitivity to the characteristics of the visual world we encounter. Besides, adaptation can enhance differential sensitivity, enabling the observer to detect smaller differences around the adapted stimulus (Barraclough et al., 2017). At the neural level, there is increasing evidence that adaptation is typically considered to involve neurons adjusting their sensitivity to achieve more efficient coding (Schwartz et al., 2007).

Previous studies have found that SiFI could be influenced by interindividual perceptual sensitivity. Given visual adaptation has been shown to change individuals’ visual sensitivity (Tseng et al., 2004; von der Twer & MacLeod, 2001), the present study used the classic SiFI paradigm to investigate whether the SiFI effect could be modified with visual adaptation by reducing perceptual sensitivity. Recently, we added repeated visual stimuli prior to the presentation of audiovisual stimuli to investigate the visual RS effect on the SiFI and discovered that there was no effect of the number of preceding visual stimuli on the size of the fission and fusion illusions. The possible explanation was that the number of preceding visual stimuli was not enough to affect the size of the illusion (Wang et al., 2021). Hence, we added prolonged adapting visual stimuli prior to the presentation of audiovisual stimuli to investigate whether the bottom-up factor of adaptation affects the SiFI. Moreover, adaptation improves the discrimination of stimuli similar to the adapted stimulus (Oruç & Barton, 2011) while impairing the discrimination of stimuli dissimilar to the adapted stimulus (Phinney et al., 1997). Therefore, the adapting visual stimuli consisted of one or two of the same visual stimuli that lasted for 2 minutes in succession, which is the same patterned flashes as in the SiFI measurement part of the experiment. Since previous studies have shown that visual adaptation can occur over a wide range of times, from milliseconds (Glasser et al., 2011; Pavan et al., 2012) to hours (Bao & Engel, 2012; Zhang et al., 2009) or even days (Delahunt et al., 2004), we chose different groups of participants for separate experiments to eliminate the errors caused by participants’ adaptation more than once. We hypothesized that the adaptation of visual stimuli could affect the SiFI and that the number of adapting flashes would affect the magnitude of the SiFI differently. Adapting to a single flash improves fusion illusion, and adapting to double flashes improves fission illusion.

Methods

Participants

In adapting a single-flash condition, 32 participants were included in the statistical analysis for this experiment, between the ages of 19 and 24 years (25 females; M = 21.38, SD = 1.67). In adapting the double-flash condition, 32 participants were included in the statistical analysis for this experiment, between the ages of 19 and 27 years (10 men, 22 females; M = 21.67, SD = 1.93). They all had normal hearing and normal or corrected-to-normal vision and no visual or psychiatric disorders. Participants received some remuneration after the experiment. Prior to the experiment, all participants gave written informed consent in accordance with the Declaration of Helsinki. The study protocol was approved by the Ethics Committee of Soochow University. The appropriate sample size was calculated by using G*Power (Version 3.1.9.7) for this experiment. A hybrid design of 2 (number of adapting visual stimuli: 1 vs. 2) × 2 (illusion type: F1B2 vs. F2B1) used the medium effect size (ƒ = 0.25). The probability of Type I error α was 0.01, power (1 − β) was 0.80, and the appropriate sample size was calculated to be at least 52.

Apparatus and materials

All stimuli were presented on a View Sonic P220f VS10284 monitor with a screen resolution of 1,920 × 1,080 pixels and a refresh rate of 60 Hz. Participants’ heads were held in a chin rest and placed at a distance of 60 cm from the screen. All visual flash stimuli were presented on a black background by Presentation Software (Neurobehavioral Systems Inc.). The visual flashes were white disks (visual angle of 2°) that presented a 5° visual angle below the central fixation point for 17 ms. The illusory effect was greatest when the visual flash stimuli were located in the peripheral visual field in conjunction with the auditory beep stimuli (Shams et al., 2002). The auditory beeps in the experiment were presented through a headset (HD200 PRO) at 70 dB and 3.5 kHz for 7 ms.

Experimental design and procedure

A mixed design of 2 (number of adapting visual stimuli: 1 vs. 2) × 2 (number of visual flash stimuli: 1 vs. 2) × 3 (number of auditory beep stimuli: 0 vs. 1 vs. 2) was used. The number of visual adapting stimuli was the between-group variable and the others were the within-group variables. The 12 comprised the experiment (V1_F1, V1_F2, V1_F1B1, V1_F1B2, V1_F2B1, V1_F2B2, V2_F1, V2 _F2, V2_F1B1, V2_F1B2, V2_F2B1, and V2_F2B2). For ease of understanding, these task types are expressed using abbreviations: F stands for flash, B stands for sound, and V stands for adapting flash. For example, V1_F1B2 represents adapting a single flash, a presentation of a single visual flash, and two auditory beeps. V1_F1B2 and V2_F1B2 are the fission illusion conditions, and V1_F2B1 and V2_F2B1 are the fusion illusion conditions.

At the beginning of the experiment, participants conducted a practice experiment to determine whether they understood the task and were able to discriminate flashes and beeps. The formal experimental procedure consisted of the visual adaptation period and the classic SiFI experiment period (see Fig. 1). Participants should complete the task in the visual adaptation phase before completing tasks in the classic SiFI experiment period. In the visual adaptation period, participants were presented with the same patterned flashes as in the SiFI measurement part of the experiment. The adapting visual stimuli consisted of one or two of the same visual stimuli that lasted for 2 minutes in succession. For the adapting a single flash condition, a single flash presented consecutively and lasted for 2 minutes, with each two adapting trials at an interval of 500 ms. As a result, there would be 232 trials in 2 minutes. For the adapting double-flash condition, double flashes presented consecutively and lasted for 2 minutes, with each two adapting trials at an interval of 500 ms. As a result, there would be 200 trials in 2 minutes. To ensure that the participant was constantly focused on the visual adaptation stimuli during the adaptation phase, participants were asked to indicate the number of times a single flash stimulus occurred during the adapting a single flash condition or the number of times double-flash stimuli occurred during the adapting double-flash condition at the end of the adaptation period. Then, the participants were asked to conduct the classic SiFI experiment (Shams et al., 2002). One or two visual target stimuli (duration 17 ms) accompanied by zero, one, or two auditory stimuli (duration 7 ms) were simultaneously presented, the number of which was independent of the number of previous adapting visual stimuli. The interval between the two visual flash stimuli was 66 ms, and that between the auditory sound stimuli was 76 ms. Participants were asked to determine the number of visual flash stimuli within 2,500 ms after the stimuli were presented. Each participant needed to complete 384 trials (64 trials per block, six blocks in total), ordered pseudorandomly. The interval between trials was random from 1,400 ms to 1,700 ms in steps of 100 ms.

Fig. 1
figure 1

Experimental procedure and temporal profile of presentation of the stimuli. A visual adaptation period was presented preceding the classic SiFI experiment. V1 represents adapting a single flash (duration 17 ms) and V2 represents adapting double flashes (duration 17 ms), in which the interval between the double flashes was 66 ms. For the adapting a single-flash condition (V1), a single flash presented consecutively and lasted for 2 minutes (232 trials), with each two adapting trials at an interval of 500 ms. For the adapting double-flash condition (V2), double-flashes presented consecutively and lasted for 2 minutes (200 trials), with each two adapting trials at an interval of 500 ms. Participants were asked to indicate the number of times flash stimuli occurred at the end of the adaptation period. Then, the participants were asked to conduct the classic SiFI experiment. Each trial contained a single or two visual flashes (17 ms) accompanied by zero, one or two sounds (7 ms). The interval between the two visual flash stimuli was 66 ms, and that between the auditory sound stimuli was 76 ms. Participants were asked to determine the number of visual flash stimuli within 2500 ms after the stimuli were presented. The interval between trials was randomized, from 1,400 ms to 1,700 ms, in steps of 100 ms. (Color figure online)

Results

Accuracy

We conducted a comparison of accuracy under all conditions before the statistical analysis (see Table 1). The results of the accuracy showed that nonillusory conditions (i.e., F1, F2, F1B1, and F2B2) were higher than 90%, regardless of whether adapting a single flash or adapting double flashes, indicating that participants were able to perceive the number of flashes accurately when only visual flashes were presented or when both flashes and beeps were equally presented. To explore the effects of accuracy on SiFI with adaptation, we performed 2 (number of adapting visual stimuli: 1 vs. 2) × 2 (number of visual flash stimuli: 1 vs. 2) × 3 (number of auditory beep stimuli: 0 vs. 1 vs. 2) repeated-measures analysis of variance (ANOVA). The main effect of number of adapting visual stimuli was significant, F(1, 62) = 6.00, p = .02, η2 = 0.09. The main effect of number of visual flash stimuli was significant, F(1, 62) = 19.24, p < .001, η2 = 0.24. The main effect of number of auditory beep stimuli was significant, F(2, 61) = 68.53, p < .001, η2 = 0.69. The interaction between number of adapting visual stimuli and number of auditory beep stimuli was significant, F(2, 61) = 3.61, p = .03, η2 = 0.11. The interaction between number of visual flash stimuli and number of auditory beep stimuli was significant, F(2, 61) = 83.46, p < .001, η2 = 0.73. In contrast, there was no significant difference between number of adapting visual stimuli and number of visual flash stimuli, F(1,62) = 2.55, p = 0.12. There were significant differences among the number of adapting visual stimuli, number of visual flash stimuli, and number of auditory beep stimuli, F(2, 61) = 3.83, p = .03, η2 = 0.11.

Table 1 Mean accuracy (%) and standard deviation (%) for all conditions between the adapting a single-flash condition and the adapting double-flash condition

To investigate whether the number of adapting visual stimuli could be affected by the interaction between number of visual flash stimuli and number of auditory beep stimuli, a 2 (number of visual flash stimuli: 1 vs. 2) × 3 (number of auditory beep stimuli: 0 vs. 1 vs. 2) repeated-measures ANOVA was conducted to analyze the results of the accuracy from the number of adapting visual stimuli conditions. When adapting a single flash condition, the results revealed that the main effects of number of visual flash stimuli, F(1, 31) = 5.51, p = .03, η2 = 0.15, and number of auditory beep stimuli, F(2, 30) = 26.77, p < .001, η2 = 0.64, were significant. There was a significant interaction between number of visual flash stimuli and number of auditory beep stimuli, F(2, 30) = 34.67, p < .001, η2 = 0.70. To examine the interactions, we used a series of paired-samples t tests with multiple comparison corrections based on Bonferroni comparisons in the adapting a single flash condition. The results showed that for the one flash stimulus condition, the difference in accuracy between the F1 condition and F1B1 condition was not significant, t(31) = 0.30, p = 1.0, Cohen’s d = 0.07, 95% CI [−0.09, 0.11]. The difference in accuracy between the F1 condition and F1B2 condition was significant, t(31) = 10.46, p < .001, Cohen’s d = 2.52, 95% CI [0.25, 0.44]. The difference in accuracy between the F1B1 condition and F1B2 condition was significant, t(31) = 10.75, p < .001, Cohen’s d = 2.59, 95% CI [0.26, 0.45]. Notably, the accuracies of the nonillusory conditions (i.e., F1 and F1B1) were significantly higher than the illusory condition (i.e., F1B2) on the adapting a single flash condition (ps < .001), which showed the fission illusion. For the two-flashes stimuli condition, the difference in accuracy between the F2 condition and F2B1 condition was significant, t(31) = 3.79, p = .004, Cohen’s d = 0.91, 95% CI [0.03, 0.22]. The difference in accuracy between the F2 condition and F2B2 condition was not significant, t(31) = 0.58, p = 1.0, Cohen’s d = 0.14, 95% CI [−0.08, 0.12]. The difference in accuracy between the F2B1 condition and F2B2 condition was significant, t(31) = 4.37, p < .001, Cohen’s d = 1.05, 95% CI [0.05, 0.24]. Notably, the accuracies of the nonillusory conditions (i.e., F2 and F2B2) were significantly higher than the illusory condition (i.e., F2B1) on the adapting a single flash condition (ps < .001), which manifested the fusion illusion (see Table 2). According to Cohen’s guidelines for interpreting effect sizes (Cohen, 1988), d > 0.8 represents a large effect size. Therefore, our results revealed significant fission illusion and fusion illusion for the adapting a single flash condition. In addition, we conducted paired-sample t tests on the illusion conditions. The accuracy of F2B1 was significantly higher than that of F1B2, t(31) = 5.79, p < .001, Cohen’s d = 1.39, 95% CI [0.09, 0.29], which indicated that the fission illusion had a greater amplitude than the fusion illusion.

Table 2 Multiple comparison results of interaction between the number of visual flash stimuli and the number of auditory beep stimuli.

When adapting double-flash condition, the results revealed that the main effects of number of visual flash stimuli, F(1, 31) = 13.83, p = .001, η2 = 0.31, and number of auditory beep stimuli, F(2, 30) = 41.04, p < .001, η2 = 0.73, were significant. There was a significant interaction between number of visual flash stimuli and number of auditory beep stimuli, F(2, 30) = 48.73 , p < .001, η2 = 0.77. To examine the interactions, we used a series of paired-samples t tests with multiple comparison corrections based on Bonferroni comparisons in the adapting double-flash condition. The results showed that for the one flash stimulus condition, the difference in accuracy between the F1 condition and F1B1 condition was not significant, t(31) = 0.25, p = 1.0, Cohen’s d = 0.06, 95% CI [−0.12, 0.14]. The difference in accuracy between the F1 condition and F1B2 condition was significant, t(31) = 12.83, p < .001, Cohen’s d = 3.02, 95% CI [0.41, 0.66]. The difference in accuracy between the F1B1 condition and F1B2 condition was significant, t(31) = 13.08, p < .001, Cohen’s d = 3.08, 95% CI [0.42, 0.67]. Notably, the accuracies of the nonillusory conditions (i.e., F1 and F1B1) were significantly higher than the illusory condition (i.e., F1B2) on the adapting double-flash condition (ps < .001), which showed the fission illusion. For the two flashes stimuli condition, the difference in accuracy between the F2 condition and F2B1 condition was significant, t(31) = 4.79, p < .001, Cohen’s d = 1.13, 95% CI [0.08, 0.33]. The difference in accuracy between the F2 condition and F2B2 condition was not significant, t(31) = 0.53, p = 1.0, Cohen’s d = 0.13, 95% CI [−0.10, 0.15]. The difference in accuracy between the F2B1 condition and F2B2 condition was significant, t(31) = 5.33, p < .001, Cohen’s d = 1.25, 95% CI [0.10, 0.35]. Notably, the accuracies of the nonillusory conditions (i.e., F2 and F2B2) were significantly higher than the illusory condition (i.e., F2B1) on the adapting double-flash condition (ps < .001), which manifested the fusion illusion (see Table 2). Moreover, we conducted paired-sample t tests on the illusion conditions. The accuracy of F2B1 was significantly higher than that of F1B2, t(31) = 7.48, p < .001, Cohen’s d = 1.80, 95% CI [0.19, 0.45], which indicated that the fission illusion had a greater amplitude than the fusion illusion.

To further examine the effects of adaptation in the two illusion conditions, we tested adapting a single visual flash and double flash conditions under the F1B2 and F2B1 conditions by independent-samples t test. Figure 2 shows that the accuracy of the V2_F1B2 (M = 43%, SD = 0.35) condition was significantly lower than that of the V1_F1B2 (M = 64%, SD = 0.28) condition, t(62) = 2.55, p = .01, Cohen’s d = 0.64, 95% CI [0.13, 1.14]. However, there was no significant difference between the accuracy of the V1_F2B1 condition and the V2_F2B1 condition, t(62) = 1.31, p = .20.

Fig. 2
figure 2

Accuracy (%) of the fission illusion condition and the fusion illusion condition. V1 represents adapting a single-flash condition, and V2 represents adapting double-flash condition. The different color bars mark the number of adapting visual stimuli. Silver bar: The proportion of trials adapting to a single-flash stimulus condition, Gray bar: The proportion of trials adapting to double flashes. F1B2 represents a flash and two auditory beeps, and F2B1 represents two flashes and an auditory beep. Error bars represent the standard error of the mean *p < .05

Signal detection theory analysis of the fission and fusion illusions

According to the signal detection theory (McCormick & Mamassian, 2008; Violentyev et al., 2005), the different magnitudes of illusions in the number of adapting visual stimuli condition is due to a change in the sensitivity of the flashes and/or the criterion for reporting the number of flashes induced by the presentation of the beeps. Thus, d′ represents the sensitivity (Witt et al., 2015), which reflects the ability to distinguish one and two flashes. ln(β) represents the response criterion (Stanislaw & Todorov, 1999). According to Vanes et al. (2016) and Keil (2020), information from trials with two beeps combined with two flashes that are congruent (F2B2) and trials with two beeps combined with one flash that are incongruent (F1B2) can be used to analyze the fission illusion. For the fission illusion, two flashes were defined as the signal. Therefore, the response of “2” on F2B2 was a “hit,” and the response of “1” was a “miss.” Likewise, the response of “2” on F1B2 was a “false alarm,” and the response of “1” was a “correct rejection.” Accordingly, information from trials with one beep combined with one flash that are congruent (F1B1) and trials with one beep combined with two flashes that are incongruent (F2B1) can be used to analyze the fusion illusion. For the fusion illusion, one flash was defined as the signal. Therefore, the response of “1” on F1B1 was a “hit,” and the response of “2” was a “miss.” Likewise, the response of “1” on F2B1 was a “false alarm,” and the response of “2” was a “correct rejection.” Importantly, a positive ln(β) reflects the overall tendency toward responding “1�� for the fission condition, which reflects the overall tendency toward responding “2” for the fusion condition. However, in further analysis, ln(β) in the fusion condition was inverted to represent the tendency toward responding “1.” The d′ and ln(β) were calculated for the fission illusion (F1B2 and F2B2) and the fusion illusion (F1B1 and F2B1), respectively. They were calculated as follows (Vanes et al., 2016):

$${\displaystyle \begin{array}{c}{d}^{\prime }=\textrm{z}\left(\textrm{hit}\ \textrm{rate}\right)-\textrm{z}\left(\textrm{false}\ \textrm{alarm}\ \textrm{rate}\right)\\ {}\textrm{In}\left(\upbeta \right)=\left[\textrm{z}{\left(\textrm{false}\ \textrm{alarm}\ \textrm{rate}\right)}^2-\textrm{z}{\left(\textrm{hit}\ \textrm{rate}\right)}^2\right]/2.\end{array}}$$

To avoid extreme d′ scores, the probabilities were 0.5/n and (n − 0.5)/n when p = 0 and p = 1, respectively, where n is the number of trials (Macmillan & Kaplan, 1985). We analyzed the d′ scores of adapting a single-flash and double-flash conditions under F1B2 and F2B1 conditions by an independent-samples t test. Figure 3a shows the d′ scores under the illusory conditions. There was a significant difference in the number of adapting visual stimuli for the fission illusion (F1B2) condition, t(62) = 2.05, p = 0.045, Cohen’s d = 0.51, 95% CI [0.01, 1.01], with lower d′ scores in the adapting double-flash condition (M = 1.91, SD = 1.22) than in the adapting a single-flash condition (M = 2.49, SD = 1.03) However, there was no significant difference in the number of adapting visual stimuli for the fusion illusion (F2B1) condition, t(62) = 1.24, p = 0.22.

Fig. 3
figure 3

Mean sensitivity and criterion values are shown for flashes presented to the adapting a single flash condition (silver bars) and the adapting double-flash condition (gray bars) for the different illusions. V1 represents adapting a single flash, and V2 represents adapting double-flashes. a Perceptual sensitivity d′ in the fission illusion and the fusion illusion. The d′ score in the adapting a single flash condition (M = 2.49, SD = 1.03) was significantly higher than that of the adapting double-flash condition (M = 1.91, SD = 1.22). b Response criterion ln(β) in the fission illusion and the fusion illusion. There was no significant difference in the number of adapting visual stimuli for the fission and fusion illusion conditions. Error bars represent the standard error of the mean *p < .05

We analyzed the ln(β) scores of adapting single-flash and double-flash conditions under F1B2 and F2B1 conditions by an independent-samples t test. Figure 3b shows the ln(β) scores under the illusory conditions. There was no significant difference in the number of adapting visual stimuli for the fission illusion (F1B2) condition, t(62) = 0.38, p = .70. Meanwhile, there was no significant difference in the number of adapting visual stimuli for the fusion illusion (F2B1) condition, t(62) = 0.64, p = .52.

Discussion

SiFI is induced by a reduction in perceptual sensitivity, which allows individuals to perceive the illusion (Kumpik et al., 2014; McCormick & Mamassian, 2008). Visual adaptation can adjust the neuronal gain to alter the sensitivity, resulting in more efficient coding (Kohn, 2007; Webster, 2011). Therefore, in the present study, based on the classical SiFI paradigm (Shams et al., 2000, 2002), we added adapting visual stimuli prior to the presentation of the audiovisual stimuli to determine whether the bottom-up factor of adaptation affects the SiFI. Our findings confirmed previous findings that the participants’ accuracy was lower in the incongruent conditions than in the congruent conditions (Andersen et al., 2004; Shams et al., 2000, 2002; Shams et al., 2005; Wozny et al., 2008), demonstrating the fission and fusion illusions. In addition, we demonstrated that visual adaptation could affect SiFI whether adapting a single flash or double flashes, especially for the fission illusion. This finding gains additional support from our signal detection analysis, and visually adapting double-flashes reduced the participants’ discriminability (d′) to visual flashes compared with adapting a single flash, enhancing the illusion effects. These results indicated that the reduced perceptual sensitivity based on visual adaptation could enhance the fission illusion in multisensory integration.

Our results mirrored previous studies’ results (Shams et al., 2000, 2002), which showed that the accuracy for the illusory conditions (F1B2 and F2B1) was significantly lower than those for the nonillusory conditions (F1, F2, F1B1 and F2B2) regardless of the number of adapting stimuli, which indicated that visual adaptation, a bottom-up factor, could affect the SiFI effect. Empirical evidence has suggested that adaptation can improve differential sensitivity, allowing the participants to detect smaller changes surrounding the adapted stimulus (Barraclough et al., 2017). As a result, visual adaptation can adjust the neurons in the retina or visual cortex to alter changes in sensitivity (Clifford et al., 2000; Kohn, 2007), improving the SiFIs by reducing perceptual sensitivity, especially in the form of multisensory integration. Meanwhile, this result may be explained by the fact that the SiFI was influenced by experience (Hirst et al., 2020). Our previous studies have indicated that long-term training played an important role in ameliorating the performance of the fission and fusion illusions, suggesting that perceptual experience has a strong effect on the SiFI effect (Huang et al., 2022). We proposed that the visual adaptation session can be considered an experience, which might improve the participants’ familiarity with SiFI and reduce the illusion to some extent.

The most obvious finding was that there was a significant difference in both adaptation conditions for the fission illusion, which is that the accuracy of adapting double-flashes was significantly lower than that of adapting a single flash. Based on previous studies, adaptation improves the discrimination of stimuli similar to the adapted stimulus (Oruç & Barton, 2011) while impairing the discrimination of stimuli dissimilar to the adapted stimulus (Phinney et al., 1997). Therefore, when the number of adapting flashes is unequal to the number of visual target flashes, adaptation impairs the discrimination of visual target stimuli in adapting double flashes for the fission illusion. Moreover, greater SiFI effects mean lower sensitivity; the decreased perceptual sensitivity determined by d′ can be regarded as a measure of susceptibility to the illusion (Kumpik et al., 2014; McCormick & Mamassian, 2008). In the present study, Fig. 3a shows that the adaptation of double flashes decreased the perceptual sensitivity of visual target stimuli for the fission illusion, which resulted in higher auditory dominance. Meanwhile, Fig. 3b discovers that individual response criterion is not the major cause of the magnitude of the illusion. That is why the perceptual sensitivity of the fission illusion in adapting double flashes has been decreased than that of adapting a single flash.

The duration of adaptation stimuli is equal between single- and double-flash adaption conditions in this study was 2 minutes. However, in the adaptation session, the number of times presented differed for the two conditions (V1: 232 trials & V2: 200 trials). If the adaptation effect found in the current study were caused by the difference in the number of times the two conditions were presented, then adapting double-flashes should result in lower accuracy and sensitivity (d′) than adapting a single flash, regardless of regardless of the illusion type. However, our findings indicated that visual adaptation only impacted the fission illusion. Besides, previous studies have shown that visual adaptation can occur over a wide range of times, from milliseconds (Glasser et al., 2011; Pavan et al., 2012) to hours (Bao & Engel, 2012; Zhang et al., 2009) or even days (Delahunt et al., 2004). Therefore, the adaptation effect in the current study was not the adaptation to the number of the adaptation stimuli, which is visual adaptation could increase the fission illusion by reducing perceptual sensitivity.

Nevertheless, in contrast to the fission illusion, visual adaptation had less effect on the fusion illusion. We also found that the fission illusion had a greater amplitude than the fusion illusion, in line with previous studies (Shams et al., 2000, 2002; Shams et al., 2005; Watkins et al., 2006, 2007; Wozny et al., 2008). The possible explanation was that different underlying mechanisms and influencing factors may have contributed to the fission and fusion illusions (Kostaki & Vatakis, 2016; Mishra et al., 2008). As mentioned in the literature review, unlike the fission illusion, which emerges from primary sensory cortices (Hirst et al., 2020), the fusion illusion involves unisensory (e.g., V1 and A1) and multisensory brain areas. Moreover, alpha frequency correlates with the fission illusion (Venskus & Hughes, 2021), while beta frequency correlates with the fusion illusion (Noguchi, 2022). These results suggest that the fission and fusion illusions represent different neural mechanisms and can be separated in a frequency domain.

Previous research demonstrated that the fission illusion was more stable and susceptible to bottom-up factors, resulting in varying degrees of changes in the illusion effects (Bolognini et al., 2011; Shams et al., 2000; Wozny et al., 2008). Based on the findings of this study, it was discovered that the fusion illusion exhibited relatively stable characteristics, whereas the fission illusion was significantly influenced by the bottom-up factor of visual adaptation. This demonstrated that fission and fusion effects reflect the different underlying mechanisms. Visual adaptation usually refers to the improvement of our visual system’s ability to process the environment through an adaptation effect. Neurons in the retina or visual cortex can become less sensitive after adaptation (Kohn, 2007), and they can show an increase in the efficiency of neuronal coding (Clifford et al., 2007). In the primary visual cortex (V1), adaptation to the cell’s preferred stimulus produces the largest response change at this level (Dragoi et al., 2000; Movshon & Lennie, 1979). Using retinotopic mapping, Watkins et al. (2006) investigated whether fission illusion can occur in early visual cortical areas. The fission illusion was discovered to have a greater activation level in V1 (Watkins et al., 2006). In contrast, their research on fusion illusions discovered that illusory visual flashes had lower V1 activation levels (Watkins et al., 2007). This suggested that by altering the activation of V1, the bottom-up factor of adaptation could be susceptible to fission rather than fusion.

Conclusion

In summary, the present study revealed that the SiFI effect could be regulated by bottom-up visual adaptation, which might be influenced by perceptual sensitivity. Visual adaptation had different effects on both illusion conditions. For the fission illusion, adapting double flashes could enhance the size of the illusion more than adapting a single flash. This is attributed to a lower d′ in adapting double flashes than in adapting a single flash. However, for the fusion illusion, both visual adaptation conditions had no effect on the size of the illusion. A possible reason is that fusion effects are less reliable than fission effects. Therefore, the enhancement of the fission illusion by double flash adaptation could also be observed, suggesting that the SiFI effect could be modified with visual adaptation by reducing perceptual sensitivity.