English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study

English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study

Journal Pre-proofs Research report English spoken word segmentation activates the prefrontal cortex and temporoparietal junction in Chinese ESL learne...

2MB Sizes 0 Downloads 25 Views

Journal Pre-proofs Research report English spoken word segmentation activates the prefrontal cortex and temporoparietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study Yadan Li, Yilong Yang, Akaysha C. Tang, Nian Liu, Xuewei Wang, Ying Du, Weiping Hu PII: DOI: Reference:

S0006-8993(20)30049-4 https://doi.org/10.1016/j.brainres.2020.146693 BRES 146693

To appear in:

Brain Research

Received Date: Revised Date: Accepted Date:

2 May 2019 21 January 2020 27 January 2020

Please cite this article as: Y. Li, Y. Yang, A.C. Tang, N. Liu, X. Wang, Y. Du, W. Hu, English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study, Brain Research (2020), doi: https://doi.org/10.1016/j.brainres. 2020.146693

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study Yadan Lia,1, Yilong Yangb,c,1,*, Akaysha C. Tangd,e, Nian Liuf, Xuewei Wanga, Ying Dua, Weiping Hua,g,* a MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China b Research Center for Linguistics and Applied Linguistics, Xi’an International Studies University, Xi’an, China c School of English Studies, Xi’an International Studies University, Xi’an, China d The Laboratory of Neuroscience for Education, University of Hong Kong, China e The Mind Research Network, Albuquerque, N.M., U.S.A. f Department of Modern Languages, Literatures, and Linguistics, University of Oklahoma, Norman, U.S.A. g Shaanxi Normal University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University, Xi’an, China 1Equal

first author. author: Yilong Yang, School of English Studies, Xi’an International Studies University, Xi’an, China. Email address: [email protected], [email protected]. Weiping Hu, MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China. Email address: [email protected]. *Corresponding

25

Abstract

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

A direct measure of spoken lexical processing based on neuroimaging technology would provide us useful information to understand the neural mechanisms underlying speech or auditory language processing. The neural mechanisms of spoken word segmentation for English as a second language (ESL) learners remain elusive. The present study, using functional near-infrared spectroscopy (fNIRS), addresses this issue by measuring hemodynamic responses in the temporo-parietal junction (TPJ) and the prefrontal cortex (PFC) in a word-spotting task, designed with two task conditions (easy vs. difficult). Thirty participants, divided into a high listening proficiency group (HLG) and a low listening proficiency group (LLG), were tested. Results revealed significantly less TPJ activation in the HLG than in the LLG. Further analyses supported this result by showing that activation in the TPJ was in a negative correlation with listening proficiency. This association appears to be related to the more efficient use of processing resources in a bottom-up fashion for accurate and efficient sensory representations in high proficient language learners. In contrast, cortical activation in the PFC increased with listening proficiency and was stronger in the difficult task condition than in the easy task condition, implying that recruitment of top-down cognitive control functions might play a role in word segmentation. Our results suggest that the combination of the functions mediated via bottom-up sensory input processing (demonstrated in the TPJ activation) and top-down cognitive processing (demonstrated in the PFC activation) are crucial for ESL listeners’ spoken word segmentation. Keywords: spoken word segmentation; word-spotting task; temporo-parietal junction; prefrontal cortex; functional near-infrared spectroscopy

1

4

English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study

5

Yadan Lia,1, Yilong Yangb,c,1,*, Akaysha C. Tangd,e, Nian Liuf, Xuewei Wanga, Ying Dua,

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Weiping Hua,g,* a MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China b Research Center for Linguistics and Applied Linguistics, Xi’an International Studies University, Xi’an, China c School of English Studies, Xi’an International Studies University, Xi’an, China d The Laboratory of Neuroscience for Education, University of Hong Kong, China e The Mind Research Network, Albuquerque, N.M., U.S.A. f Department of Modern Languages, Literatures, and Linguistics, University of Oklahoma, Norman, U.S.A. g Shaanxi Normal University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University, Xi’an, China

1 2 3

1Equal

first author. author: Yilong Yang, School of English Studies, Xi’an International Studies University, Xi’an, China. Email address: [email protected], [email protected]. Weiping Hu, MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China. Email address: [email protected]. *Corresponding

25

Abstract

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

A direct measure of spoken lexical processing based on neuroimaging technology would provide us useful information to understand the neural mechanisms underlying speech or auditory language processing. The neural mechanisms of spoken word segmentation for English as a second language (ESL) learners remain elusive. The present study, using functional near-infrared spectroscopy (fNIRS), addresses this issue by measuring hemodynamic responses in the temporo-parietal junction (TPJ) and the prefrontal cortex (PFC) in a word-spotting task, designed with two task conditions (easy vs. difficult). Thirty participants, divided into a high listening proficiency group (HLG) and a low listening proficiency group (LLG), were tested. Results revealed significantly less TPJ activation in the HLG than in the LLG. Further analyses supported this result by showing that activation in the TPJ was in a negative correlation with listening proficiency. This association appears to be related to the more efficient use of processing resources in a bottom-up fashion for accurate and efficient sensory representations in high proficient language learners. In contrast, cortical activation in the PFC increased with listening proficiency and was stronger in the difficult task condition than in the easy task condition, implying that recruitment of top-down cognitive control functions might play a role in word segmentation. Our results suggest that the combination of the functions mediated via bottom-up sensory input processing (demonstrated in the TPJ activation) and top-down cognitive processing (demonstrated in the PFC activation) are crucial for ESL listeners’ spoken word segmentation. Keywords: spoken word segmentation; word-spotting task; temporo-parietal junction; prefrontal cortex; functional near-infrared spectroscopy 2

1

1.

Introduction

2 3 4 5 6 7 8 9 10 11 12 13 14 15

For second language (L2) learners, listening may be both the most important and the most challenging of the four general language skills (i.e. listening, speaking, reading, and writing), as in speech there are no obvious pauses between words for the listeners to tell where the speech stream should be segmented. In contrast, native speakers seem to have the ability to segment native speech streams into words with great ease. In the case of Chinese-speaking English learners, Goh (2000) reported that one of the most difficult problems in their English listening was that they “do not recognize words they know.” One important ability in speech perception is spoken word segmentation. Previous behavioral studies on spoken word segmentation have identified cues participants use to segment speech streams (Sanders & Neville, 2003), such as metrical cues (e.g. Norris et al., 1995; Vroomen et al., 1996), cues to stress and accent (e.g. Banel & Bacri, 1995; Yerkey & Sawusch, 1992, 1993), allophonic cues (e.g. Yerkey & Sawusch, 1993), phonotactic cues (e.g. Banel & Bacri, 1995; McQueen & Cox, 1995), and statistical learning (e.g. Johnson & Tyler, 2010), all of which were classified into lexical cues and prelexical cues by Gow and Gordon (1995). Studies in phonological

16 17 18 19 20 21 22 23

processing have demonstrated that children’s literacy levels affect their phonological awareness (e.g. Castles et al., 2003; Ehri & Wilce, 1980; Stuart, 1990), which in turn promotes speech perception (Gillon, 2005). These studies imply that language proficiency may have an influence on the performance of spoken word segmentation. This effect would be prominent and crucial for second language learners. However, currently there is relatively less research in this direction. Such speculation needs further and direct evidence. Only a limited number of studies have investigated the neural mechanisms of spoken word segmentation from the perspective of cognitive neuroscience. Most of these studies used event-

24 25 26 27 28

related potentials (ERP) and focused on phonological and semantic factors (e.g. Goyet et al., 2010; Kooijman et al., 2009; Newman et al., 2012; Sanders & Neville, 2003; Sanders et al., 2002). Burton et al. (2000) used fMRI to investigate the role of segmentation in phonological processing. They proposed that frontal activation was the result of either word segmentation or the use of working memory. Furthermore, evidence from neuroimaging studies suggests that not only the

29 30 31 32 33 34 35 36

supramarginal gyrus (SMG) and the superior temporal brain regions (e.g. Bitan et al., 2007; Chang et al., 2010; Hickok & Poeppel, 2007; Price et al., 1992; Wilson et al., 2007) but also the temporoparietal junction (TPJ) (e.g. Dai et al., 2018; Perner & Aichhorn, 2008; Saxe & Kanwisher, 2003; Saxe & Wexler, 2005) are responsible for speech/auditory processing. However, there are very few neuroimaging studies on spoken word segmentation. The underlying neural mechanisms of spoken word segmentation in second language learners remain largely unknown, and the patterns of cortical activation during spoken word segmentation are yet to be explored. Neuroscience studies have further shown that both bottom-up and top-down processes are used in

37 38

phonological processing (e.g. Bitan et al., 2009; Bonte et al., 2005; Newman & Connolly, 2009). Auditory brain areas (e.g. supramarginal gyrus, superior temporal gyrus) are recruited for bottom-

39 40

up phonological processing (Bitan et al., 2009). Particularly it has been reported that the TPJ (including SMG and superior temporal brain regions) is involved in stimulus-driven or bottom-up

41 42 43

induced auditory attention, auditory perception, and the processing of semantic information (e.g. Hall & Moore, 2003; Molholm et al., 2004; Seifritz et al., 2002). Moreover, the activation of topdown phonological processes is confirmed to be a component of the prefrontal cortical functions

44

(Bitan et al., 2009). The prefrontal cortex (PFC) has been reported to be activated in speech 3

1 2 3 4 5

intelligibility tests, reflecting possible compensatory or cognition activities in more difficult listening conditions (e.g. Bisconti et al., 2012; Davis & Johnsrude, 2003; Lawrence et al., 2018; Wijayasiri et al., 2017; Wild et al., 2012). The PFC is also responsible for planning cognitive behavior in a top-down fashion, including cognitive inhibition (e.g. Chmielewski et al., 2014; Rubia et al., 2003). A couple of behavioral studies have implied that cognitive inhibition plays a role in

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

spoken word segmentation (e.g. Norris et al., 1995; Vroomen et al., 1996), and is also a cue (i.e. inhibitory cue) that participants use (Luce & Cluff, 1998). However, these claims are limited to behavioral investigations, and thus can neither provide neural evidence nor indicate the possible engagement of cognitive inhibition in word segmentation. Whether both bottom-up and top-down processing is activated in spoken word segmentation still remains unknown. To the best of our knowledge, there has been no prior research using functional near-infrared spectroscopy (fNIRS) to explore the effects of L2 listening proficiency and the effects of task difficulty on spoken word segmentation and the underlying neural mechanisms. fNIRS, as a noninvasive technique for recording hemodynamic responses in the human brain based on optical measurements, is drawing more and more attention as a tool for functional brain imaging (Boas et al., 2014). Research in auditory language processing using fNIRS has grown rapidly in recent years (e.g. Chen et al., 2015; Hassanpour et al., 2015; Hong & Santosa, 2016; Lawrence et al., 2018; Pollonini et al., 2014; van de Rijt et al., 2016; Wiggins et al., 2016; Wijayasiri et al., 2017). The underlying reason is that fNIRS has a few important strengths and advantages in auditory language processing studies. In the present study, we used fNIRS rather than fMRI to address three concerns. First, fNIRS is much quieter and thus does not interfere with audio stimuli in the experiment. Second, fNIRS requires less restriction on the participants’ motion and has a high tolerance for movement artifacts in their verbal responses, and thus allowing the participants to behave more naturally. Last but not least, the temporal resolution of fNIRS is significantly higher than that of fMRI. The purpose of the current research is to investigate potential neural mechanisms of English spoken word segmentation in Chinese ESL (English as a second language) learners using a word-spotting paradigm and fNIRS technology. We chose to use the word-spotting paradigm because it not only provides a measure to examine the process of competition between different lexical hypotheses but also has higher ecological validity that better imitates listeners’ segmentation of continuous speech (McQueen, 1996). In the current study, Chinese ESL learners with different English listening proficiency levels were asked to perform a word-spotting task (Cutler & Shanley, 2010; Farrell, 2015) designed with two levels of task conditions (easy vs. difficult). Their cortical activation was recorded while they performed the task. As previous studies suggest that the TPJ is responsible for auditory language processing in a bottom-up fashion while the PFC is involved in top-down cognitive inhibition, we hypothesized that both the TPJ (reflecting bottom-up processing) and the PFC (reflecting top-down processing) would be activated in L2 learners’ spoken word segmentation, and that the activation patterns in these two brain regions would be different depending on the participants’ different levels of L2 listening proficiency and task conditions.

40

2.

41

2.1.

42 43

2.1.1. L2 listening proficiency Participants’ ESL listening proficiency was tested (M=29.47, SD=4.68; full mark=40). The high

Results Behavioral data

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

listening proficiency group (HLG) consisted of the 15 participants with the highest scores (M=33.13, SD=2.26), and the low listening proficiency group (LLG) consisted of the 15 participants with the lowest scores (M=25.80, SD=3.38) (Fig. 1a). The two groups were significantly different in their ESL listening proficiency [F(1, 28)=48.650, p<0.001, η2=0.635]. 2.1.2. Word-spotting task The participants’ behavioral performance in the word-spotting task was recorded (M=57.80, SD=9.14; full mark=94). The results from each group were analyzed(HLG: M=61.13, SD=9.55; LLG: M=54.45, SD=7.61) (Fig. 1b). A two-way repeated-measures ANOVA (RM-ANOVA) confirmed a significant main effect of participant group [F(1, 28)=5.598, p<0.05, η2=0.167]. A posthoc comparison showed that the HLG performed significantly better than the LLG (p<0.05). The participants’ performance was also analyzed in the two task conditions (easy vs. difficult). Both groups achieved better results in the easy (M=33.87, SD=5.16; full mark=47) than in the difficult task condition (M=23.93, SD=5.12; full mark=47) (Fig. 1c). An RM-ANOVA revealed a significant main effect of task condition [F(1, 28)=131.617, p<0.001, η2=0.825]. However, no significant interaction effect was found between participant group (HLG vs. LLG) and task condition (easy vs. difficult) [F(1, 28)=0.717, p=0.404, η2=0.025]. Post-hoc comparisons showed that the two groups (HLG and LLG) performed similarly in the easy task condition (p=0.122) but were significantly different in the difficult task condition (p<0.05).

19 20 21 22 23 24 25 26 27 28 29 30

Figure 1 Mean behavioral task scores. (a) ESL listening proficiency scores of the high and low ESL listening proficiency groups. (b) Word-spotting task scores in the high and low ESL listening proficiency groups (both task conditions). (c) Word-spotting task scores in easy and difficult task conditions (both participant groups). Error bars show standard deviations (SD). *** p<0.001,* p<0.05. 2.1.3. Correlation analysis of L2 listening proficiency and performance of spoken word segmentation A Pearson correlation analysis was carried out to check whether participants’ ESL listening proficiency was related to their performance in the word-spotting task. The results revealed a significant correlation between ESL listening proficiency and word-spotting performance (n=30, r=0.420, p<0.05).

31

2.2.

32 33

GLM beta weights of HbO were used to identify differences in fNIRS response between task conditions (easy vs. difficult) and participant groups (LLG vs. HLG). An RM-ANOVA, with

34 35 36 37

Greenhouse-Geisser correction for non-sphericity (Geisser & Greenhouse, 1958), was performed channel by channel for each participant. No significant interaction effect was found between condition and group while the main effects of condition and group were significant in several channels (Table 1, Fig. 3).

fNIRS data

5

1

2 3 4 5 6 7 8 9 10 11 12 13

Table 1 RM-ANOVA results of beta weights of hemodynamic activation Channel

Brain area

Ch4 Ch5 Ch6 Ch7 Ch9 Ch10 Ch11 Ch12 Ch13 Ch14 Ch18 Ch19 Ch27 Ch35 Ch36

r-SMG (BA40) r-SMG (BA40) r-SMG (BA40) r-SMG (BA40) l-PS (BA3) l-SMG (BA40) l-SMG (BA40) l-PS (BA2) l-SMG (BA40) l-STG (BA22) r-FPC (BA10) l-FPC (BA10) FPC (BA10) r-OFC (BA11) l-OFC (BA11)

Group (LLG, HLG)

0.025* 0.013*

0.025* 0.045* 0.044* 0.045* 0.015* 0.016* 0.017* 0.050* 0.012* 0.024*

0.041*

0.048* 0.014* 0.022* 0.010** 0.017* 0.030* 0.033*

Abbreviations: SMG, supramarginal gyrus; PS, primary somatosensory cortex; STG, superior temporal gyrus; FPC, frontopolar cortex; OFC, orbitofrontal cortex. * p<0.05 ** p<0.01, Bonferroni corrected; blank cells represent non-significance. 2.2.1. Neural response differences between the two task conditions An RM-ANOVA analysis was conducted on beta values channel by channel in the two participant groups. The results revealed a main effect of task condition at Ch4, 5, 7 in the right supramarginal gyrus part of Wernicke’s area (r-SMG, BA40), Ch12 in the left primary somatosensory cortex (lPS, BA2), Ch14 in the left superior temporal cortex (l-STG, BA22), Ch18 in the right frontopolar cortex (r-FPC, BA10), Ch19 in the left FPC (BA10), Ch27 in the FPC (BA10), Ch35 in the right orbitofrontal cortex (r-OFC, BA11), and Ch36 in the left OFC (BA11) (Table 2, Fig. 3a). Table 2 RM-ANOVA results of beta weights of hemodynamic activation between easy and difficult task conditions Channel Ch4 Ch5 Ch7 Ch12 Ch14 Ch18 Ch19 Ch27 Ch35 Ch36

14

Condition (easy, difficult)

Estimated MNI X

Y

Z

63 61 69 -69 -68 16 -13 3 16 -12

-39 -52 -41 -28 -41 72 72 70 72 72

51 46 28 30 21 16 14 2 -8 -8

Brain area

Probability

F(1, 28)

Sig.

η2

r-SMG (BA40) r-SMG (BA40) r-SMG (BA40) l-PS (BA2) l-STG (BA22) r-FPC (BA10) l-FPC (BA10) FPC (BA10) r-OFC (BA11) l-OFC (BA11)

0.948 0.791 0.882 0.726 0.673 1 1 0.994 0.781 0.784

5.633 7.035 4.603 4.199 6.808 5.841 7.681 6.498 5.204 5.000

0.025* 0.013* 0.041* 0.050* 0.014* 0.022* 0.010** 0.017* 0.030* 0.033*

0.167 0.201 0.141 0.130 0.196 0.173 0.215 0.188 0.157 0.152

Abbreviations: SMG, supramarginal gyrus; PS, primary somatosensory cortex; STG, superior

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

temporal cortex; FPC, frontopolar cortex; OFC, orbitofrontal cortex. * p<0.05 ** p<0.01, Bonferroni corrected. We also conducted post-hoc comparisons of beta values of HbO concentrations in response to different task conditions, and the results suggested that brain activation was significantly more intense in the difficult task condition than in the easy task condition in the following channels: Ch4 (easy: 0.010±0.002, difficult: 0.012±0.002, p<0.05), Ch5 (easy: 0.010±0.002, difficult: 0.012±0.002, p<0.05), Ch7 (easy: 0.010±0.002, difficult: 0.012±0.002, p<0.05), Ch12 (easy: 0.009±0.002, difficult: 0.011±0.003, p<0.05), Ch14 (easy: 0.012±0.002, difficult: 0.014±0.002, p<0.05), Ch18 (easy: 0.008±0.001, difficult: 0.011±0.002, p<0.05), Ch19 (easy: 0.009±0.001, difficult: 0.012±0.002, p<0.01), Ch27 (easy: 0.013±0.002, difficult: 0.016±0.002, p<0.05), Ch35 (easy: 0.006±0.001, difficult: 0.009±0.002, p<0.05), and Ch36 (easy: 0.007±0.001, difficult: 0.010±0.002, p<0.05). To observe the overall temporal responses under each experimental condition, event-related HbO and HbR hemodynamic response changes were grand-averaged across the two participant groups in the channels with a significant difference (Fig. 2). Overall, participants’ hemodynamic responses showed a canonical increase in HbO and a decrease in HbR. The experimental tasks activated hemodynamic responses in both the TPJ and the FPC. In the TPJ, the r-SMG (BA40) channels (Ch4, 5, 7) and the l-STG (BA22) channel (Ch14) had higher sensitivity than did the l-PS (BA2) channel (Ch12). In the FPC, the FPC (BA10) channels (Ch18, 19, 27) had higher response intensity than did the OFC (BA11) channels (Ch35, 36).

7

1 2 3 4 5 6 7 8 9 10

Figure 2 Grand-averaged event-related hemodynamic time courses. The red and blue lines represent the estimated changes in the concentration of HbO and HbR respectively across participant groups. Solid lines indicate hemodynamic responses under the difficult task condition, while dashed lines show hemodynamic responses under the easy task condition. 2.2.2. Neural response differences between the two participant groups Results from the ANOVA contrast analyses revealed the group main effect at Ch4, 5, 6, 7 in the right SMG (BA40), Ch9 in the left PS (BA3), Ch10, 11 in the left SMG (BA40), Ch12 in the left PS (BA2), Ch13 in the left SMG (BA40), and Ch14 in the left STG (BA22) (Table 3, Fig. 3b). Table 3 RM-ANOVA results of beta weights of hemodynamic activation between LLG and HLG Channel

Estimated MNI

Brain area

Probability

F(1, 28)

Sig.

η2

51

r-SMG (BA40)

0.948

5.624

0.025*

0.167

-52

46

r-SMG (BA40)

0.791

4.406

0.045*

0.136

70

-30

37

r-SMG (BA40)

0.690

4.453

0.044*

0.137

69

-41

28

r-SMG (BA40)

0.882

4.401

0.045*

0.136

X

Y

Z

Ch4

63

-39

Ch5

61

Ch6 Ch7

8

1 2 3 4 5 6 7 8 9 10 11 12 13

Ch9

-59

-22

52

l-PS (BA3)

0.506

6.784

0.015*

0.195

Ch10

-46

-44

62

l-SMG (BA40)

0.753

6.523

0.016*

0.189

Ch11

-63

-38

48

l-SMG (BA40)

0.912

6.475

0.017*

0.188

Ch12

-69

-28

30

l-PS (BA2)

0.726

4.126

0.050*

0.128

Ch13

-61

-52

43

l-SMG (BA40)

0.756

7.188

0.012*

0.204

Ch14

-68

-41

21

l-STG (BA22)

0.673

5.730

0.024*

0.170

Abbreviations: SMG, supramarginal gyrus; PS, primary somatosensory cortex; STG, superior temporal cortex. * p<0.05, Bonferroni corrected. We conducted post-hoc comparisons on the beta values of the HbO concentrations between the two groups. Results showed that brain activation was significantly more intense in the LLG than in the HLG in the following channels: Ch4 (LLG: 0.016±0.003, HLG: 0.006±0.003, p<0.05), Ch5 (LLG: 0.015±0.003, HLG: 0.007±0.003, p<0.05), Ch6 (LLG: 0.011±0.002, HLG: 0.004±0.002, p<0.05), Ch7 (LLG: 0.016±0.003, HLG: 0.006±0.003, p<0.05), Ch9 (LLG: 0.014±0.002, HLG: 0.007±0.002, p<0.05), Ch10 (LLG: 0.014±0.003, HLG: 0.003±0.003, p<0.05), Ch11 (LLG: 0.014±0.004, HLG: 0.001±0.004, p<0.05), Ch12 (LLG: 0.015±0.003, HLG: 0.006±0.003 p<0.05), Ch13 (LLG: 0.017±0.004, HLG: 0.003±0.004, p<0.05), and Ch14 (LLG: 0.018±0.003, HLG: 0.008±0.003, p<0.05).

14

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Figure 3 RM-ANOVA results of beta weights representing condition differences and group differences. Channels with significant main effects are marked by circles. (a) Contrast between easy and difficult task conditions. (b) Contrast between HLG and LLG. 2.2.3. The correlation between behavioral and fNIRS data Pearson correlation analyses revealed that participants’ ESL listening proficiency was significantly correlated with the beta values of HbO in the word-spotting task in some channels. Both positive and negative correlations were found. Under the easy task condition, the correlations were negative in Ch7 (r-SMG, BA40) (r=-0.370, p<0.05), Ch9 (l-PS, BA3) (r=-0.413, p<0.05), Ch33 (right inferior prefrontal gyrus, BA47) (r=-0.412, p<0.05), and Ch34 (r-FPC, BA10) (r=-0.458, p<0.05), and positive in Ch25 (r-FPC, BA10) (r=0.432, p<0.05), Ch26 (r-FPC, BA10) (r=0.409, p<0.05), Ch27 (FPC, BA10) (r=0.453, p<0.05), and Ch28 (l-FPC, BA10) (r=0.430, p<0.05). Under the difficult task condition, negative correlations were found in Ch2 (r-SMG, BA40) (r=-0.382, p<0.05), Ch7 (r-SMG, BA40) (r=-0.396, p<0.05), and Ch9 (l-PS, BA3) (r=-0.380, p<0.05), and positive correlations in Ch26 (r-FPC, BA10) (r=0.767, p<0.001), Ch27 (FPC, BA10) (r=0.677, p<0.001), and Ch28 (l-FPC, BA10) (r=0.789, p<0.001).

18

3.

19 20 21 22 23 24 25 26 27 28 29 30

The current research is a preliminary attempt to use fNIRS to investigate the neural mechanisms of English spoken word segmentation in Chinese ESL learners with a word-spotting paradigm. Its major findings either confirm or extend the results of previous behavioral and neuroscience studies, and also have clinical implications. Specifically, participants’ performance in spoken word segmentation was positively correlated with their ESL listening proficiency. We found that the TPJ (l-STG, r-SMG, l-SMG) and PFC (FPC, r-FPC, l-FPC, r-OFC, l-OFC) were both involved in spoken word segmentation; however, the activation patterns in these brain regions were different depending on participants’ ESL listening proficiency levels and on task conditions. The behavioral and fNIRS findings suggest that higher ESL listening proficiency implies a more efficient use of processing resources in spoken word segmentation. From the perspective of cognitive neuroscience, this study’s results also suggest the possible engagement of cognitive inhibition in spoken word segmentation.

Discussion

10

1

3.1.

Behavioral results

2 3 4 5 6 7 8 9 10 11 12 13 14

Our behavioral analysis showed that the ESL participants’ performance of word-spotting in the easy task condition was significantly better than it was in the difficult task condition, a result aligned with that of Cutler and Shanley (2010). This result is also consistent with findings reported by McQueen (1998) and Weber and Cutler (2006) with the first language (L1) participants using a similar paradigm. In the overall performance of word-spotting, the difference between the two groups (HLG vs. LLG) was significant, and the difference between the two task conditions (easy vs. difficult) was also significant. Specifically, we found a statistical difference between the two groups in the difficult task condition but not in the easy task condition. This finding indicates that the group difference in the difficult task condition was probably the main contributor to the overall difference across the two task conditions. However, no significant interaction effect between participant group and task condition was found. Such a result might be caused by the relatively small sample size. Studies in phonological processing have shown that the literacy level of native-speaker children

15 16 17 18 19 20

affects their phonological awareness (e.g. Castles et al., 2003; Ehri & Wilce, 1980; Stuart, 1990), which further influences their speech perception. By testing the participants’ L2 listening proficiency and using a word-spotting paradigm with ESL participants, the current study replicates and expands previous findings. Our finding of a positive correlation between ESL listening proficiency and participants’ performance in the word-spotting task suggests that better ESL listening skills are associated with better performance in spoken word segmentation.

21

3.2.

22 23 24 25 26

The fNIRS data showed that the TPJ and PFC were involved in spoken word segmentation. Through contrast analysis of the two task conditions, we found significant differences not only in the SMG (BA40) and STG (BA22), which are auditory brain areas, but also in brain areas involved in topdown cognitive control, i.e. the FPC (BA10) and OFC (BA11). It has been well established in the literature that both the SMG (BA40) and the STG (BA22) are

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

responsible for auditory language processing in a bottom-up fashion (e.g. Dai et al., 2018; Hall & Moore, 2003; Perner & Aichhorn, 2008; Saxe & Wexler, 2005), including on a lexical level (e.g. Capek et al., 2008; San José‐Robertson et al., 2004). Spoken word segmentation in the present study is a type of lexical language processing. Therefore, the activation we observed in the SMG (BA40) and STG (BA22) was expected. In the current study, the RM-ANOVA results further suggested that brain activation differed significantly in the SMG (BA40) and STG (BA22) between the two task conditions and between the two listening proficiency levels. These results imply that both task condition and ESL listening proficiency played a role in the participants’ English spoken word segmentation. As for the condition difference, the post-hoc comparison results of beta values further showed that brain activation in both the SMG (BA40) and STG (BA22) was significantly stronger in the difficult task condition than in the easy task condition. This finding was expected because more cognitively demanding tasks require more neural resources (e.g. Gandour et al., 2007). Previous studies have suggested that the SMG and STG appear to be involved in the processing of the segmental structure (phonology) of language, and that these two brain regions may be engaged in detecting spoken phonetic features, identifying spoken words accurately, and facilitating perceptual identification

fNIRS results

11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

when phonetic contrast is ambiguous (e.g. Campbell et al., 1999). Similarly, the behavioral data in the present study showed a significant difference in the performance of word segmentation between the two groups in the difficult but not the easy task condition. As for the group difference, the results showed that neural activation in the SMG (BA40) and STG (BA22) was less intense for the more proficient ESL listeners than for their less proficient counterparts under both task conditions. This finding indicates that the more proficient participants made better use of their processing resources to achieve better performance in spoken word segmentation. In contrast, the less proficient ESL listeners generally required stronger bottom-up processing to perceptually separate the target words from competing nonsense strings, especially in the difficult task condition. They, therefore, had to engage the brain networks involved in bottomup processing (i.e. the TPJ) more often and more fully than did their more proficient counterparts in order to help gate and enhance sensory input processing. Similar results were reported by Bunce et al. (2011) and Marian and Shook (2012), who claimed that greater expertise was associated with relatively less neural activity. This finding is also supported by numerous neuroscience experiments examining expertise and task difficulty (e.g. Marian & Shook, 2012). Evidence from previous studies indicates that both the FPC (BA10) and the OFC (BA11) are involved in cognitive inhibition (e.g. Chmielewski et al., 2014; Rubia et al., 2003), and that cognitive inhibition is probably related to word-spotting (e.g. Luce & Lyons, 1999; McClelland & Elman, 1986; Norris, 1994). Therefore, the activation of the FPC (BA10) and OFC (BA11) observed in the present study likely reflected the presence of an inhibitory function in spoken word segmentation, and may provide potential neural evidence for the engagement of this function. The contrast analyses showed significant differences in brain activation in the FPC (BA10) and OFC (BA11) between the two task conditions. However, no significant difference in cortical activation between the two participant groups (HLG vs. LLG) in these two brain areas was found. Further analyses of the beta values of HbO showed that neural activation in the FPC (BA10) and OFC (BA11) in both participant groups was more intense in the difficult task condition than in the easy task condition. One possible explanation for these results is that because the word boundaries were less transparent in the difficult task condition, lexical selections were more effortful/attentiondemanding and involved more conflict; thus, the deployment of greater cognitive inhibition became more necessary (e.g. efficiently suppressing competing dominant but irrelevant cognitive processing or mental activities). Therefore, the activation of brain regions involved in top-down cognitive processes (including the FPC and OFC) increased under the difficult condition. Such increased activation might be associated with increased cognitive control demands. Accordingly, increased activation in the FPC and OFC might suggest that these two brain regions were particularly important in effective cognitive inhibitory functioning for successful spoken word segmentation, especially under the difficult task condition. In addition, the behavioral results suggested that the difference between the HLG and LLG mainly came from their performance in the difficult task condition. It seemed that the more proficient ESL listeners were good at using this top-down cognitive mechanism to help them achieve better performance under challenging conditions. It should also be noted that Broca’s area in the left inferior frontal gyrus (IFG), including the pars opercularis (BA44) and pars triangularis (BA45), was covered in the current study. However, no statistical difference in the activation of Broca’s area was found between either the two task conditions or the two participant groups. Based on extensive literature, Ardila et al. (2016) showed that Broca’s area is primarily responsible for language production. However, the language 12

1 2 3

production required by the task in the current study was not demanding. Therefore, it is reasonable that the activation in Broca’s area was not significantly different across the task conditions or participant groups.

4

3.3.

Behavioral and neural correlates

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

We also performed correlation analysis to gain further insights into the relationship between ESL listening proficiency and the beta values of fNIRS-measured cortical activation in spoken word segmentation. Results showed significantly negative correlations in SMG (BA40) (Ch2, 7), suggesting that the higher the ESL listeners’ proficiency was, the lower the beta values in the SMG (BA40) were. This result confirmed our findings in regard to the differences in behavioral performance and brain activation between the two participant groups. Our preceding analysis showed that the HLG participants were better at word-spotting and used their neural resources more efficiently than did the LLG participants. The result is also consistent with those of numerous studies in language processing (e.g. Abutalebi, 2008). In addition, Alain et al. (2018) found an association of higher expertise and lower brain activity among professional musicians, who showed lower brain activity during music tasks requiring both executive functioning and working memory than did a control group, reflecting the professionals’ more efficient use of neural resources. In contrast, we found ESL listening proficiency was positively correlated with beta values in some channels (Ch25, 26, 27, 28) in the FPC (BA10), implying that brain activation in those regions during spoken word segmentation was more intensive in the more proficient ESL participants (HLG). Correlation analysis of the behavioral data revealed that the HLG participants were also good at spoken word segmentation. Increased activation in the FPC (BA10) and OFC (BA11) might reflect an increase in the top-down processing involved in the monitoring of word recognition, leading to better performance in spoken word segmentation. Taken together, this evidence and the results of the SMG (BA40) and STG (BA22) probably indicated that the more proficient ESL participants were able to make good use of top-down cognitive resources to help them segment spoken words (demonstrated in the PFC activation) while they were also more efficient in using neural resources in a bottom-up fashion in the auditory-related TPJ. This result is also consistent with the results of studies on the interactive functioning of different brain regions in language

29 30 31 32 33 34 35 36 37 38 39 40 41 42

processing (e.g. Bonte et al., 2005; Noesselt et al., 2003; Perlovsky, 2011). Moreover, spoken word segmentation in the present study can be seen as a process of listening to words, and thus our results might contribute to this line of study by supporting one of the models in L2 listening that has been proposed. In regard to L2 listening comprehension, Flowerdew and Miller (2010) summarized three popular models of the listening process: the bottom-up model, the topdown model, and the interactive model. In the bottom-up model, listeners decode acoustic meaning through the sequence of phonemes, words, phrases, and sentences, while in the top-down model they do so through an inverted sequence by starting from contextual knowledge. The interactive model combines the two aforementioned models, considering listening as a process of simultaneous bottom-up and top-down interpretation of auditory input. Our results supported the interactive model on the lexical level by showing that both bottom-up processing (demonstrated in the TPJ activation) and top-down processing (demonstrated in the FPC activation) took place during the ESL listeners’ spoken word segmentation.

13

1

3.4.

Potential clinical applications

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

The present study has practical implications for Auditory Processing Disorder (APD) and auditory/language interventions. Children with APD can successfully pass a hearing test but still struggle to process and make sense of the words they hear, especially in the presence of other noise (Putter-Katz et al., 2002). APD can pose lifelong difficulties if it is not diagnosed and treated properly, causing problems in communication, academic achievement, and the development of social skills. One important step in diagnosing APD is to test children’s language listening skills and cognitive abilities (Ferguson et al., 2011). The current study indicates that spoken word segmentation is positively correlated with language listening proficiency. Therefore, patients’ performance in word segmentation could be used to predict their language listening skills. This would be a much easier diagnostic method than those currently in use if designed properly. Besides, if the association between spoken word segmentation and cognitive inhibition was established, a single word segmentation task could probably provide therapists with information on patients’ abilities in both language listening and cognition. Upon diagnosis of APD, therapists employ auditory/language interventions for the treatment. Auditory treatments are usually performed by manipulating auditory components of speech and non-speech stimuli, such as rate, intensity, frequency, inter-stimulus interval, and background noise (Fey et al., 2011). These variables are easy to control and manipulate in a word-spotting paradigm using audio-editing software. Moreover, the validity of the paradigm of spoken word segmentation the current study has been verified in educational scenarios by previous research (e.g. Al-jasser, 2008; Cutler & Shanley, 2010; Delvaux et al., 2015). Therefore, speech therapists could use spoken word segmentation tasks as a preliminary method to train patients to help them build their ability to identify distinct words in speech sounds, on the basis of which they can further develop their conversational and language listening abilities. It should be noted that the clinical implications of the current study for APD diagnosis and treatment might be practical for patients in their L1, rather than in their L2. Because APD patients already have difficulties in comprehending sounds and syllables in their native language, it would be totally futile for them to understand and learn a second language. Doing so might even undermine the recovery of their native language.

30

3.5.

31 32 33 34 35 36 37 38

Several limitations of the present study, which suggest directions for further research, should be noted. First, only young and healthy Chinese participants, most of whom were female, joined the study, limiting its generalizability to other populations. Future studies should seek a balance of participants’ gender and extend to more diverse participant groups. Further cross-cultural studies between Chinese English learners and native English speakers should be carried out. Second, though we found neural evidence that led us to tentatively suggest the engagement of cognitive inhibition in word segmentation, the present study did not directly test cognitive inhibition. Future studies should directly investigate the influence of cognitive inhibition on spoken word segmentation.

39

3.6.

40 41

In summary, the present study is a preliminary attempt to employ fNIRS to investigate the neural mechanisms of English spoken word segmentation in Chinese ESL participants using a word-

Limitations

Summary

14

1 2 3 4 5 6 7 8 9 10 11 12

spotting paradigm. We found that the Chinese participants’ performance in English word segmentation was positively correlated with their ESL listening proficiency. Both the TPJ and the PFC were activated during the participants’ segmentation of words, indicating that both bottom-up processing and top-down processing were taking place, and the activation was more intense in the difficult task condition than in the easy task condition. As for the TPJ, which is related to bottomup auditory language processing, the correlation of its activation with ESL listening proficiency was found to be negative, suggesting that the more proficient L2 listeners were better and more efficient at using processing resources in a bottom-up fashion for accurate sensory representations than the less proficient L2 listeners. Moreover, the activation of the PFC and its positive correlation with ESL listening proficiency suggested that efficient top-down cognitive inhibitory functioning might probably subserve spoken word segmentation, especially under the difficult task condition.

13

4.

14

4.1.

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

A total of 133 Mandarin Chinese–speaking university juniors (12 males, 121 females, mean age: 19.96 ± 1.04 years, range: 18–25 years) majoring in English participated in an English listening proficiency test. However, 72 of them could not participate in the fNIRS scan mainly because of travel distance or time conflicts. As a result, only 61 participants (5 males, 56 females, mean age: 19.79 ± 0.80 years, range: 18–21 years) confirmed they could also continue to take part in the brain imaging experiment. The current study used an extreme group design based on this sample size. According to Feldt (1961) and Preacher (2014), such a design would best indicate the group differences in relatively small sample size, instead of testing on a larger sample. As the most valid sample size in extreme group design would be 25–27% for the two extremes (Feldt, 1961; Preacher, 2014), 32 participants were selected from the 61 volunteers based on their English listening proficiency scores. The 16 participants with the highest scores were assigned to the high listening proficiency group (HLG), and the 16 participants with the lowest scores were assigned to the low listening proficiency group (LLG). Two participants (one in each group) were excluded from further analysis as they did not finish the required research procedure. Therefore, there were 15 participants in each group, and the data of a total of 30 participants were analyzed (2 males, 28 females, mean age: 19.60 ± 0.77 years, range: 18–21 years). An analysis of their background information showed no statistical differences between the two groups in age (HLG: 19.80 ± 0.56, LLG: 19.40 ± 0.91, p=0.158) or in length (in years) of ESL learning (HLG: 12.63 ± 0.52, LLG: 12.77 ± 0.70, p=0.560). A 60-item Chinese version of the Raven’s Standardized Reasoning Test (Zhang & Wang, 1985) also suggested that there was no statistical difference in general intelligence between the two groups (HLG: 56.40 ± 3.16, LLG: 56.33 ± 2.26, p=0.947). All participants had passed the TEM-4 (Test for English Majors-Band 4), which is a national English proficiency test for English majors in China. All participants were right-handed with no known cognitive or psychomotor impairment, with normal or corrected-to-normal vision, and self-reported normal hearing. They all reported English to be their second language (ESL). The study was approved by the Ethics Committee of Shaanxi Normal University in China, and all participants signed a written informed consent form.

Experimental procedures Participants

15

1

4.2.

Equipment

2 3 4 5 6 7 8 9

The word-spotting task was conducted in a sound-attenuated room. Participants were seated about 75 cm away from a 28-inch LED computer monitor. Two loudspeakers mounted right beside the monitor were used to play the audio stimuli. Brain activities were recorded by a Shimadzu (Kyoto, Japan) NS-1005 continuous-wave fNIRS system at three wavelengths of 780 nm, 805 nm, and 830 nm (sampling rate 10 Hz). In the fNIRS task, the experiment was implemented in E-Prime 2.0 (Psychology Software Tools, Inc., Pittsburgh, PA), and participants responded by pressing the space bar on the keyboard. To ensure an accurate position of each optode, a 3D digitizer (FASTRAK; Polhemus, Colchester, VT, USA) was used (Singh et al., 2005).

10

4.3.

L2 listening proficiency

11 12 13 14 15 16 17 18 19 20

A test taken from authentic examination papers in Cambridge English IELTS 9 (2013) was used to assess the participants’ ESL listening proficiency. The test has a total of 40 questions in four sections. A question before the test showed that no participant in the current research had previously taken this particular version of the IELTS listening test. The test was carried out in an audiovisual classroom, in which every participant had a computer terminal and a headset. Participants were asked to answer the questions as they listened to the test’s audio within the given time limit, and the audio was played once only. The questions required the test-takers to either mark the right choices or write down answers in less than three words. The whole test took about 40 minutes. The testing and the rating of English listening proficiency were carried out by experienced university lecturers of English listening following the IELTS test instruction and answer keys.

21

4.4.

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Participants’ performance in spoken word segmentation was measured using one of the most influential paradigms, the word-spotting paradigm (Cutler & Norris, 1988), since it is a promising tool to study the segmentation and recognition of spoken language for different subjects and in different languages (McQueen, 1996). In a word-spotting task, participants hear a series of nonsense word strings, which contain embedded words. Participants’ job is to find real words in the word strings within a limited time. We adopted stimuli for the word-spotting task from Cutler and Shanley (2010) and Farrell (2015). The stimuli consisted of 100 multisyllabic strings (six of which were used for a practice session) made of real words (target words) and nonsense strings (i.e. a word plus a nonsense syllable, e.g. westej, lencool). Several manipulations were made for the purpose of variable control, including the participants’ familiarity of the real words in the stimuli, word distribution and task conditions, as well as appropriate speed and acoustic features of the stimuli. First, the familiarity of the test stimuli to the participants was carefully checked. As all participants had passed TEM-4, we chose an initial 180 words from the exam vocabulary, which were also target words of the stimuli used in Cutler and Shanley (2010) and Farrell (2015). In addition, a word familiarity test based on a 5-point scale was administered to an independent sample of 30 students (2 males, 28 females, mean age: 19.83 ± 0.65 years, range: 18–21 years), who were in the same year of university, with the same major and TEM-4 level. They were required to rate the familiarity of each target word (a total of 180 words) on a scale from 1 (extremely unfamiliar) to 5 (extremely familiar). One hundred words that were evaluated as familiar (mean scores > 4) were chosen to use

Spoken word segmentation

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

in the stimuli. Second, word distribution and task conditions were balanced. According to Gitt (2006), 71.5% of English words are monosyllabic, suggesting that the other 28.5% are multisyllabic. To correspond to this word distribution, we carefully chose stimuli consisting of 72 monosyllabic and 28 multisyllabic target words. Next, we balanced the positions of target words, so that 50 of them were initial and the other 50 were final in the word strings. Following Cutler and Shanley (2010), we employed two types of syllable boundary transitions, creating the two task conditions, i.e. the easy and difficult task conditions. An easy task condition item has an unambiguous word boundary, for example, boss in bossthet and cloth in detcloth. In the difficult task condition, the word boundary is ambiguous and has liaison, such as pack in packom and agree in veamagree. The experiment utilized 50 word strings for each task condition. A t-test showed that the familiarity of the target words did not differ between the easy and difficult task conditions (p=0.672). The participants did not know beforehand whether the target word was in the initial or the final position as the stimuli were played in random order. Last but not least, to ensure the appropriate speed and acoustic features, the audio stimuli were recorded by a female native speaker of American English using a digital audio recording pen (44.1 kHz, 16 bit, mono). The speaker was asked to read the stimuli at a normal speed. The audio recordings were then further processed with Cool Edit Pro 2.1 (Adobe Inc., San Jose, CA) and Praat

19 20 21 22 23 24 25 26 27

6.1.04 (Boersma & Weenink, 2019). The final version of the stimuli had varying length (mean duration: 956.31 ± 159.90 ms, range: 765–1276 ms) and varying pitch (mean pitch: 177.96 ± 12.71 Hz, range: 161.05–207.94 Hz), and were normalized for peak intensity (70 dB). The stimuli for the easy task condition varied in length (mean duration: 967.00 ± 175.85 ms, range: 765–1276 ms) and in pitch (mean pitch: 180.03 ± 12.44 Hz, range: 172.10–207.94 Hz). The stimuli for the difficult task condition also varied in length (mean duration: 945.63 ± 153.66 ms, range: 765–1275 ms) and in pitch (mean pitch: 175.89 ± 13.62 Hz, range: 161.05–202.77 Hz). However, there was no statistical difference between the stimuli of the two task conditions in length and pitch (for length p=0.625, for pitch p=0.696).

28

4.5.

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

After informed consent was obtained, the holder and the optodes were attached to the participants. Participants were asked to sit as still as possible to reduce motion artifacts during the whole fNIRS task. Prior to the experiment, participants were instructed to take a rest and try to be relaxed for a period of 180 seconds while keeping their eyes fixed on a fixation cross on the screen. This rest state allowed participants to become accustomed to the optode holder. It was also recorded as a baseline. Participants then were given instructions and demonstrations for the word-spotting task, followed by six practice trials to ensure that they understood the procedure (Fig. 4). The formal task consisted of 94 trials with different task conditions (easy vs. difficult) presented randomly. In each trial, an 8s fixation cross was displayed, and then an audio stimulus (approximate duration 800– 1200ms) was played through the two loudspeakers placed on each side of the monitor. In each trial, participants were asked to find the target word as quickly and accurately as possible within a 3s time limit. If they recognized the target word in the time limit, they were required to make a keyboard response by pressing the space bar immediately. Only when a keyboard response was made within the 3s, participants were given 2s to speak out the target word, e.g. speaking out English words west and boss in response to westej and bossthet respectively. Their verbal answers were recorded by a

The fNIRS task

17

1 2 3 4

digital audio recording pen (44.1 kHz, 16 bit, mono) placed right in front of them in order to rate their verbal response after the experiment. If the participant did not identify a target word within the 3s time limitation, the next trial would automatically start. The fNIRS task lasted approximately 25 minutes.

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Figure 4 fNIRS task design 4.6.

fNIRS data acquisition

Following previous studies (e.g. Dai et al., 2018; Perner & Aichhorn, 2008; Saxe & Kanwisher, 2003; Saxe & Wexler, 2005; Seifritz et al., 2002), we selected two regions of interest (ROIs) to measure brain activation: (i) the temporo-parietal junction (TPJ) and (ii) the prefrontal cortex (PFC). fNIRS data acquisition was designed accordingly. Data were collected through a total of 39 channels, which consisted of 30 optodes distributed in one 2×9 (9 emitters and 9 detectors, 3 cm optode separation) and two 2×3 (3 emitters and 3 detectors, 3 cm optode separation) arrays (Fig. 5). To maintain consistent optode placement across participants, the International 10-20 System (Jasper, 1958) was used for guidance. The lower row of optodes of the 2×9 array was placed along the Fp1Fpz-Fp2 line. The exact locations of all optodes and four reference points (left and right mastoid points, nasion, and Cz) were then measured for each participant with a 3D digitizer (FASTRAK; Polhemus, Colchester, VT, USA), and converted into the channels’ location on an estimated Montreal Neurological Institute (MNI) standard brain space (Singh et al., 2005). Following Lawrence et al. (2018) and Wijayasiri et al. (2017), the mean positions of the optodes were used for data analysis (Fig. 5).

18

1

2 3 4 5 6 7 8

Figure 5 The 3D localization and configuration of 39 channels on the two hemispheres covering the prefrontal cortex and temporo-parietal junction cortex. (a) Photo of the right side arrays of the optode holder (optodes not inserted) worn on the head of one of the authors. The left side arrays of optodes are anatomically symmetrical to the right. (b) The 39 channels, consisting of 15 emitters and 15 detectors, distributed in one 2×9 and two 2×3 sets of optode arrays. The distance between each optode in each set is 3 cm. (c) (d) The estimated locations of the 39 channels on the cortex.

9

4.7.

10 11 12 13

fNIRS data analysis

The software package NIRS_SPM (Tak & Ye, 2014; Ye et al., 2009) and custom scripts in MATLAB (Math-Works, Natick, MA) and SPSS 19 (IBM Corp., Armonk, NY) were used to analyze the fNIRS data. We used the general linear model (GLM), which is a statistical linear model that explains data as a linear combination of an explanatory variable plus an error term, to compare 19

1 2 3 4 5 6 7 8 9

the covariance between the theoretical hemodynamic response and the actual response (Cloud, 2014). The theoretical response was created through a convolution of the boxcar function and the hemodynamic response function. The boxcar function is an essential component of the model function of a GLM that reflects the temporal structures of the experimental paradigm (Uga et al., 2014). One key result of the GLM analysis in NIRS_SPM is beta value, which indicates the intensity of the actual hemodynamic responses. As GLM measures the temporal variational pattern of signals rather than their absolute magnitude, it is more robust in many cases, even for those with an incorrect diffusion pathlength factor (DPF) or with severe optical signal attenuation due to scattering or poor contact (Ye et al., 2009).

10 11 12 13 14 15 16 17 18 19 20 21 22 23

According to the modified Beer-Lambert Law (Cope & Delpy, 1988), both HbO and HbR signals represent changes in cerebral blood flow. However, as the HbO signal is more sensitive to cerebral blood flow (e.g. Hoshi, 2016; Jiang et al., 2012; Lu et al., 2018), we only analyzed HbO signals in the current research. The fNIRS data of each participant were analyzed as follows. First, a baseline correction was applied. Next, both a hemodynamic response function filter and a wavelet-MDL (minimum description length) were employed in order to remove breathing, cardiac, or motion noises or artifacts (e.g. Brigadoi et al., 2014; Tang et al., 2015; Ye et al., 2009). Then, a general linear model convolving the two task functions was used to estimate parameters (beta values of the GLM model as the weights). Beta values, which represent the amplitudes of the hemodynamic responses, are indicators of neural activation, and higher (absolute) beta values indicate a higher level of cortical activation (e.g. Köchel et al., 2013; Plichta et al., 2011; Plichta et al., 2007). Finally, an RM-ANOVA for contrast analysis was carried out with the within-subject factors of “task condition” (two levels: easy vs. difficult) and “group” (two levels: high vs. low proficiency) at each channel. Wherever the assumption of sphericity was violated, a Greenhouse-Geisser correction

24 25

(Geisser & Greenhouse, 1958) was applied. To control for false positives, all p-values were corrected by a Bonferroni correction with α=0.05.

26 27

Declarations of interest

28 29

None.

30

Notes

31 32 33

Shaanxi Normal University and Xi’an International Studies University made an equal contribution to and share equal credit for the current research.

34

Acknowledgments

35 36 37 38 39

The deep gratitude of the authors goes to Dr. Anne Cutler and Dr. Janise Farrell for kindly providing us with the stimuli for the word-spotting task, and to Dr. Shengting Zhang for participant recruitment. This research is supported and funded by the National Natural Science Foundation of China [Grant No. 31700976; Grant No. 31871118]; the Humanity and Social Science Youth Foundation of the Ministry of Education of China [Grant No. 17XJC190002]; the Social Science Fund Project of 20

1 2 3 4 5 6 7 8

Shaanxi [Grant No. 2017P005]; the China Post-doctoral Science Foundation [Grant No. 2017M623099, 2018T111009]; and the Shaanxi Post-doctoral Science Foundation [Grant No. 2017BSHEDZZ128]; the Research Funds of Xi’an International Studies University [Grant No. 18XWA04]; the Applied Linguistics Research Project: Multidimensional and Multidisciplinary Perspectives [Grant No. 20180101]; the Major Project of the National Social Science Foundation of China [Grant No. 14ZDB160]; the Research Program Funds of the Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University [Grant No. 2019-05-002 BZPK01].

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

References Abutalebi, J. (2008). Neural aspects of second language representation and language control. Acta Psychologica, 128(3), 466-478. Al-jasser, F. (2008). The effect of teaching English phonotactics on the lexical segmentation of English as a foreign language. System, 36(1), 94-106. Alain, C., Khatamian, Y., He, Y., Lee, Y., Moreno, S., Leung, A. W., & Bialystok, E. (2018). Different neural activities support auditory working memory in musicians and bilinguals. Annals of the New York Academy of Sciences, 1423(1), 435-446. Ardila, A., Bernal, B., & Rosselli, M. (2016). How localized are language brain areas? A review of Brodmann areas involvement in oral language. Archives of Clinical Neuropsychology, 31(1), 112-122. Banel, M., & Bacri, N. (1995). Do metrical and phonotactic segmentation cues cooperate in spoken word recognition. Paper presented at the Proceedings of the 13th International Congress of Phonetic Sciences. Bisconti, S., Di Sante, G., Ferrari, M., & Quaresima, V. (2012). Functional near-infrared spectroscopy reveals heterogeneous patterns of language lateralization over frontopolar cortex. Neuroscience research, 73(4), 328-332. Bitan, T., Burman, D. D., Chou, T. L., Lu, D., Cone, N. E., Cao, F., . . . Booth, J. R. (2007). The interaction between orthographic and phonological information in children: an fMRI study. Human Brain Mapping, 28(9), 880-891. Bitan, T., Cheon, J., Lu, D., Burman, D. D., & Booth, J. R. (2009). Developmental increase in top–down and bottom–up processing in a phonological task: An effective connectivity, fMRI Study. Journal of Cognitive Neuroscience, 21(6), 1135-1145. Boas, D. A., Elwell, C. E., Ferrari, M., & Taga, G. (2014). Twenty years of functional near-infrared spectroscopy: introduction for the special issue. Neuroimage, 85(15), 1-5. Boersma, P., & Weenink, D. (2019). Praat: doing phonetics by computer (Version 6.1.04). Retrieved from http://www.praat.org/ Bonte, M., Parviainen, T., Hytönen, K., & Salmelin, R. (2005). Time course of top-down and bottom-up influences on syllable processing in the auditory cortex. Cerebral Cortex, 16(1), 115-123. Brigadoi, S., Ceccherini, L., Cutini, S., Scarpa, F., Scatturin, P., Selb, J., . . . Cooper, R. J. (2014). Motion artifacts in functional near-infrared spectroscopy: a comparison of motion correction techniques applied to real cognitive data. Neuroimage, 85, 181-191. Bunce, S. C., Izzetoglu, K., Ayaz, H., Shewokis, P., Izzetoglu, M., Pourrezaei, K., & Onaral, B. (2011). Implementation of fNIRS for monitoring levels of expertise and mental workload. Paper presented at the International Conference on Foundations of Augmented Cognition. Burton, M. W., Small, S. L., & Blumstein, S. E. (2000). The role of segmentation in phonological processing: an fMRI investigation. Journal of Cognitive Neuroscience, 12(4), 679-690. Campbell, R., Calvert, G., Brammer, M., MacSweeney, M., Surguladze, S., McGuire, P., . . . David, A. S. (1999). Activation in auditory cortex by speechreading in hearing people: fMRI studies. Paper presented at the International Conference on Auditory-Visual Speech Processing (AVSP'99), Santa Cruz, CA, USA. Capek, C. M., Waters, D., Woll, B., MacSweeney, M., Brammer, M. J., McGuire, P. K., . . . Campbell, R. (2008). Hand and mouth: Cortical correlates of lexical processing in British Sign Language

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

and speechreading English. Journal of Cognitive Neuroscience, 20(7), 1220-1234. Castles, A., Holmes, V. M., Neath, J., & Kinoshita, S. (2003). How does orthographic knowledge influence performance on phonological awareness tasks? The Quarterly Journal of Experimental Psychology, 56A(3), 445-467. Chang, E. F., Rieger, J. W., Johnson, K., Berger, M. S., Barbaro, N. M., & Knight, R. T. (2010). Categorical speech representation in human superior temporal gyrus. Nature neuroscience, 13(11), 1428-1432. Chen, L.-C., Sandmann, P., Thorne, J. D., Herrmann, C. S., & Debener, S. (2015). Association of concurrent fNIRS and EEG signatures in response to auditory and visual stimuli. Brain topography, 28(5), 710-725. Chmielewski, W. X., Mückschel, M., Roessner, V., & Beste, C. (2014). Expectancy effects during response selection modulate attentional selection and inhibitory control networks. Behavioural brain research, 274, 53-61. Cloud, M. A. (2014). Reliable Frontal Cortex Activity For An Oral Stroop Task Using Functional Near Infrared Spectroscopy. (M.A.), The University of Texas at Arlington, Arlington. Cope, M., & Delpy, D. T. (1988). System for long-term measurement of cerebral blood and tissue oxygenation on newborn infants by near infra-red transillumination. Medical and Biological Engineering and Computing, 26(3), 289-294. Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14(1), 113-121. Cutler, A., & Shanley, J. (2010). Validation of a training method for L2 continuous-speech segmentation. Paper presented at the 11th Annual Conference of the International Speech Communication Association (Interspeech 2010). Dai, B., Chen, C., Long, Y., Zheng, L., Zhao, H., Bai, X., . . . Lu, C. (2018). Neural mechanisms for selectively tuning in to the target speaker in a naturalistic noisy situation. Nature communications, 9(1), 1-12. Davis, M. H., & Johnsrude, I. S. (2003). Hierarchical processing in spoken language comprehension. Journal of Neuroscience, 23(8), 3423-3431. Delvaux, V., Huet, K., Calomme, M., Harmegnies, B., & Piccaluga, M. (2015). Teaching listening in L2: A successful training method using the word-spotting task. Paper presented at the ICPhS. Ehri, L. C., & Wilce, L. S. (1980). The influence of orthography on readers' conceptualization of the phonemic structure of words. Applied Psycholinguistics, 1(4), 371-385. Farrell, J. (2015). Training L2 speech segmentation with word-spotting. (PhD), Western Sydney University, Sydney. Retrieved from http://researchdirect.westernsydney.edu.au Feldt, L. S. (1961). The use of extreme groups to test for the presence of a relationship. Psychometrika, 26(3), 307-316. Ferguson, M. A., Hall, R. L., Riley, A., & Moore, D. R. (2011). Communication, listening, cognitive and speech perception skills in children with auditory processing disorder (APD) or specific language impairment (SLI). Journal of Speech, Language, and Hearing Research, 54(1), 211227. Fey, M. E., Richard, G. J., Geffner, D., Kamhi, A. G., Medwetsky, L., Paul, D., . . . Schooling, T. (2011). Auditory processing disorder and auditory/language interventions: An evidence-based systematic review. Language, Speech, and Hearing: Services in Schools, 42, 246-264. Flowerdew, J., & Miller, L. (2010). Listening in a second language. In A. D. Wolvin (Ed.), Listening and

23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

human communication in the 21st century (pp. 158-177). Oxford: Wiley-Blackwell. Gandour, J., Tong, Y., Talavage, T., Wong, D., Dzemidzic, M., Xu, Y., . . . Lowe, M. (2007). Neural basis of first and second language processing of sentence‐level linguistic prosody. Human Brain Mapping, 28(2), 94-108. Geisser, S., & Greenhouse, S. W. (1958). An extension of box's results on the use of the F distribution in multivariate analysis. The Annals of Mathematical Statistics, 29(3), 885-891. Gillon, G. T. (2005). Phonological awareness: effecting change through the integration of research findings. Language, speech, and hearing services in schools, 36(4), 346-349. Gitt, W. (2006). In the beginning was information: A scientist explains the incredible design in nature. Green Forest: New Leaf Publishing Group. Goh, C. C. M. (2000). A cognitive perspective on language learners' listening comprehension problems. System, 28(1), 55-75. Gow, D., & Gordon, P. C. (1995). Lexical and prelexical influences on word segmentation: Evidence from priming. Journal of Experimental Psychology: Human Perception and Performance, 21(2), 344-359. Goyet, L., de Schonen, S., & Nazzi, T. (2010). Words and syllables in fluent speech segmentation by French-learning infants: An ERP study. Brain Res, 1332, 75-89. Hall, D. A., & Moore, D. R. (2003). Auditory neuroscience: The salience of looming sounds. Current Biology, 13(3), R91-R93. Hassanpour, M. S., Eggebrecht, A. T., Culver, J. P., & Peelle, J. E. (2015). Mapping cortical responses to speech using high-density diffuse optical tomography. Neuroimage, 117, 319-326. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393. Hong, K.-S., & Santosa, H. (2016). Decoding four different sound-categories in the auditory cortex using functional near-infrared spectroscopy. Hearing research, 333, 157-166. Hoshi, Y. (2016). Hemodynamic signals in fNIRS. Progress in brain research, 225, 153-179. Jiang, J., Dai, B., Peng, D., Zhu, C., Liu, L., & Lu, C. (2012). Neural synchronization during face-to-face communication. Journal of Neuroscience, 32(45), 16064-16069. Johnson, E. K., & Tyler, M. D. (2010). Testing the limits of statistical learning for word segmentation. Developmental Science, 13(2), 339-345. Köchel, A., Schöngassner, F., & Schienle, A. (2013). Cortical activation during auditory elicitation of fear and disgust: a near-infrared spectroscopy (NIRS) study. Neuroscience Letters, 549, 197200. Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten‐month‐olds. Infancy, 14(6), 591-612. Lawrence, R. J., Wiggins, I. M., Anderson, C. A., Davies-Thompson, J., & Hartley, D. E. (2018). Cortical correlates of speech intelligibility measured using functional near-infrared spectroscopy (fNIRS). Hearing research, 370, 53-64. Lu, K., Xue, H., Nozawa, T., & Hao, N. (2019). Cooperation makes a group be more creative. Cerebral Cortex, 20(8), 1-14. Luce, P. A., & Cluff, M. S. (1998). Delayed commitment in spoken word recognition: Evidence from cross-modal priming. Perception & Psychophysics, 60(3), 484-490. Luce, P. A., & Lyons, E. A. (1999). Processing lexically embedded spoken words. Journal of Experimental Psychology: Human Perception and Performance, 25(1), 174-183.

24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Marian, V., & Shook, A. (2012). The cognitive benefits of being bilingual. Paper presented at the Cerebrum: the Dana forum on brain science. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive psychology, 18(1), 1-86. McQueen, J. (1996). Word spotting. Language and Cognitive Processes, 11(6), 695-699. McQueen, J. (1998). Segmentation of continuous speech using phonotactics. Journal of memory and language, 39(1), 21-46. McQueen, J., & Cox, E. (1995). The use of phonotactic constraints in the segmentation of Dutch. Paper presented at the Fourth European Conference on Speech Communication and Technology. Molholm, S., Martinez, A., Ritter, W., Javitt, D. C., & Foxe, J. J. (2004). The neural circuitry of preattentive auditory change-detection: an fMRI study of pitch and duration mismatch negativity generators. Cerebral Cortex, 15(5), 545-551. Newman, R. L., & Connolly, J. F. (2009). Electrophysiological markers of pre-lexical speech processing: Evidence for bottom–up and top–down effects on spoken word processing. Biological Psychology, 80(1), 114-121. Newman, R. L., Forbes, K., & Connolly, J. F. (2012). Event-Related Potentials and Magnetic Fields Associated with Spoken Word Recognition In M. J. Spivey, K. McRae, & M. F. Joanisse (Eds.), The Cambridge Handbook of Psycholinguistics (pp. 61-75). Cambridge: Cambridge University Press. Noesselt, T., Shah, N. J., & Jäncke, L. (2003). Top-down and bottom-up modulation of language related areas–an fMRI study. BMC neuroscience, 4(1), 4-13. Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52(3), 189-234. Norris, D., McQueen, J., & Cutler, A. (1995). Competition and segmentation in spoken-word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(5), 1209-1228. Perlovsky, L. (2011). Language and cognition interaction neural mechanisms. Computational Intelligence and Neuroscience, 2011, 1-13. Perner, J., & Aichhorn, M. (2008). Theory of mind, language and the temporoparietal junction mystery. Trends in cognitive sciences, 12(4), 123-126. Plichta, M. M., Gerdes, A. B., Alpers, G. W., Harnisch, W., Brill, S., Wieser, M. J., & Fallgatter, A. J. (2011). Auditory cortex activation is modulated by emotion: a functional near-infrared spectroscopy (fNIRS) study. Neuroimage, 55(3), 1200-1207. Plichta, M. M., Heinzel, S., Ehlis, A.-C., Pauli, P., & Fallgatter, A. J. (2007). Model-based analysis of rapid event-related functional near-infrared spectroscopy (NIRS) data: a parametric validation study. Neuroimage, 35(2), 625-634. Pollonini, L., Olds, C., Abaya, H., Bortfeld, H., Beauchamp, M. S., & Oghalai, J. S. (2014). Auditory cortex activation to natural speech and simulated cochlear implant speech measured with functional near-infrared spectroscopy. Hearing research, 309, 84-93. Preacher, K. (2015). Extreme groups designs. In R. L. Cautin & S. O. Lilienfeld (Eds.), The encyclopedia of clinical psychology (pp. 1-4). Indianapolis: Wiley. Price, C., Wise, R., Ramsay, S., Friston, K., Howard, D., Patterson, K., & Frackowiak, R. (1992). Regional response differences within the human auditory cortex when listening to words. Neuroscience Letters, 146(2), 179-182. Putter-Katz, H., Said, L. A.-B., Feldman, I., Miran, D., Kushnir, D., Muchnik, C., & Hildesheimer, M.

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

(2002). Treatment and evaluation indices of auditory processing disorders. Seminars in Hearing, 23(4), 357-364. Rubia, K., Smith, A. B., Brammer, M. J., & Taylor, E. (2003). Right inferior prefrontal cortex mediates response inhibition while mesial prefrontal cortex is responsible for error detection. Neuroimage, 20(1), 351-358. San José‐Robertson, L., Corina, D. P., Ackerman, D., Guillemin, A., & Braun, A. R. (2004). Neural systems for sign language production: mechanisms supporting lexical selection, phonological encoding, and articulation. Human Brain Mapping, 23(3), 156-167. Sanders, L. D., & Neville, H. J. (2003). An ERP study of continuous speech processing: I. Segmentation, semantics, and syntax in native speakers. Cognitive Brain Research, 15(3), 228-240. Sanders, L. D., Newport, E. L., & Neville, H. J. (2002). Segmenting nonsense: an event-related potential index of perceived onsets in continuous speech. Nature neuroscience, 5(7), 700-703. Saxe, R., & Kanwisher, N. (2003). People thinking about thinking people: the role of the temporo-parietal junction in “theory of mind”. Neuroimage, 19(4), 1835-1842. Saxe, R., & Wexler, A. (2005). Making sense of another mind: the role of the right temporo-parietal junction. Neuropsychologia, 43(10), 1391-1399. Seifritz, E., Neuhoff, J. G., Bilecen, D., Scheffler, K., Mustovic, H., Schächinger, H., . . . Di Salle, F. (2002). Neural processing of auditory looming in the human brain. Current Biology, 12(24), 2147-2151. Singh, A. K., Okamoto, M., Dan, H., Jurcak, V., & Dan, I. (2005). Spatial registration of multichannel multi-subject fNIRS data to MNI space without MRI. Neuroimage, 27(4), 842-851. Stuart, M. (1990). Processing strategies in a phoneme deletion task. The Quarterly Journal of Experimental Psychology, 42(2), 305-327. Tak, S., & Ye, J. C. (2014). Statistical analysis of fNIRS data: a comprehensive review. Neuroimage, 85, 72-91. Tang, H., Mai, X., Wang, S., Zhu, C., Krueger, F., & Liu, C. (2015). Interpersonal brain synchronization in the right temporo-parietal junction during face-to-face economic exchange. Social cognitive and affective neuroscience, 11(1), 23-32. Uga, M., Dan, I., Sano, T., Dan, H., & Watanabe, E. (2014). Optimizing the general linear model for functional near-infrared spectroscopy: an adaptive hemodynamic response function approach. Neurophotonics, 1(1), 015004. van de Rijt, L. P., van Opstal, A. J., Mylanus, E. A., Straatman, L. V., Hu, H. Y., Snik, A. F., & van Wanrooij, M. M. (2016). Temporal cortex activation to audiovisual speech in normal-hearing and cochlear implant users measured with functional near-infrared spectroscopy. Frontiers in Human Neuroscience, 10, 1-14. Vroomen, J., Van Zon, M., & De Gelder, B. (1996). Cues to speech segmentation: Evidence from juncture misperceptions and word spotting. Memory & Cognition, 24(6), 744-755. Weber, A., & Cutler, A. (2006). First-language phonotactics in second-language listening. The Journal of the Acoustical Society of America, 119(1), 597-607. Wiggins, I. M., Anderson, C. A., Kitterick, P. T., & Hartley, D. E. (2016). Speech-evoked activation in adult temporal cortex measured using functional near-infrared spectroscopy (fNIRS): Are the measurements reliable? Hearing research, 339, 142-154. Wijayasiri, P., Hartley, D. E., & Wiggins, I. M. (2017). Brain activity underlying the recovery of meaning from degraded speech: A functional near-infrared spectroscopy (fNIRS) study. Hearing

26

1 2 3 4 5 6 7 8 9 10 11 12 13 14

research, 351, 55-67. Wild, C. J., Yusuf, A., Wilson, D. E., Peelle, J. E., Davis, M. H., & Johnsrude, I. S. (2012). Effortful listening: the processing of degraded speech depends critically on attention. Journal of Neuroscience, 32(40), 14010-14021. Wilson, S. M., Molnar-Szakacs, I., & Iacoboni, M. (2007). Beyond superior temporal cortex: intersubject correlations in narrative speech comprehension. Cerebral Cortex, 18(1), 230-242. Ye, J. C., Tak, S., Jang, K. E., Jung, J., & Jang, J. (2009). NIRS-SPM: statistical parametric mapping for near-infrared spectroscopy. Neuroimage, 44(2), 428-447. Yerkey, P. N., & Sawusch, J. R. (1992). On the weakness of using strong syllables as word boundary markers. The Journal of the Acoustical Society of America, 91(4), 2338-2338. Yerkey, P. N., & Sawusch, J. R. (1993). The influence of stress, vowel, and juncture cues on segmentation. The Journal of the Acoustical Society of America, 94(3), 1880-1880. Zhang, H. C., & Wang, X. P. (1985). Chinese Version of Raven’s IQ Reasoning Standardized Test. Beijing: Beijing Normal University Press.

15 16 17

Highlights

18



19 20

as a Second Language) learners while they engaged in English spoken word segmentation. •

21 22

There were different activation patterns in the PFC and TPJ depending on listening proficiency and task conditions.



23 24

The first fNIRS study that tested neural activation patterns for Chinese university ESL (English

Evidence from the PFC showed that cognitive inhibition might contribute to successful word segmentation. And the evidence is more robust under a difficult task condition.



25

High proficient English listeners used their classical auditory processing area TPJ in segmenting words more efficiently than did the less proficient participants.

26

30

English spoken word segmentation activates the prefrontal cortex and temporo-parietal junction in Chinese ESL learners: A functional near-infrared spectroscopy (fNIRS) study

31

Yadan Lia,1, Yilong Yangb,c,1,*, Akaysha C. Tangd,e, Nian Liuf, Xuewei Wanga, Ying Dua,

32

Weiping Hua,g,*

27 28 29

33 34 35 36 37

a

MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China Research Center for Linguistics and Applied Linguistics, Xi’an International Studies University, Xi’an, China c School of English Studies, Xi’an International Studies University, Xi’an, China d The Laboratory of Neuroscience for Education, University of Hong Kong, China b

27

1 2 3 4 5 6 7 8 9 10 11 12 13

e

The Mind Research Network, Albuquerque, N.M., U.S.A. Department of Modern Languages, Literatures, and Linguistics, University of Oklahoma, Norman, U.S.A. g Shaanxi Normal University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality at Beijing Normal University, Xi’an, China f

1Equal

first author. author: Yilong Yang, School of English Studies, Xi’an International Studies University, Xi’an, China. Email address: [email protected], [email protected]. Weiping Hu, MOE Key Laboratory of Modern Teaching Technology, Shaanxi Normal University, Xi’an, China. Email address: [email protected]. *Corresponding

14

Author Contributions Section

15 16 17 18 19 20

Yadan Li and Yilong Yang finished all the experiments, data analysis, and drafted the manuscript; Akaysha C. Tang and Weiping Hu supervised revisions; Nian Liu finished proofreading and provided suggestions for revisions; Xuewei Wang guided the fNIRS experimenting and data analysis; Ying Du proofread the manuscript.

28