Tonotopic representation of loudness in the human cortex

Tonotopic representation of loudness in the human cortex

Accepted Manuscript Tonotopic representation of loudness in the human cortex Andrew Thwaites, Josef Schlittenlacher, Ian Nimmo-Smith, William D. Marsl...

5MB Sizes 0 Downloads 63 Views

Accepted Manuscript Tonotopic representation of loudness in the human cortex Andrew Thwaites, Josef Schlittenlacher, Ian Nimmo-Smith, William D. MarslenWilson, Brian C.J. Moore PII:

S0378-5955(16)30330-6

DOI:

10.1016/j.heares.2016.11.015

Reference:

HEARES 7280

To appear in:

Hearing Research

Received Date: 28 July 2016 Revised Date:

24 November 2016

Accepted Date: 29 November 2016

Please cite this article as: Thwaites, A., Schlittenlacher, J., Nimmo-Smith, I., Marslen-Wilson, W.D., Moore, B.C.J., Tonotopic representation of loudness in the human cortex, Hearing Research (2016), doi: 10.1016/j.heares.2016.11.015. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Thwaites et al

1

Tonotopic representation of loudness

1

ACCEPTED MANUSCRIPT

Tonotopic representation of loudness in the human cortex

2 3

Andrew Thwaites*1,2, Josef Schlittenlacher1, Ian Nimmo-Smith2, William D. Marslen-

4

Wilson1,2, Brian C.J. Moore1

6

1

Department of Psychology, University of Cambridge, Cambridge, UK

2

MRC Cognition and Brain Sciences Unit, Cambridge, UK

7 8

RI PT

5

SC

9

11 Email addresses:

13

AT: [email protected]

14

JS: [email protected]

15

INS: [email protected]

16

WMW: [email protected]

17

BM: [email protected]

18

TE D

12

M AN U

10

*Corresponding author. Department of Psychology, Downing Site, University of Cambridge,

20

Cambridge, CB2 3EB, United Kingdom

21

E-mail address: [email protected]

23 24

AC C

22

EP

19

25 26 27 28

Keywords: tonotopy, magnetoencephalography, loudness, temporal integration, model

29

expression, entrainment

Thwaites et al

Tonotopic representation of loudness

2

ACCEPTED MANUSCRIPT

30 31 Abbreviations: CF, characteristic frequency; DLS, dorso-lateral sulcus; EMEG, electro- and

33

magneto-encephalographic; HG, Heschl’s gyrus; KID, Kymata identifier;

AC C

EP

TE D

M AN U

SC

RI PT

32

Thwaites et al

Tonotopic representation of loudness

3

ACCEPTED MANUSCRIPT

34 Abstract

36

A prominent feature of the auditory system is that neurons show tuning to audio frequency;

37

each neuron has a characteristic frequency (CF) to which it is most sensitive. Furthermore,

38

there is an orderly mapping of CF to position, which is called tonotopic organization and

39

which is observed at many levels of the auditory system. In a previous study (Thwaites et al.,

40

2016) we examined cortical entrainment to two auditory transforms predicted by a model of

41

loudness, instantaneous loudness and short-term loudness, using speech as the input signal.

42

The model is based on the assumption that neural activity is combined across CFs (i.e. across

43

frequency channels) before the transform to short-term loudness. However, it is also possible

44

that short-term loudness is determined on a channel-specific basis. Here we tested these

45

possibilities by assessing neural entrainment to the overall and channel-specific instantaneous

46

loudness and the overall and channel-specific short-term loudness. The results showed

47

entrainment to channel-specific instantaneous loudness at latencies of 45 and 100 ms

48

(bilaterally, in and around Heschl’s gyrus). There was entrainment to overall instantaneous

49

loudness at 165 ms in dorso-lateral sulcus (DLS). Entrainment to overall short-term loudness

50

occurred primarily at 275 ms, bilaterally in DLS and superior temporal sulcus. There was

51

only weak evidence for entrainment to channel-specific short-term loudness.

SC

M AN U

TE D

EP

53

AC C

52

RI PT

35

Thwaites et al

Tonotopic representation of loudness

4

ACCEPTED MANUSCRIPT

54 55 56

1. Introduction The loudness of a sound is a subjective attribute corresponding to the impression of its magnitude. Glasberg and Moore (2002) proposed that there are two aspects of loudness for

58

time-varying sounds such as speech and music. One is the short-term loudness, which

59

corresponds to the loudness of a short segment of sound such as a single word in speech or a

60

single note in music. The second is the long-term loudness, which corresponds to the overall

61

loudness of a relatively long segment of sound, such as a whole sentence or a musical phrase.

62

Glasberg and Moore (2002) proposed a model in which transformations and processes that

63

are assumed to occur at relatively peripheral levels in the auditory system (i.e. the outer,

64

middle, and inner ear) are used to construct a quantity called “instantaneous loudness” that is

65

not available to conscious perception, although it appears to be represented in the brain

66

(Thwaites et al., 2016). At later stages in the auditory system the neural representation of the

67

instantaneous loudness is assumed to be transformed into the short-term loudness and long-

68

term loudness, via processes of temporal integration.

SC

M AN U

In a previous study (Thwaites et al., 2016), we investigated whether the time-varying

TE D

69

RI PT

57

instantaneous loudness or short-term loudness predicted by the loudness model of Glasberg

71

and Moore (2002) is ‘tracked’ by cortical current, a phenomenon known as cortical

72

entrainment. The extent of cortical entrainment was estimated from electro- and magneto-

73

encephalographic (EMEG) measures of cortical current, recorded while normal-hearing

74

participants listened to continuous speech. There was entrainment to instantaneous loudness

75

bilaterally at 45 ms, 100 ms and 165 ms, in Heschl’s gyrus (HG), dorso-lateral sulcus (DLS)

76

and HG, respectively. Entrainment to short-term loudness was found in both the DLS and

77

superior temporal sulcus at 275 ms. These results suggest that short-term loudness is derived

78

from instantaneous loudness, and that this derivation occurs after processing in sub-cortical

79

structures.

80

AC C

EP

70

Missing from this account is how overall instantaneous loudness is constructed from

81

the information that flows from the cochlea. A prominent feature of the auditory system is

82

that neurons show tuning to audio frequencies; each neuron has a characteristic frequency

Thwaites et al

Tonotopic representation of loudness

5

ACCEPTED MANUSCRIPT

(CF) to which it is most sensitive. This is a consequence of the filtering that occurs in the

84

cochlea, which decomposes broadband signals like speech and music into multiple narrow

85

frequency channels. The information in each channel is transmitted, in parallel, along the

86

afferent fibers making up the auditory nerve (Helmholtz, 1863, Fuchs, 2010; Meyer and

87

Moser, 2010; von Bekesy, 1949), and information encoded in these channels reaches the

88

cortex in separate locations (Saenz and Langers, 2014). Furthermore, there is an orderly

89

mapping of CF to position, which is called tonotopic organization and which is observed at

90

many levels of the auditory system (Palmer, 1995).

The model of Glasberg and Moore (2002) is based on the assumption that neural

SC

91

RI PT

83

activity is combined across CFs, i.e. across frequency channels, before the transformation to

93

short-term loudness. It is not known where in the brain such a combination might occur, but

94

if there is cortical entrainment to channel-specific activity, the model leads to the prediction

95

that this entrainment should occur with a shorter latency than for short-term loudness. In this

96

study, we aimed to test this, by assessing neural entrainment to the instantaneous loudness

97

predicted by the loudness model for nine frequency sub-bands, hereafter called “channels”,

98

spanning the range from low to high frequencies.

TE D

99

M AN U

92

The sequence of transforms assumed in the model of Glasberg and Moore (2002) (channel-specific instantaneous loudness, followed by overall instantaneous loudness,

101

followed by overall short-term loudness, outlined in figure 1) is not the only plausible

102

sequence leading to the neural computation of overall short-term loudness. Short-term

103

loudness may instead be derived separately for each channel, and these short-term loudness

104

estimates may then be combined across channels to give the overall short-term loudness, as

105

assumed in the models of Chalupper and Fastl (2002) and Moore et al. (2016). To test this

106

alternative, we created a modified version of the model in which the short-term loudness was

107

calculated separately for each channel and we assessed whether there was cortical

108

entrainment to the channel-specific short-term loudness values. If neural responses are

109

combined across CFs before the derivation of short-term loudness, then there might be

110

cortical entrainment to the channel-specific instantaneous loudness values but not to the

111

channel-specific short-term loudness values. If the short-term loudness is derived separately

AC C

EP

100

Thwaites et al

Tonotopic representation of loudness

6

ACCEPTED MANUSCRIPT

for each CF and then the short-term loudness estimates are combined across CFs, then

113

cortical entrainment might be observed both to the channel-specific instantaneous loudness

114

values and to the channel-specific short-term loudness values. Furthermore, the cortical

115

entrainment to the channel-specific short-term loudness values should occur earlier in time

116

than the cortical entrainment to the overall short-term loudness values.

117

RI PT

112

In addition to standard graphic representations, an interactive representation of this study’s results can be viewed on the online Kymata Atlas (http://kymata.org). For easy

119

reference, each model in this paper (referred to as a ‘function’ in Kymata) is assigned a

120

Kymata ID [KID].

SC

118

122 123

2.

Defining candidate models

M AN U

121

To measure cortical entrainment, some constraints must be imposed on the models/transforms that can be tested. Any model that takes a time-varying signal as input and

125

produces a time-varying signal as output can be used, with function f() characterizing the

126

mechanism by which the information (in this case the acoustic waveform) is transformed

127

before it produces cortical entrainment. Thus, if both input x1,…,xm and output y1,...,yn are of

128

duration t, the model takes the form:

129

TE D

124

f(x1,x2,x3,...,xt) = (y1,y2,y3,...,yt),

(eq 1)

where f() is bounded by a set of formal requirements (Davis et al., 1994) and a requirement

131

that yi cannot be dependent on any xk where k>i (this last requirement avoids hypothesizing a

132

non-causal f() where a region can express an output before it has the appropriate input).

AC C

133

EP

130

Previously, three ‘auditory magnitude’ models were tested using the same dataset as

134

the one used here: instantaneous loudness, short-term loudness, and the Hilbert envelope

135

(KIDs: QRLFE, B3PU3 and ZDSQ9, respectively). The instantaneous loudness model

136

(Moore et al.,1997; Glasberg and Moore, 2002) and short-term loudness model (Glasberg and

137

Moore, 2002) represent successive transformations that approximate physiological processing

138

in the peripheral and central auditory system. The third model, the Hilbert envelope model,

139

was uninformed by physiological processing, and was included as a naïve comparison model.

Thwaites et al

140

Tonotopic representation of loudness

7

ACCEPTED MANUSCRIPT

The loudness model of Glasberg and Moore (2002) is based on a series of stages that mimic processes that are known to occur in the auditory system. These stages are: (1) A

142

linear filter to account for the transfer of sound from the source (e.g. a headphone or

143

loudspeaker) to the tympanic membrane; (2) A linear filter to account for the transfer of

144

sound pressure from the tympanic membrane through the middle ear to pressure difference

145

across the basilar membrane within the cochlea; (3) An array of level-dependent bandpass

146

filters, resembling the filters that exist within the cochlea (Glasberg and Moore, 1990). These

147

filters are often called the “auditory filters” and they provide the basis for the tonotopic

148

organization of the auditory system; (4) A compressive nonlinearity, which is applied to the

149

output of each auditory filter, resembling the compression that occurs in the cochlea (Robles

150

and Ruggero, 2001).

SC

M AN U

151

RI PT

141

The model of Glasberg and Moore (2002) is based on the assumption that the compressed outputs of the filters are combined to give a quantity proportional to

153

instantaneous loudness. The moment-by-moment estimates of instantaneous loudness are

154

then smoothed over time to give the short-term loudness. However, here we also consider a

155

model based on the assumption that the smoothing over time occurs on a channel-specific

156

basis, and that the overall short-term loudness is derived by summing the short-term specific

157

loudness values across channels. The estimates of overall instantaneous loudness and

158

channel-specific instantaneous loudness were updated every 1 ms. Examples of the structure

159

of these models (and the models’ predicted activity) are shown in figure 1.

EP

We chose to use nine channels, each 4 Cams wide on the ERBN-number scale

AC C

160

TE D

152

161

(Glasberg and Moore, 2002; Moore, 2012). The ERBN-number scale is a perceptually

162

relevant transformation of the frequency scale that corresponds approximately to a scale of

163

distance along the basilar membrane. The equation relating ERBN-number in Cams to

164

frequency, f, in Hz is (Glasberg and Moore, 1990):

165 166 167

ERBNnumber = 21.4 log10(0.00437f + 1)

(eq 2)

Thwaites et al

Tonotopic representation of loudness

8

ACCEPTED MANUSCRIPT

The normal auditory system uses channels whose bandwidths are approximately 1-Cam wide

169

(Moore, 2012). However, cortical entrainment to the activity in 1-Cam wide channels might

170

be very weak, because each channel contains only a small amount of energy relative to the

171

overall energy. We used 4-Cam wide channels so as to give reasonable frequency specificity

172

while avoiding channels that were so narrow that they would contain little energy and

173

therefore lead to very “noisy” cortical representations. A second reason for not using 1-Cam

174

wide channels is that the magnitudes of the outputs of closely spaced frequency channels in

175

response to speech are highly correlated, whereas more widely spaced channels show lower

176

correlations in their outputs (Crouzet and Ainsworth, 2001). The analysis method used in this

177

paper does not work well when the models that are being compared give highly correlated

178

outputs. Analyses using narrower channels may be possible in future studies if methods can

179

be found for reducing the noise in the measured cortical activation.

SC

M AN U

180

RI PT

168

In total, twenty-one candidate models were tested, each based on a different transform of the input signal. These were: the overall instantaneous loudness model; nine channel-

182

specific instantaneous loudness models; the short-term loudness model; nine channel-specific

183

short-term loudness models; and the Hilbert envelope model. The following sections outline

184

these models, with the exception of the Hilbert envelope model, which is described in

185

Thwaites et al. (2016).

EP

187

AC C

186

TE D

181

Thwaites et al

Tonotopic representation of loudness

9

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

188

Figure 1. Example of the stimulus and model predictions. A. The hypothesized pathway in Thwaites et al.

190

(2016), with the predicted activity for the first 1-second of the stimulus. B. The hypothesized pathway when

191

channel-specific instantaneous loudnesses (purple) are added to the model. C. The hypothesized pathway when

192

channel-specific short-term loudnesses (pink) are calculated as a prerequisite before the overall short-term

193

loudness.

196

EP

195

2.1. Overall instantaneous loudness (KID: QRLFE) The specific loudness is a kind of loudness density. It is the loudness derived from

AC C

194

TE D

189

197

auditory filters with CFs spanning a 1-Cam range. To obtain the overall instantaneous

198

loudness, the specific instantaneous loudness is summed for filters with CFs between 1.75

199

and 39.0 Cam (corresponding to 47.5 Hz and 15108 Hz, respectively), as described by

200

Glasberg and Moore (2002) and Thwaites et al. (2016). To achieve high accuracy, the CFs of

201

the filters are spaced at 0.25-Cam intervals, and the sum is divided by 4.

202 203

2.2. Channel-specific instantaneous loudness

Thwaites et al

204

Tonotopic representation of loudness

10

ACCEPTED MANUSCRIPT

The operation of the model for the calculation of channel-specific instantaneous loudness can be summarized using an equation that characterizes spectral integration. For

206

each channel employed here, the specific loudness values were summed over a range of 4

207

Cams. For example, channel 4 used CFs between 13.75 and 17.5 Cam, corresponding to

208

frequencies from 766 to 1302 Hz. Table 1 gives the center frequency of each channel in Hz

209

and in Cams. The compressed outputs of all auditory filters with center frequencies within the

210

span of a given channel were summed to give the channel-specific instantaneous loudness.

RI PT

205

212 Center frequency

Center frequency

(Hz)

(Cam)

109

Instantaneous loudness channel 2

292

Instantaneous loudness channel 3

573

Instantaneous loudness channel 4

1005

TE D

Instantaneous loudness channel 1

M AN U

Model name

Kymata ID

3.625

EUXJZ

7.625

PWMD2

11.625

5LHYD

15.625

KAEKQ

1670

19.625

YYB73

Instantaneous loudness channel 6

2694

23.625

EN7SE

Instantaneous loudness channel 7

4270

27.625

UC4DR

Instantaneous loudness channel 8

6687

31.625

A8QRP

Instantaneous loudness channel 9

10415

35.625

UJU6C

EP

Instantaneous loudness channel 5

AC C

213

SC

211

214

Table 1. The center frequencies of the nine channels in Hz and in Cams, and their respective KIDs for channel-

215

specific instantaneous loudness.

216 217

2.3. Overall short-term loudness

218

The overall short-term loudness was calculated from the overall instantaneous

219

loudness, as described by Glasberg and Moore (2002) and Thwaites et al. (2016), by taking a

Thwaites et al

Tonotopic representation of loudness

11

ACCEPTED MANUSCRIPT

220

running average of the instantaneous loudness. The averager resembled the operation of an

221

automatic gain control system with separate attack and release times.

222

224

2.4. Channel-specific short-term loudness values The channel-specific short-term loudness values were calculated in the same way as

RI PT

223

for the overall short-term loudness, except that the averager was applied to the channel-

226

specific instantaneous loudness values. The KIDs of the channel-specific short-term loudness

227

values are PH5TU, 9T98H, U5CM6, EFFZT, YRKEG, K3NT5, 5DS7S, PPVLF, and

228

9ZYZ4, for channels 1-9, respectively.

SC

225

230 231

3. The analysis procedure

M AN U

229

The reconstructed distributed source current of the cortex is specified as the current of 10242 cortical regions (sources), spaced uniformly over the cortex. The testing procedure

233

involves examining each of these sources, looking for evidence that the current predicted by a

234

model is similar to the current observed (figure 2A). This procedure is repeated at 5-ms

235

intervals (Figure 2B) across a range of time lags (−200 < l < 800 ms), covering the range of

236

plausible latencies (0 to 800 ms) and a short, pre-stimulation range (−200 to 0 ms) during

237

which we would expect to see no significant match (The 0-800 ms range was chosen because

238

the study of Thwaites et al. (2015) showed little significant expression for instantaneous

239

loudness after a latency of 500 ms; the 5-ms interval step was chosen because it is the

240

smallest value that can be used given current computing constraints). This produces a

241

statistical parametric map that changes over time as the lag is varied, revealing the evolution

242

of similarity for a given model’s predicted behavior with observed behavior over cortical

243

location and time. Evidence of a model’s similarity between its predicted behavior and

244

cortical activity is expressed as a p-value, which is generated through the match-mismatch

245

technique described in Thwaites et al. (2015), where evidence for similarity is described as

246

significant if the p-value is less than a pre-defined value, α*. We refer to the observation of

247

significant matches at a specific lag as ‘model expression’.

248

AC C

EP

TE D

232

Thwaites et al

Tonotopic representation of loudness

12

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

249

Figure 2. Technique overview. First (A), the electrophysiological activity of the brain in response to a given

251

stimulus (measured using EMEG) is matched to the pattern of neural activity predicted by the model being

252

evaluated. Predicted and observed activity are tested for similarity and the resulting statistical parametric map

253

displays the regions (sources) where the match is statistically significant. Second (B), this procedure is repeated

254

at different lags (illustrated here from 0-150 ms) between the onset of the observed neural activity and the onset

255

of the predicted output. The similarity is highest at a specific lag (highlighted). This produces a statistical

256

parametric map that changes over time.

EP

258

Setting α* so that it accurately reflects what is known about the data being tested can

AC C

257

TE D

250

259

be difficult. In the current study, some of the measurements used in the tests are dependent on

260

other measurements (because of spatial and temporal similarities between neighboring

261

sources and lags). However, it is very difficult, if not impossible, to get accurate estimations

262

of, for instance, the spatial dependencies between sources. In the present study, rather than

263

accept assumptions about the dependencies that are hard to justify, we assumed that the data

264

at each source and lag were independent (a ‘worst case’ scenario). As a result, the type II

265

error rate is likely to be high, making the reported results ‘conservative’.

Thwaites et al

266

Tonotopic representation of loudness

13

ACCEPTED MANUSCRIPT

We used an exact formula for the familywise false-alarm rate to generate a ‘corrected’

267

α, α* of approximately 3×10-13 (see Thwaites et al., 2015, for the full reasoning); p-values

268

greater than this are deemed to be not significant.

269

The results are presented as expression plots, which show the latency at which each of the 10242 sources for each hemisphere best matched the output of the tested model (marked

271

as a ‘stem’). The y-axis shows the evidence supporting the match at this latency: if any of the

272

sources have evidence, at their best latency, indicated by a p-value lower than α*, they are

273

deemed ‘significant matches’ and the stems are colored red, pink or blue, depending on the

274

model.

SC

RI PT

270

The expression plots also allow us to chart which models are most likely at a

276

particular source through a model selection procedure, using p-values as a proxy for model

277

likelihood. For each model’s expression plot we retain only those sources where the p-value

278

is lower (has higher likelihood) than for the other models tested. Accordingly, each plot has

279

fewer than 10242 sources per hemisphere, but the five plots (Figs 3A, 3B, 3C, 6A and 6B)

280

taken together make up the full complement of 10242 sources per hemisphere. It is important

281

to note that this model selection procedure does not indicate that any one model is

282

significantly better than another for some source. It indicates only that one model is better

283

than another by some amount, even if the evidence may not differ strongly between models.

284

We take this approach as we are only interested in the trend of which models explain the

285

activity best in each source, and our aim is to distinguish between models that may be

286

correlated over time.

TE D

EP

AC C

287

M AN U

275

In what follows, the input signal for all transformations is the estimated waveform at

288

the tympanic membrane. This waveform was estimated by passing the digital representation

289

of the signal waveform through a digital filter representing the effective frequency response

290

of the earpieces used. This frequency response (called the transfer function) was measured

291

using KEMAR KB0060 and KB0061 artificial ears, mounted in a KEMAR Type 45DA Head

292

Assembly (G.R.A.S. Sound & Vibration, Holte, Denmark). This frequency response,

293

measured as gain relative to 1000 Hz, was estimated to be ([frequency:left gain:right gain]):

Thwaites et al

Tonotopic representation of loudness

14

ACCEPTED MANUSCRIPT

294

[125 Hz:-1.5:-0.8], [250 Hz:1. 5:1.3], [500 Hz:1.5:2.6], [1000 Hz:0:0], [2000 Hz: 1.6: 0.5],

295

[3000 Hz: -2.0:-0.5] [4000 Hz:-5.6:-3.7], [5000 Hz:-6.4:-5.1], [6000 Hz:-12.3:-13.3].

296 4.

298 299

Methods and materials The methods and materials are the same as those used in Thwaites et al. (2016). The

reader is referred to that paper for full details.

300 301

4.1.

Participants: there were 15 right-handed participants (7 men, mean age = 24 years,

SC

302

Participants and stimuli

RI PT

297

range=18-30). All gave informed consent and were paid for their participation. The study was

304

approved by the Peterborough and Fenland Ethical Committee (UK). For reasons

305

unconnected with the present study, all participants were native speakers of Russian.

M AN U

303

306

Stimuli: A single 6 minute 40 second acoustic stimulus (a Russian-language BBC radio interview about Colombian coffee) was used. This was later split in the analysis

308

procedure into 400 segments of length 1000 ms. The stimulus was presented at a sampling

309

rate of 44.1 kHz with 16-bit resolution. A visual black fixation cross (itself placed over a

310

video of slowly fluctuating colors) was presented during the audio signal to stop the

311

participant’s gaze from wandering.

312

314

4.2.

Procedure

Each participant received one practice stimulus lasting 20 seconds. Subsequent to this,

AC C

313

EP

TE D

307

315

the continuous 6 minute 40 second stimulus was presented four times, with instructions to

316

fixate on the cross in the middle of the screen while listening. After each presentation, the

317

participant was asked two simple questions about the content of the stimulus, which they

318

could answer using the button box. Having made a reply, they could rest, playing the next

319

presentation when ready, again using the button box. Presentation of stimuli was controlled

320

with Matlab, using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997;

321

Kleiner et al., 2007). The stimuli were binaurally presented at approximately 65 dB SPL via

322

Etymotic Research (Elk Grove Village, Illinois) ER3 earpieces with 2.5 m tubes.

Thwaites et al

Tonotopic representation of loudness

15

ACCEPTED MANUSCRIPT

323 324

4.3.

325

EMEG recording Continuous MEG data were recorded using a 306 channel VectorView system

(Elekta-Neuromag, Helsinki, Finland) containing 102 identical sensor triplets (two

327

orthogonal planar gradiometers and one magnetometer) in a hemispherical array situated in a

328

light magnetically-shielded room. The position of the head relative to the sensor array was

329

monitored continuously by four Head-Position Indicator (HPI) coils attached to the scalp.

330

Simultaneous EEG was recorded from 70 Ag-AgCl electrodes placed in an elastic cap

331

(EASYCAP GmbH, Herrsching-Breitbrunn, Germany) according to the 10/20 system, using

332

a nose electrode as reference. Vertical and horizontal EOG were also recorded. All data were

333

sampled at 1 kHz and were band-pass filtered between 0.03 and 330 Hz. A 3-D digitizer

334

(Fastrak Polhemus Inc, Colchester, VA) recorded the locations of the EEG electrodes, the

335

HPI coils and approximately 50-100 ‘headpoints’ along the scalp, relative to three anatomical

336

fiducials (the nasion and left and right pre-auricular points).

M AN U

SC

RI PT

326

337 4.4.

Data pre-processing

TE D

338

Static MEG bad channels were detected and excluded from subsequent analyses

340

(MaxFilter version 2, Elektra-Neuromag, Stockholm, Sweden). Compensation for head

341

movements (measured by HPI coils every 200 ms) and a temporal extension of the signal–

342

space separation technique (Taulu et al., 2005) were applied to the MEG data. Static EEG

343

bad channels were visually detected and removed from the analysis (MNE version 2.7,

344

Martinos Center for Biomedical Imaging, Boston, Massachusetts). The EEG data were re-

345

referenced to the average over all channels. The continuous data were low-pass filtered at 100

346

Hz (zero-phase shift, overlap-add, FIR filtering). The recording was split into 400 epochs of

347

1000 ms duration. Each epoch included the 200 ms from before the epoch onset and 800 ms

348

after the epoch finished (taken from the previous and subsequent epochs) to allow for the

349

testing of different latencies. Epochs in which the EEG or EOG exceeded 200 ìV, or in which

350

the value on any gradiometer channel exceeded 2000 fT/m, were rejected from both EEG and

351

MEG datasets. Epochs for each participant were averaged over all four stimulus repetitions.

AC C

EP

339

Thwaites et al

Tonotopic representation of loudness

16

ACCEPTED MANUSCRIPT

352 353

4.5.

354

Source reconstruction The locations of the cortical current sources were estimated using minimum-norm

estimation [MNE] (Hämäläinen and Ilmoniemi, 1994; Gramfort et al., 2013; 2014), neuro-

356

anatomically constrained by MRI images obtained using a GRAPPA 3D MPRAGE sequence

357

(TR=2250 ms; TE=2.99 ms; flip-angle=9 degrees; acceleration factor=2) on a 3T Tim Trio

358

(Siemens, Erlangen, Germany) with 1-mm isotropic voxels. For each participant a

359

representation of their cerebral cortex was constructed using FreeSurfer (Freesurfer 5.3,

360

Martinos Center for Biomedical Imaging, Boston, Massachusetts). The forward model was

361

calculated with a three-layer Boundary Element Model using the outer surface of the scalp

362

and the outer and inner surfaces of the skull identified in the structural MRI. Anatomically-

363

constrained source activation reconstructions at the cortical surface were created by

364

combining MRI, MEG, and EEG data. The MNE representations were downsampled to

365

10242 sources per hemisphere, roughly 3 mm apart, to improve computational efficiency.

366

Representations of individual participants were aligned using a spherical morphing technique

367

(Fischl et al., 1999). Source activations for each trial were averaged over participants. We

368

employed a loose-orientation constraint (0.2) to improve the spatial accuracy of localization.

369

Sensitivity to neural sources was improved by calculating a noise covariance matrix based on

370

a 1-s pre-stimulus period. Reflecting the reduced sensitivity of MEG sensors for deeper

371

cortical activity (Hauk et al., 2011), sources located on the cortical medial wall and in

372

subcortical regions were not included in the analyses reported here.

373 374 375

4.6.

AC C

EP

TE D

M AN U

SC

RI PT

355

Visualization

The cortical slices in Fig. 3 use the visualization software MRIcron (Georgia State

376

Center for Advanced Brain Imaging, Atlanta, Georgia) with results mapped to the high-

377

resolution colin27 brain (Schmahmann et al., 2000). The HG label in Fig. 5 is taken from the

378

on Desikan-Kilkenny-Tourville-40 cortical labelling atlas (Klein and Tourville, 2012), and

379

the probabilistic anatomical parcellation of HG in Fig. 3 is based on the probabilistic atlases

380

of Rademacher et al. (1992), Fischl et al. (2004) and Morosan et al. (2001).

Thwaites et al

Tonotopic representation of loudness

17

ACCEPTED MANUSCRIPT

381 382

5.

383

5.1. Channel-specific instantaneous loudness models

384

Results

The regions where expression for one of the nine channel-specific instantaneous loudness models was the most significant of all the models tested, and below the α*

386

threshold, were located mainly bilaterally at latencies of 45 ms, 100 ms and 165 ms (Fig.

387

3A).

AC C

EP

TE D

M AN U

SC

RI PT

385

Thwaites et al

Tonotopic representation of loudness

18

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

388 389

Figure 3. Expression plots for the channel-specific instantaneous loudness, instantaneous loudness and short-

390

term loudness models. (A) Plot overlaying the expression of the nine channel-specific instantaneous loudness

Thwaites et al

Tonotopic representation of loudness

19

ACCEPTED MANUSCRIPT

models across processing lags from -200 to +800 ms, relative to the acoustic signal. Results for the left and right

392

hemispheres are plotted separately, with the right hemisphere plots inverted to allow comparison between

393

hemispheres. The minimum p-values at a given source, over all latencies, are marked as ‘stems’. Stems at

394

or above the stipulated value (p = 3×10-13) indicate significant expression of the channel-specific instantaneous

395

loudness models that are not explained better by the other models tested. The cortical locations of significant

396

sources at latencies W (45 ms), X (100 ms) and Y (165 ms) (labeled in black boxes) are indicated on the coronal

397

and axial slices to the right of the plot. (B). Plot of the expression for the overall instantaneous loudness model

398

across processing lags from -200 to +800 ms, relative to the acoustic signal. Again, the cortical locations of

399

significant sources at latency Y (165 ms) (labeled in black box) are indicated on the coronal and axial slices to

400

the right of the plot. (C). Plot of the expression for the short-term loudness model across processing lags from -

401

200 to +800 ms, relative to the acoustic signal. The cortical locations of significant sources at latency Z (275

402

ms) (labeled in a black box) are indicated on the coronal and axial slices to the right of the plot. All three

403

expression plots (with Fig 6) implement models selected so that each source appears only once in the five plots.

404

The boundary of the probabilistic anatomical parcellation of HG is shown in green (Rademacher et al., 1992;

405

Fischl et al., 2004; Morosan et al., 2001).

407

SC

M AN U

TE D

406

RI PT

391

This picture is complicated somewhat by the fact that significant expression was not

409

found for all channels at all three latencies. Broadly speaking, the ‘center’ channels (that is,

410

channels 3, 4, 5 and 6) were expressed at 165 ms, while the ‘outermost’ channels (1, 2, 7, 8

411

and 9) were expressed at 45 and 100 ms (Fig. 4).

AC C

EP

408

Thwaites et al

Tonotopic representation of loudness

20

SC

RI PT

ACCEPTED MANUSCRIPT

412

Figure 4. Expression plot illustrating the differences between the ‘center’ and ‘outermost’ instantaneous

414

loudness channels. The figure reproduces data from figure 3A, but with expressions for the ‘center’ and

415

‘outermost’ instantaneous loudness channels (channels 3, 4, 5, 6 and 1, 2, 7, 8, 9, respectively) colored purple

416

and green, respectively. The plot shows that expression at ‘W’ and ‘X’ (labeled in black boxes) is strongest for

417

the outermost channels, while expression at ‘Y’ is strongest for the channels in the center.

M AN U

413

TE D

418

The reported positions of the expression of these channels are uncertain as a

420

consequence of the error introduced by the point-spread function inherent in EMEG source

421

localization (see discussion). The expression of channels 1, 2, 3 and 5 at 45 ms appears to be

422

bilateral, and dispersed loosely around the lateral sulcus, while the locations of the expression

423

at 100 ms are found more clearly in or around HG (Fig. 5), although these are again

424

somewhat dispersed.

426

AC C

425

EP

419

Thwaites et al

Tonotopic representation of loudness

21

ACCEPTED MANUSCRIPT

Figure 5. Positions of the expression for the channel-specific instantaneous loudnesses at 100 ms. The figure

428

shows an inflated brain with the significant regions of entrainment to each of the nine tested instantaneous

429

loudness channels, at 100 ms. Channels with no significant expression at this latency are ghosted in the key. For

430

display purposes, only the top seven most significant vertices for each channels are shown. The boundary of

431

HG, taken from the Desikan-Kilkenny-Tourville-40 cortical labeling atlas (Klein and Tourville, 2012) is also

432

shown.

RI PT

427

433 434

The locations of the expression at 165 ms were bilateral, like the expression of the channels found at 45 ms, and dispersed loosely around the lateral sulcus (see accompanying

436

cortical slices to the right of Fig. 3A). An interactive representation of these nine channels

437

(and all models tested in this paper) can be viewed using the Kymata Atlas (2016).

438 439 440

5.2. Instantaneous loudness model

M AN U

SC

435

The regions where expression for the instantaneous loudness model was the most significant of the models tested, and below the α* threshold, were located mainly bilaterally

442

with a latency of 165 ms (Fig. 3B). This expression was centered on the DLS.

TE D

441

443 444

5.3. Short-term loudness model

The regions where expression for the short-term loudness model was the most

446

significant of the models tested, and below the α* threshold, were located mainly bilaterally

447

with a latency of 275 ms (Fig. 3C). This expression was centered on the DLS and the superior

448

temporal sulcus.

450 451

AC C

449

EP

445

5.4. Hilbert envelope model Small but significant expression was found for the Hilbert envelope model at 155 ms

452

(Fig. 6A). The locations of this expression were similar to the locations of the sources

453

entrained to instantaneous loudness. While significant, the evidence (in terms of p-values) for

454

this expression was several orders of magnitude below those for the most significant

455

instantaneous or short-term loudness expressions. The Hilbert envelope model is unrealistic,

Thwaites et al

Tonotopic representation of loudness

22

ACCEPTED MANUSCRIPT

456

as it takes no account of the physical transformations occurring between the sound source and

457

the cochlea or of the physiological transformations occurring in the cochlea. The entrainment

458

found to the Hilbert envelope probably reflects its high correlation with the instantaneous

459

loudness, and the noisiness of the data resulting from source estimation error.

M AN U

SC

RI PT

460

461

Figure 6. Expression plot for the Hilbert envelope and channel-specific short-term loudness model. (A) Plot of

463

expression of the Hilbert envelope model (across processing lags from -200 to +800 ms, relative to the acoustic

464

signal), showing the latencies of significant expression with the Hilbert model. (B) Plot of expression of the

465

nine channel-specific short-term loudness models. All nine expression plots for each channel are overlaid into a

466

single plot.

EP

467

469

5.5. Channel-specific short-term loudness models

AC C

468

TE D

462

Small but significant expression was found for the nine channel-specific short-term

470

loudness models between 0 and 130 ms and around 300 ms (Fig. 6B). The locations of this

471

expression were similar to the locations of the sources entrained to channel-specific

472

instantaneous loudnesses. While significant, the evidence (in terms of p-values) for this

473

expression was several orders of magnitude below those for the most significant

474

instantaneous or short-term loudness expressions.

475 476

6. Discussion

Thwaites et al

Tonotopic representation of loudness

23

ACCEPTED MANUSCRIPT

The study of Thwaites et al. (2016) assessed entrainment to only three transforms:

478

instantaneous loudness, short-term loudness and Hilbert envelope. As a result, it ascribed

479

entrainment in HG and the DLS at 45, 100 and 165 ms latency to instantaneous loudness, and

480

entrainment in the DLS and superior temporal sulcus at 275 ms latency as entrainment to

481

short-term loudness, with little evidence of entrainment to the Hilbert envelope. The current

482

study, which assessed entrainment to these three transforms plus eighteen new transforms, re-

483

ascribes this early entrainment at 45 ms and 100 ms latency to the nine channel-specific

484

instantaneous loudnesses. This supports the view that auditory information is split into

485

frequency channels in the cochlea, this information is transmitted, in parallel, along the

486

tonotopically organized afferent fibers making up the auditory nerve (Helmholtz, 1863,

487

Fuchs, 2010; Meyer and Moser, 2010; von Bekesy, 1949), and that the information encoded

488

in these channels reaches the cortex in separate cortical locations (Saenz and Langers, 2014).

SC

M AN U

489

RI PT

477

The integration of these channels into a single ‘combined’ instantaneous loudness value appears to occur between 100 ms and 165 ms: expression for the channel-specific

491

instantaneous loudnesses was predominantly found at 45 ms and 100 ms latency, and

492

expression for overall instantaneous loudness was predominately found at 165 ms. This

493

interpretation is, at first glance, seemingly contradicted by additional expression of channel-

494

specific instantaneous loudness at 165 ms: if both channel-specific instantaneous loudness

495

and instantaneous loudness are expressed at 165 ms, then there is no time for the conversion

496

from the former to the latter. However, Figure 4 shows that this 165 ms expression for

497

channel-specific instantaneous loudness is likely a false positive, as it is mostly the central

498

four channels (channels 3-6) that are associated with expression at this latency. The outputs

499

of these four channels are, in this dataset, highly correlated with the overall instantaneous

500

loudness (ρ=0.9, 0.9, 0.8, and 0.6 for channels 3, 4, 5, and 6, respectively). As a result, it is

501

difficult to assess whether this expression is primarily related to the four channel-specific

502

instantaneous loudnesses or to the overall instantaneous loudness. The fact that the

503

entrainment at 45 and 100 ms was largely to the outermost bands (which are mostly less

504

correlated with overall instantaneous loudness (ρ=0.5, 0.8, 0.4, −0.1, and −0.1 for channels 1,

505

2, 7, 8 and 9 respectively)) lends credence to the fact that that the middle bands were also

AC C

EP

TE D

490

Thwaites et al

Tonotopic representation of loudness

24

ACCEPTED MANUSCRIPT

506

entrained at this time, and we assume this to be the case for the rest of this discussion. In

507

future studies, we aim to reduce correlations between the outputs of these transforms by

508

including naturalistic non-speech sounds (e.g. musical sounds, rain, running water etc.) in the

509

stimulus. The inherent insolvability of the inverse problem during EMEG source reconstruction

511

(Grave de Peralta-Menendez et al., 1996; Grave de Peralta-Menendez and Gonzalez-Andino,

512

1998) means that significant ‘point spread’ of localization data was present, and the

513

localization of entrainment (and the tonotopic map of the channel-specific instantaneous

514

loudnesses in Fig. 6 in particular) should thus be treated with caution. However, evidence

515

from other studies suggests that tonotopic mapping of frequency channels occurs in HG (see

516

Saenz and Langers, 2014, for review), and the results found in this study are consistent with

517

this. All expression at 100 ms latency (and, to a lesser extent, 45 ms) was found in or around

518

HG, and the channels that showed entrainment in regions closer to HG (e.g. channels 1 and

519

2), had low (i.e. more significant) p-values. This is to be expected: where the localization is

520

more accurate, the signal-to-noise ratio is presumably higher, resulting in lower p-values.

521

Nevertheless, it is difficult to infer a tonotopic layout of CFs from the current data, beyond a

522

possible low-frequency/anterior, high-frequency/posterior orientation (Fig. 5 cf. Baumann et

523

al., 2013; Moerel et al., 2014; Saenz and Langers, 2014; Su et al., 2015). Improvements in

524

source reconstruction (through the gathering of more data or improved inverse techniques)

525

may reveal this layout more accurately.

SC

M AN U

TE D

EP

It cannot be determined from these results why the channel-specific instantaneous

AC C

526

RI PT

510

527

loudness information is ‘moved’ or ‘copied’ (i.e. with no intervening transform, signaled in

528

Fig. 7 by a null() function) from 45 ms to 100 ms to a different region of the brain. Storage,

529

or preparation for integration with other information, are both possibilities. Alternatively, one

530

of these periods of entrainment may reflect entrainment to the output of a correlated, but

531

untested, transform, perhaps one carried out as part of the construction of channel-specific

532

instantaneous loudness.

533 534

Some evidence was found of entrainment to the channel-specific short-term loudnesses, but this evidence was several orders of magnitude below those for the most

Thwaites et al

Tonotopic representation of loudness

25

ACCEPTED MANUSCRIPT

significant instantaneous or short-term loudness expressions. The weak significant

536

entrainment to channel-specific short-term loudness may reflect a genuine effect or it may

537

reflect false positives, since the channel-specific short-term loudness is correlated with the

538

channel-specific instantaneous loudness for the same channel (ρ=0.9, 0.8, 0.8, 0.8, 0.8, 0.8,

539

0.7, 0.8 and 0.7 for channels 1-9 respectively), and, for the middle channels, the channel-

540

specific short-term loudness is correlated with the overall short-term loudness (ρ=0.9, 0.8 and

541

0.8 for channels 3-5 respectively). Hence the data do not allow a clear conclusion about

542

whether channel-specific instantaneous loudness is summed across channels to give the

543

overall instantaneous loudness and the overall short-term loudness is determined from the

544

overall instantaneous loudness, as assumed in the loudness model of Glasberg and Moore

545

(2002), or whether short-term loudness is determined separately for each channel from the

546

instantaneous loudness for that channel, and then short-term loudness values are summed

547

across channels to give the overall short-term loudness, as assumed in the models of

548

Chalupper and Fastl (2002) and Moore et al. (2016). The evidence from this study appears to

549

favour the former.

SC

M AN U

Overall, these findings suggest a set of pathways whereby auditory magnitude

TE D

550

RI PT

535

information is moved through regions of the cortex (Fig. 7). Under this interpretation, the

552

instantaneous loudness transform is determined in a number of frequency bands, each of

553

which represents the information from a restricted region of the basilar membrane, with the

554

output of this transformation being entrained in or around HG at 45 ms. This information is

555

then moved or copied (seemingly untransformed) to other areas located in HG or its environs

556

at a latency of 100 ms. From there, the information in each of these channels is combined,

557

with the output of this transformation entrained in DLS at 165 ms. This instantaneous

558

loudness is transformed to short-term loudness, to be entrained in both the DLS and superior

559

temporal sulcus at 275 ms latency.

560

AC C

EP

551

Thwaites et al

Tonotopic representation of loudness

26

SC

RI PT

ACCEPTED MANUSCRIPT

561

Figure 7. Implied pathway of loudness information. The figure illustrates one interpretation of the pathway

563

suggested by the findings of this study, with latencies W, X, Y and Z (labeled in black boxes), corresponding to

564

those same labels in Fig 3A, 3B and 3C. First, the channel-specific instantaneous loudness transforms are

565

applied, with the result of these transformations being entrained in or around HG at 45 ms (locations are

566

approximate due to the source localization error). This information is then moved or copied (seemingly

567

untransformed) to other areas located in HG or its environs (at 100-ms latency). The information in each of

568

these channels is then combined to form instantaneous loudness, expressed in the DLS at 165 ms. From here,

569

the instantaneous loudness is transformed to the short-term loudness, to be entrained in both the DLS and

570

superior temporal sulcus (at 275 ms latency).

571

573

7. Conclusions

AC C

572

EP

TE D

M AN U

562

The results of this study suggest that the channel-specific instantaneous loudness

574

transforms of incoming sound take place before 45 ms latency, and entrainment to the output

575

of these transforms occurs in different regions of the cortex at latencies of 45 and 100 ms

576

(bilaterally, in and around HG). Between 100 ms and 165 ms, these channel signals are

577

combined, forming instantaneous loudness, entrained at 165 ms in DLS. The short-term

578

loudness transform is then applied, with entrainment primarily at 275 ms, bilaterally in DLS

579

and superior temporal sulcus. The locations of this entrainment are approximate due to the

Thwaites et al

Tonotopic representation of loudness

27

ACCEPTED MANUSCRIPT

580

inherent error in source estimation of EMEG data. More work is needed to improve the

581

accuracy of these reconstructions in order to improve the certainty of these locations.

582 583

Acknowledgements

585

RI PT

584 This work was supported by an ERC Advanced Grant (230570, ‘Neurolex’) to WMW, by MRC Cognition and Brain Sciences Unit (CBU) funding to WMW

587

(U.1055.04.002.00001.01), and by EPSRC grant RG78536 to JS and BM. Computing

588

resources were provided by the MRC-CBU, and office space for AT by the Centre for

589

Speech, Language and the Brain. We thank Brian Glasberg, Russell Thompson, Lorraine

590

Tyler, Caroline Whiting, Elisabeth Fonteneau, Anastasia Klimovich-Smith, Gary Chandler,

591

Maarten van Casteren, Eric Wieser, Andrew Soltan, Alistair Mitchell-Innes and Clare Cook

592

for invaluable support.

M AN U

SC

586

593 594

References

596

TE D

595

Baumann, S., Petkov, C. I., and Griffiths, T. D. (2013). A unified framework for the organization of the primate auditory cortex. Front. Syst. Neurosci. 7:11. doi:

598

10.3389/fnsys.2013.00011

600 601 602 603

Brainard, D. H. (1997) The Psychophysics Toolbox, Spatial Vision, 10, 433-436. doi: 10.1163/156856897X00357

AC C

599

EP

597

Chalupper, J., and Fastl, H. (2002). Dynamic loudness model (DLM) for normal and hearing impaired listeners. Acta Acustica United with Acustica, 88, 378-386. Crouzet, O. & Ainsworth, W. A. (2001). On the various instances of envelope information on

604

the perception of speech in adverse conditions: An analysis of between-channel envelope

605

correlation. In Workshop on Consistent and Reliable Cues for Sound Analysis (pp. 1-4).

606

Aalborg, Denmark.

607 608

Davis, M., Sigal, R., and Weyuker, E. J. (1994). Computability, complexity, and languages: fundamentals of theoretical computer science. Burlington, MA: Morgan Kaufmann.

Thwaites et al

Tonotopic representation of loudness

28

ACCEPTED MANUSCRIPT

609

Fischl, B., Sereno, M. I., Tootell, and R. B., Dale, A. M. (1999). High-resolution inter-subject

610

averaging and a coordinate system for the cortical surface. Hum. Brain. Mapp. 8, 272-284.

611

Fuchs, P. (Ed) (2010) The Oxford Handbook of Auditory Science: the Ear. Oxford University

614 615 616 617

Glasberg, B. R., and Moore, B. C. J. (1990). Derivation of auditory filter shapes from

RI PT

613

Press, Oxford, UK.

notched-noise data. Hear. Res. 47, 103-138 doi: 10.1016/0378-5955(90)90170-T

Glasberg, B. R., and Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. J. Aud. Eng. Soc. 50, 331-342.

Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., Brodbeck, C.,

SC

612

Parkkonen, L., and Hämäläinen M. (2014) MNE software for processing MEG and EEG

619

data, NeuroImage 86: 446-460 doi: 10.1016/j.neuroimage.2013.10.027

620

M AN U

618

Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., Brodbeck, C., Goj, R.,

621

Jas, M., Brooks, T., Parkkonen, L., and Hämäläinen, M. (2013) MEG and EEG data

622

analysis with MNE-Python, Front. Neurosci. 7 doi: 10.3389/fnins.2013.00267

623

Grave de Peralta-Menendez, R., Gonzalez-Andino, S. L., and Lutkenhoner, B. (1996). Figures of merit to compare linear distributed inverse solutions. Brain. Topogr. 9:117–

625

124. doi: 10.1007/BF01200711

626

TE D

624

Grave de Peralta-Menendez, R., and Gonzalez-Andino, S. L. (1998). A critical analysis of linear inverse solutions to the neuroelectromagnetic inverse problem. IEEE Trans.

628

Biomed. Eng. 45, 440–8. doi: 10.1109/10.664200

630 631

Hämäläinen, M. S., and Ilmoniemi, R. J. (1994). Interpreting magnetic fields of the brain:

AC C

629

EP

627

minimum norm estimates. Med. Biol. Eng. Comput. 32, 35-42. doi: 10.1007/BF02512476 Helmholtz, H. L. F. (1863) Die Lehre von den Tonempfindungen als physiologische

632

Grundlage fur die Theorie der Musik (On the Sensations of Tone as a Physiological Basis

633

for the Theory of Music), F. Vieweg, Braunschweig, Germany.

634

Klein, A., and Tourville, J. (2012) 101 labeled brain images and a consistent human cortical

635

labeling protocol Front. Brain Imag. Methods. 6, article 171, 1-12.

636

doi: 10.3389/fnins.2012.00171

Thwaites et al

637 638

Tonotopic representation of loudness

29

ACCEPTED MANUSCRIPT

Kleiner, M., Brainard. D., Pelli, D. (2007). What’s new in Psychtoolbox-3? Perception 36, ECVP Abstract Supplement.

639

The Kymata Atlas; Cambridge University (2016) https://kymata.org/

640

Meyer, A.C., and Moser, T. (2010). Structure and function of cochlear afferent innervation.

642 643

Curr. Opin. Otolaryngol. Head Neck Surg. 18, 441-6. doi:

RI PT

641

10.1097/MOO.0b013e32833e0586. Moerel, M., De Martino, F., and Formisano, E. (2014). An anatomical and functional topography of human auditory cortical areas. Front. Neurosci. 8:225. doi:

645

10.3389/fnins.2014.00225

647 648 649 650 651 652

Moore, B. C. J. (2012). An Introduction to the Psychology of Hearing, 6th Ed., Brill, Leiden, The Netherlands.

Moore, B. C. J., Glasberg, B. R., and Baer, T. (1997). Model for the prediction of thresholds,

M AN U

646

SC

644

loudness, and partial loudness. J. Audio. Eng. Soc. 45, 224-240. Moore, B. C. J., Glasberg, B. R., Varathanathan, A., Schlittenlacher, J., 2016. A loudness model for time-varying sounds incorporating binaural inhibition. Trends Hear. (in press). Morosan, P., Rademacher, J., Schleicher, A., Amunts, K., Schormann, T., and Zilles, K. (2001). Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into

654

a spatial reference system. Neuroimage 13, 684-701. doi: 10.1006/nimg.2000.0715

655

Palmer, A. R. (1995). Neural signal processing. In B. C. J. Moore (Ed.) Hearing (pp. 75-

658

EP

657

121). San Diego: Academic Press.

Pelli, D. G. (1997) The VideoToolbox software for visual psychophysics: Transforming numbers into movies, Spat. Vis. 10, 437-442. doi: 10.1163/156856897X00366

AC C

656

TE D

653

659

Rademacher, J., Galaburda, A. M., Kennedy, D. N., Filipek, P. A., and Caviness, V. S. J.

660

(1992). Human cerebral cortex: Localization, parcellation, and morphometric analysis

661

with magnetic resonance imaging. Cognitive Neurosci. 4, 352-374.

662 663 664 665

Robles, L. and Ruggero, M. A. (2001). Mechanics of the mammalian cochlea. Physiol. Rev. 81, 1305-1352. Saenz, M. and Langers, D.R.M. (2014) Tonotopic mapping of human auditory cortex. Hear. Res. 307, 42-52. Doi:10.1016/j.heares.2013.07.016

Thwaites et al

666 667 668

Tonotopic representation of loudness

30

ACCEPTED MANUSCRIPT

Schmahmann, J. D., Doyon, J., Toga, A. W., Petrides, M., and Evans, A. C. (2000). MRI Atlas of the Human Cerebellum. San Diego, CA: Academic Press. Su, L., Zulfiqar, I., Jamshed, F., Fonteneau, E., and Marslen-Wilson W. (2014). Mapping tonotopic organization in human temporal cortex: Representational similarity analysis in

670

EMEG source space. Front. Neurosci. 8:368 doi: 10.3389/fnins.2014.00368

671 672 673

RI PT

669

Taulu, S., Simola, J., and Kajola, M. (2005). Applications of the signal space separation method. IEEE Trans. Sig. Proc. 53, 3359-3372. doi: 10.1109/TSP.2005.853302

Thwaites, A., Glasberg, B.R., Nimmo-Smith, I., Marslen-Wilson W.D., and Moore, B.C.J. (2016). Representation of instantaneous and short-term loudness in the human cortex.

675

Front. Neurosci. 10:183. doi: 10.3389/fnins.2016.00183

Thwaites, A., Nimmo-Smith, I., Fonteneau, E., Patterson, R. D., Buttery, P. and Marslen-

M AN U

676

SC

674

677

Wilson, W.D. (2015). Tracking cortical entrainment in neural activity: auditory processes

678

in human temporal cortex. Front. Comp. Neurosci. 9:5. doi: 10.3389/fncom.2015.00005

679

von Bekesy, G. (1949) The vibration of the cochlear partition in anatomical preparations and

TE D EP

681

in models of the inner ear. J. Acoust. Soc. Am. 21, 233-245. doi: 10.1121/1.1906502

AC C

680

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT Highlights

EP

TE D

M AN U

SC

RI PT

The latency of cortical entrainment to various aspects of loudness was assessed. For channel-specific instantaneous loudness the latencies were 45 and 100 ms. For overall instantaneous loudness the latency was 165 ms. For overall short-term loudness the latency was 275 ms. Entrainment to channel-specific short-term loudness was weak.

AC C

• • • • •