Optimized loudness-function estimation for categorical loudness scaling data

Optimized loudness-function estimation for categorical loudness scaling data

Hearing Research 316 (2014) 16e27 Contents lists available at ScienceDirect Hearing Research journal homepage: www.elsevier.com/locate/heares Resea...

667KB Sizes 0 Downloads 83 Views

Hearing Research 316 (2014) 16e27

Contents lists available at ScienceDirect

Hearing Research journal homepage: www.elsevier.com/locate/heares

Research paper

Optimized loudness-function estimation for categorical loudness scaling data Dirk Oetting a, b, *, Thomas Brand b, Stephan D. Ewert b a

Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany b €t Oldenburg, 26111 Oldenburg, Germany Medizinische Physik and Cluster of Excellence Hearing4all, Universita

a r t i c l e i n f o

a b s t r a c t

Article history: Received 7 November 2013 Received in revised form 3 July 2014 Accepted 9 July 2014 Available online 21 July 2014

Individual loudness perception can be assessed using categorical loudness scaling (CLS). The procedure does not require any training and is frequently used in clinics. The goal of this study was to investigate different methods of loudness-function estimation from CLS data in terms of their testeretest behaviour and to suggest an improved method compared to Brand and Hohmann (2002) for adaptive CLS. Four different runs of the CLS procedure were conducted using 13 normal-hearing and 11 hearing-impaired listeners. The following approaches for loudness-function estimation (fitting) by minimising the error between the data and loudness function were compared: Errors were defined both in level and in loudness direction, respectively. The hearing threshold level (HTL) was extracted from CLS by splitting the responses into an audible and an inaudible category. The extracted HTL was used as a fixed starting point of the loudness function. The uncomfortable loudness level (UCL) was estimated if presentation levels were not sufficiently high to yield responses in the upper loudness range, as often observed in practise. Compared to the original fitting method, the modified estimation of the HTL was closer to the pure-tone audiometric threshold. Results of a computer simulation for UCL estimation showed that the estimation error was reduced for data sets with sparse or absent responses in the upper loudness range. Overall, the suggested modifications lead to a better testeretest behaviour. If CLS data are highly consistent over the whole loudness range, all fitting methods lead to almost equal loudness functions. A considerable advantage of the suggested fitting method is observed for data sets where the responses either show high standard deviations or where responses are not present in the upper loudness range. Both cases regularly occur in clinical practice. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Loudness is a fundamental perceptual measure that can be assigned to any stimulus. While depending on several physical stimulus properties, like level, duration, bandwidth, and spectral content, listeners are typically able to perform loudness judgements without prior training or accommodation to the task. An overview of different methods to measure individual loudness perception can be found in Launer (1995). Categorical loudness scaling (CLS; Heller, 1985; Hellbrück and Moser, 1985; Allen et al., 1990; Hohmann and Kollmeier, 1995; Humes et al., 1996; Cox

* Corresponding author. Project Group Hearing, Speech and Audio Technology of the Fraunhofer IDMT and Cluster of Excellence Hearing4all, Marie-Curie-Str. 2, 26129 Oldenburg, Germany. Fax: þ49 441 2172 450. E-mail address: [email protected] (D. Oetting). http://dx.doi.org/10.1016/j.heares.2014.07.003 0378-5955/© 2014 Elsevier B.V. All rights reserved.

et al., 1997; Keidser et al., 1999; Brand and Hohmann, 2002) is a commonly used method to quantify the individual loudness perception, particularly with respect to hearing deficits. The CLS method is standardised in ISO 16832 (2006). Heeren et al. (2013) analysed the relation between loudness measured in categorical units and classical sone and phone units (Fletcher and Munson, 1933; Stevens, 1936, 1959) and provide conversion formulae. In CLS measurements, the loudness perception for stimuli of different levels is rated on a scale ranging from “not heard” to “too loud”. Signals of different intensities are presented in randomised order and the subject has to choose a category like “soft”, “medium”, or “loud” on a response scale (see Fig. 1, 11-point scale according to Hohmann and Kollmeier, 1995; Brand and Hohmann, 2002). In normal-hearing (NH) listeners, categorical loudness impressions from “very soft” to “very loud” define a large level range of about 80 dB and more, depending on the stimulus. The loudness perception in sensorineural hearing-impaired (HI) listeners

D. Oetting et al. / Hearing Research 316 (2014) 16e27

too loud very loud

50 45

Loudness (CU)

40

loud

35 30

medium

25 20

soft

15 10

very soft not heard

5 0 0

20

40 60 80 Level (dB HL)

100

Fig. 1. Response scale (left panel) and typical data (right panel) of the ACALOS procedure. The circles indicate responses of the subject according to the presentation level in dB HL. The black line shows the loudness function estimated by the reference fitting model BY.

changes and a steep rise of loudness from the elevated threshold to “very loud” is often observed. The category “very loud” generally occurs at comparable levels as in NH listeners. The CLS procedure is fast and does not require any training which makes it applicable for routine use in the clinic for assessment of categorical loudness perception in hearing-impaired listeners. Elberling (1999) mentioned that one difficulty using loudness scaling procedures is that different measurement procedures for CLS (e.g. increasing vs. decreasing level presentation) produce different loudness functions. Additionally, he stated that the intersubject variance of loudness judgements for NH listeners is considerable and the 95% confidence interval covers a range of 32e42 dB at the comfortable level. A major step to eliminate these difficulties was the standardisation in ISO 16832 (2006). However, a further improvement of the method for clinical application for diagnosis and particularly for hearing-aid fitting is still desired. CLS has been used for different diagnostic purposes. The “recruitment phenomenon” (Fowler, 1950) was quantified in individual listeners using the CLS procedure (Launer et al., 1996; Cox et al., 1997; Al-Salim et al., 2010; Van Esch et al., 2013). Furthermore, Launer et al. (1994) measured the reduced loudness summation effect of broad-band signals in HI listeners compared to NH listeners using CLS. Jürgens et al. (2011) used CLS to estimate inner and outer hair cell loss and to quantify the individual nonlinear processing on the basilar membrane. The most prominent application of CLS is to assess the supra-threshold loudness perception of an individual listener for different signals. Brand and Hohmann (2001) analysed the effect of hearing loss, centre frequency, and bandwidth on the shape of the loudness function using the CLS procedure. Anweiler and Verhey (2006) used the CLS procedure to study spectral loudness summation for short and long signals. Rennies et al. (2013) investigated loudness of speech and speechlike signals using a loudness matching procedure and the CLS procedure. They concluded that both procedures are suitable to measure loudness of speech and speech-like signals at intermedigo et al. (1999) used the CLS procedure to measure ate levels. Galle the loudness growth function in cochlear implant subjects altering the pulse widths. In a recent study, Theelen-van den Hoek et al. (2014) found that the reproducibility of CLS in electrical hearing was comparable to the reproducibility for acoustical stimulation. Stenfelt and Zeitooni (2013) used CLS to measure loudness

17

functions with air and bone conduction stimulation. Loudness functions of bone conducted sounds were steeper compared to air conducted sounds. Supra-threshold loudness perception data provided by the CLS procedure has been used in various studies to fit hearing aids (Kiessling et al., 1996; Valente and Van Vliet, 1997; Herzke and Hohmann, 2005; Kreikemeier, 2011; Ewert and Grimm, 2012). Nevertheless, CLS has not replaced the measurement of audiometric thresholds for fitting hearing aids, but can be regarded as valuable addition. The adaptive categorical loudness scaling (ACALOS) procedure (Brand and Hohmann, 2002) considered here, consists of two phases and uses a scale with 11 response categories as shown in Fig. 1. In the first phase, the auditory dynamic range of the subject is roughly estimated. The starting stimulus level is 75 dB HL and is increased by 10 dB until the response “too loud” is reached. The step size is reduced to 5 dB in this increasing track for levels above 90 dB HL. In a decreasing track, the starting level is reduced by 15 dB until the response is “not heard”. The level is then increased by 5 dB until the signal becomes audible. The increasing track stops at 105 dB HL and the decreasing track stops at 0 dB HL. Levels are presented randomly interleaved between the increasing and the decreasing track. Assuming that the last levels of the first phase correspond to 5 and 50 CU, the levels for 15, 25, 35, and 45 CU are estimated by a linear function for the second phase. These levels are presented in a randomised order in the second phase. A linear function is fitted to all previously given responses and the levels for 5, 15, 25, 35, and 45 CU are again estimated using the fitted linear function. The procedure repeats this estimation and presenting step a second time. The ACALOS procedure has two main advantages compared with other CLS procedures. First, the dynamic range is estimated during the measurement procedure. Hence, no pre-measurement phase is required. Other CLS procedures require this premeasurement to ensure that the number of inaudible and too loud stimuli is low (Allen et al., 1990; Elberling and Nielsen, 1993; Ricketts and Bentler, 1996; Rasmussen et al., 1998). Second, the level presentation is in pseudo randomised order. CLS procedures using ascending levels (Pascoe, 1988; Cox et al., 1997; Keidser et al., 1999) cause significant bias effects in the loudness function compared to descending or randomly selected level sequences (Kollmeier, 1997; Jenstad et al., 1997). The ACALOS procedure is a standard procedure in the “auditory profile”, a standardised test battery of audiological tests for clinical applications (Dreschler et al., 2008). The “auditory profile” was evaluated in a multi-centre study and several tests were conducted at equal subjective loudness measured with the ACALOS procedure (Van Esch et al., 2013). Jesteadt and Joshi (2013) compared different procedures for loudness scaling and showed that the ACALOS procedure was the most robust procedure compared to magnitude estimation and magnitude production. A typical data set as a result from the ACALOS procedure consist of 22e25 responses per frequency which means that about 2 responses per category are available. An example is provided in Fig. 1. Responses between “very soft” and “medium” have the highest intra-subject standard deviation (Robinson and Gatehouse, 1996; Keidser et al., 1999) which makes it difficult to estimate precisely the underlying loudness function. Al-Salim et al. (2010) found mean intra-subject standard deviations of NH and HI listeners per loudness category between sessions of 6.6 and 7.8 dB, respectively. To increase the precision of the estimated loudness perception, responses of the different loudness categories are combined in a loudness function described by only a few parameters (e.g. straight line, two lines, power function). An overview of different loudness functions applicable to CLS is given in Brand (2000). A loudness

18

D. Oetting et al. / Hearing Research 316 (2014) 16e27

function monotonically relates the stimulus level on the x-axis to the loudness perception scale on the y-axis. The fitting error, which measures the deviation between the data points and the loudness function, can be minimised either in level or loudness direction. In many of the conducted studies using CLS, the loudness functions were fitted minimising deviations on the loudness perception scale (y-direction; Allen et al., 1990; Brand and Hohmann, 2002). This minimises the deviation for predicting the perceived loudness category for a given presentation level. Depending on the desired application, the situation might change, e.g. if the estimated loudness function is to be used for fitting hearing aids. A loudness based hearing-aid fitting generally compares predicted levels required for the same loudness perception based on a (reference) NH loudness function and an individually fitted HI loudness function. The hearing-aid gain is derived from the estimated level difference. Thus, for fitting hearing aids, the goal is to predict the corresponding level for a specific loudness category. Consequently, deviations should be minimised on the level scale (x-direction) so that the error for the predicted gain is minimised. Another CLS application is to determine parameters characterising the loudness perception on the level scale like medium loudness level (MLL, defined as the loudness rating “medium”) or uncomfortable loudness level (UCL, loudness rating “too loud”). Here, deviations should also be minimised on the level scale to minimise the prediction error for these levels. An important factor in clinical use of the CLS procedure is the maximum presentation level that can be used to reach the UCL of the subject. In Brand and Hohmann (2002) the maximum presentation level was 115 dB HL. The UCL was not reached in 9% of the measurement runs with 10 NH listeners and in 10% of the runs with 10 HI listeners. For clinical application, the maximum presentation €rTech, 2010). In Keller level is typically reduced to 105 dB HL (Ho (2006), the grand mean loudness discomfort level (LDL) for pure tones for NH and HI listeners was about 100 dB HL with a standard deviation of 10 dB. Assuming these values, the UCL cannot be reached in at least 30% of the cases. If the UCL is not reached, the data set of the CLS can be regarded as incomplete. Such incomplete data sets can influence the results as, for example, subjects tend to map all available loudness categories to the presented range of levels (Hohmann et al., 1997). Moreover, fitting a loudness function to incomplete data sets often leads to problems for ACALOS: loudness functions bending towards very high UCL values are regularly observed because the fitted function is extrapolated over a large loudness range. For diagnostic purposes simple and intuitive parameters to describe the subject's loudness perception are of interest. These parameters should be independent of the different CLS procedures (Heller, 1985; Allen et al., 1990; Cox et al., 1997; Keidser et al., 1999; Brand and Hohmann, 2002), independent of the different response scales and independent of the mathematical parameters describing the loudness function. All mentioned scales include at least four categories, namely, “soft”, “ok” (or “medium” or “comfortable”, respectively), “loud”, and “too loud” (or “uncomfortably loud”, respectively) which makes at least the resulting levels at these named categories comparable. Thus the levels at these distinct categories appear particularly suited as intuitive parameters. Furthermore, the loudness function is expected to be linked to the hearing threshold for the same signal. It appears reasonable to expect the hearing threshold level between the lowest category where loudness is perceived and the category “not heard” or “no response”. In this work different loudness-function estimation (fitting) methods for analysing data from the ACALOS procedure were investigated. The motivation for the suggested modifications was based on typical fitting problems mentioned above which

frequently occur in clinical practice and make the result of the CLS procedure sometimes useless. The goal of this study is to i) develop a more robust fitting function using the 22e25 responses of the ACALOS procedure for application of hearing-aid fitting and to ii) characterise the fitted loudness function by meaningful and descriptive diagnostic parameters, providing benefit for use in clinical practise. Three aspects were assessed in the investigated loudnessfunction fitting models: a) minimising the fitting error in x-direction, b) using the estimated hearing threshold level (HTL) of the CLS data as a starting point of the loudness function, and c) handling of incomplete data sets, if not enough responses in the upper part of the loudness scale could be gathered during data collection. 2. Methods 2.1. Subjects Thirteen NH listeners and 11 HI listeners participated in the study. The NH group was aged between 21 and 65 years (mean: 31.9 years, standard deviation: 11.1 years) and had audiometric thresholds of 20 dB HL or better at the test frequencies of 500, 2000, and 6000 Hz (ANSI S3.6, 2010). The HI group was aged between 25 and 77 years (mean: 60.5 years, standard deviation: 18.9 years) with slight-to-moderate sensorineural hearing losses (PTA between 16.3 and 56.3 dB, mean 40.7 dB). All listeners received an hourly compensation for their participation in the study. 2.2. Procedure and stimuli Subjects participated in three experimental sessions, each lasting about 1 h. All measurements were conducted in a soundinsulated booth. Pure-tone hearing thresholds were determined using manual pure-tone audiometry at the beginning of each session. The procedure described by the American Speech-LanguageHearing Association (2005) with a minimum step size of 5 dB was used. CLS data using the ACALOS procedure (Brand and Hohmann, 2002) were obtained in four repetitions on at least two different days. Three NH listeners and one HI listener conducted three repetitions only. The ACALOS procedure used 1000ms, one-third-octave stationary low-noise noise at centre frequencies of 500 Hz, 2 kHz, and 6 kHz, presented monaurally to the listeners. The subject's task was to rate the loudness of each stimulus using 11 different loudness categories from “not heard” to “too loud” on a touch screen as shown in Fig. 1. Each loudness category is numbered using categorical units, which were not visible to the listener, for further data processing. The scale starts at 0 categorical units (CU) for “not heard” and increases by 5 CU steps towards 50 CU for “too loud” including 4 intermediate categories between the labelled ones. All experimental procedures were approved by the Ethics Committee of the University of Oldenburg. 2.3. Apparatus Audiograms were measured using a Siemens Unity audiometer with Sennheiser HDA200 headphones. The ACALOS procedure was €rTech (2010) Oldenburg Measurement Applimeasured using Ho cations, research edition 1.3. Signals were presented using a RME DIGI96 sound card, an RME ADI-8 pro D/A converter, a TuckereDavis HB7 headphone driver, and Sennheiser HDA200 headphones. Calibration was performed using a B&K artificial ear 4153, a B&K 0.5-inch microphone 4134, a B&K microphone preamplifier 2669, and a B&K measuring amplifier 2610.

D. Oetting et al. / Hearing Research 316 (2014) 16e27

In Brand (2000) 10 different loudness functions describing CLS data were analysed. They were compared in terms of their bias (compare Section 3.3) against the average ratings for each loudness category and the intra-subject standard deviations between different runs. 10 NH and 10 HI listeners participated and performed 10 CLS tracks on each ear. Brand (2000) and later Brand and Hohmann (2002) suggested a model loudness function F(L) yielding the smallest bias (as defined in Section 3.3) and an intraear standard deviation of 4e5 dB. This function consists of two linear parts with slope values mlow and mhigh and a juncture Lcut at a fixed value of 25 CU. A fixed break point between the upper and the lower part at the “medium” category was also found and used for fitting in Al-Salim et al. (2010). Both linear parts are smoothed zier curve. The equation between 15 and 35 CU using a quadratic Be describing the loudness function is

8 < 25CU þ mlow ðL  Lcut Þ for L  L15 FðLÞ ¼ bezðL; Lcut ; L15 ; L35 Þ for L15 < L < L35 ; : 25CU þ m high ðL  Lcut Þ for L  L35

(1)

with F denoting the loudness in CU for a stimulus with the level L. zier function bez including the inverse function The smoothing Be bez1 is given in the appendix of Brand and Hohmann (2002). Fig. 1 shows an example of this function indicated by the black line. For describing the same loudness function with more intuitive parameters see Section 2.5.5. 2.5. Loudness-function fitting methods This section describes the reference fitting method for estimating the loudness function using the responses of the ACALOS procedure of Brand and Hohmann (2002). The later sections describe the modifications of the fitting method as motivated in the introduction. 2.5.1. Reference fitting method This fitting method is referred to as BY, with B indicating the zier loudness function as given in Section 2.4, and with Y indiBe cating that deviations are minimised on the loudness scale (y-scale). A modified least-square fit was performed in the y-diP 2 rection minimising Di between the data Ri(Li) and the loudness i function F(Li) as indicated by the vertical grey bars in Fig. 1. The equation for the distance Di is

8 < 0 dB for FðLi Þ < 0 CU and Ri ðLi Þ ¼ 0 CU Di ¼ 0 dB for FðLi Þ > 50 CU and Ri ðLi Þ ¼ 50 CU : Ri ðLi Þ  FðLi Þ otherwise

(2)

as given in Brand and Hohmann (2002). The parameters mlow, mhigh, and Lcut were un-constrained. An example of a typical data set and the corresponding loudness function fitted with the BY model is given in Fig. 1. 2.5.2. Modification I: Minimising deviation on the level scale The first alternative approach for estimating loudness functions considers the direction of minimising the deviation during fitting. Using the same underlying loudness function as in BY, the leastsquare fit was now performed in the direction of presentation level (x-direction), referred to as BX. The distance between the presentation level of the response Li(Ri) and the estimated level for the given loudness category F1(Ri) was minimised using the inverse loudness function F1. Hence, the distance

8 < 0 dB for FðLi Þ < 0 CU and Ri ðLi Þ ¼ 0 CU; Di ¼ 0 dB for FðLi Þ > 50 CU and Ri ðLi Þ ¼ 50 CU; : Li ðRi Þ  F 1 ðRi Þ otherwise

(3)

is now defined over the horizontal distance between the data points and the model function. To restrict the effect of possible outliers, a limitation for the absolute value of Di was applied. The results showed that a value of 40 dB leads to the lowest bias using the BX fitting method for the current data. Only 4 responses (0.01%) in the whole data set of this study had absolute values of Di greater than 40 dB. Additionally, mlow and mhigh were constrained to values between 0.2 and 5 CU/dB which limits the possible dynamic range. The minimum dynamic range assuming a slope of 5 CU/dB would be 10 dB between 0 and 50 CU. The maximum dynamic range would be 250 dB for a slope of 0.2 CU/dB. Slopes below 0.2 CU/dB did not occur for the current data. Slopes above 5 CU/dB occurred in 7% of the runs mainly in incomplete runs where no responses at 50 CU were provided. If the fitting constraints are reached, the suggested method issues a warning to indicate a possible problem with the data set.

2.5.3. Modification II: Inclusion of estimated hearing threshold This modification, referred to as BTX, addresses the common problem of under- or overestimation of the HTL at frequencies where the HTL is normal. An example for underestimating the HTL is given in Fig. 1 and for overestimating the hearing threshold is given in Fig. 2. Here, the grey lines show the loudness function using the fitting method BY and BX, respectively. The estimated level for 0 CU (“not heard”) is 28.0 and 33.9 dB HL for BY and BX, respectively, although responses that were heard obviously exist below this value. The HTL is a stable point in the auditory system (Burns and Hinchcliffe, 1957). It is used as the starting point of the loudness function (Buus et al., 1998) in this modification. A logistic function similar to Lecluyse and Meddis (2009) was used to estimate the HTL: the data sets were rearranged in two categories labelled “heard” for all responses between 5 and 50 CU and “not heard” for

Subject Reponses BY BX BTX Hearing Threshold

45 Loudness (CU)

2.4. Loudness function

19

35 25 15 5 0

heard All responses between 5 and 50 CU. not heard 0

20

40

60 80 Level (dB HL)

100

Fig. 2. Example of estimating the HTL using a logistic function in the fitting method BTX. Upper panel: The solid and dashed grey lines indicate the loudness function estimated using the BY and BX fitting methods, respectively. The fitting method BTX (black line) uses the HTL as a fixed point at 2.5 CU indicated by the square. Lower panel: The HTL is estimated from the “not heard” and “heard” responses using the logistic function.

20

D. Oetting et al. / Hearing Research 316 (2014) 16e27

all responses at 0 CU. With this rearrangement (see lower panel of Fig. 2), a logistic function of the form

pðLÞ ¼

1 1 þ ekðLqÞ

(4)

was fitted to the data using the least-squares method in y-direction. In Eqn. (4), p(L) is the proportion of audible responses, L is the level of the stimulus, k is a slope parameter with a limited range from 0.4 to 1 (compare Figs. 5 and 6 in Lecluyse and Meddis, 2009). The level Q indicates where the probability of heard-responses is 50%. The estimated value for Q was used as a fixed point of the loudness function at 2.5 CU. This point was interpreted as the HTL and all responses at 0 CU were removed from the data set for fitting the loudness function. If a listener never responded “not heard” even for the lowest presentation level of 0 dB HL, an HTL of 5 dB HL was assumed. There are data sets where the listener responded “very soft” at a certain level and “not heard” at a higher level. For these data sets, the estimated HTL might exceed the range of the presented levels. In these cases, the modification was not applied and the fitting method BX was used instead (a warning is issued). With the inclusion of the threshold estimate at 2.5 CU, the lower part of the loudness function is strongly constrained. Because of the fixed intersection point of the lower and upper part at 25 CU this constraint also affects the behaviour of the upper part. Thus a high number of data points in the lower loudness range can strongly influence the behaviour of the upper part of the loudness function. For NH listeners, the spread of responses in the lower part of the loudness function was often quite large, which is illustrated by the example in Fig. 2 at 5 CU. To counterbalance the resulting strong effect of many high-variance data points in the lower loudness region on the upper part of the loudness function, an inverse variance weighting was applied (Bevington and Robinson, 2003). This compensates for the differences in the intra-subject standard deviation over the loudness range (Robinson and Gatehouse, 1996) ~ 2 is now as shown in Fig. 3. Each weighted squared distance D i defined as

~2 ¼ D i

D2i s2i

¼

2 1 1 L ðR Þ  F ðR Þ : i i i s2i

(5)

2

~ is weighted with the inverse of the squared averaged Thus, D i intra-subject standard deviation s2i of the i-th loudness category as P 2 ~ . given in Fig. 3. The least-square fit now minimises D i Fig. 2 shows the differences of the loudness functions resulting from fitting method BY (solid grey line), BX (dashed grey line), and BTX (black line) for an example data set. The fitting methods BY and BX result in loudness functions predicting thresholds at 30 dB HL, although audible responses at 20 dB HL exist. 2.5.4. Modification III: Limitation of UCL In practical applications of the CLS method, responses in the upper loudness domain often cannot be reached because of a limited maximum presentation level (typical limit: 105 dB HL). An example is shown in Fig. 4. Here, no responses above 35 CU were collected and the whole loudness function was fitted based on the responses at and below 35 CU (“loud”). Therefore, the shape of the upper part was determined by the sparse data between 25 and 35 CU, given that the intersection of the underlying lines is fixed at 25 CU. The resulting loudness function between 35 and 50 CU is a pure extrapolation. This can lead to very high UCL values like 177 and 124 dB HL in the example in Fig. 4 for the BY and BX fitting models, respectively. To address possible problems with very high UCL estimates for data sets with sparse data in the upper CU range, two different methods for estimating the UCL value are proposed and compared. The first method is based on a suggestion by Brand (2007): If fewer than four responses between 35 and 50 CU were present, the UCL was derived using the estimated HTL from the loudness function. The UCL was extracted using data of Pascoe (1988) which relate the HTL to the mean UCL. The loudness function was fitted using a derived value for the UCL. For hearing thresholds up to 40 dB, the mean UCL is 100 dB HL and then linearly increases to 140 dB HL for hearing thresholds of 120 dB HL. It should be noted that for all hearing thresholds up to 40 dB HL a constant UCL value of 100 dB HL is assumed. This estimation of the UCL based on Pascoe's data is also used if mhigh is below 0.25 CU/dB. If both previous conditions are not met and the fitted loudness function exceeds an UCL value of 140 dB HL, a fixed UCL of 140 dB HL is assumed and a re-fit is performed. This fitting method is referred as BTPX. The estimated UCL for the example in Fig. 4 was 113 dB HL. The second method for estimating the UCL assumed a fixed slope of 1.53 CU/dB for mhigh (for details see Section 3.1) if fewer

NH data NH model HI data HI model

10

50

BY BX

40 Loudness / CU

Standard deviation / dB

15

5

BTPX BTUX

30 20 10

0

0

0

10

20 30 Loudness / CU

40

50

Fig. 3. Mean intra-subject standard deviation of the responses of NH and HI listeners for each loudness category (black symbols). The standard deviation is higher in the lower loudness domain. Mean intra-subject standard deviation from computer simulations of the individual subjects (grey symbols) matched quite well with the measured data.

40

60

80 100 Level / dB HL

120

140

Fig. 4. Example of estimating the uncomfortable level (UCL at 50 CU) when fewer than four responses are given between 35 and 50 CU (light grey area). The solid and dashed grey lines indicate the loudness function estimated using the BY and BX fitting method, respectively. The black line shows the result of the BTPX and BTUX fitting method which both used a method for estimating the upper loudness function.

D. Oetting et al. / Hearing Research 316 (2014) 16e27

than four responses between 35 and 50 CU were present. This fitting method is referred to as BTUX. In the example data set in Fig. 4 the estimated UCL using the BTUX fitting method is 110 dB HL. BTPX and BTUX equal BTX for data sets having at least four responses between 35 and 50 CU. Therefore, these modifications only affect data sets with “missing values”. 2.5.5. Descriptive parameters for the loudness function The parameters to describe the loudness function as used in Brand and Hohmann (2002) are the lower slope mlow, the upper slope mhigh, and the juncture between both lines Lcut. These three parameters originate from the mathematical definition of the loudness function, but do not provide an intuitive relationship to common parameters of loudness and dynamic ranges. For diagnostic purposes and for fitting hearing aids, intuitively accessible and descriptive parameters would provide an advantage compared to the former mathematically motivated parameters. The parameters HTL and UCL limit the auditory dynamic range (Smeds and Leijon, 2011) and are used by many prescription rules (for an overview see Table 1 in Kiessling, 2001). Additionally, several hearing-aid fitting procedures use the level at 15, 20, or 25 CU (“medium”) which is often associated with the “most comfortable loudness level” (MCL) as an additional parameter (Kiessling et al., 1996; Kreikemeier, 2011; Van Esch et al., 2013). Here, the “medium-loudness level” (MLL) is defined at 25 CU as a more descriptive parameter. Therefore, the parameters HTL at 2.5 CU, MLL at 25 CU, and UCL at 50 CU are defined and can be used as intuitive parameters to describe the loudness perception. These values are independent of the mathematical definition of the underlying loudness function. The transformation for the loudness function of Section 2.4 is given by the following equations:

MLL ¼ L25 ¼

22:5 CU mlow

(6)

12 0 ! 20 CU C 10 CU 10 CU B C B m  B0:5 þ 20 CU low20 CUC @ mhigh mlow  m A m high



(7)

2 20 CU mlow

 40 CU mlow

UCL ¼ L50 ¼ Lcut þ

CU  40 mlow

25 CU mhigh

low

þ Lcut 

10 CU mlow

Monte-Carlo computer simulations were carried out. The simulation requires an underlying “true” loudness function F(L) including assumptions about the statistics of the response behaviour during the CLS measurements. The response behaviour was modelled by using the loudness function defined in Section 2.4. A normal distribution with a standard deviation of 4 CU was assumed for the statistical behaviour of the listeners. Brand (2000) analysed the response behaviour of 9 NH and 10 HI listeners and derived a typical value of 4 CU for the standard deviation over the whole loudness range for both groups. For a given presentation level L, the simulated loudness judgement was gained by selecting a random number drawn from a normal distribution with the mean value F(L) and 4 CU standard deviation. The normal distribution was truncated at 2.49 and 52.49 CU and appropriately rescaled. The selected random number on the continuous scale was rounded towards the nearest possible response from 0 to 50 CU in 5 CU steps. To validate the described method for the computer simulations, the underlying loudness function was estimated for each NH and HI listener using the pooled data of all four CLS measurements. The loudness function was fitted to the data set using the reference least-square fit (BY) described in Section 2.5.1. This fitting method minimises the deviation between the loudness function and the given responses on the loudness scale (y-scale). Four CLS runs for each listener were then conducted with simulated responses based on the individual “true” loudness function. The average intrasubject standard deviation was calculated from the simulations and extracted from the data. Both values per loudness category are shown in Fig. 3. Differences between data and simulations of more than 2 dB only occurred at 0 and 5 CU for NH and HI listeners and at 25 CU for NH listeners. However, both deviations will not influence the simulations for UCL estimations. 3. Results For an evaluation of the proposed fitting methods, the data of 13 NH and 11 HI listeners who conducted four runs of the CLS procedure at three frequencies (500, 2000, and 6000 Hz) were used. The reference fitting method BY was used in Sections 3.1 and 3.2 to analyse the data. Comparisons between the different fitting methods are described in Sections 3.3e3.5. 3.1. Distribution of slopes of the loudness function

(8)

Additionally derived parameters as the complete dynamic range DR ¼ UCL  HTL, the lower dynamic range Dlow ¼ MLL  HTL, or the higher dynamic range Dhigh ¼ UCL  MLL might be used. For a NH listener the parameters mlow ¼ 0.3 CU/dB, Lcut ¼ 81 dB, and mhigh ¼ 1.4 CU/dB are transformed into HTL ¼ 6.1 dB, MLL ¼ 74.5 dB and UCL ¼ 99.0 dB. Loudness perception of a HI listener with mlow ¼ 1.0 CU/dB, Lcut ¼ 78.2 dB, and mhigh ¼ 1.4 CU/dB would be described by HTL ¼ 56.3 dB, MLL ¼ 78.2 dB, and UCL ¼ 96.5 dB. Obviously, an increased threshold can be detected. The lower dynamic range of the HI listener is Dlow ¼ 21.9 dB compared to Dlow ¼ 68.4 dB of the NH listener. For signals above the MLL both listeners made similar loudness judgements. 2.6. Computer simulations To assess the effect of different loudness fitting functions on UCL estimation with sparse loudness judgements in the upper CU range,

In Fig. 5 the values for the slopes of the two linear parts using the BY fitting method to the CLS data of the current study are shown. Typical slopes for the lower loudness function, mlow, for NH listeners are 0.3e0.4 CU/dB. HI listeners show quite a large spread

Freq. of Occurence / %

HTL ¼ L2:5 ¼ Lcut 

21

60

m

50

low

40

m

high

NH HI

30 20 10 0

0

0.3 0.6 0.9 1.2 1.5 0 0.6 1.2 1.8 2.4 3 3.6 Lower Slope / CU/dB Higher Slope / CU/dB

Fig. 5. Distribution of slopes mlow (left) and mhigh (right) of the BY method fitted to the 279 CLS data sets collected here. For NH listeners (dark grey) mlow is typically 0.3 CU/dB.

22

D. Oetting et al. / Hearing Research 316 (2014) 16e27

of mlow from 0.3 up to 1.4 CU/dB. The slope of the upper loudness function, mhigh, does not show such a distinct behaviour: values are between 0.3 and 3.3 CU/dB for both listener groups. Values above 3.6 CU/dB rarely occurred in the NH and in the HI group and were not consistent over different repetitions. This finding supports to limit the slopes to a range between 0.2 and 5 CU/dB (Section 2.5.2). The fixed slope mhigh ¼ 1.53 CU/dB in the BTUX fitting method was estimated using data sets with responses at 45 or 50 CU (206 out of 279). The interquartile range was between 0.9 and 1.8 CU/dB. Mean values were mhigh ¼ 1.56 CU/dB for NH and mhigh ¼ 1.50 CU/ dB for HI listeners, estimated by method BX. 3.2. Intra-subject standard deviation Fig. 3 shows the intra-subject standard deviations of the responses of NH and HI listeners of the current study in each loudness category as circles. Values of about 10 dB in the lower loudness domain for NH listeners and about 7 dB for HI listeners are observed. The intra-subject standard deviation decreased to about 4 dB in the upper loudness domain for both groups. Additionally, the values of the computer simulations of the NH and HI listeners are shown as crosses in Fig. 3. The standard deviations produced by the computer simulation were in good agreement with the measured values. At 0 CU and 5 CU, the simulated standard deviations differed from the measured values indicating that the response behaviour at the threshold was not appropriately modelled. 3.3. Bias and root-mean-square error (RMSE) To assess the deviations between the fitted loudness functions and the data, the bias and the RMSE were calculated. Responses of all four runs were pooled and for each loudness category x the median stimulus level b L x was calculated. This served as the reference loudness-function estimate for each listener. The mean bias B based on a single run is given by



10 1 X L 5$x F 1 ð5$xÞ  b 11 x¼0

(9)

with F1(x) denoting the inverse of the fitted loudness function for a single run. This bias reveals possible systematic mismatches between the fitting method and the data (which may be caused by the small sample size or problems of the model function or fitting method). The RMSE between the median stimulus level and the loudness function is

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 10  2 u1 X F 1 ð5$xÞ  b L 5$x RMSE ¼ t 11 x¼0

(10)

For each fitting method, the mean bias and RMSE values are given in Table 1. The BY and BX fitting method lead to a mean bias in level (x-direction) of 1.2 and 1.5 dB, respectively. The bias of the BY fitting method has negative values at lower loudness categories and positive values at higher loudness levels. The BX fitting method has a positive bias at the lower loudness categories. The BTX fitting method has a lower bias of 0.2 dB. BTUX showed the smallest bias of 0.1 dB. The RMSE for the BY fitting method is 5.7 dB and is around 5 dB for all other fitting methods.

Table 1 Bias and root-mean-square errors (RMSE) in level direction for the five different fitting methods.

Bias x/dB RMSE x/dB

BY

BX

BTX

BTPX

BTUX

1.2 (±2.7) 5.7 (±2.9)

1.5 (±2.6) 4.9 (±2.0)

0.2 (±2.6) 4.9 (±2.1)

0.2 (±2.5) 5.0 (±2.2)

0.1 (±2.6) 4.9 (±2.1)

3.4. Comparison to audiometric threshold A common problem of the BY fitting method by Brand and Hohmann (2002) is to under or overestimate the HTL. The HTL is located between “not heard” (0 CU), which is obviously at or below the hearing threshold, and “very soft” (5 CU), which is above the threshold. Here, the HTL was arbitrarily defined at 2.5 CU, corresponding to the mean value of both loudness categories. An example for a single subject is given in Fig. 6. The four different CLS runs and the loudness functions are shown. The estimated HTL of the BY fitting method at 2.5 CU ranges between 0.9 and 19.4 dB HL across the four CLS runs. The main reason for the different loudness functions and HTLs is the small number of responses in the lower loudness domain which entail a high standard deviation (compare Fig. 3). This leads to large differences between these functions in the lower loudness domain especially in the first and third run. The BTX method estimates the HTL using the logistic function and yields the same HTL of 7.5 dB HL for each of the four runs. This level is always estimated if the first “not heard” response occurred at 0 dB and the lowest audible response is at 15 dB HL. The step size of 15 dB in the decreasing track results in a coarse quantisation of HTL (discussed in Section 4.1). Fig. 7 shows the difference between the estimated HTL from the CLS data at 2.5 CU and the audiometric pure-tone thresholds as “gold standard” for 500 Hz, 2000 Hz, and 6000 Hz. The audiometric thresholds are the mean values of four repetitions of the manual audiometry. The four repetitions were in 78% of the cases within a range of 5 dB (97% of the cases in 10 dB range). It should be noted that the loudness scaling was measured with narrow-band noises (lownoise noise) and audiometric thresholds with pure tones. The HTL derived from the BY and BX method show the largest spread of threshold differences over all three frequencies compared to the audiometric threshold. Median values at 500 Hz for NH listeners are 3.1 and 8.2 dB and the interpercentile ranges between the 10th and 90th percentile of BY and BX are 22.6 and 22.3 dB. The BTX fitting method showed a median value of 1.7 dB and an almost halved interpercentile range of 11.2 dB. A similar result was observed for 2000 and 6000 Hz. The BTX method showed significant smaller variance as indicated in Fig. 7 compared to BYand BX for NH listeners (F-test, **p < 0.01, ***p < 0.001). A significant threshold shift between BY and BX (ManneWhitney U, p < 0.01 for 500 and 2000 Hz and p ¼ 0.02 for 6000 Hz) was observed for NH listeners. Improvements for HI listeners using the BTX fitting method are only significant at 6000 Hz. All three fitting methods showed a small interquartile range of 5e8 dB at all frequencies in case of HI listeners. The overall squared correlation between the hearing threshold using pure-tone audiometry and the HTL estimated from the CLS data was r2 ¼ 0.89 for BY and BX and r2 ¼ 0.95 for the BTX fitting method. 3.5. Computer simulations for testing UCL estimation Computer simulations were used to assess the accuracy of the UCL estimation using the different fitting methods when the dynamic range of the CLS procedure is not sufficient to reach the individual UCL. The maximum presentation level was limited to 105 dB HL as in typical clinical measurements. The underlying

D. Oetting et al. / Hearing Research 316 (2014) 16e27

Loudness (CU)

UCL (50)

BY BTX

BY BTX

1

23

2

35 MLL (25)

BY HTL: −0.9 dB HL MLL: 65.4 dB UCL: 91.3 dB HL

BTX HTL: 7.5 dB HL MLL: 63.0 dB UCL: 91.8 dB HL

15

BY HTL: 5.2 dB HL MLL: 66.2 dB UCL: 88.9 dB HL

BTX HTL: 7.5 dB HL MLL: 65.9 dB UCL: 88.4 dB HL

HTL (2.5)

Loudness (CU)

UCL (50)

BY BTX

BY BTX

3

4

35 MLL (25)

BY HTL: 19.4 dB HL MLL: 68.7 dB UCL: 82.4 dB HL

BTX HTL: 7.5 dB HL MLL: 69.4 dB UCL: 81.6 dB HL

15

BY HTL: 10.9 dB HL MLL: 62.6 dB UCL: 85.7 dB HL

BTX HTL: 7.5 dB HL MLL: 60.3 dB UCL: 85.6 dB HL

HTL (2.5) 0

20

40 60 Level (dB HL)

80

100

0

20

40 60 Level (dB HL)

80

100

Fig. 6. Example of four different runs of a NH subject. The grey line shows the BY fitting method. The black line shows the BTX fitting method that uses the estimated threshold as a fixed value at 2.5 CU.

*** ***

40 30 20 10 0 −10 −20

CLS threshold lower

HTL Difference to Audiogram / dB

“true” loudness functions had HTL values between 50 and 70 dB HL and lower slopes between 0.6 and 1 CU/dB according to typical values in the current data. The UCL was chosen between 10 dB above the MLL and 125 dB HL. Different loudness functions were generated from a set of randomly selected parameter values (uniform-distribution in the above mentioned ranges). Subjective responses were simulated using these underlying loudness functions. Only simulated data sets with the highest response “loud” (35 CU) or below were taken into account for further analysis. Overall, 2000 loudness functions and their corresponding data sets with a maximum response of 35 CU were generated. The differences between the underlying “true” UCL values and the estimated UCL values for the different fitting models were calculated and are shown in Fig. 8. The reference fitting model BY showed large deviations between the “true” UCL values and the estimated UCL values. Differences of more than 60 dB occurred in 26 cases which are not shown in Fig. 8. The BTPX and BTUX methods showed a narrower variance of the UCL prediction

NH BY BX BTX

500 Hz

HI BY BX BTX BY

*** ***

(SiegeleTukey, median adjusted, p < 0.001) compared to the BY, BX, and BTX fitting method. The range between the 5th and the 95th percentile was reduced from 47.8 dB for BY to 21.8 and 16.6 dB for the BTPX and BTUX fitting method, respectively. The BTUX fitting method resulted in a UCL estimate deviation with a median value of 3.5 dB compared to 8.2 dB of the BTPX fitting method. The variance of the BTUX fitting method was also narrower compared to the BTPX fitting method (SiegeleTukey, median adjusted, p < 0.001). 3.6. Testeretest behaviour Estimated loudness functions require a high precision in testeretest performance to be applicable for, e.g. hearing-aid fitting. Therefore, the similarity of the loudness functions estimated for different runs was analysed. In clinical applications, usually only a single run would be available for hearing-aid fitting. Hence, the range between the minimum and maximum levels of the loudness

**

2000 Hz

NH BX

HI BTX BY Fitting Method

BX

***

*** ***

6000 Hz

NH BTX

BY

BX

HI BTX

BY

BX

BTX

Fig. 7. Difference between the audiometric threshold and the HTL estimated with the CLS procedure for the different fitting methods.

24

D. Oetting et al. / Hearing Research 316 (2014) 16e27 Table 2 Within-subject standard deviation of different parameters using the different fitting methods. Bold figures indicate lowest values per row per session. The asterisks (for mhigh) indicate that values above 5 dB/CU (limit for BX and BTX, see Section 2.5.2) were excluded. That was the case for two fits in NH listeners and a single fit in HI listeners for BY for the intrasession and intersession comparison. Eight fits in HI listeners for BTPX for the intrasession and intersession comparison were excluded. 12 fits in NH for the intrasession and 10 fits for the intersession comparison were excluded for the BTPX fitting method.

60

UCL Difference / dB

40 20

Value unit

0

BY

BY

BX

BTX BTPX Fitting Method

BTUX

Fig. 8. Difference between the assumed UCL of the underlying loudness function and the UCL estimated from the simulated responses. The ends of the whiskers indicate values within 1.5 interquartile ranges of the first and third quartile, respectively. Outliers are marked with an “x”.

n

1 ðxÞ denoting the estimated level for loudness category x of with Fi;n the n-th run using the i-th fitting method (i ¼ 1: BY,2: BX,3: BTX, 4: BTPX, 5: BTUX). The results for rx,i for NH and HI listeners were averaged across frequency and are shown in Fig. 9. The mean range in level estimation for the loudness categories 20e30 CU was almost identical for all fitting methods for NH listeners. Differences occurred at lower categories, where the BY and BX fitting method had higher values compared to BTX, BTPX, and BTUX (using threshold estimation). For HI listeners, the fitting methods using threshold estimation also showed lower mean ranges below 20 CU. For higher loudness categories, the differences are caused by the different methods of UCL estimation. The BTUX and BTPX fitting method resulted in the lowest values for NH and HI listeners, respectively. The within-subject standard deviation of m measurements and n subjects is given by

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X sw ¼ t si 2 ¼ n

0.03 0.08 0.97 0.46 4.29 2.94 3.47 2.44 3.90 2.48

BTX BTPX BTUX

0.03 0.06 0.03 0.14 0.11 0.11 0.61 1.06 1.28 0.61 0.81 0.82 6.03 6.68 4.10 4.84 4.54 2.29 3.13 3.09 3.19 5.45 3.70 4.01 14.98 10.08 8.49 15.06 6.15 5.77

0.03 0.13 1.08 0.48 4.10 2.29 3.11 4.23 5.52 2.42

0.03 0.12 1.20 0.61 4.10 2.29 3.08 4.26 5.12 3.40

(13)

In Table 2 values for sw for intersession (data collected on two different days) and intrasession repetitions using the loudness scaling procedure and the different fitting methods are given. The slope of the lower part mlow was accurately assessed with all fitting methods, resulting in a standard deviation of about 0.03 CU/dB for NH and 0.1 CU/dB for HI listeners. The upper slope mhigh showed a standard deviation exceeding 0.4 CU/dB for all fitting methods. This corresponds to more than 26% of the mean value for mhigh (1.53 CU/dB; compare Section 3.1). The standard deviation for 2.5, 25, and 50 CU had usually the lowest value for BTPX and BTUX (between 2.29 and 5.52 dB). Only BTPX and BTUX showed a lower standard deviation over the whole loudness range, which did not exceed 6 dB.

4.1. Accuracy of HTL estimation and use in the loudness function

(12)

k¼1

The accuracy of the HTL estimation compared to the audiometric thresholds as the “gold standard” was determined. While there was almost no change in median values between the fitting

Here, si is the standard deviation for the i-th subject, xi,k is the measured value of the k-th repetition for the i-th subject, and xi

NH

12 Mean Range / dB

0.03 0.06 0.86 0.46 4.29 3.00 3.70 2.33 4.66 2.39

BX

4. Discussion

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! u n m u1 X 1 X t xi;k  xi : n m1 i¼1

0.03 0.08 1.16 0.50 4.29 2.94 3.54 2.41 7.86 3.20

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X sw ¼ t Dx2i : 2n i¼1

(11)

n

Intersession

BTX BTPX BTUX BY

denotes the respective mean. If only two measurement repetitions are considered per subject, xi,1 and xi,2, Eq. 12 can be simplified using the difference Dxi ¼ xi,1  xi,2 (Bland and Altman, 1996) and leads to

functions fitted to the different runs was compared. The mean range rx between the minimum and maximum predicted levels for each of the 11 loudness categories was calculated using

    1 1 rx;i ¼ max Fi;n ðxÞ  min Fi;n ðxÞ ;

BX

0.03 0.06 mlow (NH) CU/dB mlow (HI) CU/dB 0.12 0.10 mhigh (NH)* CU/dB 0.41 0.92 mhigh (HI)* CU/dB 0.56 0.63 HTL (NH) dB 5.04 4.92 HTL (HI) dB 3.05 2.92 MLL (NH) dB 3.23 3.58 MLL (HI) dB 3.19 2.63 UCL (NH) dB 14.67 9.08 UCL (HI) dB 16.81 3.30

−20

i¼1

Intrasession

HI

BY

10

BX

8

BTX

6

BTPX

4 BTUX 0

5

10

15

20 25 30 35 Loudness / CU

40

45

50 0

5

10

15

20 25 30 35 Loudness / CU

40

45

50

Fig. 9. The mean range between the maximum and minimum levels for each loudness category from the estimated loudness functions by the different fitting methods for different runs.

D. Oetting et al. / Hearing Research 316 (2014) 16e27

models (see Fig. 7), the interquartile range was considerably reduced for BTX. This modification is especially beneficial in listeners having low slopes in the lower loudness domain. Here, the HTL estimation based on the loudness function is less accurately defined because of the high standard deviation of the responses (see Figs. 3 and 7). In many runs the “not heard” category consist only of a single response, because the CLS procedure stops to detect the hearing threshold when 0 dB HL is reached (Brand and Hohmann, 2002). Although many HTL estimations in this study are based on a single response (126 out of 279 runs), the comparison with the audiometric thresholds (see Fig. 7) supports this approach. For hearing thresholds above 20 dB HL, the procedure is not interrupted and a step size of 5 dB is used to detect the threshold. Here, on average 2.8 responses were collected in the “not heard” category. To provide a solid basis for the HTL estimation, the ACALOS procedure should present more levels near the HTL. It is thus recommended to lower the minimum presentation level to 10 dB HL to also cover lower HTLs by the procedure. If the response is “not heard” at the minimum presentation level, the sequence should continue by presenting increasing levels until the response is audible again. The sequence should only stop if the response is still audible for the €rTech (2013) Oldminimum presentation level. Versions of Ho enburg Measurement Applications from release 1.5 include a modification of presenting levels below 0 dB HL. Any further increase of responses near the HTL would not lead to a more precise HTL estimation by the BY and BX fitting method because of the boundary effects in the responses. There are only a few responses close to the HTL in the “not heard” category but widely spread responses in the “very soft” to “soft” categories. This asymmetric distribution leads to an overestimation of the HTL. The additional threshold estimation by the BTX fitting method is still required to ensure that the loudness function converges towards the HTL. The mean intra-subject standard deviation for estimating the HTL from the CLS data was about 4.1e4.3 dB using the BTX method. Atherley and Dingwall-Fordyce (1963) measured the testeretest accuracy of audiometric thresholds resulting in a mean standard deviation of 3.6 dB. Thus the repeatability of HTL estimation using the CLS measurement is slightly worse but in the same range. Today, the HTL is still the most relevant parameter for hearingaid fitting and used by prescriptive rules like NAL-NL2 (Keidser et al., 2008) or DSL (Polonenko et al., 2010). Also in hearing diagnosis and in clustering HI listeners, the HTL serves as a fundamental criterion. Hence, the loudness function should relate to the subject's HTL. After the definition of loudness as the perceptual strength of a sound that ranges from very soft to very loud (Scharf, 2007) the HTL can be used as the starting point of the loudness function. Here, 2.5 CU was used as this starting point. 4.2. Supra-threshold parameters The intersession, intra-subject standard deviation for the fitted loudness function using the ACALOS method for 2.5, 25, and 50 CU was improved to 3e6 dB using the suggested modifications. Elberling (1999) reported higher values of 8.0e10.5 dB in different CLS studies. Values of 7.6e10.5 for the intersubject standard deviation for the UCL were mentioned by Keller (2006). Bentler and Cooley (2001) showed only a small correlation between the UCL and the pure-tone audiometric threshold (r2 ¼ 0.006) for thresholds between 20 and 60 dB HL. The correlation of the audiometric thresholds and the UCL in the current data (fitting method BTUX) was also small (r2 ¼ 0.002). These findings are consistent with the data of Pascoe (1988), who also showed no correlation for hearing losses up to 40 dB HL. Therefore, a precise estimate of the individual

25

UCL from the audiometric thresholds is not possible. Squared correlation between the audiometric thresholds and the MLL was r2 ¼ 0.45. Hence, 45% of the variance of the MLL was explained by the audiometric thresholds. Elberling (1999) mentioned that the slope of the loudness function can be predicted from the hearing loss in 70% of the cases. Al-Salim et al. (2010) also reported a squared correlation between r2 ¼ 0.7 and r2 ¼ 0.8 for different frequencies. Similar correlations between the audiometric threshold and mlow (r2 ¼ 0.75) were found here, meaning that 75% of the variance of the lower slopes can be explained by the audiometric threshold. Thus the mean UCL does not systematically change with hearing loss and the remaining dynamic range is reduced with increasing audiometric threshold. Correlation between audiometric threshold and UCL for more severe hearing losses (thresholds between 60 and 80 dB HL) leads to a squared correlation of r2 ¼ 0.18 in the current data. This observation is similar to the value of r2 ¼ 0.11 mentioned by Bentler and Cooley (2001). Reasons for the positive correlation might be the increasing amount of inner hair cell loss and an increasing contribution of conductive hearing loss. Taken together, CLS can be used as a tool to measure the individual loudness perception and to categorise subjects by their HTL, MLL, and UCL. 4.3. Limited data in the upper loudness range A lack of responses in the upper loudness domain can be a problem in clinical practice. If the upper loudness categories of the CLS procedure are not reached, a change in the response behaviour for loudness judgements was observed (Hohmann et al., 1997). AlSalim et al. (2010) used pure tones and a maximum presentation level of 105 dB SPL using an in-the-ear calibration. About 30% of the listeners at 2 and 4 kHz did not use two or more categories above medium, so that the upper slope could not be determined. If enough responses in the upper loudness range were available in the current data, all fitting methods (BY, BX, BTX, BTPX, and BTUX) led to similar loudness functions. To demonstrate this, the CLS results were divided in “incomplete” and “complete” data sets. Complete data sets (81.7% of the data sets) had at least 4 responses at 35 CU or above, so no UCL estimation method was applied. All other results were regarded as incomplete data sets. The maximum range among the five loudness functions in x-direction for the categories between “very soft” and “too loud” was measured. For 68.2% of the complete data sets, the maximum range among the loudness functions was below 10 dB. For the incomplete data sets only 12.2% were within a range of 10 dB because of the undefined behaviour in the upper loudness range. The suggested BTPX and BTUX methods use additional assumptions if data is missing in the upper loudness domain. In computer simulations, the BTUX approach had a higher accuracy than the BTPX method. The BTPX method uses the mean UCL of data from Pascoe (1988). This results in an UCL of 100 dB HL for hearing losses up to 40 dB HL although the CLS procedure presents levels up to 105 dB HL. Therefore, the result had a negative bias in UCL estimation as shown in Fig. 8. For more severe hearing losses Keller (2006) showed that the mean UCL values of Pascoe lead to higher UCL values compared to their data. The reason might be found in the procedure used by Pascoe (1988) that shifts the UCL towards higher levels than initially chosen. The upper slope mhigh of the loudness function has a high relative variability and cannot be measured with a sufficient reliability across sessions as observed in Table 2. The same result was mentioned by Al-Salim et al. (2010). Not surprisingly, no correlation was found by Jürgens et al. (2011) between mhigh and the compression ratio of the estimated inputeoutput function of the

26

D. Oetting et al. / Hearing Research 316 (2014) 16e27

basilar membrane. Also no correlation was found by Al-Salim et al. (2010) between mhigh and the audiometric threshold. The average slope of 1.53 CU/dB for the BTUX fitting method was determined using narrow-band noises. This slope was frequency dependent and was about 1 CU/dB for 500 Hz and about 1.5 CU/dB for 2.000 and 6.000 Hz. The selected slope of 1.53 CU/dB thus was a reasonable value for a frequency-independent slope for the loudness function. Changes of this value (between 1.2 and 1.8 CU/dB) had only minor effects on the results of the UCL estimation. The median value was 2.9 dB higher when a slope of 1.2 CU/dB was used instead of 1.53 CU/dB and 1.6 dB lower when using 1.8 CU/dB. Interquartile ranges stayed almost unaffected and were between 6.9 and 7.1 dB. For broad-band stimuli, the slope of the upper part of the loudness function was shallower compared to narrow-band signals (Brand and Hohmann, 2001). In this case the value of 1.53 CU/dB might not be appropriate and a signal dependent value has to be determined. 4.4. Recommended fitting methods The fitting method BTUX provided the lowest intra-subject standard deviation and had lowest values for bias and RMSE. The accuracy for estimating the UCL in computer simulations was higher for the BTUX method compared to the BTPX method. Therefore the BTUX model is recommended to analyse CLS data for diagnostic and hearing-aid fitting purposes. The use of the original (reference) fitting method BY for analysing ACALOS data cannot be generally recommended. Its results have to be interpreted with care because of the known problems (as described earlier) with this fitting method. For hearing-aid fitting and evaluation, the loudness function used in this work was so far only assessed for non-aided CLS measurements. Due to the nonlinear signal processing in hearing aids (e.g. dynamic compression with output limiting), the aided loudness function may result in different shapes compared to the loudness function used here. The modifications suggested in this work might be adopted to fitting methods for other CLS procedures. But the applicability of HTL and UCL estimation for CLS procedures which have a premeasurement phase is not clear and has to be evaluated for each CLS method separately. 5. Summary and conclusions Four loudness-function estimation methods for categorical loudness scaling were compared using empirical and simulated data. The suggested fitting methods including a MATLAB version of the CLS procedure by Brand and Hohmann (2002) are freely available as part of the AFC toolbox (Ewert, 2013) under http:// medi.uni-oldenburg.de/afc. The following conclusions can be drawn:  If responses of the subject are consistent over the whole loudness range, all suggested fitting methods and the original procedure by Brand and Hohmann (2002) produce almost equal loudness functions. For general purposes, including diagnoses and hearing-aid fitting, the suggested BTUX fitting method is recommended.  An improved performance of the modified fitting methods becomes obvious in CLS data with few responses in the lower loudness domain which spread over a large level range. A large spread of 50 dB or more, distributed over the categories between “very soft” and “medium”, occurred in 93% of the runs in normal-hearing listeners and in 26% of the runs in hearingimpaired listeners.

 The accuracy of the HTL estimation using the CLS procedure improves if data are analysed in terms of “heard” and “not heard”. Furthermore, using the estimated HTL from the CLS data as a fixed starting point of the loudness function improves testeretest repeatability.  The performance was improved for CLS data sets with only few responses in the upper loudness domain using BTUX. In 18% of the runs in normal-hearing and in hearing-impaired listeners, fewer than four responses between “loud” and “too loud” were collected.  Brand and Hohmann (2002) reported an intra-subject standard deviation of 4e5 dB for the ACALOS method. These values were only achieved in the data of this study when using the modified fitting models BTPX or BTUX.

Conflict of Interest None of the authors has a financial or other conflict of interest. Acknowledgements We are very grateful to Birger Kollmeier for his substantial support. We thank Ray Meddis for comments on earlier versions and discussions. This work was supported by the BMBF 13EZ1127D €rsysteme”) (“Modellbasierte Ho and the Deutsche Forschungsgemeinschaft (DFG FOR 1732 “Individualisierte € rakustik”, TPE). Ho References Allen, J.B., Hall, J.L., Jeng, P.S., 1990. Loudness growth in 1/2-octave bands (LGOB)ea procedure for the assessment of loudness. J. Acoust. Soc. Am. 88, 745e753. Al-Salim, S.C., Kopun, J.G., Neely, S.T., et al., 2010. Reliability of categorical loudness scaling and its relation to threshold. Ear Hear. 31, 567e578. American Speech-Language-Hearing Association, 2005. Guidelines for Manual Pure-tone Threshold Audiometry. ANSI S3.6, 2010. Specification for Audiometers. Anweiler, A.-K., Verhey, J.L., 2006. Spectral loudness summation for short and long signals as a function of level. J. Acoust. Soc. Am. 119, 2919. Atherley, G.R.C., Dingwall-Fordyce, I., 1963. The reliability of Repeated auditory threshold determination. Br. J. Ind. Med. 20, 231e235. Bentler, R.A., Cooley, L.J., 2001. An examination of several characteristics that affect the prediction of OSPL90 in hearing aids. Ear Hear. 22, 58e64. Bevington, P.R., Robinson, D.K., 2003. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, New York. Bland, J.M., Altman, D.G., 1996. Statistics notes: measurement error. Bmj 313, 744. Brand, T., 2000. Analysis and Optimization of Psychophysical Procedures in Audiology. Universit€ at Oldenburg, Germany. PhD thesis. Brand, T., 2007. Loudness scaling. In: 8th EFAS Congr. Congr. Ger. Soc. Audiol. Deutsche Gesellschaft für Audiologie e.V. CD-ROM, Heidelberg. Brand, T., Hohmann, V., 2001. Effect of hearing loss, centre frequency, and bandwidth on the shape of loudness functions in categorical loudness scaling. Int. J. Audiol. 40, 92e103. Brand, T., Hohmann, V., 2002. An adaptive procedure for categorical loudness scaling. J. Acoust. Soc. Am. 112, 1597e1604. Burns, W., Hinchcliffe, R., 1957. Comparison of the auditory threshold as measured by individual pure tone and by Bekesy audiometry. J. Acoust. Soc. Am. 4, 1274e1277. Buus, S., Müsch, H., Florentine, M., 1998. On loudness at threshold. J. Acoust. Soc. Am. 104, 399e410. Cox, R.M., Alexander, G.C., Taylor, I.M., Gray, G.A., 1997. The contour test of loudness perception. Ear Hear. 18, 388e400. Dreschler, W.A., Esch Van, T.E., Larsby, B., et al., 2008. Charactering the individual ear by the “Auditory Profile”. J. Acoust. Soc. Am. 123, 3714. Elberling, C., 1999. Loudness scaling revisited. J. Am. Acad. Audiol. 10, 248e260. Elberling, C., Nielsen, C., 1993. The dynamics of speech and the auditory dynamic range in sensorineural hearing impairment. In: Beilin, J., Jensen, G. (Eds.), Recent Dev. Hear. Instrum. Technol. Danavox Foundation Kolding, Denmark, pp. 99e131. Ewert, S.D., 2013. AFC e a modular framework for running psychoacoustic experiments and computational perception models. In: Int. Conf. Acoust. AIA-DAGA, pp. 1326e1329. Ewert, S.D., Grimm, G., 2012. Model-based hearing aid gain prescription rule. In: Dau, T., Jepsen, M.L., Cristensen-Dalsgaard, J., Poulsen, T. (Eds.), Proc. ISAAR 2011

D. Oetting et al. / Hearing Research 316 (2014) 16e27 Speech Percept. Audit. Disord. The Danavox Jubilee Foundation, Nyborg, pp. 393e400. Fletcher, H., Munson, W.A., 1933. Loudness, its definition, measurement and Calculation. J. Acoust. Soc. Am. 5, 82e108. Fowler, E.P., 1950. The recruitment of loudness phenomenon. Laryngoscope 60, 680e695. go, S., Garnier, S., Micheyl, C., et al., 1999. Loudness growth functions and EABR Galle characteristics in Digisonic cochlear implantees. Acta Otolaryngol. 119, 234e238. Heeren, W., Hohmann, V., Appell, J.E., Verhey, J.L., 2013. Relation between loudness in categorical units and loudness in phons and sones. J. Acoust. Soc. Am. 133, EL314eEL319. € rgera €te-Audiometrie. Ein computergestütztes Hellbrück, J., Moser, L.M., 1985. Ho €rgera €teanpassung. Psychol. Beitra €ge 27, psychologisches Verfahren zur Ho 494e508. € rfeldaudiometrie mit dem Verfahren der Kategorienunterteilung Heller, O., 1985. Ho €ge 27, 478e493. (KU). Psychol. Beitra Herzke, T., Hohmann, V., 2005. Effects of instantaneous multiband dynamic compression on speech intelligibility. EURASIP J. Appl. Signal Process. 18, 3034e3043. Hohmann, V., Kollmeier, B., 1995. Weiterentwicklung und klinischer Einsatz der €rfeldskalierung. Audiol. Akust. 34, 48e59. Ho Hohmann, V., Kollmeier, B., Müller-Deile, J., 1997. Festlegung der Parameter. In: € rfl€ Kollmeier, B. (Ed.), Ho achenskalierung - Grundlagen und Anwendung der €rdiagnostik und Ho € rgera €te-Versorgung. MeKateg. Lautheitsskalierung für Ho dian-Verlag, pp. 81e102. €rTech, 2010. Operation Manual Categorical Loudness Scaling (Rev. 1.3.b). Ho €rTech, 2013. Bedienungsanleitung “Kategoriale Lautheitsskalierung (Version Ho 1.6)”. Humes, L.E., Pavlovic, C., Bray, V., Barr, M., 1996. Real-ear measurement of hearing threshold and loudness. Trends Amplif. 1, 121e135. ISO 16832, 2006. Acoustics e Loudness Scaling by Means of Categories. Jenstad, L.M., Cornelisse, L.E., Seewald, R.C., 1997. Effects of test procedure on individual loudness functions. Ear Hear. 18, 401e408. Jesteadt, W., Joshi, S.N., 2013. Reliability of procedures used for scaling loudness. In: Proc. Meet. Acoust. Acoustical Society of America, pp. 050023e050023. Jürgens, T., Kollmeier, B., Brand, T., Ewert, S.D., 2011. Assessment of auditory nonlinearity for listeners with different hearing losses using temporal masking and categorical loudness scaling. Hear Res. 280, 177e191. Keidser, G., Seymour, J., Dillon, H., et al., 1999. An efficient, adaptive method of measuring loudness growth functions. Scand. Audiol. 28, 3e14. Keidser, G., Dillon, H.R., Flax, M.R., et al., 2008. The NAL-NL2 prescription procedure. Audiol. Res. 1, 88e90. Keller, J.N., 2006. Loudness Discomfort Levels: a Retrospective Study Comparing Data from Pascoe (1988) and Washington University School of Medicine. Washington University School of Medicine. PhD thesis. Kiessling, J., 2001. Hearing aid fitting procedures e state-of-the-art and current issues. Scand. Audiol. 30, 57e59. Kiessling, J., Schubert, M., Archut, A., 1996. Adaptive fitting of hearing instruments by category loudness scaling (ScalAdapt). Int. J. Audiol. 25, 153e160. € rfla €chenskalierung - Grundlagen und Anwendung der kateKollmeier, B., 1997. Ho € rdiagnostik und Ho € rgera €te-Versorgung, 256 gorialen Lautheitsskalierung für Ho pp. €rger€ Kreikemeier, S., 2011. Verfahren zur lautheitsbasierten Anpassung von Ho aten mit instantanem Insitu-Perzentil-Monitoring. Universit€ at Gießen. PhD thesis. Launer, S., 1995. Loudness Perception in Listeners with Sensorineural Hearing €t Oldenburg. PhD thesis. Impairment. Universita Launer, S., Hohmann, V., Kollmeier, B., 1994. Experimente und Modellvorstellungen € renden. DAGA 1994-Fortschritte der zur Lautheitsskalierung bei Schwerho Akust. Bad Honnef, pp. 1409e1412. Launer, S., Holube, I., Hohmann, V., Kollmeier, B., 1996. Categorical loudness scaling in hearing-impaired listeners: can loudness growth be predicted from the audiogram? Audiol. Akust. 4, 156e163. Lecluyse, W., Meddis, R., 2009. A simple single-interval adaptive procedure for estimating thresholds in normal and impaired listeners. J. Acoust. Soc. Am. 126, 2570e2579. Pascoe, D.P., 1988. Clinical measurements of the auditory dynamic range and their relation to formulas for hearing aid gain. In: Jensen, J. (Ed.), Hear. Aid Fitting Theor. Pract. Views. Proc. He 13th Danavox Symp. Stougaard Jensen, Copenhagen, pp. 129e152. Polonenko, M.J., Scollie, S.D., Moodie, S., et al., 2010. Fit to targets, preferred listening levels, and self-reported outcomes for the DSL v5.0a hearing aid prescription for adults. Int. J. Audiol. 49, 550e560.

27

Rasmussen, A.N., Olsen, S.O., Borgkvist, B.V., Nielsen, L.H., 1998. Long-term testretest reliability of category loudness scaling in normal-hearing subjects using pure-tone stimuli. Scand. Audiol. 27, 161e167. Rennies, J., Holube, I., Verhey, J.L., 2013. Loudness of speech and speech-like signals. Acta Acust. United Acust. 99, 268e282. Ricketts, T.A., Bentler, R.A., 1996. The effect of test signal type and bandwidth on the categorical scaling of loudness. J. Acoust. Soc. Am. 99, 2281e2287. Robinson, K., Gatehouse, S., 1996. Test-retest reliability of loudness scaling. Ear Hear. 17, 120e123. Scharf, B., 2007. Loudness. Encycl. Acoust. 3, 1481e1495. Smeds, K., Leijon, A., 2011. Loudness and hearing loss. In: Florentine, M., Fay, R.R., Popper, A.N. (Eds.), Loudness. Springer, New York, pp. 223e259. Stenfelt, S., Zeitooni, M., 2013. Loudness functions with air and bone conduction stimulation in normal-hearing subjects using a categorical loudness scaling procedure. Hear Res. 301, 85e92. Stevens, S.S., 1936. A scale for the measurement of a psychological magnitude: loudness. Psychol. Rev. 43, 405e416. Stevens, S.S., 1959. On the validity of the loudness scale. J. Acoust. Soc. Am. 31, 995e1003. Theelen-van den Hoek, F.L., Boymans, M., Stainsby, T., Dreschler, W.A., 2014. Reliability of categorical loudness scaling in the electrical domain. Int. J. Audiol. 53, 409e417. Valente, M., Van Vliet, D., 1997. The independent hearing aid fitting forum (IHAFF) protocol. Trends Amplif. 2, 6e35. Van Esch, T.E.M., Kollmeier, B., Vormann, M., et al., 2013. Evaluation of the preliminary auditory profile test battery in an international multi-centre study. Int. J. Audiol. 52, 305e321.

Glossary ACALOS: adaptive categorical loudness scaling ANSI: American National Standards Institute B&K: Brüel & Kjaer B: bias zier function bez: Be BTPX: fitting method including HTL estimation and UCL estimation using UCL data of Pascoe (1988) BTUX: fitting method including HTL estimation and UCL estimation using a fixed value for mhigh BTX: fitting method including HTL estimation BX: fitting method minimises error in x-direction BY: reference fitting method CLS: categorical loudness scaling CU: categorical units dB: decibel DR: dynamic range DRhigh: dynamic range between MLL and UCL DRlow: dynamic range between HTL and MLL F(L): level (L) dependent loudness function HI: hearing-impaired HL: hearing level HTL: hearing threshold level (at 2.5 CU on the loudness function) ISO: International Organisation for Standardisation Lcut: level at intersection of linear parts of the loudness function Lx: level at x CU MCL: most comfortable loudness level mhigh: slope of the upper portion of the loudness function MLL: median-loudness level (at 25 CU on the loudness function) mlow: slope of the lower portion of the loudness function NH: normal-hearing OMA: Oldenburg Measurement Applications PTA: pure-tone average r: Pearson's correlation coefficient Ri(Li): i-th response on loudness scale for stimuli with level Li RMSE: root-mean-square error rx: range between the minimum and maximum level for loudness category x SPL: sound pressure level sw: within-subject standard deviation UCL: uncomfortable loudness level (at 50 CU on the loudness function)