A psychoacoustical study of wind buffeting noise

A psychoacoustical study of wind buffeting noise

Applied Acoustics 95 (2015) 1–12 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust A p...

1MB Sizes 2 Downloads 61 Views

Applied Acoustics 95 (2015) 1–12

Contents lists available at ScienceDirect

Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

A psychoacoustical study of wind buffeting noise Guillaume Lemaitre a,⇑, Christian Vartanian b, Christophe Lambourg a, Patrick Boussard a a b

Genesis Acoustics, Domaine du Petit Arbois, 13045 Aix-en-Provence, France Soufflerie GIE S2A, 2 avenue Volta, 78180 Montigny-le-Bretonneux, France

a r t i c l e

i n f o

Article history: Received 19 February 2014 Received in revised form 19 December 2014 Accepted 19 February 2015 Available online 10 March 2015 Keywords: Psychoacoustics Sound quality Wind buffeting Aerodynamic noise Fluctuation strength

a b s t r a c t Aerodynamic noise resulting from a vehicle moving through air at high speeds is one of the most important sources of noise perceived by the passengers. It consists of a stationary broadband signal and of modulated and fluctuating components, particularly emphasized by gusts of wind and turbulences generated by the interaction with nearby vehicles. The article reports on a study of the perception of this latter phenomenon, wind buffeting, potentially deleterious to the sound quality of a vehicle. Binaural recordings of nineteen cars were conducted in a wind tunnel with a specific module designed to simulate mild or severe buffeting. Naive and expert participants first rated the unpleasantness of the recordings played at their real levels. There were large differences of loudness between the sounds resulting mostly from the car designs, and loudness was the main factor contributing to the unpleasantness of the sounds. Participants then rated the unpleasantness of the recordings equalized to the same loudness. In that case, unpleasantness was mostly influenced by the buffeting module and related to fluctuation strength, a psychoacoustical descriptor of perceived loudness modulations. We propose an indicator of the unpleasantness of wind buffeting based on fluctuation strength in several frequency bands as well as other descriptors of the spectral balance of the sounds. Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction A variety of sources contribute to the sounds emitted by a car: engine and exhaust, contact of tires and road, aerodynamic flow as the vehicle moves through air, vibrations of light elements in the cabin (steering wheel, seats, dashboard, etc.), alarms and warning signals emitted inside and outside the vehicle, air conditioning systems, etc. These different sources contribute to different aspects of the overall sound quality of the vehicle, different components of a person’s appraisal of the vehicle based on its sounds. Car noises impinge on three categories of persons: the driver and the passengers inside the car, the other road users outside the car, and the neighboring communities. Accordingly, interior and exterior vehicle noises have different effects and functions. On the one hand, exterior vehicle noise is mainly an issue regarding annoyance to the community [1–3], although sounds are also necessary to signal the presence of the vehicle to road users [4–6]. On the other hand, interior sounds contribute to the comfort or discomfort of the driver and the passengers, the appraisal of the character of the car, and is a source of information for the driver. Many aspects of car interior noise have been studied: the booming sound caused by

⇑ Corresponding author. E-mail address: [email protected] (G. Lemaitre). http://dx.doi.org/10.1016/j.apacoust.2015.02.011 0003-682X/Ó 2015 Elsevier Ltd. All rights reserved.

the excitation of the passenger cavity by the engine noise [7], sounds of closing doors [8,9], light switches [10], starter [11], the roughness of the engine noise [12,13] and the identity of a car [14], the influence of the exhaust system on the identity of the car [15], the sound of anti-lock braking systems [15], the influence of direct and indirect fuel injection on diesel engine noise [16], the influence of various events on the appraisal of long sequences [17,18], etc. (see [19] for a review). For instance, Cerrato noted that road-tire and aerodynamic noises contribute to the pleasantness and comfort of the interior noise [19], and Bodden et al. suggested that high-frequency aerodynamic noise may be used to balance aggressive low-frequency engine noise and increase the ‘‘elegance’’ of interior noise [20]. The study focused on this particular source of interior noise: aerodynamic noise caused by the vehicle moving at high speed through air, isolated from other sources of noises in a car. At speeds above 120 kph, aerodynamic noise becomes the most important source of noise in most vehicles. In France, speed limit is 130 kph on highways, and the average measured speed is 117 kph.1 Drivers are thus often submitted to loud aerodynamic noises and car manufacturers pay a great deal of attention to the design of 1 http://www.securite-routiere.gouv.fr/la-securite-routiere/l-observatoire-national-interministeriel-de-la-securite-routiere/comportement-des-usagers/observationde-la-circulation, last retrieved on Decembre 12, 2014.

2

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

the hood, the position of the pillars, the shape of the windshield and the side view mirrors to tailor the aerodynamic noise experienced by the driver and the passengers. In modern vehicles, aerodynamic noise consists of a quasi-stationary broadband signal, fluctuations caused by aerodynamic turbulences, irregularities of the air flow, cavity resonances, and aspiration leaks, and low-frequency beating noise caused by open windows, sunroofs, and gusts of winds [21,19]. Aerodynamic noise is often experimentally studied by placing a test vehicle in a wind tunnel, which simulates the motion of the vehicle through steady air. But this fails to capture two other phenomena known to be deleterious to the comfort of the driver in real driving situations: gusts of winds and turbulences generated by perturbations of the air flow resulting from the presence of other vehicles. For instance, Peric et al. showed that orienting the vehicle with different yaw angles in a wind tunnel resulted in turbulences that created audible fluctuations of the aerodynamic noise [22]. Such a fluctuating noise is called ‘‘wind buffeting’’. Wind buffeting contributes to the unpleasantness of the interior car sounds [20]. In particular, for sounds with the same loudness, sounds with fluctuations are rated as more unpleasant than sounds without fluctuations [23]. More generally, fluctuations of broadband noises are perceived as unpleasant in a variety of applications. Diesel engines at idle, for instance, make a typical sound whose slow amplitude modulations are perceived as very unpleasant [24–26]. Heat ventilation and air conditioning systems (HVAC) are another example of broadband signals whose slow modulations are perceived as unpleasant [27]. In a different context, the ‘‘swishing’’ (i.e. abrupt and periodic modulations of amplitude) character of wind turbines contribute to their unpleasantness [28]. Most product sound quality studies use a psychoacoustical framework: quality, preference, or unpleasantness are measured using various methods (questionnaires, magnitude estimations scales, semantic differentials, pair comparisons, multidimensional scalings, etc. see [29] for a review). These judgements are then correlated with psychoacoustic descriptors (or indicators) using linear or multilinear correlations (see [30] for a review). Most of these studies use the psychoacoustical descriptors developed by Zwicker and Fastl [31,32], or develop their own metric. Almost all of these studies find that loudness correlates best with sound quality judgements (listeners prefer quieter sounds). Another common result is that quality judgements are negatively correlated with roughness (rough sounds are generally evaluated as unpleasant, see for instance [13,33]), sharpness or spectral gravity center2 (listeners tend to find sharp sounds unpleasant e.g. [34]), tone-tonoise ratio and related metrics (prominence ratio, etc.; sounds with prominent tonal components tend to be judged as unpleasant, although some listeners prefer tonal sounds over noisy sounds, and some other listeners prefer the opposite [35]), and fluctuation strength ([24,33,36,27] see below for a discussion). Regarding wind buffeting, car manufacturers generally evaluate aerodynamic noise with intensity or loudness based indicators (Aweighted sound pressure level, ISO 532B model of loudness [31], etc.). For instance, Otto and Feng showed that annoyance caused by steady aerodynamic noise was very well correlated with the loudness of the sounds [37]. By definition, such indicators are related to stationary features of the sounds, and cannot capture the potential influence of fluctuations. Blommer et al. have developed an indicator for measuring the influence of buffeting and gusting noises based on the loudness of detected impulses in the noise [21]. Such an indicator has two drawbacks: Detecting impulsive events in noise is far from trivial, and the standard loudness model (ISO 532B) is not relevant for short impulsive sounds [31]. Hoshino

2 Sharpness describes the sensation associated with spectral balance; Sharp sounds have much energy in high frequency.

and Kato used a different strategy: They developed an indicator based on the loudness of the part of the sound coming from the direction of the driver window [38]. They found this indicator well correlated with judgments of the ‘‘loudness of the wind noise’’, but it is uncertain whether this actually captures the influence the fluctuations. In fact, the perception of modulated tones and broadband noises has been experimentally investigated in psychoacoustical studies (see [39–41] for experimental studies published in English and [42] for a summary). Rapid modulations (greater than 20 Hz) result in sounds perceived as rough [43–45]. For slower modulations, listeners perceive fluctuations of loudness. The sensation of fluctuating loudness is called fluctuation strength. Fluctuation strength is maximal for modulation frequencies of about 4 Hz, and increases with loudness and modulation depth. Zwicker and Fastl have proposed a unit for fluctuation strength: the vacil [42]. A 1000 Hz tone 100% modulated in amplitude by a 4 Hz sine wave has a fluctuation strength of one vacil. Zwicker and Fastl have also proposed a model to predict fluctuation strength from a signal’s properties. It is based on the modulation index of the envelope of the signal, taking into account masking phenomena, and bandpass filtered around 4 Hz. Such a model successfully predicts the perceived fluctuation strength of modulated tones, but is ineffective for broadband noises. By definition, random noises have random fluctuations that Zwicker and Fastl’s model inappropriately considers as contributing to the fluctuation strength, whereas these sounds are in fact perceived as stationary. A more complex model was devised by Sontacchi [46], which is in fact a transposition of a similar model developed for the roughness of broadband signals by Aures and Daniel and Weber [44,47–49]. In this model the input signal is first filtered to account for the effect of the middle ear and passed through a set of band-pass filters modeling auditory filtering occurring in the cochlea. A modulation index is computed from the envelope at the output of each auditory channel. The contribution of the modulation indices in each auditory channels are weighted by a coefficient taking into account the correlations between adjacent bands and finally integrated into a single number for each time frame. This latter step is crucial: In a random noise, the envelopes in each auditory channel are uncorrelated and do not contribute to the global fluctuation strength. For a broadband signal modulated in amplitude, each auditory channel is similarly modulated and contributes maximally to the global perception of fluctuation. The goal of the study was to investigate the perception of wind buffeting noise and design an indicator of the unpleasantness of these sounds. The overarching principle of the study was to have subjects rate the unpleasantness of the aerodynamic noises of different cars recorded in different buffeting conditions, and to relate acoustic features to the unpleasantness judgements using multilinear regressions and bootstrap. We analyzed the aerodynamic sounds by computing a large set of acoustic features, including the features found in studies of product sound quality (loudness, sharpness, roughness, tone-to-noise ratio, etc.). In particular, we included the fluctuation strength calculated according to the method described in the previous paragraph. Multilinear regression and bootstrap then revealed which features contributed significantly to the unpleasantness jugements. The study used a set of different cars recorded in a wind tunnel, under three conditions: in a steady flow of air, and with a ‘‘buffeting’’ module placed at two positions. The module was designed to simulate buffeting generated by another vehicle running in front of the test vehicle. We conducted an experiment in which participants rated the unpleasantness of a set of sounds. ‘‘Pleasantness’’ describes the hedonic value of auditory sensations and is therefore directly related the sound’s properties (on the contrary, for instance, ‘‘annoyance’’ is a broader concept that also

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

relates to non-acoustical factors, attitudes of participants towards the sources of noise, effects of noise on participants’ activities, etc. [30]). Subjects rated ‘‘unpleasantness’’ rather than pleasantness because sounds were actually unpleasant (see [50] for a similar approach). The first part of the experiment played sounds at their real loudness levels. However, we hypothesized that such a setting would not optimally assess the influence of amplitude fluctuations. First, most studies find that judgments of unpleasantness (or preference) are massively driven by loudness differences when these differences are large (which was the case of our recordings). Second, loudness and fluctuation strength are quasi-confounded variables since fluctuation strength increases with the loudness of tones and broadband noises [39,40,42]. It is therefore very difficult to disentangle the contribution of loudness and fluctuation strength to the perception of the annoyance caused by sounds with different loudness values. Thus, participants judged in the second part of the experiment the pleasantness of the same sounds, equalized in loudness. In fact, equalizing all sounds to the same loudness is a method that allows to study more subtle aspects of sound quality that are masked by too large loudness differences in a laboratory setting [35,51]. Eventually, the goal of the study was to provide car manufacturers with a indicator capable of comparing the influence of a car design on the fluctuations perceived inside the vehicle.

2. Experimental study 2.1. Recordings Nineteen cars ranging from A-segment small cars to D-segment large cars and including M-segment multi purpose cars3 were recorded in a full-scale 3/4 open wind tunnel (nozzle section 24 m2). A rotating belt between the wheels and a boundary layer suction system simulated the moving ground to approach the actual conditions of the vehicle on the road. The testing room was semianechoic (cut-off frequency 125 Hz) and the air duct acoustically treated to reduce the level of noise of the air flow. Vehicles were positioned in the axis of the air flow, with a simulated speed of 120 kph, corresponding the average speed on French highways. Recordings were made with the engine off, the wheels static and all windows closed. Each car was recorded three times (totaling 57 stimuli), each corresponding to a position of a ‘‘buffeting module’’.4 The buffeting module consists of a piece of equipment simulating a car driving in front of the test vehicle. The module can be adjusted laterally, so as to vary the amount of buffeting. There were three positions of the module. In the parking position the module is completely removed from the air flow and does not generate any buffeting. In the center position, the module is positioned just in front of the test vehicle, simulating a vehicle driving in the same lane as the test vehicle. In the reference position the module is at a position that was empirically determined to create the maximum amount of buffeting, and corresponding for instance to another vehicle overtaking the test vehicle. This situation corresponds to a worst-case scenario. Fig. 2 represents a vehicle in the wind tunnel with the buffeting module at the reference position. The stimuli were binaurally recorded with an artificial head and torso (HEAD Acoustics GmbH, Herzogenrath-Kohlscheid, Germany) at the driver’s position, a 48 kHz sampling rate, and 32 bit resolution. Stimuli were edited to 10 s sequences and digitally filtered to compensate the direction-independent frequency response of head 3

European market segments. http://www.soufflerie2a.com/en/simulation-de-suivi-de-vehicule-2/, last retrieved on December 8, 2014. 4

3

and torso and the frequency–response of the headphones used in the experiment. The loudness (ISO 532B) of the stimuli varied from 78 to 91 phones (61–74 dB(A)). In the second part of the experiment the amplitudes of the sounds were modified so as to set the loudness of each sound to 81 phones. Fig. 1 represents the spectra of three recordings of the same car in the parking, centre and reference positions. Amplitude modulations are clearly visible in the reference position. 2.2. Method 2.2.1. Stimuli and apparatus The experiment used the 57 previously-described recorded sounds. The sound stimuli were played through an RME Fireface 400 audio interface and Sennheiser HD650 headphones in a quiet room. In addition, the left channel of the recordings was played through a Cabasse TSA 100 W amplified subwoofer. The crossover frequency between headphones and subwoofer was set to 60 Hz (stimuli played over headphones were high-pass filtered and the subwoofer had a low-pass filter set to 60 Hz). The subwoofer was placed at the participant’s feet (about 1 m away from the participant’s ears to minimize propagation delay), and its precise position was empirically determined to insure a flat frequency response at the participant’s ears. Stimulus presentation and response collection were programmed on a Dell XPS laptop running Windows 7 with Matlab 7.13.0.664 and the Psychophysics Toolbox extensions [52]. 2.2.2. Participants Forty-eight persons volunteered as participants in two groups. Thirty French speaking lay participants (15 male and 15 female), between 20 and 60 years of age (median 40 years old), formed Group 1 and were payed 40 Euros in coupons. They were screened with questionnaires. The requirements to participate were to own a car and have minimal audio expertise. Group 1 was selected to roughly match the demographics of car owners in France. Eighteen French speaking expert participants (17 male and one female), between 25 and 57 years (median 45 years old) of age, formed Group 2. Participants of Group 2 were engineers from the wind tunnel facility or car manufacturers. 2.2.3. Procedure The experimental session was divided in two parts. In the first part participants judged the unpleasantness of the 57 sounds played at their recorded level. In the second part they judged the 57 sounds equalized to the same loudness (81 phones). The design of the experiment took inspiration from the MUSHRA procedure (MUltiple Stimuli with Hidden Reference and Anchor) developed by the International Telecom Union for the assessment of audio codecs [53]. The experiment consisted of 11 steps. Each step consisted of a graphical interface with 11 sliders corresponding to 11 sounds. Participants listened to each sound by pressing a button at the top of each slider. Participants judged each sound with the different sliders on a continuous scale ranging from ‘‘the least unpleasant’’ to ‘‘the most unpleasant’’. Participants could play each sound as many time as they wished, switch between sounds, or stop the playback whenever they felt like. Unbeknown to subjects, two sounds were repeated at every step. These sounds had been selected in pilot studies as being the least (positive reference) and the most unpleasant sounds (negative reference). The interface was locked until the participant had listened to every sound at least once and positioned at least one slider to the minimal value and one slider to the maximal value of the unpleasantness scale. Such a procedure has several advantages. First, participants usually find comparisons easier than absolute judgments. Second, each sound is systematically compared to the

4

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

absolute value is determined by the reference sounds used in the experiment. The main drawback is that the results of experiments made with different references cannot be compared. Participants made a total of 121 judgments in each part. The two reference sounds were systematically presented at each step. In addition, each of the reference sounds was presented an additional three times, in steps that were randomly selected throughout the experiment (30 presentations in total). Each of the 19 cars in the parking and reference positions of the buffeting module were presented twice (72 presentations in total). The 19 cars in the center position were presented once throughout the experiment. The order of the sounds was randomized for each subject. This presentation scheme ensured that the experiment was not too long and that the number of sounds to be compared at each step was manageable by the participants (these numbers had been determined in pilot studies). Repetitions of the recordings made at the parking and reference positions (test–retest) allowed to check participants’ consistency throughout the experiment. Participants received written instructions and a training sessions before the two parts of the experiment. The negative reference for the sounds at real loudness corresponded to a supermini car (market segment B) manufactured in France in the 1990s (labeled B2 in the following), with the buffeting module in the reference position. The positive reference for the sounds at real loudness corresponded to a large family car (market segment D) produced in France from the 2000s on (labeled D3 in the following), with the buffeting module in the parking position. The negative reference for the sounds with equalized loudness corresponded to a supermini car (market segment B) produced in France throughout the 2000s (labeled B1 in the following), with the buffeting module in the reference position. The positive reference for the sounds with equalized loudness corresponded to a large family car (market segment D) manufactured in France in the first part of the 2000s (labeled D6 in the following), with the buffeting module in the parking position. The results of the two parts of the experiment (real and equalized loudness) are analyzed separately, because they used different references to anchor the judgments and thus cannot be directly compared. Fig. 1. Spectrogram of three recordings of the same car, in the parking (top), centre (middle), and reference positions. Only three seconds of sounds are represented in the 10–10000 Hz range to emphasized amplitude modulations.

worst and the best sounds, which provides participants with stable references for the comparisons. Third, it constrains the subjects to use the full scale and no normalization of the results is thus needed. The results of such a procedure is an interval scale whose

3. Results: sounds with different loudness values Participants took 36 min on average to perform the experiment (from 10 to 97 min). Ratings of unpleasantness were coded between zero (minimal unpleasantness) and one (maximal unpleasantness). The results of one participant were excluded from analyzes because of a failure of the interface.

Fig. 2. Schematic representation of a vehicle in the wind tunnel with the buffeting module at the reference position.

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

3.1. Individual differences and outliers There were three reasons to suspect that participants may have used different strategies to respond. First, there were large discrepancies between times spent at the experiment and we conjectured that some participants may not have devoted enough attention to the task. Second, the two groups of participants had different levels of expertise and familiarity with the sounds and we have shown that listeners with different sound expertise may engage in different listening strategies [54]. Finally, informal preliminary tests and interviews had suggested that listeners may be sensitive to modulations of different frequency bands of the sound spectra. Three methods were used to identify potential outliers and groups of participants: consistency of the ratings between tests and retests (within-participant consistency), absolute ratings of the two reference sounds, and correlations and distances between the participants (between-participant consistency). None of these methods provided us with a clear-cut rule to exclude participants. Rather, we combined them to single out potential outliers, and the decision to exclude some participants resulted from individually examining the data of these potential outliers. 3.1.1. Within-participant consistency Each participant evaluated 38 sounds at least twice (test–retest). Differences between tests and retests ranged from 0.12 to 0.09 (i.e. approximately ±10% of full scale), with an average of 0.03. The distributions of differences between test and retest were submitted to a series of two-tailed t-tests. The average test retest difference was significantly different from zero for 13 subjects, with an a value of .05.5 These participants were selected for further inquiry. 3.1.2. Reference sounds All participants rated the negative reference sound as the most unpleasant sounds: ratings ranged from 0.97 to 1.00 with an average of 0.99. The ratings of the positive reference were more diverse across subjects, and this sound was not systematically judged as the most pleasant sound. They ranged from 0.05 to 0.5 with an average of 0.25. The average rating of the positive reference exceeded one third of full scale (this value was arbitrarily chosen) for 13 participants. They were selected for further inquiry. 3.1.3. Between-participant consistency: correlations The correlation between each pair of participants across the 57 ratings was considered as a measure of similarity of the ratings of the participants for the 57 sounds. There were no negative correlations, which indicates that no subject had misunderstood the meaning of the scales. We examined the similarities of the participants’ ratings with a hierarchical tree representation of the correlations (UGPMA). There was no clear clustering of different groups of subjects. In particular, lay and expert participants did not form two separate clusters, which suggests that the two groups used similar listening strategies. Eight participants had a lower correlation with all the other subjects and were selected for further inquiry. 3.1.4. Between-participant consistency: distances Finally, we checked the consistency between participants by considering the Euclidean distance between the 57 ratings of each subject and the median ratings across the participants. If participants had similarly rated the sounds, their ratings should be equivalently distant from the average judgement and the 5 Note that we did not use any correction for multiple comparisons here. Since the goal was to exclude outliers, correcting for multiple comparisons would in fact be less conservative.

5

distribution of distances should be unimodal and compact. The distribution of the distances between the participants appeared clearly unimodal, but nine participants were more distant from the average and were selected for further inquiry. 3.1.5. Excluding outliers Overall these different tests did not reveal any systematic clustering of participants but 12 participants were singled out by at least two tests. Examining the data for these participants highlighted two phenomena. First, eight subjects overemphasized their ratings: they systematically rated sounds that were negatively rated more negatively than the average of the other participants, and rated sounds that were positively rated more positively than the average of the other participants. This difference of strategy is not really a problem because it preserves the order of the ratings between sounds. Subsequent analyzes did not exclude these eight participants. Second, four participants had produced a few ratings that were inverted when compared to the average ratings for the other participants: for instance they negatively rated a sound that was positively rated across the other participants. The sounds with inverted ratings were different across these four participants. Since we could not interpret these differences of ratings, these four participants were excluded from subsequent analyzes. 3.2. Influence of experimental factors Ratings for the 43 selected participants were averaged across repetitions and submitted to a repeated-measure analysis of variance (ANOVA) with the 19 cars and the three positions of the buffeting module as the within-subject factors and the two groups of participants as between-subject factor. Here and in the following, all statistics are reported after correcting the degrees of freedom to account for violations of the hypothesis of sphericity of the data (Geisser–Greenhouse correction). The main effect of the cars was significant (F(18,738) = 208.343, p < .01) and the most important in size (g2 = 52.7%). The top panel of Fig. 3 represents the ratings averaged across subjects for the 19 cars and the three positions of the module. It shows that the different cars were rated with large differences of unpleasantness. Car B2 (a supermini manufactured in France in the 1990s) was used as the negative reference (with the buffeting module in the reference position) and was rated as the most unpleasant car, whatever the position of the module (average rating 0.99). Car D3 (a large family car produced in France from the 2000s on) that was used a the positive reference (with the buffeting module in the parking position) was rated on the positive end of the scale (average rating 0.24), but was not the most positively-rated car. Car D5 (a large family car produced in France in the first half of the 2000s) with the module in the center position was rated the most positively (average rating 0.02). The main effect of the position of the module was also significant (F(2,82) = 121.887, p < .01, g2 = 9.0%). As expected, a contrast analysis showed that cars recorded with the module in the parking position (average rating 0.46) were significantly less unpleasant than in the reference position (average rating 0.55, F(1,41) = 39.645, p < .01). More surprisingly, cars with the module in the center position were also significantly less unpleasant than in the parking position (average rating 0.33, F(1,41) = 70.610, p < .01). There was a small but significant interaction between the cars and the position of the module (F(36,1476) = 9.053 p < .01, g2 = 3.2%). The top panel of Fig. 3 shows that the effect of the module on unpleasantness slightly depended on the cars. Whereas the difference between the parking, center, and reference positions was approximately the same for most cars, there was for instance

6

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12 1 Parking Center Reference Negative Reference Positive reference

0.6 0.4

B2 (FR)

M5 (FR)

B1 (FR)

C3 (G)

A1 (IT)

B3 (FR)

M2 (FR)

C4 (FR)

C2 (FR)

M1 (FR)

D5 (FR)

M3 (FR)

D1 (FR)

M4 (US)

C1 (FR)

D3 (FR)

D2 (G)

0

D4 (FR)

0.2 D6 (FR)

Average rating

0.8

Cars 0.7

Average rating

0.6

Naive participants Expert participants

0.5 0.4 0.3 0.2 0.1 0

Parking

Center

Reference

Module position Fig. 3. Ratings of unpleasantness for the sounds played at their real loudness. The top panel represents the ratings averaged across subjects for the 19 cars and the three positions of the buffeting module. The first letter in the reference of the cars corresponds to the market segment. The letters between parentheses correspond to the nationality of the manufacturer. Cars were sorted according to the average ratings for the ‘‘parking’’ position. Filled symbols represent the two references used throughout the experiment. The bottom panel represents the ratings averaged across subjects for the two groups of subjects and the three position of the module. Vertical bars represent the 95% confidence interval for the averages.

almost no difference between the three positions of the module for car B2 and a much larger difference for car D5. The ratings were not significantly different between the two groups of participants (F(1,41) = 0.028, p = .869) but there was a significant interaction between the groups of participants and the module (F(2,82) = 11.415, p < .01). The bottom panel of Fig. 3 shows that lay participants rated parking position slightly less unpleasant than the expert participants. This effect was however much smaller than the other effects (g2 = 0.8%) and will not be further discussed. Similarly, the three-way interaction was significant (F(36,1476) = 1.884, p < .05) but very small in size (g2 = 0.7%). 3.3. Correlation with acoustic features Regression analyzes searched for acoustic features correlated with the ratings of unpleasantness. Psychoacoustical descriptors developed to account for elementary auditory sensations were the primary candidates. In particular, pilot tests and comments suggested that loudness had a strong influence on the unpleasantness judgments, and we expected Zwicker’s and Moore’s models of loudness of stationary sounds (ISO 532B and ANSI S3.4-2007 [31,55]) to account for a good proportion of the unpleasantness ratings. We also expected descriptors of roughness and fluctuation strength to play an important role, because the effect of the buffeting module was precisely to generate fluctuations of the signals. The 57 sounds consisted of broadband noises with no audible tonal components. Besides loudness and fluctuation strength, we finally hypothesized that the unpleasantness ratings may be based on the overall balance of energy in the spectrum, and that the ratings could also be potentially correlated with descriptors of sharpness. Three types of software calculated a large number of acoustic features for the 47 sounds, including models of loudness, sharpness, roughness, and fluctuation strength:

 Commercial pieces of software calculating usual psychoacoustical descriptors [42] (e.g. Genesis’s LEA6 and Genesis’s loudness toolbox7). They include Zwicker’s and Moore’s models of stationary loudness [31,55], percentile loudness N5 ; N10 ; L5 ; L10 , loudness model of impulsive sounds [56], Zwicker and Fastl’s models of sharpness, roughness, fluctuation strength and psychoacoustical annoyance [31], etc.,  The ‘‘Timbre toolbox’’, calculating acoustic features generally used in the context of music information retrieval, including different implementations of psychoacoustical descriptors and statistical moments of the sounds [57],  Custom-made implementations of Daniel’s and Weber’s model of roughness [47] and Sontacchi’s model of fluctuation strength [46]. The algorithms calculated the features both for the whole signals and different frequency bands across different time frames. This is particularly important since some participants had for instance indicated that they were sensitive to fluctuations in the higher or lower ends of the spectrum. In particular, fluctuation strength was calculated in 47 Bark bands. These frequency bands had a bandwidth of one Bark and an overlap of 0.5 Bark [31]. A total of 274 features were calculated. Loudness in phones (ISO 532B; loudness was calculated with a 2 ms time step; here the indicator corresponds to the maximum value) was best correlated with the unpleasantness ratings (r(55) = 0.92, p < .01 i.e. 85% of the variance of the ratings). The left panel of Fig. 4 represents the ratings as a function of the loudness

6 http://www.genesis-acoustics.com/en/lea_the_sound_lab_for_industry-14.html, last retrieved on October 28, 2013. 7 http://genesis-acoustics.com/sonie_en_ligne-32.html, last retrieved on October 28, 2013.

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

7

values for the 57 sounds. Loudness predicts the unpleasantness ratings fairly well for the louder sounds but the prediction is coarser for the quieter sounds. Thus, the data were submitted to a multilinear regression analysis. Data were entered in the model with a forward stepwise method (the a threshold to enter a new parameter was set to 0.1). Since the quantity of potential predictors (274) greatly outnumbers the data points used to fit the model, such an approach is very sensitive the initial dataset of predictors fed to the algorithm, and produces several models with an equivalent prediction quality. Therefore, a bootstrap procedure ran the regression analysis with different initial conditions.8 Ten thousand initial combinations of 50 predictors randomly chosen from the 274 indicators were tested. We selected only the solutions that used less than 5 parameters.9 The best models usually included one the different variants of loudness, sharpness, fluctuation strength, or a statistical moment of the spectrum. We report here one model that was consistent with participants’ comments. This model had three parameters: ISO 532B loudness, percentile loudness N10 (the loudness value exceeded 10% of time), and fluctuation strength in the 37th Bark band ([5216–6360 Hz]). The quality of this 3-parameter model was slightly better than simply using loudness as a predictor (r(55) = 0.96, p < .0.1 i.e. 92% of the variance of the ratings). The right panel of Fig. 4 represents the ratings as a function of the values predicted by this model. Prediction is improved mostly for the quieter sounds.

Loudness is the simplest model to predict unpleasantness of the sounds. This simple models provides a fair prediction for the louder sounds but is more imprecise for quieter sounds. Adding two parameters (fluctuation strength and percentile loudness N 10 ) to the model improves the prediction of the model, but the generality of this improvement is questionable. Right panel of Fig. 4 shows that prediction improves for the quieter sounds. Analyzes of individual results showed that data were noisier for the less unpleasant (i.e. quieter) sounds: for instance, there was less agreement regarding the positive than the negative reference. This suggests that data for quieter sounds were probably noisier than for louder sounds: For the quieter sounds, more subtle differences of timbre may also had an influence, but participants were probably focused on larger differences of loudness. Therefore, there is a possibility that the added parameters simply fitted the noise in the data. However, the largest 95% confidence interval for the regression analysis was 0.15, whereas the maximal 95% confidence for the average ratings averaged across subjects was 0.07. This suggests that dispersion of data across participants alone does not account for the variance of the data not explained by the model. Analyzes of the second part of the experiment (where loudness was equalized and participants could concentrate on other differences) will clarify this point, by removing loudness from the equation.

3.4. Discussion

The second part of the experiment (sounds with equal loudness) was analyzed following the same steps as the first part. Participants took on average 28 min to perform the experiment (from 9 to 50 min).

The first part of the experiment showed that ratings were fairly homogeneous across the selected participants, and the differences between the two groups of participants were marginal. The different cars were the main source of unpleasantness when participants rated the sounds at their real loudness and loudness explained a large part of the ratings. For instance, the sounds of car B2 (rated as the most unpleasant car whatever the position of the module) were notably louder than the other sounds. The influence of the module was smaller in size than the influence of the cars. Altogether, this suggests that buffeting created by the module was not the main cause of unpleasantness, although it also played a role. Instead, the direct influence of the car on the sound created by the flow of air was the main contributor to unpleasantness. Since ratings were mainly correlated with loudness, this suggests that different car designs strongly influence the loudness of the aerodynamic noises perceived inside the car, and that louder sounds are rated as more unpleasant than quieter ones. The module had nevertheless a smaller but significant influence on the ratings of unpleasantness, and this influence also depended on the different cars. Overall, cars were quieter with the buffeting module in the center position and louder in the reference position. In fact, when the module is in the center position, it protects the test vehicle from the flow of air, and the aerodynamic noise is quieter. It is however also interesting to compare cars D3 and D4. These cars were two instances of the same model (a large family car produced in France from the 2000s on). Ratings of unpleasantness were approximately the same in the parking, center, and reference positions positions for car D4 (0.11, 0.19, 0.22). They were however higher for car D3 in the reference position (0.45) than in the parking and center positions (0.24 and 0.12). This suggests that the influence of the module may be sensitive to small variations in the production the cars and positions of the cars and the module in the flow of air. 8 This procedure was loosely inspired by the method of bagging predictors used in machine learning [58]. 9 A rule of thumb is to use no more than one parameter per 10 data points [59].

4. Analysis: sounds with equalized loudness

4.1. Individual differences and outliers The data from the 48 subjects were first analyzed to detect potential outliers and groups of subjects. 4.1.1. Within-participant consistency Differences between tests and retests ranged from 0.01 to 0.20, with an average of 0.04. The distributions of differences between test and retest were submitted to a series of two-tailed t-tests. The average test retest difference was significantly different from zero for 9 subjects (2 expert and 7 lay participants), with an a value of.05. These participants were selected for further inquiry. 4.1.2. Reference sound The average rating across subjects was 0.82 for the negative reference and 0.16 for the positive reference. In addition, differences between participants were larger than in the first part of the experiment. Fig. 5 represents the participants’ ratings for the two reference sounds. It shows that, whereas there was a good agreement between expert participants, there were large differences between lay participants. Ratings for the positive reference ranged from 0.04 to 0.63 for the lay participants (average: 0.21) and from 0.00 to 0.17 for the expert participants (average 0.07). Ratings for the negative reference ranged from 0.36 to 0.99 for the lay participants (average: 0.75) and from 0.79 to 1 for the expert participants (average 0.93). This suggests that the task was more difficult for the lay participants than for expert participants, whose ratings were in agreement with the selection of references made before the experiment. Ratings for the positive reference exceeded 0.33 (one third of the full scale, arbitrary threshold) and were smaller than 0.66 (two thirds of the full scale) for the positive reference for 12 participants (they were all lay participants). These participants were selected for further inquiry.

8

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12 1

1 Parking Reference Centre

Ratings of unpleasantness

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

82

84

86

88

90

92

0

0

Loudness in phones

0.2

0.4

0.6

0.8

1

3−parameter model

Fig. 4. Correlation analyzes for the sounds played at their real loudness. Left panel represents the ratings of unpleasantness as a function of the loudness of the sounds. Right panel represents the ratings as a function of a 3-parameter model (including loudness, N 10 loudness, and fluctuation strength in the [5216–6360 Hz] band). The dotted lines represents the 95% interval of the regression models. Vertical bars represent 95% interval of the average ratings.

1

Rating

0.8 0.6

Negative reference Positive reference

0.4 0.2 0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 101 102 103 104 105 106 107 108 109 110 111 112 113 115 116 117 118 119

Participant index Fig. 5. Ratings of unpleasantness for the 47 participants for the two reference sounds, averaged across the different presentations. Vertical bars represent the 96% confidence interval of the mean. Participants with an index smaller than 100 were lay participants. Participants with an index greater than 100 were expert participants.

4.1.3. Between-participant consistency: correlations Overall, correlations between each pair of participants were larger for expert than for lay participants: pairwise correlations ranged from 0.27 to 0.82 for lay participants (average 0.41) and from 0.28 to 0.82 for expert participants (average 0.62). This confirms that there was a better agreement between expert than between lay participants. Even if there were very small and negative correlations between for some pairs of participants, no participant had produced ratings that were systematically uncorrelated with the other ones. 4.1.4. Between-participant consistency: distances The distribution of distances between participants was unimodal. Inspecting the data for seven participants that were the most distant from the mode of the distribution did not outline any strategy that would have set them apart. 4.1.5. Excluding outliers Five lay participants were singled out by at least two tests of consistency and were excluded from further analysis. These analyzes also strongly suggests that the ratings for the two groups of participants (lay and expert) were different. The consistency

checks reported in the previous paragraphs showed that expert participants were in good agreement between themselves (regarding the tests/retests and the patterns of ratings across sounds) and with the preselection of the most unpleasant and less unpleasant sounds conducted by the experimenters. This was not the case for the lay participants: the were less correlated between themselves and many of them disagreed about the positive and negative references being the less and the most unpleasant sounds. Overall, this suggests that the second part of the experiment (when sounds all had the same loudness) was more difficult than the first part. The next section will analyze whether the observed disagreements resulted from lay subjects appraising the sounds differently from the expert participants or from a difficulty for lay participants to produce consistent rating for small sound differences (i.e. noisier data).

4.2. Influence of experimental factors Ratings for the selected 43 participants were averaged across repetitions and submitted to a repeated-measure ANOVA with the 19 cars and the three positions of the buffeting module as

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

the within-subject factors and the two groups of participants as between-subject factor. As in the first part of the experiment the main effect of the cars was significant (F(18,738) = 30.225, p < .01) though not the most important in size (g2 = 14.6%). The main effect of the position of the module was the largest significant effect (F(2,82) = 245.359, p < .01, g2 = 27.5%). A contrast analysis showed that the ratings for the reference positions (average rating 0.64) were significantly higher than the ratings for the parking position (average rating 0.34, F(1,41) = 269.130, p < .01) but the ratings for center and reference positions were not significantly different (average rating 0.64 in both cases, F(1,41) = 0.021, p = .885). The interaction between the cars and the position of the module was significant (F(36,1476) = 18.996, p < .01, g2 = 9.9%). Top panel of Fig. 6 represents the ratings averaged across subjects for the cars and the different positions of the module. Car D6 in the parking position (the positive reference) was rated as the less unpleasant sound. Car B1 in the reference position (negative reference) was rated as one of the most unpleasant sounds. The effect of the module in the parking and center position was very similar for most of the cars: overall, cars were rated more unpleasant with the module in the reference or center positions. This was however not the case for a few cars (e.g. B1, D5, C2) that drove the significance of the interaction. The difference between the two groups of subjects was not significant (F(1,41) = 3.262, p = .078) and did not interact with the cars (F(18,738) = 1.712, p = .102) nor with the position of the module (F(2,82) = 2.380, p = .117). The three-way interaction was significant (F(36,1476) = 1.965, p < .05) but very small in size (g2 = 1.0%) and will not be further discussed.

4.3. Correlation with acoustic indicators Ratings averaged across participants were best correlated with fluctuation strength in the 37th Bark band ([5216–6360 Hz], r(55) = 0.79, p < .01 i.e. 62% of the variance of the ratings). Left panel of Fig. 7 represents the ratings of unpleasantness as a function of fluctuation strength in this band. Fluctuation strength was overall weaker for cars recorded with the buffeting module in the parking position and larger for cars recorded with module in the center or reference positions. Fluctuation strength alone almost perfectly classifies the two groups (parking vs. center and reference) but is not sufficient to predict the ratings within the two groups. Ratings were therefore submitted to the same procedure of multiple linear regression analysis with bootstrap as in the first part of the experiment. Several models resulted in the same (and equivalently good) predictive power. Most of them used descriptors of fluctuation strength, sharpness, and some description of the spectral envelope. The most meaningful model is represented in the right panel of Fig. 7 (r(55) = 0.91, p < .01 i.e. 83% of the variance of the ratings). It is based on fluctuation strength in the 37th Bark band ([5216–6360 Hz] also found for the sound at their real loudness), in the 30th ([2730–3246 Hz]) and the 32th Bark band ([3252–3903 Hz]), sharpness (DIN 45692), and the A-weighted sound pressure level. The largest 95% confidence interval for the regression analysis was 0.17, whereas the maximal 95% confidence for the average ratings averaged across subjects was 0.09. This again suggests that variability between subjects alone does not account for the variance of the data not explained by the model. Since the analysis of individual results showed that there was less agreement between lay participants than between expert participants we also ran the regression for the ratings averaged only across expert participants. The regression procedure found a similar model and improvement was only marginal.

9

4.4. Discussion The most striking difference between the second (equal loudness) and the first part of the experiment (real loudness) was that there was a clear effect of the module on unpleasantness ratings when loudness was equalized and that this effect was related to signal fluctuations. For instance, the ratings of the sounds of car B2 played at real loudness were not different, but they were different when the sounds were played at the same loudness. Equalizing loudness has therefore ‘‘unmasked’’ subtle effects that were unnoticeable in the first part of the experiment because of too large differences of loudness. The influence of the module was the largest effect but there was also a smaller effect of the different cars. The ratings of the different cars were not completely different from the ratings in the first part of the experiment. For instance, car B2 was the most unpleasant car when the sounds were played with their real loudness, and was still rated as one of the most unpleasant cars when the sounds were played with the same loudness. Overall, the results show that second part of the experiment was more difficult than the first part, especially for the lay participants. However, there was no evidence that the two groups of participants used different strategies or preferred different sounds: there simply was less agreement between lay than between expert participants but the average ratings were similar. The indicator of fluctuation strength [46] captured a good proportion of the fluctuations created by the buffeting module. More precisely, ratings were best correlated with fluctuations applied to a frequency band in the higher end of the sound spectra. This confirms some of the comments recorded in preliminary informal tests (some participants indicated that they had found fluctuations in higher frequencies particularly unpleasant). However, we found no evidence that some participants had been more annoyed by fluctuations in low-frequency regions.

5. General discussion The experiment reported in this article studied acoustic factors influencing the perceived unpleasantness of aerodynamic noises generated by a car moving through a flow of air. In particular, recordings made with a specific ‘‘buffeting module’’ allowed us to study the influence of fluctuations created by the interaction with other cars. Results of the first part of the experiment (when participants rated sounds played at their real level) showed that loudness was a major factor influencing unpleasantness. Different cars placed in a flow of air and different positions of the buffeting module produced sounds with large difference of loudness (about 10 phones) and participants rated the louder sounds as the more unpleasant. Analyzes of the first part of the experiment showed that models of loudness alone could predict about 85% of the variance of the ratings. Such a result is not surprising. Many studies using sounds with large differences of loudness have typically found loudness as the major contributor to ratings unpleasantness, annoyance, or quality [23,60,33,61,18,37,27,7,62,63]. The results also suggested that more subtle timbral aspects (amplitude modulations, spectral balance) also partially influenced the perceived unpleasantness of the sounds. The second part of the experiment specifically addressed this point by using the same sounds as in the first part but equalized to the same loudness. In this case, fluctuation strength (a measure of amplitude modulations) was the major contributor to unpleasantness. More precisely, unpleasantness ratings were best correlated with fluctuation strength in a frequency region between 2 and 6 kHz, thus confirming participants’ comments who indicated that fluctuations in the

10

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12 1

0.6 0.4

M3 (FR)

B2 (FR)

C4 (FR)

C2 (FR)

M1 (FR)

M4 (US)

B3 (FR)

D2 (G)

D5 (FR)

D3 (FR)

A1 (IT)

M5 (FR)

C3 (G)

D1 (FR)

B1 (FR)

C1 (FR)

D4 (FR)

0

M2 (FR)

Parking Center Reference Negative Reference Positive reference

0.2 D6 (FR)

Average rating

0.8

Cars

0.7 Naive participants Expert participants

Average rating

0.6 0.5 0.4 0.3 0.2 0.1 0

Parking

Center

Reference

Module position Fig. 6. Ratings of unpleasantness for the sounds played with equalized loudness. The top panel represents the ratings averaged across subjects for the 19 cars and the three positions of the buffeting module. The first letter in the reference of the cars corresponds to the market segment. The letters between parentheses correspond to the country of the manufacturer. Cars were sorted according to the average ratings for the ‘‘parking’’ position. Filled symbols represent the two references used throughout the experiment. The bottom panel represents the ratings averaged across subjects for the two groups of subjects and the three position of the module. Vertical bars represent the 95% confidence interval for the averages.

1

1 Parking Reference Centre

Ratings of unpleasantness

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0.01

0.02

0.03

0.04

0.05

0.06

Fluctuation strength ([5216−6360 Hz], vacil)

0

0.2

0.4

0.6

0.8

1

5−parameter model

Fig. 7. Correlation analysis for the sounds with equal loudness. Left panel represents the ratings of unpleasantness as a function of fluctuation strength. Right panel represents the ratings as a function of a 5-parameter model (including fluctuation strength in three frequency bands, sharpness, and dB(A)). The dotted lines represents the 95% interval of the regression models. Vertical bars represent 95% interval of the average ratings.

higher end of the spectrum were the most unpleasant. This indicator could classify cars recorded with the module in the parking position (creating no fluctuations) from cars recorded in the center or reference positions (creating maximal fluctuations). This indicator of fluctuation strength alone accounted for only 62% of the variance of the ratings. A model using additional

indicators of fluctuation strength in other frequency bands, sharpness and A-weighted sound pressure level achieved better prediction (83% of the variance of the ratings). Since sounds had the same loudness level, differences of A-weighted sound pressure levels have to be interpreted as reflecting differences in the low and high ends of the spectrum (where the frequency weighting is more

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

pronounced). This indicates that unpleasantness ratings were also influenced by the spectral balance of energy. The study took a great care in identifying outliers and analyzing potential differences between participants, since preliminary tests had suggested that different groups of participants may be sensitive to different aspects of the sounds. A few outliers were excluded from analyzes but analyzes were unable to highlight different strategies or preferences between groups of subjects. Only a difference of between-participant consistency was found between expert and lay participants in the second part of the experiment: there was less agreement between lay participants than expert participants. This probably reflects the fact that participants could focus on finer acoustic aspects when sounds had the same loudness. In fact, differences between expert and lay participants are usually found for tasks for which experts have been specifically trained (e.g. musical tasks for musicians) or regarding vocabulary used to describe sounds (see Lemaitre et al., for a review [54]). Differences of preference are also rarely reported: to our knowledge only Susini et al. have reported convincing evidence of groups of subjects with contradictory preferences [35]. The results of the analyzes of the second part of the experiment allow us to propose a metric to assess unpleasantness caused by wind buffeting, based on fluctuation strength in several frequency bands, sharpness and A-weighted sound pressure level. This metric has several advantages when compared to the other metrics we are aware of [21,38]. First, it uses only widespread monaural indicators found in common commercial pieces of software (fluctuation strength, sharpness, and sound pressure level) and does not require a formal detection of transient modulations. Second, it was based on data measured for sounds that were equalized in loudness, which prevented the influence of buffeting to be masked by large loudness differences. For instance, Blommer et al. and Hoshino et al. used sounds with large differences of loudness and both found that about 80% of the variance of the data was explained by loudness alone. The development of their specific metric to quantify the perception of buffeting was therefore based on only 20% of the variance of the data. Third, our model was developed for a larger number of sounds (57) than other studies (10 in [38], 9 in [21]), which is a better guarantee of its generality. Fluctuation strength is therefore a useful indicator to describe the perception of buffeting noise. This indicator was however developed and tested for tones or broadband noises modulated by sinusoids. Further work is needed to test and revise the model for other types of sounds, in particular when the modulation is not sinusoidal (see for instance a example of such a study of roughness [45]). References [1] Berglund B, Lindvall T, Schwela DH, editors. Guidelines for community noise. Geneva (Switzerland): World Health Organization; 1995. [2] Västfjäll D, Gulbol M-A, Kleiner M, Gärling T. Affective evaluations of and reactions to exterior and interior vehicle auditory quality. J Sound Vib 2002;255(3):501–18. [3] Morel J, Marquis-Favre C, Dubois D, Pierrette M. Road traffic in urban areas: a perceptual and cognitive typology of pass-by noises. Acta Acust United Acust 2012;98:166–72. [4] Lemaitre G, Susini P, Winsberg S, Letinturier B, McAdams S. The sound quality of car horns: designing new representative sounds. Acta Acust United Acust 2009;95(2):356–72. [5] Misdariis N, Cera A, Levallois E, Locqueteau C. Do electric cars have to make noise? an emblematic opportunity for designing sounds and soundscapes. In: Proceedings of the 11th Congrès Français d’ Acoustique and the 2012 Annual IOA Meeting, Acoustics 2012, Nantes, France, Society Française d’Acoustique, Paris, France; 2012. p. 1045–50. [6] Parizet E, Robart R, Chamard J-C, schlittenlacher J, Pondrom P, Ellermeier W, et al. Detectability and annoyance of warning sounds of electric vehicles. In: Proceedings of the 11th international congress on acoustics, Montreal, Canada. Melville (NY): Acoustical Society of America; 2013. paper 2aNSa5. [7] Shin S-H, Ih J-G, Hashimoto T, Hatano S. Sound quality evaluation of the booming sensation for passenger cars. Appl Acoust 2009;70:309–20.

11

[8] Van der Auweraer H, Wyckaert K, Hendricx W. From sound quality to the engineering of solution for NVH problems: case studies. Acta Acust United Acust 1997;83:796–804. [9] Parizet E, Guyader E, Nosulenko V. Analysis of car door closing sound quality. Appl Acoust 2008;69:12–22. [10] Widmann U. Three examples for sound quality design using psychoacoustic tools. Acta Acust United Acust 1997;83:819–26. [11] Otto NC, Wakefield GH. A subjective evaluation and analysis of automotive starter sounds. Noise Control Eng J 1993;41(3):377–82. [12] Pflueger M, Hoeldrich R, Brandl FK, Biermayer W. Subjective assessment of roughness as a basis for objective interior noise quality evaluation. In: Proceedings of the noise and vibration conference and exhibition, Traverse City, MI, Society of Automotive Engineers International, Warrendale, PA; 1999. SAE Technical paper series 1999-01-1850. [13] Wang YS, Shen GQ, Tang XL, Hamade T. Roughness modelling based on human auditory perception for sound quality evaluation of vehicle interior noise. J Sound Vib 2013;332:3893–904. [14] Bezat M-C. Qualification acoustique du typage sonore. application au typage sport. Master’s thesis. ATIAM (Paris VI, ENST, Aix-Marseille II, UJF Grenoble I); 2003. [15] Genuit K. Background and practical examples of sound design. Acta Acust United Acust 1997;83(5):805–12. [16] Beidl CV, Stücklschwaiger W. Application of the AVL-annoyance index for engine noise quality development. Acta Acust United Acust 1997;83:789–95. [17] Parizet E, Hamzaoui N, Ségaud L, Koch J. Continuous evaluation of noise uncomfort in a bus. Acta Acust United Acust 2003;89:900–7. [18] Kuwano S, Namba S, Hayakawa Y. Comparison of the loudness of inside car noises from various sound sources in the same context. J Acoust Soc Jpn (E) 1997;18(4):191–5. [19] Cerrato G. Automotive sound quality – powertrain, road and wind noise. Sound Vib 2009:16–24. [20] Bodden M, Booz G, Heinrichs R. Interior vehicle sound composition: wind noise perception. In: Proceedings of the joint congress, Congrès Français d’Acoustique/Tagung der Deutschen Arbeitsgemeinschaft für Akustik (CFA/ DAGA), Strasbourg, France, Société Française d’Acoustique, Paris, France; 2004. [21] Blommer M, Amman S, Abhyankar S, Dedecker B. Sound quality metric development for wind buffetting and gusting noise. In: Proceedings of the noise and vibration conference and exhibition, Traverse City, MI, Society of Automotive Engineers International, Warrendale, PA; 2003. SAE Technical paper series 2003-01-1509. [22] Peric C, Watkins S, Lindqvist E. Wind turbulence effects on aerodynamic noise with relevance to road vehicle interior noise. J Wind Eng Ind Aerodyn 1997:69–71. [23] Amman S, Greenberg J, Gulker B, Abhyankar S. Subjective quantification of wind buffeting noise. In: Proceedings of the noise and vibration conference and exhibition, Traverse City, MI, Society of Automotive Engineers International, Warrendale, PA; 1999. SAE Technical paper series 1999-011821. [24] Patsouras C, Fastl H, Patsouras D, Pfaffelhuber K. Psychoacoustic sensation magnitudes and sound quality ratings of upper middle class car’s idling noise. In: Proceedings of the international conference on acoustics. Rome, ICA 2001; 2001. [25] Kantarelis C, Walker JG. The identification and subjective effect of amplitude modulation in diesel engine exhaust noise. J Sound Vib 1988;120(2):297–302. [26] Frère A, Susini P, Misdariis N, Weber R, Péteul-Brouillet C, Guyader G. Vibrations’ influence on dieselness perception. Appl Acoust 2014;77:59–70. [27] Penna Leite R, Paul S, Gerges SNY. A sound quality-based investigation of the HVAC system noise of an automobile model. Appl Acoust 2009;70:636–45. [28] Waye KP, Öhrström E. Psychoacoustic characters of relevance for annoyance of wind turbine noise. Appl Acoust 2002;250(1):65–73. [29] Lemaitre G, Susini P, Winsberg S, Letinturier B, McAdams S. The sound quality of car horns: a psychoacoustical study of timbre. Acta Acust United Acust 2007;93(3):457–68. [30] Guski R. Psychological methods for evaluating sound quality and assessing acoustic information. Acta Acust United Acust 1997;83:765–74. [31] Zwicker E, Fastl H, Widmann U, Kurakata K, Kuwano S, Namba S. Program for calculating loudness according to DIN 45631 (ISO 532B). J Acoust Soc Jpn 1991;12(1). [32] Fastl H. The psychoacoustics of sound-quality evaluation. Acta Acust United Acust 1997;83:754–64. [33] Jeon JY, You J, Chang HY. Sound radiation and sound quality characteristics of refrigerator noise in real living environments. Appl Acoust 2007;68:1118–34. [34] Takada M, Takeno A, ichiro Iwamiya S. Effects of vehicle horn acoustic properties on auditory impressions and interpretation of reason for horn use by other drivers. Noise Control Eng J 2010;58(3):259–72. [35] Susini P, McAdams S, Winsberg S, Perry I, Vieillard S, Rodet X. Characterizing the sound quality of air-conditioning noise. Appl Acoust 2004;65(8):763–90. [36] Sato S, You J, Jeon J. Sound quality characteristics of refrigerator noise in real living environments with relation to psychoacoustical and autocorrelation function parameters. J Acoust Soc Am 2007;122(1):314–25. [37] Otto N, Feng BJ. Wind noise sound quality. In: Proceedings of the noise and vibration conference, Traverse City, MI, Society of Automotive Engineers International, Warrendale, PA; 1995. SAE Technical paper series 951369. [38] Hoshino H, Kato H. A new objective evaluation method of wind noise in a car based on human hearing properties. Acosut Sci Technol 2002;23(1):17–24.

12

G. Lemaitre et al. / Applied Acoustics 95 (2015) 1–12

[39] Fastl H. Fluctuation strength and temporal masking patterns of amplitudemodulated broadband noise. Hear Res 1982;8:59–69. [40] Fastl H. Fluctuation strength of modulated tones and broadband noise. In: Hearing, physiological bases and psychophysics. Proceedings of the 6th international symposium on hearing, Bad Neuheim, Germany. Heidelberg: Springer Verlage; 1983. p. 282–8. [41] Fleischer H. Calculating psychoacoustic parameters of amplitude modulated narrow noise bands. Biol Cybern 1982;44:177–84. [42] Zwicker E, Fastl H. Psychoacoustics facts and models. Springer-Verlag; 1990. [43] Terhardt E. On the perception of periodic sound fluctuations (roughness). Acustica 1974;30:201–13. [44] Aures W. Ein berechnungsverfahren des rauhigkeit (a procedure for calculating auditory roughness). Acustica 1985;58:268–81. [45] Pressnitzer D. Perception de rugosité psychoacoustique: d’un attribut élémentaire de l’audition à l’écoute musicale. PhD thesis. Université Paris 6; 1998. [46] Sontacchi A. Entwicklung eines modulkonzeptes für die psychoakustische geräuschenalayse unter matlab, diplomarbeit. Institut für Elektronische Musik der Kunstuniversität Graz, Graz, Austria; 1999. [47] Daniel P, Weber R. Psychoacoustical roughness: implementation of an optimized model. Acta Acust United Acust 1997;83:113–23. [48] Daniel P. Psychoacoustical roughness. In: Havelock D, Kuwano S, Vorländer M, editors. Handbook of signal processing in acoustics, 1st ed., vol. 1. New York: Springer; 2008. p. 263–74. ch. 19. [49] Hoeldrich R, Pflueger M. A generalized psychoacoustical model of modulation parameters (roughness) for objective vehicle noise quality evaluation. In: Proceedings of the noise and vibration conference and exhibition, Traverse City, MI, Society of Automotive Engineers International, Warrendale, PA; 1999. SAE Technical paper series 1999-011817.

[50] Lavandier C, Defréville B. The contribution of sound source characteristics in the assessment of urban soundscapes. Acta Acust United Acust 2006;92:912–21. [51] Barbot B, Lavandier C, Cheminée P. Perceptual representation of aircraft sounds. Appl Acoust 2008;69:1003–16. [52] Brainard DH. The psychophysics toolbox. Spatial Vis 1997;10:433–6. [53] Recommendation ITU-R BS.1534-1. Method for the subjective assessment of intermediate quality level of coding systems, international Telecom Union, Geneva, Switzerland; 2001–2003. [54] Lemaitre G, Houix O, Misdariis N, Susini P. Listener expertise and sound identification influence the categorization of environmental sounds. J Exp Psychol: Appl 2010;16(1):16–32. [55] Moore BCJ, Glasberg BR. A revision of Zwicker’s loudness model. Acta Acust United Acust 1996;82:335–45. [56] Boulet I. La sonie des sons impulsionnels: perception, mesures et modèles. PhD thesis. Université de la Méditerranée – Aix-Marseille II; 2006. [57] Peeters G, Giordano BL, Susini P, Misdariis N, McAdams S. The timbre toolbox: extracting audio descriptors from musical signals. J Acoust Soc Am 2011;130(5):2902. [58] Breiman L. Bagging predictors. Mach Lear 1996;24:123–40. [59] Howell DC. Statistical methods for psychology. PWS-Kent; 1992. [60] Ih J-G, Lim D-H, Shin S-H, Park Y. Experimental design and assessment of product sound quality: application to a vacuum cleaner. Noise Control Eng J 2003;51(4):244–52. [61] Ko NWM, Ho WF, Un WK. Response to air-conditioning system noise. J Sound Vib 1978;57(4):595–602. [62] Tang SK. Performance of noise indices in air-conditioned landscaped office buildings. J Acoust Soc Am 1997;102(3):1657–63. [63] Tang SK, Wong MY. On noise indices for domestic conditioners. J Sound Vib 2004;274:1–12.