Assessment of the subjective perception of music quality using psychoacoustic approach

Assessment of the subjective perception of music quality using psychoacoustic approach

International Journal of Industrial Ergonomics 53 (2016) 219e228 Contents lists available at ScienceDirect International Journal of Industrial Ergon...

2MB Sizes 0 Downloads 32 Views

International Journal of Industrial Ergonomics 53 (2016) 219e228

Contents lists available at ScienceDirect

International Journal of Industrial Ergonomics journal homepage: www.elsevier.com/locate/ergon

Assessment of the subjective perception of music quality using psychoacoustic approach Kuo Hao Tang a, *, Shih Ting Chen a, Yu Ting Tsai b a b

Department of Industrial Engineering and System Management, Feng Chia University, Taichung 40724, Taiwan, ROC Bachelor's Program in Precision System Design, Feng Chia University, Taichung 40724, Taiwan, ROC

a r t i c l e i n f o

a b s t r a c t

Article history: Received 18 August 2014 Received in revised form 22 November 2015 Accepted 17 January 2016 Available online xxx

It is generally agreed that sound quality is one of the most difficult to measure characteristics of an electroacoustic products such as an earphone or a loudspeaker. A conventional approach used to measure people's subjective perception of these sound reproduction products is to conduct a jury test on a group of experiment participants; however, jury tests require considerable costs, including those of effort and time. As development speed and cost become strategic competitive dimensions, electroacoustic industry needs a more efficient approach to assess their newly developed products for subjective sound quality. This study developed and validated a quantitative model, the tonal harmony level (THL), that can effectively predict people's subjective perceptions of music quality. Participants' subjective perception and preference was measured for four music genres by listening to short music excerpts (8 s) in both ordinal and interval scales. The purpose of using two scales is to examine the consistency between subjective perceptions and to determine the robustness of the subjective measurements. The experimental results were very stable over the two assessment procedures, and the objective THL measure is highly correlated to subjective preference. The analysis suggests that the construction of subjective music quality prediction models should also consider music genre. Among four types of music, musical solos consisted of human vocals accompanied by a few instruments has a distinct pattern from the other three types. Thus, while R2 value of the overall regression model is 0.707, the R2 values are 0.955 and 0.901 when four music genres are categorized into two groups according to their patterns. When efficiency and accuracy were simultaneously considered, according to the results of this study, the approach of twogroup categorization can be adopted. © 2016 Elsevier B.V. All rights reserved.

Keywords: Psychoacoustics Roughness Music quality Subjective perception Prediction model

Relevance to Industry The result suggests that using psychoacoustic parameters can accurately assess the subjective perception of music quality. This information is particularly useful for electroacoustic industry. The prediction models developed in this paper can expedite the development speed of sound reproduction products by avoiding excess testing time and efforts. 1. Research background and motivation Although music quality is influenced by numerous factors such as the recording and audio compression technology (Fazekas and Sandler, 2007; Kowalgin and Gamage, 2002), music quality is * Corresponding author. E-mail address: [email protected] (K.H. Tang). http://dx.doi.org/10.1016/j.ergon.2016.01.004 0169-8141/© 2016 Elsevier B.V. All rights reserved.

primarily determined by subjective feelings (Usher, 2006) and some research use music samples as an emotion measurement scale (Lu and Petiot, 2014). Because people's tastes in music differ, during the process of developing electroacoustic products such as earphones or loudspeakers, a conventional approach used to measure people's subjective perception of the newly developed products is to conduct a jury test on a group of experiment participants for data collection; however, jury tests require considerable costs, including those of effort and time (Kahana et al., 1997). One of the current trends in the electroacoustic industry is to increase the product specifications to meet the needs of consumers while reducing the development time and costs. A quick response time becomes an important competitive advantage for electroacoustic industry, like many other industries. Therefore, in the electroacoustic industry, establishing a model that can rapidly and accurately assess users' preference for the sound quality of music is crucial. Based on the concept of auditory roughness proposed by

220

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

Vassilakis (2005), this study developed and validated a quantitative model, the tonal harmony level (THL), that can effectively predict people's subjective perceptions of music quality. Experiment using both ordinal and interval scales were adopted for verification during in the model construction process to enhance the robustness and validity of the measurement results (Rossi et al., 2005; Poirson et al., 2010). The physical vibration of sound waves stimulates the human auditory system, resulting in the sense of hearing, which closely relates to individual physiological conditions and psychological statuses. Because the same sound exerts distinct psychological effects on people, the study of psychoacoustics has been developed (Fastl and Zwicker, 2007). By describing the ear structure and applying auditory physiology, researchers have constructed models that can depict people's auditory perceptions to quantify their subjective feelings of sounds and reflect the differences in subjective hearing perceptions. Common parameters used in psychoacoustics are loudness, sharpness, roughness, and fluctuation strength (Fastl and Zwicker, 2007). Numerous studies have developed sound quality assessment models by using psychoacoustic parameters and regression or other quantitative models, such as neural network models. The sounds involved in the assessment models included noise, traffic noise, and sounds caused by machinery appliances. For example, Raggamet et al. (2007) conducted a jury test to investigate the relationship between the subjective perception of traffic noise and the objective psychoacoustic and physiological parameters. The psychoacoustic parameters (i.e., loudness, roughness, sharpness, sound level, tonality, and fluctuation strength) and physiological parameters (i.e., heart rate and respiratory rate) in each circumstance were measured. Regression analysis results suggested that the subjective assessment was highly correlated with loudness, roughness, sharpness, and sound level; the correlation between the subjective assessment and the heart rate was also significant. Jeon and Sato (2008) assessed people's subjective perceptions of the noise produced by household refrigerators by adopting a semantic differential scale. Linear regression analysis was performed using psychoacoustic parameters and subjective perception scores. The results implied that loudness, roughness, and fluctuation strength were significantly correlated with the subjective scores, and thus, the parameters were used to construct an index to predict the subjective sound quality of household refrigerators. Yoon et al. (2012) conducted a jury test by adopting the semantic differential scale to investigate people's subjective perception of air conditioning noise in cars. The analysis involved using linear regression and neural network models, revealing that loudness, sharpness, and roughness were highly correlated with the subjective perception. Wang et al. (2013) conducted a jury test to examine the relationship between the auditory perception and roughness of car noise. In addition, Moon et al. (2015) conducted a jury test to determine the detectability of ringtones by adjusting frequency and sound level. Overall, in most of the models created to evaluate subjective sound quality, psychoacoustics has been applied to determine the subjective perception of noise rather than to measure how pleasant the music was to the ear. According to Western music theory (Helmholtz, 1885), Vassilakis (2005) reinterpreted the notion of auditory roughness to examine the sound harmony perceived by the human ear. Fig. 1 displays the standard curve for the consonance and dissonance obtained empirically by subjective rating of pairs of sine waves (Plomp and Levelt, 1965). For example, assuming that the two monotones were at frequencies of f1 and f2, when f1 ¼ f2, the frequency difference was 0 (at unison) and the consonance level was maximum at 1. When f1 remained the same and f2 gradually increased, suggesting the frequency difference increased, the

Fig. 1. The standard curve of two monotones at dissimilar frequencies; the X axis indicates the frequency differences measured by critical bandwidth (Plomp and Levelt, 1965).

Fig. 2. The consonance and dissonance of two monotones with dissimilar basic frequencies; the X axis indicates the number of semitones, and the Y axis represents the consonance and dissonance (Sethares, 1993).

consonance level of the two monotones initially decreased, and then increased up towards, but never quite reached the consonance of the unison. Sethares (1993) employed the Plomp-Levelt curve to investigate the consonance and dissonance of a chord of two monotones at dissimilar frequencies (i.e., f1 and f2). From Fig. 2, for example, when f1 ¼ 125 Hz, the chord was perceived as dissonant when the tone pair was five semitones apart. By contrast, when f1 ¼ 2000 Hz, the chord perceived as consonant when tone pair was only three semitones apart. Figs. 1 and 2 suggest that the degree of sensitivity of the human ear to dissonant chords at low and high frequencies differed. Therefore, Sethares (1993) proposed a model that could be used to calculate the level of dissonance of a chord comprising two monotones at various frequencies, as expressed Eq. (1).

dðxÞ ¼ e3:5x  e5:75x

(1)

where d(x) is the level of dissonance of the two monotones, and x can be calculated using Eq. (2).

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

221

Table 1 Music types and 8 pieces of music used in the roughness experiment. Music type

Instruments Only

Type Ⅰ

More Instruments, Higher Energy

1 2

Type Ⅱ

Few Instruments, Lower Energy

3

Type Ⅲ

More Instruments, Higher Energy

Type Ⅳ

Few Instruments, Lower Energy

Albums/Titles

Start playing time-Stop playing time playing time: 8 Sec

Kunzel/Cincinnati Pops Orchestra Karajan

Ein Straussfest Ⅱ/Artists Quadeille (Kunstler Quadeille) Op. 201 Dvorak: Symphonies 8& 9/Symphony No.9 in E minor Op.95 “Form the New World” e Ⅳ: Allegro con fuoco Dvorak Dumky Trio Lento maestoso

00:48.4e00:56.4

5

Beethoven:Violin Sonatas 9 “Kreutzer” & 5 “Spring”/Violin Sanata No. 9 in A major, op.47 “Kreutzer” Pornograffitti/Decadence Dance

01:59.2e02:07.2

6 7

Roxette Silje Vige

Look Sharp/Chances Alle Mine Tankar/Ut Av Mitt Liv

01:10.2e01:18.2 00:20.7e00:28.7

8

Paul Potts

Passione/Piano(Memory)

00:21.5e00:29.5

x ¼ jf 1  f 2 j

(2)

where f1 and f2 are sine waves at different frequencies. Because the two frequencies exhibited dissimilar energies, Vassilakis (2005) considered amplitude fluctuation degree and depth, which represent the amplitude change rates of sound signals within a unit time interval. Therefore, the amplitudes (a1 and a2) of the two monotones at various frequencies were incorporated into the model to reflect the level of dissonance, as can be expressed as the following:

RVassilakis

00:30e00:38

Jean-Guihen Queyras Itzhak Perlman/ Vladimir Ashkenazy Extreme Ⅱ

4 Instruments and Vocals

Performers

  2½minða1 ; a2 Þ 3:11 ¼ ða1 $a2 Þ0:1 $0:5$ $dðxÞ a1 þ a2

00:48.6e00:56.6 10:15.9e10:23.9

whether the auditory roughness model can reflect people's subjective perception of music quality and construct a prediction model for the sound quality of music by conducting correlation analysis. The measurement of subjective preference required a robust approach. According to the approach adopted by Rossi et al. (2005), in this study, preference was measured by using both interval and ordinal scales; the results produced by using various scales were compared to determine the robustness of the measured preference. 2. Research method 2.1. The production of a music sample

(3)

The auditory roughness model proposed by Vassilakis (2005) is a psychoacoustic parameter built on the basis of Western music and chords. The primary objective of the present study was to verify

The music assessed in this study was categorized into four types according to the instrument usage complexity and whether the human voice was included. The first music type (Type I) included symphonies and concertos, which were played by a wide range of instruments. The second type of music (Type II), including violin

Fig. 3. Time-frequency spectrograms for the four types of music; the X axis denotes time (sec), and the Y axis indicates frequency (Hz). Grey level signifies sound pressure levels (dB).

222

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

and piano solos, was played by few instruments. The third type of music (Type III), such as rock and dance music, comprised human vocals and various instruments. The fourth music type (Type IV), including vocal solos and Broadway musical solos, consisted of mainly human vocals accompanied by a few instruments. Two pieces of music from each of these four types were selected, and an 8-sec long segment from each piece was used as the music piece in the experiment (Table 1). Fig. 3 illustrates the difference among the four types of music on the time-frequency spectrograms. The spectral distribution of audio signals from the music that involved numerous instruments featured high density and energy. In addition, the spectral distribution of signals from music comprising human singing and instrumentation was wider than that of signals from music involving only instruments. After the eight music pieces were selected, a specific method was used to systematically change the sound quality of each music piece; therefore, each piece had several versions based on various qualities. ISO MPEG audio coding (ISO, 1993), an audio compression format, was used in this study to change the quality of the compressed files, thereby influencing people's perception of music quality. MPEG audio file compression, which directly influences the exhibition of sound quality, is based on the masking effect that human ear has on the perception of sound frequency, and employs a psychoacoustic model to filter the sound frequencies that are imperceptible to human ears. Only essential frequency information is retained to reduce the size of audio files. The masking index for the tonal masking components is expressed in Eq. (4) with a constant c (ISO, 1993):

avtm ¼ 2:025  0:275*zðjÞ  c

in

dB

(4)

where avtm is the tonal masking components, z(j) is the frequency index of the bark scale, and j is the index label of the tonal label. Based on the MPEG audio coding standard applied to audio file compression, when the value of c increases, more sound details are filtered, leading to a high degree of sound distortion. The MP3 format file is produced such that c set to 3.5, where the frequency details that are not easily detected by human ears are filtered to compress the audio file size (ISO, 1993). In the current study, the c value was forcibly adjusted, leading to a change in the masking threshold, thereby influencing people's subjective sense of hearing. A large c value caused a high amount of sound distortion and a low music quality. To alter the music quality of the eight music pieces listed in Table 1, an amateur music lover determined the c value according to the principle that the difference in the subjective perceptions of music quality can be exhibited. By adjusting the c value, which was within the range of 0e50, each music piece generated seven versions, with each version having distinct sound quality categorized into Levels 1 to 7. Level 1 represents the original sound quality in the CDs, that is, c ¼ 0. Consequently, a total of 48 altered music pieces based on the eight original music pieces were produced, along with the eight original music pieces, to be the experiment sample in this study.

corresponds to the first 24 critical bands of hearing, representing a frequency scale on which equal distances correspond with perceptually equal distances. The average values of MPEG audio coding for auditory roughness on the Bark scale were used to reflect the auditory characteristics of the human ear. To calculate the average auditory roughness for the ith Bark within the time interval DT(¼T0-T), the auditory roughness proposed by Vassilakis (2005) was summed from frequency fL to f 0L , and then divided by bandwidth DfB ð¼ f 0L  f L Þ in the ith Bark:

3 2 fL0 T0   X X 4 Rt ðiÞ ¼ 1=DT$ RVassilakis fj ; fjþ1 ; aj ; ajþ1 5 1=DfB $ t¼T

(5)

j¼fL 0

where T is the initial time point and T is the final time point of the observed time interval of one music piece; and fL is the initial fre0 quency bin and f L is the final frequency bin of the observed frequency region corresponding to the ith Bark. The total auditory roughness of each Bark obtained by Eq. (5) indicates the tonal harmony perceived by the human ear from that specific Bark. The tonal harmony can be calculated using the summation Rt(i):

Tharmonay ¼

24 X

Rt ðiÞ

(6)

i¼1

where Tharmonay is the tonal harmony level (THL). 2.3. Methods for the measurement and analysis of subjective sound quality Participants' subjective perception and preference was measured according to the structure proposed by Rossi et al. (2005) in Fig. 4, in which ordinal and interval scales were employed. The results were compared to examine the consistency between subjective perceptions and determine the robustness of the subjective measurements. Specifically, regarding interval scale, interval estimation method was adopted, in which all the stimuli (i.e., music pieces) were simultaneously presented to the participants, who assigned subjective scores to every stimulus. This is classified as a global method. Concerning ordinal scale, paired comparison was conducted, in which two stimuli were presented simultaneously in one test; the participants selected the stimulus they perceived as having

2.2. The calculation of tonal harmony level According to Sethares (1993), when pitches increase, the human ear's ability to distinguish pitches decreases, resulting in a low hearing sensitivity to the consonance of sounds. In this study, the Bark scale was adopted to differentiate and calculate auditory roughness. The Bark scale is a psycho-acoustical scale proposed by Zwicker (1961), where each cochlea filter was spaced equidistantly on the critical band scale. The scale ranges from 1 to 24 and

Fig. 4. Procedures for verifying ordinal and interval scales (Rossi et al., 2005).

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

superior sound quality. All stimuli were paired and tested using this local method. Subsequently, the ordinal scale used in the paired comparison experiment was converted into the z values by employing the model proposed by Thurstone (1927). A correlation analysis was subsequently performed on the two sets of values to cross check the consistency (Fig. 4). The structure proposed by Rossi et al. (2005) was employed to ensure the robustness of the measurement methods. Using the correlation analysis, the reliability of the jury test results was verified. Thurstone (1927) proposed an analysis model for paired comparison, employing the law of comparative judgment. The model assumes that for any two stimuli presented to a person, each stimulus elicits an unobserved continuous preference, which exhibits a normal distribution, and the person will select the stimulus with a greater preference. This process is called a discriminal process. Thurstone developed Cases I to V models for various ranking patterns, the equation of which is expressed in Eq. (7):

0 pi _ j

1

mi  mj B C ¼ F@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA 2 2 si þ sj  2rij si sj

(7)

where pi _ j is the probability distribution when the continuous preference of stimulus i is larger than that of stimulus j; mi and mj are the mean values of the continuous preferences of stimuli i and j, respectively; s2i and s2j are the variances of the continuous preferences of stimuli i and j, respectively; and rij is the covariance of continuous preferences of stimuli i and j. Assuming that the continuous preferences expressed in Eq. (7) are independent from each other (i.e.,rij ¼ 0), and the variances of continuous preferences of the two stimuli are equal (i.e., s2i ¼ s2j ), Eq. (7) can be converted into Eq. (8), which is the Case V model by Thurstone that is most commonly used.

  mi  mj pi_j ¼ F pffiffiffiffiffiffiffiffiffi 2s2

(8)

where s2 denotes the variance of continuous preference of stimuli i and j. Based on the Case V model, Woods et al. (2009) used Eqs. (9)e(11) to generate a z value of normal distribution for every stimulus.

$

pi_j

f i_j ¼ f i_j þ f j_i

% (9)

wherefi_j is the frequency of stimulus i that was chosen in the paired comparison of stimuli i and j; fj_i is the frequency of stimulus j that was chosen in the paired comparison of stimuli i and j; and Pi_j is the probability that the continuous preference of stimulus i is superior to that of stimulus j. According to Thurstone, Pi_j is a normal distribution, and its z value is calculated using Eq. (10):

$ 1

Zi_j ¼ F

THL

Music piece

Zi ¼

X

1

2

3

4

5

6

7

47.54 48.37 45.16 45.86 47.86 47.80 44.99 45.78

44.68 45.46 44.03 44.75 44.80 44.90 32.98 33.29

41.83 42.60 40.38 41.07 40.61 40.41 21.91 22.43

38.96 39.30 37.37 37.75 34.86 34.36 13.41 13.84

32.24 32.65 30.63 31.24 27.38 27.05 9.17 9.13

23.34 23.36 22.79 23.86 19.01 18.86 7.90 8.35

14.81 15.00 14.19 14.64 11.61 11.32 5.97 6.30

(10)

Zij n

(11)

where n is the total number of stimuli.

2.4. Experiment procedures for the measurement of subjective sound quality The interval estimation and paired comparison conducted to measure subjective sound quality were both within-subject experiments. A total of 13 university students participated in the experiments; the average age was 22 years old, and all of the students had no hearing problems. The experiments were conducted in a quiet room where the background noise was maintained at a level below 20 dB. The participants wore headphones to block out sounds from the external environment, concentrating on the sounds from the headphones (Kallinen and Ravaja, 2007). The Audio-Technica ATH-M50 headphones, which are professional studio monitor headphones and have a frequency response of 15 Hze28 kHz, were used in the experiments. The music player was developed using the software Visual Basic; all the tested music pieces (i.e., the seven versions for each of the eight music pieces described in Section 2.2) were input into the player in advance. In the interval estimation experiment, the seven quality versions of one music piece were randomly listed on the music player (which was also the measurement platform) with seven buttons. Music was played through headphones after participants pressed any of the buttons. A trackbar control was next to each button; participants moved the slider to any place on the trackbar based on their personal auditory perception to signify their preference for the sound quality of a specific version. The left side of the trackbar indicated the lowest sound quality, and the right side indicated the highest sound quality. For data collection, the lowest sound quality had a value of 0, and the highest sound quality had a value of 100; other levels of sound quality linearly corresponded to other numerical values according to the trackbar control method. Participants could repeatedly listen to any of the seven versions. After the seven versions of one music piece were rated, the next music piece would appear on the platform. The experiment completed when

Table 3 The mean z values for the levels of subjective preference for music at various quality levels.

Music piece

Sound quality level

1 2 3 4 5 6 7 8

%

where Zi is the average z value of stimulus i and can be calculated using Eq. (11):

Z value Table 2 The measured THLs for the eight music pieces at seven sound quality levels.

f i_j f i_j þ f j_i

223

Type

Overall

Sound quality level

1 2 3 4 5 6 7 8 Ⅰ Ⅱ Ⅲ Ⅳ

1

2

3

4

5

6

7

0.987 0.927 0.306 0.179 0.873 1.108 1.013 0.651 0.957 0.243 0.990 0.832 0.755

0.707 0.889 0.525 0.374 1.048 1.016 1.017 0.707 0.798 0.450 1.032 0.862 0.785

0.479 0.523 0.287 0.333 0.591 0.392 0.643 0.049 0.501 0.310 0.492 0.346 0.412

0.083 0.137 0.055 0.405 0.180 0.242 0.271 0.244 0.110 0.230 0.211 0.013 0.141

0.643 0.450 0.089 0.179 0.392 0.452 0.025 0.676 0.547 0.134 0.422 0.350 0.296

1.133 1.151 0.314 0.076 0.769 1.297 0.408 0.530 1.142 0.195 1.033 0.469 0.710

1.420 1.511 0.841 0.627 1.162 1.470 0.857 0.814 1.465 0.734 1.316 0.836 1.088

224

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

the participants finished rating all of the eight music pieces, which appeared on the music player in a random order. In the experiment of paired comparison, as in the experiment of interval estimation, each music piece was regarded as a block; the eight music pieces were presented on the music platform randomly. For measuring a specific piece, 21 pair combinations generated by the seven quality versions randomly appeared on the platform. A pair of quality versions was played through headphones after participants pressed the corresponding button. The playing sequence within a pair is also random. Participants then conducted paired comparisons and selected the version they perceived as having superior sound quality. To prevent the participants from being influenced by the temporal masking effect, the two quality versions were separately played with a time interval of 1 s. No time limitation was imposed on the two experiments. Considering the auditory fatigue, participants could adjust their progress in the experiments and the required break time. On average, each participant spent approximately 3 h on the experiments of interval estimation and paired comparison.

3. Results 3.1. Tonal harmony level According to Eq. (6), the THLs were calculated for the eight music pieces with seven quality levels, as listed in Table 2, which shows that changing the c value in Eq. (4) influenced the THL values. Notably, the THLs for Level 1 are values measured from the eight music pieces with the original CD sound quality. 3.2. The experiment of interval estimation The minimal value for the interval estimates was 0, and the maximal value was 100. To avoid bias among the participants, within-subject standardization was performed. The within-subject normalization was conducted across all 8 pieces of music and 7 levels of music quality within one participant. In other words, the z values for one participant were calculated based on 56 observations for the interval estimation. Table 3 lists the standardized mean z values of participant preferences for the music pieces at various

a

b

Fig. 5. a. Regression analysis of acoustic roughness (THL) and the preference in interval estimates for the eight music pieces as a whole. b. Regression analysis of acoustic roughness (THL) and the preference in interval estimates for the four types of music.

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

quality levels. Positive values here represent that the perceived music quality is above the average of all music pieces whereas negative values imply inferior music quality. Overall, regardless the music type, and except for Levels 1 and 2, when the sound quality was low, the level of preference for the sound quality was also low. The THLs obtained in Section 3.1 for the eight music pieces with seven quality levels and the standardized interval estimates of subjective music preference were used to construct a scatter plot and conduct regression analysis; the effect of using THL to predict the interval estimates of subjective music preference was observed. According to Fig. 5a, the value of R2 obtained in the simple linear regression analysis for the eight music pieces as a whole was 0.618. When the music was categorized into four types, the first, second, and third types of music were subjected to simple linear regression analysis, and the fourth type of music was subjected to logarithmic regression. The obtained R2 values ranged between 0.861 and 0.970. According to Fig. 5b, for the first and third types of music involving numerous instruments and with complex structures, the prediction conducted using THL was relatively accurate; the R2 values for these two types of music were higher than the R2 values for the second and fourth types of music involving few instrument and with simple structures. For the fourth type of music that contained human vocals and few instruments, the relationship between THL and preference was nonlinear, particularly when the THL was low. Thus, the performance of logarithmic regression (R2 ¼ 0.861) was superior to that of linear regression (R2 ¼ 0.766). 3.3. The experiment of paired comparison The aforementioned approach adopted by Woods et al. (2009) was used in this study to calculate the preference measured in the paired comparison. Table 4 displays the probabilities that each music quality version was selected as the preferred music quality; the probabilities were calculated by using Eq. (9) based on the measurement results of the paired comparison. For example, the probability that Level 1 was perceived to be superior to Level 5 by all experiment participants was 0.82. To facilitate meaningful z-

Table 4 The probabilities of preferences for various sound qualities for the eight music pieces as a whole. Prob.

Sound quality level 1

Sound quality level

1 2 3 4 5 6 7

0.41 0.38 0.29 0.18 0.15 0.09

2

3

4

5

6

7

0.59

0.63 0.67

0.71 0.71 0.59

0.82 0.76 0.73 0.76

0.85 0.81 0.84 0.83 0.77

0.91 0.89 0.88 0.87 0.87 0.87

0.33 0.29 0.24 0.19 0.11

0.41 0.27 0.16 0.12

0.24 0.17 0.13

0.23 0.13

0.13

value conversion, the probability of 1 was converted into 0.98, and the probability of 0 was converted into 0.02. Moreover, based on Table 4, the average z values of the average sound quality preference were calculated using Eqs. (10) and (11), as shown in Table 5. For example, since the z value of F1(0.82) is 0.91, thus, in Table 5, the z value preference for the case where Level 1 was perceived to be superior to Level 5 was 0.91. In addition, the average z values of the sound quality preference for the four music types and eight music pieces were calculated using the same method as shown in Table 6. Note that the bottom row “Overall” in Table 6 is the rightmost column, averaged z values, shown in Table 5. Regardless of music type, when the sound quality was low, the perceived music quality was low. Similar to the approach adopted for producing Fig. 5, the THLs for the various qualities of music and the standardized z values (Table 6) for the subjective music preference were used to construct a scatter plot and conduct regression analysis. According to Fig. 6a, the R2 value obtained from the simple linear regression analysis of the eight music pieces as a whole was 0.707. In addition, for the four types of music, Fig. 6b suggests that for the prediction of music preference in paired comparison, the THL had a superior predictability for the first, second, and third types of music in the simple linear regression analysis. The R2 values ranged from 0.952 to 0.989. However, for the fourth type of music that contained human singing and few instruments, the relationship between THLs and the values of preference was nonlinear. Thus, logarithmic regression analysis was conducted, and the obtained R2 was 0.901. Fig. 6b suggests that the simple linear regression model could be used to predict preference for the first, second, and third music types, and the logarithmic regression model was suitable for the fourth music type. This result is consistent with that described in Section 3.2. 3.4. Model construction for the subjective music preference Based on the verification procedures for ordinal and interval scales depicted in Fig. 4 (Rossi et al., 2005), correlation analysis was conducted on the results of interval estimation (Section 3.2) and paired comparison (Section 3.3) (Fig. 7). The reliability of the jury test result was verified using the correlation analysis. According to Fig. 7, the Pearson's correlation coefficient (r) was 0.896, indicating a strong correlation between the result of interval estimation and that of paired comparison. Thus, the data provided by participants during the jury test was robust and reliable. Generally, although paired comparison required numerous comparisons, the comparisons were local, which requires less mental workload when compared with the global method used in interval estimation. From Sections 3.2 and 3.3, the results of paired comparison had superior

Table 6 The mean z values of the subjective preference for music at various quality levels. Z value

Table 5 The z values of preferences relative to various sound qualities and the average z values of the preferences for the eight music pieces as a whole. Z Value

Sound quality level 1

Sound quality level

1 2 3 4 5 6 7

2

3 0.22

0.22 0.32 0.56 0.91 1.02 1.36

0.45 0.56 0.71 0.87 1.25

0.22 0.62 0.98 1.20

Music piece

Avg. Z 4

0.32 0.45

5 0.56 0.56 0.22

0.71 0.94 1.10

6 0.91 0.71 0.62 0.71

0.74 1.10

7 1.02 0.87 0.98 0.94 0.74

1.10

1.36 1.25 1.20 1.10 1.10 1.10

0.626 0.516 0.321 0.203 0.156 0.492 1.018

225

Type

Overall

Sound quality level

1 2 3 4 5 6 7 8 Ⅰ Ⅱ Ⅲ Ⅳ

1

2

3

4

5

6

7

0.772 0.798 0.726 1.108 1.342 1.120 0.953 1.265 0.788 0.101 1.065 0.850 0.626

0.803 0.720 0.970 0.737 1.127 1.366 1.085 1.144 0.740 0.070 0.930 0.658 0.516

0.284 0.659 0.719 0.787 0.665 0.784 0.789 0.920 0.482 0.160 0.505 0.300 0.321

0.099 0.477 0.562 0.000 0.232 0.059 0.209 0.036 0.290 0.327 0.361 0.055 0.203

0.005 0.161 0.235 0.045 0.527 0.491 0.198 0.629 0.083 0.034 0.309 0.182 0.156

0.847 0.757 0.982 0.923 1.078 1.078 1.174 1.072 0.799 0.154 0.874 0.472 0.492

1.106 1.760 1.760 1.665 1.760 1.760 1.665 1.665 1.418 0.470 1.679 1.099 1.018

226

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

a

b

Fig. 6. a. Regression analysis of the acoustic roughness (THL) and the preference in paired comparison for the eight music pieces as a whole. b. Regression analysis of the acoustic roughness (THL) and the preference in paired comparison for the four types of music.

predictability in the regression analysis; thus, the results from paired comparison were used to construct the model of subjective music preference. The upper part of Table 7 shows the prediction models for the subjective quality levels of the four music types; the R2 values ranged between 0.901 and 0.989. Since it can be observed that the data points for the first, second, and third types of music in Fig. 6b overlapped, thus, the four types of music were divided into two groups. Group I comprised the first, second, and third types of music, and Group II contained the fourth type of music. The middle part of Table 7 lists the regression models for the two groups, and Fig. 8 depicts the scatter plot. The R2 value for Group I was 0.955. The lower part of Table 7 shows the overall regression model; the R2 value was 0.707, which was significantly lower than that for Groups I and II. Fig. 7 shows a linear relationship was observed between the THL and the subjective sound quality for Group I. By comparison, the relationship between the THL and the subjective sound quality for Group II was nonlinear. Before the THL reached a threshold, the

change in the THL within the range from 5 to 15 corresponded to a substantial change in the subjective music quality. Conversely, as the THL increased from 20 to 50, the corresponding subjective music quality only slightly increased. This suggests that participants' sensitivity to the change of THL was high when the THL was small, and after THL exceeded the threshold, the sensitivity decreased. 4. Conclusion The results of interval estimation and paired comparison showed that the measurement of the THL for various music pieces reflected the participants' subjective perception of the sound quality. When the THL decreased, the perceived preference decreased. Thus, the THL can reflect subjective perception of music quality, and therefore can be used to construct a model for predicting subjective sound quality. Participants' perception difference for the four types of music caused the variance in the prediction models. When categorized

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

227

Fig. 7. Comparison of results from interval estimation and paired comparison regarding preference for music quality.

Table 7 Models for the prediction of subjective music quality and the R2 values. Prediction Models for subjective music quality Type

Group Overall

Ⅰ Ⅱ Ⅲ Ⅳ Ⅰ Ⅱ

SMQ SMQ SMQ SMQ SMQ SMQ SMQ

¼ ¼ ¼ ¼ ¼ ¼ ¼

0.069*THL e 2.408 0.086*THL e 2.925 0.085*THL e 2.748 1.403*In(THL) e 3.839 0.080*THL e 2.682 1.403*In(THL) e 3.839 0.060*THL e 1.810

R2 0.952 0.969 0.989 0.901 0.955 0.901 0.707

into two groups to construct prediction models, the R2 values for the Group I and II model (0.955 and 0.901, respectively) were both significantly higher than the R2 value in the overall regression model (R2 ¼ 0.707), implying that the construction of subjective music quality prediction models should also consider music type. When efficiency and accuracy were simultaneously considered,

according to the results of this study, the approach of two-group categorization can be adopted. The prediction model of sound quality in this study was developed using the auditory roughness model proposed by Vassilakis (2005). It can be applied to product development in the electroacoustic industry. For example, after electroacoustic product prototypes are developed, the prediction model can be employed to rapidly assess users' preferences for the music sound quality by measuring the electroacoustic parameters without the presence of the users. Moreover, after acquiring the capability to simulate the physical characteristics of product components, manufacturers who apply computer aided engineering technique can predict the quality of music played by electroacoustic products to some extents by employing the proposed prediction model during the engineering design stage. This technique can substantially save time and the cost spent on repeatedly developing product prototypes and testing user preference.

Fig. 8. Scatter plot and regression models for Group I (firstethird types of music) and Group II (fourth type of music).

228

K.H. Tang et al. / International Journal of Industrial Ergonomics 53 (2016) 219e228

Acknowledgment The authors would like to thank the National Science Council, Taiwan, R.O.C. for the financial support of this research under Grant No. NSC-100-2221-E-035-078-MY3. References Fastl, H., Zwicker, E., 2007. Psychoacoustics: Facts and Models, third ed. SpringerVerlag Berlin Heidelberg, New York. Fazekas, G., Sandler, M., 2007. Intelligent editing of studio recordings with the help of automatic music structure extraction. In: 122nd Audio Engineering Society Convention, May, pp. 5e8. Helmholtz, G.L.F., 1885. On the Sensations of Tones as a Physiological Basis for the Theory of Music, 2nd English Edition. Dover, New York. ISO, 1993. ISO/IEC 11172-3 Information Technology e Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1,5 Mbit/s e Part 3: Audio. International Standards for Business, Government and Society. Jeon, J.Y., Sato, S.I., 2008. Annoyance caused by heavyweight floor impact sounds in relation to the autocorrelation function and sound quality metrics. J. Sound Vib. 311, 767e785. Kahana, Y., Nelson, P.A., Kirkeby, O., 1997. Objective and subjective assessment of systems for the production of virtual acoustic images for multiple listeners. In: 103nd Audio Engineering Society Convention, Sep, pp. 26e29. Kallinen, K., Ravaaja, N., 2007. Comparing speakers versus headphones in listening to news from a computer-individual differences and psychophysiological responses. Comput. Hum. Behav. 23, 303e317. Kowalgin, Y., Gamage, Y., 2002. Algorithms of digital audio data compression; standards, problems and perspectives of development. In: 21st Audio Engineering Society Conference, June, pp. 1e3. Lu, W., Petiot, J.F., 2014. Affective design of products using an audio-based protocol:

application to eyeglass frame. Int. J. Ind. Ergon. 44, 383e394. Moon, H., Han, S.H., Chung, J., 2015. Applying signal detection theory to determine the ringtone volume of a mobile phone under ambient noise. Int. J. Ind. Ergon. 47, 117e123. Plomp, R., Levelt, W.J.M., 1965. Tonal consonance and critical bandwidth. J. Acoust. Soc. Of Am. 38, 548e560. Poirson, E., Petiot, J.F., Richard, F., 2010. A method for perceptual evaluation of products by naive subjects: application to car engine sounds. Int. J. Ind. Ergon. 40, 504e516. Raggam, R.B., Cik, M., Holdrich, R.R., Fallast, K., Gallasch, E., Fend, M., Lackner, A., Marth, E., 2007. Personal noise ranking of road traffic: subjective estimation versus physiological parameters under laboratory conditions. Int. J. Hyg. Environ. Health 210, 97e105. Rossi, G.B., Crenna, F., Panero, M., 2005. Panel or jury testing methods in a metrological perspective. Metrologia 42, 97e109. Sethares, W.A., 1993. Local consonance and the relationship between timbre and scale. J. Acoust. Soc. Am. 94, 1. Thurstone, L.L., 1927. A law of comparative judgment. Psychol. Rev. 34, 273e286. Usher, J.S., 2006. Subjective Evaluation and Electroacoustic Theoretical Validation of a New Approach to Audio Upmixing. McGill University, Canada. Vassilakis, P.N., 2005. Auditory Roughness as a Means of Musical Expression. Select Reports in Ethnomusicology Perspectives in Systematic Musicology, 12. Wang, Y.S., Shen, G.Q., Guo, H., Tang, X.L., Hamade, T., 2013. Roughness modelling based on human auditory perception for sound quality evaluation of vehicle interior noise. J. Sound Vib. 332, 3893e3904. Woods, R.L., Satgunam, P., Bronstad, P.M., Peli, E., 2009. Statistical Analysis of Subjective Preferences for Video Enhancement. Harvard Medical School, Boston, MA, USA. Yoon, J.H., Yang, I.H., Jeong, J.E., Park, S.G., Oh, J.E., 2012. Reliability improvement of a sound quality index for a vehicle HVAC system using a regression and neural network model. Appl. Acoust. 73, 1099e1103. Zwicker, E., 1961. Subdivision of the audible frequency range into critical bands. J. Acoust. Soc. Am. 33, 248e248.