Neural network predictions of speech levels in university classrooms

Neural network predictions of speech levels in university classrooms

Applied Acoustics 62 (2001) 749±767 www.elsevier.com/locate/apacoust Neural network predictions of speech levels in university classrooms Joseph Nann...

282KB Sizes 0 Downloads 32 Views

Applied Acoustics 62 (2001) 749±767 www.elsevier.com/locate/apacoust

Neural network predictions of speech levels in university classrooms Joseph Nannariello a,*, Murray Hodgson b, Fergus R. Fricke c a Department of Architectural and Design Science, University of Sydney, NSW 2006, Australia School of Occupational and Environmental Hygiene and Department of Mechanical Engineering, University of British Columbia, 2206 East Mall, Vancouver, British Columbia V6T 1Z3, Canada c Department of Architectural and Design Science, University of Sydney, NSW 2006, Australia

b

Received 8 August 2000; accepted 2 October 2000

Abstract At the schematic design stage of a classroom there is a need for an expeditious and accurate method of predicting the distribution of sound levels (speech levels). The objective of this work is to investigate the possibility of developing a method of predicting the sound propagation (SP) in university classrooms, using arti®cial neural networks. Constructional and acoustical data for 34 randomly chosen unoccupied University of British Columbia (UBC) classrooms were used for the neural network analyses. One source position, both directional and omnidirectional sources, and a number of listener positions were chosen in each classroom making a combination of 182 cases available to train the neural networks. Assessments have been made of the method by comparing the predicted sound propagation obtained using neural networks with measured values, with predictions made using Barron's revised theory and the Hopkins±Stryker equation. The results indicate that there is a good basis for using trained neural networks to predict the distribution of sound levels in empty classrooms. The results also indicate that neural networks trained with variables which have a causal relationship to the acoustical quality of the UBC classrooms produce reliable and accurate predictions. RMS errors for Sound Propagation, in each of the frequency bands, are within the subjective di€erence limen for steady-state sound pressure levels, which is about 1dB (i.e. E=E ˆ 0:26 where E is energy density). # 2001 Elsevier Science Ltd. All rights reserved.

* Corresponding author. Fax: +61-2-9351-3031. E-mail address: [email protected] (J. Nannariello). 0003-682X/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved. PII: S0003-682X(00)00088-8

750

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

1. Introduction Classrooms are primarily auditory-verbal learning environments in which students must have access to acoustic signals (speech) in order to understand and learn. University classrooms are designed and purpose built for this sole reason, that is, for speech communication from a source to a number of listeners. These rooms should be designed to ensure that adequate acoustical conditions are a practical reality. This may be done by planning for the highest possible degree of speech intelligibility, while maintaining a natural sound quality. The two important ingredients of an acoustically optimized classroom are a suciently low (relative to the speech level) background noise level and a suciently low reverberation time. The background noise may be from a heating/ventilation unit, activity in an adjacent classroom or corridor, trac or aircraft noise, student activity within the classroom, or any combination of these. Reverberation is the multiple re¯ection of sounds within a room that can prolong and distort the original sound components. Research has suggested that, to obtain high intelligibility for normal-hearing adults working in their ®rst language, the optimal mid-frequency reverberation time should be 0.4± 0.75 s. Furthermore, the optimal background noise level is 35 dB (A) [1±4]. Values which exceed these levels will result in an inadequate signal-to-noise ratio S/N (S/N must exceed 15 dB) and hence a poor teaching environment [2]. There is an extensive body of literature which documents the e€ects of poor classrooms acoustics on speech intelligibility. Nevertheless, testing for speech intelligibility remains a dicult task. Not only are there acoustical variables to be considered, but the skills of the speaker and listeners, as well as the speech content need to be considered. In the past 20 years much has been written about speech intelligibility, and the literature is replete with con¯icting data and conclusions [3±9]. Speech intelligibility, and the various predictors [7±10] of it, will not be discussed here. However, speech intelligibility is dependent upon the reverberation time (RT60), the background noise level, and the sound level. The accurate prediction of the distribution of the sound level in university classrooms, at the conceptual design stage, is the main concern of this study. 1.1. Distribution of sound levels in rooms There are a number of methods available for predicting the distribution of sound levels in rooms. The use of empirical equations allows the prediction of the variation of sound level with distance, the absorption of the internal surfaces and reverberation time [11,12]. Acoustic scale models and ray-tracing techniques are two other methods for predicting the distribution of sound levels in an enclosure [13±15]. In addition, a number of theoretical models exist that can be used to calculate sound level in rooms. The most widely used is the Hopkins±Stryker equation [16]. Speech intelligibility scores can be predicted at the drawing board stage from estimated source levels and the characteristics of the room. This is done by ®rst calculating, in octave bands and at each receiver position, long-time averaged speech levels using [9]:

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

  Lp …r† ˆ Lw ‡ 10 log Q=4r2 ‡ 4=A ‡ 0:1; dB

751

…1†

where, Lp(r) is the sound pressure level in dB at a distance r(m) from the source, Q is the directivity of the source, Lw is the sound power level of the source in dB, and A is the total sound absorption in the room in square metres. The predicted reverberant speech level [second term of Eq. (1)] is an acoustic parameter pertinent to understanding how speech intelligibility is obtained. The predicted reverberant and direct sound levels [the ®rst term in the Eq. (1)] can be compared with the objective of obtaining a positive direct-to-reverberant sound ratio [4]. Unfortunately, this expression is only suitable for predicting the sound level from a steady sound source in a fully di€use ®eld, and research and listening experiences have shown that the results from the traditional equation can be very inaccurate and usually produce predictions which are larger than measured values [29]. A number of studies have been carried out of sound level distribution in concert halls and in theatres for speech. The total sound level in a concert hall was calculated by Barron [17,18] based on the statistical theory of room acoustics. The so called `revised theory' was proposed to account for the variations of early sound energies among the seats, by dividing the received sound energy of the impulse response into three components: the direct sound d, the early sound (delay < 80 ms, er) and the late sound (delay > 80 ms, l). Also, reverberation time, RT60, or early decay time, EDT10, room volume, V, and source-receiver distance, r, were used [17]. This technique was applied to theatres, and equations were developed, with three additional features, to predict the total sound level. The three additional features related to the directivity of the source, prediction of the early sound, and the energy lost through a proscenium opening. Barron's expression for the total sound level at a receiver position is: Total sound level ˆ 10 log…ed ‡ l†dB

…2†

where ed l

ˆ ˆ

…d ‡ e† ˆ 100 n=r2 …31200fRT60 =V†e

0:04r=RT60

…3† e

0:68=RT60

…4†

and where, ed is the early sound energy including the direct sound,  is the directivity factor, n is the early re¯ection ratio, l is the late sound energy (a constant is changed from the concert hall revised theory since the time limit now is 50 ms), f is the energy fraction in case of a proscenium stage (for classrooms f is equals to unity), RT60 is the reverberation time in seconds, and V is the volume of the enclosure in cubic metres. Eq. (2) can be used to calculate the total speech level, however, a simple method to predict sound level distribution in classrooms would be useful. Most recently, Hodgson et al. [19] have developed a method for determining typical long-term speech levels and background noise levels during lectures. Multivariable

752

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

linear regression analysis was used to develop empirical prediction models. The following statistical model predicted room-average A-weighted instructor speech level SL: SL ˆ 48:5

2:6Isex ‡ 0:58SA

4:0 log r ‡ 0:013V

11:7 log Ao dB

…5†

where Isex is the instructor's sex (0 value for male and 1 for female), r is the instructor/student distance, V is the volume of the room, A0 is the total occupied room absorption, and SA is the student-activity noise level. The statistical model (independent of SL) which predicted SA is: SA ˆ 83 ‡ 10:0 log n

34:4 log A0 ‡ 0:081A0 dB

…6†

where n is the number of students. It seems that the use of these statistical models to predict speech levels at the schematic design stage of a classroom is valid, however, as concluded in Ref. [19], the technique requires further validation. This is clearly demonstrated by the results of the study which showed that the statistical models for SL and SA accounted for only 66 and 69% of the variance respectively [19], therefore explaining only 66 and 69% or less of the variation in the data; hence predictions when using the models may be poor. It is for these and other reasons, such as the lack of success in applying traditional techniques, and the extended time required to conduct calculations, that it is worth investigating an alternative approach to predicting sound levels in rooms. One such approach is the use of arti®cial neural networks. 1.2. Neural network analysis Neural network analysis is an alternative computational paradigm to the conventional machine computation method introduced by von Neumann [20]. It involves computer models inspired by knowledge from neuroscience, though they do not try to be biologically realistic in detail. They consist of processing elements, or units, that attempt to model the percipience and intuitive functions of the human brain. Essentially, neural network analysis involves the design of a speci®c network architecture that includes a speci®ed number of layers each consisting of a certain number of either input, hidden, or output neurons [21]. The training of the network is an iterative process. One of various learning algorithms is used to optimally adjust the weights of the connections between the neurons in the network in order to optimally predict (®nd a best ®t) the sample data on which the training is performed. The trained networks can than be used to generate predictions for unknown cases. While one of the major advantages of neural networks is that the user need not have any knowledge of the underlying model on which input variables depend, the predominant disadvantage is that it is dicult to interpret the results in traditional analytic terms. There are other important limitations to the successful training of neural networks. Much has been written and discussed about the phenomena of neural networks, their limitations, the software that has been developed in the past

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

753

20 years, their possible application to practical problems, and the theory. This information can be found in numerous publications [21±26]. The recent resurgence in neural network research and the consequential increase in its applications have been due to their remarkable ability to derive meaning from complicated or imprecise data. Arti®cial neural networks (multilayer feedforward networks) based on the supervised learning procedure and the back-propagation learning algorithm Ð have been used extensively in the areas of structural and civil engineering. However, the use of neural networks to predict room acoustical parameters such as the work by Nannariello and Fricke [27,32,36] on the prediction of reverberation time, RT60, and strength factor, G, is fairly recent. 1.3. Objectives of current work The main objective of this study is to investigate the possibility of developing a method of predicting sound levels using neural network analysis, which is devoid of the complexities associated with the wave or geometrical acoustic theory. The new method should be capable of making reliable and accurate predictions, in octave bands, of the sound propagation, SP, at various listener positions in empty classrooms. SP is the received steady-state sound pressure level, Lp, minus the source sound power level, Lw, in dB. It is a useful measure describing how the room and its contents, independent of the source output level, a€ects the variation of sound pressure level and, thus, speech level, with distance from a source and the acoustic characteristics of the room [28]. The study also aims to investigate whether neural networks, trained only with the variables that are fundamental to the neural network analysis, and to the prediction of sound propagation in classrooms, can make useful and accurate predictions. The accuracy should be within the subjective di€erence limen of the sound pressure level, which is approximately 1 dB (i.e. E=E ˆ 0:26, where E is the energy density) [6]. In order to test the e€ectiveness of neural networks and their ability to make accurate predictions, comparisons will be made to actual measured octave-band sound pressure levels, and predictions by Barron's revised theory and the Hopkins±Stryker equation. 2. Procedure and source of input data 2.1. Source of input data for neural network analyses As reported in detail in Ref. [28], measurements of sound pressure levels and reverberation times were made in 34 randomly selected, unoccupied classrooms at the University of British Columbia (UBC). This represents 8% of the UBC classrooms. The classrooms varied from small seminar rooms with volumes of 110 m3 seating about 20 students, to large lecture theatres with a maximum volume of 3890 m3 seating over 500. The largest proportion of rooms had 40±60 seats, volumes of 250±500 m3, and volume-to-surface-area ratios of about 1.0 m. The classroom plan

754

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

shapes were mainly rectangular, with a level ¯oor. A few of the larger classrooms had a sloping ¯oor and seating. The range of data for the 34 classrooms used for the neural network analyses is shown in Tables 1 and 2. The critical distance in classrooms varies signi®cantly, depending on the volume and other factors. Typically, the student position closest to the instructor is at a distance of about 1, 2 and 4 m in small, medium and large classrooms, respectively. The farthest receiver is about 4, 8 and 16 m from the speaker, respectively. The range of source-receiver distances of interest in the present work was 0.5±20.0 m (see Table 1). Sound level and reverberation time measurements were performed in the classrooms at typical student seating positions, with the source at the typical lecturing position. An omnidirectional and a directional (speech) source Ð located on the longitudinal axis and 1±3 m from the chalkboard wall Ð were used in separate measurements. The omnidirectional source consisted of a loudspeaker array located centrally at 1.5 m above the ¯oor. The directional (speech) source radiated with directivity approximating that of a typical human speaker and was located at 1.4 m above the ¯oor. The source was directed towards the classroom seating area. Procedures and particulars of the classroom measurements are described in Ref. [28]. Listeners' positions were chosen to represent typical seating positions towards the central front, middle and rear, and towards the side of the classrooms. These were at 1.5 m above the ¯oor on the longitudinal axis of the classroom (for the omnidirectional source), and at 1.1 m above the ¯oor at the side of the classroom for the directional (speech) source. At each receiver position, and in octave bands from 125 to 2000 Hz, the quantities, SP and reverberation time RT60, were determined from the measured impulse response and source power levels. Reverberation times were based on the 5 to 25 Table 1 Descriptive statistics and main physical characteristics of the 34 classrooms and 182 listeners positions used to `train', `verify' and `test' the neural networks Variable a

V LMXb DMXc WMXd HMXe STf ROSg a b c d e f g

Mean

Minimum

Maximum

Range

S.D.

623.77 12.10 9.71 11.11 3.98 130.59 3.89

110.00 4.35 3.15 4.35 2.20 25.60 0.50

3890.00 24.00 19.60 21.00 11.50 472.26 20.00

3780.00 19.65 16.45 16.65 9.30 446.66 19.50

758.94 4.70 3.93 4.13 1.81 108.84 3.99

V= volume (m3). LMX = maximum length (m). DMX = distance from front of podium to rearmost wall (m). WMX = maximum width (m). HMX = maximum height (m). ST = ¯oor area (m2). ROS = source±listener distance (m).

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

755

dB part of the sound decay. The acoustical and constructional data of the classrooms were used for the neural network analyses. 2.2. Input and output data for neural network analysis Neural networks generalize the knowledge implicit in the `data' or `training' set and provide solutions to new situations. It is for this reason that neural network analysis, using a combination of geometric, constructional and acoustical data, has been used in this study to develop an alternative method of comprehending and predicting the sound propagation in classrooms. The constructional and acoustical Table 2 Descriptive statistics and main physical characteristics of the 30 classrooms and 162 listeners positions used to `train' neural networks Variable

Mean

Minimum

Maximum

Range

S.D.

Va LMXb DMXc WMXd HMXe STf SXg SY SZ RXh RY RZ ROSi RTA_125j RTA_250 RTA_500 RTA_1000 RTA_2000 SP_125k SP_250 SP_500 SP_1000 SP_2000

578.67 11.78 9.53 10.51 3.92 119.47 0.90 0.17 1.45 2.79 0.39 1.57 3.88 1.22 1.35 1.06 0.96 0.93 13.30 11.97 12.17 12.00 11.18

110.00 4.35 3.15 4.35 2.20 25.60 2.50 1.20 0.60 2.10 5.85 0.55 0.50 0.43 0.40 0.41 0.47 0.43 28.20 25.70 23.20 21.60 19.60

3890.00 24.00 19.60 20.20 11.50 472.26 1.10 1.50 1.90 17.28 5.80 5.00 20.00 2.77 3.54 3.15 2.85 2.36 5.37 5.39 5.73 6.16 6.05

3780.00 19.65 16.45 15.85 9.30 446.66 3.60 2.70 1.30 19.38 11.65 4.45 19.50 2.35 3.14 2.73 2.38 1.93 22.83 20.31 17.47 15.44 13.55

777.68 4.74 4.02 3.58 1.89 105.73 0.84 0.40 0.19 3.71 1.31 0.66 3.99 0.48 0.76 0.60 0.58 0.45 6.09 5.17 4.78 4.41 4.09

a

Skewness 0.81 0.14 0.24 0.21 1.71 0.35 2.13 0.66 3.26 0.70 0.77 0.89 0.53 0.30 0.91 0.74 1.09 0.72 0.41 0.66 0.42 0.44 0.47

V = volume (m3). LMX = maximum length (m). c DMX = distance from front of podium to rearmost wall (m). d WMX = maximum width (m). e HMX = maximum height (m). f ST = ¯oor area (m2). g SX,SY,SZ= source coordinates. h RX,RY,RZ= listener coordinates. i ROS = source±listener distance (m). j RTA_125 to RTA_2000 = octave frequency band reverberation times at listener positions (s). k SP_125 to SP_2000= sound propagation in octave frequency bands (Hz). b

Kurtosis 0.32 0.57 0.57 0.04 3.44 0.51 6.07 0.68 14.31 0.53 2.33 3.21 0.75 0.63 0.72 0.02 0.51 0.04 0.88 0.24 0.67 0.72 0.84

756

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

data available were for a limited number of university classrooms and more importantly for a limited number of listener positions. A number of techniques were applied in an attempt to increase the size of the `training' set for the neural network analyses. One such technique was to include bound values in the `training` set. The values of the bounds were determined on the basis that for an omnidirectional source, the sound level at listener position 0.5 m from the source is the same in every classroom, irrespective of the room conditions. This is also the case for the sound level at listener position 1.0 m from the source. For the bound values in each classroom, the averaged reverberation time, in each octave frequency band, was used in the analyses. In each of the 34 classrooms the resulting number of listener positions varied between 4 and 7, averaging 5.35, making a combination of 182 cases available to be used for the neural network analyses. The `training' set Ð in particular the combination of input variables Ð is critical to the probability of a successful solution. The number of input variables is limited only by the number of classrooms and by the amount of information available and relevant to the process at hand. Because of the limited number of classrooms and combinations of source±listener measurements, the number of input variables in the study was of critical importance. Of the 30 `candidate' input variables investigated only 15 were ®nally considered for use in the neural network analyses. The `candidate' variables were selected heuristically and from various publications [2± 10,19,28,29]. The basic criterion Ð high causal relationship in predicting SP Ð was also used in the selection process. Linear and non-linear transformation models were used to establish the relationship between the input variables and between input variables and the output variable, SP. It was desirable that only variables having an orthogonal relationship should be used in the neural network analysis. Input variables were also chosen primarily for their simplicity and suitability. 2.3. Procedure: prediction of SP The 15 input variables determined to have genuine information content, and hence used in the network analyses to predict SP were the source type, S-type, the classroom volume, V, the maximum length, LMX, maximum width, WMX, and maximum height, HMX, the maximum depth of room (distance from front of platform to rearmost wall), DMX, the total ¯oor area, ST, source position coordinates, SX, SY and SZ, receiver/listener position coordinates, RX, RY and RZ, (the intersection of the front of the platform and the longitudinal axis is the origin of the source and receiver co-ordinates) the source Ð receiver distance, ROS, and the receiver position reverberation time, RTA. Table 5 shows the set up of the network functions used in the neural network analyses. The signi®cance of the above variables is mainly self-evident. The volume, and dimensions (maximum length, width and height), and to some extent the total ¯oor area, ST, are a measure of the size of the classroom. The source and receiver coordinates, and the source Ð receiver distance are signi®cant, as one would expect. The source type is also signi®cant, as the directivity of the source a€ects the early sound component of the overall distribution of the sound in a classroom.

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

757

The most signi®cant aspect concerning the selected variables is that, with the exception of the reverberation time, and without any laborious calculations, they can be speci®ed to the architect at the schematic design stage in the form of estimated design criteria. The application of simple rules of thumb is only needed to verify and coordinate the data. The reverberation time of the classroom can be calculated using conventional methods or, preferably, can be predicted using neural networks [27]. Before initiating training, further processing of the `training sets' was undertaken. As neural networks tend to perform much better when input data is normally distributed, skewness and kurtosis coecient calculations were carried out (see Table 1). The data were transformed [30]  optimize the distribution (to normalize the data a pto logarithmic (ln(x)) or root y x transformation was applied to all input variables except to nominal variable source-type S-type). Once the `data' sets were de®ned, the neural networks were trained using a number of iterative algorithms such as back propagation, conjugate gradient and Levenberg±Marquardt. The neural network software used for the analyses was statistica neural network (SNN) V4 [31]. Conditions were set within the SNN by altering the software parameters; these techniques have been discussed elsewhere [31]. Training of the neural networks stopped when the RMS error of the `testing' set could no longer be improved. 3. Results of neural network analyses The neural network analyses were carried out with the `training' set comprising 182 facts. Of these 182, approximately 10% were used for veri®cation and a further 10% were set aside (4 classrooms) to be used for `testing' Ð the remaining facts were `trained' using the network functions shown in Table 5. The ecacy of the trained networks was tested using four unoccupied classrooms not used in the original analyses and with input variables in the same range as those used in the `training' set. The descriptive statistics and main characteristics of the four classrooms C3, C7, C16 and C24 are shown in Tables 3 and 4. The four classrooms were chosen randomly from the 34 classrooms measured. Classroom C3 is medium sized with a symmetrical geometric ¯oor plan and a width greater than its length. It has a ¯oor slope of approximately 12 and a mid-frequency reverberation time, of 0.64 s. Classroom C7 is medium sized with a rectangular ¯oor and a length greater than its width. It has a ¯oor slope of approximately 10 . It has a mid-frequency reverberation time, of 1.81 s. Classroom C16 is small, parallelepipedic with an approximately square plan and a mid-frequency reverberation time, of 0.71 s. Classroom C24 is small and, parallelepipedic, with a width greater than its length. It has a ¯oor slope of approximately 13 and a mid-frequency reverberation time, of 1.39 s. For these four classrooms, the sound levels at the receiver positions were measured for the omnidirectional source. The total number of listener positions in the four classrooms was 20. The source± receiver distance raged from 0.5 to 15.0 m. The numbers of listener positions, and the distances between source and receiver in each of the classrooms `tested', are

758

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

Table 3 Descriptive statistics and main physical characteristics of the four classrooms and 20 listeners positions used to `test' neural networks Variablesa

Mean

Minimum

Maximum

Range

S.D.

V LMX DMX WMX HMX ST SX SY SZ RX RY RZ ROS RTA_125 RTA_250 RTA_500 RTA_1000 RTA_2000

989.10 14.68 11.21 15.97 4.49 220.67 1.14 0.58 1.50 2.68 0.58 1.89 3.92 1.63 1.47 1.24 1.11 1.02

159.00 7.75 5.75 7.90 2.75 47.52 2.95 0.05 1.50 2.45 0.05 1.50 0.50 0.82 0.80 0.64 0.53 0.42

1360.00 17.00 13.10 21.00 5.15 292.12 0.00 1.95 1.50 11.92 1.95 3.50 15.00 2.79 2.19 1.90 1.80 1.65

1201.00 9.25 7.35 13.10 2.40 244.60 2.95 1.90 0.00 14.37 1.90 2.00 14.50 1.97 1.39 1.27 1.27 1.24

452.18 3.60 2.83 5.11 0.92 92.30 1.23 0.74 0.00 3.90 0.74 0.62 4.11 0.60 0.51 0.49 0.54 0.49

a

For de®nitions of abbreviations see Table 2.

Table 4 Details of the four classrooms used to test neural networks and for which results are presented Room

Volume (m3)

Maximum length (m)

Maximum width (m)

Maximum height (m)

`Global' R.T. (s)

Floor Slope ( )

C3 C7 C16 C27

1360.00 986.00 159.00 1286.00

16.60 17.00 7.75 15.50

20.00 13.80 7.90 21.00

5.15 4.60 2.75 5.10

0.78 1.87 0.74 1.55

12 10 0 13

Table 5 Set up of network functions used for neural network analysesa A B C D E

SPrec SPrec SPrec SPrec SPrec a

pos‰125Š pos‰250Š pos‰500Š pos‰1000Š pos‰2000Š

ˆ f…S ˆ f…S ˆ f…S ˆ f…S ˆ f…S

type; V; LMX ; DMX ; WMX ; HMX ; ST ; SX SY ; SZ RX RY ; RZ ; ROS ; RTA‰125Š † type; V; LMX ; DMX ; WMX ; HMX ; ST ; SX SY ; SZ RX RY ; RZ ; ROS ; RTA‰250Š † type; V; LMX ; DMX ; WMX ; HMX ; ST ; SX SY ; SZ RX RY ; RZ ; ROS ; RTA‰500Š † type; V; LMX ; DMX ; WMX ; HMX ; ST ; SX SY ; SZ RX RY ; RZ ; ROS ; RTA‰1000Š † type; V; LMX ; DMX ; WMX ; HMX ; ST ; SX SY ; SZ RX RY ; RZ ; ROS ; RTA‰2000Š †

For de®nitions of abbreviations see Table 2.

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

759

shown in Tables 6±10. In each of the octave bands `tested', the results varied marginally, between the four classrooms. SP prediction in the mid-frequency octave bands of 500 and 1000 Hz produced the best results. The average (standard deviation) of the variations range between 0.72(0.35) and +0.69(1.05) dB, con®rming that in most cases the error was relatively small and that predictions of speech levels at listener positions were accurate to within the magnitude of the subjective di€erence limen of the sound level. Furthermore, in the mid-frequency bands, listenerposition predictions returned high correlation coecients (R2) for individual (`global' Ð averaged results of the 4 classrooms) classrooms. For the 500 and 1000 Hz octave bands the R2 were 0.96±1.00 (0.97) and 0.96±0.98 (0.97) respectively, (see Figs. 1 and 2). Tables 6±10 show results of trained networks with setup functions A±E (see Table 5). The tables clearly show very high correlation coecients R2. With the single exception of C16 at 250 Hz (0.87), all classrooms in each of the octave bands Table 6 Descriptive statistics for individual classroom's listener positions SP[Lp-Lw(97.6)] predictions, at 125 Hz octave band, using a neural network (NNet 2*): four classrooms and a total of 20 listener positions Classroom

ROS(m)a

C3

0.50 1.00 2.00 5.00 10.00

5.37 9.58 17.40 18.10 18.70

6.12 10.64 14.68 18.37 19.78

C7

0.50 1.00 2.40 5.00 10.00 15.00

5.37 9.58 14.90 17.60 15.90 16.40

5.20 9.54 4.40 17.00 18.29 18.63

C16

0.50 1.00 2.00 5.00

5.37 9.58 12.40 18.80

5.69 9.57 12.94 15.51

0.50 1.00 2.00 5.00 9.00

5.37 9.58 14.20 15.50 20.20

5.47 10.01 13.75 17.14 21.40

C27

SP (dB)b

*NNet2 (dB)c

R2d

%Ae

E._SPf

NN_RMSg

0.93

87.68 90.00 84.37 98.54 94.56

0.75 1.06 2.72 0.27 1.08

1.44

0.94

96.84 99.64 96.63 96.60 86.94 88.03

0.17 0.03 0.50 0.60 2.39 2.23

1.37

0.95

94.27 99.95 95.80 82.50

0.33 0.00 0.54 3.29

1.68

0.99

98.06 95.70 96.82 90.44 94.38

0.11 0.43 0.45 1.64 1.20

0.95

ROS= distance between sound source and listener position (m). SP= sound Propagation at listener position:=measured sound pressure level minus sound power level (Lp-Lw) at that position (dB). c *NNet2 = best trained neural network predictions (dB). d R2= coecient of determination (correlation coecient). e %A= percentage agreement between measured SP and predicted SP by neural network. f E._SP = error between measured and predicted SP (dB). g NN_RMS= the root mean squared error of measured and predicted SP (dB). a

b

760

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

Table 7 Descriptive statistics for individual classroom's listener positions SP[Lp±Lw(98.0)] predictions, at 250 Hz octave band, using a neural network (NNet 3*): four classrooms and a total of 20 listener positions Classroom

ROS (m)a

C3

0.50 1.00 2.00 5.00 10.00

5.39 9.23 14.50 16.30 19.50

5.44 9.76 14.16 16.14 18.83

C7

0.50 1.00 2.40 5.00 10.00 15.00

5.39 9.23 12.80 13.60 13.90 15.60

5.50 8.70 12.02 13.86 14.59 15.00

0.50 1.00 2.00 5.00

5.39 9.23 8.60 12.60

5.01 8.19 10.59 13.27

0.50 1.00 2.00 5.00 9.00

5.39 9.23 11.90 13.60 17.10

5.48 9.40 12.79 15.79 16.28

C16

C27

a

SP (dB)a

*Nnet3(dB)a

R2a

%Aa

E._SPa

NN_RMSa

1.00

99.02 94.56 97.68 99.01 96.55

0.05 0.53 0.34 0.16 0.67

0.42

0.98

97.94 94.33 93.91 98.12 95.24 96.17

0.11 0.52 0.78 0.26 0.69 0.60

0.55

0.87

93.03 88.72 81.18 94.93

0.38 1.04 1.99 0.67

1.19

0.94

98.34 98.13 93.07 86.13 95.21

0.09 0.18 0.89 2.19 0.82

1.12

For de®nitions of abbreviations see Table 6.

Table 8 Descriptive statistics for individual classroom's listener positions SP[Lp±Lw(97.9)] predictions, at 500 Hz octave band, using a neural network (NNet 3*): four classrooms and a total of 20 listener positions Classroom

ROS (m)a

C3

0.50 1.00 2.00 5.00 10.00

5.73 9.30 14.00 17.30 19.80

6.18 10.37 14.26 18.04 19.99

C7

0.50 1.00 2.40 5.00 10.00 15.00

5.73 9.30 13.50 15.10 15.10 16.60

5.42 8.80 12.29 14.10 15.48 15.52

0.50 1.00 2.00 5.00

5.73 9.30 11.10 13.40

5.87 9.23 12.05 14.70

0.50 1.00 2.00 5.00 9.00

5.73 9.30 12.10 15.30 18.80

5.91 9.65 13.23 16.23 17.29

C16

C27

a

SP (dB)a

*Nnet3 (dB)a

For de®nitions of abbreviations see Table 6.

R2a

%Aa

E._SPa

NN_RMSa

1.00

92.35 89.96 98.74 96.51 99.71

0.47 1.04 0.18 0.63 0.06

0.59

0.98

92.81 93.45 90.39 93.09 97.86 93.21

0.41 0.61 1.30 1.04 0.33 1.13

0.88

0.99

98.86 98.13 92.99 91.92

0.07 0.17 0.84 1.18

0.73

0.96

98.12 97.23 92.23 94.95 91.35

0.11 0.27 1.02 0.81 1.63

0.94

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

761

Table 9 Descriptive statistics for individual classroom's listener positions SP[Lp±Lw(96.7)] predictions, at 1000 Hz octave band, using a neural network (NNet 4*): four classrooms and a total of 20 listener positions Classroom

ROS (m)a

C3

0.50 1.00 2.00 5.00 10.00

6.16 9.39 15.70 18.50 20.50

4.92 9.54 14.05 19.19 21.50

C7

0.50 1.00 2.40 5.00 10.00 15.00

6.16 9.39 11.80 14.40 14.90 16.00

6.25 9.53 13.12 13.63 14.73 15.42

0.50 1.00 2.00 5.00

6.16 9.39 12.10 13.40

6.18 8.84 11.14 13.35

0.50 1.00 2.00 5.00 9.00

6.16 9.39 12.60 16.60 19.40

5.95 10.17 13.35 16.20 17.54

C16

C27

a

SP (dB)a

*Nnet4 (dB)a

R2a

%Aa

E._SPa

NN_RMSa

0.98

79.85 98.44 89.46 96.39 95.37

1.24 0.15 1.65 0.69 1.00

1.07

0.96

98.62 98.52 89.91 94.62 98.86 96.40

0.09 0.14 1.32 0.77 0.17 0.58

0.68

0.98

99.63 94.14 92.10 99.63

0.02 0.55 0.96 0.05

0.55

0.97

96.63 92.35 94.40 97.58 90.39

0.21 0.78 0.75 0.40 1.86

0.98

For de®nitions of abbreviations see Table 6.

Table 10 Descriptive statistics for individual classroom's listener positions SP[Lp±Lw(98.5)] predictions, at 2000 Hz octave band, using a neural network (NNet 2*): four classrooms and a total of 20 listener positions Classroom

ROS (m)a

C3

0.50 1.00 2.00 5.00 10.00

6.05 8.62 16.50 17.90 21.20

7.63 11.17 14.75 19.16 21.18

C7

0.50 1.00 2.40 5.00 10.00 15.00

6.05 8.62 11.90 13.50 14.20 15.30

6.27 8.87 11.96 13.76 14.85 15.23

0.50 1.00 2.00 5.00

6.05 8.62 11.30 13.40

6.93 9.27 11.50 13.15

0.50 1.00 2.00 5.00 9.00

6.05 8.62 10.70 14.90 18.80

6.92 9.96 12.81 15.51 16.46

C16

C27

a

SP (dB)a

*NNet2 (dB)2a

For de®nitions of abbreviations see Table 6.

R2a

%Aa

E._SPa

NN_RMSa

0.94

79.36 77.15 89.37 93.43 99.90

1.57 2.55 1.75 1.26 0.02

1.65

1.00

96.56 97.16 99.53 98.14 95.65 99.53

0.22 0.25 0.06 0.26 0.65 0.07

0.32

1.00

87.29 92.98 98.26 98.14

0.88 0.65 0.20 0.25

0.57

0.92

87.50 86.56 83.50 96.08 87.56

0.86 1.34 2.11 0.61 2.34

1.60

762

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

returned high R2 ranging from 0.93 to 1.00. The high correlations are highlighted by the scatterplots shown in Figs. 1 and 2. The tables also show that the percentage agreement, %A, between measured and predicted values, at each listener positions in each of the classrooms, are also high. Furthermore, the tables show acceptable RMS errors between measured and predicted SP values. The smallest errors are for the predictions in the 500 and 1000 Hz frequency bands; they range between 0.55 and 1.07. The accuracy of the predictions is within the subjective di€erence limen of sound level, which is approximately 1 dB. Further assessments were made of the ecacy of the neural networks by comparing the neural network predicted SP values with those obtained using the Hopkins± Stryker equation [Eq. (1)] and Barron's revised theory [Eq. (2)]. Results obtained using the Hopkins±Stryker equation are generally higher than the measured values. For all predictions 54% of the variations were positive and were well in excess of the subjective di€erence limen of 1 dB. In addition, the equation's inability to make accurate and reliable predictions was further emphasized by the `global' percentage agreement %AG, obtained between measured and predicted SP values, of 85% and the `global' RMSG error of 2.28 dB. This is consistent with previously published literature [18,33±35]. Results obtained using Barron's revised theory did not entirely agree with those reported in the literature, that is, that the revised theory substantially reduces the discrepancy between the predictions of the `classical theory' and that of the measured values. The revised theory, like Hopkins±Stryker equation, makes use of

Fig. 1. Typical scatterplot showing a `global' linear regression analysis for neural network predictions of SP for 20 listener positions in the 1000 Hz octave band in classrooms C3, C7, C17 and C27. Curved dotted lines are the con®dence bands denoting 95% con®dence limits.

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767 763

Fig. 2. Typical multiple line plots comparing variations, at 1000 Hz octave band in each of classrooms C3, C7, C17 and C27, between SP predictions of alternative methods: trained neural network *; the revised theory, ~; and the Hopkins±Stryker equation &.

764

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

average absorption coecients. In addition, there is the added problem of having to guess the best value of the early re¯ection ratio, n, at the schematic design stage. While there are a number of factors [17] a€ecting the choice of the value of n, for reasons that are not obvious, the accuracy of predictions was highest when n was equal to unity. Results showed that the revised theory reduced the variations only marginally. The `global' percentage agreement, %AG, between measured and predicted SP was 87% and the global RMSG error was 2.20 dB. In addition, the RMS errors for the predictions for each classroom, in each octave band, are relatively high, the only exceptions being classroom C7 and C16 at 500 Hz frequency where RMS errors of 0.73 and 0.76 dB, respectively, were obtained. All other RMS errors ranged between 1.19 and 4.00 dB. The average (standard deviation) of the variations ranged between 2.70 (0.68) and +2.60 (2.95) dB, suggesting that prediction errors of speech levels at most of the receiver positions are well above the subjective di€erence limen. Notwithstanding the high errors, the `revised theory' results produced high correlation coecients R2. Except for classroom C16 at a frequency 125 Hz (0.79) and classroom C27 at frequencies 250 and 2000 Hz (0.88 and 0.86), all other predictions for classrooms at each of octave frequency returned R2 values ranging from 0.92 to 1.00. Regarding a direct comparison of the results, it is clear that the neural networks produced the most accurate predictions of sound propagation in the university classrooms. Fig. 2 shows typical multiple line plots of the variations of predicted SP in each of the four classrooms in the 1000 Hz octave band. They show the agreement between the methods investigated; i.e. that the neural network predictions of SP show better agreement with the measured values and are accurate to within the same order of magnitude as the subjective di€erence limen for sound levels (1 dB). Furthermore, Tables 11 and 12 show results of t-tests con®rming that there is a signi®cant di€erence between the SP prediction methods of neural networks and the Table 11 Student t-test results for neural networks and revised theory predicted SP values in classrooms C3, C7, C16 and C27 for frequency bands 125±2000 Hz (signi®cant at P < 0.05) Method Neural networks vs revised theory

Mean 12.30 12.76

Std.

Number of predictions

Student t-value

P

4.44 5.54

100

2.36

0.02

Table 12 Student t-test results for neural networks and Hopkins±Stryker equation predicted SP values in classrooms C3, C7, C16 and C27 for frequency bands 125±2000 Hz (signi®cant at P < 0.05) Method Neural networks vs Hopkins±Stryker

Mean 12.30 10.87

Std.

Number of predictions

4.44 4.40

100

Student t-value 8.44

P 2.67E-13

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

765

revised theory (P=0.02 [< 0.05]), and a signi®cant di€erence between the neural network method and the Hopkins±Stryker equation (P=0.00 [< 0.05]). 4. Discussion and conclusions Research over the past 20 years has shown that a number of factors a€ect the acoustics of a classroom for speech. They are the sound source output power and directivity factor, and the resulting speech level at receiver positions, background noise (ventilation and student activity noise), levels, and the reverberation time. Speech intelligibility, simply de®ned as how clear one can hear speech in a room, can be predicted from known estimates of these factors. Furthermore, the signal to noise ratio, S/N, one of the most important parameters a€ecting speech intelligibility, is equal to the level of speech minus the level of the background noise at the listener position. It is clear that when deliberating and assessing the acoustical quality of a classroom at the schematic design stage, there is a need for an accurate, reliable and expeditious method of predicting the distribution of speech levels in the room. The development of such a method has been accomplished using neural network analysis. Results show that trained neural networks can be used to make predictions of sound propagation SP at listener positions. Results of the study show that neural networks are at least as e€ective as other methods used to predict the distribution of sound in rooms. Results also showed that SP predictions for classrooms using neural networks were in better agreement with measured values than those obtained using Barron's revised theory or the Hopkins±Stryker equation. Furthermore, results showed that for all classrooms `tested', and in octave frequency bands from 125 to 2000 Hz (except for classroom C3, C7 and C16 in the 125 Hz frequency band), RMS errors were within the subjective limen of sound levels which is approximately 1 dB. The study had shortcomings: in particular the size of the `training' and `testing' data used for the neural network analyses. Expectations are that, a substantial increase in the number of facts used in the `training' and `testing' sets, (i.e. a larger number of university classrooms and receiver positions) would substantially increase the accuracy of the neural network predictions. Results of RMS errors, a€ected by `extreme' or `random' variations, for SP predictions in the frequency bands 125, 250 and 2000 Hz, were higher than expected. These results support the supposition that an absolute minimum of 10 measurements is required in each classroom to conduct signi®cant statistical analyses. It is clear, that, with a larger `testing set' for each classroom, a residual regression analysis would most likely have deemed these `random' variations as outliers. Consequently, the RMS values would have been lowered signi®cantly. It is also clear that the removal of the outliers would have signi®cantly improved the average (standard deviation) of the variations, adding con®dence in the overall results. An average of only 5.35 measurements per classroom was used in these investigations. Attempts were made, during the course of investigations, to increase the number of `facts' using the number of techniques which did not include actual

766

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

listener position measurements. These techniques, except for the inclusion of bounds values, had no a€ect on the results of the neural network analyses. Furthermore, while it is postulated that the limitations do not diminish the validity of the results presented here nor the signi®cance of the method developed to predict SP in university classrooms, it is clear that future work should replicate these investigations using a larger database. Finally, results show that there is a good basis for using geometrical and acoustical variables to train non-linear models, such as neural networks, to predict SP in university classrooms. The number and signi®cance of the `input' variables used to train these neural networks may not be so apparent. While the present work limited the number of input variables to 15, this could be altered and/or decreased/ increased, provided sucient data were available for the `training' set. However, even with sucient data, future studies should aim at reducing the number of input variables to an absolute minimum. This study was mainly concerned with the conceptual design stage of university classrooms. Hence, the database used contains measurements from unoccupied university classrooms, and the input variables were chosen primarily for their simplicity, availability and suitability. Future work should investigate the rami®cations to neural networks that are `trained' with data from occupied university classrooms. Acknowledgements The research by J.N. was conducted under an Australian Postgraduate Award and a Department of Architectural and Design Science Supplementary Scholarship. References [1] Houtgast T, Steeneken HJM, Plomp R. Predicting speech intelligibility in rooms from the modular transfer function. I. General room acoustics. Acustica 1980;46:60±72. [2] Bradley JS. Uniform derivation of optimum conditions for speech in rooms. Report BRN 239, National Research Council Canada, 1985. [3] Latham HG. The signal-to-noise ratio for speech intelligibility Ð an auditorium design index. Applied Acoustics 1979;12:253±320. [4] Davis D, Davis C. Designing for speech intelligibility. In: Ballou Glen, editor. Handbook for sound engineers: the new audio cyclopedia, 2nd ed. Indiana: SAMS, 1991. [5] Knudsen VO, Harris CM. Acoustical designing in architecture. New York: Acoustical Society of America, 1978. [6] Cremer L, MuÈller HA. Principles and applications of room acoustics, Vol. 1, T. J. Schultz Trans. Applied Science Publishers, London. [7] Bradley JS, Reich RD, Norcross SG. On the combined e€ects of signal-to-noise ratio and room acoustics on speech intelligibility. Journal of Acoustical Society of America 1999;80(3):837±45. [8] Bradley JS. Speech intelligibility studies in classrooms. J Acoust Soc Amer 1986;80(3):846±54. [9] Bradley JS. Predictors of speech intelligibility in rooms. J Acoust Soc Amer 1986;80(3):837±45. [10] Bradley JS. Relationships among measures of speech intelligibility in rooms. Journal of Audio Engineering Society 1998;46(5):396±405.

J. Nannariello et al. / Applied Acoustics 62 (2001) 749±767

767

[11] Friberg R. Noise reduction in industrial halls obtained by acoustical treatment of ceiling and walls. Noise Control and Vibration Reductions, 1975: 75±9. [12] Schultz TJ. Relationship between sound power level and sound pressure level in dwellings and oces. ASHRAE Transactions 1985;91(1):124±53. [13] Vorlander M. Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm. J Acoust Soc Amer 1989;86(1):172±8. [14] Rindel JH, Naylor GM. ODEON±a hybrid computer model for acoustic modelling. In: Proceedings Western Paci®c Regional Acoustics Conference, Brisbane, 26-28 November 1991, p. 95±102. [15] Rindel JH. Computer simulation techniques for acoustical design of rooms. Acoustics Australia 1995;23(3):81±6. [16] Davis D, Davis C. Audio measurements. In: Ballou G, editors. Handbook for sound engineers: the new audio cyclopedia, 2nd ed., Indiana:SAMS, 1991. [17] Barron M. Auditorium acoustics and architectural design. London: E & FN Spon, 1993. [18] Barron M, Lee L-J. Energy relations in concert auditoriums. I. J Acoust Soc Amer 1988;84(2):618± 28. [19] Hodgson M, Rempel R, Kennedy S. Measurement and prediciton of typical speech and backgroundnoise levels in university classrooms during lectures. J Acoust Soc Amer 1999;105(1):226±33. [20] Nelson MM, Illingworth WT. A practical guide to neural nets. Addison-Wesley, 1991. [21] Lippmann RP. An introduction to computing with neural nets. IEEE ASSP Magazine, 4±22 April 1987. [22] Simpson PK. Arti®cial neural systems: foundations, paradigms, applications, and implementations. New York: Pergamon Press, 1990. [23] Hertz J, Krogh A, Palmer RG. Introduction to the theory of neural computation. New York: Addison-Wesley, 1991. [24] Hammerstrom D. Working with neural networks. IEEE Spectrum 1993;34(7):46±53. [25] Hegazy T, Fazio P, Moselhi O. Developing practical neural network applications using back-propagation. Microcomputers in Civil Engineering 1994;9:145±59. [26] Fausett L. Fundamental of neural networks. New York: Prentice Hall, 1994. [27] Nannariello J, Fricke. The prediction of reverberation time using neural network analysis. Applied acoustics 1999;58:305±25. [28] Hodgson M. Experimental investigation of the acoustical characteristics of university classrooms. J Acoust Soc Amer 1999;84(4):1810±9. [29] Hodgson M. When is di€use-®eld theory applicable. Applied Acoustics 1996;49(3):197±207. [30] Stein R. Preprocessing data for neural networks. AI Expert, March 1993, 32±7. [31] StatSoft, Statistica neural networks manual. Tulsa OK: StatSoft, 1999. [32] Nannariello J, Fricke FR. The use of neural network analysis to predict the acoustic performance of large rooms: part I. predictions of the parameter G utilizing numerical simulations. Applied Acoustics (in press). [33] Hodgson M. Experimental evaluation of the accuracy of the Sabine and Eyrins theory in the case of non-low surface absorption. J Acoust Soc Amer 1993;94(2):835±40. [34] Barron M. Growth and decay of sound intensity in rooms according to some formulae of geometrical acoustic theory. J Sound Vib 1973;27:186±96. [35] Kutru€ H. Room acoustics. 3rd ed. London: Elsevier Applied Science, 1991. [36] Nannariello J, Fricke FR. The use of neural network analysis to predict the acoustic performance of large rooms: part II. Predictions of the of the acoustical attributes of concert halls utilizing measured data. Applied Acoustics (in press).