Journal of Phonetics (1974) 2, 65-89
Language background and the perception of foreign accent William J. Barry lnstitutfiir Phonetik, University of Kiel, West Germany Received 1st November 1973
Abstract:
The interlingual German-English wordpairs Tip-tip, PajJ-pus and Busch-bush (spoken by English and German speakers), were offered to two groups of English observers for judgment of their acceptability as English and to four groups of German learners of English from two regions of Germany for judgment as English or German. Various acoustic properties of the stimuli were measured and the degree of correlation with the group judgments calculated. Results indicate that certain acoustic properties vary systematically according to language and dialect background, and that different properties influence the decisions of the observer groups according to their language or dialect background.
Introduction
The investigation reported below had a threefold goal, each part of which was of an exploratory nature. It was concerned, firstly, with a person's ability not only to identify the words of an utterance (and consequently also the strings of phonemes which make up the word) but, in addition, to pass judgment on the accent with which the utterance was produced. This may occur with extreme accuracy and explicitness as to the origin of the speaker (social or geographical) or merely express the strangeness or foreignness of the accent. Secondly, the investigation was aimed at finding properties of the acoustic signal correlating with the person's judgments of accent, which might thus be considered to signal particular properties of "accent" for the group of observers concerned. The third part of the investigation was an attempt to link the judgments on "accent" with certain regional phenomena in the production of the words on which judgment was passed. Before describing the arrangements used to obtain the information outlined in the above goals, we shall present a brief discussion of the assumptions underlying the investigation. Essential to all three goals is the assumption that the phonetic aspect of speech-perception is at least a "two-tiered" -process, one part concerned with phoneme recognition, the other providing a sub-phonemic output. Whether or not the tier involved in the non-semantic (non-phonemic) process should be further sub-divided (Meyer-Eppler, 1959, cf. Hammarstram, 1963) is merely a question of the categorization of information contained in the speech signal and does not therefore affect the basic assumption. There are indications (Heike, 1969a) that judgments of deviant accent depend to a large extent on vowel quality and are very little affected by other elements of the speech signal. Other experimental evidence in phoneme perception supports the assumption that nonsemantic information is likely to be carried by vowels rather than, say, by consonants. The exclusion of consonants is supported by the idea of categorical perception as developed in 3
66
W. J. Barry
connection with "motor theory" of speech perception (Liberman, 1957; Liberman, Cooper, Harris & MacNeilage, 1963). Similar experiments with vowels reveal a higher degree of discriminability of vowels which are nonetheless classified as belonging to the same phoneme category; in other words, differences can be perceived at a sub-phonemic level (Fry, Abramson, Eimas & Liberman, 1962). The extent to which such smaller differences can be perceived is, however, still unclear, since it has been shown that the discrimination of vowels presented in syllables is not so acute as that of isolated vowels (Stevens, 1968). Whether acuity diminishes to an even greater degree when vowels are presented in larger contexts is not proven but appears likely. The absolute categoricality of consonant perception is, however, strictly speaking, just as doubtful as the absolute non-categoricality of vowel perception. In a recent study of the perception of voicing (Abramson & Lisker, 1973), giving considerable information on the performance of individual observers, results indicate that, for speakers of Spanish, peaks of discrimination are often found well away from the phoneme boundary for cases where the voice onset is delayed. This can be interpreted as showing that differences in degree of aspiration can be perceived in voiceless plosives independent of phoneme categories. It is, however, not only by modifying the validity of a theory of categorical perception that we can hope to find support for the possibility of perceiving sub-phonemic differences. By definition, categorical perception means that only sounds which have the physical attributes of different phonemes can be perceived as different. But speakers with different language or dialectal backgrounds are known to draw the boundary in the continuum of variation for any particular physical attribute at different points (Lisker & Abramson, 1964 etc.; Stevens, Liberman, Studdert-Kennedy & Ohman, 1969; Scholes, 1967a, b; Lindner, 1966a, b; etc.). Thus, if a phoneme category is ambiguous as a result of contextual and situational information, the value of any particular physical attribute can be ambiguous without altering the identification of the phoneme. Thus, the value of an attribute may be near or on the phoneme boundary in the listener's idiolect. In such cases, even strictly categorical perception will allow that such a sound can be perceived as different from an "unambiguous" realization of a particular phoneme, i.e. from a realization in which the attribute has a value midway between phoneme boundaries as defined by the listener's classificatory habits. It has been shown (Heike, 1969b) that in fact judgments of vowel abnormality given simultaneously with phoneme identification reach a peak at the phoneme boundary. Since there is a discrimination peak at the phoneme boundaries along any physical continuum of consonants too, there is no reason to believe that such judgments of abnormality do not also apply to consonants if individual or dialectal differences in such phoneme boundaries can be discovered. It has become apparent in the course of the discussion that the "two-tiered" nature of the process of speech-sound perception postulated at the beginning is probably not independent of higher levels of speech perception. However, even if the incoming acoustic signal has to be stored briefly and checked against higher-level information (Fry, 1964; Kozhevnikov & Chistovich, 1965) in cases of doubtful identity, it is necessary to postulate some inner store of distinct units for the immediate identification of non-ambiguous incoming speech signals (otherwise no higher-order information is accumulated for the identification of "doubtful" cases). The judgment of accent can be seen as a process of comparing the stored acoustic impression against signals contained in the short-term store which have been identified phonemically (on the strength of either immediate decisions or with reference to higher-level information) and responding with a deviation count.
Language background and perception offoreign accent
67
Despite their extreme usefulness, one of the main objections to such hypotheses of internalized impressions of sound units, corresponding ultimately to linguistic units, is the extreme variability of phonetic substance. It has been calculated that just to account for differences in the realizations of English phonemes due to phonotactic possibilities and aliowing for four degrees of stress (but no differences in delivery speed) a store of some 55,000 impressions would be necessary (see MacNeilage, 1970). The situation is further complicated by the inaccuracy of the speech organs even when repeating items which the speaker himself intends and considers to be identical. Reducing such variability merely to sluggishness of articulating organs is contradicted by electromyographic evidence (Manseli, 1970) which raises the source of random variation at least to the neuro-muscular level. Despite such a discouraging point of departure, phonetic studies have stili been able to show that a great deal of the variation in vowel realization at least is regular both with regard to quality (Lindblom, 1963) and duration (Nooteboom, 1972, 1973) and is thus reducible to low-level determinable modification of a unit which is constant at a higher level. Both Lindblom's and Nooteboom's models represent only a partial explanation of phonetic variance in that (so-called) random fluctuations are stili apparent, and that factors of the communication situation are not included in the model. However, such restrictions in the scope of a model in no way invalidate it, and a further indication that it is correct to postulate the existence of underlying invariant units modified in production by syntagmatic and prosodic constraints are the results from perceptual experiments (Lindblom & Studdert-Kennedy, 1967; Nooteboom, 1973). Allowing that the mechanisms of speech perception permit systematic differences in the realization of phonemes, the assumption appears plausible that a connection exists between judgments of "accent" and measured differences in the acoustic properties of stimuli offered for judgment. Experimental Procedure
Material Studio recordings were made of four native speakers of English and six native speakers of German in their 1st-3rd year of English studies reading the sentences He gave me a tip, The wound was filled with pus and He found it in a bush. The sentences were spoken as three of a series of 106 sentences, and the speakers did not know which , if any, of the sentences would be the object of any special attention. They were read with falling intonation at a rate which came naturally to each speaker, the main accent occurring on the final word. These words: tip, pus and bush were separated from the sentences. In addition to the English sentences, the German speakers recorded the same number of German sentences, under the same conditions, each German sentence containing a word which constituted an interlingual pair with the corresponding English word. The German words Tip, PajJ, and Busch were separated from the sentences Er gab mir einen Tip, Er verlor seinen PajJ, and Er fand es im Busch to form the basis for comparison with the corresponding English words. These 16 realizations of the three interlingual word-pairs (4 English-English, 6 GermanEnglish, 6 German-German) were duplicated 5 times and randomized (in different random order for each set of 16) with 5 s between each realization, thus giving a sequence of 80 realizations for each word-pair with a playing-time of 6 min 40 s. The tapes were presented to English and German observers for judgment.
68
W. J. Barry
Speakers Two of the four English speakers (RP) spoke without a regional accent while two were identified as having a North Midlands accent (RE). The German speakers were also heterogeneous, three having been born in or near Cologne and having lived in Cologne (C) and the other three having been born and having lived in Hamburg (HH). The basis of comparison was therefore four-sided, with the primary pole being English-German and the secondary poles being formed by the dialectal differences existing within the two language groups. Observers Two regionally heterogeneous groups of English observers consisting of 9 linguistically naive students (UCL) and 13 postgraduates attending a course for teaching English as a Foreign Language (EFL) and four groups of German observers judged the stimuli on the test tapes. Organizational difficulties did not allow the PajJjpus tape to be presented to both English groups. Each of the German groups was regionally homogeneous; one Hamburg group and three Cologne groups of different ages were tested. The Cologne groups were: (1) 25 pupils from the second class of a grammar-school (Gymnasium) (Qui), (2) 12 pupils from the penultimate class of the same school (UI) and (3) 19 1st and 2nd term students of English (PH). The 16 Hamburg observers (HH) were members of the penultimate class of a Hamburg grammar-school. Test arrangements In the test carried out with German observers at least 24 h lay between each word-pair presented for judgment. In England, however, all the tests had to be completed in one day. The English observers were informed that the words they were to hear had been separated from English sentences, some of which had been spoken by English speakers, some by foreign learners of English. Their task was to judge the accent with which the word was spoken as being (1) neutral English, (2) regional English, (3) slightly foreign, (4) foreign, (5) very foreign. They were not informed as to the possible nationality of the foreign learners. The German observers were told that the words had been separated either from a German sentence or from an English sent~nce. They were asked to judge the words as (1) English, (2) undecided, (3) German. In all cases, the test procedure was the same: instructions were given and answer sheets distributed; ten words were played from the middle of the test-tape to accustom the observers to the condition and speed of delivery; the observers were asked whether they could hear clearly; the 80 stimuli were played; the answer sheets were collected. Methods of analysis The analysis comprised three aspects corresponding to the aims of the investigations mentioned above. (1) The judgment tests were considered in their own right as an indication of whether or not a differentiation of the isolated words had been possible. (2) The stimuli offered for judgment were subjected to an acoustic analysis with respect to certain spectral and durational properties. (3) Within the limits of the computational apparatus available, the correlations between judgment-test results and acoustic properties of the stimuli were investigated.
Language background and perception offoreign accent
69
Judgment tests The necessarily different judgments by the English and German observers prevent a regular comparison of these two sets of observer groups. However, due to the limitation of "foreign" speakers to those of German nationality, the category "foreign" used in the English tests can be equated with the category "German" used in the German tests, at least for the purpose of an impressionistic comparison. To compensate for the difference in the number of categories available for selection (3 in the German tests, 5 in the English), each judgment was given a value between 0 and 1·0 for both the German and the English tests and then divided by the number of judgments given. Thus an index of "Englishness" or "Germanness" ("Foreignness" in the English tests) could be calculated for any speaker or group of speakers, any observer or group of observers. The values given for each category were:
English observers (1) (2) (3) (4) (5)
Neutral English Regional English Slightly foreign Foreign Very foreign
German observers 1·0 0·75 0·5 0·25
(1) English (2) Uncertain (3) German
1·0 0·5
0
0
Since index values calculated by this method can be ambiguous (for example a value of 0· 5 in an English test can be arrived at both by 100% selection of category 3 or by an equal number of selections of all five categories) a method was devised by means of which the reliability of the judgments could be presented quantitatively in values which also ranged between 0 and 1·0 and thus be indicated graphically parallel with the index values. One advantage of such a reliability figure is the indication it gives of the number of internal judgment categories used by any observer group. Speakers who are judged consistently as either "English" or "German" (foreign) will have a high reliability value while those judged inconsistently will have a low reliability value. The reliability curve can therefore be expected to be U-shaped when the tokens are ordered according to their index-values. If a W-shaped curve is observed, i.e. a third reliability peak at a medium index value, a third category for judging the speakers can be assumed. This possibility becomes interesting in the English tests because the category "regional English", which was given the value 0·75, is not necessarily to be considered as a position along the acceptability continuum "neutral English-very foreign". If the reliability value for speakers with index values on or near 0·75 is particularly high then "regional English" as a separate category, distinct from "neutral English" and "Foreign" can be assumed . Acoustic measurements The auditory properties which were considered to be of interest as determiners of observer judgments were (a) vowel quality (b) vowel length (c) degree of aspiration (in tip and pus) of the initial consonant (d) quality of final fricative (in pus and bush) (e) degree of pitch change during the syllable. Of these, the quality of the fricatives was ignored for both theoretical and technical reasons. It is known that the noise component is a primary cue for distinguishing /s/ and /J/ (Harris, 1958), yet the acoustic properties within these two phoneme categories are
70
W. J. Barry
known to be very variable (Hughes & Halle, 1956; Strevens, 1960), so that no cues for differentiating within the phoneme categorie& could be assumed, even hypothetically. Perception experiments even with phonetically trained observers have revealed a large amount of disagreement even in the task of phoneme recognition on the strength of noise quality (Sharf, 1968) and the ability to recognize deviant fricatives, which Sharf investigated, has little bearing on sub-phonemic differentiation , the task carried out in the present investigation. Technically, the measurement of fricative spectra with an unmodified sonagraph, which was the only one available, is liable to produce artificial peaks of energy (with the narrow filter) or present a non-representative section of the fricative (with the broad filter) due to the fixed bandwidth/integration time relation (Fant, Fintoft, Liljencrants, Lindblom & Martony, 1963; cf. Fant, 1968; Fourcin, 1965). The degree of pitch movement, measured in terms of fundamental frequency of the vowel with a Fr0kjaer-Jensen Trans-Pitchmeter used together with a Philips Oscilloscript, was not considered to be of importance in a linguistic or para-linguistic sense, particularly as the pitch change represented the nuclear tone of an intonation group, separated and presented as an isolated stimulus. However, it might still affect the judgments of certain observer groups if the amount of change agreed or disagreed with the expectations of the group. The degree of aspiration, represented as the duration value between explosion of the initial consonant and voice onset, was considered important in the light of results in a recent investigation (Abramson & Lisker, 1972, 1973). Differences may well be perceived within a phoneme category with longer aspiration values. Vowel duration was measured from the explosion to the end of F 1 • Including the aspiration portion in the vowel duration allowed a parallel treatment of tip and pus with bush, although experimental procedure is by no means uniform in this respect (Maack, 1953; Peterson & Lehiste, 1960; Fischer-J0rgensen, 1964). It is also not clear perceptually which segment should be accorded the voiceless transitional part of the signal. It is known that both the quality of the consonant (aspiration) and the quality of the vowel is signalled by this portion, as can be readily demonstrated by separating the voiced part of the vowel and listening to the explosion and voiceless transition. Since parallel encoding of consonant information (place of articulation) and vowel is accepted as unquestioned with voiced initial stops there seems no reason not to extend the parallel encoding to aspiration in the case of voiceless stops. Vowel quality was represented in terms of F 1 and F2 in the case of PajJ:pus and Busch: bush, and in terms of the first three formants with Tip: tip. With the exception of the fundamental frequency, the measurements were taken from Sonagrams made with broadband filter at the 80-8000 Hz setting with the scale-magnifier set at 0· 5. A check-set for the duration values was made of the Busch: bush stimuli on the large drum, but the degree of variation in the measured value led to the conclusion that a check for the other stimuli was unnecessary. In the measurement of formant values, the mid-point between the beginning and end of F 1 was used as the measurement point since the combinations of initial and final consonants in the test-words did not allow the point to be estimated at which the articulatory movement reflected in the formant changes was nearest to a possible "target". All measurements were made with a Hewlett-Packard No. 7004 B X-Y Recorder. The voltage values were transformed automatically to frequency and duration values. The error expectation with an average voice pitch of 120Hz is about 30Hz (Lindblom, 1962).
Language background and perception offoreign accent
71
Correlations Correlations between the index values for individual realizations of the test words and the values gained by measuring the acoustic properties of the realizations were calculated as single, double and triple correlations. Due to the rank-like nature of the individual categories rank-correlations had to be calculated. Single correlations were calculated according to the Spearman formula: r
= 1- 6I:d/ n(n 2
-
1)
Double correlations, using single correlations and the intercorrelations between the two acoustic parameters being considered, can be calculated directly (Lienert, 1967) with the formula:
The parameters can be weighted by calculating the standard partial-regression coefficient or "Beta-value" according to the formula:
A= r cA
f3 c
-
1-
r cBrAB. 2 rAB
'
This represents the -regression coefficient of one variable while the second one is kept constant. With more than two variables the calculation becomes complicated. The general formula is: R cAB
0
0
0
k
=
v'rcAPcA
+ rcBP cB + r cc pcc +
0
0
0
r ckPck·
This presupposes the knowledge of the Beta-values which can be calculated most easily by non-mathematicians by means of the Doolittle method. (Lienert, 1967, pp. 405-8.)
Results Judgment tests The index values calculated for the individual realizations in the various judgment tests are given in Table I. The points considered in the judgment test results were (a) the degree of agreement in the judgment order between groups of common geographical origin, (b) how dependent judgments were on the "voice-quality" of the speakers rather than other features of production, (c) whether a connection between judgment-order of "nativestimuli" (German words spoken by Germans and English words by English speakers) a nd geographical origin is evident in some or alJ of the group results, (d) whether a systematic relationship between the judgments given for English and German stimuli spoken by the German subjects is apparent, (e) further, the differences in judgment reliability of the individual observer groups are of interest. (a) Table II indicates the degree of agreement in the judgment of the three word-pairs between the groups of common and different origin. It is striking that the correlations between the Cologne groups are in alJ nine cases higher than those between the Cologne and Hamburg groups. In the case of the Tip-tip test, the degree of agreement between the two English observer groups is extremely high despite the heterogenity of the observers comprising the groups. In the Busch-bush test, however, the agreement is no higher than between the Hamburg and Cologne groups.
Table I
Index values for individual speakers. Values are to be read as , ~o (e.g. 94 = 0·94)
A. English observer groups
B. German observer groups
Hamburg EFL
UCL
Busch Tip
Pa./3
Busch Tip
Pa./3
Busch Tip
Pa./3
Busch
15 86 60
70 85 80 87 21 23 60 56 18 18 95 67 10 21 85 30
36 83 45 97 58 50 56 34 54 56 49 29 16 23 89 52
80 71 80 85 15 28 63 87 61 08 58 26 02 00 83 13
64 82 71 87 56 57 45 82 42 42 64 26 09 05 84 14
60 80 74 99 62 66 27 43 56 42 32 16 31 24 91 39
83 69 50 53 27 32 57 59 70 19 70 42 06 03 89 52
60 65 61 71 56 42 48 79 64 36 59 26 19 19 67 23
37 75 54 93 42 66 28 31 61 52 27 09 16 17 88 29
41 32 60 80 45 47 52 36 70 16 56 28 19 14 94 37
44 48 82 87 52 48 45 62 41 20 70 44 18 13 76 26
30 60 66 90 41 46 36 24 50 26 28 25 35 23 85 45
51 24
52 30
52 21
47 33
51 29
52 24
48 25
49 23
45 25
45 23
48 22
44 25
Pa./3
Busch Tip
Pa./3
Busch Tip
p
94 85 74 82 63 66 94 90 91 61 68 21 66 32
-
97 91 80 76 47 56 79 80 85 47 54
93 76
77
s
H Ke Ke La La Te Te Bo Bo Mil M il Ot Ot
.X: s:
(E) (G) (E) (G) (E) (G) (E) (G) (E) (G) (E) (G)
08
-
66 33
-
63 19
72
-
11
68 60 43 80 55 76 81 40 33 84 47 60 36
78 76 33 31 38 51 46 41 36 41 31 68 43 81 70
59 27
63 23
52 18
11
37 22 75
77
Quinta
UI
Pa./3
Tip
90 83 64 08 59 71 81 75 67 55 56 46 85 60 62 61
PH
HH
Speaker
B
Cologne
62 65 20 35 71 74 38 45 95 36 59 50 11
Language background and perception offoreign accent Table II
Tip-tip PajJ-pus Busch-bush
73
Degree of agreement between observer groups
PH/UI
PH/Qui
UI/Qui
HH/PH
HH/UI
HH/Qui
UCL/EFL
0·75 0·91 0·94
0·64 0·91 0·81
0·63 0·77 0·94
0·24 0·75 0·67
0·57 0·58 0·77
0·43 0·76 0·67
0·93
Hamburg/Cologne
Cologne/Cologne
0·68 Engl./Engl.
(b) The influence which voice-quality had on the judgments cannot be determined with any certainty, although widely different index values given for the English and German realizations of the same speaker indicate that in some cases at least, judgments were made independent of voice-quality (cf. Speakers Te, Bo, Ot in Tip-tip, La, Bo, Ot in PafJ-pus and Ot in Busch-bush). Intercorrelations for the results of the different word-pairs were calculated on the assumption that high correlation coefficients could be interpreted as a high dependency on voice-quality (i.e. similar judgment order despite different segment structure in the syllables). However, close agreement between the judgment orders for different word-pairs can also indicate the general degree of acceptability of a particular idiolect, just as a low correlation coefficient between word-pairs can be interpreted either as an indication of a basis of judgment which is independent of voice quality or of a completely haphazard basis of judgment. 1·0
(a)
08
06 04 0 ·2
-
O 10
0·8
I I
RP RP HH RE RE HH HH
c
I
I
c c
RP RE HH
C
(b)
-
" 06 "' 04 -
"0
E
02 O
RP HH RE RP
RE HH
10
C HH
I
I I
C
C
RP RE HH
c
RP RE HH
C
(c)
08 t-
06 t04 10·2 t0
Figure 1
I I RP RP RE HH
C
C HH HH
C RE
English index values for RP and Regional (RE) English speakers and Cologne (C) and Hamburg (HH) German speakers. Individual values on the left, average values for speaker groups on the right. (a) Tip- tip; (b) Pa./3-pus; (c) Busch-bush .
74
W. J. Barry
(c) An ability to judge the speakers according to their geographical origin was only apparent in the English observer groups. Figure 1 represents the index values for the English words spoken by English speakers (RP, RE) and the German words spoken by the Hamburg (HH) and Cologne (K) speakers. The average index values for these speaker groups are shown on the right. With the exception of RE in Busch-bush, where one regional English speaker (RE) received by far the lowest index, thus lowering the average RE index drastically, the speaker groups are regularly placed in the same order; in other words, the RP speakers are consistently the most acceptable and the Cologne speakers the least acceptable to the English observers. It is to be noted that the Hamburg German realizations have a rather wide range of acceptability. The same sort of speaker grouping for the German observers did not reveal the same degree of classification in accordance with the speaker's background.
Hamburg speakers
10 08 06 04 r02 X Q)
r--
0
Ll
.f:
Cologne speakers
10 r-08
'---
06
-
04 02 0
Figure 2
I
GE
I GE Tip -tip
I GE
GE
I I
GE GE Pa;J-pus
I
GE GE GE Busch-bush
English index values for the English (E) and German (G) words spoken by the Cologne and Hamburg speakers.
(d) The differences in the judgments of the English and German words spoken by German speakers, show an interesting regularity. Whereas the English observers judged the Cologne English realizations to be much more acceptable than the German realizations in all cases, they judged three Hamburg German realizations as being more acceptable than the English realization by the same speakers. Even in the other six cases, when the English word was considered more "English" the difference in index value between the English and German words was much lower than with the Cologne speakers (Fig. 2). The same trend was observed with the German observer groups (Table III). It is apparent that the Cologne speakers modified their pronunciation much more clearly than the Hamburg speakers, a fact that is perhaps to be expected when the low index values of the Cologne German realizations compared with the Hamburg German realizations are borne in mind. (e) Table IV contains the reliability figures for the different observer groups in each of the tests. The consistently higher reliability, smaller range and lower standard deviation in the English observer groups is striking. This is perhaps to be expected in view of the task they
Language background and perception offoreign accent
75
Table III Average German index values for the English (E) and German (G) words spoken by Cologne (C) and Hamburg (HH) speakers
Speaker Ke
Tip-tip
PajJ-pus
Busch-bush
0·39 0-45 0·53 0·57 0·74 0·20
0·46 0-43 0·50 0·70 0·41 0·29
0·51 0·57 0·37 0·33 0·55 0·44
0·61 0·37 0·10 0·08 0·88 0·41
0·72 0-41 0·14 0·15 0·78 0·23
0·34 0·20 0·25 0·22 0·88 0·41
(E) (G)
HH
La
(E) (G)
Te
(E) (G)
Bo
(E) (G)
c
Mli
(E) (G)
Ot
(E) (G)
Table IV Reliability values (R) for the individual tests with variation (range) (V) and standard deviation (S)
R
Tip-tip
v
s
EFL
UCL
PH
UI
Qui.
HH
0·85 0·20 0·06
0·85 0·19 0·07 0·80 0·21 0·06 0·80 0·20 0·06 0·82
0·70 0-49 0·15 0·67 0·55 0·14 0·57 0·50 0·14 0·64
0·62 0·44 0·13 0·56 0·35 0·11 0·59 0·50 0·15 0·59
0·55 0·47 0·13 0·60 0·50 0·12 0·57 0·40 0·10 0·57
0·59 0·57 0·16 0·68 0·50 0·14 0·56 0·66 0·16 0·61
R PajJ-pus
v
s
R Busch-bush
Average
v
s
0·80 0·26 0·06 0·82
were given: judging deviation from the neutral form of their native language. The German observers, on the other hand, were given a bipolar task, being asked to categorize some of the stimuli as belonging to a language which was non-native to them. It is, of course, recognized that a true comparison of the German and English reliability values is not possible since the basis of the judgments is different, but if anything a lower reliability is to be expected with five categories to choose from, so that the higher values of the English groups are doubly convincing. Apart from the general information as to the comparative reliability of the group judgments, the reliability values of the English groups were expected, as was mentioned above, to reveal whether the English observers judged "regional English" as a category in its own right or merely as a deviation from the "neutral English" norm. A third reliability "peak" for index values around 0·75 would be an indication of the separate category. The English UCL-group did in fact show a marked increase in reliability in the Tip-tip and Paj3-pus tests for certain stimuli indexed at around 0·75, but (a) the generally very high reliability, (b) the judgment of one bush realization spoken by aRE speaker as unmistakably foreign (Index 0·08, Reliability 0·96), and (c) the lack of any peaks in the EFL-group reliability values, reduce the significance of such peaks.
76
W. J. Barry
The reliability values and the wide range of index values are a strong indication that the judgments were not the result of chance. A x2 -test was carried out on the judgments passed by each group in individual realizations. The significance of the deviation from an equal distribution of the judgments was tested, and those realizations which did not deviate significantly were discarded for the correlations. The number of realizations discarded differed from test to test; Table Vindicates the number of tokens discarded for the correlation calculations. Tip-tip, which generally had the highest reliability values (cf. Table IV) had the lowest number of tokens discarded whereas Busch-bush, which was judged less reliably, had the highest number of tokens discarded. Table V Speakers for whom the judgment distribution did not reach significance
Tip-tip
UCL EFL HH PH UI Quinta
La(E), Te(G) La(E)
PajJ-pus
Busch-bush
La(G), Bo(E)
LA(G), Te(E)
La(E), Bo(G) La(E) P, Ke(E), La(E) p
S, Te(E), Bo(E), Ot(G) La(G), Te(E), Bo(E), Mli(E), Ot(G) S, Ke(G), La(G), Te(E), Ot(G) Ke(G), La(G)
Acoustic analysis The acoustic data will be presented in two parts, (a) those data connected with vowel quality and (b) durational data, i.e. vowel and aspiration durations. To provide further evidence and to give an indication of the representability of the test-word data, a selection of control-words taken from the 106 English and German sentences was analysed in addition to the test -words. They were as follows: (If-bin-bin, fit-fit, Fisch-fish, Kitt- kit, pickt-picked, ticktticked fa-A{-BajJ- bus, FajJ-fuss, lasch-lush, hat-hut, Hasch-hush, -Kaff-cuff fuj- guck-cook, Pudding-pudding. The picture of regional differences conveyed by the three short vowel phonemes occurring in these words is, of course, extremely limited, although they do represent the extent of the short vowel area of articulation in both languages (with the exception of the intermediatelength vowel 1~1 in English). By considering the corresponding long vowels / i :j , fa :-a:/ ju :f in relation to the short vowels, it was felt that considerably more information could be gained about the articulatory habits in vowel production of the different languages and dialects. Therefore the keywords from a further selection of sentences were included in the analysis. They were: /i :jVieth-feet,jlieht-fleet, lieb-leap, lief-leaf, Lied-/eat, liejJlease, fa :-a :f- Bahn- barn, tat-tart, kam-calm, kahl-karl fu :/ Mut-moot, Mus-moose, tut-toot, Hut-hoot, Huf-hooj, The large variety of consonantal contexts in the selected word-pairs is of no import since each speaker uttered all the words analysed.
Language background and perception offoreign accent
77
Vowel quality As stated above F 1 and F 2 values were obtained for all vowels and for (If and /i :/ F 3 in addition. The F 1 and F 2 values for the test-word vowels are shown in Fig. 3. Although the individual values cover a rather wide range, a systematic tendency is apparent ev~n with the small number of vowels measured. The tendency is supported by the control-word measurement (Fig. 4). It may be summarized as follows: (1) The English vowel /If is considerably more "centralized" than the German one. The most striking feature, acoustically, is the high F 1 and the relatively small amount of spread. In general the German F 2 values, particularly the Cologne ones, are higher than the English, though this tendency is relatively weak, the overlap being almost complete.
2·5.-----------------------------------------------,
2·0
HH
·~•HH
• HH • HH
I
• HH
1·0 0·9
08 O· ? L---~~--~--L---L-~-----L---J__- J___ L_ _~O L---~1 -2
0·3
0-4
0·5
Fi Figure 3
0·8
I
(kHz)
F1 /F2 values of test-word vowels spoken by RP and Regional (RE) English, Cologne (C) and Hamburg (HH) speakers.
(2) The English vowel fA/ is strongly centralized in the Regional English speakers so that two distinct realization norms must be assumed, with higher F1 and F2 value, for the RP norm. In relation to the German realizations, however, even the RP realization area is more centralized, although there is a small area of overlap with both the Hamburg and Cologne realizations when the total area of spread is considered. The two German regional groups reveal distinctly different tendencies in the F 2 values, the Hamburg values being consistently higher. (3) Once again, the area within which the formant values for English /u/ fall is more centralized than for the corresponding realizations. The spread areas for the English and Hamburg speakers are larger than for the other two test vowels. The apparent division of RP and RE /u/ -realizations in the test-word bush into a centralized and peripheral norm is not supported by the data from the control words. All English realizations are therefore marked in one area with a correspondingly large F 2 spread. The Hamburg and Cologne realizations show differing tendencies in the F 1 values. 4
78
W. J. Barry 25
20 18 16 /a/
14
;-·
..,.
~
1.("
12
10
X
09 08 07
06
03
0·6
07
08
09
I0
1·2
F, (kHz) Figure 4
F1 /F2 values of control-word vowels grouped according to speaker origin. Engl.: e HH: x C: 0
25 I ll
2 0 1/ <1-A/
1·8 116 1I
N
-"'-
X
lA t-
LJ.."'
·~.>
.. ..
·G
1·2 t-
+0 09 0 ·8 0·3
l UI
I
I
I
I
I
0-4
05
0·6
0·7
08
0·9
F, (kHz) Figure 5
Average formant values for vowels in German and English test and control words spoken by Hamburg(-----) and Cologne( ... . . . ) speakers compared with English (x) values.
Language background and perception offoreign accent
79
It is interesting to note that in producing the English words, in general the German speakers modified their vowel articulation in such a way as to counteract the main difference between the German and English vowel in question (Fig. 5). Thus, the Hamburg speakers produced vowels with reduced F 2 values when speaking words with E/A/ as opposed to 0 /a/, whereas the Cologne speakers produced vowels with lower F 1 values. In their /E/u/ production, the Hamburg speakers showed a greater change in F 1whereas the Cologne speakers had a greater change in F 2 • It must, however, be pointed out that the clear modifications represented in the figure are naturally not present in each EnglishGerman word-pair as spoken by each individual speaker, but the tendency is strong enough
---
---
/ a/
....·· l ui
/ u/
..................-
..·· :
.. ··
/ 0·3
04
0·5
06
0-7
08
09
Fj (k Hz )
Figure 6
Average formant values for long and short vowels in German control words spoken by Hamburg(-----) and Cologne(· · ····) speakers compared with average values from English control words spoken by English(--) speakers.
to maintain that the two regional German speaker-groups modify their vowel production differently. The relation between the short vowels just discussed and the corresponding long vowels /i ://a :-a:/ /u :/ is shown in Fig. 6. A clear difference in the realization of fu:/ is apparent between the Hamburg and Cologne speakers ; the English production of all three vowels differs quite considerably from both those of both German speaker-groups. Duration values The values for vowel duration and aspiration for the ind ividual test-word realizations are given in Table VI. The desire to avoid influencing the speaker's natural production habits while reading the sentences meant that a control of the speed of production to aid comparison of duration values was not possible. An examination of sentence duration revealed ,
W. J. Barry
80
Table VI
Vowel and aspiration durations for the test-words
B. Duration of aspiration (ms)
A. Vowel duration (ms) Speaker
Tip-tip
PajJ-pus
Busch-bush
Tip-tip
PajJ-pus
(E) (G) (E) (G) (E) (G) (E) (G) Mli (E) Mli (G) Ot (E) Ot (G)
99 82 133 140 142 137 129 129 152 103 119 91 85 73 152 101
149 128 126 139 172 163 162 189 140 109 133 119 119 113 174 166
99 107 116 119 131 128 103 121 110 109 96 86 86 149 124
55 18 54 59 53 52 43 43 61 37 40 27 27 30 63 37
52 14 27 32 38 71 22 41 27 27 12 9 15 9 38 18
RP RE
91 136
139 133
103 118
37 57
33 30
Engl.*
113
136
110
47
31
HH (E)* HH (G)*
141 123
158
154
121 114
52 44
29 46
c c
119 88
142 133
115 102
43 31
22 12
p
B
s
H Ke Ke La La Te Te Bo Bo
(E)* (G)*
Ill
*Average values. however, that with the exception of the Cologne speaker Ot., who spoke at a rate that was 15-20% slower than the others, there was very little variation in the reading speed. A comparison of duration values is therefore still considered to be justified. In the test-words the average vowel duration for the German words spoken by the Cologne speakers is shorter than the English words spoken by the English speakers. The average duration of the Hamburg German realizations, on the other hand, is in all cases longer than the English average values. In the case of Tip-tip, however, an average value for the English speakers is misleading since the two RP values are much lower than theRE values (91-136) ms). This difference is supported by the data from the control words containing the short vowel E/I/ (84-112 ms), which indicates that there may be different durational norms dependent on dialect in both English and German. Such an assumption would, in the face of minimal dialectal differences in the other two short vowels examined, imply either that the underlying value is not invariant for all short vowels (cf. Nooteboom, 1972), or that the constraints governing the actual duration during production are different in different dialects for different vowels. In the case of German jaf and fu/ the shorter realizations for the Cologne speakers are found consistently for all three vowels if allowance is made for the slow reading rate of the speaker Ot. Thus, although either dialectal differences in the underlying vowel duration or in the constraints on actual realization may be assumed, they
Language background and perception offoreign accent
81
are at least constant for all three short vowels examined. The question of invariant underlying durations, the linguistic-phonetic answer to the objections of variance of duration in actual production, becomes even more difficult in the light of the duration values of the long vowels examined. On the one hand, taking the average values for each speaker group for all the words spoken, there is an apparent consistency in the vfv: ratios for the different vowel pairs between groups (Table Vlf), in that the (1-i :/ ratio is the highest and fa-a:/ or /A-a:/ the lowest for all groups. On the other hand, the difference in the vfv: ratios for the three vowel pairs in the English and Hamburg speaker groups is too great to be ascribed merely to the heterogenity of the consonantal context, so that invariant underlying short and long vowel durations appear rather unlikely in these two cases. It might be noted in passing that the vfv: ratios are in no way related to the degree of difference between formant values for the long and short vowels (see Fig. 6) which might be expected if a trading relation was assumed between qualitative and d urational differences as a basis for vowel oppositions (Bennett, 1968). Table VII
v/v: ratios
Vowels
/r- i :f
/u- u: / /a-a :/ A a:
Speaker group Engl.
HH
c
0·95 0·72
0·86 0·78
0·65 0·60
0·49
0·52
0·54
The second durational property which was examined was the aspiration of the initial plosives. In the test-words the data cannot be considered sufficiently representative nor clear enough in their trends to allow a statement about differences in the aspiration norms of the three speaker groups. They do, however, form a basis on which to consider the additional durational values taken from the control words, where, in addition to fpf and ftf, fkf was also represented (Kitt-kit, Kaff-cuff, kahl-Karl, kam-calm). The average values for the speaker-groups in the native-language test-word realizations were as follows: /p/: Cologne-12 ms, Hamburg- 46 ms, English-31 ms ftf: Cologne-31 ms, Hamburg- 44 ms, English-47 ms In the case of the Hamburg and English speakers no systematic difference is apparent either from the average or the individual values, but a clear difference between these two groups and the Cologne speakers is apparent. This trend is supported even in the individual values despite the expected higher duration values for the Cologne speaker Ot. The only non-Cologne speaker with overlapping aspiration values was the RP speaker B. His voiceless plosive production was characterized by very weak aspiration throughout . fpf: Cologne-16 ms, Hamburg- 34 ms, English-27 ms
ftf: Cologne-25 ms, Hamburg-46 ms, English-44 ms /k/: Cologne-37 ms, Hamburg-51 ms, English-43 ms These confirm the trend registered in the test-word data in that the Cologne values are consistently shorter than either the English or the Hamburg values. A comparison of the
82
W. J. Barry
English and Hamburg aspiration habits is difficult in view of the degree of variation from speaker to speaker. Standard deviation values for the speaker groups were:
fpf: Cologne-5·99; Hamburg-17·60; English-14·06 /t/: Cologne-8·88; Hamburg-14·58; English-13-12 /k/: Cologne-10·85; Hamburg-12·15; English-9·67 Another aspect of the aspiration values which was of interest was the difference between the German and English words as spoken by the German speakers. The Cologne speakers were again the only ones to reveal anything like systematic differences. The average values were: /p/: Cologne-12/22 ms; Hamburg-46/29 ms (German/English) ftf: Cologne-31/43 ms; Hamburg-44/52 ms (German/English) for the test-words. As might be expected, the Cologne speakers increased the degree of aspiration when producing "English" voiceless plosives, i.e. they modified their articulation in the direction of a supposed English norm. This tendency was supported by all but three of the control-word pairs. The apparent systematic modifications of the Hamburg speakers, which are also in accordance with the relationship existing between the native English and native Hamburg production values (increase in /t/ aspiration, decrease in /p/ aspiration), are not borne out by the control-word data. It can be maintained therefore, on the basis of the test-word and control-data for fpf and /t/, that the aspiration habits of the English and Hamburg speakers are comparable, and that the degree of aspiration of the Cologne speakers is considerably less. Further, there is strong enough evidence that the difference in aspiration had led to a modification of the Cologne speakers' articulatory habits in English in the course of their contact with the English language. In the case of E,G/k/, no significant differences can be observed between any of the speaker groups, although once again the Cologne values were slightly lower than either the Hamburg or the English values. The tendency towards an equalization of the aspiration values for /k/ is presumably due to the physiological factors of relatively restricted mobility of the back of the tongue compared with the tongue-tip and lips and the smaller oral cavity, which reduce the differences in the time taken for the transglottal pressure difference to reach the state necessary for phonation.
Discussion of Correlation Results Figure 7 gives a graphic representation of the single correlations of the acoustic parameters F 1 , F 2 , R 1 (also R 2 in the case of Tip-tip), duration of aspiration (Asp) (with the exception of Busch-bush), vowel duration (V-D), vowel duration/aspiration, fa change (Int) with the index values from the perception tests. There are two questions of interest: (1) Is there evidence of different strategies for judging English or foreign accent (a) in the English observer groups compared with the German groups, (b) in the Cologne groups compared with the Hamburg group? (2) Is there any evidence for assuming that any such differences in perceptual strategy are connected with differences in production by speakers from the same region, i.e. are the judgments on "accent" determined by language or regional norms? With regard to the first question, no clear-cut differences can be observed which offer conclusive evidence for different strategies. Inconsistencies from word-pair to word-pair and between observer groups of a common language- or dialect-background complicate
83
Language background and perception offoreign accent Tip-tip F 1 F 2 R 1 R2
Pa,B-pus
Asp VD VD/Asp Int.
Fj
F2
R1
_l_Asp
Busch-bush
VD VD/Asp lnt
F.I
F2 R1
VD lnt
_l
I
I I
x-x'-
x-x
I
-1
\ X
/X A
I
X/
l
~------------------~
I /~ / I
/x-~
l I
X
-1
X" /
I
X
I
~------------------~
X-
I
x-x
X
yx,
I
I
~I
I j_
X
I
" 1/
X
x-:t_x
-1
L----------r--------~
I
I
X
X
j_
-1
L---------~--------~
I
I
I
-JL----------r-K~----~
Figure 7
I
X
I
Results of single correlations.
interpretations. Some indications remain, however. Firstly, with English, none of the corre· lations of vowel duration with the index-values are significant, whereas in 10 of the 12 perception tests with German group (4 groups x 3 word-pairs), vowel duration appears to have influenced the groups' judgments. Secondly, the Hamburg group seems to judge
84
W. J. Barry
English or German accent independently of the degree of aspiration whereas strong aspiration appears to indicate an English accent for the Cologne observers. The results from Tip-tip suggest that lack of aspiration in producing ft/ initially is an indication of a nonEnglish accent for the English observers too (despite the high acceptability of speaker B's very weakly aspirated j tj), but without any significant correlations at all for the acoustic properties measured in PajJ-pus it is not possible to claim more general significance for aspiration as an "accent-marker" in English. Taking up the second question , whether such differences are directly connected with differences in production habits, we are again faced with inconclusive evidence. Allowing for the slower speaking speed of the Cologne subject Ot., the importance of vowel-length for the Cologne judgments on accent would appear to be directly connected with their own shorter norm. Similarly, it can be argued that with such a large degree of variation in evidence in the production of the native English speakers, judgments by the English observer groups, made independently of vowel duration, also confirm expectations. This simple reflection of production habits in perception is, however, not to be found in the Hamburg results. For Tip-tip and Busch-bush, the Hamburg observers appear to have judged the shorter vowel durations as one of the indications of a German accent rather than an English accent in just the same way as the Cologne observers. But the average duration for the Hamburg short vowels was longer than the English average in all three cases. Thus, whereas it is possible to claim that the Cologne observers were judging the acceptability of the stimuli offered on the strength of a comparison with some internalized Cologne norm, this is not possible as an explanation of the Hamburg judgments. A similar dilemma is apparent when the connection is sought between production and perception along the aspiration parameter. In the case of the English observers (at least for Tip-tip), a deviation from the English aspirated norm was registered as "foreign". For the Cologne groups too, deviations from their own unaspirated norm tended to be judged "English". The Hamburg group, on the other hand , should show a positive correlation (less aspiration =less "German") under the same circumstances, i.e. if its members had judged accent according to the same principle of deviation from the norm. As it is, an almost zero correlation can be observed for PajJ-pus and a non-significant negative correlation for Tip-tip. One fact that is apparent from these results is that the Hamburg observers, at least, did not make their judgments solely within the frame of reference of the stimuli offered, i.e. they did not locate their own norm within the range offered and automatically judge deviations as English. If that had been the case they would have judged differently in both of the above cases. It may be assumed therefore, that the Hamburg observers at least, were aware of the direction a deviation from their own norm would have to take to be "English". Evidence from the vowel quality data suggests that this was also the case for the Cologne observers. Results from the PajJ- pus test reveal significant correlations with at least one " quality" parameter for all German groups. A comparison of the Hamburg, Cologne and English production data (Fig. 4) indicates that the Hamburg observers would obtain high F 1 and F 2 correlations simply by following the principle of registering deviations from the Hamburg extreme low, fronted jaj norm. The Cologne observers could not, however, follow this principle and obtain significant correlations either with R 1 (which is the case for all three Cologne groups), F 1 (two cases) or F 2 (in one case). There are indications in the results from the Busch- bush "quality" parameters that the orientation of judgments on an assumed English "norm" is a question of (second) language experience, and that the more nai:ve the hearer the more likely his judgments are to be based on deviations from his own
Language background and perception offoreign accent
85
norm. Thus, whereas the Cologne student group (PH) obtained significant correlations for both F 1 and F 2 , the two school groups' (UI and Qui) index values only correlated significantly with F2 • The results from Tip-tip also point to an increase in the importance of vowel quality with increasing age (and presumably increasing L 2 experience), but such interpretations are extremely dangerous on the basis of such vague evidence. In answer to the second question, then, several indications of a direct connection between the respective production norms and perceptual data are to be found though there is also evidence that some knowledge of the English norm, or alternatively of possible German dialectal variants, influenced the judgments of some German groups, preventing a general classification as "English" of all stimuli deviating from the norm. The uncertain nature of such interpretations must, however, be stressed in the light of other results which cannot be explained at all by these two alternatives. The most extreme example is the Hamburg group 's vowel quality data in the Tip-tip test, which although not reaching significance in the correlations, reveal a reverse trend both to the English and Cologne results . This could be interpreted as pointing to a tendency of the Hamburg observers to judge more centralized vowel realizations as more German despite the fact that their own norm was the most extreme of all three speaker groups. In other words, deviations from their own norm were, if anything, considered more German. Multiple Correlations One obvious disadvantage of interpreting data gained from correlations between a dependent variable and only one independent variable (in the present case between the index and one acoustic parameter) is the uncertainty about the genuine independence of the "independent variable". Particularly in the present case, a separation of one parameter from the complex of the speech signal is clearly artificial. Naturally, any selection of parameters shares this artificiality to a certain extent, but by calculating multiple correlations the statistical relations between the independent variables are taken into consideration. To take one example of possible misinterpretations that can arise from overlooking such interdependencies: The Hamburg index values for PajJ-pus correlate significantly with both F 1 (+0·7341) and F 2 (+0·6649). Translated into predictability values, this me~-ws that more than 50% of the index variance is predictable from variance in F 1 and 40% is predictable from F 2 • But it is, of course, not possible to conclude that the index is determined 90% by F1 and F2 . The double correlation F 1 F 2 /Index (0·7519) shows that in fact the predictability of the index from the two parameters is only 56· 5% due to an intercorrelation of +0·7616 for F 1 and F 2 . Multiple correlations give a clearer picture of the relationship between the independent variables and the dependent variable. Additional information is gained by calculating the Beta-values (see above), which reveal the relative contribution of the individual variables to the correlation. There is, however, a practical limit to the number of independent variables which it is advisable to include in a correlation (apart from the limit imposed in the present study by the computational facilities available). Firstly, interpretation becomes increasingly difficult as the number increases, and secondly, a significant correlation becomes inevitable above a certain number of variables even if none of them achieve significance individually (McNemar, 1969, p. 204). An F-test can be applied to check whether the increase in a correlation by including an additional variable is significant:
F= (R1 2 - R/)/(m 1 - m 2 ) (1 - R/)/(N- m 1 - 1)
86
W. J. Barry
where R 1 is the correlation with m1 independent variables, R 2 the correlation with m 2 independent variables, and m 2 variables are also among the m1 variables. Double and triple correlations and the corresponding Beta-values were calculated for those variables which could be considered 'acoustically' independent (i.e. F 1 , F 2 and R 1 or the combination Asp., V-D and VD/Asp were not included in the triple correlations). A full discussion of the interpretation cannot be included in the space available (see Barry, 1974, pp. 141-158), but after comparing Beta-values and applying the F-test, the following acoustic parameters were extrapolated from the data as having influenced the judgments of the observer groups). Table VIII Acoustic properties extrapolated from multiple correlation data as having influenced observer group judgments
Tip-tip
PajJ-pus
UCL
EFL
HH
R1 Asp lnt
RJ Asp Int
-Rl
(~~)
F1
(lnt) Busch-bush
F1
VD
F2
PH
UI
R1
F1
VD
VD
Asp
Asp
F1
F1
Asp
F2
VD
VD
Quinta Asp
VD
(VD)
(Fz) (-Asp)
(Asp)
VD
VD
VD
VD
F1
Fz
F1
F1
(VD) (Asp)
Those parameters whose influence remains in some doubt are given in brackets. The importance of the variables that were selected is confirmed to a certain extent by the level reached in the correlations. For the German groups, the highest coefficient is above 0·75 for every word-pair and reaches 0·93 in two cases (Qui:. Tip-tip and PH: Busch-bush); in other words, between 56% and 86% of the index variation is predictable from the variations in the parameters included in the correlation. The results from Tip-tip are comparable for both English groups as are those from Busch-bush in the case of UCL, but in Pafi-pus the correlations are extremely low throughout. Conclusions
In the light of the test results, acoustic measurements and the correlations between test results and acoustic properties, the following conclusions may be drawn in connection with the aims stated at the outset. The test results clearly show that the observer groups were in a position to pass judgment on the acceptability as English or foreign/German of certain previously identified words. In the case of the word-pairs Tip-tip and PajJ-pus, the judgments of the English observers show a remarkable degree of agreement with the actual language background of the individual speakers, i.e. the RP speakers were considered most and the Cologne speakers least acceptable as "English". For Busch-bush, the degree of overlap of the speaker groups was slightly greater. The German observers' judgments did not reveal the same differentiation of speakers according to actual language background. In the search for acoustic correlates of "foreign accent" (English vs. German accent in the case of the German observers), significant correlations were found for all tests except
Language background and perception offoreign accent
87
for the UCL observer group in the Paj3-pus test. The interpretation of these correlations as indications of the influence of a particular acoustic property on the observer group's judgments underwent certain modifications in the light of results from the multiple correlations when the intercorrelations from various acoustic parameters, Beta-values and the significance of the increased correlation were taken into account. In some cases, uncertainty as to the importance of a particular property remained. With regard to the third aim of the investigation, that oflinking the correlation data with inter-regional and inter-lingual production differences, the following can be concluded. The priority of vowel quality for the judgment of vowels by English speakers against both quality and duration by the German speakers is a fairly clear trend. This is in accordance with expectations (but compare Bennett, 1968) considering the importance vowel length has as encoded information for the following consonant (Raphael, 1972). This does not mean that vowel length is irrelevant, but merely that the shorter vowel realizations (particularly by the Cologne speakers) were not judged as deviant. Longer "short vowel" realizations, on the other hand, would presumably have led to judgments of "foreign" for vowels before fortis consonants. A further difference between groups which is in agreement with observed differences in production is that of aspiration . For the Cologne observers, longer duration of aspiration correlates with "English", and the production of the Cologne speakers reveals consistently lower aspiration values. For the English groups, the relation is reversed; low values correlate with "foreign" and the English production reveals aspiration values that are generally higher than the Cologne values and comparable with the Hamburg values. Despite such cases of agreement between production and perception, the use of natural stimuli presents difficulties of analysis which cannot be completely overcome by employing multiple correlation techniques, particularly since interpretation becomes increasingly difficult with a growing number of variables. Natural stimuli just do not allow the certainty that all the parameters that have interacted in the judgments have been included in the correlation. The PajJ-pus results are the best witness of this fact since none of the parameters give a significant correlation for UCL despite the fact that the speakers were differentiated extremely well according to their language and regional background. Synthetic stimuli, on the other hand, only provide information about the parameters varied during stimulus production. This provides certainty at one level, but does not allow speculation about other possible parameters. This is particularly important in such an area of speech investigation as foreign or social "accent", where the relevant information is interwoven with more central information. Whether the Hamburg group's judgments on Tip-tip were in fact a result of such unexplored parameters, an artefact of the test situation or a true reflection of the group's perceptual habits must await further investigation. But it must be mentioned again that they represent a puzzling contradiction to the indications of a link between production norms and the perception of deviant "accent" which pervade the investigation in general. With all due consideration for such inconsistencies and discrepancies between production habits and perceptual strategies, the important implications of the results should not be overlooked. It is evident that a considerable degree of agreement exists between members of observer groups, not only in the identification of phonemes, as has been amply demonstrated in the past, but also in the judgment of "foreign accent". With groups of differing language background, significant judgment differences are revealed, and connections are apparent between these differences and measured differences in the production of speakers with the same language background. The possibility of isolating the elements which create
88
W.J. Barry
the impression of "foreignness" has obvious applications in language-teaching, and the validity of such diagnoses at regional level is particularly important. The research reported here was carried out as part of the research for a doctoral thesis under the supervision of Professor Georg Heike at Cologne University Institute of Phonetics. I wish to thank him and the other Institute members, who gave me critical advice and encouragement. References Abramson, A. S. & Lisker, L. (1972). Voice-timing perception in Spanish word-initial stops. Status Report on Speech Research, SR-29/30. Haskins Laboratories, New Haven, 15-25; also (1973). Journal ofPhonetics 1, 1-8. Barry, W. J. (1974) Perzeption und Produktion im sub-phonemischen Bereich. Eine kontrastive Untersuchung an intersprachlichen Minima!pam·en des Deutschen und Englischen. Tiibingen: Niemeyer, in press. Bennett, D . C. (1968). Spectral form and duration as cues in the recognition of English and German vowels. Language and Speech 11, 65-85. Fant, G. (1968). Analysis and synthesis of speech processes. In (B. Malmberg, Ed.) Manual of Phonetics, Amsterdam. pp. 173-277. Fant, G., Fintoft, K., Liljencrants, J., Lindblom, B. & Martony, J. (1963). Formant amplitude measurements. Journal of the Acoustical Society of America 35, 1753(A). Fischer-J0rgensen, E. (1964). Sound duration and place of articulation. Zeitschrift [!lr Phonetik 17, 175-207. Fourcin, A. J. (1965). A note on the spectral analysis of unvoiced sounds. Proceedings of the 5th International Congress of Phonetic Sciences, Basel, pp. 287-291. Fry, D. B. (1964). The correction of errors in the reception of speech. Phonetica 11, 164-174. Fry, D. B., Abramson, A. S., Eimas, P. D . & Liberman, A.M. (1962). The identification and discrimination of synthetic vowels. Language and Speech 5, 171-l 89. Hammarstrom, G. (1963). Reflexions sur Ia Iinguistique structurale et Ia phonetique experimentale. Phonetica 9, 11-16. Harris, K. S. (1958). Cues for the discrimination of American English fricatives in spoken syllables. Language and Speech 1, 1-7. Heike, G. (1969a). Sprachliche Kommunikation und linguistische Analyse. Heidelberg. Heike, G. (1969b). Suprasegmentale Analyse. Marburg : N . G. Elwert Verlag. Hughes, G. & Halle, M . (1956). Spectral properties of fricative consonants. Journal of the Acoustical Society of America 28, 303-310. Kozhevnikov, V. A. & Chistovich, L.A. (1965). Speech, Articulation and Perception. Moscow, Leningrad: Joint Publications Research Service, Washington D.C. Liberman, A. M. (1957). Some results of research on speech perception. Journal of the Acoustical Society of America 29, 117-123. Liberman, A.M ., Cooper, F . S., Harris, K. S. & MacNeilage, P. F. (1963). A motor theory of speech perception. Proceedings of the Speech Communication Seminar. Stockholm. Lienert, G . A. (1967). Testaufbauund Testanalyse. Weinheim: Julius Beltz Verlag. Lindblom, B. (1962). Accuracy and limitations of sonagraph measurements. Proceedings of the 4th International Congress of Phonetic Sciences, Helsinki. pp. J 88-202. Mouton & Co. Lindblom, B. (1963). Spectrographic study of vowel reduction . Journal of the Acoustical Society of America 35, 1773-1781. Lindblom, B. & Studdert-Kennedy, M. (1967]. On the role of formant transitions in vowel recognition. Journal of the Acoustical Society of America 42, 830-843. Lindner, G. (1966a). Beurteilung synthetisch erzeugter vokalartiger Klange durch deutschsprachige Horer. Zeitschriftfur Phonetik 19, 45-65. Lindner, G. (1966b). Veranderung der Beurteilung synthetischer Vokale unter dem EinfluB des Sukzessivkontrastes. Zeitschrift fiir Phonetik 19, 287-307. Lisker, L. & Abramson, A. S. (1964). A cross-language study of voicing in initial stops: Acoustical measurements. Word20, 384-422. Maack, A. (1953). Die Beeinflussung der Sonantdauer durch die Nachbarkonsonanten. Zeitschriftfur Phonetik 1, 104-128. MacNeilage, P. F. (1970). The motor control of serial ordering of speech. Psychological Review 71, 182-196. McNemar, Q. (1969). Psychological Statistics. New York: John Wiley & Sons, Inc. Mansell, P. (1970). The nature of EMG variations. University of Essex Language Centre, Occasional Papers, 9, Symposium 1970: Aerodynamic and Myodynamic Studies of Speech. Models of Speech Production, pp. 65-87.
Language background and perception offoreign accent
89
Meyer-Eppler, W. (1959). Zum Problem der sphariellen Analyse in der lautsprachlichen Kommunikation. Zeitschriftfiir Phonetik 12, 228- 236. Nooteboom, S. G. (1972). Production and Perception of Vowel Duration, a Study of Dura tiona! Properties of Vowels in Dutch. Dissertation, Utrecht. Nooteboom, S. G. (1973). The perceptual reality of some prosodic durations. Journal of Phonetics 1, 25-45. Petersen, G . E . & Lehiste, I. (1960). Duration of syllable nuclei in English. Journal of the Acoustical Society of America 32, 693-703 . Raphael, L. J. (1972). Preceding vowel duration as a cue to the perception of voicing characteristics of word-final consonants in American English. Journal of the Acoustical Society of America 51, 12961303. Scholes, R . J. (1967a). Phoneme categorization of synthetic vocalic stimuli by speakers of Japanese, Spanish, Persian and American English . Language and Speech 10, 46-68 . Scholes, R . J. (1967 b). Categorial responses to synthetic vocalic stimuli by speakers of various languages. Language and Speech 10, 252-282. Sharf, D . J . (1968). Distinctiveness of " defective" fricative sounds. Language and Speech 11, 38-45. Stevens, K. N . (1968). On the relations between speech movements and speech perception. Z eitschrift filr Phonetik 21, 102-106. Stevens, K . N., Liberman, A. M. , Studdert-Kennedy, M. & Ohman, S. E. G. (1969). Cross-language study of vowel perception . Language and Speech 12, 1-23. Strevens, P. (1960). Spectra of fricative noise in human speech. Language and Speech 3, 32-49.