Perceptual category mapping between English and Korean prevocalic obstruents: Evidence from mapping effects in second language identification skills

Perceptual category mapping between English and Korean prevocalic obstruents: Evidence from mapping effects in second language identification skills

ARTICLE IN PRESS Journal of Phonetics 36 (2008) 704–723 www.elsevier.com/locate/phonetics Perceptual category mapping between English and Korean pre...

612KB Sizes 1 Downloads 54 Views

ARTICLE IN PRESS

Journal of Phonetics 36 (2008) 704–723 www.elsevier.com/locate/phonetics

Perceptual category mapping between English and Korean prevocalic obstruents: Evidence from mapping effects in second language identification skills Hanyong Park, Kenneth J. de Jong Department of Linguistics, Indiana University, Memorial Hall Room 322, Bloomington, 1021 E Third St., Bloomington, IN 47405-7005, USA Received 29 May 2007; received in revised form 9 June 2008; accepted 20 June 2008

Abstract The current study develops an approach to quantify the extent to which native language (L1) categories are used in second language (L2) category identification, and uses this approach to examine the identification of a set of English obstruents by Korean learners of English as a foreign language. Forty native Koreans listened to nonsense English CV words consisting of /p b t d f v y j/ and /a/, and were asked to identify the consonant with both Korean and English labeling. They also gave gradient evaluations of the goodness of the Korean labels to the stimuli. The results of the Korean labeling task were analyzed to predict what confusion patterns would be expected if listeners used L1 categories and probabilistically mapped them onto L1 category responses. Results show the perceptual patterns of L2 stops can be successfully predicted by use of L1 categories alone if the listeners’ goodness rating scores were used to weight the probabilistic mapping from L1 to L2 in the predictions. Accuracy for other segments, such as /p/ and /f/, was higher than predicted. In general, this increase in accuracy over what is predicted from the L1 mapping data was negatively correlated with the average goodness-of-fit to the Korean. These results provide quantitative corroboration of acquisition models claiming that some L2 categories can function by using existing L1 categories alone while others must be indicative of the addition of a new linguistic category. r 2008 Elsevier Ltd. All rights reserved.

1. Introduction A fundamental tenet of Flege’s Speech Learning Model (SLM: Flege, 1987, 1995) is that the degree of success an acquirer will have in approximating the production of a non-native speech segment depends on the perceptual similarity of that segment to segments in the native language. Specifically, when an L1 and an L2 segment are similar enough to each other, the L2 segment can be functionally approximated more quickly than less similar segments. However, over time, these less similar or ‘‘new’’ sounds will be acquired more accurately, since a learner is said to construct an L2 category de novo. The more similar segments, by contrast, are entangled heavily with existing L1 categories, and so retain aspects of the L1 categories long into the acquisition process. Corresponding author. Tel.: +1 812 331 0307; fax: +1 812 855 5356.

E-mail address: [email protected] (H. Park). 0095-4470/$ - see front matter r 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2008.06.002

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

705

The basic role of cross-language perceptual similarity in L2 perception is also commonly accepted. It is a fundamental tenet of Best and her colleagues’ Perceptual Assimilation Model (PAM: Best, McRoberts, & Goodell, 2001; Best, McRoberts, & Sithole, 1988), that a listener’s initial perceptual ability to discriminate sounds also depends on the perceptual similarity between the objects of perception in an L2 and those in an L1. Using a general technique similar to classic research in the categorical perception literature, research in PAM typically uses a cross-language similarity mapping task targeting contrasting pairs of segments in an unfamiliar language, and then uses the results of such higher level tasks to classify the particular pairs with respect to various scenarios. Based on this classification, researchers then make qualitative predictions about listeners’ lower-level abilities to discriminate the sounds. This research has accrued much evidence that discrimination abilities depend not just on familiarity with sounds similar to those in the L2, but also on the degree to which the listeners’ L1 categories require them to fail to differentiate the sounds in linguistic categorization; discrimination is very good both in cases in which the L1 categories require the listeners to differentiate the sounds (i.e., Two Category assimilation), and in cases where one of the sounds falls outside of the L1 system (i.e., Uncategorized–Categorized pair), and thus is poorly assimilated with any category in L1. Cases in which two sounds map onto a single L1 category, and so are not differentiated in L1 at all (i.e., Single Category assimilation), yield the poorest discrimination abilities. Though the two sounds map onto a single L1 category, in cases in which one sound fits into the L1 category better than the other sound (i.e., Category Goodness difference), moderately good discrimination is expected. Guion, Flege, Akahane-Yamada, and Pruitt (2000) extend this research approach in cross-language perception to the learners who are the object of the SLM model, finding generally similar results. Examining a variety of segmental pairs (or triplets) in English that typically exhibit substitution effects in Japanese learners of English, Japanese listeners performed mapping tasks in which they labeled English productions with Japanese orthographic characters and gave goodness-of-fit ratings on a Likert scale. These results were then used to classify pairs of segments according to PAM scenarios, finding essentially the same results as earlier cross-language perception studies. Extending these results to listeners with more experience with English showed that the time-course of development could not be readily predicted from the cross-language mapping data, suggesting that, while research in the SLM has accrued much evidence for the role of similarity between segments in L2 production accuracy, predicting learning patterns will require more than similarity mapping. The SLM differs from PAM in that it seeks to determine the extent to which a single L2 category is nativelike, while PAM examines relations between categories. While much previous research has investigated mapping and discrimination (Best et al., 2001; Flege & MacKay, 2004; Polka, 1995; Tsukada et al., 2005), there is little research examining quantitatively the relationship between mapping and identification. While identification skills require discrimination skills, learning a linguistic function requires more than just discrimination, but also robust association to the right linguistic categories. When dealing with learners with a variety of ambient experience with the L2, identification performance can indicate explicitly how the listeners are functionally differentiating speech objects. Much like earlier SLM research, the current research seeks to predict identification abilities in learners on the basis of L1-to-L2 orthographic mapping patterns. However, unlike previous work on identification, the current study seeks to quantify the relationship between mapping results and identification. The current research investigates and seeks to quantify the perceptual similarity between consonants in Korean and English, and uses these similarity estimates to make quantitative predictions about L2 identification skills. Similar to the predictions of PAM, the basic hypothesis about L2 identification accuracy is that cases in which native L2 productions of two contrasting segments are mapped onto the same L1 category will yield confusions between the two L2 sounds, and so are particularly problematic for L2 identification. By contrast, cases in which L2 productions of two contrasting segments are systematically mapped onto different L1 categories will not be confusing, even if there is no single L1 category corresponding to either of the L2 categories. Similar to the predictions of SLM about production, a second hypothesis is that L2 segments that do not map well onto any L1 category will not present L2 identification problems for learners with enough experience with the L2 to have acquired the segmental label. Schmidt (1996) provides groundwork for Korean-to-English mapping. Schmidt asked Korean learners of English to attempt to label English consonant productions using Korean orthographic characters. The orthographic classification technique, in which a phonemic L1 orthographic system is used for the response

ARTICLE IN PRESS 706

H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

labels, allows learners to use the very familiar L1 labels to tap into their native Korean and apply the native categories to the variety of consonants in English. Several researchers have used this orthographic classification technique in cross-language perception work. Wiik (1965) investigated the vowel space of English and Finnish with this method. Since Finnish has a phonemic orthographic system with a one-to-one correspondence between phonemic and graphemic categories, listeners’ responses show which English vowel corresponds to which Finnish vowels. He asked Finnish listeners to write what they heard on an answer form with ordinary Finnish orthography, and he analyzed how probable each English vowel was to be identified as each Finnish vowel. Flege (1991) also used this technique for investigating Spanish and English vowels. Because the Spanish orthographic system for vowels also has a one-to-one grapheme-to-phoneme mapping, he asked native Spanish listeners to label English vowels with one of the letters of the five vowel phonemes of Spanish (i.e., /i/, /e/, /o/, /a/, and /u/). More recently, Cebrian (2006) investigated the mappings between English and Catalan high and mid front vowels. He asked native speakers of Catalan to choose one of four Catalan orthographic representations, namely, ‘i’, ‘e`’, ‘e´’, and ‘ei’ (/i/, /e/, /e/, /ei/, respectively), as the best matching Catalan vowel for an English vowel stimulus. One potential problem with these studies, however, is that the orthographic probes used in both languages are the same. Finnish, Spanish, Catalan, and English all have Roman character writing sets. Confusion may arise between the phonemic and orthographic levels when studying the languages sharing the same alphabet (Park, de Jong, & Silbert, 2004); responses using L1 labels could be occurring at the level of orthography rather than at the level of auditory categorization. However, this is not an issue for languages that do not share characters in their orthographic systems. A second potential complication also arises in studies such as Guion et al. (2000), which uses Roman and Japanese orthography, in that the Japanese system does not incorporate segmental labels, but requires the listeners to make syllable-level judgments about the stimuli to assign orthographic labels. Kim (1972) examined Korean and English consonants using Korean orthography. Since the Korean orthography is alphabetic, but shares no character structure with the English orthographic system, character usage is not an issue. Schmidt (1996) replicated Kim’s study with more controlled experimental variables. She asked 20 native Korean listeners to type the initial consonant they heard when listening to English stimuli with Korean orthography. The stimuli were nonsense CV syllables of 22 consonants combined with vowels /i a u/ spoken by three female native English speakers. She also asked the listeners to judge how similar that English sound was to the selected Korean orthographic response using a Likert scale from 1 to 5. Schmidt (1996), however, did not obtain English identification performance from her listeners. The current research partially replicates Schmidt (1996), and also seeks to determine the degree to which such L2-to-L1 mappings are implicated in L2 perceptual identification. We examined the Korean listeners’ perception of English anterior obstruents using the orthographic classification technique. We also used a rating of each response with a Likert scale in order to assess the strength of the mapping between the stimuli and the Korean labels (Best et al., 1988; Cebrian, 2006; Guion et al., 2000; Polka, 1995; Schmidt, 1996; Strange et al., 1998; Tsukada et al., 2005). From these mapping data, we seek to predict English identification accuracy, using a model where identification is based entirely on the listeners’ L1 categories and the mapping of these categories onto L2 labels. The success of these predictions indicates the degree to which L2 identification performance can be said to be due to the use of L1 categories. Increments in performance beyond such a hypothetical baseline, then, are taken to indicate the development of new L2 categories. 2. Methods 2.1. Talkers and stimuli Two male and two female native speakers of American English produced the stimuli. All the speakers were in their late 20s and had a residential history dominated by the Northern Mid-west. The speakers were asked to read a randomized list of nonsense words in isolation, consisting of the vowel /a/ and a variety of consonants in various prosodic locations with respect to the vowel, i.e., initial and prevocalic, final and postvocalic, and pre-stressed and post-stressed intervocalic positions. The current analyses only examine the

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

707

Table 1 Corpus description in the current study Labial

Stops Non-sibilant fricatives

Coronal

Voiced

Voiceless

Voiced

Voiceless

/b/ /v/

/p/ /f/

/d/ /j/

/t/ /y/

consonants produced in prevocalic word-initial position. The stimuli were recorded in a sound-attenuated recording room in the Linguistics Department at Indiana University, using an Electro Voice (model RE50) standing microphone and a TASCAM DA-30 MKII DAT recorder at a 44.1 kHz sampling rate. The recordings were transferred from DAT to an Apple G4 in the Linguistic Speech Lab at Indiana University for editing. The full stimulus set included most of the non-dorsal consonants in English in various syllabic positions. However, the current paper only focuses on the results from the symmetrical set of eight obstruents in CV position as shown in Table 1. Roughly speaking, the stops are acoustically similar to the stops in Korean, though the voicing contrast is somewhat different. (See Kang & Guion, 2006, for a review on acoustic properties of Korean stops.) Instead of exhibiting a two-way contrast in initial position, Korean stops show a three-way contrast, between the so-called tense, lax, and aspirated (Kim, 1970). Although the match across the languages is not exact, English stops and coronal sibilant fricatives are similar to their counterparts in Korean in the place of articulation as well as manner of articulation. The non-sibilant fricatives, however, are likely to be novel to the Koreans, because Korean does not have any non-sibilant fricatives except for /h/. The whole set (14 consonants  4 speakers  4 prosodic locations ¼ 224 stimuli) was included in the stimuli in order to cover the range of responses to the larger set of segmental neighbors. In the current study, however, only the results from eight types of stimuli produced from 4 different speakers, resulting in 32 test stimuli ( ¼ 8 consonants  4 speakers  1 prosodic location), are reported. 2.2. Listeners Forty listeners (28 female and 12 male; mean age ¼ 24.97 years; 30 from Seoul/Gyeonggi area and 10 from other areas) were recruited from the undergraduate students at Kyonggi University, in Suwon (near Seoul), Korea. All the listeners had been studying English for more than 7 years as a regular course in school. However, none had lived in an English speaking country prior to the experiment. All were paid for participating in the experiment. 2.3. Procedure All of the stimuli were randomized and presented to groups of approximately 10 listeners concurrently. The experiment was conducted in a quiet room at Kyonggi University in Korea. The stimuli were played from a CD by means of either a PC or other playback devices through loudspeakers with the interstimulus interval of 5 s. No feedback was given during any of the sessions. The listeners knew that the stimuli had been produced by a native speaker of English, and they were also told that the stimuli were not real English words.1 Five practice items were played before each task to avoid any misunderstanding of the experimental procedure. There was a native Korean experimenter conducting the experiment, and thus no misunderstanding about the mechanics of the tasks resulting from lack of English proficiency on the part of the listeners. Each perception experiment included two different tasks: (1) labeling each stimulus, and (2) giving a gradient evaluation of their response. Each task was performed with two different sets of labels, Korean 1 The stimuli /pa/, /ta/, /fa/, etc., could be real English words to those who have been exposed to non-rhotic English (e.g., par for /pa/, tar for /ta/, far for /fa/, etc.), as one reviewer pointed out. However, the participants in our experiment were unlikely perceiving the stimuli as real English words due to the dominant Korean tradition of teaching American English.

ARTICLE IN PRESS 708

H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

consonantal characters, and Roman and IPA symbols. Twenty listeners were asked to label the stimuli with Korean orthography first, and then to label the same stimuli with the Roman and IPA symbols on a later day. The other 20 listeners did the Roman identification task on an earlier day. In the Korean labeling task, the listeners chose the consonant they heard from the 13 alternatives presented in Korean orthography. The 13 alternatives were determined through pilot work to cover all of the likely responses for the core obstruents in the corpus. In addition, to cover cases in which there was no match between a stimulus and the pre-selected alternatives, they were also provided a blank for writing down what they heard. The Korean orthography is a phonemic system and there was no apparent difficulty in using the Korean alphabet for the choices. Following each identification response, the listeners were also asked to mark how good they considered the label to be on a scale from 1 to 7. If they thought that their choice was not similar to the stimulus at all, they would mark ‘‘1’’, and mark ‘‘7’’ if they felt that the label was an exact match with the stimulus. The Appendix shows the instructions and the answer sheet for the tasks. In the Roman labeling task, the listeners were asked to choose the initial consonant of the stimulus from 15 alternatives. They could also write down what they heard, if there was no correspondence between a stimulus and the choices. Since IPA symbols /y/ and /j/ were also included in the choices, we provided example words indicating which sound each symbol corresponds to before the task. The example words were selected from among words familiar to the listeners. The prosodic location of the consonant in the key words was chosen to match the location of the stimulus consonant. Key words for each target sound were presented at the top of each answer sheet. It should be noted that, since the listeners all had extensive experience with written English, they were very familiar with the roman orthographic probes. Also, since the instructions for the contrast between the voiced and voiceless dental fricative are typical in English education in Korea (including the use of IPA symbols to indicate the difference), the students did not seem to have any problems interpreting the IPA response alternatives. Following each identification response, the listeners were asked to indicate how confident they were in the response on a 7-point Likert scale. These confidence ratings do not figure into the analyses presented below. 2.4. Analysis The point of the current analyses is to assess the degree to which L2 identification patterns indicate a reliance on L1 categories. It should be noted that the current analyses do not make any reference to other influences (e.g., acoustic influence) on perceptual identification, which should be investigated in a further study. A given identification could be due to reliance on an L1 category and the additional mapping of that L1 category onto the L2 response category, or it could be due to the development of a (new) L2 category. The mapping data are used to distinguish these two possibilities. Our analyses develop predictions based on the mapping data as to what a null hypothesis of listeners using L1 categories for L2 identification would predict for each segment in terms of identification accuracy and error type. Identification accuracies that are greater than what is expected based on the mapping data would then be attributable to the development of L2 categories beyond the L1 categories. An all-or-nothing approach for generating predictions from the mapping data was considered. It would determine the probability that a segment will get labeled with particular Korean label, based on the Korean orthographic labeling data. The most common or highly rated English segments to exhibit mapping to this Korean category would then be predicted as the Roman label response. This approach would be a winnertake-all approach, where a Korean category always maps onto a single English response category. That is, for each Korean label, we expect the application of a corresponding Roman label. However, previous work on borrowings of English words has suggested that even borrowing patterns do not exhibit a one-to-one winnertake-all category mapping, but are often probabilistic (Kang, 2003; Park, 2007, 2008). If even conventionalized borrowing is probablistic, we would expect L1-to-L2 perceptual mapping to be probabilistic as well. Hence, the first approach examined here also has a probabilistic mapping from the Korean to the English response category. In the first approach used in the current research, predictions about L2 identification accuracy and error rates are generated from the mapping data with an additional second step. The second step estimates the probability of each label being mapped onto an English label. The probability of assigning a stimulus to a

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

709

Korean category, determined in the first step, is multiplied by the probability of that Korean category getting mapped onto the correct English category, yielding the predicted probability of correct identification based on L1 categories alone. Predicted error patterns are generated the same way with incorrect English response categories. For example, the data in Table 2 below (see Section 3.1.1) indicate that 84% of the English /d/s were labeled with the Korean category /t/. This category /t/ was also applied to other English categories including /v/ (18%), /j/ (78%), and /y/ (7%). Thus, when we add up all of the cases in which the Korean /t/ was used, of these only 45% ( ¼ 0.84/(0.18+0.78+0.07+0.84)) were cases of English /d/. In other words, even though the labeling of English /d/ is probabilistically connected to Korean /t/, so are several other English categories, yielding a number of likely confusions, if the Korean listeners are using their native /t/ category for the various English consonants. The predicted accuracy for English /d/ on the basis of using the Korean /t/ category, then, is 38% ( ¼ 0.45  0.84). Note that this approach would predict that having a one-to-two mapping of English categories onto Korean categories would not, by itself, create identification errors. From the data in Table 2, for example, English /d/ can be labeled either as Korean lax /t/ or Korean tense /t0 /. Whether the listeners label English /d/ as Korean /t/ or /t0 / will not matter for accuracy, as long as both /t/ and /t0 / are mapped onto the English ‘‘d’’ category. The formulas for accuracy and specific error prediction based on L1 labeling are shown in (1) and (2). Note that the summation in each formula is necessitated by the fact that there may be more than one path, through the Korean categories, leading to the correct English category. (1) Prediction of accuracy based on the confusion between L1 and L2 P Probability of accuracy where Category A is perceived as Category A ¼ (probability of category A being perceived as L1 category X  probability of L1 category X being associated with category A). (2) Prediction of specific errors based on the confusion between L1 and P L2 Probability of error where Category A is perceived as Category B ¼ (probability of category A being perceived as L1 category X  probability of L1 category X being associated with category B). One weakness of this approach is that goodness-of-fit is determined entirely in terms of usage of a label, without regard to whether the use of the label is likely to be indicative of such usage in general. The prediction based on such proportionate measures alone, then, is at best an indirect indicator of the goodness-of-fit of the Table 2 Matrix showing percentage labeling of English initial consonants with Korean consonants and their mean goodness ratings Korean

English consonants /p/

/p/ /p0 / /ph/ /t/ /t0 / /th/ /L/ /s/ /s0 / /c/ /c0 / /ch/ /m/ /h/ Others

95 (6.20)

/b/

/f/

/v/

41 (4.89) 46 (5.92) 8 (4.08) 1 (4.00)

8 (4.00) 23 (5.06) 54 (4.84)

65 (4.38)

18 (3.62)

/y/

/j/

/t/

/d/

15 (4.79) 7 (4.45) 16 (4.38) 7 (3.82) 24 (4.03)

1 (5.00) 78 (4.40) 1 (4.00)

1 (5.50)

84 (5.35) 14 (4.45) 98 (6.13)

6 (3.78) 4 (2.57)

2 (5.33)

4 (3.86) 40 (4.17) 1 (5.00)

1 (6.00)

2 (1.67) 2 (1.34)

5 (3.88) 4 (4.14)

1 (2.5) 5 (3.25)

1 (4.50) 2 (4.30)

Judgments totaling less than 1% are not shown. Each stimulus had approximately 160 tokens (4 talkers  40 listeners) for the analysis. Similarity ratings were on a gradient scale from 1 to 7 (7 ¼ exactly like Korean). Modal responses are marked in bold.

ARTICLE IN PRESS 710

H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

stimulus to the L1 category. Mean goodness ratings can augment the estimation of the match between the L1 and L2 categories (Cebrian, 2006; Guion et al., 2000). For instance, if the mean goodness rating of an L1 category to a set of productions of an L2 category is low, it seems less likely that Korean listeners will map that L1 category onto the L2 category. Thus, a second set of predictions incorporating the goodness ratings was generated using a method similar to Guion et al.’s (2000) fit index. In this method, each categorical mapping is weighted by the mean goodness rating of all the cases in which that L1 category is applied to a stimulus of a specific L2 category. We decided to use the mean values of goodness rating, rather than the median values (e.g., Strange et al., 1998), in order to pursue finer predictions with weighting. This mean goodness rating is applied to the mapping of L1 categories onto the L2 response categories. In order to increase the dynamic range of the goodness ratings, the lower end of the response scale was removed by subtracting a lower threshold determined as the lowest mean value for any of the segments of 7% (i.e., chance level with 13 possible response alternatives) or more responses (in the data below ¼ 3.5). It should be noted that the choice of a threshold value is a free parameter which could be fitted to maximize the match between the predictions and the actual accuracy. In Section 3, we will discuss the general effect of modifying the threshold value. The formula for weighting the connection between L1 and L2 categories according to their mean similarity rating is given in (3). (3) Weighted proportion of L2 category X in L1 category Y Proportion of L2 category X in L1 categoryP Y ¼ {probability of L2 category X is perceived as L1 category Y  (its mean similarity rating score3.5)}/ {probability of all L2 categories associated with L1 category Y  (its mean similarity rating score3.5)}. 3. Results 3.1. English to Korean perceptual category mapping 3.1.1. Results for Korean orthography Table 2 presents the proportion of Korean labeling chosen for each stimulus with its mean goodness ratings. The number of labels selected for each stimulus shows how many Korean categories are related to a specific English category and the proportion of the choices reveals the strength of connection between the Korean and English categories. In addition, the mean goodness ratings further indicate the strength of connection between the Korean and English categories. If the mean goodness rating for a specific choice is low, it indicates that the listeners chose the answer because there were no better alternatives. On the other hand, high mean goodness ratings will indicate that such a connection between the Korean and English categories is very strong. Fig. 1 gives a visual summary of the proportion of Korean label usage for English categories. In the figure, the thickness of the connecting lines indicates the proportion of times the Korean label was used for the English production. Connections that were made less than 7% of the time (i.e., chance level with 13 possible response alternatives) are not indicated. Some English categories were connected to a single Korean category and others to more than one Korean category. For example, in Fig. 1, English /p/ (at the top) is virtually always labeled as Korean /ph/. On the other hand, /y/ (second from bottom) has five different Korean categories applied to it, /ph/, /p0 /, /t/, /t0 /, and /s0 /. (The Korean /s/ category is not shown in Fig. 1 since it was selected less than 7% of the time for English /y/ stimuli.) This means that Korean listeners were very confused with what the stimulus /y/ was in terms of the Korean phonological system, although it was more often labeled /s0 / than other alternatives. Table 2 shows the relationship between the categories with the proportion of application and the mean goodness rating. For instance, /p/ was perceived as /ph/ in 95% of the cases and the mean goodness rating for this mapping was 6.20 out of 7.00. This demonstrates that the Korean listeners were very sure that they heard the stimulus /p/ as Korean /ph/. By contrast, the listeners were not sure what the stimulus /y/ was; there were a wide range of responses such as /p0 /, /ph/, /t/, /t0 /, /s/, and /s0 /, and all of these choices were given low mean goodness ratings. In general, the proportion of occasions a particular label is used for an English sound is related to the mean goodness rating for that choice. The correlation between the mean goodness ratings and the proportions of times a segment gets labeled with a particular Korean category (considering mappings greater than 2%) was

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

711

Fig. 1. Perceptual mapping of English categories to Korean categories. The width of each line corresponds to the proportion of times the Korean label was applied to the English production. Any mappings occurring less than 7% of the time (i.e., chance level) are not shown.

highly significant (R2 ¼ .55, po.0001). However, there are some systematic deviations between the goodness ratings and the proportion of usage, across different English stimuli. For example, both voiced stops and voiced fricatives exhibit one-to-two mappings with relatively low proportions of mapping them onto individual Korean categories. However, the voiced fricatives tend to have lower goodness ratings for the most commonly chosen alternatives. The voiceless stops /p/ and /t/ were perceived as /ph/ and /th/, respectively, almost 100% of the time, and with very high goodness ratings. These large proportions of answers and high mean goodness ratings show a strong connection between English voiceless categories with Korean aspirated categories. On the other hand, the voiced stop /b/ was perceived mainly as Korean categories /p/ and /p0 / with a small proportion of /ph/. However, the mean goodness rating for the /ph/ was lower than the others, suggesting that the connection between Korean category /ph/ and the voiced stop /b/ is very weak, and in general the best fit with the English /b/ category is with the tense stop. The patterns for /d/ are similar in being split between the tense and lax Korean categories, though slightly different; Korean /t/ was more common and had a higher mean goodness rating than that for /t0 /. The higher proportion of /t/ than that of /t0 / responses, and the higher mean goodness rating of /t/ suggest that the closest category in the Korean system to the English /d/ productions is the lax stop /t/. The clear distinction between voiced and voiceless segments was not observed in responses for English fricatives. In general, the voiced fricatives were perceived as two Korean categories while the voiceless fricatives were perceived as more than two Korean categories. The stimulus /v/ was perceived most often as Korean /p/, and occasionally as /t/ with a lower goodness rating (4.38 vs. 3.62). The overall mean goodness rating for the stimulus /v/ was lower than that for the voiced stop counterpart /b/. This suggests that the connection between the English /v/ and the two Korean categories is generally weaker than that between the voiced stop stimuli and two Korean categories, and that the split between the L1 categories does not tell the whole story with L2 labeling. The other voiced fricative stimulus /j/ was, like English /v/, perceived as Korean categories /p/ and /t/, except more often as the Korean /t/ category, with higher mean goodness ratings. This connection is also different from the connection of the voiced stop counterpart /d/. First, the mean goodness rating for the fricative /j/ was lower than that for the stop /d/. Second, in the connection between the English /d/ stimuli and Korean categories, the more commonly used Korean category /t/ had higher mean

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

712

goodness rating than the less commonly applied Korean category /t0 /. However, in the case of English /j/ stimuli, the more common Korean category /t/ had the lower mean goodness rating than the less common Korean category /p/. All these suggest that, again, while both the English ‘‘voiced’’ stops and fricatives elicit multiple categories in the Korean system that they map on to, the fit of the stop categories to the Korean categories is systematically better. The /f/ stimuli were generally labeled with either Korean /ph/ or /p0 /, though a number of other labels were also applied (e.g., /p/, /s0 /, and /h/). As with voiced fricatives, the overall mean goodness ratings were relatively low for /f/. Although Korean bilabial stops were selected for /f/, the Korean listeners did not feel that the voiceless fricative /f/ and the Korean categories were actually very similar as shown by the low mean goodness ratings. An even wider range of alternatives was selected for the /y/ stimuli (/p0 /, /ph/, /t/, /t0 /, /s/, and /s0 /) with, again, relatively low mean goodness ratings. Although coronal Korean categories were chosen for the voiceless coronal fricative /y/, the Korean listeners gave very low goodness ratings to them. Thus, to summarize the general patterns, voiceless stops map closely onto Korean aspirated stops, voiced stops map onto both lax and tense stops and exhibit lower goodness ratings, and fricatives also map onto multiple categories but exhibit even lower goodness ratings. 3.1.2. Comparison with Schmidt (1996) To determine the degree to which our mappings are likely to be replicable, we compared the current data with that in Schmidt (1996). Schmidt (1996) did a similar experiment with 20 Korean learners of English using the orthographic classification technique. In her experiment, 20 Korean listeners (12 male and 8 female; ages between 21 and 38 years, mean ¼ 30 years; length of residence in USA between 4 months and 5.5 years, mean ¼ 3 years; usage of Korean is at least 50% of the time) listened to the stimuli with headphones in a sound-attenuated booth and typed the consonant they heard. They also rated how similar that English sound was to their choice on a scale from 1 to 5 (5 ¼ exactly like Korean). The stimuli were CV nonsense words of 22 consonants combined with the vowels /i/, /a/, and /u/ produced by 3 female native speakers of English. In order to compare her results with ours, the results from the eight consonants /p/, /b/, /f/, /v/, /y/, /j/, /t/, and /d/ combined with the vowel /a/ were reorganized and presented here as Table 3. In general, the results for stops were similar in both studies, while they were a little different for most fricatives except /f/. Schmidt (1996) reported that the voiceless stops /p/ and /t/ were perceived as single

Table 3 Matrix showing percentage labeling of English initial consonants as Korean consonants and their mean goodness ratings from Schmidt (1996) Korean

/p/ /p0 / /ph/ /t/ /t0 / /th/ /L/ /s/ /s0 / /c/ /c0 / /ch/ /m/ /h/

English consonants /p/

/b/

/f/

/v/

/y/

/j/

/t/

3 (2.5)

57 (3.9) 43 (3.9)

5 (1.7) 15 (3.3) 65 (2.0)

97 (2.6)

3 (1.5) 3 (3.5) 5 (2.3) 8 (2.0) 23 (3.1) 5 (3.7)

5 (2.0)

2 (2.0)

48 (3.2) 45 (4.3)

2 (4.0)

97 (4.8)

8 (2.6) 40 (3.1)

/d/

82 (3.7) 18 (3.7)

93 (4.7) 2 (1.0) 3 (1.0)

2 (2.0) 2 (1.0) 15 (2.1)

1 (4.50)

2 (1.0)

Each stimulus had approximately 60 tokens (3 talkers  20 listeners) for the analysis. Similarity ratings were on a gradient scale from 1 to 5 (5 ¼ exactly like Korean). Modal responses are marked in bold.

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

713

Korean categories /ph/ and /th/, respectively, while the voiced stops /b/ and /d/ were perceived as two Korean categories. Our results are the same for the stops, with the additional observation that the stimulus /b/ has a connection to the Korean category /ph/. Schmidt reported that /f/, /y/, and /j/ were perceived as more than two Korean categories while /v/ was perceived as a single Korean category. Our data also showed that /f/ and /y/ were connected to more than two Korean categories, though the paths were a little different. Schmidt’s listeners chose /h/ for /f/ more than our listeners. Schmidt reported that /y/ was perceived as /s0 /, /th/, /t0 /, etc., as in the current data, but the proportion of /ph/ for /y/ was larger in the current results. /j/ was mapped onto Korean categories /p/, /t/, and /t0 / in Schmidt’s results but the strong connection between /j/ and /t0 / was not observed in the current data. /v/, which was connected to a single category /p/ in Schmidt’s results, was connected to two Korean categories, /p/ and /t/ in ours. From this comparison, it seems that the mapping between English stops and Korean categories are very similar across the two studies. However, the mapping between English fricatives and Korean categories varies between the two studies. These results appear to be related to the fact that Schmidt also reported relatively high mean similarity ratings for the stops and low mean similarity ratings for the fricatives. The fricative categories, which map particularly poorly onto the Korean categories, exhibit different patterns across the two studies, while the ones with the closer match to the Korean categories exhibit similar mapping patterns. However, it should be noted that the two studies were conducted in different environments: the current one in free field through loudspeakers and Schmidt (1996) in a sound-attenuated booth with headphones. Hence, the difference in mapping for the fricatives might be due to noise in the presentation environment. Whether differences are due to population or noise is not clear, though the results of previous listening in noise studies, e.g., Miller and Nicely (1955), do not offer much help in explaining the peculiarities of the differences between these groups. The pattern of confusions in from +6 to 6 dB S/N in Miller and Nicely’s data is very similar, and there does not seem to be any particular sensitivity difference for stops and fricatives. Non-sibilant fricatives exhibit more confusions overall, but the pattern of spread in confusions is roughly similar to those for stops. 3.2. English identification patterns We report the results from the Roman labeling task here. The proportions of English categories chosen for each stimulus with the listeners’ mean goodness ratings (N ¼ 40) are shown in Table 4. The Korean listeners were better at identifying voiceless English sounds than their voiced counterparts. For example, English /p/ and /t/ were correctly identified at a rate of 90% and the rate for English /f/ was 76%. The Korean listeners were also confident in their identification for these sounds as the high mean ratings demonstrate. By contrast, identification accuracies of all English voiced stimuli (i.e., /b/, /d/, /v/, and /j/) and English /y/ were less than 60% and their mean goodness ratings for a modal response were less than 5.50, except for English /v/. This suggests that the Korean listeners had difficulty in identifying these English sounds. English /d/ was confused with English /j/, with errors in both directions. Confusability among sounds, however, was not always bidirectional; English /b/ was often identified as English /p/, whereas the reverse identification pattern was rarely observed. The general patterns apparent in the Roman labeling results are that most voiceless English sounds, except English /y/, are better identified than voiced English sounds and that the poor identification of English sounds seems to be related to confusions among categories. Another observation from these results is that the mean confidence ratings were higher for the less confusable sounds than the more confusable ones. 3.3. Predicting identification patterns from category mapping The next step in the analysis is to relate L2 identification performance in Section 3.2 to predictions from the L1 labeling patterns in Section 3.1.1. To do this, we add mappings from the Korean to English labels to the English to Korean category mappings in Fig. 1. The simple probabilistic model is based on the likelihood that a Korean label is applied to an English stimulus, i.e., on the same data as presented in Fig. 1. While Fig. 1 plots the proportion of times an English stimulus category is labeled with each Korean label, the reverse mapping includes the proportion of times a Korean label corresponded to that English category.

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

714

Table 4 Matrix showing percentage labeling of English initial consonants with Roman and IPA symbols and their mean confidence ratings IPA

/p/ /b/ /t/ /d/ /y/ /j/ /f/ /v/ /s/ /z/ /l/ /r/ /j/ /w/ /h/ Others

English consonants /p/

/b/

/f/

/v/

89 (6.21) 1 (5.50) 1 (4.00)

14 (5.13) 52 (5.13) 2 (3.00)

4 (5.33) 8 (4.07)

14 (4.69)

3 (4.00) 1 (5.00) 76 (6.09) 4 (4.33)

7 (4.09) 3 (5.00) 15 (5.00) 1 (6.00) 52 (6.02)

7 (5.75)

2 (3.66) 2 (3.00) 14 (4.81) 8 (4.83)

/y/

/j/

3 (3.60)

/t/

/d/

1 (3.00) 4 (4.42) 93 (6.08)

2 (5.00) 52 (5.39) 14 (4.86) 23 (5.51)

24 (4.79) 3 (5.40) 56 (5.48)

1 (2.00) 59 (5.32) 36 (5.27)

3 (5.75) 8 (5.33)

3 (3.25)

1 (5.50)

2 (3.33) 1 (5.00) 1 (5.5)

2 (4.00) 5 (1.25)

3 (1.00)

3 (4.6)

2 (3.00)

1 (5.50)

Judgments totaling less than 1% are not shown. Each stimulus had approximately 160 tokens (4 talkers  40 listeners) for the analysis. Confidence ratings were on a gradient scale from 1 to 7 (7 ¼ very confident). Modal responses are marked in bold.

Fig. 2 illustrates the complete bidirectional mapping between English and Korean categories, by adding the reverse probabilities to the right of the mappings shown in Fig. 1. As shown in Fig. 2, the mappings of English to Korean categories and of Korean to English categories are not just symmetrical versions of one another. For example, the highly probable mapping from English /p/ to Korean /ph/ becomes less probable from Korean /ph/ to English /p/, because even though English /p/ is virtually always heard as Korean /ph/, there are a number of other English consonants also heard as instances of Korean /ph/. This contrasts with the case for English /t/, where the Korean /th/ label is used only for the English /t/ stimuli. Another example is the English /b/ and Korean /ph/ mappings. The thin line connecting English category /b/ and Korean /ph/ does not have a reverse counterpart in this graph, because the proportion of times that the Korean /ph/ label in which the stimulus was English /b/ was below the 7% threshold used to generate the figure. To generate predictions about Korean identification performance, we sum up the product of the proportions for each connection to each Korean label that yields a path from a set of stimuli to a particular English response label. For example, in order to predict the accuracy with which English /p/ is correctly perceived by Korean listeners, we consider two paths, one connecting English /p/ stimuli to Korean category /ph/ and the other connecting the /ph/ and English response category /p/. Although English /p/ stimuli were mostly perceived as Korean /ph/, the /ph/ was, in fact, used for four sets of stimuli, /p/, /f/, /b/, and /y/ as shown in Table 2. Thus, the probability that English /p/ is identified correctly as English /p/, based on Korean categories alone, will be the proportion of times that English /p/ stimuli were perceived as Korean /ph/ (95%), multiplied by the proportion of times that the Korean /ph/ corresponds to an English /p/ (54%), which is 52%. Note that this approach predicts that having a one-to-two mapping of English categories onto Korean categories would not, by itself, create identification errors. An example, from Fig. 2 is English /b/, which seems to lie between Korean /p/ and /p0 /. Here, whether the Korean label applied is /p/ or /p0 / does not matter, as long as the reverse mapping also associates the two Korean categories with the English /b/ category. Hence, our prediction for this one-to-two category mapping case is that the likelihood of getting the right English category is the sum of the likelihood of getting the English /b/ category from the path leading through Korean /p/ (13%), and the likelihood of getting the /b/ category through the Korean /p0 / category (27%), for a total predicted accuracy of 40%. Of course, in practice, when there are two L1 categories involved, the coverage of the two categories is likely to be quite high, and so the association of either of those categories with different

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

715

Fig. 2. Bidirectional perceptual mapping between English and Korean categories. The width of each line in English to Korean categories corresponds to the proportion of times the Korean label was applied to the English production. The width of each line in Korean to English categories corresponds to the proportion of the English productions which bore the Korean label. Any mappings occurring less than 7% of the time (i.e., chance level) are not included.

L2 categories is also likely to be higher, leading to more misidentifications. However, the one-to-two category mapping, by itself, does not automatically lead to predicted poor L2 identification accuracy. Specific errors can also be predicted in the same way. For instance, the error rate at which Korean listeners misidentify English /p/ as /f/ is predicted as the proportions associated with paths connecting English /p/ and English /f/. In Fig. 2, English /p/ is connected to /f/ only through Korean /ph/; thus the probability of English /p/ being labeled as /f/ by Korean listeners will be the proportion of /p/ stimuli labeled as Korean /ph/ (95%) multiplied by the proportion of times that the Korean /ph/ corresponds to an English /f/ (31%), resulting in a predicted error rate of 29%. The predictions above do not incorporate the goodness rating data. These predictions can be modulated by weighting the probabilities of each connection between the Korean labels and the English response labels by the adjusted goodness ratings. Since, in general, the goodness ratings for the stops were higher than those for the fricatives, the effect of such weighting will increase the predicted accuracy for stops while decreasing the predicted accuracy for fricatives. Also, predicted errors will more often involve labeling fricatives as stops rather than labeling stops as fricatives. Fig. 3a plots the accuracy in the Roman labeling task for each of the eight target segments against predictions of the unweighted model. The x ¼ y diagonal indicates perfectly accurate predictions. There is a general correlation between predicted and actual accuracies (R2 ¼ .63), suggesting that the model is roughly on the right track. The unweighted model predicted the good identification of /t/, but tends to underestimate actual accuracy by 15–40% for the other segments, as is apparent in most tokens appearing above the line in Fig. 3a. Particularly striking is the model’s prediction that /p/ would be less accurately identified than /t/ (by 43%: t ¼ 95% and p ¼ 52%), due to the fact that the Korean listeners used the Korean /ph/ category for a number of other English segments. This prediction, however, was not born out; voiceless stop accuracy is nearly equal for both the coronal and the labial stop. Also evident in Fig. 3a, the English /b/ and /d/ were

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

716 100

100

80 Observed Accuracy Rate (%)

t

p

p 80 f

f

d

60

60

d v

θ

b

θ

b

v

40

40

20

20 R Sq Linear = 0.63

R Sq Linear = 0.44 0

0 0

20

40

60

80

100

0

20

40

60

Predicted Accuracy Rate (%)

Predicted Accuracy Rate (%)

(a) Unweighted model

(b) Weighted model Fricatives

80

100

Stops

Fig. 3. Accuracy rate predictions. Accuracy in labeling English productions with Roman character labels plotted as a function of the accuracy predicted from labeling with Korean orthographic labels. Empty circles represent stops and filled circles indicate fricatives. The line (x ¼ y) indicates exact prediction by the model. The left panel (a) plots predictions from a model without weighting by goodness ratings, while the right panel (b) plots weighed predictions.

predicted to have slightly better accuracy than the fricatives (since they lie further to the right). However, this prediction also turned out to be incorrect. The results for /p/ in particular suggest that the reverse mapping of Korean /ph/ to the various fricative categories (which did not happen with the Korean /th/ category) is having too much impact on the model’s predictions. This correlates with the fact that such reverse mappings to English categories other than /p/ receive relatively low goodness ratings. Assuming that the Korean listeners are biased toward using labels of categories that match closely to their L1 categories, we expect the weighted model to perform better. The results of the Roman labeling task are plotted against predictions by the weighted model for each segment in Fig. 3b. The weighted model predicted the perceptual accuracy for stops quite accurately (R2 for stops ¼ .71). Even with the weighting, however, the model still underestimates the accuracy for /p/, though the underestimation is considerably less. The predictions for the other stops, however, are very accurate (within 5%). The perceptual pattern of the fricatives was not correctly predicted by the weighted model. The effect of weighting lowers the predicted accuracy for fricatives, increasing the divergence of the actual accuracy from the predictions.2 The listeners were 25% to 60% better than the weighted model predicted. Comparing the two panels of Fig. 3 shows the effects of the goodness judgments on the predictions. It is quite clear that adjusting the predictions with the goodness ratings using other threshold values to modulate their strength will always increase the accuracy of the predictions for either voiced stops or fricatives at the expense of the other. The particular threshold values used in the weighted model yield almost exactly correct predicted accuracies for the voiced stops, but do so at the cost of the fricatives. 2

Note that Pearson r2 value for regressions between predicted and actual accuracies is quite low (i.e., .44) in the weighted model, due to the poor prediction on the perceptual patterns for the fricatives.

ARTICLE IN PRESS

Observed Error Rate (%)

H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723 60

60

50

50

40

40

30

30

20

20

10

10

0

R Sq Linear = 0.44

0

10

20

30

40

50

717

R Sq Linear = 0.41

0 60

0

10

Predicted Error Rate (%)

20

30

40

50

60

Predicted Error Rate (%)

(a) Unweighted model

(b) Weighted model

FF

FS

SF

SS

Fig. 4. Error rate predictions. Error rates in labeling English productions with Roman character labels plotted as a function of the specific errors predicted from labeling with Korean orthographic labels. Filled circles (FF) represent errors between English fricatives and filled squares (SS) between English stops. Empty circles (FS) are errors where the listeners identified a fricative as a stop while empty squares (SF) are the cases where the listeners perceived a stop as a fricative. The line (x ¼ y) is a reference line for correct prediction by the model. The left panel (a) plots predictions from a model without weighting by goodness ratings, while the right panel (b) plots weighed predictions.

Turning to the prediction of specific errors, Fig. 4 presents the error rate predictions with actual error rate in the Roman labeling task. For both models, there is a significant correlation between predicted and actual error rates (R2 ¼ .44 and .41 for the unweighted and weighted models, respectively). These regressions are largely due to the models correctly predicting the large number of error types which do not, in fact, happen. Leaving these out, the models do not perform all that well. We can see, as in the accuracy plots that, by and large, the listeners have fewer errors than predicted, sometimes as much as 26% in the unweighted model, and 36% in the weighted model. In both models, however, almost all the large deviations are related to fricatives. Both models tended to predict more errors for the confusion between fricatives and stops than what actually occurred (unfilled symbols). This is expected from the results for accuracy prediction presented above, where accuracy for fricatives was much higher than predicted. Some complementary problems are also apparent; the models, especially the weighted model, underestimate the error rates encountered between the fricative categories. The most obvious observation, then, is that those segments that are similar to some L1 segment have identification performance that is well predicted on the basis of the mapping data. To quantify this possibility, we use the average goodness estimates for each segment to index the general match of that segment to some segment in the L1, and predict the degree of deviation from performance predicted from the mapping models. Note that the goodness data used in the weighted model are not the same as the one used here, since the goodness estimates in the weighted model are for individual mappings between the L2 stimuli and L1 label, whereas here, the estimate of overall match to a L1 category is the average estimate for any mapping of the segment. We use average goodness estimates, rather than weighted indices (like the ‘fit index’ in Guion et al., 2000) that incorporate variation in the mapping pattern, since the logic or the mapping analyses above is that

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

Actually Observed Accuracy-Predicted Accuracy (%)

718 60

60

50

50

40

f

40 p

f

θ

30

30 v

20

θ v

20

p

b R Sq Linear = 0.341 d

10

R Sq Linear = 0.041

10

0

b

0 t

d

-10 4.0

t

-10 4.5

5.0

5.5

6.0

6.5

4.0

Average Goodness Rating

4.5

5.0

5.5

6.0

6.5

Average Goodness Rating

(a) Unweighted model

(b) Weighted model Fricatives

Stops

Fig. 5. Deviation predictions. Observed identification accuracy deviation from predicted identification accuracy plotted as a function of average goodness estimates for each segment. Empty circles represent stops and filled circles indicate fricatives. The panels (a) and (b) represent the deviations from the unweighted and weighted models’ accuracy predictions, respectively.

two-to-one mappings should not present problems as long as they are mapped consistently back onto an L2 response category. Goodness estimates here are used as a direct index of the degree of fit in assigning the stimuli to some L1 category. The results of this analysis are plotted in Fig. 5. Fig. 5a and b show a general, negative correlation between average goodness rating estimates and the degree of deviation from predicted identifications based on L1 mappings, with two obvious exceptions, /f/ and /p/. As the segments were judged to be more similar to some Korean categories, the degree of deviation from the predicted identification accuracy decreased. For example, English /t/ received very high similarity ratings (mean ¼ 6.09 out of 7.00) and the deviation between the actually observed accuracy and the predicted identification performance was close to zero. On the other hand, English fricatives /v/, /y/, and /j/ received low similarity ratings and the deviations were relatively large (about 20%). The similarity ratings for English voiced stops /b/ and /d/ were between those for English /t/ and fricatives /v/, /y/, and /j/, as was the deviation from predicted accuracy. Plotting deviation from the weighted model (Fig. 5b) shows an obvious reduction of deviation for /p/. However, a greater deviation for /f/ compensates for such a reduction. The overall negative correlation in Fig. 5b is more obvious than that in Fig. 5a. Nevertheless, this is expected, since segments with higher goodness estimates to the right have greater predicted accuracy, which is compounded in the value plotted on the y-axis. The exceptionality of /p/ and /f/ also does not go away; the two are simply more accurately identified than is predicted by the cross-language mapping data and the overall goodness estimates. 4. Discussion The general pattern that emerges from the current results is that L2 identification accuracy can be considered a function of L2-to-L1 mapping, but only if the L2-to-L1 mapping is good, as indexed by subjective goodness estimates, and only if mapping of L2 categories onto L2 response options is also

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

719

modulated by the goodness-of-fit between the L2 and L1 categories. Thus, as originally proposed by early versions of SLM, L2 segments which are perceptually similar to L1 segments seem to have confusability that is based on the probability that the L2 segments mapped onto the L1 segments are distinct from one another in the mapping. L2 segments that are judged to be poor fits to any L1 category, however, exhibit identification accuracy that is systematically much greater than predicted by the L1 mapping. Also, error rates between the good and poor fitting segments are systematically lower than predicted by L1 mapping. These results are reminiscent of Flege’s original terminological distinction between ‘‘similar’’ sounds that are entangled with L1 categories, and ‘‘new’’ sounds that are not. Lack of entanglement with L1 categories does not, however, tell the whole story with ‘‘new’’ categories. While accuracy with the non-sibilant fricatives is systematically greater than predicted on the basis of L1 mapping, and the error rate between stops and fricatives is typically lower than predicted, there is also a tendency in the data for error rates between ‘‘new’’ fricatives categories to be higher than predicted (Fig. 4b). This suggests that, while L1 categories might not be interfering with L2 identification in these cases, this is not to say that the new L2 categories are necessarily very effective at distinguishing the fricatives. The learners must develop accuracy with these categories. What this highlights is that mapping models do not address the nature of the developing new categories. Specifically, production studies have suggested that such categories are eventually recognized without interfering L1 categories (Flege, 1987). However, in perception, it appears that, though such categories might be well suited to the L2 productions, they may also be less effective at separating distinct categories in general. How this develops throughout the process of acquisition is another area for future work; an obvious extension of the current study would be to apply the quantitative model to learners with a broader range of experience. One final note on the ‘‘new’’ vs. ‘‘similar’’ distinction is in order. More recent work on L2 learning has shied away from this categorical approach, based on the observation that similarity is a gradient property, as is evident in previous mapping experiments (e.g., Guion et al., 2000; Strange et al., 1998). Between obviously similar segments, e.g., the voiceless stops in English are very similar to the aspirated stops in Korean, and obviously new segments, e.g., the anterior non-sibilant fricatives in English are very different from any of the Korean segments, there are segments which are somewhere in between, e.g., the voiced stops in the current data. The obvious question then arises as to what is to be said of such segments. The results for the voiced stops in the current data suggest that they are identified using L1 categories, as voiceless stops are, exhibiting the (relatively low) accuracy predicted by the L1 mapping data. Further evidence for this conclusion is the very similar mapping results between the current study and that of Schmidt (1996); while fricatives exhibit numerous mapping differences across the two studies, the mapping for the voiced stops seems to be stable across the two studies. (Though, see Section 3.1.2, above, for caveats on this conclusion.) The extent of such apparent stability throughout the learning process is unknown, but is an important target of investigation. One would expect the status (‘‘new’’ or ‘‘similar’’) of these intermediate sounds to be a function of degree of experience with the L2; as learners have more experience with the L2 system, the perceived differences between the similar L1 and L2 phones will become greater, and so the likelihood of an independent L2 category will increase. (See Kang & Guion, 2006 on independence of L1 and L2 phonological systems in production.) However, this needs to be verified with data from a group of more experienced bilinguals. (See Cebrian, 2006 on the influence of L2 experience on L2 vowel identification.) Looking in more detail at the use of the mapping data to predict L2 accuracy, our approaches lead to two expectations, which bear special scrutiny. First, to what extent are one-to-two mappings likely to lead to low accuracy? Here, the relevant cases are, again, the voiced stops. Working through the pattern for /b/ illustrates why split mappings are likely to yield lower accuracy. The problem with having two L1 categories corresponding to an L2 category is that each L1 category has some likelihood to map onto other segments in a crowded system such as the English system. For example, the mapping model illustrated in Fig. 2 predicts /b/ to be misidentified as /f/, based on the L1 /p0 / category, and such errors are common in the current data (13.7%). On top of this, the model also predicts /b/ to be misidentified as /v/ on the basis of the L1 /p/ category. These errors also occur (7.5%). It is in the combination of these two sets of errors that the /b/ accuracy is predicted to be (and turns out to be) low. The second question to be examined is the degree to which goodness estimates should be incorporated into the mapping models. Goodness estimates enter into the current model in two ways. In addition to indicating

ARTICLE IN PRESS 720

H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

the use of an L1 segment for the L2 perception, the goodness estimates here are also related to the probability that the non-native listener will use an L2 label. L2 categories that map well onto L1 categories are more likely to be used to label items that fit well in those L1 categories. The major difference that adding the goodness estimates makes is to increase the predicted accuracy for the stops, which matches actual performance, at the cost of reducing the predicted accuracy for the fricatives, which does not match actual performance. While adding the goodness weighting in the model does make the results conform more to expectations of the original versions of SLM, is there evidence in the data to support such weighting? The most obvious feature to look for is a strong directional bias. If an L1 category is used to identify the L2 stimuli and labeling is weighted toward better fitting segments, we would expect a bias toward using more similar L2 categories. Perusing Table 4 for error directions shows very little of such biasing, however. For example, /d/ and /j/ are quite confusable, and the direction of the confusions, if anything, is in the direction of /j/ (24% of /j/ identified as /d/ errors and 56% of accurate /j/ identification; however, 36% of /d/ identified as /j/ errors and 59% accurate /d/ identification). The problem with interpreting such results with respect to the mapping algorithm, however, is that there is other evidence in the data that indicates the listeners are developing new L2 fricative categories, and hence mapping weights may not be relevant to the fricative identification. In this case, the directionality of the errors could be determined by outside factors, such as the listeners’ likelihood to choose a new (English) category for the English stimuli. This bias toward categories particularly associated with the L2 has appeared in a number of studies; for example, Flege (1987) reports that inexperienced French learners tend to produce French /u/ as /y/. Major (1987) also reports that some Brazilian Portuguese learners of English tend to produce English vowel /e/ as /æ/, which is absent in their L1 vowel inventory. In addition, inexperienced Japanese learners in Nagao, Lim, and de Jong (2003)’s study also tend to label ambiguously syllabified stops as being in a coda, even though Japanese does not allow such codas. The only directly relevant case in the current data set is that between /b/ and /p/, both of which appear to be linked to L1 categories. In this case, there is a very robust confusion directionality in the errors, most being /b/ identified as /p/ errors, as would be predicted by the goodness estimates; the more similar segment /p/ is used more than the less similar segment /b/. (Note that there are no parallel /t/–/d/ confusions.) While this is not much evidence to go on, further weight is lent by the fact that native English listeners do not exhibit this directionality in /p/–/b/ confusions (Cutler, Weber, Smits, & Cooper, 2004). One final issue concerns the exceptionality of /f/ and /p/. The Korean listeners were much better at the voiceless labials than what would be predicted on the basis of the L1 mapping. For the /p/, the problem might simply be that our weighting scheme does not give enough weight to the extremely high similarity estimates for the voiceless stops. This, however, will not explain the Korean listeners’ exceptional performance on /f/. One possibility is that /f/ is simply more identifiable as a segment than are the other anterior non-sibilants, though previous data are not clear on this point. While there is some evidence for it in Miller and Nicely’s (1955) data, higher accuracy with /f/ does not appear with the native (English) nor with the non-native (Dutch) listeners in Cutler et al. (2004), and the perceptual similarity modeling in Silbert and de Jong (under review) places /f/ where we would expect it to be, based on the other fricatives. Another possibility is that /f/ is somehow selected, perhaps by the way they are instructed in English classes, as a prototypical English category to Koreans, fostering increased /f/ responses (Park, Hao, & de Jong, 2007). Thus, it is unclear at this point whether these segmental deviations are due to the nature of the segments themselves, due to the listeners’ likelihood to choose a new category for English stimuli, or due to some aspect of the ongoing developing system that is not being addressed by the mapping models discussed here. 5. Conclusion The current paper provides a quantification of the degree to which an L1 category inventory affects the identification accuracy of L2 sounds. This study shows a sizable difference in the influence of L1 categories on L2 categories, conforming to the original distinction made in the SLM between ‘‘new’’ and ‘‘similar’’ categories, and suggests a way of determining whether a category is operating independently of mapping from the L1 categories. Results here also suggest that there are remaining peculiarities particularly with regards to

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

721

the development of the new categories, as suggested in previous studies such as Guion et al. (2000). The extent to which the aspects of new category development can be predicted from either aspects of the L1 systems on one hand, or on the interaction between native listeners and the peculiar exigencies of particular segments on the other, remains as a pressing topic for future research, along with more directed longitudinal research in segments which fall somewhere between ‘‘similar’’ and ‘‘new’’. Acknowledgments This work is supported by NSF (Grant #BCS-04406540; ‘‘Prosody in Cross-language Production and Perception’’). We would like to acknowledge valuable comments from Ocke-Schwen Bohn and three anonymous reviewers. We would also like to express appreciation to Mi-Hui Cho for help in collection of the data reported here and to Kyoko Nagao and Noah Silbert for their work on the design and processing of the data. Appendix 1. Instructions and answer sheet for the Korean labeling task.

You will hear a series of items spoken by native speakers of English. For each item, you will hear a number followed by a one or two syllable word (these words are not real English words.) Please identify the consonant you hear among the alternatives.

If the choices do not include the consonant that you hear, you may write what you heard in the space marked other ( ) in Korean orthography.

After identifying the consonant that you heard among the alternatives, indicate how similar your choice and the consonant you heard are by circling a number to the right of the alternatives. For example, circle ‘‘1’’ if the consonant you chose is not similar to the consonant you heard at all. However, circle ‘‘7’’ if you think that the consonant you chose and the consonant you heard are exactly the same.

2. Instructions and answer sheet for Roman and IPA labeling task. You will hear a series of items spoken by native speakers of English. For each item, you will hear a number followed by a one or two syllable nonsense word. For each nonsense word, please identify the consonant you hear by circling the appropriate symbol. If you do not find a symbol for the consonant that you hear, you may write the appropriate English consonant in the space marked other (  ). The symbols that you may

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

722

choose from are: Symbol

Description

p t f y s r w h

as as as as as as as as

in in in in in in in in

the the the the the the the the

words words words words words words words words

pit, apple, and stop ten, beauty, and cat fan, beautiful, and half think, math, and thank you salt, list, and pass rock, hearing, and cover wood, towel, and cow happy, ahead, and hand

Symbol

Description

b d v j z l y

as as as as as as as

in in in in in in in

the the the the the the the

words words words words words words words

bad, table, and rob door, body, and mad van, cover, and save they, brother, and this zebra, amazing, and size light, feeling, and ball yes, lawyer, and toy

After identifying the consonant that you heard, indicate how certain you are that you have chosen the appropriate symbol by circling a number to the right of the consonant symbols. The number 1 indicates that you are not confident that you have chosen the appropriate symbol for the consonant that you heard (you are just guessing). The number 7 indicates that you are very confident that you have chosen the appropriate symbol for the consonant that you heard. Keyword tell dog thin that fall vase sit zip pin ball rain law hall wood yes 1.

t

d

y

j

f

v

s

z

p

b

r

l

h

w

y

confidence + other ( ) 1 2 3 4 5 6 7

References Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. Journal of the Acoustical Society of America, 109, 775–794. Best, C. T., McRoberts, G. W., & Sithole, N. M. (1988). Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance, 14, 345–360. Cebrian, J. (2006). Experience and the use of non-native duration in L2 vowel categorization. Journal of Phonetics, 34, 327–387. Cutler, A., Weber, A., Smits, R., & Cooper, N. (2004). Patterns of English phoneme confusions by native and non-native listeners. Journal of the Acoustical Society of America, 116, 3668–3678. Flege, J. E. (1987). The production of new and similar phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47–65. Flege, J. E. (1991). The interlingual identification of Spanish and English vowels: Orthographic evidence. Quarterly Journal of Experimental Psychology, 43, 701–731. Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and language experience: Issues in cross-language research (pp. 233–277). Baltimore. MD: York Press. Flege, J. E., & MacKay, I. R. A. (2004). Perceiving vowels in a second language. Studies in Second Language Acquisition, 26, 1–34. Guion, S. G., Flege, J. E., Akahane-Yamada, R., & Pruitt, J. C. (2000). An investigation of current models of second language speech perception: The case of Japanese adults’ perception of English consonants. Journal of the Acoustical Society of America, 107, 2711–2724. Kang, K.-H., & Guion, S. G. (2006). Phonological systems in bilinguals: Age of learning effects on the stop consonant systems of KoreanEnglish bilinguals. Journal of the Acoustical Society of America, 119, 1672–1683. Kang, Y.-J. (2003). Perceptual similarity in loanword adaptation: English postvocalic word-final stops in Korean. Phonology, 20, 219–273. Kim, C.-W. (1970). A theory of aspiration. Phonetica, 21, 107–116. Kim, D.-J. (1972). A contrastive study of English and Korean phonology. Language Teaching, 5, 1–36. Major, R. C. (1987). Phonological similarity, markedness, and rate of L2 acquisition. Studies in Second Language Acquisition, 9, 63–82. Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 27, 338–352. Nagao, K., Lim, B.-J., & de Jong, K. J. (2003). Perception of rate induced resyllabification: Cross-language comparison. In D. Burleson, C. Dillon, & R. Port (Eds.), Speech prosody and timing, dynamic aspects of speech: IULC working papers in linguistics (Vol. 4, pp. 33–43). Bloomington, IN: IULC. Park, H. (2007). Varied adaptation patterns of English stops and fricatives in Korean loanwords: The influence of the P-map. IULC working papers, online (Vol. 7). Park, H. (2008, in press). Limits to the role of perception in Korean loanwords: English anterior obstruents in various prosodic locations. Harvard studies in Korean linguistics (Vol. XII).

ARTICLE IN PRESS H. Park, K.J. de Jong / Journal of Phonetics 36 (2008) 704–723

723

Park, H., de Jong, K. J., & Silbert, N. (2004). Cross-language perceptual category mapping: Korean perception of English obstruents. Journal of the Acoustical Society of America, 115, 2504. Park, H., Hao, Y.-C., & de Jong, K. J. (2007). Neutralization in the perception and production of English coda obstruents by Korean learners of English. Journal of the Acoustical Society of America, 122(Part 2), 3018. Polka, L. (1995). Linguistic influences in adult perception of non-native vowel contrasts. Journal of the Acoustical Society of America, 97, 1286–1296. Schmidt, A. M. (1996). Cross-language identification of consonants. Part 1. Korean perception of English. Journal of the Acoustical Society of America, 99, 3201–3211. Silbert, N. H., & de Jong, K. J. (in review). Are all features created equal? The relationship between phonological features and perceptual similarity. Journal of Phonetics. Strange, W., Akahane-Yamada, R., Kudo, R., Trent, S. A., Nishi, K., & Jenkins, J. J. (1998). Perceptual assimilation of American English vowels by Japanese listeners. Journal of Phonetics, 26, 311–344. Tsukada, K., Birdsong, D., Bialystok, E., Mack, M., Sung, H., & Flege, J. (2005). A developmental study of English vowel production and perception by native Korean adults and children. Journal of Phonetics, 33, 263–290. Wiik, K. (1965). Finnish and English vowels. Turun yliopiston julkaisuja B: 94. Turun yliopisto.