Structure and content in language production: A theory of frame constraints in phonological speech errors

COGNITIVE SCIENCE 17, 149-195 (1993) Structureand Content in LanguageProduction: A Theory of FrameConstraints in PhonologicalSpeechErrors GARY S...

Download PDF

3MB Sizes 0 Downloads 65 Views

Report

Full Text

COGNITIVE

SCIENCE

17,

149-195

(1993)

Structureand Content in LanguageProduction: A Theory of FrameConstraints in PhonologicalSpeechErrors GARY S.DELL University of Illinois

CORNELL JULIANO University of Rochester

ANITA GOVINDJEE University of Illinois

.

Theories of language production propose that utterances are constructed by a mechanism that separates linguistic content from linguistic structure. linguistic content is retrieved from the mental lexicon, and is then inserted into slots in linguistic structures or frames. Support for this kind of model at the phonological level comes from patterns of phonological speech errors. W present an alternative account of these patterns using a connectionist or parallel distributed pracessing (PDP) model that learns to produce sequences of phonological features. The model’s errors exhibit some of the properties of human speech errors, specifically, frames.

properties or other

that have been attributed structural generalizations.

to the

action

of phonological

rules,

A standard assumption of linguistic theory is that words and sentences are properly described as a merger of structure and content. A sentence description contains both a string of words (its content) and a syntactic tree specifying the hierarchical relations among the words (its structure). An individual word, itself, is viewed as a merger of phonological structure and content. This research was supported by NSF BNS-8910546, NIH NS25502, and NIH DC-00191. We wish to thank P. O’Seaghdha, R. Peterson, C. Sevald, A. Meyer, S. Garnsey, M. Tanenhaus, T. Bever, J. Lachter, J. Stemberger, J. Cole, D. MacKay, and J. Elman for helpful comments, discussion, and information, and Linda May for work on the manuscript. Correspondence and requests for reprints should be sent to Gary S. Dell, Department of Psychology, University of Illinois, 603 E. Daniel St., Champaign, IL 61820. 149

150

DELL, JULIANO,

AND

GOVINDJEE

The word’s content consists of a string of phonological or phonetic features or segments and its structure describes an arrangement of these features into feet, syllables, syllabic constituents, and other levels of organization (e.g., Clements & Keyser, 1983; Liberman & Prince, 1977). In response to seminal papers by Lashley (1951) and Chomsky (1957), and later by Garrett (1975), this structure-content distinction became the cornerstone of modern psychological theories of language production (e.g., Bock, 1982, 1986; Dell, 1986; Fromkin, 1971; Garrett, 1980; Kempen & Hoenkamp, 1987; Levelt, 1989; MacKay, 1982,1987; Meyer, 1990; ShattuckHufnagel, 1979, 1987; Stemberger, 1985, 1990). In these theories, planning a sentence involves the construction of successive levels of representation. A semantic or conceptual representation is assumed to be constructed first, followed by two linguistic representations, one involving syntactic, and the other, phonological information. Ultimately, the phonological representation is translated into a motor program to produce speech. The two linguistic representations are assumed to be built by a frame-andslot mechanism, as illustrated in Figure 1. Consider the syntactic level. A syntactic frame identifying the phrase structure of the sentence is generated under the guidance of the semantic representation (e.g., Bock, 1982, 1986; Garrett, 1975; Kempen & Hoenkamp, 1987; Lapointe, 1985; Levelt, 1989). This frame contains grammatically labeled slots for lexical items. These items are retrieved from the mental lexicon and inserted into the slots. Thus, there is a separation of the structure (the frame) and content (the lexical items). The frame-and-slot mechanism has also been applied to the planning of the phonological segments of words (e.g., Berg, 1988; Dell, 1986, 1988; Levelt, 1989; MacKay, 1987; Meyer, 1990, 1991; Reich, 1977; ShattuckHufnagel, 1979, 1987; Stemberger, 1990, 1991). A phonological frame, such as that shown at the bottom of Figure 1, is assembled under the guidance of the phonological rules of the language being spoken. According to some theories, a fully constructed frame appropriate for the word to be spoken is retrieved from a stored inventory of frames and, hence, little or no assembly is required. Once the frame is available, phonological segments or feature bundles are then inserted into the slots. These theories, therefore, assert that the phonological encoding of a word initially separates the retrieval or assembly of the word’s phonological structure from the word’s phonological content. These are then brought together when the frame’s slots are filled. One should note that frame-and-slot approaches can be found in connectionist or spreading-activation models (e.g., Berg, 1988; Dell, 1988; MacKay, 1987; Stemberger, 1985) as well as more symbolic models (e.g., Meyer, 1990; Shattuck-Hufnagel, 1979). The frame in the activation models is an interconnected set of nodes, each standing for some structural category such as onset consonant. The filling of a frame slot consists in certain frame nodes of nonspecifically activating potential filler content nodes (e.g., all potential

STRUCTURE

AND

CONTENT

IN LANGUAGE

151

PRODUCTION

Standard Model

Lexicon syntactic Level

V

Phonological Level

Figure

1. Syntactic

Phonological

and

phonological

representations

in theories

of language

production.

onset consonants). The clearest example of the latter process is found in the work of MacKay (1987; see, also, Eikmeyer & Schade, 1990). We suggest that much of the psychological data, that is thought to support the phonological structure-content distinction in production, can be explained by a mechanism in which no such distinction is explicitly present. Specifically, we show that a parallel distributed processing (PDP) model that maps between lexical representations and sequences of phonological features produces a number of speech-error phenomena, phenomena whose

DELL, JULIANO,

152

AND

GOVINDJEE

existence has been taken as strong support for the distinction between phonological frames and phonological segments. This occurs even though the model lacks explicit frames. More generally, we suggest that the role of the structure-content distinction in theories of language production needs to be reexamined in the light of PDP accounts of memory storage and retrieval (e.g., Elman, 1990; Rumelhart & McClelland, 1986). First, we review the relevant speech-error data, and then introduce our model.

FRAME

CONSTRAINTS

IN LANGUAGE

PRODUCTION

The strongest support for frame-and-slot models comes from analyses of speech errors. Here, we focus on a set of facts that we callphonologi~alframe constraints. These are error phenomena that have been taken as evidence for a phonological frame such as that shown in Figure 1. There are two classes of frame constraints: general constraints, which apply to all phonological errors, both movement and nonmovement errors (e.g., see Errors 1 and 2 in the following), and movement constraints, which apply only to movement errors. The model that we develop does not systematically represent contextual influences due to surrounding words, and hence, produces what would be characterized as nonmovement errors. Despite this limitation, this model reveals forces at work in both movement and nonmovement errors, thus, we see it as speaking to the general frame constraints. The more specific movement constraints will be considered in the general discussion. Error 1. a reading list - a leading list (a phonological movement error,

anticipation of /l/; from Fromkin, 1971). Error 2. department - jepartment (substitution of/j/

for id/, a nonmove-

ment error; from Stemberger, 1982). The following four error effects are general constraints. For each effect, we estimate its strength and explain how it provides support for the notion of a phonological frame. Phonotactic Regularity Effect Phonological speech errors almost always create sound sequences that occur in the language being spoken. Error 3 is one of the rare counterexamples (Stemberger, 1983a). This effect, originally noted by Meringer and Mayer (1895) and termed the “first law” of speech errors by Wells (1951), has been observed in just about every error collection. Stemberger (1983b) noted 37 violations, which constituted less than 1% of his phonological error corpus. It is possible that the percentage of violations is greater, on the assumption that the error collector’s perceptual system may regularize violations, but the effect is nonetheless a powerful one.

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

153

Error 3. dorm - dlorm (violation of phonotactic constraint of English prohibiting /dl/ onsets). Theories of production locate much of the phonotactic regularity effect in the phonological frame. Frames are specified in such a manner that impossible sound sequences are proscribed. For example, the frame in Dell’s (1988) model enforces the phonotactic regularity effect by controlling which sequences of categorized slots are made available. It is assumed that there is no available frame that will allow an illegal sequence such as /dlorm/ to be encoded. Dell’s (1986) model requires a separate set of rules to influence which segments can go in which slots. Regardless of how it is implemented in current production models, the phonotactic regularity effect is usually taken as solid evidence for the active use of phonological rules or frames in production (e.g., Fromkin, 1971). Consonant-Vowel Category Effect When one segment replaces another, the replacing and replaced segments are in the same basic category. Because the basic phonological categories are typically assumed to be vowel (V) and consonant (C), the claim is that consonants slip with consonants and vowels slip with vowels. This effect appears to be even stronger than the phonotactic regularity effect. MacKay (1970) and Stemberger (1983b) reported that there are no clear instances of cross-category errors in their corpora. Shattuck-Hufnagel and Klatt (1979) asserted that they are “extremely rare” (p. 44). Errors such as Mexico being spoken as /meksakL/, which can be analyzed as /o/ replaced by syllabic /L/, do occur (Shattuck-Hufnagel, 1986; Stemberger, 1983b), but these can also be viewed as vowels replacing vowels or as syllable rather than segment errors. A clear exception to the CV-category effect such as /eni:/ (U~JJ)being spoken as /em/ would appear to be extremely rare if not impossible. The CV-category effect is considered to be a frame constraint because the categories C and V are assumed to be part of the slot labels in the frame (e.g., Dell, 1986; MacKay, 1982; Shattuck-Hufnagel, 1979; Stemberger, 1985; 1990). These labels cause the slots to accept only segments of the correct category. Hence, cross-category errors do not happen. Syllabic Constituent Effect When an adjacent vowel and consonant are both replaced in a slip, the sequence is more likely a VC than CV. So, for example, dog - tig would be less likely than dog - dit. Nooteboom (1969) found that 19 of his phonological errors were VCs and 4 CVs. For Shattuck-Hufnagel’s (1983) exchange errors, 6% were VCs and 2% were CVs. Stemberger (1983a) also reported that VCs were more common than CVs. The syllabic constituent effect supports a role for the phonological frame because the closeness of the V and C

154

DELL, JULIANO,

AND

GOVINDJEE

in a VC sequence is derivable from the assumed hierarchical structure of the syllable unit within the frame. A VC is likely to be a single subsyllabic unit, the rhyme constituent, whereas a CV is never a single subsyllabic unit (see Figure 1). The division of the syllable into onset and rhyme is an important aspect of phonological structure, obtaining its motivation from the distribution of sounds and the placement of stress in languages of the world. It has, as well, received much support from psycholinguistic data (e.g., Fowler, 1987; MacKay, 1974; Treiman, 1983; Yaniv, Meyer, Gordon, Huft, & Sevald, 1990). The predominance of VC over CV slips, therefore, is evidence that phonological structure is exerting an influence in production. By some means or other, the frame biases the structurally close segments of a VC group to have a common fate in an error: An error in one segment of the group is associated with an error in the other segment (MacKay, 1972; Stemberger, 1983a). Initialness Effect Initial or onset consonants are more likely to slip than noninitial ones. There are actually two related effects that need to be considered under this constraint. First, there is a strong tendency for word-initial consonants to slip. Approximately 80% of consonant movement errors involve word-onset consonants; for nonmovement errors, the percentage is much smaller @O-50%), but is, nevertheless, substantially greater than what would be expected if all sounds were equally likely to slip (Shattuck-Hufnagel, 1987; but, see Berg, 1991, for contrary data in Spanish errors). Second, there may be a tendency for syllable-initial consonants to slip more often than syllable-final ones (codas). However, this has been hard to establish because of the confounding of syllable and word position. The standard explanation for the initialness effects has been that initial consonants of words and syllables are structurally distinct in the phonological frame. They often correspond to a hypothesized phonological constituent, either the word-onset (Shattuck-Hufnagel, 1987) or the syllable-onset (e.g., MacKay, 1972), and hence, are more detachable than the other sounds, which are more buried in the hierarchical structure of the word. So, for example, you can detach the /d/ from dog more easily than the /g/ because the division d / og corresponds to the principal division in the frame structure; the split do / g does not. Accounting for these four general frame constraints is a central concern of a theory of language production. The effects are quantitatively strong and, with the exception of the syllabic constituent effect, characterize most phonological errors. Perhaps most importantly, these effects are assumed to reveal the action of, or specific properties of, phonological rules and structures as they are used in the retrieval of the sounds of words. All current theories of phonological encoding in language production have accounted for the frame constraints by separating phonological structure (frames and

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

155

rules) from content (segments and features) and assigning the mechanism responsible for each constraint to the structural component (Berg, 1988; Dell, 1988; MacKay, 1987; Meyer, 1990; Shattuck-Hufnagel, 1979; Stemberger, 1990). The PDP approach outlined in the next section provides an alternative.

PDP MODELS

AND LANGUAGE

STRUCTURE

One of the most controversial claims associated with PDP models is that they obviate the need for explicit rules in cognitive processing (Rumelhart & McClelland, 1986; Seidenberg & McClelland, 1989). Thus, the PDP approach to language processing has been to assume that linguistic structure is not distinguished a priori from linguistic content. Rather, structural or rule-like effects emerge from the storage of many individual linguistic strings. Storage, or learning, takes place by changing connection strengths or weights among units in a network. As specific items are learned, the network acquires the ability to generalize in a rule-like way to new items. This occurs because items are represented in a distributed and composite fashion, that is, knowledge about a given item is contained in many units and connections, and these units and connections are shared with other related items. A well known example is Rumelhart and McClelland’s (1986a) model, which learned to map between the phonetic form of the root of a verb and the past-tense form. It acted as if it acquired the rules for English past-tense allomorphy in that it generalized to novel items, and, early in the learning process, incorrectly appended the regular form onto irregular items. Verbs that use the same past-tense form (e.g., walk and tap append /t/) are associated with common units and connections, thus providing the basis for generalization to novel forms. However, this knowledge of the past-tense rule is not located in a separate component; rather, it is inherent in the storage of the items themselves. The model exhibits sensitivity to linguistic structure without an explicit structure-content distinction. There have been influential critiques of the past-tense model (Lachter 8c Bever, 1988; Pinker & Prince, 1988), and new models developed in response to these critiques (MacWhinney, Leinbach, Taraban, & McDonald, 1989; Plunkett & Marchman, 1991). Moreover, it has been debated whether or not models of this type do, in fact, have rules. An emerging conclusion from this debate is that the issue is not whether the models have rules or structure, but of how rule-governed or structural effects are to be produced (Elman, 1989). This conclusion guides our analysis of the frame constraints in phonological speech errors. In existing models of production, the frame constraints are produced by mechanisms that separate structural frames from linguistic items. We simply ask whether an alternative mechanism can be developed.

156

DELL, JULIANO,

AND

GOVINDJEE

THE MODELS

We develop PDP models that learn to produce the phonological forms of single words. These models lack an explicit structure-content distinction. Our purpose is to determine whether the models’ error patterns show effects of phonological structure by evaluating the extent to which the errors obey the four general frame constraints. The models’ errors may show some sensitivity to phonological structure from the set of words that they are trained to produce. Structural effects may emerge from the massed influence of these words in the absence of a separate set of frames or rules. We focus on PDP models that learn to map from a set of input units to a set of output units by an error-correcting prodedure, specifically, the generalized delta rule or backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986). Like the NETtalk model of Sejnowski and Rosenberg (1987) and Seidenberg and McClelland’s (1989) reading model, our models learn to pronounce words. Figure 2 shows the basic design. Each model starts with an input representation of the word to be spoken, coded as a pattern of activation (l’s and O’s) across the lexical input units. Activation is passed through weighted connections to units from a hidden layer, and finally to an output layer consisting of 18 units (see Appendix A), 1 unit for each phonological feature of English (Chomsky & Halle, 1968). When trained, the model should activate the features of each segment of the input word, one segment at a time. The sequential aspect of the model owes specific debts to previous work by Jordan (1986) and Elman (1990). Jordon outlined a general scheme for producing sequential output in the PDP framework and applied it to the problem of coarticulation in speech. Jordan’s proposal was that the input layer should contain an invariant plan and a slate representation derived from the model’s output during the sequential execution of the plan. The corresponding entities in our model are the lexical input units and the external feedback units, which are copies of the output during the encoding of the previous segment. Elman (1990) explored a related architecture where the state representation consisted of a replica of the most recent activation values of the hidden units, rather than the output units. Our model incorporates this kind of state representation as well, labelled internal feedback in Figure 2. Thus, both the output and hidden unit activations are fed back to the input layer. In other respects, the model employed what are coming to be the standard features of backpropagation models: 1. The net input to a unit i is the sum of each incoming connection’s signal, Cwuai(t), where wo is the connection weight from unitj to unit i, and aj (t) is the activation of unit j at time t. 2. The logistic activation function mapped the net input to a unit to that unit’s activation level, with the result that these levels ranged between 0 and 1.

STRUCTURE

AND

I

CONTENT

IN LANGUAGE

PRODUCTION

157

OUTPUT (Phonological Features)

A

HIDDEN UNITS

I

A

7 I

I Lexical

I

I

I

Internal Feedback

External Feedback

I

INPUT

Plan Units I

I Spte Units

Figure 2. The PDP architecture elongated rectangle represents

with both a set of units.

internal

and

external

context

units.

Each

3. Units at the hidden and output layers had a bias, an additional weight from a unit that is always on and is modified in the same way as other weights. 4. Weights were adjusted by the backpropagation algorithm with a momentum factor of .5 and a learningrate that varied between .25 and 1.0. 5. Prior to the training, the weights were initialized to randomly distributed values between - .l and .l . (For definitions of the logistic activation function, and the actual backpropagation mechanism, including the role of the learning rate and momentum parameters, see Rumelhart et al. 1986). Because of the model’s sequential nature, other characteristics need to be mentioned. Given a lexical input, say some encoding of the word CAT, the network’s goal is to generate the features of the first segment /k/ across the output units, than /me/, then /t/, and then it should “stop,” that is, generate neutral activation values of .5 for all the output features. We call a pattern of .5 for all features the null segment. Training via the backpropagation rule ocurs after each attempt to produce a segment, including the final null segment. When the encoding process for a word begins, the external feedback state units are initialized to the null segment and the internal feedback state units are initialized to zero. After the model’s approximation to the first segment is produced, the activation values of the output units are copied to the external feedback units, and the values of the hidden units are copied to the internal feedback units. Following this, the forward spread of activation

158

DELL, JULIANO,

AND

GOVINDJEE

attempts to produce the features of the second segment, and so on, until the word is finished. STUDY 1: FACTORIAL MANIPULATION OF MODEL CHARACTERISTICS Simulation Design In order to gain some insight into the model’s mechanisms, we tested eight versions of it derived from manipulating three factors of two levels each. These factors were (1) the training vocabulary, (2) the nature of the input representation, and (3) the nature of the state representation. Training Vocabulary. Two training vocabularies, ea‘ch comprised of 50 English words, were prepared. Both were samples from the set-of all 3-segment words appearing in Kucera and Francis (1967). In addition, only words spelled with three letters were included because, as explained later, one of the input representations examined was based on spelling. Thefrequent vocabdory consisted of a sample that was heavily biased toward the most frequent words from the original set (see Appendix B). The goal was to create a sample of word types whose tokens comprised around 90% of the word tokens from the set. Thus, the frequent vocabulary represents, in terms of word tokens, the lion’s share of English-speakers’ experience with short words. The second infrequent vocabulary was sampled with a bias toward the middle frequencies (median of 30 tokens) and with a goal of representing around 10% of the tokens. The two vocabularies are given in Table 1, along with some of their characteristics. Our purpose in comparing the two vocabularies was to examine the dependence of error pattern on the training set. Given that our hypothesis is that the frame constraints on errors reflect nothing more than the mass action of the stored vocabulary, the vocabulary is important. The frequent vocabulary not only contains common words, but it also conforms to the statistics of English sound distribution better than the infrequent vocabulary in two specific respects. First, as shown in Table 1, each existing VC tends to recur in the frequent vocabulary to a greater extent than does each existing CV; the ratio of VC tokens:VC types is 1.62 times larger than the corresponding ratio for CVs. This asymmetry is less pronounced in the infrequent vocabulary. In other words, the frequent vocabulary has the property that its VCs are more redundant than its CVs. English vocabulary, in general, has this characteristic. We examined all 34etter words in the Kucera and Francis’s (1967) vocabulary listing and found that the token:type ratio for VC phonological sequences is 2.02 times greater than that for CVs, an asymmetry that is even more pronounced than that of the frequent vocabulary. Second, the frequent vocabulary has some diversity with regard to word shapes and consonant clusters: 18% of the words in this vocabulary are not CVCs, which

STRUCTURE

AND

CONTENT

TABLE 1 of Training

Characteristics Frequent agO and

ask bad bed big but con Infrequent orm

old

try

get god got gun

hot its iob let lot

Put ran red

war

man

run

Vocabularyb fat

kid

pat

leg lid lot

Pot rid rub

map mud net nut oil out

rug sad sin sit sum tln

iov ’ 88.9% Francis 82% mean mean b 8.0% 08% mean mean

Vocabularies

sat

ivy jet

dog dot elm

159

men met nor not

gas gin her hit how

cap cot cow dan

PRODUCTION

had has him his

few

bag bus

and

Vocabulary0 car cut did end far for

any art

IN LANGUAGE

of tokens (1967); cvcs; number of number of of tokens: cvcs; number of number of

of 3-segment,

34etter

words words

per CV-1.16; per VC-1 .BB.

words words

per CV-1.12; per VC-1.53.

words

set son ten

was yes yet

tip tom top von vet web win won yin zen in Kucera

is close to the actual percentage of nonCVCs in English word types of this length, 17.3% (U. Frauenfelder, personal communication). Moreover, the vocabulary contains a variety of consonant cluster types: nasal-stop, /s/stop, stop-/s/, stop-liquid, and liquid-stop. The infrequent vocabulary has fewer non-CVCs. Its clusters include only liquid-nasal combinations and the combination /fy/ in few. Thus, the frequent vocabulary is not only representative in the sense that it represents most of the tokens of short English words, but also in that its structures and cluster combinations contain more of the diversity of English phonology. Znput Representations. The lexical units represented the word to be spoken in one of two ways, a random coding or a correlated coding. In the random coding, each word was input as a pattern of l’s and O’s across 30 lexical units. This pattern was determined by randomly sampling a 30-digit binary number

160

DELL, JULIANO,

AND

GOVINDJEE

for each word. The random coding can be thought of as a distributed semantic representation of the word in the sense that it is uncorrelated with the word’s form. In this sense, mapping from the random coding to the output units is analogous to lexical retrieval in normal production. The essence of the correlated coding is that words with similar output feature sequences should have similar input representations. However, we did not want to use a representation that was perfectly consistent, so our correlated coding was based on the word’s orthography. Similarly spelled words tend to be similarly pronounced, but the relationship is not a perfectly consistent one. Each letter of the alphabet was assigned an arbitrary Sdigit binary code, which appeared, where appropriate, in units l-5 for initial letters, 6-10 for second letters, and so on. Because the vocabulary was limited to 3-letter, 3-segment words, there were only 15 lexical units in the correlated coding. Thus, the correlated representation had only half the units of the random one. Mapping from the correlated representation to the output can be interpreted as a kind of reading aloud task, or as a form of lexical retrieval in production in which one goes from a static underlying phonological form to a sequence of phonological features. The importance of the input coding has been emphasized in recent discussions of PDP models of language (e.g., Lachter & Bever, 1988; Pinker & Prince, 1988). By comparing two different input representations, we hope to get a feel for how the model’s ability to reproduce the frame constraints is related to the input. Nature of the State Representation. The state units enable the model to produce sequential output. These units keep track of where the model is in the sequence. We compared two kinds of state representations, the one discussed previously, in which both the output and hidden units’ activation values are copied onto state units (the internal-external architecture), and one in which only the hidden units’ activation is copied onto state units (the internal-only architecture). Both of these architectures are, in principle, capable of supporting sequential output. The internal-only architecture was explored in detail by Elman (1990). The internal-external architecture has not previously been studied. One should note that an external-only architecture is not capable of supporting sequential output when there are repeated items in the sequence (Elman, 1990; Jordan, 1986). To see this, consider how such a model would produce the word did (/d/ /I/ /d/ null). After it has produced the features of /d/ across the output features, these output activation values are then copied onto state units and thus represent the state of having just said a /d/. But the state representation does not tell us which /d/, the first or second, and hence, the model cannot tell whether to next produce /I/ or stop. Neither the internal-only nor the internal-external model have this problem because, as noted by Elman (1990), the internal state units, being copies of the hidden units and not the output units, are not

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

161

constrained to represent identical states for each /d/. Thus, these units can, in principle, represent the state of having said an initial /d/, and this state can be different than the state of having already said /did/. Jordan’s (1986) model used external feedback only, but got around the repetition problem by giving each state unit a memory; its current activation was a combination of the corresponding output unit’s and its own previous activation level. Our purpose in comparing the architectures was to learn something about the role of the state units in producing structurelike effects on the errors. Given that most English phonotactic constraints reflect contingencies between adjacent segments, our expectation was that the internal-external architecture, which emphasizes the most recently output segment to a greater extent than does the internal-only architecture, would be associated with a more rule-governed error pattern. One must keep in mind, however, that the internal-external architecture also has the advantage in that it provides more state units (20 internal feedback units corresponding to the 20 hidden units plus 18 external feedback units corresponding to the 18 output features) than the internal only architecture (20 internal feedback units).

Simulation Training Twenty-four models were created, comprising three replications of the eight conditions (2 vocabularies x 2 input codings x 2 architectures). Each replication was trained from a different set of starting weights. All models were trained by presenting repeated sweeps through the vocabulary, each sweep, or epoch, consisting of a presentation of all 50 words in a random order. Every 50 epochs, the models were tested by assessing the correctness of the four output vectors associated with each word (the three segments and the concluding null segment). An output vector was deemed correct if it was closer in Euclidian distance to the desired segment than to any other segment. When the model achieved greater than 90% correct segments, training stopped. For the models with the correlated input pattern, greater than 90% accuracy was achieved every time within 200 epochs with a learning rate of 1.0. This accuracy was not achieved with the random input pattern, but it was found that with a learning rate of .25, the target accuracy level, did result within 200 epochs. Error Analysis. When a model’s output, after training, was closer to a segment other than the correct segment for any of the four sequential positions, an error was recorded. Across the 24 simulations, there were 376 errors. Table 2 gives examples by presenting all of the errors for one of the models, the first one that we trained. We examined the errors from all simulations to determine the extent to which they followed the four general frame constraints. For each constraint, we present the models’ performance and compare it to standards derived from the speech-error literature. Initially, we

DELL,

162

JULIANO,

Errors

from

AND

GOVINDJEE

TABLE 2 o Single

Model

osk

cesk

b/s did far for him

big dld f>r for hlm

hrs

hlz iob not

hen htzz iad net

old

and

lob not old

mtk* bd dmd v3r vor

Put sat

Put scet

set

Sd

ed ect

son ten

sen ten

Ben nen

use

Y”Z

Y=Q

* Phonotoctic

Pet

violations.

focus on the mean pattern across all eight model versions and then contrast the versions. Mean Performance of the Models Phonotactic Regularity Effect. Across the 24 models, 92.6% of the errors created phonotactically legal strings of American English. This compares to a standard of 99% in a natural error collection (Stemberger, 1983a). An experimental speech production experiment by Meyer and Dell (1993), in which a limited set of phonotactic constraints in Dutch were examined, found that 97.7% of the errors were legal strings of Dutch sounds. Clearly, the average model’s behavior is very good, but not quite up to the human error standard. One must bear in mind, though, that the very high rate of phonotactic regularity in the natural collection may be an overestimate due to perceptual biases on the part of the error collectors. Table 3 gives examples of the model’s illegal slips. These errors created strings ending in certain lax vowels or strings containing illegal clusters, illegal word-initial or word-final consonants, or two vowels in a row. (/i:/, /ow/, and /ey/ were treated as single segments; /ay/ was considered to be single segment pronounced /a/; other diphthongs were represented as vowel-glide combinations). In general, the model’s illegal slips tended to occur on nonCVC words, that is, on word shapes that are less common. It is of interest to note that the illegal error string atk generated by the model actually occurred in Stemberger’s (1983a) error corpus.

STRUCTURE

AND

Example

Ivy any

-

avl

-

cnl cetk

lob fry few

-

etd ;ad trah fee

Ills orm

-

a3a >rmm

Of/

-

VI

ask old

CONTENT

IN LANGUAGE

TABLE 3 of Phonotactic

PRODUCTION

163

Violations

illegal illegal

word-final word-final

illegal illegal

cluster cluster

vowel vowel

illegal word-inltial consonant illegal word-final consonant two vowels in a row two vowels in a row English has no double consonants /ey/ does not occur in the dialect used for comparison

English

Vowel-Category Effect. The average model’s errors involved the substitution of consonants for consonants and vowels for vowels 97.6% of the time; /a~/ replacing /h/ in the word his is an example of a violation from the models’ output. (In the feature system chosen for the model, the consonant /h/ is somewhat similar to the vowels.) For this constraint there is no clear standard from the speech-error literature, other than claims that the effect is exceptionless or nearly so. So, for the purposes of adopting a standard, we will assert that this rule is upheld 99.5% of the time. So, again, the average model’s behavior is good, but falls just short of the standard.’ Syllabic Constituent Effect. Although the vast majority of the models’ errors were substitutions, additions, or deletions of single sounds, there were some errors in which more than one sound was replaced. The syllabic constituent effect concerns the cases in which an adjacent vowel and consonant both slip, VC and CV errors. In accord with speech-error data, the models produced more VC substitutions than CV substitutions; 21, or 5.6’3’0, of the total number of errors constituted a VC sequence and 8, or 2.1070, were CV sequences. This difference is significant across the 24 models, t(23) = 2.26, p< .OS, and is in very close agreement with the 6% VC and 2% CV rates cited ’ For the phonotactic regularity and CV-category effects, we have adopted standards close to 100% adherence. We do not claim that our standards exactly reflect the data-there are simply to many uncertainties involved in coming up with the numbers-but they serve to illustrate what is not in doubt: These effects index very strong constraints and, moreover, they give our models difficult targets to shoot for. Given that the errors the model generates are technically nonmovement errors, it is useful to note that these constraints are strong for both movement and nonmovement errors. Although we have not found any estimate of the phonotactic and CV effects broken down by error category, it is clear that nonmovement errors, which are a nonnegligible portion of phonological errors (31% based on the tape-recorded corpus of Garnham, Shillcock, Brown, Mill, &Cutler, 1981; 15% on Stemberger’s 1983x, corpus), fall in a range of 94-100% adherence to the phonotactic regularity and CV-category constraints.

164

DELL, JULIANO,

AND

GOVINDJEE

earlier from Shattuck-Hufnagel (1983). One must keep in mind, though, that the numbers from natural error collections are based primarily on movement errors (exclusively on movement errors for Shattuck-Hufnagel’s, 1983, data), and that the models’ errors are nonmovement. We were unable to find an estimate of VC and CV rates based exclusively on nonmovement errors from the speech-error literature. Initialness Effect. Single-consonant substitution errors from the models corresponded to syllable-onset consonants 61.6% of the time. This occurred in spite of more coda than onset consonants in the training vocabularies (47.4% of vocabularies’ consonants were syllable-onsets). Hence, the models exhibit a bias for onset errors. Because the words in the vocabulary were so short, just about all the syllable-onset errors were also word-onset errors, so it is not possible to tell whether the models’ initialness effects reflect words, syllables, or both. Estimates of the strength of the initialness effect in natural error corpora vary depending on the kind of error and the size of the word. Given that the training words in the model were three segments long and that the errors generated were nonmovement, a standard was chosen from Shattuck-Hufnagel (1987) based on nonmovement errors in monosyllabic words. This standard was 62% onset errors. So again, the average models’ behavior is close to the standard. Our conclusion is that the average model adheres fairly well to the four general frame constraints. But why does it work? There are two related forces that determine the models’ error pattern: similarity and sequential biases. When a model errs, the erroneous sound is very likely to be similar to the correct sound, Thus, car might slip to gar, but not to uar, simply because the model’s output representation is based on linguistic features. These are motivated by acoustic and articulatory similarity, as well as linguistic distinctions. When the model is trained to produce /k/, an output pattern that corresponds to /g/ is much closer to its training than one corresponding to /II/. Sequential bias refers to the fact that the context units bias the models’ output toward common sequences of sounds. For example, after producing a particular segment, the model will be strongly biased against producing the same segment because there are no instances in the vocabulary of immediate segment repetition. Similarity and sequential bias can be seen as two manifestations of a single principle of adherence to well-worn paths, where a path is a trajectory in the space defined by the output units, the phonological features. The production of a word tends to strengthen or wear its path, and the model’s output tends to stay close to the paths that are worn by the training process. Each word, however, is not an isolated influence, though. The similarity structure of the lexicon plays a large role by wearing some components of paths more strongly than others. For example, run, gun, and son share the sequence/an#/and

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

165

thus, training on these words creates an increased tendency to output this sequence. Each of the frame constraints results from similarity and/or sequential bias. The phonotactic regularity effect is produced by both of these forces. The substitution of a similar sound usually will not create a phonotactically illegal sequence. This is because the phonotactic patterns of the language are sensitive to similarity; if the sequence AB is acceptable, the sequence AB ’ likely is as well. Sequential bias also plays a role in keeping the models’ errors phonotactically legal. The previous sounds in a word bias the model to produce a subsequent sound if that combination of sounds (or a similar combination) occurs in the vocabulary. The CV effect arises primarily from similarity. Vowels and consonants are very different from one another in the models’ output representations and, hence, there is little tendency for cross-category substitutions. The syllabic constituent and initialness effects arise from sequential bias. In the models’ vocabularies, VC units tend to occur with more than one word to a greater extent than CV units. Hence, there is a stronger redundancy within the VC than the CV. The initialness effect reflects the fact that there is more uncertainty about sounds in the beginning of a word than elsewhere. At the beginning, the context units have null values and, hence, are less effective at pointing to the desired sound. Later on in the word, the context units have activation values that are specific for that word, and thus, are very effective cues for subsequent sounds. Note that this explanation of the initialness effect identifies it as a word-onset, rather than a syllable-onset, phenomenon. Whether the obtained initialness effects in the speech-error literature reflect word or syllable units is controversial. However, Shattuck-Hufnagel (1987) argued that the data are consistent with a view that the predominance of syllable-onset slips is really due to a large tendency for slips to involve word-onset consonants. Comparisons Among Model Versions The degree to which the eight versions of the model obeyed the four frame constraints is shown in Table 4. In order to compare the versions, we computed a deviation score for each of the 24 models based on the error proportions that index the frame constraints. This score was the sum of squared deviations between a given model proportion, Mp(i), and the standard proportion, Sp(i), for each error effect, with the proportions undergoing the arcsine transformation, which corrects for differences in variability for extreme proportions. Deviation Score = 7 { A [Mp (i)] - A IS’ (01 }’ where A@) = amine (Pas) Because the syllabic constituent effect is represented by two proportions, one for VC slips and one for CV slips, there were five components to each

166

DELL,

Error Total

Training

Errors

Set

Standards

from

JULIANO,

GOVINDJEE

TABLE 4 for the Eight

Characteristics

Model Architecture”

Speech-Error

AND

Input

%Phon. Regular

Literature

Model

Versions

%Obey CV Rule

4

58

94 100 100

7

0

83

loo loo 83

100

9

2

62

loo loo 09

random

correlated

(43)

Frequent

int-only

(46)

Frequent

int-ex

(62)

(2)

correlated

int-ex

%Onset Consonant

10

(99.5) loo 100

Frequent

%CV Slips

(6)

(99) loo loo 95

(49)

%VC Slips

100

100 100

(45)

Frequent

int-only

random

80 09 83

100 94 loo

11

0

35

(56)

Infrequent

lnt-ex

correlated

95

95

2

0

81

81 95

94 90 95 92

2

0

73

5

8

50

2

4

22

(50)

Infrequent

int-ex

correlated

95 92 94

(39)

Infrequent

int-ex

random

00

loo

loo 06

loo loo

94 03

94 loo

(48)

Infrequent

in&only

random

loo

each

Note. The phonotactic model version. a tnt-ext=internal-external;

and

CV effects

are

int-only=internol

shown

94

loo separately

for

all three

replications

of

only

deviation score, two for the syllabic constituent effect, and one for each of the other three effects. The square roots of these scores were then compared using analysis of variance (ANOVA). Effect of Training Vocabulary. The degree of deviation from the standard was smaller for the frequent vocabulary than for the infrequent one, F(1, 16) =9.24, p< .Ol (frequent, M= .670, infrequent, M= .961). The frequent vocabulary showed a stronger phonotactic regularity effect (6 models were at 100070vs. only 2 for the infrequent vocabulary) and better adherence to the CV-category effect (10 of the frequent models were at 100% vs. 5 infrequent ones). In addition, the models trained on the frequent vocabulary

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

167

showed a strong syllabic constituent effect (9.2% VC slips and 1.5% CV slips), whereas the models trained on the other vocabulary did not show this effect at all (2.7% VC slips and 3.0% CV slips). Both training vocabularies showed the initialness effect (59.8% for frequent, and 63.3% for infrequent). As expected, the training vocabulary matters. Given that the training set is the model’s only source of information regarding the properties of English words, it makes sense that the most representative vocabulary would do best at generating error effects. For example, the frequent vocabulary produces a strong syllabic constituent effect, whereas the infrequent one does not. This seems to be related to the fact that the frequent vocabulary has, like the vocabulary of English as a whole, greater redundancy among its VCs than its CVs; the infrequent vocabulary has less of a difference. In particular, when a given VC is present in many different words, it may come to act as an output unit. Segments that cooccur with high frequency give rise to emergent behaviors that mimic the effect of explicitly assigning a unit or node to the higher order grouping. Notice, in the errors from the model in Table 2, that the three slips involving a VC unit, big- /bed/, him- /hen/, and old- /and/, all have a VC that occurs only once in the vocabulary replaced by a VC that occurs more than once. The better performance on the phonotactic regularity effect and the CV effect may be attributed to the greater diversity of phonotactic patterns in the frequent vocabulary. The infrequent vocabulary lacks diversity in its non-CVC patterns and, hence, makes deviant errors on the few non-CVCs that it must produce. Effect of Input Representation. The models with the random input representations performed just about as well as those with the correlated input representations, F(1, 16) = 0.10, random = .830, correlated = .801. At least in these models, the input representation is not critical in producing adherence to the frame constraints. This is important because it suggests a degree of generality of the model. The input representation could be correlated with the word’s form, as in an underlying phonological representation, or uncorrelated, as in a semantic representation. In either case, the error effects that have been attributed to phonological frames emerge. This conclusion must be qualified because there was a small, but significant, interaction between input representation and vocabulary, F(l, 16) = 4.67, p< .05. The superiority of the frequent over the infrequent vocabulary was greater with the correlated representation than with the random representation. The infrequent correlated models are perhaps unfairly punished by our measure because their initialness effects are actually too strong (> 62% onset slips). These models also are the only ones that do poorly on the CVcategory effect and the standard here is quite high, 99.5%. Hence, with the arcsine transformation, the deviation from the standard is large. We have no interpretation to offer for these unexpected effects.

168

DELL, JULIANO,

AND

GOVINDJEE

Effect of Architecture. As anticipated, the more complex architecture, the one with both external and internal feedback, was superior to the internal-only architecture, F(1, 16) = 5.07, p < .05, internal-external = .707, internal-only= .923. Both the phonotactic regularity effect and the CV effects were better with the internal-external models (see Table 4), giving some evidence that the extra context supplied by the internal-external model is helpful for producing structural effects on errors. Whether the effect is due to the specific effect of output context (external feedback) or simply to the presence of more units dedicated to context cannot be ascertained. Discussion The analyses performed thus far suggest that not all models are equal, but that it is possible to achieve very humanlike performance on the frame constraints. In order to have good performance with regard to the phonotactic regularity effect, one needs a solid encoding of the surrounding context (better in the internal-external architecture than the internal-only) and a representative vocabulary. Good performance on the CV effect is easier to achieve. As long as consonants and vowels are very different in their output features, the model’s slips obey this constraint. The syllabic constituent effect is heavily dependent on the training vocabulary. If there is a large difference in the patterning of VCs and CVs, then the desired effect emerges. These considerations suggest that the six models created with the internalexternal architecture and trained on the frequent vocabulary should be very good. And, as Table 4 shows, they were. Their errors were phonotactically regular 96.3% of the time and obeyed the CV constraint, 100% of the time. VC slips outnumbered CV slips, 9% to 3%, and the initialness effect was 60%. These numbers are very close to the standards. One of the most important findings from the simulations is that the errors are sensitive to the phonological patterns of English. The fact that this is true of people’s errors prompted some researchers to suggest that phonological rules are actively consulted during production (e.g., Fromkin, 1971). Others (Crompton, 1981; Stemberger, 1983x), however, have been more skeptical and argued that error adherence to phonotactics does not necessarily implicate rule consultation. Our simulations justify this skepticism. They suggest that the phonotactic regularity effect may reflect more simple mechanisms, mechanisms based on the principles of similarity and sequential bias. To shed more light on how the models produce the phonotactic effect, we performed a follow-up set of simulation experiments using just one model, one of those with the internal-external architecture and the correlated coding and trained with the frequent vocabulary. These experiments let the model “babble” in order to examine its general sequential biases. Specifically, we asked the model to tell us what it knows about sound patterns in the absence

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

169

TABLE 5 Distance in Features Between Model’s Output and the Average Stop Consonant, the Average Nosal Consonant, and the Average Vowel OS a Function of the Preceding Encoded on the External Context Units, from the “Babbling” Simulation Following

Preceding Stop

Nasal

stop Consonant

Sound

Consonant

Sound

Nasal Consonant

2.00

I

1.17

2.25

Vowel

I enclosed

Vowel

2.60

Consonant

Numbers vocabulary.

Sound

in boxes

indicate

sound

combinations

1.70 that

2.52 occurred

in the

training

of any lexical input by examining its output when lexical nodes were set to zero and when its external context units take on various values. We set the external context units to one of three patterns: (1) a generic stop consonant (a vector that is the average of all stop consonants); (2) a generic nasal consonant (average of all nasals); and (3) a generic vowel (average of all vowels). Then we examined the model’s output in response to this generic input by asking how close that output is to the same three patterns. For example, when the model’s external context units were set to the generic stop, the model’s output was found to be much closer to the generic vowel (1.17 features) than to the generic nasal (2.60 features) or generic stop (2.00 features). In effect, the model produces a more vowel-like pattern after a stop than a pattern resembling a stop or nasal consonant. Given that stop-vowel sequences are phonotactically legal, whereas stop-nasal and stop-stop sequences are not, at least in this vocabulary, the model is therefore showing a sensitivity to phonotactic constraints, a sensitivity that is independent of any lexical input. Table 5 shows that this sensitivity is characteristic of the model by presenting the results of other sound-class combinations. Every example of a legal combination is associated with a distance of less than 2.0 feature units, and every example of an illegal combination is associated with a distance greater than or equal to 2.0. The babbling simulation led to two conclusions. First, the network has made some generalizations about the sound patterns of English. In this sense, the model is exhibiting rulelike behavior; its knowledge is not tied to the lexical items that it was trained on. Second, at least some of this knowledge is found in the associations between the immediately previous output and current output. Of course, these associations do not exhaust the model’s knowledge of sound patterns because we have not considered the internal context

170

DELL, JULIANO,

AND

GOVINDJEE

units, nor have we taken into account any interactions between both kinds of context units and the lexical input units. Nonetheless, we can conclude that the associations involving the context units play a large role in pushing the model’s output toward phonotactically acceptable sequences. Despite the promising results of all of our simulations, there is something a little odd about treating the model’s errors as analogous to speech errors. The model’s errors reflect a lack of knowledge, rather than a transient inability to use knowledge. Adult speech errors are clearly the latter rather than the former. We address this in the following section by generating errors through transient degradation. STUDY 2: ERROR GENERATION

BY DEGRADATION

As developed so far, the model exhibits the general frame constraints in the errors that it produces when its learning is not yet complete. Although we think this is a valuable result, one that reveals the forces at work in models of this sort, we acknowledge that it is unclear exactly what psychological process is being modelled. Because the speech-error data come from adult subjects who can, most of the time, correctly pronounce all of the words in their vocabulary, it would be most appropriate for the model to behave in this way, too. So, we trained two new models, each using the frequent vocabulary, the correlated representation, and the internal-external architecture, until they correctly produced all their words. To achieve this level of correctness, we used a training device that is commonly employed in models using backpropagation: Error was not back-propagated if the model’s output was greater than .9, for a target output of 1.0, or if the model’s output is less than .l, for a target output of 0. The learning rate was .25. Model 1 needed 205 training epochs and Model 2, 190 epochs. To create errors, random normal deviations to some of the weights were then introduced. For each word tested, different sets of weight changes were made and these were maintained only through the production of that word. The deviations represent the natural variations in lexical knowledge resulting from recent experience, environmental contamination, and other putative causes.of error. First, we examined the effect of adding noise to the weights from hidden to output nodes. Each model was tested with normal deviates of mean 0 (SD = . 1) added to the hidden-output weights. A separate set of deviates was used for each word. Errors were computed as before. This process was repeated eight times for each model, resulting in 3,200 segments produced, counting the final null segment. The noise degraded performance to 94.9% correct segments, comparable to that of the models in Study 1. The 164 errors obtained followed the four general frame constraints. As in previous models, the errors that violated phonotactic constraints or the consonant vowel rule occurred rarely and

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

171

TABLE 6 Relation Between Amount of Noise Added to Hidden-Output Connections in Model 2 and Adherence to Frame Constraints

Noise

(SD)

No. of Errors

% Phonotactically Regular

% Obey CV Rule

%VC

%CV

% Onset Consonant

.15

129

96.1

96.9

5.4

3.9

77.6

.lO .075 .05 Totals

83 52 27 291

98.8 100.0 100.0 97.9

98.0 96.2 100.0 97.6

2.4 3.8 0 3.8

0.0 1.9 0 2.1

02.0 85.0 50.0 70.0

primarily with non-CVC words: 94.5% of the errors were phonotactically legal and 98.8% followed the consonant vowel rule. These were the only violations (some were obtained more than once): try- /tr#/, try-/tRaa/ (R is a syllabic /r/, so this would sound like “turah” with the final “ah” being too long for English), ago- /aeo/, any- /enI/, and can- aan./ The models’ errors also exhibited the initialness and syllabic constituent effects: 76.3% of the consonant substitution errors involved onsets, 2.4% of the errors were VCs and 0.6% were CVs. So, it appears that the general frame constraints are associated with transient degradation to a completely trained system, as well as an incompletely trained system. Given that normal speech errors reflect deviations in a well-trained system, the model gains in credibility as an account of these error effects. We also explored the extent to which the frame constraints depend on the overall level of noise. The idea is that small deviations should be less upsetting to the general patterns that the network has extracted from the vocabulary and, hence, one expects better adherence to the frame constraints with a higher overall level of correctness, that is, with less noise. We tested Model 2, which was the better of the two models, with noise levels of .15, .075, and .05 standard deviations. There were eight replications for each noise level. The extent to which the errors followed the frame constraints is shown in Table 6, which presents the results with a standard deviation of .lO from the previous tests with Model 2 for comparison. On average, the errors exhibited the frame constraints (97.9% phonotactically legal, 97.6% obeyed the CV rule, 3.8% VC errors vs. 2.1% CV errors, and 78% initialness effect). However, as the overall error rate decreases, the errors’ tendency to follow phonotactic patterns increases. With the .075 and .05 noise levels, none of the errors violated phonotactic constraints. There is a similar tendency for greater adherence to the CV rule with a lower error rate, but the trend is not as clear. The initialness and syllabic constituent effects are not associated with any particular trend, but the small number of relevant errors does not allow for a firm conclusion.

172

DELL,

AND

TABLE of Noise on Adherence 1 and 2 as a Function

Effects in Models No. of Errors

Lesion

JULIANO,

GOVINDJEE 7 to Frame of Location

% Phonotactically

Constraints of the Lesion

% Obey

% Onset

Regular

CV Rule

%VC

%CV

Consonants

Hidden-Output (SD of noise=.l)

164

94.5

90.0

2.4

0.6

76.3

Context-Hidden (SD of noise=.l)

238

81.1

96.6

0.6

0

24.3

Note.

Each

condition

is based

on 3,200

segments

produced.

These results are encouraging because they suggest that the model’s adherence to the frame constraints, if anything, gets better as its error rate decreases. Thus, if we were to extrapolate the model to a level of correctness found in normal adult speech (one error per 300 words, Hotopf, 1983; one error per 900 words, Garnham et al., 1981), we might well find the excepttionally high adherence to the phonotactic regularity and CV rules seen in speech-error collections. At least we can expect that such adherence would not get worse as correctness increased. At this point, it is useful to ask whether the model’s adherence to the frame constraints is an automatic consequence of its assumptions, because most of the models that we examined have worked. To some extent, that is exactly what we are saying. The frame constraints are simple consequences of the covariances in the feature sequences in vocabulary, and so any device that learns to be sensitive to these covariances will tend to do the right thing (see Van Orden, Pennington, & Stone, 1990, for discussion of general principles of covariant learning). However, it is not the case that any model that produces greater than 90% correctness would follow the constraints as well as the previous models have, and this can be demonstrated by adding noise to a different set of connections, the context-hidden connections. We tested Models 1 and 2 eight times each with noise standard deviation of .l added to the context-hidden connections: 92.6% of the segments were correct, comparable to the 94.9% found for these models with the same noise level added to the hidden-output connections. However, the models failed to produce the frame constraints, as shown in Table 7. Specifically, the average performance on the phonotactic constraints was a dismal 8 1.1%) and the initialness effect was transformed into a preference for errors on noninitial consonants (24.3%‘0). These two percentages are significantly different from both the standard percentages (99Yo and 62%) respectively) and from the corresponding percentages obtained from the models degraded by the same noise added to the hidden-output connections (p< .05 for the phonotactic effect,p< .Ol for the initialness effect). The square root of the deviation score, as defined

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

173

previously, computed on the average of the models degraded on the contexthidden units, was a large 1.49. In comparison, the score for the average of the models degraded on the hidden-output units was 0.47. The context-hidden models’ errors tended to involve sequencing and inappropriate doubling, rather than the substitution of similar sounds. This often resulted in violations of phonotactic constraints, for example, ten- /tntt/, ran- /rierr/. These findings, together with those of the babbling simulations, point to the role of the context units in producing well-behaved output. When degradations affect contextual information, that is, the context-hidden connections, both sequencing and adherence to frame constraints that are dependent on sequential bias such as the phonotactic and initialness effects are disrupted. When degradations occur after the contextual and lexical information have been integrated, to the hidden-output connections, the proper sequential biases have already been added in. Hence, the errors are well-behaved with respect to the frame constraints. The comparison of context-hidden and hidden-output noise, along with the previous investigations of the model’s performance without any lexical input, allow us to localize some of the framelike effects in the model, particularly the phonotactic regularity and initialness effects. They reside, at least in part, in the way that the model uses the changing context that occurs during the sequential production of each word’s segments. STUDY 3: LARGER

VOCABULARY

Although the frequent SO-word vocabulary represents most of the tokens of 3-letter, 3-segment English words, it makes up only about one sixth of these word types. Specifically, Kucera and Francis’s (1967) corpus gives 308 words that are three letters and three segments long. Our next set of simulations sought to examine models that have learned larger vocabularies. The extension to a large vocabulary is a useful challenge for two reasons. First, there is the question of replication. If the model is working well on a representative small vocabulary, it should also do so on a large representative one. Second, the interesting theoretical claim of the model is that the storage of the vocabulary provides the basis for error phenomena that we seek to explain. Hence, it is important that a good chunk of that vocabulary be stored in the network. Our first attempt with a large vocabulary was only partially successful. We used the internal-external architecture with the correlated input coding because this combination seemed to work the best in the earlier studies. We extended the capacity of the network from 15 to 30 input nodes, 10 for each letter position,2 and from 20 to 25 hidden units. This, then, resulted in 25 ’ The letter coding was the same as in the previous simulation, except that each unit had a twin that behaved in the same way.

174

DELL, JULIANO,

All

Errors

from Reaching

and ask ado ant end few

Violating

AND

GOVINDJEE

TABLE 8 Phonotoctic

a Model Trained o Level of 89.6%

Constraints

on 308 Words Correct Segments

tend cesk edu cent

cedt

end

fzdt feu ak

& cdl mdt

fYJ osi’

icy ink mew out oil

hk

I& neu

my” OWt

egt wn pke SIE

WI kwo ski sko’ vow

quo ski sky vow * The dipthong

oy was

sle vew encoded

OS /a/.

internal context units. As before, there were 18 external context units, one for each output unit. The training vocabulary consisted of the 308 words, as described before. The learning rate was .25; connection weights were initialized as before. All of the words were presented in a new random order for each epoch. Thus, we did not weight the presentation frequency of each word by their observed frequencies. (A model that is heavily weighted by frequency would have a training experience more like that of our 50-word frequent vocabulary). This model did not learn as well as the smaller ones. We looked at the percentage of correct segments after every 20 epochs, and the best performance achieved was 89.6% correct segments, a level that occurred, surprisingly, with only 80 epochs of training. Performance on the frame constraints was mixed: Only 87.1% of the 116 errors were phonotactically regular. Table 8 gives all of the illegal errors obtained. It appears that the model has yet to learn that adjacent stops in the same syllable are disallowed. The model also has a problem with the unusual clusters /fy/ and /my/ and, in general, has difficulty with non-CVC words: 93.1% of the errors followed the rule that consonants slip with consonants, and vowels with vowels, a percentage that falls short of that with the smaller networks. The results for the other two frame constraints, however, were good: 59% of the consonant errors were onsets (62% is the standard), and 6.9% of the errors were VCs versus 4.3Yo CVs (standards are 6% and 2%, respectively). Our tentative conclusion is that the initialness and syllabic constituent constraints are present in the large network. Performance on the two other

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

175

constraints, however, was not strong. One explanation for this weakness is that the model, itself, did not reach a high degree of correctness. As we suggested earlier, the tendency for a model’s errors to obey the phonotactic and CV rules may depend on the overall level of error. Hence, we sought to improve learning performance. In a second attempt to handle a large vocabulary, we added in several tricks of the trade. First, the number of hidden units was increased to 40, resulting in 40 internal context units as well. Second, each external context unit was given a small memory composed of 20% of its value on the previous time step (e.g., Jordan, 1986). Specifically, the activation at a given time was .8*Input + .2*OlcLactivation. Third, the strictness of the teacher varied over time. For the first 50 epochs, an output of greater than .6 or less than .4 was acceptable for units whose targets were 1 and 0, respectively. This was increased to .7 and .3 for the next 50 epochs, and so on until the teacher was as strict as possible (1.0 and 0). Finally, we only used the 298 singlesyllable 3-letter words from the training set in order to create more uniformity in the set. This removed 10 VCV words. The level of correctness was quite good and so we trained three separate models, resulting in an average of 99.1070 correct segments. The few errors obtained (32) exhibited the frame constraints. There were only two phonotactic violations, ask- /aetk/ and out- /aat/. The latter error was the only violation of the CV rule because we coded out as /awt/ and, hence, the substitution of /a/ for /w/ is a vowel for a consonant. So, the phonotactic regularity effect was upheld 93.8% of the time, and the CV rule, 96.9% of the time. Furthermore, the initialness effect was a good 66.7%. The syllabic constituent effect, however, could not be examined as there were no errors involving more than a single phoneme. For our final exploration with a large vocabulary, we increased the vocabulary to include all of the l- and a-segment words from the set of Kucera and Francis’s (1967) 3-letter words in addition to the 3-segment words resulting in a total of 412 words. This included the 2-syllable words previously excluded. Thus, the model must deal with words of differing length. Four models with 40 hidden units each were trained, two using all the features described immediately before, including the progressive teacher and the 20% memory on the external context units, and two using the features that were successful in training the small models in Study 2, a constant teacher with a criterion of .l and .9. Each model was trained until performance appeared to asymptote. The two models trained with the progressive teacher reached 95.4% and 95.6% correct segments, and the two models trained with the constant teacher did a little better, 97.5% and 96.1% correct. With the exception of the phonotactic regularity effect, the 206 errors obtained exhibited the frame constraints: 89.3% of the errors were phonotactically regular; 96.1% obeyed the CV-category rule; 61.2% of the consonant substitutions

176

DELL, JULIANO,

AND

GOVINDJEE

involved onsets; 3.9% of the errors were VCs and 0% were CVs (see Table 9). The two disappointing features of the results were the lower level of phonotactic regularity and the degree of variability among the four models with regard to the initialness effect. We have no real explanation for the latter finding, but we believe that two factors may be responsible for the lower phonotactic regularity percentage. First, it may simply be the case that higher levels of correctness are required for the large vocabularies to achieve nearly perfect percentages on phonotactic regularity. The models trained with the .l to .9 teacher had fewer errors overall and did achieve better performance on the phonotactic regularity effect, 92.5%. However, an examination of the nature of the violations in all four models leads to the conclusion that there may be a particular difficulty with word-final vowels in short words, The most common error that violated phonotactic constraints was to replace a word-final tense vowel in a 2- or l-segment word with /I/ or /E/ (e.g., she- /sI/ , new- /nI/, maw- /me/, eye- /E/, the- /61/, fee- /fI/ all occurred at least once). Although these are legal syllables, they are not legal word-final syllables. The model has a tendency to produce lax vowels erroneously in first and second positions, simply because its vocabulary has this characteristic. At the point that the lax for tense substitution is made, the string is perfectly legal. However, it is then illegal for the model to end the word at that point by producing a null segment. Yet, all of its training with these words forces it to produce a null segment, ending the word. For the model to avoid making an illegal slip, it would have to make another error and add a final consonant. This did indeed happen on occasion (e.g., eye/eg/, a legal slip). But most of the time, the model’s tendency to generate correctly a null segment in the third position (second position for a singlesegment word) won out over its tendency to generate a legal string. We don’t know whether this aspect of the model’s behavior should be considered a bug or not. If it is a bug, it could be due to the model’s representational assumptions for vowels. The model codes long vowels by features such as tense. More recent phonological theory and some psycholinguistic data (e.g., MacKay, 1978), associates long vowels with two segment positions. Thus, our phonological assumptions may be in error. Alternatively, the fact that the model’s error words occasionally end with vowels such as /e/, /me/, and /I/ may be true of real speech errors. Errors such as cake- /ke/ may indeed occur and be incorrectly coded as cutoff words, rather than phonotactic violations. A cutoff word occurs when a speaker stops in the middle of a word, often because of a need to correct or clarify what he or she has previously said (Levelt, 1983). Until there is a thorough empirical analysis of the issue, however, we simply cannot tell whether some English speech errors do end inappropriately on lax vowels. In general, the larger models gave promising results. But they were not up to human standards with respect to the percentage of errors that are phonotactically legal or the percentage of errors obeying the CV rule.

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

177

The simulations with the 412-word vocabulary allow us to explore another feature of these kinds of models, the dependence of error on the extent to which the target word is phonotactically atypical. The most common pattern in the vocabulary is CVC (63%) followed by, in order, CV (17@i’o),VC (8%), VCC (5%), CCV (4070), VCV (20/o), V (0.7%), and CVV (0.2%). The probability of an error of any form was much higher on the atypical patterns. For the four models presented in Table 9, the probability per word of any error was CVC (9Yo), CV (14O70), VC (14Yo), VCC (17(rlo), CCV (19010), VCV (22Yo), V (33%), and CVV (25%). This strong negative relation between pattern frequency and error illustrates the role of generalization in the model. Training on a particular word transfers to similar words. We do not know whether these relative error percentages map accurately onto data. It does seem to be the case that consonant clusters are particularly vulnerable to error (Stemberger, 1990), but we doubt that simple but infrequent patterns, such as a word made up of a single vowel (V), attract errors to a large extent, as the model appears to predict. One must keep in mind, though, that the token frequency of these simple words is invariably high, which the model did not represent because it was trained on each word equally often. A high token frequency can compensate for an unusual pattern in these kinds of models (Seidenberg & McClelland, 1989).

GENERAL

DISCUSSION

On average, the models do a reasonable job of simulating four speech-error effects, the phonotactic regularity effect, the CV-category effect, the syllabic constituent effect, and initialness effect. Earlier, we claimed that these effects are important because they have previously been associated with the action of phonological rules or frames, data structures that are assumed to be separate from the set of phonological segments associated with a word. Our success in simulation using a sequential PDP architecture suggests that the four effects do not necessarily reflect separate rules or frames, rather they can be produced by a mechanism that does not separate linguistic structure and content in an explicit a priori fashion. The general discussion considers the impact of these findings. Specifically, what are their implications for psycholinguistic theories of language production, linguistic theory, and work in connectionist (PDP) modelling? We first consider the implications for efforts in modelling. Issues in Connectionist Models of Language Our model produces sequences of features by integrating two information sources, the intended word to be spoken coded in the lexical plan units, and a changing short-term memory or context that represents the past state of the network. The past state of the network is, itself, comprised of two parts

Totals

Trained with progressive teacher and memory in context units

Trained with .l-.9 teacher

95.6

60

206

95.4

97.5

31

66

96.1

% Correct Segments

Chorocteristics

49

No. of Errors

Error

Trained

22 89.3% violations)

8

8

4

2

No. of Phonotactic Violations

(no

of Models

(no

TABLE on 412~Word Vocobulory

8 96.1% violations)

3

2

0

3

No. of cv Category Violations

9 (All

2-, and

49

8

9

1

31

Words)

31

9

11

7

4

No. of Final Substitutions

onsets)

3Segment

(61.2%

No. of Initial Substitutions

l-,

(3.89%

5

3

0

0

No. of vc Errors

0 0%)

-0

0

0

0

No. of cv Errors

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

179

in the internal-external architecture making this model a combination of those developed by Jordan (1986) and Elman (1990). Because of this use of context units, our model is an example of a simple recurrent network. Much of the interest in these networks has centered on their applications to language, and specifically with the hypothesis that the network’s treatment of sequencing allows it to represent linguistic structure (Allen, 1990; Elman, 1989, 1990; Gasser & Lee, 1990; Hare, 1990; Hare, Corina, & Cottrell, 1990; Jordan, 1986; Lathroum, 1989; McClelland, St. John, & Taraban, 1989; Servan-Schreiber, Cleeremans, & McClelland, 1989; St. John & McClelland, 1990). There has been a particular interest in applying recurrent nets and other connectionist architectures to phonological facts. There seem to be two reasons for this interest. First, phonology is a rule-governed aspect of language that may be more tractable for connectionist models than syntax or semantics: At least for nonphrasal phonology in languages such as English, one does not have to deal with clearly recursive structures. Second, we know more of the real-world link to phonology, namely articulation and speech audition, than we do for syntax and semantics. As a result, models can take advantage of discoveries in linguistics and phonetics by using well-motivated representations, as we have done with our output level. Connectionist approaches to phonology have included Touretzky and Wheeler’s (1990) implementation of Lakoff’s (1988) cognitive phonology and Goldsmith’s (1991) treatment of sonority and accent systems. Recurrent networks similar to the kind used here have been applied to coarticulation (Jordan, 1986), vowel harmony (Hare, 1990), and other feature-persistence effects (Gasser & Lee, 1990). This work has largely been linguistic in nature. It tries to account for facts about the distribution of sounds in words: facts about linguistic competence. In contrast, our approach is psychological. We are attempting to give an account of performance data, findings that result from people actually trying to produce language. This is the main sense in which our research differs from the previous work in phonology using recurrent nets. In fact, we find it unusual that much of the PDP phonology work has had more of a linguistic than a psycholinguistic flavor, given that early applications of connectionist models to language emphasized performance data over competence (e.g., Cottrell, 1989; Dell, 1985; Elman & McClelland, 1984; Hirst, 1987; Kawamoto, 1988; McClelland & Kawamoto, 1986; Rumelhart & McClelland, 1986; Seidenberg & McClelland, 1989; Waltz 8c Pollack, 1985). In sum, the work here contributes to the growing literature on connectionist models of language by its application of a popular class of network architectures to important psycholinguistic facts. Implications for Linguistic Theory As we stated before, the model presented here is not a model of phonological competence. It is sometimes tempting to pit linguistic theory against PDP

180

DELL, JULIANO,

AND

GOVINDJEE

models and ask whether or not the PDP model can provide an alternative to standard rule- or principle-based accounts of linguistic structure. This is not our goal and, if it were, we would be far from it. The model accounts for few of the phonological facts that are motivating current linguistic theory, such as facts related to the interaction between morphology and phonology, nonsegmental phonology, phrasal phonology, and metrical structure. The model only offers a mechanism for sensitivity to phonotactic constraints, the consonant and vowel categories, and syllabic constituency. Moreover, its mechanism is found in the covariant structure of the vocabulary, and thus begs the question of w/r-v the vocabulary has the structure that it does. A theory of phonological competence should explain why certain sound patterns are present and why others are not. Although we do not offer a theory of the structure of the vocabulary, we expect that psycholinguistic theory (connectionist or otherwise) will contribute to its development. Ultimately, it will be found through a combination of the distributional and cross-linguistic analyses that characterizes linguistic inquiry and accounts of production and perception that are sensitive to the constraints imposed by the auditory, articulatory, and cognitive systems. For example, one aspect of an action system, such as articulation, is that the system is biased to repeat or partly repeat actions. Actions are more rapidly and accurately carried out when they retain parameters from a prior action. Moreover, repetition can benefit at many levels in an action hierarchy (Rosenbaum, 1990). These facts are suggestive of reduplication phenomena in language, in which a linguistic rule calls for the repetition of some material (e.g., the plural of some Walpiri nouns is formed simply by repeating the noun, see Marantz, 1982, for review). It is tempting to conclude that reduplication is a feature of language, in part because of this characteristic of action systems. There are two aspects of our work that specifically address current phonological issues. First, there is the status of hierarchical syllabic constituents such as the rhyme. Although the psycholinguistic literature has embraced a view of syllable structure with explicit rhyme constituents (e.g., Dell, 1986; Levelt, 1989; MacKay, 1987; Treiman, 1983), phonologists have been more equivocal, and some theories do not have explicit onsets and rhymes (Clements & Keyser, 1983). In our model, speech-error effects that have been attributed to the onset/rhyme division, the syllabic constituent, and initialness effects, reflect the sequential biases inherent in the vocabulary, and not explicit constituents. We suggest that explicit rhyme constituents may not be required in either the psycholinguistic or the linguistic domains. A second phonological implication of our work concerns variation in phonotactic acceptability (e.g., Pierrehumbert, in press). Some unacceptable sequences are “less bad” than others. For example, the following syllable templates do not occur in English: /s/ stop vowel stop, where the two stops are identical noncoronals (as in /spap/) and /s/ nasal vowel nasal (as in /smq/) (Davis, 1989). Yet, these sequences are easily pronounced by English

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

181

speakers and, to our intuitions, are not very deviant. Somewhat more deviant, but still easily pronounced, are words ending in certain vowels, such as /bI/, or /se/. The most deviant sequences such as /dtb/ simply cannot be pronounced. We feel that models of the type proposed here may be useful in accounting for this variation in acceptability. For example, our model never produced three stop consonants in a row, but it did, on occasion, produce words ending in /I/ or /E/. Although we did not investigate longer sequences such as /spap/, we doubt that a model like ours could learn to exclude these illegal strings. Because /spat/, Istap/, and /pap/ are all good syllables, the proscription of /spap/ would be quite difficult to learn. The model would have to be sensitive to very specific and distant effects: the final /p/ is illegal only if there is a /p/ two segments before it and an /s/ before that. In general, these models find some bad sequences “more bad” than others, and thus may be able to give an account of the graded nature of deviancy. Even better, one may be able to show that certain exclusions are simply too hard for a model, and thus the model can make predictions about which nonoccuring phonological patterns might appear in speech. For example, let us suppose that our model cannot, as we have speculated, learn to exclude patterns such as /spap/. One would than predict that these patterns will not only be easily spoken, but also appear in speech errors, and eventually may show up in the vocabulary of English. Implications for Psycholinguistic Theory The most important aspect of our model concerns psycholinguistic issues related to language production. We have suggested that some of the data supporting the distinction between structure and content in phonological encoding in production may result from other mechanisms. In this section, we discuss the theoretical implications of this claim. First, weconsider where our model might fit in a general production model. Then we consider some of the shortcomings of the model and discuss how these limit our conclusions. Finally, we suggest how the model could be applied to some additional data sets. Where Does the Model Fit In? We propose that the model corresponds roughly to the segmental spell-out component of phonological encoding in language production (Levelt, 1989). This component maps from a lemma, an abstract symbol representing the word as a semantic/syntactic entity and any attendant morphological markers (e.g., plural), to the sequence of phonological segments of the word. In keeping with Levelt’s (1989) notion of segmental spell-out, the model does not deal with the prior process of lemma selection, that is, the identification of the word to be spoken, nor does it deal with the subsequent process of phonetic and coarticulatory adjustments to the phonological segments of a word. The model also does not handle the metrical and prosodic structure of single words and words in sentences.

182

DELL, JULIANO,

AND

GOVINDJEE

The lexical input nodes in the model correspond to the lemma. In the first set of studies with the model, we found that it was successful with both the random and correlated representations. Thus, we would be comfortable viewing the lemma either as Levelt (1989) did, as an entity that contains no phonological information (like the random coding), or as a kind of underlying phonological form (like the correlated coding). The sequence of output features corresponds to the retrieved set of segments for each lexical item. Each segment, represented in a distributed fashion as a set of features, is produced one at a time. If the model is to be taken as an account of segmental spell-out, it is useful to compare it to other accounts of this process. Our comparison centers on Dell’s (1986) model, which is similar to the spell-out models of Stemberger (1985), MacKay (1987), and Levelt (1989), and shares some features with Shattuck-Hufnagel’s (1979) model. To anticipate our conclusion, our model will be shown to be superior in dealing with the general frame constraints on speech errors, but not with other speech-error effects. Segmental spell-out in Dell (1986) is achieved by spreading activation in a hierarchical network. A word (or morpheme) node is activated when it is time to produce the word. Activation spreads to syllable nodes (preferentially to the first syllable) and from there it spreads downwards to syllabic constituent nodes, to phonological segment nodes, and to feature nodes. Activation spreads upwards as well. After a period of time has passed, the most highly activated syllabic onset node, vowel node, and coda node are selected and linked, respectively, to onset, vowel, and coda slots in a syllable frame. Inhibitory processes then diminish the activation of the selected constituents. If there are more syllables in the word, activation then spreads preferentially to the second syllable of the word and the process continues until the end of the word. Speech errors occur when the wrong node is more active than the correct one and is selected in its place. This can occur because the spreading-activation process activates nodes other than the intended ones, and because other words in the sentence and extraneously activated words also can interfere. Dell’s (1986) model contrasts with our model in several ways. First, our model retrieves all information sequentially, whereas the earlier one retrieves the segments of a syllable simultaneously. Second, our model has no specific nodes for syllables, syllabic constituents, or segments, but the earlier model does. Third, the earlier model uses the syllabic frame and some unspecified phonological rules to create serial order and enforce phonotactic constraints; our model has no explicit frame or rules that are stored separately from the segmental information. A consequence of the lack of frames is that our model produces substitutions (e.g., try-cry) deletions (fry-tie) and additions (try--trite) by the same mechanism; Dell’s model (1988) followed a suggestion of Stemberger (1985) that additions and deletions involve errors in the

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

183

frame, whereas substitutions do not. Fourth, spreading activation is interactive in the earlier model; activated nodes at a lower level can influence the amount of activation at higher levels. In our model, the lexical input pattern is unaffected by the segmental spell-out process. The model here, in our view, gives a more satisfactory explanation of the four general frame constraints than Dell’s (1986) whose account of the phonotactic regularity effect was incomplete. Errors were phonologically acceptable because the frame slots were labeled as either onset, vowel, or coda, and would only accept legal onsets, vowels, or codas. However, extra rules or more slot information would be required to make the model sensitive to word-level phonotactic patterns (e.g., never begin a word with /i/ or end it with /I/). The only motivation for this extra information would be the phonotactic regularity effect, itself. Our model has the advantage of having a fully motivated account of the effect. It results entirely from the mechanism of the vocabulary storage. Moreover, our model, unlike Dell’s (1986), is consistent with the fact that violations of the phonotactic rule actually do occur sometimes. The rule that consonants and vowels are only replaced by segments of the same type is produced in Dell (1986) by labels on the frame slots, hence, the effect is largely built in. In our model, the effect is a by-product of the large dissimilarly between vowels and consonants. The syllabic constituent effect also is built into Dell’s (1986) model because there are specific nodes for rhyme constituents. When a particular V and C form a rhyme, they connect to a specific VC node. This increases the chance that these two segments will have a common fate; that is, they will either both be replaced or neither will be. Our model requires no special nodes to produce the effect. Finally, the initialness effect is simply not present at all in Dell’s (1986) model. Because it retrieves the onset and coda consonants of syllables at the same time, there is no difference in their error rates. Our model produces the initialness effect as a result of its sequential assumptions. These assumptions also make our model more consistent with Meyer’s (1991) findings, which show that segments within a syllable are retrieved in order. We note that related frame-based models (MacKay, 1987) also produce the initialness effect by sequential activation. Although the PDP model does a better job than the earlier one at explaining the four general frame constraints, there are other data to be considered. Here, we compare the two models’ ability to handle movement constraints and similarity and familiarity effects on phonological errors. Dell’s (1986) model, and other models of segmental spell-out (ShattuckHufnagel, 1979; Stemberger, 1990), are able to give accounts of both movement and nonmovement phonological errors. In particular, these models can generate exchange errors (e.g., lark yibrury for York library) and other

184

DELL, JULIANO,

AND

GOVINDJEE

movement errors such as anticipations and perseverations. Our model only produces nonmovement errors. This is a serious omission for two reasons. First, movement errors are quite common, making up a majority of phonological errors. In Garnham et al.‘s (1981) tape-recorded error corpus, 69% of the phonological errors are contextually based as determined by the analysis of Schwartz, Saffran, and Dell (1990). The actual breakdown between movement and nonmovement slips, however, depends heavily on how error sources are defined. The 31% nonmovement errors from Schwartz et al’s analysis were characterized by the lack of a segment identical to the error in the clause before and the clause after the error site. Other criteria could lead to different percentages. In general, the distinction is not clear-cut, and hence, we find it most profitable to view a contextual influence as graded, and as one of many possible forces on whether a target sound is correctly produced. Although our model does not directly represent the contextual influence on errors, we believe that the model provides a general account of phonological errors. Contextual influences and other influences on error such as inattention are, for us, just part of the noise, aspects that we do not simulate systematically. The second problem arising from the model’s failure to produce movement errors concerns exchanges. The existence of exchange errors suggests the action of phonological frames (Shattuck-Hufnagel, 1979). Each segment was erroneously placed in the other’s frame slots. Moreover, there are positional constraints on which sounds can exchange, and these can be explained by labels on the slots. Exchange errors, such as the one listed before, tend to involve similar sounds occupying structurally similar positions. In fact, more than 70% of between-word exchanges involve word-initial consonants exchanging with word-initial consonants (Shattuck-Hufnagel, 1987) This suggests, in the terms of frame-based models, that word-initial consonants slots are labelled as such and are treated differently in some way. There may also be a syllable-position constraint on which sounds can exchange suggesting a further labelling of slots with syllable information, but it is unclear whether such a constraint exists (e.g., Meyer & Dell, 1993; ShattuckHufnagel, 1987). Can our model be amended to produce exchange errors such as lark yibrury? Our model certainly could be amended to have output influenced by more than one word at time. For example, one can imagine a model in which the input pattern for York is contaminated with that of library when York is the target word. For example, the activation of each input node might be 80% of what it should be for York and 20% of what it should be for library. This contamination would presumably be due to the action of lemma retrieval processes and the syntactic processes responsible for ordering words. The need for continuously updating the lexical input units may naturally lead to circumstances in which the pattern is a mixture of more than one

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

185

word. This amendment to our model may account for some of the movement constraints on speech errors. First, error involving movement of word-initial consonants (lark) would be most common for a reason closely related to the model’s explanation of the initialness effect in nonmovement errors. Because they are word-initial, both the /l/ and /y/ are associated to the null segment and, hence, the context units fail to point to one over the other, resulting in an increased chance of one replacing the other. More generally, there would be increased probability of a movement error right after any point where the two words have identical or similar contexts. This would create a tendency for movement errors to occur in similar positions and after similar sounds, although it remains to be seen, of course, whether such tendencies would match those in real data (e.g., Dell, 1984; MacKay, 1970). Second, movement errors tend to involve similar consonants (e.g., MacKay, 1970; ShattuckHufnagel & Klatt, 1979). Again, this is a simple consequence of the model’s tendency to replace sounds with sounds that are close in output space, and our assumptions about the nature of the output space. Third, movement errors tend to create common sounds and sound combinations, and even words (e.g., Baars, Motley, & MacKay, 1975; Dell, 1990; Dell & Reich, 1981; Levitt & Healy, 1985; Stemberger, 1991). Our model has a natural tendency to produce familiar sounds and sound combinations because of its sensitivity to the statistics of sound distribution. In fact, the model’s avoidance of nonoccurring sound combinations in its nonmovement errors is the result of this tendency. The same forces would be at work in movement errors. Despite our optimistic speculations, there are challenges in applying the model to movement errors. The most serious is that, although movement errors may be created by the mechanism outlined earlier, such a model would not naturally create complete exchange errors. So, when /l/ replaces /y/ to create lork, this replacement would not cause, in the model, a corresponding tendency for the /y/ to replace the /l/ in library to create the exchange lork library. In the frame-based models, anticipations, lork, do cause corresponding perseverations, yibrury, because the replacing sound, /l/, has been “used” and is inhibited or removed from the pool of segments to be inserted and the replaced sound, /y/, has not (Dell, 1986; MacKay, 1987; Reich, 1977; Shattuck-Hufnagel, 1979; Stemberger, 1985). Although these kinds of mechanisms can be implemented in networks by “competitive cueing” (e.g., Burgess & Hitch, 1992; Houghton, 1990), there is no such mechanism in our model. In this respect, the frame-based models enjoy the advantage. Another challenge to our model arises from facts about movement errors when consonant clusters are involved. Stemberger (1990) showed that consonants are sometimes erroneously moved to form consonant clusters, and this effect is enhanced if the moving consonant was, itself, part of a cluster. For example, big glass- blig glass would be more likely than big lass- blig

186

DELL, JULIANO,

AND

GOVINDJEE

lass, suggesting that the moving /l/ “knows” that it is part of a cluster. Because consonant clusterhood is an abstract property of a word’s shape, one can explain the effect by proposing that frame characteristics, such as the fact that the word begins with a cluster, exist separately from the segmental content of the word. Stemberger’s (1990) model and other framebased models (e.g., MacKay, 1987) do, in fact, have a unit representing an abstract initial consonant cluster. This unit is activated when a word with a cluster is planned, and this activation can trigger the erroneous production of a cluster in a surrounding word. Thus, these effects must also be considered as frame constraints, error effects that suggest the action of frames, We do not know whether our model would produce these cluster movement effects were it to be amended so that planned words could interfere with one another. However, there is reason to be optimistic because of the model’s sensitivity to similarity and context. The /l/ in glass follows a stop consonant, and so context units would be more likely to signal for it to be produced after the /b/ in big than if the /l/ came from lass. Specifically, the model that we envisage predicts that blig would be most likely to occur in the context big blast (identical initial consonants, hence, very similar network states after the /b/ in big and the /b/ in blast), somewhat less likely in big glass (/b/ and /g/ are similar consonants), less likely in big slap (dissimilar consonants), and least likely in big lass (no consonant before the /l/). Stemberger’s (1990) data suggest that big blast > big glass = big slap > big lass and, hence, the predicted difference between similar and dissimilar consonantal effects did not show up in the data. Given the absence of an implemented model of these effects, however, it remains to be seen whether the predicted pattern deviates significantly from the data. In summary, we cannot claim that the PDP model is superior to the framebased models. It lacks development in some areas. However, it gives a parsimonious account of a number of effects, and may have the potential for development to handle other effects. Is the Model a Theory of Learning? One area in which the PDP model may prove to be better than existing models is in consideration of learning and practice effects. Can the model’s learning mechanism be taken as a theory of how the pronunciation of words is learned? First of all, we believe the model should not be credited or blamed for the extent to which its learning assumptions match those of neural systems. It is quite removed from this level. At the same time, though, this kind of model should make predictions from its learning and representational assumptions about change as a function of experience. We feel that such predictions can, at this point, be most easily made by examining learning and priming effects in the mature system rather than developmental effects. Making predictions about phonological development may be premature because the model would have to have

STRUCTUREAND CONTENT IN LANGUAGE PRODUCTION

187

an accurate characterization of the initial state of the child and the specifics of the child’s experience. Although knowledge in this area has advanced (e.g., Menn, 1983; Jusczyk, Bertoncini, Bijeljac-Babic, Kennedy, 8c Mehler, 1990; Werker & Lalonde, 1988), a theorist would, nonetheless, have a difficult time deciding how to set up a network that encodes the child’s initial state of phonological knowledge and a training regimen to match that of the child’s early experience. The learning aspect of the model may, therefore, best be tested by looking at changes in the mature system under controlled training conditions. For example, one of the model’s most striking predictions is that error adherence to phonotactic constraints is related to experience with the vocabulary. This suggests that recent experience can modify such adherence. An experiment by Meyer and Dell (1993) is relevant. They gave subjects sequences of Dutch monosyllables to produce, and recorded errors. As expected, errors were sensitive to the phonotactic constraints of Dutch, specifically that /h/ is syllable-initial and /T/ is syllable-final. Legal errors producing the segments /h/ and /T/ occurred 352 times and illegal ones (e.g., having a syllable begin with /y/ or end with /h/ occurred only 8 times. However, Meyer and Dell also showed that subjects’ errors were sensitive to a local phonotactic “rule” present only in the experimental stimuli, not in the Dutch language. In the stimuli, /s/ was only syllable-initial and /f/ was only syllable-final. 234 errors involving the production of /s/ and /f/ were “legal” and only 16 were “illegal,” producing a final /s/ or initial /f/. The sensitivity to recent experience was, therefore, almost as strong as the actual Dutch rule involving /h/ and /g/. This is exactly what would happen if the PDP model were exposed to the experimental stimuli. It would learn to be sensitive to the distributional properties of the stimuli, and this would show up in its errors,’ Another example of learning in the mature system that could be examined in the light of the model is short-term priming effects, the kind of effects discussed earlier in the context of reduplication. When the model correctly produces a word, it nonetheless makes small adjustments to connection weights in response to the deviations in each output unit from the target values of l’s and 0’s. These adjustments have consequences for the production of subsequent words. In general, the model would find it easy to produce the same word later, and this benefit would be greatest if the repetition were immediate. Specifically, repeating a pattern such as “pick pick pick. . . ” would be easier than, “pick pick ton ton pick pick. . . ,” which, in turn would ’ One should not think that this means that the model (or the subjects!) would then have difficulty speaking normal Dutch after the experiment. The model’s learning should be tied to the specific context of the experiment, a context that would presumably be coded by other input nodes. PDP models need to develop mechanisms that insulate learning from severe retroactive interference (e.g., McCloskey & Cohen, 1989; Ratcliff, 1990; Sloman & Rumelhart, 1992).

188

DELL, JULIANO,

AND

GOVINDJEE

be easier than “pick ton pick ton. . . ,” which would be easier than “pick ton tick pun pick ton tick pun. . . .“’ This behavior of the model appears to match that of subjects (Sevald & Dell, 1990). These two applications of the model’s learning assumptions to data serve to illustrate the possibility of testing PDP accounts of learning. In general, we suggest that modelling changes in mature systems will, at least initially, prove more fruitful than more ambitious attempts to handle developmental effects. The success of PDP learning models of data from controlled experimental paradigms (e.g., Cleeremans & McClelland, 1991; Cohen, Dunbar & McClelland, 1990; Gluck & Bower, 1988) is clear, whereas applications to developmental data (e.g., Rumelhart & McClelland, 1986) may continue to be more difficult because of uncertainty about the initial state and early experience. CONCLUSIONS: RULES AND FRAMES IN PSYCHOLINGUISTIC

MODELS

It is important to distinguish between what our model shows-that certain speech-error effects attributed to separate rules and frames can be accounted for otherwise-and the general claim that separate rules and frames are not required in accounts of language performance. We have not addressed any of the data that motivate rules and frames in morphology and syntax. Nor have we addressed phonological facts associated with stress, cliticization, and resyllabification, facts which are standardly accounted for by restructuring phonological frames for single words and which, it can be argued, suggest the need for such frames in production (Levelt, 1989). Consequently, we must be content with making a more limited claim: The PDP model can be taken as a demonstration that sequential biases, similarity, and the structure of the vocabulary are powerful principles in explaining error patterns. In fact, we take the model a bit further and argue that the general frame constraints may simply reflect similarity and sequential effects and not phonological frames, at least not directly. Moreover, we interpret the model’s success in its domain as evidence that there is something right about it, that the “correct” model would make use of learning and distributed representations in such a way that phonological speech errors result from the simultaneous influence of all the words stored in the system, in addition to the set of words in the intended utterance. ’ We verified this claim with one of the small models that had been trained on the frequent vocabulary until it made no errors. Sixteen monosyllabic words from Sevald and Dell (1990) were then added to its training vocabulary and training continued until all words were produced correctly. Then various patterns-identical words (AAAA. . .), repeated words (AABBAA. . .), alternating words (ABABAB.. .), and repetition at lag four (ABCDABCD.. .)-were produced, and the ease of production, as measured by the summed error score (Euclidian distance between output and correct vectors for each intended segment) was assessed.

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

189

But can we take the model’s success as evidence that analogous models, which deal with patterns of words instead of patterns of sound, would also work as psycholinguistic models? Although some models have demonstrated remarkable syntactic knowledge (e.g., Elman, 1989, 1990), there is reason to be cautious if not downright skeptical. The creativity associated with the phonology of single words is trival compared to that of the syntactic system. Separately stored rules, frames, principles, and the like are effective mechanisms for achieving creativity in a systematic fashion, and it is far from clear that simple networks of the type studied here can do the job (Fodor & Pylyshyn, 1988; Kim, Pinker, Prince, 8c Prasada, 1991). Nonetheless, we believe that there may be many performance phenomena, including syntactic phenomena, whose explanation lies in the experienced set of items and the mechanisms for their storage. At the same time, there may be other effects best explained by structural knowledge that has been separated out from linguistic content. A worthy goal of psycholinguistic research is to determine which is which. REFERENCES Allen, R.B. (1990). Connection% languageusers. Connection Science, 4, 279-311. Baars, B., Motley, M., & MacKay, D. (1975). Output editing for lexical status in artificially elicited slips of the tongue. Journal of Verbal and Verbal Behavior, 14, 382-391. Berg, T. (1988). Die Abbildung des Sprachprodukfionprozess in einem Aktivafionsflussmodel [An illustration of speech production processes in a spreading activation model]. Tubingen: Max Niemeyer. Berg, T. (1991). Phonological processing in a syllable-timed language with pm-final stress: Evidence from Spanish speech error data. Language and Cognhive Processes, 6, 265-301.

Bock, J.K. (1982). Towards a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review, 89, l-47. Bock, J.K. (1986). Meaning, sound, and syntax: Lexical priming in sentence production. Journal of Experimenlal Psychology: Learning, Memory & Cognition, 12, 575-586. Burgess, N., & Hitch, G. (1992). Toward a network model of the articulatory loop. Journal of Memory and Language, 31, 429-460. Chomsky, N. (1957). Synfactic sfructures. The Hague: Mouton. Chomsky, N., & Halle, M. (1968). The soundpattern of English. New York: Harper & Row. Cleeremans, A., & McClelland, J.L. (1991). Learning of the structure of event sequences. Journal of Experimental Psychology: General, 120, 235-253. Clements, G.N., & Keyser, S.J. (1983). CV-Phonology: A generative theory of the syllable. Cambridge, MA: MIT Press. Cohen, J.D., Dunbar, K.. & McClelland, J.L. (1990). On the control of automatic processes: A parallel distributed account for the Stroop effect. PsychologicalReview, 97, 332-364. Cottrell, G.W. (1989). A connectionisl approach to word-sense disambiguafion. London: Pittman. Crompton, A. (1981). Syllables and segments in speech production. Linguistics, 19, 663-716. Davis, S. (1989). Cross-vowel phototactic constraints. Computational Linguistics, 15, 109-l 10. Dell, G.S. (1984). The representation of serial order in speech: Evidence from the repeated phoneme effect in speech errors. Journal of Experimental Psychology: Learning, Memory & Cognilion, 10, 222-233.

190

DELL, JULIANO,

AND

GOVINDJEE

Dell, G.S. (1985). Positive feedback in hierarchical connectionist models: Applications to language production. Cognitive Science, 9, 3-24. Dell, G.S. (1986). A spreading-activation theory of retrieval in language production. Psychological Review, 93, 283-321. Dell, G.S. (1988). The retrieval of phonological forms in production: Tests of predictions from a connectionist model. Journal of Memory and Language, 27, 124-142. Dell, G.S. (1990). Effects of frequency and vocabulary type on phonological speech errors. Language and Cognitive Processes, S, 313-349. Dell, G.S., & Reich, P.A. (1981). Stages in sentence production: An analysis of speech error data. Journal of Verbal Learning and Verbal Behavior, 20, 61 l-629. Eikmeyer, H-J. & Schade, U. (1990). Sequentialization in connectionist language-production models. Cognitive Systems, 3, 128-138. Elman, J.L. (1989) Structured representations and connectionist models. Proceedings of the 11th Annual Conjerence of the Cognitive Science Society, Hillsdale, NJ: Erlbaum. Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 213-252. Elman, J.L., & McClelland, J.L. (1984). The interactive activation model of speech perception. In L. Lass (Ed.), Speech ond language. New York: Academic. Fodor, J.A., & Pylyshyn, Z.W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3-71. Fowler, C.A. (1987). Consonant-vowel cohesiveness in speech production as revealed by initial and final consonant exchanges. Speech Communication, 6, 231-244. Fromkin, V.A. (1971). The nonanomalous nature of anomalous utterances. Language, 47, 27-52. Garnham, A., Shillcock, R.C., Brown, G.D.A., Mill, A.I.D., &Cutler, A. (1981). Slips of the tongue in the London-Lund corpus of spontaneous conversation. Linguistics, 19, 805-817. Garrett, M.F. (1975). The analysis of sentence production. In G.H. Bower (Ed.), Thepsychology of learning and motivation (pp. 133-177). New York: Academic. Garrett, M.F. (1980). Levels of processing in sentence production. In B. Butterworth (Ed.), Languageproduction (Vol. 1) (pp. 177-220). London: Academic. Gasser, M., & Lee, C-D. (1990). Networks that learn about phonological feature persistence. Connection Science, 2, 265-278. Gluck, M.A., & Bower, G.H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195. Goldsmith, J. (1991). Phonology as an intelligent system. In D.J. Napoli & J. KegI (E&.), Bridges between psychology and linguistics: A Swarthmore Festschrift for Lila Gleitmon. Hilldale, NJ: Erlbaum. Hare, M. (1990). The role of similarity in Hungarian vowel harmony: A connectionist account. Connection Science, 2, 125-152. Hare, M., Corina, D., Br Cottrell, G. (1990). A connectionist perspective on prosodic structure. Proceedings of the Annual Meeting of the Berkeley Linguistic Society, 15, Berkeley, CA: University of Calfornia Press. Hirst, G. (1987). Semantic interpretation and the resolution of ambiguity. Cambridge, Cmbridge University Press. Hotopf, W.H.N. (1983). Lexical slips of the pen and tongue. In B. Butterworth (Ed.), Lenguoge production (Vol. 2). San Diego, CA: Academic. Houghton, G. (1990). The problem of serial order: A neural network model of sequence learning and recall. In R. Dale, C. Mellish, & M. Zock (Eds.), Current research in noturul language generation (pp. 287-319). London: Academic. Jordan, M.I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society. (pp. 531-546). Hillsdale, NJ: Erlbaum.

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

191

Juscsyk, P.W., Bertoncini, J., Bijeljac-Babic, R., Kennedy, L.J., & Mehler, J. (1990). The role of attention in speech perception by infants. Cognifive Developmenf, 5. 265-286. KawamOtO, A.H. (1988). Distributed representations of ambiguous words and their resolution in a connectionist network. In S. Small, G. Cottrell, & M. Tanenhaus (Eds.), Lexical ambiguity resolution. (pp. 195-228). San Mateo, CA: Morgan Kaufmann. Kempen, G., & Hoenkamp, E. (1987). An incremental procedural grammar for sentence foimulation. Cognitive Science, Il. 201-258. Kim, J.J., Pinker, S., Prince, A., & Prasada, S. (1991). Why no mere mortal has ever flown out to center field. Cognitive Science, 15, 173-218. Kucera, H., &Francis, W.N. (1967). Compulationalanalysisofpresent-day American English. Providence: RI: Brown University Press. Lachter, J., & Bever, T.G. (1988). The relation between linguistic structure and associative theories of language learning-A constructive critique of some connectionist learning models. Cognition, 28, 195-247. Lakoff, G. (1988, December). Cognitive phonology. Paper presented at the annual meeting of the Linguistic Society of America, Washington, DC. Lapointe, S. (1985). A theory of verb form use in the speech of agrammatic aphasics, Brain & Language, 24, 100-155. Lashley, K.S. (1951). The problem of serial order in behavior. In L.A. Jeffress (Ed.), Cerebral mechanisms in behavior. New York: Wiley. Lathroum, A. (1989). Feature encoding by neural nets. Phonology, 6, 305-316. Levelt, W.J.M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41-104. Levelt, W.J.M. (1989). Speaking: From intention to articulation. Cambridge, MA: MITPress. Levitt, A.G., & Healy, A.F. (1985). The roles of phoneme frequency, similarity, and availability in the experimental elicitation of speech errors. Journal of Memory and Language, 24, 717-733. Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8, 249-336.

MacKay, D.G. (1970). Spoonerisms: The structure of errors in the serial order of speech. Neuropsychologia,

8, 323-350.

MacKay, D.G. (1972). The structure of words and syllables: Evidence from errors in speech. Cognitive

Psychology,

3, 210-227.

MacKay, D.G. (1974). Aspects of the syntax of speech: Syllable structure and speech ratio. Quarterly

Journal

of Experimental

Psychology,

26, 642-657.

MacKay, D.G. (1978). Speech errors inside the syllable. In A. Bell&J. Hooper (Eds.), Syllables and segments. New York: North Holland. MacKay, D.G. (1982). The problems of flexibility, fluency, and speed-accuracy trade-off in skilled behavior. Psychological Review, 89, 483-506. MacKay, D.G. (1987). The organization of perception and action: A theory of language and other cognitive skills. New York: Springer. MacWhinney, B., Leinbach. J., Taraban, R., & McDonald, J. (1989). Language learning: Cues or rules? Journal of Memory and Language, 28, 255-277. Marantz, A. (1982). Re Reduplication. Linguistic Inquiry, 13, 435-482. McClelland, J.L., & Kawamoto, A.H. (1986). Mechanisms of sentence processing: Assigning roles to constituents, In J.L. McClelland & D.E. Rumelhart @ids.), Paralleldistributed processing: Explorations in the microstructure of cognition (Vol. 2). Cambridge, MA: Bradford Books. McClelland, J.L., St. John, M., & Taraban, R. (1989). Sentence comprehension: A parallel distributed processing approach. Language and Cognitive Processes, 4, 287-335. McCloskey, M., & Cohen, N.J. (1989). Catastrophic interference in connectionist networks: The sequential learning paradigm. In G.H. Bower (Ed.), Psychology of learning and motivation (Vol. 2). (pp. 109-164). New York: Academic.

192

DELL, JULIANO,

AND

GOVINDJEE

Menn, L. (1983). Development of speaking skills. In B. Butterworth (Ed.), Longuuge Pro&ction (Vol. 2). London: Academic. Meringer, R., &Mayer, K. (1895). Versprechen und Verlesen [Slips of the tongue and errors in reading]. Stuttgart: Goschensche. Meyer, AS. (1990). The phonological encoding of successive syllables. Journal of Memory and Language,

29, 524-545.

Meyer, AS. (1991). The time course of phonological encoding in language production: Phonological encoding inside a syllable. Journal of Memory ond Language, 30, 69-89. Meyer, A.S., & Dell, G.S. (1993). An experimental analysis of positional and phonotactic constraints on phonological speech errors (manuscript in preparation). Nooteboom, S.G. (1969). The tongue slips into patterns. In A.G. Sciarone, A.J. van Essen, & A.A. Van Raad (Eds.), Leyden studiesin linguisticsandphonetics. The Hague: Mouton. Pierrehumbert, J. (in press). Syllable structure and word structure: Astudy of triconsonantal clusters in English. In P. Keating & B. Hayes (Eds.), Papers m laboratory phonology (Vol. 3). Cambridge: Cambridge University Press. Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-194. Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multilayered perception: Implications for child language acquisition. Cognition, 38, 43-102. Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308. Reich, P.A. (1977). Evidence for a stratal boundary from slips of the tongue. Forum Linguisticum, 2, 211-217. Rosenbaum, D.A. (1990). Human motor control. San Diego: CA: Academic. Rumelhart, D.E., & McClelland, J.L. (1986). On learning the past tenses of English verbs. In J.L. McClelland & D.E. Rumelhart (Eds.), Parallel distributedprocessing: Explorutions in the microstructure of cognition (Vol. 2) (pp. 216-271). Cambridge, MA: Bradford Books. Rumelhart, D.E., Hinton, G.E., &Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Ed.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press. Schwartz, M.F., Saffran, E.M., &Dell, G.S. (1990, October). Comparingspeech errorpatterns in normals

and jargon

aphasics:

Methodological

issues

and

theoretical

implications.

Paper presented at the Academy of Aphasia, Baltimore, MD. Seidenberg, M., & McClelland, J.L. (1989). A distributed developmental model of visual word recognition and naming. Psychological Review, 96, 523-568. Sejnowski, T.J., &Rosenberg, CR. (1987). Parallel networks that learn to pronounce English text. Complex Systems, I, 145-168. Servan-Schreiber. D., Cleeremans, A., & McClelland, J.L. (1989). Learning and sequential structure in simple recurrent networks. In D. Touretzky (Ed.), Advances in neural information processing systems (Vol. 2). San Mateo, CA: Morgan Kaufmann. Sevald, C., & Dell, G.S. (1990, May). Parameter remapping in speech production. Paper presented at Midwestern Psychological Association, Chicago, IL. Shattuck-Hufnagel, S. (1979). Speech errors as evidence for a serial-order mechanism in sentence production. In W.E. Cooper 8c E.C.T. Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale, NJ: Erlbaum. Shattuck-Hufnagel, S. (1983). Sublexical units and suprasegmental structure in speech production planning. In P.F. MacNeilage (Ed.), Theproduction ofspeech (pp. 109-136). New York: Springer-Verlag. Shattuck-Hufnagel, S. (1986). The representation of phonological information during speech production planning: Evidence from vowel errors in spontaneous speech. Phonology Yearbook, 3, 117-149.

STRUCTURE

AND

CONTENT

IN LANGUAGE

PRODUCTION

193

Shattuck-Hufnagel, S. (1987). The role of word-onset consonants in speech production planning: New evidence from speech error patterns. In E. Keller & M. Gopnik, (Eds.), Motor and sensory processes of language. Hillsdale, NJ: Erlbaum. Shattuck-Hufnagel, S., & Klatt, D. (1979). The limited use of distinctive features and markednessin speech production: Evidence from speech error data. Journal of Verbal Learning and Verbal Behavior, 18, 41-S. Sloman, S., & Rumelhart, D.E. (1992). Reducing interference in distributed memory through episodic gating. In A.F. Healy, S.M. Kosslyn, & R.M. Shiffrin (Eds.), From learning theory to cognitiveprocesses: Essays in honor of W.K. Es&r. Hillsdale, NJ: Erlbaum. Stemberger, J.P. (1982). The lexicon in a modelof languageproduction. Unpublished doctoral dissertation, University of California, San Diego. Stemberger, J.P. (1983a). Speech errors and theoretical phonology: A review. Bloomington: Indiana University Linguistics Club. Stemberger, J.P. (1983b). The nature of/r/ and /l/ in English: Evidence from speech errors. Journal of Phonetics, II, 139-147. Stemberger, J.P. (1985). An interactive activation model of language production. In A. Ellis, (Ed.), Progress in thepsychology of language (Vol. 1). London: Erlbaum. Stemberger, J.P. (1990). Word shape errors in language production. Cognition, 35, 123-157. Stemberger, J.P. (1991). Apparent antifrequency effects in language production: The addition bias and phonological underspecification. Journal of Memory and Language, 30, 161-185. St. John, M.F., & McClelland, J.L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46, 217-257. Touretzky, D.S., & Wheeler, D.S. (1990). A computational basis for phonology. In D.S. Touretzky (Ed.), Advances in neural irlformation processing systems (Vol. 2). San Mateo, CA: Morgan Kaufmann. Treiman, R. (1983). The structure of spoken syllables: Evidence from novel word games. Cog&ion, 15, 49-74. Van Orden, G.C., Pennington, B.F., & Stone, G.O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488-522. Waltz, D.L., & Pollack, J.B. (1985). Massively parallel parsing: A strongly interactive model of natural language interpretation. Cognifive Science, 9, 51-74. Wells, R. (1951). Predicting slips of the tongue. Yule Scientific Muguzine, 3, 9-30. Werker, J.F., & Lalonde, C.E. (1988). Cross-language speech perception: Initial capabilities and development change. Developmental Psycholou, 24, 672-685. Yaniv, I., Meyer, D.E., Gordon, P.C., Huft, C.A., & Sevald, CA. (1990). Vowel similarity, connectionist models, and syllable structure in motor programming of speech. Journal of Memory and Language, 29, l-26.

FEATURE p={O,

CODING

APPENDIX A: FOR ENGLISH PHONOLOGICAL

SEGMENTS

1,0,0,0,&0,0,0,0,1,0,1,0,0,0,0,0>;

b={O,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0}; m={O,1,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0}; mS={1,1,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0};/*SYLLABICm*/ t={0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0}; d={O,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0); n={O,1,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,0}; nS={1,1,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,0}; k={O,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0}; g=IO, l,O, 1,0,0,0,0,0,0,0,0,0,1,1,0,0,0~; 3 {0,1,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0}; f={O,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0}; v={0,1,0,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0}; s={0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,0,0}; 2={0,1,0,1,1,0,1,0,0,0,0,1,1,0,0,0,0,0}; ~={0,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0}; 6={0,1,0,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0}; s={0,1,0,0,1,0,1,0,1,0,0,1,0,1,0,0,0,0}; i={0,1,0,1,1,0,1,0,1,0,0,1,0,1,0,0,0,0}; E={O,1,0,0,0,0,1,0,1,1,0,1,0,1,0,0,0,0}; j={O, l,O, 1,&O, l,O, l,l,O, l,O, l,O,O,O,O); r={O,1,1,1,1,1,0,0,1,0,0,0,1,1,0,0,0,0}; ~~={1,1,1,1,1,0,0,1,0,0,0,1,1,0,0,0,0,0};/*SYLLAB1C1*/ r={0,1,1,1,1,0,0,0,0,0,0,1,1,0,0,1,0,0}; rS={l,1,1,1,1,0,0,0,0,0,0,l,1,0,0,1,0,0);/*SYLLABICr**/ w={0,0,1,1,1,0,0,0,0,0,1,0,0,1,1,0,0,0}; u={0,0,1,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0}; h={O,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0};

/*SYLLABICn*/

/*NOTVOICEDth*, /*VOICEDth*/

i={1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1}; /*asinBEAT */ 1={1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0}; /*asinBIT*/ e~={1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}; /*asinBAIT*/ E= { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); /* as in GET */ a={l,O,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0}; /*asinSOFA*/ ce={1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0}; /* asin BAT */ a={l,O,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0}; /*asinHOT*/ 3={1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0}; /*asinCAUGHT*/ V={1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0); /*asinFOOT*/ u={1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1}; /*asinCRUDE*/ 0={1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1}; /*asinBOAT*/ The features are, in order, syllabic, consonantal, sonorant, voiced, contin uant, nasal, strident, lateral, distributed, affricate, labial, coronal, anterior high, back, low, round, tense. 194

CODING agO and any art ask bad bed big but can Cal

cut did end far for get god got gun had has him his hot

APPENDIX B: FOR FREQUENT VOCABULARY /ago/ /aend/ /eni/ / x-t/ /=sk/ /bed/ /bed/ /big/ /bat/ /kax/ /kIr/ /kat/ /did/ /end/ /f >r/ /for/ /get/ /gad/ /gat/ /gan/ /had/ /h=z/ /him/ /hIz/ /hat/

its job let lot man men met nor not old one Put ran red run sat set son ten try use WU

was yes yet

/Its/ /jab/ /let/ /lat/ /maxU /men/ /met/ /nor/ /nat/ /old/ /wan/ /put/ /raen/ /red/ /ran/ /set/ /Sd/ /San/ /ten/ /tra/ /yuz/ /war/ /waz/ /yes/ /yet/

195

Structure and content in language production: A theory of frame constraints in phonological speech errors

Structure and content in language production: A theory of frame constraints in phonological speech errors

Recommend Documents