Paraphrasing: Extending the data base for child language research

Paraphrasing: Extending the data base for child language research

Language Scknc~s. Volume Printed in Great Britain 10. Number I, pp. 173-192. 1988. 03884001/88 $3.00+ .OO Q 1988 Pergamon Press plc Paraphrasing: ...

1MB Sizes 0 Downloads 37 Views

Language Scknc~s. Volume Printed in Great Britain

10. Number

I, pp. 173-192. 1988.

03884001/88 $3.00+ .OO Q 1988 Pergamon Press plc

Paraphrasing: Extending the Data Base for Child Language Research

Jan Vorster Human Sciences Research Council

ABSTRACT

This paper presents a principled method, paraphrasing, for expanding what young children actually say to what they intended to say. The technique consists of minimally restoring all deviant utterances to well-formedness, taking account of available verbal and nonverbal contextual information. The method is illustrated by application to longitudinal studies of six subjects, three initially at 18 months of age and three initially at 28 months of age. Mean Length of Utterance (MLU) spread the children along a developmental continuum, which served as a baseline for comparison. MLU was significantly correlated with real&d and paraphrased frequencies of several linguistic items in the children’s corpora. Similar results were obtained from an association of children’s linguistic development with a number of additional measures. These included coverb deletion and the complexity of linguistic context, deletion of the components of copula constructions and the relative information load of those components, pronoun deletion and information load and preposition deletion and the semantic distinctions underlying various prepositions. The results seem to justify the claim that paraphrase is a productive method for examining children’s corpora of speech and the linguistic development illustrated by them.

INTRODUCI’ION The purpose of this article is to explore the capabilities of a method of language analysis involving the extension of the data base to include not merely what speakers actually said, but also what they ostensibly intended to say even if in fact they failed to do so, Justification for this topic in a volume devoted to current issues in child language is found in the perpetual currency of any methodological considerations aimed at illuminating the language acquisition process. Moreover, in exploring the capabilities of the paraphrase technique, the following issues are addressed: (1) (2) LSC

10/1-L

The relation between mean length of utterance and other syntactic and semantic measures. The relation between complexity and performance constraints.

174

Language Sciences,Volume 10, Number 1 (1988)

(3) (4)

The relation between information load and performance constraints. The notion that a priori knowledge of certain concepts precedes the acquisition of the corresponding words.

Theoretical Orientation

Approaches to descriptions of child language vary in terms of a variety of factors, prominent among which are the purpose of the description and the theoretical stance of the investigator. Thus a formal models approach aiming to account for the fact of acquisition would have little in common with a developmental approach attempting to describe the course’ of a particular language’s acquisition; a syntax-oriented description aimed at establishing the explanatory adequacy of a generative grammar would differ widely from a semantically based description seeking to explain language acquisition in terms of cognitive growth and interpersonal interaction. The decade separating Chomsky’s “Syntactic Structures” (1957) from Fillmore’s “The Case for Case” (1968) was dominated by the notion of a syntactic underlying structure of the sentence, resulting in attempts to characterize children’s developing language by means of transfo~ational grammars (c$ McNeil1 1966; Bloom 1970). The swing in linguistic theory away from underlying structures comprising syntactic categories towards underlying structures comprising semantic categories was echoed in child language research during the 1970’s (cJ Schlesinger 1971; Bowerman 1973; Brown 1973; Van der Geest et al. 1973; Wells 1974; Greenfield and Smith 1976). Although working in the tradition of descriptions of linguistic development based on transfo~ational grammars, it was Bloom (1970) who convincingly disposed with the notion that the child’s acquisition of language can be penetrated by paying attention only to surface aspects of utterances, i.e. by ignoring, or denying the relevance of semantic intent and context. Schlesinger (197 1) advanced this line of thought by proposing a language acquisition model based specifically on speaker intentions; and speaker intentions are central both in the work of Greenfield and Smith (1976) and in the present investigation. By paraphrasing utterances with the purpose of penetrating to the speaker’s semantic intent, one is taking to its logical conclusion Brown’s (1973) advocacy of a “rich” interpretation of child language. Paraphrasing

The paraphrase technique consists of minimally restoring all deviant utterances to well-formedness, taking account of available verbal and nonverbal

Data Base for Child Language Research

175

contextual information. This procedure enables one to distinguish systematically between the speaker’s semantic intent, or message, and the realization, or code. Paraphrasing was used by Van der Geest et al. (1973) to compare the speech of children from three social classes, and by Snow et al. (1976) to compare the effect of social class on mothers’ child-directed speech. In the present study it is used as a device for the longitudinal monitoring of children’s language development. A likely criticism of the technique is that one may be analyzing adult interpretations rather than children’s intended meanings. Yet, as Wells (1974) points out, even for adult speech “in the last resort it is not possible to know the intended meaning of an utterance: the listener forms the best possible estimate on the basis of all the cues available - perceived speech signal, linguistic context, situation, etc - and responds, or interprets, on the basis of this estimate” (Wells 1974:257). So, while it is not possible to say with absolute certainty what the child’s semantic intent in every case was, it can be said with certainty that the paraphase represents what an adult would have said had he and the child had the same intent.

Despite all the obvious differences between early child speech and adult speech, it is an observable fact that mothers understand children’s early utterances. This observation leads to the first hypothesis, i.e. that children and adults express the way they see the world in essentially similar ways. If this is true, then we may predict that differences between child and adult speech would in an essential way be reducible to the non-realization by children of low-information elements, and language development would be describable in terms of the narrowing, over time, of the gap between child and adult speech. A second observation, namely that different children’s language development tends to proceed at different rates, leads to the hypothesis that an effective descriptive framework should show up whatever differences in linguistic development there may be between two age-equivalent children. If this is true, then we may predict that with mean length of utterance (MLU) as criterion it would be possible to rank a number of age-equivalent children in a canonical order from the least advanced to the most-advanced child. Against the backdrop of this canonical MLU-based order, children’s performances on other measures - disturbing or preserving the order - can be assessed. In exploring the capabilities of the paraphrase technique, we shall be testing the predictions based on the above hypotheses.

176

Language Sciences, Volume 10, Number 1 (1988)

METHOW Subjects

The corpus used for the present analyses was obtained from six subjects, divided into two age-homogeneous cohorts. The first cohort comprised two boys and a girl, all of whom were 18 months old when regular fortnightly sampling started. The second cohort comprised two girls and a boy, with an initial age of 28 months and a sampling interval of three weeks. In this way the age range from 18 to 40 months was covered. For the present analyses a lower MLU limit of 1.5 and an upper limit of 5.0 was set. The least advanced member of the younger cohort passed the 1.5 MLU mark at 23 months and the most advanced member of the older cohort passed the 5.0 MLU mark at 35 months and two weeks. The data therefore cover the age range from 23 through to 35 months, with a one-sample overlap between cohorts at 28 months, and an MLU range of 1.7 through 5.3. Following the “classical” design, the subjects were all first born children of academic middle-class parents. All the parents were native

5-

45-

4-

3.5 -

3-

v 250

2-v P Doto

_!

points

: C$~o;$Jing

points

IS

1 I Fred0

Figure 1.

I Edk

I

I

I

I

DIon

CMS

Betsy

dma

Data Points and Mean MLU for Each Child’s Corpus.

Data Base for Child LanguageResearch

177

speakers of Afrikaans and all the children had their mothers as sole caretakers. The sex distribution of the subjects was fortuitous. On the MLU criterion the only girl in the younger cohort consistently lagged behind her two male peers, while the only boy in the older cohort consistently lagged behind his two female peers. Alphabetical pseudonyms are used, “Anna” having the highest and “Freda” the lowest MLU (c$ Figure 1). Sampling

The standard sample size was one side of a C60 audio cassette, i.e. half an hour per sample, recorded on a Sony TC 55 battery operated, integrated microphone, portable tape recorder. Each sample was divided equally between two and three recording situations such as looking at pictures in books and magazines, playing with familiar toys, drawing, coloring and pasting, and household routines. Each sample was transcribed, usually by the mother, after which the transcript was checked and segmented into numbered terminable units (henceforth “utterances”), and typed out for analysis. Ceding From alternate samples 100 utterances per sample were analyzed meeting certain criteria aimed at avoiding attenuation of the data. Excluded from analysis were, inter ah, solitary vocatives, attention-getters, expressions of assent or dissent, requests for repetition, politeness terms and greetings. Brown’s (1973:54) rules for calculating MLU were used, but since Afrikaans is a highly analytical language in which a word count would capture virtually all the corresponding bound forms of an inflectional language, the word - in preference to the morpheme - was used as base, without prejudicing the comparability of these data to morpheme-based data in the literature. All the words in the paraphased version of each utterance were extensively coded for a wide variety of aspects such as word class, person, number, function, animacy, transitivity, etc. Finally each word was coded for its status vis-h-vis the paraphrase, in terms of normal realization, deletion, substitution and addition. Thus, in addition to all other relevant information, it was also readily retrievable whether the word was actually spoken by the child or supplied in the paraphrase or substituted for an inappropriate word or redundantly spoken. RESULTS Relation Between

MLU and Other Measures

The data support the hypothesis that, using MLU as criterion, individual differences would spread the children fairly evenly along a continuum from the

178

Language Sciences, Volume 10, Number 1 (1988)

least advanced to the most advanced child. Figure 1 shows the mean MLU for each child’s corpus, as well as the data points contributing to each mean. It can be seen that each mean derives from a cohesive set of data, and that not only the means, but also each set of data as a whole, underscore the canonical order. The question now arises whether there is any relation between the canonical ranking, based on realized MLU, and rankings based on the realized as well as paraphrased frequencies of other elements in the children’s corpora. The total frequencies (paraphrased and realized) of coverbs, prepositions and copulas, as well as the realized percentages are given in Table 1. Correlations varying in degree of significance obtain between the canonical ranking and five of the six rankings, higher correlations in most cases being precluded by a mismatch between the canonical ranking on the one hand and the figures for Eric and Freda on the other.3 However, the overall association between the six sets of data is significant (Kendall’s W = 0.759, p < 0.01). Table 1 Frequencies of Coverbs, Prepositions and Copulas in Each Child’s Extended Corpus, and Percentages Realized in Actual Speech

Total Anna Betsy Chris Deon Eric Freda rs P<

Coverbs % Real

290 203 191 133 91 107 .943 .02

91.72 90.15 76.96 33.83 54.95 20.56 .943 .02

Prepositions Total % Real 200

96.08

143 90 95 64

83.92 78.89 48.48 47.69

74 ,886 .05

40.28 1.00 .Ol

Total

Copulas % Real

171 196 274 188 190 215 - .371 p>.l

92.98

85.71 71.17 56.38 19.47 22.97 .943 .02

The association between linguistic development - in terms of MLU - and frequency of deletions4 though interesting in its own right, does not provide information on the factors associated with deletion; nor does it reflect the relative deletability of elements in a construction. It is to these matters that we now turn. Performance Constraints and Cumulative Complexity

In this section the potential of the paraphrase technique is demonstrated for illuminating the relation between children’s deletions and the complexity of

Data Base for Child Language Research

179

the contexts from which elements are deleted. The term “complexity” is not used in the sense of Brown and Hanlon (1970) to denote the result of an accumulation of tranformations. For our purposes complexity results from the introduction of optional elements into a string. To the “ideal speaker-listener” who is “unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors” (Chomsky 1966:3) complexity, in any sense of the word, is of no more than academic interest. To the real-life speaker, however, the above “conditions” translate into performance constraints, i.e. limits to the complexity of the structures he/she can handle. Affected by such conditions to an extreme degree, the language-learning child is initially limited to a one-word output to convey a given semantic intent, then to two words at a time, then to three, and so on. Much of the present argument is consonant with, and a logical expansion, of the view expressed by Greenfield and Smith (1976:210) that “the development from one- to two-word utterances can be seen as the addition of a second, less informative element to a single-word utterance” (emphasis added). In a counterfactual world where - for the sake of the argument language acquisition is assumed to occur without the characteristic deletion of obligatory elements, i.e. where from the start children produce only well-formed utterances, their output during the early stages would be severely restricted. In the real world different conditions prevail: Instead of confining themselves to the structures allowed by the performance constraints of the moment, children introduce elements before they can “afford” them. The price they pay for this extravagance is that they have to delete obligatory elements. In this section, using coverbs as a test case, we address the question of whether certain contexts are more likely than others to precipitate deletion. An elegant contingency would be if we were able to isolate a small set of individual elements, each with a high predictive power for coverb deletion and together accounting for all such deletions. However, the opposite is the case. Although the most advanced children realize far more coverbs than they delete, and the least advanced children do the opposite (c$ Table l), for each child coverb deletion appears to be quite random. Contrary to expectation, deletions sometimes occur in short, ostensibly simple utterances, and fail to occur in longer, ostensibly complex utterances; nor is there any one element with greater predictive power than any other. A more realistic expectation would therefore seem to be a set of potentially co-occurring, optional high-frequency, growthsensitive elements, each member of which would by its presence in an utterance not necessarily precipitate a coverb deletion, but merely increase the risk factor for coverb deletion. The above criteria narrow the field down to prenominal adjectives, adverbs, prepositional phrases and object noun phrases.5

180

Language Sciences,Volume 10,Number1 (1988)

To facilitate comparisons between samples, a deletion ratio (DR: the number of complicating elements per deleted coverb) and a realization ratio (RR: the number of complicating elements per realized coverb) were computed for each sample in the corpus. The assumption that heightened complexity will result in increased coverb deletion, leads to the following predictions: (1) (2)

In any samples containing coverbs the DR will be greater than the RR. As a particular child’s performance constraints decrease, greater complexity would be required to precipitate a deletion.

The first prediction is confirmed by the fairly consistent distance maintained over time between the least squares regression lines showing the best-fitting straight lines through the observed data points for DR and RR (cf: Figure 2). The second prediction is confirmed by a regression analysis showing a significant association between age and DR (F = 8.42, p< .Ol). We may conclude from these findings that coverb deletions are not random, but are associated with the overall complexity of the utterances in which they occur. The optional elements cont~buting to this complexity are adjectives, adverbs, PPs and object NPs.

Age

Figure 2.

in months

Regression Lines Showing the Association between Deletion Ratio (DR) and Realization Ratio (RR).

Age,

Data Base for Child Language Research

181

The relation between performance constraints and complexity having been established, we turn now to the potential of the paraphrase technique for illuminating the relation between children’s deletions and the information load of deleted elements.

Performance Constraints and Information Load

Verification of the highly predictable relation between deletions and MLU is found in the “% Real” columns in Table 1. It is not this observation as such, but the deletion pattern of the essential components of a copula construction subject NP, copula complement - that is of interest. The prediction to be considered in this section is that the copula would have the highest deletability potential, the subject the second highest, and that the complement would have a low deletability. This prediction is based on a consideration of the relative informativeness of the three components.

copuIas

Due to the semantic vacuity of the copula, and its high predictability and commensurately low information load - in the context of a subject and a complement, copula deletion does not result in information loss. Moreover, copula deletion is not only a regular feature of nonstandard Black English (cJ Labov 1972) but also occurs to some extent in South African English as well as in Afrikaans.

Subjects

In a study of the negative utterances produced by two Afrikaans children between the ages of 18 and 30 months, it was found that the least advanced child deleted 89% and the most advanced one 59% of sentence subjects (Vorster 1982). Yet the reactions of mothers to subject-deleted utterances reveal information loss to be negligible. This is largely due to the fact that in the mother-child discourses observed, the same subject tends to persist over several utterances. Once a subject has been introduced, communication is maintained regardless of whether the child articulates the subject in subsequent utterances. Moreover, due to the highly contextualized nature of these discourses, entities under discussion are almost invariably in the joint attention focus of the dyad, so that the child can introduce a subject, by commenting on it, without actually naming it. In such cases the mother typically names it in her next turn, after which the discourse nms it course.

182

Language Sciences, Volume 10, Number 1 (1988) Complements

By the very nature of the copula construction, it is the complement that typically conveys the “new” information, so that an utterance featuring a complement deletion would only in exceptional circumstances succeed in performing a communicative function. Figure 3 shows to what extent the data verify the predicted deletion pattern: copula deletions < subject deletions < complement deletions. However, the relative ranges involved, i.e. the differences between the children deleting the most and the fewest of each element, are more revealing than the mere verification of the prediction. The range for copula deletion is a massive 73.39 (Anna = 7.14%; Eric= 80.53%). A considerably smaller range of 35.20 is found for subject deletions (Anna = 6.43; Freda = 41.63%). For complement deletions the range is a mere 11.83 (Anna = 1.29%; Freda = 13.12%). These figures clearly show the interrelationship between information load, deletability of elements and children’s level of linguistic development.6 60

-

Erik Freda &

(cl

70 \ * 60 -

\\ \‘\ ‘,‘\..

50-

30-

ChrisW------_..

20 -

coplh

Figure 3.

Subiect

Complement

Percentage Deletion of Essential Components Construction.

Pronoun Deletions and the “Nominal

of the Copula

Shift”

The relative frequency of pronouns and nouns among the first 50 words of 18 children (mean age 19 months) led Nelson (1975) to classify eight of them as

Data Basefor child Laagoage Research

183

“Expressive” and ten as “Referential”. In a follow-up study she analyzed speech samples of all the children at 24 months, and of 16 of them again at 30 months. Twenty-four samples were selected to provide the four cells at the intersection of two binary parameters - expressive or referential child, and high and low MLU - with six children each. The cut-off points for low and high MLU are 1.0-2.5 and 2.5-4.5, respectively, so that Nelson’s two groups are comparable with our younger and older cohorts. We will concern ourselves further with only one of the twelve parameters against which Nelson considered her data, i.e. pronouns as percentage of all nominal elements.

While two of the parameters quadrichotomizing Nelson’s data, i.e. high and low MLU, correspond closely with our older and younger cohorts, the other two, i.e expressive and referential child, have no prima facie connection with our realized and extended data. Yet the mean percentages from the present data are strikingly similar to those obtained by Nelson: Nelson’s Data Expressive x low = 50.2% Expressive x high = 5 1.6% Referential x low = 36.3% Referential x high = 59.6%

Present Data Extended x younger = 50.5 Extended x older = 57.7 Realized x younger = 30.7 Realized x older = 53.5

While there is no significant difference between our older group’s two sets of data, the means of the realized data from the older and younger groups differ si~i~cantly as do the means of the younger group’s realized and extended data (cJ Figure 4). Comparing the present results with those of Nelson we fmd a clear correspondence between her referential children at low MLU and our younger group’s realized data. Moreover, these two sets of data differ from the rest of both Nelson’s and the present data. Of interest is that we are dealing here with three different types of differences, the one common factor being some developmental dimension. The differences are: (1) (2) (3)

between the same children at two developmental stages (Nelson’s subjects); between two groups of children at different developmental stages (our younger and older groups); and between two types of data (realized vs extended) produced by the same children at the same point in time (our younger group).

The developmental dimension underlying the first two differences is obvious; that underlying the third difference derives from the fact that the data containing the paraphrased items are by implication more “advanced” than

184

LanguageSciinces,Volume10,Number 1 (1988)

p
*

\

I

Oldergroup Figure 4.

Pronouns

Younger

as Percentages

Older and Younger Realized (-) Data.

/

group

of all Nominals.

Groups,

Comparison

and of extended

(-)

of and

those containing only the realized items, the former being in a sense “projected towards” adult speech. When attempting to characterize our two groups in terms of Nelson’s dichotomy, we find two possibilities for the older group: They could be styled (i) “expressive” if we assume that their current high pronoun ratio had remained more or less constant since their MLU’s were in the vicinity of 2.0, or (ii) “referential” if we assume that they had undergone nominal shift and that their initial pronoun ratio had been more than 20% lower than at present. Although circumstantial evidence favours the latter, there can be no certainty, and the use of the older group’s data is restricted to that of a reference point for the younger group. As for the younger group, it is clear that they are to be classified as “pre-shift referential”. The Afrikaans data, more specifically the differences between the younger group’s extended and realized pronoun percentages, support the hypothesis that a general performance constraint results in the deletion of a particular class of elements, i.e. those with a low information load. The data furthermore seem to indicate that pronouns form part of this class, and that the difference between

Data Base for Child Laagaage Research

185

referential children’s use of pronouns at low MLU and high MLU is to be ascribed to the same performance constraint responsible for the deletion patterns found in copula constructions. The argument for including pronouns in the class of low-information elements is based on the main functions of the pronoun, i.e. to enable a speaker to refer in the most economical way to a previously identified noun phrase, and to enable him/her to meet the obligation of filling an obligatory syntactic slot. The present data suggest children’s awareness that a pronoun does not primarily convey information; that it merely stands pro something else which has already been identified to the satisfaction of both parties to the discourse and can therefore be deleted. It would seem, then, that the nominal shift represents a record of how children’s deletability threshold for pronouns increases, of how they become progressively more able to “afford” pronouns; more generally, of how the child progressively overcomes those performance constraints responsible for the deletion of low-information elements in his/her language.

Preposition Deletions and the Primacy of Concepts Ever since Brown (1957) hypothesized that words precede concepts, that adult words serve as “lures to cognition”, word/concept primacy has been at issue. Although Bowerman’s (1973) data seem to support Brown’s stance, there is a substantial body of counter-argument, for example: (1)

(2)

(3)

In 1972 Macnamara, on the acquisition of logical terms, argued that the first hearing of a term cannot possibly introduce it into a child’s mind. A child can only learn a new word “if he experienced the need for it in his own thinking and looked for it in the linguistic usage about him” (Macnamara 1972:5). In 1973 H.H Clark, on the acquisition of English expressions for space and time, argued that the child acquires these expressions by learning how to apply them to a priori knowledge. As Clark puts it, “the child knows much about space and time before he learns the English terms for space and time” (H.H. Clark 1973:28). Experimental evidence failed to support the hypothesis of Levine and Carey (1982) that the words “front” and “back” are acquired before the corresponding concepts. They found clear evidence that “a complex disjunctive concept of ‘front-back’ orientation . . . precede(s) any knowledge of the words front and back” (Levine and Carey 1982:645).

l&i

Language Sciences, Volume 10, Number I (1988)

In this section the paraphrase technique is used to provide evidence that the concepts underlying certain prepositions in the Afrikaans corpus were formed prior to the learning of the corresponding words. Of the 50-odd prepositions in Afrikaans only 19 types and 500 tokens occur in the present data - the ten types in the younger cohort’s corpus forming a subset of the 19 in the older cohort’s corpus - and only eight types appear with any frequency, even in the most advanced children’s corpora. There is also a strong overall predominance, greatly accentuated in the younger cohort’s data, of spatial (locative and directional) prepositions. Table 2 Frequencies of Preposition Types Realized in Actual Speech and Supplied in the Extended Corpus Anna In By (lot.) To (postpos.) On Into To (dative) With(instr.) With (comit.) To (prepos.) On (“touch on”) From Of (“like of’) out of Under Over Beside Near Against Around Types Tokens

23:0 34:3 14:l 14:o lo:o 55:2 15:o 1 I:0 60 2:o 3:o 2:o _ 1:o I:0 _ I:0 1:o l:o 17:3 194:6

Betsy

Chris

Deon

Eric

Freda

39:4 7:o Ii:0 162 8:l 18:8 3:5 8:0 I:2 3:o l:o 1:l 2:o 1:o -.

12:2 712 12:o 9:l 2:o 7:7 8:3 5:1 2:2 I:1 2:o _

20:9 9:5 5:7 4:1 5:1 *:7 *:10 I:4 *:2 2:o *:1 *:1 -

12:6 I:1 5:2 3:5 5:2 2:6 *:8 *:1 *:2 1:o -

9:9 1:7 12:4 1:4 12:4 I:3 *:4 1:l *:6 *:2 _

l:o -

2:o I:0 _

1:o -

1:l -

14:8 71:19

8:ll 47:48

8:lO 30:34

1:o -

157 120:23

* Cells with no realized prepositions - Vacant cells.

Total

115:30 59:18 59:19 47:8 42:8 83:33 26:30 267 9:14 9:3 6:l *:2 3:4 1:o 5:l _ 3:o _ 3:o _ 2:o I:0 _ i:O _ 1:o 8:ll 19:19 38:46 500: 176

Data Base for ChildLanguageResearch

187

In each cell in Table 2 the figures for realized prepositions appear on the left of the colon, while figures for p~positions supplied in the paraphrase appear on the right. For ease of identification, cells with no realized prepositions are marked with asterisks, while completely vacant cells contain a dash (-). With realizations and paraphrases as criteria, the following tripartition is imposed on the data: Group 1.

Group 2.

Group 3.

The first five types (all spatial+ are realized in all the children’s corpora, while it is also common for them to appear in paraphrased versions of utterances in which they were not actually realized. The middle group differs from the other two in three important respects. First, it contains prepositions expressing a variety of relations compared with the exclusively spatial nature of the other two groups. Secondly, it reveals a marked cohort split, the older cohort’s realized figures resembling their Group 1 performance, and the younger cohort’s their Group 3 performance. Thirdly, the seven types in Group 2 all fail to occur in the realized versions of some of the younger children’s corpora, yet they all occur in the paraphrased versions of one or more of these. (For ease of reference, we shall call such types “unsupported”.) The seven types at the bottom of the list (all spatials) occur with a very low frequency in any one child’s corpus, and fail to occur at all in two-thirds of the cells. With a single exception, these types do not occur in the paraphrased versions of utterances; if they occur at all, they are realized.

The fact that some types occur only in the paraphrased versions seems to show that certain conceptual distinctions are present in the child’s mind before he/she is able to overtly express them in language. However, we have a problem with the resemblance between the older cohort’s Group 3 data and the younger cohort’s Group 2 data. Why does this resemblance go so far as it does, without extending to the occurrence of unsupported prepositions in the older cohort’s Group 3 data? An answer to this question may be found in closer scrutiny of the distinctions underlying the items being acquired in each group. Of the Group 1 prepositions, four (in, into, by and to) are defined by the perceptually simple parameters containment and general proximity, and by the opposites static and directional. The fifth preposition (on) expresses the first specifed proximity relationship usually found in children’s speech (cf Clark 1973; Dromi 1979).

188

Language Sciences,

Volumelo,

Nmber 1 (1988)

If we consider the relative frequencies in the younger cohort’s data of the prepositions from the three groups, we find that Group 1 prepositions are used on average 20.80 times each, compared with 1.14 and 0.43 for the other two groups. These figures clearly show that the five Group 1 prepositions are s~cient to convey most spatial relations the young child needs to express. The Group 2 prepositions, for all the contrasts pointed out above, share an important feature with Group 1: They also convey relatively simple, “breadand-butter” relations. Dative, instrumental and commitative seem to be the most basic non-spatials; the destination-oriented directional “to” is here joined by its source-oriented opposite “from”; “aan” and “van” are fixed prepositions required to convey the simple and commonplace meanings “touch” and “like”. By comparison with the more or less “primitive” relations expressed by the prepositions from Groups I and 2, the relations expressed by the Group 3 prepositions are specialized and complex. Moreover, their low frequencies in the data show them to be dispensible - for both cohorts - for the level of communication in question. It is for precisely these reasons that we may hypothesize a fundamental difference between, on the one hand, Group 1 and 2 prepositions, and on the other hand those from Group 3. There seems to be a period when prepositions from the first two groups will be “un- supported” when the child knows about the relations and shows proof that he/she wants to express them, before he/she knows the appropriate words. Evidence of this can still be found in the unsupported prepositions occurring in the younger cohort’s Group 2 data; and we may assume that if earlier data for both the cohorts were available, more such forms would be found. In contrast, the absence of “unsupported” instances of Group 3 prepositions in the present data seem to support the suggestion “that Brown’s 1975 hypothesis may be correct whenever a word has a complex, disju~cf~ve de$nition” (Levine and Carey 1982646, emphasis added).

CONCIAJSIONS The central concern of this article was to test a descriptive method uniquely capable of illuminating certain aspects of langauge acquisition. We were able to show that the non-realization of elements occurring in the paraphrase constitute part of the data base of child language research, a part that can only be made accessible by means of the paraphrase procedure. _ The central hypothesis is that children and adults express the way they see the world in essentially similar ways, and predictions following from this general hypothesis are that:

Data Base for Child Language Research

(1)

(2)

189

one of the most important differences between child and adult speech lies in children’s non-realization of low-information elements; and we can usefully describe language development in terms of a narrowing gap between child and adult speech.

The second hypothesis is that an effective descriptive framework should reveal such differences in linguistic development as there may be between ageequivalent children. Predictions following from this hypothesis are that: (1) (2)

MLU would rank a number of age-equivalent children from the least to the most advanced child; and it would be possible to use this canonical order as a backdrop against which the relation between MLU and other, more differentiated or finer-grained measures can be determined.

The results presented offer persuasive support for the hypotheses. With the MLU-based canonical order as a constant, it was possible to show the relation between children’s linguistic development and: (1) (2) (3) (4)

coverb deletion and complexity of linguistic context; deletion of the components of copula constructions and the relative information load of these components; pronoun deletion and information load; and preposition deletion and the semantic distinctions underlying various prepositions.

These positive results seem to justify the claim that the paraphrase procedure is able to reveal relevant information which eludes other methods; that it provides for an objective and controlled comparison between more and less “standard” forms of a language. In considering the potential of this method, it is well to bear in mind that paraphrasing is not limited to deletions but also entails substitutions, additions and permutations; nor is its potential usefulness limited to the language acquisition context.

NOTES

1. This important distinction is articulated, in a somewhat different context, by Hoff-Ginsberg and Shatz (19825). 2. A detailed description of the method of the present investigation appears in Vorster (1983). LSC

10/1-n

190

Language Sciermes,Volume 10, Number l(1988)

3. These and similar mismatches are discussed in Vorster (1983). Let one example suffice to underscore the limited informativeness of gross frequencies: The 50 realized coverb tokens in Eric’s corpus comprise only four types, of which one alone accounts for 40 tokens. By contrast the 22 realized coverb tokens in Freda’s corpus comprise nine types, the most frequent of which occurs six times. It is obvious, gross frequencies notwithstanding, that Freda’s acquisition of coverbs outstrips that of Eric to a considerable degree. 4. The term “deletion” is used in the sense of “non-realization”, and not to designate a tranformational operation whereby an element is removed from a string. 5. The optionality of object NPs may be questioned. However, a large number of the potentially transitive verbs in the present data can be, and are, used intransitively. 6. The significance of the present results is enhanced by similar results obtained for the elements associated with prepositional phrases - and this in spite of the high semantic value of prepositions, relative to copulas (cJ Vorster 1983). A range of 68.77 was found between the children deleting the most and the least sentence subjects, while the corresponding figure for prepositions was 55.17. In sharp contrast, the range for prepositional NPs is a mere 6.52.

REFERENCES

Bloom, L. 1970 Language Development: Form and Function in Emerging Grammars, Cambridge, MA: The M.I.T. Press. Bowerman, M. 1973 Early Syntactic Development: A Cross-linguistic Study with Special Reference to Finnish, Cambridge: Cambridge University Press. Brown, R. Words and Things, Glencoe, IL: Free Press. 1957 Brown, R. The Early Stages, London: George Allen & 1973 A First Language: Unwin. Chomsky, N. 1957 Syntactic Structures, The Hague: Mouton. Clark, E.V. “Non-linguistic Strategies and the Acquisition of Word Meanings,” 1973 Cognition

2, 161-182.

Data Base for Child Language Research

191

Clark, H.H. 1973 “Space, Time, Semantics and the Child,” in Cognitive Development and the Acquisition of Language, T.E. Moore (ed.), New York: Academic Press. Dromi, E. “More on the Acquisition of Locative Prepositions: An Analysis of 1979 Hebrew Data,” Journal of Child Language 9, 547-562. Fillmore, C. 1968 “The Case for Case,” in Universals in Linguistic Theory, pp. l-90, E. Bach and R. Harms (eds), New York: Holt, Rinehart and Winston. Greenfield, P. and J. Smith 1976 The Structure of Communication in Early Language Development, New York: Academic Press. Hoff-Ginsberg, E. and M. Shatz 1982 “Linguistics Input and the Child’s Acquisition of Language,” Psychological Bulletin 92(l),

3-26.

Labov, W. 1972 Sociolinguistic Patterns, Philadelphia: University of Pennsylvania Press. Levine, S. and S. Carey 1982 “Up Front: The Acquisition of a Concept and a Word,” Journal of Child Language 9, 645-657.

Macnamara, J. 1972 “Cognitive Basis of Language Learning in Infants,” Psychological Review 79, l-l 3. McNeill, D. 1966 “Developmental Psycholinguistics” in The Genesis of Language, F. Smith and G. Miller (eds), Cambridge, MA: The M.I.T. Press. Nelson, K. Development,” 1975 “The Nominal Shift in Semantic-Syntactic Cognitive Psychology 7, 46 l-479. Schlesinger, I. 1971 “Learning

Grammar:

Language

Acquisition:

From

Pivot

to

Realization Rule,” in R. Huxley and E.

Models and Methods,

Ingram, (eds), London: Academic Press. Snow, C., A. Arlman-Rupp, Y. Hassing, J. Jobse, J. Joosten, and J. Vorster 1976 “Mothers’ Speech in Three Social Classes,” Journal of Psycholinguistic Research 5, l-20.

192

Language Sciences, Volume IO, Number

1 (1988)

Van Der Geest, T., R. Gerstel, R. Appel, and B. Tervoort 1973 The Child’s Communicative Competence, Mouton: The Hague. Vorster, J. 1982 “The Last Shall Be First: On the Acquisition of the Afrikaans Double Negative,” in Proceedings of the Second International Congress for the Study of Child Language, C.E. Johnson and C.L. Thew (eds), Lanham, Md: University Press of America. Vorster, J. 1983 Aspects of the Acquisition of Afrikaans Syntax, Doctoral Thesis, University of South Africa, Pretoria. Wells, G. 1974 “Learning to Code Experience Through Language,” Journal of Child Language

1, 243-269.