(1971) 281-293, Q North-Hollmd Parbkshrng Company L”FOLto26ta reproduced In any formwithoutnrimn pwmLs&m from the publisher
ON THE
DISTRIBUTION
OF NOUN-PHRASE TYPES IN ENGLISH CLAUSE-STRUCTURE F
0
G
1)
A M AARTS
IN1 RODUCTION
The mvestrgatron reported m this p&per 1s based on an exammatron of 6 corpus of approxrmately ‘72000 words of present-day English This corpus, which forms part of the rues of the Survey of Engtib Usage at UAversrty College London, comprises 14 texts of ca 5 080 words each, whrch may be drvrded mto four catcgznes (1) hght frctron. texts 6 1, 6 2, 6 3 and 6 4, (2) screntrfrc wntmg texts 8a 1,8a 3 and 8a 4, (3) informal speech texts 5b 1,5b16, and Sic 11; (4) formalspoken and wntten Enghsh texts 5b 2 and 5b 51, 8b 1 and 8ab 2. The question we are interested m IS whether rt IS possrble to demonstrate non-randomness m the &stnbutron of noun-phrase types in Enghsh clause-structure We shall assume that, to a large extent, this distnbution IS determmed not only by the functron of the noun-phrase m the clause, but also stylistically wth respect to the variety of Enghsh More specrfically, we shall try to find evrdence for the hypotheses that there IS a correlatron between subjectexponents and structural ‘hghtness’ on the one hand and a very strong tendency for non-subject-exponents to be reahzed by structurally ‘heavy’ noun-phrase types on the other The drstinctron drawn here between ‘hght’ and ‘heavy’ nounphrase types is to be interpreted as fc!lows By ‘hghr’ Items we shall here ,mderstand* (1) pronouns, (2) names, (3) nouns, neither prenor postm&fred, (4) nouns, premodrfred by determmers only -r) This paper was wntten dunng a stay at Umverslty College London (1969-1970). which was partly supported by a grant from the Nlels Stensen Foundation m Amsterdam I am very much obbged to Professor Randolph Qmrk for permmuon to use the flies of the Survey of Engbsh Usage and especially for his many helpful suggestions and comments
281
282
F. G. A
M. AARTS Taem
I
-”
c Prepositional
b Dlrffit ObJect
a Subject
aclJunct
Pewma proTEXTS
nOUllS
Other pro-
Names
Pemmdl
pronouns
nouns
Other Names pronouns
Personal Other Names pro- pronouns nouns 50 51 34 23
16 36 15 16
20 22 29 22
7 4 5
34 24 22
6 2 41
5 5 -
30 J9 4ir
39 38 29
54 23 27
1 3
32 30 15 10
37 36 27 25
25 6 13 58
38
376
394
348
61 62 63 64
329 508 271 232
36 83 48 35
94 164 72 71
55 76 41 40
16 51 10 20
3 7 4 10
0a 1 8a 3 8a4
57 117 42
65 73 75
4 1 25
4 3
2 4 2
-
5b 1 5b 16 SIC 11
432 520 643
90 97 a0
23 26 22
54 44 72
29 47 51
Sb2 5b51 8b 1 8ab 2
360 346 163 154
126 101 59 60
16 8 60 25
43 22 25 22
36 37 25 19
-
Total
4174
1036
611
501
357
By ‘heavy’ items we shall understand’ (a) All other premodfied noun-phrases, I e those premodlfied by (1)adjective(s), (2) gemtlve, (3) noun, (4) ,idjective + noun, (5) genltive + noun (b) All postmo&fled noun-phrases, i.e those postmo&fied Sy (1) preposltmnal phrase(s), (2) preposlilonal phrase(s) + clause, (3) prepositional phrase + non-fmrte clause, (4) non-fmite clause, (5) relative clause. 1
PRONOUNS
AND NAM&i
Table 1hstsall the pronouns and names that occur in the corpus as exponents of the vanous places m clause-structnre (columns a-f) 2) ‘) The s;rammatlcal terms m tlus pqzer are beemgused wth the values gwen to them m R W Zandvoort, A handbook of Englash grammar, 1ltb ed , Gronmngen. I%9
NOUN-PHRASE
d Indnect
TYPES
Object
Perxmal pronouns
Other prorlO”IlS
12 I1 13
2
Names
2
Personal pronoun.5 2 6
-
-
2
2
-
2 12 8 5
a
12
aa
3
-
Other
f Predlcatwe
Names
6 6
3
-
-
-
11 3
-
10 11
-
54
adJmICt -
Other pronouns
-
-
-
13
Personal proWJ”llS
proncjons
-
283
CLAUSE-STRUCTURE
e Nommal part of predate
-
-
IN
2
-
-
-
2 -
11
5
As appears from column a, all 14 texts have a large number of personal pronouns at S As exponents of non-S they decrease markedly as we move from left 1onght -their number IS considerably smaher in columns b and c, almost neghgble m columns d and e and they are totally lacking m column f Other pronouns and names are also more frequent at S than elsewhere, but the ratio of S-exponents to non-S-exponents is different from that of the personal pronouns Table 1, then, contains overwhelmmg evrdence to show that structurally ‘hght’ items tend to be found at S rather than elsewhere m the clause As table 2 shows, m thrs corpus pronouns and names at S are in fact 2 6 times as frequent as those at non-S There is more evrdence to support our hypothesrs Before going on to examme it, a few comments must be made on the figures m table 1.
Names -
-
264
fi. G. A. M. AARTS TABLE 2 Subject
Non-subject
4174 1036 611 5821
978 813 402 2193
Personal pronouns Other pronouns Names
The question to be answered here IS whether there IS any srgnificant relation between the four categones of texts al Edthe distribution of the figures m table 1 for S (column a) and non-S (columns b-f). For each of the three kinds of exponents (personal pronouns, other pronouns and names) this was examined by applying the ~2 test to the following 2 x 2 tabless) (table 3) TABLET PersOnal pronouns 1
Texts6 l-64 Texts 5b 1-S lc 11
S 1340 1595
Texts 8a l-8a 4 Texts 5b 2-8ab 2
S 216 1013
Other pronouns 3
non-S 421 309
S 202 275
nonS 23 225
S 213 346
2
non-S !9b 251
Names 5 non-S 127 117
S 341 71
4
6 non-S 91 265
S 30 109
non-s 49 109
The X*test shows that of the above tables 1,4 and 5 are statrstically highly significant (fi > 0 OOl), 2 is srgmficant ($ > 0 01) and 3 and 6 are insrgmfmant (fi < 0 025) Lmgmstically speaking this means that there IS a strong assouatron between personal pronouns and ‘sub;ectness’ in all four categories of texts (tables 1 and 2) Since th? drstnbutron in tables 3 and 6 IS not SI#ficant, the same 8) For help wltb statlstlcal problems I am wry grateful to Mar Wright of the Computer Centre of Umversky College London a;ld to Mrs CshrolmeBott
NOUN-PHRASE
TYPES
IN CLAUSE-STKUCTURE
concluaon cannot be drawn for other pronouns and names. The former, according to table 4, are typrcal S-exponents in textcategones 2 and 4 (Screntrfic wnting and Formal spoken and wntten Enghsh), the latter, accordmg to table 5, m text-category 1 (hght fictron) only One final comment should be made on the scrence texts m tlus connection. Not only do they have a stnkmgly low number of personal (and other) pronouns at non-S (indeed there are none at all in columns d and f) and an even smaller number of names (none in columns b, d, e and f), they also have comparatrvely few personal pronouns as sublect There IS some evrdence to show that one of the reasons for thus may be that the sublect m thrs type of text tends to avoid pronominahzatron E&her of two thmgs may happen The noun may snnply be repeated as m the followmg examples From the lower part of the mtd- and hmd-bram arose all the cramal nerves except the olfactory and optx These net-m follow the same plan ab those of gnathostomes (aa 1) The phenyl radxals are capable mtibked by the presence of
of effectmg
arylatlon,
and fhe nrykakon w @a 4)
or else the noun IS replaced by an eqmvalent. LMle IS known about the condltlon m sea lampreys, where blood IS probably hypotomc to the sea When 111the river the anrmals must deal wzth the tendency for water to flow 7 @a 1)
These examples of course prove httle or nothmg, but rt nught well be worth mvestrgatmg this phenomenon to see whether this IS a characteristic feature of screntific English 2. PBZEMODIFIED AND POSTMODIFIED NOUN-PHRASES Table 4 hsts the vanous premodrfiers (includmg zero) that occur in the corpus unth the non-pronommal exponents of S and non-S If we &stinguish between ‘hght’ items (columns a-b) on the one hand and ‘heavy’ items (columns c-i) on the other, and combine the figures in columns a and b with those m table 2
F
236
G A M AARTS
TABLE 4 ---
SubJect Duect obJwt Preposltlonal adJunct Induect ObJeCt Nommal part of pmdlcate Pred1cat IV0 adJUUCt
Lo + head
b Determmers + head
C ladJectwe
d 2adJechves
-Ihead
-Ihead
168 203
760 615
283 307
31 59
2 3
26 11
457 3 44
1090 6 I51
728 1 176
93 18
6 -
4
4
8
3
-
we get the following distnbutron (table 5)
e 3 (or more) adJect1ves + head
1
f Gemtwe + head
h Ad]&:ttve + noun -Ihead
i Gemttve + noun -Ihead
100 71
7 12
-
31 1 10
170 1 34
30 4
I
1
.% Noun + head
-
of ‘light’ Items at S and non-S
TABLE 5 J?ronck~+ames
& Determmer
+ head
SU ,Ject NWI-subject --
5821 2193
928 2560
The ~2 test shows that thus dtstnbutron 1s highly significant (p > 0.001) The same IS true for the tables of each of the four groups of texts separately (table 6) rt 1s evrdent, then, from the above, that wrthm the category of the structurally ‘light Items, we must distmgmsh between pronouns and names on the one hand and nouns (plus or minus determmers) on the other, the former bemg more typmal exponents of S than the latter. Table 4 requires httle further commem It 511lbe noticed that the premodrfred noun-phrases m columns d-i represent only a mmonty of those m the whole of the corpus and that the frequency of thar occurrence decreases with their complexrty, So far we have only exanmed structuraliy ‘light’ exponents, that
1 1 2
-
NOUN-PHRASE
TYPES
IN
287
CLAUSE-STRUCTURE
TABLE 6
-
Lqht fxtton Pronouns/names Subject Non-subject
f Determmer + head
1943 754
277 928
Sctentlftc wntmg Pronouns/names Subyxt Non-sublect
f Determmer + head
459 163
261 466
Informal speech
Subject Non-sub]&
Pronouns/names
f Determmer + head
1941 677
123 492
Formal spoken and wntten Enghsh iionouns/names Sublect Non-sub@
1478 599
f Determmer + head 267 674
ISpronouns, names and nouns premodfied by zero and determmers Thrs has gone some way to provnlmg evidence for our hypotheses, but we must also take account of the structurally ‘heavy’ Items, that ISto say the other premo&fied noun-phrases m table 4 (columns c-i) and all postmodrfred noun-phrases The latter are cont,uned m table 7 In table 8 the figures of table 7 have heen broken down accordmg to textual category Thus the 419 subject noun-phrases that are postmodified by one preposrtional phrase are &stnbuted over the four textual categories as indicated m the four top cells of column a Perhaps it IS a httle dangerous to attempt to draw far-reachmg conclusions from a styhstic point of vrew It 1s evrdent, however, from the perfectly regular drstnbutron of the figures m table 8, that post-mtified noun-phrases, m a’llfour text-groups, typmally appear m functions other than subject. Moreover, categones 2 and 4 (Screntrficwriting and Formal spoken and written Engbsh) tend to have a larger number of postmodrfied noun-phrases than the Fmtron
208
F. G A
a Noun + 1 prep phra%
Sub@ Dmct obyxt Prcpos~tlonal adjunct Induect object Nommal part of predwate Predxatwe adJuUCt
M. AARTS
b c d Noun Noun Noun + f + 1 prep 1 prep 2 prep phrase phrase phrases + + clause nonfmte clause
h f e g Noun Noun Noun Noun + + + -I2 prep 3 prep non- relatwe phrasea phrasce fuute clause clause + clause
4IO 411
14 45
16 18
74 62
1 8
10 14
58 79
128
7’09 1
67 -
30 -
133 -
14 -
26 -
98 -
249
196
26
11
31
9
10
36
92
-
-
5
2
TABLE 8
a Noun + 1 prep phrase
t 8 4
b c d e f Noun Noun Noun Noun Noun + -t+ + + 1 prep 1 prep 2 prep 2 prop 3 prep phrase phrase phrasesphrasesphrasea + + + clause nonclause
h g Noun Now + + non- relative fnnte clause clause
Light fiction Sclentlfic wr~tmg Informal speech Formal spoken and written Enghsh
59 168 49
134
1
2
23
1
5
13
58
tght
324
28
15
30
4
5
58
105
327 235
32 Xi
32 5
9r 28
14 2
30 3
82 35
101 96
436
53
7
71
11
12
38
204
fictxon
Sclent3fic wntmg I g Informalspeech 3 Formal spoken 8, and wnttan Eaghsh
1 6 6
4 10 -
9 35 7
-
5 -
9 30 6
17 30 23
NOUN-PHRASE
TYPES
IN CLAUSE-STRUCTURE
289
and Informal speech groups This 1strue of both the top and bottom halves of table 8 What does not become clear from table 8 is the correlatronbetween text-vanety and degree of structural complexrty Let us confine ourselves to one example Accorchng to column h the Fictron texts contain 105 noun-phrases m non-sublect functions postmochfred by a relative clause and the Science texts 101 The numerical difference is neghgrble. However, an exammatlon of the texts themselves shows that the novels have a preference for fairly sample nounphrases of thrs type, whereas screntrststend to use very complex structures hke the followmg It was this cncumstance that led Fowler to mtroduce the name ‘co-operatrve tram&on’, whxh emphasrses that the mteractronmust be ‘self-helpmg’m the senseof mcreasmgm importancewrththe progressof the changeIt 1s tendmgto promote (sa 3)
that the reactroncould be extendedfor the arylarronof a sohd aromatrccompoundArH by usmg an orgamc solvent (chloroformor carbon tetrachlonde)wluch ISrelatrvelymert towardsthe radrcaisr~olved, and whrch,whenIt does undergoreactronwith themto a smallextent,formscompoundswhrchare easdyseparatedfrom the desued substrtutron productArAr’ Pa 4) Grieve and Hey also showed
3. ‘LIGHT’ AND ‘HEAVY’ ITEMS COMPARED Tables 1, 4 and 7 account for the chstnbutron of all noun-phrase typps m the whole of the corpus Table 9 combmes these figures. TABLE
9
‘Light’ Pz. nouns/ names
All functions As subjects As complements or m adjuncts
8014 5821 2193
b f Determlner + head
c Nouns premodlfied
‘Heavy d Nouns postmodlfied
by 1 &]I% twe
by 1 prep phrase
e Nouns otherww pre- or postmodlfled
Total
3480 928 2560
!494 283 i211
1732 410 1322
2233 456 1777
16961 7898 9063
290
e
F. G. A. M. AARTS
distmgmshmg between ‘light’ noun-phrase types in columns a-b and ‘heavy’ ones in columns c-e In a more srmphfred form the data in table 9 m&y be presented m the following 2 x 2 table (table 10) TABLE
Subjects
Non-subjects
10
‘Llght’ Items
‘Heavy’ items
6749 4753
1149 4310
The xs test agun shows that this drstnbutron 1s hrghly srgmfrcant ($ : 0 001) here as well as III the case of the tables for each of the four text genres m table I1 TABLE 11
Light fiction
Sublects Non-subjects
‘Light’ Items 2220 1682
‘Heavy’ Items 211 1121
Scxentlfic wntmg
--
Subpcts Non-subpts
‘Llght’ items 720 629
‘Heavy’ Items 447 1140
Informal speech
Sublects Non-sublecls
‘Light’ Items 2064 1169
‘Heavy’ Items 148 811
Formal spoken and wntten Enghsh
SubJects Non-subjects
‘Llght’ items 1745 1273
‘Heavy’ Items 343 1238
Now that more complex structures have been exammed, rt IS clear that our hypothesis, whrch had already received support from the data m table 1,ISamply confumed by the tables above
NOUN-PHRASE
4
TYPES
291
IN CLAUSE-STRUCTURE
TEXTUAL ANALYSIS
Table 12 contams a textual analyas of the figures m the two bottom rows of table 9, thus enabhng us to examme then drstnbution for each of the 4 text-groups separately We now see that the situation we found in table 9 1s exactly parallehed m table 12 for each of these The two columns on the left have already been commented on above. The other three show that ‘heavy’ noun-phrase types are much less hkely to occur as exponents of sublect than as exponents of other functions m the clause Independently of therr function m the clause, they would also seem to be more frequent m Screntrfrc wrrtmg and m formal language than m Fictron and mformal speech TABLE 12 ‘Llght’ Pronouns/ names
‘Heavy’
f Determlne* + head
Xouns premodlfled by
I
ad]
B 8 a B 2
s 8 ,7
by
’
prep phrase
Nouns other\r,se pre- or
Total
postnodlfled
Light f&Ion
1943
277
60
59
92
Sclentlfic wntmg
459 1941
26 1 123
104 37
168 49
175 62
1167 2212
Informal speech Formal spoken and wntten
i:31
1470
267
02
134
127
2088
Light fiction
754
Scientific wntmg Informal speech Formal spoken
163 677 599
920 466 492
363 209 249
324 327 235
434 524 327
2803 1769 1980
674
310
436
492
2511
Enghsh
2
Nouns postmod]fled
and wntten Enghsh
5
FINAL
REM 4RKS
Obvrously v’e cannot conchrde from what precedes that heavy’ noun-phrase t ‘pes do not occur as subjects They do, and by no means infreqr mtly.
292
F G. A. M. AARTS
Consider for example The correspondlugsharp, but not mathematwally drscontmuous,changes predrctedby theory for a large but fmlte assembly (coutammg perhaps lOBa molecules) are, from an experimental pomt of stew, mdlstmgmshable from those that would be expected for an ‘mfmlte’ assembly, 3)
,.
@a
Instances of more particular mterest, mcludmg especially those of which detads have become avadable since the pubhcatlon of the most recent of the aforementioned reviews, are dlscussed m the followmg text (8a.4) Well, I think people who are gomg to create anythmg new to develop a new Idea, which IS not on the present tram lmes of commumcation, have got to do that (5b 2) A daff&l that Mey had plucked from the flower-banks before commg on board and had pganted m the ]om between the two rear seats, now the sole survivor of the reception left behmd them, jiggled its soundless bell (6 4) A froth whuzh depends upon the pnvate and Inward operation of the Spmt, whether m the form of mtellectual apprehension, emottonal expenence. or dreams and vIslons, ~111not be the genume work of the Splnt of (;od (8ab 2)
It is mterestmg to note that all these examples, with one excepItron, are from wntten texts In the spoken texts such ‘heavy’ structures are extremely rare There is evidence m thrs corpus to show that the language posse~es certam devices to cope wrth ‘heavy’ structures. Pngve (1960, 1961) has pointed out that among these is the use of discon tinuous constrtuents, as in the following cases. Several mam types of homolybc arylatlon reactions are known, which feature mteractmn between aryl radicals and aromatic substrates, and which can be generally expressed by equation (I) (8a 4) . condltlons can be found under whch they are not a compbcatmg m the mterpretatlon of quantitative results White corpusclea resembhng lymphocytes and pelymorphonuclear occur, produced by lymphold tissue m the ludneys 813 elsewhere
Another device is to put the sub@ in sentence&ml in be-sentences containmg a complement or an adjunct:
cells
@a1)
positron
NOUN-PHRASE
TYPES
IN CLAUSE-STRUCTURE
Attached to thrs base ISa serms of mcomplete cartdagmous boxes surroundmg (6d. 1) the bram and organs of specml sense
At theend of the resprratory tube rs a senes of veiar tentacles, correspondmg
exactly m posrtron to those of amphroxus, and servmg to separate the mouth an3 owophagus from the resprratory tube whrle the lamprey IS feedmg (8a 1)
The followmg repetrtrvedevrce occurs m a spoken example Whathe wdz,askmg for was that the Opposrtron. that Her Malesty’s Opposrtron, whrch has a very Important part to play m our whole constrtutronal structure, that Her Malesty’s Opposrtron should be kept mformed by Her Male&y’s Government of the mam and most crucral developments m defence (5b 16)
These problems present some very interestmg questrons, but to examme these would be beyond the scope of this paper As to our hypothesis, there can be no doubt about rts v&&ty There ISoverwhelming evidence m our corpus to Justify the conclusronthat nounphrase types are not randomly &stnbuted over the Enghsh clause, but that there is a marked assocratron between then structural make-up and their functronal role. There ISless evrdence to support the styli&m part of our hypotheses, whrch probably reqmres the investigation of more material. Kathlieke Umversitezt, Instztmt Enpls-Awskaans, B@evekis~@ 72, Ny+qm Tke Netherlands
REFERENCES
,
YNGVE, Victor H l%O ‘A model and an hypothws for language structure’, Procecdrngs of the Amsrrcan Phrlosophtcd Socrcty 104, 444446 YNGVE,Victor H I%1 ‘The depth hypothese’, Proccedrngs of Symposra an Ap+ed Mathmnatss, XII, Structure of Language and rts Mathematrcal Aspects, 130438
,