Zipf and his heirs

Zipf and his heirs

Language Sciences 26 (2004) 1–25 www.elsevier.com/locate/langsci Zipf and his heirs Regina Pustet Institut f€ur Allgemeine und Indogermanische Sprach...

275KB Sizes 0 Downloads 67 Views

Language Sciences 26 (2004) 1–25 www.elsevier.com/locate/langsci

Zipf and his heirs Regina Pustet Institut f€ur Allgemeine und Indogermanische Sprachwissenschaft, Ludwig-Maximilians-Universit€at M€unchen, Geschwister-Scholl-Platz 1, D-80539 M€unchen, Germany Accepted 9 March 2003

Abstract During the first half of this century, George Kingsley Zipf devised a comprehensive model of language mainly on the empirical basis of frequency counts in discourse. Although Zipf is remembered particularly for establishing the general formula ‘‘the higher the discourse frequency of a linguistic item, the shorter it will be’’, his model is complex enough to merit reevaluation especially in the light of more recent findings of discourse-based models of language and grammaticalization theory. Ó 2003 Elsevier Ltd. All rights reserved. Keywords: Usage-based models of language; Discourse analysis; Word frequency; Diachronic development of language

1. Introduction Recent years have witnessed a considerable surge of interest in discourse-based approaches to language, which derive all kinds of theoretical generalizations from coherent samples of natural language production, rather than from isolated sentences and clauses which are often formed in an ad hoc manner to prove certain theoretical points. One issue of particular interest within discourse-based models (or ‘‘usage-based models’’, Langacker, 1987) is frequency phenomena. Discourse studies are, however, by no means a novelty in theoretical linguistics. The most prominent of the early advocates of discourse-based approaches to language was George Kingsley Zipf (1902–1950). ZipfÕs influence reached far beyond the limits of the initial object of his studies, linguistics. His numerous publications include the two major works

E-mail address: [email protected] (R. Pustet). 0388-0001/$ - see front matter Ó 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0388-0001(03)00018-4

2

R. Pustet / Language Sciences 26 (2004) 1–25

population size

The psycho-biology of language (1935/1965) and Human behavior and the principle of least effort (1949/1965). ZipfÕs stated goal was to develop a general theory of human ecology covering aspects of human existence as diverse as geography, politics, economy, art, biology, and psychology. The basic explanatory factor in his ambitious theory is the so-called ‘‘principle of least effort’’ (Zipf, 1965b). The main body of ZipfÕs work consists of empirical counts of the frequency of occurrence of all kinds of phenomena relative to another variable, such as physical size, quantity, or distance. The statistical analysis of these data reveals spectacular correlations. For instance, cities, towns, or other communities can be ranked according to population size; Zipf found that the highest-ranking cities, etc. will also be the ones which occur least frequently, and most importantly, that the frequency of cities, etc. decreases proportionally to their rank in the population size hierarchy. Graphical representations of Zipf distributions regularly yield a hyperbolic curve, as in Fig. 1. It must be kept in mind, however, that Fig. 1 depicts only the basic structure of the graph, and not its exact slope in individual cases. More detailed empirical research on the distribution of populations led Zipf to claim that certain political events, such as the American Civil War, would have been predictable on the basis of his curves: he found that certain divergences from the ideal curve correlate with political instability. The distribution of economic power and social status within societies also conforms to the graph given in Fig. 1. The higher the income, the lower the number of people who will receive it; it probably does not even take elaborate statistical analyses to make this formula convincing. Analogous hyperbolic distributions can be observed for the frequency of service establishments, manufacturers, retail stores, etc. in relation to business size. Similarly, statistical correlations between frequency and geographical distance produce hyperbolic patterns of distribution. Thus, the number of one-way trips taking place in a certain area within a certain time interval decreases proportionally

frequency Fig. 1. ZipfÕs basic frequency:size distribution for populations of cities and other communities.

R. Pustet / Language Sciences 26 (2004) 1–25

3

as the length of the trips increases. Or, at least in the first half of this century in American society, on the whole, the number of marriages decreased gradually the farther apart the would-be partners lived. Change of residence of families can also be shown to decrease in frequency as the distance between the new and original residences increases. In music, the number of intervals of like length between repetition of notes increases as the length of the intervals increases, again, in such a way that reproducing the data graphically results in a hyperbolic distribution. Likewise, the length of articles in newspapers and encyclopedias decreases as the frequency of their occurrence increases. This is also true for the length of individual books kept in libraries. The correlations addressed earlier comprise just a handful of the numerous statistical phenomena Zipf discovered. Subsequent research has disclosed a variety of analogous correlations which are all thought to be manifestations of what is generally referred to as ‘‘ZipfÕs Law’’ today. But its extremely wide range of application notwithstanding, ‘‘ZipfÕs law remains essentially unexplained’’ (Gell-Mann, 1994, p. 97). Paradoxically, the famous Zipf curves have attracted far more attention in the natural and social sciences (e.g. Gell-Mann, 1994, p. 92ff) than in linguistics, although ZipfÕs methods originate in his work on language. The interpretations Zipf proposed for the puzzling statistical regularities he found have often been criticized, especially by mathematicians who favor stochastic explanations of the data (more on this in Section 4.1). However, the validity of ZipfÕs empirical findings, both within and outside of linguistics, remains indisputable. In the more recent past, ZipfÕs model has been corroborated by a wide range of empirical investigations in finance and business, physics, biology, ecology, and library science, to name just a few areas of research in which the model has been tested. Moreover, ZipfÕs theories have always played an eminently important role within information science. Despite of its complexity and acknowledged empirical soundness, ZipfÕs work in linguistics has never been appreciated by practitioners of the discipline as much as it might deserve to be appreciated; in the literature in general linguistics references to ZipfÕs work are sporadic (for three examples, cf. Bybee and Hopper, 2001b, p. 1f; Croft, 2000, p. 75; Haiman, 1985), and a less than superficial evaluation of his theories apparently never occurred (cf. Bybee and Hopper, 2001b, p. 1f). ZipfÕs reputation in general linguistics mainly stems from his popular formula ‘‘the more the discourse frequency of a linguistic item increases, the more its length decreases’’ (Zipf, 1965a,b), which is, as Section 2 will show, just one facet of ZipfÕs comprehensive model of language. However, the discussion of ZipfÕs model of language has never ceased within more specialized, quantitatively oriented approaches to language, especially in Germany and Eastern Europe (e.g. Guiter and Arapov, 1982; K€ ohler, 1986; Tuldava, 1995). Currently, this discussion seems to be gathering momentum, as numerous recent Zipf-related publications, many of which are contained in the Journal of Quantitative Linguistics, demonstrate. Moreover, it seems that ZipfÕs ideas are now in the process of being integrated into more general approaches to language structure, as the publication of a comprehensive collection of frequency studies in Bybee and Hopper (2001a), Frequency and the Emergence of Linguistic Structure, indicates.

4

R. Pustet / Language Sciences 26 (2004) 1–25

Today, mainstream linguistics, i.e. generative linguistics, is being challenged by a theoretical framework that interprets language structure as the direct output of the need to communicate, and other function-related aspects of language. This theoretical paradigm, for which the terminological label of functionalism is widely used, employs analytical methods based on actual discourse. Within these models, frequency counts play an ever more decisive role (e.g. Bybee and Hopper, 2001b). However, modern functionalism has actually evolved to a far lesser degree on the intellectual foundation Zipf has laid than one might expect, given the fact that in essence, the parallels between ZipfÕs work and contemporary functionalist approaches to language go far beyond the mere postulation of a statistical correlation between the discourse frequency and the length of linguistic units. Such convergences would be all the more significant if, as a detailed examination of the extant literature on the subject suggests, the relevant recent research results have largely been worked out in ideological isolation from Zipf. It is generally accepted that one of the criteria by which the quality of a scientific theory should be measured is the amount of independent evidence supporting it. Today, ZipfÕs theories appear amazingly modern. It seems like it is time to take stock, to determine how the more recent work in functionalism relates to ZipfÕs model of language in detail, and, most importantly, to assess to what extent the combination of ZipfÕs intellectual heritage with modern functionalist theories could further the development of a comprehensive theory of human language. These are the issues that will be dealt with in this paper.

2. Zipf ’s model of language 2.1. Empirical results On the basis of statistical discourse analyses Zipf establishes four interdependent variables in terms of which individual linguistic elements can be characterized: discourse frequency, phonetic size, semantic scope, and age. ZipfÕs data come from languages as diverse as English, Latin, German, Gothic, Chinese (Mandarin), Lakota, Nootka, and Plains Cree. Henceforth, the term expression unit (EU) will be used to refer to individual linguistic elements. An EU is an invariant string of phonetic units that codes a fixed meaning or function or a fixed set of meanings or functions. Thus, the term EU subsumes the two most basic types of linguistic elements, i.e. lexemes and grammemes, under a single terminological label. Zipf himself does not operate with this terminological simplification, which is introduced here mainly because it ensures an increased compactness of argumentation in the discussion of ZipfÕs theories. The point of departure for ZipfÕs investigations is the fact that with respect to all four of the parameters listed earlier, there is internal variation among EUs in any one language. The observation that EUs differ in length or phonetic size is so trivial that it does not need to be explicated further. On the basis of variation in phonetic size, the EUs of a given language can be arranged in classes. To use examples from

R. Pustet / Language Sciences 26 (2004) 1–25

5

English, carpet contains more phonetic material than car; but car comprises the same number of phonemes as hit, and consequently, car and hit are members of the same class in terms of the criterion of phonetic size. It should be noted that Zipf is not consistent in the usage of the variable of number of phonemes as the defining criterion for phonetic size. In some of his discourse samples, phonetic size is measured in terms of the number of syllables per EU. The non-linguist may be less conscious of the fact that there is also considerable variation among EUs with respect to the frequency of their occurrence in discourse. For instance, in English discourse, the is (drastically) more frequent than carpet. On the basis of frequency variation in discourse, EUs can be arranged in frequency classes which group EUs of the same discourse frequency together. EUs also differ with respect to their semantic scope. To use English examples again, the meanings of the noun field include: Country or open country in general; a piece of ground enclosed for tillage or pasture or sport; the range of any series of action or energies; specialty; an area of knowledge, interest, etc.; a region of space in which forces are at work; the locality of a battle; the battle itself; a wide expanse; area visible to an observer at one time; a region yielding a mineral etc.; the surface of a shield; the background of a coin, flag, etc.; those taking part in a hunt; the entries collectively against which a contestant has to compete; all parties not individually excepted; disposition of fielders; a set of characters comprising a unit of information. For coyote, on the other hand, only the meanings Ôa prairie-wolf, a small wolf of North AmericaÕ are listed in the dictionary consulted. Consequently, field has more meanings than coyote. From a contemporary perspective, of course, semantic scope should be defined on the basis of criteria other than the number of metaphors given for an EU in a dictionary (cf. Section 4.3); for now, however, it suffices to know that this is the criterion Zipf used. On the basis of variation in semantic scope, EUs can be arranged in classes that subsume EUs of equal semantic scope. Lastly, EUs differ with respect to their relative age, i.e. with respect to the amount of time that passes between their emergence and obsolescence. Thus, the EUs employed by a given language can be assigned to different age groups. ZipfÕs analyses of the discourse structure of various languages reveal that the four parameters frequency, phonetic size, semantic scope, and age correlate systematically. 2.1.1. Frequency: phonetic size According to Zipf (1965a, p. 25), the phonetic size of EUs ‘‘tends, on the whole, to stand in an inverse (not necessarily proportionate) relationship to the number of occurrences’’. This formula, more than any other statistical relationship he disclosed in his complex work, might have earned Zipf immortality in linguistics. However, it is often overlooked that the formula, as it stands, is ambiguous. One possible reading is that the phonetic size of individual EUs increases as their discourse frequency

6

R. Pustet / Language Sciences 26 (2004) 1–25

decreases, and vice versa. The English article the [ðc], which is phonetically short but extremely frequent in discourse, can be used to illustrate this most common interpretation of ZipfÕs frequency: phonetic size principle. This tenet will henceforth be referred to as the Popular Frequency/Size Principle. However, this principle is not the only thing Zipf had in mind when he established the earlier statistical correlation between frequency and phonetic size. Some materials included in ZipfÕs work do not concern the distributional and phonetic properties of individual EUs, but rather, the statistical correlation between phonetic size classes and frequency. Thus, in this case, rather than counting the discourse occurrences of individual EUs, Zipf determined the total of discourse occurrences, including repetitions, of the combined members of a given size class. For instance, the phonetic size class which contains English the also includes at [æt], he [hi:], it [ t], key [ki:], me [mi:], see [si:], she [i:], tea [ti:], two [tu:], you [ju:], and, presumably, hundreds of additional EUs which are composed of two phonemes. Once the numerical values for the occurrences of each biphonemic EU encountered in a given discourse sample are added up, the properties of the size class characterized by the index two, i.e. of the size class that contains all EUs composed of two phonemes, can be compared to the properties of all other size classes contained in the sample. To illustrate this point, Zipf used the German sample reproduced in Table 1, in which, however, phonetic size is measured in syllables rather than in phonemes. The figures given in Table 1 demonstrate what Zipf aimed at when he established the version of the size: frequency principle which will henceforth be referred to as the Basic Frequency/Size Principle. The total of discourse occurrences of bisyllabic EUs (total: 3 156 448) is lower than the total of occurrences of monosyllabic EUs (total: 5 426 326), but higher than the total of occurrences of trisyllabic EUs (total: 1 410 494), and so on. It must, however, be kept in mind that the validity of the Basic Frequency/Size Principle, which states that the class hier-

Table 1 Distributional data on German, adapted from Zipf (1965a, p. 23) Phonetic size index (number of syllables in EU)

Total of discourse occurrences of EUs with the phonetic size index specified in column 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

5 426 326 3 156 448 1 410 494 646 971 187 738 54 436 16 993 5038 1225 461 59 35 8 2 1

R. Pustet / Language Sciences 26 (2004) 1–25

7

archies for phonetic size and overall discourse frequency stand in an inverse correlation, by no means implies the validity of the above Popular Frequency/Size Principle, which characterizes the behavior of individual EUs. As a matter of fact, the variables of phonetic size and frequency of individual EUs may stand in any imaginable correlation to each other without violating the Basic Frequency/Size Principle. The following thought experiment elucidates this. Theoretically, the total of discourse occurrences determined for a given class X of EUs with a low phonetic size index such as 2 might come from a large set of individual EUs each of which is repeated infrequently or occurs only once in the discourse sample analyzed. To choose random numbers, the total of occurrences of members of size class X in the fictitious discourse sample is 16, and the sum of individual EUs contained in this class is 10. In the same discourse sample, there might be another class Y with a higher phonetic size index, such as 9. According to the Basic Frequency/Size Principle, the overall discourse frequency of class Y, which has a high phonetic size index, must be lower than that of class X, which has a low phonetic size index. Again, to choose random numbers for purposes of demonstration, for class Y an overall discourse frequency of 7 will be posited; for the total of individual EUs contained in class Y, the value 2 will be posited. The respective internal composition of classes X and Y might be as in Tables 2 and 3. This logically possible combination of size classes within a given discourse sample, which obeys the Basic Frequency/Size Principle in that the overall discourse frequency of members of the class whose members are smaller in size exceeds the overall discourse frequency of members of the class whose members are larger in size has a property that reverses the Popular Frequency/Size Principle: any of the members of the class with the lower size index (class X) is less frequent in discourse than any of the members of the class with the larger size index (class Y). For class X the discourse frequencies of individual EUs are either 1 or 2; for class Y, the discourse frequencies of individual EUs are either 3 or 4. The conclusion to be drawn from the possible existence of such constellations is as follows. Given the fact that the Basic Frequency/Size Principle is effective in any discourse sample, in order for the Popular Table 2 Class X (phonetic size index 2)

Discourse frequency

EU1

EU2

EU3

EU4

EU5

EU6

EU7

EU8

EU9

EU10

2

1

2

2

1

1

2

2

2

1

Total of occurrences of class X members in discourse: 16.

Table 3 Class Y (phonetic size index 9)

Discourse frequency

EU1

EU2

4

3

Total of occurrences of class Y members in discourse: 7.

8

R. Pustet / Language Sciences 26 (2004) 1–25

Frequency/Size Principle to be valid also, the number of individual EUs within a given size class is not arbitrary. Thus, an additional important variable ‘‘number of individual EUs in a given size class’’ is effective in ZipfÕs model. This parameter, labeled ‘‘variety’’ by Zipf, will be referred to as diversity in this study. In all of ZipfÕs discourse samples, the parameter of diversity behaves in such a way that the validity of the Popular Frequency/Size Principle is confirmed (cf. next section). 2.1.2. Frequency: diversity ZipfÕs discourse analyses reveal that ‘‘the number of different words (i.e. variety) seems to be ever larger as the frequency of occurrence becomes ever smaller’’ (Zipf, 1965a, p. 25). Since the Basic Frequency/Size Principle, as discussed in the preceding section, establishes a correlation between frequency and size, and the parameter of diversity correlates with the parameter of frequency, the parameter of diversity correlates with the parameter of size as well (cf. Zipf, 1965a, p. 27). This predictable relationship is corroborated by the statistical data. What is more, the frequency: diversity correlation lends itself to representation by the formula ab2 ¼ k: The variable a stands for the number of distinct EUs in a given frequency class Q; b stands for the frequency index (i.e. number of discourse occurrences) that defines frequency class Q. k symbolizes a constant. To illustrate this, part of ZipfÕs data on the Latin of Plautus (1965a, p. 27) are reproduced in Table 4. The formula prescribes that multiplication of a with b2 always yields k, i.e. a value which is more or less constant. The data given in Table 4 fulfill this requirement, since in any case, k ranges between a minimum of 4025 and a maximum of 5577. 2.1.3. Frequency: meaning That Zipf has also provided statistical evidence for the interdependence between the parameters frequency and meaning is less widely known. According to this Table 4 Discourse frequency:diversity in the Latin of Plautus b

a

k

1 2 3 4 5 6 7 8 9 10 11 12 13

5429 1198 492 299 161 126 87 69 54 43 44 36 33

5429 4792 4428 4784 4025 4536 4263 4416 4374 4300 5324 5184 5577

R. Pustet / Language Sciences 26 (2004) 1–25

9

principle, the number of meanings of EUs increases as their discourse frequency increases (Zipf, 1965b, p. 28). However, ZipfÕs frequency: meaning correlation should be taken with a pinch of salt, for two reasons. First, the empirical data Zipf had at his disposal in establishing this correlation come from a single language only, namely English. Second, determining a statistical correlation between the parameters of frequency and meaning presupposes that both parameters are quantifiable. That is, for this purpose, both meaning and frequency must de decomposable into units which can be counted. This does not pose a problem with frequency, but meaning, so far, has always eluded precise definitions, and the question of how the semantic scope of a given EU should be partitioned into units of meaning is one aspect of this difficulty. In many cases it seems to be a matter of taste whether one posits a certain number of semantic units, or sememes, for a given EU, or a higher number of semantically more specific sememes, or a lower number of semantically more abstract sememes. Thus, any attempts at quantifying the meaning of individual EUs will, of necessity, result in ad hoc segmentations of semantic space, and any attempts at defining potential statistical correlations between frequency and meaning will, consequently, remain vague. As indicated in Section 2.1.3, the empirical method Zipf employs in this context is counting dictionary definitions for EUs. The formula Zipf derives from his data states that the number of meanings of EUs increases as their discourse frequency increases. In Section 4.3, it will be argued that this tenet, despite the shortcomings of the empirical method it is based on, must be taken seriously. 1 Moreover, this aspect of ZipfÕs work is borne out by more recent investigations such as Breiter (1994) for Chinese and Levickij et al. (1996) for Modern German. In Modern German, ‘‘on the whole, there is a tendency for the frequency of use to grow with increasing number of meanings’’ (Levickij et al., 1996, p. 134). 2.1.4. Frequency: age Like his views on the interaction of discourse frequency and meaning, ZipfÕs proposal that there are statistical correlations between discourse frequency and age as well has not been noted widely. On the basis of empirical data on consecutive lexical borrowings from Scandinavian, Romance, and other languages into Middle English and modern English, as compared with those lexical items that were present in the earliest stratum, i.e. Old English, Zipf claims that the discourse frequencies of EUs are, on the whole, ‘‘inversely related to their age’’ (Zipf, 1965b, p. 120). Zipf concludes that if the less frequent EUs tend to be more complex in phonetic size, the parameters phonetic size and age must also stand in an inverse relationship to each other; this prediction is borne out by his data (Zipf, 1965b, p. 112). Thus, in a given

1

In this context, it is noteworthy that the rule ‘‘the number of meanings of EUs increases as their discourse frequency increases’’ seems to apply mainly to lexical elements, rather than to grammatical elements (anonymous reviewer, p.c.). In English and presumably, in other languages as well, grammatical elements generally have a very low range of meanings, even if their discourse frequency is very high.

10

R. Pustet / Language Sciences 26 (2004) 1–25

language, ‘‘the longer and less frequent words tend to be the younger ones’’ (Zipf, 1965b, p. 120). The complex frequency:size:age correlation observed with borrowed EUs, however, has some interesting implications if its potential motivations are considered. The interpretation that at the ‘‘oldest’’ level, only short/high-frequency EUs have been borrowed, while in more recent times, ever longer/lower frequency EUs have been borrowed, is implausible. Under the assumption that at the time when borrowing from a given language takes place, EUs of various frequency and size indexes are borrowed, two possible explanations for the chronologically stratified distribution of borrowed EUs with respect to frequency and size indexes can be offered. For one, the process of erosion might eliminate borrowed EUs (or, possibly, all EUs in a given language) selectively, i.e. according to their frequency and size indices, in such a way that low frequency/large size items are eroded faster than high frequency/small size items. An alternative explanation would be that in the course of time, borrowed EUs are gradually shortened, and their discourse frequency increases. However, these two options might not be the only solutions to the problem. Another possible explanation for the observed effect of chronological stratification of borrowed EUs is the assumption that languages borrow selectively in such a way that low frequency/large size items are more likely to be borrowed than high frequency/small size items. This hypothesis is in line with the common intuition that loanwords tend to have highly specialized meanings (Zipf, 1965b, p. 120), and it is also in line with the assumption that on the whole, lexical items are borrowed more readily than grammatical items (e.g. Moravcsik, 1978, p. 110) 2––but lexical items, on the whole, have lower frequency and larger size indices than grammatical items. As will be made explicit in Section 3, current research into language change provides ample evidence for the validity of all three of the earlier hypotheses devised to account for the phenomenon of chronological stratification of borrowed EUs. This is not too surprising since the three explanatory models are not mutually exclusive on logical grounds. 2.1.5. Summary ZipfÕs basic empirical findings regarding the discourse-dependent properties of individual EUs can be summarized in the following integrated chart, which illustrates the systematic correlation of the four parameters of frequency, phonetic size, semantic scope, and age (Table 5). ZipfÕs basic model, i.e. the quantitative interaction between the components of discourse frequency, phonetic size, and diversity, can be schematized as in Fig. 2. The proportions chosen in Fig. 2 do, of course, not correspond to the quantitative distributions encountered in real discourse. Each layer in the inverted triangle represents a frequency class––i.e. each layer subsumes those

2

This is merely a statistical tendency since there are counterexamples to this claim (for an overview, cf. Harris and Campbell, 1995, p. 120–150). In addition, Aikhenvald (in press, chapter 12) reports that in the Amazonian area, bound morphemes, which can have the status of grammemes, can be borrowed more easily than free morphemes if there is a cultural inhibition about borrowing forms.

R. Pustet / Language Sciences 26 (2004) 1–25

11

Table 5 ZipfÕs basic empirical correlations Properties of individual EUs Frequency Phonetic size Semantic scope Age

Frequent in discourse Small in phonetic size Large semantic scope Old

Infrequent in discourse Large in phonetic size Small semantic scope Young

max.

discourse frequency semantic scope age

min. Fig. 2. Schematic representation of ZipfÕs model of language.

EUs whose discourse frequency indices are identical. Frequency classes are ordered hierarchically: the class with the highest frequency index is placed at the top of the inverted triangle, while the class with the lowest frequency index ranks at the bottom. In addition, the parameter of discourse frequency is iconically represented by a higher or lower number of circles in each layer––each circle represents an EU occurring in natural discourse. Thus, the higher the number of circles in a layer, the higher the frequency index of this layer. The variable of phonetic size is symbolized by the size of the circles contained in the layers. Note that in actual discourse, distributional patterns with respect to phonetic size are not as absolute as the idealized Fig. 2 makes believe: in a given frequency layer, EUs of varying phonetic sizes will be found. There is merely a pronounced statistical tendency for shorter EUs to accumulate in higher-frequency classes, and for longer EUs to accumulate in lower-frequency classes. That is, in a given frequency class, EUs of a given size will prevail, but there will be a certain proportion of EUs of longer and shorter sizes as well. So far, Fig. 2 schematizes only the correlation of raw discourse frequency of EUs and their phonetic size. The coloring or marking of the individual circles represents the variable of diversity in the diagram. Identically colored or marked circles represent instances of repeated occurrence of a single EU in discourse. Variety in coloring or marking increases gradually from the top to the bottom layers of the triangle. This symbolizes the correlation between high frequency/small size and low diversity, and vice versa.

12

R. Pustet / Language Sciences 26 (2004) 1–25

The parameters of meaning and age, which are also part of ZipfÕs model, are represented in Fig. 2 as well: the ‘‘measuring rod’’ at the left hand side of the diagram indicates discourse frequency as well as semantic scope and age, since these parameters correlate directly––an increase of the value for one of these parameters amounts to an increase of the value for the two others. 2.2. Interpretation of the statistical data Especially within the functionalist paradigm of language theory, the question about the system-internal or system-external factors that shape the surface structure of human language has always been of considerable interest. Does ZipfÕs putatively universal model of language yield any new insights in this respect? In tackling this issue, one might ask, more specifically, whether any causal chains between the interdependent parameters frequency, size, semantic scope, and age can be established. In this context, the possibility that there are additional, yet undiscovered forces at work in creating causal dependencies between the four parameters must, of course, be taken into consideration. According to Zipf, the structure of human language, like structure in any other domain of human existence, can be traced back to the fundamental principle of economy alias the ‘‘principle of least effort’’, which is proclaimed ‘‘the primary principle that governs our entire individual and collective behavior of all sorts, including the behavior of our language and preconceptions’’ (Zipf, 1965b, p. viii) since ‘‘all physical process throughout the entire time–space continuum is governed by the one single superlative, least action’’ (Zipf, 1965b, p. 3). Zipf regards the sheer existence of human language as a consequence of the economy principle because ‘‘some human objectives are more easily obtained with speech than without it’’ (Zipf, 1965b, p. 20). However, in his opinion, the economy principle is effective at a more microscopic level of linguistic organization as well. The fact that the parameters of frequency and phonetic size are inversely proportional can directly––and plausibly–– be motivated via the principle of economy since ‘‘an abbreviation of speech is a short-cut in time . . . the more extensive the abbreviation, the greater is the velocity of the stream of speech’’ (Zipf, 1965a, p. 284). Using short EUs more frequently in discourse than long EUs is economical from several perspectives. The shorter the stretch of EUs required for communication, the less time and effort must be invested in speech production; time and effort saved in this context can be spent on other activities. Moreover, there are situations in which survival depends directly on the speed of communication. Although the principle of economy, as formulated earlier, provides a convincing motivation for ZipfÕs model of language, it might not be the only factor that should be taken into consideration in this context. The economy that lies in expressing the most frequent concepts by means of the shortest phonetic sequences, in the first place, works to the advantage of the speaker. Currently, it is not clear whether, and if so, to what extent this manifestation of economy facilitates the hearerÕs tasks of speech perception, decoding, and processing as well––it might actually render these tasks more difficult, as Langacker (1977, p. 105) argues:

R. Pustet / Language Sciences 26 (2004) 1–25

13

Signal simplicity naturally benefits the speaker more than the listener, as it limits the complexity of the task of physically producing an utterance by reducing the number, length, and difficulty of the units to be articulated. Obviously, though, continued erosion of the units of expression will ultimately render the listenerÕs task more difficult. It should be noted that economy in language structure resides in areas other than word length as well; in particular, frequency effects can also be observed at the level of the complexity of phonemes, as Zipf himself points out (1965a, p. 49–129). Thus, for instance, across languages, non-aspirated stops are more frequent than aspirated stops, and voiceless stops are more frequent than voiceless stops; but aspirated and voiced stops can also be regarded as phonetically more complex than non-aspirated and voiceless stops, respectively. Such observations result in a statistical correlation between high phonetic complexity and low frequency––and between low phonetic complexity and high frequency––which is analogous to the correlation between word length and discourse frequency which has been dealt with above. Further, in diachronic development, complex phonemes are far more susceptible to reduction and simplification than less complex phonemes. Another case in point is the important question of why the parameters of frequency and diversity are inversely proportional. Zipf (1965a, p. 21) alludes to an indirect explanation of this phenomenon: the mathematical law of permutation prescribes that the sum of different collocations of units––such as letters or phonemes––increases as the length of the string of collocated units increases. Thus, in a system comprising the five units A–E, the total of logically possible collocations which are five units long exceeds the total of collocations which are, for instance, only two units long. Consequently, any language will have fewer short collocational strings (i.e. lexemes or grammemes) at its disposal than longer collocational strings. Thus, according to the law of permutation, the system-internal diversity of information strings is a function of the variable of length or phonetic size. But since according to ZipfÕs empirical results there is a systematic correlation between the parameters of size and frequency, the parameters of frequency and diversity, of necessity, correlate as well. In other words, small size is concomitant with both low diversity and high frequency, and therefore the formula ‘‘small diversity is concomitant with high frequency, and vice versa’’ applies as well. No matter how convincing this explanation for the correlation between diversity and frequency may appear to be at first glance, it is unclear if it is suitable for motivating the situation holding in human languages because, presumably, no language ever utilizes all logically possible collocations of phonemes in its grammar and vocabulary, as Zipf (1965a, p. 21) himself points out. This is partly because certain combinations of phonemes are difficult to pronounce, but even if such combinations are disregarded, in any language, there will always remain a residue of phonetically well-formed collocations that are ‘‘out of service’’. Nevertheless, on the whole, an explanation of the frequency/diversity correlation via the law of permutation does not seem entirely unrealistic.

14

R. Pustet / Language Sciences 26 (2004) 1–25

3. Contemporary research into diachronic change One of the important recent innovations in linguistic theory is grammaticalization theory, as articulated in works such as Heine et al. (1991), Hopper and Traugott (1993), or Traugott and K€ onig (1991). Grammaticalization theory confirms ZipfÕs empirical findings in an impressive way. The extent of the parallelism between ZipfÕs model of language and grammaticalization theory apparently has gone unnoticed so far, mainly because modern scholars hardly, if ever, deal with those facets of ZipfÕs model that go beyond the frequency/size principle in greater detail, i.e. with the correlations pertaining to the parameters of meaning and age, which are focal ingredients of grammaticalization theory. In Section 2.1.4 it is argued that one of the logically possible implications of ZipfÕs frequency:age correlation regarding the behavior of borrowed EUs is that in the course of time, EUs might move upwards from layer to layer in the schematic representation of ZipfÕs model given in Fig. 2. In other words, the longer a given EU is retained in a given language, the shorter, more frequent, and consequently, the more desemanticized it will become. By adding this dynamic component to ZipfÕs model of language all the major tenets of grammaticalization theory can be immediately derived from the latter. On this basis, ZipfÕs model predicts that while traveling upwards from layer to layer in Fig. 2, a given EU increases its frequency index, decreases its size index, and increases its semantic scope index, if small semantic scope is defined as specificity in meaning and large semantic scope as generality in meaning. Changing the value for one variable in the fourparameter system depicted in Table 5 amounts to changing the values for the other three parameters as well. For instance, increasing the value for age will increase the values for frequency and semantic scope, and diminish the value for size. It is precisely this kind of systematic interdependence between the variables that lies at the core of the empirical findings of grammaticalization theory. The basic tenet of grammaticalization theory is that grammar, ultimately, arises from the lexicon. Grammaticalization is the diachronic process by which EUs move out of the lexical level and proceed to the grammatical level, or the process by which EUs advance from a specific level of grammar to a more grammatical level. Symptoms of grammaticalization are increase in semantic generality or abstractness, reduction of phonetic substance, and increase of frequency of usage in discourse. The empirical fact that these diachronic developments occur simultaneously directly reflects, and thus underscores, the synchronic statistical correlations Zipf establishes between discourse frequency, phonetic size, and semantic scope. Of course, neither ZipfÕs correlations nor the earlier formula describing the process of grammaticalization can be taken as absolute in the sense that language change proceeds exclusively along the lines prescribed by the two models. For instance, presumably, in any language, there exist EUs which are old but infrequent, phonetically complex, and restricted in semantic scope, as well as EUs which are new but frequent, short, and less restricted in semantic scope. Thus, the model which is schematized in Fig. 2 merely represents statistical tendencies. Instances of language change which do not conform to the earlier formula which characterizes grammaticalization theory are dealt with extensively in Campbell (2001) and Newmeyer

R. Pustet / Language Sciences 26 (2004) 1–25

15

(2001). For example, both semantic change without concurrent phonetic reduction as well as phonetic reduction without semantic change––as in regular sound change––are widely attested. This basic independence of the processes which together constitute grammaticalization has led several scholars to question the notion of grammaticalization theory (cf. Campbell, 2001; Joseph, 2001; Newmeyer, 2001; among others) and to unmask the term as ‘‘nothing more than a label for the conjunction of certain types of independently occurring linguistic changes’’ (Newmeyer, 2001, p. 191); thus, ‘‘. . .there is no such thing as grammaticalization, at least in so far as it might be regarded as a distinct grammatical phenomenon requiring a distinct set of principles for its explanation’’ (Newmeyer, 2001, p. 188). However, there undoubtedly exists a conspicuous tendency for increase in semantic generality (i.e. semantic ‘‘bleaching’’) and reduction of phonetic substance to co-occur, as Norde (2001, p. 233) notes: Though it may be true that . . . phonetic erosion and semantic bleaching occur independently from one another, they are so frequently intertwined in language change that they deserve a descriptive framework of their own. If this conflation is the rule rather than the exception, there is no way of denying that a special status must be assigned to grammaticalization in a general theory of language change. Put differently, the earlier quotation implies that language change has a prototype structure, with co-occurrence of semantic bleaching and reduction of phonetic substance as the prototypical core. There is definitely a need to integrate grammaticalization theory into a more comprehensive approach to language change which determines the nature of the interrelations between all processes contributing to this phenomenon, such as reanalysis, sound change, and borrowing, as Campbell (2001) and Newmeyer (2001) argue. But even if this analytical step reveals that grammaticalization is only a specific aggregate of certain simultaneously occurring, basically independent mechanisms of change, the mere fact that this aggregate apparently describes something like the statistical norm in diachronic change suffices to elevate the phenomenon to the status of an particularly rewarding subject of research. After all, to use an analogy from language typology, statistical universals, i.e. language universals which capture statistical tendencies, are considered no less interesting than absolute universals, which represent cross-linguistically valid rules without exceptions. Zipf could have foreseen the direction that research into the mechanisms of language change would take, i.e. the rise of grammaticalization theory, on the basis of his own model, but during the first half of this century, empirical data suited for proving this point were hard to come by because investigations into diachronic developments other than sound change did not occupy the top position in the research agenda at that time. Zipf, however, did notice certain processes of phonetic reduction in the lexicon of the variety of American English spoken at his time, such as shortening of telephone to phone, or of gasoline to gas (1965b, p. 66), and proposed, at least tentatively, a linkage of such processes with an increase in the frequency of usage of the respective lexemes:

16

R. Pustet / Language Sciences 26 (2004) 1–25

When any object, act, relationship, or quality becomes so frequent in the experience of a speech-community that the word that names it develops a high frequency of occurrence in the stream of speech, the word will probably become truncated (Zipf, 1965a, p. 30). One of the most striking claims of grammaticalization theory is the principle of unidirectionality (Giv on, 1975, p. 196; Heine et al., 1991, p. 212): by and large, grammaticalization must be conceived of as a diachronic one-way street on which EUs proceed only from lower to higher semantic generality, from higher to lower phonetic complexity, and from lower to higher discourse frequency. The reverse processes are comparatively rare (Heine et al., 1991, p. 212); for an extensive discussion of counterexamples to the unidirectionality principle, cf. Campbell (2001), Newmeyer (2001), and Norde (2001). ZipfÕs model of language also foreshadowed other central components of grammaticalization theory, such as the diachronic development of affixes from morphosyntactically free elements, or the emergence of fusional forms. In fact, ZipfÕs work, both in spirit and empirical fact, is so rich in hypotheses that anticipate modern functionalist approaches to language that discussing all these aspects would drastically exceed the limitations of space imposed on this paper. Consequently, this study focuses on the basic architecture of ZipfÕs model only. One of the possible explanations for the puzzling frequency:age correlation dealt with in Section 2.1.4 is assuming that the dynamic component of language change, which increases discourse frequency while decreasing phonetic size, affects low frequency/large size EUs more quickly than high frequency/small size EUs. Increasing conservatism in the higher-frequency ranks is the explanation of the correlation that Zipf himself envisaged. He interprets his data as suggesting that ‘‘. . . innovating ÔForcesÕ tend to impinge more successfully upon the forms and functions of less frequent acts at the outer periphery than those of the more frequent ones which serve as the more conservative core or matrix’’ (Zipf, 1965b, p. 120). As a matter of fact, more recent research, especially by Bybee, reveals that this principle of innovative inertia of high-frequency EUs manifests itself in many areas of phonological change. Thus, for instance, Bybee (2000, p. 65) reports that ‘‘the rate at which words undergo sound change is positively correlated with their text frequency’’––in particular, highfrequency EUs are more susceptible to reductive sound change than low-frequency EUs (Bybee, 2000, p. 67). BybeeÕs findings explicitly challenge the Neogrammarian tenet that sound change is always regular, i.e. that it affects all lexical items which qualify for such a change at the same time. For instance, schwa deletion in English is more prevalent in high-frequency lexemes such as memory, salary, summary, and nursery than in low-frequency lexemes such as mammary, artillery, summery, and cursory (Bybee, 2000, p. 68). [It should be noted that these data refer to American English only; non-American varieties of English show different distributional patterns, and the process as a whole is influenced by socio-cultural factors (anonymous reviewer, p.c.). The fact that there is such variation, of course, does not rule out the possibility that schwa deletion in non-American varieties of English, and in different socio-cultural contexts, might, in principle, be sensitive to discourse frequency as

R. Pustet / Language Sciences 26 (2004) 1–25

17

well.] Likewise, word-final t/d in English is less likely to delete as past tense suffix -ed, as in walk-ed and learn-ed, than as part of irregular past tense forms such as kep-t and tol-d, each of which in general have a higher discourse frequency than each of the lexical roots combined with -ed (Bybee, 2000, p. 74).

4. Synthesis Zipf was convinced that virtually every aspect of his discourse-based model of language could be motivated via the economy principle (cf. Section 2.2). However, more recent research in linguistics and other branches of science has brought to light new variables that interact with individual parameters in the model, and new explanations as well. Of primary interest in this context is identifying the cause of the interdependency of the four parameters frequency, size, semantic scope, and age–– understanding the mechanics of the coherence of ZipfÕs system would amount to understanding a lot more about human language. 4.1. The frequency/size correlation Before turning to explanatory models offered in linguistics, an approach devised and popularized by the mathematician B. Mandelbrot, who regards ZipfÕs frequency/ size principle as a necessary result of the laws of probability, will be considered. According to Mandelbrot (1953a,b, 1954, 1961, 1977), discourse can be interpreted as a string of phonemic units whose distribution is regulated by stochastic processes. In order to be fully applicable to language, MandelbrotÕs model requires the use of an additional phonemic unit ‘‘pause’’, which separates the EUs to be counted. (Introducing this element is only partly naturalistic because language employs pauses only for separating lexemes from lexemes, but not for separating morphemes from lexemes and other morphemes.) The status of phonemic units in MandelbrotÕs model is that of keys in a keyboard, with the special key ‘‘space’’ being assigned to the unit ‘‘pause’’. Mandelbrot (e.g. 1977, p. 344ff) argues that producing random strings of units, maybe through computer-generated simulation of stochastic processes, will, according to the laws of probability, yield exactly the same distributional patterns as natural discourse: the frequency of specific strings will increase as their length decreases. The same results will be obtained when monkeys create analogous strings of gobbledygook on a typewriter (e.g. Mandelbrot, 1961; Miller, 1957); again, in such a hypothetical experiment, the space key, which defines unit length, is just one of the keys contained in the keyboard, and will be hit with the same probability as all other keys. Such ‘‘fake language’’ can also be produced by means of dice, if one of the six numbers which constitute the ‘‘phonemic inventory’’ in this case is assigned the status of the placeholder for the unit ‘‘pause’’. The results of the dice experiment match those of the arrangements described earlier: individual strings are the more frequent the shorter they are. So if chance creates the same distributional patterns as human language, does it follow from this that the structure of human language is the product of chance? This

18

R. Pustet / Language Sciences 26 (2004) 1–25

interpretation is both intuitively and rationally unappealing. Even the non-linguist will object that we do not choose words on the basis of chance as we talk, but rather, on the basis of their meaning (also, cf. Zipf, 1965a, p. 29). But most importantly, it is not legitimate to argue that chance must rule language just because language produces the same distributional patterns as chance. This convergence might be accidental, although it is quite likely that both phenomena are somehow connected. It should be remembered at this point that language is by far not the only domain of human existence in which statistical distributions characterized by the formula ‘‘the smaller, the more frequent and vice versa’’ are encountered. On the contrary, most objects of investigation which relate to social science will display such distributional patterns, as, for instance, the population sizes of cities (cf. Section 1). The phenomenon is too widespread to be relegated to the status of a marginal curiosity; language is just one in many aspects of human existence in which ZipfÕs Law applies. ZipfÕs hyperbolic frequency distributions are about as prevalent in social sciences as Gaussian distributions are in the natural sciences (Mandelbrot, 1977, p. 404), which implies that ZipfÕs Law captures a very fundamental regularity in the universe surrounding human beings. Future research might identify some common motivation for all manifestations of ZipfÕs Law, and the possibility that this explanation might somehow be related to the laws of probability addressed above cannot be ruled out, of course. But it seems advisable to leave the solution of this problem to interdisciplinary research, given the complexity of the issue. Zipf has been attacked by Mandelbrot (1977, p. 423) for trying to explain his empirical data via the economy principle, but MandelbrotÕs stochastic model is not immune to criticism either. Its fundamental weakness is that it deals exclusively with the frequency:size correlation of linguistic items and ignores other essential aspects of ZipfÕs model, such as the behavior of the parameters of meaning and age. In the light of more recent findings particularly of grammaticalization theory (for more details, see later), the role of both meaning and age in a joint approach to language based on ZipfÕs model turns out to be as essential as the roles of frequency and phonetic size. Given this, it could be argued that the correlation between frequency and size might not be a direct one. If it can be shown, for instance, that substantial causal relationships hold between (a) frequency and meaning, and (b) phonetic size and meaning, a causal chain ‘‘frequency–meaning–phonetic size’’ can be established. In this case, the dependence relation between frequency and size would be secondary and indirect. This would render the search for a direct motivation of the frequency:size correlation pointless, to the effect that MandelbrotÕs explanatory efforts could be dismissed as unnecessary. Such prospects, in turn, might ultimately lead to an upgrading of the status of ZipfÕs model of language in linguistic theory: MillerÕs skeptical assessment of ZipfÕs work in his introduction to the 1965 edition of ZipfÕs The psycho-biology of language, which presumably contributed a lot to making ZipfÕs ideas unattractive to the linguistic audience at large, was, for the most part, based on MandelbrotÕs criticism. Professional linguists have also attempted to account for the existence of the frequency:size correlation. On encountering numerous ‘‘truncations’’ of lexical items in English, such as telephone > phone or gasoline > gas, which are concomitant with

R. Pustet / Language Sciences 26 (2004) 1–25

19

an increase in the discourse frequency of the respective lexemes, but never processes such as lengthening resulting from frequency or shortening resulting from rarity in his empirical data, Zipf (1965a, p. 36) concludes that ‘‘where frequency and abbreviatory substitution are connected, the frequency is the cause of the abbreviatory substitution’’. According to Zipf (1965a, p. 291), the motivation for phonetic reduction of high-frequency EUs is, ultimately, the principle of economy, since abbreviation increases the velocity of speech. ZipfÕs view is supported by Haiman (1985, p. 150): ‘‘Based on the economy principle, frequently used words should be shorter, and less used forms longer’’. Others also assume a close causal connection between frequency and phonetic reduction. According to Bybee (1994, p. 297), ‘‘differential reduction due to frequency is pervasive throughout the forms of a language’’. Heine (1993, p. 111f) feels that ‘‘it is the pragmatic factor of frequency of use that appears to be most immediately responsible for erosion’’. For related approaches in which frequency is assigned the status of a factor that shapes linguistic form, cf. for instance, DuBois (1985), Giv on (1979), and Hopper (1987). Could the phonetic size of an EU also influence the frequency of its occurrence in discourse? It is at least theoretically conceivable that a certain EU is used ever more frequently just because it is short, which renders its use more economical than that of competing longer EUs. But since there are apparently no empirical data which support this hypothesis, it will be taken for granted for now that the frequency parameter, either directly or indirectly, controls the size parameter. 4.2. The meaning/size correlation Contemporary empirical research sheds some more light on the nature of ZipfÕs meaning:size correlation as well. Regarding chronological order in language change, Heine et al. (1991, p. 213) propose the general rule that functional or semantic change precedes formal change and thus, among other things, phonetic reduction processes. Such lack of simultaneity of the developments involved may be taken as indicating a causal sequence. Thus, it can be hypothesized that meaning change, either directly or indirectly, brings about phonetic reduction. On the other hand, Hopper and Traugott (1993, p. 207) report cases in which ‘‘meaning change accompanies rather than follows syntactic change’’, which, of course, do not allow chronological ordering of semantic and formal change. Could the phonetic size of an EU also influence its semantic scope? Intuitively, this causal connection seems less plausible, and empirical data documenting its effectiveness are not available. 4.3. The frequency/meaning correlation A well-known axiom in logic states that the intension or the semantic content of an EU is inversely proportional to its extension or range of applicability to extra-linguistic states of affairs. Low intension corresponds to high extension, and

20

R. Pustet / Language Sciences 26 (2004) 1–25

high intension corresponds to low extension. Low intension is equivalent to high abstractness of meaning, or lack of semantic specificity. But the more abstract and unspecific the meaning of an EU, the higher the number of contexts it can be used in will be; consequently, the higher the index for what has so far been referred to as semantic scope will be for a given EU. Heine et al. (1991, p. 39) explicitly point out that the process of grammaticalization ‘‘has the effect that the intensional content of the concept concerned is reduced while its extension is increased; that is, compared to the source structure, the target structure has a smaller intension but a larger extension’’. On these grounds, increase in semantic scope, of necessity, will result in increase in frequency. Thus, there is a direct dependence relation between the parameters of meaning and frequency. ZipfÕs claim that the number of meanings of EUs listed in a dictionary increases as their discourse frequency increases (cf. Section 2.1.3) can be interpreted as one of the reflexes of this fundamental correlation between meaning and discourse frequency. Independent empirical evidence for the assumption that the parameters of frequency and meaning stand in a close dependence relation comes from recent research in cognitive science. Dehaene and Mehler (1992) report that on the whole, the discourse frequency of number words decreases gradually as the quantity indicated by a given number word increases. This can be interpreted as a manifestation of BenfordÕs Law (Benford, 1938), which captures the general statistical rule that in many situations in the real world in which numbers occur which are socially or naturally related, 1 appears as the first digit in about 30% of the cases, 2 in about 18% of the cases, 3 in 12% of the cases, and so on. Contexts in which the law is valid include figures on budget, income tax, population, the size of river basins, and the stock market. BenfordÕs Law is regarded as one of the numerous manifestations of ZipfÕs Law. This can be explained in terms of decreasing ease of processing. For instance, a human being may be able to determine at a glance how many, say, oranges are lying on a table if the number of oranges is not too high (six might be a realistic limit). If the number is higher, counting will become necessary to determine the exact number of oranges. The process of counting will, obviously, take the longer as the number of oranges to be counted increases, to the effect that counting is entirely refrained from with higher numbers, and approximations and estimations are employed to indicate quantities, at least in everyday non-scientific discourse. The phenomenon of decreasing frequency of higher numbers in discourse can thus be motivated by limited processing capacities of the human brain, and it can also be interpreted as a time- and energy-saving measure, which brings the factor of economy into play once more. In the case of number words, the parameter of meaning may not have a direct impact on discourse frequency, but the dependence is nevertheless real: the meaning of number words dictates the cognitive costs of their use in discourse, which, in turn, determines the frequency of their occurrence. Could the causal relationship between frequency and meaning be reversed as well, so that frequency controls meaning? Since empirical data attesting such a causal chain seem to be lacking, this question has to be negated.

R. Pustet / Language Sciences 26 (2004) 1–25

21

4.4. The big picture In summarizing the earlier findings regarding the basic architecture of ZipfÕs model of language, particularly with respect to the overall causality relations that link individual parameters, the following statements can be made: (A) The parameter of frequency controls the parameter of phonetic size, either via MandelbrotÕs stochastic model or via ZipfÕs economy principle. (B) Grammaticalization theory establishes at least a temporal order between semantic expansion and reduction in phonetic size, which implies the possibility that the parameter of meaning controls the parameter of phonetic size. (I) The parameter of meaning controls the parameter of discourse frequency. The final challenge in synthesizing ZipfÕs pioneering model of language and the findings of more recent research is arranging all relevant components in an integrated model. Turning to tenets (A) and (B) first, the parameter of size figures as a result rather than as a cause in both cases. The question that poses itself at this point concerns the order of the parameters of frequency and meaning in the causal chain. The link established in (B) is just a temporal one, so that a direct causal connection does not necessarily hold here. The causal connection described in (A) can be regarded as more direct. Thus, on the basis of tenets (A) and (B), the following causal chain can be posited: meaning > frequency > size Tenet (I) confirms this arrangement, since it establishes a direct dependence of frequency on meaning. On the basis of their empirical research on language change, Bybee and Pagliuca (1985, p. 72) propose an analogous causal chain: ‘‘As the meaning generalizes and the range of use widens, the frequency increases and this leads automatically to phonological reduction’’. A final question to be asked about ZipfÕs model is whether the system is influenced by any additional external factors. For instance, formal change, i.e. phonetic reduction, is not necessarily a response to frequency pressures. Phonetic reduction of individual EUs may also be brought about by regular sound change. Further, occasionally, the discourse frequency of EUs increases while their meaning remains unchanged. This seems to imply the possibility that system-external factors may operate on frequency too. For instance, the discourse frequency of the lexical item computer should have increased dramatically in more recent years (in English as well as in many other languages), although the semantic content of the word did not change. One might feel tempted to speculate that the behavior of the frequency parameter is not conditioned by an interference of semantics in this case. This conclusion, however, would be a fallacy because if it were not for the changing cultural relevance of computers, frequency of use of the word computer would have remained unchanged; i.e. increase in frequency of the word computer took place only because of the increased cultural importance of its meaning––frequency still depends on meaning in this case. However, culture definitely is a variable that can be added to

22

R. Pustet / Language Sciences 26 (2004) 1–25

the list of factors that have an impact on ZipfÕs basic model; thus, an extended causal chain culture > semantics > frequency > size can be posited (also, cf. Croft, 2000, p. 76). Further, it has often been stated in the more recent literature that there is an interaction between frequency and certain cognitive parameters. For instance, highfrequency items can be accessed and processed more rapidly than low-frequency items (Rubenstein et al., 1970). In addition, the cognitive phenomenon of entrenchment turns out to be intimately connected with discourse frequency (Langacker, 1987; also, cf. Bybee, 1985; Langacker, 2000). The occurrence of a linguistic event, i.e. the use of an EU or a more complex configuration, ‘‘leaves some kind of neurochemical trace that facilitates recurrence’’ (Langacker, 1987, p. 100). Repetition fosters entrenchment: Every use of a structure has a positive impact on its degree of entrenchment, whereas extended periods of disuse have a negative impact. With repeated use, a novel structure becomes progressively entrenched . . . moreover, units are variably entrenched depending on the frequency of their occurrence (Langacker, 1987, p. 58). While Langacker always portrays the relationship between frequency and entrenchment as such that frequency is the cause of entrenchment, never vice versa, entrenchment has sometimes been interpreted by others as having the effect of increasing the repetition of an element and thus, its discourse frequency. Higher discourse frequency, in turn, brings about deeper entrenchment, which further raises the rate of repetition. Entrenchment is seen as creating some kind of feedback loop which makes frequency increase self-perpetuating. This view is problematic because, as Langacker points out (cf. previous quotation), the reverse process of ‘‘disentrenchment’’ through decreasing frequency of use occurs as well. In language change, this process manifests itself in the phenomenon of obsolescence.

5. Conclusion In the preceding sections, it has been argued that ZipfÕs model of language, which has largely fallen into oblivion in general linguistics, is in fact strikingly modern. ZipfÕs work was complex and visionary enough to anticipate various important aspects of current functionally oriented language theory. Today, the principle of economy, which Zipf regarded as the fundamental factor motivating the statistical regularities he uncovered in numerous areas of human existence, is firmly established as one of the important explanatory parameters in current functionalist approaches to language and in cognitive science. Further, contemporary non-generative approaches to language are increasingly distancing themselves from the strict traditional bipartition of grammar and lexicon. There is ample evidence for the

R. Pustet / Language Sciences 26 (2004) 1–25

23

hypothesis that the connection between grammar and lexicon is much closer than has formerly been assumed––moreover, both grammar and lexicon are intimately connected with discourse. Not only does grammar develop out of the lexicon, as grammaticalization theory states; this development itself takes place during language use, i.e. in discourse. Ultimately, every aspect of linguistic structure arises out of discourse; Bybee and Hopper (2001b, p. 63) argue that ‘‘linguistic form is neither fixed nor aprioristically determined; rather structure is shaped . . . by discourse use’’. Thus, language ‘‘is not what a speaker has, but rather what he does’’ (Langacker, 1987, p. 382). The birthplace of grammar is discourse, and the matter it emerges from is the lexicon. Systematically linking the domains of grammar, lexicon, and discourse together holds great promise for future research, and claiming that Zipf has provided the most comprehensive and coherent foundation of such an integrated approach hitherto available should not overstate the case. Moreover, ZipfÕs holistic treatment of language opens up new perspectives for language typology. A basic question to be dealt with in this context is whether the statistical correlations Zipf discovered can be observed in all languages. Incipient research on this subject, such as Fenk-Oczlon and Fenk (1999), seems to suggest that at least the frequency:size correlation is, in fact, universal. However, the special value of ZipfÕs model lies not only in its theoretical implications. It also offers a sound methodology, i.e. an empirical approach based on unequivocally definable units of investigation. Discourse, after all, is a string of sounds uttered in temporal succession which can be partitioned into specific sound combinations which have the property of repeating themselves, and which each fulfill a consistent set of functions. These ever-recurring building blocks of speech, their phonetic size, and their discourse frequency, are physically real, unambiguously defined variables. An empirical method that proceeds by quantifying sound strings on the basis of their length and frequency should be unimpeachable enough to be of timeless validity. In addition, ZipfÕs model has the potential of opening up new dimensions in research on grammaticalization. A major objective of science in general is making things predictable. The issues that might face grammaticalization theory in the future concern, for instance, predicting the grammaticalization paths individual EUs will take in a given language with greater accuracy; or, at an even higher level of subtlety, predicting what EUs in a given language will grammaticalize at all, and at what time. But solving such problems will require a holistic approach to language like the one proposed by Zipf, in which EUs which are part of a given linguistic system are studied both with respect to their individual properties and with respect to the relationships holding between them, as well as within the overall system they are embedded in. It is, admittedly, doubtful whether a predictive model of language change can ever be constructed (Croft, 2000, p. 2), but even the mere attempt to find out why this should be impossible is likely provide further insights into the nature of human language. Another logical implication to be derived from combining ZipfÕs model with grammaticalization theory is that there might be quantitative limits imposed on grammaticalization processes. According to ZipfÕs model, which is schematically

24

R. Pustet / Language Sciences 26 (2004) 1–25

represented in Fig. 2, each class of EUs defined by a specific frequency index contains a higher number of individual EUs than each higher-ranking class, i.e. each class with a higher-frequency index and a lower phonetic size index, and fewer individual EUs than each lower-ranking class, i.e. each class with a lower-frequency index and a higher phonetic size index. This rule precludes the possibility of infinite enlargement of individual frequency classes through excessive grammaticalization, since this process might result in the quantitative inflation of a given class to the point at which it includes a higher number of individual EUs than the lower-ranking classes. Thus, in order to ensure that individual classes comply with ZipfÕs Law, grammaticalization processes must be regulated either through simple quantitative restrictions or through a force that creates a balance between innovation and obsolescence within the overall system. Since an increase in the population of EUs within frequency classes can also be brought about by borrowing, it can be hypothesized that such a regulatory mechanism applies to the process of borrowing as well. The earlier discussion, presumably, merely scratches the surface of the innovative potential inherent in ZipfÕs model of language. But even so, or maybe just because this is so, it should be safe to say that it is time for a Zipf Renaissance.

References Aikhenvald, A.Y., in press. Language Contact in Amazonia. Oxford University Press, Oxford. Benford, F., 1938. The law of anomalous numbers. Proceedings of the American Philosophical Society 78, 551–572. Breiter, M.A., 1994. Length of Chinese words in relation to their other systemic features. Journal of Quantitative Linguistics 1, 224–231. Bybee, J.L., 1985. Morphology: A Study of the Relation between Meaning and Form. Benjamins, Amsterdam. Bybee, J.L., 1994. A view of phonology from a cognitive and functional perspective. Cognitive Linguistics 5, 285–305. Bybee, J.L., 2000. The phonology of the lexicon: evidence from lexical diffusion. In: Barlow, M., Kemmer, S. (Eds.), pp. 65–85. Bybee, J.L., Hopper, P. (Eds.), 2001a. Frequency and the Emergence of Linguistic Structure. Benjamins, Amsterdam. Bybee, J.L., Hopper, P., 2001b. Introduction to frequency and the emergence of linguistic structure. In: Bybee, J.L., Hopper, P. (Eds.), pp. 1–24. Bybee, J.L., Pagliuca, W., 1985. Cross-linguistic comparison and the development of grammatical meaning. In: Fisiak, J. (Ed.), Historical Semantics, Historical Word Formation. Mouton de Gruyter, Berlin, pp. 59–83. Campbell, L., 2001. WhatÕs wrong with grammaticalization? Language Sciences 23, 113–161. Croft, W., 2000. Explaining Language Change. Longman, Harlow. Dehaene, S., Mehler, J., 1992. Cross-linguistic regularities in the frequency of number words. Cognition 43, 1–29. DuBois, J., 1985. Competing motivations. In: Haiman, J. (Ed.), Iconicity in Syntax. Benjamins, Amsterdam, pp. 343–365. Fenk-Oczlon, G., Fenk, A., 1999. Cognition, quantitative linguistics, and holistic typology. Linguistic Typology 3, 151–177. Gell-Mann, M., 1994. The Quark and the Jaguar. Freeman, New York. Giv on, T., 1975. Serial verbs and syntactic change: Niger-Congo. In: Li, C. (Ed.), Word Order and Word Order Change. University of Texas Press, Austin, pp. 47–112.

R. Pustet / Language Sciences 26 (2004) 1–25

25

Giv on, T., 1979. On Understanding Grammar. Academic Press, New York. Guiter, H., Arapov, M.V. (Eds.), 1982. Studies on ZipfÕs Law. Brockmeyer, Bochum. Haiman, J., 1985. Natural Syntax. Iconicity and Erosion. Cambridge University Press, Cambridge. Harris, A.C., Campbell, L., 1995. Historical Syntax in Cross-linguistic Perspective. Cambridge University Press, Cambridge. Heine, B., 1993. Auxiliaries: Cognitive Forces and Grammaticalization. Oxford University Press, Oxford. Heine, B., Claudi, U., H€ unnemeyer, F., 1991. Grammaticalization. A Conceptual Framework. University of Chicago Press, Chicago. Hopper, P.J., 1987. Emergent grammar. BLS 13, 139–157. Hopper, P.J., Traugott, E.C., 1993. Grammaticalization. Cambridge University Press, Cambridge. Joseph, B., 2001. Is there such a thing as ‘‘grammaticalization?’’. Language Sciences 23, 163–186. K€ ohler, R., 1986. Zur linguistischen Synergetik. Struktur und Dynamik der Lexik. Brockmeyer, Bochum. Langacker, R., 1977. Syntactic reanalysis. In: Li, Ch. (Ed.), Mechanisms of Syntactic Change. University of Texas Press, Austin, pp. 59–139. Langacker, R., 1987. In: Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites. Stanford University Press, Stanford. Langacker, R., 2000. A dynamic usage-based model. In: Barlow, M., Kemmer, S. (Eds.), pp. 1–63. Levickij, V.V., Kiiko, J.J., Spolnicka, S.V., 1996. Quantitative analysis of verb polysemy in Modern German. Journal of Quantitative Linguistics 3, 132–135. Mandelbrot, B., 1953a. An informal theory of statistical structure of language. In: Jackson, W. (Ed.), Communication Theory. Butterworths, London, pp. 486–502. Mandelbrot, B., 1953b. Contribution a la theorie mathematique des jeux de communication. Institute of Statistics, University of Paris, Paris. Mandelbrot, B., 1954. Structure formelle des texts et communication. Word 10, 1–27. Mandelbrot, B., 1961. On the theory of word frequencies and on related Markovian models of discourse. In: Proceedings of Symposia in Applied Mathematics, vol. 12. American Mathematical Society, Providence, RI, pp. 190–219. Mandelbrot, B., 1977. The Fractal Geometry of Nature. Freeman, New York. Miller, G.A., 1957. Some effects of intermittent silence. American Journal of Psychology 70, 311–314. Miller, G.A., 1965. Introduction to G.K. ZipfÕs The psycho-biology of language. In: Zipf, G.K. (Ed.), The Psycho-biology of Language, v–xv, second ed. MIT, Cambridge, MA. Moravcsik, E.A., 1978. Language contact. In: Greenberg, J.H., Ferguson, C.A., Moravcsik, E.A. (Eds.), Universals of Human Language, vol. 1. Stanford University Press, Stanford, pp. 93–123. Newmeyer, F.J., 2001. Deconstructing grammaticalization. Language Sciences 23, 187–229. Norde, M., 2001. Deflexion as a counterdirectional factor in grammatical change. Language Sciences 23, 231–264. Rubenstein, H., Garfield, L., Milliken, J.A., 1970. Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior 9, 487–492. Traugott, E.C., K€ onig, E. (Eds.), 1991. Approaches to Grammaticalization, vol. 1: Focus on Theoretical and Methodological Issues. Benjamins, Amsterdam. Tuldava, J. (Ed.), 1995. Methods in Quantitative Linguistics. Wissenschaftlicher Verlag Trier, Trier. Zipf, G.K., 1965a. The Psycho-biology of Language (1935) MIT, Cambridge, MA. Zipf, G.K., 1965b. Human Behavior and the Principle of Least Effort (1949) Hafner, New York.