System. Vol. 18, No. 2. pp. 209-220, Printed in Great Britain
1990
03&-251x/90 $3.00 + 0.00 ,9 1990 Pergamon Press plc
THE IDEA OF A LEXICAL
META-SYLLABUS
STEVEN D. TRIPP
School of Education,
University of Kansas, USA
There are competing orientations in language teaching syllabus design resulting in lack of clarity about the goals of the selection process of curriculum planning. Empirical studies of the statistical distribution of lexical items in spoken and written language indicate regularities which could be exploited for curriculum design. Although data are not sufficient to construct a lexical syllabus, the potential of unifying syllabus design is clear. Recommendations for further research are made.
INTRODUCTION In 1984, Michael Swan enumerated some of the types of syllabuses needed to teach a language properly. He stated that at least three formal syllabuses, covering structures, words, and pronunciation are necessary. A fourth syllabus covering graphics may also be necessary if the student does not know the Roman alphabet. Additionally, instructional designers would need at least four meaning syllabuses, covering functions, notions, situations, and subject matter. On top of these, we need a set of skills syllabuses which specify performance. The problem facing curriculum designers is how to combine all these syllabuses into a coherent approach. A fundamental aspect of curriculum design is the selection and sequencing of content. Mackey (1965), in his book, Language Teaching Analysis, wrote that: All teaching, whether good or bad, must include some sort of selection, some sort of gradation, some sort of presentation, and some sort of repetition. Selection, because it is impossible to teach the whole of a field of knowledge; we are forced to select the part of it that we wish to teach. Gradation, because it is impossible to teach all of what we have selected at once; we are forced to put something before or after something else. Presentation, because it is impossible to teach without communicating or trying to communicate something to somebody. Repetition, because it is impossible to learn a skill from a single instance; all skill depends on practice. (MacKay, 1965: p. 157)
The bulk of the research published in language teaching journals is about language learning and classroom techniques. In other words, it is either concerned with presentation and repetition or not concerned directly with instruction at all. In recent years there has been little published on the subject of selection and gradation. By my count, of 136 articles published in three journals read by language teachers only eight could be said to concern 209
210
STE’.‘EN
D. TRIPP
the selection and sequencing aspect of language teachin,. 0 Research, where it has dealt with teaching at all, has dealt almost exclusively with how to teach and not what to teach. This paper is an attempt to look at the question of selection and sequencing in the light of recent research into the nature of spoken English. Specifically, it argues that a lexical meta-syllabus based upon statistical regularities in the spoken and written languages would be a useful tool to curriculum designers. Although we lack sufficient information to construct such a syllabus at the present time, an analysis of existing data is suggestive of the kinds of information which could act as a unifying thread in a variety of language teaching approaches. To illustrate the kinds of information_ which could be useful to curriculum designers, this paper first presents a re-analysis of the data in Hartvig Dahl’s (1979) Word Frequencies in Spoken American English. Dahl’s study is based on a 1,058,888 word corpus drawn from tape recordings of 19 speakers in eight American cities. Dahl lists 17,821 different types’ that are in use in spoken English. Of these 17,821 types just 42 are sufficient to account for 50% of spoken English. In addition, 848 types will account for 90% of spoken English. Plainly, a very small number of words make up a very large part of the spoken English language. This paper will look in detail at those words that make up the bulk of spoken English and to make some suggestions about how these data could be used in curriculum design. There have been several large studies of the English language before, which have been used to support instructional purposes. Notably, the Thorndike and Lorge study, which led to A Teacher’s Word Book of 30,000 Words (1944), and hfichael West’s General Service List of English Words (1953) have been in use for many years. What new knowledge could possibly come from another large study of English word frequencies? Dahl attempts to answer this question in the introduction to his book thusly: The answer is that it is now apparent that the differences between spoken and written English are more substantial than many have suspected. Aside from important (and largely univestigated) differences in syntactic structure there are fundamental differences in total vocabularies and their distributions, in rates of high-frequency words, and in the presence of certain special classes of words (e.g., profanity). (Dahl, 1979: p. vii)
THE
ORIGINAL
DATA
To demonstrate clearly that there are important statistical differences between spoken and written English the following table is reproduced from Dahl (p. vii), which compares Dahl’s corpus to the Kucera and Francis (Brown University) corpus. This table shows that, for example, a mere 42 types account for 50% of the spoken corpus. In other words, of the approximately l,OOO,OOO words counted, about 500,000 of them are the same 42 words used repeatedly. We can see that in Kucera and Francis more than three times as many words are required to reach the same 50% level. Looking at the 90% level it can be seen that the written corpus has nearly ten times as many words as Dahl and that that amounts to a difference of more than seven thousand words. In addition,
THE IDEA OF A LEXICAL
Table
META-SYLLABUS
1. Number of types that account for specified percentages of tokens
Per cent tokens accounted for
Spoken (Dahl)
(Kucera
Written and Francis)
50
42
133
75
183
i ,788
80
270
2,827
85
436
4,572
90
848
7,955
95
2,243
16.217
98
5,669
30,124
100
17,871
50,406
there are significant differences in the rate of occurrence of certain high-frequency words. For example, the pronouns I and if are about 10 times more frequent in spoken English than in written English (Dahl, 1979). On the other hand, the articles a and the are more than four and 17 times more frequent, respectively, in written text. Differences as great as these should not be ignored in the process of the selection of the content of a course in spoken English. Since the range of vocabulary used in spoken English is much narrower it suggests that courses designed to improve listening and speaking skills can be based on a fairly severely constrained vocabulary set. On the other hand, it also suggests that emphasis on articles may be reduced while emphasis on pronouns may need to be increased. Before beginning analysis of the data, certain limitations of Dahl’s corpus should be noted. The corpus was transcribed from tape recordings of psychoanalytic cases. The technique of elicitation in these cases was free-association. Dahl argues that this produces a representative sample of every-day conversation. The reader might disagree. In the absence of better data the question cannot be answered definitively. However, as he readily admits this is not a random sample of American speech. It might have been better to call his study, Word Frequencies of Private Spoken American English. Because of the nature of the sample, the word list was modified in five ways for the present purposes. First, there are some types that are inappropriate for curriculum design. Unintelligible words were transcribed as Z_ _ _. This turned out to be the 21st most common type in the corpus. Although it is interesting that about one per cent of the speech was unintelligible, even to several native listeners, it has little applicability for instructional design. Secondly, the types, uh, yeah, ah, oh, uhm, eh, gee, huh, uhum, and hmm, have been eliminated. Some of these words are highly frequent and have fairly clearcut uses. There may be a place for them in a teaching program, especially a course in spoken English, and their elimination may be unjustified. In addition, fifteen additional words with sexual or profane meanings were eliminated because they are most likely artifacts of the psychoanalytic process and are inappropriate to many educational situations. Fourthly, in the original transcription process, proper names were encoded to ensure privacy. I eliminated these code words (like MCI). Finally, numerals were also eliminated.
STEVEN D. TRIPP
?I’
In analyzing this data for pedagogical purposes, it was necessary to classify words by part of speech. Since many frequent words may reasonably be assigned to several categories I have done so. This was done primarily by intuition and is therefore subject to error. In addition, each time a word is assigned to two parts of speech the number of words in the overall list increases by one. Naturally, this skews the calculated statistics slightly. To words replace the words that were eliminated. some extent, however, the “new” Additionally, some of the types in the list are not readily classifiable into a particular part of speech. For example, is the type, I’m, a pronoun or a verb? Types like this have been classified sometimes as a pronoun and sometimes as a verb depending on the purpose of the analysis.
METHOD Dahl’s data, as reported, consists mainly of two long lists: one, an alphabetical listing of all 17,871 types, and the other, a descending frequency list of the same 17,871 types. For pedagogical purposes the alphabetical listing is no more useful than a dictionary, perhaps less so. The descending frequency list could be useful if it were sub-divided and sorted in some systematic way. To do this, a computer program was written for dividing the list into subsets, separating the subsets into grammatical categories, alphabetizing the words within the categories, and calculating the distributions of the categories. The program was written in BASIC on a Sony SMC-70 computer.
RESULTS As an example of the kind of data received on print-out, the following table of the 100 most common words in spoken English is given. Please note that because of additions and deletions, this does not correspond exactly to Dahl’s first 100 types.
Table 2. The 100 most common Adjectives Adverbs
(8)
Verbs (26)
class
right some
and as because
but if or so then
feeling one right sort thing
(7)
Pronouns
by grammatical
can could
(2)
Prepositions
English
a an some the
(4)
Conjunctions Modals
in spoken
all as just like more no not now out really so then very way well
(17)
Articles
Nouns
all other
(4)
words
(10) (22)
about
time way
at for in of on out to up with
I I’m he her him how it it’s me my she something why you your
that that’s
are be been didn’t do don’t feel feeling get go going say see sort think thought want was were
they this we what when
had have is know like mean said
A quick glance at this table reveals some of the problems involved in classifying types into grammatical categories. Take the type sort for example. The dictionary lists sort as a noun
THE IDEA OF A LEXICAL
META-SYLLABUS
213
and a verb and it has been classified as such in this list. However, intuition suggests that its most common usage in conversation is in expressions like “sort of difficult” in which it functions as an adverb. A simple frequency count does not reveal collocations like this, so it has not been listed as an adverb to reduce the amount of intuitive data introduced into the study. This is a shortcoming of Dahl’s data and recommendations are made later on how problems like this could be overcome. One problem with listing a type in all the categories that, according to the dictionary, it can assume is that it nullifies the purpose of this classification. A dictionary must list all the conceivable parts of speech that a word can assume and therefore some relatively rare usages will be listed. The purpose of this selection is to separate the common from the rare in order to guide curriculum designers and reduce the burden on our students learning English. Therefore, although mean can be an adjective it is not listed as such. The same is true of time as a verb and several other cases. Finally, it will be noticed that some is listed as an article. The dictionary lists some only as an adjective. It is this writer’s opinion that this is incorrect and that the unstressed form of some is, in fact, the plural of a and should be listed as such. The reader may disagree. The point of this study is not that this classification is absolutely correct- obviously different people will have different opinions- but that selection and gradation such as this can and should be done. To appreciate how the parts of speech are distributed throughout the 800 most common words, that list was sub-divided into ordinal groups of 50 words and classified by part of speech. Table Set
3. The distribution
I
2
of parts of speech in ordinal 3
4
5
6
7
subsets
8
9
of 50 words
10
11
I2
13
14
15
16
Part of Speech Adjective
2
2
5
5
6
II
8
2
8
6
10
11
4
7
6
8
Adverb
8
9
11
8
8
9
9
4
1
7
4
2
4
3
7
2
Article
2
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Conjunction
7101130000000110
Modal
0
2
2
2
4
2
2
0
0
11
0
0
0
1
0
0
Noun
0
7
7
7
10
9
14
16
18
20
18
15
22
20
16
18
8
2
2
6
1114
2
10
Pronoun
13
9
11
4
5
5
2
5
3
0
2
2
1
1
0
6
Verb
10
16
12
17
15
10
14
19
19
15
15
19
18
15
19
16
Preposition
11111
Subset (1) represents the 50 most common words, subset (2) is the second 50 most common words and so on. The sets are not equally important of course. From Table 1 it can be seen that although subset (1) represents about 50% of spoken English, the next 15 sets combined represent only about 40% of the language in use. Beyond this level it takes about 1500 words just to add one per cent. It can be seen that the distribution is not even. Function words tend to cluster at the left (the most frequent levels) whereas nouns tend to make up more and more of the corpus as frequency levels go down. It is interesting to note that verbs make up a relatively constant precentage of the corpus, regardless of the level of
STEVEN D. TRIPP
214
frequency. Obviously, certain grammatical categories such as modals or conjunctions are closed sets and, therefore, new modals cannot be easily added to the language. Nouns, on the other hand, may be coined fairly freely and therefore the supply of nouns is theoretically unlimited. These facts do not, however, predict that nouns will be totally absent from the set of the 50 most frequent words or that, for example, prepositions will not be evenly distributed in terms of frequency. These data can be useful in course design. Another way of looking at the data is to measure the relative distribution of the parts of speech as we increase the size of the sample moving from the most frequent words to the less frequent ones. Table 4 is a listing of the cumulative frequencies of the parts of speech. Examination of this table reveals why vocabulary lists are so misleading as a guide to curriculum design. The 800-word list is equivalent to a listing of all the different words that appear in a text without regard to frequency. The bulk of such a list will be nouns and verbs. Together with adjectives and adverbs about 80% of the different words will be accounted for. Naturally, instructional designers and text-book writers will tend to be drawn to those things which seem to be most numerous. A moment’s thought will reveal how misleading this tendency can be. The first sample accounts for about half of all spoken language. This means that in a given discourse the chances are approximately 50% that the next word will be one from the first set. It is reasonable to suppose that if a large percentage of our instruction is devoted to those 50 words, effectiveness will increase. Table 4. Cumulative Sample Part
size
frequencies 50
as a per cent of the sample
100
200
300
size 500
800
of Speech
Adjective Adverb Article Conjunction Modal
4
4
7
10
11
I2
16
17
18
17
14
12
4
4
2
1
14
8
4
4
2
1
0
2
3
4
3
2
>I
>I
0
7
10
13
21
27
Preposition
16
10
9
6
5
4
Pronoun
26
22
18
15
I1
8
Verb
20
26
27
26
29
31
Noun
It may be objected that those words are so simple that they don’t warrant such attention. There are two answers to such criticism. First, the 50 most common words include such words as a, the, just, as, and eight prepositions which are by no means easy to master. Second, it is roughly true that the more common a word is the more meanings (or uses) it has. Take, for example, the word all. Any dictionary will list such uses as: all the way,
all last year, all all out, all over allrwantis.. bad as all that, If knowing
day, dressed all in black, all alone, all along, all the same, all in all, go the world, all right, not all there, all told, give your all, it 3 all broken, ., we all want it, all of us, after all, not at all, once and for all, not as and all of a sudden. Obviously, these little words are not all that easy. the word all means knowing all of its most common uses the reader might
THE IDEA OF .4 LE.YICAL META-SYLLABUS
215
consider whether his/her students would pass such a test. It should be noted also that most of the above examples are formed in combination with other high-frequency words and that, therefore, the teaching of one of these words naturally leads to the teaching of the others. It is possible to separate out certain parts of speech for special attention. A separate list of the first 300 most common verb forms was made and analyzed. The list was divided into base or present simple forms, -ing forms, and past forms. Of course, with some verbs the past form is indistinguishable from the base so they are inevitably listed with the base forms. In addition, where the past participle differs from the past form it is listed in the past tense group. Modals are included in this list as verbs, as are pronoun plus verb contractions such as I’m. The following is a listing of the 50 most common verb forms. Table 5. The 50 most common
verb forms
30 simple forms:
I’ll I’m I’ve are be can can’t come do don’t feel get go guess has have he’s it’s know like make mean mind remember say see there’s think want you’re
7 -ing forms:
being doing
13 past forms:
been could did didn’t
feeling going
saying
talking
thinking
felt got had said thought
was went were would
It should be noted that the largest group is the present simple or base form group. This suggests that at least half our instruction, when it is devoted to teaching verbs, should be applied to this group. It is also interesting to note that would appears without will (although I’ll is present) and that talking appears without talk. This suggests that the timing of the introduction of certain verbs, such as talking, should be delayed until the progressive form of the verb is being modeled. The verb frequencies can also be arranged in tables similar to the general vocabulary list. The following table shows the relative frequencies of the verb forms in subsets of 50. Table 6. Verb form frequencies (by per cent) Ordinal subsets of 50 words Set Simple
1
2
3
4
5
6
60
60
52
56
52
42
-ing
14
10
12
12
20
24
Past
26
30
36
32
28
34
This table shows that as we move away from the most common verbs [subset (l)] nonsimple forms tend to increase. This means that to the extent that we concentrate on forms other than the simple present and other base forms we are getting away from the heart of spoken English. The implications of this for curriculum designers are obvious. Since nowadays word frequency counts are done by computer the results are not exactly the same as what we would expect if they were done by humans. Specifically, words like can (to be able to) and cun (a tin can) are counted together. Obviously, this is not ideal. In the case of such distinctions as the stressed and unstressed forms of some and that,
216
STEVEN
D. TRIPP
for example, we would prefer a more sensitive measure. However, this fault can be an advantage in some cases. The singular and plural forms of nouns are counted separately as are the negative and third person singular forms of verbs. This allows us to determine the most common plurals, negatives and third person singulars in spoken English. In the following table those types are listed starting with the most frequent. Table Plural
7.
things
nouns
Negatives Third
person
The 10 most common
don’t singulars
people didn’t
plural
feelings can’t
was is it’s that’s
nouns,
negatives,
times years thoughts
wasn’t there’s
doesn’t
couldn’t
and third person days children haven’t
won’t
singulars
weeks friends hadn’t
he’s seems she’s does what’s
The list of the plural nouns is revealing because it consists mostly of words that refer to people and time. Although this may be an artifact of the free-association process, even so, this is a case where instructional selection gives us a hint about instructional practice. When teaching plurals it may be most natural to direct the discussion towards family and friends and how old they are or how long it has been since you’ve seen them and similar topics. The point here is that the language presented in courses should be representative of the language used by native speakers. Without statistical data on language use, we cannot be certain that the ideal is being met. The above list of the most common negatives presents no surprises except for the fact that isn’t and aren’f are missing. It is worth noting that couldn’t is about equally as frequent as doesn’t and more frequent than won’t. It is also notable that haven’t and hadn’t are in the top 10. We can be fairly certain that these forms are part of present and past perfect verbs and that, therefore, when these tenses are presented to students it may be better to present them in the negative. The reader may check his/her own intuitions but it seems to this writer that sentences like, “I haven’t done it yet.” or “He hadn’t finished when we . . .” are more natural than their affirmative counterparts. Again, the emphasis is on modeling language that accurately represents native usage. To further underline the point, it should be noted that in the third person singular list the most frequent verb besides the copula is seems. It is doubtful whether it is equally as frequent in English conversation textbooks. Accurate information about usage frequencies can help to correct omissions like this. There is a possibility that the nature of the elicitation process influenced the distribution of the verb tenses in the corpus and that the above data is not representative. Ota’s (1971) corpus of 14,724 verb forms taken from recordings of radio and television programs suggests that this is not the case. The following table adapted from Ota shows the distribution of verb forms in his corpus. The total column represents the combined total of the radio and television counts. The written count is appended as evidence that there are significant syntactic differences between written and spoken English. The fact that the simple present represents about 60% of the spoken corpus is in agreement with the numbers presented in Table 6. This suggests that Dahl’s elicitation process has not caused the sample to be syntactically skewed.
THE IDEA OF A LEXICAL
META-SYLLABUS
Table 8. Distribution of the verb forms (by per cent) Verb forms
Radio
TV
Total
Written
Simple present Simple past Present perfect
65.1 13.5 7.2
63.6 23.1 2.5
64.4 17.8 5.1
26.4 58.5
Past perfect Present prog. Past prog.
0.6 4.4 0.8
0.2 6.4 1.1
0.4 5.3 0.9
3.4 0.9 1.1
Pres. perf. prog. Past perf. prog. Passive
0.5 0.02 7.9
0.5 0.0 2.6
0.5 0.0 5.5
0.1 0.2 6.6
IMPROVING
ON THE PRESENT
2.7
DATA
Dahl’s study, although useful, could be improved upon in several ways. One area that needs to be researched is the potential differences among lexical samples drawn from differing domains of human activity. In the 10 years that have passed since his study, powerful microcomputers, with sufficient speed and memory, have become available making lexical studies practical without large amounts of funding. Language samples chosen from a variety of domains can now be analyzed by individual researchers provided there is interest. Such information could be of great use to curriculum designers and textbook writers. A second way in which lexical studies could be improved is in the preparation of the data. Common words such as the, some, and that have several usages. The relative frequency of these usages needs to be known. One way to detect these differing usage frequencies would be to encode the word according to their usages. For example, some could be spelled s-me when it is unstressed. Similarly, that could be spelled that, th@t, or th^t depending upon whether it is used as an adjective, a conjunction, or a pronoun. Needless to say, this same technique should be applied to homographs. Obviously this would have to be done by a person sensitive to language usage, but given the capacity of modern word processors and the restricted set of words which would be differentiated, this is not an impractical suggestion. A third type of information we need about the most common words is collocational frequency. One type of knowledge that native speakers have about their language is which words are likely to follow other words. This allows them to anticipate and fill in where there is interference. Strangely, Dahl and others have not used their computers to their full advantage to extract this sort of data. It is a relatively straightforward matter to program a computer to search out the 100 most common words and then list the most common sets of one, two or three words which precede or follow those words. This would give us information about the most common phrases in spoken English, information which would be invaluable to curriculum designers.
218
STEVEN
A LEXICAL
D. TRIPP
META-SYLLABUS
A lexical meta-syllabus would consist of a structured list of words and phrases divided by relative frequency. It could serve, not only as a guide to course designers but also to test designers because it would define a body of language which has been empirically determined to be central to native usage. Different courses could be evaluated as to the extent to which they contribute to the mastery of the meta-syllabus. Unfortunately, given the data available, a lexical meta-syllabus cannot be constructed. But it should be obvious that a structured word list, with usage and phrase-frequency information could serve as an underlying design guide for any course in spoken English. Any course must teach words, and logically those words should be the most frequent ones actually used by nati\ e speakers. However, it should be stressed that a sequenced list of words is not a methodology and should not be judged as such. Neither should it be expected to indicate, directly, how a language should be taught. Nevertheless, a careful analysis of the distribution of the words and their likely collocations may be indirectly useful from a methodological point of view. It has already been shown that the selection of something about the kind of topics that are most plural nouns and negatives a very small number sentences that are most frequent and, probably,
frequent words themselves can reveal natural in conversation. In the cases of of words is suggestive of the kind of most useful to language learners.
Another advantage of restricting the focus of a course to a limited set of vocabulary is that it serves as a guide to the designer about what is important and what is not. Whenever teaching drifts off into the world of relatively uncommon nouns and verbs it is good to be reminded that if the students could only master the first 300 or so most frequent words, they would have 80% of spoken English under their belt. Most language programs have nowhere near the time needed to cover the 17,000 words that constitute the remaining 20%, so the small amount of time available should be used wisely. Furthermore, when attention is focused on a small amount of data very often patterns and relationships appear that were previously obscured by the vastness of a full language. The wide range of uses of the word a/f were illustrated before. The reader might select a few words from Table 2, check a collocation dictionary, and see if all is an exception. Even a word like well has many more uses than come to mind at first. Another useful exercise is to take words that can appear in a particular position in a sentence and see how many reasonable combinations can be formed. For example, the following are some phrases that can appear at the beginning of a sentence: Yes and if . . ., Yes but what if . . ., Well, anyway even if _ . ., it’s not as if . ., Yes but evenso. . ., Maybe so, but anyway. ., Because somehow now that . ., R’ell, even though . . ., Either that or . . ., Or else if. . ., But maybe if. . ., So either . . ., No but maybe . . ., Anyway now that . . ., Tell you what, if . . .
These phrases were constructed by the author. The reader may judge whether they sound far-fetched and also whether they typically appear in ESL courses. If statistical data were available on such phrases we would have a basis for their sequenced introduction into courses. Only by deliberately restricting your analysis to a subset of the language will such patterns appear.
THE IDEA OF A LEXICAL
hlET.A-SYLLABUS
219
But such specific advantages of limiting instruction to a restricted set of highly frequent words are not the main reasons for selecting and grading the words to be taught. A lexical meta-syllabus would be, in Mackey’s terms, the selection and gradation of a set of data that would specify the content, sequencing and partial goals of a language course. It would be a list of words, phrases, and their most common usages divided into small subsets and ordered from the most frequent to the least frequent. At the subset level it would not be ordered and it would not restrict the manner of presentation and repetition of the data (the teaching method). A lexical meta-syllabus is not a teaching method. It wouId be compatible with a grammatical approach, a situational approach, a functional approach, or any other systematic approach that could be imagined. A lexical meta-syllabus is not a vocabulary oriented approach. In fact it is just the opposite. A lexical meta-syllabus says that in the early states of language learnin g, at least, expanding vocabulary is not important and as such it follows the general educational principle of holding form constant while changing content. In a sense, such a syllabus would specify the atoms that would be the content of more specific syllabuses that address the needs of particular situations. An analysis of specific situations might indicate that certain atoms are more important than others and therefore the emphasis would differ according to the learners’ needs. But, since the most common words and expressions are nearly universally necessary, valid tests based on the lexical metasyllabus would be roughly standardized because they would all be drawn from the same restricted set of language data. A lexical meta-syllabus would be restricted but not restrictive. It would not forbid the introduction of extra material nor would it specify grammatical structures. It would only state that at level one, for example, certain words and their most common usages should be taught. Such a specification implies certain grammatical, situational, functional, notional, and pragmatic categories but it does not state how they should be communicated to the student. There are two main impediments to the immediate adoption of a lexical meta-syllabus. The first is the lack of data as described above. The second is that the necessary consensus for such a syllabus cannot be obtained until it is tried and empirically proven. As with any proposal that includes human variables, good results cannot be expected just because an idea seems reasonable. Very often real situations are far more complicated than they seem at first. Having said that, there is nothing to prevent individual instructional designers and institutions from creating their own lexical meta-syllabuses based partly on Dahl’s frequency count, partly on good collocation dictionaries, and partly on common sense. After all the emphasis on how to teach, a little thought about what to teach may not be out of place.
NOTE ’ Types are roughly equivalent to words. It is normal in language corpuses to make a distinction between types and tokens. A type is a category and a token is an instance of a category. In a computer corpus, a type is a unique string of characters. Thus dog and dogs are tokens of different types, while bunk (money) and bunk (river) are tokens of the same type.
220
STEVEN
D. TRIPP
REFERENCES DAHL, H. (1979) U’ordFrequencies Research). MACKEY, OTA,
W. F. (1965) Language
ofSpoken Teaching
English. Detroit,
American Analysis.
Bloomington,
IN: Indiana
A. (1971) Tense and Aspecf of Present-Day American English. Tokyo:
SWAN, M. (1984) From structures to skills: a comprehensive by Terry Toney). The Language Teacher 8. 3-5. THORNDIKE, E. L. and LORGE, Teachers College. WEST,
TX: Verbatim
IV. (ed.) (1953) A General
1. (1944)A
view of language
(distributed
University
by Gale
Press.
Kenkyusha. teaching
and learning
(Interview
Teacher’s Wordbook of30.000 Words. New York, NY: Columbia
Service Lisl o/English
Words. London:
Longman.