j. Theoret. Biol. (1962) 3, 23o-267
Typology and Empiricism in Taxonomy t ROBERTR. SOI~L Department of Entomology, University of Kansas, Lawrence, Kansas, U.S.A. (Received 5 January 1962) T h e term typology has been used in taxonomy to imply procedures and philosophies of somewhat diverse meanings. Not all typological work meets every criterion suggested for such work by various authors. Some of these criteria are untenable when viewed in the context of modern biological theory, while others seem eminently reasonable in the light of present day knowledge. Mere reference to a work as typological will not convey to the reader or listener a clear idea why a reviewer has decided that it merits the term. Similarly, it would be improper to attach automatic derogatory implications to the adjective "typological", since it is only those aspects of typological procedure which cannot be defended or maintained today that would merit such a connotation. T h e classificatory philosophy of many proponents of typology is based on a platonic idealism. T o consider a taxonomic type as an abstract idea seems of doubtful utility at best and is certainly beyond the realm of scientific inquiry. The method by which types are obtained must be indicated without ambiguity. In classical typological work this is generally not done, and reconstructions relying on the metamorphosis of parts are often rather vague and controversial. Similarly, abstractions relying either on weighted or unweighted averages of the characters in the taxa to be studied are of questionable value. It would appear that character variation of individuals within taxa or of taxa within higher taxa must in some way be accounted for, if types are to be defined which will represent the natural order in the system. T h e choice of characters must also be clearly defined and defensible. A priori weighting or preference of some characters over others, based on either presumed phylogenetic importance or logical or functional primacy, is an unjustifiable procedure. Equal weighting and use of all characters leads directly into an empirical approach, which attempts to classify organisms on the basis of all available evidence, without preconceived notions about their arrangement. Such a procedure also arrives at a classification from which types may be abstracted, but does so by clearly defined and previously determined criteria, and does not rely on hypotheses based on certain preliminary conclusions in order to reinforce its argument. T h e empirical approach, the most notable example t Contribution No. 1I26 from the Department of Entomology, University of Kansas, Lawrence, Kansas, U.S.A. 230
TYPOLOGY AND EMPIRICISM IN TAXONOMY
23I
of which in recent years has been the development of numerical taxonomy, :has definite typological aspects which, however, appear to this writer to be of value and preferable to the phylogenetic method, Smirnov's statistical method of constructing types is contrasted with results obtained by numerical taxonomy. The latter appear preferable from several points of view. I. Introduction The purpose of the present study is to investigate some of the modern views of typology as applied to systematics, to examine their scientific and philosophical foundations and to evaluate critically their applicability. \¥e shall also consider the extent to which typological approaches are involved in two other taxonomic philosophies, phylogenetic systematics and numerical taxonomy. This is followed by a discussion and theoretical justification of the empirical and statistical typological elements in the new (numerical) taxonomies, which are currently being developed. In any science there are some points of view which, after a long and varied history, have finally been abandoned by students in the field. Frequently such decisions are preceded by protracted and acrimonious scientific controversy in which both sides, in order to convince the scientific public, tend to reduce the arguments of their opponents to the absurd. When one side emerges victorious from such a controversy the apparent absurdity of their opponents' views remains long remembered in scientific thought and the appellation of the vanquished idea often acquires a pejorative connotation. In biology there are numerous examples of the above-described situation: vitalism, Lamarckism, anthropomorphism and teleology come to mind immediately as ~ases in point. Negative reactions to these attitudes are instilled in students of biology from their first elementary course on. Deplorably, much shallow thinking surrounds the acceptance of currently prevalent points of view and the total rejection of opposing ones. From time to time re-examination is needed to explore whether the older views are still scientifically untenable, whether the currently accepted point of view has not acquired elements of the abandoned theory (as will be shown to be the case in the present inquiry) and whether such .an acquisition is legitimate. It will also behove us to reexamine periodically the newly acquired body of biological knowledge to see whether the presently accepted forms of thought are adequate to explain the processes or relations concerned, or whether another theoretical structure must be devised to accommodate the factual material. A return to older views may sometimes be indicated, although these views when restated in modern form might not be recognized by their early adherents. The subject matter of the present paper, typology, illustrates some of the above statements. Except for a small minority, systematists and evolutionists x6--2
232
ROBERT R. SOKAL
all over the world have decisivelyrejectedtypologicalprinciples.The word "principles" should be emphasizedhere because, while typologicalprinciples are spurned, typologicalpractices are of necessity universal among even the most phylogenetical]yoriented systematists. Nevertheless,among the biologists who care about systematic problems (and there are far too many who could not care less, much to the mutual detriment of systcmatics and the biochemicaland physiologicaldisciplines) typology is a point of view discredited to the extent that to call someone a typo]ogist is to employ a mild form of invective, and to describe someone's systematic work as typologicalis to condemn it outright on fundamental grounds. These attitudes were also shared by the present author, who with several others (Michener & Sokal, 1957; Sneath, 1957a; Cain & Harrison, i958; Michener, 1958; Rogers & Tanimoto, 196o; Soka], i96o; Sheath& Sokal, 1962) has attempted to establish certain taxonomicprinciples and practices, now generallyknown as numericaltaxonomy,in order to meet some of the objectionscommonlyraised against phylogeneticsystematics.These attempts have been labeled typological(Inger, 1958; Simpson, 1961). So deeply ingrainedis the rejectionof typology(and the idealisticmorphology associated with it) amongthe present generationof biologists, that the first reaction of this author to these charges was an almost instinctive denial of their veracity. More mature reflection, however, has resulted in the realization that there are typologicalaspects to the numericalapproachand that these typologicalimplicationsare of a peculiar, empiricalnature (see also Simpson, 1961). The present paper is a result of these reflections. Before we embark upon a discussion of typology it will be of value to review briefly some of the fundamental principles of numericaltaxonomy in order to provide some backgroundfor the present inquiry. It will also be necessary to define my use of the word taxonomy. "Systematics" and "taxonomy" have frequently been used synonymously and are occasionally so used in this paper. However, I agree with the distinction between these two terms (and "classification") established by majority usage at the moment (at least in the United States). The terms are aptly defined by Simpson (1961) as follows: "Systematicsis the scientific study of the kinds and diversityof orgamsms and of any and all relationships among them." "Classification is the ordering of organisms into groups on the basis of their relationships, that is, of associations by contiguity, similarity, or both." "Taxonomy is the theoretical study of classification, including its bases, principles,procedures and rules." I therefore use "taxonomy" as the scienceof the theory of classification; a person who erects a classification is a practisingtaxonomist, a person who investigates the theory of classificationa theoreticaltaxonomist;many workersfunction in both roles.
TYPOLOGY
AND EMPIRICISM
IN TAXONOMY
233
2. N u m e r i c a l Taxonomy
Numerical taxonomy is defined as the evaluation by numerical methods of the affinity or similarity between taxonomic units and the employment of these affinities in erecting a hierarchic order of taxa (Sokal, 196o; Sneath& Sokal, 1962). Some of the ideas on which numerical taxonomy rests originated with Adanson, a contemporary of Linnaeus. They have been voiced repeatedly since his day. With the advent of digital computers, necessary to make such methods practical, a number of authors in Europe, America and Asia have applied numerical taxonomy with apparent success. We need mention only a few examples, such as a study on bacteria (Sneath & Cowan, i958), a study of bees (Michener & Sokal, 1957; Sokal & Michener, i958), mosquitoes (Rohlf, i96i), butterflies (Ehrlich, i96I), rice (Morishima & Oka, I96o), and members of the nightshade genus Solanum (Sofia & Heiser, 196i ). Numerical taxonomy aims to develop methods by means of which different scientists working independently with the same taxa and characters must of necessity arrive at identical affinities between organisms. Proponents of numerical taxonomy advocate the strict separation of phylogenetic studies from taxonomic procedure. Taxonomic relationships between taxa are to be evaluated purely on the basis of the resemblances in the material at hand. The relationships are therefore static (Michener, i958 ) or phenetic (Cain & Harrison, 196o). They do not take into account the modes of origin of the observed resemblances or the rate at which these resemblances increased or decreased in the past. This attitude is taken for two distinct reasons: (I) There exist no methods for objectively assessing and quantifying the phylogenetic significance of character differences or affinities. (2) Natural taxa cannot be based on phylogenetic information, since in the great majority of cases the phylogeny is unknown. Numerical taxonomists find phylogeny of interest but, because it is so often hypothetical, regard it as an unsatisfactory basis for classification. A second distinction between numerical and phylogenetic taxonomy is the employment in the former of many, equally weighted taxonomic characters. All kinds of characters are equally desirable: morphological, physiological, ethological ones, etc. Coefficients of phenetic relationship are computed based on these characters, coded numerically. It is hypothesized that as the number of characters utilized increases, the value of the similarity coefficient becomes more stable and that eventually further search for and use of characters is not warranted by the corresponding slight decrease in the width of the confidence band of the similarity coefficient. There are a number of rules for admitting characters to a numerical taxonomic study, which are largely concerned with avoiding
234.
ROBERT
R. S O K A L
logical fallacies and redundancies. T h e details of these rules cannot be given here, but will be presented by Sokal & Sheath (1963). Three types of coefficients have been proposed by various authors for the computation of similarity: coefficients of association or contingency (e.g. Sheath, i957b), the Pearsonian product-moment correlation coefficient (Sokal & Michener, 1958 ) and taxonomic distance (Sokal, 1961 ). Once the similarity coefficients have been computed they are recorded in matrix form and structured by various methods of cluster analysis into a hierarchic system. The results of cluster analysis can be put into the form of a dendrogram, which is clearly not a phylogenetic tree but simply shows phenetic relationships with the nearest stem. Horizontal lines drawn across a dendrogram at a given level make all stems they cross taxa of equivalent rank. Instead of naming the taxa at various ranks by the conventional names of genera, tribes, families, etc., Sheath & Sokal (1962) suggest calling them phenons, and supplying them with a number depicting their similarity levels. Terms such as 8o-phenon (or 6o-phenon) connote groups associated at levels no lower than 80 (or 60)% of the scale used in the analysis. The phenon terminology obviates ambiguous older terms and avoids many sterile debates, as for example, regarding the generic nature of a given taxon. The employment of all characters, the absence of weighting of characters and particularly the rejection of phylogenetic inferences and interpretations during the classificatory process have occasioned the labeling of these techniques as typological by Inger (1958) and also by Simpson (1961). Simpson has brought out the empirical nature of the typology in numerical taxonomic work and Hennig (i95o), while his discussion antedates the recently published numerical methods, stresses the relationship between typology and the earlier attempts at numerical methods in the I92OS and 3os. The above account of numerical taxonomy is necessarily brief and the reader is referred to the various cited sources for elaboration. We shall investigate below the extent to which numerical taxonomy is typological. In order to be able to discuss this, however, we first have to become acquainted with typology and examine it critically, to determine which elements of the theory are tenable nowadays and to what extent they must be modified in order to meet the needs of modern systematics and in particular numerical taxonomy.
3. Typology Historical introduction. A brief look at the intellectual roots of the idea of typology in biology should prove useful. The basic concept traces back to Plato's philosophy of ideas : observable and seemingly real nature is only an imperfect image of reality, which latter is to be found in ideal abstractions or essences of objects in nature. The modern foundation of the idea
TYPOLOGY
AND EMPIRICISM
IN T A X O N O M Y
235
of types was laid in the eighteenth century by Goethe in his studies of comparative anatomy. Goethe's ideas can be found in the Erster Entwurf einer allgemeinen Einleitung in die vergleichende Anatomie, in which he states that the variations in form that can be observed in studying comparative anatomy should be compared with the ideal type. Goethe does not state exactly how this ideal type could be arrived at or what constitutes it, and his vagueness on this point has led to some differences of opinion about his basic philosophical position in typology. Some feel (e.g. Simpson, 1961 ) that the concept of type necessarily implies a platonic idea. It is well known that Goethe's views were strongly influenced by Herder, one of the Naturphilosophen. An excerpt from a letter by Goethe to Johannes Miiller would bear out this point of view: "Permit me to add that some forty years ago a controversy existed, which, while already won, has still not subsided. A type was to be recognized, a law, whose manifestation can only be demonstrated by its exceptions; it is this hidden and inscrutable pattern within which all life must move, forever attempting to exceed its closed boundaries." t Remane (i956), on the other hand, feels that Goethe's types rest upon recognition of patterns of biological correlations; thus they are essentially empirical, leading directly to the views later developed by Cuvier. In support of his interpretation Remane cites a number of statements by Goethe of which the following is representative : " T h e concept of a type will itself show us how such a type is to be found: experience must teach us the parts which are common to all animals and also wherein these parts are different in different animals; thence we proceed to abstraction, involving an ordering of these parts and the erection of a general image." Cuvier's typology was basically empirical. He based his groupings of animals on correlations among characters (still an essentially modern concept) and established clearly (perhaps for the first time) the distinction between conservative morphological patterns and adaptive correlations to meet certain environmental challenges. Richard Owen, in developing his theory of general homology, postulated a general type in conformity with which an animal is constructed. This general or archetype, which at that time did not possess the evolutionary connotations the present student of biology would ascribe to it, is an essentially typological concept. What is a type ? Modern typology centers on two concepts which will now be discussed. Their meaning is somewhat difficult to separate. They are the Bauplan (translated as structural plan by Zangerl, 1948) and the
t I have translated quotations by Goethe, Hennig, K~ilin, Naef and Remane from the original German. This has been a formidable task in view of the difficultyof the language employed by these authors. When a choice had to be made between literalness or comprehensibility I aimed at the latter.
236
R O B E R T R. S O K A L
type.~ Naef (i93 I, pp. 84-85) defines the Bauplan as "the general positional reIations of the shared (therefore characteristic or diagnostic) constituent parts of a group of individuals, species or organs, which permit of a common representation". There will be a hierarchic series of Bauplane, showing an entire gradation of relationships from the simplest and most general ones among the characters of the highest taxonomic ranks to progressively more detailed and hence more concrete relationships among the characters of low ranking taxa. Types are "norms in concrete and diagrammatic form" (Naef, i93 I, P. 94)- This definition hinges on the exact meaning of the word norm (identically spelt in German), which, while frequently used in Naef's writings, is not clearly defined by him. From qualifying statements it becomes, however, fairly clear what the idealistic morphologists mean by type. While Baupltine contain only those parts found in all members of a taxonomic group, a type would include portions and structures not necessarily found in every member, but thought to be necessary for the morphological derivation of all members of the group being considered. Thus not all typical characters will be common; but all shared characters are necessarily typical. It is ironically true that the simplest definition of type can be given in phylogenetic terms. Naef (I93i , p. 95) calls "natural forms which can be related by being of a common type 'form-related' or 'typically similar'. The concept of 'form relationship' or 'typical similarity' designates an ideal, purely formal relationship of observed structures within a natural group". These arguments about types enter the same vicious circle of reasoning to which phylogenetic taxonomy is beholden. The type and typical similarity are defined on the basis of conditions existing within natural taxa, yet there is no initial criterion of what these natural taxa are, why they are natural and how they are to be erected. The circular reasoning in this instance becomes even more obvious when we follow a further elaboration on the definition of the type of Naef (i93i , p. 96). " T h e type is that natural form conceptualized within a category through which all known forms in this category can be imagined to be connected by the simplest natural metamorphosis (i.e. structural transformation or change in form)". Yet we read in the immediately following sentence that "the natural systematic T h e word type as used a n d d e f i n e d by typologists is not related to the nomenclatural types on the basis of which names are attached to taxa. It is in this latter sense, however, that the word is much more extensively established. While in earlier times the two kinds of types were confused it is today recognized that a nomenclatural type only s e r v e s as a name carrier by means of which questions of identity and nomenclatural problems arising from the splitting or lumping of taxa are decided. Hence Simpson (x94 o) suggested that in order to avoid confusion, nomenelatural types be called onomatophores, while Schopf (i96o) proposed the term nomenifers.
T Y P O L O G Y AND E M P I R I C I S M
IN TAXONOMY
Z37
category can be defined as the totality of forms which can be referred back to a certain type". Naef points out that, while the rules for erecting a type are generally rather vague, when a type is to be constructed in actuality, it is found that there is generally little leeway in the process. Thus the principles of typology, in a manner of speaking force themselves upon the worker. K~ilin (i945, p. i37 ) defines a type as (freely translated) "the ideal construction of a form from which all separate forms within the category being considered can be thought to be derived" (his italics). Derivation here has nothing to do with descent, being entirely a relationship involving thought processes. Another definition of the type of a group (class, family, genus, species, etc.) would be that it is "the basic form of that group obtained by abstraction from the subordinated categories and in the final analysis from all relevant individuals. These latter are related to the type as individual cases to a law or as musical variations to the theme of a melody" (K~ilin, i945, p. i38 ). K~lin's definition of a Bauplan is that it is the positional or relational plan of the parts of the type to its whole. In this respect K~lin's definition is somewhat different from Naef's, who states that every type would have a Bauplan but not every'Bauplan would represent a type. Danser (i95o , p. iz5) defines a type, which he calls ground-plan, as follows: " T h e systematic ground-plan of a natural group is an imaginary living being in which are combined the following qualities: firstly, everything that the ground-plans of the component groups of the next lower order have in common; secondly, where these diverge, all the most primitive conditions occurring amongst these." He goes on to say that "the theory of all these ground-plans or types is called typology. It is the essential basis of natural systematics such as we conceive it". Yet again we do not find the natural group defined as such. A further difficulty of such a definition is the meaning of "primitive". In typological theory "primitive" can have no immediate evolutionary connotations, but must be thought of as the simplest morphological configuration from which other structures can be derived conceptually. Zangerl (i948 , pp. 355, 357) expresses the relations between structural plan (Bauplan) and morphotype (type) in yet a different form. The structural plan is "the conformity to a design in the topographic (spatial) relationship of the parts of an organism to the body as a whole. It represents the basic mutual arrangement among the parts of the compared organisms". The morphotype is "an abstraction of the actual form variety [sic] within a group of organisms of the same structural plan; . . . the actual form conditions can be thought of as being derived from it" (his italics). This is essentially the same definition as K~lin's.
238
ROBERT R. SOKAL
From the above definitions the nature of a type as an ideal construct should have become apparent. It may be asked why one should describe types at all? If it is only in order to describe a platonic ideal, then it is not an undertaking that would appeal to modern biologists; if it is to fix an archetype from which other morphological plans can be conceptually derived, then the vast majority of present-day biologists would feel that ontogenetic and sound phylogenetic work should take the place of idealistic morphology and typological systematics, respectively; if the recognition of types is undertaken as a taxonomic tool with the implications that the recognition of types will aid us in defining taxa, then we must examine whether these premises hold. The rest of this paper will be concerned with the latter problem. 4. Typological Taxonomy Any scheme claiming to be a self-sufficient method for the classification of organisms must provide suitable answers to a number of important questions. These questions might be summarized under the following headings : (i) (2) (3) (4) (5) (6)
Are taxa real? Are there natural taxa and how is "natural" to be defined? How are natural taxa to be recognized, i.e. delimited, in practice? How are natural taxa to be characterized? How and on what basis are characters to be chosen? What constitutes homology?
We shall discuss these questions below for typological taxonomy. Are taxa real ? A fundamental problem of taxonomy is whether the taxa that are defined and described by taxonomists have any reality at all. Hennig (I95o) discusses this question at great length, at both the specific and supraspecific level. He points out, quite correctly, that many of the prevalent arguments on this question are based on confusion about the meaning of the term reality, and on lack of critical distinction between relationships based on similarity or resemblance and relationships based on descent. He feels that there is no sharp distinction between that form of reality commonly associated with the individual and that other ontological form ascribed to species and higher categories. Hennig's distinction between phenetic (sensu Cain & Harrison, I96o ) and phyletic relationships is very important. Many systematists nowadays would maintain that while the species can be objectively defined and delimited this cannot be done of higher categories. The hidden snag in such a statement is that we are not told what criteria, phyletic or phenetic ones, are used to define or delimit the taxon under consideration.
TYPOLOGY
AND
EMPIRICISM
IN T A X O N O M Y
239
T h e existence of genetical (or inferentially genetical) methods for determining intraspecific relationships is generally conceded nowadays. However, it is often quite difficult to discover and interpret such relations (tokogenetic relations in Hennig's (i95o) terminology). While there is no universal agreement on the nature of the species, a consensus on whether certain populations are conspecific may be obtained. In sufficiently wellknown groups the (biological) species can be defined with a fair degree of objectivity. However, in view of limitations of material and scientific manpower such considerations are employed for only a tiny fraction of the organic world. I think it would be fair to state that in the vast majority of cases phenetic, purely morphological methods are used in deciding Whether a given form is a distinct species. It would not appear that a change in this practice is imminent. Such considerations have recently led some workers to the realization that for the large majority of species the biological species concept is at present not workable. They suggest that species should be frankly defined on a phenetic basis by consideration of many characters. Proposals in this direction, employing numerical taxonomy, have been made by Ehrlich (I96I). At the infraspecific level attempts at classifying strains and races of plants by Morishima & Oka (x96o) and Soria & Heiser (i96i) also deserve mention. Among asexual organisms phenetic species criteria are likely to be the only applicable ones. A search for such criteria was one of the major factors behind the recent development of numerical taxonomic methods in bacteriology (for a review of this extensive field see Sneath, i96z ). Similarly, there remains little doubt that higher categories can be objectively delimited on a phenetic basis by a variety of empirical statistical methods of the general class of cluster ahalysis (see McQuitty, I954, I956, I957; Sokal & Michener, i958; Rogers & Tanimoto, I96o; Rohlf & Sokal, I962), although we may not agree on a name for the hierarchic level so established. A number of workers, such as Martini (i929) , Boyden (i947) , Bigelow (i958), Sneath & Sokal (I962), and the several typologists mentioned elsewhere in this paper have maintained that phyletic approaches cannot serve to provide objectively definable and repeatable taxa. This is not to say that taxa are not real from a phyletic point of view, when only the branching of the lines, i.e. cladistic relationships (Cain & Harrison, I96o), are considered. Describing a taxon as all the descendants of a monophyletic line existing at a certain point in time provides a logically unequivocal definition. However, from a practical point of view such a definition is of no more use to the taxonomist than the definition of red as the color of blood would be to a red-green color blind man, since, except in unusual circumstances, the taxonomist would have no knowledge of the nature and composition of monophyletic lines and assemblages.
240
R O B E R T Ro S O K A L
To sum up, I would support Hennig in asserting the reality of taxa of low and high rank from both the phyletic (or tokogenetic) and the phenetic point of view. I would maintain, however, that phenetic classifications are operationally more useful than phyletic classifications which cannot be more than theoretical concepts in the vast majority of cases because of our lack of knowledge of genetical and genealogical relations in the groups concerned. Taxa are real in the system of the typologist, at least as real as the types. The existence of natural taxa is nothing but the "expression of the recognized or assumed typical similarities" (Naef, I919, p. i9). Naef states that the representation of the diversity of a group of organisms reduces simply to the relations of the separate types determined for these organisms. He maintains that a phylogenetic tree is under all circumstances the simplest and most perfect method of representation since it reproduces the systematic relations completely. In this he would seem to be over optimistic. It has been recognized repeatedly (see Simpson, I96I ) that phylogenetic trees, even if used only for the presentation of phenetic similarities are still not adequate to show all the various kinds of relationships which have on occasion been ascribed to them. Sokal & Sneath (I963) point out that of the eight types of relationships that have at various times been expressed by phylogenetic trees only three can logically be shown in this manner. Are there natural taxa and how is "natural" to be defined ? There has been a general assumption throughout the history of taxonomy that groupings of organisms exist in a hierarchic, nested scheme ranging from the individual to the species and on to higher categories, such as genera, families, classes, etc. Following the work of Linnaeus, it has been realized that among the multiplicity of classificatory schemes which could be erected for the known flora and fauna there appeared to be an elusive, yet most consistent and most satisfying scheme called the "natural classification" in antithesis to other "artificial classifications". The different reasons advanced at various times for the existence of such a natural system of classification need not concern us here. They have been ably covered in historical reviews by a number of authors in recent years (see Cain, i959; Remane, i956; Simpson, i96i ). Virtually all present day taxonomists agree that similarities among organisms are due in large measure to common descent and that the gaps and discontinuities among systematic groups are caused by extinctions, by different evolutionary trends and by different evolutionary rates of separate phyletic lines. The differences among taxonomic schools become apparent in their answers to the ~luestions discussed below, namely, how is "natural" to be defined, how are natural taxa to be recognized in practice and how are
T Y P O L O G Y AND E M P I R I C I S M
IN TAXONOMY
24I
natural taxa to be characterized? These three questions may appear to be essentially the same, but they are quite distinct, as will be shown. Confusion about differences in meaning of these questions has led to considerable controversy in systematies. A definition of "natural" can be given in a theoretical manner. The common reply which one obtains on asking the average systematist to define a natural taxon is that a natural taxon is a monophyletic group. This is a legitimate and consistent definition, but it must of necessity remain a theoretical concept since no natural taxon could ever be recognized or delimited on that basis, unless we had complete knowledge of the evolution of the group. We do not, of course, have such knowledge and cannot have it for the vast majority of groups in the foreseeable future. Thus a natural taxon must be a workable concept, i.e. it must be possible to apply it to existing systematic data. The definition and recognition of natural taxa must therefore go hand in hand. The third question about the characterization of natural taxa refers to their description after they have been erected and delimited. For example, having erected the natural group Insecta, are we to describe it on the basis of a single character, more than one character or in what particular manner? One looks in vain to the orthodox typologist for an answer to the question how "natural" is to be defined. Neither of the extensive essays by Naef (1931) or Kiilin (1945) contains any reference to a definition of the natural system, although these authors use the term in establishing definitions of type. Danser (i95o, p. 123) realizes the difficulty of defining natural groups, but is not able to state any exact or scientific definition for them, ending with the hope that " . . . some day systematics will arrive at a more exact stage, but this does not alter the fact that" already now we are entitled to face its problems, be it for the moment in a more intuitive but nevertheless scientific manner". He is finally forced to say "the most natural are those groups which on closer inspection become more and more distinct. By the side of these, less natural, more o~ less artificial groups in all conceivable shades are known". Simpson (I96I), in discussing a natural classification, points out quite correctly that types cannot be recognized unless groups are established from which they are to be extracted and that their definition according to idealistic typological principles is thus logically impossible. Yet Simpson himself is unable to produce any definition of natural classification, realizing (1961, p. 57) that "in fact much of the theoretical discussion in the history of taxonomy has, beneath its impersonal language and objective fa§ade, been an attempt to find some theoretical basis for these personal and subjective results". He is led to conclude that a natural classification can be meaningfully achieved only through an evolutionary classification. As has been stated above, this is an acceptable definition, but
242
ROBERT R . SOKAL
not a workable one, since it assumes knowledgewhich generally is unavailable to the investigator. Simpson rejects as illogical the contention by Gi]mour (1951) that a classification serving a large number of purposes will be more natural than one that is more specialized and that the most useful and generally applicable classification will be the most natural one. It would, of course, be quite useless to argue that the definition of "natural" sensu Gilmour is more correct than a phyletic one, since each can be validly proposed in its own right. However, Gilmour's dictum that a system of classification is the more natural the more propositions there are that can be made regarding its constituent classes admits of objective measurement and testing in contradistinction to Simpson's natural system. Furthermore Gilmour's system has powerful predictive properties; it is therefore to be recommended. It is my belief that it will eventually be shown that, with few exceptions, monophyletic taxa will also be most natural in the sense of Gilmour and that the two concepts will emerge as essentially identical. If this is so, phylogenetic conclusions may eventually be drawn from a demonstration of naturalness sensu Gilmour. H o w are natural taxa to be delimited or recognized in practice ? By what certain or at least probable criteria can we establish natural groups in an assemblage of organisms? Any workable definition of natural groups must contain directions for their construction. Typology not only does not tell us which organisms should initially be grouped and on what basis, but rules for the inclusion or exclusion of a given form in a type are not clearly prescribed. A number of papers, e.g. K~ilin (i945) , show how to abstract a type based on a given set of characters, but we are not told how we should deal with extension of our knowledge to more characters and more complex types, particularly if information on a newly studied structure does not lend itself to arrangement within the previously established system. The process of coordination and subordination of types is a rather arbitrary one and is as vague and more or less instinctive as current phylogenetic systematic practices. A number of rules (of "morphological primacy") are given by Naef (1919) for erecting natural taxa. Although not based on phylogenetic arguments, the circular reasoning of phylogenetic taxonomy persists in these rules. For example, in the "primacy of ontogenetic precedence" it is alleged that forms of typically similar morphogenesis will resemble each other more in the earlier compared (homologous) stages. Hence it is claimed that, if stages at the beginning of ontogeny resemble each other more than they do later, they necessarily resemble a type. This is an unsupported hypothesis, which operationally does not differ from the biogenetic law and must necessarily suffer from all the criticisms to which the latter has been
T Y P O L O G Y AND E M P I R I C I S M
IN TAXONOMY
243
subject. The "primacy of systematic precedence", to cite a second rule, is to recognize those characters as typical within a group of forms which are already typical at the next higher systematic level. Such a procedure, however, rests one hypothesis on another, since we cannot assume that any one hierarchic systematic level has been determined without error and can be used as a standard against which to compare another. On the other hand, one could hardly take exception to the "primacy of paleontological precedence", by which Naef means the ordering of types in geological sequence in those relatively few groups where it is possible to do so. H o w are natural taxa to be characterized? T h e delimitation and full description of a natural taxon should involve the use of many characters. On the other hand for practical purposes it is useful to diagnose natural taxa by the use of few characters, as in a taxonomic key. T h e diagnosis of taxa in typology is no different from that in phylogenetic systematics, except that typologists stress repeatedly (e.g. Naef, 1919) that diagnosis is but a poor substitute for description. A scanty description, based on few characters may be adequate only for diagnosis, but if taken for descriptive work, would weaken the quality of the taxonomy based upon it. Problems arise when through the discovery of new forms the diagnosis has to be revised, while the nature of the type as such is generally not affected. Unless the new forms are included in an old type, when no change at all occurs, the concept of the type of the particular group is only widened, but not fundamentally altered in moderately well-known groups. H o w and on what basis are characters to be chosen ? The kinds of characters studied have changed during the development of biology and also with different schools of thought of taxonomic research. Modern systematic theory has generally stressed the equivalence of various types of characters. Physiological, ethological and other characters have begun to take their place alongside morphological ones in modern systematic treatises, although I think it is still fair to say that 95%, if not more, of current systematic work is based on morphological evidence. This evidence is largely of the (relatively) easiest obtainable kind, such as bones and skins in the case of vertebrates or characters of the exoskeleton in the case of arthropods. In the development of the taxonomies of other groups, microorganisms for example, physiological and biochemical characters have necessarily been used at a relatively early stage. Typological systematics has almost exclusively employed morphological characters. While historical reasons, i.e. the early emphasis on descriptive biology, are largely responsible for this situation, this necessity has been turned into a virtue by some typologists who speak of the "primacy" of morphological material. Kiilin (1941) means by this that it is impossible to understand function logically other than through form. He holds that there
244
ROBERT R. SOKAL
is a well known, strict correlation between form and function. A given function presupposes a certain structure and external form. The structure and external morphology of a cell, of a tissue, of an organ, etc., present the necessary material basis for the function of these parts. Structure and external form thus are in "functional preparedness" in the organism; they contain potential function. In a modern taxonomy such a position can no longer be taken. While it is true that much functional biology is conditioned by the morphology of the organism, the type of function to which this applies is the grosser physiology of the organism. Should we ignore findings on blood groups and biochemical relationships among organisms, and have we any right to exclude behavorial and other differences, which may or may not be perceptively mirrored in the morphology? A purist might maintain that ethological differences must always have morphological differences as a base, and it might be hard to deny such a claim. On the other hand it would be almost impossible to note in every case the fine differences in morphology' at the base of such behavioral differences, while the latter may be quite obvious. To restrict typology to non-functional morphology, as Danser (i95o) would have us do, would engulf us in a mass of unproven and unprovable hypotheses concerning potential functions of given structures. With the increase in our knowledge of the fine structure of D N A we may be returning to a "morphological" interpretation of genetic differences and a typology based on the genetic code. However, such a typology would bear little if any resemblance to that of the idealistic morphologists. Are characters to be used indiscriminately, using as many as can be recorded or are characters to be examined, evaluated and selected on the basis of stated principles or personal preference? In this respect an important distinction exists between numerical taxonomy on one side and typology and phylogenetic systematics on the other. Numerical taxonomy assumes the equivalence of all characters and requires the use of many equally weighted characters, justifying this procedure by several hypotheses based on genetical theory. Phylogenetic systematics and typology, on the other hand, make a careful selection of characters, basing a classification on homologous characters and excluding the evidence of analogous and convergent ones. The definition of homology and its theoretical foundations in phylogenetic taxonomy and in typology differ markedly, but the characters chosen by typological and phylogenetic taxonomists are frequently identical. Regarding the numbers of characters to be chosen we find no admonitions at all from the typologists, except that in their examples the numbers of characters are relatively small. The idea of numbers of particular characters would be alien to typologists ab initio in view of the stress placed on the
TYPOLOGY
AND EMPIRICISM
IN TAXONOMY
245
holistic nature of typological theory and the aspersions cast time and again by them on the particularization of morphology as a result of trends in genetic research. What constitutes homology ? We cannot take time here for a historical review of the concept of homology. This has been done thoroughly by Naef (i93i), Boyden (x947), Simpson (I96I) and others. Taxonomic work of any sort would be impossible if characters could not be homologized among the taxonomic units in any given study. However, the definition of homology has varied considerably depending on the school of thought of the person defining the term and can easily lead to fruitless arguments. It is doubtful that there is one central concept of homology in biology, which by successive definition and redefinition could be made precise to the extent that everyone would accept it. There clearly are several different kinds of homology, which probably should receive different names or at least different qualifying adjectives to distinguish them. There have indeed been repeated attempts in this direction (Haas & Simpson, x946). In taxonomy the concept of homology, just as that of natural taxa, must be defined so as to be workable and applicable to real data without logical pitfalls. The phylogenetic definition of homology does not meet these requirements, although it is of value as a theoretical concept. It is in their consideration of homology that typologists have made the most solid contributions to systematics. Many definitions of homology are in the typological literature, but just a few will be given here. Naef (I93x, p. 85) defines homology as "the formal (idealized) relation between certain parts of the organization of different organisms (or portions of organisms) with similar body plans, resulting from an ordering of the parts in a corresponding manner. It can be represented in a schematic diagram by single components for each part". K~ilin (i945, p. I43 ) has chosen a simpler and more fortunate phrase: "Homology consists of the correspondence of parts of different organisms in the common type-bauplan of a certain systematic category." We should note the absence of any implication of phylogeny in both of the above definitions. Zangerl (i948) has pointed out quite correctly that any phylogenetic definition of homology robs the concept of its only possible function, namely as a tool, since we do not and cannot, know anything a priori, about the causality of a given structural relationship between parts of different organisms. Boyden (x947) , no typologist as such, has come to the similar realization that homology has to be based on empirical evidence rather than on hypothetical phylogeny. His definition is that structures are homologous "if they are essentially similar in their structure and embryonic development and in the relative position and connections of corresponding parts of the bodies of the organisms". Boyden's definition is almost identical to the independently r.s.
x7
246
ROBERT R. SOKAL
arrived at definition of operational homology by Sokal &Sneath (r963). The important work of Woodger (I945) is similarly empirically based. The typological position vis-a-vis phylogenetic taxonomy. Phylogenetie systematies is at present accepted by the preponderant majority of systematists, at least as a professed principle if not as an actual guide line of work. The typologists' criticism of phylogenetic systematics has been but little heeded; largely this must be due to the persuasive and pervasive influence of the phylogenetic point of view on taxonomy. Another reason for typological viewpoints being not too familiar (at least among English speaking taxonomists) is that much of the pertinent literature has been in rather complex, philosophically worded German. The vacuity of much of phylogenetic reasoning does not become apparent until the problems of systematies are viewed with some objective detachment. Numerical taxonomists, who independently arrived at views critical of phylogenetie taxonomic principles, have found much merit in the typologists' critique. The fundamental weakness of phylogenetie systematies is well known: hypotheses about phylogenetie relationships are used as evidence about taxonomic relationships, which in turn yield judgments concerning the phylogenetie relationships of other structures and forms. This involves almost any phylogenetie systematic study in a morass of circular reasoning from which escape to solid factual ground is difficult, if not impossible. The more profound among the phylogenetic taxonomists are well aware of this difficulty. Simpson (I96i) clearly states the dilemma, but nowhere in his book, the most cogent modern review and exposition of phytogenetie systematies in the English language, is there a logical and consistent defense for such practices, except by calling classification an art, and not a science. Hennig (i95o), who realizes and describes the dilemma in even greater detail, defends the circularity of reasoning on the principle of reciprocal illumination. This means that some light thrown from one source of logical illumination onto a natural situation will kindle another, brighter light in the latter, which in turn will throw added illumination on the first source. Thus in a self-reinforcing, positive feed-back type of operation the relationships between the subjects to be investigated are eventually clarified. While conceptually attractive, such ideas rely upon a peculiar thermodynamic pattern; it remains to be demonstrated how procedures of this sort differ from the much condemned vertical construction of hypothesis upon hypothesis. Remane (I956), a fundamentally phylogenetically oriented systematist, has also shown that phylogenetic reasoning cannot serve as the basis for erecting a natural system. He has chosen a nonphylogenetie criterion of homology as the basis on which he erects the natUral system and as the means by which he breaks into the circle of reasoning.
TYPOLOGY
AND EMPIRICISM
IN TAXONOMY
247
Zangcrl (i948 , p. 354), discussing the evidence for a natural system to be obtained from morphology, has put the relations very aptly: "All morphological concepts express observed relations. These relations are facts verifiable by any subsequent observer. The morpholog,cal concepts are factual generalization derived from observed structural relationships and as such they do not and cannot carry phylogenetic implication" (author's italics). Or again (p. 37i): "It must, first of all, be remembered that a given morphological interpretation is sound, while its phylogenetic translations may not be; . . . for this reason it can hardly be overemphasized that morphological interpretation (with careful consideration of its meaning and limits) must always precede a phylogenetic conclusion, so that the speculative elements, hereby introduced, can easily be recognized." If "morphological interpretation" in the above quotation is replaced by "phenetic evidence", the passage would be a proper representation of the views of the numerical taxonomists. That the modern phylogenetic systematists are themselves not true to their professed phylogenetic principIes has been stated by a number of authors. On careful inspection it should be obvious to anyone that modern systematics, at the species as well as at higher levels, is based to a very large degree on similarities and differences about the phylogenetic origin of which little or nothing is really known. Conclusions are clothed in phylogenetic terminology, giving the impression that much is known about phylogenetics. Danser (i95o , p. I35 ) has stated it dearly: "Now neither are modern so-called phylogenetic systems based on descent but, in fact, mainly on similarities and divergencies, therefore only phylogenetic as to their terminology. To these quasi-phylogenetic systems the typological system is opposed only in so far as it is desirous of using a terminology that openly acknowledges its non-phylogenetic character." Similar views are expressed by Myers (I96O). Is typology tenable ? To what extent are typological principles consonant with modern biological theory? A fundamental typological approach through idealistic morphology would, I believe, be rejected by most modern biologists. A platonic idea of type with its metaphysicalimplications is incompatible with the ordinary processes of thought and analysis of natural science (Remane, I956; Simpson, ~96i). In a very thorough philosophical analysis of the nature of systematics Bloch (i956) has shown that metaphysical concepts of the type cannot automatically be connected with the physical cause-and-effect system customarily recognized and investigated in science. Furthermore, and perhaps of more moment to the practical minded systematist, an idealistic type cannot have any value, historical or heuristic, in the establishment or study of the natural system. For a recent view to the contrary see Borgmeier (1957).
248
ROBERT R. SOKAL
While one therefore cannot accept an idealistic type concept in a scientific taxonomy, it does not follow that every type concept is of necessity based on idealistic morphology. Hence Simpson's (1961) statement that typological theory is inextricably linked with philosophic idealism is not correct. Simpson himself has pointed out the empirical nature of some of the newer typological approaches and that there is no necessary connection between the idealistic point of view and an empirical, but non-phylogenetic method. Bloch (1956) also finds that it is not necessary to consider all non-phylogenetically oriented morphology and taxonomy as idealistic. Simpson (1961, p. 5 o) further condemns typology on the grounds that the "concept of distinct and static patterns cannot meaningfully be applied to real groups of organisms, which are parts of an evolutionary continuum and which are always highly variable. Their variation is not incidental or an 'accident' to be ignored at any level in taxonomy; it belongs to the very nature of taxa and is part of the mechanism of their origin and continuing existence". These statements confuse the dynamism of evolutionary processes with the essentially static nature of the resemblances and differences on the basis of which we perceive evolutionary changes and their results in a group of organisms. Almost identical criticism has been expressed against numerical taxonomy. It is stated that evolution is a dynamic process and, since numerical taxonomy can only evaluate relationships between fixed, static entities, it is in principle incapable of exhibiting evolutionary relationships. While it is, of course, true that evolution is dynamic and that evolutionary processes must be studied as changes in form and shape, these changes can only be described as modifications of fixed stages arbitrarily chosen at any given point in time. The actual comparison is never made while forms are in a state of flux, but is at certain fixed developmental and evolutionary stages. Simpson, being a palmontologist, is naturally more aware of phylogenetic changes and the modification of evolutionary lines than a neozoologist would be. However, the occurrence of phylogenetic changes should not blind us to the fact that the paleontologist himself describes and measures changes between fixed stages at various geological horizons. Similarly, embryology is a dynamic science and yet the study of it has not necessarily been of an embryo in statu nascendi, but also of series of developmental stages. T h e above processes might be analogized to integration in the calculus; we piece together a continuous curve out of many small segments, as we are doing in phylogenetic work using fossil material. T h e point is also made by Simpson that typology cannot take into consideration variation within any one point in time, i.e. the morphological variation within a taxon. This is a criticism justified only if it is aimed at
TYPOLOGY
AND EMPIRICISM
IN TAXONOMY
249
classical typological work. A typology should a~ow for variation of characters to represent the lower hierarchic units contained in a proposed taxon. A number of suggestions for a statistical typology which would meet such requirements are given in a later section. Another objection to classical typological theory is its restriction to morphological evidence. T h e historical justification of this restriction has already been pointed out. Phylogenetie systematics in its earlier days also relied almost exclusively upon morphological characters and to a large extent does so even today. Just as it is obvious from modern knowledge of genetics and development that the external morphology of an organism will to a remarkable extent portray its genetic make-up, so we also know that there is no essential difference between the types of genes which manifest themselves primarily at other levels of integration, i.e. physiological, behavorial and so forth. Indeed, owing to pleiotropie effects many genes will manifest themselves at several of these levels simultaneously. It is for this reason that consistent and reasonable systems have been arrived at based entirely on morphological evidence and even only on morphological evidence of given organ systems, such as the skeleton in vertebrates, and that discovery of additional characters of whatever sort frequently only verifies such classifications. Yet at the same time there would appear no justification for rejecting non-morphological characters which are being discovered at an ever-increasing rate, as new methods of analysis (biochemical, physiological, ethological, etc.) reveal the existence of previously unsuspected patterns of variation. It cannot be argued that morphological characters are more significant in evolution, hence should be preferred, or that one type of character is functional and the other type not, since first of all the non-adaptiveness of a character can hardly ever be satisfactorily demonstrated and even if true would be of no consequence to classification. Orthodox typology shares an important shortcoming with phylogenetic systematics, namely, classification on the basis of a few characters, often arbitrarily chosen. When a given organ or organ system lends itself conveniently to typological analysis it may be used to erect a taxonomy, yet no apriori reason may exist for choosing this organ system over any other one. Arbitrariness and bias therefore are inherent in typological classifications erected on the basis of a few characters, just as they are in the case of phylogenetic systematics. In summary, classical typology is unacceptable when contaminated with idealistic, metaphysical concepts, when restricted only to morphological evidence and when based on few characters. On the other hand, when it represents an empirical summation of the information available on a given taxon without phylogenetic value judgments of these characters and when it
250
ROBERT R. SOKAL
is performed on the basis of numerous characters of many kinds rather than few of a single kind, then typology can serve as a yardstick of resemblance between taxa which, when properly quantified, should yield important information on the resuks of evolutionary processes. It will be the task of the next two sections of this paper to examine the possibilities for a typology grounded in empiricism and employing a statistical methodology. 5" Empiricism and Typology
Empiricism in taxonomy has been ably reviewed by Simpson (i96i , pp. 41 ft.). The empirical taxonomist will observe and record as many characters as possible and group the taxa according to a majority of shared characters. The closeness of the relationship depends on the number of shared characters. The method by which the number of shared characters is determined and computed varies with the empirical school. A number of relatively sophisticated statistical methods have been developed to this end in recent years in connection with the development of numerical taxonomy. Historically empiricism dates back to the French botanist Andanson, working in the eighteenth century; there have been a number of empiricists among taxonomists in every generation since. The fundamental problem of the validity of empiricism in taxonomy must be whether it can be used as a consequential and consistent method for arranging organized nature. Alternatively, should any system, phylogenetic or other, be used, which aprioristically or during the process of classification assigns unequal weights to certain characters? Should the phylogenetic hypotheses, which will inevitably be drawn from the classification, enter and influence the classificatory process? It is not difficult to see why phylogenetically oriented systematists use inferences about phylogeny to further their taxonomic work. Given a reasonable working hypothesis, it is only human nature to employ it toward the building of further hypotheses. Having decided that regularity and order in groups of organisms is a function of their descent with modification, one is easily led to speculate on the progress of this descent and the nature of these modifications. To ignore such reasoning would displease some. It would mean (to them) ignoring a basic unifying principle of biology. Thus Simpson (x96i) calls the work of modern Adansonians (empiricists) not wrong, but shallow and incomplete. Yet the argument here centers only on when, not on whether, phylogenetic deductions are to be made. None but the most orthodox and regressive typologists would deny the evolutionary record and discourage evolutionary deductions from systematic evidence. However, all consistent empiricists must emphasize that these deductions should be made after
TYPOLOGY
AND
EMPIRICISM
IN T A X O N O M Y
251
the classification has been established, not during the process of classification. Not only is this position consistent and logically defensible, but phylogenetie deductions made from phenetic evidence as exhaustive as the canons of numerical taxonomy require are bound to be better than those based on the usual taxonomic methods. T h e argument has often been made both by typological and numerical taxonomists that non-phylogenetic classifications are to be preferred since these are closer to the facts of nature and contain fewer hypothetical elements. Hennig (i95o) feels that there are two basically erroneous assumptions contained in such a statement. One of these is that the first step in systematics should be a simple classification without concern for the causes of order in the system. He feels it safer to use a single hypothesis and assumption regarding the origin or the phylogenetie significance of a single structure than to use vague and manifold assumptions at various portions of the natural system to be erected. He claims it is impossible for the taxonomist to proceed toward ordering his system without maintaining some position, often without realizing what this position is and what possible bias could be inherent in it. Hennig's arguments, while pertinent against classical typology, are groundless against empirical typological procedures such as numerical taxonomy. The procedures for numerical taxonomy are clearly defined and circumscribed. T h e original data that enter into the computations must be equally precisely described. This permits not only the exact delineation of the information used in the classificatory process but also permits its correction if errors in judgment or in classification of characters can be shown. Furthermore, if only admissible characters are employed, following uniform criteria (such as those of Sneath & Sokal, i962), there need be little doubt about the nature of the assumptions used in the classificatory process. Hennig's second charge of error is against those who maintain that relations based on similarity are easiest to obtain and most factual and that individuals (or sometimes species) are to be the basic units in the classificatory process. Hennig points out that many forms undergo metamorphic changes and will appear different during various stages in their life history. Hence a system based on similarity would give quite erroneous results, e.g. caterpillars would be classified with other caterpillars rather than with their own adult form. Hennig therefore concludes that we cannot ignore ontogenetic considerations in the classificatory process. I would concur in this opinion, but cannot follow Hermig when he goes on to say that, because it has been shown that we must take account of ontogenetic relationships as well as tokogenetic relationships (Hennig's term for genetic relationships within a species) in systematic work at the species level, work on the higher categories must of necessity include phylogenetic relationships. This does
252
ROBERT R. $ O K A L
not follow from any logical or scientific principle. The fundamental difference between the value of ontogenetic and tokogenctic relations in interpretingsystematic relationshipsat the infraspecificlevels,and the use of phylogenctic relationshipsat higher levelsis that the former can be put to exact experimental and analyticalproof and demonstration, while the lattermust of necessity remain hypothetical. Empirical taxonomy must take ontogenetic relationshipsinto account in the case of the caterpillarand butterfly,since in any phenetic classification we are constantly on guard against the establishment of absurd relationships. I believe it would be legitimateto exclude from consideration the comparison of a butterflywith a caterpillaron a priori grounds, but ifthat is not granted by the criticalreader, then such comparisons are stillpartly invalidaccording to the rulesof numerical taxonomy which require employmerit of only those characters comparable between the two forms (Sneath & Sokal, 1962). In a character-by-character analysis it would be quite likely that most of the characters by which adult butterflies could be compared could not be found in the larvae (and vice versa), making a meaningful comparison difficult.If, on the other hand, enough common characters can be found (and these would quite likely be of the biochemical type), it may very well be that the caterpillar and adult of species A would indeed be closer to each other than either of them would be to one of the life history stages of butterfly species B. This whole subject will remain debatable until a problem of this sort is actually tested by an analysis of character-by-character correspondences among forms exhibiting various types of metamorphosis. It should be added that in a comparison of overall similarity between species A and B by numerical taxonomy available characters from all life history stages would be utilized. Empirical taxonomy thus need not be limited to a consideration of a single life history stage. Hennig (195o) defends the legitimacy of the phylogenetic approach by the questionable principle of reciprocal illumination, discussed above. He cites four reasons for preferring a phylogenetic point of view. The first is that a phylogenetic system is the most meaningful of all possible systems because all other types of classifications, such as ecological, zoogeographic, or morphological, can be derived and explained through the phylogenetic system. None of these special systems could occupy such a central and allexplanatory position. This is a powerful argument and its essential correctness cannot be doubted. The theory of descent with modification is the most adequate, most unitary and indeed simplest hypothesis through which a great variety of biological phenomena such as geographical distribution, physiological adaptation, morphological similarity or biocoenotic complexity can be related. Phylogeny can thus be seen as the central cause of much of biology, but (and here I must ask for the reader's indul-
T Y P O L O G Y AND E M P I R I C I S M IN TAXONOMY
253
gence for my painful belaboring of the same point) phylogeny can be a useful explanatory concept only when it is reasonably well known, and it is not so known in the vast majority of instances ! Thus while an empirical classification cannot serve as a universal explanatory concept for these many biological phenomena it is at least a self-sufficient, factual procedure. Hennig's second reason for preferring phylogenetic systematics is no longer valid today. He states that while phylogenetic relationships are at least in principle measureable (measureable is surely meant in sen3u latissimo) no exact method for the measurement of similarities between various morphological systems exists. Recent years have witnessed the rapid and varied development of methods of quantitative assessment of similarity of form, both in numerical taxonomy and related fields; the increasing availability of computational hardware promises to make these methods increasingly practical. Hennig differs from other phylogenetic systematists, e.g. Simpson, in accepting the legitimacy of several systems of classification, typology among them. He states (I95o , p. i54 ) that "the quest for typological systems is indissolubly linked with the quest for exact methods for the recognition of types". He thus realizes clearly the need for a numerical taxonomy. Many authors join Hennig in a plea for an exact taxonomy. A pertinent quotation from Singer (I959, p. 2oo) must suffice here : " W e would stress the fact that, from the time of Linnaeus to our own, a weak point in biological science has been the absence of any quantitative meaning in our classificatory terms. What is a Class, and does Class A differ from Class B as much as Class C differs from Class D? T h e question can be put for the other classificatory grades, such as Order, Family, Genus, and Species. In no case can it be answered fully, and in most cases it cannot be answered at all . . . . Until some adequate reply can be given to such questions as these, our classificatory schemes can never be satisfactory or 'natural'. They can be little better than mnemonicsN mere skeletons or frames on which we hang somewhat disconnected fragments of knowledge. Evolutionary doctrine, which has been at the back of all classificatory systems of the last century, has provided no real answer to these difficulties. Geology has given a fragmentary answer here and there. But to sketch the manner in which the various groups of living things arose is a very different thing from ascribing any quantitative value to those groups." Thirdly, Hennig claims that there is no one-to-one correspondence between morphological similarity and phylogenetic relationship. It would appear from general experience, particularly with convergent evolution, that phenetic relationships may mask phylogenetic relationships to some extent. Which of the two relationships would be more pertinent to the taxonomist under such circumstances remains to be demonstrated (see in
2~4
R O B E R T R. $ O K A L
this connection Sneath & Sokal, 1962 ). In any case, we enter a field unexplored in theory or practice. Until it can be shown through plausible models to what degree phylogenetic relationships can differ from morphological (phenetic) relationships, or until a case of known phyletic history can be used to demonstrate quantitatively the correspondence of morphological with phylogenetic relationships, judgment on such issues should be suspended in favor of research upon them. It must not be forgotten that in any case the basic fallacy in Hennig's reasoning remains, namely, that while phenetic relationships can be observed and quantified, phylogenetic relationships cannot be so studied and frequently have to be derived from the phenetic relationships. Hennig's fourth point, the inability to classify members of different developmental stages from purely morphological points of view, has already been discussed. In numerical taxonomy no real problem develops in this regard, since all kinds of characters are considered. Can empirical taxonomy be equated to typology? Simpson's (1961) and Inger's (1958) assertions to this effect are only partially true. Empirical procedures do not aim at a hypothetical idealistic type and in that sense they clearly are not typological. However, they do on the basis of many characters abstract and define a common "type", best described as a bounded multidimensional space contained within a multidimensional framework, the axes of which are the characters considered. What is said below of empirical taxonomy, will primarily refer to numerical taxonomy but need not necessarily be restricted to this procedure. Such a multidimensional representation will have to take into account the variation of a given character among the members within the taxon. It will not be necessary for any character to be constant within the taxon. It is quite likely that many characters will not be constant for all the members of the taxon that were studied. Simpson and most orthodox taxonomists concur with Beckner's (1959) demonstration that it is possible to define a taxon the members of which possess not a single character in common, although with the large number of characters used in numerical taxonomy it seems to me unlikely that such a situation would occur, at least for the relatively lower hierarchic ranks within a study. Classical typology defines an invariant type, which, while not descriptively representative of all the actual forms subsumed by it, permits their conceptual derivation. T h e "type" of the empirical taxonomist would be wider and allow for all variations of form within it. It would thus be a space which would contain within it all the forms that have been studied to date, but it would also contain many unoccupied regions into which additional forms could be placed at a later time. Other additional forms might only require the widening of this space but not any basic alteration.
TYPOLOGY
AND
EMPIRICISM
IN T A X O N O M Y
25S
6. Typological Methodology After removing metaphysical concepts from consideration, the remaining type concepts can be divided into "types by abstraction" and "types by synthesis" (Smirnov, 1925). Types by abstraction are based upon common elements of all the members to be subsumed under the type, i.e. upon characters whose expression is constant among the taxa represented by a given type (but not among related taxa). Of necessity abstracted types are frequently based on few characters, since within any taxon there may be only a few characters which are characteristic of the entire group and not characteristic of related groups. The higher the hierarchic rank the fewer common characters will remain. Abstracted types thus have little to recommend them. For a discussion of two kinds of abstracted types, the diagrammatic type and the generalized type, see Remane (1956). The synthetical types sensu Smirnov (i925) hold more promise. Such a type should express the different features of all the variations it subsumes. The central types of Remane (i956) are synthetical. When characters of forms within a taxon exhibit a series of (presumably adaptive) radiations, a central form can often be extrapolated. Many so-called "primitive" types, which may or may not be represented by extant organisms, are determined by this method. They are considered synthetical because in arriving at the central type we are taking the various series of character variations into consideration. Remane makes a point of stating that the central type is not necessarily a central value in the statistical sense, i.e. it does not represent the means of the characters to be studied. The term "central" is much more appropriate to Smirnov's (I925) types based on central values in the customary statistical sense. They are synthetical because central values involve all members of a distribution. If the characters are coded on a continuous scale, with different states of a character coded arbitrarily one, two, three, etc., then the midpoint of the range, the median, the mode, or the arithmetic mean all are central values. Some hesitation must accompany the use of any of these statistics because they are based on frequency distributions which may not be at all representative of the variation we wish to encompass. An example from human biology may make this point clear. If we code skin color on a graduated scale from palest white to darkest black, what value are we to choose as representative of the species Homo sapiens ? Taking either the mean or the median would involve some weighting of various racial classes. Are we to weight them by the number of individuals of each particular race or color class? This might be an acceptable procedure, but may have no relation at all to the evolutionary importance of a given race in terms of the origin of the human species as a whole. It may be argued that in a
2~6
ROBERT R. SOKAL
phenetic classification evolutionary origin should have no place; however, establishing a phenetic classification does not mean operating a priori in blatant contradiction to evolutionary trends. In ignorance of any dependable method of weighting the character states representing a given taxon, a mid-point between the extreme values of the range or perhaps a modal class representing the most frequent one might be a more useful method for describing central types. Once a central type (sensu Smirnov) is established, it can be plotted in a multi-dimensional space and the relations between different types can be obtained by studying distances in this space in the manner suggested by Sokal (I961). While synthetic by nature, Remane's (1956) systematic types require for their construction prior knowledge of the natural system, which that author bases on homologous characters. In considering the reconstruction of a type from its natural group Remane would include also neighboring and related forms. He would agree with Naef (1919) that of all characters of a group those should be regarded as typical that are also typical of the more inclusive systematic category. To illustrate this procedure Remane cites the pectoral girdle of the monotremes which looks quite different from that of the Eutheria. Ordinarily it would not be included among the type of mammals, since after all only a very tiny proportion of all mammals have such a pectoral girdle. On the other hand, when we consider the reptiles and amphibia, in which structures homologous to the monotrematous pectoral girdle occur, we are forced to recognize that the mammalian type must take the girdle of the monotremes into account. The construction of systematic types cannot be recommended since it requires that the natural system be known. Yet once this system is known little point is served by erecting a type. Furthermore, if it is true that the natural system can only be elaborated by empirical taxonomy, then it follows that the type to be erected should be an empirical one. The methods that seem to have most promise in the construction of synthetical types are the empirical ones employed in numerical taxonomy. One of these consists of defining clouds or clusters of individual forms, such as individuals, species or genera (so-called operational taxonomic units--OTU's; Sneath& Sokal, 196z ). An easy way of thinking of them is to represent these OTU's in a hyperspace bounded by axes representing the characters. Sokal (1961) has shown how Euclidian distances can be calculated between such simple OTU's. The resulting matrix of distance coefficients is similar to the matrix of correlation coefficients obtained in other studies by numerical taxonomy. The distance coefficient matrix can then be analysed by various forms of cluster analysis and mutually exclusive non-arbitrary groups (Simpson, 1961 ) formed by various methods of
TYPOLOGY AND EMPIRICISM IN TAXONOMY
2~7
clustering. Methods for defining these clusters have been given by Rogers & Tanimoto (I96o), Sokal & Michener (i958) and McQuitty (i954, i956 , i957). Others are currently under study in the writer's laboratory. Related to these methods is one Sokal & Sheath (i963) have called the exemplar method. On its face it appears not to be synthetical, but rather an abstracted method of arriving at types. On the assumption that it would be difficuk to find some scheme for obtaining a central value, the method employs a single member of a presumed group, as its exemplar or representative. If the group is real and definable in the conventional way, the amount of variation or error around this exemplar within the group should be less than the variation of the entire group within the next higher taxon. Thus, to use another example from humans, we might wonder in a study of various primates, including the species Homo sapiens, which particular race of humans to use as a representative. According to the tenets of the exemplar method this would not matter much, because the differences between man and other primates would be of such magnitude that the slight differences in location in hyperspace between a Mongolian and a Caucasian, for example, would be trivial by comparison. Types could be set up based on correlations among operational taxonomic units and the clusters obtained therefrom. Correlations among taxa are the so-called Q-type correlations, and methods of clustering these have already been mentioned. When taxa are correlated they can be conceived of as a bundle of vectors (or tips of these vectors) in a multidimensional space. Clusters of these vectors represent taxa of similar forms. The coordinates of such a space would be rectangular and the number of dimensions necessary to represent the correlations among taxa would correspond to the rank of the correlation matrix. When taxa are represented as clouds or clusters of points, or as vectors, i.e. when we work with correlations or distances, the resulting taxa are generally not defined on a basis of discrete characters but are defined by overall similarities. Most methods of numerical taxonomy are of this general nature. Thus the establishment of taxa by such methods does not necessarily yield descriptions of types. It is necessary to refer back to the characters on the basis of which the correlations or distances have been computed. In order to represent character variation in a type, the latter would have to be represented by some central value or other synthetical statistic, as was discussed earlier. For example, once a taxon is defined it could be located in a hyperspace as a point representing central values for all the characters defining the space. Experience at this time is not sufficient to tell us what kind of central value is to be preferred. The example given below will show some of the applications of these methods. Perhaps some measure of the variance of forms should be included in
258
ROBERT R. SOKAL
describing a type. This should show how the subordinate taxa are distributed within the higher taxon. No efforts in this direction have yet been made, but such problems might be a worthwhile field for inquiry. In plotting taxa in hyperspace we have assumed that the coordinate axes representing individual characters are orthogonal, i.e. that the characters are uncorrelated. This is an invalid assumption, as is well known. Procedures will have to be modified in subsequent work to conform with the more complicated but realistic postulates of correlated characters, with the result that some of the coordinate axes will be at varying angles to each other. No general treatment for numerical taxonomy based on correlated characters has yet been developed. This will require a thorough analysis of the relations between R-type matrices (matrices of correlations among characters) with Q-type matrices (correlations among taxa). Another method of obtaining clusters of taxa is by the method of multiple factor analysis, first employed for this purpose by SokaI (1958) and recently elaborated by Rohlf & Sokal (I962). 7- Art Example The example chosen is taken from Smirnov (1925) , who used 24 genera of the sub-family Syrphinaeof the dipterous family Syrphidae in an attempt to quantify types. The example is cited here strictly by way of an illustration of the concepts and methodology discussed above. It should in no way be thought to be a contribution to the systematics of syrphid flies. I have corroborated neither the morphological facts nor the nomenclature, which, being 35 years old, must surely be outdated. Smirnov cautions against using these data in more than an illustrative fashion since they are based on few characters of the internal male reproductive system only. Thus similarities obtained by Smirnov's method or by numerical taxonomy are merely crude, approximate similarities among the male reproductive systems of these forms, but not among the entire organisms. Smirnov's data consist of records on seven ratios based on eight characters of the male internal reproductive systems of these flies. His raw figures are shown in Table i in the form of ratios. Smirnov first examines the frequency distribution of the 24 genera for each of the seven characters. Finding that the shape of this distribution for character, V:P, relative length of seminal duct, is distinctly bimodal he proceeds to divide the assemblage of genera into two groups, group I containing genera code numbered i through II by me and group II containing the other 13 genera, code numbered I2 through 24. This first step of Smirnov's is difficult to follow, since it seems to make the subsequent grouping of the taxa rather arbitrary, depending on the distribution of the first character, whichis therefore given an inordinatelyheavyweight. Further-
T Y P O L O G Y AND ] E M P I R I C I S M I N T A X O N O M Y
2S9
more, the division is taken between genera i i and 12, which in fact have the same value for character V:P. Having set up the two groups on this basis, Smirnov computes the means for each character for each of the groups (except character T:P which is left out from further computations). T h e n he calculates for each genus the deviation of a particular character from the mean of its group. This deviation is standardized by being divided by the group mean. The absolute sum of these deviations for each genus in group I over all characters is obtained and called S t. He compares this quantity with another, S~, which is based on the deviations of the values of any given character in group I from the mean of that character in group II. By showing that the sum of the deviations is always less within groups than it is among groups, i.e. that S 1 is always less than S~ for group I (and conversely for group II), Smirnov contends that the correctness of his natural grouping and of his typology has been demonstrated. It would be remarkable ff this were indeed so, since the original grouping was arrived at on the basis of only a single character, rather than all seven characters. In fact, although omitted from his table, $2 for genus 9 (Platychirus) is smaller (4.1o) than the S 1 value of 4"6o. If there is any basis for divergence among the groups (which may be supposed if at least one character diverges and there is some correlation among the other characters), then the conditions of S 1 and S~ are almost a necessary consequence, since on the average the deviation of a genus from the mean of its own group will be less than from any other mean. However, the major objection to this particular way of grouping the genera would rest upon the arbitrariness of the initial choice, followed up by only a single test of the fitness of the particular grouping. Having delimited his groups, Smirnov then constructs graphical types for his two groups. This is done by calculating the mean for the seven characters for each group and from these means, which represent mean ratios between two variables, reconstructing the original means of the variables on which they are based. He is then in a position to draw diagrams based on these means. These are "synthetic" rather than "abstracted" by his terminology. Examples of such graphical representations of central types (semu Smirnov) are shown in Fig. i. Even if a given grouping satisfies the criteria postulated by Smirnov, there is no reason why there may not be better groupings showing even greater divergence between S t and S~. This would lead to a test of naturalhess of grouping which has interesting possibilities, namely, to test all possible arrangements of the 24 genera into two not necessarily equal sized groups, to show which particular arrangement would be the most consistent within and divergent between groups. However, such a task on a trial and error basis is most formidable even in this day of the computer. Further-
260
ROBERT
R. S O K A L
more, there is no special reason to restrict to two the number of groups into which an assemblage of taxa has to be divided at any given rank. A logically consistent method would have to test up to t-I groups for t genera. Since this would involve astronomic amounts of computation, more rapid methods for arriving at optimum solutions to the problem of grouping have to be employed. Such solutions would involve the various methods of cluster analysis which have been employed in numerical taxonomy in order to obtain taxonomic structure from matrices of similarity coefficients.
pr. V.S. 5.
v.d.
FIG. I. G r a p h i c representation of types of the internal rome reproductive system o f the sub-family Syrphinae (redrawn from Smirnov, I925). T h e s e two figures r e p r e s e n t the types of the two groups into w h i c h S m i r n o v divided the 24 genera. His first group, representing genera I t h r o u g h I I , is s h o w n on t h e left; t h e s e c o n d group, r e p r e s e n t i n g genera 12 t h r o u g h 24, is s h o w n o n the right. T h e s e figures were d r a w n o n t h e basis o f m e a n s for the characters shown, c o m p u t e d separately for the two groups. N o t e the considerable differences in relative lengths of the various structures I T h i s is to be expected, since the ~vo groups were discriminated o n the basis of ratios of t h e structures s h o w n in Fig. I. Abbreviations: t. = testis; pr. = prostate gland; v.s. = seminal d u c t ; yes. s. = seminal vesicle; v.d. = vas deferens; a. = seminal ampulla; d,ej. = ejaculatory duct. T h e terminology of these structures m a y not c o r r e s p o n d to m o d e r n usage. T h e prostate is c o m m o n l y called the accessory gland; the seminal ampulla is also k n o w n as ejaculatory sac. Structures labelled v.d., a., and d.ej. m i g h t be included in the ejaculatory d u c t and v.s. m i g h t be labeled deferent duct in a m o d e r n study.
To apply some of the methods of numerical taxonomy to Smirnov's data, the data matrix of Table i was standardized by columns, i.e. each column was standardized over the 24 genera. The standardized data matrix, coded by the addition of a constant 5"° in order to remove negative signs, is shown in Table 2. Distances were computed between all possible pairs of the 24 genera using the seven standardized characters (Sokal, 1961). It should be emphasized again that this example is solely methodological since seven characters would be quite inadequate in ordinary numerical taxonomic practice. However, one can still compare the results of Smirnov's method with numerical taxonomy since both are based on the same seven characters. The matrix of distances is shown in Table 3 in coded form, representing the ten complements (io -- distance) of the actual distances. Genera closest to each other will have a relatively high complement of distance, while
TYPOLOGY
AND EMPIRICISM
261
IN TAXONOMY
TABLE I
Seven ratios of measurements of parts of the internal male reproductive system of 24 genera of Syrphidae (data by Smirnov, 1925) Gels.US
Code No. r 2 3 4 5 6 7 8 9 xo 11 12 x3 I4x5 x6 17 I8 19 20 21 22 23 24
Name Syrphus Chrysotoxum Xanthogramma Leucozona Ischyrosyrphus Pyrophaena Lasiopticus Sphaerophoria Platychirus Didea Bacha Pipiza Heringia Chilosia Sphegina Chrysogaster Paragus Rhingia Hammerschmidtia Liogaster Ferdinandea Orthoneura Brachyopa Neoascia
Ratios V : P Vx:V 0.66 63 61 61 6o 57 56 53 46 44 44 44 38 27 23 18 i7 I3 12 11 Io 09 08 07
0"55 68 46 6o 77 88 62 I .oo 67 85 62 oo 4I 54 oo 0o x.oo 8o 46 67 31 54 89 00
T:P
B:T
D:P
Yes:D
Pr:P
0.26 19 35 33 28 3o 29 27 29 5o 44 24 23 24 22 42 42 26 62 16 29 40 33 43
0"75 94 81 58 7o 82 58 27 87 51 55 65 66 5o 70 69 7o 69 I3 67 42 2x 32 62
0"07 x9 o4 04 xr i2 15 I9 25 o5 x2 32 38 49 55 40 4r 60 25 73 61 51 54 50
o.oo xx oo oo oo oo oo x8 I6 oo oo 49 67 24 02 0o 38 53 0o 06 13 ox 29 03
0"47 48 47 4o 5t 52 53 94 51 2 42 38 42 28 24 34 48 2o 26 39 27 62 28 I5
V : P relative length of seminal duct to entire length of reproductive ducts. V t : V relative length of blended part of seminal duct to entire seminal duct. T : P relative length of testicle to entire length of reproductive ducts.
B:T width: length ratio of testicle. D : P relative length of deferent duct to entire length of reproductive ducts. Ves:D relative length of seminal vesicle to length of deferent duct. Pr:P relative length of prostate gland to entire length of reproductive ducts.
species far apart from each other will have a relatively low one. The distance matrix was then analysed by the weighted pair group method (Sokal & Michener, i958), using simple averages to compute relations between newly formed stems. The results of this clustering are shown in the form of a numerical taxonomic dendrogram in Fig. 2. It can be seen that, when all seven characters are simultaneously considered, the grouping of the 24 genera is appreciably different from the dichotomy established by Smirnov. By drawing an 86 phenon line (Sneath & Sokal, 1962 ) across the dendrogram four groups result, which we shall arbitrarily name taxa A, B, C, and D. Taxon A consists of genera i through 7 and genus 9 through I i ; taxon B consists of genera 12 through 18, and genera 20, 21, 23 and 24; taxon C consists of genera 19 and 22, while taxon D is monogeneric, consisting of genus 8 only. Inspection of Fig. 2 shows that such a classification is quite rough and that there are important subdivisions of these categories, as for T.B.
x8
262
R O B E R T R. S O K A L
example genera IO and 11 which are quite distant from the other genera in taxon A, similarly 12 and 13 which are quite distant from the rest of taxon B. On the whole, however, the classification set up by the distance method is preferable to the division by Smirnov. For example genus 8, which is quite aberrant according to Smirnov, is separated by the distance method. The classification shown in Fig. 2 is preferable to Smirnov's also by his I00
4
7
I
3
5
6
2
I~
13 17 23
14 21 18 20 24 16 15 19 22
I r
98
96
94
92
9O
1
8"8
86
84
I
86 PHEt'ION LINE
I .
82
.
.
.
.
.
l
80
F1a. 2. Dendrogram or diagram of relationships among Smirnov's 24 genera of
Syrphinae. This dendrogram is based on the weighted pair group method (Sokal & Michener, x958) from a computation of distances (Sokal, i961) of the data in Table 2. The ordinate is graduated as the ten complement of the distance (io -- distance) and multiplied by Io in order to remove the decimal point. When this scale is compared with that of Table 3, it must be remembered that the ten complements there were multiplied by ioo in order to carry one more significant digit, which is not necessary in the ordinate. Numbers across the top of the figure are genus code numbers identified in Table I. T h e broken horizontal line across the dendrogram is a phenon line (Sheath & Sokal, I962) defining four taxa at a minimum level of 86 (within-taxon resemblance) on the ordinate scale. Note that genera I9, 22, and 8 do not belong into the other two groups, as shown by the method of S m i m o v (I925).
criterion of "naturalness". Comparing the S 1 and S 2 values, the average difference for his scheme is 5"76 when prorated for seven characters, while by numerical taxonomy it is 6.46 for taxa A and B. Using the classification obtained by the weighted pair group analysis of the distances, we can now proceed to the formulation of types. A pictorial type similar to Smirnov's could have been prepared based on means of characters of groups. These would have been similar to those shown in
TYPOLOGY
AND E M P I R I C I S M
IN TAXONOMY
263
TABLE 2
Ratios of Table I standardized by columns (over each character) and coded by addition of 5-0 Genus Code No. x 2 3 4 5 6 7 8 9 IO Ix 12 13 I4 15 16 x7 18 19 2o 21 22 23 24
Ratios V:P 6"43 6.29 6.20 6.20 6.15 6.oi 5.96 5.82 5"5° 5-40 5.4 ° 5-4° 5 "12 4.61 4"42 4"I9 4"14 3"96 3"91 3.86 3.82 3"77 3"72 3'68
VI:V 4"98 5"4o 4'69 5-15 5.70 6.o5 5.21 6.44 5"37 5'96 5.2I 3.20 4"53 4"95 3.20 3.2o 6"44 5"79 4"69 5"37 4.21 4"95 6"o9 3.2o
T:P 4"41 3"76 5"25 5.06 4.60 4"78 4"69 4"50 4"69 6.65 6"09 4.22 4"I3 4-22 4.04 5"90 5"90 4"41 7"77 3"48 4-69 5"72 5.o6 6-oo
B:T 5"74 6.66 6.03 4"92 5.5 ° 6.o8 4"92 3"4I 6.32 4"58 4"77 5"25 5-30 4"53 5"50 5"45 5"9° 5"45 2"73 5"35 4"14 3.I2 3"65 5" 1 x
D:P 3"83 4"40 3"63 3"63 4.o2 4.07 4.21 4.4 o 4.68 3"74 4"07 5"ox 5"29 5.8I 6-Io 5"39 5"44 6"33 4.68 6"94 6.38 5"9I 6"o5 5.86
Ves:D 4"3o 4"86 4"30 4"3 ° 4"30 4"30 4.3 ° 5.22 5.1I 4"3 ° 4"3 ° 6"79 7.70 5"52 5.60 5"70 6-12 6.2o 4"3o 4-61 4-96 4"354"23 4"45
Pr:P 5"37 5"43 5"37 4"96 5-61 5"67 5"73 8.I7 5.61 3"83 5 .08 4"84 5"08 4"24 4.Ol 4.6o 5"43 3"77 4"13 4"9o 4"19 6.27 4"24 3"47
For explanation of genus code numbers and abbreviations of ratios see T a b l e r.
Fig. i, but little purpose would be served in perpetuating such a convention unless strict morphological aims were being pursued. The figure that could be drawn would be more representative of the taxa shown because of the greater naturalness obtained by the method just demonstrated. If we wish to identify a type as a hyperspace bounded in every one of its dimensions and calculate the ranges of the taxa for each of the characters, we note that with respect to characters V:P and D:P there is no overlap between the groups whatsoever, and relatively little overlap for character B:T. One such character without overlap is sufficient to separate taxa, as shown in the schematic diagram of Fig. 3. We note that the taxa represent different subsets of the hyperspace described in the study. Because the characters in this example are continuous variables it is unlikely that they are identical for any of the taxa formed. However, if we consider character values within given ranges to be essentially equivalent, we could simulate character invariance by this method. We would then be able to obtain abstracted diagrammatic types defining groups A and B on the basis of characters V:P and D:P. As we have seen, this method, basing itself on too few characters and not representing a realistic organism, is not 18-----~
264
ROBERT R. SOKAL
to be recommended. These non-overlapping characters would, however, be useful in the construction of keys for the groups defined by the study. A final reference might be made to the exemplar method of representing types. If taxa A and B had been represented by a single genus each, the possible distance values which might have been obtained between them are all those distances given in Table 3 between members of taxa A and B. The Y .iJ
u
mY4--:
i
:2
l
o
0_
~o
Y2-)
× Yl--
o~
Z
x xo~ Y w
x
o n
o
on
~-x
x, Taxa A and B x O v e r l a p on c h a r a c t e r x
FIa. 3. Schematic diagram in two dimensions illustrating the plotting of taxa in a space, the axes of which are the characters. All the taxa are contained in the two dimensional space (rectangle defined by coordinates xl, x2, Yl, Y~). When more than three characters are considered they would be bounded in an analogous hyperspace. Taxa are plotted alongside the coordinate axes to show their magnitude with respect to a given axis (character). From these plots it can be seen that taxon A (symbolized by circles) and taxon B (symbolized by crosses) overlap with respect to character X while with respect to character Y there is a clear separation, taxon A being bounded by yl and Y2, taxon B by Ya and Y4. The area ("two-space') is therefore divided into two separate sub sets, one for taxon A and one for taxon B with respect to character Y while it is not so divided with respect to character X. Similar principles would apply to a hyperspace which could be divided along certain dimensions, but not along others. average v a l u e of these distances is 8 5 i w i t h a s t a n d a r d d e v i a t i o n of zo. T h e r a n g e of these values is f r o m 9oo as a n u p p e r v a l u e to 8 i 3 as a lower value. T h e level of j u n c t u r e b e t w e e n t h e s t e m s as o b t a i n e d i n t h e d e n d r o g r a m of Fig. 2 is 85o. T h u s w e are able to o b t a i n a n e s t i m a t e of t h e e r r o r w h i c h t h e e x a m p l a r m e t h o d m i g h t p r o v i d e . W e find t h a t t h e m a g n i t u d e of error, i n this p a r t i c u l a r e x a m p l e at least, is q u i t e tolerable. I am grateful for comments and criticisms from Professors George W. Byers
0
~o0o0
~'~ e~ 0 ~ ~
o
I~.
o
¢.~ ~"~ 1"~,~ ',~-
8
~ O0 O0 O0 O0 0~00 O0 0 ~ ~ 0 0 0~000000 0,0~000 ~00
0~
~'~
~-
O~
~00
0~00 O0 O 0 0 ~ ~'-00O0 O0 O0 O0
I
~ 0 0 O0 O0 O0 ~ 0 0 O0 O0 O0 O0
C~
~ O0 O0 O0 O0 O0 O0 O0 O0 O0 ~0 O0 O0 O0 "0 0~
C~O
C~O
~
~0
~o~u~O~
D-~'~
~O~C~
0 ~-O~u~O~"
~
el
~
00~
~ 0 ~ 0 ~ ~ - ~ - 0
~
o 0 0
O0 0~00 C ~ O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0~00 0~00 C ~ O 00 00 00 00 ~O 00 ~'~0000 00 00 00
C~
I ~u~.~
0~00 ~t*00 e ~
~
~-00~
~- ~ - ~
~00
~0
O~ O~ 0~00 O~ O~ 0~00 O0 O0 O0 O0 O0 O0 O0 ~0 O0 O0 O0 O0 ~00
~1 ~ * ~
~0~0
u'~ ~-~ ~ ~ 0 0
~u~,~
0~'~0 0 ~ 0 0 0
O~ O~ O~ 0~00 0~00 0 ~ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
O~ O~ ~ O~ C~O0 0 ~ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ~'~
~'0
~-~0~0
0~0~0~0~0~00
~'-0~
000000
~-0000 O000CO
~ ~'-~ C~ 0 ~ 0
~0000
~-~
0~00 0~00 00 00 00 00 00 00 ~ - 0 0 0 0 ~ 0 0 0 0 0
~C
266
ROBERT R. SOKAL
and Charles D. Michener and Mr. F. James Rohlf, all of T h e University of Kansas, and from Dr. P. H. A. Sneath of the National Institute for Medical Research, Mill Hill, London. Mrs. Julie C. Sokal kindly processed Smirnov's data at T h e University of Kansas Computation Center, which furnished much appreciated free computer time to me. Mrs. A n n Schlager prepared the illustrations. REFERENCES
BECKNr~, M. (*959). "The Biological Way of Thought". Columbia University Press, New York. BmELOW, R. S. (I958). Systematic Zool. 7, 49. BLocrt, K. (I956). Biblioth. Biotheor. 7, I. BonGremmg, T. (i957). Systematic Zool. 6, 53. BOYDSN, A. (I947). Amer. Midland Naturalist 37, 648CAIN, A. J. (I959). Proc. Linnaean Soc. Lond. XTO, x85. CAIN, A. J. & HARRISON,G. A. (I958). Proc. Zool. Soc. Lond. x3x, 85. CAIN, A. J. & H.~RISON, G. A. (x96o). Proc. Zool. Soc. Lond. x35 , i. DaNSEa, B. H. (I95o). Biblioth. Biotheor. 4, H7. EH~ICH, P. R. (x96*). Systematic Zool. xo, 267. GILMOUR, J. S. L. (I95I). Nature, Load. I68, 400. HAas, O. & SIMPSON, G. G. (1946). Proc. Amer. Philos. Soc. 9o, 319. HENNm, W. (x95o). "Grundziige Einer Theorie der Phylogenetischen Systematik". Deutscher Zentralverlag, Berlin. INTER, R. F. (*958). Evolution I2, 370. KKLIN, J. (i94i). "Ganzheitliclae Morphologie und Homologie". Universit~itsbuchhandlung Freiburg, Switzerland. K~LIN, J. (*945). Bull. Soc. Fribourgeoise Sci. Nat. 37, I35. MARTINI, E. (19z9). 3. Wanderversamml. Deutsch. Entomol. p. 94. McQuITTY, L. L. (x954). Educ. Psychol. Meas. x4, 598. MCQUITTY, L. L. (*956). Brit. ft. Stat. Psychol. 9, 5. McQuITTY, L. L. (x957). Educ. Psychol. Meas. I7, zo7. MlCrmNER, C. D. (I958). Systematic Zool. 6, 16o. MICrrF.NER, C. D. & SOKnL, R. R. (1957). Evolution II, *3o. MORISHIMA, H. & OKA, H. (x96o). Evolution 14, i53. MYERS, G. S. (x96o). Systematic Zool. 9, 37. N~F, A. (*9*9)- "Idealistische Morphologie und Phylogenetik". Gustav Fischer, Jena. N~F, A. (,93x). "Die Gestalt als Begriff und Idee" in "Handbuch der Vergleichenden Anatomic der Wirbeltiere". Urban & Schwarzenberg, Berlin. P~M~',~E, A. (x956). "Die Grundtagen des Natiirlichen Systems der Vergleichenden Anatomic und der Phylogenetik". and edition. Akad. Verlagsges. Geest & Portig, Leipzig. ROGERS, D. J. & TANIIVlOTO,T. T. (z96o). Science I3z , 11i 5. ROHLF, F. J. (I96x). Proc. Ann. Meet. North Central Branch Entomol. Soc. Amer. I6, Iz. ROHLF, F. J. & SOKAL,R. R. (x96z). Systematic Zool. II, i. SCHOPF, J. M. (196o). Science x31, xo43. SIMPSON, G. G. (I94o). Amer..7. ScL 238, 4x3. SIMPSON, G. G. (x96x). "Principles of Animal Taxonomy". Columbia University Press, New York. SINCER, C. (x959). "A History of Biology". 3rd edition. Abelard-Schuman, London. SMIPaNOV,E. (I9z5). Z. indukt. Abstamm. Vererbungsl. 37, 28. SNEATH, P. H. A. (z957a). ft. Gen. Microbiol. 17, x84. S~'¢~ATH,P. H. A. (I957b)- ft. Gen. Microbiol. x7, zox. SNEATH, P. H. A. (I96Z). "The Construction of Taxonomic Groups" in "Microbial Classification". Cambridge University Press. SNEATH, P. H. A. & C o w ~ , S. T. (I958)..7. Gen. Microbiol. x9, 55L
T Y P O L O G Y AND E M P I R I C I S M
IN TAXONOMY
267
SNEATH,P. H. A. & SOKAL, R. R. (1962). Nature, Lond. 193 , 855. SOKAL, R. R. (I958). Proc. Ioth Intern. Congr. Entomol. x, 4o9. SOKAL, R. R. (196o). Proc. xxth Intern. Congr. Entomol. x, 7. SOKAL, R. R. (196I). Systematic Zool. :to, 7 o. SOKAL, R. R. & MlCltENER, C. D. (1958). Univ. Kansas Sci. Bull. 38, 14o 9. SOKAL, R. R. & SNEATH, P. H. A. (1963). " T h e Principles of Numerical Taxonomy". (Book, in preparation). SORIA, J. & HEISER, C. B., JR. (I96I). Econ. Botany 15, 245. WOODOER, J. H. (1945). " O n Biological Transformations" in "Essays on Growth and Form Presented to D'Arcy Wentworth Thompson", ed. W. E. Le Gros Clark & P. B. Medawar. Clarendon Press, Oxford. ZANGERL,R. (I948). Evolution 2, 35I.