Cluster Analysis

Cluster Analysis

I I I IIII I III II I II I I I CLUSTER ANALYSIS PAUL A. GORE, JR. Department of Psychology, Southern Illinois University, Carbondale, Illinois...

1MB Sizes 1 Downloads 110 Views

I I I

IIII

I III

II

I

II

I

I I

CLUSTER ANALYSIS PAUL A. GORE, JR. Department of Psychology, Southern Illinois University, Carbondale, Illinois

Linnaeus, whose system of biological taxonomy survives in modified form to this day, believed that all real knowledge depends on our capacity to distinguish the similar from the dissimilar. The classification of objectsand experiences is a fundamental human activity. Perhaps in an attempt to simplify a complex world, we tend to organize aspects of our environment into meaningful units such as gender, political party, or specieswsorting objects and events into categories based on their characteristics or uses. During early childhood for example, we learn that a chair and a table are both pieces of furniture. The very process of classification may be a necessary prerequisite for the acquisition of language. Furthermore, the human capacity to learn over the life span is greatly enhanced as a result of our ability to assimilate new information into existing cognitive categories or schemas. Classification is also fundamental to the development of science (Gould, 1989; Medin, 1989). Early attempts at categorizing celestial bodies, for example, gave rise to modern-day astronomy. And, although scientists no longer rely on measurement of the four humors to understand the nature and behavior of physical substances, this early Greek classification system spawned inquiry into physics, chemistry, biology, and philosophy. RaHandbook of Applied Multivariate Statistics and Mathematical Modeling Copyright 9 2000 by Academic Press. All rights of reproduction in any form reserved.

297

298

PAUL A. GORE,JR.

tionally based classification gave way to more sophisticated techniques when Tryon (1939) and Cattell (1944) introduced mathematical procedures for organizing objects based on observed similarity. It was not until Sokal and Sneath's (1963) publication of Principles of Numerical Taxonomy, however, that clustering methods gained widespread acceptance in the sciences. Today, the literatures of biology, zoology, chemistry, earth sciences, medicine, engineering, business and economics, and the social sciences are replete with cluster analysis studies. Although the use of cluster techniques is now quite commonplace, there are valid reasons why many investigators continue to be ambivalent about using the procedures in their research. For example, prior to beginning a cluster analysis, researchers must make several critical methodological decisions with little or no guidance. Further, attempts to familiarize oneself with cluster methodology may necessitate an interest in soil science or icosahedral particle orientations as methodological advances are frequently embedded within content-specific empirical reports. The purpose of this chapter, therefore, is to (a) provide an overview of the uses of cluster methods, (b) describe the procedures involved in conducting cluster analyses, and (c) provide some general recommendations to the researcher interested in using cluster procedures.

I. G E N E R A L

OVERVIEW

Cluster analysis is a term used to describe a family of statistical procedures specifically designed to discover classifications within complex data sets. The objective of cluster analysis is to group objects into clusters such that objects within one cluster share more in common with one another than they do with the objects of other clusters. Thus, the purpose of the analysis is to arrange objects into relatively homogeneous groups based on multivariate observations. Although investigators in the social and behavioral sciences are often interested in clustering people, clustering nonhuman objects is common in other disciplines. For example, a marketing researcher might be interested in clustering metropolitan areas into test-market subsets based on observations of population demographics, economic trends, and retail sales data. Alternatively, researchers in the physical sciences might be interested in clustering proteins or bacteria based on relevant observations of these objects. It is helpful to consider the relation of cluster analysis to other multivariate procedures. Although cluster and discriminant analyses are both concerned with the characteristics of groups of objects, there is an important conceptual difference between the two procedures. Discriminant analysis is used to identify an optimal subset of variables that is capable of distinguishing among discrete predetermined groups (see M. Brown & Wicker,

II. CLUSTERANALYSIS

299

chapter 8, this volume). In contrast, cluster analysis begins with undifferentiated groups and attempts to create clusters of objects based on the similarities observed among a set of variables. Cluster analysis is also frequently compared to exploratory factor analysis (see Cudeck, chapter 10, this volume). In fact, some social scientists advocate the use of "inverse factor analysis" or Q-analysis as a method of clustering objects (Overall & Klett, 1972; Skinner, 1979). Both cluster and factor analyses attempt to account for similarities among a set of observations by grouping those observations together. Through an inspection of item covariability, factor analysis creates groups of variables. In contrast, cluster methods are used to group people together based on their scores across a set of variables. These contrasting approaches to observing similarities have been likened to the contrast between the Aristotelian and the Galilean approach to scientific investigation (Cattell, Coulter, & Tsujioka, 1966; Filsinger, 1990). Filsinger notes that although the dominant Galilean approach permits accurate prediction of one variable given knowledge of another, the typological Aristotelian approach characterized by cluster analysis allows us to better understand the unique combination of characteristics that an organism possesses. Most cluster analyses share a similar process. A representative sample must be identified and variables selected for use in the cluster method. Samples and variables should be carefully selected so as to be both representative of the population in question and relevant to the investigator's purpose for clustering. The researcher must decide whether to standardize the data, which similarity measure to use, and which clustering algorithm to select. The researcher's conception of what constitutes a cluster and his or her research question can provide some guidance in making these decisions. The final stages of cluster analysis involve interpreting and testing the resultant clusters, and replicating the cluster structure on an independent sample. Morris, Blashfield, and Satz (1981), among others, have summarized some generic problems inherent in using cluster analysis and interpreting the findings from cluster studies. Differences exist within and across disciplines with respect to the terms used to describe cluster methods. A single cluster algorithm, for example, might be called complete linkage, maximum method, space-distorting method, space-dilating method, furthest neighbor method, or diameter analysis. Furthermore, as is often the case with multivariate methods, a set of procedural decisions must be made prior to conducting a cluster analysis. The investigator is faced with choosing from among hundreds of possible analytic combinations~some of which have been described but rarely used. Further complicating these decisions is the fact that statistical packages often include only a subset of these procedures, and some procedures are available only through the use of privately distributed software programs. Emerging research emphasizes the importance of these decisions in reporting that different classification procedures are often

3 O0

PAULA. GORE,JR.

best suited to different classification questions or to the nature of the data being analyzed. Finally, researchers often fail to replicate and validate their clusters because of the difficulty involved. Nonreplicable clusters, and clusters that fail to relate to variables outside of the cluster solution are of limited practical utility. In this volume, Tinsley and S. Brown (chapter 1), Dawis (chapter 3), M. Brown and Wicker (chapter 8), and Thorndike (chapter 9) describe cross-validation, bootstrapping, and jackknife procedures for evaluating the generalizability of multivariate analyses.

II. USES F O R C L U S T E R A N A L Y S I S

Cluster methods lend themselves to use by investigators considering a wide range of empirical questions. Investigators in the life sciences, for example, are often interested in creating classifications for life forms, chemicals, or cells. They may be interested in developing complete taxonomies or in delimiting classifications based on their particular research interests. Medical scientists rely on clinical diagnoses and may use cluster methods to identify groups of people who share common symptoms or disease processes. The use of cluster methods in the behavioral sciences is as varied as the fields that constitute this branch of inquiry. A psychologist might be interested in exploring the possible relations among types of counseling interventions. In contrast, the economist may be charged with identifying economic similarities among developing countries. Clustering methods are useful whenever the researcher is interested in grouping together objects based on multivariate similarity. Cluster analysis can be employed as a data exploration tool as well as a hypothesis testing and confirmation tool. The most frequent use of cluster analysis is in the development of a typology or classification system where one does not already exist. For example, Heppner and his colleagues (Heppner et al., 1994) identified nine unique clusters of persons presenting for treatment at a university counseling center. Persons in different clusters differed both in the constellation and magnitude of their presenting concerns and on external characteristics (e.g., ethnicity) and family of origin variables (e.g., family history of alcoholism). Alone, these findings might be of use to practitioners working in similar settings. It might be helpful to know, for example, that clients resembling members of clusters 1 and 2 (the severe and high generalized distress clusters) are less likely to come from alcoholic parents compared to clients resembling members of cluster 6 (severe somatic concerns). It is far more likely, however, that findings such as these will lead the researcher to generate new hypotheses that might help to explain the obtained cluster solution. Cluster algorithms might assemble observations into groups in ways that researchers would not have considered. The result of such an unexpected

II. CLUSTERANALYSIS

301

grouping should challenge the investigator to develop and further explore hypotheses that might account for the cluster solution. Similarly, applied researchers might use information obtained from a cluster solution to develop hypotheses regarding treatment or intervention. Larson and Majors (1998) recently provided an example of this type of study. These investigators identified four clusters of individuals based on their reports of subjective career-related distress and problem-solving efficacy. Their report outlines recommended strategies for working with career clients who may resemble members of these four clusters. Investigators might also use clustering methods to test a priori assumptions or hypotheses or to confirm previously established cluster solutions (Borgen & Barnett, 1987). Chartrand and her colleagues (Chartrand et al., 1994), for example, used cluster analysis to compare a continuum versus an interactional view of career indecision, and McLaughlin, Carnevale, and Lim (1991) used cluster methods to compare competing models of organizational mediation. As researchers become more familiar with clustering procedures, it is likely they will develop new and innovative uses for this methodology. Several recent studies (Gati, Osipow, & Fassa, 1994; Sireci & Geisinger, 1992) apply cluster analysis to the psychometric evaluation of measurement instruments, and the authors make cogent arguments for using cluster techniques to complement existing methods of psychometric evaluation.

III. CLUSTER ANALYSIS METHODS A. Theory Formulation

The first stage of a framework for classification, according to Skinner (1981), is theory formulation (see Hetherington, chapter 2, this volume). It is during this stage, and prior to collecting data, that the researcher should focus on grounding the proposed analysis in theory, specifying the purpose of the study, and determining the population and variables to be used. Ideally, a researcher should describe the nomological network guiding the study by providing precise definitions of expected classifications and the theoretical relations among them. In reality, however, social scientists rarely specify the conceptual framework guiding their classification research. This may stem in part from the common perception of cluster analysis as an exploratory technique. Other exploratory procedures such as exploratory factor analysis (see Cudeck, chapter 10, this volume) and multiple regression (see Venter & Maxwell, chapter 6, this volume) are routinely used in the absence of explicit theoretical assumptions. One of the primary arguments for beginning with theory when using cluster analysis is the fact that cluster analyses will cluster any data (even random data). When guided by theory, the

302

PAUL A. GORE, JR.

researcher will have some mechanism for evaluating the meaning of, and potential uses for, the resulting clusters. Speece (1995) encourages researchers to consider the purpose for their classification during this stage of the study. Cluster analysis may be used to develop a typology or classification system, as a test of existing classification systems, or simply to explore possible undiscovered patterns and similarities among objects. Speece notes that classification systems may be used either to promote communication with practitioners or to enhance prediction. As Blashfield and Draguns (1976) have observed, these two goals are frequently in direct opposition to one another. Speece encourages researchers to be explicit about the purpose for conducting their cluster analyses and to avoid altering their purpose midstream. Similar to other analytic procedures, cluster analysis requires investigators to select variables for study and to define a population of interest. Although the selected variables represent only one particular subset of measurements available to the investigator, they determine how the clusters will be formed. Because cluster analysis will always produce a classification scheme, regardless of the data available, it is essential that the investigator select variables that relate adequately to the classification problem at hand. A biologist attempting to establish a classification scheme for living organisms, for example, would generate very different clusters using embryonic morphology as opposed to adult features. Psychologists can imagine how different their primary classification tool (DSM-IV) would look if it had been established using etiological variables instead of current symptomatology. Everitt (1986) points out that the initial choice of which variables to include in the analysis is itself an initial categorization of the data and one for which the investigator receives no statistical guidance. A related issue involves determining the number of variables to include in a cluster analysis. Although there is no clear-cut rule of thumb, researchers whose studies are guided by theory will have an advantage in specifying which variables are most likely to contribute to a meaningful cluster solution. Research exploring the issue of how many variables to include in an analysis is conflicting. For example, Hands and Everitt (1987) found that increasing the number of variables used in a cluster analysis resulted in better identification of a known cluster structure. Price (1993), on the other hand, found that increasing the number of variables used in the analysis resulted in poorer cluster identification. Further, Milligan (1980) cautions against the use of irrelevant variables. Researchers are encouraged to select variables based on sound theoretical grounds, to select variables that will maximally discriminate among objects, and to avoid the indiscriminant inclusion of variables. Selection of a representative population also needs to be considered during this stage of a cluster study. As is true with any statistical procedure, inferences drawn from a sample will generalize most adequately to the

II. CLUSTERANALYSIS

3 03

population from which the sample was drawn. Although there are no clearcut recommendations with respect to how many objects to include in an analysis, data from several studies (Hands & Everitt, 1987; Schweizer, Braun, & Boiler, 1994) suggest that increasing sample sizes result in an increase in cluster reliability. B. Measures of Association

Clustering objects based on their similarity to one another requires that you determine the degree of similarity among the objects. Generally, the matrices used in cluster analyses are referred to as either similarity (proximity) or distance matrices. Interpretation of the magnitude of the values depends on the type of matrix generated. High levels of similarity among objects are indicated by large values in a similarity or proximity matrices and small value in a distance matrix.

1. Similarity Matrices There are several commonly used measures of similarity. The productmoment correlation coefficient is often used with continuous data. Contrary to its more common usage as an index of the association between two variables, however, the correlation coefficient used in similarity matrices describes the relations between two objects (e.g., people) on a set of variables (i.e., Cattell's Q correlation; Cattell, 1988). This is accomplished by inverting the Person (rows) x Variable (columns) matrix so that it becomes a Variable (rows) x Person (columns) matrix. Correlations calculated using this inverted matrix represent the degree of similarity between two objects (persons) on the defined set of variables. Table 11.1 provides an example of a raw data matrix in the classical (i.e., Person x Variable) and inverted (i.e., Variable x Person) formats. Table 11.2 provides a similarity matrix (product-moment correlations) for that raw data matrix. When continuous data are not available, other procedures are used to calculate the similarity matrix. The contingency table can be used in combination with one of several different formulae to describe the similarity between two objects when only binary variables are available. For example, a contingency table could be calculated for the binary data in Table 11.1 and used to establish the degree of similarity between two persons on five separate binary variables (see Table 11.2). Once such a table has been established, the investigator has a large number of similarity coefficients from which to choose. Many of the coefficients differentially weigh values that signify matching pairs, mismatches, or instances where both objects lack some attribute (negative matches; for example see Anderberg, 1973, and Romesburg, 1984). Imrey (chapter 14, this volume) provides a more extensive coverage of multivariate procedures for analyzing binary observations.

304

PAUL A. GORE. JR.

T A B L E I I.I

Raw D a t a Matrices i

i

,

Classical format

Variable

Participant Participant Participant Participant Participant

1 2 3 4 5

1

2

3

4

5

1.0

5.0 6.0 4.0 1.0 2.0

7.0 2.0 5.0 4.0 3.0

7.0 8.0 5.0 4.0 9.0

5.0 6.0 4.0 3.0 2.0

2.0 4.0 5.0 7.0

Inverted ("Q") format Participant

Varl Var2 Var3 Var4 Var5

1

2

3

4

5

1.0

2.0 6.0 2.0 8.0 6.0

4.0 4.0 5.0 5.0 4.0

5.0 1.0 4.0 4.0 3.0

7.0 2.0 9.0 9.0 2.0

5.0 7.0 7.0 5.0

Classical format for binary observations Variable

Participant 1 Participant 2

1

2

3

4

5

1 0

0 0

0 1

1 1

1 0

Classical format for use in centroid cluster method

Variable

Participant Participant Participant Participant

I 2 (3,4) 5

1

2

3

4

5

1.0 2.0 4.5 7.0

5.0 6.0 2.5 2.0

7.0 2.0 4.5 3.0

7.0 8.0 4.5 9.0

5.0 6.0 3.5 2.0

i

N o m i n a l m e a s u r e s having g r e a t e r t h a n t w o levels a n d o r d i n a l o b s e r v a tions can be d e a l t with using a v a r i e t y of m e t h o d s . F o r e x a m p l e , C o h e n ' s (1960) K a p p a Coefficient (Po - Pc/1 - Pc, w h e r e Po = P r o b a b i l i t y of O b s e r v e d A g r e e m e n t a n d Pc = P r o b a b i l i t y of C h a n c e A g r e e m e n t ) is o f t e n u s e d with n o m i n a l d a t a a n d has the a d d e d a d v a n t a g e of c o r r e c t i n g similarity values for the o c c u r r e n c e of chance. Tinsley a n d Weiss ( c h a p t e r 4, this

II. CLUSTERANALYSIS

305

volume) review the use of the Kappa Coefficient in evaluating interrater agreement. Kendall's Tau Coefficient (Kendall, 1963) is a method available to researchers when dealing with observations that are ranked. Two of the most common concerns among investigators using binary or polychotomous variables are how to treat absences and how to treat negative matches. Aldenderfer and Blashfield (1984) provided an excellent example of the difficulty of dealing with absences in anthropology. They noted that failing to discover a particular type of artifact at one site does not necessarily reflect on the behaviors of the deceased inhabitants. It may, for example, reflect decay patterns idiosyncratic to a particular site. Dealing with negative matches is often dependent on the nature of the variables used and the investigator's research question. In some instances it may be inappropriate to consider two objects similar based on the fact that they both lack some characteristic. To consider two people similar because neither of them lives on the East Coast, for example, may not be appropriate. On the other hand, we might be more justified in assuming some similarity among two individuals because they both lack a high school education. The Rand coefficient [(a + d)/a + b + c + d] counts negative

TABLE 11.2 Distance and Similarity Matrices for Use in Cluster Algorithms , ,

i

_

Similarity and distance matrices corresponding to raw data matrix Product-moment correlation matrix

9

Squared Euclidean distance matrix

Object Object

1

1 2 3 4 5

0

2

.46 .75 -.27 -.13

Object

3

4

5

Object

1

2

3

4

0

1 2 3 4 5

0 29 19 54 74

0 30 63 59

0 13 37

0 32

0 .07 -.47 .16

0 .36 .40

0 .66

Contingency table used to calculate similarity values for binary observations Person 1

Person 2

Ia

ib

2

2c

id

3 5P ,

i

5

306

PAUL A. GORE, JR.

matches as bona fide matches, whereas the Jaccard coefficient [a/a + b + c] ignores negative matches entirely. Consideration of negative matches becomes more important when dealing with polychotomous data (Blong, 1973; Romesburg, 1984). 2. Distance Matrices

In contrast to similarity measures, distance (or dissimilarity) measures emphasize the differences between two objects on a set of observations. Large values in a distance matrix reflect dissimilarity. The squared Euclidean distance measure has received the most widespread use among researchers in the social and behavioral sciences. Represented as D 2, values are calculated for each object pair by summing the squared differences between observations. The distance between participant 1 and participant 2 in Table 11.1 (see inverted matrix) can be described by the following e q u a t i o n ( I - 2 ) 2 + ( 5 - 6 ) 2 + ( 7 - 2) 2 + ( 7 - 8 ) 2 + ( 5 - 6 ) 2 or 29. A squared Euclidean distance matrix (see Table 11.2) is made up of all pairwise comparisons among objects, in this case participants. Using this distance measure has its drawbacks, however, in that changes in the scale of measurement can result in quite different pairwise rankings. For this reason, most researchers choose to standardize their data prior to using the squared distance measure. Another potentially useful distance measure is Mahalanobis's (1936) distance, which is the distance of a case from the centroid of the remaining cases. Conceptually, this measure of distance accounts for possible correlations among variables--something that is not taken into consideration when Euclidean distances are calculated. Related to the use of distance matrices is the issue of variable standardization. Although considerable debate has surrounded the issue of whether or not to standardize variables, most authors currently agree that standardization of variables is advisable when differences between variables may be an artifact of different measurement units or scales. However, several recent studies also suggest that standardization may not influence the outcome of a cluster analysis as much as was once thought. For example, Punj and Stewart (1983) reviewed 12 cluster studies and concluded that standardizing variables appears to have little effect on the clustering solution. More recent studies echo Punj and Stewart's conclusions (Milligan & Cooper, 1988; Schaffer & Green, 1996). There are numerous problems associated with the process of selecting an appropriate similarity or distance measure. One of the most frequently discussed issues involves how similarity among objects is captured by distance versus similarity measures. Cronbach and Gleser (1953) described how the similarity between profiles could be decomposed into shape, scatter, and elevation information. Shape describes the pattern of rises and falls across variables. A typical Minnesota Multiphasic Personality Inventory (MMPI) profile, for example, will contain patterns of high and low scores

II. CLUSTERANALYSIS

307

on 10 clinical and 3 validity scales. Scatter describes the distribution of scores around a central average. Psychologists occasionally discuss the scatter of subtest scores relative to the mean performance level of some construct. Finally, elevation represents the degree of rise or fall across variables and describes a profile's absolute magnitude. Objects that are deemed similar on one of these three characteristics are not necessarily similar on another characteristic. For example, MMPI profiles from two clients might be similar with respect to shape (e.g., rising scores for depression and anxiety) but dramatically different with respect to elevation (e.g., client 1 may be in the "clinically significant" range, whereas the scores for client 2 fall well within the average range). The product-moment correlation coefficient is often criticized for discarding information about elevation and scatter--a by-product of standardizing the variables and subtracting the mean from each score. Two variables will correlate highly regardless of the magnitude or variability observed among scores, provided the scores rise or fall in parallel. In situations where nonlinear relations exist, the product-moment correlation coefficient also fails to adequately capture shape characteristics. Distance matrices calculated on standardized variables will result in the similar loss of elevation and scatter information. Distance matrices calculated on nonstandardized data, on the other hand, retain shape, elevation, and scatter information. Therefore, clusters resulting from a nonstandardized distance matrix may be based on any combination of shape, elevation, and scatter. An investigator hoping to capture severity (elevation) in the formation of clusters may in fact be forming clusters on shape or scatter alone. Although the debate over the superiority of using either the distance or similarity method appears to have subsided somewhat in recent years, studies investigating the benefits or drawbacks of both methods continue to appear in the literature. Punj and Stewart (1983) concluded from their review of cluster studies, that the choice of similarity or distance measures does not appear to be a critical factor in identifying clusters. In contrast, Scheibler and Schneider (1985) reported striking differences between the Pearson correlation and D 2 in a recent Monte Carlo study. Overall, Gibson, and Novy (1993) echoed these findings more recently. For the applied researcher, the question of whether to use a similarity or distance matrix often depends upon the nature of the classification question. When classification based on elevation or scatter is desirable, then researchers are encouraged to consider using the squared Euclidean distance measure on nonstandardized raw data. In contrast, when a researcher is primarily concerned with establishing clusters based on the shape of relations among observations, then the similarity matrix is the method of choice. Researchers who conduct their cluster analyses using both procedures and obtain similar outcomes are assured their cluster structure is not an artifact of the method used.

308

PAUL A. GORE,JR.

C. Clustering Algorithms During the next stage of cluster analysis research, the investigator must select a clustering procedure. Whereas the similarity or distance measures provide an index of the similarity among objects, the cluster algorithm operationlizes a specific criterion for grouping objects together (Speece, 1995). Although researchers often disagree on the most appropriate classification scheme for cluster procedures (Lorr, 1994; Milligan & Cooper, 1987), cluster methods are frequently classified into the following four general categories: hierarchical methods, partitioning methods, overlapping cluster procedures, and ordination techniques. The first two categories have received widespread use and support among researchers in the social and behavioral sciences and will thus be described in more detail. D. Hierarchical Methods Some of the most widely used procedures for conducting cluster analyses are referred to as hierarchical methods or sequential agglomerative hierarchical methods. In general, these procedures begin by assuming that each entity in the similarity or distance matrix is an individual cluster. At each successive stage of the procedure, clusters are combined according to some algorithm. This process continues until all objects have been combined to form one cluster. Hierarchical methods generate strictly nested structures and are often presented graphically using a dendrogram similar to the one shown in Figure 11.1. In this example, objects 3 and 4 are clustered together during the first step of the procedure because that pair of objects have the highest similarity index or the smallest distance value. During the second, third, and fourth steps of this example, objects 1, 2, and 5 are added to the cluster, respectively. Since the cluster method proceeds until only one cluster remains, it is up to the researcher to decide how many clusters to interpret.

Cluster Steps

5

F I G U R E I I.I

2 1 3 4 Objects or Individuals Steps of cluster formation portrayed using a dendrogram.

II. CLUSTER ANALYSIS

309

1. Single Linkage Procedure Four different agglomerative methods are found frequently in the literature. These methods differ primarily in how elements are compared to one another at each successive stage of clustering. Sneath (1957) and McQuitty (1957) proposed the single linkage procedure in which an object is included in a cluster if it shares a sufficient degree of similarity with at least one member of a cluster. This method also has been referred to as the elementary linkage, minimum method, and nearest neighbor cluster analysis (Johnson, 1967; Lance & Williams, 1967). The squared Euclidean distance matrix in Table 11.2 can be used as a starting point to demonstrate this relatively simple algorithm. The first step in establishing clusters is to join the pair of objects with the smallest distance value (or largest similarity value). Thus, in this example, objects 3 and 4 (D 2 = 13) are joined to form a cluster. Then, the distance matrix is recalculated to reflect the smallest distance between either 3 or 4 and the remaining objects, thereby reducing the 5 x 5 matrix to a 4 x 4 matrix. For example, the distance between cluster 1 and 3 (19) is compared to that between cluster 1 and 4 (54) so the value of 19 is used in the recalculated matrix to signify the distance between clusters 1 and 3-4. This recalculated matrix is presented in Table 11.3. During the next step of the procedure, cluster 1 is added to cluster (3,4). This process continues until all objects are clustered together. Single linkage procedures can use distance or similarity matrices, but notice that a different cluster solution would emerge if the similarity matrix in Table 11.1 was used.

2. Maximum Linkage Procedure A second agglomerative method, referred to as complete linkage analysis, maximum method, or furthest neighbor analysis (Horn, 1943), is the opposite of the single linkage method in that the distance between clusters is defined as the distance between the most remote pair of cluster members. The distance matrix in Table 11.2 can again serve to illustrate this procedure. The first step of the complete linkage analysis is the same as for the single linkage method. Following the initial clustering of objects 3 and 4, however, the distance matrix is recalculated to reflect the maximum distance between the element within a cluster and the comparison cluster. The recalculated matrix is shown in Table 11.3. The larger of the distances between either 3 or 4 and 1 is included (i.e., 54) in this recalculated matrix. The next step in the procedure involves the formation of a second cluster containing objects 1 and 2. Table 11.2 illustrates how the same principle can be applied when comparing two clusters that contain multiple elements. The larger distance between 1 or 2 and either 3 or 4 is used for subsequent clustering (i.e., 63). As can be seen from this example, the complete method yielded a pattern after two stages of the analysis that was different from that generated by the single linkage method. These two methods often yield

310

PAULA. GORE,JR.

T A B L E I 1.3 Recalculated Distance Matrices Obtained Using Four Cluster A l g o r i t h m s i

Single linkage method squared Euclidean distance matrices Object Object

1

2

1 2 (3,4) 5

0 29 19 74

0 30 59

Object (3,4)

0 32

5

Object

(1,3,4)

2

5

(1,3,4) 2 5

0 29 32

0 59

0

0

Maximum linkage method squared Euclidean distance matrices

Object

Object Object

1

2

(3,4)

1 2 (3,4) 5

0 29 54 74

0 63 59

0 37

5

Object

(1,2)

(3,4)

(1,2) (3,4) 5

0 63 74

0 37

0

Unweighted pair-group method squared Euclidean distance matrices Object Object

1

2

1 2 (3,4) 5

0 29 36.5 74

0 46.5 59

Object (3,4)

0 34.5

5

0

Centroid method squared Euclidean distance matrix Object Object 1

2 (3,4) 5

1

2

(3,4)

0 43.3 59

0 31.2

0

29 33.3 74

5

Object

(1,2)

(3,4)

(1,2) (3,4) 5

0 41.5 66.5

0 34.5

5

II. CLUSTER ANALYSIS

31

I

quite different cluster patterns. In general, the maximum linkage method tends to generate clusters that are compact and spherical.

3. Average Linkage Method The average linkage cluster method (Sokal & Michener, 1958) joins elements to existing clusters based on the average degree of similarity between the element and the existing members of the cluster. The average linkage method actually represents a set of related methods that differ with respect to how the "averaging" is conducted. Romesburg (1984) describes the unweighted pair-group method that uses arithmetic averages of pairs to recalculate the distance matrix. Table 11.3 illustrates distance matrices derived from the squared Euclidean distance matrix using the unweighted pair-group method. Note that the distance between 1 and duster (3,4) is the average of the distances between 1 and 3 (19) and 1 and 4 (54). Similarly, the distance between cluster (1,2) and cluster (3,4) is represented by the equation (19 + 54 + 30 + 63)/4. 4. Centroid Method Somewhat related to the unweighted pair-group method is the centroid method of cluster analysis. In this method, distance is defined as the distance between group centroids. Whereas the arithmetic average procedure recalculates distance matrices by averaging previously determined distance values, the recalculation of a distance matrix in the centroid method is preceded by updates to the raw data matrix. Table 11.1. shows an updated raw data matrix reflecting the combination of objects 3 and 4 during the first stage of a centroid cluster analysis. Note that cluster (3,4) now has raw scores that are the arithmetic average of the scores previously belonging to participant 3 and participant 4. Table 11.3 also shows the updated distance matrix resuiting from these raw scores. Note that this method generates a distance matrix slightly different from that generated by the averaging method described above. 5. Minimum Variance Method A final group of agglomerative methods frequently used by researchers are sum-of-square methods or minimum variance methods. Ward's (1963) is probably the most widely used minimum variance method, although other sum-of-squares procedures are available to the researcher (Friedman & Rubin, 1967; Wishart, 1969). The relative proximity of a set of objects can be described using the concept of sum of squares (the squared sum of the distances of each object from the mean value of the cluster). Using Ward's method, the cluster that results in the smallest increase in the sum of squares is formed during each step. Every possible combination of cluster formations is considered at each subsequent step.

312

PAUL A. GORE, JR.

The raw data matrix in inverted (Q) format shown in Table 11.1 can be used to illustrate this procedure. For the purposes of brevity, let us consider only participants 1-3 and variables 1 and 2. In this reduced data set, there are three possible initial cluster formations (1,2), (1,3), and (2,3). The following equations are analyzed to determine the lowest sum of squares values: Cluster (1,2): (1-1.5) 2 + (5-5.5) 2 + (2-1.5) 2 + (6-5.5) 2 + (4-4) 2 + (4-4) 2= 1.00 Cluster (1,3): (1-2.5) 2 + (5-4.5) 2 + (4-2.5) 2 + (4-4.5) 2 + (2-2) 2 + (6-6) 2= 5.00 Cluster (2,3): (2-3) 2 + (6-5) 2 + (4-3) 2 + (4-5) 2 + (1-1) 2 + (5-5) 2

= 4.00

Note that the sum of squares for clusters that contain only one object always sum to zero. Each of these formulae captures the average overall sum of squares of each object from its group (or potential "cluster") mean on each variable measured. The lowest value (1.0) determines that objects 1 and 2 will be combined to form the first cluster. 6. Divisive Methods Whereas hierarchical agglomerative methods begin by treating every object as a unique cluster, hierarchical divisive methods begin by assuming that all objects are members of a single cluster and proceed by establishing clusters with successively smaller membership. Monothetic divisive methods are typically used with binary data. Objects are initially separated based on whether or not they posses some specific attribute. Lambert and Williams (1962, 1966) and MacNaughton-Smith (1965) described related procedures for determining which variable to select for clustering at each step of the analysis. These procedures often rely on chi-square pair-wise comparisons, among variables to determine the cluster criterion. 7. Polythetic Hierarchical Methods In contrast to monothetic methods, polythetic hierarchical methods can accommodate clusters of objects that have been measured on continuous scales. MacNaughton-Smith, Williams, Dale, and Mockett (1964) described the most commonly used procedure. An initial cluster consisting of one object is formed by determining which object has the largest average distance from other objects. In the distance matrix in Table 11.2, object 1 differs from objects 2 through 5 by an average of 44 units (e.g., [29 + 19 + 54 + 74]/4). This value is larger than the average distance among other comparisons and thus object 1 is clustered out. During subsequent stages the average distances of each object from other objects in the main cluster (e.g., "within-group" distance) are calculated, and the betweengroup distance is subtracted from the within-group distance (see Table 11.4). The object having the highest positive value (i.e., object 2 in Table

II. CLUSTERANALYSIS

3 13

T A B L E I 1.4 Calculations Demonstrating the Use of the Polythetic Cluster Method ..

.

.

.

.

.

Participant Participant Participant Participant

,

2 3 4 5

.

.

,

-

-

Average distance to group (1) (a)

Average distance to main group (b)

Difference (b)-(a)

29 19 54 74

50.6 26.3 36.0 42.6

21.6 7.3 - 18.0 -31.0

11.4) is incorporated into the newly formed cluster 1, and the process is repeated. The interested reader is referred to Everitt (1986) for a more comprehensive example. E. Partitioning Methods

Iterative partitioning methods, also frequently referred to as two-stage cluster analysis or k-means partitioning, were developed, in part, in response to one of the major shortcomings of the hierarchical methods. Once an object is clustered in hierarchical methods, it cannot be reassigned to a "better fitting" cluster at some subsequent stage of the process. Iterative partitioning methods simultaneously minimize within-cluster variability and maximize between-cluster variability. A typical partitioning method begins by partitioning a set of data into a specific number of clusters. Objects are then evaluated for potential membership in each of the formed clusters, and the process continues until no further cluster assignment or reassignment occurs among objects or until a predetermined number of iterations has been reached. Although the most efficient means of partitioning the data into clusters would be to generate all possible partitions, this method is computationally impractical given the size of most data sets. Partitioning procedures differ with respect to the methods used to determine the initial partition of the data, how assignments are made during each pass or iteration, and the clustering criterion used. Research shows that initial partitions based on random assignment results in poor cluster recovery (Milligan, 1980; Milligan & Sokal, 1980; Scheibler & Schneider, 1985), leading investigators to propose methods for generating initial cluster partitions (Milligan & Sokal, 1980; Punj & Stewart, 1983). The most frequently used method assigns objects to the clusters having the nearest centroid. This procedure creates initial partitions based on the results from preliminary hierarchical cluster procedures such as the average linkage method or Ward's method, a procedure that resulted in partitioning meth-

314

PAUL A. GORE,JR.

ods being referred to as two-stage cluster analysis. Some partitioning methods use multiple passes during which cluster centroids are recalculated and objects are re-evaluated, whereas other methods use a single-pass procedure. Partitioning methods also differ with respect to how they evaluate an object's distance from cluster centroids; some procedures use simple distance and others use more complex multivariate matrix criteria. Finally, most partitioning methods require that the user specify a priori how many clusters will be formed. As such, these methods prove most useful when the researcher has well-formulated hypotheses about the number of clusters to expect.

F. Overlapping and Ordination Methods Overlapping cluster methods or "clumping techniques" permit clustered objects to belong to more than one cluster and are thus useful when an investigator has sound reason to believe there to be overlap among hypothesized clusters. These are methodologically complex procedures that have not been widely used by social and behavioral scientists. Ordination clustering algorithms are often referred to as inverse factor analysis or Q-type factoring because they essentially involve factor analyzing an inverted raw data matrix ("Q" matrix) like that shown in Table 11.1. Several authors have discussed the potential problems of applying factor analytic technology to object clustering problems (Everitt, 1986; Fleiss, Lawlor, Platman, & Fieve, 1971; Gorsuch, 1983). Among the charges levied against Q-type factoring is the fact that these procedures may be in violation of assumptions of the general linear model.

G. Empirical Investigations of Cluster Methods Studies evaluating cluster methods typically use the recovery of a known cluster structure as a benchmark. Many of these studies adopt Monte Carlo techniques and vary one or more characteristics (e.g., cluster structure, number of objects, number of variables, or type of algorithm). Although the literature examining the performance of cluster methods is voluminous, the vast majority of studies have focused on evaluating the most widely used hierarchical methods. Punj and Stewart (1983) reviewed 12 validity studies conducted between 1972 and 1980. The reviewed studies used a wide range of hierarchical methods including single, complete, centroid, median, and average linkage, and Ward's method, in addition to the k-means partitioning method. In general, these authors concluded that Ward's method and the average linkage method outperformed all other hierarchical methods. The k-means partitioning method appears to provide recovery that rivals that of the better hierarchical methods, but only when a nonrandom starting point is used. Punj and Stewart also pointed out the deleterious effects of adding spurious variables to the analysis and of

II. CLUSTERANALYSIS

3 15

including all objects in the final solution. Cluster solutions that incorporate most or all of the available objects tend to include more outliers (see Venter and Maxwell, chapter 6, and Huberty and Petoskey, chapter 7, this volume, for a discussion of procedures for detecting and dealing with outliers.) This issue relates directly to an investigator's determination of how many clusters to interpret and will be addressed in more detail below. In comparing nine hierarchical and four nonhierarchical cluster methods, Scheibler and Schneider (1985) found hierarchical procedures to be quite robust at all but the most extreme levels of coverage (e.g., those situations where all objects must be clustered). Ward's and the average methods of clustering were the most accurate of the methods explored. Consistent with previous findings, partitioning methods performed best when nonrandom starting partitions were used. Similar findings were reported by Overall et al. (1993) and Milligan (1981a). The interested reader is referred to Milligan and Cooper (1987) for a comprehensive review of cluster algorithm performance. In general, however, researchers using hierarchical methods should consider using either Ward's or the average linkage method. These methods perform well under a range of circumstances (e.g., in the presence of outliers and overlapping cluster structures). Some data suggest that Ward's method is best used with distance matrices, whereas the average linkage method works best with similarity matrices. I recommend that researchers consider using both the average linkage method and Ward's method with similarity and distance matrices. That will allow you to rule out method artifact if consistent cluster structures are obtained across the analyses. Investigators using partitioning methods should establish initial partitions based on preliminary hierarchical analyses. H. Deciding on the Number of Clusters to Interpret

Once a cluster algorithm has been executed, the researcher is faced with the task of deciding how many clusters to interpret. By definition most clustering procedures will continue unabated until all objects in the data set have been assigned to one or more clusters. In the case of hierarchical cluster algorithms, all objects will ultimately exist within a single all-inclusive cluster~a solution that holds no interest for applied researchers. Fortunately, a number of external and internal criteria exist to assist the researcher in determining the best number of clusters to interpret. External criteria use information outside of the cluster solution to evaluate the data partition. A large number of external criteria have been developed including the Rand (Rand, 1971), the Jaccard (Anderberg, 1973), and the kappa statistics (Blashfield, 1976). Monte Carlo studies suggest that some external criteria are better than others. For example, Milligan and Cooper (1986) found the Hubert and Arabie (1985) adjusted Rand index to be superior to four other external criteria across various combina-

316

PAUL A. GORE.JR.

tions of cluster number and cluster algorithm. Unfortunately, external criteria are rarely of much use to the applied researcher since the true cluster structure is almost never known a priori. Internal criteria, or stopping rules, use information inherent in the cluster solution to determine the degree of fit between the data partition and the data used to create the partitions. The large number of internal criteria precludes a detailed description of them here, but the results from a number of empirical studies reveal that the criteria vary in validity. For example, Milligan (1981b) compared thirty different internal criteria to two external criterion indices and concluded that the Baker and Hubert (1975) Gamma and the Hubert and Levin (1976) C-index were the most successful at identifying the correct number of clusters in a data set. Overall and his colleagues (Atlas & Overall, 1994; Overall & Magee, 1992) advocate crossvalidating the cluster structure (partition replication) as a means of determining the correct number of clusters in a set of data. Although the procedure described by Overall will result in a rigorous test, it is quite cumbersome and involves multiple hierarchical and higher order cluster procedures on subsets of data. McIntyre and Blashfield (1980) described a somewhat less cumbersome replication process. The applied researcher may find it difficult to implement many of the stopping rules advocated above. They are often statistically complex and not readily available in popular statistical software (e.g., SAS and SPSS). SPSS currently includes a "coefficient" as part of the default agglomerative schedule printout. Coefficient values represent the square Euclidean distance between the two objects (or clusters) being joined. As such, small coefficients indicate that fairly homogeneous clusters are being joined, whereas larger values indicate that dissimilar clusters or objects are being joined. Many researchers rely on these values to suggest the number of interpretable clusters in much the same way as factor analysis researchers rely on the scree plot to determine how many factors to extract. SAS includes the cubic clustering criterionma method that was favorably reviewed by Milligan and Cooper (1985). Atlas and Overall (1994) described a method of calculating ANOVAs at each hierarchical level that is relatively simple to perform and provides a more empirical approach to determining the number of clusters to interpret. Until more internal stopping rules are integrated into the popular computer statistics programs, however, researchers will need to rely on theoretical rationale, subjective inspection, or additional time-consuming statistical computations to determine the best number of clusters to interpret.

I. External Validity The final stage of developing a classification system involves establishing the external validity of the cluster structure. Issues that might be addressed

II. CLUSTERANALYSIS

3 I7

in such studies include exploring the cluster structure's predictive and descriptive utility and its consistency across samples. At the very least, researchers are encouraged to replicate their cluster solutions on independent samples from the same population. If the researcher wishes to make inferences beyond the population used in the study, then the cluster solution should be validated on diverse populations. As Skinner (1981) points out, the inferences that we draw regarding the meaning of our classification scheme rely on these generalizability studies. Cross-validation samples from the same population are often obtained by randomly dividing an existing sample into two or more subsamples prior to conducting a cluster analysis (see Tinsley & S. Brown, chapter 1, this volume). In addition to investigating the generalizability of their results to other samples, researchers should also attend to the generalizability of their classification structures to alternative measures of the same constructs. A cluster solution gains strength when an investigator is able to replicate that solution using different observations. When the cluster study is well grounded in theory, the researcher is able to evaluate the validity of the data partition by comparing the classification obtained to theoretical postulates about the relations among relevant constructs. Finally, the researcher's ultimate objective is often to use the obtained cluster solution to predict other phenomenon. Thus, the concurrent and predictive utility of the cluster solution must be evaluated carefully and separately across all samples to which the researcher wishes to generalize. For example, it is one thing to establish a highly reliable classification of political and economic stability among emerging nations, it is somewhat more impressive, however, if that classification system is capable of predicting economic growth within the United Statesresulting from increased foreign trade with those nations. It is not sufficient merely to establish a classification system; a critical analysis of the ability of that system to provide information relevant to real-world questions and problems is necessary to establish the system's usefulness.

J. Presenting Cluster Analysis Results Aldenderfer and Blashfield (1984), among others (e.g., Romesburg, 1984), provide investigators with guidelines for reporting cluster analysis results. Researchers should begin by explaining and describing the theoretical or empirical framework used to guide the cluster study and how objects and observations were selected for inclusion in the study. They should provide unambiguous descriptions of the cluster and similarity or distance methods used in a study so that researchers from different content areas will be able to identify their method from among the myriad of procedures. Because the choice of similarity or distance measures can affect the outcome of a

318

PAUt. A. ~OR,:. JR.

cluster solution, it is vital that researchers report which procedure was used in their study. Also identify the computer program used to establish the cluster solution because various computer programs will occasionally generate different cluster solutions using the same data and cluster method (Blashfield, 1977). Finally, researchers must provide a cogent description of how clusters were selected and evidence for the validity of their cluster structure.

IV. C O N C L U S I O N

The use of cluster methods has increased dramatically in the last 30 years, but many researchers still fail to use the procedure when applicable or they use it improperly. Because cluster analysis is not a single standardized procedure, and there are pitfalls associated with its improper use, care is required in its application. Nevertheless, the perceived difficulties involved in conducting a cluster analysis study are greatly outweighed by the potential usefulness and flexibility of these procedures. Investigators should use theory to guide their research questions and to identify the populations and variables of interest; it is imperative to make this step explicit. Also, use theory to guide your choice of a measure of association (i.e., a distance or similarity measure) and clustering algorithm (e.g., hierarchical or partitioning method). Conduct your analyses using more than one method to increase your confidence in your findings (and indirectly to contribute to cluster methods research). Although a number of statistical options exist for determining the best number of clusters to interpret, cross-validation remains one of the best ways of demonstrating the internal validity of a cluster solution. Consider the cluster study as a first step and not as an end in itself whenever you are interested in the how clusters relate to other phenomenon. Finally, thoroughly describe your procedure when preparing a written report.

REFERENCES Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Thousand Oaks, CA: Sage Publications. Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic Press. Atlas, R. S., & Overall, J. E. (1994). Comparative evaluation of two superior stopping rules for hierarchical cluster analysis. Psychometrika, 59, 581-591. Baker, F. B., & Hubert, L. J. (1975). Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association, 70, 31-38. Blashfield, R. K. (1976). Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 83, 377-388. Blashfield, R. K. (1977). On the equivalence of four software programs for performing hierarchical cluster analysis. Psychometrika, 42, 429-431.

II. CLUSTERANALYSIS

3 I?

Blashfield, R. K., & Draguns, J. G. (1976). Evaluative criterion for psychiatric classification. Journal of Abnormal Psychology, 85, 140-150. Blong, R. J. (1973). A numerical classification of selected landslides of the debris slideavalanche-flow type. Engineering Geology, 7, 99-114. Borgen, F. H., & Barnett, D. C. (1987). Applying cluster analysis in counseling psychology research. Journal of Counseling Psychology, 34, 456-468. Cattell, R. B. (1944). A note on correlation clusters and cluster search methods. Psychometrica, 9, 169-184. Cattell, R. B., Coulter, M. A., & Tsujioka, B. (1966). The taxonometric recognition of types and functional emergents. In R. B. Cattell (Ed.), Handbook of multivariate experimental psychology. Chicago: Rand McNally. Cattell, R. B. (1988). The data box: Its ordering of total resources in terms of possible relational systems. In: J. R. Nesselroade & R. B. Cattell (Eds.), Handbook of multivariate experimental psychology (2nd ed., pp. 69-130). New York: Plenum Press. Chartrand, J. M., Martin, W. F., Robbins, S. B., McAuliffe, G. J., Pickering, J. W., & Calliotte, J. A. (1994). Testing a level versus an interactional view of career indecision. Journal of Career Assessment, 2, 55-69. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity between profiles. Psychological Bulletin, 50, 456-473. Everitt, B. (1986). Cluster analysis (2rid ed.). New York: John Wiley & Sons. Filsinger, E. E. (1990). Empirical typology, cluster analysis, and family-level measurement. In T. W. Draper, A. C. Marcos (Eds.), Family variables: Conceptualization, measurement, and use. Thousand Oaks, CA: Sage Publications. Fleiss, J. L., Lawlor, W., Platman, S. R., & Fieve, R. R. (1971). On the use of inverted factor analysis for generating typologies. Journal of Abnormal Psychology, 77, 127-132. Friedman, H. P., & Rubin, J. (1967). On some invariant criteria for grouping data. Journal of the American Statistical Association, 62, 1159-1178. Gati, I., Osipow, S. H., & Fassa, N. (1994). The scale structure of multi-scale measures: Application of the split-scale method to the Task Specific Occupational Self-Efficacy Scale and the Career Decision Making Self-Efficacy Scale. Journal of Career Assessment, 2, 384-397. Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Gould, S. J. (1989). Wonderful life: The Burgess Shale and the nature of history. New York: Norton. Hands, S., & Everitt, B. (1987). A Monte Carlo study of the recover of cluster structure in binary data by hierarchical clustering techniques. Multivariate Behavioral Research, 22, 235-243. Heppner, P. P., Kivlighan, D. M. Jr., Good, G. E., Roehlke, H. J., Hills, H. I., & Ashby, J. S. (1994). Presenting problems of university counseling center clients: A snapshot and multivariate classification scheme. Journal of Counseling Psychology, 41, 315-324. Horn, D. (1943). A study of personality syndromes. Character and Personality, 12, 257-274. Hubert, L. J., & Arable, P. (1985). Comparing partitions. Journal of Classification, 2, 193-218. Hubert, L. J., & Levin, J. R. (1976). A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin, 83, 1072-1080. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241-254. Kendall, M. G. (1963). Rank correlation methods (3rd ed.). London: Griffin. Lambert, J. M., & Williams, W. T. (1962). Multivariate methods in plant ecology, IV. Nodal analysis. Journal of Ecology, 50, 775-802. Lambert, J. M., & Williams, W. T. (1966). Multivariate methods in plant ecology, IV. Comparison of informational analysis and association analysis. Journal of Ecology, 54, 635-664.

3 20

PAUL A. GORE, JR.

Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting strategies: 1 hierarchical systems. Computer Journal, 9, 373-380. Larson, L. M., & Majors, M. S. (1998). Applications of the Coping with Career Indecision instrument with adolescents. Journal of Career Assessment, 6, 163-179. Lorr, M. (1994). Cluster analysis: Aims, methods, and problems. In: S. Strack & M. Lorr (Eds.), Differentiating normal and abnormal personality (pp. 179-195). New York: Springer Publishing. MacNaughton-Smith, P. (1965). Some statistical and other numerical techniques for classifying individuals. Home office research unit report No. 6. London: H.M.S.O. MacNaughton-Smith, P., Williams, W. T., Dale, M. B., & Mockett, L. G. (1964). Dissimilarity analysis. Nature, 202, 1034-1035. Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science, Calcutta, 12, 49-55. Mclntyre, R. M., & Blashfield, R. K. (1980). A nearest centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225-238. McLaughlin, M. E., Carnevale, P., & Lim, R. G. (1991). Professional mediators' judgements of mediational tactics: Multidimensional scaling and cluster analysis. Journal of Applied Psychology, 76, 465-472. McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educational and Psychological Measurement, 17, 207-229 M edin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44,1469-1481. Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325-342. Milligan, G. W. (1981a). A review of Monte Carlo tests of cluster analysis. Multivariate Behavioral Research, 16, 379-407. Milligan, G. W. (1981b). A Monte Carlo test of thirty internal criterion measures for cluster analysis. Psychometrika, 46, 187-199. Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159-179. Milligan, G. W., & Cooper, M. C. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21, 441-458. Milligan, G. W., & Cooper, M. C. (1987). Methodology review: Clustering methods. Applied Psychological Measurement, 11, 329-354. Milligan, G. W., & Cooper, M. C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181-204. Milligan, G. W., & Sokal, L. M. (1980). A two-stage clustering algorithm with robust recovery characteristics. Educational and Psychological Measurement, 40, 755-759. Morris, L. C., Blashfield, R. K., & Satz, P. (1981). Neuropsychology and cluster analysis. Journal of Clinical Neuropsychology, 3, 79-99. Overall, J. E., Gibson, J. M., & Novy, D. M. (1993). Population recovery capabilities of 35 cluster analysis methods. 1ournal of Clinical Psychology, 49, 459-470. Overall, J. E., & Klett, C. (1972). Applied multivariate analysis. New York: McGraw Hill. Overall, J. E., & Magee, K. N. (1992). Replication as a rule for determining the number of clusters in hierarchical cluster analysis. Applied Psychological Measurement, 16, 119-128. Price, L. J. (1993). Identifying cluster overlap with NORMIX population membership probabilities. Multivariate Behavioral Research, 28, 235-262. Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20, 134-148. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846-850. Romesburg, H. C. (1984). Cluster analysis for researchers. Belmont, CA: Lifetime Learning Publications.

II. CLUSTERANALYSIS

321

Schaffer, C. M., & Green, P. E. (1996). An empirical comparison of standardization methods in cluster analysis. Multivariate Behavioral Research, 31, 149-167. Scheibler, D., & Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithmsmA comparison of hierarchical and nonhierarchical methods. Multivariate Behavioral Research, 20, 293-304. Schweizer, K., Braun, G., & Boiler, E. (1994). Validity and stability of partitions with different sample sizes and classification methods: An empirical study. Diagnostica, 40, 305-319. Sireci, S. G., & Geisinger, K. F. (1992). Analyzing test content using cluster analysis and multidimensional scaling. Applied Psychological Measurement, 16, 17-31. Skinner, H. A. (1979). Dimensions and clusters: A hybrid approach to classification. Applied Psychological Measurement, 3, 327-341. Skinner, H. A. (1981). Toward the integration of classification theory and methods. Journal of Abnormal Psychology, 20, 68-87. Sneath, P. H. A. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17, 201-226. Sokal, R., &Sneath, P. (1963). Principles ofnumeric taxonomy. San Francisco: W. H. Freeman. Sokal, R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409-1438. Speece, D. L. (1995). Cluster analysis in perspective. Exceptionality, 5, 31-44. Tryon, R. (1939). Cluster analysis. New York: McGraw Hill. Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236-244. Wishart, D. (1969). An algorithm for hierarchical classifications. Biometrics, 25, 165-170.