Bootstrapping and correspondence analysis in archaeology

Bootstrapping and correspondence analysis in archaeology

Journal of Archaeological Science1992,19,615-629 Bootstrapping and Correspondence Analysis in Archaeology T. J. Ringrose’ (Received 12 April 1991, re...

1MB Sizes 0 Downloads 101 Views

Journal of Archaeological Science1992,19,615-629

Bootstrapping and Correspondence Analysis in Archaeology T. J. Ringrose’ (Received 12 April 1991, revised manuscript accepted 9 December 1991) Analysis is a statistical technique for producing graphical displays of frequency data in the form of contingency tables. It has recently become popular in quantitative archaeology, particularly with the advent of fast, easy-to-use PC pro-

Correspondence

grams to perform it. This paper describes a method which allows a certain amount of “significance” to be attached to the spread of points on a Correspondence Analysis display, and in addition presents the results of a simulation study which gives

important information concerning the properties of this method. Keywords: CORRESPONDENCE SIMULATION, MNI, NISP.

ANALYSIS,

BOOTSTRAPPING,

1. Introduction Correspondence Analysis (CA) is a statistical technique which has become widely used by archaeologists in recent years. It provides an informative, but informal, graphical representation of the cross-classifying variables in a contingency table. In the archaeological case the tables are most frequently locations (sites, pits, levels) by some type classification (species, artifact type), with the cell values being the numbers of that type found in that location. However, CA is a purely deterministic, algebraic technique, so that there is little indication of the strength, or otherwise, of any apparent relationships. This paper describes a method, “bootstrapping”, which allows some degree of statistical significance to be attached to CA displays. The results of a simulation study are also reported; these aid significantly the interpretation of such bootstrap analyses. Section 2 describes CA itself and Section 3 describes the bootstrap method while Section 4 outlines the design and results of the simulation study and finally Section 5 gives some example analyses, interpreted in the light of the simulation results. Ringrose (1988a, b) gives further examples of the use of the technique described in Section 3. The study in Section 4 sheds new light on the behaviour of the method and the present account reconsiders these earlier results in view of this new information. 2. Correspondence Analysis CA is in many ways similar to the well-known techniques of Principal Components Analysis (PCA) and Factor Analysis (FA), although there are important differences. The “Department Aberdeen,

of Mathematical Sciences, Dunbar Street, Old Aberdeen

Edward Wright Building, AB9 2TY, Scotland, U.K.

University

615 03054403/92/060615

+ 15 SOS.OO/O

0 1992 Academic

Press Limited

of

616

T. J. RINGROSE

most noticeable of these is that CA produces coordinate values for both the rows and columns of the data matrix, rather than just the rows. A row and a column point will be close if the value at their intersection is (relatively) large, although distances between row and column points are not defined, and hence non-interpretable (a common misconception). Additionally, whereas PCA and FA are intended for “true” multivariate data, where the rows represent “objects” and the columns define the attributes of these objects, CA is intended for contingency tables, where both rows and columns define variables and the cell values are counts of individuals cross-classified by these variables. However, data of the latter kind can usually be regarded as being of the former kind, so that the distinction is often relatively unimportant. The descriptions below deal with the French method of “Analysedes Correspondances”, set out by Benz&i (1973) and described in English by Greenacre (1984), Greenacre & Hastie (1987) and Lebart, Morineau & Warwick (1984). The algebra of the technique seems to date to the 1930’s, and has been “discovered” more than once, resulting in several methods which essentially differ only in their way of presenting the results, in particular the “dual scaling” of Nishisato (1980) and the “reciprocal averaging” of Hill (1974). Greenacre (1984) briefly traces the history of these methods, but emphasizes the difference of the French approach in its concentration on multi-dimensional geometry and graphical display. Tenenhaus & Young (1985) also describe the relationships between the methods. The biplot (see, for example, Gabriel, 1981, orJolliffe, 1986: 75-85) is another way of simultaneously displaying both rows and columns, albeit with the columns usually shown as lines rather than points. There are various forms of biplot, some of which are very similar to CA, and one of which gives identical numerical results to PCA (but displayed in the biplot style). Indeed, nearly all exploratory multivariate methods are heavily related (see Greenacre, 1984: Appendix A), and in particular it should be noted that the CA co-ordinate values can, in fact, be calculated as resealed versions of the PCA co-ordinate values from a transformed data matrix and its transpose (Jolliffe, 1986: 202-203). Archaeological works in English using CA include Avery & Underhill (1986) Bolviken et al. (1982), Denys (1985) Holm-Olsen (1981) and Powers, Padmore & Gilbertson (1989), although there were earlier works in French including Djindjian (1977a, b) and Boutin, Tallur & Chollet (1977). A statistical paper by Hill (1974) included an archaeological example, while the technique was used earlier in other historical sciences, as in Melguen (1974) and Spicer & Hill (1979). Balviken et al. (1982) briefly describe the algebra for archaeologists. It would not be appropriate to give a substantial description of the technique here, but its main practical features are outlined by reference to the well-known classical PCA method, while certain algebraic formulae are necessary in order to facilitate later discussion of the bootstrap technique; the notation used below broadly follows that of Greenacre (1984). Define the following: X= the data matrix divided by the sum of all its elements vector of row sums of X c = vector of column sums of X D, = diag(r) = “row masses” D, = diag (c) = “column masses” R = D,-'X= matrix of “row profiles” C= Dc-'XT=matrix of “column profiles” r =

where AT denotes the transpose of the matrix A and diag (v) denotes a matrix with the elements of the vector v along the main diagonal and zeros elsewhere. Thus the row profile

CORRESPONDENCE

ANALYSIS

IN ARCHAEOLOGY

617

matrix is just the original matrix divided through by its row sums and the column profile matrix is the original matrix divided through by its column sums and then transposed. Although, as noted above, CA was designed for the analysis of contingency tables, the algebra can still be formally carried through on “true” multivariate data, providing all the values are non-negative and the possible alterations in interpretation are borne in mind. Greenacre (1984: Sections 9.6,9.10) gives two examples of this. In fact we do not use X, R, C but centred versions of them, which has the effect of removing a trivial first dimension from the analysis. We then used the Generalized Singular Value Decomposition of the centred data matrix: X - rcT = AD,BT

where A’D;‘A

= I = BTD;‘B

and D, = diag (a), with

Note that Zis the identity matrix (ones along the main diagonal and zeros elsewhere). Then the columns of B and A define the axes of the row and column points respectively. Hence, if Fand G are the matrices containing the co-ordinates of the row and column points relative to the new axes (with each row of F containing the co-ordinates of a row point and each row of G the co-ordinates of a column point), then they are calculated as simple inner products in the x2 metric, that is F = (R - lcT)D,-‘B

G = (C - lrT)D,-‘A. Further, it can be shown that F = RGD,’

(1)

G = CFD,‘.

(2)

These are often called the “transition formulae”, since they show how one set of coordinates can be derived from the other, and are the basis of the reciprocal averaging method, which can be viewed as an iterative procedure for calculating the Singular Value Decomposition. The presence of D,-’ and 0;’ produces a “downweighting” effect so that the technique pays more attention to the “shape” of the rows and columns, and less to their absolute size. PCA on either the rows or the columns would not have this feature. In particular, the characteristic of PCA that the first axis is simply a weighted measure of the “size” of the rows does not occur in CA because of this. Another consequence of the algebra of CA is that if two rows have the same relative abundances but different absolute abundances (e.g. one is exactly twice the other), then in CA these will have exactly the same co-ordinates on the axes, differing only in their contributions to the overall positionings of the axes. However, in PCA they will have different positions (the larger will be further from the origin than the other on each axis). In the archaeological context it is not entirely clear which characteristic is preferable, since absolute abundances are factors of interest, but are quite likely caused by processes which are not of interest, at least directly, such as postdepositional destruction. Intuitively, however, the Correspondence Analysis version seems perhaps more reasonable in most practical applications.

618

T. J. RINGROSE

The overall “inertia” (similar to variance but including the above downweighting) of the data is decomposed along the axes, with pi being the inertia of the i-th axis in both the row and the column space. In addition, the inertia of an axis can be decomposed among the different points so that each point’s contribution to the position of that axis can be found, as well as allowing the total inertia of a point to be decomposed along the different axes to show how well each point is represented by each axis. These are usually shown as percentages and are important adjuncts to the scatterplots, in the same way that eigenvector coefficients and factor loadings are in PCA and FA respectively. The “horseshoe effect” (or Guttman effect) denotes the phenomenon sometimes occurring in CA, PCA or similar “ordination” methods that all or most of the plotted points appear in a curve. This is due to the fact that, although the axes are orthogonal, non-linear relationships may still exist between them. The technique of “detrended” Correspondence Analysis has been developed by Hill & Gauch (1980) to avoid this problem, and involves centring the row (but not column) scores at each stage in the reciprocal averaging algorithm. This follows from the usual ecological usage of a sites by species presence/absence or abundance matrix where the orders of the sites along the first few axes are thought to represent underlying environmental gradients. Such data are precisely those most likely to exhibit the “horseshoe effect” (Greenacre, 1984), and if the main object is to produce a site ordination of this kind then it is easy to see why ecologists are so concerned about it. However, the methodology is defined only in terms of certain actions inside a computer program (DECORANA) and it is far from clear that such actions will not cause more problems than they solve. Inaddition, even in the ecological case, it is not clear that the problem is as serious as Hill & Gauch seem to think (Greenacre, 1984: 226232). In summary, it seems clearly preferable to keep to the well-understood standard Correspondence Analysis method. This is also the view expressed in Greenacre (1984) and Digby & Kempton (1987) and by J. C. Gower (pers. comm.) The analysis of a two-way contingency table described above is sometimes referred to as “Simple Correspondence Analysis”. Multiple Correspondence Analysis is a version of the method applicable to contingency tables of more than two variables, and essentially just involves applying the same algebra to a different coding of the data. This is a rather less satisfactory method than ordinary CA, especially with respect to the inertias, and Greenacre (1988) has proposed an improved version. In recent years there has been an upsurge of interest in Correspondence Analysis in the statistical literature, in particular in its relationship to log-linear modelling of contingency tables (Leese & Needham, 1986, describe this method for archaeologists). In particular, Goodman (1986) and Van der Heijden, de Falguerolles &de Leeuw (1989), and references and comments therein, show the similarities in the two approaches, and illustrate the steps being taken to put Correspondence Analysis on a more model-based footing. However, in the discussion of the latter paper, Greenacre states Benz&i’s philosophy of the method as a “data re-expressing instrument”, a view with which the present author agrees. In particular, log-linear models are unable to cope easily with sparseness (a data matrix with a large number of zeros), which is a feature of many archaeological data sets. 3. Bootstrapping

“Bootstrapping” is the general statistical term for a number of “resampling” methods; the general principles are expounded by Efron (1982) and Efron & Gong (1983). In the case of CA the objective is to assessthe statistical stability of the scatterplots, that is to determine if small differences in the data matrix can produce (relatively) large differences in the plot(s), and so to assess whether differences in the positions of points on the plots are “significant” or not. The mathematics required for theoretical statistical analysis of this issue is prohibitively complex and so a simulation method is required; the “bootstrapping”

CORRESPONDENCE

ANALYSIS

IN ARCHAEOLOGY

619

used here is a form of simulation based on statistically perturbing the observed data. The description given here is developed from Greenacre (1984: 214218); he gives no references for the idea, and it is not clear whether there are previous uses of this method in the French literature. In fact, Lebart, Morineau & Warwick (1984) propose the use of a confidence region based on the x2 distribution, producing circles around points similar to those placed around group centroids in Canonical Variate Analysis (CVA; see Krzanowski, 1988, for example). However, these only work for points having low contributions to the first two axes, and hence are not useful in general. In the following we only consider the first two axes from the analysis, although higher dimensionalities can be considered in practice. Depending on the values of the inertias of the axes this restriction of attention may or may not be appropriate, but this can be compared with the aforementioned CVA confidence regions which are nearly always presented in only two dimensions. If it were possible to generate more data (i.e. sample) from the underlying statistical distribution from which the observed data were drawn, then this could be performed many times to give a set of “replicate” data matrices, each of which could undergo Correspondence Analysis to produce a complete new set of points. Hence each point in the original analysis could give rise to a cloud of points (one from each replicate matrix) representing the “area of uncertainty” of the point’s “true” position. The overlapping or non-overlapping nature of the different clouds could be used informally to assess the “significance”, or otherwise, of similarities or differences between different locations or types. However, this additional sampling is unlikely to be possible, so we treat the observed sample itself as a proxy for the underlying distribution and draw the new samples from it instead. This is referred to as “resampling”. The idea behind this is that the resampling distribution (defined by the observed sample) is an “unbiased estimate” of the underlying distribution. Strictly, we should say that the empirical distribution function is an unbiased and consistent estimate of the underlying distribution function. In the case of CA of a contingency table, this amounts to resampling, with replacement, the individuals in the sample, noting their original row and column cross-classification. Hence, if the sum of all the entries in the table is N, and the number in the ij-th cell NV, then, to generate a new “replicate” matrix, we draw a new sample of size N, with the probability of each new individual being in the ij-th cell equal to N,jlN. In fact we do not perform a new Correspondence Analysis for each replicate matrix, which would be far too time consuming, except for very small numbers, and would lead to all the points’ co-ordinates being relative to different axes. Instead, we simply convert the bootstrapped sets of row and column profiles into points on the Correspondence Analysis co-ordinate system calculated from the original data. This is most easily done by using equations 1 and 2. Hence to get the co-ordinates of the bootstrapped row profiles, we plug the bootstrap value of R and the original GDP-’ into the right hand side of equation 1, to get the bootstrap value of F on the left hand side. Given the existence of a good random number generator, and a program or package for CA which can output GQI-’ and FDp-‘, the above procedure is easy to program and runs quite quickly. The above version of the method treats the data set as a single multinominal distribution for generating the bootstrap matrices. In some situations it might be more sensible to treat it as several, one for each row or column (in other words the row or column sums are fixed), or even as one binomial distribution for each cell. In the absence of any convincing argument to the contrary, the single multinominal should probably be used. Additionally, it might sometimes be of use to fix certain rows or columns at their values in the original matrix. However, it should be noted that comparing two columns, say, when the values in the other columns are fixed, will give a different result from comparing them in an ordinary

620

T. J. RINGROSE

bootstrap when only the matrix sum, N, is fixed. This is because the properties of the multinominal distribution mean that the variability, and hence the size of the bootstrap clouds, will tend to be smaller in the former case. This point affects the interpretation of the bootstrap of the lemming abundances given in Ringrose (19883). A further different approach would be to generate the bootstrap matrices by giving the value in each cell a Poisson distribution with mean Nij It is usually convenient to show only the convex hulls of the clouds rather than all of the points, and in the example plots in Section 5 the outermost hull and the hulls containing at least 75% and 50% of the points are drawn. These were calculated by the GreenSilverman convex hull peeling routine (Green & Silverman, 1979). The convex hull of a cloud of points in two dimensions is defined as the lines joining the “outer” points. If an elastic band is imagined to be stretched around the outside of the point cloud, and it is allowed to contract, then the points it touches are the outer points. Peeling is simply the process of successively removing the points on the current hull and recalculating the hull for the remaining points. In practice it is only possible to examine a small number of clouds of points on one plot so that rows and columns are often plotted separately even if there is interest in both. In addition, it is not appropriate to compare overlaps of row point clouds with those of column points since, as noted previously, there is no formal definition of distance between row and column points in the CA method. A problem with the above approach is that, with a sparse matrix, the many zero cells in the original will never contain values greater than nought in the replicate bootstrap matrices. Some sort of “smoothed” bootstrap might be of use in this case. However this comes up against the intractable problem in such data-that of how significant the zeros are. In other words a zero observation may be a reflection of rare, but possible, occurrence, or true impossibility. If, say, a species has abundances of one or two in spits WI, 6-9 and 1l-l 5 then one might say that allowing the possibility of it occurring (in a bootstrap sample) in spits 5 and 10 was eminently reasonable, but that it occurring in spit 20 was much less so. Similarly, if its abundance in those spits had been of the order of 10 or 20 rather than one or two then one might be less happy about the likelihood of occurrences in spits 5 and 10, as well as 1620. Hence any solution to this would probably cause as many problems as it removed. The points which have the greatest spread in a bootstrap display will be those with even distributions, i.e. similar numbers in all their cells, and low frequencies, while the reverse will give a tighter group. However, “confidence clouds” for points containing very few non-zero entries will often be a straight line or a single point, and so are not informative. Note that all of the above discussion requires that the data are a contingency table. In the case of “true” multivariate data a different method for generating the bootstrap matrices would be required. In particular, the methodology could easily be applied to PCA instead. Projecting the bootstrap matrices onto the PCA axes would be effected by simply using the eigenvectors from the original analysis in the obvious way, while generating the bootstrap matrices could be performed by simulating from a multivariate normal distribution, with the variance matrix estimated from the data. A similar procedure could be applied to other methods, such as FA. 4. Simulation Study

This section discusses the design of, and the results obtained from, a simulation study (experiment) which investigated the reliability of the bootstrap “confidence regions” described above. The general idea is to assesswhat “significance level”, in some sense, they represent. Greenacre (1984) does not discuss this point, even though the issue is clearly vitally important.

CORRESPONDENCE ANALYSIS IN ARCHAEOLOGY

621

The fundamental objective is to investigate the behaviour of the bootstrap point clouds, in the form of their convex hulls, in relation to various factors. For simplicity, in the following it will always be assumed that it is the columns of the data matrix which are of interest, and for which bootstrap “confidence regions” have been calculated as described above. The main variables involved in the performance of the regions, and hence the factors used in the experiment performed, are the size and structure of the matrix and the number of bootstraps used. Clearly, the number of rows (n), the number of columns (p) and the sum of the matrix (ZV)are likely to be important, and these are easy to specify. Similarly, the more bootstrap replications are used in the construction of the hulls the larger they are likely to be and the greater the probability of overlaps. However, the “structure” of the matrix is less easily defined, being composed of all the possible relationships between the different columns and rows. Clearly, it would be of interest to see how the overlapping or non-overlapping nature of the convex hulls picks out columns which are genuinely different from the others, but it is difficult to see how to do this sensibly. The confounding of the scale and nature of any differences existing between the columns (how many columns are different, how and to what extent do they differ?) would seem to make any conclusions reached of limited practical use. Since the objective of this experiment is to help establish practical guidelines it seems that the most helpful subject of study would be the performance of the hulls when the columns all come from the same population (in other words all differences in the observed data are merely due to random variation), that is to look at “the null hypothesis” of universal equivalence. In statistical terminology, the experiment will be dealing with the “type 1 error” associated with the “test”. Based on the above justification, the method used is, for each combination of factor levels (described below), to define an underlying structure for the population, which will essentially take the form of a set of multinominal probabilities, and randomly generate 100 matrices following this structure. Each matrix is subjected to Correspondence Analysis and then to the bootstrap procedure described above in order to produce the convex hulls. The results of the experiment will then be summaries of the degrees and frequencies of hull overlap for the different combinations of factor levels. Thus, the study is a two-step simulation (a simulation study of a simulation method). The parameter values used in this experiment were selected to be as relevant as possible to the analysis of A. L. Armstrong’s data from Pin Hole Cave, Creswell Crags, described in Jenkinson (1984) and Jenkinson & Gilbertson (1984). These data are the fossil abundances of 148 vertebrate taxa in 21 l-foot spits, and consist of large numbers of, generally poorly represented, bird species, some fish and bats, large numbers of frogs and toads (all treated as one taxon), and nearly 40 species of mammal, some, in particular the lemmings, relatively abundant. Interest was concentrated mainly on the similarities and differences among the spits. However, the majority of finds are in the top 13 spits, so that the most interesting pair of (n, p) values was deemed to be n = 148, p = 13. Similarly, the most interesting matrix sum values (N) were deemed to be those for the Armstrong data as quantified by the Minimum Number of Individuals (MNI) and the Number of Identified Specimens (NISP); see Klein & Cruz-Uribe (1984) and Grayson (1984) for substantial descriptions and discussions of these measures. The MN1 data are contained in Jenkinson & Gilbertson (1984: Table 25,119-122) while the NISP data were compiled from Jenkinson (1984: Tables 47-8 1,2 18-252). The MN1 data are not really a true contingency table (while the NISP data or, say, counts of artifacts, are), due to their method of calculation, but this is a minor point. It should be noted that, although increasing the matrix sum before the underlying matrix is generated will decrease the size of the hull, as the bootstrap variation will be decreased (Greenacre, 1984: 216-218) this will not necessarily reduce the probability of

622

T. J. RINGROSE

hull overlaps since the variation between the columns in the generated matrix will be reduced also. Greenacre’s examples are the equivalent of generating the matrix and then doubling (say) the numbers, rather than generating the matrix with double the sum. It was decided that the study would be much simplified if all mean column sums in the underlying population were equal, in other words there was no bias for or against any columns in generating the initial data matrices. This avoids problems of comparing different overlaps where some hulls are smaller than others due to there being generally larger values in that column. Some differences will arise from the generation of the data matrix but these are unlikely to be substantial. Information on the effect ofdifferent mean column sums within a population is of interest, but not sufficiently so to make worthwhile the complication entailed; the information on variation in column sums between trials provided by use of the MN1 and NISP values should be sufficient. However, it is not clear that such an approach is sensible in the case of the mean row sums. Taking them all to be equal gives low values in nearly all cells and will never produce values as large as some of those in the Armstrong data set, or as many zeros. Hence another approach would be to allow them to vary, and a sensible way seems to be to do so according to the row sums in the Armstrong data. In other words a data set generated in this way would look just like the Armstrong data set except for the more even distribution of the row sums between the columns. The number of bootstraps used is of interest in particular in order to see if variations have a large effect on the overlap frequencies. If not then a low number may be adequate, while if the overlap frequency for two columns from the same population can be raised to, say, 95%, with a feasible number of bootstraps, then this might be worthwhile. Using values of 100 and 1000 seems reasonable, while 300 is a sensible intermediate value, as the approximate geometric mean of these two. The factor levels considered, then, are as follows: l number of rows and columns (n, p) = (148, 13) (Armstrong) 0 matrix sum (N) = 1895 (Armstrong MNI), 11015 (Armstrong NISP) 0 mean row sums even or based on armstrong MN1 or NISP ones 0 number of bootstrap replications used to generate the convex hulls = 100,300,1000

However, in addition to these variants on the Armstrong data set it is also of interest to see if the characteristics of the method differ with respect to n and p. It was not thought worthwhile conducting a full factorial experiment with all of the above factors included, which would have been prohibitively lengthy, and instead it and p were varied with 300 bootstrap replications, equal mean row sums and Armstrong MN1 matrix sum. The last value was in fact, varied to allow for different matrix sizesin order to keep the “density” of the matrices the same. The values used were n=25, 148 and p= 5, 13,20. These seem reasonably typical of the kind of values likely to be encountered in practice. There is a slight theoretical problem with the possible occurrence of rows or columns with zero sums in the generated data matrices, which would cause the CA algorithm to fail. In fact, this never happened in the equal cell probabilities case, but was very common when mean row sums varied. This was solved pragmatically by putting one individual in each row (in a random column) before performing the main generation. This does, of course, mean that the mean row sums are not quite correct, but it should not be problematic. An alternative, and possibly preferable, course would be to keep a count of how many rows have not had an individual put in them and then, when the number of individuals left to be assigned is equal to this figure, assign one each to these rows. Although overlap rates for all of the lOO%, 75% and 50% hulls were calculated in the experiment, only the results for the 100% hulls will be presented here, as the results for the

CORRESPONDENCE Table 1. Percentages standard deviation

ANALYSIS of outermost

IN ARCHAEOLOGY

hulls overlapping

for

n= 148, p=

size = MN1 Number of bootstraps

623 13, mean and

size = NISP

Mean row sums equal

Mean row sums unequal

Mean row sums equal

Mean row sums unequal

35.71 (5.08) 29.10 (4.94) 22.13 (4.81)

40.77 (7.27) 32.22 (6.86) 25.32

37.81 (4.76) 30.97 (4.31) 24.18 (4.17)

41.31 (6.30) 33.37 (7.15) 26.68 (6.74)

1000 300

100

(644)

Table 2. Percentages of outermost hulls overlapping matrices, 300 bootstraps, mean and standard deviation

for

MNI

(equivalent)

sum

P n

25

5

40.20

(1044) 148

7.70 (7.70)

13

20

76.23 (5.96) 29.10 (4.94)

84.77 (4.06) 40.94 (4.16)

others did not add substantially to understanding of the technique. It should be noted that the 100% hulls are likely to be the most affected by differences in the number of bootstrap replications, and might be thought to be the most variable in general, due to being a function of the most “extreme” points. Hence it might in some cases be more useful to consider, say, the 90% hulls instead, or perhaps the second or third hulls in from the outside. However, every hull is affected by the ones outside it, so that it is intuitively fairly clear that inner hulls might be quite substantially affected by small changes in ones outside them. Hence it is probably best in most cases just to use the outer hulls. Overlaps of all pairs of outer hulls (p(p- 1)/2 per matrix) were calculated. This had to be performed in two stages, since hulls can either intersect, or, in some cases, one hull may contain another. The term “overlap” is taken to include both of these cases. It now remains to assesswhich summaries of the results are most helpful in ascertaining the nature of the hulls. The most fundamental property is probably the frequency of overlap of the hulls of two particular columns. In this case we might arbitrarily pick columns 1 and 2 for this purpose, and this would give a picture of how the method performs in discriminating between twopre-.spec$edcolumns (when they are in fact from the same population). However, in most cases the user is not necessarily interested in prespecified pairs of columns but in the overall picture (this being an exploratory method). Hence it also seems sensible to assessthe overall proportion of columns with overlapping hulls in the data set. These “tests” are not independent, of course, but since this figure is a mean this does not matter, so that this will in fact give a better estimate of the first characteristic.

624

T. J. RINGROSE

Tables 1 and 2 show the mean and standard deviation (as a percentage of the total number of comparisons, p(p- 1)/2, over 100 replications) of the number of columns whose outer hulls overlap. Table 1 shows a clear effect of unequal row sums increasing the overlap rates, suggesting that the overall structure of a matrix cannot be totally ignored in assessing bootstrap results. However, the fact that, even for such uneven sets of row sums as those considered, the difference is never more than 5%, suggests that, except in extreme cases, it should not need to be taken into account. There is a rather smaller effect of the use of the NISP matrix sum increasing overlap rates. The very small scale of this effect seems to imply that the interpretation of hull overlap should not differ substantially between matrices of different sum, if this difference can be viewed as being due to “sampling effort” in some sense. However, in the archaeological case of MN1 and NISP data sets the latter are a different measure to the former, so that this does not apply and the situation is less clear. As expected, the number of bootstraps used in generating the hulls had a noticeable effect, with the overlap rate for 1000 replications being of the order of 1.5 times that for 100, and that for 300 being roughly halfway between those two. The distribution of the number of overlapping pairs in each of the 100 trials was very skewed for each combination of factor levels; in most cases all the values were within two standard deviations less than the mean but up to around four standard deviations greater than the mean. Overall, all of the overlap rates are fairly low, under 42%, so that if the hulls of two prespecified columns are fairly close but not overlapping we might still conclude that they are not very different. Of course, the effect of multiple comparisons means that this may not be the case for two columns which just happen to,have hulls close to each other, but this is unavoidable. However, it seems that, in the Armstrong case at least, columns with overlapping hulls should be viewed as being very similar (in terms of their positions on a Correspondence Analysis display). Table 2 shows the overlap rates, and their standard deviations, for the experiment varying n andp. It is clear that there are huge differences, with increasingp and decreasing n both increasing the overlap rate quite dramatically. Hence the size of a matrix must clearly be taken into account in any interpretation of a bootstrap plot of this kind. Note that the same results were used for the n = 148, p = 13 case here as for the “MN1 matrix sum, mean row sums equal” case above. This is poor design, but since the data are not being analysed formally it has no real effect. 5. Examples

All bootstraps mentioned below consisted of 300 resamplings. The three examples given here are all taken from the Armstong data mentioned in Section 3. Figure 1 showsa bootstrapofthespitsforthe ArmstrongMNIdataset, with lOO%, 75% and 50 % hulls shown for each of the top 13 spits. The results in Table 1 suggest that around 33% of the hull-pairs would be expected to overlap if all the spits were effectively the same, but in fact we have only 7.7% (six out of a possible 78). The picture suggest that spits 5 and 6 are very similar, as are 7, 8 and 9 and 10, 11 and 12, while spits 0, 1, 2 and 3 are loosely grouped. As an aside, further analyses (with CA and other methods) suggested that spits O4 were quite different from each other and substantially different from the other spits, while the 7-9 and 10-12 groups were quite robust, with the former differing substantially from the latter only in lemming content. Figure 2 bootstraps the 15 most common taxa in this data set over all of the 19 spits in which they occur (in this case it is sensible to include spits 13-l 6,18,19, as well as CLl2) so that the most appropriate figure for comparison is 76.23% in Table 2 for n = 25, p = 13. In fact we might expect an even higher overlap rate than this, since we have higherp and lower n. This plot shows that the frogs are very different from the rest; in fact there are large numbers of them, mostly in the top six spits, thus producing

CORRESPONDENCE

ANALYSIS

IN ARCHAEOLOGY

625

0.851

0 63 t

-1.05

-0.83

15

-0.25 i -0.47 1

-0.69:

-1.35’ AXIS

n LAYER

0

Y LAYER OLAYER . LAYER

6 II 3

. LAYER

V LAYER 0 LAYER

2 7

f

12

. LAYER

I

5

l

LAYER

8

LAYER

m LAYER

+ LAYER o LAYER

I (27.3E”/o)

4 9

IO

Figure

I. Bootstrap

of top 13 spits in Armstrong

MN1

data.

a very tight cloud well away from the other taxa. Both lemming species also produce small clouds, while all of the other taxa have larger clouds, nearly all overlapping with each other. We might conclude that the frogs and Norway lemmings have different distributions while those of the other taxa are fairly similar, with the possible exception of the arctic lemmings. Figure 3 is the bootstrap for the NISP version of Figure 1, although with several taxa (including frogs) omitted (for various reasons). Only the outer hulls are shown here due to the smaller size of the clouds. We see approximately 2530% overlaps, expecting 33% under universal equivalence, suggesting that 6-12 and 24 can be viewed as groups, with 5 intermediate, while 0 and 1 are very different. 6. Conclusions

The bootstrap method outlined above should not be regarded as equivalent to formal statistical hypothesis testing, but is nevertheless a useful way of assessing the meaning of

626

T. J. RINGROSE

-0.96

720

-

AXIS

A Cove liyoeno x 0 v a

Woolly Rhino Mountoln Hare Red Fox Bos/B~son

Figure

v Wolf fl Relndeer = Ram sp q Woolly Mommolh l Arctic Lemming

2. Bootstrap

of 15 most

I

L Brown Bear @ Norway Lemming l Cove L10n c+ Horse l Ptormlgon

common

taxa in Armstrong

MN1

data.

the spreads of points on a CA display. In addition, it shares with formal hypothesis testing the drawback that, when multiple hypotheses are being tested (in this case, the overlapping, or not, nature of& - 1)/2 pairs of hulls is being assessed at once), some of these will give significant results even when there is no difference. However, this is an unavoidable consequence of the nature of the data and of the over-ambitious demands sometimes made of the method. Further, it also shares the problem that, in most archaeological cases, we are interested in whether two entities (species, spits etc) are “significantly different”, whereas a hypothesis test is determining just if there is a significant probability of any difference. This point was made by Shennan (1988) with respect to the x2 test. The simulation study described in Section 4 gives valuable insight into the behaviour of the method, in particular demonstrating that the numbers of rows and columns in the matrix have a substantial effect, while the structure and sum of the matrix are much less important.

CORRESPONDENCE

ANALYSIS

621

IN ARCHAEOLOGY

2.64

-2.351

-2-75 AXIS

A X 0 v l

SPIT 0 SPIT 6 SPIT I I SPIT 3 SPIT IO

D fl * a

Figure

SPIT 2 SPIT 7 SPIT 12 SPIT 5

’ I (30.87%)

3 05

I::

L SPIT 4 0 SPIT 9 . SPIT I l SPIT 8

3. Bootstrap

of top 13 spits in Armstrong

NISP

data.

Acknowledgements The author would like to thank a referee for pointing out the existence of the early archaeological applications of Correspondence Analysis in the French literature. I would also like to thank Nick Fieller for his help in preparing the paper, and the use of his software for producing the plots, and Peter Green for supplying the convex hull peeling routine. Appendix-Computing All Correspondence Analyses were performed using a GENSTAT4 program developed from that given in Greenacre (1984: 354) by the author. However, GENSTAT4 has now been withdrawn in many institutions, to be superceded by GENSTATS, which has a totally different syntax. The simulations were performed using a PASCAL program, calling the Green-Silverman peeling routine, which is in FORTRAN, while the ordinary

628

T. J. RINGROSE

bootstrapping was performed by a PASCAL program which produces output for a FORTRAN program which also calls the peeling routine. Random numbers were generated using NAG routine GOSDYF. Anyone who is interested is welcome to write to the author for a copy of the bootstrapping program, although this may need some modification to comply with local PASCAL compilers. The basic program is in fact very easy to write. I can be contacted most easily by electronic mail on JANET address [email protected]. However, permission from its authors is required for the acquisition of the peeling routine. Some statistical packages, for example SAS, can perform CA, but some effort may be required to obtain all of the information required by the bootstrap program. References Avery, G. & Underhill, L. G. (1986) Seasonal exploitation of seabirds by Late Holocene coastal foragers: analysis of modern and archaeological data from the Western Cape, South Africa. Journal of Archaeological Science 13,339-360. Benzecri, J.-P. (1973). L’Analyse des Don&es. Tome (Vol.) 1: La Taxinomie. Tome 2: L’Analyse des Correspondances. Paris: Dunod. Bolviken, E., Helskog, E., Helskog, K., Holm-Olsen, I. M., Solheim, L. & Bertelsen, R. (1982). Correspondence analysis: an alternative to principal components. Worfd Archaeology 14, 41-60. Boutin, P., Tallur, B. and Chollet, A. (1977). Essai d’application des techniques de l’analyse des don&es aux pointes a dos des niveaux aziliens de Rochereil. Bulletin de la Socitte Prehistorique Francaise 74,362-375. Denys, D. (1985). Palaeoenvironmental and palaeobiogeographical significance of fossil rodent assemblages of Laetoli (Pliocene, Tanzania). Palaeogeography. Palaeoclimatology, Palaeoecology 52,77-97. Digby, P. G. N. & Kempton, R. A. (1987). Multivariate Analysis of Ecological Communities. London: Chapman and Hall. Djindjian, F. (1977a). Burin de Noailles, burin sur troncature et sur cassure: statistique descriptive appliqute a l’analyse typologique. Bulletin de la Societe Prehistorique Francaise 14, 145-l 54. Djindjian, F. (19776). Etude quantitative des series aurignaciennes de la Ferrasie par l’analyse des donntes. Bulletin de la Societe Prehistorique Francaise 74,357-361. Efron, B. (1982). The Jackknife, the Bootstrap and other Resampling Plans. Philadelphia: SIAM. Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife and cross-validation. The American Statistician 37, 3648. Gabriel, K. R. (1981). Biplot display of multivariate matrices for inspection of data and diagnosis. In (V. Barnett, Ed.) Interpreting Multivariate Data. Chichester: Wiley, pp 147-174. Goodman, L. A. (1986). Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables. International Statistical Review 54,243-309. Grayson D. K. (1984) Quantitative Zooarchaeology. London: Academic Press. Green, P. J. & Silverman, B. W. (1979). Constructing the convex hull of a set points in the plane. The Computer Journal 22,262-266. Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis. London: Academic Press. Greenacre, M. J. (1988). Correspondence analysis of multivariate categorical data by weighted least squares. Biometrika 75,457467. Greenacre, M. J. & Hastie, T. (1987). The geometric interpretation of correspondence analysis. Journal of the American Statistical Association 82,437447. Hill, M. 0. (1974). Correspondence analysis: a neglected multivariate method. Applied Statistics 23,34&354. Hill, M. 0. and Gauch, H. G. (1980). Detrended correspondence analysis, an improved ordination technique. Vegetatio 42,47758. Holm-Olsen, I. M. (1981). Economy and settlement pattern 135&1600 AD, based on evidence from farm mounds. Norwegian Archaeological Review 14,86-101.

CORRESPONDENCE

ANALYSIS

IN ARCHAEOLOGY

629

Jenkinson, R. D. S. (1984). Creswell Crags: Late Pleistocene Sites in the East Midlands. British Archaeological Reports, British Series 122. Jenkinson, R. D. S. & Gilbertson, D. D. (1984). In the Shadow ofExtinction. Sheffield: University of Sheffield Monographs in Prehistory and Archaeology. Jolliffe, I.T. (1986). Principal Component Analysis. New York: Springer. Klein, R. G. & Cruz-Uribe, K. (1984). The Analysis of Animal Bones from Archaeological Sites. Chicago: University of Chicago Press. Krzanowski, W. J. (1988). Principles of Multivariate Analysis: A User’s Perspective. Oxford: Clarendon Press. Lebart, L. Morineau, A. & Warwick, K. M. (1984). Multivariate Descriptive Statistical Analysis. New York: Wiley. Leese, M. N. & Needham, S. P. (1986). Frequency table analysis: examples from early bronze age axe decoration. Journal of Archaeological Science 13, l-12. Melguen, M. (1974). Facies analysis by “correspondence analysis”: numerous advantages of this new statistical technique. Marine Geology 17, 165-182. Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling and its Applications. Toronto: University of Toronto Press. Powers, A. H., Padmore, J. & Gilbertson, D. D. (1989). Studies of late prehistoric and modern opal phytoliths from coastal sand dunes and machair in northwest Britain. Journal of Archaeological Science 16,2745. Ringrose, T. J. (1988a). Correspondence analysis as an exploratory technique for stratigraphic abundance data. In (C. L. N. Ruggles & S. P. Q. Rahtz, Eds) Computer and Quantitative Methods in Archaeology 1987. British Archaeological Reports, International Series 393, pp. 3-14. Ringrose, T. J. (1988b). Exploratory multivariate analysis of stratigraphic data: Armstrong’s data from Pin Hole Cave re-examined. In (E. A. Slater & J. 0. Tate, Eds) Science and Archaeology Glasgow 1987. British Archaeological Reports, British Series 196, pp. 521-539. Shennan, S. (1988). Quantzfying Archaeology. Edinburgh: Edinburgh University Press. Spicer, R. A. & Hill, C. R. (1979). Principal components and correspondence analyses of quantitative data from a Jurassic plant bed. Review of Palaeobotany and Palynology 28, 273-299. Tenenhaus, M. &Young, F. W. (1985). An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods of quantifying categorical multivariate data. Psychometrika 50,91-l 19. Van der Heijden, P. G. M., de Falguerolles, A. &de Leeuw, J. (1989). A combined approach to contingency table analysis using correspondence analysis and log-linear analysis. Applied Statistics 38,249-292.