Journal oflmmunological Methods, 78 (1985) 307-321
307
Elsevier JIM 03460
Multiwell Chamber Chemotaxis Assays: Improved Experimental Design and Data Analysis Cedric Minkin 1, David J. Bannon, Jr., Selma Pokress and Michael Melnick Department of Basic Sciences, Section of Physiology and Laboratory of Developmental Biology, Unioersity of Southern California School of Dentistry, University Park Campus, Los Angeles, CA 90089.0641, U.S.A. (Received 23 May 1984, accepted 31 December 1984)
The chemotaxis assay using the Boyden transfilter technique has become widely used in recent years for assessing migratory responses of a wide variety of cell types. In the study reported here we examined the migratory responses of mouse peritoneal macrophages using a multiwell chamber. The experiments were designed to analyze the components of variance in the assay method, to optimize the experimental design, and to develop objective statistical criteria for choosing among experiments with disparate results. Cell counts were obtained with the aid of an image analyzer coupled to a fight microscope. Microcomputer software was developed to drive the image analyzer, collect data and conduct statistical analyses. Nested analysis of variance (ANOVA, either 2- or 3-level) was employed to partition the components of variance and F-tests were used to determine their significance. Significant sources of experimental error were identified both within and among wells and were attributed mostly to variability in the chamber/filter assembly and counting procedure. Statistical analyses demonstrated that there was significant variation among assays conducted in different multiwell chambers on the same day, among assays where the same agent was tested on different days in the same chamber, and among replicate counts of the same assay. The following recommendations were made: (1) use ANOVA to distinguish differences due to biological effects from those due to experimental error, (2) design experiments so that all relevant comparisons are included in the same chamber and the same assay, (3) avoid pooling data from different assays unless ANOVA treatment variances are comparable, and (4) when replicate assays yield disparate results choose the assay with the lowest percentage of variation due to experimental error. Key words: chemotaxis - multiwell chamber- image analyzer- analysis of variance
Introduction The multiwell chamber has been demonstrated to be useful for performing multiple in vitro chemotaxis assays utilizing small numbers of responder cells and minimum amounts of chemoattractant (Falk et al., 1980). We have utilized such I Address for correspondence and reprint requests: Cedric Minkin, Ph.D., University of Southern California, School of Dentistry Room 4220, University Park MC-0641, Los Angeles, CA 90089-0641, U.S.A. Telephone (213) 743-8149 or 743-7284. 0022-1759/85/$03.30 © 1985 Elsevier Science Publishers B.V. (Biomedical Division)
308 chambers for our studies on bone-derived macrophage chemotactic factors and, as with single well chambers, have noted considerable variation both within and among assays. Of particular concern was our observation that intragroup variance was frequently so great as to conf~and group mean comparisons by t-test. In an effort to improve assay reliability, we undertook a series of experiments to examine the sources of variation and to seek means for optimizing the assay conditions. To assist in this task we have developed computer software, utilizing an Optomax IV image analyzer coupled with an Apple II plus microcomputer, which enables us to collect, store and reduce large amounts of data from each assay and subject them to analysis of variance. This approach provides a reliable and effieient means of partitioning components of variance, as well as a sensitive and valid basis for determining the significance of group differences using repeated t-tests. The methods which we have developed and utilized make it possible to conduct chemotaxis assays in which even small differences between treatment groups can be accurately and reliably measured. Furthermore, the analysis of variance provides a means of distinguishing real biological differences from those due to experimental error, as well as a valid and unbiased basis for choosing among replicate experiments with disparate results.
Materials and Methods
Chemotaxis assays The assays were conducted essentially as previously reported (Minkin et al., 1981) with the major difference being that multiwell chambers (Falk et al., 1980) were used in place of single blind well chambers. Responder cells were thioglycollate-elicited peritoneal macrophages obtained from C 3 H / H e mice. Cells were harvested and chemotactic attractants diluted in Gey's balanced salt solution (Irvine Scientific, Santa Ana, CA) supplemented with 20 mM Hepes and 2% bovine serum albumin. Endotoxin-activated mouse serum (EAMS) was prepared by established procedures (Boumsell and Meltzer, 1975) and used as the positive control for chemotaxis. Bone-derived chemotactic factors (BDCF) were 0.5 M EDTA extracts of newborn mouse calvaria and are described in detail elsewhere (Minkin et al., 1981; Minkin et al., 1984). N-formylmethionyl-leucyl-phenylalanine (FMLP), a chemotactic peptide, and all other reagents were obtained from Sigma Chemical Co., St. Louis, MO. The multiwell chambers used for these experiments were obtained from Neuroprobe, Cabin John, MD, and the polycarbonate filters (5 /~m pore size) were from Nucleopore, Pleasanton, CA. The lower (attractant) wells received 30 #1 of attractant or appropriate control, while the upper (cell) wells received 50/~1 of cell suspension (2-3 million macrophages per ml). Incubation of assay chambers was conducted at 37°C, at 95% relative humidity in an atmosphere of 5% CO 2 in air for 3 h. Filters were stained with Diff-Quick (Harleco, Gibbstown, NY). Cell counts with the image analyzer Cells which migrated across the filter were counted under a 20 x objective using an Olympus BH-2 microscope coupled to an Optomax IV image analyzer (Optomax,
309 Hollis, N H ) driven by software developed in our laboratory for an Apple II plus microcomputer. This equipment significantly reduces the time and labor involved in counting the cells on the filter. The image analyzer detects cells (features) as a function of their grey-level intensity. Contrast of the specimen was enhanced by a green filter placed over the fight source. The substage condenser was adjusted so that the illumination was sufficiently bright to effectively eliminate the visible boundaries of the filter pores. The detector was then set at an appropriate level so that only the cells, and not the pores in the filter, were counted. The image analyzer determines the raw cell count per field (i.e., the number of individual feature detected) and quantitates the area occupied by the cells in a field (i.e., the total area in pixels occupied by the features counted). Calibration of the analyzer for a given cell type, by determining the area to count ratio, was used to transform the area determination to an area corrected cell count. Both raw cell counts and area corrected cell counts were recorded for each field observed and stored in the computer for subsequent analysis.
Statistical methods A mixed-model 2-level nested analysis ot variance (ANOVA) was used for single multiwell chamber experiments and a 3-level was utilized when multiple chambers were compared. Independent estimates of assay variance due to groups ('among treatments') and subgroups ('among wells within treatments') were tested for significance by F-test. A partitioning of variance was conducted as described by Sokal and Rohlf (1969). When significance was found among groups, planned comparisons among group means were conducted using repeated t-tests, which have been shown to be mathematically equivalent to orthogonal comparisons (Sokal and Rohlf, 1969). For more conservative t-tests a group composite standard error term was employed which was derived from the ANOVA using the among subgroup (wells) mean square to estimate average experimental error. In this instance the standard error term appropriate for testing any 2 group means then would be the square root of twice the 'among wells' MS (mean square) divided by the total number of observations per group.
Experimental protocols The basic treatment protocol used to examine the inherent variability in the system involved a series of experiments in which all groups were treated in an identical manner using EAMS as chemoattractant at a concentration sufficient to elicit a near maximal response (1 : 400 dilution of our EAMS preparation). In setting up the assays, chambers were arbitrarily divided into 8 sections of 4 wells each. Thus, for t h e p u r p o s e s of partitioning components of variance for the ANOVA, these 8 sections were designated as groups, the individual wells were designated as subgroups within groups and the microscopic fields observed and counted (either 6 or 12 per well) accounted for the error or within subgroup variance. Since peritoneal cells were harvested for each experiment, variation due to differences in cell preparations could be estimated by comparing identical assays conducted on different days. The variation due to differences among individual multiwell chambers was
310 estimated by comparing the results from identical assays conducted in different chambers on the same day. I n a d d i t i o n t o e x p e r i m e n t s o u t l i n e d a b o v e w h e r e i n all g r o u p s w e r e t r e a t e d t h e s a m e , we c o n d u c t e d e x p e r i m e n t s i n w h i c h w e e x a m i n e d t h e c h e m o t a c t i c r e s p o n s e t o various experimental treatments. In one case we tested the efficacy of ammonium s u l f a t e p r e c i p i t a t i o n as a m e a n s o f c o n c e n t r a t i n g B D C F a c t i v i t y f r o m b o n e e x t r a c t s . In another, we examined the chemotactic dose-response of FMLP. In these experim e n t s , e i t h e r 2 o r 3 i n d i v i d u a l m u l t i w e l l c h a m b e r s w e r e set u p as r e p l i c a t e a s s a y s .
Results Experiments with all groups treated the same A s s a y s c o n d u c t e d w h e r e all g r o u p s r e c e i v e d t h e s a m e t r e a t m e n t ( E A M S 1 : 4 0 0 ) were useful in examining the inherent variability of the system. The ANOVA for such experiments are presented in Tables I-III. Details of the results are presented b e l o w a n d t h e y c a n b e s u m m a r i z e d as f o l l o w s : (1) a r e a c o r r e c t e d c o u n t s a r e a m o r e r e l i a b l e m e a s u r e o f cell n u m b e r t h a n r a w c o u n t s , (2) t h e r e is a c o n s i s t e n t l y s i g n i f i c a n t v a r i a b i l i t y a m o n g wells w i t h i n g r o u p s , (3) a s s a y s c o u n t e d m o r e t h a n o n c e d e m o n s t r a t e s i g n i f i c a n t v a r i a t i o n a m o n g c o u n t i n g - t i m e s , (4) r e p l i c a t e a s s a y s c o n TABLE I COMPARISON OF REPLICATE COUNTS OF THE SAME ASSAY Migration response was measured in an assay in which all the groups were treated with an identical concentration of EAMS. Separate counts of the filter were made on the dates indicated. Both raw count and area corrected count data were collected and the results of a 2-level nested ANOVA are recorded here. Grand means and SEM for each counting mode and counting time were calculated and are also recorded below.
(.4) Assay MI: date counted." 9/9/83 (Groups = 8, Wells = 4, Fields = 6) Source
Raw count data Mean = 135.8 (0.99) DF
MS F
Area corrected count data Mean = 150.0 (1.23)
P-value % of variation MS F
Among groups 7 935 3.69 0.008 Among wells within groups 24 254 1.76 0.022 Within wells (error) 160 144
15 10 75
P-value % of variation
698 1.18 0.352 2 593 2.58 < 0.001 20 230 78
(B) Assay MI : date counted: 11/15/83 (Groups = 8, Wells = 4, Fields = 6) Source
Raw count data Mean = 124.0 (1.4) DF
MS F
Area corrected count data Mean = 138.5 (1.6)
P-value ~ of variation MS F
Among groups 7 994 1.87 0.027 Among wells within groups 24 530 1.63 0.009 Within wells (error) 160 325
5 9 86
705 0.75 938 2.09 448
P-value % of variation 0.986 2 0.001 15 83
311 T A B L E II C O M P A R I S O N OF ASSAYS C O N D U C T E D IN R E P L I C A T E C H A M B E R S Migration response was measured in replicate assays conducted on the same day using different chambers. Area corrected counts were collected and the results of a 2-level nested A N O V A for each chamber are recorded here. G r a n d means and SEM for each chamber were also calculated and are recorded below. (A) Assay M2: chamber no. 4 (Groups = 8, Wells = 4, Fields = 6) Source
A m o n g groups A m o n g wells within groups Within wells (error)
Mean = 250.7 (2.75) DF
MS
F
P-value
% of variation
7 24 160
4 866 3 526 999
1.38 3,53
0.258 < 0.001
4 28 68
(B) Assay M3: chamber no. 1 (Groups ffi 8, Wells = 4, Fields ffi 6) Source
A m o n g groups A m o n g wells within groups Within wells (error)
Mean = 220.0 (3.02) DF
MS
F
P-value
% of variation
7 24 160
5 474 3 812 1276
1.44 2.99
0.237 < 0.001
4 24 72
ducted in different chambers demonstrate significant variation among chambers, and (5) replicate assays conducted on different days demonstrate significant variation among days. (1) Raw counts vs. area corrected counts. Since the treatment is the same in all wells and groups, it is expected that there will be no significant variation among groups or among subgroups. However, in the raw count ANOVA of Table IA and IB there is significant variation among groups as measured by F-test. This is not the case when one examines the area corrected counts in the same assays. This has been a consistent finding in the majority of our experiments and suggests that area corrected counts are a more accurate measure. (2) Among well variability. One of the greatest concerns revealed in these studies was the significant variation seen among wells within groups in most all experiments. When all groups are treated the same from 15% (Table I) to 28% (Table II) of all experimental variation is found among wells. (3) Repeated counts of the same assay. Table I presents 2 separate counts, conducted on different days, of the same assay. A 3-level nested ANOVA of area corrected counts (Table IIIA) demonstrated that there was significant variation among repeated counts. The grand means of these repeated counts (138.5 vs. 150.0) could also be demonstrated to be different ( P < 0.001) using the t-test. (4) Replicate experiments conducted in different chambers. The data presented in Table II are from an experiment in which the same cell preparation was used to conduct replicate experiments using different microwell chambers. Comparison of
312 TABL E I l l T H R E E - L E V E L N E S T E D ANOVA OF R E P L I C A T E C H E M O T A X I S ASSAYS Three-level nested ANOVA were conducted using area corrected count data from experiments found in
Tables I and II. Replication was either by time counted, chamber used or day conducted. As previously mentioned, all groups in these assays were treated with the same concentration of EAMS. (.4) Different counting times: same assay Assay M1 ( 9 / 1 9 / 8 3 ) vs. Assay M1 ( 1 1 / 1 5 / 8 3 ) Source Among Among Among Within
DF
counting times groups wells within groups wells (error)
1 14 48 320
MS
F
13067 702 765 339
18.6 0.9 2.3
P-value
% of variation
< 0.001 N.S. 0.001
14 0 15 71
(B) Different chambers." same day and same cells Assay M2 vs. Assay M3 Source Among Among Among Within
DF
chambers groups wells within groups wells (error)
1 14 48 320
MS
F
89792 5170 3669 1137
17.4 1.4 3.2
P-value
% of variation
< 0.001 N.S. < 0.001
21 3 21 55
(C) Different day and different cells." same chamber Assay M1 vs. M2 Source
DF
MS
F
P-value
% of variation
Among d a y s / c e l l s Among groups Among wells within groups Within wells (error)
1 14 48 320
97 0126 2 782 2 059 614
348.0 1.4 3.2
< 0.001 N.S. < 0.001
85 1 4 10
independent estimates of variance revealed significant variation among wells (as mentioned above) and among fields within wells. A 3-level nested A N O V A of area corrected counts (Table IIIB) demonstrated a significant variation among chambers. The grand means of these chambers (250.7 vs. 220.0) were also shown to be different ( P < 0.001) using the t-test. (5) Replicate experiments conducted on different days. The data presented in Tables I and II are identical experiments conducted on different days, and of course with different cell preparations. Examination of the partitioning of the components of variance using 2-level nested A N O V A demonstrated that 96-98% of the variation in these experiments was due to experimental error (among subgroups within groups plus within subgroups) and 2-4% was due to variation among groups. This was to be expected since all groups were treated the same. A 3-level nested A N O V A of area corrected counts (Table IIIC) comparing assay M1 (Table IA) with assay M2 (Table II) demonstrated a significant variation among days. The large (85%) component of the variation among days was also apparent in comparing the grand means for the 2 assays (250.7 vs. 150.0) which were found to be different ( P < 0.001) by t-test.
313 Experiments
with different treatments
C o n d u c t i n g experiments with all groups treated the same is useful is e x a m i n i n g the i n h e r e n t variability of the assay system a n d in developing strategies for optimizing the experimental design. However, the utility of the A N O V A as a m e a n s of distinguishing true biological differences from those due to experimental error is most evident w h e n applied to ' t y p i c a l ' experiments c o n d u c t e d as part of s t a n d a r d l a b o r a t o r y research. The following are examples of such experiments routinely performed i n our laboratories. The first example was a n experiment designed to test whether a m m o n i u m sulfate precipitation was a n effective m e a n s :of precipitating B D C F (bone-derived chemotactic factor) from crude extracts. Briefly, crude extracts of B D C F o b t a i n e d b y 0.5 M E D T A extraction were b r o u g h t to 85% s a t u r a t i o n with a m m o n i u m sulfate a n d a precipitate a n d s u p e r n a t a n t fraction were prepared. These were dialyzed against 3 m M Hepes buffer a n d diluted with G e y ' s / B S A to the desired c o n c e n t r a t i o n . The original B D C F p r e p a r a t i o n s (BE-3), the a m m o n i u m sulfate precipitate fraction (BE-3P) a n d the s u p e r n a t a n t fraction (BE-3S) were tested at the original c o n c e n t r a tions a n d at a 1 : 5 d i l u t i o n for chemotactic activity. Replicate assays (designated 224.1 a n d 224.3) were c o n d u c t e d in 2 separate chambers with 4 wells per treatment. T e n fields were c o u n t e d per well a n d the group m e a n chemotactic responses for cells TABLE IV FRACTIONATION OF BCDF WITH AMMONIUM SULFATE Cell migration in response to 2 concentrations of each of 3 fractions of BDCF obtained was determined in duplicate assays conducted in different chambers on the same day with the same cell preparation. Area corrected counts were collected and the group means and 2-levelnested ANOVA are recorded here. (.4) Mean cell migration data: data are recorded as the number of cells migrated/20 x field - mean (SEM)
Treatment
Assay 224.1
Assay 224.3
Gey's/BSA BE-3 BE-3 (1 : 5) BE-3P BE-3P (1 : 5) BE-3S BE-3S (1":5)
157.3 (4.8) 235.3 (6.7) ~ 177.4 (3.2) a 291.6 (4.8) a 251.0 (5.6) a 246.5 (6.2) a 209.5 (5.1) a
95.4 (3.0) 156.7 (3'.2) a 134.7 (4.1) a 236.2 (4.3) a i71.2 (4.6) a 116.8 (3.7) 100.5 (3.2)
" ~
(B) Two-level nested A N O V A : data for Gey's/BSA group was not used for the A N O V A
(Groups = 6, Wells = 4, Fields ~ 10) Source
Assay 224.1 DF
MS
F
Assay 224.3 P-values ~ of MS variation
Among groups 5 60529 9.6 <0.001 Among wells within groups 18 6286 8.6 <0.001 Within wells (error) 216 732 a Significantlydifferent than Gey's/BSA P < 0.001.
51 21 28
F
P-value ~ of variation
93 362 30.5 <0.001 77 3.061 7.7 <0.001 9 399 14
100.9 82.2 77.2 66.0
(4.6) (3.6) (3.3) (3.0)
38.8 (1.9) 228.3 (7.4)
Assay 212.2
3 12 176
DF
13474 1141 353
MS 11.81 a 3.23 a
F
Assay 212.1
38 10 52
% of variation
10149 3 057 494
MS
3.32 6.19 a
F
Assay 212.2
Variation significant as determined by F-test, P _< 0.001 (others not significant P > 0.05).
A m o n g groups A m o n g wells within groups Within wells (error)
(Groups = 4, Wells = 4, Fields = 10) Source
(B) Two-leoel nested A N O V A of each assay: A N O V A was conducted on groups treated with F M L P only
(3.4) (2.9) (2.6) (2.6)
36.2 (1.9) 177.2 (6.8)
Gey's BSA EAMS FMLP undiluted 1:2 1:4 1:8 83.2 52.9 57.0 44.3
Assay 212.1
Treatment
17 25 58
% of variation
(A) Mean cell migration data: data are recorded as the number of cells migrated~20 × field - - mean (SEM)
(4.6) (2.5) (2.3) (2.0)
59688 598 429
MS
99.8 a 1.4
F
Assay 212.3
126.9 70.3 55.9 49.4
31.2 (1.7) 251.9 (6.7)
Assay 212.3
74 1 25
% of variation
Cell migration in response to graded doses of F M L P was determined in triplicate assays conducted in different chambers on the same day with the same cell preparation. The concentration of undiluted F M L P was 0.1 mM. Area corrected counts were collected and the group means and 2-level nested A N O V A are recorded here. These data are also graphically represented in Fig. 1.
DOSE R E L A T E D RESPONSE TO F M L P
TABLE V
315 migrated per 20 x field (area corrected count) are recorded in Table IVA. The means for assay 224.1 are larger than those in assay 224.3. Comparisons of the various treatment means to the negative control ( G e y ' s / B S A ) suggest relative differences in response between the 2 assays. This is most obvious in the BE-3S groups where the results of assay 224.1 indicate that this fraction, the a m m o n i u m sulfate supernatant, has activity, while the results of assay 224.3 suggest that there is little or no activity in this fraction (see mean comparisons by t-test, Table IVA). The A N O V A of these assays is recorded in Table IVB. In assay 224.1, 49% of the variation is due to components associated with error while 51% is due to treatment differences. Assay 224.3, in contrast, has only 23% of the variation associated with experimental error and 77% due to treatment effects. Thus, on the basis of ANOVA, the differences measured in assay 224.3 are likely to be a more accurate estimate the true biological variation than those in assay 224.1. Another example of a routine protocol used in m a n y laboratories is the dose-response experiment. Dose-response data are a useful analytical tool with which to characterize chemotactic factors as well as the chemotactic responsiveness of various responder cell types. Table V presents the results of an experiment to test the efficacy and dose responsiveness of FMLP, a tripeptide known to be chemotactic for neutrophils and monocytes, as a chemoattractant for mouse peritoneal macrophages. In this experiment 4 concentrations of F M L P were tested along with a negative ( G e y ' s / B S A ) and a positive (EAMS) control. Three replicate assays were conducted with the same cell preparation in different microchambers using 4 wells per treatment group. Twelve fields were counted per well and the group mean chemotactic responses for cells migrated per 20 x field (area corrected count) is recorded in Table VA and graphically illustrated in Fig. 1. The results are obviously disparate. Which assay is the most likely to be reflecting real biological differences? It is possible to answer this question if one compares the A N O V A data reported in Table VB. Here it is clear that assay 212.3 is the most reliable since 74% of the variation is 150
~
uJ u.
o~ 100
.......
-,
....
212.3
212.2
........ .2,21
so
..J
12.5
100 RELATIVE
CONCENTRATION
OF
FMLP
Fig. 1. Cell migration in response to graded doses of FMLP. The data illustrated in this figure are those recorded in Table V. Data points represent means with + SEM as error bars.
316 among treatments and only 26% is associated with experimental error. This is in contrast to assays 212.1 and 212.2 where substantially less of the variation is associated with treatments (38% or 17%) and a greater proportion of the variation (62% or 83%) is due to those components associated with experimental error.
Discussion
The assay of chemotaxis using the Boyden transfilter method has become widely used in recent years for assessing migratory responses of a wide variety of cell types. The development of the 48-well micro chemotaxis chamber (Falk et al., 1980) has improved the efficiency of conducting such assays by minimizing manipulation time, numbers of cells and amounts of chemoattractants required. The application of image analysis to the procedure has further improved the efficiency by providing a rapid means of counting the migrated cells. In the course of our experiments we have frequently encountered apparent inconsistencies among our experiments, as well as between our results and those of other laboratories. We therefore embarked upon a series of experiments designed to analyze the components of variance in the assay method, to optimize the experimental design, and to develop objective statistical criteria for choosing among experiments with disparate results. The standard design of a chemotaxis experiment seemed to lend itself to a 2-level nested analysis of variance (ANOVA). Variation in the first level or 'among groups' would be associated with treatment effects (chemotactic agents). Variation in the second level or 'among subgroups within groups' (wells used per treatment) would be associated with well effects. Variation in the error term or 'among observations within wells' (fields counted per well) wolald be associated with field effects and residual experimental error. As a further aid in comparing the relative magnitudes of the variance components in the nested ANOVA (designated in Tables I - I V as '% of variation') we utilized the components of the expected mean squares following procedures outlined by Sokal and Rohlf (1969). In order to examine the inherent variability of the chemotaxis assay we conducted experiments in which all groups received the same treatment (EAMS). Eight sections of wells were arbitrarily assigned as treatment groups. In this design, with no difference expected due to treatment, variation among groups due to treatment was theoretically set at zero and all variation in the experiment would be expected to be found in the error term. The results of such experiments (Tables I - I I I ) yielded much useful information regarding the significant sources of error in the multiwell chamber chemotaxis assay. One of the first things which we wished to determine was whether the raw count mode or area corrected count mode of the image analyzer was the most suitable for analysis of chemotactic responsiveness. An examination of the raw count data (Table I) revealed, that, despite the fact that all groups were treated the same, there was significant variation among groups. This was not the case with the area corrected counts and thus it appears that there is significant error associated with the counting mode of the image analyzer which is not apparent when the area mode is
317
utilized and a conversion is made to area corrected counts using a calibration factor. This error is probably due to the presence of closely packed groups of cells which the analyzer cannot count accurately. This point is made in the documentation provided with the image analyzer and is clearly an important factor to consider in these chemotaxis experiments. Thus we conclude that area corrected counts are the most reliable for use in these assays. A consistent finding in all experiments with groups treated the same was a significant variation among wells. Such a result was unexpected in these experiments and indicated an important systematic source of error. When considered in the context of the entire assay procedure, significant variation among wells could represent irregularities in the multiwell chamber/filter assembly, errors in sample pipetting or error associated with the counting procedure including the microscope, image analyzer a n d / o r their operation. Significant variation observed among repeated counts of the same assay (Table IIIA) indicates that the counting procedure is a significant source of error and, although it is probably the major factor influencing within well variation, it may also account for some of the variation among wells. The same may also be said for variance among chambers. When variance among chambers is tested using a 3-level nested ANOVA (Table IIIB) there is still significant variation among wells. One possible explanation that we have not ruled out is that significant variation among wells is due to the fact that the wells were not assigned randomly to treatment groups. This was not done because of the practical difficulties that random assignment of wells would introduce in the counting procedure. Nonetheless, it is reasonable to conclude that, aside from chamber associated errors, the variation among wells is likely to be due to errors associated with the counting procedure. Another concern regarding the design and conduct of the chemotaxis assay is the question of pooling experimental data. Frequently, a series of experiments is conducted such that it is useful to compare experiments conducted at different times. Also for certain kinds o f experiments, such as time course studies, it is impractical to use a single multiwell chamber and multiple chambers are the only practical answer. In order to address this question we examined assays where different chambers were used with the same cell preparation or the same chamber was used on different days. As before, all groups were treated the same to eliminate (or minimize) among treatment variation. The 3-level nested ANOVA used to evaluate these experiments demonstrated that there was a significant variation among chambers used with the same cell preparation (Table IIIB). A similar difference among chambers was observed when triplicate dose-response assays were conducted on the same day (Table V). This chamber effect indicates that it is not advisable to pool data from assays conducted in different chambers. Results from the experiment using the same chamber on different days demonstrate significant variation among days (Table IIIC). It is possible that this difference could be due to the fact that a different cell preparation was used, but nonetheless, the results indicate that pooling of experiments conducted on different days, even in the same chamber, may introduce significant error. When these results are taken together with those mentioned above for different chambers, it is clear that pooling of experimen-
318 tal data from microwell chamber chemotaxis assays is not adviseable. It is possible to avoid this problem by designing assays so that all important comparisons can be made within an individual chamber. It should be noted that the experiments discussed above have been biased by treating all groups the same. These conditions provide a very sensitive test for false positives and in so doing amplify potential sources of error. However they do indicate what precautions must be taken in the design of experiments so that true biological differences may be discerned from differences due to experimental error or chance. Aside from providing an objective means of identifying sources of variation in the multiwell chamber chemotaxis assay, the ANOVA can also be applied as an aid in the selection of the most biologically informative experiment from replicate experiments with disparate results. To illustrate this point we have included 2 kinds of experiments, which are routinely employed in our laboratory. These are a test of ammonium sulfate fractionation on BDCF activity replicated twice in different chambers with the same cell preparation (Table IV), and a dose-response assay replicated 3 times using different chambers (Table V). The experiment whose data are recorded in Table IV is an example of a frequent finding in our laboratory as we have attempted to isolate and purify BDCF from bones. It is clear from an inspection of the data that the fractionation results in a concentration of activity in the precipitate (BE-3P). The disparity in the results between the 2 replicates is whether or not there is activity in the supernatant (BE-3S). The ANOVA (Table IVB) demonstrates that assay 224.3 has the least variation due to experimental error (23% of variation as compared to 49% in assay 224.1) and is thus the most reliable for interpreting treatment differences. We can conclude, with a reasonable degree of confidence, that there is little or no activity in the supernatant and proceed to design further experiments to pursue these findings. This then is the value of the ANOVA for these types of experiments, since without this information it is virtually impossible to objectively choose among the assays with disparate results. It is equally difficult to objectively determine how many additional times the assay must be repeated before a subjective choice can be made among such disparate results. Another routine type of experiment conducted in our laboratory is the dose-response assay. The disparate results reported in Table V indicate a problem that most likely arises both within and among laboratories studying phenomena of chemotaxis. Since it is known that monocytes and neutrophils respond to FMLP, investigator bias would lead to the selection of assay 212.3 and the rejection of the others. Alternatively, a prudent investigator might decide to repeat the experiments and 'hope' for replicates which were in agreement. The ANOVA, however, provides an objective and relatively unbiased means of selecting from among the 3 assays. The criteria for such a selection must distinguish relevant biological differences from those due to experimental error or chance. The ANOVA enables the investigator to design an experiment in which the variation due to biologic differences (i.e., variance among treatments) can be distinguished from variation due to experimental error (i.e., among well variance and within well variance) and thus provides a
319
criterion for making objective assessments of real biological differences. The assay which is most likely to be biologically relevant is the assay in which the highest percentage of variation is associated with treatments and conversely, the least percentage of variation is associated with experimental error. In the experiments presented in Table V it is clear that assay 212.3 meets the criterion for the highest percentage of treatment variation and the lowest percentage of error variation, and thus it is this assay which is most likely to represent the real biologic effects of F M L P on macrophage chemotaxis. Another use of the ANOVA for evaluating results of multiwell chamber chemotaxis experiments relates to the comparison of treatments. The standard approach for making treatment comparisons using ANOVA is to first test for significance among treatments using the F-test. If there is significant variation due to treatment, it is then appropriate to compare individual treatments using the t-test. The standard t-test for mean comparisons involves calculating a found 't ', by dividing the mean difference by the standard error of the mean difference, and comparing this to table 't' from a normal t-distribution at a predetermined alpha level of probability. The standard error term is derived from the variance of the observations about their mean. Using the ANOVA it is possible to derive a standard error term from the variance among wells which represents the average error of all observations in the ANOVA and is thus a more conservative, or stringent, test of mean differences. Since we have demonstrated that there is significant inherent variability in the multiwell chamber chemotaxis assay and thus an increased likelihood for false positive results, it seemed that a more conservative t-test would be more likely to reject such false positives. An example of its use is presented in Table VI. The data here are from experiment M1 (see Table I) where all groups are treated the same. T A B L E VI t-TEST F O R M E A N C O M P A R I S O N S U S I N G A COMPOSITE V A R I A N C E D E R I V E D F R O M T H E ANOVA The means used here are derived from Assay M1 (Table IB) in which all groups were treated the same. Comparisons are made between the largest mean (group no. 2) and each of the others and the mean differences are tested by either a standard t-test using the individual group SEM (listed in the table below) or by a t-test using a composite SEM derived from the a m o n g well variance in the A N O V A (ANOVA MS). Data recorded in the table are levels of significance for the m e a n comparisons using each t-test. G r o u p no.
Mean (SEM)
Standard t-test
A N O V A MS t-test
2 1 3 4 5 6 7 8
132.9 123.6 114.6 123.4 119.4 118.7 120.0 120.3
N.S. a 0.003 N.S. ?b 0.031 0.048 ?
N.S. ? N.S. N.S. N.S. N.S. N.S.
(4.4) (3.9) (3.9) (4.2) (5.6) (4.8) (4.7) (4.4)
" N.S. = no significance ( P > 0.10). b ? ffi Questionable significance (0.05 < P < 0.10).
320 The means of the 8 groups are presented and the largest response (group 2) is arbitrarily compared to the others by either standard t-test or A N O V A MS t-test. The standard error of the mean (SEM) for the A N O V A MS test is 7.823. The A N O V A of this assay (Table I) indicated that there was no significant variance among groups, and this was confirmed by the A N O V A MS t-test. On the other hand the standard t-test suggests that at least 3 of the groups (3, 6 and 7) were significantly different from group 2. Thus the A N O V A MS t-test, because it is more conservative than the standard t-test, appears to be a useful means of reducing the probability of obtaining false positive results when comparing treatment means in multiwell chamber chemotaxis assays. In summary, we have conducted a series of experiments to analyze the components of variability associated with the multiwell chamber chemotaxis assays. Our experiences suggest that the assay is a rapid and efficient means for evaluation of cell migration. However, in order to increase the probability of obtaining consistently accurate and reliable results, and to provide a standard for comparisons of data among different laboratories, we make the following recommendations: (1) We recommend the use of A N O V A in these assays in order to distinguish differences due to biological effects from those due to experimental error. (2) We have observed significant error when using the count mode of the image analyzer and thus recommend using area corrected counts for these assays. (3) We have observed significant among well variation in most assays and attribute this error to chamber/filter assembly differences and the counting procedures. Since the chamber/filter assembly and image analyzer are fixed sources of error (assuming proper operation and maintenance), extra caution should be taken in filter staining, and microscopic and image analyzer adjustment and calibration. Pipetting errors can also be a source of variation among wells and should be minimized by frequent calibration of pipetting devices and operators. (4) Since there is significant variation among different chambers and among assays conducted at different times, it is recommended that assays be designed so that all relevant comparisons are made within the same chamber at the same time and with the same cell preparation. (5) It does not appear to be adviseable to pool data from different assays. However, if it is necessary to do so, appropriate tests for homoscedasticity should be employed. A simple test is to compare treatment mean squares from A N O V A of experiments to be pooled using the F-test. If there is no significance among the mean squares then it may be justified to pool the data. (6) When faced with the need to select from among assays with disparate results, it is recommended to always choose the assay which has the highest percentage of variation associated with treatments and the lowest percentage associated with experimental error. (7) When comparing treatments, it is necessary to first demonstrate significant variance among treatments in the ANOVA. Having demonstrated significance it is appropriate to use the t-test, however, we recommend using a standard error term derived from the A N O V A MS for ' a m o n g wells' in order to reduce the probability of false positives (i.e., claiming differences where there are no differences).
321 (8) We have not as yet applied the methods used in this study to the analysis of components of variance using the single well Boyden chamber and we are unaware of any such published report in the literature. However our experience with these chambers indicates that the results would be similar to those reported here for the multiwell chamber. Thus we recommend that the same precautions cited above be taken in conducting chemotaxis assays using these chambers until a proper analysis of the components of variation is made available in the literature.
Acknowledgements This research was supported in part by funds from U S P H S - N I D R research G r a n t DE-04961. The authors gratefully acknowledge the following individuals: E.J. Leonard, R.H. Goodwin and A. Skeel for advice and guidance with the multiwell chamber assay methodology; C. Rohde for statistical consultation; and P.D. Rousseau, L.K. Minkin and K.L.Y.S. Minkin for support and encouragement.
References Boumsell, L. and M.S. Meltzer, 1975, J. Immunol. 115, 1746. Fall W., R.H. Goodwin, Jr. and E.J. Leonard, 1980, J. Immunol. Methods 33, 239. Minkin, C., R. Posek and J. Newbrey, 1981, Metab. Bone Dis. Rel. Res. 2, 363. Minkin, C., D.J. Bannon and S. Pokress, 1985, Calcif. Tissue Int. 37, 63. Sokal, R.R. and F.J. Rohif, 1969, Biometry: The Principles and Practice of Statistics in Biological Research (Freeman, San Francisco, CA).