Statistical significance in comparative ethological experiments

Statistical significance in comparative ethological experiments

Applied Animal Behaviour Science, 16 (1986) 303-308 303 Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands Commentary Stat...

304KB Sizes 0 Downloads 77 Views

Applied Animal Behaviour Science, 16 (1986) 303-308

303

Elsevier Science Publishers B.V., Amsterdam - - Printed in The Netherlands

Commentary

Statistical Significance in Comparative Ethological Experiments J.A. HOEKSTRA and J. JANSEN

Institute of Applied Computer Science TNO, P.O. Box 100, 6700 AC Wageningen (The Netherlands) (Accepted for publication 30 September 1986)

ABSTRACT Hoekstra, J.A. and Jansen, J., 1986. Statistical significance in comparative ethological experiments. Appl. Anita. Behav. Sci., 16: 303-308.

This paper is concerned with the statisticalanalysis of data from comparative experiments in applied animal behaviour research. It is emphasized that the statisticalanalysis of experimental resultsshould be in accordance with the design of the experiment. An example is given to illustrate this.

INTRODUCTION

The use of statistical methods is common practice in applied ethological research. In many papers in Applied Animal Behaviour Science (AABS), authors present P-values to show the strength of evidence of their conclusions. After reading 50 of these papers in AABS-issues of 1984 and 1985, we found that in about 25 cases statistical methods were used incorrectly. The main defect was that observations entered into test statistics were not independent. In a number of cases it was totally unclear how the authors made their computations. Provision of P-values calculated in an incorrect way gives a pretence of objectivity to possibly unjustified conclusions. It is therefore very important that ethologists become more aware of the fact that the design of the experiment can imply dependency between observations, and that one should take account for this in the statistical analysis. This will be explained for two situations occurring regularly in applied animal behaviour research: the situation in which animals are housed in groups, and the situation where the same animal is observed repeatedly to study the development of its behaviour. A more general treatment of the subject can be found in Cochran and Cox (1957) and Cox (1958). 0168-1591/86/$03.50

© 1986 Elsevier Science Publishers B.V.

304 ANIMALSHOUSEDIN GROUPS The aim of many studies in applied behaviour research is to compare different treatments, e.g. housing systems, with respect to some relevant behaviour. Testing a difference between two treatments amounts to answering the question whether there is sufficient evidence that the observed difference is due to the treatments and not the result of accidental differences between the animals or groups of animals that received the treatments. Any statistical test, including the classical t-test but also the more popular rank tests, assumes that if there are no treatment effects every ranking of the observations will be equally likely. To maintain this property, the outcome of one observation should be independent of the outcome of another observation. This is not guaranteed for observations on animals housed in groups. For example, the observed behaviour may be correlated with the rank order of the animals within each group ( e.g. dominant animals perform the behaviour more frequently than subordinate ones). Hence if the data of all individual animals involved in the study are ordered, a within-group pattern will arise even in absence of treatment effects. The same type of argument holds for behaviour that involves synchronization within groups. Only for observations on animals of different groups is every ordering equally likely. A valid procedure for testing the effect of treatments on the behaviour of group-housed animals may be obtained by summarizing the data of all animals of one group in one relevant quantity and entering these quantities, one for each group, into the test statistic. Of course, there can be more than one quantity of interest. Either these should be tested separately, or a multivariate procedure should b e used assuming correlation between the quantities observed in one group. The basic units to which treatments are allocated (at random! ) and which form the material for the statistical test are called experimental units. A consequence of regarding groups instead of animals as experimental units is that the number of degrees of freedom for testing is greatly reduced. Sometimes a mere case-study remains. The following example signifies the relevance of correctly identifying the experimental units.

Example The example involves an experiment set up to compare two housing systems with respect to the behaviour of piglets ( Schouten, 1986). One housing type was a large straw pen, the other a farrowing crate. The variable discussed in this example is the time spent by the piglets on sucking relative to the total amount of time they are active. The data were obtained in the 6th week after birth and just before weaning.

305

TABLE I Analysisofvarianceon suckingofpigletsin the 6th week afterbirth Source of variation

Using pigletsas units Df

Sum of squares

Sister-pair Housing system Residual

3 1 59

384.2 893.3 10253.7

Total

63

Using littersas units F-test

Df

Sum of squares

5.140"

3 1 3

48.0 111.7 138.2

F-test

2.424

7

In the experiment there were 8 sows, fallinginto 4 sister-pairs.Within each sister-pair,the pigs were assigned to the housing types at random. All litters consisted of 8 animals, so that there were 64 piglets in the entire experiment. Full details of the experiment can be found in Schouten (1986). Table I gives the analysis of variance for the situation in which individual piglets are incorrectly considered as experimental units. The total number of observations is 64, so that after correction for differences between littersfrom different sister-pairs and housing systems, 59 degrees of freedom for error remain. From this analysis one would conclude that housing systems significantly affect sucking (F ~9 = 5.140; P-- 0.027 ). However, instead of piglets,the litters are the experimental units, because sow-litter combinations were assigned to housing types. A simple way to analyse the data accordingly is to calculate the mean sucking time per litterand to carry out an analysis of variance on these 8 numbers. The analysis should also reflect the fact that the allocation to treatments took account of sister-pairs (i.e.one sister of each pair to each housing type). This is done by inclusion of"sister-pair" as source of variation. (This is exactly equivalent to a paired t-test.The analysis of variance is presented here for comparison with the earlieranalysis). The results of this analysis are also shown in Table I. Only 3 degrees of freedom for error estimation are available.There is no conclusive evidence for t h e existence o f a h o u s i n g effect o n s u c k i n g (F~ = 2.424; P = 0.217).

D E V E L O P M E N T OF BEHAVIOUR In some ethological studies, interest focuses on the development of a behaviour in time in relation to the treatments. In such experiments, the animals under study are observed at a number of times. Of course successive measurements on the same animal or group of animals cannot be regarded as independent. As in the situation described in the preceding section, one should summarize the measurements on each unit by one or several quantities. The

306 quantities should describe the temporal change in behaviour, for example in case of a linear trend one could take the slope. The next step is to analyse these quantities, one for each unit, by an appropriate method ( rank order test, analysis of variance, etc. ). An alternative aproach sometimes used for successive measurements is to carry out a so-called split-plot analysis. However, this is not recommended, as the validity of the analysis depends on the strong assumption that time-profiles of different groups with the same treatment are parallel (Rowell and Waiters, 1976; Keen et al., 1986).

Example In the piglet study discussed in the earlier section, one was also interested in the development of exploratory behaviour of the piglets during the 6 weeks before weaning. On each of the 8 litters, observations had been made in Weeks 1, 2, ..., 6. The data on individual piglets were averaged per litter and per week, as shown in Fig. 1. We see from Fig. 1 that the amount of time spent on exploration is definitely increasing with time and that a difference exists between the two treatments. However, it is not at all clear whether the treatment difference is changing during the 6 weeks before weaning. In order to make a valid comparison between the treatments, the following calculations were performed for each litter:

Average= ( 11,+ 112+ Y3+ Y4÷ }75+ Y6)/6 Slope= ( -5Y1-3112 - Y3 + Y4 +3115 ÷ 5Y6)/70. Here, Y1, Y2, ..., Y6 represent the observations in Weeks 1, 2, ..., 6. Parabolic and higher-order effects can also be calculated by referring to a table of orthogonal polynomial coefficients (cf. Pearson and Hartley, 1972 ). The values for "average" as well as "slope" can be subjected to analysis of variance. The results relating to the development of exploratory behaviour are given in Table II. It appears that the housing systems affect the average amount of time spent on exploration during the 6 weeks before weaning, but it also appears that the difference between the two housing systems becomes larger over time. DISCUSSION The design of an experiment involves identification of the experimental units and allocation of the units to the treatments by a well-defined procedure. In this paper, we stress the fact that statistical tests carried out later should take account of the design. When analysing the data, differences between treat-

307 I

I

I

I

I

I

6 56

7

5,6

48

40

32 I

2,3

24

16

>i/

/_yjJ-

WEEK

Fig. 1. Mean time spent on exploratory behaviour per litter per week. Litters 1, 2, 3 and 4 were housed in a farrowing crate, Litters 5, 6, 7 and 8 in a straw pen.

T A B L E II Analysis of variance on exploratory behaviour of piglets during Weeks 1-6: average and slope Source of variation

Average

Slope

Df

Sum of squares

Sister-pair Housing system Residual

3 1 3

236.03 4920.44 275.24

Total

7

5431.71

F-test

53.63**

Sum of squares 27.36 29.04 1.75 58.15

F-test

49.81"*

308 ments should be tested against variation between these units. If few units are included in an experiment, this variation cannot be estimated very precisely, as was the case in the example. This affects the power of the test adversely, and one should therefore try to include a sufficient number of units in the experiment. We have the impression that in many ethological experiments the effort spent on the measurement of the behaviour of one unit is too high compared with the number of units. The right balance between number of units and number of observations per unit depends on the relative magnitude of variability between and within units, as well as the costs involved. Sometimes researchers will find that their study consists of one experimental unit only, so that no statistical test can be carried out. This does not mean that the investigation is useless, b u t only that it should be regarded as a casestudy. M a n y important biological theories have developed from case-studies, e.g. Darwins evolution theory. However, the confirmative use of P-values in case-studies is nonsensical; statistics should there be used for descriptive purposes only. ACKNOWLEDGEMENT Dr. W.G.P. Schouten is t h a n k e d for supplying the data for the example.

REFERENCES

Cochran, W.G. and Cox, G.M., 1957. Experimental Designs. 2nd edn. Wiley, New York, 611 pp. Cox, D.R., 1958. Planning of Experiments. Wiley, New York, 308 pp. Keen, A., Thissen, J.Th.N.M., Hoekstra, J.A. and Jansen, J., 1986. Successive measurement experiments: analysis and interpretation. Stat. Neerl., 4 (1986). Pearson, E.S. and Hartley, H.O. (Editors), 1972. Biometrika Tables for Statisticians. Vol. 2. Cambridge University Press, Cambridge. Rowell, J.G. and Walters, J.E., 1976. Analysing data with repeated observations on each experimental unit. J. Agric. Sci., 87: 423-432. Schouten, W.G.P., 1986. Rearing conditions and behaviour in pigs. Ph.D. Thesis, Agricultural University of Wageningen.