Accepted Manuscript Title: Measuring behaviour accurately with instantaneous sampling: A new tool for selecting appropriate sampling intervals Author: Wilhelmiina H¨am¨al¨ainen Salla Ruuska Tuomo Kokkonen Saana Orkola Jaakko Mononen PII: DOI: Reference:
S0168-1591(16)30095-8 http://dx.doi.org/doi:10.1016/j.applanim.2016.04.006 APPLAN 4237
To appear in:
APPLAN
Received date: Revised date: Accepted date:
4-11-2015 31-3-2016 4-4-2016
Please cite this article as: H¨am¨al¨ainen, Wilhelmiina, Ruuska, Salla, Kokkonen, Tuomo, Orkola, Saana, Mononen, Jaakko, Measuring behaviour accurately with instantaneous sampling: A new tool for selecting appropriate sampling intervals.Applied Animal Behaviour Science http://dx.doi.org/10.1016/j.applanim.2016.04.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Measuring behaviour accurately with instantaneous sampling: A new tool for selecting appropriate sampling intervals Wilhelmiina Hämäläinena, Salla Ruuskab,c,1, Tuomo Kokkonend, Saana Orkolad, Jaakko Mononenb,c a
Academy of Finland, University of Helsinki, Department of Computer Science, P.O. Box 68, FI-
00014 University of Helsinki, FINLAND,
[email protected] b
University of Eastern Finland, Department of Biology, P.O. Box 1627, 70211 Kuopio, FINLAND
[email protected] ,
[email protected] c
Natural Resources Institute Finland (Luke), Green technology, Halolantie 31 A, 71750 Maaninka,
FINLAND d
University of Helsinki, Department of Agricultural Sciences, Animal Science, P.O. Box 28, FI-00014
University of Helsinki, FINLAND,
[email protected], Saana Orkola current: Taminco Finland Oy (a subsidiary of Eastman Chemical Company),
[email protected] 1
Corresponding author:
Salla Ruuska
[email protected] tel. +358-40-3553275
1
Highlights •
Accuracy of instantaneous sampling evaluated with computer simulations.
•
Random errors determined for five behaviours of dairy cows.
•
The starting point of sampling has a major effect on the error magnitude.
•
The new program can be used for selecting optimal interval lengths.
2
Abstract A central dilemma in instantaneous sampling (IS) is to select appropriate sampling intervals for different behaviours. Ideally, the interval should be as long as possible without risking the accuracy of obtained estimates. In this study, we developed a computational method for evaluating the accuracy of IS estimates for behaviour durations and for selecting optimal interval lengths. The method was used to test different IS protocols in the analysis of the behaviour of dairy cows in tiestalls. The data consisted of 29 days of continuous recordings (CR) from 16 dairy cows. Random error with sampling interval lengths of 0.5, 1, 2, …, 29 min and 30, 40, …, 120 min were estimated from the CR data for eating, ruminating, drinking, standing, and lying durations. For this purpose, each IS simulation was repeated starting from all possible seconds of the day. The difference between the real and estimated durations was characterised by five indices: The average error magnitude (AEM ± SD) estimated the expected error magnitude from a random starting point with the given IS interval. The error magnitude range (EMR), expressed as minimum – maximum errors, described the best and the worst scenarios for sampling. The probability of the error magnitude exceeding 10% (PEM10) and the upper bound of the error magnitude with probability 90% (EMP90) described the error magnitude distribution, i.e., the chance of getting an appreciable error and the likely maximum error, respectively. Generally, the errors increased with the interval length and shortterm behaviours produced the largest errors. As an example, AEMs and EMRs (in parentheses) for the commonly used IS-10 min were: eating 10.1% (0.0 - 57.0%), ruminating 3.3% (0.0 - 22.1%), drinking 68.9% (0.2 - 620.4%), standing 2.2% (0.0 - 19.9%), and lying 2.0% (0.0 - 13.4%). The most surprising finding was the dramatic effect of the starting point. Therefore, suitable interval lengths cannot be determined from individual simulations. As a solution, we suggest that researchers analyse their own pilot data with the introduced program using appropriate error bounds and confidence probabilities.
Key words: instantaneous sampling; computer simulation; accuracy; behaviour; dairy cow
3
1. Introduction Instantaneous sampling (IS, Martin and Bateson, 2007, pp. 48-61) is a systematic time sampling technique, which can be used to estimate the proportion of time spent in different behaviours or activities. The technique has been widely used in behavioural research of animals, including cattle (Palacio et al., 2015; Winckler et al., 2015), as well as human behaviour research ("momentary time sampling", Radley et al., 2015) and work measurement studies ("work sampling", Wolff et al., 2015, or “activity sampling”, Roll and Yadin, 1986).
The main attraction of IS is its efficiency. In IS, behaviour is recorded only at sampled time points. Therefore, IS reduces the amount of work needed to analyse animal behaviour, compared to continuous recording (CR), where the beginning and end of each bout of behaviour is recorded (Martin and Bateson, 2007, pp. 48-61). Since observations are made at regular time intervals, IS can also be used for scan sampling where several animals are monitored at the same time. Note that scan sampling is a sampling rule in the terminology by Martin and Bateson (2007) but it is sometimes used synonymously with IS (Mitlöhner et al., 2001; Neisen et al., 2009). In addition, automated data loggers recording behavioural data may use IS for data collection (Kitts et al., 2011; Müller and Schrader, 2003; Vasseur et al., 2012).
The dilemma with IS, like all sampling, is the trade-off between efficiency and accuracy. In IS, long sampling intervals are the most cost-efficient because they produce small samples (i.e., sets of sampling points) which are fast to process and allow more animals to be observed simultaneously. However, the smaller the sample is the more inaccurate are the resulting duration estimates. In principle, the inaccuracy is due to two kinds of errors: systematic error (bias) and random error (imprecision). A systematic error is a constant or predictable part of the error, which is usually estimated by averaging over a large number of repeated measurements (ISO 2006, Sections 3.3.2, 3.4.7). A random error is the remaining, unpredictable part of the error due to inherent variability between repeated measurements (ISO 2006, Sections 3.4.6, 3.3.4).
4
It has been stated that IS does not produce any systematic error (Ary and Suen, 1983), but this is true only if certain prerequisites are fulfilled (Sampath 2001, pp. 29-34, see also discussion later). Random errors are a bigger problem, because they are unavoidable and their effects can be quite considerable. The magnitude of random errors depends on the length of the sampling interval and its relationship to the frequency and duration of the studied behaviour (Ary and Suen, 1983).
The critical question is how to evaluate the error distribution and select appropriate sampling interval lengths for different behaviour categories. Computer simulations are a classic way of testing the accuracy of sampling protocols, i.e., how true and precise estimates the protocol produces. In the previous research, simulations have been used for determining suitable sampling interval lengths for cattle’s behaviour patterns by comparing different IS protocols to CR (Kitts et al., 2011; Ledgerwood et al., 2010; Mattachini et al., 2013; Miller-Cushon and DeVries, 2011; Mitlöhner et al., 2001). However, as far as we know, all of the previous studies have based the results on individual simulations, where each data set is processed just once from a single starting point. This can give misleading information - either too optimistic or too pessimistic - because the errors are heavily dependent on the starting point (Fig. 1).
The main objective of this study was to develop a generic computational method for evaluating the effect of random errors in IS and determining appropriate interval lengths which guarantee desired error and confidence bounds. The motivation is obvious: to utilize existing data – either from a pilot experiment or similar previous studies – in determining appropriate interval lengths for a new study. However, the criticality of errors varies, and therefore we wanted a method which maximizes the interval length (and, thus, minimizes the work), given an upper bound for the error and a lower bound for the probability how often it can be exceeded. The second objective was to test the new method with authentic CR data collected from dairy cows kept in tie-stalls and evaluate the suitability of different IS protocols for estimating the duration of eating, ruminating, drinking, lying, and standing behaviour. For this purpose, we developed an efficient computer program which simulates IS systematically with different interval lengths. Because our aim was to test the effect of random errors 5
(i.e., precision of estimates), the computer simulations were implemented in a manner which eliminated systematic errors. In practice, each observation period (cow-day) was sampled in a circular manner, beginning from all possible starting points (seconds of the day). A special algorithm was developed for this purpose, because such comprehensive simulations are computationally demanding. From these simulations, we calculated five measures (see Table 3) to characterise the size of random errors, including their probability distribution. In addition, we present graphs which can be used for selecting suitable sampling intervals with desired error bounds and confidence probabilities for other similar studies with dairy cows. However, the main contribution is the new generic program that can be used for selecting an optimal sampling interval length according to desired accuracy requirements from any CR data.
2. Materials and methods 2.1. Behavioural data The behavioural data were produced during another experiment (Viikki research barn, University of Helsinki, Finland), which studied the effects of two levels of grass silage feeding during a dry period on the behaviour and hypothalamus-pituitary-adrenal axis function of dairy cows (Orkola, 2011). During the early dry period, 16 Ayrshire dairy cows were fed with grass silage either ad libitum or restricted to 100% of energy requirement, while during the close-up period (starting from 21 d prior to the expected calving date), cereal concentrate was added to the diet (30% of energy intake). After calving all the cows were fed with grass silage ad libitum and the daily concentrate allowance was increased to 11 kg by d 14 of lactation. The cows were kept in tie-stalls and they had free access to individual drinking bowls. The cows' behaviour was video-recorded for three 24-hour periods: 23 and 10 days pre-parturition and 14 days post-parturition. Each cow was observed by two cameras, to catch both the postures (from the side) and feeding behaviour (from the front). The video recordings were analysed and behavioural observations were stored into log files (i.e., behaviour logs) using the CowLog software (Hänninen and Pastel, 2009).
6
In the current study we used 29 log files out of the original 48 files as our behavioural data because some files were unusable because some of the files did not include full 24h recording. Each behaviour log consisted of events of the form (time stamp, behaviour code), where the time stamp indicated the beginning of the behaviour (and, thus, the end of the previous behaviour). Because the log does not reveal the beginning of the first bout or the end of the last bout, they were excluded from the data. Thus, the actual lengths of animal days were slightly less than 24-hours (22 h 19 min 24 s at the shortest). We note that our objective was to study the effect of the IS-interval length on the accuracy of the estimated behaviour durations. Therefore, we ignored possible dependencies between the log files (there were up to three files per animal and up to 16 files per each time period of the production cycle of the cows).
The behavioural categories chosen for the accuracy analysis (Table 1) represent behaviours with varying total daily frequencies, lengths of bouts, and intervals between bouts (Table 2).
2.2. Evaluating the accuracy of IS-protocols A sampling strategy is considered accurate, if it produces no systematic errors and random errors are small. The absence of systematic errors means that the expected value of the error is zero. In practice, each sampling estimate contains some error, but on a long run, in repeated sampling, positive and negative errors should compensate each other. When systematic errors have been eliminated, the remaining errors are random errors. If their magnitudes (unsigned errors) are small, the estimate is considered precise. Requirements for sufficient precision depend of the modelling purpose. If all errors are critical, one may require that the random error never exceeds some threshold, say 10%. In less critical situations, larger errors may be allowed exceptionally, if most errors, say 90% of them, are sufficiently small. In the case of instantaneous sampling, one can eliminate systematic errors with a right sampling strategy but random errors can still be quite remarkable, depending on the data and the length of the sampling interval, Δ.
7
In the simplest form of instantaneous sampling (systematic time sampling, see e.g., Buhaug, 1978; Sampath, 2001, pp. 29-39), one selects randomly a starting point among the first Δ time points and then samples systematically every Δth time point (Fig. 2). If behaviour A occurs in h out of m sample points, d = h / m is an IS estimate for the relative duration of A. Similarly, hn / m gives an estimate for the absolute duration of A, when n -1 is the length of the whole time line (1,2, …, n). These estimates are unbiased (free of any systematic error), if n is a multiple of Δ (i.e., n = kΔ for some integer k) (Sampath, 2001, pp. 29-39, see also Fig. 2 A). However, if n ≠ kΔ, the unbiasedness is no more guaranteed, and the method can produce also systematic error. The simplest solution is circular systematic sampling (Sampath, 2001, pp. 43-44, see also Fig. 2 B), where one selects the starting point randomly among all n time points and then proceeds circularily, sampling every Δth time point. This method produces always an unbiased estimate for the duration of A (Sampath, 2001, pp. 43-44).
If IS is implemented as circular sampling from a random starting point, the only errors are random errors. Unfortunately, there are no closed form equations for estimating their size (one can give upper bounds for the error magnitude, but they are too loose to be useful; see Discussion). Consequently, there are no useful means for selecting optimal Δs.
In this study, the main objective was to estimate random errors by comprehensive computer simulations. For this purpose, we simulated all possible samples with the given Δ, estimated the durations of selected behaviours, and compared them to real durations (from continuous recording). For each simulation, the error magnitude was expressed as relative error (estimate−duration)/duration, and together they composed the random error distribution for the given data. Finally, results from all 29 log files were combined. Because the lengths of animal days and sampling intervals varied, we used circular systematic sampling to get unbiased results. The systematic error (average error over all simulations) was still checked.
8
From random errors, we calculated five indices (Table 3). Three indices, the average, median, minimum and maximum error magnitude (the latter two presented together as the error magnitude range), served to express the extent of the discrepancy, i.e., to give a general picture of the random error produced by IS (presented for 0.5, 1, 2, 5, 10, 15, 30, 60 and 120 min intervals). Two more indices, the probability of the error magnitude exceeding e% (PEMe, %) and the minimum upper bound of the error magnitude with probability of p% (EMPp, %), were calculated to be used as more practical tools for selecting an appropriate sampling interval, i.e., to estimate the risk of error. In this paper, we report only PEM10 and EMP90, because 10 % error bound and 90 % confidence bound are commonly used. The latter two indices are reported for IS protocols with Δ varying from 0.5, 1, 2,…, 29 min and 30, 40, …, 120 min.
2.3. Computer simulation All error indices were determined by comprehensive computer simulations, which tested all possible IS samples from each log file and reported the combined results for each interval length Δ. We note that for unbiased results, it is necessary to test all possible starting points, because n (the length of the animal day) is seldom a multiple of Δ. For example, when Δ=30 and n=30k+q, starting from 1 gives sample (1, 31, 61, …, 30k+1), but starting from 31 gives sample (31, 61, …, 30k+1, 31-q). Since all possible samples were tested, the resulting error distributions are exact, unlike in stochastic simulation, where only some randomly selected samples are tested.
The main idea of the simulation algorithm is presented in Figure 3. The algorithm shows only a naive implementation, where the inner-most for-loop (code lines 5-9) checks all possible up to 86400 starting points. In practice, this is quite time consuming, and the loop was implemented in a more efficient way, using a dynamic programming approach.
We note that the program is generic and can be used for any behavioural or other data of the given format with freely selected values for error bound e and confidence probability p in PEMe and 9
EMPp. The source code of the program is available on internet (https://sites.google.com/site/whsivut/home/sourcecode/isvalidate). The program has been implemented with the C programming language and works at least in linux/unix environments.
3. Results The average (signed) error was zero for all activity categories and all sampling interval lengths, which means that there was no systematic error. This is consistent with the statistical theory according to which IS is an unbiased sampling method when performed properly (i.e., from a random starting point) (see e.g., Sampath 2001).
The average error magnitudes (including standard deviations and medians) are given in Table 4 and the error magnitude ranges (minimum and maximum errors) in Table 5. As expected, all error magnitudes increased with the IS interval length. However, the differences between behaviour categories were large. For example, AEM for IS-10 min varied from 2.0% (lying) to 68.9% (drinking), and EMR from 0.0 - 13.4% (lying) to 0.2 - 620.4% (drinking). For all interval lengths, the error magnitudes increased in this order: lying, standing, ruminating, eating, and drinking. This was the same order as imposed by the behaviour frequency and reverse to the order by the bout length (Table 2). Graphs for the probability of the error magnitude exceeding 10% (PEM10, %) are presented in Fig. 4. For example, PEM10 of standing and lying increased slowly with the IS interval length and even for IS-30 min they were 14.8% and 12.8%, respectively. In contrast, PEM10 for drinking was nearly 60% already for IS-1 min and increased rapidly to 100%.
Graphs for the upper bound of the error magnitude with probability of 90% (EMP90, %) are presented in Fig. 5. Once again, the error magnitudes increased with the IS interval length. For example, if one wants to be 90% sure that the error magnitude does not exceed 10%, then one should select IS-22 min for standing, IS-26 min for lying, IS-15 min for ruminating, and IS-4 min for eating. For drinking, the corresponding IS interval length has to be less than 0.5 min (15 s). 10
4. Discussion The objective of this study was to test the accuracy of different IS protocols and to determine appropriate IS interval lengths for five cattle behaviour categories. For this purpose, we performed comprehensive computer simulations on authentic behavioural data. The results confirmed the known fact (Martin and Bateson, 2007) that the accuracy of IS estimates depends on the sampling interval length and the behaviour category. The most difficult is to estimate durations of frequently occurring, short-term behaviours (like eating and especially drinking). The reason is that the relative error has an upper bound (frequency*Δ)/duration, because in each bout the absolute error is at most Δ and in the worst case all bouts produce the maximum error [Ary and Suen (1983) give a somewhat similar upper bound, but it holds only for unpractically short Δs].Thus, the largest errors are obtained when the frequency is large and the total duration small, which also means short bout lengths. However, these upper error bounds are very loose and therefore not usable in determining appropriate Δs. Instead, we recommend to use the PEM and EMP measures, introduced in the present study.
The main contribution of this study was the analysis of error variance. This analysis showed that a simulation from a single starting point can give misleading information on the error magnitude. For example, the commonly used IS-10 produced in average only small to moderate errors for all behaviours except drinking but the error ranges were wide. In the worst case, the error magnitudes were five (eating 57.0%/10.1%, IS-10 min) to nine (standing 96.6%/10.6%, IS-60 min) times AEM (see Tables 4 and 5). However, in the best case, a single simulation could produce 0% error for all behaviours, including drinking. In general, short-term behaviours have the largest variance, because they can have the largest errors. In our study, the extreme case was drinking, where the error with IS-10 could be anything from 0 to 620%.
It is obvious that appropriate sampling intervals cannot be determined from individual IS simulations. Average error magnitudes, based on all possible starting points, are more informative but still not 11
sufficient for guaranteeing desired error bounds. For this purpose, we need the PEM and EMP measures. For example, if the maximum acceptable error magnitude is 10% (e.g., Martin and Bateson, 2007), AEM would suggest interval lengths of 0.5 min for drinking, 10 min for eating, 30 min for ruminating, and 60 min for standing and lying. However, with these interval lengths the probability that the error magnitude would exceed 10% is at least 36%. If one would like to be 90% confident that the 10% error bound holds one should select interval lengths of 15 s (drinking), 4 min (eating), 15 min (ruminating), 22 min (standing) and 28 min (lying).
In the previous studies, sampling interval lengths of 10 min for drinking (Mitlöhner et al., 2001) and feeding (Endres et al., 2005; Kitts et al., 2011; Mitlöhner et al., 2001) and 10 – 15 min for standing and lying (Mattachini et al., 2013; Mitlöhner et al., 2001) have been recommended. Compared to our results, an interval length of 10 – 15 min is more than sufficient for estimating the duration of standing and lying, but not enough for feeding, nor drinking, if one wants to guarantee the 10% error bound with 90% probability.
There are at least two reasons for this discrepancy, in addition to different data sets. First, the previous studies have neglected the effect of the starting point, and it is not known how representative the simulations were. Second, the accuracy of IS protocols has often been evaluated with correlation (Endres et al., 2005; Kitts et al., 2011; Mitlöhner et al., 2001) or linear regression (Miller-Cushon and DeVries, 2011; Mitlöhner et al., 2001) analyses, requiring correlation coefficients from 0.502 to over 0.95. However, the correlation coefficient tells very little on error magnitudes. For example, in our data, random IS-samples with Pearson correlation coefficient r = 0.90 could produce 20% error for eating (Δ = 28 min) and 36% error for drinking (Δ = 2.5 min). In addition, we recall that the correlation coefficient does not reveal systematic error. A linear regression model is slightly more powerful but still it does not optimize the accuracy (i.e., minimize the error), either. The problem is that the IS interval with the best linear regression model (i.e., r and regression slope b closest to 1.0 and intercept a closest to 0) does not necessarily mean the smallest error or error magnitude. For example, in our data we can find the following linear regression models for ruminating: 12
y=0.851x+0.045 (r = 0.867) when Δ = 22.5 min and y=1.114x-0.045 (r = 0.906) when Δ = 29.5 min. These models suggest that we should select Δ = 29.5 min, but actually it produces 2.2 percentage points larger error magnitude (9.9% vs. 7.7%) and larger bias (average error -4.3% vs. 0.9%).
Periodic behaviour patterns (i.e., behaviours with little variation in the bout and bout interval lengths) are always a special concern with IS. If the sampling interval length is the same as (or a multiple of) the behaviour period length, IS produces a biased sample with large error variance (Daniel, 2012, pp. 125-174). In our study, rumination was the only somewhat periodic behaviour. Still, it occurred so irregularly and the variance between behaviour logs was so large that no interference was observed in the average error magnitudes, when Δ matched the average cycle length (72 ± 59 min; average ± standard deviation). Some interference was observed in the behaviour log with the smallest cycle variance (68 ± 40 min), when Δ passed 60 min. However, one would hardly use so long intervals, because smaller values produce already too much imprecision. This suggests that the periodicity can affect the accuracy of the IS estimates when feeding behaviour is studied, but the corresponding interval lengths are too large to have any practical meaning.
Finally, we want to remind that one should be cautious in generalizing appropriate sampling interval lengths from one situation to other, e.g., from tie-stalls to loose-housing in dairy cattle, since the situation may affect the behaviour (Haley et al., 2000). If one wants to use existing data for selecting suitable interval lengths for future studies, it is best to select as similar data as possible (representative for the target population). Mixing data from different experiments (i.e., representative and unrepresentative data) produces unnecessary variation and results suboptimal Δs. Another important aspect is the size of data. In general, a large representative data set produces the most accurate duration estimates. In practice, this can be achieved by using more animals and gathering data from longer periods. For example, Ito et al. (2009) demonstrated that accurate estimates for measuring dairy cows' lying behaviour required at least three days continuous recording of at least 30 cows. Larger data sets produce also larger samples for fixed Δ and, thus, likely better ISestimates. Therefore, it may be possible to use longer Δs without compromising the accuracy. 13
5. Conclusions In this study, we developed a computational method for evaluating the accuracy of IS estimates for behaviour durations. The method was applied to dairy cows' behavioural log data and the effect of different sampling interval lengths was analysed for five behaviour categories. As expected, the error magnitude increased with the interval length and short-term behaviours produced the largest errors. Surprisingly, it turned out that the starting point has a dramatic effect on the accuracy of sampling, i.e., the minimum and maximum values of the error magnitude can vary substantially. Due to this large variance, average error magnitudes are not suitable for selecting appropriate IS interval lengths, and, thus, in the present study we introduced new practical tools for this purpose. Graphs for the probability of the error magnitude exceeding 10% and the upper bound of the error magnitude with probability of 90% aid researchers of dairy cows to choose appropriate interval lengths which guarantee small errors with high probability. However, we strongly recommend researchers to analyse their own behavioural data from a pilot study with the given program and determine suitable interval lengths with desired error and confidence bounds.
Acknowledgments We would like to thank Juha Suomi and the technical staff of the University of Helsinki’s research barn, Viikki, for their assistance with animal care. We gratefully acknowledge Laura Hänninen, Seija Jaakkola, Siru Salin, Vera Hakala and Pirjo Pursiainen for their contribution in planning and coordinating the original experiment. The experiment was funded by the Finnish Ministry of Agriculture and Forestry. Salla Ruuska’s work was supported by the Olvi foundation and Niemi foundation and Wilhelmiina Hämäläinen’s by the Academy of Finland (decision number 258589).
14
References Ary, D., Suen, H.K., 1983. The use of momentary time sampling to assess both frequency and duration of the behavior. J. Behav. Assess. 5, 143-150. http://dx.doi.org/10.1007/BF01321446 Buhaug, H., 1978. The sampling interval in systematic activity sampling. Int. J. Prod. Res. 16, 19-25. http://dx.doi.org/10.1080/00207547808929996 Daniel, J, 2012. Sampling essentials: Practical guidelines for making sampling choices. SAGE Publications Inc., California. http://dx.doi.org/10.4135/9781452272047 Endres, M.I., DeVries, T.J., von Keyserlingk, M.A.G., Weary, D.M., 2005. Short communication: Effect of feed barrier design on the behaviour of loose-housed lactating dairy cows. J. Dairy Sci. 88, 2377-2380. http://dx.doi.org/10.3168/jds.S0022-0302(05)72915-5 Haley, D.B., Rushen, J., de Pasillé, A. M., 2000. Behavioural indicators of cow comfort: activity and resting behaviour of dairy cows in two types of housing. Can. J. Anim. Sci. 80, 257-263. http://dx.doi.org/10.4141/A99-084 Hänninen, L., Pastel, M., 2009. CowLog: open source software for coding behaviors from digital video. Behav. Res. Methods 41, 472-476. http://dx.doi.org/10.3758/BRM.41.2.472 Ito, K., Weary, D.M., von Keyserlingk, M.A.G., 2009. Lying behavior: Assessing within- and between-herd variation in free-stall-housed dairy cows. J. Dairy Science. 92, 4412-4420. http://dx.doi.org/ 10.3168/jds.2009-2235 Kitts, B.L., Duncan, I.J.H., McBride, B.W., DeVries, T.J., 2011. Effect of the provision of a lownutritive feedstuff on the behavior of dairy heifers limit fed a high-concentrate ration. J. Dairy Sci. 94, 940-950. http://dx.doi.org/10.3168/jds.2010-3767 Ledgerwood, D.N., Winckler, C., Tucker, C.B., 2010. Evaluation of data loggers, sampling intervals, and editing techniques for measuring the lying behaviour of dairy cattle. J. Dairy Sci. 93, 51295139. http://dx.doi.org/10.3168/jds.2009-2945 Martin, P., Bateson, P., 2007. Measuring behaviour: An introductory guide, third ed. Cambridge University Press, Cambridge. http://dx.doi.org/10.1017/CBO9780511810893
15
Mattachini, G., Riva, E., Bisaglia, C., Pompe, J.C.A.M., Provolo, G, 2013. Methodology for quantifying the behavioural activity of dairy cows in a freestall barns. J. Anim. Sci. 91, 4899-4907. http://dx.doi.org/10.2527/jas.2012-5554 Miller-Cushon, E.K., DeVries, T.J., 2011. Technical note: Validation of methodology for characterization of feeding behavior in dairy calves. J. Dairy Sci. 94, 6103-6110. http://dx.doi.org/10.3168/jds.2011-4589 Mitlöhner, F.M., Morrow-Tesch, J.L., Wilson, S.C., Dailey, J.W., McGlone, J.J., 2001. Behavioral sampling techniques for feedlot cattle. J. Anim. Sci. 79, 1189-1193. Müller, R., Schrader, L., 2003. A new method to measure behavioural activity levels in dairy cows. Appl. Anim. Behav. Sci. 83, 247-258. http://dx.doi.org/10.1016/S0168-1591(03)00141-2 Neisen, G., Wechsler, B., Gygax, L., 2009. Choice of scan-sampling intervals ─ An example with quantifying neighbours in dairy cows. Appl. Anim. Behav. Sci. 116, 134-140. http://dx.doi.org/10.1016/j.applanim.2008.08.006 Orkola, S., 2011. Rajoitetun tai vapaan ummessaolokauden ruokinnan vaikutus lypsylehmien käyttäytymiseen ja hyvinvointiin. (The effect of feeding level to the behavior and welfare of dairy cows in dry period). (In Finnish). University of Helsinki, Faculty of Agriculture and Forestry. Master’s thesis, 43 p. Palacio, S., Bergeron, R., Lachance, S., Vasseur, E., 2015. The effects of providing portable shade at pasture on dairy cow behavior and physiology. J. Dairy Sci. 98, 6085-6093. http://dx.doi.org/10.3168/jds.2014-8932 Radley, K.C., O’Handley, R.D., Labrot, Z.C., 2015. A comparison of momentary time sampling and partial-interval recording for assessment of effects of social skills training. Psychol. Schools 52, 363-378. http://dx.doi.org/10.1002/pits.21829 Roll, Y., Yadin, M., 1986. Activity sampling in a stochastic environment. IEE Transactions 18(4) 343349. http://dx.doi.org/10.1080/07408178608975354 Sampath, S., 2001. Sampling theory and methods. CRC Press, Boca Raton / Narosa Publishing House, New Delhi.
16
The International Organization for Standardization (2006), ISO 3534-2:2006. Statistics – Vocabulary and symbols. Vasseur, E., Rushen, J., Haley, D.B., de Passillé, A.M., 2012. Sampling cows to assess lying time for on-farm animal welfare assessment. J. Dairy Sci. 95, 4968-4977. http://dx.doi.org/10.3168/jds.2011-5176 Winckler, C., Tucker, C.B., Weary D.M., 2015. Effects of under- and overstocking freestalls on dairy cattle behaviour. Appl. Anim. Behav. Sci. 170, 14-19. http://dx.doi.org/10.1016/j.applanim.2015.06.003 Wolff, J., McCrone, P., Patel, A., Auber, G., Reinhard, T., 2015. A time study of physicians’ work in a German university eye hospital to estimate unit costs. PloS ONE 10, e0121910. http://dx.doi.org/10.1371/journal.pone.0121910
17
Figure Captions
Fig. 1. An example demonstrating the dramatic effect of the starting point in IS. The picture has been modified from real log data depicting rumination behaviour during a cow day. The real duration of rumination is 38% of total time. A) IS-2h beginning at 00:00 detects rumination at every sampling point and the duration of rumination is estimated to be 100%. Relative error is (100–38)/38= 163%. B) IS-2h beginning at 00:40 misses all bouts and the duration of rumination is estimated to be 0%. Relative error is 100%. C) IS-2h beginning at 01:45 produces the optimal result. The duration estimate is 5/12=42% and relative error is (42–38)/38=10.5%.
18
Fig. 2. The main types of systematic time sampling. A) In the basic systematic time sampling, the starting point is selected randomly from 1,2, ..., Δ. If the last time point n is a multiple of Δ, q=0 and the duration estimate is unbiased. Otherwise, the samples are not of the same size, and the estimate is biased. B) In the circular systematic time sampling, the starting point is selected randomly from 1,2, ..., n and the sampling is continued from the beginning, as if time point 1 would be n+1. Now the duration estimate is always unbiased.
19
Fig. 3. The main idea of the simulation algorithm to validate IS protocols.
20
Fig. 4. A) Graphs for the probability of the error magnitude exceeding 10% (PEM10, %) for eating, ruminating, drinking, standing, and lying when the IS interval length varied between 0.5 and 120 min. The sampling interval lengths are presented once a minute from 1 to 30 min and thereafter (i.e., 40 – 120 min) every tenth minute. B) Magnification of picture A for IS interval length ≤ 30 min. The dash line illustrates the 10% PEM10 level.
21
Fig. 5. A) Graphs for the upper bound of the error magnitude with probability of 90% (EMP90, %) for eating, ruminating, drinking, standing, and lying when the IS length varied between 0.5 and 120 min. The sampling interval lengths are presented once a minute from 1 to 30 min and thereafter (i.e., 40 – 120 min) every tenth minute. B) Magnification of picture A for IS length ≤ 30 min. The dash line illustrates the 10% EMP90 level.
22
Tables Table 1 The behaviour categories and their definitions used in the present study. Behaviour Definition Eating A cow takes feed into its mouth and swallows it. The cow may masticate the feed. Ruminating A cow re-masticates and re-swallows the feed. A cow does not take feed into its mouth while ruminating. Drinking A cow’s nose is in the water bowl, and cow swallows water at regular intervals. Standing A cow is standing on at least three hoofs on the ground. Lying A cow is lying on the sternum or on the flank.
23
Table 2 Characteristics of the studied behaviour categories: frequencies, bout lengths, and intervals between bouts. Means, standard deviations (SD), and medians (in parentheses) were calculated from 29 animal days (24 hours minus the first and the last incomplete bouts). Behaviour Eating Ruminating Drinking Standing Lying
Frequency (mean ± SD) 41.5 ± 19.8 (45.0) 25.6 ± 5.6 (26.0) 49.0 ± 30.2 (38.5) 15.8 ± 5.5 (16.0) 15.8 ± 5.4 (16.0)
Bout length, min (mean ± SD) 4.7 ± 4.9 (4.5) 16.6 ± 15.0 (15.9) 0.3 ± 0.2 (0.3) 49.1 ± 47.8 (39.7) 50.2 ± 34.1 (49.1)
Bout interval, min (mean ± SD) 28.2 ± 63.9 (24.9) 40.7 ± 55.5 (38.3) 28.1 ± 66.6 (23.6) 50.9 ± 33.7 (46.7) 49.0 ± 47.8 (39.6)
24
Table 3 The indices and their abbreviations (in parentheses), verbal definitions, and references to equations in the algorithm code. Index (abbreviation)
Definition
Equation
Average error magnitude (AEM, %)
The expected error magnitude from a random starting point with the given IS interval length, averaged over all 29 animal days and presented as mean ± standard deviation (median)
Code lines 18-19
Error magnitude range (EMR, %)
Minimum and maximum errors illustrate the best and the worst scenarios for sampling (i.e., the smallest and largest difference between the real duration and its estimate)
Code lines 20-21
Probability of the error magnitude exceeding 10% (PEM10, %)
The probability that the duration estimate from one sampling deviates from the real duration at least 10%
Code line 22
Minimum upper bound of the error magnitude with probability of 90% (EMP90, %)
A confidence bound, i.e., the worst case error in 90% of all possible samples. In practice, EMP90 means that there is a 90% probability that the error magnitude from a randomly selected point is at most 90% (i.e., it ignores outliers)
Code line 23-24
25
Table 4 The average error magnitudes (AEM, %), expressed as mean ± standard deviation and medians (in parentheses) for selected IS interval lengths (Δ). Δ, min 0.5 1 2 5 10 15 30 60 120
Eating 0.6 ± 0.4 (0.6) 1.4 ± 1.0 (1.2) 2.8 ± 1.9 (2.9) 5.3 ± 3.8 (5.4) 10.1 ± 6.9 (9.1) 13.4 ± 9.7 (13.6) 21.2 ± 15.8 (21.5) 37.1 ± 26.4 (36.0) 62.8 ± 42.8 (58.8)
Ruminating 0.2 ± 0.2 (0.2) 0.4 ± 0.3 (0.4) 0.8 ± 0.6 (0.7) 1.8 ± 1.3 (1.7) 3.3 ± 2.3 (3.4) 4.8 ± 3.4 (4.7) 8.8 ± 6.5 (8.7) 19.6 ± 13.6 (18.5) 32.8 ± 23.4 (31.0)
Behaviour Drinking 9.4 ± 6.3 (9.4) 16.6 ± 11.4 (16.3) 26.8 ± 19.7 (24.9) 45.3 ± 33.2 (44.1) 68.9 ± 48.6 (65.6) 87.4 ± 55.6 (85.3) 122.8 ± 83.9 (131.1) 154.1 ± 142.0 (162.4) 175.3 ± 245.2 (180.5)
Standing 0.1 ± 0.1 (0.1) 0.3 ± 0.2 (0.2) 0.4 ± 0.3 (0.4) 1.1 ± 0.8 (1.0) 2.2 ± 1.5 (2.2) 2.9 ± 2.1 (2.9) 5.6 ± 4.0 (5.3) 10.6 ± 7.4 (9.5) 18.7 ± 13.0 (18.2)
Lying 0.1 ± 0.1 (0.1) 0.2 ± 0.1 (0.2) 0.4 ± 0.2 (0.4) 1.0 ± 0.7 (0.9) 2.0 ± 1.4 (1.8) 2.6 ± 1.8 (2.4) 5.1 ± 3.7 (4.8) 9.9 ± 6.8 (9.1) 17.0 ± 12.2 (14.7)
26
Table 5 The error magnitude range (EMR, %) expressed as minimum and maximum magnitudes for selected IS interval lengths (Δ). Δ, min 0.5 1 2 5 10 15 30 60 120
Eating 0.0 – 4.2 0.0 – 8.7 0.0 – 17.2 0.0 – 37.2 0.0 – 57.0 0.4 – 76.8 0.0 – 133.5 0.4 – 181.0 0.4 – 330.0
Ruminating 0.0 – 1.6 0.0 – 3.0 0.0 – 6.1 0.0 – 12.4 0.0 – 22.1 0.0 – 29.3 0.0 – 61.0 0.1 – 146.1 0.1 – 277.4
Behaviour Drinking 0.1 – 55.2 0.1 – 117.5 0.2 – 189.8 0.2 – 442.2 0.2 – 620.4 0.5 – 707.6 12.2 – 1383.3 29.9 – 2304.0 100.0 – 4162.5
Standing 0.0 – 0.9 0.0 – 2.2 0.0 – 4.1 0.0 – 6.9 0.0 – 19.9 0.1 – 25.6 0.0 – 45.4 0.2 – 96.6 1.2 – 118.4
Lying 0.0 – 0.6 0.0 – 1.1 0.0 – 2.1 0.0 – 6.7 0.0 – 13.4 0.0 – 15.1 0.1 – 47.7 0.0 – 60.0 0.6 – 84.1
27