An Approach to Summarize Somatic Cell Score Trendsfor a Data-Driven, Decision Support System1

An Approach to Summarize Somatic Cell Score Trendsfor a Data-Driven, Decision Support System1

An Approach to Summarize Somatic Cell Score Trends for a Data-Driven, Decision Support System1 H. G. ALLORE and L R. JONES2 Department of Animal SCien...

451KB Sizes 2 Downloads 29 Views

An Approach to Summarize Somatic Cell Score Trends for a Data-Driven, Decision Support System1 H. G. ALLORE and L R. JONES2 Department of Animal SCience Cornell University Ithaca, NY 14853 ABSTRACT

A set of analytical routines were developed to determine significant trends in values for herd DHI somatic cell scores. These trends were used as input to a data-driven, decision support system to aid mastitis management of dairy cattle. The trends of interest were those experienced over the last six DHI sample periods for the entire herd, for three parIty groups, and for three stages of lactation. First, cows within the herd were split randomly into two equal groups to account for within-herd variation. For each group, linear regression was calculated for somatic cell score over time within each parity by stage of lactation group. The 18 slope estimates were then analyzed using a two-way ANOVA to test for main fixed effects of parity, stage of lactation, and their interaction. The results of the trend analyses were converted to facts that were asserted to an embedded expert system for further evaluation. (Key words: mastitis, expert system, decision support system) Abbreviation key: SCS

t

= somatic cell score.

INTRODUCTION

Expert systems were first introduced in aniagriculture in 1988 (7). Since then, functional systems for animal agriculture have been slow to appear in the literature in part because of the complexity of the computer theory, which initially was not understood.

~al

Received July 29, 1994. Accepted February 9, 1995. .IThis study is part of the NC·119 Regional Project, Dllll)' Herd Management Strategies for Improved Decision Making and Profitability. 2Corresponding author. 1995 J Dairy Sci 78:1377-1381

There are two general classes of expert systems, knowledge-driven and data-driven. The difference between these classes is their method of operation. Knowledge-driven expert systems operate by collecting data to support or refute the inherent knowledge. These systems generally require substantial data collection and entry by the user. Conversely, a datadriven expert system operates by applying data to a knowledge base, and conclusions are derived through deduction from that data. Data are generally applied in a batch mode, and conclusions are drawn with little or no user interaction. Consequently because a database exists, data-driven expert systems are the most appropriate for evaluating DHI data and have been used for interpreting DHI reproduction data (5). Mastitis is widely recognized as the most costly production disease facing the dairy industry. An important monitor of mastitis status is the SCC of milk. Evaluation of patterns in sec data can provide information that suggests weaknesses in a strategy for mastitis control. However, extension publications that account for parity structure or stage of lactation generally analyze SCC data only at a single time point. Other techniques have been designed to evaluate clinical mastitis cases (6) or to determine when infections have become clinical (3). The approach used in this study was to examine long-term trends of SCC as a method of evaluating herd programs for mastitis control. The primary objective of this study was to utilize statistical techniques that convert DHI sce data into input information for a datadriven analysis system called MAST (2), which evaluates mastitis control. In particular, the goal was to analyze SCC records for individual cows and to determine whether a significant increase or decrease in herd SCC existed and whether either a parity group or a particular stage of lactation had significantly increased or decreased sec values. These trends assisted

1377

1378

ALLORE AND JONES

TABLE 1. Example data design for regression analysis SCS record for first parity cows: A. B. C. D. E. F, G. and H.I Stage of lactation Sample period

Early

Mid

Current Last month 2 mo ago 3 mo ago 4 mo ago 5 mo ago

81 °1 GI F2 FI AI

D2. AS, A4, A3 A2, ES

I Subscripts

Late FS. G3 F4, G2 F3

~. ElO

E'.<)

Es

~, HlO

~

H9 ClO, HS

represent sample period (e.g., 1 = first sample period).

the expert system in detennining specific management recommendations regarding mastitis control. MATERIALS AND METHODS

MAST is a decision-support system that contains an embedded expert system. In total, MAST contains five modules, but this article focuses on the statistical analysis module. These functions are perfonned with database and C language routines. The MAST module uses electronic lactation records for individual cows from the Cornell Dairy Records Processing Laboratory. Somatic cell scores (SCS) are used to represent SCC and are preferable for trend analysis because the log linear transformation, [loglO(SCCI 12.5(0)/loglO; (2)], reduces skewness and kurtosis, increases normality, improves additivity, and results in a homogeneity of variance (1). Individual cow records for test days up to 300 DIM were grouped and summarized by parity (1, 2, and ~3) and status (active or culled) with the restriction that a maximum of 10 test days were retained. Ten sample periods of 30 d were defined. Because DR! does not collect milk records for cows <7 DIM, the first sample interval was only 7 to 30 DIM. To analyze SCS trends over the past six sample periods, individual SCS data were represented in the C language as a linked list of structures. Next, the herd was randomly spilt into two groups by selecting alternating cows and placing all of their SCS records into the appropriate group. With the herd split randomly into two groups, a measure of withinherd variation could be calculated because two sets of samples were now available per herd. The two herd groups that resulted from the splitting routine were further divided based on Journal of Dairy Science Vol. 78. No.6. 1995

parity (I, 2, and ~3 lactations) and stage of lactation (early, 7 to 45 DIM; mid, 46 to 180 DIM; and late, 181 to 300 DIM). Sample periods were sorted, and only the past six observations were analyzed. Management changes were more likely to be detected viewing the data over this shorter period rather than over the entire 3QO-d period. Not all cows had the same pattern or appeared the same number of times in the design. An example pattern is shown in Table 1. Because herds are normally tested at monthly intervals, cows usually had up to two early SCS values, up to five mid SCS values, and up to five late SCS values. For example, a cow currently at 25 DIM would appear only once in the design in the early stage of lactation for the current sample period (e.g., cow B in Table 1). A cow at 300 DIM would appear six times in the design, four times in the late stage of lactation, and twice in the midstage of lactation (e.g., cow E in Table 1).

A linear regression of SCS over calendar time was performed for each stage of lactation for the three parity groups within each of the randomly split groups, resulting in 18 slope estimates, as seen in Table 2. The slope esti-

TABLE 2. Slope estimates I from the two herd groups. A and B. Stage of lactation Parity

Early

Mid

Late

1 2

BlIA BIIB BI2A B21B B31A B318

BI2A BI2B B22A 822B B32A 8 32B

BI3A BI3B B23A BnB B33A 833B

~3 1Bij'

where B is slope estimate. i is parity class (1,.2, and j is stage of lactation class (I = early. 2 = I1I1d, and 3 late); A and 8 are herd groups.

~3).

=

OUR INDUSTRY TODAY

mates, B , where i is parity class and j is stage 9 of lactatIon class, represented a measure of SCS change over the past six sample periods. These slope estimates were analyzed in a twoway ANOVA with replication to test for significant fixed effects of parity, stage of lactation, and the interaction of parity by stage of lactation. Because stage, parity, and stage by parity were fixed effects, the mean sums of squares for replication was used as the error term when the F statistic was calculated for these effects. Because herds varied in size and seasonality of calving, not all herds had two slope estimates per cell. Therefore, it was necessary to determine whether the results of the regression routine provided for a complete and balanced 3 x 3 matrix with two values per cell. If an observation was missing or a cell was empty, the matrix was modified before the ANOVA was performed; most likely, the early stage had a missing observation. If a slope estimate was missing in the early stage of any parity, the column for early stage of lactation was removed. If a slope estimate was missing for any parity group for mid or late lactation, then the row for that parity group was removed. If more than one row or column was removed from the matrix, then the trend analysis was halted based on too few observations. This procedure ensured that the tWQ-way ANOVA routine would only be done with complete and balanced design of either a 3 x 3. 2 x 3, or 3 x 2 matrix. Whenever the ANOVA design was not a complete 3 x 3, then a fact stating this was asserted to MAST. All analyses of the ANDVA results were performed at P = .25 to limit the Type II errors, because these errors were considered more costly to managers trying to evaluate management practices in light of SCS trends. To test the null hypothesis that the slope is equal to zero on the overall mean, a two-sided t test was used to determine whether the herd had increased. decreased. or had not changed with respect to SCS. If the result was statistically significant, a fact stating the parameter and resultant sign was asserted to MAST. The mean sums of squares for replication was used as the denominator of the t test. Subsequently, if the F statistic for the interaction term was significant, a fact stating this was asserted to MAST, and testing of the main effects was

1379

bypassed. Instead, each cell mean was analyzed. Because multiple tests were performed, Bonferroni's approach (4) for constructing simultaneous estimations of mean responses was applied to test against the correct probability value. Hence, a two-sided t test with nine cells was performed against P = .0138. If the result was significant, a fact stating the parity, stage, and resultant sign was asserted to MAST. If the interaction term was not significant. then the main effects were tested for significance. If parity was significant, then a twosided t test was applied to the marginal means. The resulting significance level and sign determined whether the SCS of each parity had increased. decreased, or had not significantly changed. If the result was significant, then a fact stating the parity and resultant sign was asserted to MAST. Similarly. the main effect of stage of lactation was tested. If the result was significant, then a fact stating the parameters stage of lactation and resultant sign was asserted to MAST. The complete analysis of two example herds is presented in the Appendix. Interaction between parity and stage of lactation was significant for herd A; main effects (parity and stage of lactation) were significant for herd B, but an interaction was not detected. RESULTS AND DISCUSSION

A data-driven expert system performs deduction on a set of data (or facts) using a collection of rules. The approach utilized in this project was to convert DHI data into statistical statements that served as the input to the expert system. The statistical module determined the trend of SCS for the herd and by parity and stage of lactation over the preceding 6 mo. The procedures used to determine trends of SCS over time included a splitting routine, simple linear regression. and two-way analysis of variance. Splitting the herd randomly into two groups allowed for a measure of within-herd variation. The simple linear regression returned slope estimates that indicated whether the SCS for the stages of lactation had increased, decreased, or stayed the same over the preceding 6 mo. The final step was to perform a two-way ANOVA. The results of this analysis provided Journal of Dairy Science Vol. 78, No.6, 1995

1380

ALLORE AND JONES

input to the embedded expert system of MAST to resolve several questions. For example. analytical results determined the existence of either increasing or decreasing significant changes in herd SCS; highlighted trends in SCS in early. mid. and late lactation. as well as parity; and determined whether differences in SCS trends existed between stages of lactation or parities. Several assumptions were made to analyze the herd SCS trend. The structure of the data consisted of consecutive repeated measures from current lactations of varying lengths, resulting in higher correlations between consecutive measures than between nonconsecutive measures and in the common occurrence of unequal observations per cow. However, ANDVA constraints were that observations were independent. data design was balanced, and SCS were normally distributed with a homogeneous variance. We determined that the average dairy farm in New York state had too few cows to use any method that would remove the repeated measures problems, and the influence of this problem was thought to be small. Additionally, all observations were included although a minimum of two observations was needed for each calculation. Furthermore, prior work into the validity of using SCS in statistical analysis (1) indicated that all basic assumptions behind the analysis were met. Finally, if testing intervals were bimonthly or greater, this situation was flagged by the program so that statistical analysis for trend was bypassed, although heuristic analysis by the expert system occurred. Some problems associated with this set of statistical procedures were found when analyses were attempted for small herds or herds with significant seasonal calving patterns. A minimum number of observations per cell were required for these routines to be valid. A limit was set of two observations per slope estimate and two slope estimates per cell in the ANDVA design. If there were fewer than two observations, no slope estimate could be calculated. If there were fewer than two slope estimates per cell in the two-way ANDVA for the early stage of lactation, the column for early stage of lactation was removed. If there was a missing observation in the mid or late stages, then the row for parity was removed. Analysis Journal of Dairy Science Vol. 78. No.6. 1995

was only carried out on complete and balanced designs: 3 x 3, 3 x 2. and 2 x 3. Smaller matrices were thought to contain too little useful management information, although they could be heuristically analyzed by the expert system. CONCLUSIONS

Expert systems normally have utilized heuristics to draw conclusions from a set of data. However, other traditional techniques of data analysis should be included in a datadriven decision support system when appropriate. In this study, traditional techniques for statistical summarization were used to develop statements regarding SCS trends. These statements served as the input to the embedded expert system. This approach should be carefully considered for other data-driven decision support systems. ACKNOWLEDGMENTS

The authors gratefully thank G. Casella for statistical advice. This study was partially funded by Northeast DHIA. REFERENCES 1 Ali, A.K.A., and G. E. Shook. 1980. An optimum transformation of somatic cell concentration in milk. 1. Dairy Sci. 63:487. 2 Allore. H. G., L. R. Jones. W. G. Merrill, and P. A. Oltenacu. 1995. A decision support system for evaluating mastitis information. J. Dairy Sci. 78: 1382. 3 Elvinger, F., R. C. Littell, R. P. Natzke. and P. J. Hansen. 1991. Analysis of somatic cell count data by a peak algorithm to determine inflammation events. J. Dairy Sci. 74:3396. 4 Neter, 1.. W. Wasserman. and M. H. Kutner. 1990. Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs. Irwin, Homewood, IL. 50ltenacu, P. A.. J. D. Ferguson, and A. J. Lednor. 1990. A data-driven expert system to evaluate management, environmental, and cow factors influencing reproductive efficiency in dairy herds. 1. Dairy Sci. 73(Suppl. 1):145.(Abstr.) 6 Schuken, Y. H., G. Casella, and J. van den Broek. 1991. Overdispersion in clinical mastitis data from dairy herds: a negative binomial approach. Prev. Vet. Med. 10:239. 7 Spahr. S. L., L. R. Jones, and D. E. Dill. 1988. Expert systems-their use in dairy herd management. J. Dairy Sci. 71:879.

1381

OUR INDUSTRY TODAY APPENDIX

Herd A TABLE AI. Slope estimates from linear regression. Stage of lactation Parity

Early

Replicate

2 3+

I

.8500

2 1 2 1 2

7000

-.2182 -.5365 .7719 .6529

Mid

Late

-.0113 -.0427 -.0678 -.0959 -.1198 -.1843

.1774 -.1042 -.1200 -.0583 .2105 .1357

TABLE A2. Results from the two-way ANOV A. Source

df

SS

MS

F

p

Parity Stage Interaction Replication Total

2 2 4 9 17

.7605 6676 1.0035 .1166 25480

.3803 .3338 .2509 .0129

29.4295 51.6688 19.4162

.0001 .0001 .0002

• • • •

SCS trend Interaction SCS trend creasing. SCS trend ity cows

for the herd is increasing. effect is significant. for early lactation first parity cows is infor early lactation third and greater paris increasing.

• • • • • • • • •

Figure AI. Infonnation asserted to expert system for herd A. SCS = Somatic cell score.

SCS trend for the herd is increasing. Parity effect is significant. Stage effect is significant. SCS trend for first parity cows is increasing. SCS trend for second parity cows is increasing. SCS trend for third and greater parity cows is decreasing. SCS trend for early stage of lactalion is increasing. SCS trend for mid stage of lactation is decreasing. SCS trend for late stage of lactation is increasing.

Figure A2. Information asserted to expert system for herd B. SCS = Somatic cell score.

Herd B TABLE A3. Slope estimates from linear regression. Stage of lactation Parity

2 3

Replicate

Early

Mid

Late

1 2 1 2 1 2

.2587 .1121 .2711 -.0615 .2384 .0501

.1218 .0158 -.0884 -.0390 -.1309 -.0561

.1744 .2159 .2065 .2041 .0504 -.2048

TABLE A4. Results from the two-way ANOVA. Source Parity Stage Interaction Replication Total

df

2 2 4 9 17

SS

MS

F

P

.0760 .1011 .0631 .1269 .3671

.0380 .0506 .0158 .0141

2.6954 7.1757 1.1189

.1210 .0137 .4056

Journal of Dairy Science Vol. 78, No.6, 1995