Preventive Veterinary Medicine, 11 ( 1991 ) 37-54
37
Elsevier Science Publishers B.V., Amsterdam
Simultaneous-equations bias of animal production systems Ivar Vhgsholma, Tim E. Carpenter a, Richard E. Howitt b aDepartment o.fEpidemiology and Preventive Medicine, School of Veterinary Medicine, UniversiO,of California at Davis, CA 95616, USA bDepartment of Agricultural Economics, University of California at Davis, CA 95616, USA (Accepted for publication 8 March 1991 )
ABSTRACT Vhgsholm, I., Carpenter, T.E. and Howitt, R.E., 1991. Simultaneous-equations bias of animal production systems. Prey. Vet. Med., I 1: 37-54. The problem of simultaneous equations and their ensuing bias when examining animal production is the result of either: ( 1) true biological interactions; (2) aggregated observation periods; or/and (3) the fact that in multiple-output production processes, economic theory of production predicts a simultaneous determination of the optimum output levels. The simultaneous-equationsbias should be considered if the conceptual analysis of the biological system suggests the presence of the bias. A standard linear model (such as the ordinary least-squares regression model) which ignores this simultaneity might produce biased and inconsistent regression coefficients. In the present study of milk production, the different estimates and significance levels of the regression coefficients of a simultaneous-equations model compared with an ordinary least-squares regression model indicated the presence of simultaneous-equationsbias. A practical consequence of ignoring this bias is the flawed estimation of herd health benefits, e.g. seven times larger benefits from improved reproduction were indicated if the simultaneous-equationsbias was accounted for. The population studied was the 1913 dairy herds participating in the Norwegian farm efficiency program during 1987.
INTRODUCTION
Simultaneous equations and their ensuing bias have received a lot of attention in econometric studies (e.g. the study of supply and demand for food by Girschick and Haavelmo ( 1947 ) ). We suggest that the simultaneous-equations bias should also be suspected when examining animal production systems such as dairying. Interacting production processes (such as we find in animal production) arise either from true biological interactions or from aggregated observation periods. They both result in simultaneity in the observations. Moreover, in multiple-output production processes (milk and meat), economic theory of production shows that the most efficient output levels are determined jointly 0167-5877/91/$03.50
© 1991 - - Elsevier Science Publishers B.V.
38
I. V/~GSHOLMETAL.
or simultaneously (Beattie and Taylor, 1985 ). Well-specifiedstatistical models should account for these interdependencies (sometimes termed the 'systems problem') between the outputs. A standard linear model (such as the ordinary least-squares regression model) which ignores this simultaneity risks the simultaneous-equationsbias, which causes biased and inconsistent regression coefficients (Kmenta, 1986 ). Accordingly, if the simultaneous-equations bias were present, we expected the regression coefficients to have different estimates and significance levels in a simultaneous-equations model compared with an ordinary least squares regression model. Path analysis (Erb et al., 1981) and simultaneous equations (Kmenta, 1986 ) are model specifications that consider the systems problem. However, path analysis assumes unidirectional flows of causation (recursive structures), while simultaneous-equation models do not. We cannot a priori assume unidirectional flows of causation when studying milk production (e.g. culling rates might affect milk production, but also milk production might affect culling rates). Consequently, the simultaneous-equations model is the suitable method for studying milk production and other simultaneous production processes (with more than one output). We will first explain what a simultaneous relationship is, then show why the simultaneous-equations bias does matter (non-zero expectation of the error term). The consequences of the simultaneous-equations bias will be examined by comparing the production function of milk production estimated using this simultaneous model with an ordinary least-squares regression model. MATERIALS AND METHODS
Illustration of the simultaneous-equations bias We will illustrate the simultaneous-equations problem using an example from the travails of everyday life and, later, give a brief explanation why the simultaneous-equation bias does matter. Let the ~i indicate regression coefficients of endogenous variables (determined by the system), Bi indicate regression coefficients of the exogenous variables (determined outside the system) and Bo indicate the intercept term. Then, let us assume that we can write a linear equation for how a person, say a graduate student, allocates his/her time as a function of work requirements. Graduate student: Time =Bo +BI )
(I)
The work requirements are assumed to be determined independently of the graduate student (exogenously determined). Hence, we can use a singleequation regression model (such as an ordinary least-squares regression
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
39
model) to estimate this relationship. As time passes, this graduate student forms a family (spousel and spouse2) and now life becomes more complicated. They allocate their time according to the following equations Spouse 1: Time = Bol + 01 X spouse2's time + B l × work
(2)
Spouse2: Time = Bo2 4- 02 X spouse 1's time + B2 × work
(3)
The critical question is now whether the spouses spend their time independently of each other (the 0i are truly zero), or in an interdependent relationship with each other (non-zero 0i). If both spouses take the other's time allocation into consideration, we have a simultaneous system of relationships. In this case, using single-equation models (such as ordinary least-squares regression models) would give misleading results due to the simultaneousequations bias. In the case when only one of the spouses take the other's time allocation into consideration, we have a triangular system of relationships (Kmenta, 1986) which lends itself to be analyzed by path analysis. It is the conceptual understanding of the relationships which determines whether a simultaneous-equations model, a path model or a single-equation model is appropriate. However, the simultaneous model has the advantage of including the other specifications as special cases. Let us now jump from the travails of everyday life to the world of animal production systems, and examine why the simultaneous-equations bias does matter. Consider the following specification of a simultaneous system where disease (Y1) and milk production (Y2) are endogenous variables (determined by the system), veterinary inputs (XI) are considered exogenous variables (determined outside the system) and the error terms are el and e2. Disease:
Yl =01 × ]12 "~-~l ×Xl +el
Production: Y2 = 02 × YI + fiE × Xi + e2
(4)
(5)
The estimated value of the dependent variable (I7) of eqn. (4) is I?=01 × Y2 +fl, xXI
(6)
Thus, the error term el of eqn. ( 1 ) could be written as el = Y I - Y
(7)
The error term (el) would have a zero expectation under the assumptions of the ordinary least-squares model. What is the case in a simultaneous system? Substituting eqn. ( 5 ) into eqn. (4) gives Yi =01 × (02 × YI -~-~2x X i -I- e2) d-fllXi +el
(8)
The error term (el) will then be eqn. (8) minus eqn. (6) el = Y l - I)=01 ×02 Y2 +0! ×fie XXl
+OleE+fllX~--01Y2-fllXl
(9)
40
I. VAGSHOLM ET AL.
Equation (9) can be simplified to e, =~1 X (~2 Y2 - - Y2) +~, X/~2 x X , +~1 Xe2
(10)
Unless the regression coefficient ( ~ ) of the endogenous variable in eqn. (3) is zero, the error term (el) of eqn. (4) will have a non-zero expectation. Consequently, ordinary least-squares estimates of the regression coefficients in eqn. (4) are biased and inconsistent, both for the endogenous and exogenous variables. Moreover, the simultaneous-equations bias violates the assumption of independence between the error term and independent variables.
Simultaneous model identification In simultaneous-equation models, endogenous variables are considered to be random and simultaneously determined by the system. Exogenous variables are considered to be predetermined or fixed in repeated samples and are used as explanatory variables (Kmenta, 1986). Given the endogenous variables and the concept that these simultaneously affect each other, the regression (or structural) equations each have some endogenous variables and some exogenous variables as explanatory variables. This form of writing the equations is called the structural form because the structure of each equation is written out explicitly. However, this form of writing the equations introduces the analytical problem of inconsistency if we estimate each equation as dependent on the exogenous and endogenous variables. Nevertheless, by algebraic manipulation shown in the Appendix, the simultaneous model can be rewritten in a reduced form. The reduced form has the convenient structure of having only the endogenous variables on the left side of the equality. If the reduced form is estimated using linear regression, the results are consistent. One problem remains: can we calculate the unique structural coefficients from the reduced-form coefficients? The mathematical conditions under which the structural form can be calculated uniquely from the reduced form are called the identification conditions. The identification conditions are met by imposing restrictions on the exogenous variables in the structural equations. The restrictions are often in the form of variables having their coefficients set to zero (omitted) from certain equations, a n d / o r equality restrictions. A convenient rule of thumb is if the number of exogenous variables omitted (coefficients set equal to zero ) in an equation is at least as large as the number of included endogenous variables minus one, the equation is probably identified by zero restrictions; this is also called the order condition. Additional complications of identification are addressed in the Appendix.
Study population, unit of interest The unit of interest was the dairy herd. The inclusion criterion was participation in the cow health card program, the animal production recording pro-
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
41
gram (Husdyrkontrollen), and the farm efficiency program (Effektivitetskontrollen). Of a total population of 31 279 dairy herds, 1913 (approximately 6%) were included in the study. Nine (approximately 0.5%) of the included herds had one or more missing observations.
Data origin and variable description Data were obtained from the animal production recording program on nutrition, reproductive performance, herd size, culling, replacement and milk production. A technician from the program records the milk production on monthly farm visits and records the nutrition annually. The nutrition was measured in barley feed units (Presthegge and Engan-Skei, 1976), which is the energy content of 1 kg barley used for fattening. Data on milk deliveries per farm were obtained from the dairy factories. Data on disease incidence were obtained from the cow health card program, in which about 95% of the animal production herds participated (Norwegian Dairies Association ( N M L ) , 1988 ). In Norwegian dairy herds, an individual cow health card is kept on which veterinarians note clinical signs, diagnosis and treatment every time they treat the cow (Ministry of Agriculture, 1985 ). Because medical treatments of a Norwegian dairy cow require an individual veterinary diagnosis, this provides us with a reliable indicator of the dairy herd's disease status. The herd's cumulative incidence (hereafter referred to as incidence) of a disease was estimated as the number of cows treated for the particular disease (excluding recurrent cases in that year) divided by herd size (annual average number of lactating cows in the herd). Geographic regions might influence the milk production process since Norway is a geographically diverse country. D u m m y variables for the districts of regional veterinary officers represented the geographical regions (Ministry of Agriculture, 1985 ). The variables (listed by acronyms) and their descriptions appear in Table 1. The 90th, 50th and 10th percentile of the variables are also described (Table 2 ).
Conceptual model of milk production Dairying has at least two outputs at the conceptual level (milk and meat), and the farmer has to find optimal production levels of milk and meat. Economic theory of production with more than one output (Beattie and Taylor, 1985 ) implies that those (optimum) output levels (of milk and meat) are determined simultaneously. Accordingly, economic theory of production indicates that milk production is such a simultaneous system. Milk production has more outcomes than merely milk produced, e.g. milk quality (for which bulk tank milk somatic cell counts (BMSCC) are a common indicator). High BMSCC indicate mastitis which depreciates milk qual-
42
I. V,~GSHOLM ET AL.
TABLE 1 Endogenous and exogenous variables used in a study of the simultaneous-equations bias ( 1913 dairy herds which participated in the Norwegian farm efficiency program during 1987 ) Acronym Endogenous BMSCC CULL FPCM
FS
MILKD
Exogenous CINT CONC 2 KETO l MILKF I OFEED 2 OTHDZ I PAST 2 PSCONC 2 REG2 3 REG3 REG4 REG5 REG6 REG7 REPL REPRO I ROUGH 2 UDDER I
Description
Annual geometric mean of bulk tank milk somatic cell count for milk delivered to the dairy factory ( 1000 per ml) Herd culling rate (%) Herd average milk produced (kg) per cow calculated from the formula FPCM = 0.297 × M + 1 I. 8 X F + 7 × P, where M = kg milk yield, F = kg butterfat, P = k g protein. FPCM includes milk withheld to avoid drug residues and milk consumed on the farm Herd fertility index calculated as FS = { [ (NR60 + TW ) / N A I ] - [ CIDAY125 ] } × (CCAI - CCOW )/CCAI, where NR60 = non-return percentage at Day 60 post-insemination, TW = n u m b e r of inseminations during the same heat, NAI = number of inseminations per cow or heifer, CIDAY = time in days between calving and last insemination, CCAI = n u m b e r of cows inseminated, CCOW = n u m b e r of cows culled for infertility Herd average milk delivered (kg) per cow to the dairy factory calculated using the same formula as FPCM. MILKD excludes milk withheld to avoid drug residues and milk consumed on the farm
Herd average calving interval in months Feeding of concentrates Ketosis Milk fever (hypocalcemia) Feeding of dry straw, alkali- or ammonia-treated straw, or other feedstuffs Other diseases including dystocia, traumatic reticuloperitonitis, laminitis, acute indigestion, arthritis, tendinitis, bursitis and ungual diseases Pasture feeding Feeding of roots, potatoes, creamery and brewery by-products Region 2 (Hedmark and Oppland counties, 520 herds) Region 3 (Vestfold, Buskerud and Telemark, 57 herds) Region 4 (Rogaland, Aust-Agder and Vest-Agder, 233 herds) Region 5 (Sogn & Fjordane and Hordaland counties, 123 herds) Region 6 ( M r r e & Romsdal, S-Tr/Sndelag and N-Trrndelag, 770 herds) Region 7 (Nordland, "Froms and Finnmark counties, 170 herds) Herd replacement rate (%) Reproductive diseases: cystic ovaries, metritis, silent heat and retained placenta Feeding of hay, silage, fresh grass and grass cubicles Udder diseases: clinical mastitis and teat lesions
~Disease incidence was the number of cows treated for a particular disease divided by the annual average number of cows in the herd (~.rskyr) expressed as a percent. 2Measured in barley feed units which are defined as the energy content of I kg barley used for fattening. 3Region I (Ostfold, Akershus and Oslo, 76 herds) was captured in the intercept. The regions were the districts of the regional veterinary officers.
43
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
TABLE 2 Distribution of the endogenous and exogenous variables ( 1913 dairy herds which participated in the Norwegian farm efficiency program during 1987) Variable
90th percentile
Median
10th percentile
BMSCC ( 1000 per ml) CULL (%) FPCM (kg) FS MILKD (kg) CINT ( m o n t h s ) CONC 2 KETO ~ MILKF I OFEED 2 OTHDZ j PAST 2 PSCONC 2 REPL (%) REPRO I ROUGH 2 UDDER I
291 65 7350 93 6882 13.3 2196 0.508 0.213 232 0.416 1031 456 63 0.351 2229 0.747
173 43 6413 65 6028 12.3 1761 0.172 0.078 0 0.152 589 0 41 0.123 1609 0.400
101 24 5158 34 5172 11.7 1382 0 0 0 0 0 0 24 0 1411 0.127
'Disease incidence was the n u m b e r of cows treated for a particular disease divided by the annual average number of cows in the herd (~,rskyr). -'Measured in barley feed units which are defined as the energy content of 1 kg barley used for fattening.
ity. In Norway, dairy farmers are paid a premium for delivering milk with low BMSCC and are penalized if the milk delivered to the dairy factory has repeatedly high BMSCC. Therefore, economic theory indicates that the management-chosen relationship between BMSCC and milk production might be simultaneous. Moreover, increased BMSCC decrease milk production (Raubertas and Shook, 1982; Dohoo et al., 1984; Jones et al., 1984; Salsberg et al., 1984). Conversely, Emanuelson and Persson (1984) found that high milk production was associated with low BMSCC (possibly a dilution effect). Hence, the biological relationship between milk production and BMSCC might also be a simultaneous one. Culling and reproduction might have as simultaneous relationship, since both the events and the management response take place during the observation period. Reproduction is measured by a fertility index (FS) (Refsdal, 1982 ), which is also a function of the number of cows culled for reproductive problems (Table l ). If a farmer has cows with reproductive problems, the routine management response is to cull cows which do not respond to veterinary treatment. On the other hand, a policy of high cull rates (thus producing a lot more meat) could conceivably lead to improved reproduction. Further, the relationship between culling and BMSCC or milk production could also
44
I. VAGSHOLM ET AL.
be simultaneous because the events (high BMSCC and/or low milk production) and management response (culling) occur during the observation period, and because culling itself would also influence the levels of BMSCC and milk production. The farmer also has to optimize reproduction and milk production simultaneously. A farmer needs good reproduction to maintain both the stock of producing cows and high milk production for revenues. If the reproduction is substandard, a lower milk production is expected. However, there is some evidence indicating that a high milk production (which might cause prolonged negative energy balances) could also impair reproductive performance (Radostits and Blood, 1985). Hence, when looking at the relationship between reproduction and milk production, a simultaneous relationship is both biologically and economically reasonable. We will now explain some additional concepts of our model. By including milk production (FPCM) and milk deliveries (MILKD) in the model, we captured both the disease's impact on the cow's production potential and the impact on milk delivered, respectively. Milk production is an estimate of a cow's milk production based on monthly weighing of a day's milk production. This measure of milk production includes the milk withheld from delivery by the farmer to avoid drug residues. For example, the milk produced by a cow with clinical mastitis receiving antibiotic treatment is withheld for 7-10 days. Milk delivered is the actual amount of milk delivered to the dairy plant and excludes milk withheld due to drug residues. We assumed that milk produced influenced the amount of milk delivered, but assumed no reciprocal influence (not a simultaneous relationship). Furthermore, a possible dilution effect of milk delivery on BMSCC (the larger the amount delivered, the lesser the BMSCC) was suspected. Accordingly, in this study milk production (FPCM), milk delivery ( MILKD ), bulk tank milk somatic cell counts (BMSCC), reproduction (FS) and culling rate (CULL) were the endogenous variables. The exogenous variables (which were determined outside the system) were nutrition, disease incidence, geographical region, calving interval (CINT) and replacement rate (REPL). The calving interval was an exogenous variable because farmers frequently prefer a concentrated spring or fall calving. Hence, the desirable calving interval (about 12 months) is determined by the seasonal structure of dairying, rather than by the milk production process itself. The replacement rate was perceived as exogenous because farmers usually determine in advance how many replacement heifers are available (before deciding how many and which cows to cull ).
Identification: restrictions applied and statistical analysis The conceptual model was now translated into a simultaneous-equations model consisting of five structural equations, each with one endogenous vari-
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
45
able on the right-hand side, to be explained by endogenous and exogenous variables on the left-hand side. However, this conceptual model did not satisfy the necessary order conditions required for unique estimation of the regression (structural) coefficients. Hence, to identify the simultaneous model, we had to impose either zero or equality restrictions on the regression coefficients (the latter applied only to regional effects). Based on prior assumptions, we could omit certain exogenous variables from the equations (which is equivalent to applying zero restrictions): (1) from the milk production (FPCM) model, the calving interval (CINT) and the replacement rate (REPL); (2) from the milk delivery (MILKD) model, the nutrition variables (ROUGH, CONC, PSCONC, PAST and OFEED), the calving interval (CINT) and the replacement rate (REPL); (3) from the BMSCC model, the calving interval (CINT) and the incidence of reproductive diseases (REPRO); (4) from the reproduction (FS) model, the incidence of udder disease (UDDER) and milk fever (MILKF); (5) from the culling (CULL) model, the nutrition variables (ROUGH, CONC, PSCONC, OFEED and PAST) and the incidence of ketosis (KETO). The calving interval could be excluded from the milk production model because the endogenous fertility index (FS) variable was assumed to capture the effects of reproduction. Milk fever might be associated with unsatisfactory reproduction through an increased incidence of retained placenta and possibly other reproductive problems. However, this relationship was captured by the included incidence of reproductive diseases (REPRO) variable in the reproduction (FS) model. To ensure that the model satisfied the necessary and sufficient rank condition for identification (see the Appendix), we applied equality restrictions for the regional dummy variables. All P values were greater than 0.16 (in the final three-stage least-squares (3 SLS ) model ) for these null hypotheses: ( 1 ) in the FPCM model, REG2---REG6=REG7; (2) in the MILKD model, REG3=REG4; (3) in the BMSCC model, REG4=REG7; (4) in the FS model, REG4 = REG6. Moreover, if an exogenous variable or its square root had a regression coefficient with P> 0.15 in both the simultaneous model and the ordinary leastsquares model when offered with the a priori restrictions, then these coefficients were restricted to zero too. The zero restrictions were imposed in one step, to avoid the problems of the pre-test bias (Kennedy, 1985 ) which might be the consequence of using either a backward or a forward variable selection procedure. The model was then re-run for both the simultaneous and the ordinary least-squares versions, and the results of this re-run are presented in Table 3 and Tables A1-A4. To compare the simultaneous-equation model with the classical linear regression model, we estimated a 3 SLS model (Zellner and Theil, 1982) and an ordinary least-squares (OLS) model for each equation (Statistical Analysis Systems (SAS), 1984 and 1985, program SAS
46
I. V/kGSHOLM ET AL.
TABLE 3
The structural equation (model) for milk production ( F P C M ) ( 1913 dairy herds which participated in the Norwegian farm efficiency program during 1987 ) Variables
Intercept
Ordinary least squares
Three-stage least squares
p
p
SE(#)
2828.5*
159.0
SE(p)
2863.0*
326.0
Endogenous BMSCC CULL FS
-0.53* 3.28* 0.57
0.13 0.65 0.44
0.84 6.48* 3.67*
1.06 1.42 1.04
1.63" - 297.7* 383.4* 461.3" 0.48* 1.44" - 10.46" 1.13* - 199.1" - 297.8* - 455.5* - 199.1" - 199.1" 1.75" - 76.8* 167.3*
0.04 129.2 120.2 121.9 0.07 0.10 3.72 0.05 41.5 48.8 61.7 41.5 41.5 0.14 8.7 46.1
1.50" - 475.6* 534.3* 591.4" 0.32* 1.65" -40.8* 1.00" - 259.6* - 359.7* - 531.4" - 259.6" - 259.6* 0.34* - 13.02* 159.0*
0.04 151.5 125.9 125.1 0.08 0.11 3.53 0.05 42.7 54.6 64.0 42.7 42.7 0.07 5.72 57.2
Exogenous CONC KETO K E T O -°5 MILKF OFEED PAST PAST -°5 PSCONC REG2 REG4 REG5 REG6 REG7 ROUGH R O U G H "°5 UDDER
Levels of significance are expressed by * which denotes P values less than 0.05.
PROC SYSLIN {3 SLS} and PROC REG {OLS}). The 3 SLS estimation method (Kennedy, 1985) can be described briefly as: (1) estimate the reduced-form coefficients (see the Appendix); (2) use these coefficients to estimate the error terms of the structural equations; then (3) incorporate the correlations between the error terms of the structural equations to obtain efficient estimates of the structural coefficients. Since the observations of both exogenous and endogenous variables were herd averages, we weighted the observations according to herd size to reduce unequal variances. We accounted for potential curvilinear (Beattie and Taylor, 1985 ) associations between the endogenous and exogenous variables by offering both each exogenous variable and its square root to the models. A square-root transformation was preferable since it would push the distribution rightward for variable values less than 1 and leftward for variable values
SIMULTANEOUS-EQUATIONSBIASOF ANIMALPRODUCTION SYSTEMS
47
greater than 1 (which was fortunate since the distribution of disease incidence was concentrated to the left with a long right tail ).
Financial illustration of the simultaneous-equations bias To illustrate the practical importance of the simultaneous-equations bias, we estimated both the change in milk production and the subsequent change in milk delivery (valued at 3.42 Nkr kg-l, (Norwegian Dairies Association (NML), 1988) ), ifa 20-cow herd improved its fertility index (FS) from the 10th percentile to the 90th percentile (from 34 to 93). RESULTS AND DISCUSSION
Results of the statistical analysis The results of the analysis are shown in Table 3 for the milk production (FPCM) model and in Tables A l-A4 for the other models (structural equations). The simultaneous (3 SLS) model had a system weighted R 2 of 0.42, while the average R E for the ordinary least-squares models was 0.38. One practical result of ignoring the simultaneous-equations bias is the failure to reap the optimum pay off of investments in herd health. As an illustration, this study indicated that seven times larger investment into improving a herd's reproduction was warranted if the results of the simultaneous model were followed instead of the ordinary least-squares model. The ordinary leastsquares model predicted that improving reproduction (FS) from 34 (10th percentile) to 93 (90th percentile) results in a 33.6 kg increase in milk production, while the simultaneous model predicted an increase of 216.5 kg. According to the simultaneous model of milk delivery (MILKD), this difference in milk production (216.5-33.6=182.9 kg) would result in a (0.893× 182.9= 163.3 kg) increase in milk delivery (worth 586 Nkr). The simultaneous model indicated that BMSCC did not significantly influence milk production, contrary to the ordinary least-squares model (Table 3 ) and the literature (e.g. Raubertas and Shook, 1982; Dohoo et al., 1984; Jones et al., 1984; Salsberg et al., 1984). However, the surveyed herds had low BMSCC levels (Table 2), presumably in contrast with the herds in the cited studies. Radostits and Blood ( 1985 ) suggested that BMSCC < 250 000 ml-i indicates udder health. We probably did not have herds with bad enough BMSCC to detect any effects of unsatisfactory udder health. Nevertheless, these findings challenge the assertion that the greatest benefits of reduced somatic cell counts are reaped at lower BMSCC levels; e.g. BMSCC < 200 000 ml- 1. A reasonable interpretation of this study's findings concerning BMSCC and milk production is that one should expect changes in milk production only to the extent variations in BMSCC are correlated with mastitis.
48
I. VAGSHOLMETAL.
The impact of udder disease (UDDER) is negative for milk delivered (Table AI ) and positive for milk produced (Table 3 ). One interpretation is that prompt treatment of udder disease preserves the cow's production potential, while the antibiotic treatment of udder disease causes withheld milk (leading to diminished milk delivery). Hence, when examining disease impacts on animal production, it is useful to distinguish between the impact of disease itself and the impact of its cure. GENERAL DISCUSSION
The simultaneous 3 SLS estimation of our model of milk production resuited in several instances in different estimates concerning signs, magnitude and significance for the endogenous and exogenous variables compared with ordinary least-squares estimation (Table 3 ). We expected these results since the simultaneous-equations bias results in a non-zero expectation of the error terms of the structural equations. Hence, accounting for simultaneous-equations bias might change our interpretations of the relationships in the milk production process substantially (Table 3 and Tables A l-A4 ). It is however, at the conceptual level that one should discuss whether a simultaneous-equations bias is present or not. The statistical analysis simply substantiates or disputes our conceptual understanding of the animal production system. If the conceptual model of a biological system indicates the presence of simultaneous-equations bias, then it is appropriate to use the simultaneous-equations model, despite the effort involved. Moreover, the simultaneous-equations bias and its consequences when studying biological systems (including veterinary epidemiology) should be subject to further scrutiny. Both simultaneous-equations models and path analysis account for direct and indirect effects (of diseases on milk production, for example ). However, only simultaneous-equation methods can account for the simultaneous-equations bias. Nevertheless, if a system can be specified as a recursive system, then path analysis would provide unbiased and efficient estimates in large samples (Lahiri and Schmidt, 1978 ). Moreover, if the path analysis could be specified as a triangular structural system, and if the disturbance covariance matrix is known, path analysis would also provide unbiased and efficient estimates (Lahiri and Schmidt, 1978 ). Two arguments against using a simultaneous-equations approach are computational complexity and more detailed model specification. However, the value of improved information might be considerable, as shown in this paper. Furthermore, the identification problem compels the researcher to specify the model carefully and to use prior knowledge through imposing restrictions. We believe that the use of simultaneous-equations models will ultimately be a challenge to our competence as veterinarians rather than as statisticians. In conclusion, we recommend consideration of the simultaneous-equations bias
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
49
when studying animal production systems or carrying out epidemiological studies of animal diseases. ACKNOWLEDGMENTS
We wish to acknowledge the support from the Producer Services, Norwegian Dairy Association (NML Produsenttjenesten), the Agricultural Research Council of Norway and the Norwegian National Veterinary Institute.
REFERENCES Beattie, B.R. and Taylor, C.R., 1985. The Economics of Production. Wiley, New York, 255 pp. Dohoo, I.R., Meek, A.H. and Martin, S.W., 1984. Somatic cell counts in bovine milk: relationships to production and clinical episodes of mastitis. Can. J. Comp. Med., 48:130-135. Emanuelson, U. and Persson, E., 1984. Studies on somatic cell counts in milk from Swedish dairy cows. I. Non-genetic causes of variation in monthly test-day results. Acta Agric. Scand., 34: 33-44. Erb, H.N., Martin, S.W., Ison, N. and Swaminatan, S., 1981. Interrelationships between production and reproductive diseases in Holstein cows. Path analysis. J. Dairy Sci., 64: 282289. Girschick, M.A. and Haavelmo, T., 1947. Statistical analysis of demand for food: Examples of simultaneous estimation of structural equations. Econometrica, 15: 79-110. Jones, G.M., Pearson, R.E., Clabaugh, G.A. and Heald, C.W., 1984. Relationships between somatic cell counts and milk production. J. Dairy Sci., 67:1823-1831. Kennedy, P., 1985. A Guide to Econometrics. 2nd Edn. MIT Press, Cambridge, MA, 238 pp. Kmenta, J., 1986. Elements in Econometrics. 2nd Edn. Macmillan, New York, 786 pp. Lahiri, K. and Schmidt, P., 1978. On the estimation of triangular systems. Econometrica, 46: 1217-1221. Ministry of Agriculture, 1985. The state veterinary service and animal health situation in Norway. Ministry of Agriculture, Division of Veterinary Services, Oslo, 30 pp. Norwegian Dairies Association (NML), 1988. ,~rsmelding Produsentjenesten, NML 1987 (Annual Report of the Herd Recording Program, 1987). Norwegian Dairymen Association, Oslo, 144 pp. (in Norwegian). Presthegge, K. and Engan-Skei, I., 1976. Foring og stell av husdyr (Nutrition and Management of Livestock). Landbruksforlaget, Oslo, 292 pp. (in Norwegian ). Radostits, O.M. and Blood, D.C., 1985. Herd Health. A Textbook of Health and Production Management of Agricultural Animals. Saunders, Philadelphia, PA, 456 pp. Raubertas, R.F. and Shook, G.E., 1982. Relationship between lactation measures of somatic cell concentration and milk yield. J. Dairy Sci., 65: 419-425. Refsdal, A.O., 1982. Nytt fruktbarshetsm~l - - FS-tallet (A new fertility index - - FS). Reproduksjonsnytt, 4 : 8 - 9 (in Norwegian). Salsberg, E., Meek, A.H. and Martin, S.W., 1984. Somatic cell counts: associated factors and relationship to production. Can. J. Comp. Med., 48:251-257. Statistical Analysis Systems Institute, 1984. SAS/ETS T M User's Guide: Statistics, Version 5. SAS Institute Inc., Cary, NC, 738 pp. Statistical Analysis Systems Institute, 1985. SAS User's Guide: Statistics, Version 5. SAS Institute Inc., Cary, NC, 956 pp.
50
I. VAGSHOLM ET AL.
Zellner, A. a n d Theil, H., 1982. T h r e e stage least square: s i m u l t a n e o u s e s t i m a t i o n o f simultaneous equations. Econometrica, 30: 54-78.
A P P E N D I X ; I D E N T I F I C A T I O N O F S I M U L T A N E O U S SYSTEMS
The algebra n e e d e d to define the identification p r o b l e m requires that the set of equations to be estimated is written in the f o r m of vectors a n d matrices. The vector of e n d o g e n o u s variables is d e n o t e d as Y(n × 1 ) and the vector of exogenous variables as X ( m X 1 ). T h e vector of error terms from each structural equation is d e n o t e d U (n × 1 ). T h e structural form of the set of simultaneous equations is written
fl× Y + F x X = U
(1)
Where fl is an n × n matrix of structural coefficients and F is an n X m matrix of coefficients for the exogenous variables, p is always square ( n X n ) because there are always n equations for each e n d o g e n o u s variable, each possibly, but not necessarily affected by any other e n d o g e n o u s variables. I f t h e j t h endogenous variable is excluded from the ith equation, the coefficient b,j will'equal zero. Likewise, zero coefficients in the n × m F m a t r i x imply exclusion restrictions. Assuming that fl is of full rank, its inverse exists and eqn. ( 1 ) can be pre-multiplied by fl- t to yield
Y+ f l - ' x F x X = f l - '
XU
(2)
Defining - (fl- l X F ) = n (n X m ) and fl- l × U = Y, we can rewrite eqn. (2) as TABLE A 1 Model of milk delivery ( MILKD ) Variables
Intercept Endogenous FPCM Exogenous REG3 REG4 REG6 REPRO REPRO ^°'5 UDDER
Ordinary least squares
Three-stage least squares
fl
fl
SE(fl) 510.0"
0.855*
121.9" 121.9" 82.5* - 526.3* 301.2* - 52.1
76.5
0.01
23.0 23.0 18.4 157.7 110.1 37.4
SE(fl) 271.0"
0.89*
117.5" 117.5" 90.1" - 500. I * 302.6* - 79.6*
Levels of significance are expressed by * which denotes P values less than 0.05.
100.7
0.02
22.8 22.8 18.1 153.0 106.7 37.9
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
51
T A B L E A2 Model o f bulk t a n k milk s o m a t i c cell c o u n t s ( B M S C C ) Variables
O r d i n a r y least s q u a r e s
Three-stage least s q u a r e s
#
#
SE(#)
SE(#) 471.4"
490.0*
Endogenous CULL FPCM MILKD
- 0.18 0.004 - 0.02*
0.13 0.005 0.005
- 2.92* - 0.06 0.05
0.98 0.03 0.03
0.14* - I 1.7" - 118.8" 55.5* 0.04* 0.01 * 15.7" - 15.7" -0.11 26.2*
0.07 5.4 56.8 28.2 0.01 0.005 4.7 4.7 0.13 7.7
0.10 -9.8 - 115.4* 53.3 0.04* 0.01 -11.3"
0.07 5.8 57.2 28.8 0.01 0.01 5.2 5.2 0.54 9.4
Exogenous CONC C O N C "°-5 MILKF M I L K F "°-s OFEED PAST REG4 REG7 REPL UDDER
111.7
121.7
Intercept
-
11.3"
1.34" 26.4*
Levels o f significance are e x p r e s s e d by * w h i c h d e n o t e s P v a l u e s less t h a n 0.05.
Y= - z r × X + V
(3)
This is the reduced form; it has the familiar form of n single regression equations stacked up with the dependent variable on the left, and the explanatory and the error term on the right-hand side. These equations can be estimated consistently yielding coefficients in zr and statistical properties based on V. The identification problem is to use these estimates and stochastic properties to derive estimates and statistical properties for the structural coefficients in matrices # and F. If we can go uniquely from 7r to fl and F by a linear transformation, the linear transformation of V will preserve the normal statistical properties. The question is, under which conditions can we obtain this linear transformation. A little more matrix algebra will show these conditions. Substituting eqn. (3) into eqn. ( 1 ), we obtain
# X ( T r × X + V ) + F × X = U or p × 7rXX+ #X V + F x X = U
(4)
but since #- ~X U= Vby definition, the error vectors in eqn. (4) cancel, leaving
#X rcxX= - ( F x X ) or #X zc= - F
(5)
52
I. VAGSHOLM ET AL.
TABLE A3 Model o f reproduction ( FS ) Variables
Ordinary least squares
Three-stage least squares
,8
fl
SE(fl)
Intercept
306.7*
Endogenous CULL FPCM
- 0.05 0.0004
0.03 0.001
-0.I0 0.002
0.06 0.002
- 15.0* 0.035* - 2.96* 0.02* -0.37* 5.2* 3.1" 3.1" - 37.7* 17.1" 0.001
0.67 0.017 1.4 0.004 0.15 1.4 1.3 1.3 8.9 6.2 0.002
- 16.6* 0.032 - 3.01 * 0.02* -0.36* 5.1 * 3.1" 3.1" - 37.4* 16.6* 0.001
0.71 0.017 1.4 0.01 0.16 1.4 1.3 1.3 8.6 6.0 0.001
Exogenous CINT CONC C O N C "°5 PAST PAST ^°'s REG2 REG4 REG6 REPRO R E P R O "°'5 ROUGH
31.1
SE(,8)
321.5"
30.3
Levels o f significance are expressed by * which denotes P values less than 0.05.
TABLE A4 Model o f the culling rate ( C U L L ) Variables
Intercept
Ordinary least squares
Three-stage least squares
fl
fl
SE(fl) 13.3*
3.4
SE(fl) 18.1"
6.9
Endogenous BMSCC FPCM FS
- 0.005 0.002* 0.004
0.004 0.001 0.01
- 0.03 0.001 0.07*
0.02 0.001 0.003
Exogenous REG5 REPL R E P R O ^°'s UDDER U D D E R "°s
- 5.6* 0.53* 1.2 9.1" - 8.3
1.4 0.02 1.5 4.7 5.3
- 4.5* 0.53* 2.8 8.4 - 6.6
1.3 0.02 1.5 4.5 5.0
Levels o f significance are expressed by * which denotes P values less than 0.05.
SIMULTANEOUS-EQUATIONS BIAS OF ANIMAL PRODUCTION SYSTEMS
53
We now have an equality relationship between the known estimates of reduced form coefficients in rt and the u n k n o w n structural coefficients in fl and F. However, fl and F c o n t a i n the coefficients for all n equations. Breaking out the kth equation by selecting the kth row offl and F, the equality in eqn. (5) for just the kth equation can be written as !
[ bkl ,bk2,..,bk,] × ]hi, ... ZCt,n •
I
I nnl
,
= - [Zkl ,Zk2,.,Zk,, ] ...
(6)
7~'nm
Since we have a priori excluded the influence of some of the endogenous variables and some of the exogenous variables from the kth equation, some of the values of bk, and rk, will be set equal to zero. Matrix algebra allows us to reshuffle the order of the coefficients in bk and rk as long as we adjust the coefficients in matrix n accordingly. Moving all the zero coefficients to the right end of each vector, we can rewrite two vectors of structural coefficients as
bk = [ b # , 0 ~ ] and Zk = [Z.,0**]
(7)
where the # a n d . denote the n u m b e r of zero and non-zero coefficients, respectively. The rearranged ze matrix is partitioned into four submatrices and the quality (6) is written [b~lO~#]× rc~. 7[##.
n~** I = - [ z .
I
lO**]
(8)
7[#:~** I
Using matrix multiplication, eqn. (8) multiplies out to two sets of equations which link the known ~zcoefficients with the unknown, non-zero coefficients in b and z. The equalities are b~ Xz~#. = - r . b# X rc~** = 0
(9) (10)
If we can solve for the values ofb~ in eqn. (10), we can substitute and solve for z in eqn. (9). Since one b coefficient in each equation is set equal to one ( t h i n k i n g of it as the dependent variable on the left-hand side), the vector b~ contains ~_ l u n k n o w n b in ** equations. We can solve this system only if the n u m b e r of equations is at least as great as the n u m b e r of unknowns, i.e. ** larger than or equal to ~_ i. Because ** is the n u m b e r of variables excluded from the exogenous set and ~ is the n u m b e r of included endogenous variables, eqn. (10) proves the first condition. The necessary order condition states
54
I. V/~,GSHOLM ET AL.
that in each equation the number of excluded variables is greater than or equal to the number of included endogenous variables minus one. Matrix algebra enthusiasts will also recognize that a sufficient condition for identification is that the rank of the matrix re#** is greater than or equal to