Mixed Model Evaluation of Related Sires for Milk Yield Eliminating Age and Interaction of Sire with Herd-Year-Season 1 A. J. LEE Department of Dairy Science University of Illinois Urbana 61801
ABSTRACT
tions for sire evaluation were not indicated clearly. Equations for multivariate mixed models were simplified for traits having the same model and for all traits recorded on each individual (6). Correlations among random elements of a class and multiple random classifications were considered known. This simplification was used to develop sire evaluation for milk yield and age at calving with related and possibly inbred sires and interaction of sire by herd-year-season. Henderson (4) considered multiple traits generally in the absence of interaction of sire by herd-year-season. Computing methods for a single trait with interaction and unrelated sires (2) are special cases of methods to be derived below. Implications for evaluating the genetic merit of dairy sires for milk yield were examined with results from (5). Assumptions leading to analysis of actual milk yield ignoring age and of age-corrected milk yield under a univariate mixed model were examined.
A method of best linear unbiased evaluation of related sires for milk yield considering age and interaction of sire with herd-year-season was developed. It was based on simplification of mixed model equations for multiple traits and multiple random classifications when all traits are recorded for each individual and design matrices are the same for all traits. The resulting modified least squares equations were used for milk yield and age at calving under a model with sire, herdyear-season, and sire by herd-year-season. For unrelated sires, this permits consideration of an arbitrarily large number of classes of herd-year-seasons and sires by herd-year-season. Absorption involves inversion of matrices of order t for t traits and unrelated sires. For groups of c related sires inverses of order t c are needed. Resulting absorbed equations of order ts for s sires must be solved. Sire, sire by herd, and error components of variance and covariance for first lactation milk yield and age at calving were used to examine implications for sire evaluation. Conditions justifying evaluation with only actual milk yield or only agecorrected milk yield were derived.
Model and Assumptions
The multivariate linear model for observations from a design with interaction of sire by herd-year-season (HYS) where all traits are observed on each individual can be written as: y = (It*X)h + (It*Z)s + (It*W) 7 + e where:
INTRODUCTION
!
Methods of evaluating dairy sires for milk yield make use of records adjusted for effects of age and season of calving, eliminating effects of sire and herd-year-seasons. Components of variance and covariance for milk yield in first lactation and age at first calving for sire and sires by herds were in (5). However, implica-
Received July 18, 1977. 1Supported in part by the Illinois Agricultural Experiment Station. 1978 J Dairy Sci 61:600--606
[1]
600
transpose of a matrix or vector, Kronecker or direct product of matrices, y' = Yl' Y2' • . • Yt' with Yi' a row vector of the n observations for trait i, identity matrix of order t, |t = X = an n x p known matrix with a single 1 in each row for column i when the observation is in the ith of the p HYS classes. Consequently, X'X is a p x p matrix with ni., the number of observations for the ith HYS as the ith diagonal element, and zero elsewhere,
TECHNICAL NOTE h' = h i ' h2' . . . h t' with h i' a row vector of p fixed, u n k n o w n HYS effects for trait i. Z = an n × q k n o w n matrix with a single 1 in each row for c o l u m n j when the observation is on a daughter of sire j from q possible sires. Consequently, Z'Z is a q × q matrix with n.j, the n u m b e r of daughters of sire j, for the jth diagonal element, and zero, elsewhere. Also, X ' Z is a p × q matrix of nij's , the n u m b e r of observations on daughter of sire j in HYS i. s' = sl s2 • • • st' with Si F a row vector of q r a n d o m u n k n o w n effects of sires in trait i. Var (si) is siiAq, and Cov (s i, si') is sii'A where A is the matrix of coefficients of relationship a m o n g the q sires. Therefore, Var (s) = S*A. The sii and sii' are the sire c o m p o n e n t s of variance for trait i or covariance between trait i and i', respectively, when sires are neither inbred nor related.
W'= W l ' w 2' . . .Wp I. W i=
a k n o w n n × q matrix. One element of each row of W i corresponding to a filled class sire by HYS in which the observation occurred is 1; other elements are zero. Every e l e m e n t of a row for an e m p t y subclass will be zero. Consequently W W is a pq × pq matrix with diagonal elements nij for each filled subclass and zero otherwise. @
.
W i t h o u t loss of generality, data are sorted by sire within herd and W and W'W are arranged correspondingly. The X'W can be partitioned similarily as (X'Wl X'W2 . . . X'Wp). Row i of X'W i contains the n u m b e r s of observations for each sire represented in herd i. All other rows are null. Similarly Z'W, a q × pq matrix is partitionable into (Z'Wl Z'W2 . . . Z'Wp) with elements nij in the c o l u m n of row j corresponding to sire j if sire j has one or more daughters in herd i and zero otherwise. There will be rows and c o l u m n s of all zeros for missing ceils. It is
It*X'X
It*X'Z
It*Z'X
It*Z'Z + RS -1 *A-X
It*W'X
It*W'Z
txw
601
necessary to include these and their elements in 3' since i n f o r m a t i o n from related sires in the same HYS will provide a means of estimating interaction in missing cells. The vector of interactions is 3" = 3'1' 3'2' • • • 3't' where 3'k' is a row vector of pq r a n d o m u n k n o w n effects for interactions b e t w e e n the ith HYS and sire j in trait k. Assume Var (3'i) = gii (Ip*A) and Cov (3'i, 3'f) = gif (Ip*A). The gii and gii' are the c o m p o n e n t s of variance for .sire by HYS for trait i or covariance b e t w e e n trait i and i', respectively. In the definition of Var (3'i), A is the additive relationship matrix b e t w e e n sires. This implies that interactions of two sires in one HYS are correlated to the extent that the two sires are related. The vector of residual effects is e' = e l ' e2' e3' . • • et' with e l ' a row vector of a random u n k n o w n residueal effects, Var (el) = riln, and Coy (eiei') = r i f I n, so Var (e) = R*I n. The r i and rif are the residual c o m p o n e n t s of variance in trait i or of covariance between traits i and i'. This model is a special case of the general multivariate model of (4) where all traits are recorded for each individual a n d the model contains the same classes of factors for each trait• A simplification of equations of (4) for these conditions and for multiple r a n d o m classifications has been derived (6). This yields the equations for the specific model as shown in [2]. The A-1 matrix, the inverse of the matrix of relationships among sires, can be c o m p u t e d for even large q by the m e t h o d of (8). The solution for s t h e n is the BLUP (best linear unbiased predictor) of sire effects eliminating age and interaction. Solutions for 3' would be BLUP of interactions. In typical dairy cattle data interaction would be estimated poorly. However, it is possible to o b t a i n a BLUP of 3' and s or c o m b i n a t i o n s of them. For example, the BLUP of sire j plus interaction of sire j in HYS i in trait t can be obtained. With limited data in each HYS this will be d o m i n a t e d heavily by the BLUP for sire j. M a x i m u m likelihood under n o r m a l i t y or best linear unbiased
It*Z'W It*W'W + RG -1 *(lp *A -1 )
JI ] II xl =
It*Z' { It*W' j
y
[2]
Journal of Dairy Science Vol. 61, No. 5, 1978
602
LEE
estimated (BLUE) of estimable contrasts among fixed HYS effects also can be obtained from solutions of [2]. These are usually of limited direct interest in applications to dairy cattle breeding. Computing solutions to [2] is simplified greatly when they are actually independent sets of equations for each trait. This occurs if, and only if, both RS-1 and RG -1 are diagonal. Apart from R S - l * A-1 and RG-I*(Ip*A-1), equations (2) are the least squares equations for each trait independently. This can be seen from the block diagonal form of submatrices like It*X'X. Equations for one trait would not involve unknown subvectors of h, s, and 3' for other traits. However, RS- I * A -1 and RG-1 *(Ip*A-1) generally destroy this feature, making solutions for one trait depend on those for other traits. Only when RS- I * A -1 and R G - I * ( I p * A-1) are block diagonal will the independence of equations for different traits be retained. This occurs only when both RS-~ and RG -1 are diagonal. The resulting independent sets of equations for each trait are not generally those for each trait ignoring other traits. The diagonal elements of RS-1 and RG-1 depend on off diagonal as well as diagonal elements of R, G, and S. The sets of equations for a trait will be identical to those for that trait ignoring other traits only in the trivial case where R, G, and S are diagonal. Computing methods for solving a set of equations for each trait in large designs with interaction of sire by HYS have been described (2). One special case of diagonal RG -1 and RS-1 is when matrices R, S, and G are proportional. If S = ksR then S-1 = (1/ks)R -1 and RS -1 = (1/ks)I t . Similarily, if G = kgR then RG -1 = (1/kg)l t. Equal correlation matrices associated with R, S, and G only yield diagonal RG-1 and RS-1 when the ratio of diagonal elements of R and S and of R and G are constants for all traits (i.e. rii/gii = constant all i, rii/sii = constant all i). A common heritability and a common proportion of genetic by environment interaction for all traits along with equal genetic, interaction, and residual correlations for each pair of traits is an equivalent condition. There are many combinations of R, G, and S yielding diagonal RG -1 and RS-1, but the above appear to be the most interesting special cases. Solution of equation [2] for arbitrary RG-l and RS-l is practical for an arbitrary number of Journal of Dairy Science Vol. 61, No. 5, 1978
HYS and sire by HYS classes when A-x is diagonal. This is the case of unrelated sires. First, interactions are absorbed into the herd equations. Then interactions and herds are absorbed into the sire equations. This involves matrix inversions of only order t and results in absorbed sire equations for all traits of order tq. These then might be solved iteratively. Absorption is simplified because interaction equations for a class of sire by HYS for all traits are independent of those for other sire by HYS classes. Details are in Appendix I. The above approach also can be used for related sires when Cov ('Yi'~'i') = gii' (Ip*D), where D is a diagonal matrix of order q. This implies that interaction effects of related sires in the same HYS are uncorrelated. This model is probably unrealistic. However, it may not be in serious error and is one for which solution is practical when sires are related. Absorption of interactions and herds follows the pattern described above and in Appendix I. Then RS-1 *A-1 is added to the absorbed sire equations before solving. The ability to absorb interaction effects in solving equations [2] when Coy (7i7i') = gii' (Ip *A-l) is determined by t, q, and the form of A-1. For arbitrary A-1, an inverse of order tq must be computed at the end of each HYS class. Clearly this is impractical computationally when q is large as in most problems of dairy sire evaluation. Interactions for sires unrelated to sires having one or more daughters in an HYS can be ignored, since estimates of their interactions are zero and they are not included in absorption. Interaction effects for sires, unrelated to other sires but having one or more daughter in an HYS, can be absorbed as shown above. At the end of each HYS inverses of order tqr need to be computed for each group of qr interrelated sires from which one or more have one or more daughters in an HYS. Therefore, if only a few sires are represented in an HYS and these are from small groups of interrelated sires, it might be practical to absorb interactions. This might apply to many problems of dairy sire evaluation. This would be particularly true if only relationships larger than some arbitrary value, say 1/8, were considered. The exact details of an efficient computing method depend on the size and form of A-1 as well as the number of traits in the particular problem.
TECHNICAL NOTE The results depend critically on the assumptions that all traits are recorded for each observed individual and that all traits have the same model. This might be appropriate for milk yield, fat percent, and age at calving in Dairy Herd Improvement (DHI) data. Eliminating these restrictive assumptions while permitting sire by herd-year-season interaction leads to mixed model sire evaluation for multiple traits by (4). Computation of a solution for most sets of field data from dairy carrie is extremely difficult. Fi~'t Lactation Milk Yield and Age at First Calving
Application of the above methods to dairy sire evaluations for milk yield eliminating effects of age and interaction of sire by herd was examined. Henderson (3) has shown that treating herd or herd-year-seasons as fixed effects, even when they are random, avoids several biases in evaluation due to selection. Therefore, only sire and interaction of sire by herd-yearseason are considered random. Estimates of correlations between age at first calving and its square for sire, sire by herd, and residual were essentially one. Hence, quadratic effects were ignored. Estimates of components of variance and covariance among age at calving and first lactation milk yield with year-season from Table 6 of (5), were used for R, G, and S: Age
Milk yield
603
and 8.19 RS-~ = [_-362.00
R=
L326 kg mo
Age Milk yield R=
G=
48 kg mo 1.8 X 104 kg
Age Milk yield
for interaction effects with a correlation of .28, and
G=
0 ~1 50.1 X 104 kg
Age M.E. yield
[ 1.58 rno 2 L-33.69 kg mo
-33.69 kgmo ] 1.89 x 104 kg2J
Age
-.[4i2 mo 2 ~ 4 kg mo
44 kg mo ] 4.3 X 104 kg2J
Age Milk yield
for sire effects with a correlation of .27. Consequently, F 2.87 RG'1 = L-734.16
.0105] 30.96 _j
Age Milk yield
M.E. yield
with a - . 1 9 correlation, and S=
S=
M.E. yield
IO5"04"m°2
for residual effects with a correlation of .20, 1.58 mo 2 48 kg mo
Age Milk yield
Neither of the above matrices is diagonal. Therefore, both age at calving and actual milk yield appear to be required for sire evaluation. Due to large sampling variances associated with sire and sire by herd component of covariance, one might assume that S and G are multiples of R. Then S = ksR and G = kgR, making the correlations associated with R, G, and S all equal, then S-1 = (1/ks)R -1 and G-1 = (1/kg) R-1, making RG -1 = (1/kg)l t and RS -1 = (1/ks)It, which are both diagonal. The ultimate consequence is that sire evaluation for milk yield should use actual milk yield and ignore age at calving. One could assume that components of covariance between milk yield and age for sires and sires by herds are zero making G and S diagonal. This includes the special case where only the milk yield by milk yield elements are nonzero. This does not lead to diagonal RG -l and RS-I , and separate equations for each trait since R is not necessarily diagonal. An alternative approach is to use milk yield adjusted for the residual effects of age at calving (M.E. yield) instead of actual milk yield. If a regression adjustment is used, R, G, and S become: Age
326 kg mo 52.22 X 104 kg
-.0008] 12.51 _J
.62 rno 2 -6.59 kg mo ] -6.59 kg mo 3.99 × 104 kg2J
Age M.E. yield
with a - . 0 4 correlation. Muhiplicative ageseason adjustment factors presently in use (1) accomplish the same effect as well as stabilizing variances of yield across ages. Thus, conclusions from the above R, G, and S are likely appropriate for such records as well as those adjusted by regression. Jouraal of Dairy Science Vol. 61, No. 5, 19"78
604
LEE
Although R is diagonal, G and S are n o t so that Age ~ 3.55 RG-I = 11038.98
M.E. yield .0105 1 25.06 J
Age M.E. yield
-.000791 12.56 J
Age M.E. yield
and
RS-1 =
iI 8.13 -78.96
Neither of the above two matrices is diagonal. Sire evaluation for milk yield by M.E. yield also should consider differences in age at calving due to sires and sires by herds. For R G-t and RS-1 to be diagonal, S and G m u s t be diagonal. This is equivalent to no genuine sire or sire b y herd correlation b e t w e e n first lactation M.E. yield and age at first calving. This implies n o n z e r o correlations of actual first lact a t i o n yield with age at first calving due to sires and sires by herds. Sire evaluation with M.E. yield ignoring age at calving would be appropriate only when the correlations of M.E. yield with age at calving due to sires and sires by herds are zero. The large standard errors attached to estimates of R, G, and S presently available make it difficult to choose confidently a m o n g the different simplifying assumptions above or the estimates. Precise inference on the details of sire evaluation require better estimates of parameters. Present methods are only justified if sire and sire b y herd correlations between M.E. yield and age can be assumed zero. This m a y be no more appropriate than sire evaluation using actual yield ignoring age or considering age as a c o n c o m i t a n t trait using the methods o u t l i n e d earlier in this paper. More precise estimates of G and S for actual or M.E. yield and age at first calving would resolve this dilemma.
CONCLUSIONS A best linear unbiased prediction m e t h o d of evaluating related and possible inbred dairy sires was derived for multiple traits with possible interaction of sires by herd-year-seasons w h e n all traits are observed on each individual. Simplified c o m p u t i n g m e t h o d s for unrelated Journal of Dairy Science Vol. 61, No. 5, 1978
sires were developed. T h e y are practical for an arbitrary n u m b e r of herd-year-season and interaction classes providing there is a limited number of traits and the n u m b e r of sires is moderately large (e.g. 1,500). The m e t h o d also may be practical for related sires, providing only a few sires are in each herd and related sires form small groups unrelated to each other. Use of these m e t h o d s for evaluating dairy sires on first lactation milk yield and age at first calving was examined. However, the limited precision of parameter estimated precluded a clear choice b e t w e e n univariate and hivariate methods.
REFERENCES 1 Dickinson, F. N., R. L. Powell, and H. D. Norman. 1976. An introduction to the USDA-DHIA modified contemporary comparison in the USDA-DHI modified contemporary comparison-sire summary and cow index procedures. Prod. Res. Rep. No. 165. ARS-USDA, BeltsviUe, MD. 2 Henderson, C. R. 1974. General flexibility of linear model techniques for sire evaluation. J. Dairy Sci. 57:963. 3 Henderson, C. R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423. 4 Henderson, C. R. 1976. Multiple trait sire evaluation using the relationship matrix. J. Dairy Sci. 59: 769. 5 Lee, A. J. 1976. Relationship between milk yield and age at calving in first lactation. J. Dairy Sci. 59:1794. 6 Lee, A. J. 1977. Mixed model analysis of multiple traits with homogeneous variances and covariances. Biometrics 33 : 581. (Abstr.) 7 Norman, H. D., P. D. Miller, B. T. McDaniel, F. N. Dickinson, and C. R. Henderson. 1974. USDADHIA factors for standardizing 305-day lactation records for age and month of calving. Prod. Res. Rep. ARS-NE-40 ARS-USDA, Beltsville, MD. 8 Quaas, R. L. 1976. Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics 32:949.
APPENDIX
I
An approach which greatly eases solution of the equations [2] for unrelated sires in typical dairy cattle data is outlined. It depends on the special form of the X, Z, and W matrices and the diagonal form of A-x for this model. First, interaction effects are absorbed into the herd equations. Then, interaction and herd effects are absorbed into the sire equations. This leaves a set of equations of order tq to solve for q sires and t traits.
TECHNICAL NOTE The absorption of interaction effects is described. The last row of equations [2] is rearranged so that the t equations for a particular filled subclass are together. Equations for e m p t y subclasses can be deleted and ignored. A typical one for herd i and s i r e j is:
605
the elements of (nijI t + RG -1 )-1 for each sire in a herd. Let a solution for herd effects eliminating interaction be: "h'i = [ ~ n i j l t - ]~n~'Ci "1]-1 • J J J J (~Yij - ~nijTij) = Di 1 Yi.
j
(nijlt + aJJRG-1 )"/ij + (nijlt)hi + (nijlt)sj = Yij
[3]
where ~'ij = a vector of r a n d o m effects due to interaction of herd i with sire j in each of t traits. h i = a vector of fixed effects due to herd i for each of t traits. sj = a vector of r a n d o m effects due to sire j for each of t traits. Yij = a vector of observation totals for herd i and sire j in each of t traits. aJJ = the jth diagonal element of A-1 . ~j Yij
j
This is calculated at the end of each herd from the stored vectors and involves an inverse of only order t, the n u m b e r of traits. F r o m [5], [7], and [8], herd effects, h i , in terms of h i are: hi = "hi -- DTI1 .~(nijlt -- n?jCi7 )$j j
[9]
The second row of equations [2] is n o w rearranged to bring together equations for the same sire b u t different traits. A typical set of equations for sire j is: (]~nijlt + RS-I )sj + .~(nijh i) + i i
[4]
]~(nij'Yij) = .~Yij
where
i
Cij = (nijI t + aJJRG -1 )
[5]
sires absorbing interaction: (~]nijlt -- ~nijCij 2 -1 + RS-1)sj + 1
~](nijlt - nijCij 2 l )hi = ~(Yij - nij~'ij) i
R e a r r a n g e m e n t of the equations for herds in the first row of [2] similarily gives
J
(nijlt)h i + ~.(nijlt)s i + J .~ (nijTij) = .~ Yij J
[6]
J
J
J
)Sj
J
j
j
[111
Finally e q u a t i o n [9] is used to substitute for h i in [11] and order of s u m m a t i o n is reversed to give the typical set of equations for sire j, with both interaction and herds absorbed. (~. nijI t -- ~. n~jC;] + RS-' )sj 1
Substituting for 7ij [15]yields the following equations with interactions absorbed ( Zt n"i j i _ ~. n~jC~-jl)h i + Zt ( n" i j i _ nijCij2 -1
[10]
")'ij in [10] to give the following equations for
i
7ij = 7"ij-nijCii] (hi + sj)
t
Now equations [5] are used to substitute for
Equations [4] are only of order t, the n u m b e r of traits. For [3] and [4] the interaction f o r a particular size by herd cell is:
J
[81
-- ~ i'
1
~., (nijlt - - n2C -1 )D~ 1 ij ij l 2 1 (nij, It nij,Ciij,) sj' = ~Yij -- ~.nij~'ij -i l
[7]
This involves storage of several vectors of length q, the n u m b e r of sires. A vector is required to store nij , t vectors to store Yij and t 2 to store
~(nijlt n 2 C'7..1 )h i i -- ij ij
[12]
The matrix p r o d u c t within braces is of order t for each j and j'. Calculations within a herd Journal of Dairy Science Vol. 61, No. 5, 1978
606
LEE
need be done only for pairs of sires which both have at least one daughter in a herd. For all other combinations the matrix is null. The complete coefficient matrix in [12] for all sires and traits is of order tq. Accumulating across herd yields the final coefficient matrix and adjusted right hand sides. The computing method reduces the solution of equations [2] of order t(q + c + p) to solution of equations [12] of order tq. Because of the general form of equations [12] there may be little if any advantage in further use of the absorption method when sire evaluations are desired for all traits. Further use of the absorp-
Journal of Dairy Science Vol. 61, No. 5, 1978
tion method would be helpful if sire evaluations are needed for only one trait adjusted for all other traits. Collect sire equation for each trait together leaving the trait for which one desires sire evaluation for the last trait. For each trait adjusted sequentially solve for previous traits and ignoring later traits. This involves t solutions of equations of order q and associated adjustment of right hand sides. The solutions for the last trait are then solutions to equations [12] or [2] for s in that trait. This is an appropriate computing approach in dairy sire evaluation for milk yield adjusting for age at calving and possibly other traits.