A Generalized Computing Procedure for Setting Up and Solving Mixed Linear Models1 EILDERT GROENEVELD and MILENA KOVAC Department of Animal SCiences University of Illinois Urbana 61801 ABSTRACT
A generalized computing procedure applicable to a large set of problems in animal breeding including ordinary least squares, generalized least squares, and mixed model equations is presented. The procedure is implemented using sparse storage of coefficients and iteration on data. It allows any number of fixed and random effects, with any number of levels, any number of covariables. multipletrait models, heterogeneous variances, missing values. and multiple incidence matrices for different traits. Under the sparse storage scheme. the relationship matrix can be included without sorting. Subroutine interfaces are presented that allow identical treatment of the sparse and full stored coefficient matrices. The system of equations can be solved either by direct procedures or iteratively. Modern computers with adequate memory easily can accommodate 100.000 equations under the sparse storage scheme. The iteration on data version. reading data from memory. can accommodate problems up to to times larger. Reading data from disk further increases the scope as might be required in national breeding programs. Pseudo codes for both strategies are given. (Key words: mixed linear models, multiple incidence matrices. programming strategy)
models, the translation of matrix notation into program code does not seem to be equally straightforward. The literature reports many mixed model analyses: however. most are single trait and often are on precorrected data. Correspondingly, multiple-trait mixed model evaluations are not possible in the major statistical packages like SAS, the Statistical Package for the Social Sciences (SPSS). or the BMDP statistical software. VanRaden (12), using results from Harville (3), demonstrated that with some "tricks", mixed model solutions can be generated from the SAS package. However, this can also be seen as a strong indication that generalized computing procedures for mixed linear models are missing. Although there is nothing inherently new in translation of results from matrix algebra into scalar notation for program implementation. fairly little has been reported on computational strategies covering the complete area of ordinary least squares (OLS), generalized least squares (GLS), and mixed model equations (MME). Where computing strategies are presented, special cases are discussed (5, 9). The purpose of this paper is to present a generalized procedure to set up and solve systems of MME for multiple-trait. heterogeneous variances, missing data, covariables and, finally, different incidence matrices. A generalized program code will be developed that can accommodate anyone of the above models and also allowing rapid adaptation for OLS or GLS analysis.
INTRODUCT10N
COMPUTATIONAL PROCEDURES
Although matrix notation is very comprehensive and concise for all types of statistical
Direct translation of matrix notation into programming code generates very inefficient programs if the internal structure of matrices is not taken into consideration. because arithmetic operations are carried out on all elements whether they are zero or not. Thus. a large number of unnecessary operations can be avoided if the inherent structure of the equation system is exploited. Additionally, for real life
Received March 7.1989. Accepted May 25. 1989. lSupponcd by the Agricultural Research Station. Hatch Project ILLU·35·0322. Schaumarm StifllUlg (FRG). and Deutsche Forschungsgemeinschaft (FRG). 1990 J Dairy Sci 73:513-531
513
514
GROENEVELD AND KOVAC
data sets, operating with component matrices like X will usually exceed the memory capacity of most computers. Therefore, to generate efficient programs, the inherent structure of the systems of equations has to be exploited and translated into scalar equivalents. This is done by computing products of certain matrix operations like X'X, X'R-1X directly without forming the component matrices. Pseudo code based on FORTRAN 77 will be presented, starting with OLS and progressing to MME in sufficient detail to clarify the programming strategy. Setting up systems of equations for the complete coefficient matrix is basically independent of the technique used to solve them. The following presentation will demonstrate setting up coefficient matrices for various statistical models storing either the complete coefficient matrix or its nonzero elements only in memory.
sponding sum of observations for the same level. All of the following pseudo codes will assume that each effect is coded sequentially starting from 1. The basic program code for setting up this system of equations for OLS is given by pseudo code 1 in the appendix. Interaction can be treated as other main effects by coding the levels of interactions sequentially. Covariables. Including a covariable nested within the first effect in the model the scalar notation for X'X and X'y gives [4]: n\. 0 X'X=
nIl
n2.
nIl
n21 0.1 n21
n12
n22
n\3
n23
xI..
a
Ordinary Least Squares
0
a a
n22 n23 a X2.. a 0 XlI. x2I. n.2 0
a
Xl2. x22.
n.3
XI3. x23. 2 x 0 XlI. x12. x 13. I.. a x22.. X2.. x2\. x22. X23 a
The OLS system of equations in matrix notation is:
Yl..
X'Xb = X'y
Y2 ..
[ 1]
n12 n\3 xl.. 0
Y.\.
with X being the incidence matrix, and b and y the vectors of the unknown parameters and data, respectively. Solving for b gives: b O = (X'X) - X'y
[2]
and X'y =
Y.2. Y.3. LX IikY 1ik ik
L
ik
Levels of Fixed Effects. Carrying out the multiplications X'X symbolically for a classification model gives the well-known result for X'X and X'y in [3], assuming two main effects with two and three levels each:
X'X=
["'
~ll
n12 n\3
a n2. n21 n22 n23
nIl n21 n.l
a a
n12 n22
a a
n.2
n\3 n23
a a
n.3
]
X'Y{ YL] Y2.. Y." Y.2. Y.3.
[3]
with n\. being the number of observations in level I of factor 1, and YI.. being the correJournal of Dairy Science Vol. 73,
No.2, 1990
X 2ikY2ik
[4]
Programmatically, covariables should not be treated as a special case but rather within the classification model. The accumulation for each element in X'X can be written as: XXO=XXO + clxc2. When accumulating for portions in X'X that deal with main effects only, both cl and c2 are 1 giving nij' If we are dealing with main effects times the covariable, c 1 is 1 and c2 is the independent variable x, giving the subclass sums Xij.' Finally, dealing with the covariable portion of the matrix cl and c2 are the covariables that accumulate the sums of the products of the covariables. The subroutine "determine_cl_c2" in pseudo code 2 delivers the correct factors, which modify pseudo code 1 only slightly (modifications to the previous pseudo code are printed in boldface). Note that the covariables themselves cannot be used for addressing the X'X and X'y as with the main effects. If they are nested within a
515
SEITING UP AND SOLVING MIXED LINEAR MODELS
main effect, the level of this effect has to be used as a displacement. Moreover, the number of levels of the nesting effect is also the number of levels for the covariable. The logical variables "cov" specifies covariable effects, and "nested" specifies in which main effects the covariable is nested. The declaration part of the program has to be modified accordingly and the subroutines to modify the addressing of X'X based on the nesting effects and the computation of c 1 and c2 has been added giving pseudo code 2. This code also accommodates models with more than one covariable.
TABLE I. Data set I. Classific ali on
Traits
I
12
10
2 2
I 1
II
2
13
2 2
2 3
14 17
1 I
RO=
[~
a 12
~
with its inverse
a 12
R-oI --
[5]
sym
with the solutions:
In the following section. V is defined to be the residual variance covariance matrix R; thus, GLS fixed effects models only are considered.
Homogeneous Residual Covariance Matrix. In GLS with two traits, the residual covariance matrix would be R=I 18> Ro. I is the identity matrix of order number of units and Ro is the residual covariance matrix for traits measured
[ 4a11 4a12 X'R- 1X = [2a11 2a12 ] [ 1al1 2a 12 2a22 1a12 [Oa11 Oa12 ] [2a 11 2a12 Oa12 Oa22 [Oa11 Oa12 ] [1a 11 1a12 Oa 12 Oa 22
2.6
] [II ] sym
The GLS set of equations is:
0
3.\ 3.0
on the same unit. The residual covariance matrix is thus the direct product (18)) of I and Ro.
Generalized Least Squares
[2a11 2a 12 ] 2a 12 2a22
2.1 l.l 2.2
~2
[7]
Sorting traits within levels of effects gives a very distinctive block diagonal pattern of the R matrix. The structure of the coefficient matrix will be demonstrated using the six-record sample data set 1 (Table 1). The first two columns denote the levels of each of the two main effects with two observations on each experimental unit with covariance matrix Ro as in [7]. Setting up X and performing the operations X'R-IX gives the coefficient matrix in [8].
11 [2a 2a12
0
4a 12 ] 4a 22
1a12] 1a22 12 ] 2a22 1a12 ] 1a 22
2a
2a12 ] [Oa 11 Oa12 ] [Oa11 Oa12 ] 2a 22 Oa12 Oa22 Oa12 oa 22 11 1a12 ] [2a11 2a12 ] [ 1all 1a12 ] 1a 12 1a22 2a 12 2a 22 1a 12 1a 22 12 ] 11 0 0 3a12 3a 22 [2a11 2a12 ] 0 0 2a 12 2a 22 [1a11 1a12 ] 0 0 1a 12 1a22
[1a [3a 3a
[8J
Journal of Dairy Science Vol. 73,
No.2, 1990
516
GROENEVELD AND KOVAC
Similarly, we get for the right-hand side (RHS) expression [9]: a 12 ] a22
a12 ]
a 22
a12 ] a22
a
12 ] a22
[12.+10.] 2.1+1.1 [11.+13.+14.+17. ] 2.2 + 3.1 + 3.0 + 2.6 [12.+10.+11. ] 2.1+1.1+2.2 [13.+14. ] 3.1 +3.0
a12 ] [17.] 2.6
a 22
(9]
Going from one trait to two doubles the size of the equation system. Every element for each level within each effect in the coefficient matrix is replaced by a block of order number of traits. Accordingly, instead of accumulating l's for each record, now the respective elements of the inverse of the covariance matrix have to be accumulated. This leads directly to the programming algorithm: for each level within each effect a small block of size ntrait x ntrait has to be generated. Pseudo code 3 gives the modified pseudo code 1 for GLS. It should be noted that these few lines of code perform a multiple-trait evaluation for any number of effects and traits. If the model includes covariables, the respective lines from pseudo code 2 have to be included. But for a fixed model, GLS for multiple traits is identical to separate OLS analyses for each trait in the case of no missing values, homogeneous (co)variances, and identical incidence matrices for each trait (proof 1). Proof 1. The incidence matrix X for GLS under these restrictions can be written as direct product of incidence matrix for one trait (X I) and identity matrix of order of number of traits (In)' Thus:
/30 =
=
= =
I [(X; XI )- ® RoJ(X; ® R O ) y
=
{[(X; XI ® In)- (X'I ® In)lY
=
{[(X'I XI )-X'I] ® InlY
(X'X)-X'y
From this folJows that independent OLS analysis for each trait is equivalent to a GLS analysis for aU traits under the condition of homogeneous (co)variances, no missing values, and identical incidence matrices. Thus, pseudo code 3 on its own does not make much sense, because it can be replaced by a computationally cheaper OLS analysis. Heterogeneous Residual Covariance Matrix. In the example with homogeneous residual covariances. Ro is assumed to be the same for all records. With heterogeneous variances, different variances among the levels of one or some effects exist. Let us assume heterogeneous variances for the first effect in data set I. Two covariance matrices have to be specified, one for each level. The covariance matrix for level 1 would be as in (7] and the corresponding one for the second level is (10].
[? ] I
~-
sym
•
°12 c?:,. 2
[. ]
with its inverse RQI. =
all
a•l2
sym
~2
•
[10]
The complete R-I for these records from data set 1 thus becomes (11].
(X'R-IX)- X'R-Iy [(X'I ® In)(I ® ROIXX I ® InW (X'I ® In)(I ® RQ I)y
=
[(X; ® ROI)(X I ® Inn-(X'I ® RQI) Y
Journal of Dairy Science Vol. 73.
No.2. 1990
[11]
517
SElTING UP AND SOLVING MIXED LINEAR MODELS In this case the product X'R-1X is (12].
[2a 11 2a12] 2a 12 2a22 [ 4aj, 4a;2 ] [1a.;, 1a12 ] 0 4a12 4a22 1a'2 1a22 1a 11 11 1 2a12 ] [ i 1ai2 ] [2a 2a'2J [1a;1 1a!2 ] X'R-1X= [2a 2a 12 2a22 + 1a;2 1a22 2a12 2a22 1a12 1a22 [Oa 11 oa'2J [2ai1 2a~2 ] 0 Oa12 Oa22 2a 12 2a 22 [oa" oa'2J [1a;'1 1a;,2 ] 0 Oa 12 Oa22 1a'2 1a 22 [2a 11 2a12 ] 2a'2 2a22
0
[Oa 11 Oa12 [2a j, 2a'2
Oa12] [Oa 11 Oa22 Da 12 2a;2 ] [ 1a!1 2a22 1a'2
Oa'2] Oa22 1aj2 ] 1a22
0
0
[2a!1 2a!2] 2a 12 2a 22
0
0
[ 1a;1 1a!2 ] 1a'2 1a 22 [ 12J
The first diagonal block contains 2RQI, since both records generating this contribution originated in level 1 with the covariance matrix Ro. The same is true for all other blocks in the first row and column. However. all the diagonal blocks for levels 2 and 3 for the second effect originate in level 2 of the first effect, which has covariance matrix ~. The diagonal element for level 1 of effect 2, however. has a contribution from both levels, two records from levelland one from level 2. This is why both R01 and
observation mlssmg. the covariance matrix is
[~] with its inverse [a~I]' The a~l and all are generally not the same! Expanding the covariance matrix to the order of number of traits. by adding zero lines and columns as in [13], allows computation of the coefficient matrix and RHS following the procedure in pseudo code 4, because all covariance matrices have same dimensions.
R;= [~
R(j I· are present. In general terms, the part of R pertinent to the level in which the record originated has to be applied to both the left-hand and right-hand sides. The R(j1 can be stored in a three-dimensional matrix "v". The added third index accesses the covariance matrix corresponding to the levels of the effect that exhibit heterogeneous variances. Pseudo code 4 assumes heterogeneous variances for the first effect. The bold lines of code have to be added to pseudo code 3. Missing Values. Missing values may arise unintentionally in experiments due to culled animals having only generated part of their records or in selection processes with sequential cullings. For a two-trait complete record, the residual covariance matrix and its inverse are as in (7 J. For a record with the second
0 0
]
with its inverse
RQ1. =
[";\
0
0
0
1
[13]
With the first record missing, only the (2,2) element would be nonzero. Treatment of missing values is similar to that of heterogeneous covariances: instead of choosing the R 1 based on the level in which the record onginated. R01 has to be selected on the basis of the pattern of missing values in the record being processed. For an example of two traits with only the second observation missing, pseudo code 5 shows the modifications.
0
Journal of Dairy Science Vol. 73.
No.2. 1990
518
GROENEVELD AND KOVAC
Mixed Model Equations
The mixed model set of equations is:
[
X'R-IZ ] [ Z'R-IZ+G-I
with b and u being solution vectors for the fixed and random parts, respectively. The difference from the GLS set of equations is inclusion of G-I, the inverse of the covariance matrix of u. With an animal model and no relationships among animals, G-I is diagonal with the additive genetic variance as elements. In the multiple-trait situation it is a block diagonal with the additive genetic variance and covariance among the traits (Go), provided they are arranged within levels (in this case, animals). G-I can be added after the complete system is set up. Thus, programmatically, all effects (random or fixed) can be treated alike. Thus, it is convenient to use only one incidence matrix W as [X Z]. In the first step, the coefficient matrix W'R-IW and the RHS are set up from the data as described for fixed models. In mixed models, covariance matrices have to be added in step 2 for all random effects. The modification to pseudo code 5 shown in code 6 sets up the MME, assuming the third effect to be random. Relationship Matrix and Genetic Groups. Adding the inverse of the relationship matrix is straightforward following Henderson's rules (4). For this case, G-l is the direct product of the inverse of the relationship matrix A-I and Go. Because all elements of the coefficient matrix are accessed directly, the relationship data need not be sorted in any specific order. Pseudo code 7 gives the code for adding the inverse of the relationship matrix for an animal's relationship to its parents. The matrix "zeta" and vector "eta" are used to generate the coefficients ("lambda") suggested by Henderson. The pedigree file should contain one record for each animal to be evaluated. Each record includes animal, sire, and dam identification. If genetic groups for unknown parents are used in the model (10, 14), the unknown parent number is replaced by the group identification. The variable type is the number of known parents. For animal, parent, and grandJournal of Dairy Science Vol. 73.
No.2. 1990
b] U
[ =
[14]
parent relationships, coefficients in "zeta" and "eta" have to be changed accordingly. Multiple Incidence Matrices
In a multiple-trait analysis, data often originate in different environments, requiring different adjustments, i.e., different incidence matrices, for each or some of the traits. This problem is illustrated by the data in Table 2. The first three columns give the levels for the three effects A, B, and C, while the last three columns give three measurements (T!, T2, and T3) on each observational unit, which could be an animal. Let the first trait be defined for all effects, trait 2 for effects 1 and 3 only, and trait 3 for effects 1 and 2 as indicated in Table 3. Nesting traits within levels gives the incidence matrix W and the observation vector y in [15]. The first 9 columns in W pertain to effect A, the next 9 columns to B, and the last 6 columns to C. "100000000 010000000 001000000 000000100 000000010 000000001 100000000 0'0000000 001000000 000100000 w=oooo 10000 000001000 '00000000 010000000 1001000000 i0000001QO
000000010 1 ,000000001 !000100000 1000010000 ~O 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0 100000 000000000 OiOOOO 000001000 000000 100000000 100000 000000000 010000 001000000 000000 000000100 0001001 000000000 000010, 000000001 000000 100000000 000000000 ~~ggggi 001000000 000000' 1 000100000 000 000000000 000010 000001000 000000 000100000 100000 000000000 010000 000001000 000000 000000100 000 100 000000000 0000 10 000000001 000000 1 1 ,
001
t t t
10.0 23 r400.0
12.0 2.0 '350.0 80 18 3800 15.0 y" 2.5 410.0
130 19 450.0 12.0,
22 ' 3900 I 100, 20 3800,
t t [15]
Because trait T2 is not defined for effect B and trait T3 is not defined for effect C, the respective columns have only zero entries as
519
SEITING UP AND SOLVING MIXED LINEAR MODELS
TABLE 3. Definition of traits for effects I of
TABLE 2. Data set 2.
Animal A
B
C
T1
T2
T3
I 2 3 4 5 6 7
2 I 3 I 2 2 3
I
10 12 8 15 13 12 10
23 2.0 1.8
400
I
I
2 1 3
2
1 2 I 2 I 2
Trait
2.5 1.9 2.2 2.0
Ix
la"
!J
'"
au [ sym
a"'" a"'"
a" a" 0
] 2
0
o , [a" [a" a" a,,] [::: a" a,,]o [a" a" o , [a" [a" a" a"] a" a" [a"a~2 a"a" a,,] [a" o [ana~2 a'2 a,,] [an a" o 0
0
0
3 33
W'R"W=
8 13
~,
2
0
,
0
0
0
aX!
~,
0
0
1
a3:J
813
3 23
1 8 12
0
0
a 23
2
0
0
a l3
1 a t2
0
a'2 a"a" 0
0
a'a"2 a"a" a" a"a", 0
0
0
[a" a" a~, ] [a" a"a~, ] an [a" aXl a" a,31 [a" a:J o a" a,,] [a"a~2 a" a,,] a" fa" 2 8'2
,
,
0
0
0
2
0
o
0
a 12
L0
x x
x
= Defined. - = nOI
defined.
a
a,31 a,,] [a" a,,] [a" a"a" ~] an' a" a", aOJ r"aOJ OJ a" a'an2 1 ra" ::] a'2 a" raa" a" o a [~' a ] [a" t a"a,,] la" ana'2 a a'a",2 a,,] [~'a" a,,] [a" 0 +~ a"a", [""a"a a"a" oo a"a" a~, ] an] [a" a" o o [a" a a:n au Xl a" a" o a"a" a~, ] a~, ] [a" '" 0]o [a" a", a Xl a" a" o a" a" a~, ] a" o a,,] [a" a" [a" o a", an a" a,,] [a" a,,] [a" a,,] [a"a" a,,] [a., a"a,2 o a" ~1 a" o o o o a,,] [a" ~,] [a"a~2 a"a,,] a~ a" a"o 1 [a" o 0 a 12 0 8 23 0 ~ 0 1 a'2 0 32' 0 8» 0
0
8'2
x
Performing the multiplication of W'R-IW for data set 2 gives [17]. The corresponding quantity for the RHS after factoring out R l is [18]. The system of equations has five zero rows and columns corresponding to the zero columns in (15). Notice that apart from this, [t7] and [18] are the standard GLS expressions. Adding covariance matrices for random effects turns [17] into the coefficient matrix for mixed models.
o
0
[a" a'a"2 a,,] a" a" a" a",
0
o
[16J
0
3
380
data set 2 [16].
-all
2
390
o
r
x x x
I
2 3
350 380 410 450
indicated by the arrows in [15]. R-I is again a block diagonal matrix of variances and covariances among traits measured on the same unit. Its diagonal blocks R I are of order three for
R-oI --
model.
Effect
Data set 2
3
me
8 23
18,2
a:J:)
8 ,3
0
2
0
0 0 at] 0 0 0 l2 0 81] 0 0 2
0
8 23
1 8'2
0 0 0 0
8 33
8 13
0
r"
1 3'2
8 12
3' 3
:] 8 23 ~
8 33 j
8'3
8 33
8 13
0 0 0 0 0 0 0 0 0
1 a '2
1
2
0
3
0
2
0
0
2
0
0
0
0
~2
2 8'2
8 23
0
0
0
3 23
2 a'2
0
0
32'J
j
3. 2
0
0 0
t
0 8 21
0
~
1
a l2
0
0 0
t
8'2
0
0
0
0
~2
0
0 0 0
2
0 0 0 0
8n
t
4
01 0 0
0) 0 0
l2
3'3
0] 0
0] 0 ~2
12
0
0
~,
0] 0
0
0
0] 0 0
0
8'3
0
8
~
~
0,
'3
0] 2 0
0
8'3
~
31:)
8 12
0
0
~
OJ
0
0
0
0] 1
l2
2 0
0 0
a"a" a" [a"a a"a" a" [a" aa"22 a" [a" a"o an a' , [a" 2 ~1 a" [a" a" ~] a"
r"a"
2 a'2
0
3
[a" aa" 3\2
o
t
22
0
~]j~
t [17]
Journal of Dairy Science Vol. 73,
No.2, 1990
520
GROENEVELD AND KOVAC TABLE 4. Data sets 3a and 3b. 1
a" a'2 a'3] [10.+ 8.+13. ] a,2 a 22 a 23 2.3+ 18 +1.9 [ a'3 a23 a33 400.+380.+450.
[ ::~ ~~ :~] [~:o~:~o] 0
0
2.0 +2.5
a'3 a23 a33 350.+410. all a,2 a'3] [10.+13.+12. 000
[ a'3 a23 a" a'2 o 0 [ a13 a23 all a'2 a,2 an [ 0
o
a33 a'3] 0 a33 a 13 ] a 23 0
2.3+19+2.2 400.+450.+390.
]
0
0
[8.+10. ] 1.8 + 2.0 380.+380. [10.+12.+15.+12. ] 2.3+2.0+2.5+2.2 400.+350.+410.+390.
400.+380.+450
[18]
Multiple-trait multi-model evaluations are surprisingly simple to implement: rows and columns for a trait undefined at a particular level have to be skipped while the system of equations is set up. Alternatively. the complete system can be set up as if all traits are defined for all effects. Before solving the system of equations, the corresponding zero rows and columns have to be set to zero. In terms of program code. a matrix (preferably of data type "logical") keeps track of which effect is defined in which trait and that computations are being performed accordingly. It seems advantageous to rearrange the order of loops to make it easier to test whether a trait is defined for a certain effect, as shown in pseudo code 8. The example given assumed a fixed model. If effect A is assumed to be random, G-I has to be added to the respective portion of the coefficient matrix as described before. To demonstrate the versatility of the multiple model algorithm. consider a performance test with daily gain (00) and ultrasonic backfat (BF) measurement in boars (data set 3a, Table 4) and daily gain and litter size (L) in gilts (data set 3b, Table 4). Effects considered are covariable liveweight (WT) for BF, month of test (MoT) for 00 and BF, and season for L. The model for a simultaneous genetic evaluation is summarized in Table 5. Journal of Dairy Science Vol. 73.
No.2. 1990
(g)
SF
L
(mm)
1 2
- - 3a. Perfonnance records for boars - 125 1 678 12 118 3 712 15
3 4
- - 3b. Perfonnance records for gilts - 1 562 9 4 632 12
. IWT = Liveweight. MoT = month of test. DG gam. BF = backfat. L = liner size.
all a'2 a'3] [10.+8.+13. ] [ a'2 a 22 a 23 2.3+18+1.9
o
Season DG
(kg)
a" a,2 a'3] [12.+12. ] [ a'2 a 22 a 23 2.0 + 2.2
o
MoT
Animal WT
a" a'2 a'3] [15.+10. ] a,2 a 22 a 23 2.5 + 2.0 [ a,3 a23 a33 410.+380.
= daily
Data sets 3a and 3b are combined into one data set 4 to allow their joint analysis (Table 6). Liveweight is not defined for gilts, and effect season is used only for litter records. and thus, is not applicable to records from boars. After having generated the data file. the program has to be parameterized. Using the variable names from the pseudo codes. "ntrait" is equal to 3. "neff' to 4, and the logical vector "cov" set to ".true." for position 1 if the covanable is placed as the first effect in the model. The logical matrix "model" is set according to the entries in Table 5. Note that data set 4 reflects the missing value case treated in pseud~ code 5. From the pattern of missing value It follows that two Ro are required. one for boar records and the other for records from gilts (assuming complete records for boars and gilts). However, there is only one genetic covariance matrix Go for the three traits, DO, BF. and L. Thus, breeding values for DO, BF, and L are obtained for boars and gilts. Analysis of data set 4 requires a combination of pseudo code 2 (covariables), 3 (multiple traits), 5 (missing values), 6 (random effects). possibly 7 (relationship matrix), and 8 (multiple incidence matrices). Moreover. a program code
TABLE 5. Model defmition for data set 4. 1 Trait
WT
DG
Fixed Covariable Fixed
BF L
MoT
Season
Animal
Fixed
Random Random Random
. IWT = Liveweight, MoT = month of test, DG gam, BF = backfat. L liner size.
=
= daily
521
SETIING UP AND SOLVING MIXED LINEAR MODELS TABLE 6. Data set 4 1 Data set 4 Animal
MoT
WT
1 2 3
1 3
125 118
4
lMoT
= Month
of test, WT
= liveweight,
Season
DG
BF
L
678
12 15
missing missing 9 12
712
1
562
4
632
DG
= daily
compnsmg all of these elements can handle equally well simple single-trait OLS and multiple-trait mixed model analysis with different incidence matrices and missing values. Sparse Storage of Coefficients
The nature of MME as applied to animal breeding problems results in sparse coefficient matrices. The proportion of nonzero elements rarely exceeds 1% and for animal models is more likely in the order of .1 %. The lA, JA, A storage scheme is used as a general device to store and access the nonzero elements of sparse coefficient matrix (2). The vector A contains the nonzero coefficients, JA the column number of each nonzero element in the coefficient matrix, while IA contains the starting address of each row of the system of equations in JA and A. Because the total number of nonzero elements is known only after processing all data for a specific model, there is no easy way to set up these three vectors. Sparse Matrix User Interface. A set of subroutines has been written for setting up the lA, JA, and A vectors. The interface is designed such that no major changes have to be made to the previous pseudo code. Pseudo code 9 gives the adaptation of pseudo code 4 to sparse matrix storage. Apart from declaration of work space required, the line for setting up the coefficient matrix is being replaced by a call to subroutine "LHS". lA, JA, and A are the vectors, as discussed with dimension "lvec", which is the number of expected nonzero elements multiplied by a constant. Because hashing (6) is used to store and retrieve the locations of the coefficients in the conceptual complete coefficient matrix and the coefficients themselves additional work space has to be provided. For fast operation a constant in the order of 1.3 should be used. If memory is a restriction, this value can be re-
gain, BF
= backfat,
missing missing L
= litter
size.
duced at the expense of additional central processing unit (CPU) time required to store and locate the coefficients. As a general rule the large this number, the faster is the hashing. "nil' returns the number of nonzero elements stored so far. "s_type" defines the storage type, which can be either full or half-storage of the coefficient matrix. After all data have been read in, the procedure "CLEANUP_LHS" has to be initiated to eventually produce lA, JA, and A by compressing and sorting the three vectors. Solving Systems of Equations
So far, the system of equations has been set up directly in memory, either in its complete or sparse fonnat. Because the systems of equations are generally not of full rank, solving strategies have to be used that can accommodate singular systems. Subroutines that calculate a generalized inverse for matrices not stored in sparse fonnat are readily available [for algorithms, see (7)]. Multiplication of the inverse by the RHS produces the solution vector. This procedure is limited to small problems with equation systems perhaps up to 1000. However, computing time and rounding errors can be reduced by solving a system of linear equations directly without using its inverse (9). Various commercial sparse matrix solvers are also available (9) working with coefficient matrices in sparse fonnat. The Yale Sparse Matrix Package (SMPAK) (2) has been used. Input to this package is the lA, JA, A vectors described. The SMPAK solves directly by LU decomposition. The requirement for large additional work space by SMPAK has lead to investigation of Gauss-Seidel iteration on coefficients stored in sparse fonnat. To utilize memory most efficiently, GaussSeidel iteration should be perfonned on halfJournal of Dairy Science Vol. 73,
No.2, 1990
522
GROENEVELD AND KOVAC
stored coefficients based on the lA, JA, and A vectors. This requires modification of the algorithm applicable to full storage (7). Consider a full-stored system of equations in [19].
In half-stored format the lower diagonal of [19] would not be available. Rewriting [19] accordingly gives [20]:
[r
a12 a22 0
an a23 ] [ a33
[RHSl RHS2 - a2IbI RHS3 - a3I b l - a32b 2
bI b2 b3
]
=
]
[20]
Note that the zeros are not stored in sparse format. When performing Gauss-Seidel iteration. the first equation is solved for the first unknown, the second for the second. and so on. Because the coefficient matrix of the MME is symmetric, aI2 is equal to a21. After bl has been calculated, each coefficient of the current row has to be multiplied by the current solution and subtracted from the next RHS element. as indicated in [20]. Because the solutions change from one round of iteration to the next. this process has to be repeated for each round. Therefore, a temporary work space has to be provided to store the adjusted RHS. Various acceleration methods can be used to speed up convergence (1, 7. 13). The subroutine written to perform Gauss-Seidel iteration on half-stored coefficient in sparse format [20] allows specification of end-of-round relaxation (13). Scope of Sparse Matrix Techniques
Because sparse matrix techniques store only nonzero elements, the size of the system of equations that can be solved for a fixed amount of memory is a function of the number of nonzero elements. A genetic evaluation has been carried out on performance test data of boars (data set 5). The model had one fixed main effect (month of test), one covariable (liveweight) nested within the fixed effect, and Journal of Dairy Science Vol. 73,
No.2. 1990
three random effects (herd, litter, and animal). The 11.152 animals were evaluated on two traits simultaneously; 9297 animals had records. The order of the system was 30.015. Including the complete relationship matrix generated 385,200 nonzero elements. Setting up the coefficient matrix in lA, JA, A format and the RHS takes approximately 5 min CPU time on a SUN 4. This includes the cleanup procedure. On the basis of data set 5, Gauss-Seidel iteration on coefficients leads to a program of around 9 MB, while the use of SMPAK results in a program of more than 19 MB. The respective CPU time was 50 min for Gauss-Seidel iteration on coefficients and approximately 95 min for the SMPAK. The stopping criterion for the iteration on coefficients was no change in the third decimal place of any solution between consecutive rounds. The GS iteration on coefficients as described clearly requires less time than SMPAK for the given stopping criterion. The CPU time measurements seem to indicate that Gauss-Seidel iteration on coefficients is superior if only one solution is to be obtained, as is the case in predicting breeding values. If a system of equations with the same coefficient matrix has to be solved repeatedly [generating inverse (E. Tholen, personal communication). variance component estimation (I. Mizstal, personal communication), solving systems with multiple RHS], direct solutions performed by SMPAK may be superior. Gauss-Seldel Iteration on Data
It is clear that very large problems such as national genetic evaluations exceed the capabilities of the sparse matrix techniques on generally available computers. Creating the current equation from data before it is used leads to Gauss-Seidel iteration on data. which has proven to be a very powerful tool in such situations. The following translates these procedures for setting up the complete coefficient matrix into the context of iteration on data, thus going well beyond the size of problems possible with sparse matrix techniques. Input and Output Considerations. The general strategy to solve systems of equations by Gauss-Seidel iteration has been based on reading data from files repeatedly and having multiple copies of the data file sorted according to
523
SETIING UP AND SOLVING MIXED LINEAR MODELS
this routine and passed in "current_addr". Furthennore, covariates are accommodated and the pertinent RQI in the case of missing values or heterogeneous variances is supplied by "read_data". In step 0 of pseudo code 10, data and pedigree infonnation are read into memory. Main effects are stored in an integer table to avoid data type conversion during iteration. Depending on the number of levels integer*2 or even integer*1 could be used to drastically cut down on memory requirements. Covariates and traits are stored in matrices of type real. The pedigree matrix contains entries as described in pseudo code 7 plus two addresses that link parents with their offspring. Instead of initializing the solution vector with zeros, previous solutions can be read in. To explain the principles of the algorithm applied, the MME are split into smaller subsystems [21]. The vector of unknown parameters for fixed effects b is broken up into b(, which consists of effects with a smaller number of levels like breed, month of test, regression; and a generally larger b2, which includes all other fixed effects. Random effects are divided in two vectors. The first, v, contains effects with independent levels (litter, herd) and a vector u for additive genetic effects (sire, animal). Matrices X J, X2, Zy, and Zu are corresponding incidence matrices. Accordingly, the covariance matrix for random effects is split into G y and G u. The random effects in v and u are assumed to be independent of each other. Matrix R-l is the inverse of covariance matrix for residuals. In addition, genetic groups g for unknown parents (10, 14) are added to the system and mapped to "animals" with incidence matrix Q.
each effect (11). However, this is only useful when memory constraints do not allow storage of all the coefficients of one effect. With memory sizes of many megabytes being the rule rather than the exception, this technique does not seem to be appropriate any longer (8). Contrary to Jacobi iteration, which does not guarantee convergence (7), Gauss-Seidel requires processing data once for each effect for each iteration. Even with optimized input and output (I/O) this puts a high load on the computer system if data are being read from disk. Accessing records repeatedly is much more efficient if data can be stored in memory. Comparing binary I/O from disk to memory access on a SUN 4 workstation reduced the sum of CPU and system time by a factor 150. Even large data sets in the order of hundreds of thousands can be stored in memory on most computers. Setting Up and Solving the Equation System. The main reason to apply iteration on data is its efficient use of memory. The special structure of MME is not exploited in either the direct solver in SMPAK or in the GS iteration on coefficients derived from [20]. However, pseudo code 10 gives the algorithm for iteration on data exploiting the special structure of the MME. Pseudo code 10 comprises the complete set of algorithms, allowing any number of fixed and random effects, covariables, multiple incidence matrices, heterogeneous variances, and missing values, whereas previously each problem has been illustrated with separate codes. The same variable names have been used wherever possible. The routine "read_data" provides one record to the calling program. Direct addresses in the solutions vector are calculated in X'R-1X I I X'R-1X 2 I Z'R-1X
Z'R-IX
Z'R-IX
Z'R-1X
v
u
0
1 1
X'R-IX I
X'R-1X 2
v
u
0
2 2
2 2
I X'RI Y
X'R-IZ I y
X'R-IZ
u
0
13 1
X'R-IZ
X'R-IZ 2 u Z'R-IZ
0
132
X~R-Iy
0
V
1 Z'Rv Y
Z'R-IZ + G- I
--G~lQ
U
1 Z'Ru y
-Q'G-u 1
-Q'G-IQ
g
0
2
Z'R-IZ v
Z'R-IZ u
0
v
+ G-v I v v
I
v
U
U
U
U
u
[21]
The system is solved iteratively using a combination of direct solvers and Gauss-Seidel iteration as suggested by Henderson (13). Each
iteration is perfonned in three relatively independent steps. In the frrst one, the first equation in [21] is simultaneously solved for all effects Journal of Dairy Science Vol. 73.
No.2. 1990
524
GROENEVELD AND KOVAC
x,
in vector ~I [22]. ~~ = (X'jR-1X I)- X~R-I(Y-Xzll~-l- Zvyn-I_ zuun- I)
[22]
Prior to the first iteration, the appropriate coefficient matrix for the first block of equations X~R-IXI is set up and factorized. This is the first part in step 0 of pseudo code 10. In step I, each observation y is adjusted for all effects not included in ~I and accumulated in a
-I( y- X21-'2 An-l - Z vv_Jl-l - Z uu n-I) . 1R new RH S Subroutine "accum_rhs" carries out these operations. Its code is identical to the subroutine "blup_c", except for the "if' bock where "sigma" is accumulated. The small system [22] is solved directly using "forward and backward substitution". the results are new improved solutions for effects in vector ~1. Applying Gauss-Seidel to the second and third equations in [21] gives [23] and [24]. [23]
1 _Jl = (Z'R-1Z An _ X21-'2 An - Z uun-I) v y v + G-v )- Z'vR-1(y - X II-'I
More than one effect can be included in the vector of unknown fixed effects ~2 as well as in vector of unknown random effects v. Each of the effects usually contains a large number of independent levels. The equations that belong to one effect only are solved one group at a time. This is reflected by the "do" loop around step 2 in the main program. The coefficient submatrix for one effect has a very simple form; it is diagonal or at least block diagonal in multiple-trait analysis. The covariance matrix
[24]
has to be added for the random effect v, which is done in step 0 prior to iteration. Observations adjusted for all other effects using the last solution are accumulated for the RHS in step 2. Each small block, which includes all traits within a level, is solved directly. The procedure has to be repeated for each effect in ~2 and v. The third part (step 3) solves for additive genetic effects [25] with or without relationships and genetic groups [26].
(Z~R-IZU + D 1 ® G~~)-w w=
{Z~R-l(y - XI~~ - X2~2 - Zv~) + Q'G~Ig"-l (Q'G~IQ)- Q'G~lun
= matrix of diagonal elements of inverse
of numerator relationship matrix, of all off-diagonal elements of inverse of numerator relationship matrix, G- 1 = inverse of additive genetic covariance uo matrix among traits, and w =RHS. D2
= matrix
Observations are adjusted for all other effects and accumulated in RHS in subroutine "blup_c". Note that solutions for all other effects are available from the present round. The
Z~R-IZu is a sum of inverses of the residual Journal of Dairy Science Vol. 73.
G~;J
u}
[25]
[26]
where D1
- [D2 ®
No.2. 1990
covariance matrices for all observations on an animal. In addition, the RHS includes information from relatives (- [D 2 ® G~~J u) or genetic groups (+ Q'G~lgn-l) and D 1 ® G~~is added to the coefficient matrix in subroutine "relation". This routine is similar to pseudo code 7; however, it has to consider all contributions from relatives using the pointers in the pedigree table. Since the animal solutions are generated by GS iteration one block of equations for all trait" within the animal is solved at a time. This requires the knowledge of the offdiagonal elements among relatives. Therefore, data and pedigree files have to be sorted by effect "animal". There is no need for special treatment of
SETTING UP AND SOLVING MIXED LINEAR MODELS
genetic groups. They are processed the same way as animals without records. The small system (traitxtrait) is solved directly. The memory requirements are largely determined by matrices for data and pedigree information, and vectors "solution", "work", "diag", and "sigma_first". Data set 5 requires 796 kB for data and pedigree storage and 320kB for the rest. This compares very favorably with the memory requirements given for the sparse matrix techniques reducing the memory requirements for the given data set by a factor of 19. An assumed maximum program size of 19MB would allow to solve a system of around 900,000 equations under the animal model. CONCLUSIONS
These procedures are very powerful and flexible tools to tackle a large variety of problems in animal breeding. With systems of equations up to 100,000, sparse matrix techniques facilitate compact source code. Because solving the system of equations is independent of its structure, alternative solving strategies can be used simply by exchanging the solvers in the source code. For direct solutions, solvers from sparse matrix packages can be used. However, Gauss-Seidel iteration on coefficients based on the same IA. JA, A fonnat reduces memory requirements, since much less work space is needed. Also, CPU time is decreased. It should be noted that in a multiple-trait model, the degree of filling of the coefficient matrix is always higher than in a single-trait model of the same order, because for each entry in the coefficient matrix in the single-trait case we have ntraitxntrait entries. Therefore, using sparse matrix techniques, much larger systems of equations can be solved for a given amount of available memory in single-trait cases. Memory requirement is considerably reduced by adapting the described computing strategies to iteration on data. Reading data repeatedly from disk is a severe limitation and is avoided if data are initially read into memory. This has become possible because memory size of computers is increasing rapidly. If memory is still a limiting factor relative to the size of the problem, the source code can be changed easily to read all data or only those parts that do not fit into memory from disks. Because the block of equations pertaining to ~1 is solved simultaneously, the decision about which effects to place in it considerably influ-
525
ences memory requirements and processing time. Although placing an effect with many levels increases memory requirements, it also tends to speed up convergence of the complete system, thus reducing the number of iterations. Note that the strategies presented in pseudo code 1 through 9 are directly applied when setting up the ~1 coefficient submatrix. Thus, the proposed algorithm for iteration on data comprises both strategies: setting up the coefficient submatrix in memory and solving the subsystem and Gauss-Seidel iteration on data. Because one basic program code can deal with most of the linear models in animal breeding and research, adaptation to specific problems is easy and quick. Software described in the text is available from the authors. ACKNOWLEDGMENTS
Helpful discussions with R. L. Fernando are gratefully acknowledged. The valuable suggestions from the referees is greatly appreciated. REFERENCES 1 Berger, P. G., G. R. Luecke, and J. A. Hoekstra. 1989. Iterative algorithms for solving mixed model equations. J. Dairy Sci. 72:514. 2 Eisenstat, S. C.• M. C. Gursky, and A H. Sherman. 1977. Yale sparse matrix package. 1. The symmetric codes. Res. Rep. No. 112, Dep. Comput. Sci., Yale Univ. 3 Harville, D. A. 1986. Using ordinary least squares software to compute combined intra-interblock estimates of treatment contrasts. Am. Stat. 40:153. 4 Henderson, C. R. 1976. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding value. Biometrics 69:83. 5 Henderson, C. R. 1984. Application of linear models in animal breeding. Univ. Guelph, Guelph, ON, Can. 6 Martin, J. 1977. Computer data-base organization. Prentice-HaIl, Eaglewood Cliffs, N1. 7 Maron, M. J. 1987. Numerical analysis. MacMillan Publ. Co., New York, NY. 8 Misztal, I., and D. Gianola. 1987. Indirect solution of mixed model equations. 1. Dairy Sci. 70:716. 9 Press, W. H., B. P. Aannery, S. A. Teukolsky, and W. T. Vetter1ing. 1986. Numerical recipes. Cambridge Univ. Press, New York, NY. 10 Quaas, R. L. 1988. Additive genetic model with groups and relationships. 1. Dairy Sci. 71:1338. 11 Schaeffer, L. R., and B. W. Kennedy. 1986. Computing solutions to mixed model equation.. Proc. Third World Congr. Genet. Appl. Livest. Prod., Lincoln, NE. 12:382 12 VanRaden, P. M. 1987. Using PROC GLM of SAS to obtain mixed model estimates. J. Anim. Sci. 65(Suppl. 1):95. (Abstr.) 13 Van Vleck. L. D., and D. J. Dwyer. 1985. Comparison of iterative procedures for solving equations for sire evaluation. J. Dairy Sci. 68: 1006. 14 Westell, R. A, R. L. Quaas, and L. D. Van Vleck. 1988. Genetic group in an animal model. 1. Dairy Sci. 71: 1310. Journal of Dairy Science Vol. 73,
No.2, 1990
526
GROENEVELD AND KOVAC
Appendix
pseydo Code 1:
Ordinary Least Squares
inteqer neff.dimx ! declare number of effects and dimension of system parameter (neff-3.dimx-2+3) ! supply numerical values inteqer nlevel{neff).xx(dimx.dimxl.effect(neffl.start(neff) real y.rhs(dimxl data nlevel/2.3/ specify number of levels for each effect start (1)-0 define start position of each effect in X·X and X'y do 1-2.neff start(l)-start(l-l)+nlevel(l-l) end do read (I effect.y ! input record do el-l.neff ! determine start addresses in current addr(ell-start(ell+effect(el) ! coefficient matrix for current record end do do el-l.neff i-current addr(el) do e2-l. neff j-current addr(e2) xX(i.jl-xx(i.jl+l set up LHS end do rhs(il-rhs(i)+y accumulate subclass totals end do
read next record
Pseudo Code 2:
Treatment of Covariable.
integer neff,dimx! declare number of effects and dimension parameter (neff-3.dimx-2+3+2J supply numerical values inteqer nlevel(neff).xx(dimx.dimx).effect(neff).start(neffl real y.rhs(dimx).c(n.ff) inteqer nested (neff) logical coy (n.ff» data nlevel/2~3,21 ! specify number of levels for each effect data coy /.fals •.•. fals •• ,.tru•• / data n.sted/ O. 0, 1/ start(ll-O ! define start position of each effect in X·X and X'y do 1-2.neff start(ll-start(l-ll+nlevel{l-ll end do
read () effect.y
! input reeD rd
call d8tQ~ne_cl_c2 (neff, cov,nasted,effact,c) do el-l, neff ! start addresses current addr(ell-start(ell+effect(ell end do do el-L neff
i-current addr(el) calculate address for effect el do e2-L neff j-current addr(e21 calculate address for effect e2 xxIi. jl-xX(i. j)+c(el)·c(.2) end do
rhs(i)-rhs(il+y.c(.1) end do qoto 100
! accumulate subclass totals
subroutine det.rmin. c1 c2(n.ff.cov.n.st.d••ff.ct.c) do al-l,neff - if (cov(.l») then c (.1) _ff.ct (.1) if (n.sted(.1).gt.0) th.n .ff.ct(.1)_ff.ct(n.st.d(.1»)) .1.8 .ff.ct(el)-l .nd if 811181
c (el) -1 . • nd if end do return .nd
Journal of Dairy Science VoL 73.
No.2. 1990
527
SETIING UP AND SOLVING MIXED LINEAR MODELS
pseudo Code 3:
Generalized Least Squares, MultIple-Trait Evaluatlon
integer neff,dirnx,ntrait ! declare parameter (neff=3,dimx=2+3,ntrait=2) integer nlevel(neff) ,xx (dimx*ntrait,dimx*ntrait) integer effect (neff) ,start (neff) real y(ntrait),rhs(dimx*ntrait) real v(ntrait,ntrait) data nlevel/2,3/ specify number of levels for each effect start(l)=O define start position of each effect in X'X and X'y do l~2,neff start (l)=start(l-l)+nlevel(l-l)*ntrait end do specify var~ance / covariance matrix v invert v
c-
read () effect,y input record do lpl=l,neff i=start(lpl)+effect(lpl) calculate address for effect lpl do lp2=l,neff j=start (lp2)+effect (lp21 calculate address for effect lp2 do lp3=l,ntrait ! loop over small trait block do lp4=1, ntrait ii=i+lp3-1 determlne start with subclass jj=j+lp4-1 xx(ii,jj)=xx(ii,jj)+l,*v(lp3,lp4) end do end do end do weigh observations with variance / covariance matrix do lpl=l,ntrait do Ip2=l,ntrait ii=i+lp2-1 rhs(ii) = rhs(ii) + v (lpl,lp2) * Y (lp2) end do end do end do read next record
Generalized Least Squares: Heterogeneous Variances
pseudo Code 4' real
! heterogeneous variances for two levels
v(2,ntrait,ntrait)
specify var~ance covar~ance matrices invert variance covariance matrices
input record specify level of hetero. variance
read () effect,y which=effect(l) do lpl~l,neff
xx(ii,Jj)-xx(ii,jj)+l.*v(which,lp3,lp4) end do
c--
weigh observations with variance / covariance matrix rhs (lpl) - rhs
(lpl) + v
(which,lpl,lp2)
* y
(lp2)
end do read next record
Journal of Dairy Science Vol. 73,
No.2. 1990
528
GROENEVELD AND KOVAC
pseudo Code 5' real
c-
Generalized Least Squares: Multiple-Trait, Missing Values
v{2,ntrait,ntrait)
specify vel) for full matrix invert v (1) specify v(2) for first trait only (= zero second line and column) invert v(2) read () effect,y ! input record .peci~y pattern o~ missing value. if (y(l) .qt.O.and.y(2) .qt.O) then which=l else which=2 end if do lpl=l,neff
read next record
Pseudo Code 6:
c--
end do read next record add G-1 to random effect after all records are read in: i=start(2) do lpl=start(2),.tart(3) do t1=l,ntrait i=i+1 j=i-1 do t2=l, ntrait j=j+l xx(i,j)=XX(i,j)+q(t1,t2) end do end do end do
pseudo Code 7'
c--
c--
Mixed Model Equat1ons, Random Effects
Add1ng the RelationshIp Matr1x
data zeta/2,-1,-1,-1, .5, .5,-1, .5, .5/ coefficients for animal parents reI. data eta/l., .66667, .5/ specify additive genetic covariance matrix 9 invert 9 and store in inv g add relationship matrix with no inbreeding and genetic groups for unknown parents read () animal (1) ,animal(2l ,animal (3) ,type ! pedigree do i=l,3 if (animal(il.ne.O) then no parent or genetic group? do j=l,3 if (animal(j) .ne.O) then lambda~zeta{i.jl*eta(type) coefficient to be added add to trait * trait block: do lpl~l,ntrait ii~start(neff)+animal(i)*ntrait-ntrait+lpl start address do lp2=l,ntrait jj-start (neff)+animal{j) *ntrait-ntrait+lp2 xx(ii,jj)=xx(ii,jj)+lambda*inv_g(i,j) end do end do end if end do end if end do read next oediqree record
Journal of Dairy Science Vol. 73.
No.2. 1990
529
SEllING UP AND SOLVING MIXED LINEAR MODELS
pseudo Code 8'
Mult1ple Inc1dence Matr1ces
model(l,l)-.true. model(I,2)=.true. model(I,3)~.true. model(2,1)~.true.
model(2,2)=.false. model(2,3)=.true. model(3,1)=.true. model (3,2)-.true. model(3,3)=.false.
trait 2 not defined for effect 2
trait 3 not defined for effect 3
read () effect,y c-
c-
set up incidence matr~x and RightHandSide: do i=l,ntrait do k=l,neff if (model (i,k) ) then ii-cadr(k)+i-l do j=l,ntrait do l=l,neff if (model(j,l» then )j=cadr(l)+J-I xx ( ~ i, j j ) =xx ( ii, j j) + 1. ' v (i, j) end if end do sum adJusted traits: rhs (ii) = rhs (ii) + v(i, j) • y(j) end do end if end do end do
pseudo Code g.
General fzed Least Squares: Heterogeneous Variances Us1nq Sparse Matr1x User Interface
integer Ivec,nze,e type parameter (nze=300000, Ivec=nze*1.3) parameter (e_type=l) real
number of nonzero elemenmts full storage heterogeneous variances for two levels
v(2,ntrait,ntrait)
realeS a(lvec) integer ia(lvec),ja(lvec)
storage for coefficients storage for pointers and columns
specify variance covariance matrices invert variance covariance matrices
read () effect,y which=effect(l) do Ipl=l,neff
c
~nput record specify level of hetero. variance
xx(ii, Jj)=xx(ii, jj)+1.'v(which,lp3,lp4) call Ihe (ii,jj,l.*v(which,lp3,lp4),ia,ja,a,nij,lvec,e_type) end do
c--
weigh observatlons with variance / covariance matrix rhs
c--
(lpl) - rhs
(lpl) + v
(which,lpl,lp2)
• Y (lp2)
end do read next record traneform buffer. into final lA, JA, A format call cleanup_lh. (ia,ja,a,nij,lvec,dimx*ntrait)
Journal of Dairy Science Vol. 73.
No.2. 1990
530
GROENEVELD AND KOVAC
Pseudo code 1Q)'
c--
I terat 1on on data
PROGRAM mainJ'rogram parameter (max_level ~ ntrait*max(level(i),i=l, neff-1» parameter (dim - dimx*ntrait) parameter (dim_first - dimension of vector beta1 in [22]) parameter (dim second - dim. of b2) real*S solution (dim) , work(max level) real*S sigma first (dim firste(dim first+1)/2) realeS diag (dim second*(ntrait+1)/2) data storage: integer*4 data_effect (neff-ncov,nrecord) real*4 data trait (ntrait ,nrecord) ,data covariables(ncov,nrecord) integer*4 data~edigree(6,level(animaleffect» read data and pedigree file into memory
c-c--
c-c--
STEP 0
build coefficient matrix for (22) and perform LU decomposition (sigma first = upper triangular matrix after LU decomp.) build coefficient matrix for (23) and (24), LU decomposition (diag() - upper triangular blocks after LU decomp.) iterate while not converged accumulate RHS for [22J and solve for beta1 call subroutine first STEP 1 accumulate RHS for [23) or [24J and solve do i-neff first + 1, neff first + neff second call subroutine second (1) STEP 2 end do accumulate coefficient matrix and RHS for additive genetic effect and genetic groups call subroutine third STEP 3 end of iteration write output end
c ----------------------------------------------------------------subroutine first do i - 1, ;eff_first call accum rhs(work) end do call solution(dim first,sigma first,work) perform acceleration like "end of round relaxation" return end c ------------------------------.-----------------------------------
subroutine second (which effect) real*S sig~M(ntrait*(ntr;it+1)/2) call accum rhs(work) ii = start(which effect) do j - 1, nlevel(which effect) Jj = j*trait-ntrait+1 sigma =: corresponding subvector of "diag" call solution (ntrait, sigma, work(jj» perform acceleration like "end of round relaxation"
end do return
end c -----------------------------------------------------------------
c--
subroutine third process all animals and genetic groups real*S sigma (ntrait* (ntrait+l» real*S rhs(ntrait), dummy (ntrait) ,m(neff) character*5 record
Journal of Dairy Science Vol. 73. No.2. 1990
531
SETIING UP AND SOLVING MIXED LINEAR MODELS
c--
logical end_flag do animal 1, n animal+n group level = effect(animal)-+ start(factor) + 1 if (relationship)then process relationship data .. without inbreeding call relation (animal,solution(start (neff) +1) , inv_g(lanimal_effect, rhs, sigma) else !no relationship
c--
add genetic variance-covariance matrix on diagonal do i = 1, ntrait*(ntrait-1) sigma (i) gvar (last_g+i-l) end do end if if (end_flag) then
c--
c--
process records .. call blup_c(factor,level,rhs, sigma, end_flag, record) end if solve for a new solution for the effect (animal,group) call solution(ntrait,sigma,rhs) perform acceleration like "end of round relaxation" end do return end
c -----------------------------------------------------------------
subroutine blup_c(factor, level, rhs, sigma ,end_flag, record)
c---
c-c--
c--
c--
proceed if animal has record do while (level.eq.current addr(factor) .and.end flag) adjust traits (y) for all effects except cur~ent effect do i = 1, ntrait dummy (i) = y(i) do k = 1, neff if (k.ne.factor)then if (current_addr(k) .ne.O)then dummy (i) = dummy(i) - solution(current_addr(k»*m(i,k) end if end if end do end do accumulate adjusted traits in RHS and coefficient submatrix in SIGMA do i ~ 1, ntrait different model and missing value if (model (factor, i) .and. y(i) .ne. O. )then yy = O. accumulate coefficient submatrix for the level do j = 1, ntrait if (model (factor,j) .and. y(j) .ne. O.)then k = pos(i,j,ntrait) sigma(k) = sigma(k) + v(i,j) end if yy = yy + v(i,j)*dummy(j) end do accumulate RHS rhs(i) = rhs(i) + yy end if end do call read_data('NEXT' current_addr, y, m, v, end_flag) end do return end
Journal of Dairy Science Vol. 73,
No.2, 1990