Solving Large Test-Day Models by Iteration on Data and Preconditioned Conjugate Gradient ¨ NTYSAARI, M. LIDAUER, I. STRANDE´N, E. A. MA ¨ ¨ J. POSO, and A. KETTUNEN Animal Production Research, Agricultural Research Centre, FIN-31600 Jokioinen, Finland
ABSTRACT
INTRODUCTION
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a singletrait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model {random regression test-day model} required 122 {305} rounds of iteration to converge with the reference algorithm, but only 88 {149} were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm. (Key words: iteration on data, preconditioned conjugate gradient, test-day model)
Recently, more accurate and realistic statistical models have been introduced for dairy cattle breeding value estimation (7, 13). Particularly, implementation of test-day models into routine national evaluations of breeding values has been reported in several studies (6, 7, 10, 15, 21). One practical difficulty in the utilization of test-day models is heavy computing requirements arising from a dramatic increase in the number of unknowns to be solved. For example, in Canada a multiple-trait, random regression test-day model (RRM) with 72 equations per animal with records led to mixed model equations (MME) with over 87 million unknowns (7). Under Finnish conditions, replacement of the current single-trait repeatability animal model with a multiple-trait RRM would increase the number of unknowns in the MME from 3.5 million to about 50 million. The MME of such size can only be solved by powerful iterative methods. Most common iteration algorithms for the estimation of breeding values (e.g., second-order Jacobi, Gauss-Seidel, and successive overrelaxation) belong to the family of linear stationary iterative methods (4). Consecutive solutions obtained by these algorithms converge by a geometric process. Solutions approach the true solutions rapidly during the early stage of iteration but slowly at the later stage of iteration (12). Theoretically, to obtain the true solutions would require an infinite number of iterations. Consequently, the stopping criterion of the iteration process is a compromise between accuracy of solutions and costs of computations (6, 15). A proper stopping criterion may often be difficult to find because the accuracy of intermediate solutions is unknown, and the formulas that give a good approximation of the relative error involve considerably more computation (4). To overcome this problem, quasi-true solutions, obtained by performing many iterations, are often used to assess the iteration round in which the desired accuracy of solutions has been reached (12, 15, 19, 22). The low rate of convergence may require too many iterations to get quasitrue solutions. Thus, for very large MME, the empirical investigation of the stopping criterion might be
Abbreviation key: GSSJ = Gauss-Seidel second-order Jacobi, LSC = least significant change in indices, MME = mixed model equations, PCG = preconditioned conjugate gradient, RRM = random regression testday model, STM = single-trait animal model.
Received January 5, 1999. Accepted June 4, 1999. 1999 J Dairy Sci 82:2788–2796
2788
2789
SOLVING LARGE TEST-DAY MODELS
impossible. Moreover, it is questionable whether the stopping criterion validated in a subset of the data applies to the complete data, which might behave differently (pedigree length and connectedness). If so, uncertainty exists as to whether the solutions have converged at a given round of iteration when using Jacobi or Gauss-Seidel or related iterative methods for solving large MME. Methods based on conjugate gradients (5) have become dominant in solving linear systems of equations in the field of numerical analysis. These methods give the true solutions in a finite number of iteration steps (4). Furthermore, parameter estimates, like relaxation factors in second-order Jacobi or in successive overrelaxation, are not necessarily required. In animal breeding, only few studies have investigated the potential of conjugate gradient methods in solving large linear models. In solving a multiple-trait sire model (20), the conjugate gradient method was found to be less efficient than the successive overrelaxation method with an optimum relaxation factor. The conjugate gradient method was 55% more efficient than successive overrelaxation when the diagonal of the coefficient matrix was used as a preconditioner (1). Also Caraban˜ o et al. (2) found the preconditioned conjugate gradient (PCG) method to be superior over GaussSeidel related algorithms but observed that the method was not stable in certain cases. In all of these studies, the size of the MME was less than 21,000 equations, which left unsolved whether the conjugate gradient method was favorable for solving large MME. The objective of this study was to implement the PCG method into an iteration on data BLUP program. The convergence characteristics of PCG were compared with a typical iteration on data algorithm, in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by second-order Jacobi method. Algorithms were tested with a singletrait animal model (STM) and with an RRM for the same data set.
base animals of different ages were described by 108 phantom parent groups. Statistical Models Model 1. The STM as given by Po¨ so¨ et al. (14), was yijklmn = herdi + cmyj + adk + hcyl + am + eijklmn where yijklmn is 305-d milk yield, herdi is herd effect, cmyj is calving month × calving year effect, adk is calving age × days open effect, hcyl is random effect of calving year within herd, am is random additive genetic effect, and eijklmn is random residual. There were 24,468 herds, 103 calving month × calving year classes, 49 calving age × days open classes, and 170,353 calving year within herd levels. In matrix notation the model can be written as y=Hh+Xf+Tc+Za+e where h contains the herd effect, f includes all other fixed effects, c is the random effect of calving year within herd, a is the additive genetic effect, and e is the random residuals. H, X, T, and Z are incidence matrices. It was assumed that var (c) = I σc2, var (a) = A σa2, where A is the numerator relationship matrix, and var (e) = R = I σe2. The MME can be written
[
H′R–1H H′R–1X H′R–1T
H′R–1Z
X′R–1H X′R–1X X′R–1T
X′R–1Z
T′R–1Z T′R–1H T′R–1X T′R–1T + Iσ–2 c Z′R–1H
[] [ ]
Z′R–1X
H′R–1y ˆ h X′R–1y fˆ = cˆ T′R–1y aˆ Z′R–1y
MATERIALS AND METHODS Data The data were from primiparous Finnish Ayrshires, Holstein-Friesians, and Finncattle calving between January 1988 and October 1996. The 305-d lactation milk yields of 665,629 cows and 6,732,765 test-day milk yields of 674,397 cows were used. Test-day observations were restricted within 4 to 350 DIM; herds with less than 20 test-day measurements were discarded. Pedigree data for both analyses comprised 7795 bulls and 1,091,827 cows from the three breeds. Breed differences and genetic differences between
Z′R–1T
Z′R–1Z + A–1σa–2
]
.
The variance components were the same as given in Po¨ so¨ et al. (14): σc2 / σe2 = 0.185, σa2 / σe2 = 0.617, and σe2 = 431,102 kg2. Model 2. The RRM based on covariance functions was 5
yijklmnopq = herdi + ymj + + htmn+
∑
skr r=1 ′ ′ φo(p) ap + φo(p) pp
v(r) + agel + dccm + eijklmnopq
Journal of Dairy Science Vol. 82, No. 12, 1999
2790
LIDAUER ET AL.
where yijklmnopq is test-day milk yield; herdi is herd effect; ymj is test-year effect × test-month effect; skr are five regression coefficients of test-day milk yield on DIM, which describe the shape of lactation curves within calving season class k; v = [ 1 c c2 d d2 ]′, where c and c2 are the linear and quadratic Legendre polynomials (9) for DIM, and d = ln (DIM); agel is calving age effect; dccm is days carried calf effect; htmn is random effect of test-month within herd; ap is a vector of three random regression coefficients describing breeding value of animal p; φo(p) is a vector of first three Legendre polynomials (9) for DIM of observation o of animal p; pp is a vector of first three random regression coefficients for nonhereditary animal effects describing the environmental covariances among measurements along lactation of animal p; and eijklmnopq is random residual. There were 24,321 herds, 106 test-year × test-month classes, 8 calving age classes, 5 d carried calf classes, and 1,933,641 test month within herd levels. The fixed regression coefficients were estimated within three calving season classes (October to February; March to June; July to September). Similarly to STM, RRM can be written in matrix notation as y=Hh+Xf+Tc+Za+Wp+e
[
where h contains the herd effect; f includes all other fixed effects; c comprises the random test month within herd effect; and a = [a1′ , . . . , an′ ]′, and p = ′ ′ [p1′ , . . . , pm ] where n is the number of animals and m is the number of cows with records, and e contains the random residuals. H, X, T, Z, and W are the incidence and covariate matrices. For each animal with observations, Z and W contain the appropriate Φ; for animal i with n observations, Φi = [φi1, . . . , φin]′. Note that H, X, T, and Z, as well as the corresponding vectors h, f, c, and a, had different meanings in RRM than in STM. It was assumed that
[] [
c Iσc2 a 0 var p = 0 e 0
H′R–1Z
H′R–1W
X′R–1H
X′R–1 X X′R–1T
X′R–1Z
X′R–1 W
T′R–1H
T′R–1X
T′R–1T + Iσ–2 T′R–1Z c
Z′R–1H
Z′R–1X
Z′R–1T
The variance-covariance components (Table 1) for RRM were derived from multiple-trait REML variance components using continuous covariance function approach described by Kirkpatrick et. al (9). Note that the additive genetic variance-covariance matrix for the first 305 test days can be obtained by multiplication: G305×305 ≅ Φ Ka Φ′, where Φ305×3 = [φ1, . . . , φ305]′. The heritability for a particular test day j is φj′ Kaφj (Table 2). For all analyses, hj2 = ′ φjKa φj + φj′ Kpφj + σe2 variance-covariance components and observations were scaled to units of residual standard deviation. Algorithms The MME for STM and RRM contained 1,294,694 and 7,280,477 equations, respectively. Because of the size of RRM (Table 3), iteration on data technique (11, Journal of Dairy Science Vol. 82, No. 12, 1999
T′R–1W
Z′R–1Z + A–1 ⊗ Ka–1 Z′R–1W W′R–1Z
0 0 I ⊗ Kp 0
0 0 0 R
]
where A is the numerator relationship matrix, Ka and Kp are the variance-covariance matrices of additive genetic and nonhereditary animal effects, and R = I σe2. Then, MME becomes
H′R–1H H′R–1X H′R–1T
W′R–1H W′R–1X W′R–1T
0 A ⊗ Ka 0 0
W′R–1W + I ⊗ Kp–1
][ ] [ ]
H′R–1y ˆ h X′R–1y fˆ cˆ = T′R–1y . aˆ Z′R–1y pˆ W′R–1y
16, 18) was employed in the algorithm when solving the unknowns. Iteration on data technique avoids forming the MME. It allows solving the MME, although it cannot be stored in memory, but the cost is that of reading the data at each round of iteration. Let C be the coefficient matrix of the MME, x the vector of unknowns, and b the right-hand side (i.e., C x = b). Following Ducrocq (3), we rewrite the equation as [M0 + (C – M0)]x = b
[1]
then the functional iterative procedure for several iterative algorithms can be outlined as (k) (k) x(k + 1) = M–1 0 (b – C x ) + x .
[2]
Let L be strictly the lower triangular of C, and D the diagonal of C. Then, if M0 = D, Equation [2] defines
2791
SOLVING LARGE TEST-DAY MODELS
the Jacobi iteration. If M0 = L + D, Equation [2] gives the Gauss-Seidel iteration. Extending Jacobi to second-order Jacobi method increases the rate of convergence (11). Following the notation of [2], second-order Jacobi can be written as (k) (k) + γ(x(k) – x(k – 1)) x(k + 1) = M–1 0 (b – C x ) + x
[3] where M0–1 is D–1, and γ is the relaxation factor.
[
[
H′R–1H –1
X′R H
0
Gauss-Seidel second-order Jacobi algorithm. The Gauss-Seidel second-order Jacobi algorithm (GSSJ) was used as a reference in this study. The algorithm is a hybrid of the iterative methods given above and solves the fixed effect of herd (h) by GaussSeidel and other effects by second-order Jacobi (8, 10, 11). The GSSJ algorithm was implemented to utilize the block structure in MME. The diagonal block for equations pertaining to f were treated as a single block. For STM the design of the matrix M0 in Equation [3] becomes
0
0
–1
X′R X 0
0
–1
0
diags×s{T′ R T +
–1
0
0
T′R H Z′R H
–1
Iσ–2 c }
0 diagt×t{Z′R–1Z + A–1σa–2}
]
where s = t = 1 and, correspondingly, for RRM, M0 was H′R–1H 0 –1
X′R H –1
T′R H
0 –1
X′R X 0 0
diags×s{T′R T + –1
Iσ–2 c }
0
0
0
0
0
0
0
0
diagt×t{Z′R Z + A
W′R H 0
0
0
–1
Z′R H –1
–1
where s = 1, and t = 3. For a particular animal i with observations, diagt×t{Z′R–1 Z + A–1 ⊗ Ka–1}i, is diagonal block Φi′ R–1Φi + aiiKa–1, where aii is diagonal element i in A–1, and diagt×t{W′R–1 W + I ⊗ Kp–1}i is diagonal block Φi′ R–1Φi + K–1 p . For effects solved by secondorder Jacobi, corresponding diagonal blocks in M0 were inverted and stored on disk. Relaxation factor γ for STM was 0.9, as suggested by Strande´ n and Ma¨ ntysaari (19). For the RRM two relaxation factors, γ = 0.8 and γ = 0.9, were investigated. For herd solutions (h) the relaxation factor in [3] was zero, leading to GaussSeidel for this effect. The equations for the first level of calving age × days open effect in STM and for the first level of test year × test month, calving age, and days carried calf effect in RRM were removed to ensure X′R–1X being full rank. Preconditioned conjugate gradient algorithm. Implementation of the PCG iterative method required storing four vectors (size equal to the number of unknowns in MME) in random access memory; a vector of residuals (r), a search-direction vector (d), the solution vector (x), and a work vector (v). Each round of iteration required one pass through the data to calculate
–1
⊗
Ka–1}
0 diagt×t{W′R–1W + I ⊗ K–1 p }
]
the product Cd. The preconditioner matrices M were block diagonal matrices formed from the M0 matrices in the GSSJ but without the off-diagonal blocks (X′R– 1 H, T′R–1H, Z′R–1H, andW′R–1H). The inverse of the preconditioner matrix (M–1) was stored on disk and read at each round of iteration. The starting values were x(0) = 0, r(0) = b – Cx(0) = b, and d(0) = M–1r(0) = M–1b. At every iteration step (k + 1) the following calculations were performed: v = Cd(k), r′(k)M–1r(k) , α= d′(k)v x(k + 1) = x(k) + αd(k), r(k + 1) = r(k) – αv, v = M–1r(k+1), r′(k + 1)v β = ′(k) –1 (k), and r M r (k + 1) d = v + βd(k) [4] where α and β are step sizes in the PCG method. Restrictions were imposed on the same equations as for GSSJ— either on both C and M or on M only. Journal of Dairy Science Vol. 82, No. 12, 1999
2792
LIDAUER ET AL. TABLE 1. Variance-covariance components for test month within herd effect (σc2) and additive genetic (Ka) and nonhereditary animal (Kp) effects, each with three regression coefficients, and for residual effect (σe2) when estimating breeding values for milk yield with random regression test-day model.1 Ka
Kp
σc2
Linear term
Quadratic term
Cubic term
Linear term
Quadratic term
Cubic term
0.451
0.974
−0.014 (−0.04) 0.121
−0.158 (−0.67) 0.024 (0.28) 0.058
1.808
0.036 (0.04) 0.443
−0.102 (−0.19) 0.065 (0.25) 0.153
σe2 1.000
1 Variance-covarance components are scaled by residual standard deviation. The correlations between regression coefficients are in parenthesis.
Investigation of Convergence For both algorithms, the stage of convergence was monitored after each round of iteration. Two convergence indicators were used: the relative difference between consecutive solutions cd(n) =
储x(n + 1) – x(n)储 储x(n + 1)储
and the relative average difference between the righthand and left-hand sides (12) cr(n) = where 储y储 =
储b – Cx(n + 1)储 储b储
∑yi2. i
To allow comparisons between the methods, we first investigated how small the values of cr and cd needed to be to reach the accuracy of the solutions sufficient in practical breeding work. Therefore, quasi-true EBV were obtained by performing PCG iterations until cr became smaller than 10–26, which corresponded to a standard deviation of the values in r being more than 107 times smaller than the residual standard deviation. This required 301 and 681 rounds of iteration for
STM and RRM, respectively. In the case of RRM, the breeding values for 305-d lactation were calculated using the animal EBV coefficients aˆ i:EBVi = Σ(Φ305×3aˆ i). Intermediate EBV for various cr values were obtained from corresponding solutions of MME. The EBV were standardized before comparing them. In Finland, the published indices are formed by dividing EBV by 1/10 of the standard deviation of active sires EBV and rounding them to the nearest full integer. Thus, a difference of one index point in the published index was equal to 43.3 kg of milk in EBV. For each investigated cr value, the correlation between the intermediate and the quasi-true indices was calculated. Furthermore, the percentage of indices were recorded if different from the quasi-true indices by one or more index points. Solutions were considered as converged if less than 1% of the indices deviated, at most, one index point from the quasi-true indices. This least significant change in the indices (LSC) was used as convergence criterion. To avoid a reduction in selection intensity caused by inaccurate solutions of MME, LSC was a minimum requirement. The convergence of the indices was analyzed in three different animal groups: young cows, evaluated sires, and young sires. The group of young cows included all cows having their first lactation in 1995; evaluated sires consisted of bulls born in 1984 and 1985; and young sires com-
TABLE 2. Heritability (diagonal) and genetic correlations for daily milk yield in different DIM for the random regression test-day model. DIM DIM
5
55
105
155
205
255
305
5 55 105 155 205 255 305
0.14
0.92 0.19
0.81 0.97 0.23
0.72 0.93 0.99 0.25
0.65 0.88 0.96 0.99 0.24
0.57 0.79 0.88 0.94 0.98 0.22
0.43 0.61 0.69 0.77 0.84 0.94 0.17
Journal of Dairy Science Vol. 82, No. 12, 1999
2793
SOLVING LARGE TEST-DAY MODELS TABLE 3. Number of equations, nonzeros in corresponding mixed model equations, and memory requirements for preconditioned conjugate gradient (PCG) and Gauss-Seidel second-order Jacobi (GSSJ) method when solving a single-trait animal model (STM) and a random regression test-day model (RRM). Memory requirements are given in megabytes. Iteration on data Size of iteration data files
Model
Number of equations
Number of nonzero elements
STM RRM
1,294,697 7,280,477
17,271,472 403,117,019
Random access memory
C1
PCG2
GSSJ
PCG
GSSJ
111 2,376
47 392
41 582
50 237
35 130
1 Memory requirement for storing the nonzero elements of the lower triangle and diagonal of the coefficient matrix (C) of the mixed model equations as linked list. 2 Covariables, to account for the shape of the lactation curve, were stored in a table rather than reading them from the iteration files.
prised progeny tested bulls born in 1991 and 1992. There were 82,109; 651; and 318 animals in the young cow, evaluated sire, and young sire groups, respectively. RESULTS AND DISCUSSION For STM, PCG required 88 rounds of iteration to meet the convergence criterion LSC, whereas GSSJ needed 122 rounds (Table 4). This result was in agreement with the findings by Berger et al. (1). They reported 83 rounds of iteration with PCG versus 169
rounds for successive overrelaxation, when solving a reduced animal model. For RRM the difference between methods was even more apparent. Convergence was reached after 149 rounds of iteration with PCG but not before 305 rounds with GSSJ (Table 5). For GSSJ, the rate of convergence decreased considerably at the later stages of iteration, whereas for PCG it remained almost unchanged (Figure 1). This finding reflected the weakness of Gauss-Seidel and secondorder Jacobi related methods, which required many iterations to gain additional increase in accuracy toward the end of the iteration process. If the relaxation
TABLE 4. Different convergence indicators for preconditioned conjugate gradient (PCG) and Gauss-Seidel second-order Jacobi (GSSJ) method when solving a single-trait animal model with 1,294,694 unknowns in mixed model equations. Cows Iteration method
1
cr
−7
PCG
GSSJ γ6 = 0.9
10 10−8 10−9 10−10 10−11 10−12 10−13 10−14 10−7 10−8 10−9 10−10 10−11 10−12 10−13
2
cd
6.5 9.4 4.0 3.6 3.8 5.9 2.4 4.8 5.6 4.6 6.0 5.3 6.1 8.0 6.1
× × × × × × × × × × × × × × ×
−4
10 10−5 10−5 10−6 10−7 10−8 10−9 10−10 10−5 10−6 10−7 10−8 10−9 10−10 10−11
Evaluated sires
% of indices deviate
Iteration rounds
1 pt
≥2 pt
12 20 39 58 74 88 105 117 57 81 100 122 143 162 186
26.7 54.5 9.8 3.4 2.1 0.3 <0.1 <0.1 19.0 6.3 2.0 0.5 0.2 <0.1 <0.1
62.6 2.7
3
0.5
4
Young sires
% of indices deviate
% of indices deviate
5
1 pt
≥2 pt
rI,It
1 pt
≥2 pt
rI,It
0.9887 0.9975 0.9994 0.9998 0.9999 1.0 1.0 1.0 0.9988 0.9996 0.9999 1.0 1.0 1.0 1.0
38.9 48.5 19.4 5.7 1.5 0.3
39.1 14.9 0.1
15.4 52.2 10.1 4.4 1.6 0.9 0.3
75.5 24.6 0.6
24.7 6.5 2.2 0.8 0.3 0.2
0.8
0.9761 0.9930 0.9989 0.9997 0.9999 1.0 1.0 1.0 0.9985 0.9997 0.9999 1.0 1.0 1.0 1.0
33.0 8.2 1.9 0.6 0.3
1.8
0.9543 0.9972 0.9994 0.9999 0.9999 1.0 1.0 1.0 0.9983 0.9997 0.9999 1.0 1.0 1.0 1.0
rI,It
1
Relative difference between right-hand and left-hand sides. Relative difference between consecutive solutions. 3 Percentage of indices that deviate one index point from their quasi-true indices. 4 Percentage of indices that deviate two or more index points from their quasi-true indices. 5 Correlation between intermediate indices obtained by PCG or GSSJ and quasi-true indices obtained after the PCG iteration process reached cr-value below 10−26. 6 Relaxation factor for second-order Jacobi in GSSJ. 2
Journal of Dairy Science Vol. 82, No. 12, 1999
2794
LIDAUER ET AL.
factor is not optimal, this problem can be even more severe. For instance, satisfying the convergence criterion LSC required over 600 rounds of iteration when the relaxation factor for GSSJ was 0.8 (Table 5). Caraban˜ o et al. (2) observed in all their analyses two distinct iteration phases for PCG; an unstable starting phase in which solutions converged and diverged alternately was followed by a phase with a very high rate of convergence. We observed the same behavior in PCG whenever we imposed restrictions on the fixed effect equations in both the coefficient matrix and the preconditioner matrix. Note that our implementation required restrictions in the X′R–1X block of the preconditioner to enable matrix inversion. When constraints were applied only for the preconditioner matrix, a high rate of convergence was realized during the entire iteration process (Figure 1, Tables 4 and 5). With constraints in both matrices, 32 and 219 additional rounds of iteration were required to reach convergence for STM and RRM, respectively. This result was converse to the findings of Berger et. al (1), who reported 50% reduction in the number of iteration rounds when restrictions were imposed on the fixed effect equations. Their result was based on a sire model in which the
herd-year-season effect was absorbed, and the remaining 890 equations consisted of five fixed birth year groups and 885 sires. The restriction was performed by deleting the first birth year group. According to the theory demonstrated in the literature (4, 17), the PCG method guarantees convergence to the true solutions for symmetric and positive definite coefficient matrices. Without restrictions the coefficient matrix was not of full rank and, hence, was only semi-positive definite. Because the rate of convergence clearly improved without restrictions, and because the numerical values of all estimable functions do not change (1), it seems beneficial to leave the coefficient matrix unrestricted when PCG method is used. From a practical point of view, comparison of algorithms with respect to execution time is more useful. For RRM, the PCG method required 59 CPU seconds per round of iteration, and convergence was reached after 2.5 CPU hours of computation. In contrast, the GSSJ algorithm needed 203 CPU seconds per round (without calculation of cr), and convergence was reached after 17.2 CPU hours. Both analyses were performed on a Cycle SPARCengine Ultra AXmp (300 MHz) workstation of the Finnish Agricultural Data
TABLE 5. Different convergence indicators for preconditioned conjugate gradient (PCG) and Gauss-Seidel second-order Jacobi (GSSJ) method when solving random regression test-day model with 7,280,477 unknowns in mixed model equations. Cows
Iteration method
cr1 −9
PCG
GSSJ γ6 = 0.8
GSSJ γ = 0.9
10 10−10 10−11 10−12 10−13 10−14 10−15 10−16 10−17 7.0 × 10−12 4.5 × 10−14 2.2 × 10−14 10−9 10−10 10−11 10−12 10−13 10−14 10−15
cd2 2.2 9.5 1.0 7.4 6.6 7.5 3.1 2.4 1.9 6.6 6.3 2.3 1.2 4.7 3.2 1.7 1.7 2.1 2.3
× × × × × × × × × × × × × × × × × × ×
−5
10 10−6 10−6 10−7 10−8 10−9 10−10 10−10 10−11 10−9 10−11 10−11 10−5 10−6 10−8 10−9 10−10 10−11 10−12
Evaluated sires
% of indices deviate
% of indices deviate
Young sires % of indices deviate
Iteration rounds
1 pt3
≥2 pt4
rI,It5
1 pt
≥2 pt
rI,It
1 pt
≥2 pt
rI,It
38 49 79 83 116 149 219 225 259 300 600 700 117 127 174 213 305 421 552
56.0 56.2 13.0 9.9 3.9 0.9 0.1 0.1 <0.1 4.1 0.5 0.3 8.7 5.3 1.7 1.0 0.3 0.1 <0.1
8.3 1.8
0.9960 0.9979 0.9992 0.9994 0.9998 0.9999 1.0 1.0 1.0 0.9997 1.0 1.0 0.9995 0.9997 0.9999 0.9999 1.0 1.0 1.0
48.2 40.7 8.9 7.1 2.3 0.3 0.2 0.2
11.1 4.3
0.9942 0.9971 0.9996 0.9996 0.9999 1.0 1.0 1.0 1.0 0.9998 1.0 1.0 0.9998 0.9999 0.9999 0.9999 1.0 1.0 1.0
45.3 59.7 23.9 18.9 3.5 0.6
28.3 12.2
0.9973 0.9987 0.9994 0.9995 0.9999 1.0 1.0 1.0 1.0 0.9998 1.0 1.0 0.9998 0.9999 0.9999 0.9999 1.0 1.0 1.0
1
3.7 0.3 0.2 4.5 2.9 2.3 1.1 0.3 0.2
6.0 1.3 0.9 4.7 4.4 2.2 1.9 0.9 0.3
Relative difference between right-hand and left-hand sides. Relative difference between consecutive solutions. 3 Percentage of indices that deviate one index point from their quasi-true indices. 4 Percentage of indices that deviate two or more index points from their quasi-true indices. 5 Correlation between intermediate indices obtained by PCG or GSSJ and quasi-true indices obtained after the PCG iteration process reached cr-value below 10−26. 6 Relaxation factor for second-order Jacobi in GSSJ. 2
Journal of Dairy Science Vol. 82, No. 12, 1999
SOLVING LARGE TEST-DAY MODELS
Figure 1. Relative average difference between left-hand and righthand sides (Cr), for the Gauss-Seidel second-order Jacobi method with two different relaxation factors, (a) γ = 0.8 and (b) γ = 0.9, and for (c) preconditioned conjugate gradient, when solving a random regression test-day model with 7,280,477 unknowns in the mixed model equations.
Processing Centre. All data files were kept in random access memory during the iteration process to keep CPU time unaffected by input/output operations. Two reasons existed for the large difference in execution time between algorithms. Implementation of PCG enabled a more efficient program code than an algorithm employing Gauss-Seidel. Both algorithms required reading of the data at each round of iteration. Additional computing time was required by GSSJ to store the contributions to the MME of each herd and to reread them to adjust the right-hand sides with new Gauss-Seidel solutions for the herd effect. For the same reason GSSJ does not allow the method of residual updating (18), but PCG does. Strande´ n and Lidauer (18) introduced a new technique for iteration on data. Iteration on data requires a fixed number of calculations for each record say p (multiplications and additions), to compute the record contribution to the matrix multiplication Cx in [3] and Cd in [4]. By using standard iteration on data technique, p follows a quadratic function of the number of effects in the statistical model. The PCG allows a reordering of the multiplications in a way that p is a linear function of the number of effects in the statistical model (18). Consequently, for RRM p was 573 for GSSJ but was 66 for PCG. This reduction explained most of the difference
2795
in computing time per round of iteration. In fact, computation of the product Cd in [4], with the new iteration on data technique required less multiplications and additions than if the sparse matrix of coefficients (403,117,019 nonzero elements) would have been used. A disadvantage of PCG, in comparison to the GSSJ method, was a greater demand of random access memory, which may limit its use in large applications. One way to circumvent this problem is to store the solution vector on a disk, and to make the work vector unnecessary by reading the data twice at each round of iteration. An expense of these modifications is increased computing time. The most common convergence indicator in animal breeding applications is cd because it is easy to obtain. However, it has been demonstrated (12) that the evaluation of convergence from solely cd may be inappropriate, because the real accuracy of the solutions can be much lower than indicated by cd. Our results supported this conclusion. When cd was applied to solutions obtained by GSSJ, the indicator suggested that the accuracy of the solutions from round 300, with γ = 0.8, was higher than that from round 174, with γ = 0.9 (Table 5). However, indices with one point deviation from the quasi-true indices and a correlation between intermediate and quasi-true indices proved to be the opposite (Table 5). This finding was also supported by the approximated accuracy of the solutions as derived by Misztal et. al (12), which was 0.0075 for solutions from round 300 with γ = 0.8 versus 0.0017 for solutions from round 174 with γ = 0.9. The convergence indicator cr was regarded as more reliable (20), but for secondorder Jacobi and Gauss-Seidel methods, calculation of cr was expensive. In PCG all components of cr were readily available. Estimation of breeding values with RRM required greater accuracy in the solutions of MME than with STM. This was because a breeding value of an animal in RRM was a function of breeding value coefficients (aˆ p) rather than a single solution from MME. For both models, a high correlation of at least 0.9999 was observed between the quasi-true indices and the indices that fulfilled the convergence criterion LSC. The correlation between the converged indices from the STM and the RRM were 0.967, 0.990, and 0.988, for young cows, evaluated sires, and young sires, respectively. However, percentages of indices differing two or more index points between the two models were 49.5, 28.4, and 44.6 for young cows, evaluated sires, and young sires, respectively. This finding indicated a significant change in ranking of the animals when estimating EBV with STM or with RRM. Journal of Dairy Science Vol. 82, No. 12, 1999
2796
LIDAUER ET AL.
CONCLUSIONS The PCG seemed to be an attractive alternative to solve large MME. Solving the MME of RRM was accomplished in only 14% of the computation time needed for GSSJ. The implementation of PCG was straightforward and without any parameter estimates (e.g., relaxation factors). This gave another advantage over second-order Jacobi-related methods. We observed that PCG performed better when no restrictions were imposed on the coefficient matrix. Thus, the convergence was not impaired by the coefficient matrix being semi-positive definite. The estimation of breeding values with RRM required a greater accuracy of the solutions of MME than with STM. This finding favored PCG in particular, for which an additional increase in the accuracy of the solutions was computationally less costly than for GSSJ because of the high rate of convergence during later stages of iteration.
9
10
11 12 13
14
15
REFERENCES 1 Berger, P. J., G. R. Luecke, and A. Hoekstra. 1989. Iterative algorithms for solving mixed model equations. J. Dairy Sci. 72:514– 522. 2 Caraban˜ o, M. J., S. Najari, and J. J. Jurado. 1992. Solving iteratively the M. M. E. Genetic and numerical criteria. Pages 258–259 in Book Abstr. 43rd Annu. Mtg. Eur. Assoc. Anim. Prod., Madrid, Spain. Wageningen Pers, Wageningen, The Netherlands. 3 Ducrocq, V. 1992. Solving animal model equations through an approximate incomplete Cholesky decomposition. Genet. Sel. Evol. 24:193–209. 4 Hageman, L. A, and D. M. Young. 1992. Applied iterative methods. Acad. Press, Inc., San Diego, CA. 5 Hestenes, M. R., and E. L. Stiefel. Methods of conjugate gradients for solving linear systems. Natl. Bur. Std. J. Res. 49:409–439. 6 Jamrozik, J., L. R. Schaeffer, and J.C.M. Dekkers. 1997. Genetic evaluation of dairy cattle using test day yields and random regression model. J. Dairy Sci. 80:1217–1226. 7 Jamrozik, J., L. R. Schaeffer, Z. Liu, and G. Jansen. 1997. Multiple trait random regression test day model for production traits. Pages 43–47 in Bull. no. 16. INTERBULL Annu. Mtg., Vienna, Austria. Int. Bull. Eval. Serv., Uppsala, Sweden. 8 Jensen, J., and P. Madsen. 1994. DMU: A package for the analy-
Journal of Dairy Science Vol. 82, No. 12, 1999
16 17 18 19 20
21 22
sis of multivariate mixed models. Proc. 5th World Congr. Genet. Appl. Livest. Prod., Guelph, ON, Canada XXII:45–46. Kirkpatrick, M., W. G. Hill, and R. Thompson. 1994. Estimating the covariance structure of traits during growth and aging, illustrated with lactation in dairy cattle. Genet. Res. Camb. 64:57– 69. Lidauer, M., E. A. Ma¨ ntysaari, I. Strande´ n, A. Kettunen, and J. Po¨ so¨ . 1998. DMUIOD: A multitrait BLUP program suitable for random regression testday models. Proc. 6th World Congr. Genet. Appl. Livest. Prod., Armidale, NSW, Australia XXVII:463–464. Misztal, I., and D. Gianola. 1987. Indirect solution of mixed model equations. J. Dairy Sci. 70:716–723. Misztal, I., D. Gianola, and L. R. Schaeffer. 1987. Extrapolation and convergence criteria with Jacobi and Gauss-Seidel iteration in animal models. J. Dairy Sci. 70:2577–2584. Misztal, I., L. Varona, M. Culbertson, N. Gengler, J. K. Bertrand, J. Mabry, T. J. Lawlor, and C. P. Van Tassell. 1998. Studies of the values of incorporating effect of dominance in genetic evaluations of dairy cattle, beef cattle, and swine. Proc. 6th World Congr. Genet. Appl. Livest. Prod., Armidale, NSW, Australia XXV:513–516. Po¨ so¨ , J., E. A. Ma¨ ntysaari, M. Lidauer, I. Strande´ n, and A. Kettunen. 1998. Empirical bias in the pedigree indices of heifers evaluation using test day models. Proc. 6th World Congr. Genet. Appl. Livest. Prod., Armidale, NSW, Australia XXIII:339–342. Reents, R., J.C.M. Dekkers, and L. R. Schaeffer. 1995. Genetic evaluation for somatic cell score with a test day model for multiple lactations. J. Dairy Sci. 78:2847–2870. Schaeffer, L. R., and B. W. Kennedy. 1986. Computing strategies for solving mixed model equations. J. Dairy Sci. 69:575–579. Shewchuk, J. R. 1994. An introduction to the conjugate gradient method without the agonizing pain. School of Computer Sci., Carnegie Mellon Univ. Pittsburgh, PA. Strande´ n, I., and M. Lidauer. 1999. Solving large mixed linear models using preconditioned conjugate gradient iteration. J. Dairy Sci. 82:2779–2787. Strand´ en, I., and E. A. Ma¨ ntysaari. 1992. Animal model evaluation in Finland: experience with two algorithms. J. Dairy Sci. 75:2017–2022. Van Vleck, L. D., and D. J. Dwyer. 1985. Successive overrelaxation, block iteration, and method of conjugate gradients for solving equations for multiple trait evaluation of sires. J. Dairy Sci. 68:760–767. Wiggans, G. R., and M. E. Goddard. 1997. A computationally feasible test day model for genetic evaluation of yield traits in the United States. J. Dairy Sci. 80:1795–1800. Wiggans, G. R., I. Misztal, and L. D. Van Vleck. 1988. Animal model evaluation of Ayrshire milk yield with all lactations, herdsire interaction, and groups based on unknown parents. J. Dairy Sci. 71:1319–1329.