Livestock Production Science 54 Ž1998. 229–250
Marker-assisted preselection of young dairy sires prior to progeny-testing M.J. Mackinnon a , M.A.J. Georges b
b,)
a Institute of Cell, Animal and Population Biology, UniÕersity of Edinburgh, West Mains Rd., EH9 3JT, Edinburgh, UK Department of Genetics, Faculty of Veterinary Medicine, UniÕersity of Liege ` (B43), 20 Bd de Colonster, 4000-Liege, ` Belgium
Accepted 5 November 1997
Abstract This study investigates the value of a ‘bottom-up’ approach to marker-assisted selection in a conventional progeny-testing dairy-breeding programme. By marker genotyping the daughters in the progeny test for markers known to be closely linked to a quantitative trait locus ŽQTL., it can be decided whether their sire is heterozygous for the QTL. If the sire is heterozygous with allelic contrast greater than some threshold, c, then only those bull-sons which inherited the favourable QTL allele are retained for subsequent progeny testing. In this way, posterior information on a sire’s genotype from his daughters is used to preselect his sons and thereby increase the selection differential in the new generation of bulls. Simulations were used to predict the genetic gains and costs of using the bottom-up approach in a national dairy breeding scheme in which 500 young bulls were progeny-tested each generation. It was found that rates of genetic gains could be increased by 8%, 14% and 23% compared with conventional progeny testing if selection was based on 1, 2 and 5 QTL, respectively, and that this would cost less than US$100,000 per locus. A ‘top-down’ approach selecting QTL alleles inherited from the grandsires was also evaluated and shown to be highly profitable, though less so than for the bottom-up scheme. q 1998 Elsevier Science B.V. All rights reserved. Keywords: QTL; Marker-assisted selection; Dairy cattle
1. Introduction In recent years, dramatic improvements in genetic-marker technology have permitted the systematic dissection of genetically complex traits into their Mendelian components Že.g., Paterson et al., 1989; Hilbert et al., 1991; Andersson et al., 1994; Georges et al., 1995.. As most agriculturally important traits are complex in nature, these new possibilities have
)
Corresponding author. Tel.: q32-43-66-41-50; fax: q32-4366-41-22; e-mail:
[email protected]
attracted much attention from animal and plant geneticists because of the associated opportunities to design more efficient selective breeding programmes ŽSoller and Beckmann, 1983.. For milk production, a number of regions of the genome with quantitative trait locus ŽQTL. have been located Že.g., Geldermann et al., 1985; Cowan et al., 1990; Bovenhuis and Weller, 1994; Ron et al., 1994; Georges et al., 1995; Spelman et al., 1996. and further exploration is underway. The challenge remains, however, to devise breeding programmes which maximize the benefit of this information in outbred populations such as dairy cattle.
0301-6226r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 3 0 1 - 6 2 2 6 Ž 9 7 . 0 0 1 6 9 - 3
230
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
Benefits from using QTL-linked genetic markers in breeding programmes Žcalled marker-assisted selection, MAS. can accrue in three ways. First, markers can be used to increase the accuracy of selection by providing more information on an animal’s genotype than otherwise obtained using just phenotypic information. Increases in selection accuracy can be achieved if there is across population disequilibrium between markers and QTL ŽSmith and Simpson, 1986; Lande and Thompson, 1990; Zhang and Smith, 1993., or through explaining more of the withinfamily Mendelian sampling variation ŽMeuwissen and Van Arendonk, 1992; Hoeschele and Romano, 1994; Meuwissen and Goddard, 1996.. Second, markers can be used to decrease generation intervals by allowing selection at earlier stages in life Že.g., Kinghorn et al., 1991.. Third, markers can be used to increase the selection differential by allowing screening and preselection among larger numbers of candidates for later selection ŽKashi et al., 1990.. In this study we present an alternative approach to exploiting markers within a conventional progenytesting scheme in dairy cattle—the ‘bottom-up’ scheme—and show that its contribution to rates of genetic gain can be high for relatively low extra costs. The following sections present some analytical tools which can be used to evaluate the benefits and costs of the bottom-up scheme and hence its potential gains and associated risks. Comparisons are made with the conventional progeny-testing scheme and with a top-down approach based on the granddaughter design ŽWeller et al., 1990; Kashi et al., 1990..
2. Description of preselection schemes Table 1 summarizes the notation used in this study. Assume that a national dairy herd improvement scheme has 500 young bulls to progeny-test each generation. These young bulls are offspring of NBS s 10 genetically superior bull-sires and a group Žnumbering NBD . of superior bull-dams. After the progeny test, which involves phenotyping ND s 50 or 100 daughters each, their proofs are released and semen from the best NS s 50 of these bulls are disseminated to the industry for wide use in commercial herds. The best NBS s 10 of these 50 are used as
bull-sires for generating NB s 50 young bulls each to give 500 young bulls for the next round of progeny testing. It is further assumed that through previous marker–QTL linkage studies that several regions of the genome have been identified as carrying one or more QTL which are segregating in the population. These regions are marked by polymorphic markers or marker haplotypes for which animals are easily genotyped. 2.1. Bottom-up scheme Imagine that a young sire yields an outstanding progeny-test evaluation and therefore qualifies as a bull-sire. His daughters—upon which phenotypes the sire’s progeny-test is based—would be immediately genotyped for markers Žwith alleles denoted W and w . linked to known QTL Žwith alleles denoted Q and q . ŽFig. 1a.. This information would be used to indicate whether the sire is heterozygous for a marked QTL or not. He is deemed to be heterozygous if the difference in mean milk production of his two marker allelic or haplotype daughter groups is greater than a specified value, c. If he is heterozygous, his young bull progeny Žhalf-sibs to these daughters. which carry the favourable marker-QTL allele are retained for progeny testing, and all progeny carrying the unfavourable allele are precluded from progeny testing. Those precluded are replaced by additional young bulls which are carrying the favourable allele, Q. Progeny from sires which seem to be homozygous at the QTL are retained in the team i.e., they are not discriminated against. The quota of bull progeny is the same for each sire so that genetic merit due to loci other than the QTL in question is not affected by preselection. Progeny testing then proceeds as normal, the difference now being that the candidates have been preselected for segregating QTL beforehand and therefore are genetically superior prior to progeny testing than if preselection had not occurred. Note that because the scheme involves ‘topping up’ the young bull team with replacements in order to fill the preexisting quota of 50 bull-sons per bull-sire, there is no compromise to the standard progeny-testing scheme Žagainst which gains are compared in this study. in imposing this preselection scheme.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
231
Table 1 Description of notation used in this study Žin alphabetical order. and typical values Symbol
Description
Values used in this study
A BM BV c y dq i , di Dn EŽ D ) c . Fm Gj GM G100% G10 % H
Number of alleles per locus Factor by which the number of bulls to be generated is increased Average genetic value of the sons of a particular bull-sire Žbreeding value. Value of marker contrast above which a sire is deemed to be heterozygous Effect of the favourable and unfavourable allele at the ith locus Overall discount over n years Expected value Žaverage. of all allelic contrasts greater than c Number of progeny to be generated to give a 95% probability that NB have the favourable alleleŽs. Genetic gain due to preselection on locus j Predicted genetic gain due to preselection based on M loci Average genetic gain in all the bull progeny Average genetic gain in the top 10% of bull progeny when ranked on genetic merit Probability that the sire is heterozygous with D ) c
2, 3 or 4 1.2 to 6
Hˆ h h2 L M m n NBS NBD NB NC ND NS P Ž D ) c. PAI PM PM ) PS PS ) pi D
Probability that the sire is heterozygous with Dˆ ) c Probability that the sire is truly heterozygous, i.e., D ) 0 Heritability of milk production Number of loci controlling milk production Number of marked loci Number of loci for which the sire is deemed to be heterozygous Number of years over which the economic values are discounted Number of bull-sires Želite sires. Number of bull-dams Number of bull progeny per bull-sire Number of cows in the commercial herd Number of daughters on which the progeny test is based Number of sires selected after progeny testing Probability that D is greater than c Proportion of all cows inseminated by proven sires Market share of semen sales previously held by the implementing company Market share held after implementing preselection Proportion of young bulls selected as sires of the next generation Ž NS rŽ NBS NB .. Proportion of young bulls selected which come from the participating AI company Frequency of allele i Allelic contrast at a heterozygous QTL in a sire
Dˆ $AI
Estimated allelic contrast based on marker contrasts in the progeny Profit to the AI company
$C $S $s $g $p sP sG sw2
Profit per cow Cost of a successful insemination Žresulting in a live calf. Net value of one phenotypic standard deviation of milk production Cost of genotyping one animal for one marker locus Cost of producing an extra full-sib bull progeny as a replacement for excluded young bulls Phenotypic standard deviation for total annual milk production Genetic standard deviation for total annual milk production Variance within marker contrasts
2.2. Top-down scheme In the top-down scheme, it is assumed that the bull-sire’s andror bull-dam’s sires Žgrandsire in Fig. 1b. were included in a granddaughter design, allow-
0 to 0.65s P y0.5 to q0.5s P 9.8 0.17 to 0.78 s P 500 to 5000 0.05 to 0.15s P 0.05 to 0.15s P 0.01 to 0.67 0.01 to 0.67 0.67 0.3 5, 10 or 20 1, 2 or 5 1, 2 or 5 20 10 500 50 10 6 50 or 100 50 0 to 1 75% 25% 25 to 40% 10%
US$5 million– US$15 million US$5–US$25 US$12 US$122 US$2 US$500 1100 kg of milk 550 kg of milk 0.04 to 0.20
ing for the identification of the QTL for which the grandsireŽs. are heterozygous. For these QTL, progeny are selected according to which grandpaternal alleles they inherit. Note that in the top-down approach, grandpaternal QTL alleles can be traced
232
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
Fig. 1. Schematic diagram of the bottom–up Župper. and top–down Žlower. schemes. Sires of young candidate bulls are evaluated for their QTL genotype using marker information from their daughters Žbottom-up. or their sires Žtop-down.. Prior to the progeny test, their sons would be preselected based on their marker genotype linked to segregating QTL: only sons having inherited all favourable alleles for these segregating QTL would be admitted in the progeny test Žbold squares.. Precluded bulls would be replaced with full or half-sib bulls carrying favourable alleles in order to fill the quota for subsequent progeny testing.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
through bull-sire and bull-dam, while in the bottomup scheme only the bull-sire’s QTL alleles are considered. On the other hand, in the top-down approach, selection is applied irrespective of whether the bull-sire and bull-dam are heterozygous or not. Thus if these are homozygous, selection is performed when there is nothing to be gained. There are many possible ways to implement the top-down scheme. Kashi et al. Ž1990. proposed that extra genetic gain could be obtained by selecting candidate sires prior to progeny-testing ‘‘on the basis of an index I s P y Z, where, P s number of marker alleles or haplotypes associated with favourable QTL alleles that have been traced from elite sire to grandson; and Z s number of marker alleles or haplotypes associated with unfavourable QTL alleles that have been traced from elite sire to grandson.’’ To allow for a proper comparison with the bottom-up scheme, we implement the top-down scheme as follows. When only one parent has a heterozygous sire, selection is based on one side of the pedigree only. In such cases half of the progeny from each bull-sire or bull-dam are retained—those inheriting the favourable allele from their grandsire in the case where the parent Žbull-sire or bull-dam. inherited it, or those not receiving the unfavourable allele if the parent inherited the unfavourable allele. When both the bull-sire and bull-dam have selectable alleles, two options are envisaged. The first way, called the ‘maximal scheme’, is to retain only one quarter of the progeny when both the bull-sire and bull-dam have selectable alleles, those having inherited the maximum possible number of favourable alleles or minimum number of unfavourable alleles from their grandsires. The maximal scheme yields the maximum possible genetic gain: this gain is similar to that for the bottom-up scheme Žsee Section 3., but also incurs greater costs than for the bottom-up scheme where only half the progeny need to be replaced. The second way, the ‘optimal scheme’, costs a similar amount to the bottom-up scheme but yields less genetic gain. In this scheme one half of the progeny for each bull-sire are selected Žas in the bottom-up scheme., but this is distributed unevenly among the dams: if a sire carrying a favourable allele mates with a dam carrying a favourable allele, all the three quarters of the progeny which carry at least one of the favourable alleles are retained; but if this sire mates with a dam carrying
233
an unfavourable allele, only one quarter are retained. The analogous rules are used if the sire carries an unfavourable allele. Thus in the optimal scheme there is discrimination among the bull-dams because some are allowed to retain more progeny than others. 2.3. Method of preselection In order to implement the scheme, it is necessary to decide upon a minimum marker contrast, c, as measured in the daughters in bottom-up, or sires in top-down, which will be used to decide whether the bull-sire or grandsire, respectively are heterozygous for a QTL or not. Obviously if c is set too low, many young bulls will be precluded based on the wrong assumption that their sire or grandsire is heterozygous and that they carry the unfavourable QTL. Also, if the contrast in the daughtersrsires is measured on too few animals, the prediction of whether the sirergrandsire is heterozygous or not will be inaccurate and mistakes in genotype designation will be made. The next sections examine what the optimum c and number of daughtersrsires genotyped should be in terms of genetic gains and economic costs. It is assumed that there may be several marked QTL in the genome which could be used simultaneously in this scheme. We denote M as the number of QTL that have been mapped, and m Ž m s 0,1, . . . , M . as the number of loci for which the bull-sire is heterozygous. Selecting on more markers both increases the gains and the costs as shown below.
3. Genetic gains due to preselection 3.1. Bottom-up Consider the average genetic value of the sons of a particular bull sire as: BV s
1 2
L
Ý dqi q dyi
Ž 1.
is1
where L denotes the number of loci or QTL controlling the trait of interest Žincluding so-called polygenic loci with small effect., dq i denotes the allelic effect of the plus allele at the ith QTL locus, and dy i
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
234
the effect of the minus allele. These loci may or may not be marked by polymorphic genetic markers. Eq. Ž1. can be rewritten singling out a marked locus j, as: BV s
1 2
dq j q
1 2
dy j q
L
1 2
Ý
y dq i q di
Ž 2.
is1,i/j
If preselection is practised, only those bulls with the plus allele at the jth locus are retained and those with the minus allele are discarded. The average value of this group of preselected sons, assuming that the sires are mated to an average group of dams, would be: BV s dq j q
1 2
GM s M H 2
L
Ý
y dq i q di
Ž 3.
is1,i/j
Subtracting Eq. Ž2. from Eq. Ž3. gives the gain in average value of the bull sons due to preselection on that locus, Gj , as: Gj s
1 2
1
Ž dqj y dyj . s 2 D j
Ž 4.
where D j is the difference between allelic effects at the jth locus and is defined such that it is always greater than or equal to zero. If selection was based on several marked heterozygous loci then by similar reasoning the gain would be the sum of Gj over all these loci. Assuming that the loci are unlinked and act in an additive way, the gain is cumulative over all loci on which selection is practised. Furthermore, gain is only obtained if the bull is heterozygous at the marked locus. Denoting D as the mean value of the contrast over the marked loci, and H as the average probability that the sire is heterozygous at each locus, the genetic gains are expected to be
D GM s MH
2
.
and D D , respectively if both have selectable markers, the genetic gain if the maximal strategy is used is G s DSr4 q D D r4 and if the optimal strategy is used, the gain is G s DSr8 q D D r4. If only the bull-sire has a selectable allele, the gain is G s DSr4 and if only the bull-dam has a selectable allele the gain is D D r4. If neither are selectable the gain is zero. If, as before, the probability that a grandsire is heterozygous is H, and the expected values of the contrasts in sires of bull-dams are the same as those in bull-sires, then the gains from using M markers in the top-down maximal scheme are predicted to be
Ž 5.
3.2. Top-down The genetic gains using the top-down scheme can be derived in a similar way to those for the bottom-up scheme except that now information can be used from both the bull-sire and bull-dam based on their paternal Žgrandsire. alleles. Denoting the allelic contrasts in the sires of the bull-sire and bull-dam as DS
D 2
D q2 H Ž1yH .
4
D s MH
2
Ž 6.
i.e., as for bottom-up. In the optimal scheme the gains are
ž
GM s MH 1 y
H 4
/
D 2
Ž 7.
As the optimal scheme under-utilizes marker information, and the aim of this study is to show how to fully exploit marker information, it is not further evaluated.
4. Economic costs and benefits 4.1. Costs There are two major costs associated with the preselection schemes. As above, these are derived for the bottom-up scheme and then modified for the top-down scheme. The first is the cost of generating the extra bull progeny in order to meet the quota of sons with only plus alleles. In the bottom-up approach, the increase in the number of bull progeny is proportional to the number, m, of the M marked loci which are heterozygous in the bull-sire. For each marker for which the bull-sire is heterozygous, the required number of bull progeny to generate more than doubles that required without preselection because only half of the progeny receive the plus allele. The increase in the number of bull progeny required to give a 95% chance to obtain NB bull offspring all with plus
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
alleles at each of the m heterozygous loci is by a factor BM equal to: BM s
1 NB
M
Ý ms0
Fm
M! m! Ž M y m . !
H m Ž1yH .
My m
Ž 8. where Fm is the number of progeny required to give a 95% probability that at least NB offspring carry the favourable alleles for m loci, i.e., Fm satisfies the binomial probability that out of Fm trials there is a probability of 95% that there are at least NB successes where the probability of each success is 1r2 m Ži.e., the probability that the progeny carries favourable alleles at all m loci, assuming the loci are unlinked.. For example, if 50 bull progeny per bullsire are required and there are M s 2 marked loci under consideration with the probability of each locus being heterozygous as 0.67 Žsee later., then the probability that the bull has m s 0, 1 and 2 marked loci heterozygous is 0.33 2 , 2 = 0.67 = 0.33 and 0.67 2 , and the numbers of progeny required to give a 95% guarantee that there are at least 50 generated are 50, 117 and 242, respectively. This gives the expected number of bull progeny generated to meet the quota of 50 to be 166 i.e., increased by a factor of BM s 3.3. The extra costs due to phenotyping are then $ p NB NBS Ž BM y 1. where $ p is the cost of producing each extra bull progeny. In the top-down scheme there are potentially 2 M marked loci to exploit Žone from the bull-sire and one from the bull-dam.. Thus to work out the increase in the number of progeny, 2 M is substituted for M in Eq. Ž8.. The second major cost is that of genotyping for the markers. In the bottom-up scheme, for each marker, the bull-sires, their daughters and their bull progeny have to be genotyped. Denoting the cost per unit genotype as $ g , the number of bull-sires as NBS , the number of daughters per bull-sire as ND , the total genotyping and phenotyping cost in using the scheme for M marked loci Žover and above the usual cost of progeny testing. is: BU $ BU M s NBS $ g M Ž 1 q ND q HBM NB .
q$ p Ž BMBU y 1 . NB
In the top-down scheme, all of the sons of the elite sires Žgrandsires, or bull-sires in the previous generation. have to be genotyped to decide which grandsires are heterozygous. Bull-dams have to be genotyped, bull-sires are assumed to have been already genotyped as sons of the grandsire, and all the bull progeny have to be genotyped. Denoting the number of bull-dams per elite sire as NBD , the total cost is TD $ TD M s NBS $ g M Ž NB q NBD q H Ž 2 y H . BM NB .
q$ p Ž BMTD y 1 . NB
where the BU superscripts denote the bottom-up scheme.
Ž 10 .
4.2. Profits Economic profits derived from the scheme will depend on the financial benefit from the genetic gains relative to the expenses incurred in implementing the scheme. These benefits are now described in terms of both the profit per cow and the profit for an AI company investing in the scheme. The methods of Brascamp et al. Ž1993. who performed an economic analysis of the value of MAS are used. For calculating the gains on a per cow basis, they assumed that if there was an increase in mean breeding value of young bulls through preselection of G, Ži.e., only on the sire–sire and sire–dam pathways. the annual genetic gain in the commercial cow population would be 0.1G. Based on earlier work by Brascamp Ž1973., they assumed that the value of annual increased milk production discounted value over a 25 year period would yield 7 times the economic value corresponding to the genetic gain in the cow population. If the economic value of an increase in milk yield of one phenotypic standard deviation is $s then the average gain in value per cow by practising the bottom-up approach using M markers is expected to be 7 = $s = 0.1GM . Assuming that the commercial cow population is NC the profit per cow is $C s
Ž 9.
235
7$s GM 10
y
$M NC
Ž 11 .
Economic gains from MAS can also accrue to the individual AI companies through capturing a greater market share of semen sales by having genetically
236
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
superior proven bulls. Assuming that a company holds PM of the market share in semen sales and that this is increased to PM ) through increasing the mean genetic value of their proven sires Žsee below. then the increased annual profit to this company due to sales of semen at a value of $ S per successful AI Ži.e., leading to the birth of a calf. to the proportion, PAI , of all cows inseminated with semen from proven sires is PAI NC $ S Ž PM ) y PM .. Discounting at a rate r over n years, and assuming that the benefits from the first batch of tested bulls arise in year 6, gives the discounting factor, Dn , by which these annual gains are multiplied to give the cumulative financial gains over the years where Dn is calculated as n Ž Sis1 1rŽ1 q r .. iq5 ŽBrascamp et al., 1993.. If the AI company has a market share of PM the cost of preselection to the company is $ M PM . Therefore, the profit in terms of semen sales is given by: $AI s PAI NC $ S Ž PM ) y PM . Dn y $ M PM Ž 12 . The increased proportion of selected young bulls among those of the AI firm participating in preselection is calculated by assuming that the mean of their young bulls has increased by an amount GM . If PS of the young bulls are selected as sires for the next generation then the truncation point for selection for the participating firm has moved leftward relative to the mean thereby increasing the proportion selected. For example, if the mean is increased by an amount of 0.15sG then if the top 10% over all the population are selected, then 12% of those from the participating AI firm will be selected according to standard normal theory. The new market share will then be PM PS ) PM ) s Ž 13 . PM PS ) q Ž 1 y PM . PS where PS ) is the new proportion of bulls selected from among the preselecting AI firm Ži.e., 12% in this example.. Now that the general framework for the schemes has been described, it remains to numerically evaluate them for genetic gains, costs and profits. This
requires that some assumptions are made about the underlying genetic model for the genes controlling milk production. In Section 5, such a model is described, and the theoretical predictions of gains and costs are derived from it. The model is then used as basis for stochastic simulations which are used to check the theoretical predictions.
5. Genetic model The genetic model used to evaluate the scheme should be in accord with what has been observed to date. QTL mapping experiments performed in plants have repeatedly shown that a relatively small number of loci accounts for very large portions of phenotypic variance, with increasing numbers of loci accounting for progressively smaller portions of variance, until the significance threshold is reached ŽPaterson, 1995.. For milk production, whole genome scans using a granddaughter design have demonstrated a small number of loci with surprisingly large allele substitution effects ŽGeorges et al., 1995; Coppieters and Georges, personal communication., that are likely to act in concert with many other genes of minor effect. Moreover, haplotype analysis of for instance the casein cluster suggests the existence of polyallelic QTL. Allelic heterogeneity is also commonplace in many human hereditary conditions. For milk production, the underlying loci yield an overall amount of additive genetic variation Ž sG2 . which is proportionally 0.3 of the phenotypic variation Ž s P2 ., i.e., heritability h 2 s 0.3. Assume that there are L loci each with A alleles and that the distribution of allelic effects Žexpressed in units of s P . follows a double exponential, or Laplace distribution. In such a distribution, alleles with small effect are at higher frequency than rarer alleles with large effect, and the alleles have a positive or negative effect relative to the mean value of allelic effects over all loci which is zero. The
Fig. 2. Numerical evaluation of the assumed genetic model. Ža. Heritability as a function of l for 5, 10 or 20 loci with either 2 or 4 alleles each. Žb. Distribution of allelic effects when h2 s 0.3 for a model with 10 loci each with 4 alleles Ž10 = 4., or 20 loci with 2 Ž20 = 2. or 4 Ž20 = 4. alleles each. Žc. Distribution of allelic contrasts under the genetic models in Žb. in a random population. Žd. As for Žc. except in the top 10% of the population based on genetic merit. Že. Average allelic contrast in heterozygous animals Župper lines, left axis. and variance explained Žlower lines, right axis. as a function of rank of locus based on variance explained for three genetic models. Žf. Average power to detect QTL as function of rank of locus.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
237
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
238
gamma distribution, of which the exponential is a special class, has been successfully fitted to the distribution of allelic mutation effects by Keightley Ž1994.. The double exponential distribution has a density function of 1 yl < x < l eyl x 0Fx-` f Ž x. s s le 2 le l x y` - x - 0 Ž 14 .
½
5
To simulate a quantitative trait controlled by L loci, each with A alleles, we first sample at random L = A values according to their Laplace densities and assign allele frequencies within each locus according to a uniform distribution and then rescale so that they sum to unity. Then the total genotypic value for each animal in the population is generated by randomly sampling from this pool of alleles Žassuming there is gametic phase equilibria between all loci.. In order to yield an overall amount of additive genetic variation Ž sG2 . which is proportionally 0.3 of the phenotypic variation Ž s P2 . Ži.e., heritability Ž h 2 . s 0.3 typical for milk production traits., we need to find a value of l which generates this level of variation. This value will depend on the number of loci controlling the trait and the number of alleles per locus, and can be found through simulation. The relationship between h 2 and l for L s 5, 10 or 20 loci with A s 2 or 4 alleles each is shown in Fig. 2a. The data are averages of 25 simulations per set of parameters with 50 sires per simulation: the standard deviation around each line averaged 12% of the mean. Thus sampling 2 to 4 alleles from a Laplace distribution with l s 6–12 will yield a value of sG2 of f 0.3 and is therefore suitable for simulating the genetic basis of milk yield as required in this study. An example distribution of the allelic effects found in a population of 50 bull-sires is shown in Fig. 2b for the cases of 10 loci with 4 alleles, and 20 loci with 2 or 4 alleles each. Typically, for a population with h 2 s 0.3 sampled in this way, allelic effects range from y0.5s P to 0.5s P . Note that allelic effects are defined as the deviations from the average of all alleles over all loci. In Fig. 2c, the distribution of allelic contrasts, which provide the material for selection in the preselection schemes described here, are shown. The contrasts ranged from 0.0 Žwith 60–80% homozygotes. to 0.7s P2 . Even when the top
10% of the population was selected ŽFig. 2d., the effect of the number of alleles and loci on these distributions was small. In Fig. 2e the average contrast Žexcluding homozygotes. and the proportion of the phenotypic variance accounted for by the locus Žincluding homozygotes. is shown as a function of the rank of the locus with respect to the genetic variance it accounts for. The ‘best’ locus on average accounted for 10% of the variance which is modest compared with most assumed QTL effects. Over the top ten loci, there was not a large difference between the genetic models in the average contrast, though the 2-allele model tended to produce higher contrast than the 4-allele models. This is, however, compensated for by a reduced heterozygosity in the 2-allele case, resulting in a very uniform phenotypic variance accounted for by each locus. In Fig. 2f, the average power to detect these contrasts given 100 sons with 100 daughters each Žcalculated using a non-central t-test assuming a Type I error of 0.001, including homozygotes. is shown: power to detect QTL using this structure is very similar under the three models. It is therefore concluded that under the chosen model of a Laplace distribution of allelic effects, the estimates of genetic gains achievable through preselection of young dairy sires are likely to be relatively insensitive to variations in the number of loci and alleles per locus, at least within the explored range. A model assuming L s 10 and A s 4 was considered for further analysis. 6. Predicted gains and cost In the previous sections in was shown that the gains depended on the average value of the allelic contrast in heterozygous sires, the probability that the sire was heterozygous, and the number of markers ŽEq. Ž5... Now assuming that preselection is practised only if the contrast is greater than some cut-off value, c, and that this contrast is measured very accurately, Eq. Ž5. can be rewritten to predict the theoretical gains as EŽ D )c. GM s MhP Ž D ) c . Ž 15 . 2 where h is the probability that the sire is truly heterozygous, P Ž D ) c . s Hrh is the probability that if the sire is heterozygous the contrast is greater
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
than c, and EŽ D ) c . is the expected value of D when it is greater than c. The latter two terms can be calculated as functions of c and l from the following equations Žsee Appendix A for derivations. and are therefore easily evaluated once the value of l appropriate to the heritability, number of loci and number of alleles has been computed. 3 q 3c l q c 2l2 EŽ D ) c. s Ž 16 . 2 l q c l2 cl P Ž D ) c . s eyc l 1 q Ž 17 . 2 The corollaries to these are: 3 q 3c l q c 2l2 y 3e c l EŽ D - c. s Ž 18 . 2 l q c l2 y 2 l e c l cl P Ž D - c . s 1 y eyc l 1 q Ž 19 . 2 In the model, it is assumed that D is continuous in which case the probability of D being zero is 0. In fact, in this application, there is a limited number of alleles, A, at the locus so the probability of D being zero is the probability of the sire being homozygous when there are A alleles with frequencies randomly assigned from a uniform distribution and rescaled to sum to unity. Thus the probability of being heterozygous, h, is calculated as:
ž
/
ž
/
A
hs1y
1
H0
PPP
1
1
H0 H0
Ý pi2 is1 2
A
d p 1d p 2 PPP d pA
žÝ / pi
is1
Ž 20 . where pi is the frequency of the ith allele randomly chosen from the uniform distribution for the locus in question. When A s 4, h s 0.67 so that 67% of the sires are expected to be heterozygous at a randomly chosen locus when simulating under this model. Because D is not measured completely accurately, but rather on a limited number of daughters in the bottom-up scheme or sons in the top-down scheme, it is necessary to take into account the consequences of being in error in the decision about whether D is greater or less than c, i.e., whether to preselect or not. For example, if the estimate of the contrast Dˆ, is less than c, but the real value Ž D . is greater than c, then selection will not be practised
239
where it should have been i.e., a Type 2 error is made. Similarly, if the sire is homozygous Ž D s 0. but Dˆ ) c then a Type 1 error is made. Table 2 shows the 12 possible situations which can arise from having three possible truthful situations Ž D s 0, - c or ) c . each with four possible decisions based on Dˆ, and the effects on gains and costs. Note that even when D - c there are still gains when selection is practised because EŽ D - c . is greater than zero. Also note that when yDˆ ) c such that selection is practised but for the wrong allele ŽType 3 error., gains will be negative. Obviously such errors decrease as Dˆ is measured more accurately using more daughters, and as c increases. Overall the expected gains when there are errors are GM s
=
Mh 2 E Ž D ) c . P Ž D ) c, Dˆ ) c . q E Ž D - c . P Ž D - c, Dˆ ) c .
yE Ž D ) c . P Ž D ) c,y Dˆ ) c . y E Ž D - c . P Ž D - c,y Dˆ ) c .
Ž 21 . The values of P Ž D ) c, Dˆ ) c . etc. in Eq. Ž21. are worked out by integration assuming that Dˆ is normally distributed with mean D and variance equal to the variance of the mean marker contrast in the daughters or sons. This is shown in Appendix A together with graphs of these values as functions of c when the marker contrast is measured using 100 daughters. The effect of error in measuring D on costs is to alter the number of bull progeny required to be generated. The new factor by which the number of bull progeny is increased, BM , is found by substitut-
Table 2 Effects on gains and costs of the possible classifications of Dˆ in relation to D Observed
Truth
D) c
D- c
Ds 0
Gain
Costs
Gain
Cost
Gain
Cost
0 - Dˆ - c
0
0
0
0
0
0
0 - Dˆ ) c
q
q
q
q
0
q
0 -y Dˆ - c
0
0
0
0
0
0
0 -y Dˆ ) c
y
q
y
q
0
q
y. D Corresponds to Ž dq as defined in Section 3.1. j y dj
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
240
ing in Eq. Ž8. an erroneous value of H which is calculated as: Hˆ s Ž 1 y h . P Ž Dˆ ) c < D s 0 . q P Ž yDˆ ) c < D s 0 . q h P Ž D ) c, Dˆ ) c . q P Ž D ) c,y Dˆ ) c . qP Ž D - c, Dˆ ) c . q P Ž D - c,y Dˆ ) c .
7. Numerical evaluation First a small numerical example is given to illustrate the selection decision process. Following this, the likely realized gains and costs are obtained from stochastic simulations and presented together with the corresponding theoretical gains and costs.
7.1. Example Consider an example case where there are two markers available for selection. They relate to loci with four alleles with effects Žin s P units. and frequencies as given in Table 3. The amount of phenotypic variance which each of these loci explain is 8% and 5%, respectively. The marker contrasts in the daughters for the two loci in 10 bull-sires are measured and the number of bull-sires which are heterozygous for zero, one or two of the marked loci when the heterozygosity criterion, c, is 0.05s P or 0.30 s P are counted. The resulting gains in breeding value averaged over the 10 bull sires are 0.13 s P and 0.12 s P for c s 0.05s P and c s 0.30 s P , respectively ŽTable 3.. The corresponding total number of bull progeny over the 10 bull-sires was increased from 500 without preselection to 1333 for c s 0.05s P and to 931 for c s 0.30 s P . Among the 10 sires and
Table 3 Example calculation of bottom-up preselection on two loci with four alleles each among 10 bull-sires with 100 daughters each Alleles in population Locus 1 Allelic effect
Locus 2 Population frequency 0.171 0.406 0.074 0.349
y0.207 y0.133 y0.127 0.283 Contrasts in the 10 bull-sires Ž s P units. Sire Locus 1
1 2 3 4 5 6 7 8 9 10
Allelic effect
Population frequency 0.365 0.233 0.145 0.257
y0.204 y0.118 0.212 0.277
Locus 2
D
Dˆ
D
Dˆ
0.074 y0.416 0.000 0.000 y0.410 y0.074 y0.416 0.000 y0.416 0.000
0.244 y0.418 y0.167 y0.067 y0.376 0.001 y0.561 0.010 y0.787 y0.516
0.000 0.000 y0.086 y0.330 0.416 0.086 y0.086 0.086 y0.086 y0.416
y0.372 y0.208 y0.091 y0.671 0.746 0.119 0.431 0.370 0.090 0.004
Average genetic gain over all bull-sires: for c s 0.05, 1r2Ž0.074 q 4 = 0.416 q 0.410 q 3 = 0.086 y 2 = 0.086 q 0.330.r10 s 0.128; for c s 0.30, 1r2Ž4 = 0.416 q 0.410 q 0.330 y 0.086 q 0.086.r10 s 0.120. Costs Žtotal no. of bull progeny to get 500 carrying favourable alleles.: for c s 0.05, three sires with one heterozygous locus and seven sires with both loci heterozygous required a total of 1333 sons to be generated; for c s 0.30; two sires with 0 heterozygous loci, six sires with one heterozygous locus and two sires with both loci heterozygous required 931 sons to be generated.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
2 loci the numbers of Type 1, 2 and 3 errors were respectively, 4, 2 and 2 when c s 0.05 and 4, 1 and 1 for c s 0.30. 7.2. Stochastic simulations Twenty replicate populations of 500 animals each were generated under the genetic model of 10 loci, each with four alleles, contributing to the genetic variation for the trait. Allelic effects for each locus were sampled from the Laplace distribution with h 2 s 0.3 Ž l s 9. and assigned population frequencies at random as described previously. The loci were ranked on the amount of variation they contributed in order to define which loci would be targeted for selection using markers. For the bottom-up scheme, 10 bull-sires were chosen at random and ‘mated’ to dams from the same population to produce either 50, 100 or 200 daughters each. These daughters had values equal to the sum of the values of the sire and dam trait locus alleles, each inherited at random with probability of 0.5 each, and a random environmental deviation which was simulated from a normal distribution with a mean of zero and a variance of 1 y h2 s 0.7s P2 . It was assumed that markers segregated with QTL alleles without recombination and that the marker locus was polymorphic enough such that paternal and maternal alleles could be distinguished in all cases. The marker contrasts in the daughters were measured taking either M s 1, 2 or 5 markers in their order of decreasing variation. For a range of c from 0.05 to 0.65 in increments of 0.05, for each bull-sire it was decided for which m of the M loci he was heterozygous. Then 50 bull-sons per bull-sire were generated using the same sampling procedure described above, and were kept only if they carried the plus marker allele for all m loci for which the sire was heterozygous. For the top-down scheme, 50 grandsires were generated as for the 10 bull-sires described above. Fifty sons from each grandsire were generated by choosing for each locus one of the grandsire’s alleles and an allele randomly drawn from the general population and then adding an ‘environmental’ deviation from a normal distribution with a variance of Ž0.75h 2 q Ž1 y h 2 .. s P2rnumber of daughters. Mean marker contrasts were then measured in these sons for each grandsire and a decision
241
made as to whether the grandsire was heterozygous or not. Ten of these grandsires were then randomly chosen to become bull-sires. Similarly, 500 bull-dams were created by assigning as a sire one of the 50 grandsires, then for each locus taking one of his alleles and drawing the other allele from the general population. Then each of the 10 bull-sires was mated to 50 different bull-dams to produce 50 sons. These sons were screened for whether they carried the favourable alleles from their paternal and maternal grandsires. They were retained if they carried the maximum possible number of favourable alleles, and were replaced with new progeny if they did not until 50 young bulls per sire were obtained. The population mean breeding value of all 500 of these preselected sons was then calculated and deviated from the mean when no preselection was practised to give the gain when 100% of the population
Fig. 3. Predicted Žno symbols. and realized Žlines with symbols. . genetic gains Žin sP units. as a function of c when 1 Ž or 5 Ž- - -. markers and 100 daughters are used in the bottom-up Župper. and top-down Žlower. schemes. Circles indicate results for an average locus; squares indicate results for the best ranking loci.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
242
were considered, G 100% . The sons were then ranked on their actual breeding value, as would occur after progeny testing, and the mean of the best 50 Ži.e., the top 10%. was calculated and deviated from the equivalent mean with no preselection to give the gain in the population of bulls which would eventually be selected, denoted G 10 % . The gain in mean breeding value due to preselection was then plotted as a function of c and found to be approximately linear. A linear regression model was fitted to the means of G 10 % and G 100% over the 20 replicates with fixed effect terms for an intercept, the method of preselection Žbottom-up vs. top-down., the number of daughters, number of markers and interaction between them, and regression terms for c and interactions between c and the fixed effects. The total numbers of progeny generated to meet the quota were also recorded.
Based on an economic value of US$122 per phenotypic standard deviation of milk Ž1100 kg. which is an estimate of current milk returns net of production costs ŽGroen, A., personal communication. and the same as that which Brascamp et al. Ž1993. assumed, and a cost of $ M s US$2 per genotype and $ P s US$500 per bull progeny produced shared over a population of NC s 1.0 million cows, the profits per cow were calculated. For the profit to the AI company, the discount rate was r s 5% over n s 20 years, with a value of $ S s US$12 per insemination producing a live calf applied to PAI s 75% of the cow population, and the firm previously holding P M s 25% of the market share ŽBrascamp et al., 1993.. Theoretical predictions of genetic gains, costs, error rates and profits for a population with l s 9 Ž h 2 s 0.3. were also computed as a function of c for
Table 4 Regression estimatesa of genetic gains in the whole bull progeny population Ž G100% . and in the top 10% Ž G10% . on the cut-off criterion, c, when preselection is on 1, 2 or 5 markers and when there are 50 or 100 daughters per bull-sire averaged over 20 replicates of stochastic simulations Effect
G100%
G10%
Intercept a
Slope
Intercept
y0.493 Ž0.018
0.072 0.000 Ž0.009. ) ) )
y0.068 0.000 Ž0.021. )
0.100 0.000 Ž0.016. ) ) )
y0.197 0.000 Ž0.041. ) )
No. of markers 1 2 5
y0.220 y0.123 0.000 Ž0.009. ) ) )
0.355 0.204 0.000 Ž0.021. ) ) )
y0.132 y0.046 0.000 Ž0.016. ) ) )
0.153 0.031 0.000 Ž0.041. ) ) )
No. of daughters 50 100
y0.016 0.000 Ž0.004. ) ) )
0.039 0.000 Ž0.010. ) ) )
y0.015 0.000 Ž0.016.
0.0.38 0.000 Ž0.022.
y0.163 y0.273 y0.561
0.091 0.162 0.319
y0.133 y0.220 y0.461
y0.138 y0.289 y0.493 Ž0.027. ) )
0.088 0.174 0.220 Ž0.022. ) ) )
y0.111 y0.233 y0.264 Ž0.056. ) )
Overall mean Method Bottom-up Top-down
0.318 Ž0.000
Method= no. of markers Bottom-up 1 0.120 2 0.204 5 0.390 Top-down 1 0.098 2 0.195 5 0.318 Ž0.011. ) ) ) a
.) ) )
Slope
.) ) )
0.220 Ž0.013
.) ) )
y0.264 Ž0.032. ) ) )
Main effects are expressed as deviations from the mean for five markers and 100 daughters Žoverall mean. except for the interaction terms where the means have been added back in. Significance levels are indicated beside the standard errors in parentheses.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
comparison with the realized values from the stochastic simulations.
8. Numerical results Fig. 3 shows the predicted and simulated genetic gains as functions of c when one or five markers are used in a bottom-up and top-down scheme using 100 daughters per sire. Gains decreased approximately linearly with c indicating that c should be minimized to the point where costs and effort in using the
243
scheme becomes feasible. Using five markers Ždotted lines. produced about three times the genetic gain using one marker Žsolid lines. when selection was based on selected loci compared with the expected 5-fold increase when using random loci. The theoretical predictions Žlines with no symbols. matched the simulated results well if the marked loci were randomly chosen Žlines with circles., but when the loci were those which explained the most variance, genetic gains were almost doubled Žlines with square symbols.. This is because the average values of D for the best 1, 2 and 5 loci were 1.9, 1.7 and 1.4
Fig. 4. Genetic gains when c s 0.3 sP in the full population of candidate bulls Ž G100% . and top 10% of them based on genetic merit when either 50 or 100 daughters and the best ranking 1, 2 or 5 markers were used as predicted from stochastic simulations.
244
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
times higher than the average value for all 10 loci because greater contrasts produce more genetic variance and hence higher ranking for use in preselection. Thus the predicted genetic gain on the basis of an average allelic contrast over all loci is undoubtedly conservative. Table 4 shows the linear regression estimates for G 100% and G 10% from the simulations in which the marked loci were the ones explaining the highest amounts of variance. Maximum gains Žwhen c s 0. were significantly increased by using the bottom-up scheme instead of the top-down scheme, and by increasing the number of markers and number of daughters Žor sons. per bull-sire, but the rate of decline with increasing c also increased with these factors. The fact that G 10 % was almost as high as G 100% indicates that preselection did not trade off against the subsequent gains obtained by truncation selection of the top 10% after preselection. Thus preselection supplements conventional genetic gains. Fig. 4 shows the results from simulations at a fixed value of c s 0.30 s P and illustrates that most realistically achieved genetic gains will be between 0.05 and 0.15s P when preselection is based on loci which rank highest for variance explained by markers. Fig. 5 shows the predicted Žlines without symbols. and realized Žlines with symbols. total costs when it is assumed that $ P s US$500 and $ G s US$2 when there were 100 daughters per sire. Predicted costs were generally lower than estimated from simulations which is in accord with the higher realized than predicted genetic gains. Using very low values of c vastly increased the costs as did the use of five markers Ždotted lines. instead of one marker Žsolid lines.. This was mainly due to the cost of raising the extra bull progeny required when preselection is practised widely i.e., in most sires and on several loci. ŽIn some cases the extra number of bull progeny exceeded 5000 which was considered to be infeasible and so the data were excluded from further analysis.. The majority of costs Ž) 95%. were due to phenotyping, not genotyping. When one marker was used, the factors by which the number of progenytested bulls has to be increased were 1.7 and 1.4 for c s 0.05s P and c s 0.30 s P , respectively in the bottom-up scheme and corresponding values of 3.0 and 1.8 in the top-down scheme. When five markers
Fig. 5. Predicted Žno symbols. and realized Žlines with symbols. . or five Ž- - -. costs as a function of c when one Ž markers and 100 daughters are used in the bottom-up Župper. and top-down Žlower. schemes. Circles indicate results for an average locus; squares indicate results for the best ranking loci.
were used at a criterion of c s 0.30 s P , these expanded to 2.6 and 5.8 in the bottom-up and top-down schemes, respectively, and over 100-fold when c s 0.05s P . The top-down scheme incurred much higher costs than the bottom-up scheme at low levels of c because twice the number of progeny are required to be generated when tracking alleles from grandparents rather than parents as in the bottom-up scheme, and tracking two grandparental alleles requires a 16-fold increase in the number of progeny to be generated in order to get both favourable QTL alleles. However, when c is moderate, preselection is practised only in cases where the marker contrast is high which are quite rare and so the costs of top-down and bottom-up become low and roughly equivalent.
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
245
Overall, if a moderate value of c is used the costs are expected to be around US$100,000 per marker, depending heavily on the cost of generating replacement progeny. Fig. 6 shows the profit per cow if the costs were defrayed over 1 million cows. These ranged from US$5–US$25 per cow depending on how many markers were used and the value of c. At low values of c and using five markers, the top-down scheme was much less profitable than the bottom-up scheme because costs in the top-down scheme were enormous. However, when c ) 0.3 s P , the profitability of top-down was greater than 80% of that for bottom-up. Fig. 7 shows the profitability to the AI companies which is expected to be between 5 and 15 million
Fig. 7. Predicted Žno symbols. and realized Žlines with symbols. profits for the participating AI company as a function of c when . or five Ž- - -. markers and 100 daughters are one Ž used in the bottom-up Župper. and top-down Žlower. schemes. Circles indicate results for an average locus; squares indicate results for the best ranking loci.
Fig. 6. Predicted Žno symbols. and realized Žlines with symbols. . or five profits per cow as a function of c when one Ž Ž- - -. markers and 100 daughters are used in the bottom-up Župper. and top-down Žlower. schemes. Circles indicate results for an average locus; squares indicate results for the best ranking loci.
dollars at moderate levels of c. These resulted from an increase from a 25% market share to between 30 and 40%. The pattern was similar to that for profit per cow. In general, the bottom-up scheme is highly profitable, especially when used at the high-risk area of low preselection criterion, c. In contrast, the top-down scheme is more costly, and generally less profitable than bottom-up, and unsuitable for use at low values of c. However, at high values of c the top-down scheme can also contribute to significant economic gains.
246
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
9. Discussion This study has shown that exploiting marker information through preselection can produce considerable increases in genetic gains which are additional to those obtained using conventional progeny testing. The expected increase among young candidate bulls if a moderate criterion of c s 0.3 s P is used are 0.07s P , 0.12 s P and 0.20 s P when selection is based on 1, 2 or 5 markers. Compared with genetic gains when the top 10% of bulls are selected on their progeny test, which is expected to equal approximately 0.90 Žselection accuracy. = 1.755sG Žgenetic selection differential. = 60.3 Žconversion from sG to s P ., this corresponds to an increase in rate of genetic gain among the bull population of 8%, 14% and 23%. These predicted gains are based on a model in which the allelic contrast of the ‘best’ locus is 0.32 s P and explains 10% of the total phenotypic variance. These gains are comparable to the gains predicted using MAS in a nucleus breeding scheme ŽMeuwissen and Van Arendonk, 1992; Meuwissen and Goddard, 1996.. This concordance is not surprising because markers used in nucleus schemes also exploit the Mendelian sampling variance within families. Our predictions are considerably lower than those of Kashi et al. Ž1990. because some of their gains would have been obtained through progeny testing, whereas we predict the additional gains over progeny-testing. Nevertheless, our re-evaluation of the preselection approach first proposed by Kashi et al. Ž1990., clearly shows that it can produce significant genetic gains, that these gains are achieved at realistic costs and considerable profit. The profits predicted here are very similar to those predicted by Brascamp et al. Ž1993. given the genetic gains. The key requirement to achieving these gains is the ability to produce enough full or half-sib bull progeny to replace those precluded from progeny testing due to carrying unfavourable QTL alleles. Without the use of reproductive technologies such as embryo harvesting, the size of the bull-dam population would have to be increased concomitantly, with a consequent reduction in genetic superiority on the dam side. Using a slightly modified version of Eq. Ž8. Žaccounting for the fact that half the progeny will be females., one can determine that on average 1170, 2008, 3420 and 16096 matings will have to be
performed in order to have a 95% chance to obtain 500 bulls when selecting on 0, 1, 2 and 5 QTL, respectively. Note that these figures assume that D is known with certainty ŽEq. Ž8.., and that selection is performed for any value of D / 0. In reality, D is estimated and selection performed when Dˆ G c. The numbers given are therefore expected to represent worst case scenarios. Assuming a total population of one million cows and an average reliability of 0.65 for the estimations of the bull-dam’s breeding values ŽBVbull-dams s selection intensity Ž i . = accuracy Ž6h 2 . = 60.3., this will lead to a reduction in average bull-dam’s breeding value by 0.070 s P , 0.126 s P and 0.308 s P when selecting on 1, 2 and 5 QTL, respectively. Half of this genetic loss will be transmitted to the sons, causing a reduction of the genetic gain by 0.035s P , 0.063 s P and 0.154s P to be subtracted from the gains of 0.07s P , 0.12 s P and 0.20 s P due to marker assisted selection on 1, 2 and 5 QTL, respectively. In addition, there could be a considerable time-lag in order to generate the replacement progeny. Therefore it is emphasized that the optimal use of the bottom-up or top-down schemes requires the parallel implementation of reproductive technologies such as MOET or preferably gamete harvesting. Comparison of the bottom-up approach vs. the top-down approach revealed that while the bottom-up approach is always more profitable than the top-down approach, the benefits of the two approaches become quite similar at a high preselection criterion. As the bottom-up approach under its present version exploits Mendelian variance due to the segregation of bull-sire’s QTL alleles only, while the top-down approach captures information from the bull-dam’s side as well, there is good reason to combine the two approaches for optimal benefit. Such a combined approach would exploit QTL which were segregating in both the current and previous generation of elite sires and could be considered as a continuous process over multiple generations. It should be noted, however, that in the top-down design as described by Kashi et al. Ž1990. and applied in this study, the heterozygous ‘Qq’ QTL genotype of the grandsires is inferred from the phenotypic segregation observed in their respective sons. This approach is obviously only feasible as long as these sons have not yet been preselected themselves on the basis of their grandpaternal QTL alleles, which would be the case after a
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
few successive generations of applied MAS. Inference of a bull’s QTL genotype would therefore have to be measured in unselected daughters reducing the top-down approach to bottom-up. Perhaps the most attractive feature of the bottomup approach is that it suits the current situation regarding mapped QTL. Systematic genome scans for QTL controlling milk production have only identified regions of the genome in which major genes are located Že.g., Georges et al., 1995. but the precise location, the number of QTL in these regions, the number of alleles of each QTL and the magnitude of their effects are not yet known. This presents a problem in using QTL information in genetic evaluation systems Že.g., marker-based BLUP. which requires that such parameters are accurately known. Using the bottom-up scheme, preselection is based on rather inaccurate measurements of the allelic contrasts in the parents, but this is not an impediment to genetic gains. Moreover, it targets QTL which are segregating in the current breeding generation but have not yet been explored and so opens up possibilities for finding new alleles at these loci. The bottom-up scheme is also designed to fit easily into and complement the existing breeding structure of the industry. Indeed, as presented in this work, breeding value estimation and MAS are kept functionally distinct, which obviates the need to alter the genetic evaluation system in place. By performing preselection amongst candidate sires prior to progeny testing, the objective pursued when applying the bottom-up scheme is to take advantage of Mendelian sampling differentiating full-sibs, at a point in time when conventional progeny testing is forced to ignore it. As the progeny-testing scheme remains unaltered except for this preselection, the genetic gain obtained results neither from an increase in accuracy or reduction of generation interval, but from an increase in selection differential as one preselects amongst a larger pool of candidates than the number that is conventionally progeny-tested. To maximize genetic gain, the increase in selection differential should be limited only by the available QTL data and not by the number of candidates to select from, as done by Kashi et al. Ž1990.. The genetic model used here for evaluating gains assumed that milk production was controlled by 10 loci, each with four alleles with a range of effects
247
which increased as their frequency decreased. It could be considered that the assumption of four alleles per locus biased the predictions of genetic gains upwards because it resulted in more heterozygous sires and therefore greater opportunity for preselection. However, simulations showed that reducing the number of alleles to two per locus, did not greatly alter the variance attributable to the locus because simultaneously the average allelic contrast was increased. Thus the assumed number of loci and alleles did not greatly influence both the ability to detect the QTL and the value in exploiting the Mendelian sampling variance. The simulations also showed that the expected variance explained by individual QTL under this model was considerably less than that assumed by most other evaluators of MAS ŽMeuwissen and Van Arendonk, 1992; Meuwissen and Goddard, 1996; Ruane and Colleau, 1995; Spelman and Garrick, 1996.. The reduction in gains due to the effects of recombination and incomplete marker informativeness were not considered because it is envisaged that in the implementation multi-point marker haplotypes will be used to track QTL alleles. Candidate offspring that would show a recombination within the segment to which the QTL has been mapped, and would therefore have an ambiguous status with regard to the identity of their QTL allele, would be selected against. This would result in a supplementary increase of the factor BM ŽEq. Ž8.. that would depend on the size of the chromosome segment to which the QTL has been mapped. Preliminary results ŽCoppieters, W., personal communication. show that the application of identity-by-descent strategies allows mapping of QTL to chromosome segments that are smaller than 10 centimorpan. The estimated variable costs of implementing the bottom-up scheme which tests 500 young bulls per year from 10 bull-sires was less than US$500,000. This cost is negligible compared with the cost of the progeny-testing scheme. Moreover, the extra gains which the bottom-up scheme will yield represents an increase in return to money already invested. On a per cow basis, the profits are of the order of US$10 and for an AI company they are of the order of US$10 million. Even if these returns are grossly overestimated the cost effectiveness of the scheme is still very good. Of particular concern among dairy breeders in
248
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
recent years is the trade-off between-short-term and long-term genetic response. If intense selection is practised then genetic gains will be maximised in the short-term but loss of variation through inbreeding will result and so the long-term response may be reduced ŽRoberston, 1960; Woolliams and Thompson, 1994.. MAS may also compromise the improvement in polygenic value and therefore ultimately result in little, or even negative genetic gain ŽGibson, 1994; Ruane and Colleau, 1995; Spelman and Garrick, 1996.. The effects of practising the preselection on long-term response by exploiting Mendelian sampling variation at segregating QTL requires further investigation. Also, the consequences of long-term selection to the shape of the distribution of allelic effects also deserves more study.
For dq) 0, dy) 0 or dq- 0, dy- 0 `
P Ž D)c. s
`
EŽ D ) C . s
yl D
Hc l e
Dd D scq
P Ž D)c.
`
yl d q
Hc l e `
Acknowledgements EŽ D ) c. s
1
l
d dqs eyc l
yl d q
ž
Hc l e
`
dqq
yl d q
Hc l e
1
/
l
d dq scq
d dq
2
l
Ž2. For 0 - dq- c, dy- c y dq c
yl d q y lŽ cyd q .
P Ž D)c. s
H0 l e
s
H0 l e
c
e
yl c
d dq
d dqs c l eyc l
EŽ D ) c. ` c
yl d
H0 l e
Appendix A
q
dqq
y
Hcyd
y l e d dy d dy q
`
Hcyd
s
c
A.1. Expected Õalues of D without error When allelic effects are sampled from the double exponential ŽLaplace distribution., the alleles which are drawn, dq and dy, Ž dq) dy . are either both positive, or both negative, or one of each. They will be both of the same sign with probability 1r2, and of different signs with probability 1r2 also. If they are of the same sign, since D is defined as the absolute difference between the alleles, we can treat D as an exponential variable and obtain its probabilities and expected values for D ) c as follows.
d D s eyc l
If dq and dy are of opposite signs, the density function f Ž D . must be treated in two separate parts: one where dq) c Žin which case if D ) c then dy- 0, and the other where dq- c in which case dy- c y dq. The corresponding probabilities and expected values of these parts are the following. Ž1. For dq) c, dy- 0 P Ž D)c. s
This work was supported by Holland Genetics, Livestock Improvement and the Ministere ` de l’Agriculture et des Classes Moyennes ŽBelgium.. The financial support of Holland Genetics and Livestock Improvement Corporation to M.J.M. for part of this work is gratefully acknowledged. Discussions with numerous colleagues, in particular J. van Arendonk, P. Visscher, W. Hill and H. Simianer have greatly benefitted this work. P. Amer, A. Groen and J. Koopman are thanked for advice on the economic analysis.
yl D
Hc l e
yl d q
H0 l e c
s
yl c
H0 l e
ž
d dq
dqq c y dqq c
yl c
H0 l e
d dq
y
l e d d dy q
1
l
/
d dq scq
1
l
When these are combined, weighing each by 1r2 they give the overall values of: P Ž D ) c . s eyl c 1 q
ž
cl 2
/
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
EŽ D ) c. eyl c s
1 2
ž
cq
1
1
/ ž ž q
l
2
cq
eyl c 1 q
2
1
/ /
q
l cl 2
2
ž
cl c q
1
l
/
3 q 3c l q c 2l2 s
2 l q c l2
Corresponding values for when D - c are worked out to be: cl P Ž D - c . s 1 y eyl c 1 q 2
ž
EŽ D - c. s
/
3 q 3c l q c 2l2 y 3e lc 2 l q c l2 y 2 l e l c
A.2. Expected Õalues of P( D ) c, Dˆ ) c) etc. when D is measured with error The probabilities of Dˆ being less than or greater than c are obtained by integrating over the joint distribution of D and Dˆ. It is assumed that Dˆ is normally distributed with mean D and variance equal to the variance of the marker contrast, sw2 . The expressions for these integrals are given below. Fig. 8 shows the numerical values of these for D s 0, D ) c and D - c separately as functions of c when ND s 100 using the bottom-up scheme. Note that probabilities sum to one across all lines and all graphs because these represent all the possible combinations between true and observed marker contrasts in relation to c. The probabilities are calculated as follows: ` 1 dqyc P Ž D ) c,i . s h f Ž dq . f Ž dy . g i d dy d dq 2 y` y` ` ` 1 P Ž D - c,i . s h f Ž dq . f Ž dy . g i d dy d dq 2 y` d qyc 1 yc P Ž D s 0," Dˆ ) c . s Ž 1 y h . f 2 sw
H
H
H
H
ž /
Fig. 8. Probabilities of getting Dˆ as - c, ) c, )y c or -y c for the three cases of D ) c, D - c and D s 0 when 100 daughters are used in a bottom-up scheme.
249
M.J. Mackinnon, M.A.J. Georgesr LiÕestock Production Science 54 (1998) 229–250
250
where g i is a function which represents the four possible classifications of Dˆ as follows: cyD
is1
Dˆ - c
g1 s f
is2
Dˆ ) c
g2 s 1 y f
is3
yDˆ - c
g3 s f
is4
yDˆ ) c
g4 s 1 y f
ž / ž ž / ž sw
yD
yf
cyD
sw
ž /
/
yD
sw
yf
cqD
sw
sw
ž /
yc y D
sw
/
and P Ž x . denotes the standard normal cumulative density function of x.
References Andersson, L., Haley, C.S., Ellegren, H., Knott, S.A., Johansson, M., Andersson, K., Andersson-Eklund, L., Edfors-Lilja, I., Fredholm, M., Hansson, I., Hakansson, J., Lundstrom, ˚ ¨ K., 1994. Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263, 1771–1774. Bovenhuis, H., Weller, J.I., 1994. Mapping and analysis of dairy cattle QTL by ML methodology using milk protein genes as genetic markers. Genetics 137, 267–280. Brascamp, E.W., 1973. Model calculations concerning optimization of A.I.-breeding with cattle: I. The economic value of genetic improvement in milk yield. J. Anim. Breed. Genet. 90, 1–15. Brascamp, E.W., Van Arendonk, J.A.M., Groen, A.F., 1993. Economic appraisal of the utilization of genetic markers in dairy cattle breeding. J. Dairy Sci. 76, 1204–1213. Cowan, C.M., Dentine, M.R., Ax, R.L., Schuler, L.A., 1990. Structural variation around prolactin gene linked to quantitative traits in an elite Holstein sire family. Theor. Appl. Genet. 79, 577–582. Geldermann, H.U., Pieper, U., Roth, B., 1985. Effects of marked chromosome sections on milk performance in cattle. Theor. Appl. Genet. 70, 138–146. Georges, M., Nielsen, D., Mackinnon, M., Mishra, A., Okimoto, R., Pasquino, A., Sargeant, L., Sorensen, A., Steele, M., Zhao, X., Womack, J.E., Hoeschele, I., 1995. Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics 139, 907–920. Gibson, J.P., 1994. Short-term gain at the expense of long-term response with selection of identified loci. In: Smith, C., Gavora, J.S., Benkel, B., Cheshous, J., Fairfull, W., Gibson, J.P., Kennedy, B.W., Burnside, E.B. ŽEds.., Proceedings of the 5th World Congress of Genetics Applied to Livestock Production, Guelph. Vol. 21, 201–204. Hilbert, P., Lindpainter, K., Serikawa, T., Soubrier, F., Cartwright, P., Dubay, C., Julier, C., Takahashi, S., Vincent, M., Beckmann, J., Ganten, D., Georges, M., Lathrop, M., 1991. Chro-
mosomal mapping of two genetic loci associated with hereditary hypertension in the rat. Nature 353, 521–529. Hoeschele, I., Romano, E.O., 1994. On the use of marker information from granddaughter designs. J. Anim. Breed. Genet. 110, 429–449. Kashi, Y., Hallerman, E., Soller, M., 1990. Marker-assisted selection of candidate bulls for progeny testing programmes. Anim. Prod. 51, 63–74. Keightley, P.D., 1994. The distribution of mutation effects on viability in Drosophila melanogaster. Genetics 138, 1315– 1322. Kinghorn, B.P., Smith, C., Dekkers, J.C.M., 1991. Potential genetic gains in dairy cattle with gamete harvesting and in vitro fertilization. J. Dairy Sci. 74, 611–622. Lande, R., Thompson, R., 1990. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124, 743–756. Meuwissen, T.H.E., Goddard, M.E., 1996. The use of markerhaplotypes in animal breeding schemes. Genet. Select. Evol. 28, 161–172. Meuwissen, T.H.E., Van Arendonk, J.A.M., 1992. Potential improvements in rate of genetic gain from marker-assisted selection in dairy cattle breeding schemes. J. Dairy Sci. 75, 1651– 1659. Paterson, A.H., 1995. Molecular dissection of quantitative traits: progress and prospects. Genome Res. 5, 321–333. Paterson, A.H., Lander, E.S., Hewitt, J.D., Peterson, S., Lincoln, S.E., Tanksley, S.D., 1989. Resolution of quantitative traits into Mendelian factors by using a complete linkage map of RFLPs. Nature 335, 721–726. Roberston, A., 1960. A theory of limits in artificial selection. Proc. R. Soc. London, Ser. B 153, 234–249. Ron, M., Band, M., Yanai, A., Weller, J.I., 1994. Mapping quantitative trait loci with DNA microsatellites in a commercial dairy cattle population. Anim. Genet. 25, 259–264. Ruane, J., Colleau, J.J., 1995. Marker assisted selection for genetic improvement of animal populations when a single QTL is marked. Genet. Res. 66, 71–83. Smith, C., Simpson, S.P., 1986. The use of genetic polymorphisms in livestock improvement. J. Anim. Breed. Genet. 103, 205–217. Soller, M., Beckmann, J.S., 1983. Restriction fragment length polymorphism in genetic improvement. Theor. Appl. Genet. 67, 25–33. Spelman, R.J., Garrick, D., 1996. Utilisation of marker-assisted selection in a commercial dairy cow population. Livest. Prod. Sci. 47, 139–147. Spelman, R.J., Coppieters, W., Karim, L., van Arendonk, J.A.M., Bovenhuis, H., 1996. Quantitative trait loci analysis for five milk production traits on chromosome six in the Dutch Holstein–Friesian population. Genetics 144, 1799–1808. Weller, J.L., Kashi, Y., Soller, M., 1990. Power of daughter and granddaughter designs for determining linkage between marker loci and QTL in dairy cattle. J. Dairy Sci. 73, 2525–2537. Woolliams, J.A., Thompson, R., 1994. A theory of genetic contributions. Proceedings of the 5th World Congress on Genetic Application on Livestock Production, Vol. 19, pp. 127-234. Zhang, W., Smith, C., 1993. Simulation of marker-assisted selection utilizing linkage disequilibrium: the effects of several additional factors. Theor. Appl. Genet. 86, 491–496.