A statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci

A statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci

ARTICLE IN PRESS Journal of Theoretical Biology 244 (2007) 115–126 www.elsevier.com/locate/yjtbi A statistical framework for genome-wide scanning an...

231KB Sizes 4 Downloads 80 Views

ARTICLE IN PRESS

Journal of Theoretical Biology 244 (2007) 115–126 www.elsevier.com/locate/yjtbi

A statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci Yuehua Cui Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA Received 3 May 2006; received in revised form 28 June 2006; accepted 11 July 2006 Available online 8 August 2006

Abstract Non-equivalent expression of alleles at a locus results in genomic imprinting. In this article, a statistical framework for genome-wide scanning and testing of imprinted quantitative trait loci (iQTL) underlying complex traits is developed based on experimental crosses of inbred line species in backcross populations. The joint likelihood function is composed of four component likelihood functions with each of them derived from one of four backcross families. The proposed approach models genomic imprinting effect as a probability measure with which one can test the degree of imprinting. Simulation results show that the model is robust for identifying iQTL with various degree of imprinting ranging from no imprinting, partial imprinting to complete imprinting. Under various simulation scenarios, the proposed model shows consistent parameter estimation with reasonable precision and high power in testing iQTL. When a QTL shows Mendelian effect, the proposed model also outperforms traditional Mendelian model. Extension to incorporate maternal effect is also given. The developed model, built within the maximum likelihood framework and implemented with the EM algorithm, provides a quantitative framework for testing and estimating iQTL involved in the genetic control of complex traits. r 2006 Elsevier Ltd. All rights reserved. Keywords: EM algorithm; Genomic imprinting; Maximum likelihood; Quantitative trait loci; Reciprocal backcross

1. Introduction In nature, most functional regions of the genome are converted to expression products equally from each one of a chromosome pair inherited from both parents. A functional difference between the genetic information contributed by each parent occurs when this equivalence is broken. This phenomenon is called genetic or genomic imprinting (Pfeifer, 2000). Genomic imprinting has been shown to play a key role in several human diseases such as Prader–Willlis syndrome and Angelman syndrome (Falls et al., 1999). Imprinted genes were also identified in other organisms such as animals (Jeon et al., 1999; Nezer et al., 1999; Tuiskula-Haavisto et al., 2004) and plants (Alleman and Doctor, 2000; Luo et al., 2000). While a number of statistical models have been applied to assess imprinted genes in humans (Hanson et al., Tel.: +1 517 432 7098; fax: +1 517 432 1405.

E-mail address: [email protected]. 0022-5193/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2006.07.009

2001; Haghighi and Hodge, 2002; Knapp and Strauch, 2004; Pratt et al., 2000; Shete and Amos, 2002; Shete et al., 2003; Strauch et al., 2000) and animals (Knott et al., 1998; de Koning et al., 2000, 2002; Cui et al., 2006), statistical methods for incorporation of genomic imprinting into the genetic mapping of quantitative trait loci (QTL) have received little attention and not been well studied (Hanson et al., 2001; de Koning et al., 2002). Like any Mendelian gene, imprinted gene follows the Mendelian segregation theory. But unlike Mendelian gene, imprinted gene shows parental-specific epigenetic modification effect, i.e. phenotypic resemblance of offspring to one of the parents. Rather than being treated as a mono-allelic expression, recent studies showed that imprinting displays a diploid-type expression pattern and hence can be considered as a partial effect (Naumova and Croteau, 2004; Sandovici et al., 2005; Spencer, 2002). These evidences suggest that one can treat imprinted gene as a quantitative trait locus with its expression products displaying in a quantitative scale.

ARTICLE IN PRESS 116

Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

However, our information about whether a gene displays haploid or diploid expression is limited before we know the true imprinting property of this gene. This brings in statistical challenges in modelling and testing imprinted effect without prior knowledge of the underlying QTL effect. A powerful model should consider various imprinting mechanism a gene might display such as no imprinting, complete imprinting or partial imprinting, and yet maintains proper testing power and reasonable parameter estimation. With these concerns, in this article, we develop a statistical framework for genome-wide scan significant imprinted quantitative trait loci (iQTL) based on the interval mapping theory and further dissect their imprinting effect through powerful statistical tests. The developed model, based on experimental crosses of inbred lines, shows great power in testing iQTL in various kinds of conditions such as complete or partial imprinting. Parameters are estimated under the maximum likelihood framework implemented with the EM algorithm (Dempster et al., 1977). Extensive simulation studies are performed to investigate the theoretical property of the derived model. The performance of the newly derived model is also compared with the traditional Mendelian model. Testing power and Type I error of the proposed method are addressed through simulation studies. The derived model provides biologists a quantitative testable framework to design their experiments aimed at hunting for imprinted genes. 2. Genetic model

expect resemblance of phenotype between offspring and its father. Similarly, we can observe the same pattern for paternal imprinting. To dissect genomic imprinting, our first step is to design experiment which can introduce genetic variation among different genotypes and most importantly between the two reciprocal heterozygotes Qm qf or qm Qf . Suppose that a gene is completely imprinted. The complete inactivation of one allele will result in uniparental disomy. Before we get to the statistical model, we further assume that the two reciprocal heterozygotes, Qm qf and qm Qf , are distinguishable by tracing back to their parents. If a gene is completely maternal imprinting, the genotypic values of the four genotypes can be expressed as 8 mp:Qm Qf ¼ m þ a for Qm Qf ; > > > > > < mp:Qm qf ¼ m  a for Qm qf ; (1) mp:qm Qf ¼ m þ a for qm Qf ; > > > > > : mp:qm qf ¼ m  a for qm qf ; where mp: refers to the genotypic mean for paternally expressed gene; m is the overall genetic mean; a is the additive genetic effect. Similarly, the genotypic values of the four genotypes assuming completely paternal imprinting can be expressed as 8 mm:Qm Qf ¼ m þ a for Qm Qf ; > > > > > < mm:Qm qf ¼ m þ a for Qm qf ; (2) mm:qm Qf ¼ m  a for qm Qf ; > > > > > : mm:qm qf ¼ m  a for qm qf ;

2.1. Quantitative genetic model Consider an autosomal locus having two segregating alleles Q and q. The current model is developed based on experimental crosses derived from two inbred line parents. Let Qm and qm denote the offspring alleles inherited from the maternal parent, and Qf and qf denote the offspring alleles inherited from the paternal parent. The subscripts m and f denote mother and father, respectively. Different allelic combinations of these four types of alleles result in four possible genotypes Qm Qf , Qm qf , qm Qf , qm qf . Based on traditional Mendelian inheritance, the genetic values of individuals carrying genotype Qm qf or qm Qf are not distinguishable since these two contrasting genotypes only show one observable heterozygotic genotype. However, genetic imprinting theory suggests that an allele inherited from one parent might be partially or completely silent. As a result, the genotypic means for two reciprocal heterozygotes Qm qf and qm Qf are different with degree of difference depending on the nature of imprinting mechanism. If a gene shows completely maternal imprinting, then offspring will display the same phenotype as its father does since allele Qm or qm is completely silent. If partial maternal imprinting is displayed, then we would

where mm: refers to the genotypic mean for maternally expressed gene. The above two sets of mean models are only applied to complete imprinting. When genes show partial imprinting effect, these two models cannot be applied directly. Further modification is needed and is given later. 2.2. Genetic design A complete dissection of any imprinted genes based on the above model relies on the experimental designs which can introduce quantitative variation between two reciprocal heterozygotes Qm qf and qm Qf . To this purpose, we propose to use backcross family as the testing population obtained through reciprocal backcrosses. Consider an F 1 heterozygote, Qq, derived from two homologous inbred lines, high-valued P1 (QQ) and lowvalued P2 (qq), in plants or animals. This F 1 is crossed with one of the two parents to generate two backcross population denoted as F 1  P1 and F 1  P2 . If we incorporate the parent-of-origin notation to the progeny genotype, the F 1  P1 family will consist of two types of genotypes Qm Qf and qm Qf , and the F 1  P2 family will consist of another two types of genotypes Qm qf and qm qf .

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

Because Qm qf and qm Qf come from different cross type, they are distinguishable and the genotypic means of these two reciprocal heterozygotes can also be identified. Let us denote this design as design B2 . One can also use the two parents P1 and P2 as maternal parent to cross with F 1 to get the reciprocal backcross population denoted as P1  F 1 and P2  F 1 . Similarly, these two backcross population contain four parent-oforigin-specific genotypes which are the same as the ones obtained in design B2 . We denote this backcross design as B1 . Table 1 tabulates the detailed four possible backcross population with corresponding genotype and genotypic values under complete maternal and paternal imprinting mechanisms. These four backcross families contain four possible genotypes illustrated in model (1) and (2). More importantly, the two reciprocal heterozygotes, Qm qf and qm Qf , can be well identified since they come from different backcross families. Thus, both designs provide a testable population allowing us to assess gene imprinting effect. In QTL mapping study, phenotype can be explained by a linear model containing a genetic mean and a random error (Lynch and Walsh, 1998). Based on the above genetic designs, the relationship between the observation and expected mean can be described by a simple linear model expressed as yijk ¼

2 X

xijjk ujjk þ eijk ,

(3)

j¼1

where k refers to the kth backcross family shown in Table 1; xijjk is the indicator variable denoted as 1 if a QTL genotype j is considered for individual i and 2 otherwise; ujjk is the expected or genotypic mean for QTL genotype j; eijk is the residual error that is iid (identically and independently distributed) normal with mean zero and variance s2k .

117

3. Statistical methods 3.1. Statistical parameterization When a gene displays complete functional inactivation, it shows uniparental disomy and we can then fit model (1) and (2). However, in nature, imprinted genes rather show a partial effect and hence display a quantitative character (Spencer, 2002). The above-derived models do not fit partially functioned genes. Moreover, our knowledge about the imprinting mechanism of an unknown gene is limited before we know the gene function. To take account of the partial imprinting effect and to increase the model flexibility, we propose to model maternal or paternal imprinting effect as a probability measure. Let us define p as the probability that a QTL is maternally imprinted. Consequently, 1  p is the probability of paternal imprinting. With this new defined probability measure, the expected value of the genotypic mean in model (3) for one of the four possible genotypes in the kth backcross family shown in Table 1 can be expressed as mjjk ¼ pmpjjk þ ð1  pÞmmjjk

for k ¼ 1; . . . ; 4 and j ¼ 1; 2, (4)

which gives ( mjjk ¼ m þ ð1  2pÞa mj 0 jk ¼ m þ ð2p  1Þa

for Qm qf ; for qm Qf ;

(5)

where j (or j 0 )¼ 1 or 2 depending on the parent-of-origin QTL genotype. Clearly, p ¼ 1 infers that a QTL is completely maternal imprinting which leads to model (1). Similarly, p ¼ 0 can be interpreted as zero probability of maternal imprinting which means this QTL is completely paternal imprinting which leads to model (2). If we let p ¼ 0:5, the expected genotypic values for four parent-of-origin genotype under

Table 1 Backcross type, maternal and offspring parent-of-origin genotypes and offspring genotypic values under complete maternal or paternal imprinting as well as the marker and phenotype data structure in a four-way backcross designs Design

B1

B2

a

Cross type

Mother genotype

Father genotype

Offspring genotype

Offspring genotypic value under CMI a

CPI a

Marker data

Phenotype data

P1  F 1 (1)

Qm Qm

Qf qf

Qm Qf Qm qf

mp1j1 ¼ m þ a mp2j1 ¼ m  a

mm1j1 ¼ m þ a mm2j1 ¼ m þ a

M1

y1

P2  F 1 (2)

qm qm

Qf qf

qm Qf qm qf

mp1j2 ¼ m þ a mp2j2 ¼ m  a

mm1j2 ¼ m  a mm2j2 ¼ m  a

M2

y2

F 1  P1 (3)

Qm qm

Qf Qf

Qm Qf qm Qf

mp1j3 ¼ m þ a mp2j3 ¼ m þ a

mm1j3 ¼ m þ a mm2j3 ¼ m  a

M3

y3

F 1  P2 (4)

Qm qm

qf qf

Qm qf qm qf

mp1j4 ¼ m  a mp2j4 ¼ m  a

mm1j4 ¼ m þ a mm2j4 ¼ m  a

M4

y4

CMI refers to complete maternal imprinting and CPI refers to complete paternal imprinting.

ARTICLE IN PRESS 118

design B1 can be expressed as 8 m1j1 ¼ m þ a for Qm Qf ; > > > > < m2j1 ¼ m for Qm qf ; m1j2 ¼ m for qm Qf ; > > > > : m2j2 ¼ m  a for qm qf :

Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

genetic designs, the mixture model for individual i in the kth backcross family can be expressed as (6)

This reduced to a regular Mendelian model (Lynch and Walsh, 1998) and the two reciprocal heterozygotes Qm qf and qm Qf have the same genotypic value. Therefore, p ¼ 0:5 implies no imprinting effect. The same model can be obtained for design B2 given p ¼ 0:5. If p takes value between 0 and 0.5, the expected genotypic value for genotype Qm qf will be greater than the expected genotypic value for genotype qm Qf because of the inequality 1  2p42p  1 in model (5). This gives us the inference of partially paternal imprinting. Similar argument results in partially maternal imprinting if p takes value between 0.5 and 1. With this statistical parameterization, we can assess imprinting effect to any degree of precision. Let us define Mk and yk (for k ¼ 1; . . . ; 4) as the marker data and phenotype data for the kth backcross population, respectively. With the two genetic designs illustrated in Table 1, one can construct the linkage map based on the four backcross populations by combining the two designs B1 and B2 together using Mapmaker (Lander et al., 1987). In the present study, we assume that all the marker information for the four backcross families used to construct the linkage map is known. Missing or partially missing marker information can be inferred conditional on observed marker information by applying a Markov chain process (Jiang and Zeng, 1997). Table 1 tabulates the detailed data structure corresponding to specific cross type. The complete marker and phenotype data information are given by the vector M ¼ ½M1 ; M2 ; M3 ; M4  and y ¼ ½y1 ; y2 ; y3 ; y4 .

f ðyijk jMijk ; j; ZÞ ¼ pi1jk f 1jk ðyijk jMijk ; m1jk ; s2k Þ þ pi2jk f 2jk ðyijk jMijk ; m2jk ; s2k Þ.

Consider two flanking markers, MZ and MZþ1 , derived from a backcross population, whose recombination fraction is denoted by r. A putative iQTL that shows imprinting effect is located between these two markers, as measured by the recombination fraction r1 with MZ and r2 with MZþ1 . The conditional probabilities of a QTL genotype, conditional upon the four marker genotypes in a backcross can be derived. Tables 2 and 3 show the conditional QTL probabilities given upon four flanking marker genotype where Table 2 is derived based on cross type 1 or 3 and Table 3 is derived based on cross type 2 or 4. These conditional probabilities serve as the mixture proportion in model (8). 3.3. Joint log-likelihood function Assuming independence of phenotypic observations within each backcross family, we can construct the component likelihood function which is defined as the log-likelihood function for each backcross population. Table 2 Conditional probabilities of QTL genotypes given flanking marker genotypes in a backcross design corresponding to cross type 1 and 3 Marker genotype

QQ

Qq

M Z M Z M Zþ1 M Zþ1

ð1  r1 Þð1  r2 Þ 1r ð1  r1 Þr2 r r1 ð1  r2 Þ r r1 r2 1r

r1 r2 1r r1 ð1  r2 Þ r ð1  r1 Þr2 r ð1  r1 Þð1  r2 Þ 1r

M Z M Z M Zþ1 mZþ1 M Z mZ M Zþ1 M Zþ1 M Z mZ M Zþ1 mZþ1

3.2. Finite mixture model The statistical foundation of QTL mapping is based on the mixture model (Lander and Botstein, 1989) in which each observation is assumed to have arisen from one of a known or unknown number of components, each component being modelled by a density from the parametric family f. So the distribution of phenotype y can be modelled as a linear combination of density functions which are specific to different QTL genotype j yf ðyjM; j; ZÞ ¼ p1 f 1 ðyjM; j1 ; ZÞ þ    þ pJ f J ðyjM; jJ ; ZÞ, (7) where M refers to the marker information; p ¼ ðp1 ; . . . ; pJ Þ0 refers to the mixture proportions (i.e. QTL genotype frequencies) which are constrained to be nonnegative and PJ p ¼ 1; u ¼ ðj1 ; . . . ; jJ Þ0 are the component (or QTL j¼1 j genotype) specific parameters, with jj being specific to component j; and Z is a parameter (i.e. residual variance) which is common to all components. Corresponding to our

ð8Þ

r1 and r2 are the recombination fractions between the QTL and marker MZ and MZþ1 , respectively, and r is the recombination fraction between the two markers.

Table 3 Conditional probabilities of QTL genotypes given flanking marker genotypes in a backcross design corresponding to cross type 2 and 4 Marker genotype

Qq

qq

M 0Z m0Z M 0Zþ1 m0Zþ1

ð1  r01 Þð1  r02 Þ 1  r0 ð1  r01 Þr02 r0 r01 ð1  r02 Þ r0 r01 r02 1  r0

r01 r02 1  r0 r01 ð1  r02 Þ r0 ð1  r01 Þr02 r0 ð1  r01 Þð1  r02 Þ 1  r0

M 0Z m0Z m0Zþ1 m0Zþ1 m0Z m0Z M 0Zþ1 m0Zþ1 m0Z m0Z m0Zþ1 m0Zþ1

r01 and r02 are the recombination fractions between the QTL and marker M0Z and M0Zþ1 , respectively, and r0 is the recombination fraction between the two markers.

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

Suppose there are nk observations in the kth backcross population whose phenotypic trait ðyk Þ is affected by an iQTL. The component likelihood function can be expressed as " # nk 2 X X ‘k ðXÞ ¼ log Lk ðXÞ ¼ log pijjk f jjk ðyijk ; Mijk jXÞ , i¼1

j¼1

(9) where X ¼ ðy; m; a; p; s2k Þ for k ¼ 1; . . . ; 4 contains unknown QTL-position ðyÞ, QTL-effect parameters ðm; aÞ, imprinting parameter (p) and the residual variances ðs2k Þ, pijjk is the conditional probability of the jth QTL genotype given upon a marker interval for individual i in the kth backcross population and f jjk ðyijk Þ is the normal density corresponding to the jth QTL genotype with mean mjjk and variance s2k under the kth backcross design. Theoretically, we can either use data generated in design B1 or B2 to map iQTL. However, to fully use the genetic information from different backcross populations and to increase the mapping power, we propose to integrate all four component likelihood functions built on the four different backcrosses to get the joint log-likelihood function. Assuming independence of the four backcross populations, the joint log-likelihood function is a summation of the four component likelihood functions, which can be expressed as " # nk 4 4 X 2 X X X ‘ðXÞ ¼ ‘k ðXÞ ¼ log pijjk f jjk ðyijk ; Mijk jXÞ . k¼1

k¼1 i¼1

j¼1

(10) 3.4. Computation algorithm Standard EM algorithm (Dempster et al., 1977) is derived to obtain the maximum likelihood estimates (MLEs) contained in X. This includes differentiating the joint log-likelihood function (Eq. (10)) with respect to each unknown parameter, setting the derivatives equal to zero and solving the derived log-likelihood equations. The detailed derivations of the parameter estimation are given in Appendix. In current study, we further assume that s2k ¼ s2 for k ¼ 1; . . . ; 4. A grid searching approach is used to estimate the QTL location by searching for a putative QTL at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The log-likelihood ratio test statistic for a QTL at a testing position is displayed graphically to generate a log-likelihood ratio plot called LR profile plot. The genomic position that corresponds to a peak of the profile is the MLE of the QTL location. 3.5. Hypothesis tests Once parameters are estimated, likelihood ratio (LR) test can be applied to test the existence of a QTL and further identify its imprinting property. The LR test statistic is

119

computed between the full (there is a QTL) and reduced model (there is no QTL). These two models correspond to two alternative hypotheses, expressed as ( H0 : a ¼ 0; H1 : aa0: This is the first test before testing the imprinting effect of any detected QTL. Therefore, it is also defined as the e and X b be the MLEs of the primary QTL test. Letting X unknown parameters under H0 and H1 , respectively, the log-likelihood ratio can be calculated as e b LR ¼ 2½log LðXjM; yÞ  log LðXjM; yÞ.

(11)

Due to the violation of regularity conditions, LR does not follow an asymptotically w2 distribution. Permutation test (Churchill and Doerge, 1994) is applied to calculate the critical threshold value for declaring existence of a QTL. By repeatedly shuffling the relationships between marker genotypes and phenotypes many times (1000 times say), a series of the log-likelihood ratios are calculated, from the distribution of which the critical threshold is determined. The significant QTL detected by the primary QTL test cannot be called an iQTL until further imprinting tests are performed. To examine the imprinting property of the detected QTL, we propose the following three hypothesis tests. Test I: To test the existence of imprinting effect—The corresponding hypothesis can be formulated as ( H0 : p ¼ 0:5 ðMendelian QTLÞ; H1 : pa0:5 ðimprinted QTLÞ; where rejecting H0 provides evidence of genomic imprinting. This test is also referred to as LRMend . Test II: To test if an iQTL detected is completely maternal imprinting—If H0 is rejected in Test I, we can further test whether the identified iQTL shows a complete inactivation of allele derived from maternal parent, i.e. complete maternal imprinting. This is equivalent to test if p ¼ 1. Correspondingly, the hypothesis can be formulated as ( H0 : p ¼ 1; H1 : pa1: This test is also denoted as LRMat . Test III: To test if an iQTL detected is completely paternal imprinting—Complete paternal imprinting corresponds to complete inactivation of alleles derived from paternal parent. Correspondingly, the hypothesis can be formulated as ( H0 : p ¼ 0 H1 : pa0: This test is also denoted as LRPat . The above tests are sequential in the sense that the primary test should be conducted first. If a significant QTL

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

120

is identified, then next we conduct Test I which tests if a QTL shows a Mendelian or imprinting effect. If a QTL shows imprinting evidence (rejecting H0 in Test I), then Test II and III can be followed to further examine the degree of imprinting. Rejecting H0 in Test II and III implies that an iQTL is partially imprinted. Parameter po0:5 implies partially maternal imprinting and p40:5 implies partially paternal imprinting. The critical thresholds for Tests I–III are determined by simulations. 4. Results 4.1. Simulation design A series of simulation studies are performed to investigate the theoretical properties of our model. Assuming all four backcross populations share the same linkage map, a linkage group of 100 cM, composed of six equidistant markers, is simulated for all four backcross populations. For a given marker, there are two types of genotypes, one heterozygous and the other homozygous, in the backcross family. Markers are simulated according to the Mendelian inheritance law and also depending on the cross type. The size of the kth backcross family is denoted as nk . Without loss of generality, here we assume the same sample size for all four backcross families, i.e. nk ¼ n which leads to the total sample size N ¼ 4n. Assume that there is a putative QTL located at 28 cM from the first marker for an offspring trait, which account for a proportion ðH 2 Þ of the total observed variance for all four backcross populations in this trait. Knowing the location of the QTL, the QTL genotypes can be simulated by viewing this QTL as ‘‘markers’’ based on the conditional probability pijjk given in Tables 2 and 3 for the kth backcross family. The phenotypic values of an

assumed offspring trait for the kth backcross family are simulated from a normal distribution with mean expressed in model (4) and variance s2 . To better understand the model performance under different conditions, simulation studies are performed with various sample sizes (n ¼ 100 vs n ¼ 200), and different heritability levels ðH 2 ¼ 0:1; 0:25; 0:4Þ. To examine the impacts of parameter spaces on parameter estimation, we also simulated data under different imprinting mechanisms, i.e. no imprinting ðp ¼ 0:5Þ, completely maternal imprinting ðp ¼ 1Þ, completely paternal imprinting ðp ¼ 0Þ and partially maternal imprinting ðp ¼ 0:8Þ. The precision of the parameter estimation is evaluated based on the square root of the mean-squared error (SMSE). 4.2. Simulation results In general, the model can provide reasonable estimates of the iQTL positions and effects, with estimation precision depending on sample sizes, heritability levels, and gene action mode. Tables 4–7 list the hypothesized values of the iQTL location and effect parameters, as well as their MLEs and SMSE under different simulation schemes. Parameter estimation: As we expected, the estimation precision of the iQTL effects and position is dependent on sample size and heritability. Increased sample size and heritability always result in increased precision of all parameter estimation under different imprinting mechanisms. Clearly as shown in all tables (Tables 4–7), small SMSEs of MLEs indicate consistency of parameter estimation. It is interesting to note that relative to the effect of increasing sample sizes, the increase of H 2 can lead to more significant improvement for the estimation precision than the increase of sample size n. For example, nearly all the SMSEs of MLEs decrease by more than

Table 4 The MLEs of the QTL position and effect parameters based on 100 simulation replicates for the complete maternal imprinting effect of an iQTL under different heritabilities and sample sizes analysed using the imprinting (upper) and Mendelian (lower) model H2

n

Position at 28 cM

m¼5

a ¼ 0:5

p¼1

s2

0.1

100

28.26(8.38) 30.52(15.84) 29.90(7.30) 30.04(12.09)

5.01(0.07) 5.14(0.16) 5.00(0.06) 5.13(0.15)

0.54(0.09) 0.51(0.11) 0.53(0.07) 0.50(0.08)

0.94(0.11) – 0.95(0.08) –

2.25(0.16) 2.40(0.23) 2.24(0.10) 2.40(0.19)

29.52(4.94) 28.50(7.71) 29.68(3.74) 28.90(5.92)

5.01(0.04) 5.13(0.14) 5.00(0.03) 5.13(0.13)

0.52(0.05) 0.49(0.07) 0.51(0.04) 0.49(0.05)

0.96(0.07) – 0.97(0.05) –

0.75(0.06) 0.89(0.15) 0.75(0.04) 0.89(0.15)

30.00(3.35) 29.12(5.63) 29.70(2.60) 29.60(4.32)

5.00(0.03) 5.13(0.13) 5.00(0.02) 5.13(0.13)

0.52(0.04) 0.50(0.05) 0.51(0.03) 0.50(0.03)

0.97(0.04) – 0.98(0.03) –

0.38(0.03) 0.52(0.15) 0.38(0.02) 0.52(0.15)

200 0.25

100 200

0.4

100 200

Power

94 100 95 100 95 100

The square roots of the mean squared errors of the MLEs are given in parentheses. Data were simulated for an iQTL showing completely maternal imprinting. The locations of the QTL is described by the map distances (in cM) from the first marker of the linkage group (100 cM long). The hypothesized s2 value is 2.25 for H 2 ¼ 0:1, 0.75 for H 2 ¼ 0:25 and 0.38 for H 2 ¼ 0:4. Power is calculated as the percentage of the number of those simulations in which complete maternal imprinting is detected (the null in test LRMat fails to reject).

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

121

Table 5 The MLEs of the QTL position and effect parameters based on 100 simulation replicates for the complete paternal imprinting effect of an iQTL under different heritabilities and sample sizes analysed using the imprinting (upper) and Mendelian (lower) model H2

n

Position at 28 cM

m¼5

a ¼ 0:5

p¼0

s2

Power

0.1

100

31.04(12.40) 30.18(15.65) 31.18(7.79) 30.60(10.48)

5.01(0.07) 5.14(0.16) 5.01(0.06) 5.13(0.15)

0.54(0.10) 0.51(0.10) 0.52(0.06) 0.51(0.07)

0.06(0.09) – 0.05(0.08) –

2.25(0.17) 2.40(0.23) 2.26(0.12) 2.39(0.18)

93

29.72(4.82) 30.12(10.90) 30.48(3.39) 30.32(5.54)

5.01(0.04) 5.13(0.14) 5.00(0.03) 5.13(0.13)

0.52(0.06) 0.50(0.06) 0.51(0.03) 0.51(0.04)

0.03(0.06) – 0.03(0.05) –

0.75(0.06) 0.89(0.16) 0.75(0.04) 0.89(0.15)

95

30.20(3.61) 29.36(5.37) 30.38(3.01) 29.40(4.19)

5.01(0.03) 5.13(0.14) 5.00(0.02) 5.13(0.13)

0.51(0.04) 0.49(0.04) 0.51(0.02) 0.50(0.03)

0.03(0.04) – 0.02(0.03) –

0.38(0.03) 0.52(0.15) 0.38(0.02) 0.52(0.14)

96

200 0.25

100 200

0.4

100 200

98

98

99

The square roots of the mean squared errors of the MLEs are given in parentheses. Power is calculated as the percentage of the number of those simulations in which complete paternal imprinting is detected (the null in test LRPat fails to reject). See Table 4 for the other explanations.

Table 6 The MLEs of the QTL position and effect parameters based on 100 simulation replicates for no imprinting effect of an QTL under different heritabilities and sample sizes analysed using the imprinting (upper) and Mendelian (lower) model H2

n

Position at 28 cM

m¼5

a ¼ 0:5

p ¼ 0:5

s2

Type I error

0.1

100

32.78(16.62) 31.56(15.91) 31.38(12.68) 30.22(9.93)

5.01(0.07) 5.14(0.16) 5.01(0.06) 5.13(0.14)

0.51(0.11) 0.51(0.10) 0.50(0.07) 0.51(0.07)

0.51(0.15) – 0.51(0.09) –

2.24(0.16) 2.27(0.16) 2.25(0.12) 2.26(0.11)

11

29.68(6.87) 29.82(8.61) 30.32(4.42) 29.12(5.08)

5.01(0.04) 5.12(0.13) 5.00(0.03) 5.13(0.13)

0.50(0.06) 0.51(0.07) 0.50(0.04) 0.49(0.04)

0.50(0.07) – 0.50(0.05) –

0.75(0.05) 0.76(0.04) 0.75(0.04) 0.76(0.04)

8

29.70(4.39) 28.20(4.83) 29.32(3.39) 29.96(3.77)

5.01(0.03) 5.12(0.13) 5.00(0.02) 5.13(0.13)

0.50(0.04) 0.49(0.05) 0.50(0.03) 0.50(0.03)

0.50(0.05) – 0.50(0.04) –

0.38(0.03) 0.39(0.03) 0.38(0.02) 0.39(0.03)

6

200 0.25

100 200

0.4

100 200

7

5

4

The square roots of the mean squared errors of the MLEs are given in parentheses. Type I error is calculated as the percentage of the number those simulations in which false-positive imprinting effects are detected (the null in test LRMend is rejected). See Table 3 for the other explanations.

twofold when increasing H 2 from 0.1 to 0.4 for fixed sample size n ¼ 100, while these increases are not significant when increasing sample size n from 100 to 200 for fixed heritability H 2 ¼ 0:1. This suggests that in practice well-managed experiments, through which residual errors are reduced and therefore H 2 is increased, are more important than simply increasing sample sizes. Power analysis: Two types of testing power are defined in this article based on Test I–III. The first one is subject to Test I ðLRMend Þ, and is defined as the probability of rejecting the null hypothesis H0 given the alternative H1 is true and it is computed as the percentage of successfully detecting imprinting out of 100 simulation replicates. To evaluate the testing power under different degree of

imprinting, data are simulated with p ¼ 0; 0:1; . . . ; 0:9; 1 under heritability H 2 ¼ 0:1 and 0.4. Test I is then conducted for each simulation at the position where a significant QTL is detected based on the primary QTL test. First of all, as can be seen in Fig. 1, for fixed imprinting level large sample size and high heritability always result in higher power than small sample size and low heritability. For example, when sample size is 100, the power is 98% under H 2 ¼ 0:4 compared to 78% under H 2 ¼ 0:1 for fixed imprinting level p ¼ 0:2. When heritability is 0.1, the power is 94% for n ¼ 200 compared to 78% for n ¼ 100 for imprinting level p ¼ 0:2. Secondly, for fixed sample size, the power shows a monotone decreasing trend as the degree of imprinting

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

122

Table 7 The MLEs of the QTL position and effect parameters based on 100 simulation replicates for the partial maternal imprinting effect of an iQTL under different heritabilities and sample sizes analysed using the imprinting model H2

n

Position at 28 cM

m¼5

a ¼ 0:5

p ¼ 0:8

s2

0.1

100 200

30.44(13.43) 29.76(9.13)

5.01(0.07) 5.01(0.06)

0.52(0.10) 0.50(0.07)

0.80(0.14) 0.81(0.09)

2.25(0.16) 2.25(0.12)

77 95

0.25

100 200

28.98(5.72) 30.22(4.21)

5.01(0.04) 5.01(0.03)

0.50(0.06) 0.50(0.04)

0.81(0.09) 0.81(0.05)

0.75(0.06) 0.75(0.04)

93 100

0.4

100 200

29.84(4.13) 30.20(3.27)

5.01(0.03) 5.00(0.02)

0.50(0.05) 0.50(0.03)

0.80(0.06) 0.80(0.04)

0.38(0.03) 0.38(0.02)

97 100

Power

The square roots of the mean squared errors of the MLEs are given in parentheses. Power is calculated as the percentage of the number of those simulations in which imprinting effect is detected (the null in test LRMend is rejected). See Table 4 for the other explanations.

B (n=200)

1

1

0.8

0.8

Testing power

Testing power

A (n=100)

0.6 0.4 H2=0.1 H2=0.4

0.2

0.6 0.4 H2=0.1 H2=0.4

0.2

0

0 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

p C (H2=0.1)

0.8

1

D (H2=0.4)

1

1

0.8

0.8

Testing power

Testing power

0.6 p

0.6 0.4 0.2

0.6 0.4 0.2

n=100 n=200

0

n=100 n=200

0 0

0.2

0.4

0.6

0.8

1

p

0

0.2

0.4

0.6

0.8

1

p

Fig. 1. The power plot for test LRMend under different heritability levels (H 2 ¼ 0:1 and 0.4) and sample sizes (n ¼ 100 and 200). The effect of heritability on power is compared with sample size n ¼ 100 (A) and n ¼ 200 (B). The dotted line corresponds to H 2 ¼ 0:4 and the solid line corresponds to H 2 ¼ 0:1. The effect of sample size on power is compared under heritability H 2 ¼ 0:1 (C) and H 2 ¼ 0:4 (D). The dotted line corresponds to n ¼ 200 and the solid line corresponds to n ¼ 100. The testing power is calculated as the percentage of rejecting H0 in Test I (LRMend ) given the alternative is true.

increased from complete paternal imprinting ðp ¼ 0Þ to no imprinting ðp ¼ 0:5Þ, and then displays an increasing pattern as the degree of imprinting increased from no imprinting to complete maternal imprinting ðp ¼ 1Þ in all simulation designs. It is also clearly shown in Fig. 1 that the testing power is a function of imprinting parameter p with small power obtained as p close to 0.5. In all the simulation cases, the power distribution is quite symmetric about p ¼ 0:5.

Lastly, as can be inferred from Fig. 1, the testing power is more sensitive to heritability than sample size as p close to 0.5. Increased heritability level always results in more significant improvement of the testing power than increased sample size. For example, for fixed imprinting level p ¼ 0:7, when the sample size increases from 100 to 200, the difference of the power is 0.22 for the same heritability H 2 ¼ 0:1 (Fig. 1C). However, when the heritability is increased from 0.1 to 0.4, this leads to a power difference of

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

0.45 for the same sample size n ¼ 100 (Fig. 1A). As p moves close to 0 or 1, the power approaches more quickly to 100% under high heritability level. As p approaches to 0.3 (or below) or 0.7 (or above), the power is close to 100% and is not sensitive to sample size anymore. The second type of testing power is subject to Test II and III (LRMat and LRPat ) and is computed as the percentage that the model successfully tests the significance of complete maternal/paternal imprinting out of 100 simulation replicates. The last column of Tables 4 and 5 lists the testing power for LRMat and LRPat , respectively. It is clearly shown in the tables that the testing power depends on sample size and heritability level. Increased sample size and heritability level always lead to improved power. However, it seems that the testing power is more sensitive to sample size than the heritability level. For example, as shown in Table 4, we always observe 100% testing power when sample size is increased from 100 to 200. But the power increase is not significant when the heritability is increased from 0.1 to 0.4 for fixed sample size (say 100). Type I error: We define the Type I error as the percentage of falsely detected iQTL out of 100 simulation replicates when the QTL is indeed not imprinted (Mendelian QTL). Data are simulated given that the QTL shows Mendelian effect ðp ¼ 0:5Þ. Test results are shown in Table 6 based on test LRMend . Clearly, Type I error is decreased as we increase sample size or heritability. Even though we observe a little inflated Type I error under small sample size ðn ¼ 100Þ and low heritability ðH 2 ¼ 0:1Þ, the error rate is remarkably decreased when sample size and heritability are increased. Imprinting vs Mendelian model: For an imprinted data set, the imprinting model can provide accurate estimates of QTL effects and position (Tables 4, 5 and 7). However, if the same data are analysed with the Mendelian model (model (6)), inconsistent estimates are obtained. First of all, the QTL position is poorly estimated with the Mendelian model. For example, the SMSE of the QTL position when analysing data using the Mendelian model is about twofold higher than the one obtained using imprinting model. Second, as clearly shown in Tables 4 and 5, the overall genetic mean ðmÞ and residual variance ðs2 Þ are consistently over-estimated when analysing the imprinted data with the Mendelian model. When data are simulated assuming no imprinting effect ðp ¼ 0:5Þ, our model also provides better estimation than the Mendelian model (Table 6). Even though no remarkable difference for the residual variance estimate is observed when the imprinted data are analysed with both models, the overall genetic parameter is heavily overestimated with the Mendelian model. In summary, the imprinting model provides good parameter estimation and has great power to detect iQTL when the underlying data show various degree of imprinting. Even when there is no imprinting effect, our model also performs better than the traditional Mendelian model (model (6)).

123

5. Extension Cytoplasmic maternal effect is another genetic source for offspring genetic variation, which is one type of confounding effect for genetic imprinting (see Wade, 1998, for an overview). For example, the existence of maternal effect could lead to the conclusion that a detected iQTL is paternally imprinted. To avoid the confounding effect caused by maternal effect and make the model more powerful, we can introduce three quantitative parameters, namely mQQ , mQq and mqq , which denote the maternal effects if the maternal backcross lines are QQ, Qq and qq, respectively. An example is given to illustrate the idea using design B1 . Similar configurations can be easily obtained for design B2 . For the backcross type Qm Qm  Qf qf , the offspring genetic values are expressed as ( for Qm qf ; mQm Qf ¼ m þ mQQ þ a (12) mQm qf ¼ m þ mQQ þ að1  2pÞ for Qm qf : For the backcross type qm qm  Qf qf , the offspring genetic values are expressed as ( mqm Qf ¼ m þ mqq þ að2p  1Þ for qm Qf ; (13) mqm qf ¼ m þ mqq  a for qm qf : By combining all four backcross populations together, the parameters can be estimated using EM algorithm and likelihood ratio test can be conducted to test if there are maternal effects contributable for offspring genetic variation by testing the following hypothesis: ( H0 : mQQ ¼ mQq ¼ mqq ¼ 0; H1 : at least one of the parameters does not equal 0. If the null is rejected, we can further test which maternal has a significant contribution to offspring variation. 6. Discussion Imprinting-like phenomena is now ubiquitously observed in a wide spectrum of organisms spanning from plants, animals to humans. Identification of imprinted genes using positional cloning or candidate gene testing approach has produced promising results (Pfeifer, 2000). Given that these methods are labor intensive and can only analyse limited number of genes, genetic mapping approach has been proved to be an alternative cost-effective way for identifying multiple imprinted genes based on a linkage map (Hanson et al., 2001; Shete et al., 2003; de Koning et al., 2000, 2002; Cui et al., 2006). Therefore, it is still in much demand to develop efficient and powerful statistical methods to map imprinted genes given the nature of their complex properties. In this article, a novel statistical approach is proposed which provides a general framework for genome-wide scanning and testing iQTL based on experimental crosses

ARTICLE IN PRESS 124

Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

of inbred line species using backcross populations. Statistical properties of the derived model are carefully investigated through simulation studies. A number of merits are provided. First, a probability parameter p is incorporated into the genetic model through which a general iQTL mapping framework is implemented. As shown by extensive simulation studies, the parsimonious model is quite robust in that it provides accurate and consistent estimates of QTL effect parameters and position with reasonable precision in a variety of situations. In biological sense, the probability parameter p provides an assessment of degree of imprinting more than the nature of probability measure implies. Second, the developed imprinting model can well characterize different degree of imprinting with high power (or sensitivity) and low Type I error rate. Simulation studies demonstrate that the power is a function of imprinting probability p, and also affected by sample size and heritability. The power increases as p moves from 0.5 to 1 or 0, with the speed of increase depending on sample size and heritability (Fig. 1). As p moves close to 0.5, the power is more sensitive to heritability than sample size. When heritability is low, the type I error rate is inflated (Table 6). As heritability level increases from 0.1 to 0.25 or higher, the error rate reduces to around 5%. Considering the type I error rate and the power analysis, one has to be cautious when making conclusion when the proportion of variation explained by the underlying QTL is small, especially under small sample size. Third, the developed model is both biologically and statistically reasonable and interpretable. It can be considered as a unified approach in mapping both Mendelian and imprinted QTL under different imprinting mechanism ranging from no imprinting, partial imprinting to complete imprinting with backcross design. In details, significance of Mendelian QTL is first assessed and the imprinting property of any identified QTL is further exploited by a series of statistical tests. The advantage of the imprinting model is more noticeable when analysing imprinting data with the Mendelian model in which the imprinting model always provides more robust parameter estimates than the Mendelian model. It is also worth noting that even when a QTL shows no imprinting effect, the imprinting model also outperforms the Mendelian model. The developed model is implemented though reciprocal backcross populations which serve as an optimal pedigree to study genomic imprinting. Even though only two backcross populations generated either in design B1 or B2 are sufficient to map iQTL, combining four backcross populations initiated by the same inbred lines can potentially increase the mapping power. Increased power for QTL detection when different related crosses are combined has been shown in literature (Xu, 1998; Zou et al., 2001). More importantly, we can test maternal effect and genomic imprinting under the same model with combined crosses.

Using outbred line for mapping imprinted genes has been proposed by de Koning et al. (2000, 2002). While the imprinting effect could be confounded with the effect due to allelic variation, there is no such concern using inbred line. However, the successful detection and inference on the detected iQTL using inbred lines depend largely on the design of the experiment. Cui et al. (2006) proposed an F2 design for mapping imprinted genes which has been successful for identifying several imprinted genes responsible for mouse growth. The F2 model relies on the sexspecific difference on recombination fraction between male and female chromosomes and hence restricts its application in most organisms due to limited knowledge of this information. Clapcott et al. (2000) proposed to use reciprocal backcross (e.g. F 1  P1 and P1  F 1 ) for identifying imprinted genes based on linkage analysis. This design is different from either design B1 or B2 , hence is not as efficient as the ones proposed in this paper because the offspring genotype lacks one of the parental genotypes. This could lead to biased parameter estimation and reduced power consequently. Moreover, the result reached by their design is not conclusive since any imprinting result could be confounded by the maternal effect. Comparing to these methods, the proposed design should be more powerful in dissecting imprinting effect. Based on the simulation results, the proposed approach can be acted as a unified approach for mapping both Mendelian and imprinted QTL, and hence represents a timely effort to accelerate the research toward the better understanding of the genetic architecture of complex traits. The developed model can be applied to plant and animal species. In plants, endoreduplication is a commonly observed phenomenon which shows a maternally controlled parent-of-origin effect in certain species, for instance, in maize endosperm (Dilkes et al., 2002). Cells undergo endoreduplication are typically larger than other cells, which consequently results in larger fruits or seeds beneficial to human beings (Grime and Mowforth, 1982). In animals, such as the model system mouse, a large number of genes are homologous to human genes, which makes them particularly valuable as a research tool for scientists to conduct experiments that they could never perform on humans. When researchers discover a gene in a model organism that plays a role, they can then identify the counterpart gene in humans through DNA sequences comparison. Successful examples have been shown in the literature (e.g. Wevrick and Francke, 1997; Kim et al., 1997). While it is not difficult to generate backcross populations described so far with well-established genetic techniques and abundant experimental materials, the proposed approach provides a quantitative testable framework to guide geneticists to design their experiments with an ultimate goal of hunting for imprinted genes. Significant discoveries are highly expected using the proposed design and model. To clearly present our idea for mapping iQTL based on a combined analysis, the model is implemented within the

ARTICLE IN PRESS Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

context of interval mapping (Lander and Botstein, 1989). It is not difficult to extend the model to incorporate the idea of composite interval mapping or multiple interval mapping (Zeng, 1994; Kao et al., 1999) into this framework. Also, we have not considered epistasis effect into the model. It is essential to derive models to consider the interaction effect. Such more comprehensive approaches, in conjunction with growing genetic data, will make the model more useful and powerful in practice. It can be expected that the model will have great implications for the study of genomic imprinting.

Pi2jk ¼

which can be thought of as posterior probabilities of QTL genotypes given on the marker genotypes derived from the kth backcross family. Given the initial values for the unknown parameters X, we can update P1jk ¼ fP:jk gi;1 and P2jk ¼ fP:jk gi;2 (E-step). The estimated posterior probabilities are used to obtain the new MLEs of X (M step). Define 2

The author thank the anonymous referees and the editor for their valuable comments on the manuscript. This research was supported by a start-up fund from Michigan State University. Appendix A. EM algorithm for estimating the parameters The joint log-likelihood function of Eq. (10) can be expressed as ‘ðXÞ ¼ log LðXjM; y1 ; y2 ; y3 ; y4 Þ nk 4 X X

1 6 1 6 D¼6 4 1

log½pi1jk f 1jk ðyijk Þ þ pi2jk f 2jk ðyijk Þ,

1

4 1X ^ a ^ fP0 ½y  ðDk;3 þ ðDk;1  Dk;3 ÞpÞ n k¼1 1jk k

0 ^ ^ k;3 þ ðDk;1  Dk;3 ÞpÞ k¼1 fP1jk ðyk  mÞðD P4 0 ^2 k¼1 f1 P1jk ðDk;3 þ ðDk;1  Dk;3 ÞpÞ

^ ^ þ P02jk ðyk  mÞðD k;4 þ ðDk;2  Dk;4 ÞpÞg ^ 2g þ 10 P2jk ðDk;4 þ ðDk;2  Dk;4 ÞpÞ

ðA:4Þ

,

0 k¼1 fP1jk ðyk

^ k;1  Dk;3 Þ þ P02jk ðyk  m^  Dk;4 aÞðD ^ k;2  Dk;4 Þg  m^  Dk;3 aÞðD . P4 2 2 0 0 ^ þ 1 P2jk ðDk;2  Dk;4 Þ ag ^ k¼1 f1 P1jk ðDk;1  Dk;3 Þ a

where pi2jk ¼ 1  pi1jk for i ¼ 1; . . . ; nk , k ¼ 1; . . . ; 4 and k refers to the kth backcross family. The first derivative of ‘ðXÞ with respect to a particular element Xs is given by " q nk 4 X X pi1jk qX f ðyijk Þ q s 1jk ‘ðXÞ ¼ qX‘ pi1jk f 1jk ðyijk Þ þ pi2jk f 2jk ðyijk Þ k¼1 i¼1 # q pi2jk qX f ðy Þ 2jk ijk s þ pi1jk f 1jk ðyijk Þ þ pi2jk f 2jk ðyijk Þ nk  4 X X q ¼ Pi1jk log f 1jk ðyijk Þ qXs k¼1 i¼1  q þPi2jk log f 2jk ðyijk Þ , qXs

ðA:1Þ

(A.5)

(A.6)

Since p is a probability measure, it only takes value from 0 to 1. The MLE of p is then given by ( maxðe p; 0Þ if peo0; p^ ¼ (A.7) minðe p; 1Þ if pe41; s^ 2 ¼

where Pi1jk ¼

1

^ ag, ^ þ P02jk ½yk  ðDk;4 þ ðDk;2  Dk;4 ÞpÞ

P4

pe ¼

1 1

3 1 1 1 1 7 7 7 1 1 5

then the MLEs of parameters contained in X can be obtained by setting the first derivative in Eq. (A.1) equal to 0

m^ ¼

P4

1 1

1

k¼1 i¼1

a^ ¼

for k ¼ 1; . . . ; 4, (A.3)

Acknowledgments

¼

pi2jk f 2jk ðyijk Þ pi1jk f 1jk ðyijk Þ þ pi2jk f 2jk ðyijk Þ

125

1 0 ^ aÞ ^2 fP ðy  m^  ðDk;3 þ ðDk;1  Dk;3 ÞpÞ n 1jk k ^ aÞ ^ 2 g, þ P02jk ðyk  m^  ðDk;4 þ ðDk;2  Dk;4 ÞpÞ

ðA:8Þ

where Dk;j is the element that corresponds to the kth row and jth column in matrix D; 0 refers to transpose. This iterative process is repeated between Eq. (A.2) and (A.8) until the specified convergence criterion is satisfied. The values at convergence are regarded as the MLEs. References

pi1jk f 1jk ðyijk Þ pi1jk f 1jk ðyijk Þ þ pi2jk f 2jk ðyijk Þ

for k ¼ 1; . . . ; 4, (A.2)

Alleman, M., Doctor, J., 2000. Genomic imprinting in plants: observations and evolutionary implications. Plant Mol. Biol. 43, 147–161. Churchill, G.A., Doerge, R.W., 1994. Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971.

ARTICLE IN PRESS 126

Yuehua Cui / Journal of Theoretical Biology 244 (2007) 115–126

Clapcott, S.J., Teale, A.J., Kemp, S.J., 2000. Evidence for genomic imprinting of the major QTL controlling susceptibility to trypanosomiasis in mice. Parasite Immunol. (Oxf.) 22, 259–264. Cui, Y.H., Lu, Q., Cheverud, J.M., Littel, R.L., Wu, R.L., 2006. Model for mapping imprinted quantitative trait loci in an inbred F2 design. Genomics 87, 543–551. de Koning, D.-J., Rattink, A.P., Harlizius, B., van Arendonk, J.A.M., Brascamp, E.W., et al., 2000. Genome-wide scan for body composition in pigs reveals important role of imprinting. Proc. Natl. Acad. Sci. USA 97, 7947–7950. de Koning, D.-J., Bovenhuis, H., van Arendonk, J.A.M., 2002. On the detection of imprinted quantitative trait loci in experimental crosses of outbred species. Genetics 161, 931–938. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B 39, 1–38. Dilkes, B.P., Dante, R.A., Coelho, C., Larkins, B.A., 2002. Genetic analysis of endoreduplication in Zea mays endosperm: evidence of sporophytic and zygotic maternal control. Genetics 160, 1163–1177. Falls, J.G., Pulford, D.J., Wylie, A.A., Jirtle, R.L., 1999. Genomic imprinting: implications for human disease. Am. J. Pathol. 154, 635–647. Grime, J.P., Mowforth, M.A., 1982. Variation in genome size-an ecological interpretation. Nature 299, 151–153. Haghighi, F., Hodge, S.E., 2002. Likelihood formulation of parent-oforigin effects on segregation analysis, including ascertainment. Am. J. Hum. Genet. 70, 142–156. Hanson, R.L., Kobes, S., Lindsay, R.S., Kmowler, W.C., 2001. Assessment of parent-of-origin effects in linkage analysis of quantitative traits. Am. J. Hum. Genet. 68, 951–962. Jeon, J.-T., Carlborg, O., Tornsten, A., Giuffra, E., Amarger, V., et al., 1999. A paternally expressed QTL affecting skeletal and cardiac muscle mass in pigs maps to the IGF2 locus. Nat. Genet. 21, 157–158. Jiang, C., Zeng, Z.-B., 1997. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101, 47–58. Kao, C.H., Zeng, Z.-B., Teasdale, R.D., 1999. Multiple interval mapping for quantitative trait loci. Genetics 152, 1203–1216. Kim, J., Ashworth, L., Branscomb, E., Stubbs, L., 1997. The human homolog of a mouse-imprinted gene, Peg3, maps to a zinc finger gene-rich region of human chromosome 19q13.4. Genome Res. 7, 532–540. Knapp, M., Strauch, K., 2004. Affected-sib-pair test for linkage based on constraints for identical-by-descent distributions corresponding to disease models with imprinting. Genet. Epidemiol. 26, 273–285. Knott, S.A., Marklund, L., Haley, C.S., Andersson, K., Davies, W., et al., 1998. Multiple marker mapping of quantitative trait loci in a cross between outbred wild boar and large white pigs. Genetics 149, 1069–1080. Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.

Lander, E.S., Green, P., Abrahamson, J., Barlow, A., Daly, M.J., Lincoln, S.E., et al., 1987. MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1, 174–181. Luo, M., Bilodeau, P., Dennis, E.S., Peacock, W.J., Chaudhury, A.M., 2000. Expression and parent-of-origin effects for FIS2, MEA, and FIE in the endosperm and embryo of developing Arabidopsis seeds. Proc. Natl. Acad. Sci. USA 97, 10637–10642. Lynch, M., Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA. Naumova, A.K., Croteau, S., 2004. Mechanisms of epigenetic variation: polymorphic imprinting. Curr. Genomics. 5, 417–429. Nezer, C., Moreau, L., Brouwers, B., Coppieters, W., Detilleux, J., et al., 1999. An imprinted QTL with major effect on muscle mass and fat deposition maps to the IGF2 locus in pigs. Nat. Genet. 21, 155–156. Pfeifer, K., 2000. Mechanisms of genomic imprinting. Am. J. Hum. Genet. 67, 777–787. Pratt, S.C., Daly, M.K., Kruglyak, L., 2000. Exact multipoint quantitative- trait linkage analysis in pedigrees by variance components. Am. J. Hum. Genet. 66, 1153–1157. Sandovici, I., Kassovska-Bratinova, S., Loredo-Osti, J.C., Leppert, M., Suazez, A., et al., 2005. Interindividual variability and parent of origin DNA methylation differences at specific human Alu elements. Hum. Mol. Genet. 14, 2135–2143. Shete, S., Amos, C.I., 2002. Testing for genetic linkage in families by a variance-components approach in the presence of genomic imprinting. Am. J. Hum. Genet. 70, 751–757. Shete, S., Zhou, X., Amos, C.I., 2003. Genomic imprinting and linkage test for quantitative trait loci in extended pedigrees. Am. J. Hum. Genet. 73, 933–938. Spencer, H.G., 2002. The correlation between relatives on the supposition of genomic imprinting. Genetics 161, 411–417. Strauch, K., Fimmers, R., Kurz, T., Deichmann, K.A., Wienker, T.F., Baur, M.P., 2000. Parametric and nonparametric multipoint linkage analysis with imprinting and two-locus-trait models: application to mite sensitization. Am. J. Hum. Genet. 66, 1945–1957. Tuiskula-Haavisto, M., de Koning, D.J., Honkatukia, M., Schulman, N.F., Maki-Tanila, A., Vilkki, J., 2004. Quantitative trait loci with parent-of-origin effects in chicken. Genet. Res. 84, 57–66. Wade, M.J., 1998. The evolutionary genetics of maternal effects. In: Mousseau, T.A., Fox, C.W. (Eds.), Maternal Effects as Adaptations. Oxford University Press, New York, pp. 5–21. Wevrick, R., Francke, U., 1997. An imprinted mouse transcript homologous to the human imprinted in Prader–Willi syndrome (IPW) gene. Hum. Mol. Genet. 6, 325–332. Xu, S., 1998. Mapping quantitative trait loci using multiple families of line crosses. Genetics 148, 517–524. Zeng, Z.-B., 1994. Precision mapping of quantitative trait loci. Genetics 136, 1457–1468. Zou, F., Yandell, B.S., Fine, J.P., 2001. Statistical issues in the analysis of quantitative traits in combined crosses. Genetics 158, 1339–1346.