Computational Biology and Chemistry 36 (2012) 13–14
Contents lists available at SciVerse ScienceDirect
Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem
Brief communication
GADS software for parametric linkage analysis of quantitative traits distributed as a point-mass mixture Tatiana I. Axenovich ∗ , Irina V. Zorkoltseva Institute of Cytology & Genetics, Siberian Division, Russian Academy of Sciences, Novosibirsk 630090, Russia
a r t i c l e
i n f o
Article history: Received 21 October 2011 Accepted 25 November 2011 Keywords: Segregation analysis Parametric linkage Large pedigree Quantitative traits Point-mass mixture Elston–Stewart algorithm Distribution with spike
a b s t r a c t Often the quantitative data coming from proteomics and metabolomics studies have irregular distribution with a spike. None of the wide used methods for human QTL mapping are applicable to such traits. Researchers have to reduce the sample, excluding the spike, and analyze only continuous measurements. In this study, we propose a method for the parametric linkage analysis of traits with a spike in the distribution, and a software GADS, which implements this method. Our software includes not only the programs for parametric linkage analysis, but also the program for complex segregation analysis, which allows the estimation of the model parameters used in linkage. We tested our method on the real data about vertical cup-to-disc ratio, the important characteristic of the optic disc associated with glaucoma, in a large pedigree from a Dutch isolated population. Significant linkage signal was identified on chromosome 6 with the help of GADS, whereas the analysis of the normal distributed part of the sample demonstrated only a suggestive linkage peak on this chromosome. The software GADS is freely available at http://mga.bionet.nsc.ru/soft/index.html. © 2011 Elsevier Ltd. All rights reserved.
1. Introduction Data combining continuous and point-mass components are common in proteomics and metabolomics. The presence of a spike in the quantitative trait distribution is explained by the absence of the protein/metabolite in the sample or by their very low concentration, which could not be detected by instruments (Dakna et al., 2010). The distribution of such data is characterized by the proportion of observations in the point-mass (spike) at zero and the distribution of the non-negative continuous component. In other words, these data contain information about both the binary and continuous components. A spike in the distribution of the quantitative trait creates the problem for statistical analysis. Usually, researchers have to exclude the point-mass observations and analyze only continuous samples. However, such approach loses the genetic information and reduces power. Thus, it is important to develop specific methods that allow the simultaneous analyses of both the binary and continuous components. The general approach to simultaneous analyses of these components was proposed by Broman (2003). It was assumed that QTL has a share in the probability of a point-mass observation and in the value of a continuous
measurement. This model of the trait inheritance was introduced in the procedure of parametric two-point interval mapping (Broman, 2003), multiple interval mapping (Li and Chen, 2009), and composite interval mapping (Taylor and Pollard, 2010). All these methods focused on experimental crosses. Here, we introduced Broman’s approach in the parametric linkage analysis of pedigree data, which is applicable to unrestricted large human pedigrees. 2. Algorithm and implementation The likelihood of arbitrary pedigree can be written in a general form as follows LH =
is a matrix of observed measurements for all pedigree where X is a matrix of their unobserved genotypes, and the members, G summation is performed over all possible genotypic configurations (Elston and Stewart, 1971). Expression (1) may be rewritten for the pedigree with n members as
···
g1
1476-9271/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2011.11.004
(1)
G
LH = ∗ Corresponding author at: Institute of Cytology & Genetics, Siberian Division, Russian Academy of Sciences, Lavrentiev Av., 10, Novosibirsk 630090, Russia. Tel.: +7 383 363 4944; fax: +7 383 333 1278. E-mail address:
[email protected] (T.I. Axenovich).
G)P( P(X| G),
j
k
Pr(gi )
gn
i
Pr(gj |gmj , gfj )
Pr(xk |gk ),
(2)
where Pr(g) is the population frequency of genotype g; Pr(g|gm ,gf ) is the transmission probability of the offspring genotype g given parental genotypes gm and gf ; Pr(x|g) is the probability of observed
14
T.I. Axenovich, I.V. Zorkoltseva / Computational Biology and Chemistry 36 (2012) 13–14
measurements x given genotype g; and i, j, and k define the pedigree founder, offspring, and measured pedigree members, respectively. The last component of Expression (2), Pr(x|g), is defined by the model of the trait inheritance. We used the mixed model of quantitative trait inheritance, assuming that a quantitative trait is under control of the QTL and a large number of additively acting genetic (polygene) and environmental factors. The effects of these components (QTL, polygenic, and environmental) are considered to be independent of each other. We assume diallelic autosomal QTL (alleles A and B), which defines the probabilities of a point mass observation for every genotype (wAA , wAB , and wBB ), and major-genotypic means (AA , AB , and BB ) for continuous measurement (Broman, 2003). The random environmental and polygene effects are assumed to be distributed normally with mean zero and variances e 2 and G 2 , respectively. Hence, Pr(x|g) = wg for the point-mass observations and Pr(x|g) = (1 − wg )fx (g , e 2 + G 2 ) for continuous measurement, where fx (, 2 ) is the density of a normal distribution with mean and variance 2 . Earlier, we developed the algorithm and software for the parametric linkage analysis of quantitative traits MQScore SNP (Axenovich and Aulchenko, 2010). Implementation of several algorithms to avoid computational underflow and decrease running time permits the application of this software to the analysis of very large pedigrees collected in human genetically isolated populations. Our software GADS is a modification of MQSCore SNP adapted to quantitative traits with a spike in the distribution. GADS can calculate the likelihood of arbitrary large pedigrees without loops based on the mixed QTL and polygene model of trait inheritance. We assumed QTL placed either near each marker locus (programs GADS 2point), or between two marker loci (program GADS 3point). Parameters of model (frequency of QTL allele, three probabilities of a point mass observation, three genotypic means, polygene effect, and random component of the trait variance) may be defined by the user, or fixed at the estimates obtained in complex segregation analysis. The program GADS segr for complex segregation analysis is included in GADS software. We would like to note that GADS cannot be used if a pedigree contains loops. Breaking pedigree loops allows the use of our software for likelihood approximation based on Stricker’s algorithm (Stricker et al., 1996). We recommend LOOP EDGE program (Axenovich et al., 2008) for optimal breaking loops in large complex pedigrees. 3. Example We used GADS software in our study (Axenovich et al., 2011) to identify new loci that control the vertical cup-to-disc ratio (VCDR), the important characteristic of the optic disc associated with glaucoma.
We analyzed 2248 genotyped and phenotyped individuals in a large pedigree, including 23 612 members in 18 generations. VCDR was estimated as zero for 283 pedigree members. Complex segregation analysis confirmed the significance of the QTL effect on both binary and continuous components of the trait. Restricted hypotheses with equal probabilities of spike or equal genotypic means for all genotypes were significantly worse than unrestricted hypothesis (2 = 81.88, p = 1.6 × 10−18 and 2 = 23.24, p = 9.0 × 10−6 , respectively). The estimates of the model parameters obtained by complex segregation analysis were used in parametric linkage analysis, which highlighted one significant and five suggestive genome-wide linkage signals for VCDR considered as a point-mass mixture. The strongest linkage signal was obtained on chromosome 6q23–q24 (LOD = 3.33). To check the effect of the simultaneous analyses of spike and normally distributed values, we performed a parametric linkage analysis of the restricted sample containing only the continuous measurements of VCDR. We obtained LOD = 3.0 at 6q23–q24. This result supported the feasibility of the simultaneous analyses of the point-mass observations and continuous measurements for QTL mapping. For details of the VCDR analysis, refer to Axenovich et al. (2011). In conclusion, in this work we describe a software for the multipoint parametric linkage analysis of quantitative traits distributed with a spike. The software may be applied to large pedigrees, including thousands of members. The analysis of real data distributed as a point-mass mixture demonstrated the usefulness of the software for QTL mapping. The software GADS is available at http://mga.bionet.nsc.ru/soft/index.html. Acknowledgements We would like to thank Anatoly Kirichenko for technical support. This work was supported by Russian Foundation for Basic Research and Programs of Russian Academy of Sciences. References Axenovich, T.I., Aulchenko, Y.S., 2010. MQScore SNP software for multipoint parametric linkage analysis of quantitative traits in large pedigrees. Ann. Hum. Genet. 74, 286–289. Axenovich, T.I., Zorkoltseva, I.V., Liu, F., Kirichenko, A.V., Aulchenko, Y.S., 2008. Breaking loops in large complex pedigrees. Hum. Hered. 65, 57–65. Axenovich, T.I., Zorkoltseva, I.V., Belonogova, N.M., van Koolwijk, L., Borodin, P.M., Kirichenko, A.V., Babenko, V.N., Ramdas, W.D., Amin, N., Despriet, D., Vingerling, J., Lemij, H., Oostra, B.A., Klaver, C., Aulchenko, Y.S., van Duijn, C.M., 2011. Linkage and association analyses of glaucoma related traits in a large pedigree from a Dutch genetically isolated population. J. Med. Genet. 48, 802–809. Broman, K.W., 2003. Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163, 1169–1175. Dakna, M., Harris, K., Kalousis, A., Carpentier, S., Kolch, W., Schanstra, J.P., Haubitz, M., Vlahou, A., Mischak, H., Girolami, M., 2010. Addressing the challenge of defining valid proteomic biomarkers and classifiers. BMC Bioinformatics 11, 594, doi:10.1186/1471-2105-11-594. Elston, R.C., Stewart, J., 1971. A general model for the genetic analysis of pedigree data. Hum. Hered. 21, 523–542. Li, W., Chen, Z., 2009. Multiple-interval mapping for quantitative trait loci with a spike in the trait distribution. Genetics 182, 337–342. Stricker, C., Fernando, R.L., Elston, R.C., 1996. An algorithm to approximate the likelihood for pedigree data with loops by cutting. Theor. Appl. Genet. 91, 1054–1063. Taylor, S.L., Pollard, K.S., 2010. Composite interval mapping to identify quantitative trait loci for point-mass mixture phenotypes. Genet. Res. (Camb.) 92, 39–53.