I
clMinition of the Phenotype John P. Rice,’ Nancy 1. Saccone, and Erik Rasmussen Department of Psychiatry Washington University School of Medicine St. Louis, Missouri 63110
1. II. III. IV. V. VI.
Summary Introduction The Benefits of a Narrowly Defined Disease Phenotype Endophenotypes and Quantitative Traits The Impact of Diagnostic and Measurement Error Discussion References
Definition of the phenotype is a key issue in designing any genetic study whose goal is to detect disease genes. This chapter describes strategies to increase the power to detect susceptibility loci for complex diseases. A narrowly defined disease phenotype can offer advantages over broad definitions. Studies of clinical disease can also benefit from judicious selection of endophenotypes and related quantitative traits for analysis. The effect of diagnostic and measurement error is also discussed; power is maximized when strategies to reduce error are incorporated into a study design.
‘To whom correspondence should be sddrcssed. Advances in Genelics, Vol. 42 Copyrighr 0 2CCl by Academic Prw. All rights of reproduction in any form rcscrvcd @C65-266O:Cl $35.03
70
Rice ef al.
II. INTRODUCTION To identify susceptibility loci for common, complex human diseases,researchers must first define the disease or phenotype of interest. Although genetic studies may lead to a molecular basis for disease definition, uncertainty in the clinical diagnosis or confounders and measurement error for quantitative risk factors may preclude the discovery of linkage. A dichotomous disease phenotype is often of primary interest to clinical investigators. High heritability h2 of the disease, defined as the ratio of genetic variance to total phenotypic variance, can indicate that direct linkage analysis of the disease phenotype may be fruitful. However, because a disease may be influenced by multiple loci, each of which makes only a small contribution, detection of any one locus may be difficult, even for clearly heritable diseases.
Ill. THE BENEFITSOF A NARROWLYDEFINED DISEASEPHENOTYPE One option to counter the difficulties in analyzing common, complex, oligogenic diseases may be to narrow the disease definition or define subtypes for analysis. Such strategies aim to identify more severe, more “biological,” or early onset forms of illness that are perhaps due to one or few genes. Focusing on subtypes can also identify more homogeneous families for analysis. Successful applications of this approach include the subdivision of Alzheimer’s disease according to age of onset, which has led to identification of disease mutations involved in autosomal dominant, early onset forms, and to discovery of the association of the apoE e4 allele with late-onset Alzheimer’s disease, as reviewed in Tilley et al. (1998). An additional advantage of a “narrow-phenotype” approach is purely statistical: simulations have shown that the population prevalence of a trait can affect the ability to detect linkage, with more common diseasesrequiring larger sample sizes for detection in a sibpair study (Rice et al., 2000). Oligogenic traits were generated using the models of Suarez (1994), assuming equal action of the multiple loci. The results in Table 6.1 show the increased power for a disease of 1% prevalence compared to a disease of 10% prevalence, for a fixed heritability h’, and a fixed number of trait loci. This effect is a reflection of the comparative gene frequencies of disease alleles segregating when affected sibpairs are sampled. The advantage of increased ability to detect linkage for less prevalent diseases,however, is somewhat counteracted by the practical difficulty of ascertaining for a rarer phenotype.
71
6. Definition of the Phenotype Table 6.1. Simulation Resultsfor Oligogenic Models” Heritability, h2 (%) Number of loci
100
75
50
2s
10% Prevalence 1 2 4 6 8 10
59 159 669 1,510 3,094 4,926
59 70 389 251 1,106 1,815 2,592 6,264 12,502 3,740 20,075 7,645 1% Prevalence
131 983 4,112 9,048 20,360 63,078
1 2 4 6 8 10
42 62 192 449 711 1,357
42 160 369 685 1,149 1,720
42 154 466 834 1,802 2,624
43 163 870 2,012 4,438 7,118
“Sample sizeN (= number of affectedsibpairs)required to detect linkage at a significancelevel of (Y = 0.0001 and 80% power.
tV. ENBWHENDTYPES AND QUANTITATWE TRAITS The study of clinical disease phenotypes can also be enhanced by investigating related quantitative traits. Complex diseasesmay be influenced by several genes of small effect that are difficult to detect by direct analysis of the clinical phenotype. However, it may be possible to detect such genes if they have a major effect on related traits under study. Such associated biological traits, called endophenotypes or risk factors, offer several advantages. Often these endophenotypes are quantitative measures subject to minimal or quantifiable measurement error. Both unaffected and affected individuals may be measured and included in analysis. A quantitative phenotype provides a range of values with potentially more information than a discrete or threshold scale, and a well-chosen endophenotype may be more “biologicall) than a clinical diagnosis, and more directly tied to gene expression. Finally, there are quantitative phenotypes that are important to study in their own right, such as adiposity, fat distribution, and blood pressure, even as
72
Rice et al,
opposed to obesity and hypertension. There is considerably more information for linkage in quantitative variation than for an arbitrarily discretized trait (Duggirala et al., 1997). For example, plasma cholesterol levels are quantitative indicators of risk for coronary heart disease (CHD) and have been found to be significantly associated with genotypes at candidate genes such as apoE (Boerwinkle and Sing, 1987; Kaprio et al., 1991; Kamboh et al., 1995). Further refinements on the basis of more precise knowledge of the component phenotypes or metabolic processes also can be useful. For example, the study of genes underlying CHD risk can be extended by studying the various fractions of total cholesterol-lowdensity lipoprotein cholesterol (LDL-c), high-density lipoprotein cholesterol (HDL-c), and triglycerides-which have heterogeneous effects on disease risk. In fact, the identification of quantitative trait loci (QTLs) will be easiest for quantitative phenotypes most proximal to the genotype, simply because the relative contribution of a single major locus will be greater. Thus, a more informative and successful study may consist of a search for genes affecting apolipoprotein B and AI levels rather than LDL-c and HDL-c, respectively. This strategy of refining the quantitative phenotype is analogous to that of using narrowly defined disease phenotypes discussed earlier. Human event-related potentials (ERPs) have been studied as possible endophenotypes for psychiatric diseases. The P50 sensory gating response appears promising in genetic studies of schizophrenia (Freedman et al., 1997). Schizophrenic probands and their first-degree relatives exhibit reduced suppression or nongating of the P50 auditory-evoked response when presented with repeated auditory stimuli. Neurobiological studies of the P.50 phenotype in rodents and humans strongly suggest the response is mediated by the cr+icotinic cholinergic receptor gene (CHRNA7) on chromosome 15 (Freedman et al., 1997). Freedman et al. (1997) conducted a genome-wide scan for nongating QTL using the P50 response and schizophrenia as phenotypes using nine Caucasian pedigrees of European ancestry in which schizophrenia was present in at least two members of a family. For the P50 phenotype, the greatest lod score of 5.30 (0 = 0.0, P < 0.001) was obtained with D15S1360, < 120 kb from the first exon of CHRNA7. Another example of a proposed endophenotype is the P3 component of human ERPs, which shows reduced amplitude in alcoholics even after long-term abstinence (Porjesz et al., 1998). Quantitative linkage analysis has given evidence of several loci linked to the amplitude of the P3 component of the ERP (Begleiter et al., 1998). Th ese loci may lead to candidate genes and further understanding of genetic factors underlying susceptibility to alcohol dependence. Care should be exercised in selecting and pursuing endophenotypes as a means for studying clinical disease. Platelet monoamine oxidase (MAO)
6. Definitionof the Phenotype
73
activity has been suggested as an endophenotype for alcoholism, based on early reports of association between alcohol dependence and low enzyme activity. MAO activity has furthermore been shown to be heritable, and it exhibits a commingled distribution that suggestsa major gene for activity level. However, recent findings (Whitfield et al., 2000) indicate that the association between MAO activity and alcohol dependence is most likely explained by the confounding effect of cigarette smoking. While MAO activity may still warrant genetic study in its own right (Saccone et al., 1999), it no longer is expected to directly shed light on the genetics of alcoholism susceptibility.
V. THE tMFACT OF DIAliNOSTtCAND MEASDRHWERT ERROR We now return to the dichotomous trait setting and examine the impact of diagnostic error on the heritability h2 and risk ratio h of a dichotomous poly genie trait, We assume that a trait is determined by an underlying liability scale, which in turn is determined by the additive effects of many genes. Individuals above a set threshold are affected; this threshold determines the true prevalence K of the trait. The joint distribution of liability within a family is assumed to be multivariate normal, with familial resemblance given by the correlation in liability between family members. Assuming no dominance, the correlation between first-degree relatives is one-half the heritability. Let s be the sensitivity of diagnosis (the probability of correctly diagnosing a true case) and let t be the specificity (the probability of correctly diagnosing a noncase). We will see that in particular, reduced specificity has a significant effect on the power to detect genes. Recall that K is the true prevalence of the trait, and let Ki represent the (true) risk to a first-degree relative of a true case. Note that the observed prevalence of the trait is given by K* = SK f ( 1 - t) (1 - K). If all probands are in fact true cases, the observed rate KT in relatives is similarly defined. However, assuming that probands include false positives, the observed rate in relatives is KT* = Kr SK/K* + (1 - t) (1 - K). To illustrate the significance of the preceding formulas, consider the case of a threshold set to yield a true prevalence K of 10%. Setting the true heritability to be lOO%, the risk Ki of a first-degree relative being a true case is 32.4% (Rice et al., 1987). Thus the true lambda is h = Ki/K = 3.24. Now consider the effect when the specificity t is held fixed and the sensitivity s is reduced. If t = 1.0 and s = 0.95, then the observed prevalence K* = 0.095, h2 is 97%, and the observed h = Ki**/K* = 3.24. If t = 1.0 and s = 0.90, then the observed prevalence K * = 0.090, h2 is 94%, and the observed h is still 3.24. Hence there is only a minor impact of reduced sensitivity on the observed heritabilitv and risk ratio.
74
Rice ef al.
In contrast, if sensitivity is fixed at s = 1, and specificity t = 0.95, then K* = 0.145, h2 = 68%, and the observed A = KT*/K* = 2.01. If specificity is further reduced to t = 0.90, we find that K* = 0.190, h2 is 50%, and the observed h = 1.56. The inclusion of false positives (reduced specificity) has a major impact. A specificity of 90% reduces h2 and h by 50% or more, and would have a dramatic effect in reducing power compared to the situation of no diag nostic error. A more detailed table of the effects of varying degrees of reduced sensitivity and specificity appears in Rice et al. (2000). However, the lesson is clear from the foregoing examples, which underscore the necessity of making clinical diagnoses as carefully as possible. Incorporating repeated measures into a study design may be a potentially useful strategy to reduce error. Similarly, measurement error of quantitative traits can reduce the power to detect and localize QTLs. However, various simple strategies can be used to maximize the signal-to-noise ratio. First, the average of multiple measurements can be used to minimize measurement error; this is commonly done in studies of blood pressure. Second, the effect of known or suspected confounders can be controlled for by regressing out the effects of the predictor variables on the phenotype. The resulting adjusted phenotype should reflect much reduced “noise” variance.
In contrast to Mendelian phenotypes, the effect sizes of genes for complex phenotypes are unknown. The sample sizes in Table 6.1 range from 59 to 63,078 affected sibpairs for a disease with 10% prevalence. Even once a linkage has been detected, the identification of the disease gene may be problematic because there is a wide support interval for the linkage signal. It is clear that phenotype definition can play a key role in gene discovery. As noted earlier, the use of a narrowly defined phenotype or a refined quantitative phenotype can lead to a dramatic increase in power. Moreover, the elimination of false positive cases or adjustment for confounders can have a similar benefit. Similar arguments pertain to a quantitative endophenotype. The statistical power for the detection of genes associated with a quantitative measure may be high, even though that gene is a minor susceptibility gene for the disease phenotype of interest. Thus attention should be given not only to the definition of the disease phenotype, but also to related phenotypes. Phenotype definition should be considered along with sampling strategy and analytic procedures in the design of a study. The power depends on all these factors as well as the true underlying state of nature. A phenotype
6. Definition of the Phenotype
75
with high measurement error or marked heterogeneity is likely to be problematic.
Ackncrwldgments This work was supported in part by grants MH37685, MH31302, AA12239, and ME117104 (NLS, ER) from the National Institutes of Health. Special thanks to Christine Roark for preparation of this manuscript.
References Begleiter, H., Porjesz, B., Reich, T, Edenberg, H. J., Goate, A., Blangero, J., Almasy, L., Foroud, T., Van Eerdewegh, P., Polich, J., Rohrbaugh, J., Kuperman, S., Bauer, L. 0, O’Connor, S. J., Charlian, D. B., Li, T.-K., Conneally, P M., Hesselbrock, V., Rice, J. I?, Schukit, M. A., Cloninger, R., Nurnberger, J. Jr., Crowe, R., Bloom, E E., (1998) Quantitative trait loci analysis of human event-related brain potentials: P3 voltage. ElectroencephalogsClan. Neurophysiol. 108~244-250. Boerwinkle, E., and Sing, C. E (1987). Th e use of measured genotype information in the analysis of quantitative phenotypes in man. III. Simultaneous estimation of the frequencies and effects of the apolipoprotein E polymorphism and residual polygenetic effects on cholesterol, betalipoprotein and triglyceride levels. Ann. Hum. C&net. 51,211-226. Duggirala, R., Williams, J. T., Williams-Blangero, S., and Blangero, J. (1997). A variance component approach to dichotomous trait linkage analysis using a threshold model. Genet.
[email protected]. 14,987-992. Freedman, R., Coon, H., Myles-Worsley M., OrrUrtreger, A., Olincy, A., Davis, A., Polymeropoulos, M., Holik, J., Hopkins, J., Hoff, M., Rosenthal, J., Waldo, M., Reimherr, E, Wender, P., Yaw, J., Young, D., Breese, C., Adams, C., Patterson, D., Adler, L., Kruglyak, L., Leonard, S., and Byerley, W. (1997). Linkage of a neuropsychological deficit in schizophrenia to a chromosome 15 locus. Pm. Natl. Acad. Sci. USA. 94,587-592. Kamboh, M. I., Evans, R. W., and Aston, C. E. (1995). Genetic effect of apolipoprotein(a) and apolipoprotein E polymorphisms on plasma quantitative risk factors for coronary heart disease in American black women. Atherosclerosis117, 73 - 8 1. Kaprio, J., Ferrell, R. E., Kottke, B. A., Kamboh, M. I., and Sing, C. E (1991). Effects of polymorphisms in apolipoproteins E, A-IV, and H on quantitative traits related to risk for cardiovascular disease.Arterioscler. Thromb. 11, 1330-1348. Kardia, S. L., Haviland, M. B., Ferrell, R. E., and Sing, C. E (1999). The relationship between risk factor levels and presence of coronary artery calcification is dependent on apolipoprotein E genotype. Arterioscler. Thromb. Vast. Biol. 19,427-435. Porjesz, B., Begleiter, H., Reich, T., Van Eerdewegh, I?, Edenberg, H. J., Foroud, T., Goate, A., Litke, A., Chorlian, D. B., Stimus, A., Rice, J., Blangero, J., Almasy, L., Sorbell, J., Bauer, L. Q., Kuperman, S., O’Connor, S. J., and Rohrbaugh, J. (1998). Amplitude of visual P3 event-related potential as a phenotypic marker for a predisposition to alcoholism: Preliminary results from the COGA Project. Alcohol C&n. Exp. Res. 22, 1317-1323. Rice, J. I’., Endicott, J., Knesevich, M. A., and Rochberg, N. (1987). The estimation of diagnostic sensitivity using stability data: An application to major depressive disorder. J. Psychiat. Res. 21, 337-345. Rice, J. P., Saccone, N. L., and Suarez, B. K. (2000). The design of studies for investigating Iindage and association. In “Analysis of Multifactorial Disease” (T. Bishop and P. Sham, eds.). Bias, Oxford. In press.
76
Rice et al.
Saccone, N. L., Rice, J. I?., Rochberg, N., Goate, A., Reich, T, Shears, S., Wu, W., Numberger, J. I., Foroud, T., Edenberg, H. J., and Li, T. K. (1999). Genome screen for platelet monoamine oxidase (MAO) activity. Am. J. Med. Genet. 88,517-521. Suarez, B. K., Hampe, C. L., and Van Eeredewegh, l’., (1994). Pro bl ems of replicating linkage claims in psychiatry. In “Genetic Approaches to Mental Disorders” (E. S. Gerghen and C. R. Cloninger, eds.), pp 23-46. American Psychiatric Press, Inc., Washington D. C. Tilley, L., Morgan, K., and Kalsheker, N. (1998). Genetic risk factors in Alzheimer’s disease.I. Clin. Pa&l. Mol. Puthol. 51, 293-304. Whitfield, J. B., Pang, D., Bucholz, K. K., Madden, P. A. E, Heath, A. C., Statham, D. J., and Martin, N. G. (2000). Monoamine oxidase: Associations with alcohol dependence, smoking, and other measures of psychopathology and personality. Psychol. Med. 30,443 -454.