J psychlot Res VoI 26 No Prmted ,n Great Britain
4, pp 329-344.
EXTENDING
0022-3956/92 $5 00 + 00 Pergamon Press Ltd
1992
THE PHENOTYPE
IMPLICATIONS STEVEN
OF SCHIZOPHRENIA:
FOR LJNKAGE
MATTHYSSE*
and
JOSEF PARNAst
*McLean Hospital, Belmont, MA 02178, and Harvard MedIcal School,
Copenhagen
ANALYSIS
Umverslty,
U S A
, and THvldovre
Hospital
and
Denmark
Summary-On the basis of slmulatlon studies, we suggest that most exlstmg designs for studying hnkage In schlzophrema do not have sufficient power to detect a maJor contrlbutmg locus, even If one IS present For this reason, recent failures to rephcate reports of hnkage m schlzophrema are not surprlsmg Inclusion m the hnkage design of phenotypes genetically related to schizophrenia, but more commonly found m the relatives of schlzophremcs than schizophrenia Itself, may Increase the power of linkage analysts substantially
1 The Problem THE LAST
few years have brought a worrisome series of dlsconfirmatlons of much-celebrated fmdmgs of genetic hnkage m psychiatric disorders For instance the report by Sherrmgton et al (1988) on linkage between schlzophrema and a marker on chromosome 5 has not been confirmed m other samples (Kaufmann, DeLlsi, Lehnor, & Gllham, 1989; Kennedy et al., 1988; St Clalr et al , 1989), not even by the original investigators (Detera-Wadlelgh et al , 1989). The linkage reported by Egeland et al (1987) between bipolar affective disorder and a locus on chromosome 11 m a large Amish pedigree was not detected m other studies (Hodgkmson et al , 1987) and was later retracted (Kelsoe et al , 1989). Reviewing the failure to replicate the chromosome 5schlzophrema linkage relatlonshlp, Detera-Wadlelgh et al. (1989) suggest.
Takmg all the fmdmgs mto account, we can tentatively consider schlzophrema as aetlologlcally heterogeneous, although other mterpretatlons are possible Genetlc heterogeneity m affective disorders has also been proposed to account for dlsparltles m linkage results The resolution of the varymg linkage results may have to await study of a large series of families (P 393)
On the other hand,
Rlsch (1990a) argues that
a stranger to the field of genetics rmght at first be bewildered by the explanation offered for such mconslstency genetic heterogeneity In other words, non-replication of an mltial linkage fmdmg by a second mvestlgator does not confer evidence against the mitral positive report, but rather greater evidence for genetic causation m terms of a second locus (p 4)
If linkage analyses m psychiatry are indeed “a bubbling cauldron of false positives” (Matthysse, 1990), a choice among three alternative interpretations presses itself on us* (a) The methods that have been used to assess linkage m psychlatrlc disorders do not have a reasonable probability of success, even if linkage 1s present (b) The methods that have been used are appropriate; it is simply necessary to work with more and larger samples. (c) The methods and the sample sizes are adequate; the message, if we are brave enough to face It, 1s that there are no major genes to be found. 329
330
I,
MATTH~SFL
and I PARUAS
Based on the analyses reported below, we fmd (a) the most likely alternative That does not mean, of course, that a major gene will be found, n means only that we do not fmd the failure of linkage analyses to date to be compelhng evidence for relectmg a major gene hypothesis If the generally accepted monozygottc/dyzgottc (MZ/DZ) concordance ratios for schrzophrema are correct, a single-gene model of schrzophrema cannot be fully adequate, because (when the risk to srblmgs 1s close to the risk to offsprmg, as 1s the case for schtzophrema), the MZ concordance rate ought to be twice the DZ concordance rate, minus the population frequency (Matthysse & Krdd, 1976), whereas the published MZ/DZ ratios m schtzophrema are much higher A maJor gene model should not be confused with a single gene model, however Rrsch (1990b) demonstrates, for example, that multrphcatrve models with a small number of discrete loci predict MZ/DZ ratios m the observed range 2 Strategy
for Estimating
Power
Selecting among the three interpretations listed m Section 1 requires an estrmatron of the power of current methods to detect linkage Although there are model-free methods of detecting linkage (Elston, Spence, Hodge, & MacCluer, 1989; Weeks & Lange, 1988), there cannot be model-free methods for estimating power, because statrstrcal power 1s defined as the probabrhty of detecting a deviation from the null hypothesis, for a given emprrrcal procedure (with the risk of false posnrves controlled), on the assumption that a particular model 1s true Ideally, we would compute the probabthty of false negatives, assuming the vahdny of each of a number of competmg models Similarly, rt 1s not conclusrve to estimate the parameters of the model, and the carry out power calculatrons using only its maxrmumlikelihood values One ought to take mto account the rate of decline of the likelihood surface from its peak, as the parameters are varied, and calculate power over the range of parameter choices that are consistent with the observed data In the event that power analyses have strongly model-dependent or parameter-dependent outcomes, to attempt to narrow the range of possible models still further would be arbitrary, given the present state of our knowledge The practical decrsron for or against undertaking linkage experiments would have to be made on the basis of the plausrbrlny of the competing models, and on the mclmattons of linkage analysts (and funding agencies) to base then decrsrons on the bestcase or the worst-case scenario In this study, we only make a start toward this program, basing our calculattons on one model, and setting the parameters of that model equal to then maximum-likelihood estimates derived from previous emptrrcal studies We recognize, therefore, that our calculatrons are not decisive, and we urge enthusiasts for other models to carry out srmrlar calculatrons m order to make conclusrons possible about the model-dependence of the results 3 Model
Specifrcatrons
3 1 The Latent Trait Model for Schlzophrema, Spectrum, and EMD Our calculatrons were based on the “latent trait” (LT) model we have previously described and applied to the genetics of schrzophrema (Matthysse, Holzman, & Lange, 1986) The
EXTENDING
THE SCHIZOPHRENIA
331
PHENOTYPE
latent trait 1s a hypothetical process that causes both the illness and the associated manifest traits There are three genotypes, m which the pathogenic allele 1s absent, heterozygous, or homozygous. The probablhty that an mdlvldual has the latent trait 1s assumed to depend on his genotype, K,, 7r2 and 7rj being the probablhtles if the pathogenic allele 1s absent, heterozygous or homozygous, respectively The probability of observing the illness depends on the presence of the latent trait: r;lf it 1s present, r, if it IS absent Slmllarly, the probability of observmg the manifest trait IS r; if the latent trait is present, rf If It IS absent. The structure of the LT model, with the parameters estimated from our studies m Norway (Holzman et al , 1988), and from the data m Table 1, IS illustrated m Figure 1 (top). The complete LT model IS phenotyplcally three-dimensional, making separate predictions for schlzophrema, schlzophrema spectrum disorders (other than schlzophrema itself), and eye movement disorder (EMD) 3.2. The One-Dlmenslonal Model: Schlzophrema Alone The model used as the basis of the first set of calculations (Figure 1, bottom) is a “one dimensional proJection” of the LT model By a one-dlmenslonal projection, we mean that EMD
0.048
0 00063 Latent trait
q=O.O38
0 008
q=O.O38 Figure I Top Structure and parameters of the latent trait model Normal, heterozygous and homozygous genotypes are mdlcated on the left The gene frequency IS q The probabdmes of the latent trait for each genotype (r,) are mdlcated over the arrows from the genotypes to LT The arrows from LT to EMD and “spectrum” mdxate the probablhty that an mchvldual with the latent trait will have those phenotypes The schlzophrema spectrum 1s subdivided mto schlzophrema and other schizophrenia spectrum disorders The arrows labeled 048, 00063 and 008 represent the probability that mdlvlduals wlthout LT will have the mdxated phenotypes Bottom Onechmenslonal proJection of the latent trait model
332
5
MATTHYW
and
I
PARNA\
Table I Frrquenc res rn the General Popula~ron and m Srblrngs of SChuophrenrcs, assumed for Settmg Model Parameter5 Dlagnosrs
Frequency Population
Schlzophrema Other spectrum
0 6% 3%
Sibs 3 5% 15%
Note Data from Tsuang, Wlnokur, and Crowe (1980), Brrtrsh Journal of Psychratry, 137, 497-504 and Slever (personal communrcatlon)
the other two phenotypes are still assumed to segregate accordmg to the LT model, but they are Ignored, and attention IS focused on schlzophrema alone. The “path coefflclents” from the latent trait to the mamfest traits are absorbed mto the P, m the proJected model. The one-dlmenslonal proJectIon is a tradltlonal single major locus @ML) model The parameters of the one-dImensIona projection are easily calculated, they are q=O 038, x0= 0 00063, 7r, =0 071, x2= 0 086 These are by no means the only SML parameter choices consistent with the observed data On the other hand, two well-estabhshed facts about schlzophrema serve as strong constraints on any SML model the population frequency of about 0 6%, and the risk to slblmgs of affected probands of about 3 5% (Tsuang, Wmokur, & Crowe, 1980) (Table 1) The parameter choices we have made predict 0 59% population frequency and 3 49% frequency In sibs, respectively ’ Since any alternatlve SML model would be SubJect to the same population frequency and relative risk constraints, it seems likely that other SML models would yield power estimates slmllar to our own Multi-locus and mixed models, on the other hand, are likely to yield different estimates 3 3 The Two-Dlmenslonal Model Schuophrenla and Spectrum As demonstrated by Kety, Wender, and Rosenthal (1978),* and confirmed by Kendler, Gruenberg, and Strauss (1981), a number of psychlatrlc disorders other than chmcal schlzophrema are also found with greater than population frequency m the relatives of schlzophremc probands The groups of these condltlons, called the “schlzophrema spectrum”, Includes DSM-III schlzotypal, paranoid, and perhaps schlzold diagnoses (Kendler et al , 1981) Accordmgly, our model postulates an underlying spectrum condmon that has schlzophrema and other schlzophrema spectrum disorders as probablhstlc outcomes We have represented the paths to the two classes of outcomes by dotted lmes m Figure 1, to mdlcate that they are by nature exclusive outcomes, unlike EMD and the ‘The populatton frequency was mcorporated mto our search algorithm as a constramt, the frequency m sibs was not The estimate for frequency m sibs IS conservatwe, because only chrome schzophrema IS mcluded The frequency of other schlzophrenta spectrum disorders m stblmgs IS substantially higher (Table 1) 2See also confirmatory data presented by Kety and Ingraham (1992)
EXTLNDIUG
THE SCHIZOPHREYIA
PHENOTYPE
333
underlying spectrum trait, which are not exclustve, but are postulated to be statistically Independent, condttroned on the presence or absence of the latent trait The probabthtres of schrzophrema and of other schtzophrema spectrum outcomes of an underlying spectrum trait were adjusted to agree with the observed rates of these phenotypes in the general populatton and m the sibs of schrzophremcs (Table 1) 4. Method
of Estimating
Power
Our calculatrons are based on one very large pedigree, which 1s currently bemg studied intensively by our research group m Copenhagen. The use of an actual pedigree helps insure that our calculations will be relevant to the needs of workers m the field (and, of course, makes the results especially mterestmg to the authors). At the time of the srmulattons, the pedigree included 285 mdtvrduals (233 living), of whom 11 were diagnosed as schtzophremc Although this number represents a large number of affected people, the frequency for schrzophrema m this family is still low (11/285 = 3.86%) consistent with the published data reported m Table 1). The low morbid risk IS, of course, the stumbling block to the success of linkage analysts m schrzophrema The power calculatton takes place m three steps (1) stmulatton of trait and marker genotype vectors, assummg linkage or nonlmkage, and taking mto account the structure and observed dtstrtbutton of schrzophrema m the pedigree; (2) simulation of other schizophrenia spectrum phenotypes, taking into account the tract genotypes generated m step 1; and (3) computatton of the maximum LOD score for linkage, taking mto account the marker genotypes simulated m step 1, the drstrtbutton of schtzophrema, and the other phenotypes simulated m step 2. Step 3 1s carried out separately for the three-dtmenstonal LT model and for its one- and two-drmensronal proJectrons, outlmed m Section 3. We describe each of the steps m detatl below 4.1. Slmulatlon of Trait and Marker Genotypes When selectmg a pedigree for linkage studres, the analyst normally IS influenced by the number of mdrvrduals with the maJor phenotype, and by the locattons of the affected mdrvtduals wtthm the pedigree structure. Our genotype stmulattons, therefore, must be constrained by the drstrrbutton of the major phenotype (m this case, schrzophrema). Satisfying that constraint creates a problem for stmulatton, because if one were Just to select founder genotypes according to a Hardy-Weinberg dtstrtbutron, and allow Mendehan segregation to occur randomly at each mating, the probabrhty of endmg up wrth exactly the observed drstrrbutton of major phenotypes would be astronomrcally small. Two dtfferent ways of solving this problem have been worked out* a genetic risk method (Ott, 1989; Ploughman & Boehnke, 1989), and a random walk method (Lange & Matthysse, 1989). The genetic risk method 1s easier to understand, and tt 1s more exact than our own; but rt is much slower computattonally for large pedtgrees, so we have used both.
334
S MATTHYSSL and J
PARNA~
4 1 1 The genetrc rrsk method The genetic risk method works sequentially through the pedigree, addmg genotype mformatlon about successive members, based on (a) the genotypes determined m previous steps, and (b) the major phenotypes of the members whose genotypes are still undetermined lndlvlduals m the pedigree may be chosen m any order for the genotype assignments At each step, the algorithm selects an mdlvldual whose genotype has not yet been determined, and calculates the genetic risk for that mdlvldual based on the genotypes already chosen and the major phenotypes of mdlvlduals not already chosen In other words, the algorithm estimates the probability that the new mdlvldual will be homozygous, heterozygous or normal, Just as if genetlc counselmg for that mdlvldual were in question The algorithm then flips a “weighted corn”” to fix the new mdlvldual’s genotype, with weights for each genotype proportional to the genetlc risk probablhtles. Once fixed, the new mdlvldual’s genotype becomes part of the “data” used by the algorithm to estimate the genetlc risk of mdlvlduals later m the sequence Smce this method depends on successive “coin flips”, it produces an ensemble of pedigrees, a different version for each time the sequence 1s carried out It can be shown that the probablhty of fmdmg any genotype vector m the ensemble 1s proportional to its hkehhood, given the observed pattern of major phenotypes 4 1 2 The random walk method The random walk method IS an adaptation of the Metropolis algorithm used m statistical physics Genotypes are arbltrarlly assigned mltlally to each mdlvldual m the pedigree. At each step of the process, an mdlvldual 1s selected randomly, and his genotype is changed, as If by mutation 3 The change may Increase or decrease the hkehhood of the observed major phenotypes dlstrlbutlon If it increases the likelihood, the mutation is allowed to persist If it decreases the hkehhood, the mutation 1s ordmarlly reJected, except that with a certain probablhty It IS accepted, even though It makes the fit to the data worse The latter step, although it may seem paradoxical, IS crltlcal for the success of the Metropohs algorithm, since otherwlse the simulation would be likely to get “stuck” m genotype vector choices that are far from optimal, but are at least a httle better than those that he immediately nearby If this process 1s carried out a great many times, genotype vectors are generated with probablhtles approximately proportional to their likelihoods, given the observed pattern of major phenotypes The approxlmatlon improves as the number of “mutation” steps increases, no matter what the starting point (we use 500,000 steps). The disadvantage of this method IS that the correct probablhtles are reached only m the limit of a large number affect only one mdlvldual’s genotype of steps, the advantage 1s that, since “mutations” at a time, the steps are very fast On our VAXstatlon 3100 m single-user mode, simulating 20 of the 285member pedigrees took over five days with the genetic risk method, but as many as 100 pedigrees could be generated m one day with the random walk method Detailed comparisons of results with the two methods are discussed m Section 5 1.
‘Actually, genetrc descent states are assIgned and mutated, rather than genotypes GenetIc descent states contam more mformatlon than genotypes, dlstmgulshmg the maternal and paternal alleles and speclfymg their sources (Lange & Matthysse, 1989)
EXTENDING
THE SCHIZOPHRENIA
PHENOTYPE
335
4 1.3. Slmulatron of marker genotypes The methods discussed so far suffice for stmulatmg trait genotypes, but for estimatmg the power of linkage analysis tt is also necessary to simulate marker genotypes. In the case of the random walk method, stmulatton of marker genotypes 1s easy, because the algorithm produces an ensemble of genettc descent states. For example, if m a particular stmulation an mdtvtdual’s paternal tract allele comes from hts paternal grandfather, hts paternal marker allele will come from the same source with probability 1 - 0, and from his paternal grandmother with probabthty 0, where 191s the postulated recombmatlon fraction For the null hypothesis of nonhnkage, 8= % The assignment of marker genotypes IS slightly more complicated for the genetic risk method. After the trait genotypes are determined m a particular stmulatton, the pedigree is traversed once again, working systemattcally from earlier to later generations Trial sources for the trait alleles of each offspring m the pedigree are assigned at random, stopping as soon as an assignment 1s found that IS consistent with both the trait allele source assignments for earlier generations and with the offsprmg’s genotype. Then marker allele source assignments can be made, m the same way as m the random walk method. 4 2 Slmulatlon of Non-Schlzophrenrc Phenotypes Although the dtstrtbutton of the major phenotype m a pedtgree is likely to be known (at least approximately) prior to selection of the pedigree for linkage analysis, determining the dtstrtbutton of associated phenotypes m the pedigree (m our case, other schizophrenia spectrum disorders and EMD) requires a substantial commitment of resources, and 1s likely to begin only after the pedigree has been chosen. In order to estimate the power of linkage analysts, we therefore must not only generate the marker genotypes, but also simulate the associated phenotypes This step 1s relatively easy, because the model specified m Figure 1 permits other spectrum disorders and EMD to be simulated simply by “flipping weighted corns” mdtvldually for each pedigree member, once the genotypes for each pedigree member have been generated. Some care 1s necessary, because we regard schtzophrema spectrum status and EMD as observable traits only wtthm certain age ranges Similarly, blood samples for assessing marker phenotypes are only available from hvmg members of the pedigree. These hmttattons are summarized m Table 2. Spectrum diagnoses and EMD were assigned to all pedtgree members wtthm the appropriate age ranges during this phase of the stmulatton, but mformatlon on EMD was deleted for power calculations on the two-dtmensronal model (Section 3.3), and both EMD and spectrum data were omitted for calculations on the one-dtmenslonal model (Section 3.2) Table 2
Age Constrarnts on Trart and Marker Phenotypes Age (years)
Spectrum
EMD
Marker
fa
yes
265 17-64 13-16
yes yes
no no
no yes
yes
yes
no no
yes no
yes yes
5 12
4 3 Lmkage Computatrons Lmkage computattons were carrred out usmg the program userm7 for, from the MENDEL genetic analysts package,J modified to mcorporate the latent trait model The trait phenotypes were either (a) schizophrenia alone, (b) schizophrenia and other schizophrenia spectrum disorders, or (c) schizophrenia, other schrzophrema spectrum disorders and EMD, accordmg to which model was being tested Because of the probabrhsttc nature of the model, rt was not necessary to include or exclude phenotypes on an all-ornone basis; the algorrthm automatically weights each phenotype according to the strength of the relevant path coeffrcrent shown m Figure 1 Smgle-point rather than multi-point linkage methods were used, because of the evrdence that multi-point methods are not robust against model mrsspectfrcatron (Rrsch & Gruffra, 1992) The recombmatton fraction was set at .05 (true linkage hypothesis) or 5 (null hypothesis) In most of our stmulattons, the linkage marker was represented by four codommant alleles, distributed eqmprobably m the pedigree founders The approximation underestimates the power of contemporary linkage technology, but including more alleles would have been computatronally intractable In one simulation, the linkage marker was assumed to be completely mformatrve (1 e , to have an mfmrte number of alleles, so that every allele of every founder In the pedrgree could be safely assumed to be drstmct) This case is also computatronally feasible 5 Results 5 1 Comparrson of Srmulatlon Methods The two methods of genotype vector simulatron, discussed m Section 4 1, are compared m Figure 2. Only 20 pedigrees were simulated, because of the long computation time of the genetic risk method (Section 4 1 2) Although the median LOD scores do not differ substantially, the genetic risk method occasronally produces higher LOD scores than those typically attamed by the random walk method Since the genetic risk method is more exact, it may become preferable when computmg speeds are sufficiently high It was not feasible to use it for the majority of the computatrons reported here, but the stmtlartty of the cumulative frequency graphs for the last two methods m the lower LOD score range (it is the lower LOD score range that 1s decisive for esttmatmg the power of linkage; see Section 5.5) suggests that power estimates based on the random walk method will not be far off the mark 5.2. Comparrson of Models Figure 3, based on srmulatrons with the genetic risk method, shows how expected LOD scores (under the hypothesis of true linkage) vary, depending on the number of trait phenotypes taken into account m the linkage analysis There 1s a substantial posttrve shift m the drstrrbutron of simulated LOD scores m going from the one-dimensional model (schizophrenia alone) to the two-drmensronal model (schtzophrema and non-schrzophremc spectrum), and from the latter to the threedrmensronal model, including EMD The average LOD scores attained by the different models show the same trend (Table 3) 5 “Kmdly prowded by Dr K Lange ‘The value of 0 596 m this table was mistakenly stated as 0 13 m a letter from S M to fsychologlcal (January, 1992, p 75) The tabulated value IS correct
Scrente
337
EXTEVDING THE SCHIZOPHRENIA PHEYOTYPE
i-
Random walk method
risk
F
method
-
I
I
I
I
I
I
I
I 8
I
0
LOD Figure 2 Comparison of LOD scores attamed by the genetlc risk and random walk methods For each LOD score on the x-axIs, the correspondmg cumulative frequency (the proportion of slmulatlons with maxlmum LOD scores for hnkage less than or equal to the value mdlcated on the x-axis) IS plotted on the y-axis Three-dImensIona model, 20 pedigrees
F
I
I
I
I
I
I
I
I a
LOD Ffgure 3 Simulated LOD score dlstrlbutlons genetlc risk method
for the one-, two- and three-dImensIona
models
Twenty
pedigrees,
338
S MATIHYSSE and J
PARNAS
Table 3 Average LOD Scores for the One-, Two- and Threedrmensronal Models, andfor the Completely Informatwe Marker Case Random Walk Method, 100 Pedrgrees Model
Average
LOD score
e=o 05 Schlzophrema only Full spectrum Full spectrum and EMD Completely mformatlve
0 1 2 7
e=o 5
596 457 463 728
0 095 0 103 0 132 1 239
5.3 Comparison of Dlstrrbutrons under Lmkage and Nonhkage Although the average LOD score attained by the three-dlmenslonal model (2 463) 1s respectable, the lmphcatlons for the success of linkage analysis are not as optlmlstlc as this value suggests. Inferences about the power of linkage analysis require both simulations assuming linkage and simulations assuming nonlinkage, smce the risk of false negatives must be estimated for a given level of false positives, or vice versa. In order to compare the distributions under hypotheses of linkage and nonhnkage, simulation results were plotted m a standard format (see, for example, Figure 6) For each LOD score on the x-axis, the correspondmg cumulative frequency (proportion of scores equal to or below any given value) is plotted on the y-axis Two graphs are drawn m each figure. The graph marked “linkage” 1s the cumulative frequency distribution for the case where hnkage IS present; the graph marked “nonhnkage” 1s the distribution for the null hypothesis The nonhnkage graph lies above the linkage-present graph, because LOD scores are clustered around smaller values if linkage 1s not present The median LOD score can be found by drawmg a horizontal line from 0.5 on the y-axis until it intersects the graph, and then projectmg down to the x-axis. In Figure 4 the assumption 1s made that schlzophrema 1s the only phenotype studied In this figure and all subsequent figures, 100 pedigrees are simulated, usmg the random walk method Figure 5 summarizes similar analyses for the two-dimensional model, which includes other schizophrenia spectrum disorders among the observed phenotypes, and Figure 6 summarizes results for the three-dimensional model, which includes EMD as well Table 4 Number of False Negatrves and False Posltwes Slmulatlons Threshold set at 3 0 Model
False Negatives
Schlzophrema only Full spectrum Full spectrum and EMD Completely informatlve
6The maxlmum
LOD score attamed
out of 100
m 100 slmulatlons
lOOh 90 72 4
Posltlves 0 0 0 0
for the one-dImensIona
model
was 2 55
EXTENDING
THE SCHIZOPHRENIA PHENOTYPE
339
LOD Ftgure 4 Cumulative dlstr~but~o~s of LOD (upper graph), one-drmenslonaf model
scores
under hypotheses of imkage (lower graph) and nonimkage
P
LOD Ffgure 5 Cumulative dlstrlbutrons of LOD scores under hypotheses of hnkage (lower graph} and nonhnkage (upper graph), two-dlmensronal model
S
340
MATTH\FSE
PAKNI\S
and J
Llnkage
f
0
rYi
I
I
I
I
I
I
I
I
8 LOD
Fig 6 Cumulatwe dlstrlbutlons of LOD scores under hypotheses graph), three-dlmenslonal model
ot lmkage (lower graph) and nonlmkage
(upper
5.4 Results wrth the Conventional LOD Score Threshold By exammatlon of Figures 4, 5 and 6, the power of linkage analysis using the conventional LOD score threshold of 3.0 can readily be estimated The results are summarized m Table 4 Judging from this table, a threshold of 3.0 1s not optimal for studying schlzophrema and associated traits m this pedigree, except for the completely mformatlve marker model; no false positives are found, but for all the other models there are far too many false negatives
5.5 GraphIcal Method of Settrng the Threshold A graphical method can be used with Figures 4, 5 and 6 to set a threshold that maintains the rate of false negatives at an acceptable level, e g 10%. In each graph, imagine a horizontal line from 0 1 on the y-axis (or whatever false negative rate 1s considered tolerable), drawn until it intersects the lower graph Project downward to the x-axis; the mtersectlon with the x-axis 1s the LOD score threshold that will, on the average, produce false negatives at the chosen rate. Now project upward to the upper graph, and at the mtersectlon, complete the circuit by drawing a horizontal lme to the left from the upper graph mtersectlon to the y-axis. 1.0 minus the value where the horizontal line meets the y-axis 1s the expected proportion of false positives When this method 1s used for Figure 4, it appears that a false negative rate of 25% 1s achievable with a threshold of 0 0488, but with this threshold there would be 28 false positives per 100 markers, which 1s certainly an unacceptable figure. A false negative rate
EXTENDING THE
SCHIZOPHRENIA
PHEYOTYPE
341
of 10% cannot be obtained for any LOD score threshold.’ We conclude that linkage analysis cannot be carried out effectively on this pedigree, despite its large size, if schlzophrema alone is taken as the phenotype 8 Examining results with the two-dlmenslonal model m Figure 5, we see that a false negative rate of 10% can be achieved, but a very low threshold of 0.0871 must be set, that threshold will give, on the average, 27 false positives per 100 markers tested As m the case of schlzophrema alone, the balance between false posltlves and false negatives remains unfavorable The three-dlmenslonal model fares better (Figure 6). To achieve a false negative rate of IO%, a threshold of 0.8351 must be set. That threshold will produce, on the average, 4 false posltlves per 100 markers tested Since approximately 330 markers will cover the genome with a maximum trait-gene/marker-gene distance of 5 centlMorgans (mtermarker spacing 10 CentlMorgans), the estimated false positive load for a complete scan of the genome 1s approximately 13. In other words, if the hypothesis of linkage 1s true, after completmg studies of the entire genome with this pedigree, approximately 14 markers would have to be “carried forward” for work with other pedigrees One of these would, indeed, be linked to the trait gene, but the others would not Obviously claims of “dlscovermg linkage” after completmg the first pedigree would be out of the question under these circumstances On the other hand, narrowing the field down to 14 markers, with only a 10% risk of omitting the true linkage marker from the list of candidates, would be a slgmflcant step forward. Our slmulatlons of the completely mformatlve marker case (see Section 4.3) show a dramatlc increase m linkage power. In order to estimate the power to detect linkage under optimal circumstances, we used the three-dlmenslonal model; we assumed that the linkage marker was so close to the trait gene that 8 could be taken to be zero; and we considered DNA samples to be available from everyone m the pedigree, mcludmg deceased persons. The results are shown m Figure 7. The LOD score dlstrlbutlon curves for the linkage and nonhnkage hypotheses are now very well separated. As discussed m Section 5.4, the conventional threshold of 3.0 yields 4 false negatives out of 100 slmulatlons, and no false posltlves The probability of a false negative can also be lowered to approximately 1% by setting a threshold of 1.718, at the price of 1 false positive out of 100 slmulatlons 6. Conclusions 6 1. Why Lmkage m Schrzophrenra Has Faded So Far Figure 4 provides a tentative answer to the questlon raised m Section 1, why linkage experiments to date on schizophrenia have not been successful. There 1s simply no threshold that will give acceptable risks for both false posltlves and false negatives, if schlzophrema alone 1s used as the phenotype. It 1s possible that more optlmlstlc conclusions might be ‘StrvAy speakmg, there must be some LOD score threshold so low that 10% of the scores attamed if hnkage 1s present WIII exceed it Computatlonally, the reason no such threshold 1s detected IS that the maximum LOD score m any slmulatlon IS estimated over a fuute grid of recombmatlon fractions, mcreasmg m steps of 0 25 from 0 0 to 0 5 The true maximum LOD score is nearly always greater than zero, but If it is close to zero, It may be attamed for a 0 value between the points of the grid 8The other major assumptions are that markers with four codommant alleles are used, that the recombmatlon fraction 1s 0 05, and that the underlymg genetlc model IS correct
S MATTHYSSEand J
342
PARNAS
LOD Figure (upper
7 Cumulatwe
graph)
dlstrlbutlons of LOD scores under hypotheses of hnkage Completely mformatwe marker, three-dImensIona model
(lower graph)
and nonhnkage
drawn (although, m the context of the failures of hnkage analysis so far, they would actually be pessimistic conclusionsr) from other models, similarly constramed by the facts hsted m Table 1; that possibihty will remam open until appropriate simulations are carried out 6.2. How Lmkage Analysis Mrght Be Made To Succeed According to the simulations reported m Figure 6, linkage studies m schizophrenia do become worthwhile when associated traits with higher penetrance (e g., other schizophrenia spectrum disorders and EMD) are mcorporated.9 To be sure, the estimated power will not Justify strong claims after studymg one large pedigree. As discussed in Section 5.5, a threshold can be set that gives a 10% risk of missing a true hnkage m this pedigree, and after scanning the entire genome, approximately 14 marker candidates (13 of them wrong) would be “carried forward” for studies with other pedigrees. An outcome of this kmd would be a modest but valuable achievement In pubhshmg the results, it would be most important to emphasize that the fmdmgs amount to “rounding up suspects,” not “convictmg the crimmal.” The remarkable improvement m the completely mformative marker simulation (only 4% false negatrves with the conventional LOD score of 3.0; see Figure 7), is much closer ‘Current dlagnostlc crlterla most hkely reduce the power of all these models, because deslgned for genetic purposes In our studies, we are attemptmg to optlmlze the chmcal mto account a number of behavioral dlmenslons
the crlterla were not phenotype by takmg
EXTENDING THE SCHIZOPHRENIA PHENOTYPE
343
attainment than might have been imagined a few years ago. In the human genome there are 50-100,000 blocks of repeating CA sequences (short tandem repeats), and the number of repeated units m each block IS highly polymorphic. After amplification by polymerase chain reaction, it 1s possible to resolve these small differences m the number of repeated umts (Weber & May, 1989). Short tandem repeat markers are already m use m linkage laboratories (Kldd, personal communication), and because of their high abundance in the genome, it will not be long before completely mformatlve markers are available at any site of interest These advances m linkage technology, coupled with a multi-dlmenslonal view of the phenotype, may make a dramatic difference m the prospects for fmdmg a major gene contrlbutmg to schlzophrema to
Acknowledgements-We acknowledge with thanks programmmg suggestions from Dr Kenneth Lange, and helpful dIscussIons with Drs Phlhp Holzman, Deborah Levy and Kenneth Kidd Research costs were supported by Public Health Service grants MH-31154, MH-31340, and MH-44876
References Detera-Wadleigh, S D , Goldm, L R , Sherrmgton, R , Encio, I , de Miguel, C , Berrettmi, W , Curling, H , & Gershon, E S (1989) Exclusion of linkage to Sql l-13 in famlhes with schlzophrema and other psychlatrlc disorders Nature, 340, 391-393 Egeland, J A , Gerhard, D S , Pauls, D L , Sussex, J N , Kldd, K K , Allen, C R , Hostetter, A M ,& Housman, D E (1987) &polar affective disorders linked to DNA markers on chromosome 11 Nature, 325, 783-786 Elston, R C , Spence, M A , Hodge, S E , & MacCluer, J W (1989) Multlpornt mappmg and bnkage based upon affected pedrgree members LISS, New York Hodgkmson, S , Sherrmgton, R , Curling, H , Marchbanks, R , Reeders, S , Mallet, J , McInms, M , Petursson, H , & BrynJOlfSSOn, J (1987) Molecular genetic evidence for heterogeneity m manic depression Nuture, 325, 805-806 Holzman, P S , Krmglen, E , Matthysse, S , Flanagan, S D , Lipton, R B , Cramer, G , Levm, S , Lange, K , & Levy, D L (1988) A single dominant gene can account for eye tracking dysfunctions and schlzophrema m offspring of discordant twins Archrves of General Psychratry, 45, 641-647 Kaufmann, C A , DeLlsi, L E , Lehner, T , & Gllham, T C (1989) Physical mapping, linkage analysis of a putative schlzophrema locus on chromosome 5q Schlzophrema Bulletm, 15, 441-452 Kelsoe, J R , Gmns, E I , Egeland, J A , Gerhard, D S , Goldstein, A M , Bale, S J , Pauls, D L , Long, R T , Kldd, K K , Conte, G , Housman, D E , &Paul, S M (1989) Re-evaluation of the linkage relationshIp between chromosome 1 lp loci and the gene for bipolar affective disorder m the Old Order Amish Nature, 342, 238-243 Kendler, K S , Gruenberg, A M , & Strauss, J S (1981) An independent analysis of the Copenhagen sample of the Danish adoption study of schlzophrema, II The relationshIp between schlzotypal personality disorder and schizophrenia Archrves of General Psychratry, 38, 982-984 Kennedy, J L , Gluffra, L A , Cavalh-Sforza, L L , Pakstls, A J , Kldd, J R , Castlghone, C M , SJogret, B , Wetterberg, L , & Kldd, K K (1988) Evidence against linkage of schizophrenia to markers on chromosome 5 m a northern Swedish pedigree Nature, 336, 167-170 Kety, S S , & Ingraham, L J (1992) Genetic transmission and improved chagnosls of schizophrenia from pedigrees of adoptees Journal of Psychratrrc Research, 26, 247-255 Kety, S S , Wender, P H , & Rosenthal, D (1978) Genetic relationships within the schlzophrenra spectrum evidence from adoption studies In R L Spltzer & D F KLem (Eds ), Crrtrcal Issues mpsych~~fr~ dragnosls (pp 213-223) Raven Press, New York Lange, K , & Matthysse, S (1989) Simulation of pedigree genotypes by random walks Amerrcan Journal of Human Genetrcs, 45, 959-970 Matthysse, S , & Kldd, K K (1976) Estimating the genetic contrlbutlon to schlzophrema Amerlcun Journal of Psychratry, 133, 185-191 Matthysse, S , Holzman, P S , & Lange, K (1986) The genetic transmission of schlzophrema application of Mendehan latent structure analysis to eye tracking dysfunction m schlzophrema and affective disorder Journal of Psychlatrrc Research, 20, 57-67
344
S
MAT~HYS~E
and
J
PAKYAS
Matthysse, S (1990) Genetlc lmkage and complex diseases a comment Genetrc Eprdemrology, 7, 29-31 Ott, J (1989) Computer slmulatlon methods m human hnkage analysis Proceedrngs of the Nufronal Academy of Scrences of the US A , 86, 4175-4178 Ploughman, L M , & Boehnke, M (1989) Estlmatmg the power of a proposed lmkage study tor a Lomplex genetlc trait American Journal of Human Genetrcs, 44, 543-551 Rlsch, N (1990a) GenetIc hnkage and complex dlseascs, with Fpeclal reference to pjylhlatrlc dlaorder\ Genetrc Eprdemlology, 7, 3- 16 Rlsch, N (1990b) Lmkage strategies for genetically complex traits I MultIlocus models American Journal of Human Genetrcs, 46, 222-228 Rlsch, N , & Gluffra, L (1992) Model mlsspeclficatlon and muh-pomt lmkage analysis Human Heredrt_v, 42, 77-92 St Clalr, D , Blackwood, D , MUX, W , Ballhe, D , Hubbard, A , Wright, A , & Evans, H J (1989) No hnkage of chromosome 5ql l-q13 markers to schlzophrema m Scottish famlhes Nafure, 339, 305-309 H , Potter, M , Dudleston, K , Barraclough, B , Wasmuth, J , Sherrmgton, R , BrynJOlffSOn, J , Petursson, Dobbs, M , & Gurlmg, H (1988) Locahzatlon of a susceptlblhty locus for schlzophrema on chromosome 5 Nature, 336, 164-167 Tsuang, M T , Wmokur, G , & Crowe, R R (1980) Morbldlty risks of schlzophrema and affective dlsordcr\ among first-degree relatives of patients with schlzophrema, mama, depresslon and surgical condltlons Brrrlsh Journal of Psychratry, 137, 497-504 Weber, J L , & May, P E (1989) Abundant class of human DNA polymorphisms uhlch can be typed u$mg the polymerase cham reaction Amerrcan Journal of Human Genetrcs, 44, 388-396 Weeks, D E , & Lange, K (1988) The affected-pedigree-member method of hnkage analysis Amerrcan Journal of Human Genetrcs, 42, 315-326