THEORETICAL
POPULATION
BIOLOGY
2, 507-524 (1971)
Bioassay of Kinship N. E. MORTON, Population
SHIRLEY
Genetics Laboratory,
YEE,
D. E. HARRIS, AND RUTH LEW
University
of Hawaii,
Honolulu,
Hawaii
Received May 14, 1971
Human biologists often make statements about the relationship of two populations, say I and J. Such a statement can take several forms, for example: 1. I is more related to J than to a third population, K. 2. I and J are descended from the same ancestral population K some t generations (or fissions) ago. 3. The probability that a random gene in I be identical by descent with a random allele in J is vii . The first of these statements is inherently vague. We are not told what “related” means, although it presumably implies common ancestry, and so we cannot demand whether I is twice as related to J as to K, or 100/bmore related, or in general inquire what the relationship is. The usual approach is to maximize a correlation, although this confers the undesirable property of increasing with the number of variables, and the object of these studies is not to discriminate among populations. In this misapplication of discriminant functions, no estimation problem is raised, and each investigator is free to define his own “index of biological distance” without the slightest risk that it be dismissed as irrelevant or inefficient (Balakrishnan and Sangvhi, 1968). The second statement seems by comparison delightfully precise, since it purports to measure the time span back to an ancestral population. This is, in fact, a classical approach to major taxonomic categories. It is informative to deduce from the fossil record that hominids diverged from the great apes about ten million years ago; while the estimate is inexact, there is no doubt that the two reproductively isolated phylogenies did indeed separate at roughly that time. Unfortunately, once we turn our attention to microtaxonomic differences, a model of population fission without hybridization becomes unacceptable, especially when applied to a migratory species like man. If we suspend, for the moment, any reservations about the assumptions underlying recent exercises in microtaxonomy (Edwards, 1969), what meaning are we to attach to the inference that Guaymi Indians are separated by one fission from Yanomama and 507
508
MORTON,
PEE,
HARRIS,
AND
LEW
by 5 fissions from Jivaro (Fitch and Neel, 1969), when the fissions are entirely hypothetical and ignore the population exchanges which are an essential part of microevolution ? One is discouraged from a more detailed consideration of this approach, requiring as it does the hypothesis of uniform rates of change of angularly transformed gene frequencies, together with empirical approximations to the apparently intractable maximum likelihood estimation of tree form and node times (Edwards, 1969). As geneticists, therefore, we favor statements of the third type, invoking the concept of identity by descent. Admittedly this approach requires that a sample of loci determining selected traits be considered a random sample of all loci. This assumption, which underlies all other approaches, does not seem a major objection, and in any case can be satisfied asymptotically by increasing the number of loci sampled.
THE
MODEL
We define the kinship rpii of two populations I and J as the probability that a random gene in I be identical by descent with a random allele in J. If there are Mi individuals in I and Mj individuals in J, and if there are n loci in each haploid set, then we suppose that one of the 2nMi genes in I is chosen at random; with the locus thus specified, one of the 2Mj alleles in J is taken at random, and vii is the a priori probability that the two genes drawn in this way are identical by descent (autozygous). Naturally the value of p may be expected to vary among loci by drift or selection, but our definition assures that only the mean or a priori value is the object of estimation. Falconer (1960) called vii the mean councestry, while Malecot (1969) termed it the man coeficient of kinship. Now suppose that the populations of interest comprise an array in which the mean frequency of allele Ak is Qk , whereas in the i-th population the corresponding frequency is qki . To estimate Qk we shall assume that 1. Each population is panmictic, with genotype frequencies defined by P(AkA’) = qEi =
2q,&j
for
k = 1
for
k # 1.
(1)
2. A random sample is drawn from each population, the sample Ni being proportional to population size Mi . Only gross departures from these assumptions seem likely to be serious. Panmixia can always be approximated by taking each population from a sufficiently small area, not exceeding the breeding range of a community.
BIOASSAY
509
OF KINSHIP
If the qki are estimated from (1) by maximum likelihood as & , the maximum likelihood estimator of Qk is
Several computer programs are available to estimate the qri , allowing for dominance and deleting alleles that do not have a stationary point in the sample. One of these programs is ALLTYPE (Yasuda, 1969; Miki et al., 1969), which assumes a factor-union system (Cotterman, 1969) in which each allele can be represented by a binary vector of factors producing phenotypes by Boolean union (Table 1). This is not a severe restriction, since all the regular phenotype TABLE
I
The ABO Blood Groups as a Factor-Union Serological
System
reaction
Gene
anti-A
anti-B
Binary notation
A
+
-
10
B 0
-
+ -
01 00
Phenotypes AA+AO= 10 BB + BO = 01 AB = 11 o=oo
systems in man are factor-union systems or may be reduced to such with little loss of information (Morton, 1969). Now consider the sample Ni in which the estimated number of allele K is nki = 2N&
.
Note that
Therefore an appropriate estimate of homozygosis in the i-th population is &!i
= (; T&G
- 1))/2N,(2Ni
- 1)
whereas in the array, -%k2
=;
(;nk$Tnki
-
l)@Ni
(2T”t
-
1)
510
MORTON,
YEE,
HARRIS,
AND
LE\V
We shall see in the discussion that these estimates of XqL and zQR2 are not efficient, and will derive more efficient estimators. If there were panmixia in the array, the heterozygosis would be Ho = 1 - ‘?Qk2 compared with Hi = 1 -&ii in the i-th population.
By definition
pii = (H,, - H,)/H,, The mean kinship within
(Wright, = @&
populations
- ,J?Qli2)/(l - ,J?Q;).
(3)
is
Kinship between different populations ozygosis in an Fl population
=
1951)
1 and J may be estimated from the heter-
1 - C nkinkf:‘4NiNj k
, (5)
where Ck gkCakj can be obtained as the inner product of the vectors of maximum likelihood estimators. A more efficient estimate of vii is given in the discussion. The mean kinship in the array of populations is
(6) To impose panmixia of pEj to give
for random pairs of gametes, we may adjust our estimate
&i
=
hh
-
vR)/(l
-
P)R).
(7)
Associated with each estimate of vii is an amount of information K. Neglecting heterogeneity among systems, we take K as NJV,K,,/(N, + N,), where k,, is the contribution per individual to the information about F = 0 when gene frequencies are known exactly in a sample of phenotypes (Yasuda, 1969). Allowing for heterogeneity in vij among systems, more nearly equal weights are
BIOASSAY OF KINSHIP
511
appropriate. The problem of defining fully efficient weights is treated in the discussion. However defined, K may be used as a weight to averagep independent estimates of lij , so that the mean value over all loci sampled is +ij = ZK@,,/ZK.
(8)
The reliability of pij as an estimator for random loci will tend to increase with p.
EXTENSION TO DIVERSE DATA
Estimation of vij from phenotypes is an example of induction which will be called bioassay of kinship, to be differentiated from the deductive methods to be discussed later. Data suitable for bioassay include not only regular phenotypes determined by a single locus, but also isonymy and metrics. By isonymy is meant the probability that two individuals drawn at random from the same population belong to the same localized descent group, and therefore have the same surname or clan (Crow and Mange, 1965). Let qfi be the frequency of the k-th name, of which nki cases are observed in a random sample from the i-th population. Then an unbiassed estimator of & qf& is
whereas in the array
(10) Crow and Mange (1965) showed that if all names are monophyletic (due to descent from a common ancestor), if mutation and selection are negligible relative to migration, and if names are subject to the same forces of drift and migration as genes, then vii = (L?qEi- 2Qk2)/4(1 - zQg2).
(11)
This will be an overestimate if names are locally polyphyletic. Thus an occupational name like Smith would not be a deviation from the model in England if it were adopted randomly in different shires, whereas regional names adopted by many members of a community (like Andermatten in the Swiss commune of Saas) would give a seriously inflated estimate of pir . Kinship between populations may be estimated from the isonymy of hybrids, (12)
512
MORTON, SEE, HARRIS, AhW LEW
Again, more efficient estimates of ~~~and yij are presented in the discussion. The large number of names and absence of dominance give estimates from isonymy high precision, but the uncertain validity of the assumptions makes one hesitate to give the estimates disproportionate weight. Many of the traits of interest to anthropologists are of complex inheritance and distributed continuously or in arbitrary grades. We shall call such traits metrical. Let X be an additively inherited metrical character in standard deviation units (with a mean of zero and unit variance), whose value is Xi , Xj for populations I and 1. Malecot (1948) showed that E(XiXj)
= 2h*p,j( 1 + CS) L- 2h2Fpij,
(13)
where h2 is the heritability and OLis the mean kinship between uniting gametes. Application of (13) requires estimation of heritability and separation of sampling errors from the true variance among populations, which can be done by standard analysis of variance procedures. Dominance deviations can probably be neglected. Metrics can be made orthogonal by taking the first n - 1 as covariates for the n-th metric, the order being arbitrary. The most serious limitation is that the nongenetic sources of variation must be randomly distributed among populations, which cannot be tested directly. Uncertainty about the validity of this assumption seems to outweigh the convenience of metrical traits for assaying many loci simultaneously. However, (13) is of some interest because it allows estimation of heritability relative to a reference trait with high heritability even in the absence of family data.
DEDUCTION OF KINSHIP
All the approaches considered so far purport to estimate kinship without assumptions about its determinants or the times over which they have acted. Alternative methods are deductive, aiming to predict kinship from genealogical or demographic data. The genealogical method is much the oldest (Wright, 1921; Malecot, 1948). It has the unique property of predicting kinship not only as a mean between groups of individuals, but primarily as a relation between pairs of individuals. Without stressing this point, we may pass at once to kinship in the sense of this paper, which is (14) where for each pair of individuals the sum is taken over all possible genealogical
BIOASSAY
OF KINSHIP
513
paths through a common ancestor, and the expectation is taken over all possible pairs of individuals, one from I and one from J. In practice, random pairs are usually substituted for all possible pairs, so that vi, is estimated rather than deduced precisely. The estimate of vii may be compared with mt , the inbreeding in population 1, where the expectation is taken over all marital pairs in I. The main limitation of this method is that it assumes complete knowledge of the genealogy for indefinitely many generations. Any incompleteness leads to underestimation of pii . Parentage error substitutes an a priori likely genealogy for the true one, decreasing the precision of ~ij but without bias. A minor disadvantage of the genealogical method is that it ignores systematic pressure due to mutation and selection, but this is important only for large n. The deterministic demographic method uses three kinds of parameters: 1. the scalar m representing linearized systematic pressure due to selection, mutation, and long-range migration,
where A is the rate of change of gene frequency q per generation and Q is the equilibrium value of Q. In actual populations and for most polymorphisms it appears that m is dominated by long-range migration. 2. the vector of effective population size N. For our present purposes an element Ni may be thought of as roughly the number of individuals born in I during one generation (usually 25-30 years), who during their lives produce at least one live-born child. Greater precision is given to this concept by Wright (1969). 3. the column-stochastic exchange matrix P whose element pii is the probability that an individual reproducing in population J was born in population I. Malecot (1950) showed how these input parameters yield a series of symmetric matrices Get), whose element pii) predicts the kinship between populations I and J after t generations. For m > 0, @u) approaches a limit @(“) whose elements lie in the interval 0, 1. An approximation to this method for large values of Nm was introduced by Bodmer and Cavalli-Sforza (1968). Imaizumi et al. (1970) and Morton et al. (1971a) returned to the original method which is valid even for small Nm. The great advantage of this approach is that it does not depend on the completeness or accuracy of genealogies. It therefore provides a test of the importance of relationship so remote that it is not given in the pedigree. The demographic method suffers from one main disadvantage. In practice all the input parameters must be estimated, usually over a small number of generations. Predictions of pii may therefore have large errors, especially if the pop-
514
hIORTON,
YEE,
HARRIS,
AND
LEK
ulation sizes or migration patterns are not stable. In principle, these parameters could be altered from generation to generation, but in practice we do not have enough precise demographic knowledge to assure that such a model corresponds with any real population. This leads us to the third deductive method, .Ifonte Carlo simulation, in which such demographic distributions as age at birth, maturity, and marriage, birth intervals, completed family size, and age differences between mates and sibs may be introduced as random variables. Experience with this method is limited by computing time and cost, but in one recent example there were striking differences between the real and artificial populations in the proportions of various degrees of consanguineous marriage (MacCluer and Schull, 1970). This was largely due to omission of migration from the model, so that predictions of kinship tended to approach unity, while the observed values remained near zero in the real population. So far, complex population structure has seldom been simulated by Monte Carlo methods (Levin, 1969) so that no detailed comparison with other approaches is possible. It is obvious that the various ways to study kinship are complementary. Bioassay gives a description of population structure as it exists in one generation for a sample of loci, and thus provides the only observational check on the validity of predictions, however derived. The classical genealogical method can be compared with deductions based on recent demography which do not depend on the completeness or accuracy of pedigrees. These comparisons will naturally deal primarily with estimates of pij . However, there is another bridge between the demographic and genealogical predictions, which give values of v’ij in successive generations. Consider a panmictic island, with systematic pressure m and effective size 12;. It can easily be shown that
(16) (Wright, 1931 ; Morton et al., 1971b). Equation (16) can be applied to estimates of @’ in successive generations, assuming that m and N are constant, and to estimates of the frequencies of chains of length 2t + 1 for a single or pooled generations, assuming that these frequencies are constant. In this way, data from incomplete genealogies can be used to predict the results of complete pedigrees. Genealogical data from both real and artificial populations give leastsquares estimates of the demographic parameters N and m. It is an attractive but still incompletely tested hypothesis that even more complex population structure will approximate (16).
BIOASSAY OF KINSHIP ISOLATION
515
BY DISTANCE
As the number of populations in an array increases, we lose interest in the specific values of vii and look for a more succinct description of population structure. This was provided by the theory of Malecot (1959), which depends on the Euclidean distance d between population I and J. In practice, it seems to make little difference, and may sometimes be more appropriate, if we take d as the shortest road distance (Cavalli-Sforza, 1969). As originally derived, in the limit for large distances, there was an annoying effect of dimensionality of migration. Fortunately, at the smaller distances where kinship is significantly positive, observations on real populations and deterministic study of artificial populations have shown that
(17) where a G I/[1 + 4N dm(m + 2 - 2H)], (18) regardless of dimensionality (Imaizumi et al., 1970). In (18), u2 is defined somewhat ambiguously as the expected value of the square of the distance between birth places of parent and child, excluding long-range migration which enters only into the systematic pressure m, and H is the proportion of parents born in
DISTANCE d (mi) FIG. 1. Deduction of kinship in Oxfordshire villages by the deterministic graphic method. Data of Hioms et al. (1969). After Imaizumi et al. (1970).
demo-
516
MORTON,
SE,
HARRIS,
AND
LEW
a population who reproduce there. We are here concerned not so much with the precision of (18), but with the wide validity of (17). Earlier work concentrated on this relationship, whose parameters a, b vary widely among societies. Large values of a (.03 or more) have been found in island and primitive populations, while large values of b are characteristic of .OlOO
:
- : 0050
H A--a
ABO. ALL SWITZERLAND ABO, ALPINE ISOLATES RANDOM ISONYMY PEDIGREE INBREEDING
a_ 2 $
i
.OOOl
DISTANCE
FIG. 2. Bioassay of kinship in Switzerland.
(km.)
After Morton
and Hussels (1970).
continental isolates but not oceanic islands or wide-ranging hunters and gatherers (Morton, 1969; Imaizumi et al., 1970). Whereas only predictive approaches are applicable to the evolution of kinship in time, both bioassay and deduction can be used for isolation by distance (Figs. 1 and 2). The number of societies examined in this way is still small, and there is need for more comparative study, both among societies and by different methods in the same society. The Hierarchical
Model
We have defined qii = (H, - H,,)/H,, as the probability that a gene in I be identical by descent with an allele in J. There is a corresponding probability ~i+i = (Ha -
Hi+j)IHo
(19)
BIOASSAY
517
OF KINSHIP
that two genes taken at random from an F,, hybrid between I and J are autozygous. Since
Hi+j = 1 - 1 (G + Q/cj12/4 k
=
(4 - 2 c
qkiqkj
-
1
di
-
x
&j)/4
= (2Hij + Hi + Hi)/49 we may write (19) as
Finally, there is a coefficient 8ij comparing the heterozygosity of Fl and F2 populations, which we shall call the hybridity between I and J, 0,. = Hij - Hi+j _ vi+j - vii El Hi+j - 1 - vi+j Vii
+
9% -
29%
(20)
= 4 - 2Fij - pii - vjj = 0
if
i = j.
Since Hij > Hi+j , the expectation of eij is always positive, but it is not a probability because (20) gives Vii =
%+j
- eij(l - Vii-j)-
Note that as Bij and pitj are functions of vij, they can be deduced from genealogies and migration as well as from bioassay. These coefficients have a simple interpretation in terms of estimating heterozygosis when gene frequencies are given, either by prior knowledge or simultaneous estimation in a defined sample. Given the array gene frequencies Qk , the genotype frequencies in an Fl hybrid between populations I and J are predicted by vij, whereas vi+j predicts the genotype frequencies in an F2 hybrid. Given the mean gene frequencies qk(i+j) of populations I and J, as for an F,, hybrid, --8ij gives the genotype frequencies in the Fl . Clearly, the genotype frequencies are estimated with greater precision from (qi+j , Bij) than from (Q, pu), which requires less detailed knowledge of gene frequencies. Adjustment of qij by random kinship vR relates to contemporary array gene frequencies, whereas adjustment by the kinship at large distances relates to gene frequencies of ancestors sufficiently remote so that their descendants may be considered distributed randomly over the array. Bioassay of kinship in this
518
MORTON,
YEE,
HARRIS,
AND
LEW
sense is comparable to deductions of kinship, in which expectations are all positive. In Malecot’s theory of isolation by distance, pij follows (17), with E(v,J = a. Therefore from (20) the effect of distance on Bij is B(d) = a(1 - e4)/[2
- a(1 + e-b”)]
e= a(1 - e-bd)/2 =0
(21)
i =j.
if
If it is assumed that all populations in the array have the same kinship (qpii = vjj = a), then (20) may be rearranged to give yij
= [U - (2 - U)
-
eij]/[l
(22)
B,j]*
Workman and Niswander (1970) defined the “genetic distance” Sanghvi, 1953), which in our notation is
(derived from
Gij = Z II(qki- 4kjj2/(qki + Qw)l, the sum being taken over all A alleles in all S systems. If we write E(!7ki
-
4kjj2
=
Q(l
-
Q)(Pii
+
Tjj
-
2’pij),
E(G + qkj)= IQ,
and is nearly proportional to Brj . However, as estimated by Workman and Niswander, Gij includes the variance component due to errors of estimate, which is eliminated from Bij .
Canonical Analysis The square, symmetrical matrix 0 whose element is vii may be estimated from pedigrees, migration, genealogies, or Monte Carlo simulation. The two principal bipolar eigenvectors provide a pair of scores (x, y) which, in general, will be related to the geographic coordinates (X, Y) such that linear regression gives a pair of equations
X*=A+Bx+Cy, Y* = A’ + B’x + C’y.
BIOASSAY
519
OF KINSHIP
Then the distance of a sample from its predicted
coordinates
is
D = [(X - X*)2 + (Y - Y*)2]+'. Populations which have recently migrated into their present location will tend to have large values of D and to map on the (X, Y) axes near their founders. Hybridization with surrounding groups will gradually reduce D. In other cases, large values of D may indicate social barriers locally more important than geographic distance. Thus comparison of geographic with canonical topology provides information about population history, as well as giving a simple representation of kinship. Two canonical topologies, for example from migration and bioassay, may be submitted to the same analysis to detect differences between contemporary and earlier migration patterns.
DISCUSSION
In our first assays of kinship we concentrated on isolation by distance. With more experience in arrays of small numbers of populations, it seems most informative to consider pairwise kinship before continuing on to isolation by distance. Our initial approach used pairs of individuals (Morton et al., 1968), whose probabilities are approximately linear in vij if kinship is less than the smallest gene frequency (Yasuda, 1968). This approximation has been avoided by using phenotype frequencies without pairing, but only by introducing the assumption of local panmixia, so that the Qk are not estimated from pooled samples. The efficiency of these alternatives and of our information weights requires further analysis. If all populations were of the same size and subject to the same systematic pressure, gene frequency variation would approximate the multivariate beta density, with suitable modification for populations in which a gene has temporarily been lost (Crow and Kimura, 1970). However, the assumption of uniform population structure is not generally valid. This is unfortunate, not only because it forces us to use eclectic estimators of unknown efficiency instead of maximum likelihood theory, but also because one would like to find some use for the gene frequency distributions to which theoretical population geneticists have devoted so much effort. Gene frequencies can be used to estimate kinship by Wahlund’s principle, according to which
520
MORTON,
YEE,
HARRIS,
AND
LEW
This procedure has been widely applied (e.g., Workman and Niswander, 1970), but it suffers from the disadvantage that an estimate of vii cannot be negative even on the null hypothesis of panmixia. Nei and Imaizumi (1966) have considered some of the problems in unbiased estimation of ~~~, assuming sampling with replacement, which in practice is not followed. Workman and Niswander (1970) were able to show by randomization that their Wahlund estimates of vii were inflated by about 40yb. They used also the quantity xc2 = 2N 1 c G”W) qii - Qk” 0 * ,k Dominance inflates this statistic, which does not then have a ~2 distribution even on the null hypothesis of random mating. Since its expectation is a function of sample size, xc2 is not an acceptable measure of kinship. The definition of genetic distance does not allow for differences in sample size among systems. Greater efficiency can be obtained by taking (23) (24) where the first sum is taken over all Ah alleles within the h-th system and the second sum is over all systems. However, this is biased by inclusion of errors of estimate. An unbiased estimator of improved efficiency can be gotten by weighting E(qk2) with l/Qk > as with the x2 test of heterogeneity for codominant alleles, x:+~, = 2N(x qk2/Qk- l), so that (3) is replaced by E(&
= n&tkf - 1)/2Ni(2Ni - 1), (25)
vii =
I
k$l [‘%d2/Qk,d - 1//V,
- I),
and (5) by
For isonymy the same substitution is made, except that the factor of 2 is omitted from E(qk2). Then eij for each system can be estimated from (20) and Bi, from (24).
BIOASSAY
A program BIOKIN cients:
521
OF KINSHIP
has been written for calculation of three kinship coeffi-
biased unbiased unbiased
Eq. 23 and 24, Eq. 25 and 26, Eq. 25 and 20.
eij Vii eii
To illustrate these methods we have analyzed the data of Workman and Niswander (1970) on polymorphisms of Papago Indians. For various weights (Table 2) the mean kinship within populations is about .0139 and random kinship TABLE
II
Bioassay of Kinship from Polymorphisms in Papago Indians (data of Workman and Niswander, 1970)
Genetic system
Sample size Information N k
Kinship within populations
4
Random kinship
Random unbiased
Random biased
#R
4,
4,
FY” Di ABO Gc HP W
677 678 676 540 676 679 680 500 660 409
3.00 3.00 0.33 0.23 0.73 1.00 0.03 1.00 1.00 0.12
0.0164 0.0019 0.0089 0.0173 0.0503 0.0221 0.0107 0.0091 0.0091 - 0.0066
-0.0007 -0.0007 -0.0007 -0.ooo9 -0.0007 -0.0007 -0.0007 -0.0010 -0.ooo8 -0.0012
0.0086 0.0013 0.0048 0.0091 0.0026 0.0120 0.0057 0.0050 0.0049 0.0027
0.0114 0.0467 0.0082 0.0134 0.0382 0.0076 0.0099 0.0100 0.0084 0.0004
Totals
6175
10.44
-
-
-
-
0.0133 0.0149 0.0134 0.0139 0.0139
-0.0008 -0.0008 -0.0008 -0.0008 -0.0008
0.0071 0.0080 0.0072 0.0075 0.0075
0.0104 0.0117 0.0104 0.0112 0.0109
MNSs CcDEe P
Jk”
k weighted average N weighted average kN weighted average unweighted average mean of averages
is -.0008, so that the adjusted mean kinship is (.0139 + .0008)/1.0008 = .0147. Using estimates of Workman and Niswander with the same weights, mean kinship plus sampling error is .0236 and sampling error (estimated by a randomization experiment) is .0070, giving a mean kinship of .0236 - .0070 = .0166. Equation 3 gives p = .0165. By (20), mean kinship within populations is roughly twice the mean hybridity, which is .0149 from the unbiased estimate and .0217 from biased kinship, or v = .0148 if we deduct Workman and Niswander’s estimate of sampling error. If G is the mean genetic distance,
522
MORTON.
TEE,
HARRIS,
AND
LE\f
defined as x:; xi ~X’~iVjGi,i~j xi S,:V, , based on 21 alleles in 9 systems, then G/(21 - 9) = .0235 estimates kinship plus a sampling error, or v = .0165 if we use \yorkman and Niswander’s estimate of sampling error. Finally, the ALLTYPE computer program gives .0158 as the mean kinship from phenotypes when gene frequencies are determined by (2) and k’ms h’p I is estimated iteratively. These various estimates of mean kinship within populations (.0147, .0148, .0149, .Ol%, .0165, .0166) are obviously in substantial agreement. They are relative to random pairs from the contemporary array. When the distance parameters a and b are estimated simultaneously, the residual variance is less for information weights k than for equal weights, but intermediate weights K + 1 are better. About half of the variation of unbiased estimates and 2/3 of the variation of biased estimates is removed by estimation of a and b and is therefore directly determined by geographic distance, the residual being due to errors of estimate and nonisotropic effects of isolation. The values of a and b and their standard errors depend on the method of estimation, which in turn depends on the reference population. Table 3 comTABLE
III
Estimates of MalCcot Parameters Individual Reference
= k + 1
Coefficient
a
0,
b
Random pairs, qJR= -0.ooo80
v(d)
0.0149
0.0028
0.6228
0.1167
Large distances (>40 mi.), g2L = -0.00383
44
0.0175
0.0026
0.0774
0.0293
Indefinitely large distances
v(d)> a = -0.00462 zibiased B(d) biased B(d)
0.0182
0.0037
0.0661
0.0395
0.0319 0.0352
0.0152 0.0057
0.0230 0.0429
0.0196 0.0183
a Asymptote
set
Weight
estimated simultaneously
01,
with a and b.
pares estimates relative to random pairs, large distance, and indefinitely large distance. Estimates of a increase and of b decrease as the reference set is enlarged. There is much uncertainty about the kinship of random Papago, which may be as low as .0182 - .0149 = .0033, or as great as .0319 - .0149 = .0170, but is unlikely to be as great as the upwardly biased estimate .0352 - .0149 = .0203. It is striking that the estimates for a for this not remarkably inbred population approach or exceed the value of .02 which Wright (1951) considered a maximum for human isolates. Clearly the importance of local identity by descent has been underrated by theoretical population geneticists.
BIOASSAY OF KINSHIP
523
SUMMARY
Kinship between two population I and J, defined as the probability of identity by descent for a random pair of alleles from I and J, can be estimated by bioassay of phenotypes, names, metrics, and gene frequencies, and can be predicted from genealogy and demography by both deterministic and Monte Carlo methods. The statistical procedures for bioassay may be improved, but even at this early stage of investigation it appears that, because of its wealth of genetic implications, kinship as defined is the best measure of relationship or biological similarity between two populations. The theory is illustrated by data of Workman and Niswander (1970) on polymorphisms of Papago Indians, and their genetic distance parameter (derived from Sanghvi) is shown to be a simple function of kinship and a good estimator. More efficient and unbiased estimators are given.
REFERENCES BALAKRISHNAN V. AND SANGVHI, L. D. 1968. Distance between populations on the basis of attribute data, Biometrics 24, 859-865. BODMER, W. F. AND CAVALLI~FORZA, L. L. 1968. A migration matrix model for the study of random genetic drift, Genetics 59, 565-592. CAVALLI-SFORZA, L. L. 1969. Human diversity, Proc. Znt. Congr. Genetics Z2th 2, 405-416. COTTERMAN, C. W. 1969. Factor-union phenotype systems, in “Computer Applications in Genetics,” (N. E. Morton, Ed.), pp. l-18, Univ. Hawaii Press, Honolulu. CROW, J. AND KIMURA, M. 1970. “An Introduction to Population Genetics Theory,” Harper and Row, New York. CROW, J. AND MANGE, A. P. 1965. Measurements of inbreeding from the frequency of marriages between persons of the same surname, Eugen. Quart. 12, 199-203. EDWARDS, A. W. F. 1969. Genetic taxonomy, in “Computer Applications in Genetics” (N. E. Morton, Ed.), pp. 14@-142, Univ. Hawaii Press, Honolulu. FALCONER, D. S. 1960. “Introduction to Quantitative Genetics,” Oliver and Boyd, Edinburgh. FITCH, W. M. AND NEEL, J. V. 1969. The phylogenic relationship of some Indian tribes of Central and South America, Amer. J. Hum. Genet. 21, 384-397. IMAIZUMI, Y.. MORTON, N. E., AND HARRIS, D. E. 1970. Isolation by distance in artificial populations, Genetics 66, 569-582. LEVIN, B. R. 1969. Simulation of genetic systems, in “Computer Applications in Genetics” (N. E. Morton, Ed.), pp. 38-46, Univ. Hawaii Press, Honolulu. MACCLUER, J. W. AND SCHULL, W. J. 1970. Frequencies of consanguineous marriage and accumulation of inbreeding in an artificial population, Amer. J. Hum. Genet. 22, 160-175. MAL~COT, G. 1948. “Les MathCmatiques de l’HCr&dite,” Masson, Paris. MAL~COT, G. 1950. Quelques schemas probabilistes sur la variabilitk des populations naturelles, Ann. Univ. Lyon Sci. Sec. A 13, 37-60. MALLCOT, G. 1959. Les modkles stochastiques en gt?n&ique de population, Publ. Inst. Statist. Univ. Paris 8, 173-210. MALBCOT, G. 1969. “The Mathematics of Heredity,” Freeman, San Francisco.
524
hlORTON,
YEE, HARRIS, AND LEU’
C., YEE, S., YASUDA, N., AND MORTON, N. E. 1969. ALLTYPE. “A Genetics Program Library” (N. E. Morton, Ed.), pp. 24-27, University of Hawaii Press, Honolulu. MORTON, N. E. 1969. Human population structure, in “Annual Review of Genetics” (H. L. Roman, Ed.), Vol. 3, pp. 53-73, .4nnual Reviews, Palo Alto, CA. MORTON, N. E., MIKI, C., AND YEE, S. 1968. Bioassay of population structure under isolation by distance, Amer. /. Hum. Genet. 20, 411-419. MORTON, N. E., I~~AIZUMI, Y., AND HARRIS, D. E. 197la. Clans as genetic barriers, Amer. Anthro., to appear. MORTON, N. E., HARRIS, D. E., YEE, S., AND LEW, R. 197lb. Pingelap and Mokil Atolls: migration, Amer. J. Hum. Genet. 23, 339-349. NEI, M. AND IMAIZUMI, Y. 1966. Genetic structure of human populations. I. Local differentiation of blood group gene frequencies in Japan, Heredity 21, 9-35. SANGHVI, L. D. 1953. Comparison of genetical and morphological methods for a study of biological differences, Amer. J. Whys. Anthrop. 11, 385-404. WORKMAN, P. L. AND NISWANDER, J. D. 1970. Population studies on southwestern Indian tribes. II. Local genetic differentiation in the Papago, Amer. /. Hum. Genet. 22, 24-49. WRIGHT, S. 1921. Systems of mating, Genetics 6, 11 l-178. WRIGHT, S. 1931. Evolution in Mendelian populations, Genetics 16, 97-159. WRIGHT, S. 1951. The genetical structure of populations, Ann. Eugen. 15, 323-354. WRIGHT, S. 1969. “Evolution and the Genetics of Populations,” Vol. 2, The theory of gene frequencies, University of Chicago Press, Chicago, IL. YASUDA, N. 1968. An extension of Wahlund’s principle to evaluate mating type frequency, Amer. J. Hum. Genet. 20, l-23. YASUDA, N. 1969. Estimation of the inbreeding coefficient and gene frequency from mating type frequency, “Computer Applications in Genetics,” (N. E. Morton, Ed.), pp. 87-96, University of Hawaii Press, Honolulu. MIKI,