Surnames and dialects in France: Population structure and cultural evolution

Surnames and dialects in France: Population structure and cultural evolution

ARTICLE IN PRESS Journal of Theoretical Biology 237 (2005) 75–86 www.elsevier.com/locate/yjtbi Surnames and dialects in France: Population structure...

777KB Sizes 146 Downloads 184 Views

ARTICLE IN PRESS

Journal of Theoretical Biology 237 (2005) 75–86 www.elsevier.com/locate/yjtbi

Surnames and dialects in France: Population structure and cultural evolution C. Scapolia, H. Goeblb, S. Sobotab, E. Mamolinia, A. Rodriguez-Larraldec, I. Barraia, a

Dipartimento di Biologia, Universita` di Ferrara, Via L. Borsari 46, I-44100 Ferrara, Italy Institut fu¨r Romanistik, Universita¨t Salzburg, Akademiestrabe 24, A-5020 Salzburg, Austria c Centro de Medicina Experimental, Laboratorio de Genetica Humana, IVIC, Caracas, Venezuela b

Received 5 January 2005; accepted 30 March 2005 Available online 1 June 2005 Communicated by Gilean McVean

Abstract To study the isonymy structure of France as related to local language variations, the surname distributions of 6.03 million telephone users registered for the year 2002 were analysed in the 21 conterminous regions, their 94 departments and in 809 towns of the Country. For regions and departments the differences among local dialects were quantified according to the dialecto-metrization of the Atlas Linguistique Franc- ais. We found that Lasker’s distance between regions was correlated with geographic distance with r ¼ 0:692  0:040, while Euclidean (r ¼ 0:546  0:058) and Nei’s (r ¼ 0:610  0:048) distances were less correlated. Slightly lower correlations were observed for departments. Also, dialectometric distance was correlated with geography (r ¼ 0:582  0:069 for regions and r ¼ 0:617  0:015 for departments). The correlations between Lasker and dialectometric matrix distances for regions and departments are r ¼ 0:625  0:046 and 0:544  0:014; respectively, indicating that the common cause generating surname and language diversity accounts for about 35% of the differentiation. Both Lasker and dialectometric distances identify very similar boundaries between Poitou, Centre, Bourgogne and Franche Compte´e at the North, and Aquitaine, Limousin, Auvergne, RhoˆneAlpes in the South. Average Fisher’s a for France was 7877 the highest value observed for the European countries studied to date. The size of a in most French towns indicates considerable recent immigration. r 2005 Elsevier Ltd. All rights reserved. Keywords: Population structure; Isonymy distances; Isolation by distance; Dialectology; Dialectometry

1. Introduction Inbreeding studies in France began very early in the history of human population genetics. Initially, the studies were aimed at the demography of consanguineous marriages from Church dispensations, and local inbreeding levels in time and space were studied by such investigators as Fleury (1933), Sutter and Tabah (1948, 1955), Sutter and Goux (1962), Sutter (1968), Henry (1973), just to remember the names of a few pioneers who also developed the methods of data analysis. Corresponding author. Tel.: +39 0532 291731; fax: +39 0532 249761. E-mail address: [email protected] (I. Barrai).

0022-5193/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2005.03.035

Subsequently, inbreeding studies were often associated with segregation of recessive genes and became an important complement of medical genetics projects. In the second half of the XX century, several investigators continued the development of the theory and application of inbreeding aimed at the solution of specific genetic problems, overcoming the several difficulties involved in the ascertainment of parental consanguinity and in the assessment of consanguinity levels in specific areas and for specific genes (for example Jacquard, 1972, 1975; Tchen et al., 1977; Bonaiti et al., 1978; Briard et al., 1979; Stoll et al., 1994; von Kleist-Retzow et al., 1998). However, among the main difficulties in such studies there was—and there is—the acquisition, the assessment, and the evaluation of the data and of

ARTICLE IN PRESS 76

C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

their sources, and a comprehensive plan for the assessment of the levels and the dynamics of inbreeding over France, although a desirable endeavour, was never completely realized. Crow and Mange (1965), defined the relationship existing between the frequency of isonymous marriages and inbreeding, and opened the field of isonymy studies. These soon joined the classical inbreeding studies, and flourished, notwithstanding the legitimate and serious criticisms elicited by the assumptions inborn in the isonymy models (Rogers, 1991). Of course there are many valid criticisms on the use of surnames as material for the evaluation of inbreeding, and we do not want to repeat them here. Nevertheless it was stated that, for the indication that drift has occurred, a crude estimate of F ST is satisfactory (Yasuda and Morton, 1967) and that relative measures of kinship estimated through surnames can still be useful (Relethford, 1988). These are important points, since in using large national samples of surnames, as we do here, we are not trying to estimate the exact inbreeding level in a given subregion but we want to compare its value with that of other subregions of the same nation. The developments in CD-ROM and PC technology have made available in the past few years enormous numbers of surnames and the capabilities to analyse very large data sets such as the lists of telephone users. We are conscious of the bias in these sources of information, since males are more likely to be listed than females, and telephones are available depending on socio-economic status. However, the use of these sources is justified by the enormous sample sizes available, which make possible to estimate isonymy in different areas of entire states (Colantonio et al., 2003). Then estimates of F ST can be compared since they are equally biased. In France, isonymy studies became soon very popular, and still are. Using surnames, Crognier (1985) studied the social changes and the variation of inbreeding in a French rural population. Darlu and Ruffie´ (1992) described the relation between consanguinity and migration rate from surname distributions and isonymy, taking advantage of the large samples of surnames easily obtainable from the civil birth registers. They showed that migration and consanguinity estimated from surname distributions are inversely and closely linked. Vernay (2000, 2001) showed how the distribution of surnames and isonymy could contribute to the assessment of local inbreeding in isolated areas, and Legay and Vernay (2000) studied in detail the origin and the geographic distribution of some French surnames. Very recently, Morelli et al. (2002) analysed the surnames of Corse, the Mediterranean island belonging to France since 1768, but previously belonging to the Republic of Genoa for five centuries. Using principal components, they showed that in the island the distribution of surnames is in agreement with

geographic and linguistic structure. Since their description is fairly complete, we did not include the island in this work. An important study of isonymy in France, possibly the most important to date, was produced by Mourrieras et al. (1995). Using an original approach, they analysed how surname frequencies vary with geographic distance constructing a distorted map of the country based on surname distances. Their study stays the most complete description of inbreeding in France to date. 2. Purpose of the present work We want now to apply to French surnames the techniques we have developed in the course of years for the estimation of inbreeding levels from isonymy, in order to explore further the population structure of the country. Specifically, we want to test whether surname distances are linearly correlated with geographic distances, as we observed in all European countries, in Venezuela and in Argentina. Further, we can compare the structures identified by surnames with those identified by dialectological distances, since we have available the appropriate methodology to derive such distances from linguistic similarities (Goebl, 1984, 1993, 2000, 2002, 2003). This second aim is most relevant toward the study of the evolution of human groups, since isolation by distance for dialects will show that mechanisms, which are typical of genetic evolution, also act in cultural evolution. 3. Isonymy structure In the hierarchical model of inbreeding (Wright, 1951), FST results from division into subpopulations, so it is an appropriate metric to measure differentiation among regions, among departments and among towns. The main utility of FST in the history of a population is to indicate when most genetic drift is likely to have occurred. The difference between FST values in different areas of a country might give indications about the relative age of settlements in it. We recall that high values of FST from isonymy are possible when in a population or subgroup there are relatively few surnames, and low values when the number of surnames is large. As in the case of alleles, drift of surnames is proportional to time, and then small FST suggests recent immigration or settlement. 4. Materials and methods 4.1. Towns, departments and regions France is the largest country in Western Europe. It has a surface of 547,030 km2 and about 60 million

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

inhabitants. Those over 14 years are about 81.5% of the total. The French State has had continuity, with obvious surface variations, from the unification under Clovis in 486 AD, namely 10 years after the collapse of the Empire under Romulus Augustus in 476, to date. The official language is French. Literacy is 99%. Several dialects, Occitanian, Breton, Alsatian, Catalan, Basque and Flemish are still prevalent but fading away (Lexicon der Romanistischen Linguistik, (LRL) vol. V, tome 2, 1991). Summary data on the Country are available at the Central Intelligence Agency site: http:// www.cia.gov/cia/publications/factbook/country.html Here, we consider the surnames of more than 6 million private telephone users, registered for the year 2002, downloaded from a commercial CD-ROM. These were distributed in the 21 conterminous regions, their 94 departments, and in 809 towns of the Country. Inside the 21 regions, the users are classified by department. Inside the 94 departments, users are classified by town of residence. Our statistical units are the towns, which differ greatly for their latitude, their area, and for the number of users, from 845,015 in Paris to 47 in Chambord. Average number of users in 807 towns, excluding Paris and Marseille, was 6149. Paris is itself a department, and has the largest number of users. The smallest department was Loze`re in Languedoc with 9606 users. Average number for departments was 64,123. The largest region is Ile-de-France with 1,053,592 users, and the smallest is Limousin with 105,071. The average number of users for regions was 287,027. The 6.03 million users included in this study are representative of about 18.6 million persons. As said above, the present sample does not include the Island of Corse in the Mediterranean. 4.2. Isonymy and dialect theory In the following subsections, we recall briefly the definitions of statistics derived from the surname distributions and from dialects, and their meaning in the study of microevolution in human groups. The P standard parameters are isonymy ( i p2i ), namely 4FST, Fisher’s a (Fisher, 1943), and Karlin–McGregor’s n (Karlin and McGregor, 1967). 4.2.1. Isonymy between groups We have defined above isonymy within groups, in the present case regions, departments and towns. However, the distribution of surnames between groups is important to assess similarity of their populations, at the limit common origin. Based on the surname distributions, random isonymy between localities I and J (I ij ) was estimated as X I ij ¼ p p ; k ki kj

77

where pki and pkj are the relative frequencies of surname k in groups I and J; respectively; the sum is over all surnames. When the two groups have no surnames in common, isonymy is of course zero. 4.2.2. Alpha (a) and Karlin–McGregor (n) Fisher’s Alpha (a) was estimated according to Barrai et al. (1996). We note that in general, for large samples, a ¼ 1=I ii , where I ii is the random isonymy within the ith sample. It can be defined as the ‘Effective Surname Number’, ESN (Barrai et al., 2000). It estimates the number of surnames having an equal frequency, which would give the same isonymy as that observed. A small value of a would be the consequence of large inbreeding and drift, whereas high value would be the consequence of migration and low inbreeding. Since in presence of a rate of migration n; F ST ¼ 1=ð4Nn þ 1Þ then

a ¼ Nn þ ð1=4Þ.

For large N, we note that a approximates very closely Nn. This makes a a useful predictor of the evolutionary dynamics of a system and a sufficient indicator of structure. 4.2.3. Isolation by distance The isonymy between two groups I and J; I ij , is twice Lasker’s coefficient of relationship, Ri (Lasker, 1977). We note also that I ij is a function of the kinship jij between groups I and J, as shown by RodriguezLarralde et al. (1998a). To detect isolation by distance, we calculate the linear correlation of surname distances, Lasker’s (Rodriguez-Larralde et al., 1998a), Euclidean (Cavalli-Sforza and Edwards, 1967), and Nei’s (Nei, 1973) with geographic distances. To calculate geographic distance between locations, we used the weighted average of the individual coordinates inside a location, where the weight was the number of users in it. So, to calculate the coordinates of a department, we averaged the coordinates of the towns inside it, to calculate the coordinates of a region, we averaged the coordinates of its departments. The coordinates of towns were measured on maps of France with scale of 1/1,000,000. The significance of correlations was assessed using a permutation method (Smouse et al., 1986). To obtain a graphical representation of the surname relationship between different regions and different departments, dendrograms were constructed from the matrices of Lasker’s distances using the UPGMA option of the NEIGHBOR algorithm of the PHYLIP package (Felsenstein, 1989, 1993). The graphs of the dendrograms were obtained at http://bioweb.pasteur.fr/seqanal/ interfaces/drawgram.html From dendrograms, maps were constructed with the main clusters of regions (or departments) indicated in different colours.

ARTICLE IN PRESS 78

C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

4.3. Dialectometric similarity and distance Dialects are variations of main languages and their transmission is exclusively cultural. France has a wealth of dialects, which are the basis for the linguistic section of the present study. A convenient description of the French dialects and of their origin can be found in the Lexikon der Romanistischen Linguistik (LRL vol. V, tome 1, 1988) and read at http://www.discoverfrance.net/ France/Language/DF_languages.html For the evaluation of dialectological similarity and distance among departments and regions, we used as a source the Linguistic Atlas of France (Atlas Linguistique de la France (ALF) 1968). We analysed the inner geolinguistic variation of 626 of the 1421 linguistic maps of the ALF according to the following linguistic categories: phonetics (vocalism and consonantism), lexicon and morphosyntax. Doing so we obtained a data matrix with the dimensions N  p : N ¼ 641 geographic inquiry points (located mainly in France but also in Belgium, Switzerland, Italy und GreatBritain [Channel Islands]) and p ¼ 1687 ‘‘working maps’’ (equivalent, from a metrological point of view, to categorical multistate characters). Note that in the ALF unfortunately only Romance dialects have been considered and that therefore the most distinctive languages in relation to French (Breton, Basque and Flemish) could not be included in the exploration grid. In a second step, this matrix was drastically reduced in order to make it comparable with the two surname grids of France (departments and regions): first to include only French inquiry points (reduction of N from 641 to 574 points), secondly from this level to that of the departments (reduction of N from 574 to 94 inquiry points) and ultimately from the department level to the regional level (reduction of N from 94 to 21 inquiry points), whereas the initial amount of ‘‘working maps’’ (p ¼ 1687) was not altered. Note that all these reduction procedures were done by means of a progressive superposition of different polygon maps created by the Voronoi tasselation technique (Byers, 1992, 1996). Since linguists, and more especially geolinguists, are more interested in geolinguistic similarities than in distances, we calculated the linguistic similarity through a qualitative similarity index often used in the field of Numerical Classification. We defined it ‘‘Relative Identity Value’’ (RIV); (cf. Goebl, 1993). The relationship between linguistic similarity and linguistic distance is given by RIV þ RDV ¼ 100, where RDV stands for relative distant value. Hence, the flowchart of our dialectometric calculations contains the following standard steps: construction of the data matrix (here: N ¼ 94 or 21 inquiry points, p ¼ 1687 characters), choice of an adequate similarity

index (here: RIV), calculation of square similarity and distance matrices (here: 94  94 and 21  21), and final taxometric exploitation of the matrices (e.g. through dendrograms, maps, etc.). Both the electronic storing of our dialectometric data and their subsequent statistic and cartographic processing are ensured by a highly sophisticated software called Visual DialectoMetry (VDM) created by (and still regularly updated) by Edgar Haimerl (Blaustein, Germany). Further details of our dialectometric methods and the program VDM are available at the site http:// ald.sbg.ac.at/dm

5. Results and discussion 5.1. Distribution of users As already stated, the number of users per department ranged from 9606 in Loze`re, to 845,015 in the Paris area, which is department number 75. The number of different surnames found in the country was 495,104. The detailed distribution of their occurrences is available from the corresponding author. The identification, the coordinates, the sample size and number of surnames in each of the 21 regions analysed are given in Table 1, along with a; n, and I ¼ 4F ST . The same data were obtained for each of the 94 departments and for all towns; these data are visible and downloadable at: http://www2.unife.it/progetti/ genetica/pdata.htm the site where we make available our public data. The location of towns is given in Fig. 1. 5.2. The most frequent surnames The log–log frequency distribution of the occurrence of surnames (Fox and Lasker, 1983) is given in Fig. 2. The figure shows the distribution of the logarithm of the number of surnames on the logarithm of the number of times they occur. The graph is fairly linear, but there seems to be a slight excess of intermediate occurrences, which confers some convexity to the log–log curve, as we observed in Holland (Barrai et al., 2002) and to a lesser extent in Spain (Rodriguez-Larralde et al., 2003). In Table 2, we give the list of the 100 most frequent surnames. These 100 surnames comprise 487,072 users, or 8.1% of the total of 6 million. The most frequent surname in this sample is Martin (21,043 occurrences), then come Bernard (10,663), Durand (9193), Thomas (9171), and Richard (8458). In the first 100 surnames, there are seven Spanish surnames: Garcia (the 8th from the last, 8169), Martinez (6283), Lopez (5073), Sanchez (4646), Perez (4611), Fernandez (3980) and Rodriguez (3566). Dasilva (3944), although very frequent in Spain, may be of Lusitanian origin. The most frequent Arabic surname is Ben (3363),

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

79

Table 1 Names, number (nr), Coordinates (X ; Y ), number of users (N), number of surnames (S), Fisher’s a, Karlin’s n, isonymy (I ¼ 4F), of the 21 French regions Region

X

Y

nr

N

S

a

n

I ¼ 4F

ALSACE AQUITAINE AUVERGNE BASSENORM. BOURGOGNE BRETAGNE CENTRE CHAMPAGNE FR.COMTE´ HTENORMAND. ILE-DE-FRANCE LANGUEDOC LIMOUSIN LORRAINE MIDI NORD-P.-DE-C. PAYS-DE-LOIRE PICARDIE POITOU PROVENCE RHOˆNE-ALPES

932 335 615 342 704 170 495 696 834 432 551 651 490 829 489 584 304 574 355 841 772

690 286 455 751 548 664 586 745 561 791 728 145 383 732 170 922 572 815 442 158 379

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

169793 282284 152316 117538 185223 321106 246059 156126 113734 138020 1053592 282221 105071 157881 361775 213947 402796 122920 182627 724319 538208

51277 78042 40296 31251 51552 55791 64142 45153 33371 37320 232627 76076 29442 48933 83423 55525 67685 39012 45119 156855 117299

2841 6497 3732 2594 4277 2768 4353 4313 3437 3033 9258 3712 3714 3213 4638 4277 3348 3690 3144 6739 5221

0.01646 0.02251 0.02392 0.02160 0.02258 0.00856 0.01739 0.02689 0.02934 0.02151 0.00872 0.01299 0.03415 0.01995 0.01267 0.01961 0.00825 0.02915 0.01693 0.00923 0.00962

0.0003221 0.0001540 0.0002680 0.0003855 0.0002339 0.0003612 0.0002298 0.0002319 0.0002910 0.0003297 0.0001081 0.0002695 0.0002693 0.0003113 0.0002157 0.0002339 0.0003348 0.0002711 0.0003181 0.0001485 0.0001916

Fig. 1. Geographic distribution of the 809 towns of France considered in this study.

which is a prefix but is often found isolated. We did not meet the challenge of estimating the number of surnames of Arabic origin, since there are too many ambiguities with Spanish and Italian surnames. However, the origin of names like Benhamou (Son of peace, 584), Abdallah (Servant of God, 213), Elidrissi (Idris, the name of a prophet, 100) is easily identified. Not so easily Medina (town, 380), Moya (water, 200), Alcantara (the bridge, 59) that may have entered France from Spain. Rossi, the most frequent Italian surname, is 224th from the last, with 1681 occurrences, and the homologous Russo, from Southern Italy, is the 968th

Fig. 2. The log–log distribution of the frequency of occurrence of surnames in France.

with 559 occurrences. Mu¨ller (4287) is 49th, Schmidt (897) is 513th, Schultz (231) is 3037th. Vandamme (270) is 2478th from the last, but it may be either Dutch or Belgian. The most frequent surname from the Indochina area is Nguyen, which is 88th from the last with 3148 occurrences. Direct patronymic dominate, as Robert (8278), Michel (7707), Simon (7594), Laurent (7249), Bertrand (6256). Names derived from phenotypes are very frequent, like Moreau (8286), Roux (6784), Blanc (5990), Rousseau (5571), Leroux (4897). Locatives like Dubois (7765), Dupont (4797), Duval (3635), Dumont

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

80

Table 2 The 100 most frequent surnames in 6 million users in France. Increasing order of frequency BENOIT MARTY ROLLAND LEMOINE PIERRE COHEN RENARD ROGER LEMAIRE RENAUD PICARD LACROIX NGUYEN COLIN AUBERT REY BARBIER LUCAS GERARD NOEL BEN DUMONT DUFOUR MARIE JOLY DUMAS ROY JEAN RODRIGUEZ GAILLARD MARCHAND DUVAL MEUNIER MEYER ROCHE DENIS BRUNET ARNAUD LEGRAND BLANCHARD FONTAINE GIRAUD DASILVA ROUSSEL FERNANDEZ FRANCOIS BRUN HENRY MASSON MORIN

2910 2912 2925 2934 2957 2970 3003 3032 3052 3083 3097 3128 3148 3186 3210 3262 3324 3335 3344 3358 3363 3376 3387 3417 3436 3443 3509 3514 3566 3626 3629 3635 3657 3662 3685 3710 3792 3796 3821 3875 3889 3894 3944 3965 3980 4043 4056 4066 4124 4218

FABRE MULLER NICOLAS MATHIEU CLEMENT VIDAL GAUTIER ROBIN BOYER PEREZ SANCHEZ LEFEVRE CHEVALIER GAUTHIER DUPONT MERCIER PERRIN LEROUX GUERIN GARNIER LOPEZ LAMBERT MOREL ANDRE VINCENT LEFEBVRE ROUSSEAU FOURNIER FAURE GIRARD BLANC BONNET DAVID BERTRAND MARTINEZ ROUX LEROY LAURENT SIMON MICHEL DUBOIS PETIT GARCIA ROBERT MOREAU RICHARD THOMAS DURAND BERNARD MARTIN

4285 4287 4314 4375 4385 4393 4397 4450 4559 4611 4646 4718 4778 4793 4797 4810 4844 4897 4963 5008 5073 5166 5169 5256 5309 5356 5571 5775 5802 5882 5990 6123 6221 6256 6283 6784 6809 7249 7594 7707 7765 8113 8169 8278 8286 8458 9171 9193 10663 21043

(3376) are also among the 100 most frequent, as are trade names like Faure (5802), Lefebvre (5356), Mercier (4810), Muller (4287), Masson (4124), Marchand (3629) and others. Surnames like Lemoine (2934), Leveque (1903), Lacroix (3128), Leroy (6809), Leduc (1609), Lecomte (2613), Lemaire (3052), at their origin, may have indicated a specific parentage.

The abundance of Spanish surnames among these 100 most frequent French surnames, while surnames from other neighbour countries are absent or very weakly represented, seems to indicate that an important and recent directional migration from Spain has existed in France. 5.3. Isonymy parameters in French regions and departments 5.3.1. Fisher’s alpha The effective surname number, a, in France is 7877 for the whole country considered as a unit. The average over the 21 regions is 4229, a value similar to the values of a we have observed in large European towns. The average over 94 departments is 3546, and over 809 towns it is 1615. This sequence of values, indicates that the estimates of F ST from this source varies with the size of the area and of the population studied. This is in part explained by the difference in frequencies of common surnames in each subdivision. For towns, we obtain average F ST ¼ 0:0003138, for departments F ST ¼ 0:0000869, and for regions F ST ¼ 0:0000652. Nei and Imaizumi (1966a, b) were the first to show that F ST is smaller in large areas and populations as compared to smaller subdivisions, and we take pleasure in labelling this effect as the ‘‘Prefecture Effect’’, since it was first described by studying the variation of random inbreeding in Japanese Prefectures and in small isolated populations in that country. It is possible from the data to estimate the proportion of surname exchange or rate of migration (m) per generation using Wright’s (1943) formula p m ¼ 1 f2N e F ST =½ð2N e 1ÞF ST þ 1 g, where N e is the effective population size. Faute de mieux, we use as a rough approximation for France that given for Japan by Nei and Imaizumi, namely the approximated effective size is obtained from counts of N users by the relation N e ¼ 0:65 N. Here we use the harmonic mean of the number of users for each grouping. Under this crude approximation, we obtain the following migration rates: Formula Towns Departments Regions

Wright’s m 0.45 0.10 0.03

Karlin–McGregor’s n 0.36 0.08 0.02

These results are somewhat consistent. It seems that, on the average, the towns within departments have exchanged about 40% of surnames per generation, while departments within regions about 10% and regions within country about 2% per generation. Exchange is higher among towns within departments and regions, and among departments within regions. These rates are

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

81

Fig. 3. Three-dimensional plot of a over France in the 809 Towns sampled here. Note the North-West to South-East depression transect.

very sensitive to the population size of the area sampled. We also note that differential population growth between towns, departments and regions is a confounding factor in these estimations. In Fig. 3, we plot the values of a in each town. A strip of low a is visible going from Brittany in the North, down to the Mediterranean, passing through Maine, Vienne, Creuse, to Herault in the South. A low a indicates some homogeneity of surnames, then it might be proposed, among other possibilities, that the low-a strip represents an area of early settlement with low immigration. The lowest values of a were met in the departments of Loze`re, Haute-Loire, Vosges, Finiste`re and Arie`ge. These departments belong to the regions of Basse Normandie, Bretagne and Alsace, all these border the sea or other countries. The highest values, reasonably enough, were seen in the Ile-de-France, in Paris specifically, in the region of Provence, in the departments of the Alpes-Maritimes, Var, and Bouches-duRhoˆne. The second largest town of France, Marseille, is located in this region. 5.3.2. Inbreeding by isonymy We recall that isonymy is a measure of surname similarity inside a group. It was stated several times that F ST obtained from isonymy is only a rough estimate of inbreeding. Be it as it may, in France, where we have the values of inbreeding obtained from isonymy for 21 regions, 94 departments and for the 809 towns, the relative values in the different groups may be only

indicative of the true levels of inbreeding in the specific areas, but surely they are comparable within the country. Since four times the value of random inbreeding is the inverse of a, the description of the geographic variation of F ST is specular to the description of a. So, in France the highest levels of inbreeding are expected in Brittany and in the Nord-East of the country. The lowest levels are expected in the Ile-de-France, Paris and Provence. 5.4. Isolation by distance We studied isolation by distance through the correlation of surname and dialect distances with geography at the region and department levels. 5.4.1. Regions We found that Lasker’s distance between regions was significantly and highly correlated with linear geographic distance, with r ¼ 0:692  0:040. Very similar results were obtained for Euclidean distance, with r ¼ 0:546  0:058 and for Nei’s distance, with r ¼ 0:610  0:048. The smallest isonymy distance found in this analysis at the region level, 8.25 Lasker’s units, was observed between Haute- and Basse-Normandie. These two neighbouring regions face the English Channel and are related by sharing many surnames occurring in large numbers. Their centres of gravity are 98 km apart. The least related regions are Alsace and Nord/Pas-deCalais with a distance of 9.78 Lasker’s units and with

ARTICLE IN PRESS 82

C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

centres of gravity 418 km apart. The most frequent surnames in Alsace are Meyer, Muller, Schmitt, Klein, Schneider, while in the Nord/Pas-de-Calais the most frequent are the typically French Lefebvre, Dubois, Leroy, Leclerc, Lemaire, with some Belgian names like Devos and Jansen. 5.4.2. Departments We found that Lasker’s distance between departments was significantly and highly correlated with linear geographic distance, with r ¼ 0:646  0:010. Very similar results were obtained for Euclidean distance, with r ¼ 0:502  0:015 and for Nei’s distance, with r ¼ 0:576  0:012. The smallest isonymy distance found in this analysis at the department level, 7.716 Lasker’s units, was observed between Haute Loire and Loze`re. These two neighbouring departments are related by sharing many surnames occurring in high numbers, like Martin, Boyer, Brun, Michel, Bonnet, Jouve, Roche. Their centers of gravity are 67 km apart. The least related departments are Gers in MidiPyrene´es and Haut-Rhin in Alsace with a distance of 9.959 Lasker’s units and with centers of gravity 759 km apart. The most frequent surnames in Gers are Abadie, Lalanne, Garcia, Dupuy, Lasserre, while in Haut-Rhin the most frequent are the typically German Meyer, Muller, Schmitt, Schneider, Keller. 5.5. Dialectometric distance We have defined above dialectometric similarity and dissimilarity or distance. In our analysis, we use as a tool the Relative Distance Values, RDV, with the simple aim of obtaining positive instead of negative correlations, but we have well clear that dialectometric similarity is the notion which is preferred by linguists, and our use of distance is only opportunistic. 5.5.1. Regions The average dialectometric distance between the 21 regions is 35.8%. The closest regions are Champagne and Ile-de-France with a dissimilarity of 8.6% and with capital towns 146 km apart. The less similar regions are Alsace and Aquitaine with a distance of 56.7% and 721 km apart. The dialectometric distance tends to increase with geographic distance, with a significant linear component; in fact, the correlation with geography is r ¼ 0:582  0:069. The correlations with isonymic distances are 0.62570.046, 0.40470.069 and 0.45770.059 for Lasker’s, Euclidean and Nei’s, respectively. 5.5.2. Departments The average dialectometric distance between the 94 departments is 37%. The most similar departments are

the pairs Essonne and Val-de-Marne, Hauts-de-Seine and Val d’Oise, both with a dissimilarity of 7% and at 24 and 16 km apart, respectively. Then come the pairs Val-de-Marne and Loiret, Val-de-Marne and Hauts de Seine with a dissimilarity of 8% at a distance of 100 and 23 km, respectively. All these departments are in the Ilede-France, in close proximity of the Paris area. The less similar departments are all paired with the Pyrene´es-Orientales at the border with Spain, and are Meurthe-et-Moselle, Bas-Rhin, and Belfort, all of them close to Germany and at a distance of 708, 747, and 621 km respectively from the Pyrene´es-Orientales. All the three pairs have a 63% dissimilarity. The dialects of the Pyrene´es-Orientales have Spanish and Basque influences, while the dialects of Meurthe-et-Moselle, Bas-Rhin and Belfort have German influences. The dissimilarity tends to increase with geographic distance also for departments, with a significant linear component; in fact, the correlation with geography is r ¼ 0:617  0:015. The correlations with isonymic distances are 0.54470.014, 0.50470.013 and 0.57270.012 for Lasker’s, Euclidean and Nei’s, respectively. These results indicate that the consequences of factors acting on strictly cultural transmission are largely similar to those acting on simulated Y-linked genetic transmission. 5.6. Cladistics of the French regions and departments 5.6.1. Regions In the dendrogram and relative map built from the matrix of Lasker’s distance between regions two main clusters and a minor one can be identified. The minor cluster is made -no surprise here- by Alsace and Lorraine. Basse- and Haute-Normandie, Bretagne, Centre, Pays de Loire, Poitou, Bourgogne, FrancheComte´, Champagne, Nord/Pas-de-Calais, Picardie and Ile-de-France compose the first major cluster. All these regions are conterminous and cover the northern area of France. Aquitaine, Auvergne, Limousin, Rhoˆne Alpes, Languedoc, Midi, and Provence make the second cluster. These regions constitute the southern part of the country (Fig. 4). Within the major clusters, several subclusters can be identified, all of them related with geography. Also in the dendrogram built from the matrix of dialectometic distances among regions two main clusters are identified, one at North and one at South. The Northern cluster comprises 15 regions, and the southern six. It is of interest that the main East–West transect of the French Regions so identified is shifted South and East with respect to the boundary derived from surname distances between regions (Fig. 5). This is due to the fact that in the case of dialectometric distances among regions, the area where Franco-Provenc- al is prevalent clusters with the northern area and not with the South as in the case of surnames.

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

83

Fig. 4. Map of France showing the main clusters obtained from the matrix of Lasker’s distances between regions. Note the East–West boundary, which separates the North of France from the South. Borders downloaded from the site http://flagspot.net/flags/

Fig. 6. (a) Map of France derived from the matrix of surname distances between departments. (b) Map of the subclusters obtained from the matrix of surname distances between departments. All subclusters are formed by conterminous departments.

Fig. 5. Map of France derived from the matrix of dialectometric distances between regions. The main boundary is shifted toward South in comparison to the map from surname distances.

5.6.2. Departments The map of the 94 departments built from the matrix of Lasker’s distances, shows two main clusters, strongly related to the clusters formed by the regions (see map Fig. 6a). The two main clusters include 83 of the 94 French departments. The first, or northern cluster, includes 53 departments, and the second or southern, 30 departments. The Moselle, Haut- and Bas-Rhin form a third small cluster. A fourth small cluster is made up by four departments in southern Aquitaine, Gers, Landes, the Pyrene´es-Atlantiques and Hautes-Pyrene´es. Two departments of the Grande Ceinture, Val de Marne and Val-d’Oise, make a small cluster, and the department of Alpes-Maritimes stands alone. In the northern cluster, three distinct and large subclusters can be identified, all made up by conterminous departments. The three departments of Brittany cluster together. The subclusters formed by departments in general belong to one or a few neighbouring regions (see Fig. 6b).

The main feature, however, is the separation between the northern and the southern main clusters, almost exactly isotopic with the boundary identified by the dendrogram of regions. The boundary borders at North with the Charentes, Vienne, Indre, Creuze, Allier, Saoˆne et Loire, Ain, and it ends in Haute Savoie. At South, it borders with Gironde, Dordogne, Haute Vienne, Corre`ze, the Puy de Doˆme, the Loire and the Rhoˆne, Ise`re, and it ends in Savoie. Ain and Creuze belong to the southern cluster in the dendrogram formed by regions, and to the northern in that formed by departments. An inverse situation is observed for Allier and Haute Savoie. The map derived from the matrix of the dialectometric distances between departments, is given in Fig. 7a. Apart from five departments, Ise`re, Rhoˆne, Haute-Savoie, Savoie and Ain, which cluster separately from all others, but are all conterminous, and correspond to the Franco-Provenc- al dialect, the main East– West boundary is defined by the two main remaining clusters, which represent the North (Langue d’Oil) and the South of France (Langue d’Oc). The boundary is not exactly the same as the one seen through surname distances, in the sense that departments Charente (16)

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

84

and Creuse (23) cluster with the southern departments, and, as already underlined, Ain (1), Ise`re (38), Rhoˆne (69), Savoie (73) and Haute-Savoie (74) form a small third cluster overlapping the North-South division. However, the homology between the map derived from dialect distances and the one of surname distances is striking. The subclusters also (Fig. 7b) cover conterminous areas similar to those identified by Lasker’s distances. We underline that the main dialectal stems—Western and Central dialects, French, Picard, Limousin, Auvergnat, Languedocien, Catalan and Provenc- al are consistently identified by the subclusters formed by the dialectometric distance matrix.

6. Conclusion

Fig. 7. (a) Map of France derived from the main clusters in the dendrogram from dialectometric distances among departments. (b) Map of the subclusters identified by the matrix of dialectometric distances among departments.

A similar source of information and the same methodology described in this work was used to analyse the isonymy structure of Venezuela, were we used 4 million electors, and Argentina, where we used 24.6 million electors. The same source was used for European countries and the USA where we analysed surnames of telephone users (Barrai et al., 1996, 1997, 1999, 2000, 2001, 2002, 2004; Rodriguez-Larralde et al., 1998a, b, 2000, 2003). The average value of a over all cities [or cantons (Switzerland), states (Venezuela), or districts (Argentina, Dipierri et al., 2005)], and the isolation by distance measured by the correlation between isonymy and geographic distances, are given in Table 3 for the

Table 3 Comparison of isonymy parameters in France, and the countries studied to date. Overall, 70.6 million surnames were analysed Country EUROPE Austria Belgium France Germany Holland Italy Switzerlanda Spain Double surnames Paternal Maternal

Sample size (millions)

1.0 1.1 6.0 5.2 2.4 5.1 1.7 3.6

Surnames

140,766 137,442 494,646 462,526 126,485 215,623 166,116 1,581,387 94,886 110,034

a (average)

Isolation by distance (r)

854 997 1617 1596 787 1236 891

0.59 0.74 0.69 0.51 0.46 0.61 0.72

22077 134 144

0.21 0.26

NORTH AMERICA USA

18.0

899,585

1366

0.24

SOUTH AMERICA Argentinab Venezuelac

22.6 3.9

414,441 68,665

422 122

0.47 0.78

a

Cantons. Districts. c States, elsewhere towns. b

ARTICLE IN PRESS C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

countries studied up to date. There are several features emerging from the comparisons reported in this table. First, the general similarity among the European nations for abundance of surnames as measured indirectly by a and for isolation by distance. Secondly, the relatively small value of a in Venezuela, Argentina, and Spain, and thirdly, the practical absence of isolation by distance in the USA. In France, the situation is similar to other European countries for wealth of surnames and for isolation by distance. As in most of Europe, the low frequency of a relatively high number of surnames results in a high value of a. In addition, in the present work, we could compare the matrices obtained from surnames with those obtained from dialectological similarities (see also Goebl, 2000, 2002, 2003) and the maps thereof. The structures obtained from surnames are almost superimposable with those obtained from dialectological similarities. We noted elsewhere that surnames are a specific part of language and that surname distances can be transformed in geographic distances, so that isolation due to different languages is equivalent to isolation due to physical distance (Barrai et al., 2004). In France, this is seen also for local dialects, and it confirms that one relevant function of language differentiation, even at the stage of dialect, lies in group identification and belonging. So, the general idea that language similarity is an indicator of genetic kinship (Cavalli-Sforza et al., 1989, 1997) emerges also at the local level. It appears further, from the analysis of eleven countries and over 70 million surnames, that the relation between surnames, language, and geography, may be very strict, and that each distance can be predictive of all the others in those situations where drift dominates over migration.

Acknowledgements This work was supported by grants of the Italian Ministry of Universities and Research (MIUR) and by the Italian Fund for basic Research to Chiara Scapoli (No. RBAU01C53J), and by Agreements Italy/Venezuela CNR/FONACIT 2002–2004, 132.36.1/PI2000001829 to Alvaro Rodriguez-Larralde and Italo Barrai. Hans Goebl and Slawomir Sobota are supported by two grants (12414 and 13349) given by the Fonds zur Fo¨rderung der Wissenschaftlichen Forschung in O¨sterreich (FWF). Prof. G. Zei of Pavia provided key references. Prof. Bahram Dezfuli helped with the etymology of Arabic surnames. References Atlas Linguistique de la France (ALF) 1902–1910, 1968. In: Gillie´ron, J., Edmont, E. (Eds.), vol. 10. Champion, Paris (reprint: Bologna: Forni).

85

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., Rodriguez-Larralde, A., 1996. Isonymy and the genetic structure of Switzerland. I: the distributions of surnames. Ann. Hum. Biol. 23, 431–455. Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., Rodriguez-Larralde, A., 1997. Isolation by distance in Germany. Hum. Genet. 100, 684. Barrai, I., Scapoli, C., Mamolini, E., Rodriguez-Larralde, A., 1999. Isolation by distance in Italy. Hum. Biol. 71, 947–962. Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Scapoli, C., 2000. Elements of the surname structure of Austria. Ann. Hum. Biol. 26, 1–15. Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F., Scapoli, C., 2001. Elements of the surname structure of the USA. Am. J. Phys. Anthropol. 114, 109–123. Barrai, I., Rodriguez-Larralde, A., Manni, F., Mamolini, E., Scapoli, C., 2002. Isonymy and isolation by distance in the Netherlands. Hum. Biol. 74, 263–283. Barrai, I., Rodriguez-Larralde, A., Manni, F., Ruggiero, V., Tartari, D., Scapoli, C., 2004. Isolation by language and isolation by distance in Belgium. Ann. Hum. Genet. 68, 1–16. Bonaiti, C., Demenais, F., Briard, M.L., Feingold, J., 1978. Consanguinity in multifactorial inheritance. Application to data on congenital glaucoma. Hum. Hered. 28, 361–371. Briard, M.L., Fre´zal, J., Feingold, J., Kaplan, J., 1979. Genetic counseling in consanguineous marriages. J. Genet. Hum. 27, 175–188. Byers, J.A., 1992. Dirichlet tessellation of bark beetle spatial attack points. J. Anim. Ecol. 61, 759–769. Byers, J.A., 1996. Correct calculation of Dirichlet polygon areas. J. Anim. Ecol. 65, 528–529. Cavalli-Sforza, L.L., 1997. Genes, peoples, and languages. Proc. Natl Acad. Sci. USA 94, 7719–7724. Cavalli-Sforza, L.L., Edwards, A.W.F., 1967. Phylogenetic analysys models and estimation procedures. Am. J. Hum. Genet. 19, 233–257. Cavalli-Sforza, L.L., Piazza, A., Menozzi, P., Mountain, J., 1989. Genetic and linguistic evolution. Science 244, 1128–1129. Colantonio, S.E., Lasker, G.W., Kaplan, B.A., Fuster, V., 2003. Use of surname models in human population biology: a review of recent developments. Hum. Biol. 75, 785–807. Crognier, E., 1985. Consanguinity and social change: an isonymic study of a French peasant population 1870–1979. J. Biosoc. Sci. 17, 267–279. Crow, J.F., Mange, A., 1965. Measurements of inbreeding from the frequency of marriages between persons of the same surnames. Eugen. Q. 12, 199–203. Darlu, P., Ruffie´, J., 1992. Relationship between consanguinity and migration rate from surname distributions and isonymy in France. Ann. Hum. Biol. 19, 133–137. Dipierri, J.E., Alfaro, E.L., Scapoli, C., Mamolini, E., RodriguezLarralde, A., Barrai, I., 2005. Surnames in Argentina: a population study through isonymy. Am. J. Phys. Anthropol. (in press). Felsenstein, J., 1989. PHYLIP, phylogeny inference package (Version 3.2). Cladistics 5, 164–166. Felsenstein, J., 1993. PHYLIP, Phylogeny Inference Package (Version 3.5c). Distributed by the Author. Department of Genetics, University of Washington, Seattle. Fisher, R.A., 1943. The relation between the number of species and the number of individuals in a random sample of animal population. J. Anim. Ecol. 12, 42–58. Fleury, J., 1933. Recherches historiques sur les empeˆchements de parente´ dans le mariage canonique des origines aux fausses de´cre´tales. Paris. Fox, W.R., Lasker, G.W., 1983. The distribution of surname frequencies. Int. Stat. Rev. 51, 81–87.

ARTICLE IN PRESS 86

C. Scapoli et al. / Journal of Theoretical Biology 237 (2005) 75–86

Goebl, H., 1984. Dialektometrische Studien. Anhand Italoromanischer, Ra¨toromanischer und Galloromanischer Sprachmaterialien aus AIS und ALF. Niemeyer, Tu¨bingen. Goebl, H., 1993. Dialectometry. A short overview of the principles and practice of quantitative classification of linguistic atlas data. In: Ko¨hler, R., Rieger, B. (Eds.), Contributions of Quantitative Linguistics. Kluwer, Dordrecht, Boston, Londres, pp. 277–315 (with 20 dialectometric maps). Goebl, H., 2000. La dialectome´trisation de l’ALF: pre´sentation des premiers re´sultats. Linguistica 40, 209–236 (with 12 dialectometric maps). Goebl, H., 2002. Analyse dialectome´trique de structures de profondeur de l’Atlas Linguistique de la France. Rev. Linguist. Roman. 261/ 262, 5–63 (with 24 dialectometric maps). Goebl, H., 2003. Regards dialectome´triques sur les donne´es de l’Atlas linguistique de la France (ALF): relations quantitatives et structures de profondeur. Estudis Roma`nics 25, 59–96 (with 24 dialectometric maps). Henry, L., 1973. Perspectives De´mographiques. INED-PUF, Paris. Jacquard, A., 1972. The phenotypic distribution of relatives when consanguinity is present. Ann. Hum. Genet. 36, 233–236. Jacquard, A., 1975. Inbreeding: one word, several meanings. Theor. Popul. Biol. 7, 338–363. Karlin, S., McGregor, J., 1967. The number of mutant forms maintained in a population, Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, vol. 4. pp. 415–438. Lasker, G.W., 1977. A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. Hum. Biol. 49, 489–493. Legay, J.M., Vernay, M., 2000. The distribution and geographical origin of some French surnames. Ann. Hum. Biol. 27, 587–605. Lexikon der Romanistischen Linguistik (LRL), 1988 and following. Holtus, G., Metzeltin, M., Schmitt, Chr. (Eds.), Niemeyer, Tu¨bingen. Morelli, L., Paoli, G., Francalacci, P., 2002. Surname analysis of the Corsican population reveals an agreement with geographical and linguistic structure. J. Biosoc. Sci. 34, 289–301. Mourrieras, B., Darlu, P., Hochez, J., Hazout, S., 1995. Surname distribution in France: a distance analysis by a distorted geographical map. Ann. Hum. Biol. 22, 183–198. Nei, M., 1973. The theory and estimation of genetic distance. In: Morton, N.E. (Ed.), Genetic Structure of Populations. University Press of Hawaii, Honolulu, pp. 45–54. Nei, M., Imaizumi, J., 1966a. Genetic structure of human populations. I. Local differentiation of blood groups gene frequencies in Japan. Heredity 21, 9–36. Nei, M., Imaizumi, J., 1966b. Genetic structure of human populations. II. Differentiation of blood groups gene frequencies among isolated populations. Heredity 21, 183–190.

Relethford, J.H., 1988. Estimation of kinship and genetic distance from surnames. Hum. Biol. 60, 475–492. Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., Barrai, I., 1998a. Isonymy and the genetic structure of Switzerland. II. Isolation by distance. Ann. Hum. Biol. 25, 533–540. Rodriguez-Larralde, A., Barrai, I., Nesti, C., Mamolini, E., Scapoli, C., 1998b. Isonymy and isolation by distance in Germany. Hum. Biol. 70, 1041–1056. Rodriguez-Larralde, A., Morales, J., Barrai, I., 2000. Surname frequency and the isonymy structure of Venezuela. Am. J. Hum. Biol. 12, 352–362. Rodriguez-Larralde, A., Gonzalez-Martin, J., Scapoli, C., Barrai, I., 2003. The names of Spain: a study of the isonymy structure of Spain. Am. J. Phys. Anthropol. 121, 280–292. Rogers, A.R., 1991. Doubts about isonymy. Hum. Biol. 63, 663–668. Smouse, P.E., Long, J.C., Sokal, R.R., 1986. Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35, 627–632. Stoll, C., Alembik, Y., Dott, B., Feingold, J., 1994. Parental consanguinity as a cause of increased incidence of birth defects in a study of 131,760 consecutive births. Am. J. Hum. Genet. 49, 114–117. Sutter, J., 1968. Fre´quence de l’endogamie et ses facteurs au XIXe`me sie`cle. Population 23, 303–324. Sutter, J., Goux, H., 1962. L’e´volution de la consanguineite´ en France de 1926 a` 1958 avec donne´s re´centes detaille´s. Population 17, 683–704. Sutter, J., Tabah, L., 1948. Fre´quence et re´partition des mariages consanguins en France. Population 3, 607–623. Sutter, J., Tabah, L., 1955. L’e´volution des isolats de deux de´partements franc- ais: Loir-et-Cher, Finiste`re. Population 10, 654–674. Tchen, P., Bois, E., Feingold, N., Kaplan, J., 1977. Inbreeding in recessive diseases. Hum. Genet. 38, 163–167. Vernay, M., 2000. Trends in inbreeding, isonymy, and repeated pairs of surnames in the Valserine Valley, French Jura, 1763–1972. Hum. Biol. 72, 675–692. Vernay, M., 2001. Geographic distribution of surnames and genetic structure: the county of Arde`che at the beginning of the twentieth century. C. R. Acad. Sci. III 324, 589–599. von Kleist-Retzow, J.C., Cormier-Daire, V., de Lonlay, P., Parfait, B., Chretien, D., Rustin, P., Feingold, J., Rotig, A., Munnich, A., 1998. A high rate (20–30%) of parental consanguinity in cytochrome-oxidase deficiency. Am. J. Hum. Genet. 63, 428–435. Wright, S., 1943. Isolation by distance. Genetics 28, 114–138. Wright, S., 1951. The genetic structure of populations. Ann. Eugen. 15, 324–354. Yasuda, N., Morton, N.E., 1967. Studies on human population structure. In: Crow, J.F., Neel, J.V. (Eds.), Third International Congress of Human Genetics. Johns Hopkins University Press, Baltimore, MD, pp. 249–265.