Analysis of a Danish Caucasian population sample of single locus DNA-profiles. Allele frequencies, frequencies of DNA-profiles and heterozygosity

Analysis of a Danish Caucasian population sample of single locus DNA-profiles. Allele frequencies, frequencies of DNA-profiles and heterozygosity

Forensic Science International, 59 (1993) 119- 129 Elsevier Scientific Publishers Ireland Ltd. 119 ANALYSIS OF A DANISH CAUCASIAN POPULATION SAMPLE...

744KB Sizes 0 Downloads 33 Views

Forensic Science International,

59 (1993) 119- 129 Elsevier Scientific Publishers Ireland Ltd.

119

ANALYSIS OF A DANISH CAUCASIAN POPULATION SAMPLE OF SINGLE LOCUS DNA-PROFILES. ALLELE FREQUENCIES, FREQUENCIES OF DNA-PROFILES AND HETEROZYGOSITY

BIRTHE ERIKSEN

and OLE SVENSMARK

Institute of Forensic

Genetics, University of Copenhagen, (Denmark)

(Received August 3rd, 1992) (Revision received December 14th, 1992) (Accepted December 21st, 1992)

Summary The frequency distributions of the length of restriction fragments (HinfI) revealed by RFLPanalysis (restriction fragment length polymorphism) of blood samples from 482 Danish Caucasians using the single locus VNTR (variable number of tandem repeats) probes MSl, MS31, MS43a and YNH24 are reported. From two blood samples three fragments were obtained with MSl. The consistency of the characteristic allele frequency distribution for each probe is exemplified by comparing the accumulated frequency curves obtained with MS43a in samples consisting of 50 and 920 bands, respectively. The distribution of the differences in migration distance for the two fragments of a bandpair was investigated. The results suggest that the high frequency of apparent homozygotes observed is due mainly to coalescence of close heterozygotes. The distribution of frequencies of 437 DNA-profiles is reported. Key words: DNA-profiling; Single locus probes; Caucasian population; Allele frequencies; cies of DNA-profiles; Heterozygosity

Frequen-

Introduction A database comprising single locus DNA-profiles from 482 unrelated Danish individuals involved in criminal cases has been collected. DNA was digested with HinfI and analysed with the probes MSl, MS31, MS43a and YNH24. As shown previously the measurement error in units of kilobasepairs (kb) increases approximately exponentially with increasing fragment length. To circumvent this obvious disadvantage in the handling of the data the fragment lengths were transformed into normalized migration distance in units of millimetres. The transformed errors are normally distributed and independent of the fragment length [1,2]. The distributions of the allele frequencies are presented. The distriCorrespondace to: Birthe Eriksen, Institute Copenhagen, Denmark.

of Forensic Genetics 11 Frederik

V’s Vej, DK-2100

120

bution of the distances between the two fragments of bandpairs was investigated, and it is suggested, in agreement with others [3,4], that the apparent excess of homozygotes is due to coalescence of close heterozygotes. Frequencies of 437 complete DNA-profiles, i.e. profiles comprising 8 bands, were estimated using a reference sample consisting of 920 bands for each probe. Methods Blood samples from 482 unrelated Danish individuals involved in criminal cases were collected and DNA-RFLP analysis performed. Restriction was performed with HinfI (Boehringer). The probes were MSl, MS31, MS43a (Cellmark Diagnostics) and YNH24 (Promega Corporation). Ethidium bromide (0.5 pglml) was included in the TBE electrophoresis buffer and the loading buffer. The Amersham marker SJ5000 was used as size marker. Preparation of DNA, RFLP-analysis and the calculation of fragment lengths were carried out as described previously [2]. The migration distances of the fragments were measured manually. Only clearly separated fragments were recorded as two bands. Duplicate determinations of fragment lengths were performed on different plates and the mean values used throughout this study. In order to obtain data with normally distributed measurement errors independent of the fragment length all measurements were transformed into normalized migration distance [2]. The transformation was accomplished by the function j(b) = m = 796/(3.7 + b’.5) + 32.3 where b is the fragment length in units of kilobasepairs and m the normalized migration distance in units of millimetres. In case of single-band patterns (apparent or true homozygotes) the length of the fragment was included twice, i.e. as band 1 and as band 2. Estimation of the frequency of DNA-profiles Frequencies of DNA-profiles were estimated from the allele frequencies. These were estimated for each probe by counting the number of bands within a given interval in a reference sample from the population and dividing by the total number of bands (N) in the database. If the count of bands in the interval was zero the frequency v) was set to l/N unless otherwise stated. The width of the interval may be assessed in terms of the standard deviation (S.D.). In our laboratory the S.D. of determinations carried out on two different plates was 0.5 mm [2]. An interval of =t3 S.D. thus corresponds to an interval ranging from m - 1.5 mm to m + 1.5 mm where m is the transformed fragment length of the band. In practical casework we have used an interval of f 6 SD. corresponding to f 3 mm. Let fibe the allele frequency of band 1 and f2of band 2. Thus the frequency p of the bandpair is 2f& assuming Hardy-Weinberg equilibrium. For single Let pl be the frequency of band patterns the frequency was calculated as 2fi. the bandpair obtained with the probe MSl, p2 with MS31, p3 with MS43a and p4 with YNH24, then the frequency of the DNA-profile q is p1p~~p~.

u,

w

mt-

t

mt

122 TABLE 1 THE TOTAL NUMBER OF BANDPAIRS AND THE NUMBER AND FREQUENCY OF SINGLEBAND PATTERNS IN A DANISH POPULATION SAMPLE Probe

homozygotes

Number of bandpairs

Apparent Number

Frequency

MS1 MS31 MS43a YNH2

465 460 471 463

17 22 41 21

0.036 0.048 0.087 0.045

Total

1859

101

0.054

Results Distribution

of allde frequencies

The database comprised more than 900 fragment lengths for each of the four probes. The number of bands is given in Table 1. Two samples exhibited three fragments with MSl. These samples were not included in this study. 1 .o

2 6

0.9

z2 0.8 rc

: + 0 2

0.7

= Y

0.5

0.6

0.4 0.3 0.2 0.1 0.0

40

60

100

80 Fragment

length

mm

Fig. 2. Accumulated allele frequency as a function of fragment length (mm). Fragment lengths (kb) obtained with the probe MS43a were transformed into normalized migration distance (mm) and grouped in l-mm intervals. Stepped curve: data from a sample of 50 bands; smooth cnrve: data from a sample of 942 bands.

123

For each probe the fragment lengths (mm) were grouped in l-mm bins and the relative frequencies plotted against the fragment length (Fig. 1). The distributions were similar to frequency data obtained for other Caucasian population samples [5 - 151. The characteristic form of the frequency distribution for each probe did not change essentially with the size of the sample. In Fig. 2 the accumulated frequency curves for samples consisting of 50 and 942 fragments obtained with MS43a are compared. The 50-band sample was extracted from the larger database using random numbers. The trends of the curves were very similar. The accumulated frequency plot may be useful when different population samples are to be compared (see Discussion). Heterozygosity

The overall heterozygosity observed in this population sample was approximately 95% (Table 1). It is not possible to calculate an estimate of the frequency of homozygotes because the true number of alleles is unknown. From the formula

0.10

5‘ 0.09 ii ii 0.08 t Ic

: 3 D z K

0.07 0.06 0.05 0.04 0.03 0.02

0.01 0.00 0

10

20

30

40

60

50 Distance

mm

Fig. 3. Relative frequency of distances between band 2 and band 1 (mm) as a function of the distance (mm). Fragment lengths (kb) obtained with the probe MS43a were transformed into normalized migration distance, the distances between band 2 and band 1 were calculated and grouped in l-mm intervals. The relative frequencies were calculated and plotted against the distance. (-) The frequency of distances as estimated by the calculation of the distances between the two bands of all possible bandpairs in the sample. The distances were grouped in 1 mm bins and the frequencies plotted against the distance.

124

where fi is the frequency

of fragments in interval i, and n the number of intervals or ‘alleles’, it can be calculated, assuming Hardy-Weinberg equilibrium, that a homozygosity, H, of 5% corresponds to a width of the interval of approximately 2.5 mm. This again corresponds to 25 - 50 alleles for each of the four probes. However, the real number of alleles must be expected to be much larger than that, and it is obvious that the apparent homozygosity represents a pronounced excess. As pointed out by others, especially by Devlin et al. [4], the excess of single-band patterns may be due to coalescense of close heterozygotes. We have calculated the distances between pairs of clearly separated alleles in units of mm. The distances were grouped in l-mm intervals, and the relative frequencies of the distances plotted against the distance in mm. The results obtained with the probe MS43a are shown in Fig. 3. The frequency increased with decreasing distance between the fragments until 2 mm. Below 2 mm the frequency decreased rapidly. It is possible to estimate the distribution of the distances between the two fragments of a bandpair by calculating the distances between all possible pairs. With N fragments in the sample the number of possible pairs is N. (IV- 1)/2. The distances were grouped in l-mm intervals and the frequency of each interval plotted against the distance (Fig. 3). From this estimate it is ex-

0.06 2 ifa s 0 .k :

0.05

0.04

‘; 0 C K

0.03

0.02

0.01

0.00 0

lo

20

30

40

50

60

70

80 Distance

90

100

mm

Fig. 4. Relative frequency of distances between band 2 and band 1 (mm) as a function of the distance (mm). Fragment lengths obtdned for each of the probes MSl, MS31, MS43a and YNH24 were transformed into normalized migration distance, the distances between band 2 and band 1 were calculated and grouped in l-mm intervals. The relative frequencies were calculated for each probe, averaged for each interval and plotted against the distance. (-) The frequency of distances as estimated by the calculation of the distances between the two bands of all possible bandpairs for each probe. The distances were grouped in 1 mm bins, and the results for all four probes were averaged and plotted against the distance.

125

pected that the frequency will increase continuously with decreasing distance. However, the distribution of the distances recorded in this study exhibit a significant lack of distances below 2 mm (Fig. 3). The lack corresponds to a frequency of approximately 0.08 which is of the same order of size as the frequency of single-band patterns (0.087) obtained with MS43a (Table 1). Similar results were obtained for the other three probes. Since the transformed values are used the frequencies can be averaged for all probes and plotted against the distance (Fig. 4). The lack of distances below 2 mm corresponds to a frequency of approximately 0.05 which again corresponds to the overall frequency of single band patterns (Table 1). Frequency of DNA-profiles Allele frequencies have to be estimated from random population samples. It has been suggested to include the upper confidence limit in the estimation [16], or to use a minimum default frequency [17] in order to compensate for sampling errors. The frequencies of 437 DNA-profiles - each with eight different bands - from the database were calculated using a reference sample consisting of 920 bands for each probe. The window was *3 S.D. (* 1.5 mm). The frequency dis-

0.5

0.4

0.3

0.2

0.1

0.0

Fig. 5. The distribution of frequencies of 437 DNA-profiles estimated with a reference sample containing 920 bands for each probe. The window was * 3 S.D. (3 mm). The minimum default allele frequency was l/920.

126 TABLE 2 STATISTICS OF THE DISTRIBUTION OF THE FREQUENCIES OF 437 DNA-PROFILES AS ESTIMATED WITH A REFERENCE SAMPLE CONSISTING OF 920 BANDS FOR EACH PROBE. CALCULATIONS WERE CARRIED OUT WITH MINIMUM DEFAULT ALLELE FREQUENCIES OF 0.01 AND 11920. ALL VALUES ARE GIVEN AS 1og1,,(q920)

Mean Median Minimum Maximum

0.01

11920

-8.7 -8.7 - 12.3 -6.7

-8.8 -8.7 - 12.8 -6.7

tribution was calculated with a minimum default frequency of l/920. The frequencies were approximately lognormally distributed (Fig. 5). An increase of the minimum default frequency to 0.01 did not change the distribution significantly (Table 2). With a window of * 6 S.D. (* 3 mm) the frequencies were approximately 28 = 256 times higher than with a window of *3 S.D. Discussion Several studies have been published on the frequency distribution of the length of HinfI-restriction fragments as revealed by MSl, MS31, MS43a and YNH24 in different, mainly European, Caucasian populations [5- 151. Usually, the data are not accessible in tabular form and exact comparisons cannot be performed. However, a rough estimate suggests that the distributions found in this study are similar to those reported for other Caucasian population samples. In one x

1.0

0

c :

E

0.0

2 aI

.$ 0.6 0 5 E II 0.4 0

d

0.2

:

0.0 12

3

1

I

/

I

I

4

5

6

7

8

Fragment

length

9 kb

Fig. 6. Accumulated frequency curves obtained with YNH24. (- in a Caucasian population from Utah, USA 151.(-) Data from 926 sian population (this study). Fragment lengths were grouped in bins were calculated, summed and plotted against the fragment length

-) Data from 151 alleles found alleles found in a Danish Caucaof 30 basepairs. The frequencies fib).

127

study comprising unrelated Caucasians from Utah, USA the data were presented in tabular form (see Table 8 in Ref. 5). DNA was restricted with Hi&I and analysed with YNH24. We compared these data with our own data using accumulated frequency curves (Fig. 6). The two curves exhibit only insignificant differences. In agreement with others we find a high number of single-band patterns in our database (Table 1). The high frequency of single-band patterns has been associated with the occurrence of an extreme excess of homozygosity [19] and has given rise to an extensive debate on the importance of population stratification [16,20-241. On the other hand, it has been suggested that the apparent excess of homozygotes is due to coalescence of close heterozygotes and to nondetectable low molecular weight bands [3,4,25]. The transformed data used in this study are very suitable for investigations of close heterozygotes because the measurement error, and hence the discrimination power, is independent of the fragment length. For the same reason data obtained from all 1860 bandpairs can be assembled (Fig. 4). The lack of two-band patterns with distances less than 2 mm in relation to those expected is obvious. This agrees with common experience; it may often be difficult to distinguish two-band patterns when the distance between the bands is less than 2 mm. The lack of two-band patterns corresponds approximately to the frequency of single-band patterns (0.05) which indicates that close heterozygotes may be the major cause of the apparent excess of homozygotes. This finding constitutes a test - albeit weak - of HardyWeinberg equilibrium as does the close similarity between observed and expected frequencies (Figs. 3 and 4). The distribution of frequencies of DNA-profiles exhibits a very wide range. With a 3-mm window (&3 S.D.) the frequency ranged from 10 -12 to 10 -6 with a median of approximately lo-‘. With a 6-mm window the frequencies were approximately 250 times higher. To avoid extreme underestimates of frequencies it has been suggested to use the upper confidence limit [19] or to use a minimum default frequency [17,18,26]. Evett and Gill [17] suggest a minimum default frequency of 0.01 and find no need to use the upper confidence limit. Berry et al. [26] suggest a minimum default value of l/N where N is the number of bands in the reference sample. The distributions of frequency estimates obtained with a reference sample consisting of 920 bands using 0.01 and l/N as the minimum default frequency did not differ significantly. We therefore suggest a minimum default frequency of l/N as an efficient mean to avoid large underestimates. Acknowledgements We are indebted to Mrs Jane Hellung Lauridsen for excellent technical assistance.

and Mrs Susanne Billesb4le

References 1

0. Svensmark and B. Eriksen, Measurement

errors in DNA-profiling. Proceedings

oftheZnter-

128

2

3

4 5

6

7 8 9

10

11

12

13

14

15

16

17 18

19 20

national Symposium on Human Identification, Promega Corporation, Madison, WI, 1991, p. 322. B. Eriksen, A. Bertelsen and 0. Svensmark, Statistical analysis of the measurement errors in the determination of fragment length in DNA-RFLP analysis. Forensic Sci. Znt., 52 (1992) 181- 191. C. Brenner and J.W. Morris, Paternity index calculations in single locus hypervariable DNA probes: validation and other studies. Proceedings of the International Symposium on Human Identiftiation, Promega Corporation, Madison, WI, 1989, pp. 21-54. B. Devlin, N. Risch and K. Roeder, No excess of homozygosity at loci used for DNA fingerprinting. Science, 249 (1990) 1416 - 1420. S.J. Odelberg, R. Plaetke, J.R. Eldridge, L. Ballard, P.O’Connell, Y. Nakamura, M. Leppert, J.M. Lalouel and R. White, Characterization of eight VNTR loci by agarose gel electrophoresis. Genomics, 5 (1989) 915 - 924. J.C. Smith, R. Anwar, J. Riley, D. Jenner, A.F. Markham and A.J. Jeffreys, Highly polymorphic minisatellite sequences: Allele frequencies and mutation rates for five locus-specific probes in a Caucasian population. J. Forensic Sci. Sot., 30 (1990) 19-32. B. Brinkmann, S. Rand and P. Wiegand, Population and family data of RFLP’s using selected single- and multi-locus systems. Znt. J. Legal Med., 104 (1991) 81- 86. P. Gill, S. Woodroffe, J.E. Lygo and ES. Millican, Population genetics of four hypervariable loci. Int. J. Legal Med., 104 (1991) 221-227. C. Buffery, F. Burridge, M. Greenhalgh, S. Jones and G. Willott, Allele frequency distributions of four variable number tandem repeat (VNTR) loci in the London area. Forensic Sci. Ink, 52 (1991) 53-64. L. Henke, S. Cleef, M. Zakrzewska and J. Henke, Population genetic data determined for five different single locus minisatellite probes. In T. Burke, G. Dolf, A.J. Jeffreys and R. Wolff (eds.), DNA Fingerprinting: Approaches and Applications, Birkhduser, Basel, Boston, Berlin, 1991, pp. 144- 153. W. Pflug, G. Bassler, G. Mai, U. Keller, S. Aab, B. Eberspdcher and G. Wahl, Allele frequencies for five different single locus probes in a population of South-west Germany. In C. Rittner and P.M. Schneider (eds.), Advances in Forensic Haemogenetics, Vol. 4, Springer Verlag, Berlin, 1992, pp. 240-242. E. Valverde, C. Cabrero, A. Diez, A. Carracedo and T. Borras, Allele frequency in the population of Spain using several single locus probes. In C. Rittner and P.M. Schneider (eds.), Advances in Forensic Haemogenetics, Vol. 4, Springer Verlag, Berlin, 1992, pp. 187- 189. V.L. Pascali, M. Dobosz, M. Pescarmona, E. d’Aloja and A. Fiori, Allele frequency distribution of two VNTR polymorphisms (YNH24/D2S44; alpha globin 3’HVR/Dl6) in Italy, In C. Rittner and P.M. Schneider (eds.), Advances in Forensic Haemogenetics, Vol. 4, Springer Verlag, Berlin, 1992, pp. 201-203. S. Alonso, A. Castro, A. Garcia-Orad, P. Arizti, G. Tamayo and M. Martinez de Pancorbo, MSl, MS31 and MS43a single locus probes: a preliminary study in the basque population and its application in paternity testing. In C. Rittner and P.M. Schneider (eds.), Advances in Forensic Haemogenetics, Vol. 4, Springer Verlag, Berlin, 1992, pp. 228-230. N. Morling and H.E. Hansen, Matching criteria for paternity testing with VNTR systems. In C. Rittner and P.M. Schneider (eds.), Advances in Forensic Haemogexetics, Vol. 4, Springer Verlag, Berlin, 1992, pp. 149- 152. ES. Lander, Population genetic considerations in the forensic use of DNA-typing. In J. Ballantyne, G. Sensabaugh and J. Witkowski (eds.), Banburry Report 32: DNA Technologyand Forensic Scie-nce,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NJ, 1989, pp. 143 - 156. I.W. Evett and P. Gill, A discussion of the robustness of methods for assessing the evidential value of DNA single locus profiles in crime investigations. Ekctrophoresis, 12 (1991) 226 - 230. P. Gill, I.W. Evett, S. Woodroffe, J.E. Lygo, E. Millican and M. Webster, Databases, quality control and interpretation of DNA profiling in the home office forensic science service. Ekctrophoresis, 12 (1991) 204 - 209. E.S. Lander, DNA-fingerprinting on trial. Nature, 339 (1989) 501-505. J.E.Cohen, DNA fingerprinting for forensic identification: potential effects on data interpreta-

129

21 22 23 24 25

26

tion of subpopulation heterogeneity and band number variability. Am. J. Hum. Genet., 46 (1990) 358-368. J.E. Cohen, M. Lynch and C.E. Taylor, Forensic DNA tests and Hardy-Weinberg equilibrium. Science, 253 (1991) 1037- 1038. B. Devlin, N. Risch and K. Roeder, Response. Science, 253 (1991) 1039 - 1041. R.C. Lewontin and D.L. Hartl, Population genetics in forensic DNA-typing. Science, 254 (1991) 1745 - 1750. R. Chakraborty and K.K. Kidd, The utility of DNA typing in forensic work. Science, 254 (1991) 1735- 1739. R. Chakraborty, M. De Andrade, S.P. Daiger and B. Budowle, Apparent heterozygote deficiencies observed in DNA typing data and their implications in forensic applications. Ann. Hum. Genet., 56 (1992) 45 - 57. D.A. Berry, I.W. Evett and R. Pinchin, Statistical inference in crime investigations using deoxyribonucleic acid profiling. Appl. Stat., 41 (1992) 499-531.