Genetic differentiation among local populations of medaka fish (Oryzias latipes) evaluated through grid- and deme-based sampling

Genetic differentiation among local populations of medaka fish (Oryzias latipes) evaluated through grid- and deme-based sampling

Gene 443 (2009) 170–177 Contents lists available at ScienceDirect Gene j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e /...

840KB Sizes 2 Downloads 33 Views

Gene 443 (2009) 170–177

Contents lists available at ScienceDirect

Gene j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / g e n e

Genetic differentiation among local populations of medaka fish (Oryzias latipes) evaluated through grid- and deme-based sampling Takafumi Katsumura a, Shoji Oda a, Shuhei Mano b, Naoyuki Suguro c, Koji Watanabe d, Hiroshi Mitani a, Hiroki Oota a,⁎, Shoji Kawamura a a

Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Seimeitou 502, Kashiwa, Chiba 277-8562, Japan Graduate School of Natural Sciences, Nagoya City University, Nagoya, Aichi, Japan Freshwater Fisheries Experimental Station, Kanagawa Prefectural Fisheries Technology Center, Kanagawa, Japan d FUJIYA CO., LTD., Kanagawa, Japan b c

a r t i c l e

i n f o

Article history: Received 2 March 2009 Received in revised form 13 April 2009 Accepted 20 April 2009 Available online 3 May 2009 Received by J.N. Volff Keywords: Medaka mtDNA Population genetics Demographic effect Threatened species

a b s t r a c t The medaka (Oryzias latipes), a tiny fish, has been an excellent experimental model for molecular and developmental genetics, and is expected for being a “vertebrate” model animal for population genetics, because 1) there is abundant within-species variation, and 2) the whole genome sequence has been determined for one of the inbred strains. In spite of its potential usefulness, there is no comprehensive study on quantifying between- and within-population genetic diversity of wild medaka. To investigate population structure, we examine nucleotide sequences of the non-coding D-loop region and the cytochrome b gene of the mitochondrial (mt) genome for medaka individuals collected with distinct two sampling-methods. Using deme-based sampling, out of 373 total individuals from three local (two wild and one biotope) populations only 16 distinct sequence types of mt D-loop are found. However, we find 26 D-loop types in 35 individuals collected with grid-based sampling from various geographical regions in East Asia. We carry out statistical tests to evaluate the distribution pattern of nucleotide frequencies among segregating sites under the standard neutral model, and we show that the deme-sampled populations might have experienced population size reduction and/or sub-structuring caused by natural or artificial migration. The reduction of genetic variation has been driven more markedly in the populations from the biotope, suggesting that human activities can pose an impact on the demographic history of medaka. Nevertheless, our results suggest that the grid-based sampling gives more abundant variations than the deme-based sampling, while the deme-based sampling gives more information about the local-wild characteristics of allele frequency spectrum than grid-based sampling. This could be a basis of strategy to use medaka as a vertebrate model animal for comparative population genomics. © 2009 Elsevier B.V. All rights reserved.

1. Introduction The medaka (Oryzias latipes) is a tiny fresh-water teleost found in the Japanese archipelago, the southern Korean peninsula, and the eastern coast of the Chinese continent, and has been utilized as an experimental animal since the early 20th century (Wittbrodt et al., 2002; Shima and Mitani, 2004). A number of inbred strains have been established, and labstocks originally collected from different local sites are maintained in

Abbreviations: bp, base pair(s); mt, mitochondria(l); PCR, polymerase chain reaction; RFLP, restriction fragment length polymorphism(s); SNP, single nucleotide polymorphism; InDels, insertions/deletions; NJ, the neighbour-joining (method); Fst, fixation intex; dw, the mean number of differences between different sequences sampled from the same subpopulation; db, the mean number of differences between sequences sampled from the two different subpopulations sampled; Chiba-N, Chiba site N; Ibaraki-K, Ibaraki site K; Kanagawa-O, Kanagawa site O; N.JPN, the northern Japanese group; S.JPN, the southern Japanese group; E.KOR, the east Korean group; W.KOR, the west Korean/Chinese group; NBRP Medaka, the National BioResource Project Medaka. ⁎ Corresponding author. Fax: +81 4 7136 3713. E-mail address: [email protected] (H. Oota). 0378-1119/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2009.04.017

universities and institutes. Using these collections, the phylogenetic relationships of the local samples have been analyzed, and four major groups (Southern Japanese, northern Japanese, East Korean, and West Korean/Chinese groups) have been recognized (Sakaizumi et al., 1983; Sakaizumi, 1986; Sakaizumi and Joen, 1987). Recently, the whole genome sequence has been determined for the inbred strain Hd-rR, which belongs to the southern Japanese group (S.JPN) (Kasahara et al., 2007). A comprehensive comparison of the genome sequences between the Hd-rR and HNI (derived from the northern Japanese (N.JPN) group) strains agrees with the study of their mitochondrial cytochrome b gene in estimating their divergence time to be around 4 million years ago (Takehana et al., 2003; Kasahara et al., 2007). With an excellent range of molecular genetic techniques available and an accumulating set of data on genetic variations, the medaka has become an increasingly valuable model system for understanding genetic changes in humans associated with population differentiation (Nakayama et al., 2002; Omran et al., 2008; Matsumoto et al., 2009). In spite of medaka's powerful utilities, the genetic diversity in wild medaka populations remains largely unknown. Our fundamental

T. Katsumura et al. / Gene 443 (2009) 170–177

171

Fig. 1. Map of the sampling locations. The local stocks kept in Graduate School of Frontier Sciences, the University of Tokyo (Shima et al., 1985; Matsumoto et al., 2006) and the local-wild populations are mapped. (a) The opened and gray circles, and opened and filled triangles represent the sampling locations for the local lab-stocks of the northern and the southern Japanese, and the eastern and the western Korean groups, respectively, based on the grid-based sampling method. (b) This window shows the sampling locations for the lab-stock of O. luzonensis. (c) This window shows the sampling location in the Kanto area. The opened circles represent the three local-wild populations obtained by the deme-based sampling method.

question is, how are medaka populations structured? We tentatively regard a school as a “deme” (=a local-wild population) that is the smallest mating-unit of wild medaka. Wild schools of medaka are found in marshes, ponds, irrigation canals, and rice fields. We do not know, however, if and how much gene flow exists between such demes. To investigate genetic diversity for the two Japanese major groups, we collected 35 individuals of the medaka lab-stocks (non-inbred strains) from 33 wild sites in East Asia as “grid-based” samples (Fig.1a and b). We also collected 373 individuals from three local populations as “demebased” samples (Fig. 1c) and compared the genetic variation of the cytochrome b gene and the D-loop region of mtDNA between the gridbased and deme-based samples. Here we describe the first investigation of within-deme genetic diversity of medaka and discuss demographic history in reference to conservation of this threatened species. 2. Materials and methods 2.1. Sampling methods The O. latipes resource of lab-stocks consisted of the four major geographically distinct groups: the northern Japanese, the southern Japanese, the east Korean, and the west Korea/Chinese groups (Sakaizumi et al., 1983, 1984; Sakaizumi and Joen, 1987). These were collected from local sites, and have been maintained for many generations as closed colonies in Graduate School of Frontier Sciences, the University of Tokyo (Shima et al., 1985; Matsumoto et al., 2006). A total of 35 lab-stock individuals from 33 local sites were analyzed (northern Japan: Niigata, Ryotsu, Kaga, Kosugi, Tsuruoka, Maizuru,

Obama, Odate [n = 2], Sabae, Kamikita, Yokote, Miyazu, Yamagata; southern Japan: Tanabe, Takamatsu, Tessei, Kasumi, Urizura, Iwaki, Mishima, Hagi, Okewaki, Kikai, Nago [n = 2], Kochi, Yamaguchi, Akishima, Gushikami; east Korea: Yongchon, Sajin; west Korea: Maegok, Bugang; and China: Shanghai). In addition, 373 medaka individuals were collected from May 2006 to November 2007 at three local sites in the Kanto area (Chiba site N [n = 124], Ibaraki site K [n = 90], and Kanagawa site O [n = 159]). The Chiba-N and the Ibaraki-K sites are irrigation channels of rice fields that we regard as “local-wild” populations because these medaka are not under deliberate human protection. On the other hand, the Kanagawa-O site is an area installed in a wet rice field for the protection and conservation of medaka since the summer of 2005. We refer to this area as a “biotope” wherein medaka habitat has been restored by human activity. Because the biotope connects to irrigation channels and is not closed, we consider it a “semi-wild” population. Fig. 1 is a map of the Japanese archipelago showing the Kanto area from which the three local populations were collected and the local sites from which the 33 labstocks originate. The geographic distance between the three local sites is: 55 km (Chiba-N–Ibaraki-K), 100 km (Chiba-N–Kanagawa-O), and 155 km (Ibaraki-K–Kanagawa-O). We define the individuals from the local-wild and semi-wild sites as “deme-based” samples and the 35 lab-stock individuals as “gridbased” samples. Philippine medaka (Oryzias luzonensis) were supplied by the National BioResource Project Medaka (NBRP Medaka) and examined as an outgroup. In addition, we used the medaka mtDNA sequences (Hd-rR, Amino and SOK) from the DNA sequence database,

172

T. Katsumura et al. / Gene 443 (2009) 170–177

DDBJ/GeneBank/EMBL (the accession numbers are AP008938, AP008944, and AP008947, respectively) for the analysis. 2.2. DNA extraction The bodies from lab-stocks and deme-based individuals (Chiba-N and Ibaraki-K) and the fins from Kanagawa-O were dissolved in a 10% SDS/0.5 M EDTA/proteinase K solution, and total genomic DNAs were extracted and purified using phenol-chloroform and subsequent isopropanol precipitation. The DNAs were resuspended in 10–100 μl of TE buffer. We released the medaka individuals sampled from the biotope back into the collection site after clipping their fins. 2.3. Polymerase chain reaction amplification We amplified the 616-bp region of the mitochondrial D-loop by polymerase chain reaction (PCR) using a pair of primers: Medaka_Dloop_F1 (5′-CCC AAA GCC AGG ATT CTA A-3′) and Medaka_D-loop_R1 (5′-AAC CCC CAC GAT TTT TGT C-3′). We designed these primers based on the whole mitochondrial genome sequence that has been determined for the inbred Hd-rR southern Japanese strain (Mukai et al. unpublished data). Approximately 50 ng of the genomic DNA was used as a template for PCR in a 30 μl solution containing dNTP at 0.2 mM, 0.08 μM of each of primer, 0.75 U of EX Taq polymerase HS (TaKaRa Shuzo Co., Kyoto, Japan), and the reaction buffer attached to the polymerase. Reactions were carried out in the DNA Engine PTC200 (Bio-Rad Laboratories, Inc, Hercules, USA) using the following protocol: an initial denaturing step at 95 °C for 2 min, 40 cycles of denaturation at 95 °C for 30 s, annealing at 60 °C for 30 s, extension at 72 °C for 30 s, and a final extension step at 72 °C for 5 min. The PCR products were diluted 20-fold and used as templates in the directsequencing reaction (following the commercial protocol), and were then analyzed in the ABI3130 DNA Sequencer (Applied Biosystems, Tokyo, Japan). We determined the 616-bp nucleotide sequences for the mt D-loop region in both strands, trimmed both ends, and then compared 508 bp for the analyses as described in Results. To amplify the mitochondrial cytochrome b gene, we used a pair of primers previously reported by Takehana et al. (2003): Cytb Fa (5′AGG ACC TGT GGC TTG AAA AAC CAC-3′) and Cytb RVa (5′-TYC GAC YYC CGR WTT ACA AGA CCG-3′). The following primers were used for determining nucleotide sequences: Cytb Fa, Cytb Fb (5′-CAA ATA TCA TTT TGA GGG GCC ACT-3′), Cytb Fc (5′-CGA CAA AGT ATC CTT CCA CCC TTA CTT-3′), Cytb Fd (5′-CCC TAT TCT ACA CAC CTC TAA ACA ACG-3′), Cytb RVa, Cytb RVb (5′-ACT GAA AAT CCC CCT CAA ATT CAT TG-3′), and Cytb RVc (5′-CCT CCA AGT TTG TTT GGA ATT GAT CGT AG-3′) (Takehana et al., 2003). PCR and sequencing were conducted following Takehana et al. (2003). We determined the 1187-bp nucleotide sequences for the mt cytochrome b gene. These nucleotide sequence data were deposited into the international DNA database DDBJ/EMBL/GeneBank (accession numbers: AB454561–AB455005). 2.4. Sequence alignment and data analysis The whole mtDNA sequences from three strains (amino [accession # AP008944], Hd-rR [accession # AP008938] and SOK [accession # AP008947]) obtained from the international nucleotide sequence database DDBJ/EMBL/GeneBank were added into the data that we sequenced. We made two datasets. The first dataset (from grid-based sampling) included 37 and 39 sequences of the cytochrome b gene (1138 bp) and the D-loop region (508 bp), respectively (35 O. latipes and one O. luzonensis from lab-stocks and three from database, though two from E.KOR were not sequenced for cytochrome b gene), and was used for evaluating between-population diversity. The second dataset (from deme-based sampling) included 373 sequences of the 505-bp fragment in the D-loop region and was used primarily for estimating within-population diversity.

The nucleotide sequences were preliminarily aligned by the program CLUSTAL W (Thompson et al., 1994). The sites showing nucleotide polymorphisms and insertions/deletions (InDels) were re-examined by human eyes for further alignment. We first sequenced [1] a 1241 bp segment that included the complete cytochrome b gene and [2] a 616 bp of the D-loop region for the lab-stock (grid-based) individuals, including O. latipes (the southern and the northern Japanese, the east Korean, and the west Korean/Chinese medakas) and O. luzonensis (the Philippine medaka). We added these sequences from three strains (Amino, Hd-rR and SOK) acquired from the database for the subsequent analysis. Initially, we trimmed both 5′- and 3′-ends of the obtained nucleotide sequences because of nucleotide ambiguity. For the D-loop region, we found InDels of various sizes (1 to 55 bp). There was a tandem repeat region in the D-loop. Seven of the 14 northern Japanese (N.JPN) individuals had an 11 nucleotide repeatunit (GCATGTGCGTT), while another seven had repeat-unit variations. On the other hand, the southern Japanese (S.JPN), the east Korean (E.KOR), and the west Korean/Chinese (W.KOR) individuals had perfect repeats of 11 nucleotides (GCATGCGCGTT). Most of the repeat units were identical in both S.JPN and N.JPN, but the 3′-most units tended to be variable in the N.JPN individuals, and the repeat numbers varied (7 to 32 repeats in O. latipes) that will be described elsewhere (Hirayama et al. in preparation). Therefore, we only used the 5′ seven repetitive sequences for analysis, and then obtained the 508-bp sequences (Supplementary Table 1a). Other shorter InDels were found in various mt genome regions. Ultimately after removing the InDels, for the first dataset (from grid-based sampling) we used 481 bp and 489 bp for the N.JPN and S.JPN groups, respectively; and for the second dataset (from deme-based sampling) we used 500 bp for the subsequent analyses (Supplementary Table 1b). For the cytochrome b gene, we simply removed InDels from the compared sequences and used 1141 bp out of 1241 bp for the subsequent analysis (Supplementary Table 2). Phylogenetic analyses were performed using the program MEGA4 (Tamura et al., 2007). Pairwise sequence divergences were calculated under the Tamura–Nei model (Tamura and Nei, 1993), and phylogenetic trees were reconstructed with the neighbor-joining (NJ) method (Saitou and Nei, 1987). The reliability of the tree was evaluated with 1000 bootstrap replicates (Felsenstein, 1985). To investigate the relationship between sequence types, the median-joining network (Bandelt et al., 1999) was reconstructed using the program NETWORK 4.5.0.0 (http://www.fluxus-engineering.com). 2.5. Estimation of within- and between-population diversity Within-population genetic diversity was estimated by gene diversity and nucleotide diversity (Nei, 1986). To investigate the demographic history of the medaka populations, we employed Tajima's D and Fu and Li's D⁎ tests under the assumption of neutrality for the mt D-loop region. The Tajima's D was originally developed in order to test the neutral hypothesis. However, since Tajima's D is based on the expectation of a constant population size at mutationdrift equilibrium, it is now widely used to detect changes in population size (Aris-Brosou and Excoffier, 1996; Oota et al., 2002; Mousset et al., 2004). Fu and Li's D⁎ test (1993), as well as Tajima's D, measures whether the observed frequencies of segregating mutations are compatible with the frequencies expected under the standard neutral model. Tajima's D and Fu and Li's D⁎ tests were computed by the program DnaSP v4.20.2 (Rozas et al., 2003). Tajima's D evaluates if there is imbalance between average heterozygosity among nucleotide sites (i.e. average number of mutations occurring between two samples) and the number of segregating sites (i.e. total number of mutations occurring in the sample genealogy) in the sampled nucleotide sequences under the standard neutral model. Fu and Li's D⁎ tests evaluate if there is imbalance between the number of segregating sites and the number

T. Katsumura et al. / Gene 443 (2009) 170–177

of singleton sites (i.e. number of mutations expected to occur mostly at the terminal branches of the sample genealogy) in the sampled nucleotide sequences under the standard neutral model (Tajima, 1989; Fu and Li, 1993). An excess of low heterozygosity sites results in negative values for Tajima's D. Analogously, an excess of singleton sites results in negative Fu and Li's D⁎. These conditions are expected to occur when the population grows rapidly or when the population is young after starting from a homogeneous stage, possibly as a result of a severe bottleneck or a selective sweep. They can also occur when many of the mutations are slightly deleterious. On the contrary, an excess of high heterozygosity sites results in positive Tajima's D while a shortage of singleton sites results in positive Fu and Li's D⁎. These conditions can occur when the population shrinks rapidly or experiences balancing selection. Finally, population admixtures can result in the opposite effect on Tajima's D and Fu and Li's D⁎. Fu and Li's D⁎ would always be positive unless admixture is represented by only one sequence among the samples. Tajima's D would be negative if a small population is admixed to a larger population, resulting in an increase of segregating sites without much increase in heterozygosity (Tajima, 1989; Di Rienzo and Wilson, 1991; Slatkin and Hudson, 1991; Fu and Li, 1993). The significance of Tajima's D and Fu and Li's D⁎ was obtained by examining the null distribution of 10,000 coalescent simulations of these statistics using the program DnaSP. The pairwise Fst-value between two subpopulations was calculated using DnaSP v4.20.2 (Rozas et al., 2003) as 1 − dw / db, where dw is the

173

mean number of differences between different sequences sampled from the same subpopulation and db is the mean number of differences between sequences sampled from the two different subpopulations sampled (Hudson et al., 1992). 3. Results 3.1. Phylogenetic analyses on medaka We constructed phylogenetic tree based on the nucleotide sequences obtained from mt cytochrome b and D-loop for grid- and deme-based samples (Supplementary Tables 1a, 1b, and 2). The overall topology of the phylogenetic tree was consistent between cytochrome b and D-loop (Fig. S1), and agreed with the previous study using RFLP and cytochrome b of mtDNA (Matsuda et al., 1997; Takehana et al., 2003). The southern Japanese (S.JPN) and the northern Japanese (N.JPN) groups form a cluster together, and the east Korean (E.KOR) and west Korean (W.KOR) groups form another cluster; inclusion of the outgroup (the Philippine medaka: Luzon) causes the S.JPN and N.JPN groups to cluster with the E.KOR and W. KOR groups. Clustering of each of the four major groups was supported by relatively high bootstrap values (62–100% in the cytochrome b gene and 61–100% in the D-loop region), indicating that the four groups are genetically highly differentiated. In the NJ tree involving the 16 types from the three deme-based population samples

Fig. 2. Median-joining network of the D-loop sequence types observed for O. latipes in Japan. The circles represent the sequence types. Pie-chart represents frequencies of local lab-stocks and Hd-rR (yellow), and local-wild populations [Chiba-N (pink), Ibaraki-K (blue), Kanagawa-O (green)]. The areas of the circles are proportional to the frequency of the samples. The red bars and the numbers on the branches represent the nucleotide differences between the sequence types.

174

T. Katsumura et al. / Gene 443 (2009) 170–177

and the 18 grid-based samples (16 S.JPN from lab-stocks, and two S. JPN from DNA database, plus one N.JPN from lab-stocks), we found that all of the 15 sequence types belong to the S.JPN group, except for type P (Fig. S2). To see their gene genealogy, we constructed a phylogenetic network based on mt D-loop sequences (Fig. 2). In the network, 76 nucleotide changes are observed between the S.JPN and the N.JPN groups. Three colors in the pie-charts represent three local demebased samples, and the size of the circles corresponds to the frequencies of mtDNA sequence types. We detected one distinct sub-cluster in the S.JPN group and another distinct cluster (denoted type P) between the S.JPN and the N.JPN groups. The sub-cluster in the S.JPN group consists of the Kyushu and the Okinawa individuals that inhabit the southwest-most islands of the Japanese archipelago (see Fig. 1). The sequence types from three populations did not form one cluster but were intermingled in the S.JPN groups cluster except for the type P. The type P consists of three identical sequences found only in Kanagawa-O (Supplementary Table 1b) and is distinguishable from the S.JPN and N.JPN groups. We additionally sequenced cytochrome b for the three type P Kanagawa-O individuals and found that it corresponds to the “Kanto district” population reported by Takehana et al. (2003), which consists of their C1-Mooka and C2-Futtu (accession numbers AB084748 and AB084749, respectively) (Fig. S3: 1649 bp in total). All of the type P Kanagawa-O individuals were located outside the N.JPN group with 100% bootstrap probability, and the Kyushu–Okinawa subgroup clustered with non-Kyushu–Okinawa S.JPN individuals through a relatively long branch with 100% bootstrap probability (Fig. S3). Thus, each local population shares D-loop sequence type found in various geographical sites in south Japan except for the Kyushu–Okinawa, and Kanagawa-O that contains three Kanto-specific-type individuals that are rare in and far from S.JPN.

indicates that among deme-based samples the genetic diversity is relatively low in the biotope (semi-wild) population (Kanagawa-O) compared to the local-wild (natural) populations (Chiba-N and Ibaraki-K). The grid-based samples (N.JPN and S.JPN) are typical examples of population admixture, in which only one sequence is collected from most of the localities. Note that the admixture mentioned here is caused by the grid-based sampling method. Thus, if local populations are highly differentiated from each other, an excess of low heterozygosity sites and an excess of singleton sites are expected, and thus negative values are expected for both Tajima's D and Fu and Li's D⁎ (Ptak and Przeworski, 2002). Nevertheless, these values were not significantly different from zero in either the S.JPN (Tajima's D, 0.056; Fu and Li's D⁎, 0.296) or the N.JPN (Tajima's D, − 0.755; Fu and Li's D⁎, −1.528) group (Table 1). This might suggest that some demographic effects which push these statistics upward are operating, such as population subdivision, or population shrinkage. Tajima's D and Fu and Li's D⁎ were significantly positive in the two deme-sampled natural populations, except in Fu and Li's D⁎ of Ibaraki-K (Chiba-N, Tajima's D 1.734 p b 0.05, Fu and Li's D⁎ 1.399 p b 0.05; Ibaraki-K, Tajima's D 2.213 p b 0.02, Fu and Li's D⁎ 0.498 not significant). The results probably suggest population shrinkage in Ibaraki-K and ChibaN. On the other hand, the deme-sampled biotope population (Kanagawa-O) showed a significant-negative Tajima's D (−2.005, p b 0.002) and significantly positive Fu and Li's D⁎ (2.342, p b 0.002). Notice that the two tests show significance in the opposite trends. In fact, the data has very unnatural haplotype distribution; the haplotype P has so many private alleles (Supplementary Table 1b). This suggests that the biotope population consists of individuals from highly differentiated populations which are admixed in a biased proportion. 4. Discussion

3.2. Estimation of within-population diversity 4.1. Phylogenetic relationship between and within geographical groups To compare genetic diversity between grid- and deme-based samples, we calculated population genetic statistics (Table 1). Overall, the deme-based samples had lower values of “within-subpopulation” diversities than the total population diversities from grid-based samples. The gene diversity values of the grid-sampled S.JPN and the N.JPN groups were 0.956–0.967, while those of the deme-sampled natural populations, Chiba-N and Ibaraki-K, and biotope population, Kanagawa-O, were 0.805, 0.641, and 0.225, respectively. The nucleotide diversity (π) of S.JPN was highest (26.4 × 10− 3), followed by N.JPN (17.8 × 10− 3); Chiba-N and the Ibaraki-K populations showed slightly lower nucleotide diversity than N.JPN (13.1 × 10− 3 and 13.8 × 10− 3, respectively); and the Kanagawa-O population was lowest (6.7 × 10− 3). Thus, genetic diversity is remarkably high in the total populations represented by grid-based samples for the S.JPN group and also for the N.JPN group than that of subpopulations represented by deme-based samples, indicating that local populations are genetically differentiated within each group. The result also

Using our lab-stocks, we found that overall topology of the gene tree based on the nucleotide sequences of the cytochrome b gene is in concord with that based on the D-loop sequences (Fig. S1). The topology had two remarkable characteristics shared in both trees: first is the formation of four major clusters with the pattern (((S.JPN, N.JPN), (E.KOR, W.KOR)), Luzon), and second is the intermingling pattern of local sites within each cluster. The former characteristic suggests that the four major geographical groups diverged long enough ago to allow genetic differentiation. The latter characteristic suggests that time has not elapsed long enough to allow new (private) mutations to accumulate between local populations so as for them to be distinguishable at nucleotide sequence level. Thus, only measures of population diversities, such as gene diversity and nucleotide diversity which are based on allelic frequency data, are effective to detect population differentiation among local populations of medaka. This also suggests that the different sequence types that characterize

Table 1 Genetic diversity statistics within populations based on the D-loop nucleotide sequences. Groups

Area

Populations

N.JPN S.JPN Kanto

a

Chiba-N Ibaraki-K Kanagawa-O

Number of individuals

Number of sitesa

Number of sequence types

Segregating sites

Singleton sites

Proportion of segregating sites (s)

Gene Diversity

Nucleotide diversity (π) × 10− 3

Theta (θ) × 10− 3

Tajima's D

Fu and Li's D⁎

14 17 124 90 159

481 (508) 489 (508) 500 (505) 500 (505) 500 (505)

12 14 11 7 3

33 43 22 20 55

22 16 1 2 0

0.0686 0.0879 0.0440 0.0400 0.1100

0.967 0.956 0.805 0.641 0.225

17.8 26.36 13.11 13.84 6.7

21.57 26.01 8.16 7.89 19.49

− 0.755 0.056 1.734⁎ 2.213⁎⁎ − 2.005⁎⁎⁎

− 1.528 0.296 1.399⁎

The number of used sites for the analysis (comparable length including InDels). ⁎ p b 0.05. ⁎⁎ p b 0.02. ⁎⁎⁎ p b 0.002.

0.498 2.341⁎⁎⁎

T. Katsumura et al. / Gene 443 (2009) 170–177

these local populations were present in the ancestral population and were randomly sorted by genetic drift such that the differences do not correlate with the geographic locations of their original sampling sites. The intermingling pattern of geographically distinct samples in a tree could also be explained by the following effects: [1] frequent (artificial/natural) migrations and [2] parallel mutations. The parallel mutation is unlikely because the cytochrome b gene and the noncoding D-loop region have different evolutionary rates, but both show equally the intermingling patterns. In addition, this hypothesis would require too many parallel mutations to have occurred in the relatively short evolutionary timeframe in which within-species variations are considered. In a previous study, only migrations through human activities were considered (Takehana et al., 2003). However, if this is the case, we would also see such intermingling patterns across the S. JPN and N.JPN groups, which we did not find. Instead, if natural

175

migrations (i.e. those not related to human activities) are the primary source of the phylogenetic intermingling pattern, we would expect genetic distance to correlate with geographic distance. Fig. 3 shows correlations between genetic distances and geographic distances among individuals within- and between-groups. We did find a clear correlation between the S.JPN and the N.JPN groups (R2 = 0.0529; p = 6.29 × 10− 6) (Fig. 3a), but did not find any correlation among the individuals within the N.JPN group (R2 = 0.004; p = 0.581) (Fig. 3b). The correlation seen within the S.JPN (R2 = 0.1399; p = 8.45 × 10− 5) (Fig. 3c) is attributed to inclusion of Kyushu–Okinawa subgroup, which is geographically (Fig. 1) and genetically (Fig. 2) isolated from the rest of the S.JPN group and show self correlation within the subgroup (R2 = 0.4165; p = 0.044) (Fig. 3d). When the Kyushu–Okinawa subgroup was excluded from the analysis, then the individuals from the main-island southern Japan did

Fig. 3. Correlation between geographic and genetic distances for Oryzias latipes based on the grid-based sampling method. The genetic distances are based on the cytochrome b and the D-loop region combined sequences: 1649 bp, including 72 bp InDels and 3 unknown nucleotides: i.e. comparable sites are 1574 bp: (a) the northern and the southern Japanese populations, (b) the northern populations, (c) the southern populations, (d) the Kyushu–Okinawa group (and Tamba [=Amino + Kasumi] as the outgroup), and (e) the southern populations excluding the Kyushu–Okinawa group.

176

T. Katsumura et al. / Gene 443 (2009) 170–177

not show a within-group correlation (R2 = 0.002; p = 0.769) (Fig. 3e). Thus, the overall low correlations between genetic and geographic distances within groups support the hypothesis that the effect of random sorting of ancestral polymorphism. 4.2. Eco-system and demographic history of medaka Our analysis on the deme-based populations reveals the medaka's eco-system and demographic history. Each deme of the local-wild populations had relatively high within-genetic diversity. We found 7 and 11 distinct mt D-loop types in Ibaraki-K and Chiba-N, respectively. Four of the mt types were shared between the local-wild populations. The Fst-value, the statistic representing between-population genetic diversity, was 0.220 between Ibaraki-K and Chiba-N, being consistent with high population differentiation and low gene flow (Nei, 1986). Because the sampling sites of the local-wild populations are in different river systems, it is unlikely that the two populations are frequently interbreeding. The 159 samples of the Kanagawa-O population were collected from a biotope that is regarded as a semi-wild population. The biotope was installed by the city government in 2005 in a wet rice field that is maintained under the conservation policy of local people. The number of medaka has increased since the biotope was built from 105 individuals in the summer of 2005, and although the number was once dropped to 42 individuals in the spring of 2006, it has since increased every year to a thousand individuals. Before the city government built the biotope, there was another biotope already in place that was built by the local people. When the city government built the new biotope (that we examined in this study), the medaka individuals from the old biotope were transferred into the new biotope. It is unknown if type P originated in the Kanagawa-O area. On the other hand, the extremely high frequency of type G in the Kanagawa-O population could be a result of biased breeding and founder effect, which has been unrecognized under the current conservation policy. Thus, the unique frequency spectrum of polymorphism in the Kanagawa-O could be attributed to the admixture of very different lineages and the biased breeding of a particular subpopulation in the biotope. If rice fields spread as before, the irrigation channels make a huge network beyond the river system, and inter-population mating would be more frequent. However, the recent reduction in the area covered by rice fields across Japan has probably isolated the local-wild populations and limited inter-population mating, further reducing gene flow and leading to loss of genetic variability by genetic drift in each local-wild population. This recent destruction of medaka habitat may be reflected by the significantly positive Tajima's D value that is the signal of population size reduction. 4.3. Conservation for medaka The influence of human activities was, thus, unexpectedly clear in the biotope. Following the standard of the International Union for Conservation of Nature and Natural Resources (IUCN), the Ministry of the Environment Japan included the medaka as one of the “Threatened Animals of Japan” in 1999 (Hosoya, 2000). Assuming that genetic diversity spectrum in the Chiba-N and the Ibaraki-K populations is the wild state, the Kanagawa-O population is remarkably abnormal due to the mixing of very different lineages (type P) and inbreeding of a specific lineage. Consequently, our analysis detected the effects that current conservation efforts, particularly the construction of biotopes, can have on medaka genetic diversity. We should mention that conservation activity in the Kanagawa-O population was designed with genetic diversity in mind, and the Kanagawa-O population successfully maintains local-region-specific sequences, including the third group (type P) that has not been found or is very rare in the other populations (Fig. 2). Despite such careful

management, we found a peculiar genetic diversity pattern in the Kanagawa-O. The state of low genetic diversity could increase vulnerability to population collapses due to disease. For the conservation movement to be more successful, it would be ideal to have active participation of scientists cooperating with local people to take into consideration the genetic diversity data of the local population and carry out controlled and unbiased breeding. We could conclude that such monitoring of wild medaka schools using a population genetic approach is essential for effective conservation of endangered species and the environments that they can inhabit. 4.4. Efficient use of medaka for comparative population studies Medaka has been used as an experimental model animal for molecular and developmental biology. Our grid- and deme-based analysis further showed that medaka is an excellent experimental model for comparative population genomics. In a recent study, our groups reported that the genes with non-synonymous SNPs showing high Fst-values in humans were polymorphic in the same or close sites in medaka; and some of those medaka SNPs showed signals of positive selection (Matsumoto et al., 2009). For such a study screening SNPs potentially concerning adaptation, first the grid-based sampling would be recommended, because gene diversity values were higher in the grid samples than those in the deme samples (Table 1). Secondary, the deme-based samples are required to see whether the interested SNPs are common or rare variants in the local population, because our deme-based analysis showed that the local-wild populations had substantial diversity (Table 1). It is an interesting question if the lab-stocks have diversity as high as the local-wild populations have. The lab-stocks are not inbred strains, but descendants originally collected from local-wild schools. Because the stocks are closed, they might have reduced diversity in each generation. For Nago and Odate, we examined two individuals each even in the grid-sampling method. Those showed the same sequence types in both the D-loop region and the cytochrome b gene. This could be an effect of closed circumstances of the lab-stocks. To maintain lineage of this threatened species, the particular examination for lab-stock genetic diversity would be required in the near future. At the same time, the deme-based analysis is also important to trace genetic diversity in local-wild populations in both the contexts of conservation and studies on comparative population genomics requiring accurate estimation of allele frequency spectrum. Acknowledgements This work was supported by a Grant-in-Aid for Scientific Research (A) from the Japan Society for the Promotion of Science (JSPS) (19207018) to SK, by a Grant-in-Aid for Scientific Research (C) from JSPS (19570226) to HO, and by a Grant-in-Aid for Scientific Research in the Priority Area “Comparative Genomics” (#015) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) to HM. We thank Dr. Makoto Hirayama for his useful discussion, and must thank Professor Emeritus Akihiro Shima and Dr. Atsuko Shimada (the University of Tokyo) for their efforts on keeping medaka stocks from wild populations and preparing a part of genomic DNA used in this study. Finally, we thank so much for the useful comments from anonymous reviewers. Author contribution HO conceived, and HO and SK formed the project. SO and NS collected the medaka individuals from the local sites, and SO and HM provided the medaka resources (lab-stocks). TK, SO, HO, HM, and SK designed the experiments. TK and KW performed PCR and sequencing, and TK and SM analyzed the data. TK, SK and HO wrote the paper.

T. Katsumura et al. / Gene 443 (2009) 170–177

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.gene.2009.04.017. References Aris-Brosou, S., Excoffier, L., 1996. The impact of population expansion and mutation rate heterogeneity on DNA sequence polymorphism. Mol. Biol. Evol. 13, 494–504. Bandelt, H.J., Forster, P., Rohl, A., 1999. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48. Di Rienzo, A., Wilson, A.C., 1991. Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc. Natl. Acad. Sci. U. S. A. 88, 1597–1601. Felsenstein, J., 1985. Confidence-limits on phylogenies — an approach using the bootstrap. Evolution 39, 783–791. Fu, Y.X., Li, W.H., 1993. Statistical tests of neutrality of mutations. Genetics 133, 693–709. Hosoya, K., 2000. The circumstances and protection in Japanese ricefish, “Medaka”. J. Jpn. Soc. Water Envir. 23, 135–139 (in Japanese). Hudson, R.R., Boos, D.D., Kaplan, N.L., 1992. A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9, 138–151. Kasahara, M., et al., 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447, 714–719. Matsuda, M., Yonekawa, H., Hamaguchi, S., Sakaizumi, M., 1997. Geographic variation and diversity in the mitochondrial DNA of the medaka, Oryzias latipes, as determined by restriction endonuclease analysis. Zool. Sci. 14, 517–526. Matsumoto, Y., Fukamachi, S., Mitani, H., Kawamura, S., 2006. Functional characterization of visual opsin repertoire in Medaka (Oryzias latipes). Gene 371, 268–278. Matsumoto, Y., et al., 2009. Medaka: a promising model animal for comparative population genomics. BMC Research Notes 2, 88. Mousset, S., Derome, N., Veuille, M., 2004. A test of neutrality and constant population size based on the mismatch distribution. Mol. Biol. Evol. 21, 724–731. Nakayama, K., Fukamachi, S., Kimura, H., Koda, Y., Soemantri, A., Ishida, T., 2002. Distinctive distribution of AIM1 polymorphism among major human populations with different skin color. J. Hum. Genet. 47, 92–94. Nei, M., 1986. Molecular Evolutionary Genetics. Columbia University Press, New York. Omran, H., et al., 2008. Ktu/PF13 is required for cytoplasmic pre-assembly of axonemal dyneins. Nature 456, 611–616.

177

Oota, H., et al., 2002. Extreme mtDNA homogeneity in continental Asian populations. Am. J. Phys. Anthropol. 118, 146–153. Ptak, S.E., Przeworski, M., 2002. Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet. 18, 559–563. Rozas, J., Sanchez-DelBarrio, J.C., Messeguer, X., Rozas, R., 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19, 2496–2497. Saitou, N., Nei, M., 1987. The neighbor-joining method — a new method for reconstructing phylogenetic trees. Mo. Biol. Evol. 4, 406–425. Sakaizumi, M., 1986. Genetic-divergence in wild populations of medaka, Oryzias-latipes (Pisces, Oryziatidae) from Japan and China. Genetica 69, 119–125. Sakaizumi, M., Joen, S., 1987. Two divergent groups in the wild populations of medaka Oryzias latipes (Pisces: Oryziatidae) in Korea. Korean J. Limnol. 20, 13–20. Sakaizumi, M., Moriwaki, K., Egami, N., 1983. Allozymic variation and regional differentiation in wild populations of the fish Oryzias-latipes. Copeia 311–318. Sakaizumi, M., Suzuki, H., Moriwaki, K., Kominami, R., Muramastu, M., 1984. Geographic-variation of ribosomal-RNA gene structure in wild populations of the medaka Oryzias-latipes. Jpn. J. Gen. 59, 651. Shima, A., Mitani, H., 2004. Medaka as a research organism: past, present and future. Mech. Dev. 121, 599–604. Shima, A., Shimada, A., Sakaizumi, M., Egami, N., 1985. First listing of wild stocks of the medaka Oryzias latipes currently kept by the Zoological Institute, Faculty of Science, University of Tokyo. J. Fac. Sci. Univ. Tokyo Sec. IV, pp. 27–35. Slatkin, M., Hudson, R.R., 1991. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555–562. Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. Takehana, Y., Nagai, N., Matsuda, M., Tsuchiya, K., Sakaizumi, M., 2003. Geographic variation and diversity of the cytochrome b gene in Japanese wild populations of Medaka, Oryzias latipes. Zool. Sci. 20, 1279–1291. Tamura, K., Nei, M., 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial-DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Wittbrodt, J., Shima, A., Schartl, M., 2002. Medaka—a model organism from the far East. Nat. Rev. Genet. 3, 53–64.