Biochemical Systematics and Ecology 63 (2015) 127e135
Contents lists available at ScienceDirect
Biochemical Systematics and Ecology journal homepage: www.elsevier.com/locate/biochemsyseco
Characterization of microsatellites and repetitive flanking sequences (ReFS) from the topmouth culter (Culter alburnus Basilewsky) Shi-Li Liu a, b, Zhi-Min Gu a, *, Jin-Liang Zhao b, **, Yong-Yi Jia a, Wen-Ping Jiang a, Jian-Lin Guo a, Qian Li a a Agriculture Ministry Key Laboratory of Healthy Freshwater Aquaculture & Key Laboratory of Freshwater Aquatic Animal Genetic and Breeding of Zhejiang Province, Zhejiang Institute of Freshwater Fisheries, Huzhou 313001, China b Key Laboratory of Freshwater Fishery Germplasm Resources, Shanghai Ocean University, Ministry of Agriculture, Shanghai 201306, China
a r t i c l e i n f o
a b s t r a c t
Article history: Received 17 June 2015 Received in revised form 16 September 2015 Accepted 26 September 2015 Available online 23 October 2015
The topmouth culter (Culter alburnus) is an economically important freshwater fish in China. We obtained 159 microsatellite containing sequences (MCSs) from genomic DNA in this species enriched by (CAA)8 and (GAA) 8 probes. Careful examination of these sequences revealed the existence of cryptic repeated elements on presumed unique flanking regions. These cryptic elements can be grouped into three families, with the MCSs of the each family sharing regions of similarity ranging between 40 and 130 bp in length, with 96% sequence similarity. Repbase scans revealed that a large proportion of the cryptic repetitive DNA was identified as transposable elements (TEs). Complex patterns were apparent among these sequences. In most (89.2%), a single TE was identified in an MCS, in three instances, the same TE was observed twice in the same MCS. Some MCS have two or even four different TEs. We isolated nine polymorphic microsatellite loci from sequences with no matches to TEs. In a sample of 30 cultured C. alburnus, we found that the average allele number was 8.1 per locus (range ¼ 4e17), with polymorphism informative content ranging from 0.364 to 0.898. These microsatellites can be used to study the population genetic diversity of this species. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Culter alburnus Microsatellite Transposable element Genetic diversity
1. Introduction The topmouth culter (Culter alburnus) is an economically important fish that is widely distributed throughout China's large rivers, reservoirs, and lakes (Chen, 1998). The cultured production of C. alburnus has expanded significantly in recent years in response to increased market demand. The development of molecular genetic markers for this species is needed to understand the genetic differences between cultured and wild populations, to help preserve genetic variability and prevent inbreeding depression of stocks.
* Corresponding author. ** Corresponding author. E-mail addresses:
[email protected] (Z.-M. Gu),
[email protected] (J.-L. Zhao). http://dx.doi.org/10.1016/j.bse.2015.09.024 0305-1978/© 2015 Elsevier Ltd. All rights reserved.
128
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
Microsatellites are codominant, highly polymorphic genetic markers that are ideal for assessing population genetic diversity and structure. Associations between microsatellites and mobile elements are a widely reported phenomenon in numerous taxa (Anderson et al., 2007; Bailie et al., 2010; McInerney et al., 2011). Dinucleotide microsatellite loci have been previously isolated from C. alburnus (Liu et al., 2014). Such loci, however, are stutter-prone (i.e., Taq errors can cause slippage during PCR) making determination of absolute allele sizes difficult (DeWoody et al., 2006). Information from microsatellite containing sequences (MCS) and their neighboring genomic regions can increase our knowledge about the origins and evolution of microsatellites (Bailie et al., 2010; McInerney et al., 2011). In this study, we developed and characterized nine novel polymorphic microsatellites isolated from C. alburnus genome providing a powerful genetic toolbox for the effective management and breeding of this species.
2. Materials and methods 2.1. Sampling and DNA extraction We collected fin clips from 30 cultured C. alburnus obtained from HuZhou, Zhejiang Province, China, and preserved them in 100% ethanol at room temperature prior to DNA extraction. Genomic DNA was extracted using the standard phenol-chloroform method (Sambrook and Russell, 2001). The quality and concentration of DNA were assessed with agarose gel electrophoresis and measured with a NanoDrop 2000c Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).
2.2. Development of microsatellite loci and examination of polymorphism Genomic DNA was enriched for microsatellite loci using the methods described by Glenn and Schable (2005). Briefly, genomic DNA from one individual was digested with the restriction enzyme MseI (New England Biolabs, Beverly, Massachusetts, USA) and ligated to short linkers (MseI A: 50 -TACTCAGGACTCAT-30 , MseI B: 50 - GACGATGAGTCCTGAG-30 ). The ligated products were used as templates to be amplified by PCR using the primer, MseI-N (50 -GATGAGTCCTGAGTAAN-30 ) under the following cycling conditions: 94 C for 4 min, then 28 cycles of 94 C for 30 s, 53 C for 1 min and 72 C for 1 min, followed by 72 C 5 min. After denaturation at 95 C for 15 min, the PCR products were hybridized to biotinylated probes (including microsatellite CAA and GAA motifs) and then attached to streptavidin-coated magnetic beads (Promega, Madison, WI, USA). The enriched fragments were amplified by a second round of PCR, and the 250e1000 bp fragments of the DNA product were then isolated on a 1.0% agarose gel. Fragments were excised, purified with a Gel Extraction Kit (Tiangen, Beijing, China) and ligated into pGEM-T vector (Promega). This was later transformed into DH5a competent cells (Tiangen) and plated onto Luria Bertani-agar plates containing 55 mg/L Ampicillin, 60 mg/L IPTG and 30 mg/L X-Gal. Positive clones were sequenced on an ABI 377 DNA sequencer (Applied Biosystems, Foster City, CA, USA). The sequences obtained were scanned against Repbase (Kohany et al., 2006) (http://www.girinst.org/censor/index.php) to locate possible known transposable elements (TEs). To improve the efficiency of microsatellite development, we used those sequences that did not match known TEs in Repbase to design primers. Microsatellite primer pairs were designed using the program Primer Premier 6.0 (Premier Biosoft International, Palo Alto, CA). One primer of each pair was labeled with Fam at the 50 end of the primer. The PCR was performed in a total volume of 10 mL containing 1 PCR buffer with 1.5 mmol/L MgCl2, 0.1 mmol/L of each primer, 10 mmol/L dNTP, 20 ng DNA and 0.25 U of Taq polymerase (Tiangen), with the following thermal cycling conditions: pre-denaturation for 3 min at 94 C, then 30 cycles of denaturation at 94 C for 30 s, 58 C for 30 s, extension at 72 C for 30 s and a final extension at 72 C for 5 min. The final PCR products were analyzed using an ABI3730xl DNA sequencer (Applied Biosystems). Fragment sizes were determined by reference to a standard ladder, ROX-500 (Applied Biosystems) using GeneMapper version 4.1 (Applied Biosystems).
2.3. Analysis All sequences were assembled and aligned in the SeqMan program (DNAStar, Madison, WI, USA) and refined manually. We found some contigs and compared these sequences in ClustalX (Larkin et al., 2007). Contigs containing similar cryptic repetitive DNA in the microsatellite flanking regions were subsequently grouped into DNA families (McInerney et al., 2011). Nucleotide diversities within families were calculated using the Kimura 2-parameter model as implemented in the MEGA 6.0 software (http://www.megasoftware.net/). The sequence divergence for the microsatellite flanking sequences of the same family was estimated using a global multiple sequence alignment. We excluded the microsatellite region from the alignment to avoid overestimating divergence. Indices of genetic diversity, including observed number of alleles (Na) and effective number of alleles (Ne), observed heterozygosity (HO), expected heterozygosity (HE), and departure from HardyeWeinberg Equilibrium (HWE) were calculated using GENEPOP 4.2 (Rousset, 2008). Polymorphism information content (PIC) was calculated using CERVUS version 3.0.3 (Kalinowski et al., 2007).
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
129
3. Results 3.1. Isolation of repetitive DNA and microsatellite characterization In total, we obtained 159 sequences containing microsatellites. After excluding sequences with high probabilities of matching known TEs in Repbase, we developed thirty-eight microsatellite markers (38 primer pairs). All 30 C. alburnus individuals successfully amplified with these 38 primer pairs, however, only nine of these primer pairs were polymorphic (two pentanucleotide microsatellites, a tetranucleotide microsatellite, five trinucleotide microsatellites, and a compound microsatellites composed of two tetranucleotide repeats; Table 1). The amplified peak patterns were clear without shadow bands. The mean number of alleles at these nine polymorphic microsatellite loci was 8.111 (range ¼ 4e17), while mean HO was 0.613 (range ¼ 0.310e0.933) and mean HE was 0.689 (range ¼ 0.390e0.921; Table 1). We observed significant deviation from HardyeWeinberg Equilibrium at two loci, Cal189 and Cal199, after Bonferroni correction, possibly due to population fragmentation and the relatively small sample size. Pairwise tests revealed no evidence of linkage disequilibrium at any of the nine polymorphic loci. For seven of the microsatellite loci (Cal120, Cal136, Cal146, Cal184, Cal189, Cal199 and Cal207) PIC was >0.5, while PIC for the remaining two loci, Cal149 and Cal198 was 0.364 and 0.393, respectively. 3.2. Families of cryptic repeats in microsatellite flanking sequences Based on sequence similarity, all the repeated elements identified in the microsatellite flanking sequences could be reliably grouped into distinct DNA families (see Fig. 1). Of the 159 sequences examined, a total of 56 (35.2%) were grouped into 3 DNA families, comprising between 7 and 39 sequences (Table 2). Mean nucleotide diversity estimates varied within families from 0.043 to 0.208 (mean ¼ 0.113; Table 2). The high divergence of sequences belonging to the same family suggests that these sequences evolved either neutrally or were only subjected to very low selective constraints. They shared regions of similarity ranging from 40 to 130 bp in length, with 96% sequence similarity. Complex patterns were noticed among these sequences. For instance, Family 1 (motif-CAA) was the largest with 39 sequences. The NJ tree constructed using the microsatellite flanking sequence was divided into many branches, with sequence differences among branches not equivalent. Group 1 contained five very similar sequences that differed only in the number of repeats (Fig. 1). Group 2, however, differed greatly from other sequences, because of the absence of several 7e30 bp sequences during sequence alignment (Fig. 1). The motif in Family 1 was repeated between 5 and 23 times, with nine sequences having 11 motif repeats, and subtle differences in the flanking regions of MCSs. All other sequences, however, shared cryptic repeated elements in the flanking regions of each MCS but did not share the same number of motifs (Fig. 2). All three families displayed similarities in the cryptic repeated elements in the flanking region on both sides of the microsatellite. However, the shared right area of three sequences from Family 3 (Cal103, Cal104, Cal105) was relatively short. 3.3. Transposable elements Comprehensive scanning of all 159 microsatellite containing sequences against the Repbase resulted in 164 matches with known repetitive/mobile elements. These included DNA transposons (61.6%), endogenous retroviruses (3.7%), LTR
Table 1 Details of nine polymorphic microsatellite loci in Culter alburnus developed from an enriched genomic library. Locus
GenBank accession no.
Primers sequence (50 / 30 )
Repeat motif
Size range (bp)
HO
HE
PIC
Cal120
KM272078
(AAC)10
213e225
4
0.467
0.629
0.552
Cal136
KM272094
(CCAAA)5(AAACA)8
208e278
13
0.933
0.89
0.864
Cal146
KM272104
(CAA)9(AAAC)5
163e210
9
0.800
0.703
0.661
Cal149
KM272107
(AGA)9
110e125
5
0.310
0.39
0.364
Cal184
KM272142
(CAA)14
147e174
7
0.700
0.785
0.736
Cal189
KM272147
(AAGA)16
163e235
17
0.667
0.921
0.898
Cal198
KM272156
(CAA)9
205e214
4
0.500
0.468
0.393
Cal199
KM272157
(AGAAG)18
191e219
7
0.444
0.757
0.700
Cal207
KM272165
F: TTGTTGTCCAGTAGCTCTTG R: TATGCTGCATCTCTGTTCAA F: GCTACAGCTCCTTTGGTATT R: CCAGTGAACATGAACATAGAC F: CATGCTACAACTGTGATGAAC R: GGCTCCGTGTTATGATAGTG F: CGGTTCTATTGGCTGTCATA R: CATCGTTACACTATCATTAGGC F: CTCTCAGCTCATATCTCTGC R: CCTAGTTCCTTGTATGTCACT F: GGAAGATAGCAAAGGGAGTA R: CATGTCCTGGCATACAGTAA F: CACTTGGCTTGTATTGTTCA R: TTGGTCTTGTCGGAATAGTT F: CCTGCCCTCAAACCAGTAAA R: TCATTGCCCAGTAACCACAT F: ATCACCATTCTGCTGTCTAA R: TCTCAAGTGCTAACTCAACT
(AAG)8
287e317
7
0.700
0.661
0.597
Na
Note: F, forward primer; R, reverse primer; Na, number of alleles revealed; Ho, observed heterozygosity; He, expected heterozygosity; PIC, polymorphism information content.
130
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
Fig. 1. Neighbor-joining tree of sequences of Family 1 constructed using the Kimura 2P distance model, with the microsatellite region excluded. Bootstrap values with 1000 replicates are shown. Group 1 contained five very similar sequences that differed only in the number of repeats. Group 2, however, differed greatly from other sequences, because of the absence of several 7e30 bp sequences during sequence alignment.
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
131
Table 2 Summary of the DNA families identified with the BLASTn analysis and microsatellite containing sequences with known transposable elements (TEs) identified from their flanking regions. Sequence No. of GenBank accession no. seta sequencesb
Identified TEs (no. of matches)c
Sequence divergenced
Family 1
39
0.043
Family 2
7
Polinton-1N1_DR(27), L2-3_Dre(12), RIRE7_I(1), DIRS-4N2-LTR_DR(1), DIRS1_DR(1), SINE2-2_DR(1), REX1-5_DR(1) Helitron-4_DR(7)
Family 3
10
KM272013-KM272049, KM272064 and KM272065 KM272050-KM272054, KM272133 and KM272171 KM272055-KM272063 and KM272097
Helitron-1N2_DR(14), Helitron-1N1_DR(4), Helitron-1N3_DR(3), Helitron1_DR(2), Helitron-4_DR(1)
0.208 0.087
Note: a, the MCS grouped into a DNA family based on sequence similarities among flanking regions; b, the number of sequences which are grouped into each DNA family; c, the name of identified transposable element provided by Repbase; the number of matches is indicated in brackets; d, the sequence divergence for the microsatellite flanking sequences of the same family; the microsatellite region was excluded from the alignment.
retrotransposons (15.2%) and non-LTR retrotransposons (16.5%). These cover all of the main eukaryote groups: 79.9% matches to fish, 7.9% to plants, 3.7% to insects and 1.8% to reptiles. Most of the matches were to the zebrafish, Danio rerio, perhaps due to the large volume of research undertaken on this model species. Results from the BLAST analysis (involving all sequences) are shown in Table 3. Information regarding TE transpositional mechanism (autonomous/non-autonomous) was obtained from Repbase and Web of Science reports. TEs were found in 76.1% (N ¼ 121) of all the MCS examined (Tables 2 and 3). On average, regions displaying a high identity to a TE (~80.3%) were 154 bp in length. In most instances (89.2%), a single TE was identified from an MCS. In 13 instances, however, two different TEs were observed in the same MCS (e.g., Cal066). In three instances, the same TE was observed twice in the same MCS (Cal099, Cal140, Cal149, Cal162). Some MCS have three (Cal163, Cal172, Cal200) or even four (Cal192) different TEs. Thus, the cryptic repetitive DNA identified from the DNA families, after Repbase scans, was sometimes shown to be composed of more than one TE (Table 3, Fig. 1 for Family 1 and 3). In Family 1, all clones shared Polinton-1N1_DR or L2-3_DRe, but four MCS (Cal066, Cal075, Cal086 and Cal091) shared a second unrelated cryptic repeated element. In Family 2, the seven sequences shared only Helitron-4_DR. In Family 3, all associated TEs were in the Helitron category, with Cal100 associated three times with Helitron-1N2_DR and once with Helitron-4_DR. Three kinds of TEs (Helitron-1N1_DR, Helitron-1N2_DR, Helitron-1N3_DR) were associated with Cal105, Helitron-1N2_DR being associated twice. The TEs identified in the DNA families included a variety of TEs: DNA transposons (Polinton, Helitron, Harbinger); LTR retrotransposons (Gypsy, DIRS) and non-LTR retrotransposons (L2, Rex1 and SINE2). However, in Family 1, which contained the CAA-motif, TEs were mainly represented by Polinton-1N1_DR and L2-3_DRe, while in families 2 and 3, which contained the GAA-motif, TEs were mainly represented by helitrons (Helitron-4_DR, Helitron-1N2_DR, Helitron-1_DR, Helitron-1N1_DR and Helitron-1N3_DR). The non-autonomous TEs from the DNA families were observed at high proportions (Table 3). The most frequently identified TE in C. alburnus was the Polinton-1N1_DR (Haapa-Paananen et al., 2014). A total of 28 copies of Polinton-1N1_DR were detected in 71.9 kb MCS from C. alburnus (0.39 copies per kb). Hits for Polinton-1N1_DR were on average 125 bp in alignment length with varying degrees of homology (82e93%). All these occurred between nucleotide positions 3165e3290 bp of the published sequence (B15 469 bp) and were mostly found in Family 1. However, Family 1 contained 39 sequences because this family also contained 12 L2-3_DRe, with a similar region at the 4945e5066 bp part of L23_DRe. Hence, they were attributed to this family because the similarity of this part of Polinton-1N1_DR and L2-3_DRe was about 87.5%. In the NJ cluster analysis, they were not independent, but rather clustered together. The second most common TE identified from C. alburnus was the Helitron-1N2_DR (N ¼ 16), all of which appeared in Family 3. This TE is 2194 bp in length and was initially identified from D. rerio (Bao and Jurka, 2014). Family 3 contained 10 sequences, with only a 324-bp section of Cal101 being similar to Helitron-1_DR and obtaining a higher similarity score, despite not being associated with Helitron-1N2_DR. This was likely to result from these regions of Helitron-1N2_DR and Helitron-1_DR being similar. In Cal099, Helitron-1_DR appeared once, and Helitron-1N2_DR appeared twice, indicating that these two TEs are associated. Interestingly, although several different classes of TEs were identified from the MCS of the three DNA families, we could determine whether L2-3_DRe and Helitron-4_DR were autonomous or non-autonomous. There were no cases consisting only of autonomous TEs. Specifically, these included the DNA transposon. Polinton-1N1_DR, and the helitrons Helitron-1N1_DR, Helitron-1N2_DR and Helitron-1N3_DR. 4. Discussion Genomic complexities such as cryptic repetitive DNA and DNA family abundance identified in association with microsatellites have been well studied in insects (Coates et al., 2011; Tay et al., 2010), but have rarely been reported or described in fish. In this study, a BLAST search of the microsatellite containing sequences obtained from C. alburnus against a database of mobile elements yielded a high number of positive hits to other species. Research undertaken on Drosophila melanogaster cz et al., 2007). suggested that repetitive element banks are generally species-specific for each genus or target species (Megle Therefore, the identification of TEs depends upon database entries. Hence, the number of associations observed in the present
132
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
Fig. 2. Schematic representation of DNA families identified with an all-against-all BLASTn analysis. MCS names are provided with GenBank accession numbers in parenthesis. Black boxes indicate microsatellite regions and the number of repeat motifs. Grey boxes indicate cryptic repetitive DNA identified in flanking regions. Black lines indicate unique DNA sequences (i.e., those with no apparent similarity to any other regions). Dashed lines represent missing places. Backslashes represent sequences that are too long and have parts omitted. Boxes filled with graphics indicate the flanking regions of cryptic repetitive DNA identified as TEs (for details see Table 3). Different TEs are distinguished by the filling of the graphics, as shown in the bottom of the figure. For simplicity, some sequences have been omitted from the schematic of DNA Family 1.
study is likely to be grossly underestimated, as a result of the paucity of information on mobile elements in fish species. Nevertheless, we still observed a great abundance of TEs from the microsatellite flanking regions obtained from C. alburnus in this study.
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
133
Table 3 Positive matches of microsatellite containing sequences (MCS) with known mobile elements detected through BLAST searches against the Censor GIRI database. To simplify the table some results have been omitted. MCS name
GenBank accession no.
Size of match (bp)a
Matching mobile elementb
A/ NAc
Element associated speciesd
Classe
Similarity (%)f
Cal059
KM272017
137
Polinton-1N1_DR
NA
Danio rerio
85.2
741
Cal060 Cal066 Cal066
KM272018 KM272024 KM272024
122 63 133
L2-3_DRe RIRE7_I Polinton-1N1_DR
? ? NA
Danio rerio Oryza sativa Danio rerio
87.0 84.6 83.3
710 227 660
Cal075
KM272033
134
Polinton-1N1_DR
NA
Danio rerio
82.6
659
Cal075 Cal076 Cal077 Cal082
KM272033 KM272034 KM272035 KM272040
172 120 120 131
DIRS1_DR L2-3_DRe L2-3_DRe Polinton-1N1_DR
? ? ? NA
Danio Danio Danio Danio
67.6 83.5 83.5 83.0
512 672 672 642
Cal088
KM272046
127
Polinton-1N1_DR
NA
Danio rerio
82.4
614
Cal088
KM272046
92
Harbinger-N10_DR
NA
Danio rerio
74.5
403
Cal089
KM272047
119
Polinton-1N1_DR
NA
Danio rerio
83.2
653
Cal092 Cal093 Cal099 Cal099 Cal099 Cal100 Cal100 Cal100 Cal100 Cal101 Cal101 Cal101 Cal104 Cal104 Cal104 Cal105 Cal105 Cal105 Cal105 Cal106 Cal106 Cal163 Cal163 Cal163
KM272050 KM272051 KM272057 KM272057 KM272057 KM272058 KM272058 KM272058 KM272058 KM272059 KM272059 KM272059 KM272062 KM272062 KM272062 KM272063 KM272063 KM272063 KM272063 KM272064 KM272064 KM272121 KM272121 KM272121
627 439 335 60 90 249 54 58 28 324 35 243 197 91 107 266 91 105 479 122 153 117 149 42
Helitron-4_DR Helitron-4_DR Helitron-1_DR Helitron-1N2_DR Helitron-1N2_DR Helitron-1N2_DR Helitron-4_DR Helitron-1N2_DR Helitron-1N2_DR Helitron-1_DR Helitron-1N1_DR Helitron-1N1_DR Helitron-1N2_DR Helitron-1N3_DR Helitron-1N2_DR Helitron-1N2_DR Helitron-1N3_DR Helitron-1N2_DR Helitron-1N1_DR L2-3_DRe hAT-N109_DR hAT-N67_DR ERVN1-LTR_DR Polinton-1N1_DR
? ? A NA NA NA ? NA NA A NA NA NA NA NA NA NA NA NA ? NA NA NA NA
Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio Danio
rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio rerio
78.0 78.5 80.1 86.7 78.9 84.7 82.5 94.8 92.9 81.6 97.1 76.1 80.4 80.9 77.6 82.5 80.0 78.1 77.1 83.2 78.5 81.7 77.5 92.9
2534 1800 1509 383 344 1461 239 454 222 1440 308 965 996 340 494 1283 323 498 2012 635 555 433 446 329
Cal192 Cal192 Cal192 Cal192 Cal200 Cal200 Cal200
KM272150 KM272150 KM272150 KM272150 KM272158 KM272158 KM272158
257 134 199 132 207 36 146
KibiDr1 CR1-42_DR Gypsy141-LTR_DR Gypsy141-I_DR Helitron-N1_DR Helitron-2_DR MOSAT_DR
? ? ? ? NA NA ?
Danio Danio Danio Danio Danio Danio Danio
rerio rerio rerio rerio rerio rerio rerio
DNA/ Polinton NonLTR/L2 LTR/Gypsy DNA/ Polinton DNA/ Polinton LTR/DIRS NonLTR/L2 NonLTR/L2 DNA/ Polinton DNA/ Polinton DNA/ Harbinger DNA/ Polinton DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron DNA/Helitron NonLTR/L2 DNA/hAT DNA/hAT ERV DNA/ Polinton NonLTR/Tx1 NonLTR/CR1 LTR/Gypsy LTR/Gypsy DNA/Helitron DNA/Helitron Simple/Sat/ SAT
77.7 81.6 74.9 90.2 75.2 86.1 73.7
981 617 567 980 725 230 490
rerio rerio rerio rerio
Alignment scoreg
Note: a, the length of the region that matched to the corresponding transposable element; b, the transposable element description as defined by Repbase; c, the name of the known transposable element provided by Repbase; d, A: autonomous; NA: non-autonomous; ?: not specified transposition mechanism according to Repbase or Web of Science; e, class of transposable element; f, the ‘similarity’ statistic as calculated by Censor, and converted to a percentage for simplification; g, the alignment score obtained from BLAST.
The occurrence of cryptic repetitive DNA grouped into DNA families can be problematic for the development of microsatellite markers as flanking regions of the microsatellite are not unique. A small number of transposable elements carrying the common microsatellites would disrupt primer development without being apparent in the genome. To avoid wasteful expenditure of both time and labor on the development of primers that are unlikely to work, a local BLAST analysis, combined with similar BLAST searches against transposable element databases, is recommended to compare all sequenced clones. However, the mobile elementemicrosatellite association can also be capitalized upon by molecular marker systems, such as those applied to plant genomes (e.g., Inter-Retrotransposon Amplified Polymorphism and Retrotransposon-Microsatellite Amplified Polymorphisms, Kalendar and Schulman, 2006) or those more recently applied to lepidopteran species (e.g.,
134
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
repetitive flanking sequences, Anderson et al., 2007). The discovery of transposons in this study and in other fish genomes (Franck and Wright, 1993) raises the possibility of applying such marker systems to future fish studies. The genesis, behavior and evolution of microsatellites within a genome are the subjects of ongoing debate (Goldstein and Schlotterer, 1999). There is growing evidence that mobile elements are at least partially responsible for the genesis of microsatellite DNA repeat units (e.g., Tay et al., 2010). It is thought that TEs containing microsatellite repeated regions may have behaved as microsatellite inducing elements in the host genome (Coates et al., 2009). The association between the flanking regions of microsatellites and similar cryptic repeated elements observed in this study suggests that the whole region, microsatellite and flank, were already present and duplicated together. Moreover, the close association between DNA families and some TEs in C. alburnus observed in this study supports the idea that the TE association with microsatellites is a dispersal agent for microsatellites, hitchhiking within TEs during transposition (Coates et al., 2009). The abrupt loss of homology with distance is likely to reflect the transposition or insertion of a short segment. These processes can also explain the formation, and account for the abundance, of multilocus DNA families in C. alburnus (i.e., Family 1 contained up to 39 sequences, but did not exclude some sequences that were not observed). Gene conversion is the non-reciprocal exchange of genetic material among chromosomes and can be mediated by TE cz et al., 2004). Two common DNA transposons in family sequences are the rolling-circle replication transposition (Megle utilizing helitrons (Kapitonov and Jurka, 2001) and mavericks/polintons, which are likely to replicate using a self-encoded DNA polymerase (Haapa-Paananen et al., 2014). These are widely present in the genomes of diverse eukaryotic taxa, can capture and move gene fragments, and are responsible for gene duplication and conversion. Although helitrons were not identified from C. alburnus in this study, the Helitron-1N1_DR was the second most highly abundant TE we identified in C. alburnus. Studies report greater numbers of Polinton-1N1_DR and Helitron-1N1_DR in the TEs of zebrafish. Therefore, the high recombination rates in C. alburnus may be due to the high number of polintons and helitrons. Genomic rearrangements caused by polintons and helitrons transpositional activities can indicate genomic complexities and the instability of the microsatellite flanking regions of the C. alburnus. The asymmetrical arrangement of similar microsatellite cz et al., 2004). This flanking regions from a DNA family usually indicates that it has undergone unequal crossing over (Megle was not observed in this study, and thus cannot explain the absence of this phenomenon. In a study on mollusks, McInerney et al. (2011) found more non-autonomous elements than autonomous elements. During the conversion of autonomous TEs, high substitution rates of the coding region can affect its transposition activity, leading to a greater propensity for fixation. However, non-autonomous TEs that ‘hijack’ the machinery of their partner TEs to accomplish their transposition (Feschotte et al., 2002) would probably be unaffected. Meanwhile, compared to autonomous TEs, nonautonomous TEs may be more highly conserved and have higher transpositional activities. Indeed, as non-autonomous TEs continue to proliferate, their numbers will continue to grow. The results of this study support this view. This study found DNA family sequences of microsatellite flanking genes during the development of microsatellite primers with a core of CAA and GAA. Previously, however, we did not detect microsatellite families when using the same method to separate microsatellite primers of CA and GA (Liu et al., 2014). This discrepancy may be due to their being few sequenced clones, or that there were no gene families in these two kinds of dinucleotide repeats. A study on the amphibian, Pachyhynobius shangchengensis, reported that there was microsatellite family in their CA and GA, while the tetranucleotide ATAG did not exist (Wang et al., 2012). In contrast, a study on three species of mollusks did find a microsatellite family in their GAA, CCAT, GACA (McInerney et al., 2011). In these studies, the microsatellite contained sequences that were attributed to a family contained identical microsatellite core sequences. However, a study on three squat lobster species from the Galatheidae (Decapoda: Anomura) showed that there were microsatellite family in dinucleotide, trinucleotide and tetranucleotide microsatellite containing sequences (Bailie et al., 2010). Furthermore, in two of those five families studied the microsatellite core differed (Bailie et al., 2010). The results of this study have provided a basis for further study on the microsatellite family of fish. The microsatellite primers we developed will be valuable for the management and future conservation of C. alburnus. Acknowledgments This work was supported by grants from the Key Special Project for breeding of new aquaculture varieties (2012C12907-7), and the Cultivating Innovation Support Project (No. 2014F13307) of Science Technology Department of Zhejiang Province. We thank Jian-Jun Fu, Ke-Yi Ma, Shu-Ren Zhu and Hong-Gang Zhao for their assistance in experiments. References Anderson, S.J., Gould, P., Freeland, J.R., 2007. Repetitive flanking sequences (ReFS): novel molecular markers from microsatellite families. Mol. Ecol. Notes 7, 374e376. €hl, P., 2010. High incidence of cryptic repeated elements in microsatellite flanking regions of galatheid genomes and its Bailie, D., Fletcher, H., Prodo evolutionary implications. J. Crustacean Biol. 30, 664e672. Bao, W., Jurka, J., 2014. DNA transposons from zebrafish. Repbase Rep. 14, 1429. Chen, Y.Y., 1998. Fauna Sinica, Osteichthyes, Cypriniformes (II). Science Press, Beijing. Coates, B.S., Kroemer, J.A., Sumerford, D.V., Hellmich, R.L., 2011. A novel class of miniature inverted repeat transposable elements (MITEs) that contain hitchhiking (GTCY) n microsatellites. Insect Mol. Biol. 20, 15e27. Coates, B.S., Sumerford, D.V., Hellmich, R.L., Lewis, L.C., 2009. Repetitive genome elements in a European corn borer, Ostrinia nubilalis bacterial artificial chromosome library were indicated by bacterial artificial chromosome end sequencing and development of sequence tag site markers: implications for lepidopteran genomic research. Genome 52, 57e67.
S.-L. Liu et al. / Biochemical Systematics and Ecology 63 (2015) 127e135
135
DeWoody, J., Nason, J.D., Hipkins, V.D., 2006. Mitigating scoring errors in microsatellite data from wild populations. Mol. Ecol. Notes 6, 951e957. Feschotte, C., Jiang, N., Wessler, S.R., 2002. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3, 329e341. Franck, J.P.C., Wright, J.M., 1993. Conservation of a satellite DNA sequence (SATB) in the tilapiine and haplochromine genome (Pisces: Cichlidae). Genome 36, 187e194. Glenn, T.C., Schable, N.A., 2005. Isolating microsatellite DNA loci. Methods Enzym. 395, 202e222. Goldstein, D.B., Schlotterer, C., 1999. Microsatellites: Evolution and Applications. Oxford University Press, Oxford. Haapa-Paananen, S., Wahlberg, N., Savilahti, H., 2014. Phylogenetic analysis of Maverick/Polinton giant transposons across organisms. Mol. Phylogenetics Evol. 78, 271e274. Kalendar, R., Schulman, H.A., 2006. IRAP and REMAP for retrotransposon-based genotyping and fingerprinting. Nat. Protoc. 1, 2478e2484. Kalinowski, S.T., Taper, M.L., Marshall, T.C., 2007. Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol. Ecol. 16, 1099e1106. Kapitonov, V.V., Jurka, J., 2001. Rolling-circle transposons in eukaryotes. Proc. Natl. Acad. Sci. USA 98, 8714e8719. Kohany, O., Gentles, A.J., Hankus, L., Jurka, J., 2006. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and censor. BMC Bioinforma. 7, 474. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., et al., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947e2948. Liu, S.L., Gu, Z.M., Jia, Y.Y., Zhao, J.L., Jiang, W.P., Li, Q., et al., 2014. Isolation and characterization of 32 microsatellite loci for topmouth culter (Culter alburnus Basilewsky). Genet. Mol. Res. 13, 7480e7483. € hl, P.A., 2011. Comparative genomic analysis reveals species-dependent complexities that McInerney, C.E., Allcock, A.L., Johnson, M.P., Bailie, D.A., Prodo explain difficulties with microsatellite marker development in molluscs. Heredity 106, 78e87. cz, E., Anderson, S.J., Bourguet, D., Butcher, R., Caldas, A., Cassel-Lundhagen, A., 2007. Microsatellite flanking region similarities among different loci Megle within insect species. Insect Mol. Biol. 16, 175e185. cz, E., Petenian, F., Danchin, E., Coeur d’Acier, A., Rasplus, J.Y., Faure, E., 2004. High similarity between flanking regions of different microsatellites Megle detected within each of two species of Lepidoptera: Parnassius apollo and Euphydryas aurinia. Mol. Ecol. 13, 1693e1700. Rousset, F., 2008. Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resour. 8, 103e106. Sambrook, J., Russell, D.W., 2001. Molecular Cloning: a Laboratory Manual, third ed. Cold Spring Harbor Laboratory Press, New York. Tay, W.T., Behere, G.T., Batterham, P., Heckel, D.G., 2010. Generation of microsatellite repeat families by RTE retrotransposons in lepidopteran genomes. BMC Evol. Biol. 10, 144. Wang, H., Zhang, B., Shi, W., Luo, X., Zhou, L., Han, D., et al., 2012. Structural characteristics of di-nucleotide/tetra-nucleotide repeat microsatellite DNA in Pachyhynobius shangchengensis genomes and its effect on isolation. Biodivers. Sci. 20, 51e58.