Two novel cricetine mitogenomes: Insight into the mitogenomic characteristics and phylogeny in Cricetinae (Rodentia: Cricetidae)

Two novel cricetine mitogenomes: Insight into the mitogenomic characteristics and phylogeny in Cricetinae (Rodentia: Cricetidae)

Genomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Original Article Tw...

3MB Sizes 0 Downloads 30 Views

Genomics xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

Original Article

Two novel cricetine mitogenomes: Insight into the mitogenomic characteristics and phylogeny in Cricetinae (Rodentia: Cricetidae) Li Dinga, Quan Zhoua, Yuanhai Suna, Natalia Yu Feoktistovab, Jicheng Liaoa,



a b

School of Life Sciences, Lanzhou University, Lanzhou 730000, PR China A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow 119071, Russia

ARTICLE INFO

ABSTRACT

Keywords: Cricetus cricetus Phodopus sungorus Mitochondrial genome Cricetinae Phylogenetic analysis

Both Cricetus cricetus and Phodopus sungorus mitochondrial genomes (mitogenomes) were sequenced and elaborated for the first time in the present study. Their mitogenomes contained 37 genes and showed typical characteristics of the vertebrate mitogenome. Comparative analysis of 10 cricetine mitogenomes indicated that they shared similar characteristics with those of other cricetines in terms of genes arrangement, nucleotide composition, codon usage, tRNA structure, nucleotide skew and the origin of replication of light strand. Phylogenetic relationship of the subfamily Cricetinae was reconstructed using mitogenomes data with the methods of Bayesian Inference and Maximum Likelihood. Phylogenetic analysis indicated that Cricetulus kamensis was at basal position and phylogenetically distant from all other Cricetulus species but had a close relationship with the group of Phodopus, and supported that the genus Urocricetus deserved as a separate genus rank. The phylogenetic status of Tscherskia triton represented a separate clade corresponding to a diversified cricetine lineage (Cricetulus, Allocricetulus, and Cricetus).

1. Introduction The Common hamster Cricetus cricetus (Linnaeus, 1758) is the largest living cricetine of the subfamily Cricetinae, which are presently distributed in the Forest-Steppes and Steppes ranging from Europe through Kazakhstan (North) to Russia (from West borders eastward to the Altai Mountains and Minusinsk Steppe, northward to the upper Volga River) and China (North Xinjiang) [1,2]. According to the taxonomic data, C. cricetus belongs to the genus Cricetus, which have historically been a diverse group and today there is only one living species based on the morphological characteristics and coat color variations [1–3]. The Striped desert hamster Phodopus sungorus (Pallas, 1773) is dwarf cricetine, which are widely distributed in the steppes and semideserts of ranging from Kazakhstan (North, Central and East) eastward to China (North), Mongolia, and Russia (Southwest Siberia and an isolated population in Minusinsk Steppe (South Central Siberia)) [1–3]. Both of them mitogenomes were not yet available up to now. To understand the evolutionary characteristics of mammalian mitogenome and the Cricetinae phylogenetic evolution, here, we sequenced the mitogenomes of C. cricetus and P. sungorus, and summarized the characteristics of cricetine mitogenomes and elaborated the phylogenetic status in that species using mitogenomes dataset in present study. Mitochondrial DNA (mtDNA) is a pre-eminent model for evolutionary study because of its small size and relatively conserved gene content and high mutation rate [4]. Vertebrate mitogenome is approximately 16 Kb ⁎

in length, including a set of 13 protein-coding genes (PCGs) for the electron transport chain, 2 ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs) and a longest non-coding sequence (D-loop) [5]. The mtDNA shows a complete genome with different sequence characteristics in different taxonomic categories, which always reflects in functionally dependent differences or phylogenetically distant organisms [4]. In recent years, a large number of mitogenome were determined across a broad range of Cricetinae species [6,7]. For the genus Cricetulus, there were several unique mitogenome characteristics vary from species except for some common features, including genome size, gene location, nucleotide compositions, codons usages, non-coding structure and the origin of replication for the light strand (OL) [6]. The aforementioned mitogenomic characteristics have shed light on the genus Cricetulus species, but poorly known about other cricetines. Moreover, mitogenomic characteristics like nucleotide skew and nucleotide mismatch of tRNAs were not considered for the genus Cricetulus. Additionally, their work ignored codon usage bias for cricetine mitogenomes as well as the relationship between the nucleotide composition and the amino acid count. Here, the present study will firstly offer more detailed and systematic analyses to Cricetinae species inferred from their mitogenomes. Previous molecular phylogenetic studies on Cricetinae mainly focus on several fragments of mtDNA and partial nuclear genes [8–10], and these results were not completely consistent with the cytogenetic study such as Tscherskia triton [11]. In addition, phylogenetic analysis of

Corresponding author. E-mail addresses: [email protected] (L. Ding), [email protected] (J. Liao).

https://doi.org/10.1016/j.ygeno.2019.09.016 Received 17 April 2019; Received in revised form 6 September 2019; Accepted 18 September 2019 0888-7543/ © 2019 Elsevier Inc. All rights reserved.

Please cite this article as: Li Ding, et al., Genomics, https://doi.org/10.1016/j.ygeno.2019.09.016

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

chromosomal characters indicated that C. cricetus is closely related to C. migratorius in comparison to Allocricetulus eversmanni [11]. In general, there were differences in molecular and chromosomal results. More importantly, they ignored C. kamensis when they discussed the phylogenetic relationships of Cricetinae, and this may be affect the result. There was controversial on the genus level classification of C. kamensis. Historically, Satunin [12] named the female cricetine from the Southeastern part of the Qinghai-Tibet Plateau for the first time as an Urocricetus kamensis (Satunin, 1903). However, subsequent classification studies abandoned the valid genus Urocricetus and U. kamensis was classified into the genus Cricetulus — C. kamensis [1,13–18]. Pavlinov [19] persist in Urocricetus regarding as a separate subgenus and is phylogenetically distant from Cricetulus. Recently, Lebedev et al. [20] taken C. kamensis into account the phylogenetic relationship of Cricetinae and suggested that C. kamensis have a close relationship with the group of Phodopus and proposed reinstatement the genus Urocricetus. At this point, we tried to provide genetic evidence for the classification of C. kamensis that belonged to Cricetulus or Urocricetus using mitogenomes dataset. Here, animal mtDNA exhibit a maternal inheritance, relatively low rates of recombination, and a high rate of substitution, is an ideal genetic marker in the study of animal origin and evolution [21–23]. As such, we can get more accurate phylogenetic relationships using mitogenomes data than analyses of few mitochondrial DNA genes and it allows resolving trees with strong measures of branch support, reduces stochastic errors, minimizes the effect of homoplasy and increases the precision of divergence estimates [24–27]. In this situation, the complete mitogenomes of C. cricetus and P. sungorus were first determined and its mitogenomic characteristics were elaborated in the present study. We acquired 10 cricetine mitogenomes across six genera (Phodopus, Mesocricetus, Tscherskia, Cricetulus, Allocricetulus and Cricetus), here, combined them to reconstruct a robust phylogenetic relationship of Cricetinae and to elaborate the characteristics of cricetine mitogenomes. The aim of this study was to summarize the evolutionary characteristics of cricetine mitogenomes and revealed the phylogenetic relationship of Cricetinae by mitogenomes dataset.

with primers designed specifically (Supplementary Table S1) using nested PCR and each sequence overlapped the next contig over 100 bp [28,29]. As such, the mitogenome fragments of P. sungorus were amplified with standard PCR using specific primers (Supplementary Table S2) that were designed based on the homologous sequence of P. roborovskii mitogenome. Here, the nested PCR required two pairs of primers (Outside and Inside primer pairs), among which the outside primer was used to enrich the target region from complex mitogenome of C. cricetus and the inside primer was used to amplify target region from the first-round PCR products [30]. We amplified adjacent overlapping fragments for each of firstround PCR products using both long and accurate PCR (LA-PCR) [31] and standard PCR with two PCR reaction systems. In this study, the LA-PCR was used to product length > 2.5 kb and vice versa. We amplified the long DNA fragments (≥ 2.5 kb) using a 50 μL reaction system which contains 10 μL 5 × PCR buffer (Mg2+ plus), 4 μL dNTP (2.5 mM each), 1.5 μL of each primer (10 μM each), 1 μL Primer STAR GXL DNA polymerase (1.25 U/μL, Takara, Dalian), 2 μL total genomic DNA (100–200 ng) as the template; 30 μL ddH2O was added to make the volume up to 50 μL. The LA-PCR was run with pre-denaturing for 3 min at 98 °C, followed by 35 cycles of denaturing for 15 s at 98 °C, annealing for 30s at 50–60 °C, extension at 72 °C, and a final extension at 72 °C for 10 min. The short DNA fragments were amplified using a 25 μL reaction system which contained 2.5 μL 10 × PCR buffer, 1.5 μL 25 mM/L MgCl2, 0.5 μL 2.5 mM/ L dNTP, 1 μL of each primer pairs (10 μM each), 0.5 μL Taq polymerase (5 U/μL, Sangon, Shanghai), 1 μL DNA template; 17 μL ddH2O. The strand PCR was run with pre-denaturing for 3 min at 95 °C, followed by 35 cycles of denaturing for 30s at 95 °C, annealing for 45 s at 45–60 °C, extension at 72 °C, and a final extension at 72 °C for 10 min. All PCR products were detected using 1% agarose gel electrophoresis with ethidium bromide dye and then directly sequenced using the Sanger sequencing method (Genewiz Biotech (Suzhou) Co., Ltd.). 2.3. Mitogenome analysis and gene annotation DNA sequences were edited and assembled manually using the Lasergene version 5.0 (DNASTAR) and aligned with the BioEdit version 7.0.5.2 program [32]. Contigs assembly were performed using the Seqman program (a subprogram of DNASTAR). The base composition, codon usage and relative synonymous codon usage (RSCU) were calculated using MEGA 6.0 [33] after ruling out the termination codon for PCGs. To reveal the relationship between phylogenetic relationships and GC content (%), in addition, we have assigned a gradient color to each cricetine representing the GC content of mtDNA to illustrate the distribution of compositional bias across the species using the phytools package [34] in the R statistical computing environment [35]. Both twofold degenerate codons (P2FD) and fourfold degenerate codons (P4FD) were calculated in MEGA 6.0. Skew analysis of mtDNA was carried out using eqs. ATskew = (A % − T %) / (A % + T %) and GC-skew = (G % − C %) / (G % + C %) [36], respectively. All statistical analyses for cricetine mitogenomes were implemented in the R package [35]. The locations of PCGs, tRNAs and rRNAs genes were annotated by alignment to search for homologous sequences using both the BioEdit program and the basic local alignment search tool online [37]. Here, A. eversmanni (GenBank accession number: KP231506.1) and P. roborovskii (KU885975.1) were used as reference sequences to search for different genes. Most tRNA genes were validated using tRNAscan-SE 2.0 online sever under the default search mode, using the mammalian mitochondrial genetic code source [38]. Furthermore, we used the same method to determine the D-loop region with reference sequences by comparison with the mtDNA conserved sequences [6]. The tandem repeats of A + T-rich region were found using the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html) [39]. The circular mitogenome maps of C. cricetus and P. sungorus were drawn using the CGView online tool (http://stothard.afns.ualberta.ca/ cgview_server) [40]. In addition, we identified the conserved motif 5′CTTCT−3′ to find the OL sequence and used the Mfold Web Server online to draw its secondary structure map [6,41].

2. Materials and methods 2.1. Sampling, ethics and DNA extraction All animal voucher specimens were stored in the School of Life Sciences of Lanzhou University (LZUSLS), China. The specimen of C. cricetus (LZUSLS-KB201601) in this study was obtained from Dr. N. Yu. Feoktistova (A.N.Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences). It collected from Nalchik city, Kabardino-Balkaria (KB) (N 43°30′, E 43°39′, 438 m a.s.l. (above sea level)), Russia. The specimen of P. sungorus (LZUSLS-XWQ24) was collected from West Ujimqin banner (XWQ) (N 44°35′, E 117°41′, 1016 m a.s.l.) in Inner Mongolia Autonomous Region, China. Specimen collection in Russia conforms to the fulfillment of the governmental program “Animal communication-behavioral and physiological adaptions”, with governmental permission to collect such samples from public property. The sample collections and the study did not conflict with the ethical guidelines, religious beliefs, or legal requirements of China. All animal experiments conformed to the guidelines of care and use of laboratory animals and were approved by the Committee of Laboratory Animal Experimentation at Lanzhou University (Lanzhou, China). The total genomic DNA was extracted from the ear (C. cricetus) and liver tissue (P. sungorus) using the TIANamp Genomic DNA Kit (Tiangen, Beijing, China) according to the instructions. 2.2. PCR amplification and sequencing Polymerase chain reaction (PCR) was used to amplify the mtDNA sequence fragments of C. cricetus and P. sungorus. Two primers were used to amplify the partial sequences (16S rRNA and CYTB) with standard PCR for C. cricetus, and the remaining long fragments of mtDNA were amplified 2

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

Table 1 Source and information used for the phylogenetic analysis. Species

Common name

GenBank accession number

Reference

Tscherskia triton Cricetulus griseus Cricetulus longicaudatus Allocricetulus eversmanni Cricetulus kamensis Cricetulus migratorius Cricetus cricetus Mesocricetus auratus Phodopus roborovskii Phodopus sungorus

Greater long-tailed hamster Striped dwarf hamster Long-tailed dwarf hamster Eversman's hamster Tibetan dwarf hamster Gray dwarf hamster Common hamster Golden hamster Roborovski's desert hamster Striped desert hamster

NC_013068.1 NC_007936.1 KM067270.1 KP231506.1 KJ680375.1 KT918407.1 MF034880 EU660218.1 KU885975.1 MH166880

Unpublished [54] [55] [56] [57] [6] This study Unpublished [7] This study

Fig. 1. Gene organizations of C. cricetus and P. sungorus mitogenomes. Arrows indicate the orientation of gene transcription. All tRNA genes are denoted by a single letter (amino acid abbreviation), and rrnL and rrnS by large (16S) and small (12S) rRNA subunits. The PCGs indicated are as follows: ND1–6, NADH dehydrogenase subunits 1–6; COX1–3, cytochrome c oxidase subunits 1–3; ATP6 and ATP8, ATP synthase F0 subunit 6 and 8; CYTB, cytochrome b; and D-loop, control region. The GC content is plotted using a black sliding window, as the deviation from the average GC content of the entire sequence. GC-skew is plotted as the deviation from the average GC-skew of the entire sequence.

2.4. Phylogenetic analysis and genetic distance

was conducted with mixed models using MrBayes 3.2.6, which was run with one million generations with Markov Chain Monte Carlo (MCMC) simulations, and sampled every 100 generations. MCMC was run using the default model parameters starting from a random tree. The first 25% of samples were discarded as a conservative burn-in and the remaining samples were used to generate a 50% majority rule consensus tree. The robustness of inferences was assessed with bootstrapping (1000 random repetitions for ML) [50]. In all phylogenetic analyses, both bootstrap value > 70% and BI > 0.95 for clade were considered as strong relationship [51,52]. All tree files were drawn using the Figtree v. 1.3.1 program. Furthermore, the genetic distance between cricetines were calculated using MEGA 6 program [33] with the Kimura-2-parameter (K2P) model of nucleotide substitution [53].

Mitogenomic phylogeny was estimated using mitogenome sequences of cricetines, and the sequences were downloaded from the GenBank, except that of C. cricetus and P. sungorus (Table 1). Multiple-sequence alignment of mitochondrial genes was performed using the BioEdit program with default parameters. Sequence data were built by concatenating the nucleotide sequences of the 13 PCGs and two rRNA genes (12S rRNA +16S rRNA). Each gene was aligned separately and manually concatenated these sequences using Sequence Matrix v1.7.8 [42]. The concatenated sequence format was converted for further analyses using Geneious v9.1.4 program. Based on previous studies, Dipus sagitta (NC_027499.1) and Euchoreutes naso (NC_027500.1) were used as the most obvious outgroup taxa [43]. Phylogenetic analysis was run in MrBayes 3.2.6 [44] and RAxML v8.2.X [45] using Bayesian Inference (BI) [46] and Maximum Likelihood (ML) [47]. The best fit model of substitution and best partition schemes for dataset were identified with the corrected Akaike information criterion (AICc) [48], implemented in PartitionFinder v1.1.0 with the “greedy” algorithm [49]. ML analysis was calculated with 1000 bootstrap replications using the rapid bootstrap feature (random seed value 12,345) [36]. Partitioned Bayesian analysis

3. Results 3.1. Organizations and characteristics of C. cricetus and P. sungorus mitogenomes Both C. cricetus and P. sungorus mitogenomes contained 13 PCGs, 2 rRNAs (12S rRNA and 16S rRNA), 22 tRNAs, and one control region 3

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

Table 2 Characteristics of the mitogenomes of C. cricetus and P. sungorus. Gene

C. cricetus (C)

Phe

tRNA 12S rRNA tRNAVal 16S rRNA tRNALeu1 ND1 tRNAIle tRNAGln tRNAMet ND2 tRNATrp tRNAAla tRNAAsn tRNACys tRNATyr COX1 tRNASer2 tRNAAsp COX2 tRNALys ATP8 ATP6 COX3 tRNAGly ND3 tRNAArg ND4L ND4 tRNAHis tRNASer1 tRNALeu2 ND5 ND6 tRNAGlu CYTB tRNAThr tRNAPro D-loop a b

P. sungorus (P)

Size (bp)

Strand

Codon Start (C/P)

From

To

From

To

C/P

(C/P)

1 68 1018 1090 2650 2725 3682 3748 3823 3892 4925 4994 5066 5167 5234 5303 6845 6917 6985 7672 7738 7899 8579 9363 9432 9783 9852 10,142 11,520 11,587 11,645 11,715 13,519 14,044 14,117 15,261 15,331 15,395

66 1016 1089 2648 2724 3679 3750 3818 3891 4924 4991 5063 5135 5234 5301 6847 6913 6989 7668 7736 7941 8579 9362 9431 9779 9850 10,148 11,519 11,586 11,644 11,714 13,535 14,043 14,112 15,259 15,327 15,396 16,263

1 69 1019 1091 2663 2738 3693 3758 3824 3900 4933 5002 5073 5176 5243 5311 6853 6925 6994 7681 7746 7907 8587 9370 9438 9789 9857 10,147 11,525 11,592 11,650 11,720 13,524 14,049 14,122 15,267 15,334 15,399

67 1017 1090 2661 2737 3692 3760 3828 3899 4932 4999 5071 5143 5242 5309 6855 6921 6992 7677 7744 7949 8587 9370 9437 9785 9856 10,135 11,524 11,591 11,649 11,719 13,540 14,048 14,117 15,264 15,333 15,400 16,346

66/67 949/949 72/72 1559/1571 75/75 955/955 69/68 71/71 69/76 1033/1033 67/67 70/70 70/71 68/67 68/67 1545/1545 69/69 73/68 684/684 65/64 204/204 681/681 784/784 69/68 348/348 68/68 297/297 1378/1378 67/67 58/58 70/70 1821/1821 525/525 69/69 1143/1143 67/67 66/67 869/948

H/H H/H H/H H/H H/H H/H H/H L/L H/H H/H H/H L/L L/L L/L L/L H/H L/L H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H H/H L/L L/L H/H H/H L/L H/H

Intergenic nucleotidesa Stop (C/P)

GTG/GTG

T b/ T b

ATT/ATT

Tb/Tb

ATG/ATG

TAA/TAA

ATG/ATG

TAG/TAA

ATG/ATG ATG/ATG ATG/ATG

TAG/TAG TAA/TAA Tb/Tb

ATA/ATA

TAA/TAA

ATG/ATG ATG/ATG

TAA/TAA Tb/Tb

ATT/ATT ATG/ATG

TAA/TAA TAA/TAA

ATG/ATG

TAA/TAA

(C/P) 1/1 1/1 0/0 1/1 0/0 2/0 -3/−3 4/−5 0/0 0/0 2/2 2/1 31/32 −1/0 1/1 −3/−4 3/3 −5/1 3/3 1/1 −43/−43 −1/−1 0/−1 0/0 3/3 1/0 −7/11 0/0 0/0 0/0 0/0 −17/−17 0/0 4/4 1/2 3/0 −2/−2

Numbers correspond to the nucleotides separating different genes. Negative numbers indicate overlapping nucleotides between adjacent genes. T(AA) stop codon was completed via polyadenylation.

(Fig. 1 and Table 2). The rRNA genes were situated between tRNAPhe and tRNALeu and were separated by the tRNAVal. The 22 tRNA genes were interspersed among the rRNAs and PCGs, with non-coding nucleotide sequences (intergenic sequences) between them, with some genes even overlapping (Table 2). It was worth noting that the longest overlap of 43 nucleotides was found between the ATPase 8 and 6 genes for cricetines (Supplementary Fig. S1). All genes were encoded on the H-strand except for ND6 and eight tRNA genes, which were encoded on L-strand (Table 2). A total of 21 of 22 tRNA genes could fold into a clover-leaf secondary structure, but the tRNASer1 (encoded on the Hstrand) was only 58 bp in length and lost the dihydrouridine stem-loop for C. cricetus and P. sungorus. We found that a total of 25 mismatched base pairs and G-U wobble pairs scatter throughout the 17 tRNA genes (the amino acid acceptor arm (12), DHU loop (5), TψC loop (4), and anticodon stems (4)) in predicted secondary structures of C. cricetus tRNAs. These unusual base pairs were as follows: 9 mismatched base pairs (2C-A, 5 U-U, 1 A-A and 1 A-G) and 16 G-U wobble pairs. As such, a total of 35 mismatched base pairs and G-U wobble pairs scatter were found in P. sungorus mitogenome throughout the 15 tRNA genes (the amino acid acceptor arm (12), DHU loop (8), TψC loop (8), and anticodon loop (7)) in predicted secondary structures of P. sungorus tRNAs. These unusual base pairs were as follows: 9 mismatched base pairs (3C-A, 3 U-U, 1 A-A and 1 A-G) and 27 G-U wobble pairs. The Hstrand nucleotide composition showed high AT content (AT-rich), including coding and non-coding sequences, and the G-content at the 3rd

codon position was lower than that of 1st and 2nd positions (Table 3). Codons bias of 13 PCGs were elaborated for C. cricetus and P. sungorus mitogenomes, and CUA (Leu) and AUU (Ile) were the most frequently used codons (Supplementary Tables S3 and S4). For PCGs of C. cricetus and P. sungorus, they shared same start codons and stop codons had differences in ATP8, and ND1, ND2, COX3 and ND4 were an incomplete stop codon (a single stop nucleotide ‘T') (Table 2). The RSCU values of two mitogenomes were summarized in Fig. 2 and Supplementary Tables S3 and S4, which reflected A/T bias in the codon usage of mitogenomes. The GC-skew of all genes was negative for C. cricetus and P. sungorus mitogenome, whereas the AT-skew had positive and negative values that vary from genes (Table 3). The GC-skew of the 3rd codon position showed a strong bias to use the nucleotide C (C. cricetus was −0.646; P. sungorus was −0.753). PCGs located at the H-strand which showed a different skew value, while the ND6 gene displayed an opposite pattern because it was located on the L-strand, and followed by ATP8 and ND2 in C. cricetus and P. sungorus (Supplementary Fig. S2). With regard to the absolute value, GC skew was stronger bias than AT skew among all genes of them. The D-loop was the largest non-coding sequence for C. cricetus and P. sungorus, which was found on the H-strand between the tRNAPro and tRNAPhe genes (Fig. 1 and Table 2). In addition, another non-coding sequence was located on the WANCY region containing the OL. This region was approximately 45 bp long between tRNAAsn gene and tRNACys gene, and had the potential to be folded into a stem-loop structure as the DNA polymerase binding site (Supplementary Fig. S3). 4

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

Table 3 Nucleotide composition (%) and skew value of C. cricetus and P. sungorus mitogenomes. Sequences

PCGs 1st* 2nd* 3rd* tRNA 12S rRNA 16S rRNA D-loop mitogenome

C. cricetus

P. sungorus

T

C

A

G

A+T

C+G

AT skew

GC skew

T

C

A

G

A+T

C+G

AT skew

GC skew

29.2 23 42 23 29.3 24.6 25.3 32.5 28.2

28.3 24.8 26.7 33.5 20.6 21.7 21.6 25.4 27.2

28.7 31.1 19.1 36.0 34.4 35.4 34.6 28.1 30.6

13.7 21.5 12.4 7.2 15.7 18.3 18.4 14.0 14.0

57.9 53.7 60.8 59.3 63.7 60.0 60.0 60.6 58.9

42.1 46.3 39.2 40.7 36.3 40.0 40.0 39.4 41.1

−0.009 0.150 −0.375 0.220 0.080 0.180 0.155 −0.073 0.041

−0.348 −0.071 −0.366 −0.646 −0.135 −0.085 −0.080 −0.289 −0.320

30.1 23.0 42.0 25.0 29.2 25.6 24.6 35.4 28.8

24.8 22.8 26.4 25.3 29.2 20.7 19.7 22.9 24.1

32.9 32.8 19.7 46.2 36.2 37.8 38.8 29.7 34.6

12.2 21.0 12.1 3.6 14.7 15.9 16.9 11.9 12.4

63.0 56.3 61.5 71.2 65.4 63.4 63.5 65.2 63.4

37.0 43.7 38.5 28.8 34.6 36.6 36.5 34.8 36.6

0.045 0.166 −0.361 0.300 0.107 0.193 0.224 −0.087 0.092

−0.341 −0.042 −0.373 −0.753 −0.148 −0.130 −0.077 −0.315 −0.319

Note: The triplet codon position is denoted by 1st (first codon position), 2nd (second codon position) and 3rd (third codon position).

3.2. Protein-coding genes of cricetine mitogenomes

compositions (Supplementary Fig. S4). As such, the C content at the 3rd codon position was significantly correlated with the GC content of PCGs (y = 1.124x+ 32.526; R2 = 0.598, P < .01). The GC-skew of PCGs at the 3rd codon position showed a stronger GC bias than that of 1st and 2nd in cricetines; however, the GC-skew of ND6 gene at the 1st codon position showed a stronger GC bias than that of 2nd and 3rd in cricetines (Supplementary Figs. S5 and S6). It was worth noting that both P2FD and P4FD showed a strong bias against the use of G base at the 3rd codon position within 10 cricetine mitogenomes (Table 4). The P2FD has a higher GC skew value than P4FD (Supplementary Fig. S7), suggesting that P2FD was more relaxed in comparison to P4FD under the natural selection. Moreover, the G content at 3rd codon position was significantly related to the GC skew of P2FD (y = 0.031x - 0.871; R2 = 0.663, P < .01), while the GC skew of P4FD showed a weaker

The PCGs of cricetine mitogenomes were variable in length except for ND1, COX2, ND4L, ND4, ND5, and CYTB, and the most variable length was found in ND4L gene of C. migratorius (3 codons shorter than that of other cricetines; Supplementary Table S5). The most conserved PCG was COX3 (86.2% amino acid sequence identity), whereas the least conserved was ND5 (24.5%). All PCGs of 10 cricetine mitogenomes showed AT-rich and ND6 has a strong GC-skew like C. cricetus and P. sungorus. The overall distribution of color across 10 cricetine mitogenomes showed an irregular change and six dataset did not appear to follow any pattern for coding and non-coding dataset, lacking of clear correlation between compositional bias and phylogenetic position, except for closely related species that often shared very similar

Fig. 2. Relative synonymous codon usages (RSCU) in PCGs of cricetines. Codon families are indicated below the X-axis. The stop codons are not given. 5

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

Table 4 The lengths (bp) and nucleotide composition (%) of P2FD and P4FD on the 3rd codon position. Species

T. triton C. griseus C. longicaudatus A. eversmanni C. kamensis C. migratorius C. cricetus M. auratus P. roborovskii P. sungorus

Size

P2FD

P4FD

P2FD

P4FD

T

C

A

G

A+T

G+C

T

C

A

G

A+T

G+C

1844 1948 1946 1829 1829 1851 1821 1930 1864 1913

1945 1841 1843 1963 1960 1935 1968 1859 1923 1876

26.0 32.0 32.0 27.0 22.0 28.0 26.0 34.0 26.0 29.0

34.8 26.0 26.4 35.0 38.8 33.0 35.5 25.2 34.5 30.8

30.3 36.5 35.7 27.5 35.0 32.8 31.7 36.2 33.4 36.9

8.5 5.0 5.8 10.1 4.0 6.4 6.6 4.4 6.3 3.5

56.7 69.0 67.9 54.9 57.2 60.6 57.8 70.4 59.2 65.7

43.3 31.0 32.1 45.1 42.8 39.4 42.2 29.6 40.8 34.3

21.0 28.0 29.0 22.0 19.0 26.0 21.0 30.0 16.0 21.0

31.2 14.8 16.3 32.2 26.9 30.2 31.8 18.5 22.5 19.6

37.4 53.6 48.9 36.5 49.9 38.6 39.8 47.8 55.7 55.8

10.0 4.0 5.5 9.3 4.5 5.4 7.7 4.0 5.8 3.7

58.8 81.2 78.2 58.5 68.6 64.4 60.5 77.5 71.7 76.7

41.2 18.8 21.8 41.5 31.4 35.6 39.5 22.5 28.3 23.3

correlation (y = 0.022x - 0.738; R2 = 0.258, P > .05). Condon usage indicated that CUA (L) and AUU (I) were the most frequently used codons for 10 cricetine mitogenomes and ATG and TAA were the most frequently used start and stop codons respectively (Supplementary Fig. S8). Here, we detected a significant correlation between GC content and leucine content in PCGs of cricetine mitogenomes (y = 0.065x+ 9.815; R2 = 0.824, P < .01). However, there was no significant correlation between the frequency of stop codon usage (TAA/TAG) and the GC or G content of PCGs in cricetine mitogenomes (G/TAA: R2 = −0.105, P > .05; G/TAG: R2 = −0.030, P > .05; GC/TAA: R2 = −0.068, P > .05; GC/TAG: R2 = −0.122, P > .05).

recognition sites with Cricetulus species (Supplementary Table S7). Base composition showed AT-rich but were not significantly higher than other mtDNA datasets based on the paired t-test, such as rRNA (P > .05) and PCGs (P > .05). The A + T > G + C was commonly found in three domains for cricetine mitogenomes, and the central conserved domain was GC-rich greater than that of two peripheral domains (Supplementary Table S7). In addition, another non-coding sequence was located on the WANCY region containing the OL within 10 cricetine mitogenomes. The AT-rich regions of cricetine mitogenomes were only observed in the D-loop region of P. sungorus (15,496 bp – 15,598 bp) with high nucleotide percentage of A + T (78.64%).

3.3. Ribosomal and transfer RNA genes

3.5. Phylogenetic analysis

The 12S and the 16S rRNA genes of 10 cricetine mitogenomes were located between tRNAPhe and tRNALeu genes, and were separated by tRNAVal gene. The 12S rRNAs ranged from 949 bp (C. cricetus, P. sungorus) to 955 bp (M. auratus) within 10 cricetine mitogenomes, and the 16S RANs varied from 1559 bp (T. triton) to 1571 bp (P. sungorus). There was no substantial size variation between rRNAs. The base composition still showed as AT-rich (average A + T content: 61.2%) (Supplementary Table S6). All tRNA genes were total of 1497 bp (C. migratorius) to 1514 bp (C. kamensis) in length without significant size variation, and the base composite was yet AT-rich (average A + T content: 64.3%) in keeping with the rRNAs. A paired t-test showed that the AT content of tRNAs dataset was significantly higher than that of PCGs (P < .01), rRNAs (P < .01) and D-loop (P < .01). A total of 21 of 22 tRNA genes could fold into a clover-leaf secondary structure, while the tRNASer1 (encoded on the H-strand) lost the dihydrouridine stem-loop structure within 10 cricetine mitogenomes (Supplementary Fig. S9). We detected a total of 32 (C. longicaudatus) to 44 (C. kamensis) mismatched base pairs and G-U wobble pairs scatter throughout the 22 tRNA genes. The results indicated that these mismatches often occurred in the amino acid acceptor arm and G-U wobble pairs were the most frequency mismatched in 22 tRNAs with at the range of 70.3% (P. roborovskii) to 81.1% (C. griseus). Interestingly, only G-U wobble pairs were detected in DHU loop, while other forms of mispairing were richly found in the amino acid acceptor of 10 cricetine mitogenomes, including A-G, A-C, A-A, UeU and UeC.

Consensus tree of 10 cricetines was inferred from the concatenated mtDNA dataset using the methods of BI and ML (Fig. 3). Phylogenetic trees presented the same topological structure with two reconstructed methods, and each clades have a high support value. Phylogenetic analysis indicated that C. kamensis was at the basal position (PP = 1, ML = 100) and separated to the sibling species of P. sungorus and P. roborovskii (PP = 1, ML = 100), and phylogenetically distant from all other Cricetulus species. M. auratus shared a common ancestor with a diversified lineage corresponding to the genera Cricetulus, Tscherskia, Cricetus, and Allocricetulus (PP = 0.99, ML = 85). T. triton split from M. auratus, whereas C. griseus and C. longicaudatus had a close relationship (PP = 1, ML = 100) separated from T. triton (PP = 1, ML = 100). C. migratorius was closely related to A. eversmanni and C. cricetus (PP = 1, ML = 100) in comparison to C. griseus and C. longicaudatus. C. cricetus had a close relationship with the A. eversmanni. 4. Discussion 4.1. Cricetine mitogenomes The mitogenomes of cricetine is similar to that of other mammals in terms of gene quantity and organizational structure [4,5], and characteristics conformed to the typical mammalian mitogenome without any gene insertion, deletion and rearrangement. The nucleotide composition of mitogenome showed AT-rich in line with other mammals [36] as well as other coding and non-coding sequences, including PCGs, rRNAs, tRNAs and D-loop. At this point, DNA with a high AT-content makes the double helix structure less stable because the AT pair is bound by two hydrogen bonds while GC pairs are bound by three hydrogen bonds [58]; therefore, the high AT-content of mitogenome has a higher mutation rate such that it accelerates the evolutionary process for rodents [23]. Interestingly, the A + T content of PCGs at the 3rd codon position was higher than that of 1st and 2nd codon positions, and the G content at the 3rd codon position was lower than that of 1st and 2nd codon positions, in accord with other mammals which is strongly biased against “G” at the 3rd codon position [59,60]. A possible

3.4. Non-coding region and the putative origin of mitogenomic replication The D-loop was the largest noncoding sequence for 10 cricetine mitogenomes, which was found on the H-strand between the tRNAPro and tRNAPhe genes. This region contained three domains: the extended termination-associated sequences (ETAS) domain is adjacent to the tRNAPro gene, the central conserved domain and the conserved sequence blocks (CSB) domain is adjacent to the tRNAPhe gene. The organization structure of the D-loop region was similar to that in other mammals that varied in individual length by comparing the conserved 6

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

Fig. 3. Coalescent tree constructed by BI and ML based on the concatenated mtDNA dataset. The bootstrap values of ML are based on 1000 replications. The posterior probabilities (PP) and bootstrap values (ML) are indicated at each inner node as follows: PP/ML.

explanation was that the AT bias can selectively avoid the formation of stop codons and the loss of amino acids as a resulted that tend to have T-rich codons [61]. There was no significant correlation between compositional bias and phylogenetic position within 10 cricetine mitogenomes, and a prior study on murid rodents [62] supported our result. As such, Huang and Yang [63] investigated base compositions of 609 mitogenomes across 20 phyla and results indicated that the GC contents of mitogenomes in animal phyla had no significant difference could not reflect the evolutionary relationships among them. Skewness analysis of mtDNA indicated that the ND6 gene showed stronger asymmetric characteristic of 10 cricetine mitogenome. A similar pattern is also seen in other mammals, such as Indian mouse deer [64], suggesting that DN6 genes suffered from the lower selective pressure owe to encode the L-strand. The G content at 3rd codon position has a greater effect on P2FD than P4FD, in addition, supporting the existence of asymmetric mutation pressure in cricetines. Mitogenome showing a symmetric gene distribution was very rare, similarly, this characteristic also presented in cricetine mitogenomes. The asymmetrical process of mitogenome replication is related to mutation pattern, substitution rate and compositional asymmetry [65]. These characteristics are proportional to the time spent by the H-strand in single-strand status and asymmetrical directional mutation pressure [4,66,67]. The RSCU also reflects A/T bias at the 3rd codon position of mitogenomes [68]. Codon usage analysis revealed that the codon of nonpolar amino acids is the most commonly used codon, which is beneficial to form α-helix or βpleated sheet because most of them occupy a large proportion of protein structure [69]. Moreover, we detected a significant correlation between the GC content of PCGs and leucine content within 10 cricetine mitogenomes, and Gibson et al. [62] suggested that the base composition is closely related to the leucine content in line with our result. Like other vertebrates [4,70,71], PCGs of cricetine mitogenomes also contain incomplete stop codons. Commonly, when incomplete stop codons are present, they are supposed to be “completed” by polyadenylation [72]. The RNA-Seq analysis in the bank vole demonstrates that a TAA stop codon is created by post-transcriptional polyadenylation that change T or TA residues to the complete TAA stop codon [73], and can support the same mechanism to be used in cricetine here. The stop codons in the PCGs are random and probably influenced by the GC or G content in the bacterial lineage [74], while the frequency of stop codon usage (TAA/TAG) was not affected by the GC or G content of the PCGs within 10 cricetine mitogenomes. It was worth noting that the tRNASer1 (encoded on the H-strand) lost the dihydrouridine stem-loop structure in cricetine mitogenomes. This type of tRNA structure has been considered a typical feature of metazoan mitogenomes [4]. The secondary structure of lacking dihydrouridine arm is likely related to its

structural compensation mechanism among tRNA arms [75]. Mismatched and wobble pairs are commonly found in invertebrate tRNAs and can be corrected by posttranscriptional RNA editing processes [76]. The D-loop region was the largest non-coding sequence within 10 cricetine mitogenomes in agreement with other mammals [5]. This region contains the major regulatory elements for the replication and expression of the mitogenome in mice [77,78], including the major sites of transcriptional initiation and the origin of H-strand DNA replication in mammalian mtDNA [79]. According to previous researches, this region can be divided into three domains. First is the ETAS domain that is adjacent to the tRNAPro gene where synthesis of the H-strand pauses and associates with regulation of replication and transcription. Second is the central conserved domain. Third is the CSB domain that is adjacent to the tRNAPhe gene which contains the origin of the H-strand replication, two promoters and the CSB associated with the initiation of H-strand synthesis [6,80,81]. Commonly, the Central domain appeared to have different nucleotide composition constraints with respect to the two peripheral domains. Here, the Central domain of D-loop region showed G-rich in comparison to two peripheral domains within 10 cricetine mitogenomes, which was also commonly found in other cricetines and might be related to its conserved function [6,81]. Interestingly, we detected an AT-rich region in the D-loop region of P. sungorus, which the AT content was significantly higher than other mtDNA datasets such as insects [82]. The OL of C. cricetus and P. sungorus mitogenomes are located on the WANCY cluster in line with other rodents [6,73]. Prior studies suggest that the motif 5′-GCCGG-3′ is commonly found in other vertebrates [83,84] and has been reported to be critical for human mtDNA replication [85]. As such, the motif 5′-ACTGA-3′ can perform the same role in L-light replication, which plays a crucial role in determining the start site of DNA polymerase [86]. The synthesis starting motif (5′ACTGA-3′) of L-strand is commonly found in C. migratorius and A. eversmanni, but it is different from other cricetines, such as C. griseus and P. roborovskii (Rodentia: Cricetidae), which is the motif 5′-ACTAA3′ [6,7]. However, the motif 5′-GCCGG-3′ is found in many vertebrates [83,84], but it is not found in cricetine mitogenomes. 4.2. Remarks on the taxonomy of Cricetinae Subfamily Cricetinae often encountered some plausible taxonomic conclusions as previous taxonomic studies based on morphological were partially incongruent with current molecular systematic studies. Phylogenetic analysis indicated that C. kamensis have a close relationship with the genus Phodopus rather than the genus Cricetulus in line with the Lebedev et al. [19]’s phylogenetic result. This result broke the 7

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al.

understanding of C. kamensis classification by taxonomists for the past few decades, which was closely related to the genus Cricetulus [13,15,16,18]. The classification of C. kamensis has long been controversial. Historically, Satunin [12] name the female cricetine from the Southeastern part of the Qinghai-Tibet Plateau for the first time as an Urocricetus kamensis (Type locality: the Moktschjun River in the upper reaches of the Lantsang River) in 1903. Argyropulo [13] sorted the U. kamensis specimens, and replace the genus Urocricetus with the genus Cricetulus and kept the species name (C. kamensis). Subsequent taxonomic studies on hamster adopted this classification conclusion [1,14–16,18]. However, Pavlinov [19] persist in Urocricetus regarding as a separate subgenus of Cricetulus and is phylogenetically distant from Cricetulus. Recently, Lebedev et al. [20] suggested that Urocricetus is phylogenetically distant from all other Cricetulus but is closely related to the group of Phodopus based on the molecular and morphologic datasets and Urocricetus definitely deserves the rank of a separate genus. Their conclusion on the phylogenetic status of C. kamensis supported our results, and we also supported that the genus Urocricetus deserved as a separate generic rank corresponding to the genus Cricetulus rather than that of a subgenus of Cricetulus. There were three evidences can support our viewpoint on the C. kamensis classification. First, C. kamensis shared a common ancestor with the group of Phodopus and the divergence time can date back to the Middle Miocene (about 10 million years ago) [20]. Second, Bradley & Baker [87] suggested that the intrageneric K2P distance is at a range of 2.23% to 21.97% with an average of 11.46%. The K2P distance between C. kamensis and other cricetines was at a range of 19.9% to 25.3% (mean: 22.6%) based on the complete CYTB dataset, however, the intergeneric distance between Cricetinae was at a range of 20.9% to 25.3%. Here, we believed that the genus Urocricetus met the intergeneric level. Third, a morphological study on cricetine classification point out C. kamensis is the most specialized taxon and distant from typical cricetines (Mesocricetus, Allocricetulus, Cricetulus, Tscherskia and Cricetus) [88]. Specifically, the cluster analyses of external and craniodental characteristics indicated that C. kamensis is distant from the smaller cricetines cluster (dissimilarity of 77%) as a result of ecology dependent morphological traits, including intermediate (mouse-cricetine) morphology and bounding gait. Besides, C. kamensis showed reverse evolution direction that a change of locomotion is from non-bounding to bounding rather than from bounding (as in mice) to non-bounding (as in typical cricetines) in term of adaptive evolution trait. Our phylogenetic analysis also indicated that P. roborovskii was closely related to P. sungorus in line with previous molecular systematic and cytogenetic studies [10,11]. Our phylogenetic result indicated that M. auratus split from a diversified cricetine taxa (Tscherskia, Cricetulus, Allocricetulus and Cricetus), and its phylogenetic position supported by previous molecular systematic and cytogenetic studies on the Cricetinae [8,10,11]. However, T. triton divergence earlier than C. griseus and C. longicaudatus conflicted with previous studies [6,8,10]. We believed that the phylogenetic status of T. triton was reliable in the current study because of the longer DNA sequence and the relatively complete number of species, and the chromosomal phylogenetic tree [11] also supported our result. As such, the K2P distance between T. triton and other cricetines was at a range of 19.9% to 26.6% (mean: 22.9%) informed from the complete CYTB dataset and also reached the genus divergence level [87]. Here, it was reasonable that the genus Tscherskia regarded as a separate monotypic genus corresponding to the genus Cricetulus. C. griseus has a close relationship with C. longicaudatus in accord with the results of Neumann et al. [10] as well as the phylogenetic position of C. migratorius. C. cricetus had a close relationship with A. eversmanni by combining mitochondrial and nuclear genes data that has been proposed [8,10], and supported our result. In summary, our phylogenetic results on Cricetinae species was coincident with Lebedev et al. [20]’ result based on the construction of 10 cricetine mitogenomes data. Unfortunately, the relationship between Tscherskia and Cansumys remains to be resolved, and few genetic data on Cansumys is available

thus far. Morphological characteristics have long constituted the basis of Gansu hamster C. canus (Allen, 1928) classification and is no consistent conclusion. C. canus, an endemic species to China, was first found in Zhuoni County (Gansu province, China) and derived a new genus Cansumys by Allen [89]. Subsequent taxonomic studies accepted the taxonomic status of C. canus proposed by Allen [89] in terms of morphological [90] and cytogenetical [91] evidences. However, Ellerman [92] and Argyropulo [13] persist in C. canus is a subspecies of T. triton rather than that of a separate species, and Liao et al. [93] supported this conclusion based on morphological and mitochondrial Dloop datasets. Therefore, further research was needed to shed light on the phylogenetic status and taxonomic issue of C. canus using mitogenome and nuclear genes data. Acknowledgment The work was supported by the National Key Research and Development Program of China (2017YFC0504802) and the National Natural Science Foundation of China (Nos. 30870294, 31372179). Declaration of Competing Interest None. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ygeno.2019.09.016. References [1] Z.X. Luo, W. Chen, W. Gao, Fauna Sinica: Mammalia (Rodentia, Cricetidae), Science Press, Beijing, 2000. [2] I.Y. Pavlinov, A.A. Lissovsky, The Mammals of Russia: A Taxonomic and Geographic Reference, KMK Scientific Press Ltd., Moscow, 2012. [3] U.F.J. Pardiñas, P. Myers, L. León-Paniagua, Handbook of the Mammals of the World, 7 Lynx Edicions, Barcelona, 2017 Family Cricetidae (true hamsters, voles, lemmings and new world rats and mice). [4] C. Gissi, F. Iannelli, G. Pesole, Evolution of the mitochondrial genome of Metazoa as exemplified by comparison of congeneric species, Heredity 101 (2008) 301–320, https://doi.org/10.1038/hdy.2008.62. [5] J.L. Boore, Animal mitochondrial genomes, Nucleic Acids Res. 27 (1999) 1767–1780, https://doi.org/10.1093/nar/27.8.1767. [6] L. Ding, W. Li, J. Liao, Mitochondrial genome of Cricetulus migratorius (Rodentia: Cricetidae): insights into the characteristics of the mitochondrial genome and the phylogenetic relationships of Cricetulus species, Gene 595 (2016) 121–129, https:// doi.org/10.1016/j.gene.2016.10.003. [7] L. Ding, W. Li, J. Liao, Characterization of the complete mitochondrial genome of Phodopus roborovskii (Rodentia: Cricetidae) and systematic implications for Cricetinae phylogenetics, Biochem. Syst. Ecol. 69 (2016) 226–235, https://doi.org/ 10.1016/j.bse.2016.10.010. [8] P.H. Fabre, L. Hautier, D. Dimitrov, E.J. Douzery, A glimpse on the pattern of rodent diversification: a phylogenetic approach, BMC Evol. Biol. 12 (2012) 88, https://doi. org/10.1186/1471-2148-12-88. [9] K. Neumann, J. Michaux, H. Jansman, A. Kayser, G. Mundt, R. Gattermann, Genetic spatial structure of European common hamsters (Cricetus cricetus) - a result of repeated range expansion and demographic bottlenecks, Mol. Ecol. 14 (2005) 1473–1483, https://doi.org/10.1111/j.1365-294X.2005.02519.x. [10] K. Neumann, J. Michaux, V. Lebedev, N. Yigit, E. Colak, N. Ivanova, A. Poltoraus, A. Surov, G. Markov, S. Maak, S. Neumann, R. Gattermann, Molecular phylogeny of the Cricetinae subfamily based on the mitochondrial cytochrome b and 12S rRNA genes and the nuclear vWF gene, Mol. Phylogenet. Evol. 39 (2006) 135–148, https://doi.org/10.1016/j.ympev.2006.01.010. [11] S.A. Romanenko, V.T. Volobouev, P.L. Perelman, V.S. Lebedev, N.A. Serdukova, V.A. Trifonov, L.S. Biltueva, W. Nie, P.C. O’Brien, M.A. Bulatova, F. FergusonSmith, A.S.G. Yang, Karyotype evolution and phylogenetic relationships of hamsters (Cricetidae, Muroidea, Rodentia) inferred from chromosomal painting and banding comparison, Chromosom. Res. 15 (2007) 283–297, https://doi.org/10.1007/ s10577-007-1124-3. [12] K.A. Satunin, Ann. Mus. Zool., Acad. Imp. Sci. St. Petersb. 7 (1903) 574. [13] A.I. Argyropulo, Die Gatungen und Arten der hamster (Cricetinae Murray, 1866) der Paläarktik, Zeit. säugeterk. 8 (1933) 139–148. [14] G.B. Corbet, The Mammals of the Palaearctic Region: a Taxonomic Review, Cornell University Press, London and Ithaca, 1978. [15] Z.J. Feng, G.Q. Cai, C.L. Zheng, The Mammals of Tibet, Science Press, Beijing, 1986. [16] A.T. Smith, Y. Xie, Mammals of China, Princeton University Press, Princeton, 2013.

8

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al. [17] S. Wang, C.L. Zheng, Notes on Chinese hamsters (Cricetinae), Acta Zool. Sin. 19 (1973) 61–68. [18] D.E. Wilson, D.M. Reeder, Mammal Species of the World. A Taxonomic and Geographic Reference, 3rd ed, Johns Hopkins University Press, Baltimore, 2005. [19] I.Y. Pavlinov, Taxonomy of recent mammals, 46 Archives of Zoological Museum of Moscow State University, 2003, pp. 3–297. [20] V.S. Lebedev, A.A. Bannikova, K. Neumann, M.V. Ushakova, N.V. Ivanova, A.V. Surov, Molecular phylogenetics and taxonomy of dwarf hamsters Cricetulus Milne-Edwards, 1867 (Cricetidae, Rodentia): description of a new genus and reinstatement of another, Zootaxa 4387 (2018) 331–349, https://doi.org/10.11646/ zootaxa.4387.2.5. [21] C.A. Hutchison, J.E. Newbold, S.S. Potter, M.H. Edgell, Maternal inheritance of mammalian mitochondrial DNA, Nature 251 (1974) 536–538, https://doi.org/10. 1038/251536a0. [22] G. Piganeau, M. Gardner, A. Eyrewalker, A broad survey of recombination in animal mitochondria, Mol. Biol. Evol. 21 (2004) 2319–2325, https://doi.org/10.1093/ molbev/msh244. [23] D.A. Triant, J.A. Dewoody, Accelerated molecular evolution in Microtus (Rodentia) as assessed via complete mitochondrial genome sequences, Genetica 128 (2006) 95–108, https://doi.org/10.1007/s10709-005-5538-6. [24] V. Campbell, F.J. Lapointe, Retrieving a mitogenomic mammal tree using composite taxa, Mol. Phylogenet. Evol. 58 (2011) 149–156, https://doi.org/10.1016/j.ympev. 2010.11.017. [25] J.C. Havird, S.R. Santos, Performance of single and concatenated sets of mitochondrial genes at inferring metazoan relationships relative to full mitogenome data, PLoS One 9 (2014) e84080, , https://doi.org/10.1371/journal.pone.0084080. [26] B. Malyarchuk, M. Derenko, G. Denisova, A mitogenomic phylogeny and genetic history of sable (Martes zibellina), Gene 550 (2014) 56–67, https://doi.org/10. 1016/j.gene.2014.08.015. [27] D.W. Weisrock, Concordance analysis in mitogenomic phylogenetics, Mol. Phylogenet. Evol. 65 (2012) 194–202, https://doi.org/10.1016/j.ympev.2012.06. 003. [28] T.D. Kocher, W.K. Thomas, A. Meyer, S.V. Edwards, S. Paabo, F.X. Villablanca, A.C. Wilson, Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers, Proc. Natl. Acad. Sci. U. S. A. 86 (1989) 6196–6200, https://doi.org/10.1073/pnas.86.16.6196. [29] K. Rassmann, Evolutionary age of the Galápagos iguanas predates the age of the present Galápagos islands, Mol. Phylogenet. Evol. 7 (1997) 158–172, https://doi. org/10.1006/mpev.1996.0386. [30] X.X. Shen, D. Liang, Y.J. Feng, M.Y. Chen, P. Zhang, A versatile and highly efficient toolkit including 102 nuclear markers for vertebrate phylogenomics, tested by resolving the higher level relationships of the caudata, Mol. Biol. Evol. 30 (2013) 2235–2248, https://doi.org/10.1093/molbev/mst122. [31] P. Zhang, Y.Q. Chen, H. Zhou, X.L. Wang, L.H. Qu, The complete mitochondrial genome of a relic salamander, Ranodon sibiricus (Amphibia: Caudata) and implications for amphibian phylogeny, Mol. Phylogenet. Evol. 28 (2003) 620–626, https://doi.org/10.1016/S1055-7903(03)00059-9. [32] T.A. Hall, BioEdit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/NT, Nucleic Acids Symp. Ser. 41 (1999) 95–98. [33] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (2013) 2725–2729, https://doi.org/10.1093/molbev/mst197. [34] L.J. Revell, Phytools: an R package for phylogenetic comparative biology (and other things), Methods Ecol. Evol. 3 (2012) 217–223, https://doi.org/10.1111/j.2041210X.2011.00169.x. [35] C.R. Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, 2017. https://www.r-project.org/. [36] N.T. Perna, T.D. Kocher, Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes, J. Mol. Evol. 41 (1995) 353–358, https:// doi.org/10.1007/BF00186547. [37] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410, https://doi.org/10.1016/S00222836(05)80360-2. [38] T.M. Lowe, S.R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence, Nucleic Acids Res. 25 (1997) 955–964, https:// doi.org/10.1093/nar/25.5.0955. [39] G. Benson, Tandem repeats finder: a program to analyze DNA sequences, Nucleic Acids Res. 27 (1999) 573–580, https://doi.org/10.1093/nar/27.2.573. [40] J.R. Grant, P. Stothard, The CGView server: a comparative genomics tool for circular genomes, Nucleic Acids Res. 36 (2008) W181–W184, https://doi.org/10. 1093/nar/gkn179. [41] M. Zuker, Mfold web server for nucleic acid folding and hybridization prediction, Nucleic Acids Res. 31 (2003) 3406–3415, https://doi.org/10.1093/nar/gkg595. [42] G. Vaidya, D.J. Lohman, R. Meier, SequenceMatrix: concatenation software for the fast assembly of multi-gene datasets with character set and codon information, Cladistics 27 (2011) 171–180, https://doi.org/10.1111/j.1096-0031.2010.00329.x. [43] G. Churakov, M.K. Sadasivuni, K.R. Rosenbloom, D. Huchon, J. Brosius, J. Schmitz, Rodent evolution: back to the root, Mol. Biol. Evol. 27 (2010) 1315–1326, https:// doi.org/10.1093/molbev/msq019. [44] F. Ronquist, M. Teslenko, P. van der Mark, D.L. Ayres, A. Darling, S. Höhna, B. Larget, L. Liu, M.A. Suchard, J.P. Huelsenbeck, MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space, Syst. Biol. 61 (2012) 539–542, https://doi.org/10.1093/sysbio/sys029. [45] A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics 30 (2014) 1312–1313, https://doi.org/10. 1093/bioinformatics/btu033.

[46] Z. Yang, B. Rannala, Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method, Mol. Biol. Evol. 14 (1997) 717–724, https:// doi.org/10.1093/oxfordjournals.molbev.a025811. [47] J. Felsenstein, Evolutionary trees from DNA sequences: a maximum likelihood approach, J. Mol. Evol. 17 (1981) 368–376, https://doi.org/10.1007/BF01734359. [48] H. Akaike, A new look at the statistical model identification, IEEE T. Automat. Contr. 19 (1974) 716–723, https://doi.org/10.1109/TAC.1974.1100705. [49] R. Lanfear, B. Calcott, S.Y. Ho, S. Guindon, Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses, Mol. Biol. Evol. 29 (2012) 1695–1701, https://doi.org/10.1093/molbev/mss020. [50] J. Felsenstein, Confidence limits on phylogenies: an approach using the bootstrap, Evolution (1985) 783–791, https://doi.org/10.1111/j.1558-5646.1985.tb00420.x. [51] D.M. Hillis, J.J. Bull, An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis, Syst. Biol. 42 (1993) 182–192, https://doi.org/ 10.1093/sysbio/42.2.182. [52] A.D. Leaché, T.W. Reeder, Molecular systematics of the eastern fence lizard (Sceloporus undulatus): a comparison of parsimony, likelihood, and Bayesian approaches, Syst. Biol. 51 (2002) 44–68, https://doi.org/10.1080/ 106351502753475871. [53] M. Kimura, A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences, J. Mol. Evol. 16 (1980) 111–120, https://doi.org/10.1007/BF01731581. [54] M.A. Partridge, M.M. Davidson, T.K. Hei, The complete nucleotide sequence of Chinese hamster (Cricetulus griseus) mitochondrial DNA, DNA Seq. 18 (2007) 341–346, https://doi.org/10.1080/10425170601101287. [55] Z. Zhang, T. Sun, C. Kang, Y. Liu, S. Liu, B. Yue, T. Zeng, The complete mitochondrial genome of lesser long-tailed hamster Cricetulus longicaudatus (MilneEdwards, 1867) and phylogenetic implications, Mitochondrial DNA Part A 27 (2016) 1303–1304, https://doi.org/10.3109/19401736.2014.945567. [56] G. Luo, J. Liao, The complete mitochondrial genome of Allocricetulus eversmanni (Rodentia: Cricetidae), Mitochondrial DNA Part A 27 (2016) 3102–3104, https:// doi.org/10.3109/19401736.2015.1007285. [57] C. Kang, H. Yue, M. Liu, T. Huang, Y. Liu, X. Zhang, B. Yue, T. Zeng, S. Liu, The complete mitochondrial genome of Cricetulus kamensis (Rodentia: Cricetidae), Mitochondrial DNA Part A 27 (2016) 976–977, https://doi.org/10.3109/ 19401736.2014.926513. [58] P. Yakovchuk, E. Protozanova, M.D. Frankkamenetskii, Base-stacking and basepairing contributions into thermal stability of the DNA double helix, Nucleic Acids Res. 34 (2006) 564–574, https://doi.org/10.1093/nar/gkj454. [59] X. Jiang, J. Gao, L. Ni, J. Hu, K. Li, F. Sun, J. Xie, X. Bo, C. Gao, J. Xiao, The complete mitochondrial genome of Microtus fortis calamorum (Arvicolinae, Rodentia) and its phylogenetic analysis, Gene 498 (2012) 288–295, https://doi. org/10.1016/j.gene.2012.02.022. [60] H.M. Zhong, H.H. Zhang, W.L. Sha, C.D. Zhang, Y.C. Chen, Complete mitochondrial genome of the red fox (Vuples vuples) and phylogenetic analysis with other canid species, Zool. Res. 31 (2010) 122–130, https://doi.org/10.3724/SP.J.1141.2010. 02122. [61] C.A. Charneski, F. Honti, J.M. Bryant, L.D. Hurst, E.J. Feil, Atypical at skew in Firmicute genomes results from selection and not from mutation, PLoS Genet. 7 (2011) e1002283, , https://doi.org/10.1371/journal.pgen.1002283. [62] A. Gibson, V. Gowri-Shankar, P.G. Higgs, M. Rattray, A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods, Mol. Biol. Evol. 22 (2005) 251–264, https://doi.org/10.1093/molbev/ msi012. [63] Z.H. Huang, C.Z. Yang, Analysis of GC contents in animal mitogenome, Sichuan J. Zool. 34 (2015) 107–110, https://doi.org/10.3969/j.issn.1000-7083.2015.01.019. [64] R.K. Sarvani, D.R. Parmar, W. Tabasum, N. Thota, A. Sreenivas, A. Gaur, Characterization of the complete mitogenome of Indian mouse deer, Moschiola indica (Artiodactyla: Tragulidae) and its evolutionary significance, Sci. Rep. 8 (2018) 2697, https://doi.org/10.1038/s41598-018-20946-5. [65] A.H. Sahyoun, M. Bernt, P.F. Stadler, K. Tout, GC skew and mitochondrial origins of replication, Mitochondrion 17 (2014) 56, https://doi.org/10.1016/j.mito.2014.05. 009. [66] J.R. Lobry, Asymmetric substitution patterns in the two DNA strands of bacteria, Mol. Biol. Evol. 13 (1996) 660–665, https://doi.org/10.1093/oxfordjournals. molbev.a025626. [67] A. Reyes, C. Gissi, G. Pesole, C. Saccone, Asymmetrical directional mutation pressure in the mitochondrial genome of mammals, Mol. Biol. Evol. 15 (1998) 957–966, https://doi.org/10.1093/oxfordjournals.molbev.a026011. [68] J.S. Kim, M.J. Kim, J.S. Jeong, I. Kim, Complete mitochondrial genome of Saturnia jonasii (Lepidoptera: Saturniidae): genomic comparisons and phylogenetic inference among Bombycoidea, Genomics 110 (2018) 274–282, https://doi.org/10.1016/j. ygeno.2017.11.004. [69] J. Wang, S. Zhu, C. Xu, Essential Biochemistry, Higher Education Press, Beijing, 1997. [70] Q.J. Chao, Y.D. Li, X.X. Geng, L. Zhang, X. Dai, X. Zhang, J. Li, H.J. Zhang, Complete mitochondrial genome sequence of Marmota himalayana (Rodentia: Sciuridae) and phylogenetic analysis within Rodentia, Genet. Mol. Res. 13 (2013) 2739–2751, https://doi.org/10.4238/2014.April.14.3. [71] H. Yue, C. Yan, F. Tu, C. Yang, W. Ma, Z. Fan, Z. Song, J. Owens, S. Liu, X. Zhang, Two novel mitogenomes of Dipodidae species and phylogeny of Rodentia inferred from the complete mitogenomes, Biochem. Syst. Ecol. 60 (2015) 123–130, https:// doi.org/10.1016/j.bse.2015.04.013. [72] J.H. Chang, L. Tong, Mitochondrial poly(a) polymerase and polyadenylation, Biochim. Biophys. Acta 1819 (2012) 992–997, https://doi.org/10.1016/j.bbagrm. 2011.10.012.

9

Genomics xxx (xxxx) xxx–xxx

L. Ding, et al. [73] S. Marková, K. Filipi, J.B. Searle, P. Kotlík, Mapping 3′ transcript ends in the bank vole (Clethrionomys glareolus) mitochondrial genome with RNA-Seq, BMC Genomics 16 (2015) 1–9, https://doi.org/10.1186/s12864-015-2103-2. [74] I. Povolotskaya, F. Kondrashov, A. Ledda, P. Vlasov, Stop codons in bacteria are not selectively equivalent, Biol. Direct 7 (2012) 30, https://doi.org/10.1186/17456150-7-30. [75] S. Steinberg, R. Cedergren, Structural compensation in atypical mitochondrial tRNAs, Nat. Struct. Biol. 1 (1994) 507–510, https://doi.org/10.1038/nsb0894-507. [76] D. Lavrov, W. Brown, J. Boore, A novel type of RNA editing occurs in the mitochondrial tRNAs of the centipede Lithobius forficatus, Proc. Natl. Acad. Sci. U. S. A. 97 (2000) 13738–13742, https://doi.org/10.1073/pnas.250402997. [77] J.N. Doda, C.T. Wright, D.A. Clayton, Elongation of displacement-loop strands in human and mouse mitochondrial DNA is arrested near specific template sequences, Proc. Natl. Acad. Sci. U. S. A. 78 (1981) 6116–6120, https://doi.org/10.1073/pnas. 78.10.6116. [78] E. Sbisà, M. Nardelli, F. Tanzariello, A. Tullo, C. Saccone, The complete and symmetric transcription of the main non coding region of rat mitochondrial genome: in vivo mapping of heavy and light transcripts, Curr. Genet. 17 (1990) 247–253, https://doi.org/10.1007/BF00312616. [79] D.A. Clayton, Replication and transcription of vertebrate mitochondrial DNA, Annu. Rev. Cell Biol. 7 (1991) 453–478, https://doi.org/10.1146/annurev.cb.07.110191. 002321. [80] C. Saccone, M. Attimonelli, E. Sbisa, Structural elements highly preserved during the evolution of the D-loop-containing region in vertebrate mitochondrial DNA, J. Mol. Evol. 26 (1987) 205–211, https://doi.org/10.1007/BF02099853. [81] E. Sbisà, F. Tanzariello, A. Reyes, G. Pesole, C. Saccone, Mammalian mitochondrial D-loop region structural analysis: identification of new conserved sequences and their functional and evolutionary implications, Gene 205 (1997) 125–140, https:// doi.org/10.1016/S0378-1119(97)00404-6. [82] G. Niu, E.M. Korkmaz, Ö. Doğan, Y. Zhang, M.N. Aydemir, M. Budak, S. Du, H.H. Başıbüyük, M. Wei, The first mitogenomes of the superfamily Pamphilioidea

[83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93]

10

(Hymenoptera: Symphyta): Mitogenome architecture and phylogenetic inference, Int. J. Biol. Macromol. 124 (2019) 185–199, https://doi.org/10.1016/j.ijbiomac. 2018.11.129. S.L. Pereira, Mitochondrial genome organization and vertebrate phylogenetics, Genet. Mol. Biol. 23 (2000) 745–752, https://doi.org/10.1590/S141547572000000400008. R. Zardoya, A. Garrido-Pertierra, J.M. Bautista, The complete nucleotide sequence of the mitochondrial DNA genome of the rainbow trout, Oncorhynchus mykiss, J. Mol. Evol. 41 (1995) 942–951, https://doi.org/10.1007/BF00173174. J. Hixson, T. Wong, D.A. Clayton, Both the conserved stem-loop and divergent 5′flanking sequences are required for initiation at the human mitochondrial origin of light-strand DNA replication, J. Biol. Chem. 261 (1986) 2384–2390. T.W. Wong, D.A. Clayton, In vitro replication of human mitochondrial DNA: accurate initiation at the origin of light-strand synthesis, Cell 42 (1985) 951–958, https://doi.org/10.1016/0092-8674(85)90291-0. R.D. Bradley, R.J. Baker, A test of the genetic species concept: cytochrome-b sequences and mammals, J. Mammal. 82 (2001) 960–973, https://doi.org/10.1644/ 1545-1542(2001)082<0960:ATOTGS>2.0.CO;2. A. Miljutin, Trends of specialisation in rodents: the hamsters, subfamily Cricetinae (Cricetidae, Rodentia, Mammalia), Acta Zoologica Lituanica 21 (2011) 192–206, https://doi.org/10.2478/v10043-011-0024-0. G. Allen, A new cricetine genus from China, J. Mammal. 9 (1928) 244–246. Y. Gu, Y. Ma, Y.H. Sun, A rediscussion on the taxonomic status of Cansumys canus, Chin. J. Zool. 40 (2005) 116–120, https://doi.org/10.13859/j.cjz.2005.03.024. L. Yang, X. Chen, X. Zhao, J. Wang, Karyotype and classification status of Cansumys canus (Cricetidea, Rodentia), Acta Theriol. Sin. 23 (2003) 235–238. J.R. Ellerman, The Families and Genera of Living Rodents: Family Muridae, British Museum (Natural History), London, 1941. J. Liao, Z. Xiao, Y. Dong, Z. Zhang, N. Liu, J. Li, Taxonomic status of Cansumys canus (Allen,1928), Acta Zool. Sin. 53 (2007) 44–53.