Gene Reports 17 (2019) 100523
Contents lists available at ScienceDirect
Gene Reports journal homepage: www.elsevier.com/locate/genrep
A study on codon usage bias in cytochrome c oxidase I (COI) gene of solitary ascidian Herdmania momus Savigny, 1816.
T
Shabeer Ahmed Nariyampet , Abdul Jaffar Ali Hajamohideen ⁎
Department of Biotechnology, Islamiah College (Autonomous), Vaniyambadi 635752, India
ARTICLE INFO
ABSTRACT
Keywords: Herdmania momus Synonymous codon usage bias Cytochrome c oxidase I (COI) gene
Synonymous codon usage bias is an inevitable phenomenon in organismic taxa, where synonymous codons occur at uneven frequencies in gene. Synonymous codon usage pattern differs between species and also among the genes of a genome. Codon usage bias is determined under various degrees by mutational bias, natural selection and other factors. In this study codon usage pattern of cytochrome c oxidase I (COI) gene of a solitary ascidian Herdmania momus was determined by analyzing the nucleotide content, codon distribution frequency, hydrophobicity, Relative Synonymous Codon Usage (RSCU), Effective Number of Codons (ENC), GC3 and Codon Adaptation Index (CAI) values. Results presented low GC content and higher ENC values indicating less bias in codon usage preference. The Grand Average of Hydropathicity (GRAVY) values and hydropathy plot confirmed the hydrophobic nature of COI the protein.
1. Introduction Triplets of nucleotides called codons encode amino acids, the building blocks of proteins. Most of the amino acids are encoded by more than one codon, called as the synonymous codon. Synonymous codons differ by a single base in the third position of a codon, and in some amino acids in the second position (Grantham et al., 1980; Grantham et al., 1981). The frequency of synonymous codon usage is not equal (Bennetzen and Hall, 1982), where some codons are preferred over others. Codon usage bias is an essential feature of many genomes (Duret, 2002), which may be an outcome of either mutational bias or translational selection. Generally variations in codon usage occur due to natural selection and or mutation pressure for the precise and efficacious translation in many organisms (Jenkins and Holmes, 2003; Palidwor et al., 2010; Sharp et al., 2010). Codon usage bias has many important significances like study of heterologous protein expression in some species (Cai et al., 2009) in the field of molecular evolution to understand the evolution of living organism by studying the individual gene (Sheng et al., 2007). Codon usage bias ensures that the pairing of the most frequent and optimal codon with the anticodon of the most abundant tRNA genes (Sun et al., 2009). It also reveals the balance between natural selection and mutational bias (Sharp and Li, 1986a, 1986b). According to Peden (2000) various factors control the
synonymous codon usage bias, which is determined by mutational bias alone or by both mutation bias and natural selection. Previous studies in mammal (Francino and Ochman, 1999) showed mutational bias as the driving force of bias in mammals, while other researchers believed natural selection to be the determining factor in the codon usage bias in eukaryotic organisms (Moriyama and Powell, 1997; Wang and Hickey, 2007; Ingvarsson, 2007). Hence by studying the codon usage bias, codon usage pattern of species can be known and will have the evidence about the evolution of that organism. Codon usage bias have been studied in bacteria (Liu et al., 2016), insects, such as silk worm (Wei et al., 2014), mosquitoes (Morlais and Severson, 2002), other insects (Herbeck and Novembre, 2003), virus (Zhong et al., 2012), human serum proteins (Mirsafian et al., 2014), but to the best of our knowledge and extensive literature survey, there is no study on codon usage bias in ascidian genes. Ascidians, commonly called as tunicates, belonging to the subphylum Urochordata are the largest and most diverse group among the macrofouling communities in marine ecosystem. Ascidians are generally identified through conventional taxonomy which uses morphological characters for identification, but in recent years, with the advancement in computational technology and sequencing, a new approach called DNA barcoding has been employed for the identification of species. This method uses a short stretch of mitochondrial cytochrome c oxidase subunit I (COI)
Abbreviations: RSCU, Relative Synonymous Codon Usage; COI, cytochrome c oxidase I; ENC, Effective Number of Codons; CAI, Codon Adaptation Index; GRAVY, The Grand Average of Hydropathicity; NCBI, National Centre for Biotechnology Information ⁎ Corresponding author. E-mail address:
[email protected] (S.A. Nariyampet). https://doi.org/10.1016/j.genrep.2019.100523 Received 27 July 2019; Received in revised form 13 September 2019; Accepted 22 September 2019 Available online 21 October 2019 2452-0144/ © 2019 Elsevier Inc. All rights reserved.
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
gene for the identification of species. COI is involved in electron transport chain and protein translocation across membrane (Saraste, 1990). COI gene helped in the identification of the study species Herdmania momus (Jaffar Ali and Shabeer Ahmed, 2016). The pattern of synonymous codon usage bias was studied in the CO1 gene of H. momus using bioinformatics tools.
(excluding start and termination codon). GC3 was calculated using Codon W 1.4 (http://codonw.sourceforge.net). 2.9. Indices related with natural selection 2.9.1. Codon Adaptation Index (CAI) It is used to measure the gene expression level. CAI values were calculated using CAIcal Server (n.d.) (http://genomes.urv.es/CAIcal/ intro.php).
2. Materials and methods 2.1. Sequences
3. Results and discussion
Sixteen COI sequences of H. momus earlier deposited by the authors in GenBank, were retrieved from GenBank, NCBI (n.d.) (www.ncbi.nlm. nih.gov/genbank) in FASTA format.
Analysis of the nucleotide compositions of H. momus COI sequences is important, as it is closely related with the gene function (Garcia et al., 2011). All the 16 COI sequences displayed high A and T bases, which is common in the mitochondrial DNA unlike the Nuclear mitochondrial Pseudogene (NUMT) sequences (Shabeer Ahmed and Jaffar Ali, 2016). Similar trend was observed in microbial genomes, where the codon usage is strongly associated with AT content in AT rich genomes (Jon et al., 2013). It was found that in cases of extreme genomic GC compositions in bacteria, compositional bias was found to be the dominant factor in synonymous codon usage bias (Muto and Osawa, 1987). Universal bias towards AT bases were observed in bacteria (Hershberg and Petrov, 2010; Hildebrand et al., 2010). Results of nucleotide distribution in the COI sequences of H. momus are given in Table 1. Hydrophobicity analysis (GRAVY) results showed stable positive values in all the sequences. These values indicate the hydrophobic nature of this gene, which corroborated with the hydropathy plots (Fig. 1a to p), where all the peaks displayed very high score values, above the base score value of 500, which is considered significant. TMPred analysis of these 16 COI protein sequences of H. momus was in accordance with the COI protein sequences of two colonial ascidians Polyclinum indicum and Didemnum candidum (Shabeer Ahmed and Jaffar Ali, 2015). Hydropathy plots of the COI sequences predicted transmembrane helices, which were in accordance with the transmembrane helical structure of the COI protein proposed by Saraste (1990). High score values of these transmembrane helices indicate the strong hydrophobic nature of COI gene. Both GRAVY and hydropathy plot suggested that the COI protein is hydrophobic (Kyte and Doolittle, 1982). Rare codon analysis using GenScript web server was carried to know the percentage of rare codons in each COI sequence. Codon frequency distribution value100 indicates the codons that are highly used for a particular amino acid, while the values less than 30 indicate low frequency codons, which probable may affect the expression efficiency. The percentage of low frequency codon in the 16 COI sequences are 15%, 8%, 15%, 15%, 15%, 15%, 15%, 15%, 15%, 16%, 14%, 6%, 8%,
2.2. Nucleotide composition Nucleotide composition of H. momus COI sequences were analyzed using Bioedit sequence alignment editor (Hall, 1999). Individual A, T, G, C, AT% and GC% values were calculated. 2.3. Hydrophobicity analysis Grand Average of Hydropathicity (GRAVY) (Kyte and Doolittle, 1982) was calculated using Expasy- ProtParam tool (www.expasy.org/ tool/protparam.html). It is calculated by adding the hydropathy values of all the amino acid residues and dividing it by the length of the protein sequence. 2.4. Hydropathy plot Hydropathy plots for the COI protein sequences were constructed using TMpred web server (Hofmann and Stoffel, 1993). Plot has length of the COI protein sequence in the x - axis against the hydrophobic score in the Y – axis. 2.5. Codon distribution frequency analysis This is a measure to study the high and low used codons in the COI gene sequences. Rare codons are the low used codons, like synonymous codon or stop codon. This was performed using GenScript web server (GenScript rare codon analysis tools) (www.genscript.com/cgi-bin/ tools/rare-codon-analysis/) by plotting the codon frequency distribution against the percentage of codons. 2.6. Indices of codon bias level 2.6.1. Effective Number of Codons (ENC) ENC is analyzed to assess the non-uniform usage of codons, within synonymous group of codons. This parameter is used to study codon bias in one specific gene (Wright, 1990). ENC were calculated using Codon W 1.4 (http://codonw.sourceforge.net) (Codon, W; Peden, 2000).
Table 1 Nucleotide content of Herdmania momus COI gene sequences.
2.7. Relative Synonymous Codon Usage (RSCU) Relative Synonymous Codon Usage (RSCU) was analyzed through RSCU model in MEGA 6 software (Tamura et al., 2013). The pattern of codon usage and the frequency of the synonymous codons encoding the same amino acid were computed. 2.8. Indices related with mutation bias 2.8.1. GC3 This is a measure of the number of GC nucleotides and number of G or C nucleotides at the third position of the synonymous codon 2
Gene bank accession number
A
T
G
C
AT%
GC%
KM411616 KM058116 KR867633 MH720939 MH720940 MH720941 MH720942 MH720943 MH720944 MH720945 MH720946 MH720947 MH720948 MH720949 MH720950 MH720951 Average
184 133 144 166 156 148 129 117 113 120 123 119 130 125 127 123 134.8
294 257 252 282 278 282 243 211 191 214 233 208 246 225 236 217 241.8
154 132 129 147 141 149 125 115 110 117 126 115 130 121 125 115 128.18
104 75 76 92 84 99 71 64 60 66 74 61 74 65 68 63 74.75
64.946 65.327 65.89 65.211 65.857 63.422 65.493 64.694 64.135 64.603 64.029 65.01 64.828 65.299 65.288 65.637 64.97
35.054 34.673 34.11 34.789 34.143 36.578 34.507 35.306 35.865 35.397 35.971 34.99 35.172 34.701 34.712 34.363 35.02
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Fig. 1. a to p Hydropathy plots of H.momus COI sequences.
3
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Fig. 1. (continued)
4
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Fig. 2. a to p Codon frequency distribution of H. momus COI gene sequences. 5
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Fig. 2. (continued) 6
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
7%, 7%, and 16% (Fig. 2a to p). This result shows that H. momus COI sequences contain significantly small number of rare codons that may reduce the translational efficiency of gene. Quantification of codon usage bias was done using ENC, since it is one of the best estimates of codon usage bias (Fuglsang, 2003). ENC was calculated to examine the pattern of synonymous codon usage for the sequence varying in length. ENC values of the H. momus COI sequences ranged from 34 to 61. In general the ENC value 20 indicates the usage of one codon, while equal usage of codon is represented by the value 61. Variations in ENC values, irrespective of their length were observed in the study sequences (Table 2). Except the 5 sequences (KM058116, MH720947-50), which displayed lower values (34, 35.7, 34, 34.6, 34.8), all the other sequences had higher values, with the sequence MH720941 exhibiting the highest value of 61. Sequences MH720943 and MH720945 showed values (56.6 and 56.5) close to the ENC of measles virus genome (55.1) (Jenkins and Holmes, 2003). The average ENC value (47.13) is much higher than the base value (Jon et al., 2013), hence from this it is suggested that synonymous codons of H. momus COI gene are equally used and less bias in the usage of synonymous codons. In order to investigate the synonymous codon usage bias, RSCU
Table 2 Hydrophobic, ENC, GC3 and CAI analysis of Herdmania momus COI gene sequences. GenBank accession number
Gene length (bp)
GRAVY
GC content
ENC
GC3%
CAI
KM411616 KM058116 KR867633 MH720939 MH720940 MH720941 MH720942 MH720943 MH720944 MH720945 MH720946 MH720947 MH720948 MH720949 MH720950 MH720951 Average
736 597 601 687 659 678 568 507 474 517 556 503 580 536 556 518 579.56
0.440 0.325 0.368 0.397 0.364 0.325 0.334 0.349 0.371 0.357 0.347 0.348 0.336 0.339 0.330 0.349 0.354
35.054 34.673 34.11 34.789 34.143 36.578 34.507 35.306 35.865 35.397 35.971 34.99 35.172 34.701 34.712 34.363 35.02
48.744 34.003 47.209 49.982 49.201 61 48.969 56.64 57.574 56.563 47.724 35.747 34.254 34.657 34.899 57.049 47.13
13.179 11.223 13.311 13.246 13.505 14.012 13.908 14.201 14.135 14.313 14.209 11.531 11.552 11.381 11.151 13.9 13.04
0.635 0.692 0.617 0.633 0.632 0.612 0.623 0.602 0.595 0.602 0.607 0.681 0.686 0.679 0.681 0.609 0.636
Table 3 Relative synonymous codon usage in H. momus COI sequences. GenBank accession number
Phenylalanine Leucine
Isoleucine Methionine Valine
Serine
Proline
Threonine
Alanine
Tyrosine Termination Histidine Glutamine Asparagine Lysine
MH720947
MH720948
MH720949
MH720950
MH720951
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
UUU(F) UUC(F) UUA(L) UUG(L) CUU(L) CUC(L) CUA(L) CUG(L) AUU(I) AUC(I) AUA(M) AUG(M) GUU(V) GUC(V) GUA(V) GUG(V) UCU(S) UCC(S) UCA(S) UCG(S) CCU(P) CCC(P) CCA(P) CCG(P) ACU(T) ACC(T) ACA(T) ACG(T) GCU(A) GCC(A) GCA(A) GCG(A) UAU(Y) UAC(Y) UAA(*) UAG(*) CAU(H) CAC(H) CAA(Q) CAG(Q) AAU(N) AAC(N) AAA(K) AAG(K)
14 2 8 9 0 0 0 1 12 1 5 10 5 0 3 4 6 0 0 0 7 0 2 0 5 0 3 0 6 1 1 2 1 2 0 0 2 1 1 2 6 1 0 2
1.75 0.25 2.67 3 0 0 0 0.33 1.85 0.15 0.67 1.33 1.67 0 1 1.33 3 0 0 0 3.11 0 0.89 0 2.5 0 1.5 0 2.4 0.4 0.4 0.8 0.67 1.33 0 0 1.33 0.67 0.67 1.33 1.71 0.29 0 2
15 2 10 14 0 0 0 2 13 1 6 10 6 0 4 5 8 0 0 0 9 0 2 0 7 1 3 0 9 1 1 2 1 2 0 0 2 1 1 2 6 1 0 2
1.76 0.24 2.31 3.23 0 0 0 0.46 1.86 0.14 0.75 1.25 1.6 0 1.07 1.33 3.43 0 0 0 3.27 0 0.73 0 2.55 0.36 1.09 0 2.77 0.31 0.31 0.62 0.67 1.33 0 0 1.33 0.67 0.67 1.33 1.71 0.29 0 2
15 2 8 11 0 0 0 1 12 1 6 10 6 0 4 4 7 0 0 0 8 0 2 0 6 0 3 0 6 1 1 2 1 2 0 0 2 1 1 2 6 1 0 2
1.76 0.24 2.4 3.3 0 0 0 0.3 1.85 0.15 0.75 1.25 1.71 0 1.14 1.14 3.23 0 0 0 3.2 0 0.8 0 2.67 0 1.33 0 2.4 0.4 0.4 0.8 0.67 1.33 0 0 1.33 0.67 0.67 1.33 1.71 0.29 0 2
15 2 8 14 0 0 0 2 13 1 6 10 6 0 4 4 7 0 0 0 8 0 2 0 7 1 3 0 6 1 1 2 1 2 0 0 2 1 1 2 6 1 0 2
1.76 0.24 2 3.5 0 0 0 0.5 1.86 0.14 0.75 1.25 1.71 0 1.41 1.41 3.23 0 0 0 3.2 0 0.8 0 2.55 0.36 1.09 0 2.4 0.4 0.4 0.8 0.67 1.33 0 0 1.33 0.67 0.67 1.33 1.71 0.29 0 2
13 1 10 8 6 6 9 5 6 0 3 3 4 2 9 3 1 0 2 1 0 0 1 0 1 1 2 0 0 1 0 0 3 3 0 0 2 1 1 2 1 0 1 1
1.86 0.14 1.36 1.09 0.82 0.82 1.23 0.68 2 0 1 1 0.89 0.44 2 0.67 1.5 0 3 1.5 0 0 4 0 1 1 2 0 0 4 0 0 1 1 0 0 1.33 0.67 0.67 1.33 2 0 1 1
(continued on next page) 7
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Table 3 (continued) GenBank accession number
Aspartate Glutamate Cysteine Tryptophan Arginine
Serine Glycine
Average number of codons
GenBank accession number
Phenylalanine Leucine
Isoleucine Methionine Valine
Serine
Proline
Threonine
Alanine
Tyrosine Termination Histidine Glutamine Asparagine Lysine Aspartate
MH720947
MH720948
MH720949
MH720950
MH720951
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
GAU(D) GAC(D) GAA(E) GAG(E) UGU(C) UGC(C) UGA(W) UGG(W) CGU(R) CGC(R) CGA(R) CGG(R) AGU(S) AGC(S) AGA(G) AGG(G) GGU(G) GGC(G) GGA(G) GGG(G)
4 0 2 0 1 0 2 3 1 0 0 1 6 0 6 4 9 1 1 1 167
2 0 2 0 2 0 0.8 1.2 2 0 0 2 3 0 1.64 1.09 2.45 0.27 0.27 0.27
4 0 2 0 2 0 3 3 1 0 0 1 6 0 6 4 9 1 1 1 193
2 0 2 0 2 0 1 1 2 0 0 2 2.57 0 1.64 1.09 2.45 0.27 0.27 0.27
4 0 2 0 2 0 3 3 1 0 0 1 6 0 6 4 9 1 1 1 178
2 0 2 0 2 0 1 1 2 0 0 2 2.77 0 1.64 1.09 2.45 0.27 0.27 0.27
4 0 2 0 2 0 3 3 1 0 0 1 6 0 6 4 9 1 1 1 185
2 0 2 0 2 0 1 1 2 0 0 2 2.77 0 1.64 1.09 2.45 0.27 0.27 0.27
2 1 3 3 8 2 9 5 0 0 0 2 0 0 1 3 4 0 4 1 172
1.33 0.67 1 1 1.6 0.4 1.29 0.71 0 0 0 4 0 0 0.46 1.38 1.85 0 1.85 0.46
KM058116
KM411616
KR867633
MH720939
MH720940
MH720941
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
UUU(F) UUC(F) UUA(L) UUG(L) CUU(L) CUC(L) CUA(L) CUG(L) AUU(I) AUC(I) AUA(M) AUG(M) GUU(V) GUC(V) GUA(V) GUG(V) UCU(S) UCC(S) UCA(S) UCG(S) CCU(P) CCC(P) CCA(P) CCG(P) ACU(T) ACC(T) ACA(T) ACG(T) GCU(A) GCC(A) GCA(A) GCG(A) UAU(Y) UAC(Y) UAA(*) UAG(*) CAU(H) CAC(H) CAA(Q) CAG(Q) AAU(N) AAC(N) AAA(K) AAG(K) GAU(D) GAC(D)
16 2 11 15 0 0 0 2 14 1 6 11 6 0 4 5 8 0 0 0 9 0 2 0 8 1 3 0 9 1 1 2 1 2 0 0 2 1 1 2 6 1 0 2 4 0
1.78 0.22 2.36 3.21 0 0 0 0.43 1.87 0.13 0.71 1.29 1.6 0 1.07 1.33 3.43 0 0 0 3.27 0 0.73 0 2.67 0.33 1 0 2.77 0.31 0.31 0.62 0.67 1.33 0 0 1.33 0.67 0.67 1.33 1.71 0.29 0 2 2 0
17 2 13 9 9 7 12 10 6 1 3 3 5 3 12 3 1 0 3 1 0 0 2 0 1 2 5 1 0 2 2 0 4 8 0 0 2 1 3 2 1 0 7 1 2 1
1.79 0.21 1.3 0.9 0.9 0.7 1.2 1 1.71 0.29 1 1 0.87 0.52 2.09 0.52 0.67 0 2 0.67 0 0 4 0 0.44 0.89 2.22 0.44 0 2 2 0 0.67 1.33 0 0 1.33 0.67 1.2 0.8 2 0 1.75 0.25 1.33 0.67
15 1 11 9 7 6 12 9 5 1 3 3 5 2 8 3 1 0 1 1 0 0 2 0 1 1 2 0 0 1 0 0 4 5 0 0 1 1 1 2 1 0 5 1 2 1
1.88 0.13 1.22 1 0.78 0.67 1.33 1 1.67 0.33 1 1 1.11 0.44 1.78 0.67 2 0 2 2 0 0 4 0 1 1 2 0 0 4 0 0 0.89 1.11 0 0 1 1 0.67 1.33 2 0 1.67 0.33 1.33 0.67
17 1 12 9 9 6 12 9 6 1 3 3 5 3 11 3 1 0 2 1 0 0 2 0 1 2 4 0 0 1 2 0 4 7 0 0 2 1 1 2 1 0 5 1 2 1
1.89 0.11 1.26 0.95 0.95 0.63 1.26 0.95 1.71 0.29 1 1 0.91 0.55 2 0.55 0.75 0 1.5 0.75 0 0 4 0 0.57 1.14 2.29 0 0 1.33 2.67 0 0.73 1.27 0 0 1.33 0.67 0.67 1.33 2 0 1.67 0.33 1.33 0.67
17 1 12 9 9 6 12 9 6 1 3 3 5 3 10 3 1 0 2 1 0 0 2 0 1 2 2 0 0 1 1 0 4 5 0 0 2 1 1 2 1 0 5 1 2 1
1.89 0.11 1.26 0.95 0.95 0.63 1.26 0.95 1.71 0.29 1 1 0.95 0.57 1.9 0.57 1 0 2 1 0 0 4 0 0.8 1.6 1.6 0 0 2 2 0 0.89 1.11 0 0 1.33 0.67 0.67 1.33 2 0 1.67 0.33 1.33 0.67
16 1 11 9 8 6 12 9 6 0 3 3 5 2 9 3 1 0 2 1 0 0 2 0 1 1 2 0 0 1 0 0 4 5 0 0 2 1 1 2 1 0 1 1 2 1
1.88 0.12 1.2 0.98 0.87 0.65 1.31 0.98 2 0 1 1 1.05 0.42 1.89 0.63 1.5 0 3 1.5 0 0 4 0 1 1 2 0 0 4 0 0 0.89 1.11 0 0 1.33 0.67 0.67 1.33 2 0 1 1 1.33 0.67
(continued on next page) 8
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Table 3 (continued) GenBank accession number
Glutamate Cysteine Tryptophan Arginine
Serine Glycine
Average number of codons GenBank accession number
Phenylalanine Leucine
Isoleucine Methionine Valine
Serine
Proline
Threonine
Alanine
Tyrosine Termination Histidine Glutamine Asparagine Lysine Aspartate Glutamate
KM058116
KM411616
KR867633
MH720939
MH720940
MH720941
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
GAA(E) GAG(E) UGU(C) UGC(C) UGA(W) UGG(W) CGU(R) CGC(R) CGA(R) CGG(R) AGU(S) AGC(S) AGA(G) AGG(G) GGU(G) GGC(G) GGA(G) GGG(G)
2 0 2 0 3 3 1 0 0 1 6 0 6 4 9 1 1 1 199
2 0 2 0 1 1 2 0 0 2 2.57 0 1.64 1.09 2.45 0.27 0.27 0.27
4 3 15 4 9 8 0 0 0 2 2 2 1 3 4 1 4 1 232
1.14 0.86 1.58 0.42 1.06 0.94 0 0 0 4 1.33 1.33 0.43 1.29 1.71 0.43 1.71 0.43
4 3 13 4 9 6 0 0 0 2 0 0 1 3 4 0 3 1 187
1.14 0.86 1.53 0.47 1.2 0.8 0 0 0 4 0 0 0.5 1.5 2 0 1.5 0.5
4 3 15 4 9 7 0 0 0 2 2 2 1 3 4 1 4 1 215
1.14 0.86 1.58 0.42 1.13 0.88 0 0 0 4 1.5 1.5 0.43 1.29 1.71 0.43 1.71 0.43
4 3 15 4 9 6 0 0 0 2 2 0 1 3 4 0 4 1 205
1.14 0.86 1.58 0.42 1.2 0.8 0 0 0 4 2 0 0.46 1.38 1.85 0 1.85 0.46
4 3 15 3 9 6 0 0 0 2 0 0 1 3 4 0 4 1 190
1.14 0.86 1.67 0.33 1.2 0.8 0 0 0 4 0 0 0.46 1.38 1.85 0 1.85 0.46
MH720942
MH720943
MH720944
MH720945
MH720946
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
UUU(F) UUC(F) UUA(L) UUG(L) CUU(L) CUC(L) CUA(L) CUG(L) AUU(I) AUC(I) AUA(M) AUG(M) GUU(V) GUC(V) GUA(V) GUG(V) UCU(S) UCC(S) UCA(S) UCG(S) CCU(P) CCC(P) CCA(P) CCG(P) ACU(T) ACC(T) ACA(T) ACG(T) GCU(A) GCC(A) GCA(A) GCG(A) UAU(Y) UAC(Y) UAA(*) UAG(*) CAU(H) CAC(H) CAA(Q) CAG(Q) AAU(N) AAC(N) AAA(K) AAG(K) GAU(D) GAC(D) GAA(E) GAG(E)
16 1 11 9 8 6 12 9 6 0 3 3 5 2 9 3 1 0 2 1 0 0 2 0 1 1 2 0 0 1 0 0 4 5 0 0 2 1 1 2 1 0 1 1 2 1 4 3
1.88 0.12 1.2 0.98 0.87 0.65 1.31 0.98 2 0 1 1 1.05 0.42 1.89 0.63 1.5 0 3 1.5 0 0 4 0 1 1 2 0 0 4 0 0 0.89 1.11 0 0 1.33 0.67 0.67 1.33 2 0 1 1 1.33 0.67 1.14 0.86
13 1 9 8 6 6 9 6 5 0 3 3 5 2 8 3 1 0 1 1 0 0 1 0 1 1 2 0 0 1 0 0 3 4 0 0 1 1 1 2 1 0 1 1 2 1 4 3
1.86 0.14 1.23 1.09 0.82 0.82 1.23 0.82 2 0 1 1 1.11 0.44 1.78 0.67 2 0 2 2 0 0 4 0 1 1 2 0 0 4 0 0 0.86 1.14 0 0 1 1 0.67 1.33 2 0 1 1 1.33 0.67 1.14 0.86
10 1 9 8 5 6 8 5 5 0 3 3 4 2 8 3 1 0 1 1 0 0 1 0 1 1 2 0 0 1 0 0 3 3 0 0 1 1 1 2 1 0 1 1 2 1 3 3
1.82 0.18 1.32 1.17 0.73 0.88 1.17 0.73 2 0 1 1 0.94 0.47 1.88 0.71 2 0 2 2 0 0 4 0 1 1 2 0 0 4 0 0 1 1 0 0 1 1 0.67 1.33 2 0 1 1 1.33 0.67 1 1
13 1 9 8 6 6 10 6 5 0 3 3 5 2 8 3 1 0 1 1 0 0 1 0 1 1 2 0 0 1 0 0 3 4 0 0 1 1 1 2 1 0 1 1 2 1 4 3
1.86 0.14 1.2 1.07 0.8 0.8 1.33 0.8 2 0 1 1 1.11 0.44 1.78 0.67 2 0 2 2 0 0 4 0 1 1 2 0 0 4 0 0 0.86 1.14 0 0 1 1 0.67 1.33 2 0 1 1 1.33 0.67 1.14 0.86
14 1 9 8 7 6 10 9 5 0 3 3 5 2 8 3 1 0 1 1 0 0 2 0 1 1 2 0 0 1 0 0 3 5 0 0 1 1 1 2 1 0 1 1 2 1 4 3
1.87 0.13 1.1 0.98 0.86 0.73 1.22 1.1 2 0 1 1 1.11 0.44 1.78 0.67 2 0 2 2 0 0 4 0 1 1 2 0 0 4 0 0 0.75 1.25 0 0 1 1 0.67 1.33 2 0 1 1 1.33 0.67 1.14 0.86
(continued on next page) 9
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen
Table 3 (continued) GenBank accession number
Cysteine Tryptophan Arginine
Serine Glycine
Average number of codons
MH720942
MH720943
MH720944
MH720945
MH720946
Codon
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
Count
RSCU
UGU(C) UGC(C) UGA(W) UGG(W) CGU(R) CGC(R) CGA(R) CGG(R) AGU(S) AGC(S) AGA(G) AGG(G) GGU(G) GGC(G) GGA(G) GGG(G)
15 3 9 6 0 0 0 2 0 0 1 3 4 0 4 1 190
1.67 0.33 1.2 0.8 0 0 0 4 0 0 0.46 1.38 1.85 0 1.85 0.46
7 3 9 5 0 0 0 2 0 0 1 3 4 0 3 1 159
1.4 0.6 1.29 0.71 0 0 0 4 0 0 0.5 1.5 2 0 1.5 0.5
5 3 9 5 0 0 0 2 0 0 1 3 4 0 3 1 148
1.25 0.75 1.29 0.71 0 0 0 4 0 0 0.5 1.5 2 0 1.5 0.5
7 3 9 6 0 0 0 2 0 0 1 3 4 0 3 1 161
1.4 0.6 1.2 0.8 0 0 0 4 0 0 0.5 1.5 2 0 1.5 0.5
11 4 9 6 0 0 0 2 0 0 1 3 4 0 3 1 173
1.47 0.53 1.2 0.8 0 0 0 4 0 0 0.5 1.5 2 0 1.5 0.5
Note: Numbers in bold indicates high frequency codons.
4. Conclusion
Table 4 A + T and G + C preferential codon usage of Herdmania momus COI gene sequences. GenBank accession number
A+U
G+C
KM411616 KM058116 KR867633 MH720939 MH720940 MH720941 MH720942 MH720943 MH720944 MH720945 MH720946 MH720947 MH720948 MH720949 MH720950 MH720951
10 12 8 9 10 8 8 7 7 7 7 12 12 12 12 8
2 4 3 3 2 3 3 3 3 3 4 5 4 4 4 3
In summary, to the best of our knowledge, this study is the first of its kind to report the Relative Synonymous Codon Usage pattern in COI gene of ascidians. This study revealed that codon usage in H. momus COI gene is slightly biased. All the sequence displayed similar results in terms of their low GC, GC3 and high ENC value. GC3study suggested the base compositional bias towards AT bases. Low GC and GC3values revealed that mutational bias and translational selection does not play a momentous role in shaping the codon usage pattern in the COI gene of H. momus. Further study on RSCU using COI gene of ascidians is needed to understand the codon usage pattern in cytochrome oxidase I across the ascidian species. Declaration of competing interest The authors report no conflicts of interest. Authors alone are responsible for the content and writing of the paper. References
values were computed for all the 16 COI sequences differing in length (Table 3). Most preferred codons were UUU, AUU, AUG, UCU, CCU, CCA, ACU, GCU, GCC,UAC,CAU, CAG, AAU,AAG,GAU,GAA,UGU, UGA, CGG and AGU. Of these high frequency codons 44% codons use A + U, while 16% use G + C as base in the third position of the codon (Table 4). Remaining 40% has equal usage of both A or U and G or C ending codons. The propensity of the COI gene in using high A + U and low G + C showed that mutational bias does not play momentous role in synonymous codon usage. GC content at the third position of codon (GC3), analyzed to understand the extent of base composition bias is given in Table 2. GC3 values ranged from 11.1 to 14.1%. Low GC3 values, is due to the reason that this COI gene is located in the AT rich region of the mitochondrial DNA. Therefore the codons of this gene would prefer to use AT bases, indicating less bias towards the usage of GC ending codons. Codon Adaptive Index (CAI) was computed to measure the gene expression level. Values of all the sequences were stable, ranging from 0.59 to 0.69 with average value 0.63 (Table 2). Generally values above 0.8 indicate the high expression of genes. From the CAI values it is predicted that sequences (KM058116, MH720947-50) are highly expressed, while the remaining sequences are close to the good expression range. In other words none of the sequences showed poor expression.
Bennetzen, J.L., Hall, B.D., 1982. Codon selection in yeast. J. Biol. Chem. 257, 3026–3031. Cai, M.S., Cheng, A.C., Wang, M.S., Zhao, L.C., Zhu, D.K., Luo, Q.H., Liu, F., Chen, X.Y., 2009. Characterization of synonymous codon usage bias in the duck plague virus UL35 gene. Inter. virol. 52, 266–278. CAIcal Server Available online. http://genomes.urv.es/CAIcal/intro.php (Accessed on 6 July 2019). Codon, W Available online. http://codonw.sourceforge.net/. Duret, L., 2002. Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet. Dev. 12, 640–649. Francino, H.P., Ochman, H., 1999. Isochores result from mutation not selection. Nature 400, 30 31. Fuglsang, A., 2003. The effective number of codons for individual amino acids: some codons are more optimal than others. Gene 320, 185–190. Garcia, J.A.L., Fern’andez-Guerra, A., Casamayor, E.O., 2011. A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes. Mol. Phylogenet. Evol. 61, 650–658. Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A., 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8, r49–r62. Grantham, R., Gautier, C., Gouy, M., Jacobzone, M., Mercier, R., 1981. Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res. 9, r43–r74. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. 41, 95–98. Herbeck, J.T., Novembre, J., 2003. Codon usage patterns in cytochrome oxidase I across multiple insect orders. J. Mol. Evol. 56, 691–701. Hershberg, R., Petrov, D.A., 2010. Evidence that mutation is universally biased towards at in bacteria. PLoS Genet. 6, e1001115. Hildebrand, F., Meyer, A., Eyre-Walker, A., 2010. Evidence of selection upon genomic GC-
10
Gene Reports 17 (2019) 100523
S.A. Nariyampet and A.J.A. Hajamohideen content in bacteria. PLoS Genet. 6, e1001107. Hofmann, K., Stoffel, W., 1993. TMbase-A database of membrane spanning proteins segments. Biol Chem Hoppe – Seyler 374, 166. Available online. http://www.ch. embnet.org/software/TMPRED_form.html Accessed on 19 July 2019. GenScript rare codon analysis tool. Available online. http://www.genscript.com/cgi-bin/ tools/rare-codon-analysis/ (Accessed on 20 July 2019). Ingvarsson, P.K., 2007. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol. Biol. Evol. 24, 836–844. Jaffar Ali, H.A., Shabeer Ahmed, N., 2016. DNA barcoding of two solitary ascidians, Herdmania momus Savigny, 1816 and Microcosmus squamiger Michaelsen, 1927 from Thoothukudi coast, India, mitochondrial DNA. A DNA Mapp Seq Anal. 27 (4), 3005–3007. Jenkins, G.M., Holmes, E.C., 2003. The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res. 92 (1), 1–7. Jon, B., Ola, B., Tammi, V., Eystein, S., Ussery, D.W., 2013. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes. PLoS One 8, e69878. Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 (1), 105–132. Liu, J., Zhu, D., Ma, G., Liu, M., Wang, M., Jia, R., Chen, S., Sun, K., Yang, Q., Wu, Y., Chen, X., Cheng, A., 2016. Genome-wide analysis of the synonymous codon usage patterns in Riemerella anatipestifer. Int. J. Mol. Sci. 17, 1304. Mirsafian, H., Mat Ripen, A., Singh, A., Hwan Teo, P., Feisal Merican, A., Bin Mohamad, S., 2014. A comparative analysis of synonymous codon usage bias pattern in human albumin superfamily. Sci. W. J. 2014, 639682. Moriyama, E.N., Powell, J.R., 1997. Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J. Mol. Evol. 45, 378–391. Morlais, I., Severson, D.W., 2002. Complete mitochondrial DNA sequence and amino acid analysis of the cytochrome C oxidase subunit I (COI) from Aedes aegypti. DNA Seq. 13 (2), 123–127. Muto, A., Osawa, S., 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. U. S. A. 84, 166–169. NCBI http://www.ncbi.nlm.nih.gov/genbank.
Palidwor, G.A., Perkins, T.J., Xia, X., 2010. A general model of codon bias due to GC mutational bias. PLoS One 5 (10), e13431. Peden, J.F., 2000. Analysis of Codon Usage (Ph.D. thesis). Nottingham University, dep. of Genet. Saraste, M., 1990. Structural features of cytochrome oxidase. Q. Rev. Biophys. 23, 331–366. Shabeer Ahmed, N., Jaffar Ali, H.A., 2015. In silico analysis of COI gene sequences of two colonial ascidians Polyclinum indicum and Didemnum candidum from Gulf of Mannar. Int. J. Sci. Humanit. 1 (1), 217–224. Shabeer Ahmed, N., Jaffar Ali, H.A., 2016. Numts: an impediment to DNA barcoding of polyclinids, tunicata. Mitochondrial DNA Part A 27, 3395–3398. Sharp, P.M., Li, W.H., 1986a. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38. Sharp, P.M., Li, W.H., 1986b. Codon usage in regulatory genes in Escherichia coli does not reflect selection for rare codons. Nucleic Acids Res. 14 (19), 7737–7749. Sharp, P.M., Emery, L.R., Zeng, K., 2010. Forces that influence the evolution of codon bias. Philos. Trans. R. Soc. B 365 (1544), 1203–1212. Sheng, Z., Qin, Z., Chen, Z., Zhao, Y., Zhong, J., 2007. The factors shaping synonymous codon usage in the genome of Burkholderia mallei. J. Genet. Genom. 34, 362–372. Sun, Z., Wan, D., Murphy, R.W., Ma, L., Zhang, X., Huang, D., 2009. Comparison of base composition and codon usage in insect mitochondrial genomes. Genes Genom 31 (1), 65 71. Tamura, K., Stecher, G., Peterson, D., Filipki, A., Kumar, S., 2013. Molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729. Wang, H.C., Hickey, D.A., 2007. Rapid divergence of codon usage patterns within the rice genome. BMC Evol. Biol. 7 (1), S6. Wei, L., He, J., Jia, X., Qi, Q., Liang, Z., Zheng, H., Ping, Y., Liu, S., Sun, J., 2014. Analysis of codon usage bias of mitochondrial genome in Bombyx mori and its relation to evolution. BMC Evol. Biol. 14, 262. Wright, F., 1990. The effective number of codons used in a gene. Gene 87 (1), 23 29. Zhong, Q., Xu, W., Wu, Y., Xu, H., 2012. Patterns of synonymous codon usage on human metapneumovirus and its influencing factors. J. Biomed. Biotechnol. 5, 460837.
11