Chaos, Solitons & Fractals 69 (2014) 209–216
Contents lists available at ScienceDirect
Chaos, Solitons & Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
Comparative analysis of bacterial essential and nonessential genes with Hurst exponent based on chaos game representation Qian Zhou a,b,⇑, Yong-ming Yu c a Province-Ministry Joint Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability, Mailbox 293, Hebei University of Technology (East Campus), No. 8 Guangrong Road, Hongqiao District, Tianjin 300130, China b Department of Biomedical Engineering, Mailbox 293, Hebei University of Technology (East Campus), No. 8 Guangrong Road, Hongqiao District, Tianjin 300130, China c Department of Biomedical Engineering, Shandong University, No. 17923 Jingshi Road, Lixia District, Jinan 250061, China
a r t i c l e
i n f o
Article history: Received 30 July 2014 Accepted 1 October 2014
a b s t r a c t Essential genes are indispensable for the survival of an organism. Investigating features associated with gene essentiality is fundamental to the prediction and identification of essential genes with computational techniques. We use fractal theory approach to make comparative analysis of essential and nonessential genes in bacteria. The Hurst exponents of essential genes and nonessential genes available in the DEG database for 27 bacteria are calculated based on their gene chaos game representations. It is found that for most analyzed bacteria, weak negative correlation exists between Hurst exponent and gene length. Moreover, essential genes generally differ from nonessential genes in their Hurst exponent. For genes of similar length, the average Hurst exponent of essential genes is smaller than that of nonessential genes. The results of our work reveal that gene Hurst exponent is very probably useful gene feature for the algorithm predicting essential genes. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction Essential genes are indispensable for the survival of an organism and are therefore considered to be the foundation of life [1,2]. Identification of essential genes has not only great theoretical significance in understanding the universal principles of life [3], but also practical importance in the drug discovery. For instance, essential genes are proposed to be potential targets for new antibiotics, due to their indispensability for bacterial cell survival [4]. There are mainly two approaches to predict and identify essential genes, i.e. experimental techniques and computational techniques. Experimental techniques have long ⇑ Corresponding author at: Province-Ministry Joint Key Laboratory of Electromagnetic Field and Electrical Apparatus Reliability, Mailbox 293, Hebei University of Technology (East Campus), No. 8 Guangrong Road, Hongqiao District, Tianjin 300130, China. Tel.: +86 22 60201524. E-mail address:
[email protected] (Q. Zhou). http://dx.doi.org/10.1016/j.chaos.2014.10.003 0960-0779/Ó 2014 Elsevier Ltd. All rights reserved.
durations and high costs, and the results may vary with different experimental conditions and criteria [5,6]. Therefore, computational techniques based on the features of essential genes are rapidly gaining interest over the past years [7,8]. For these feature-based methods, selecting features associated with gene essentiality is fundamental to predict essential genes. In the recent past, some interest has been shown in applying nonlinear physics for the analysis of DNA sequences and proteins, including the concepts of scaleinvariance, fractal, multifractal, long-range correlation and network [9–21]. Different techniques including mutual information functions, autocorrelation functions, power spectra, ‘‘DNA walk’’ representation and entropies, are used for statistical analysis of DNA sequences [10,12]. These studies are aimed to extract the complexity, regularity, structural and dynamical information of these sequences, and to explore the relation between the DNA primary structure and its biological function. Some studies have been
210
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
shown to be useful for biological identification, classification and prediction [13–15]. C. Stan et al. investigated the characteristics of coding and non-coding DNA sequences length of some bacteria and archaea using an extension of multifractal detrended cross-correlation analysis. The results of the analysis can be considered as criteria for the identification of class affiliation of bacteria and archaea [13]. Z.Y. Su et al. found local scaling exponent differences between coding segments (exon) and non-coding segments (intron), which can be applied to find coding segments within the DNA sequence that is to be translated into protein [14]. X.H. Niu et al. predicted DNA binding proteins using support vector machine with hybrid fractal features, and the results show better performances than existing methods [15]. Hwang et al used complex networks method to elucidate the relationships between genes. They found that essential genes tend to be highly connected and generally have more interactions than nonessential ones [16]. Chaos game representation (CGR) is a scale-independent representation method for genomic sequences proposed by Jeffrey in 1990 [22]. The technique, formally an iterative map, can be traced further back to the foundations of statistical mechanics, in particular to chaos theory [23]. It converts one dimensional DNA sequence into a two-dimensional graphical representation from which one can recognize both local and global patterns in the nucleotide sequences. Although the method was first proposed for DNA sequence, due to its effectiveness in characterizing multifractality of the time series, it now has been used for any arbitrary symbols [24]. The CGR method provides facilitated access to gene structure analysis, and has been successfully applied in the study of sequence comparison, classification, similarity analysis and correlation analysis, etc. [25–29]. Some studies have illustrated that compared with nonessential proteins, essential proteins tend to enrich in some function categories, such as transcription, translation, and replication and so on. For example, compared with nonessential proteins, essential genes are preferentially situated at the leading strand, enzymes are enriched in bacterial essential genes, and proteins encoded by essential genes are enriched in internal location sites etc. [30–33]. Here we use fractal theory approach to make comparative analysis of essential and nonessential genes in bacteria. The aim of our work is to explore essential gene features from the available identified gene sequences in order to help developing computational techniques to predict and identify essential genes. In this study, all the currently available identified essential genes and nonessential genes from the DEG database are analyzed. Based on the gene representation of CGR method, we obtain the time series model of gene sequences and calculate the Hurst exponent of bacterial genes. With this gene feature, comparative analysis of bacterial essential and nonessential genes is carried out. 2. Material and methods 2.1. Essential and nonessential gene datasets The data of essential and nonessential genes are obtained from the DEG database [34] (http://www.essentialgene.org/),
which stores records of currently available identified essential genes, nonessential genes and genomic elements among a wide range of organisms including bacteria, archaea and eukaryotes. A total of 27 bacteria with their essential genes and nonessential genes are collected from DEG 10. The analyzed bacteria information is listed in Table 1. 2.2. Chaos game representation of gene sequence In our work, we use the standard chaos game representation of gene sequence introduced by Jeffrey [22]. In CGR the four bases A, G, T, and C in the gene sequence are assigned to the corners of a square. Base A is at position (1, 1), G at position (1, 1), T at position (1, 1) and C at (1, 1). Any arbitrary base is represented as one point in the square. The first point is plotted halfway between the centre of the square and the corner corresponding to the first base of the sequence and successive points plotted halfway between the previous point and the corner corresponding to the base of each successive nucleotide in the sequence. In this way, the CGR points are generated by an iterated function system defined by the following equation,
Pi ¼ 0:5ðPi1 þ Si Þ;
16i6N
ð1Þ
where, Pi is the coordinates of one point in the CGR square, corresponds to the base at position i in the sequence, and
Table 1 Information on the analyzed bacteria.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
Bacteria name
Essential Nonessential gene gene number number
Bacillus subtilis 168 Vibrio cholera N16961 Francisella novicida U112 Acinetobacter baylyi ADP1 Mycoplasma genitalium G37 Mycoplasma pulmonis UAB CTIP Streptococcus sanguinis Porphyromonas gingivalis ATCC 33277 Bacteroides thetaiotaomicron VPI-5482 Burkholderia thailandensis E264 Sphingomonas wittichii RW1 Shewanella oneidensis MR-1 Campylobacter jejuni NCTC 11168 Caulobacter crescentus Helicobacter pylori 26695 Haemophilus influenzae Rd KW20 Escherichia coli MG1655 Salmonella enterica serovar Typhimurium LT2 Salmonella enterica serovar Typhimurium 14028S Salmonella enterica serovar Typhimurium SL1344 Salmonella enterica serovar Typhi Salmonella enterica serovar Typhi Ty2 Mycobacterium tuberculosis H37Rv Staphylococcus aureus N315 Staphylococcus aureus NCTC 8325 Pseudomonas aeruginosa UCBPP-PA14 Pseudomonas aeruginosa PAO1
271 779 392 499 381 310 218 463 325 406 535 402 228 480 323 642 296 230
3955 2943 1329 2594 94 322 2052 1627 4453 5226 4315 1103 1395 3224 1135 512 4077 4228
105
5210
353
4035
353 358 771 302 351 335 117
4005 3906 3171 2281 2541 960 5454
211
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
P0 = (0, 0). Si corresponds to the corner coordinates of this base. As is known, such graphical representations do not involve any loss of information on DNA sequences, which can be fully reconstructed form the corresponding 2D maps.
measured from the X-axis. The length of the arc on the unit circle that gives the value of h needs to be corrected (by adding p or 2p if necessary) according to the quadrant in which (x, y) occur in order to obtain h. The lengths of the obtained arcs represent the values of the corresponding bases in the gene sequence. In this way, one can convert gene sequences into time series.
2.3. Time series model based on Chaos game representation of gene sequences 2.4. Hurst exponent of time series Milan Randic outlined a procedure which converts the corresponding 2D CGR DNA maps into ‘spectrum-like’ format [35], which in fact gave a time series model of gene sequences. It makes possible exploring gene physical properties by applying time series analysis method. He projected the points in the CGR square from the center of the square to the periphery of the unit circle that overlap the square. Suppose (x, y) is the coordinates of one point in the CGR square, their ratio is tanh, where h is the angle
C
The Hurst exponent is the measure of the smoothness of fractal time series based on the asymptotic behavior of the rescaled range of the process [36]. Hurst developed rescaled range (R/S) analysis, a statistical method to analyze long records of natural phenomenon, which is considered to be the central tool of fractal data modeling [37]. In our work, we estimate the Hurst exponent of time series model for gene sequences based on this method. The
7
T
1
6
Corresponding arc length
0.8 0.6 0.4 0.2 0 -0.2 -0.4
4 3 2 1
-0.6 -0.8
A -1-1
5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
1
0.8
0 0
G
200
400
600
800 1000 1200 Base position (bp)
(a)
1400
1600
(b)
1.6 1.4 1.2
n
n
ln(R /S )
1 H=0.5725
0.8 0.6 0.4 0.2 0
0
0.5
1
1.5 ln(n)
2
2.5
3
(c) Fig. 1. (a) CGR plot of essential gene DEG10010180 of Bacillus subtilis 168; (b) time series from the CGR plot in Fig. 1(a); (c) calculation of Hurst exponent of this gene. The slope of the fitting line in Fig. 1(c) represents the value of Hurst exponent.
212
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
estimation procedure involves the following basic steps [36]. The cumulative total at each point in time, for a time series F over a total duration N, is given by P CN;k ¼ ki¼1 ðF i lN Þ; for 0 < k 6 N; where Fi is the value of the time series at time i, lN is the mean over the whole
P data set given by lN ¼ ð1=NÞ Ni¼1 F i .The range R of C is given by R = Max(CN,k) Min(CN,k) The standard deviation of the values over the whole data set is given by qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P S ¼ ð1=NÞ Ni¼1 ½F i lN 2 . The rescale range is given by
1 Nonessential gene Essential gene
0.9
Gene Hurst exponent
0.8
0.7
0.6
0.5
0.4
0
2000
4000
6000
8000 10000 Gene length(bp)
12000
14000
16000
Fig. 2. The Hurst exponents of essential and nonessential genes of Bacillus subtilis 168. The majority of both essential and nonessential genes have Hurst exponents above 0.5.
Table 2 The distribution ranges of Hurst exponents and Pearson correlation coefficients between Hurst exponent and gene length of the analyzed bacteria. Bacteria name
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
Bacillus subtilis 168 Vibrio cholera N16961 Francisella novicida U112 Acinetobacter baylyi ADP1 Mycoplasma genitalium G37 Mycoplasma pulmonis UAB CTIP Streptococcus sanguinis Porphyromonas gingivalis ATCC 33277 Bacteroides thetaiotaomicron VPI-5482 Burkholderia thailandensis E264 Sphingomonas wittichii RW1 Shewanella oneidensis MR-1 Campylobacter jejuni NCTC 11168 Caulobacter crescentus Helicobacter pylori 26695 Haemophilus influenzae Rd KW20 Escherichia coli MG1655 Salmonella enterica serovar Typhimurium LT2 Salmonella enterica serovar Typhimurium 14028S Salmonella enterica serovar Typhimurium SL1344 Salmonella enterica serovar Typhi Salmonella enterica serovar Typhi Ty2 Mycobacterium tuberculosis H37Rv Staphylococcus aureus N315 Staphylococcus aureus NCTC 8325 Pseudomonas aeruginosa UCBPP-PA14 Pseudomonas aeruginosa PAO1
Distribution ranges of Hurst exponent
Pearson correlation coefficients
Essential gene
Nonessential gene
Essential genea
Nonessential genea
[0.356, 0.764] [0.362,0.851] [0.402, 0.722] [0.384, 0.690] [0.370, 0.714] [0.380, 0.776] [0.410, 0.717] [0.381, 0.738] [0.418, 0.690] [0.402, 0.779] [0.422, 0.790] [0.380, 0.695] [0.394, 0.756] [0.394, 0.744] [0.390, 0.819] [0.369, 0.733] [0.404, 0.701] [0.397, 0.728] [0.376, 0.762] [0.394, 0.791] [0.397, 0.923] [0.420, 0.879] [0.376, 0.744] [0.394, 0.718] [0.368, 0.744] [0.385, 0.739] [0.418, 0.753]
[0.374, [0.358, [0.400, [0.372, [0.403, [0.254, [0.388, [0.388, [0.370, [0.389, [0.371, [0.391, [0.383, [0.380, [0.402, [0.409, [0.376, [0.369, [0.370, [0.375, [0.374, [0.374, [0.313, [0.366, [0.366, [0.391, [0.350,
0.206/0.001 0.302/0.000 0.159/0.002 0.074/0.097 0.044/0.391 0.069/0.225 0.136/0.045 0.247/0.000 0.242/0.000 0.152/0.002 0.268/0.000 0.238/0.000 0.290/0.000 0.185/0.000 0.163/0.003 0.251/0.000 0.174/0.003 0.213/0.001 0.124/0.209 0.226/0.000 0.232/0.000 0.176/0.001 0.096/0.008 0.201/0.000 0.268/0.000 0.078/0.156 0.113/0.224
0.256/0.000 0.195/0.000 0.112/0.000 0.171/0.000 0.150/0.149 0.128/0.022 0.175/0.000 0.287/0.000 0.261/0.000 0.091/0.000 0.217/0.000 0.201/0.000 0.128/0.000 0.274/0.000 0.186/0.000 0.149/0.001 0.317/0.000 0.204/0.000 0.340/0.000 0.182/0.000 0.274/0.000 0.247/0.000 0.009/0.625 0.137/0.000 0.159/0.000 0.206/0.000 0.082/0.000
0.857] 0.802] 0.828] 0.789] 0.677] 0.755] 0.792] 0.794] 0.875] 0.848] 0.794] 0.767] 0.821] 0.849] 0.827] 0.726] 0.858] 0.923] 0.970] 0.866] 0.862] 0.822] 0.844] 0.851] 0.807] 0.766] 0.881]
a The column contains two numbers: X/Y. X is the Pearson correlation coefficient, and Y is the p-value for the significance test of X. If Y is less than 0.05, the linear correlation is significant at the 5% level.
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
213
Fig. 3. The percentage and Hurst exponent of essential and nonessential genes of Bacillus subtilis 168 in each length fragment. (a) Proportions of essential and nonessential genes in each length fragments. In all the length fragments, there are a total of 17 length fragments in which the distribution of essential genes and nonessential genes are no less than 1%. (b) Average Hurst exponent of these two kinds of genes in each fragment. For 14 of the total 17 length fragments, the average Hurst exponent of essential genes is smaller than that of nonessential genes (Fig. 3(b)).
R/S. The Hurst exponent is estimated by plotting the values of log(R/S) versus log(N). The slope of the best fitting line gives the estimate of the Hurst exponent. This is obtained through least squares method. The Hurst exponent represents the degree of self-similarity of a data set. For a selfsimilar series with long-range dependence, the Hurst exponent is between 0.5 and 1. An increased Hurst exponent indicates an increase in the degree of self-similarity and long-range dependence.
3. Results 3.1. Hurst exponents of bacterial genes based on their CGR plots For each gene of the analyzed bacteria, we obtain the time series from its CGR plots with the method mentioned in 2.3, and then calculate its Hurst exponent. Take one gene of Bacillus subtilis 168, DEG10010180 (gene length
214
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
1779 bp) for example, Fig. 1 shows the CGR plot, converted time series and Hurst exponent calculation of this gene, respectively. The slope of the fitting line in Fig. 1(c) represents the value of Hurst exponent. We find that for each analyzed bacteria, most of its essential and nonessential genes have Hurst exponents above 0.5, which shows obvious fractal characteristic and long-range dependence. For example, Fig. 2 shows Hurst exponents of all the essential genes and nonessential genes of Bacillus subtilis 168. It can be seen that there are more nonessential genes than essential gene, and most essential and nonessential genes have length below 4000 bp, while a few nonessential genes have very big gene length (above 10000 bp). The majority of both essential and nonessential genes have Hurst exponents above 0.5. For all the analyzed bacteria, the maximum Hurst exponents of both essential and nonessential genes are all above 0.5, and minimum values below 0.5 (Table 2). Moreover, we find that gene Hurst exponent is correlated with gene length. We make statistical analysis using SPSS software to verify the correlation between gene length and information dimension. The Pearson correlation coefficients and their p-values of significant tests between the two of all the analyzed objects are listed in Table 2. From the table, it can be seen that in most bacteria, weak negative correlation exists between Hurst exponent and gene length. That is to say, long genes tend to have small exponent values, and short genes big exponent values.
3.2. Comparative analysis of the Hurst exponents of essential and nonessential genes For each bacterium, comparative analysis is made between the Hurst exponents of its essential and nonessential genes. Since linear dependence exists between Hurst exponents and gene length, both the length distribution of essential and nonessential genes (from 0 to the maximum gene length) are divided into length fragments at 100 character intervals. The average Hurst exponents of these two kinds of genes fall in the same fragment are compared. Here fragments in which the distribution of essential genes or (and) nonessential genes are less than 1% are excluded from the comparison in order to reduce the length distributions difference and to concentrate on the main distribution fragments. Fig. 3 gives the comparison results of Bacillus subtilis 168. In all the length fragments, a total of 17 fragment pairs have essential and nonessential genes distributions more than 1%. Fig. 3(a) shows the percentages of essential and nonessential genes in each of these fragments. These fragments include about 90% of all the genes. The average Hurst exponents of these two kinds of genes in each fragment are shown in Fig. 3(b). For 14 length fragments (82.35% of the total), the average Hurst exponent of essential genes is smaller than that of nonessential genes. The result shows that for genes of similar length, the average Hurst exponent of essential genes is smaller than that of nonessential genes.
Fig. 4. Percentages of the length fragments in which the average Hurst exponents of essential genes are larger than those of nonessential genes. It is found that 21 bacteria have the percentage above 50%.
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
215
Fig. 5. Average Hurst exponents of essential and nonessential genes of the 27 analyzed bacteria. It is found that 24 bacteria have average Hurst exponent of essential genes smaller than that of nonessential genes. The rest three objects are V. cholera N16961, P. aeruginosa UCBPP-PA14 and P. aeruginosa PAO1.
For all the 27 bacteria, we calculate the percentages of length fragments in which the average Hurst exponents of essential genes are larger than those of nonessential genes. It is found that 21 bacteria have the percentage above 50% (Fig. 4), 2 objects have percentage of 50%, and the rest 4 objects have the percentage below 50%. They are V. cholera N16961, M. genitalium G37, C. crescentus and P. aeruginosa UCBPP-PA14. Their percentages are about 38.46%, 44.44%, 38.89% and 30.77%, respectively. We also calculate and make a comparative analysis of the average Hurst exponent of all its essential genes and that of its nonessential genes for each analyzed bacterium. The result is shown in Fig. 5. It is found that in the 27 bacteria, 24 have the average Hurst exponent of essential genes smaller than that of nonessential genes. The rest three objects are V. cholera N16961, P. aeruginosa UCBPPPA14 and P. aeruginosa PAO1.
In this paper, we make comparative analysis of bacterial essential and nonessential genes with Hurst exponent based on chaos game representation. All the currently available identified essential genes and nonessential genes from the DEG database are analyzed. Our works show that weak negative correlation exists between Hurst exponent and gene length for both essential and nonessential genes. Moreover, essential genes generally differ from nonessential genes in their Hurst exponents. For genes of similar length, the average Hurst exponent of essential genes is smaller than that of nonessential genes. This indicates that bacteria essential genes have weaker long-range correlation and selfsimilarity than nonessential genes. Since the result holds true for almost all the bacteria analyzed, Hurst exponent is very probably useful gene feature for the algorithm predicting essential genes. Acknowledgements
4. Conclusions Because of the indispensable role of essential genes in the survival of an organism, its prediction and identification is of great significance. Substantial efforts have been made to predict essential genes. Experimental techniques have long durations and high costs, therefore efficient and effective method has yet to be developed. Comparative genomics analysis has been used to identify highly conserved genes, most of which was then experimentally shown to be essential. The chaos game representation method provides facilitated access to gene structure analysis, and has been successfully applied in the study of sequence comparison, classification, similarity analysis and correlation analysis.
The authors would like to thank Prof. Gui-zhi Xu for her support on their research. The present work was supported by the Natural Science Foundation of Hebei Province of China (Grant No. F2012202016) and the National Natural Science Foundation of China (Grant No. 61305077). References [1] Itaya M. An estimation of minimal genome size required for life. FEBS letters 1995;362:257–60. [2] Kobayashi K, Ehrlich SD, Albertini A, et al. Essential Bacillus subtilis genes. Proc Natl Acad Sci 2003;100:4678–83. [3] Glass JI, Hutchison III CA, Smith HO, et al. A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol 2009;5:330. [4] Clatworthy AE, Pierson E, Hung DT. Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol 2007;3:541–8.
216
Q. Zhou, Y.-m. Yu / Chaos, Solitons & Fractals 69 (2014) 209–216
[5] Tong X, Campbell JW, Balázsi G, et al. Genome-scale identification of conditionally essential genes in E. coli by DNA microarrays. Biochem Biophys Res Commun 2004;322:347–54. [6] Molina-Henares MA, de la Torre J, García-Salamanca A, et al. Identification of conditionally essential genes for growth of Pseudomonas putida KT2440 on minimal medium through the screening of a genome-wide mutant library. Environ Microbiol 2010;12:1468–85. [7] Gustafson AM, Snitkin ES, Parker SCJ, et al. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics 2006;7:265. [8] Wang JX, Peng W, Wu FX. Computational approaches to predicting essential proteins: a survey. Proteomics Clin Appl 2013;7:181–92. [9] Karlin S, Brendel V. Patchiness and correlations in DNA sequences. Science 1993;259:677–80. [10] Li W. The study of correlation structures of DNA sequences: a critical review. Comput Chem 1997;21:257–71. [11] Cattani C. Fractals and hidden symmetries in DNA. Math Prob Eng 2010. Article ID 507056. [12] Arneodoa A, Vaillant C, Audit B, Argoul F, d’Aubenton-Carafa Y, Thermes C. Multi-scale coding of genomic information: from DNA sequence to genome structure and function. Phys Rep 2011;498:45–188. [13] Stan C, Cristescu MT, Iarinca LB, Cristescu CP. Investigation on series of length of coding and non-coding DNA sequences of bacteria using multifractal detrended cross-correlation analysis. J Theor Biol 2013;321:54–62. [14] Su ZY, Wu T, Wang SY. Local scaling and multifractal spectrum analyses of DNA sequences-GenBank data analysis. Chaos, Solitons Fractals 2009;40:1750–65. [15] Niu XH, Shi F, Xia JB. Predicting DNA binding proteins using support vector machine with hybrid fractal features. J Theor Biol 2014;343:186–92. [16] Hwang YC, Lin CC, Chang JY, Mori H, Juan HF, Huang HC. Predicting essential genes based on network and sequence analysis. Mol Biosyst 2009;5:1672–8. [17] Wang Z, Wang L, Perc M. Rewarding evolutionary fitness with links between populations promotes cooperation. J Theor Biol 2014;349:50–6. [18] Wang Z, Wang L, Perc M. Degree mixing in multilayer networks impedes the evolution of cooperation. Phys Rev E 2014;89:052813. [19] Wang Z, Szolnoki A, Perc M. Self-organization towards optimally interdependent networks by means of coevolution. New J Phys 2014;16:033041.
[20] Wang Z, Szolnoki A, Perc M. Interdependent network reciprocity in evolutionary games. Sci Rep 2013;3:1183. [21] Wang Z, Szolnoki A, Perc M. Optimal interdependence between networks for the evolution of cooperation. Sci Rep 2013;3:2470. [22] Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acid Res 1990;18:2163–70. [23] Almeida JS, Carrico JA, Maretzek A, et al. Analysis of genomic sequences by chaos game representation. Bioinformatics 2001;17:429–37. [24] Yu ZG, Anh VV, Wanliss JA, Watson SM. Chaos game representation of the D-st index and prediction of geomagnetic storm events. Chaos, Solitons Fractals 2007;31:736–46. [25] Joseph J, Sasikumar R. Chaos game representation for comparison of whole genomes. BMC Bioinformatics 2006;7:243. [26] Goli B, Aswathi BL, Nair AS. Naïve Bayes-Based Classification for Short Microbial Genes Using Chaos Game Representation, Prospects in Bioscience: Addressing the Issues (2013). p. 41-47. [27] Stan C, Cristescu CP, Scarlat EI. Similarity analysis for DNA sequences based on chaos game representation. Case study: the albumin. J Theor Biol 2010;267:513–8. [28] Yu ZG, Anha V, Lauc KS. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. J Theor Biol 2004;226:341–8. [29] Fertil B, Massin M, Lespinats S, Devic C, Dumee P, Giron A. GENSTYLE: exploration and analysis of DNA sequences with genomic signature. Nucleic Acids Res 2005;33:W512–5. [30] Rocha EPC, Danchin A. Essentiality, not expressiveness, drives genestrand bias in bacteria. Nat genet 2003;34:377–8. [31] Lin Y, Gao F, Zhang CT. Functionality of essential genes drives gene strand-bias in bacterial genomes. Biochem Biophys Res Commun 2010;396:472–6. [32] Gao F, Zhang R. Enzymes are enriched in bacterial essential genes. PLoS ONE 2011;6:e21683. [33] Peng C, Gao F. Protein localization analysis of essential genes in prokaryotes. Sci Reports 2014;4:6001. [34] Luo H, Lin Y, Gao F, et al. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res 2014;42:D574–80. [35] Randic´ M. Another look at the chaos-game representation of DNA. Chem Phys Letters 2008;456:84–8. [36] Kale M, Butar FB. Fractal analysis of time series and distribution properties of Hurst exponent. J Math Sci Math Educ 2011;5:8–19. [37] Harold Edwin Hurst, Pearson Black Robert, Simaika YM. Long-Term Storage: an Experimental Study. Constable; 1965.