A rebuttal to the comments on the genome order index

A rebuttal to the comments on the genome order index

Computational Biology and Chemistry 33 (2009) 350 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: ww...

104KB Sizes 5 Downloads 90 Views

Computational Biology and Chemistry 33 (2009) 350

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem

Technical Note

A rebuttal to the comments on the genome order index Ren Zhang Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, China

a r t i c l e

i n f o

Article history: Received 21 July 2008 Accepted 6 November 2008 Keywords: ␣-Order entropy Shannon entropy Genome order index

a b s t r a c t Recently, Elhaik et al. [Elhaik, E., Graur, D., Josic, K., 2008. ‘Genome order index’ should not be used for defining compositional constraints in nucleotide sequences. Comp. Biol. Chem. 32, 147] commented on the genome order index, which is defined as S = a2 + c2 + g2 + t2 where a, c, g and t denote corresponding base frequencies. They claimed that (1) “S < 1/3 is in fact a mathematical property that is always true”, and (2) “S is strictly equivalent to and derivable from the Shannon H function”. The first claim is apparently wrong: for instance, when a = c =0.5, g = t =0, S = 0.5 > 1/3. The second claim is also incorrect because S and H are different special cases of the ␣-order entropy, having different functional forms that are neither strictly derivable from nor equivalent to each other. Therefore, the conclusions made in their comments are wrong. © 2008 Elsevier Ltd. All rights reserved.

To examine the first claim, surprisingly only elementary mathematics is needed. For instance, when a = c = 0.5, g = t = 0, S = 0.5 > 1/3 and this sufficiently shows that their first claim is wrong. Examining the second claim necessitates mentioning the ␣-order entropy H˛ (Rényi, 1961). When ˛ tends to 1, H˛ becomes the Shannon H function; if ˛ = 2 and there are four variables, S is a linear transform of H2 , which is known as the Gini–Simpson index. Therefore, S and H are different special cases of the ␣-order entropy, having different functional forms, i.e., power and logarithm functions, respectively, and are not strictly derivable from each other. Further, they are not equivalent; for instance, H, but not S, has the mathematical property

E-mail address: [email protected]. 1476-9271/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2008.11.001

of additivity of information entropy. Therefore, S and H are neither strictly derivable from nor equivalent to each other. In conclusion, the claims made in the comments (Elhaik et al., 2008) are wrong. References Elhaik, E., Graur, D., Josic, K., 2008. ‘Genome order index’ should not be used for defining compositional constraints in nucleotide sequences. Comp. Biol. Chem. 32, 147. Rényi, A., 1961. On measures of information and entropy. In: Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. University of California Press, Berkeley, CA, pp. 547–561.