Necessary relations for nucleotide frequencies

Necessary relations for nucleotide frequencies

Journal of Theoretical Biology 374 (2015) 179–182 Contents lists available at ScienceDirect Journal of Theoretical Biology journal homepage: www.els...

485KB Sizes 2 Downloads 17 Views

Journal of Theoretical Biology 374 (2015) 179–182

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Letter to Editor

Necessary relations for nucleotide frequencies

art ic l e i nf o Keywords: Genome composition Combinatorics Dinucleotide frequency k-mer Analysis

a b s t r a c t Genome composition analysis of di-, tri- and tetra-nucleotide frequencies is known to be evolutionarily informative, and useful in metagenomic studies, where binning of raw sequence data is often an important first step. Patterns appearing in genome composition analysis may be due to evolutionary processes or purely mathematical relations. For example, the total number of dinucleotides in a sequence is equal to the sum of the individual totals of the sixteen types of dinucleotide, and this is entirely independent of any assumptions made regarding mutation or selection, or indeed any physical or chemical process. Before any statistical analysis can be attempted, a knowledge of all necessary mathematical relations is required. I show that 25% of di-, tri- and tetra-nucleotide frequencies can be written as simple sums and differences of the remainder. The vast majority of organisms have circular genomes, for which these relations are exact and necessary. In the case of linear molecules, the absolute error is very nearly zero, and does not grow with contiguous sequence length. As a result of the new, necessary relations presented here, the foundations of the statistical analysis of di-, tri- and tetranucleotide frequencies, and k-mer analysis in general, need to be revisited. & 2015 Elsevier Ltd. All rights reserved.

Theoretical work aimed at deciphering features of molecular evolution and the processes bearing on these features can only be effective if the most fundamental properties of sequences are clearly understood. In particular, patterns or regularities of a purely mathematical nature need to be separated from evolutionary signal. In the following, I describe such a type of mathematical pattern or regularity. I present it in its simplest form, since that is most likely to contribute to actual understanding. The analysis of nucleic acid sequences is a fundamental component of modern genomic and evolutionary analysis. The base composition of DNA has been studied since the earliest days of molecular biology (Freese, 1962), beginning with the empirical observations of Chargaff and the rules he derived from them (Chargaff and Davidson, 1955). As a step towards understanding the genetic code, combinatorics was successfully used (Brenner, 1957) to exclude the possibility of overlapping triplet codes (Gamow, 1954), and the results presented here continue that line of thought which associates pure mathematical proof with molecular reality. Dinucleotide frequencies are the frequencies of neighbouring pairs of nucleotides in the order they appear in a given sequence. For standard nucleic acid sequences, there are sixteen such pairs, and their relative abundances have long been known to be biologically and evolutionarily informative (Freese, 1962; Karlin and Burge, 1995). Here we show that four (25%) of these can be expressed as simple sums and differences of the remaining twelve, for purely combinatorial reasons. This is the first time that this level of dependency has been recognised. The new relations are mathematically exact for any circular molecule, including plasmids and the genomes of many viruses and the vast majority of cellular organisms, irrespective of assumed mutation rates or models, with or without selection. For linear molecules, http://dx.doi.org/10.1016/j.jtbi.2015.03.025 0022-5193/& 2015 Elsevier Ltd. All rights reserved.

the worst case error does not increase with contiguous sequence length. The same type of dependency exists between words of more than two nucleic acids, and the relations apply to metagenomic analysis. Since statistical analysis of any kind relies upon an understanding of what is independent and what is not (Walker, 1940), the assumptions made in statistical analyses of nucleic acid composition need to be revisited. Early studies of base composition tended to focus on the frequencies of base pairs (Freese, 1962; Karlin and Burge, 1995), but it has also become common, particularly in the context of metagenomics, to count triplets, quadruplets (tetranucleotides) (Pride et al., 2003; Teeling et al., 2004b) or longer words (k-mers) (Alneberg et al., 2014; Chor et al., 2009; Ragan and Chan, 2013), and these are applied in a wide variety of contexts (Hohl and Ragan, 2007; Karlin and Ladunga, 1994; Mrazek, 2009; Sims et al., 2009). Dinucleotide frequencies have continued to receive attention (Baran and Ko, 2008; Liu and Li, 2008; Palmeira et al., 2006; Simmonds et al., 2013), as exemplified by studies of CpG islands and CpG methylation (Arndt and Hwa, 2005; Bernardi, 2012; Zemach et al., 2010). Of particular relevance here is a careful study of CpG and TpA deficiencies in human isochores that revealed covariation due to mathematical consequences of dinucleotide overlap (Duret and Galtier, 2000). Starting from that observation, that overlap implies mathematical constraints, one can consider these constraints from an entirely abstract point of view. Here, an exact and complete account of all the mathematical consequences of dinucleotide, trinucleotide and tetranucleotide overlap is provided for the first time. This will impact statistical analysis of the composition of any type of nucleic acid, since what is revealed are necessary relations which until now had not been apparent, and such relations directly determine, among other things, the number of degrees of freedom (Walker, 1940). It is likely

180

Letter to Editor / Journal of Theoretical Biology 374 (2015) 179–182

that patterns seen in the analysis of raw sequence and gene expression data which are not specific to biological source (Zheng et al., 2011) are in part reflections of the dependencies exposed here. The new exact relations can also be used to design more logically consistent mutation models, since any consistent model must respect these dependencies. For a detailed case study of what this might entail, the reader is referred to the study by Duret and Galtier (2000), which improved upon an earlier mutation model (Sved and Bird, 1990) by taking aspects of dinucleotide overlap into account. The results presented here allow one to go to a more fundamental level, and design models which include the complete set of correct dependencies in their basic formulation. The positive impact of improved statistical analysis can be expected to increase the signal to noise ratio in metagenomics, and could be built in to the algorithms associated with large databases (Kryukov et al., 2012; Teeling et al., 2004a). To illustrate what is meant on a practical level, a new relation which can be directly applied in metagenomic composition analysis is provided below, and derived in the Supplementary material. The main result for dinucleotides is best illustrated by concrete examples. The number of times the pair “CG” appears in the DNA sequence of a circular genome can be computed using nothing more than the numbers of times the pairs “GC”, “AC”, “CA”, “TC” and “CT” appear, via the following exact formula: #CG ¼ #GC þ #AC #CA þ #TC  #CT:

ð1Þ

The circular mitochondrial genome of the fish Sardinops melanostictus (Inoue et al., 2000) is a non-trivial example. It has 956 “GC” pairs, 1138 “AC” pairs, 1181 “CA” pairs, 1131 “TC” pairs and 1389 “CT” pairs. Inserting these into the equation, one finds that there should be 956 þ1138 1181 þ1131  1389 or 655 “CG” pairs, and this is indeed the case. Note that it was not necessary to know the length of the genome, but the fact that it is circular does play a role. Before explaining our general approach, let us describe one direct way of deriving Eq. (1), using “N” as a wildcard which will match any nucleotide: One expects that #CN equals #NC, since both are effectively estimates of #C. Writing #CN ¼#NC, removing #CC from both sides and solving for #CG gives a rough derivation of Eq. (1). The role of circularity of the molecule becomes clear when one realises that #CN is exactly equal to #NC on a circular genome, but they may differ by at most one for a linear genome, with any difference due to a “C” at only one end. For example, the linear sequence “CACGT” can be broken down into the four dinucleotides “CA”, “AC”, “CG” and “GT”. Two begin with a “C”, so #CN ¼2. Only one ends in a “C”, so #NC¼1. One can see that #CG ¼ 1, but #GCþ #AC #CA þ#TC #CT¼ 0 þ1  1 þ0 0 ¼ 0, so Eq. (1) has an error of one (the left hand side equals 1, but the right hand side equals 0). If it were a circular molecule, there would be an extra dinucleotide “TC”, completing the circle by linking the final nucleotide back to the intitial nucleotide of “CACGT”. Eq. (1) would then be satisfied exactly. The general approach is elementary, and does not require full formal treatment. It will be useful to imagine a bracelet made of a piece of string threaded through a number of variously coloured beads and then tied in a loop (Fig. 1). Whatever the colours are, it will be possible to divide them up into two types. If the knot is positioned such that it is visible between two specific beads, then, once a direction has been chosen, the type of each bead can be read out in order, beginning with the first bead after the knot, and ending with the last bead reached before returning to the knot. Every bead has a successor, from which it may or may not differ in type. If, in passing from one bead to the next, the type changes from that of the first bead to the other type, this can be imagined as a step away from the starting type. If the next bead is of the same type as the first bead, and the current bead is not, then that can be imagined as a step back to the starting type. Now, since the bracelet is a loop, we must finally return to the first bead, so the

numbers of steps away and back must always be equal. This is a mathematical fact, not related to the materials the bracelet is made of, nor to the procedure by which the colours were divided, nor the position of the knot (Fig. 1). If one strand of a circular genome or plasmid takes the place of the bracelet, the same logic can be applied. Given the standard four nucleotides, “G”, “A”, “T” (for DNA) or “U” (for RNA), and “C”, then there are a number of choices which can be made concerning their type assignments. One could consider purines (i.e. “A” and “G”) as one type, and pyrimidines (“T” or “U”, and “C”) as another. In that case, the logic of the argument leads to the conclusion that the number of nucleotide pairs classified as purine pyrimidine must equal the number of pairs classified as pyrimidine purine. This equality translates into the equation #purine pyrimidine¼#pyrimidine purine, or, if the actual nucleotide combinations are written out in full, assuming DNA, #ATþ#ACþ #GTþ#GC¼ #TAþ#CAþ #TGþ#CG. A linear chromosome corresponds to an open string — a cut bracelet. Let the string be cut between the final bead and the knot. The last bead no longer has a successor, and so, if the first and last beads differ in type, one change in type may be missed. The equation may then only be an approximation, but since the worst case is that only one step has been missed, the greatest error is plus or minus one. Experimentally determined genome sequences can have many gaps. These correspond to many cuts in the string. One can treat each uncut, or contiguous, piece as a linear molecule. For each one, the worst case error is plus or minus one, so the worst case error for a gapped sequence is plus or minus the total number of contiguous subsequences. Human chromosome 1 (Gregory et al., 2006) (NCBI RefSeq (Pruitt et al., 2014) Accession NC_000001.11) has 10145272 “GC”, 11598278 “AC”, 16768284 “CA”, 13844699 “TC” and 16444797 “CT” dinucleotides. Using Eq. (1), one arrives at the approximation

Fig. 1. A bracelet made from eight coloured beads threaded on a black string which has been knotted. One could decide to group the reddish (“R”) and the greenish (“G”) colours together. Starting at the bead to the immediate right of the knot, the bracelet reads GRGGRRRGG, including the first bead once more at the end. The number of times “GR” appears (two) equals the number of times “RG” appears. Such combinatorial equalities are the basis of our approach.

Letter to Editor / Journal of Theoretical Biology 374 (2015) 179–182

#CGE2375168. The actual number of “CG” dinucleotides is 2375159, a difference of nine. Since the sequence contains 167 contiguous (gapless, and with all nucleotides determined) subsequences, the worst case error is plus or minus 167, and nine is clearly much smaller than that. The worst case relative error is very small (0.007%). This example serves to illustrate that the relations found here are extremely precise for linear sequences, more so than purely probabilistic arguments would typically predict (one would usually expect to see a growth proportional to the square root of length). Note that the presence of gaps is a property of an experimentally determined sequence, not of the molecule itself. The main result for dinucleotides is derived from eight equations, listed in the Supplementary material as Eqs. (5)–(12). Seven come from splits of the four nucleotides into two types, and the eighth expresses the fact that the total number of dinucleotides (#tot2) must be the sum of all the counts of specific ones. Four of these equations are redundant, meaning four (i.e. 8–4) dinucleotide counts can be calculated from the other twelve. One has some mathematical freedom in how this is done, and another set of four equations is provided in the Supplementary material to illustrate this. As an example, Eq. (1) and #TA ¼ #AT þ #GT  #TG þ #CT  #TC

ð2Þ

#AG ¼ #GA þ #CA  #AC þ #CT  #TC þ #GT  #TG

ð3Þ

#GG ¼ #tot2  #CC #AA  #TT  #AC  #CA þ #TG  2  ð#GA þ #GC þ #AT þ #CTÞ  3  #GT

ð4Þ

are four equations exact for circular molecules of any length. For linear molecules, the worst case error of Eqs. (1)–(3) is plus or minus one. For Eq. (4), it is at most plus or minus three. Our analysis divides the 16 possible nucleotides into two subsets: 4 composed of equal nucleotides (i.e. “GG”, “AA”, “TT” and “CC”), and 12 composed of unequal nucleotides. The four composed of equal nucleotides appear as the sum #GGþ #AAþ#TTþ#CC in Eq. (12), in the Supplementary material, expressing the fact that the total number of dinucleotides (#tot2) must be the sum of all the counts of specific ones, but they cannot, by construction, appear in any of the other equations derived from considering splits. The twelve dinucleotides composed of unequal or differing nucleotides appear symmetrically in the list of eight Eqs. ((5) to (12)) in the Supplementary material. Also note that these twelve dinucleotides appear in differences of the type #XY #YX in Eqs. (5)–(11), but in sums of the type #XYþ #YX in Eq. (12). These symmetries are broken by making a specific choice of equations to be used. Thus, the elimination of redundancy results in some dinucleotides composed of differing nucleotides appearing to belong to various groupings, but this apparent structure is purely a result of symmetry breaking due to an arbitrary choice. Can one write down additional non-redundant equations for dinucleotides? The answer is no. All the possible changes in pair counts that result from changing single nucleotides in a sequence can be expressed as vectors, and the dimension of the space they span can be computed in an elementary manner, using the rank (Strang, 2006) of the matrix (Supplementary Table 1) of possible dinucleotide changes induced by single nucleotide changes. The result is 12, indicating that four is the greatest number of dependent nucleotide pairs. 25% of dinucleotides can be expressed in terms of the others. Only 75% are independent. The same is of course true for single nucleotide counts, where only one equation applies (#G þ #A þ#T þ#C ¼#tot1), leaving three out of the four counts independent. Matrix rank analysis can be extended to trinucleotide and tetranucleotide counts, and the percentage of independent words remains at 75% for both, indicating that dinucleotides are not atypical in having so many combinatorial dependencies. Complete sets of necessary relations for both tri- and tetra-nucleotides can be computed from the nullspaces

181

(Strang, 2006) of the associated matrices, and these are listed in full in the Supplementary material. In the context of metagenomic binning analysis, it is meaningful to average counts over a sequence and its reverse complement. This changes the number of dependent quantities of interest (Kryukov et al., 2012), but the necessary relations presented here can still be applied, and provide new relations between these quantities of interest. An example, derived in the Supplementary material, is #AT  #TA ¼ ð#GA þ #TCÞ=2  ð#AG þ #CTÞ=2 þ ð#TG þ #CAÞ=2  ð#GT þ #ACÞ=2:

Computational experimentation suggests that, using “R” for purines, “Y” for pyrimidines, and “N” for any nucleotide as before, the mixed equation #RYRY  #YYRR¼#RY #RNY is also exact for circular molecules. It appears that the space of nucleotide frequencies abounds with interdependencies. Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jtbi.2015.03.025. References Alneberg, J., Bjarnason, B.S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C., 2014. Binning Metagenomic Contigs by Coverage and Composition. Nat. Meth. 11, 1144–1146. http://dx.doi.org/ 10.1038/nmeth.3103. Arndt, P.F., Hwa, T., 2005. Identification and measurement of neighbor-dependent nucleotide substitution processes. Bioinformatics 21, 2322–2328. http://dx.doi. org/10.1093/Bioinformatics/Bti376. Baran, R.H., Ko, H., 2008. Detecting horizontally transferred and essential genes based on dinucleotide relative abundance. DNA Res. 15, 267–276. http://dx.doi. org/10.1093/Dnares/Dsn021. Bernardi, G., 2012. The genome: an isochore ensemble and its evolution. Eff. Genome Struct. Seq. Variation Evol. 1267, 31–34. http://dx.doi.org/10.1111/ J.1749-6632.2012.06591.X. Brenner, S., 1957. On the impossibility of all overlapping triplet codes in information transfer from nucleic acid to proteins. Proc. Natl. Acad. Sci. U.S.A. 43, 687–694. http://dx.doi.org/10.1073/Pnas.43.8.687. Chargaff, E., Davidson, J.N., 1955. The Nucleic Acids: Chemistry and Biology. Academic Press, New York. Chor, B., Horn, D., Goldman, N., Levy, Y., Massingham, T., 2009. Genomic DNA k-mer spectra: models and modalities. Genome Biol., 10. http://dx.doi.org/10.1186/Gb2009-10-10-R108. Duret, L., Galtier, N., 2000. The covariation between TpA deficiency, CpG deficiency, and G þ C content of human isochores is due to a mathematical artifact. Mol. Biol. Evol. 17, 1620–1625. Freese, E., 1962. On the evolution of the base composition of DNA. J. Theor. Biol. 3, 82–101. http://dx.doi.org/10.1016/S0022-5193(62)80005-8. Gamow, G., 1954. Possible relation between deoxyribonucleic acid and protein structures. Nature 173, http://dx.doi.org/10.1038/173318a0 318-318. Gregory, S.G., Barlow, K.F., Mclay, K.E., Kaul, R., Swarbreck, D., Dunham, A., Scott, C.E., Howe, K.L., Woodfine, K., Spencer, C.C.A., Jones, M.C., Gillson, C., Searle, S., Zhou, Y., Kokocinski, F., McDonald, L., Evans, R., Phillips, K., Atkinson, A., Cooper, R., Jones, C., Hall, R.E., Andrews, T.D., Lloyd, C., Ainscough, R., Almeida, J.P., Ambrose, K.D., Anderson, F., Andrew, R.W., Ashwell, R.I.S., Aubin, K., Babbage, A.K., Bagguley, C.L., Bailey, J., Beasley, H., Bethel, G., Bird, C.P., Bray-Allen, S., Brown, J.Y., Brown, A.J., Buckley, D., Burton, J., Bye, J., Carder, C., Chapman, J.C., Clark, S.Y., Clarke, G., Clee, C., Cobley, V., Collier, R.E., Corby, N., Coville, G.J., Davies, J., Deadman, R., Dunn, M., Earthrowl, M., Ellington, A.G., Errington, H., Frankish, A., Frankland, J., French, L., Garner, P., Garnett, J., Gay, L., Ghori, M.R.J., Gibson, R., Gilby, L.M., Gillett, W., Glithero, R.J., Grafham, D.V., Griffiths, C., Griffiths-Jones, S., Grocock, R., Hammond, S., Harrison, E.S.I., Hart, E., Haugen, E., Heath, P.D., Holmes, S., Holt, K., Howden, P.J., Hunt, A.R., Hunt, S.E., Hunter, G., Isherwood, J., James, R., Johnson, C., Johnson, D., Joy, A., Kay, M., Kershaw, J.K., Kibukawa, M., Kimberley, A.M., King, A., Knights, A.J., Lad, H., Laird, G., Lawlor, S., Leongamornlert, D.A., Lloyd, D.M., et al., 2006. The DNA sequence and biological annotation of human chromosome 1. Nature 441, 315–321. http://dx.doi.org/10.1038/Nature04727. Hohl, M., Ragan, M.A., 2007. Is multiple-sequence alignment required for accurate inference of phylogeny? Syst. Biol. 56, 206–221. http://dx.doi.org/10.1080/ 10635150701294741. Inoue, J.G., Miya, M., Tsukamoto, K., Nishida, M., 2000. Complete mitochondrial DNA sequence of the Japanese sardine Sardinops melanostictus. Fish. Sci. 66, 924–932. http://dx.doi.org/10.1046/J.1444-2906.2000.00148.X. Karlin, S., Ladunga, I., 1994. Comparisons of eukaryotic genomic sequences. Proc. Natl. Acad. Sci. U.S.A. 91, 12832–12836. http://dx.doi.org/10.1073/ Pnas.91.26.12832.

182

Letter to Editor / Journal of Theoretical Biology 374 (2015) 179–182

Karlin, S., Burge, C., 1995. Dinucleotide relative abundance extremes—a genomic signature. Trends Genet. 11, 283–290. Kryukov, K., Sumiyama, K., Ikeo, K., Gojobori, T., Saitou, N., 2012. A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses. Genome Biol. Evol. 4, 501–512. http://dx. doi.org/10.1093/Gbe/Evs026. Liu, G.Q., Li, H., 2008. The correlation between recombination rate and dinucleotide bias in drosophila melanogaster. J. Mol. Evol. 67, 358–367. http://dx.doi.org/ 10.1007/S00239-008-9150-0. Mrazek, J., 2009. Phylogenetic signals in DNA composition: limitations and prospects. Mol. Biol. Evol. 26, 1163–1169. http://dx.doi.org/10.1093/Molbev/ Msp032. Palmeira, L., Gueguen, L., Lobry, J.R., 2006. UV-Targeted dinucleotides are not depleted in light-exposed prokaryotic genomes. Mol. Biol. Evol. 23, 2214–2219. http://dx.doi.org/10.1093/Molbev/Msl096. Pride, D.T., Meinersmann, R.J., Wassenaar, T.M., Blaser, M.J., 2003. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 13, 145–158. http://dx.doi.org/10.1101/Gr.335003. Pruitt, K.D., Brown, G.R., Hiatt, S.M., Thibaud-Nissen, F., Astashyn, A., Ermolaeva, O., Farrell, C.M., Hart, J., Landrum, M.J., McGarvey, K.M., Murphy, M.R., O’Leary, N.A., Pujar, S., Rajput, B., Rangwala, S.H., Riddick, L.D., Shkeda, A., Sun, H.Z., Tamez, P., Tully, R.E., Wallin, C., Webb, D., Weber, J., Wu, W.D., DiCuccio, M., Kitts, P., Maglott, D.R., Murphy, T.D., Ostell, J.M., 2014. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763. http://dx.doi.org/10.1093/Nar/ Gkt1114. Ragan, M.A., Chan, C.X., 2013. Biological intuition in alignment-free methods: response to Posada. J. Mol. Evol. 77, 1–2. http://dx.doi.org/10.1007/S00239013-9573-0. Simmonds, P., Xia, W.J., Baillie, J.K., McKinnon, K., 2013. Modelling mutational and selection pressures on dinucleotides in eukaryotic phyla – selection against CpG and UpA in cytoplasmically expressed RNA and in RNA viruses. BMC Genomics, 14. http://dx.doi.org/10.1186/1471-2164-14-610. Sims, G.E., Jun, S.R., Wua, G.A., Kim, S.H., 2009. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. U.S.A. 106, 2677–2682. http://dx.doi.org/10.1073/Pnas.0813249106.

Strang, G., 2006. Linear Algebra and its Applications. Thomson, Brooks/Cole, Belmont, CA. Sved, J., Bird, A., 1990. The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc. Natl. Acad. Sci. U.S.A. 87, 4692–4696. http://dx.doi.org/10.1073/Pnas.87.12.4692. Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., Glockner, F.O., 2004a. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinf. 5, 163. http://dx. doi.org/10.1186/1471-2105-5-163. Teeling, H., Meyerdierks, A., Bauer, M., Amann, R., Glockner, F.O., 2004b. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6, 938–947. http://dx.doi.org/10.1111/J.1462-2920.2004.00624.X. Walker, H.M., 1940. Degrees of freedom. Psychol. Rev. 31, 253–269. Zemach, A., McDaniel, I.E., Silva, P., Zilberman, D., 2010. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328, 916–919. http://dx.doi. org/10.1126/Science.1186366. Zheng, W., Chung, L.M., Zhao, H.Y., 2011. Bias detection and correction in RNAsequencing data. BMC Bioinf., 12. http://dx.doi.org/10.1186/1471-2105-12-290.

Robert Sinclair n Mathematical Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son 904-0495, Okinawa, Japan E-mail address: [email protected] Received 11 November 2014 1 February 2015 21 March 2015 Available online 3 April 2015

n

Tel.: þ 81 98 966 8624; fax: þ 81 98 966 1062.