Small introns tend to occur in GC-rich regions in some but not all vertebrates

Small introns tend to occur in GC-rich regions in some but not all vertebrates

COMMENT Genetic redundancy which combinations of the ABC genes act as ‘selectors’ to determine the identity of the floral organs12,13. In other word...

222KB Sizes 0 Downloads 21 Views

COMMENT

Genetic redundancy

which combinations of the ABC genes act as ‘selectors’ to determine the identity of the floral organs12,13. In other words, action of A-class genes is sufficient to specify sepal identity, A and B for petal, B and C for stamen, and C for carpel identity. At the time the ABC model was first proposed, only one A-gene function had been identified, APETALA2 (AP2), but subsequently AP1 was also characterized as having A-class gene function in addition to its meristem identity role4,6. Further complicating the issue is the fact that AP2 also encodes a product that has similarity to a large family of transcription factors, and at least one partially redundant gene, AINTEGUMENTA (ANT), has now been identified14–16. The genetic evidence for the combinatorial action of the ABC genes is largely based on the novel phenotypes displayed by double-mutant combinations. For instance, plants that are doubly mutant for the A-class gene AP2 and the B-class gene PISTILLATA (PI) display a novel phenotype that is dissimilar from that of each of the single mutants. However, redundancy in the AP2 gene family implies that the ap2 pi ‘novel’ double-mutant phenotype

References 1 Marra, M. et al. High throughput bacterial artificial chromosome fingerprinting of the Arabidopsis genome. Nat. Genet. (in press) 2 Martienssen, R.A. (1998) Functional genomics: probing plant gene function and expression with transposons. Proc. Natl. Acad. Sci. U. S. A. 95, 2021–2026 3 Chory, J. (1993) Out of darkness: mutants reveal pathways controlling light-regulated development in plants. Trends Genet. 9, 167–172 4 Irish, V.F. and Sussex, I.M. (1990) Function of the apetala-1 gene during Arabidopsis floral development. Plant Cell 2, 741–753 5 Weigel, D. et al. (1992) LEAFY controls floral meristem identity in Arabidopsis. Cell 69, 843–859 6 Bowman, J.L. et al. (1993) Control of flower development in Arabidopsis thaliana by APETALA1 and interacting genes.

might actually result from the disruption of a single linear genetic pathway. Complete loss of all A-gene function could have more dramatic effects than those predicted by the model; support for such an idea comes from the phenotype of ant ap2 double-mutants, which lack most floral organs15. It is possible that there is no A function per se, but, rather, that the action of all the ‘A-class’ genes (including the AP2 and AP1 family members) is required for the formation of a floral meristem, which in turn activates the C- and B-class genes via LFY. While certainly speculative, this might account for a number of discrepancies with respect to the original model. Of course, the idea that redundancy affects our interpretation of genetics is hardly new17. Similarly, the formal interpretations of double-mutant phenotypes are well established18. However, it is only with the analysis of whole genomes that the extent of the redundancy problem has become apparent genetically. If these musings reflect reality, we might need to rewrite much, if not most, of our favorite models in developmental genetics that rely on mutations in gene families for their interpretation.

Development 119, 721–743 7 Huala, E. and Sussex, I.M. (1992) LEAFY interacts with floral homeotic genes to regulate Arabidopsis floral development. Plant Cell 4, 901–913 8 Mandel, M.A. et al. (1992) Molecular characterization of the Arabidopsis floral homeotic gene APETALA1. Nature 360, 273–277 9 Kempin, S. et al. (1995) Molecular basis of the cauliflower phenotype in Arabidopsis. Science 267, 522–525 10 Mandel, M.A. and Yanofsky, M.F. (1995) The Arabidopsis AGL8 MADS box gene is expressed in inflorescence meristems and is negatively regulated by APETALA1. Plant Cell 7, 1763–1771 11 Hempel, F.D. et al. (1997) Floral determination and expression of floral regulatory genes in Arabidopsis. Development 124, 3845–3853 12 Bowman, J.L. et al. (1989) Genes directing flower development in Arabidopsis. Plant Cell 1, 37–52

Intron size and GC-rich regions

Outlook

13 Coen, E.S. and Meyerowitz, E.M. (1991) The war of the whorls: genetic interactions controlling flower development. Nature 353, 31–37 14 Weigel, D. (1995) The APETALA2 domain is related to a novel type of DNA binding domain. Plant Cell 7, 388–389 15 Elliott, R.C. et al. (1996) AINTEGUMENTA, an APETALA2-like gene of Arabidopsis with pleiotropic roles in ovule development and floral organ growth. Plant Cell 8, 155–168 16 Klucher, K.M. et al. (1996) The AINTEGUMENTA gene of Arabidopsis required for ovule and female gametophyte development is related to the floral homeotic gene APETALA2. Plant Cell 8, 137–153 17 Pickett F.B. and Meeks-Wagner, D.R. (1995) Seeing double: appreciating genetic redundancy. Plant Cell 7, 1347–1356 18 Avery, L. and Wasserman, S. (1992) Ordering gene function: the interpretation of epistasis in regulatory hierarchies. Trends Genet. 8, 312–316

GENOME ANALYSIS

Outlook

Small introns tend to occur in GC-rich regions in some but not all vertebrates here exists considerable variation in the size of introns, both within and between species. It has been reported that, for some mammals and birds, genes in GC-rich isochores might be both shorter (in terms of total intron size 1 total exon size) and more compact (in terms of total intron size 4 total exon size) than genes in isochores of lower GC content1. Does this mean that the introns are shorter in GC-rich regions and is this generally true within the vertebrates? To address these issues we have analysed the covariance of intron size and local GC composition in a mammal, a bird, a fish and an amphibian.

T

0168-9525/99/$ – see front matter © 1999 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(99)01832-6

A covariance of intron size with GC composition would be potentially informative of some of the forces affecting intronic dimensions because the proportional GC content and recombination rate are known to exhibit covariance within mammals2. Indeed, it has been hypothesized that, if recombination induces deletions, introns might be smaller in GC-rich regions owing to a mutational bias1. However, an alternative selectionist model can also be imagined. If longer introns are slightly deleterious then, because selection is more efficient when the local recombination rate is high3, small deletions and insertions are TIG November 1999, volume 15, No. 11

437

Outlook

GENOME ANALYSIS

FIGURE 1. Proportional GC3 content versus intron size

Regression of the proportional GC3 content (1.0 5 100%) against log of intron size for 1211 Homo sapiens introns. The proportional GC3 content correlates negatively with intron size (P , 0.0001). To some extent, this result is influenced by an abundance of especially small introns at especially high GC3 levels. However, restricting analysis to those with a proportional GC3 content of < 70% still reports a significant correlation, albeit one of a lesser order (P 5 0.013).

Laurence D. Hurst [email protected] Clair F.A. Brunton [email protected] *Nicholas G.C. Smith [email protected] Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK BA2 7AY. *School of Biological Sciences, University of Sussex, Brighton, UK BN1 9QG. 438

more likely to be ‘seen’ by selection in GC-rich regions. Here, we additionally investigate this putative link between recombination rates and GC content with intron sizes. Accession numbers were obtained from the FTP site as indicated in Duret et al.1 and from the Hovergen database4. Extraction of data from each database entry was automated using information in the GenBank accession files. This allowed us to calculate the size of each intron. Partial intron sequences were not included. The complete exons for each gene were also extracted and a GC3 percentage (i.e. the GC content at the third position in the codon) was calculated. The raw data on intron sizes is skewed with a tail towards larger intronic dimensions. This was corrected by Log transformation. Within humans we find a significant negative correlation between GC content and intron size [Log (Intron size) 5 20.49 GC3 1 2.93; P value on slope . 0.0001, N 5 1211 introns; see Fig. 1]. This is consistent with the hypothesis that introns are smaller where the recombination rate is highest. Analysis of mouse (L. Duret, pers. commun.) and rat data (P,0.002) indicate the same tendency. Limitations due to sample size prevent firm conclusions from being drawn in other mammals (see below). However, it must be noted that, while the statistics are very highly significant, there is also very considerable residual variance in intron size that is not explained by GC content (see Fig. 1). Data from outside of the mammals suggests a potential difference between cold- and warm-blooded species, as there might be in the case of isochores5. Chickens, like mammals, have significantly larger introns in AT-rich TIG November 1999, volume 15, No. 11

Intron size and GC-rich regions

isochores, [Log (intron size) 5 20.67 GC3% 1 2.88; P value on slope .0.0001, N 5 313 introns]. In Xenopus there is a significant positive correlation of GC3 percentage and intron size, indicating the opposite pattern [Log (Intron size) 5 2.28 GC3% 1 1.51; P value on slope . 0.0001, N 5 83 introns]. In Fugu the correlation is also positive and near significance [Log (intron size) 5 0.5 GC3% 1 1.93; P value on slope 5 0.09, N 5 260 introns]. The possible warm-blooded versus cold-blooded dichotomy needs further elucidation, most especially as the sample size for Xenopus is considerably lower than that for Fugu, but it is the former that provides a significant statistic. It must also be noted that the investigation of comparative intron sizes is heavily influenced by ascertainment biases, because small introns tend to be fully sequenced earlier and more often than large ones. So, for example, in nine species of mammals that we have examined, the best predictor of mean intron size in a species is the number of introns that have been sequenced (Spearman Rank correlation of mean intron size per species versus number of introns sampled per species, P , 0.01). Similarly, our analysis (data not shown) of orthologous mouse–rat introns finds the mean size of these to be around two-thirds that of a non-orthologous set that is an order of magnitude larger. This is to be expected if small introns are sequenced first, because the probability that the same intron has been sequenced in two species is approximately a function of the square of the probability that it has been sequenced in one. Given the possibility of ascertainment problems, we are cautious about the Xenopus result, but given that the same pattern is seen in Fugu, we consider the finding worth reporting. Considering these difficulties and given that the GC content of Xenopus and human genes are correlated5, it is then worthwhile asking whether the size of orthologous introns in Xenopus and humans are also correlated and how this might relate to the above result. We examined Hovergen to identify orthologous Xenopus human genes and found a total of 33 orthologous introns. Unfortunately this sample size is too limited to make many firm conclusions. Importantly, in this sample, the proportional GC3 content and Log intron size of the Xenopus genes do not correlate, so the sample may not be representative. We can, however, report that whereas on average human and Xenopus introns are not different in size (mean of Human intron /Xenopus intron 5 1.04), there is very considerable variation (standard deviation of Human intron /Xenopus intron 5 1.21). Some Xenopus introns are over tenfold larger than the human orthologue, whereas some human ones are nearly five times larger than the Xenopus orthologue. There is, at best, only a weak tendency for the intron sizes to correlate [Log (Human intron size) 5 1.46 1 0.378 Log (Xenopus intron size), P 5 0.1]. We do not find any significant relationship between the GC3 content of the flanking exons in Xenopus and the size ratio of the orthogues. The only strong conclusion is that, with the amount of variation in size between the orthogues, there has clearly been plenty of size evolution of the introns. Given the diversity of proportional GC3 content versus intron size results, one might suspect that intron size, GC3 content and recombination rate are not generally causally related. Indeed, while three-way alignment of orthologous mammalian introns indicates a mutational bias in favour of deletions, no covariance of GC content and size difference between introns could be found, although intronic

GENOME ANALYSIS

Intron size and GC-rich regions

size difference and local mutation rate do show covariance6. An alternative possibility is that recombination, proportional GC content and intron size do show covariance, but not in the same way in all vertebrates. It might be that in chickens and mammals there is more recombination in GC-rich regions, but in the cold-blooded species the opposite pattern is found. We are unaware of pertinent data to test the prediction with respect to Xenopus and Fugu (or indeed any other cold-blooded vertebrate). However, we can provide a test of the hypothesis in the case of chickens. If proportional GC content and recombination rate were related in the above manner, then the Z chromosome should have a GC content that is lower than that of autosomal genes, because the former recombines in males alone (except in the pseudoautosomal region). The W chromosome should have an even lower figure because it never recombines. Chickmap (www.ri.bbsrc.ac.uk/chickmap/) provides information on the map position of chicken genes. From here we have derived a list of genes whose full cDNA was known and that were also known to be either Z-linked (N 5 7) or autosomal (N 5 23). We find that the Z-linked genes have a mean proportional GC3 content of 41% compared with 67% for autosomal sequences. These figures are highly significantly different (in the Mann-Whitney U test, P 5 0.0006). Our figures appear to be consistent with analysis of all 1454 chicken coding sequences described in GenBank. The mean proportional GC3 content for these is 60.4% (from www.dna. affrc.go.jp/~nakamura/CUTG.html). It is to be expected

References 1 Duret, L. et al. (1995) Statistical-analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40, 308–317 2 Eyre-Walker, A. (1993) Recombination and mammalian genome evolution. Proc. R. Soc. London B

Outlook

that our autosomal figure should be above this because some (unknown) proportion of the 1454 genes are Z-linked. Only four sequences with putative open reading frames have been described on the chicken W-chromosome. With a mean proportional GC3 content of 34.7%, these have, as expected, a mean GC3 content lower than both Z-linked and autosomal genes. However, whereas this figure is significantly lower than the autosomal figure (in the Mann-Whitney U test, P 5 0.0009), it is not significantly different from the sequences on the Z-chromosome (in the Mann-Whitney U test, P 5 0.149), although with a total sample size of only 11, this should not be taken as a strong rejection. We conclude that, at least in some warm-blooded vertebrate species, there is a significant tendency for introns to be smaller in GC-rich regions. We have failed to reject the hypothesis that, in these species, the recombination rate also positively covaries with GC content. Hence, we cannot reject the hypothesis that recombination explains some (but possibly not much) of the variation in intron size, but we cannot know whether this is the result of stronger selection that is associated with recombination, or associated with a mutational bias. Analysis of the recombination pattern in cold-blooded species will provide a further test of the proposed link between recombination and intron size.

Acknowledgements We thank two anonymous referees for their comments and L. Duret for access to unpublished data.

252, 237–243 3 Nordborg, M. et al. (1996) The effect of recombination on background selection. Genet. Res. 67, 159–174 4 Duret, L. et al. (1994) Hovergen – a database of homologous vertebrate genes. Nucleic Acids Res. 22, 2360–2365

5 Bernardi, G. et al. (1997) The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44, 44–51 6 Ogata, H. et al. (1996) The size differences among mammalian introns are due to the accumulation of small deletions. FEBS Lett. 390, 99–103

A conserved RNA structure element involved in the regulation of bacterial riboflavin synthesis genes arge-scale sequencing of bacterial genomes has opened a new era in computational genomics. Gene complements are successfully analysed on the protein level. However, it has been noted that regulatory sites are less conserved than genes1, although it is possible to use genomic comparisons in order to predict gene regulation at the level of DNA (Ref. 2) and RNA (Refs 3, 4). Indeed, comparative analysis is one of the standard methods to predict the secondary structure of RNA. It has been used to determine the spatial structure of stable RNAs (Refs 5–7), and to analyse regulatory RNAs, mostly in viral genomes8,9. In such studies, either the common fold is selected among

L

0168-9525/99/$ – see front matter © 1999 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(99)01856-9

many predicted suboptimal folds for a set of RNAs (Ref. 8), or analysis of complementary substitutions in aligned sequences is used to construct a single conserved structure7,9. Non-viral regulatory RNA elements, such as iron-responsive elements10, and sites that regulate the initiation of translation in operons of ribosomal proteins3,4,11 often involve conserved structural elements and conserved nucleotides. Previously, we have identified a regulatory region upstream of the riboflavin operon of Bacillus subtilis and Bacillus amyloliquefaciens. Mutations in this region influence the level of riboflavin synthesis12,13. Surprisingly, this TIG November 1999, volume 15, No. 11

439