Control of Gene Expression, Posttranscriptual Regulation DJ Murphy, University of South Wales, Pontypridd, UK Ó 2017 Elsevier Ltd. All rights reserved. This article is a revision of the previous edition article by R.M. Twyman, volume 1, pp. 549–557, Ó 2003, Elsevier Ltd.
Nomenclature Cap-binding complex A protein complex that assembles on the 50 end of the mRNA and helps to recruit the small ribosomal subunit Codon bias The species-specific preference for certain codons to represent particular degenerate amino acids Codon optimization The modification of the coding region of a transgene to match the codon bias of the expression host Context-dependent silencing The recognition of a transgene as foreign DNA due to dissimilarities in base composition compared to surrounding genomic DNA Cosuppression The epigenetic silencing of an endogenous gene that is homologous to an integrated transgene Dicer An endonuclease that recognizes and cleaves doublestranded RNA molecules Exon The segments of the primary transcript that are retained in the mature transcript Hairpin RNA An RNA molecule that is self-complementary and folds back on itself Homology-dependent silencing A broad term that includes any form of gene silencing (transcriptional and posttranscriptional) provoked by the presence of multiple copies of the same sequence in the genome Initiation codon The first codon in an open reading frame; usually AUG but occasionally others Initiation complex The complex of proteins required to initiate protein synthesis; required to help the ribosome interact with the mRNA Intron An intervening sequence in the transcribed part of a gene that is removed from the mature transcript Intron-mediated enhancement The ability of a heterologous intron to increase the transcriptional activity of a promoter in transgenic plants KDEL A C-terminal tetrapeptide that causes proteins to be recycled from the Golgi apparatus to the endoplasmic reticulum Kozak consensus The sequence surrounding the initiation codon, which is required for efficient assembly of the initiation complex Mature transcript The final (processed) product of transcription. In the case of protein-encoding genes, this is also known as a messenger RNA (mRNA) Micro-RNAs (miRNAs) A ubiquitous class of endogenous small RNAs, typically of about 22 nucleotides in length, that are produced by different genes from those that they regulate Molecular chaperone A protein whose function is to help fold or refold another protein
130
Nonsense-mediated decay An alternative mRNA degradation pathway that specifically acts on mRNAs with misplaced termination codons Posttranscriptional silencing Gene silencing that requires transcription and results from the rapid degradation of RNA (also known as RNA silencing); when applied deliberately to inactivate particular genes, it is known as RNA interference Primary transcript The initial (unprocessed) product of transcription, typically containing introns that need to be spliced out; in the case of protein-encoding genes, this is also known as pre-mRNA Regulatory element A cis-acting element that regulates gene expression RNA editing A special form of RNA processing in which the information content of the coding region is changed after transcription by the substitution, insertion, or deletion of one or more nucleotides Shine–Dalgarno sequence The sequence in bacterial genes upstream of the initiation codon, which is complementary to the 16S rRNA and helps to position the ribosome correctly. Similar sequences are found in some plastid genes but not others Signal peptide A short peptide sequence present at the extreme N-terminus of a protein that directs the ribosome to receptors on the endoplasmic reticulum and inserts the protein into the lumen, the first compartment of the secretory pathway Small interfering RNA (siRNA) Short segments of doublestranded RNA, typically of 50–250 nucleotides in length; in many cases, siRNAs are the products of Dicer endonucleases, which interact with homologous RNA molecules and result in their rapid degradation Spliceosome The complex of RNAs and proteins that carries out splicing Splicing Removal of introns from the primary transcript Steady state level The amount of a specific RNA or protein present in the cell at any one time, a function of the rate of synthesis and the rate of degradation Transcriptional silencing Gene silencing involving the loss of transcriptional activity; it results from the sequestration of DNA into inactive and often hypermethylated chromatin Transgene A foreign gene introduced into a plant, such plants being described as transgenic if the transgene is nuclear or transplastomic if the transgene is integrated in the plastid genome Transgene silencing Epigenetic loss of transgene expression; can occur at either the transcriptional or posttranscriptional levels
Encyclopedia of Applied Plant Sciences, 2nd edition, Volume 2
http://dx.doi.org/10.1016/B978-0-12-394807-6.00222-7
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation
Introduction The biological information that specifies the phenotype of a plant is encoded in its genes. Crop improvement by genetic modification modifies existing genes or introduces new genes, which results in a different version of the original plant genome. The aim is to produce a change of phenotype that is beneficial to humans. However, the presence of new genes is not in itself sufficient to cause a change in phenotype. In order for this to occur, genes must also be expressed, first to produce RNA transcripts and subsequently proteins. The control of gene expression is, therefore, of paramount importance in the development of useful transgenic plants. Transgene expression is regulated by sequences present in the expression construct and factors intrinsic to the host plant. The first level of gene expression is transcription. While transcription is the predominant level of gene regulation, at least for nuclear genes, research over the last decade has revealed that the abundance and distribution of a gene’s final product also depends on posttranscriptional regulation, which occurs at many different levels and provides immense scope for increasing the diversity of gene products. After transcription, the primary transcript must be processed in a variety of ways to yield a mature transcript. Most nuclear genes of higher plants contain introns that must be removed, and variations in the pattern of splicing can produce different messenger RNAs (mRNAs) called splice variants that encode distinct proteins. An important and often-overlooked level of regulation is the control of mRNA stability, since the steady state level of a transcript depends not only on its rate of synthesis, but also on its rate of degradation. Most transgenes encode proteins, so protein synthesis is another potential opportunity for gene regulation. In the plastid genome, regulation at the level of mRNA stability and translation is thought to be more important than transcriptional regulation. Once the protein has been synthesized, it folds into its functional conformation and is often chemically modified, for example, by phosphorylation or glycosylation. Folding and modification are important determinants of protein stability and depend to a large extent on where the protein is localized in the cell. The abundance of each protein depends not only on its rate of synthesis, but also on its rate of degradation. This may be rapid for some proteins (e.g., the D1 protein in photosystem II) but extremely slow for others (e.g., seed storage proteins). The posttranscriptional regulation of gene expression is controlled by regulatory elements present in the primary transcript and mature transcript, and by peptide signal sequences in the protein. All of these elements are provided by the gene and to a certain extent, they can be recombined in expression constructs to control the posttranscriptional regulation of transgene expression precisely. In addition to these genetic mechanisms, however, nuclear genes are also subject to hostdependent epigenetic regulation, which can reduce the steady state level of mRNA in the cell by increasing the rate of mRNA degradation. This is known as posttranscriptional silencing or RNA silencing and appears to be triggered by double-stranded RNA (dsRNA). The silencing mechanism is thought to have evolved to protect plants against RNA viruses,
131
which utilize a dsRNA replication intermediate. Integrated transgenes can produce dsRNA if they are arranged as inverted repeats or if normal transcripts undergo self-priming. If the integrated transgenes are homologous to endogenous genes, the latter can also be silenced at the posttranscriptional level, a phenomenon known as cosuppression. There appears to be some cross talk between RNA silencing and transcriptional silencing mechanisms because there is evidence of chromatin remodeling and DNA methylation in some transgenes and endogenous genes that are silenced at the RNA level.
RNA Processing Splicing Higher plant genes usually contain introns and these must be spliced out after transcription. Intron splicing in plants bears all the characteristic hallmarks of other eukaryotic systems, although plant introns are on average much smaller than those of vertebrates and have a much higher proportion of uridine residues. Like those of vertebrates, most plant introns follow the so-called GU–AG rule, with AGYGUAAGU and UGCAGYG consensus sequences representing the 50 and 30 splice sites, respectively. Splicing involves two sequential transesterification reactions, the first joining the 50 splice site to an internal branch point in the intron, creating a lariat intermediate, and the second joining the 50 and 30 exons and eliminating the intron. This process is preceded by the assembly of a ribonucleoprotein complex called the spliceosome from small nuclear ribonucleoprotein particles (snRNPs) and associated proteins. As is the case for vertebrates, a minority of plant genes contain AU–AC introns that are processed by an alternative spliceosome assembly. One of the major differences between plant and vertebrate genes is the presence of AU-rich sequences in plant introns. These are required for efficient splicing, and may either reduce secondary structure or bind plant-specific components of the splicing apparatus. In support of the latter hypothesis, a protein called UBP-1 isolated from tobacco (Nicotiana tabacum) nuclei has been shown to recognize oligouridylate sequences in vitro and enhance splicing in vivo. Transgene expression in plants is occasionally disrupted by the presence of cryptic AU-rich sequences that are recognized by the splicing apparatus, as occurs in unmodified versions of Bacillus thuringiensis (Bt) toxin genes. The other major difference between plant and vertebrate genes is the difference in intron size, which may reflect the mechanism of intron recognition. In vertebrates, which typically have small exons and very large introns, the introns are thought to be delimited by protein complexes that span the exon (exon-definition). In plants, with smaller introns, both exon-definition and intron-definition mechanisms may operate. Alternative splicing, resulting in multiple mRNAs from the same primary transcript, is a relatively common phenomenon in vertebrates. The prevalence of alternative splicing in plants is unknown, and where alternative splicing has been documented its biological significance is often unclear. A recent report with relevance to crop improvement is the demonstration of alternative exon inclusion in the tobacco N gene for resistance to
132
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation
tobacco mosaic virus (TMV). The N gene produces two mRNAs, NS and NL, of which NS is more abundant in noninfected plants. N1 becomes prevalent after infection and is required to confer complete resistance to TMV. As in animals, SR proteins (serine/arginine-rich proteins) play an important role in regulating splice-site selection by binding to splicing enhancers or silencers. Varying levels of specific SR proteins in different tissues might be responsible for cell-specific or developmentally regulated differences in mRNA splicing patterns. One major practical application of splicing in transgenic plants is the improvement of transgene expression levels by intron-mediated enhancement (IME). In the best cases, the inclusion of an intron enhances transgene expression several 100-fold, although this depends on the intron, the flanking exon sequences, the promoter, the cell type, and the plant species. IME is generally more effective in monocotyledons than dicotyledons, although 30-fold enhancement has been achieved in Arabidopsis. The molecular basis of intronenhanced transgene expression has yet to be investigated in detail.
the mechanism is completely conserved. A minor pathway in yeast, the deadenylation-independent decapping decay pathway, is thought to mediate the degradation of mRNAs with incorrectly placed nonsense codons. This process is sometimes termed nonsense-mediated decay and is a form of mRNA surveillance, avoiding the accumulation of nonfunctional RNA molecules. Components of this pathway have been identified in plants and the accelerated decay of transcripts with 50 nonsense codons has been documented. As discussed above, a separate RNA degradation pathway appears to have evolved specifically to protect plants from RNA viruses. The substrate for this pathway is dsRNA, an essential intermediate in most RNA virus replication cycles. The same pathway can be activated by transgene expression if dsRNAs are adventitiously synthesized, as often occurs if transgenes are integrated as inverted repeats. The impact of RNAmediated silencing on transgene expression is considered in more detail below.
Protein Synthesis Initiation of Translation
Messenger RNA Stability The steady state level of a given mRNA depends on its rate of transcription and its rate of turnover. In both animals and plants, the half-life of endogenous mRNAs varies from minutes to days depending on the transcript in question. This intrinsic stability is determined by sequence and/or structural determinants on the transcript, which influence its interaction with the RNA degradation machinery. For some mRNAs, the turnover rate is fixed, while for others it is regulated by internal or external signals. This indicates that trans-acting factors must interact with the transcript to modulate its stability. An example is the pea (Pisum sativum) ferredoxin (Fed-1) mRNA, whose stability is regulated by light. Sequences that confer mRNA stability have been identified in vertebrates but not thus far in plants. Such elements may eventually be found in highly stable transcripts such as those of the cereal storage protein genes. The motif AUUUA has been shown to confer instability upon a number of naturally occurring mammalian mRNAs including those encoding short-lived products such as growth factors and cytokines. The same sequence is able to confer instability on recombinant plant mRNAs, although endogenous genes containing this sequence remain to be identified. Another motif, known as the downstream element (DST), has been shown to confer instability on a number of plant genes. The DST motif was originally identified in the small auxin-up RNA transcript (SAUR) and increases the turnover rate of reporter gene transcripts in transgenic plants. Such sequences should be eliminated from transgenes to help achieve maximum expression levels. For example, the presence of AU-rich instability sequences in unmodified Bt toxin transgenes is another reason why such genes are expressed at very low levels in transgenic plants. Endogenous mRNAs are broken down by the general RNA degradation machinery. Several pathways have been identified in yeast, the most important of which is the deadenylationdependent decapping decay pathway. Components of this pathway have been identified in plants but it is not clear if
The initiation of protein synthesis in eukaryotes is usually dependent on the 50 7meG cap. This is recognized by a capbinding complex comprising various initiation factors as well as the polyadenylate-binding protein (PABP). The assembly of these proteins brings the 50 and 30 ends of the transcript together in a circular RNP complex, which is required for initiation and may facilitate reinitiation with the same ribosome (Figure 1). Global regulation of protein synthesis in animals is accomplished by reversible phosphorylation of the initiation factors that are required to assemble the ribosome on the mRNA. In plants, the global regulation of translational initiation is complicated by the presence of two distinct cap-binding complexes, one comprising the initiation factors eIF4E and eIF4G, and the other comprising the initiation factors eIFiso4E and eIFiso4G. The remaining components of the initiation complex are the same in each case: the RNA helicase eIF4A, the RNA-binding protein eIF4B, and PABP (Figure 1). The
Figure 1 Assembly of the circular translational initiation complex. Each initiation factor is represented by a number, e.g., 2 ¼ eIF2 and 4A ¼ eIF4A. Initiation factor eIF4E (4E) or eIFiso4E binds to eIF4G (4G) or eIFiso4G, and the 50 -7mGpppN cap of the mRNA (shown as CAP). Interactions between eIF4 subunits and polyA-binding protein (PABP) cause the mRNA to circularize. Initiation factor eIF4B may also interact with the 50 untranslated region (UTR) of the mRNA. Components of the initiation complex that can be phosphorylated, thus affecting their interactions with each other and the mRNA, are shaded.
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation abundance and activity of these alternative initiation factors differs according to cell type, developmental stage, and external signals. The genes encoding these proteins are expressed at different levels and the proteins themselves are reversibly phosphorylated under various forms of stress. Once assembled, the cap-binding complex then interacts with eIF3 to position the preinitiation complex (ribosomal small subunit and eIF2a-GTP-tRNAmet ternary complex) near the 50 end of the mRNA, allowing it to scan along the 50 leader sequence toward the initiation codon. The correct initiation codon is embedded within a short sequence known as the Kozak consensus. Once this has been identified, eIF2a-GDP is released from the initiation complex and the large subunit of the ribosome is recruited, processes that require the activities of eIF5 and eIF5B. The efficiency of translational initiation can also be regulated by sequences that influence the efficiency of ribosomal scanning. In plants, as in other eukaryotes, a common regulatory mechanism is the presence of short open reading frames upstream of the genuine initiation codon. In many cases, these interfere with translational initiation, but they can also confer a translational advantage, as is the case for the polycistronic CaMV 35S RNA. Transcript-specific regulation can be superimposed over global regulation and can be used to protect certain genes from global repression. For example, the 50 leader sequences of the corn (Zea mays; maize) HSP70 mRNA and the TMV genomic RNA confer a translational advantage in heatshocked cells because they recruit proteins that maintain the circular organization of the transcript, while in other transcripts the interactions are disrupted due to the phosphorylation of initiation factors. In transgenic plants, optimal translational initiation can be achieved by removing negative regulatory elements from the 50 and 30 untranslated sequences and replacing them with translational enhancers such as the TMV 50 leader, also known as the omega sequence. The translational start site should contain a single AUG codon within an optimized Kozak consensus sequence. If the transgenic plant is intended to tolerate of a particular form of biotic or abiotic stress, it may be desirable to model the untranslated regions of the transgene on endogenous transcripts that are known to escape global repression under those stressful conditions.
Polypeptide Elongation Translational elongation factors are also subject to global regulation but there is little that can be done in terms of expression construct design to exploit this. However, one feature of construct design that can have a significant effect on polypeptide elongation in transgenic plants is the codon bias in the transgene’s coding region in comparison to the codon preferences of the host. Different species have very different codon preferences when specifying degenerate amino acids. Taking the amino acid arginine as an example, the codon CGU is preferred in alfalfa (Medicago sativa, lucerne), and is 50 times more likely to occur than the rarest codon, CGG. In contrast, both of these codons are equally prevalent in corn, but the preferred choice is CGC. In yeast, AGA is by far the most popular codon for arginine, and this is also the case in potato (Solanum tuberosum), soybean (Glycine max), tobacco, and
133
tomato (Solanum lycopersicum). The expression of foreign transgenes in transgenic plants can therefore be very inefficient if infrequently used codons are abundant. The ribosome pauses at such codons and this generally reduces the rate at which the recombinant protein accumulates. However, more serious problems can also occur such as premature termination, misincorporation, frameshifting, and skipping. This is yet another reason why native Bt genes are not expressed efficiently in transgenic plants. Codon optimization is the process of modifying a foreign transgene by introducing silent mutations that bring codon preference in-line with that of the host plant.
Protein Stability Protein stability is an often-overlooked component of transgene expression, but in many cases, this is the most crucial factor in the development of useful transgenic plants. Proteins emerging from the ribosome must fold to achieve their native conformation, they are often chemically modified, and some must then assemble into multiprotein complexes. These posttranslational processes are interdependent and often depend critically on the site within the cell where the protein accumulates. Therefore a failure of any one process can lead to the accumulation of a misfolded and nonfunctional polypeptide that is rapidly degraded. Many simple proteins fold spontaneously to form their native conformation. In the case of soluble globular proteins, this process is driven by hydrophobic collapse, i.e., the shielding of hydrophobic groups in the core of the protein while hydrophilic residues form the external loops. Local secondary structures form within a few milliseconds and then the protein gradually adopts its final tertiary conformation over the next few seconds or minutes. In many cases, this folding process can be recapitulated if proteins are denatured and then allowed to refold in vitro. More complex proteins are unable to fold or refold spontaneously. This suggests that multiple folding pathways are available and that it is possible for proteins to get ‘stuck’ in an unproductive pathway. To prevent this outcome, the cell produces proteins known as molecular chaperones whose function is to help other proteins fold to achieve their correct conformations. In plants, molecular chaperones are not ubiquitously available. Many are synthesized specifically in the secretory pathway and this is also the compartment wherein glycosylation and other forms of posttranslational modification take place. The chemical environment of the endoplasmic reticulum is also the most suitable for the formation of disulfide bonds (which are required both to fold individual polypeptides and to assemble them into complexes). The control of protein targeting in the cell is therefore an important component of the design of plant expression constructs. There is always a trade off between the efficiency of protein accumulation, the functionally most suitable site of accumulation, and the health of the plant. For example, full-size recombinant antibodies produced in plants accumulate to very low levels if they are targeted to the cytosol. The reasons for this include the absence of molecular chaperones (resulting in inefficient folding), the unsuitable environment for disulfide bond formation, and the lack of glycosylation. Much better levels are obtained if antibodies accumulate in the secretory pathway, which can be achieved by the inclusion
134
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation
of an N-terminal signal peptide. Both plant and animal signal peptides appear to work equally well. The default site of accumulation for proteins thus targeted is the apoplast, the space between the plasma membrane and the cell wall. However, the inclusion of a KDEL signal at the C-terminus of the recombinant protein causes it to be retrieved from the Golgi apparatus and returned to the lumen of the endoplasmic reticulum in the manner of a resident endoplasmic reticulum protein. This strategy is appropriate if the goal is maximal accumulation, but if the antibody is designed to protect the plant from viral pathogens then the cytoplasm is a functionally more suitable accumulation site because this intersects with the viral infection cycle. Thus, a cytosolic antibody has been found to confer better protection than an equivalent antibody in the secretory pathway, even though the stability of the former was so low that the secretory pathway antibody was 40 000 times more abundant. Molecules that interfere with the normal physiology of the plant might be better targeted to other compartments, such as the chloroplast or the vacuole. For example, recombinant avidin expressed in plants accumulated in the vacuole to much higher levels than possible in other compartments because of toxicity effects resulting from the interference with endogenous biotin metabolism.
Posttranscriptional Gene Silencing Transgene silencing and cosuppression occur at both the transcriptional and posttranscriptional levels. Transcriptional silencing often occurs due to the integration of the transgene at a genomic position that is already repressed (positiondependent silencing). However, even transgenes in active regions of the genome may be silenced if they are recognized as foreign invaders. This context-dependent silencing, which may detect unusual DNA structure or base composition, is a defense mechanism against invasive DNA, i.e., transposons and other mobile elements. Similarly, transcriptional silencing can occur in complex transgene loci containing tandem or inverted repeats, again because the cell may recognize certain features shared with active transposons. In transcriptional silencing, the silenced locus is generally hypermethylated and sequestered into inactive chromatin, and no mRNA is produced. In the case of posttranscriptional silencing, mRNA is produced (indeed transcription is necessary for silencing) but it is rapidly degraded. This is confirmed by nuclear run-on assays, which measure the amount of premRNA in the nucleus. Like transcriptional silencing, posttranscriptional silencing appears to have evolved as a defense against invasive nucleic acids. In this case, the target is viruses and the trigger is dsRNA. Similar defense pathways have been identified in many animals and are starting to be exploited to deliberately interfere with endogenous gene expression (RNA interference). These posttranscriptional mechanisms can be grouped under the collective term ‘RNA silencing.’ RNA silencing occurs in transgenic plants if the transgene produces dsRNA. At multicopy loci, this can occur by the adventitious production of hairpin RNAs, either through transcriptional read-through from adjacent transgene copies arranged as inverted repeats or through the transcription of inverted partial transgene copies. The resulting dsRNA is a substrate for the ribonuclease Dicer, which cleaves it into
short fragments 21–23 bp in length. Dicer endonucleases are multidomain RNase III enzymes that have been identified in many eukaryotes. In Arabidopsis, there is evidence for at least four Dicer homologs, one of which is encoded by the gene CARPEL FACTORY (CAF) also known as SHORT INTEGUMENT 1 (SIG1). The activity of Dicer alone would be insufficient to destroy invading RNA viruses, since chopping up the replicative intermediate would inhibit replication but would not prevent transcription of single-stranded copies of the genome. In a second phase of the process, however, the short interfering RNA (siRNA) fragments produced by Dicer then associate with other proteins to assemble an RNA-induced silencing complex (RISC), which recognizes homologous single-stranded RNAs and degrades them (Figure 2). This is a very efficient process which can eradicate even quite abundant viral RNAs. If dsRNA corresponding to a transgene or an endogenous gene is produced in the cell, the same potent defense mechanism eradicates the corresponding mRNA as soon as it is synthesized. The silencing mechanism is systemic, reflecting the ability of siRNAs to spread throughout the plant. It would appear that RNA-mediated transgene silencing could be avoided by making sure transgenic loci were organized in such a fashion that hairpin RNA could not form. However, posttranscriptional silencing of actively transcribed single-copy loci has also been documented, so another mechanism of dsRNA formation must be possible. It has been suggested that there is a threshold level of mRNA above which the Dicer enzyme is activated, to prevent runaway expression of viral genes. One possible mechanism is saturation of the polyadenylation machinery. This would result in the accumulation of unadenylated transcripts which might be capable of selfpriming and the formation of hairpins.
Roles of Small RNAs in Posttranscriptional Regulation It is now known that many instances of posttranscriptional gene silencing in plants involve the action of so-called small RNAs (often referred to as small interfering RNAs or siRNAs). Small RNAs are noncoding RNA molecules typically of 50– 250 nucleotides in length. They often act as gene regulators and they have important roles in most eukaryotes, including plants. They repress gene expression by acting either on DNA to guide the elimination of inappropriate sequences and chromatin remodeling or on mRNA to guide cleavage and translation repression. Small RNAs are involved in many processes associated with maintenance of genome stability, organism development, and adaptive responses to biotic and abiotic stresses. They have a wide diversity in their mechanisms of action. For example, they guide DNA elimination during the formation of heterochromatin assembly in fungi and plants. They target endogenous mRNAs for cleavage and translational repression in plants and animals, and protect both plant and animal cells against virus infection through an RNA-based immune system. They also control the movement of transposable elements at the transcriptional and posttranscriptional level in plants and animals. Because small RNAs are repressors of gene expression, small RNA-mediated regulation of gene expression is often referred to
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation l l l l l l l
135
the transgenic locus should be protected from position effects the transgene should have a similar base composition to the host plant superfluous sequences, particularly the plasmid backbone, should be avoided the transgene copy number should be kept as low as possible the locus structure should be simple, preferably a single intact copy of the transgene inverted repeat structures should be avoided wherever possible maximized transcription may be counterproductive
Chloroplast Gene Expression The chloroplast genome differs in many respects from the nuclear genome and most of these differences reflect the prokaryotic origin of plastids. One distinction with particular relevance to this article is the relative importance of transcriptional and posttranscriptional gene regulation. While chloroplast genes are regulated at the transcriptional level, posttranscriptional gene regulation is predominant. As might be expected, light plays an important role in the regulation of many chloroplast genes. Notably, neither transcriptional nor posttranscriptional silencing has been demonstrated for chloroplast genes, which means that very high levels of transgene expression can be achieved in plants with genetically transformed chloroplasts. Figure 2 Mechanism of epigenetic RNA silencing. Double-stranded RNA (dsRNA), the trigger for RNA silencing, is the substrate for the enzyme Dicer, which in Arabidopsis may be encoded by the CARPEL FACTORY/SHORT INTEGUMENT1 (CAF/SIN1) gene. The enzyme cleaves the dsRNA into short interfering RNAs (siRNAs) that are 21–25 bp in length with 2-nt overhangs at each end. The siRNAs are incorporated into the RNA-induced silencing complex (RISC), which has a conserved domain with Dicer that may facilitate their interaction. The RISC then interacts with and cleaves homologous mRNAs using the siRNA to achieve sequence specificity.
as RNA silencing, gene silencing, or RNA interference (RNAi). Small RNAs regulate the genes from which they themselves are derived and they are therefore referred to as cis-acting or autosilencing RNAs. In contrast, so-called micro-RNAs (miRNAs) are endogenous small RNAs, typically of about 22 nucleotides in length, that are produced by different genes from those that they regulate. Therefore, miRNA-mediated gene regulation is referred to as heterosilencing. The modes of action of siRNA- and miRNA-mediated gene regulation are summarized in Figure 3.
Avoiding Transgene Silencing in Transgenic Plants From the discussion above, several conclusions can be drawn about transgene silencing and several general strategies can be employed to help prevent it:
Posttranscriptional Regulation of Chloroplast Gene Expression RNA processing, degradation, and translation are the predominant levels of gene regulation in the chloroplast genome. Several distinct forms of RNA processing occur in chloroplasts. Like bacterial transcripts, chloroplast transcripts are not capped and generally not polyadenylated. Introns occur in some chloroplast genes but these are generally of the selfsplicing variety. An interesting feature of chloroplast gene regulation that is rare in the nuclear plant genome is RNA editing, the replacement of individual cytosine residues with uracil after transcription. Translation of chloroplast mRNAs has many features in common with bacterial translation, including the use of polycistronic transcripts, the requirement for similar initiation factors, the assembly of a eubacterial-like ribosome, and the formylation of the initiator amino acyltRNA. Chloroplast ribosomes exist either free in the stroma or attached to the thylakoid membranes, the latter allowing the cotranslational insertion of nascent polypeptides into the membrane. The initiation codon in chloroplast genes is almost always AUG, but occasionally GUG may be used as is the case in eubacteria. In higher plants, the ribosome often binds to a sequence reminiscent of the eubacterial Shine–Dalgarno (SD) motif, which is complementary to the 16S rRNA and helps position the ribosome over the initiation codon. In
136
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation
Figure 3 Major miRNA, ta-siRNA, nat-siRNA, and hc-siRNA pathways in plants. A color code is used to indicate members of the same gene family. Question marks indicate that a member is likely to play a role in the pathway, but the identity of the protein has not been experimentally determined. HYL1 is referred to as DRB1 for clarity. The arrow between the miRNA and ta-siRNA pathways indicates that the miRNA pathway is required for the proper functioning of the ta-siRNA pathway. The multiple arrows emanating from the sense gene in the nat-siRNA pathway indicate that the RNA transcribed from this gene is used at various steps in this pathway. DNA modification steps in the hc-siRNA pathway involve additional locus-specific components that are not represented in this figure. Reproduced with permission from Vaucheret, H., 2006. Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev. 20, 759–771.
some cases, the mechanism appears similar to that in bacteria (i.e., the SD-sequence places the 30S ribosomal subunit at the correct position on the transcript for translational initiation; an example of this is the barley (Hordeum vulgare) rbcL gene). In others the ribosomal subunit may bind to the SDsequence, but must then scan along the transcript to find the initiation codon (e.g., the barley psbA gene). In algae, about half of the chloroplast genes lack an SD-sequence and initiation must occur through a different mechanism involving the binding of specific translational initiation factors to sequences in the 50 untranslated region. Where SD-sequences do exist, mutagenesis experiments suggest that the secondary structure formed by such elements is more important than the sequence itself. The stability of some chloroplast RNAs is regulated by RNA-binding proteins that respond to light. For example, nuclear-encoded protein factors have been identified in Chlamydomonas reinhardtii that bind to the 50 untranslated region (UTR) of the psbB and psbD transcripts, and increase their stability. Similarly, light-dependent proteins are also known
to increase the efficiency of translation. The well-studied example of C. reinhardtii psbA mRNA is particularly informative since the components of the translational regulatory complex have been identified. Four nuclear-encoded proteins assemble on the 50 UTR, one of which is a polyadenylatebinding protein. Although chloroplast transcripts are generally not polyadenylated, this protein recognizes a similar adenine-rich element in the psbA 50 UTR. Another component of the complex, RB60, is structurally modified by light-dependent signals. For example, it is phosphorylated by an ADP-dependent kinase, which accumulates in the dark, therefore leading to the dissociation of the complex in the absence of sunlight. The redox state of the chloroplast, which is influenced by photosynthetic activity, also changes the structure of RB60 by reversibly oxidizing and reducing critical disulfide bonds. Certain chloroplast genes may also be regulated at the translational level by attenuation, i.e., the binding of the gene product to the mRNA. An example is the C. reinhardtii cytochrome f protein, which inhibits its own transcript (petA).
Plant Breeding and Genetics j Control of Gene Expression, Posttranscriptual Regulation
See also: Plant Breeding and Genetics: Chromosome Engineering; Insertional and Transposon Mutagenesis; Molecular Biology of Development. Seed Development and Germination: Embryogenesis.
Further Reading Allen, E., Xie, Z., Gustafson, A.M., Sung, G.H., Spatafora, J.W., Carrington, J.C., 2004. Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana. Nat. Genet. 36, 1282–1290. Ambrose, V., 2001. Dicing up RNAs. Science 293, 811–813. Bartel, D.P., 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297. Baulcombe, D., 2004. RNA silencing in plants. Nature 431, 356–363. Dunoyer, P., Himber, C., Voinnet, O., 2005. DICER-LIKE 4 is required for RNA interference and produces the 21-nucleotide small interfering RNA component of the plant cell-to-cell silencing signal. Nat. Genet. 37, 1356–1360.
137
Floris, M., Hany, H., Lanet, E., Robaglia, C., Menand, B., 2009. Post-transcriptional regulation of gene expression in plants during abiotic stress. Int. J. Mol. Sci. 10, 3168–3185. Hackenberg, M., Huang, P.J., Huang, C.Y., Shi, B.J., Gustafson, P., Langridge, P., 2013. A comprehensive expression profile of microRNAs and other classes of non-coding small RNAs in barley under phosphorous-deficient and -sufficient conditions. DNA Res. 20, 109–125. Kim, V.N., 2005. Small RNAs: classification, biogenesis, and function. Mol. Cells 19, 1–15. Marchive, C., Nikovics, K., To, A., Lepiniec, L., Baud, S., 2014. Transcriptional regulation of fatty acid production in higher plants: molecular bases and biotechnological outcomes. Eur. J. Lipid Sci. Technol. 116, 1332–1343. Sunkar, R., Li, Y.F., Jagadeeswaran, G., 2012. Functions of microRNAs in plant stress responses. Trends Plant Sci. 17, 196–203. Vaucheret, H., 2006. Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev. 20, 759–771. Yoshikawa, M., Peragine, A., Park, M.Y., Poethig, R.S., 2005. A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev. 19, 2164–2175.