The trihelix family of transcription factors – light, stress and development

The trihelix family of transcription factors – light, stress and development

Review The trihelix family of transcription factors – light, stress and development Ruth N. Kaplan-Levy1, Philip B. Brewer2, Tezz Quon and David R. S...

2MB Sizes 0 Downloads 160 Views

Review

The trihelix family of transcription factors – light, stress and development Ruth N. Kaplan-Levy1, Philip B. Brewer2, Tezz Quon and David R. Smyth School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Vic 3800, Australia

GT factors are the founding members of the trihelix transcription factor family. They bind GT elements in light regulated genes, and their nature was uncovered in a burst of activity in the 1990s. Study of the trihelix family then slowed. However, interest is now re-awakening. Genomic studies have revealed 30 members of this family in Arabidopsis and 31 in rice, falling into five clades. Newly discovered functions involve responses to salt and pathogen stresses, the development of perianth organs, trichomes, stomata and the seed abscission layer, and the regulation of late embryogenesis. Thus the time is ripe for a review of the genomic and functional information now emerging for this neglected family. Trihelix transcription factors There are over 60 transcription factor families in plants [1], and their functions are being progressively defined. However, the trihelix family has been rather neglected despite recent expansion in knowledge of the roles of members. This review aims to provide a current integrated overview of their structure and function, and their significance in physiology and development. Discovery of GT factors Trihelix transcription factors were discovered as proteins that bind specifically to GT elements required for light response. GT-1 The first discovered was the GT-1 factor of pea (Pisum sativum). It binds specifically to a GT element in the promoter of the light-induced rbcS-3A gene. The core sequence, 50 -GGTTAA-30 , is sufficient for light induction and provided the factor’s name [2]. When the gene encoding the GT-1 factor was cloned from tobacco (Nicotiana tabacum), it was revealed to carry three a-helical sequences separated by loops or turns, a domain required for DNA binding [3–5] (Figure 1). Study of the orthologous gene from Arabidopsis (Arabidopsis thaliana) showed that it binds DNA as a dimer, promoted by the C terminal region [6], although tetramers were also detected in tobacco [5]. GT-1 of Arabidopsis is a transcriptional activator, and can activate transcription directly through stabilization of the TFIIA–TBP–TATA components of the pre-initiation complex [7]. Corresponding author: Smyth, D.R. ([email protected]) Present address: Yigal Allon Kinneret Limnological Laboratory, Israel Oceanographic and Limnological Research, P.O.B. 447, Migdal, Israel. 2 Present address: School of Biological Sciences, The University of Queensland, St Lucia, Qld 4072, Australia. 1

Expression of the GT-1 gene is ubiquitous and unaffected by the light regime [3,4]. In this case, how could GT-1 respond to light and transmit the signal to its target genes? Study in Arabidopsis showed that light-dependent induction probably depends on its phosphorylation, as binding of GT-1 to its target was 10–20 times stronger after calciumdependent phosphorylation of a threonine in the trihelix domain [8]. GT-2 Meanwhile, the nature of a second DNA binding factor, GT2, was being revealed. Another light-regulated target gene was involved, the phytochrome A ( phyA) gene of rice, but in this case its expression is repressed by light. Conserved core elements, GGTAATT and GGTAAAT, occur in its promoter, and initially it was thought that these were also a target of GT-1 [9]. However, upon cloning the responsible gene, the factor was found to differ from GT-1 in its sequence binding preference and was thus named GT-2 [10]. The GT-2 protein carries duplicate DNA-binding domains, each related to that of GT-1 (Figure 1). The three amphipathic a-helices led to the name trihelix for this domain [11]. Like GT-1, expression of GT-2 is constitutive and not significantly influenced by light treatment in rice [12] or Arabidopsis [13]. Also like GT-1, GT-2 can act as a transcriptional activator [14]. Repression of its target genes following light treatment presumably requires its post-translational modification, or the joint action of other factors. This is where the field stood at the end of the 1990s when knowledge of trihelix genes was last reviewed [15]. Structural genomics With the publication of the Arabidopsis genome in 2000 [16], rice (Oryza sativa) in 2002 [17,18], and other plant species subsequently, analysis of the full complement of trihelix genes and comparison of their structure became possible (e.g. http://plntfdb.bio.uni-potsdam.de/v3.0/ [19]; http:// planttfdb.cbi.pku.edu.cn/ [20]). There are 30 trihelix genes in Arabidopsis and 31 in rice (Table 1; Figure 2). Thus the number is relatively modest compared with some of the giants of the plant transcription factor world, such as the MYB, AP2/EREBP, bHLH, NAC and C2H2 zinc finger families, all with more than 100 members in Arabidopsis [1]. The trihelix The defining feature of the family is the trihelical DNA binding domain. This is not a completely new domain as it

1360-1385/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tplants.2011.12.002 Trends in Plant Science, March 2012, Vol. 17, No. 3

163

Review

Trends in Plant Science March 2012, Vol. 17, No. 3

GT-1

W

At1g13450 (GT-1)

W

W W

W

W

W W

At2g38250 (GT-3b)

GT-2

F

W

W W

At1g76890 (GT-2)

SH4

W

W W

At1g31310

GTγ W

W

F

At1g21200

SIP1

W

W

I

100 amino acids

At3g58630 TRENDS in Plant Science

Figure 1. Structure of representative members of the five clades of Arabidopsis trihelix proteins. The trihelix DNA binding domains are shown in yellow. The three individual amphipathic a-helices and their conserved tryptophans (W), or other amino acid at this location, are indicated. The fourth amphipathic a-helix, with the general sequence (F/Y)(F/Y)-X-X-(L/I/M)-X-X-(L/I/M), is shown in magenta. This is not present in the SH4 clade, but they carry an extended third trihelix. Long, uninterrupted a-helical sequences are colored cyan. These are mostly predicted to form coiled-coils and have been shown to be associated with multimerization in GT-3a and GT-3b [25]. Their sequences are conserved within clades, but are unrelated between clades, except for a very distant relationship across the GT-1 and GT-2 clades [13], and the GTg and SIP1 clades.

has similarities to the individual repeats of the MYB family from which the trihelix may have been derived [21]. Each MYB repeat is made up of three consecutive a-helices, and in plants the repeats usually occur in two (R2R3) or three (R1R2R3) tandem copies. There are three highly conserved, regularly spaced, tryptophan (W) residues in each repeat, one per helix. The solution structure of a MYB repeat revealed that the three helices make a compact folded structure with the three tryptophans in an inner hydrophobic core [22]. Strikingly, GT-1 and GT-2 proteins also carry a conserved tryptophan closely upstream or within each helix (Figure 1), although the number of residues between the first two is consistently higher in trihelix proteins (usually 28) than in MYB repeats (19) [21]. Recently the solution structure of GT-1 of Arabidopsis was reported [23] (Figure 3a). Its shape is related to the MYB repeat, although individual helices are longer, and the target sequences are different. The helices also form a bundle held together by a hydrophobic core, which is preceded by a flexible N terminal arm. The third helix is held against the other two, and also contacts the major groove where recognition of the GGTTAA element is likely. The binding of GT-1 to its target is strengthened by phosphorylation of threonine-133 [8], and this was also the case when the threonine was replaced by aspartate to mimic its phosphorylation [23]. Significantly, residue 133 occurs in the third helix where DNA recognition is proposed (Figure 3a). Family relationships The conserved trihelical domain has been used to generate a phylogeny of all 30 family members in Arabidopsis and 164

the 31 in rice (Figure 2). They fall into five clades, here named after the relevant founding member [24]. One wellsupported branch leads to the GT-1 clade with five members in Arabidopsis and four in rice, and the GT-2 clade with seven in Arabidopsis and five in rice. In the GT-2 clade, the N terminal and C terminal trihelices (TN and TC, respectively) were each included in the phylogeny, and they mostly separate out into two clusters – as expected if they were already present in a common ancestor. However, in Arabidopsis the N terminal trihelix of At5g28330 and the C terminal trihelix of PTL are particularly divergent, and their place is not defined with surety. Also, two GT-2 clade proteins of Arabidopsis, EDA31 and At5g47660, each have only one trihelix (Table 1) indicating that the other has been lost. Similarly, in rice there are two genes with only the C terminal trihelix, LOC_Os04g45750 and LOC_Os02g01380, although the former does have two trihelical domains in the original OsGT-2 gene identified in a different sub-species, ssp. indica [11]. There are three other well-supported clades, SH4, GTg and SIP1. Note that the membership of each clade is clearly resolved in this analysis, but the relationship between clades, and the AtMYB reference group, is not. All that can be concluded is that the GT-1 and GT-2 clades are more closely related to each other than to the others. However, in support of the membership of each clade, a key amino acid likely involved with DNA target recognition varies according to clade. The position of the usual third tryptophan is occupied instead by phenylalanine in the GT2 N terminal trihelix and in the GTg clade, and by isoleucine (or less commonly methionine or valine) in the SIP1 clade (Figure 1).

Review

Trends in Plant Science March 2012, Vol. 17, No. 3

Table 1. Structure and known functions of Arabidopsis trihelix proteins (and those of related genes in other species) Locus a

Name b

Structure c

Analysis d

Function

Refs

Binds GGTTAA, light inducible genes (tobacco, Arabidopsis, rice) Binds GGTTAA, light inducible genes (Arabidopsis) Binds GTTAC, light inducible genes (Arabidopsis) Expression rapidly induced by salt, pathogen stress (Arabidopsis, soybean) Lactamase/trihelix chimera, essential in early embryogenesis (Arabidopsis)

[3–8,23,28] [27] [25] [25,29]

Binds GGTAATT, GGTAAAT, light repressible genes (rice, Arabidopsis) Repression of endoreduplication in trichomes, repression of repressor of stomatal development, binds GGTAAA (Arabidopsis) Binds GGTAATT, TACAGT, light repressible genes (pea); expression down-regulated by light (GmGT-2, soybean) Regionalized growth suppression in developing perianth (Arabidopsis) Embryo sac development (Arabidopsis) Tolerance to salt, freezing, drought stress (GmGT-2B, soybean)

[10–14]

Development of seed abscission layer (rice)

[44,45]

Tolerance to salt stress (rice)

[46]

A. tumefaciens 6b-interacting protein (tobacco) Repression of late embryogenesis genes, binds GTGATT (Arabidopsis) Repression of late embryogenesis genes (Arabidopsis)

[47] [24,48]

Trihelix/aa-kinase chimera, vegetative development (Arabidopsis)

[49]

FRIGIDA interacting protein (Arabidopsis)

[50]

mol mut o-e GT-1 clade At1g13450 At3g25990 At5g01380 At2g38250

GT-1 GT-4 GT-3a GT-3b

T T T T

F F F F

C C

+ + + +

+   

+   

At5g63420 EMB2746

L

T

F



+



GT-2 clade At1g76890 GT-2

TN F

C TC F +





At1g33240 GTL1

TN F

C TC F +

+



At1g76880 DF1-like

TN F

C TC F +





At5g03680 PTL

TN F

C TC F +

+

+

EDA31

TN F C  TN F C TC F +  C TC F

+  

 + 

SH4-like1 SH4-like2

T T T T



+



 

 

 

At3g10000 At5g28300 At5g47660 SH4 clade At2g35640 At1g31310 At2g33550 At4g31270 GTg clade At1g76870 At1g21210 At3g10040 SIP1 clade At3g11100 At5g05550 At3g58630 At1g54060

S S S S?

GTg-1-3-like1 T GTg-1-3-like2 T T

F F F

G G G

+

+

+







T T T T

F F F F

I I I I

  + +

   +

   

T T T T T T T

F F F F F F F

I I K I I I I

     + 

+  +   + 

     + 

SIP1-like ASIL1

At3g14180 ASIL2 At3g24490 At3g10030 At3g54390 At2g44730 At4g17060 FIP2 At3g24860

[31]

[33–35] [21,37] [38–40] [43] [26]

[48]

a

Databases of plant transcription factors list all loci except for At4g17060 (FIP2) (planttfdb.cbi.pku.edu.cn) [20], or this locus, three of the SH4 clade and all of the GTg clade (plntfdb.bio.uni-potsdam.de/v3.0/) [19].

b

Arabidopsis genes known only through their closest relative in another species are listed as ‘like’. In two cases where there are two such Arabidopsis genes, they are listed arbitrarily as ‘like1’ and ‘like2’.

c

Abbreviations: T, trihelix (TN, N terminal; TC, C terminal); F, fourth helix; C, central a-helical domain; G, GTg a-helical domain; I, SIP1 a-helical domain; S, SH4 a-helical domain; K, aa-kinase sequence; L, lactamase sequence.

d

Methods used in characterization: mol, molecular and cellular observations (including electromobility shift assays, yeast two and one hybrid studies, and cellular and tissue localization methods); mut, mutant and antisense effects; o–e, consequences of overexpression.

The fourth helix A fourth amphipathic a-helix lying closely downstream of the other three was discovered in GT-1 [5] and GT-2 [14], and shown to be required for DNA binding. Close inspection of the whole family in Arabidopsis reveals its presence in all members except the SH4 group, which, however, has an extended third helix (Figure 1, Table 1). It occurs as a short but strongly conserved motif (F/Y)-(F/Y)-X-X-(L/I/M)X-X-(L/I/M), where X mostly represents a hydrophilic residue (often a charged R, K, D or E). It will be interesting to

determine how it fits into the three dimensional structure of the protein–DNA complex. Perhaps the DNA binding domain should, after all, have been called the ‘tetrahelix’ [5,8,14]. The a-helical coiled-coil domain Another conserved structure occurs in trihelix proteins – a long a-helix mostly predicted to form a coiled-coil [13] (Figure 1, Table 1). It occurs in all Arabidopsis family members except GT-1 and GT-4, and lies in the C terminal 165

Review

Trends in Plant Science March 2012, Vol. 17, No. 3

GT-1 clade

SH4 clade

GT-2 clade

GTγ clade

0.1

SIP1 clade TRENDS in Plant Science

Figure 2. Phylogeny of the 30 trihelix genes of Arabidopsis thaliana and the 31 of rice Oryza sativa ssp. japonica. The rice nomenclature follows the locus identifier code of the MSU rice genome annotation project (http://rice.plantbiology.msu.edu/index.shtml), and each name is preceded by LOC_. Alignments of the trihelix domains were assembled using 69 amino acids (from 4 upstream of the first conserved tryptophan, to 12 downstream of the third, involving residues 83–151 of GT-1 and corresponding ones in other proteins). Both trihelix domains of the GT-2 clade members (TN and TC) were used. Alignment was facilitated by ClustalW with default parameters, and a Neighbor-Joining tree was produced following 1000 iterations. Results are shown using TreeView, with bootstrap values indicated at the nodes. Branch points that are not firmly established (bootstrap values less than 500) are indicated in red. To provide a reference point, the first repeat (R2) of the R2R3 MYB proteins of Arabidopsis AtMYB66 (WEREWOLF, At5g14750) and AtMYB108 (BOTRYTIS-SUSCEPTIBLE1, At3g06490), are included, as well as of their closest rice relatives LOC_Os05g35500 and LOC_Os03g20090. End branches leading to Arabidopsis genes are shown in brown, and to rice genes in blue. Other phylogenies have appeared for Arabidopsis and rice trihelix proteins [24,45,46], although they used incomplete collections and involved full-length protein sequences. Overall, the 31 rice trihelix genes shown here include the 26 listed in [46] and LOC_s Os02g07800, Os02g31160, Os04g32590, Os04g45940 and Os04g57530. The scale bar represents 0.1 amino acid substitutions per site.

half of proteins with one trihelix domain, or centrally where there are two domains. The sequence itself is conserved within clades, but not usually between clades. Coiled-coils in transcription factors are frequently associated with their dimerization, and this has been demonstrated for GT-3a and GT-3b [25], and in a soybean (Glycine max) relative of At5g28300 in the GT-2 clade [26]. It seems 166

likely that all members of the family multimerize through this a-helical region, except for GT-1 and GT-4 which dimerize through an unrelated sequence [5,6]. Function Following the characterization of GT-1 and GT-2, biological functions of other family members, including localized

Review

Trends in Plant Science March 2012, Vol. 17, No. 3

(a)

(b) C

Pathogen 0 0.5 1 3

N

NaCl

6 12 24 (h)

0 0.5 1 3

6 12 24 (h)

AtGT-3b αA

αC

αB

AtGT-1

D137 R140 D133

(c)

C23 G6

Col

gtl1-1

Col

gtl1-1

Col-0

(d)

gtl1-4

C24 G5

(e)

(g) N

Wild type

ptl-1

S M2

M1

Relative level

2

IL105

M3

(h) ASIL1

(f)

Teqing

M4

2S3

M5

CRC

Oleo2

25

12

20

10

4 3

1

0

ABA

Key:

+

8

15

6

2

10

1

5

0

-

CK

4 2

0

-

+

0

-

+

-

+

Col-0 Asil1-1

TRENDS in Plant Science

Figure 3. Biological roles of trihelix family members. (a) Structural model of the trihelix region of GT-1 of Arabidopsis in association with the DNA binding element GGTTAA. The third a-helix (aC) makes contact with the DNA major groove, and a phosphomimetic aspartate at position 133 lies in close association with the two Gs (inset) (from [23]). (b) Expression of the GT-3b gene is strongly and rapidly induced in 4-week-old Arabidopsis plants by treatment with Pseudomonas syringae and 150 mM NaCl (Northern blots). Expression of GT-1 is unaffected (from [29]). (c) GTL1 dampens endoreduplication in trichomes of Arabidopsis, and additional rounds occurring in gtl1 mutants are associated with trichome enlargement (from [34]). (d) GTL1 also directly downregulates the SDD1 negative regulator of stomatal development, so fewer stomata (black) arise in gtl1 mutants, and some are immature (arrows). As a consequence, gtl1 mutants have elevated water use efficiency (from [35]). (e) The PETAL LOSS gene of Arabidopsis inhibits growth between developing sepals. Overgrowth in this region in ptl mutants weakens a petal initiation signal so petals are often missing, and often leads to partly fused sepals (arrow) (from [39]). (f) The SHATTERING4 (SH4) gene of rice (also known as SHA1) is required for full formation of the abscission zone in the pedicels of maturing seeds. It is functional in wild-type strains such as IL105, but disrupted in all domesticated forms tested including Teqing, leading to the retention of ripe seeds at harvest (from [45]). (g) The rice GTg-1 gene promotes salt tolerance, and growth of mutants on 150 mM NaCl medium (S) for 8 days is inhibited compared to controls (N). The plants shown are progeny of individual mutant (M1–M5) and wild type (CK) sibs (from [46]). (h) ASIL1 inhibits the expression of seed storage genes except during late embryogenesis in Arabidopsis. In asil1 mutants, inappropriate expression of 2S3 albumin, cruciferinC and oleosin2 encoding genes occurs at relatively high levels in 2-week-old plants (ABA treatment for 2 days has relatively little effect) (from [24]). Reproduced with permission in each case.

suppression of growth, and response to abiotic and pathogenic stresses, are now being uncovered at an accelerating pace (Figure 3, Table 1). GT-1 clade In Arabidopsis GT-4, a close relative of GT-1, binds to the same elements as GT-1 and is also expressed ubiquitously and mostly without light regulation (although its expression is induced by light exposure of etiolated 3-day-old seedlings) [27] (Table 1). In fact, the full functions of GT-1 and its relative GT-4 may not yet have been uncovered, as there is a report that expression of the rice GT-1 ortholog (LOC_Os04g40930, known as RML1) (Figure 2) is down regulated by light in etiolated seedlings, and also that it shows a circadian expression pattern [28]. In addition, its expression is rapidly induced in seedlings following infection with the rice blast fungus (Magnaporthe grisea). Another pair of similar Arabidopsis genes, GT-3a and GT-3b, also occurs in the GT-1 clade [25] (Figures 1 and 2,

Table 1). A biological role for GT-3b in responding to salt and pathogen stress is clearly established (GT-3a was not examined) [29]. GT-3b expression is rapidly induced in 4week-old Arabidopsis plants by 150 mM NaCl and Pseudomonas syringae infection (Figure 3b) (see also [30]). In soybean, expression of a calmodulin signalling gene (SCaM-4) is induced following similar stresses, and significantly, the Arabidopsis GT-3b protein binds to a GT-like element (GAAAAA) in its promoter [29]. An unusual gene within the GT-1 clade encodes a metallo-b-lactamase-trihelix chimera. This gene is present in both Arabidopsis (At5g63420) and rice (LOC_ Os02g33610) (Figure 2) and thus predates their common ancestor. The trihelix sequence occurs C-terminally and it ends immediately following the fourth helix of the DNA binding domain. The gene is widely expressed in vegetative parts of Arabidopsis, especially seeds [30], and is required for early embryogenesis, with emb2746 mutants developing only to the globular stage [31]. The lactamase component is a 167

Review member of the b-CASP lactamase family involved in DNA repair and RNA processing [32], so the trihelix DNA binding domain may direct the hybrid protein to specific genomic locations. Thus it may have exchanged its transcription factor role for one involved in chromosome maintenance. GT-2 clade An Arabidopsis gene closely related to GT-2 was identified early by sequence homology and named GT-2-LIKE1 (GTL1) [33]. Its function remained unknown for over 10 years until loss-of-function gtl1 mutants were revealed to have larger trichomes with increased levels of endoreduplication, although their number and branching were unaffected [34] (Figure 3c). GTL1 apparently limits the extent of endoreduplication by modulating the expression of relevant cell cycle genes. Development of another epidermal cell type, the stomate, is also influenced by GTL1. However, in this case their number is reduced and their size unaffected in gtl1 mutants [35] (Figure 3d). The explanation is that GTL1 directly represses the expression of the STOMATAL DENSITY AND DISTRIBUTION1 gene, encoding a serine protease that negatively regulates stomatal generation [35]. Thus when GTL1 function is lost, negative regulation of stomatal generation is strengthened. This has implications for water use efficiency, as gtl1 mutant plants with fewer stomata are better at coping with dehydration through reduced water loss. Significantly, GTL1 may contribute to this in the wild type as its expression is reduced during drought stress, possibly leading to the generation of fewer stomata in newly arising leaves [35]. PsDF1, another member of the GT-2 group, was identified in the garden pea (P. sativum). It binds to DE1 (dark inducible element 1) in the promoter of the pra2 gene [36]. The trihelix gene was named DF1 from DE1 BINDING FACTOR 1, and the most closely related gene in Arabidopsis is At1g76880 (Table 1). At the same time, a GT-2 group gene was identified from soybean that bound a related element, D1, in another light-downregulated gene, GmAux28 [37]. This gene was named GmGT-2, but it is nevertheless more closely related in sequence to AtDF1 than AtGT-2. Significantly, GmGT-2 expression, like its pea relative PsDF1, is also downregulated by light in hypocotyls [37]. Thus, unlike the constitutively expressed AtGT-2, legume DF1 genes may downregulate the lightdependent expression of their targets as a consequence of their own downregulation. PETAL LOSS (PTL) was the first trihelix gene associated with a morphogenetic function. Mutants showed a reduction in the number of petals per flower, and some sepal fusion [38] (Figure 3e). Following its cloning [39], PTL expression was found to occur between sepal primordia rather than in the petal initiation zone. Also, overexpression studies [39], including activation tagging [40], resulted in consistent inhibition of growth, indicating that the wild-type protein normally dampens inter-sepal growth so that the sepals do not fuse together. In ptl loss-of-function mutants, overgrowth in this zone may affect nearby petal initiation thereby indirectly weakening an initiation signal [38]. A separate signal that influences petal orientation also seems to be involved. In the sepal 168

Trends in Plant Science March 2012, Vol. 17, No. 3

and petal development pathway, PTL function is repressed by the transcription factors ASYMMETRIC LEAVES1 (AS1) and JAGGED (JAG) involved in organ outgrowth [41]. By contrast, it activates RABBIT EARS (RBE), a transcription factor that promotes petal initiation and growth [42]. PTL is also expressed in the basal margins of developing leaves, sepals, petals and stamens, and its general role there may be to limit the extent of organ outgrowth [39]. Interestingly, a definite ortholog of PTL cannot be identified in rice (Figure 2), suggesting that PTL and its growth-suppressing functions have arisen relatively recently. There is preliminary evidence that the closest relative of PTL is required for embryo sac development. A loss-offunction mutant named embryo sac development arrest 31 (eda31) resulted in a developmental block before the fusion of polar nuclei [43]. Finally, two other GT-2 group genes have been characterized from soybean [26]. GmGT-2B is most closely related to At5g28300 (Table 1). Its expression is induced by ABA, high salt, drought and cold in 15-day-old seedlings. Consistent with this, overexpression in transgenic Arabidopsis is reported to result in increased tolerance to salt, drought and freezing. The other gene, GmGT-2A is most closely related to Arabidopsis GT-2 and GTL1. Even so, its expression profile and consequences of overexpression are more similar to that of GmGT-2B [26]. It will be of interest to see if loss of GmGT-2B and GmGT-2A function generates decreased tolerances to these stresses. SH4 clade The function of one member of this group in rice, LOC_Os04g57530 (Figure 2), is well characterized through a genetic variant that was selected during domestication. The recessive allele reduces shattering of the mature seed head and the loss of ripe seeds before harvest (Figure 3f). The wild-type gene product promotes the full development and function of the abscission layer in the pedicel (stem) of mature seeds [44,45]. The gene was positionally cloned as SHATTERING4 (SH4), a quantitative trait locus segregating in progeny of a cross between non-shattering domesticated rice (O. sativa ssp. indica) and a shattering wild species (Oryza nivara) [44]. It was independently cloned later as SHATTERING1 in another mapping cross, between domesticated indica cultivar Teqing and a wild variety YJCWR from Yunnan Province in China [45]. The same causal amino acid change was found in each study, K79N, present in the first helix of the trihelix domain. Significantly the same substitution was consistently present as a recessive change in all non-shattering domesticated strains tested, including ssp. japonica, but absent from accessions of shattering species O. nivara, Oryza rufipogon and four other wild species. How this change affects protein function is not yet known. Nothing is known about the function of the four Arabidopsis genes in this clade (Table 1, Figure 2). GTg clade The functions of this clade have only recently been investigated [46]. Three of the four rice genes, OsGTg-1, OsGTg2 and OsGTg-3 (Figure 2), are expressed most strongly in

Review the leaf lamina. Expression in seedlings can be induced 2.5–10 times by 6 h of salt stress, and for OsGTg-1 at least, by drought and cold as well. Expression of all is also boosted somewhat by ABA treatment. Further evidence for salinity tolerance was obtained for OsGTg-1 [46]. Shoots of a reduced expression mutant were shorter by around 20–25% when seedlings were treated for 8 days with 150 mM NaCl (Figure 3g), whereas overexpression of OsGTg-1 boosted their length. Consistent with this, sodium but not potassium ion concentrations were higher in the mutant, but lower in the overexpressing lines. There are three GTg group genes in Arabidopsis (Figure 2, Table 1) [46], but their expression does not seem to show similar stress-induced trends [30], and further studies of the extent of stress-related functions will be of interest. SIP1 clade The first member of this large group was identified through being bound to the Agrobacterium tumefaciens 6b oncogenic protein, and was named 6b INTERACTING PROTEIN1 (SIP1) [47]. The 6b protein promotes hormone independent cell division in cultured tobacco cells. NtSIP1 is a transcriptional activator which also promotes nuclear localization of the 6b protein, but its role is otherwise unknown. Recently, two closely related SIP1 group genes from Arabidopsis have been shown to repress the expression of late embryo development genes in seedlings [24], and during early embryogenesis itself [48]. Named ARABIDOPSIS 6b-INTERACTING PROTEIN1-LIKE1 and 2 (ASIL1 and ASIL2), transcripts of ASIL1 were identified in seedlings through binding of their product to a repression element in the promoter of the seed storage albumin gene 2S3, and specifically to a GT box-like sequence (GTGATT) [24]. Strikingly, expression of 2S3 and other seed-specific genes was also derepressed in newly germinating seedlings of asil1 mutants [24] (Figure 3h). In developing embryos, mutants of asil1 and asil2 in heterozygous and homozygous mutant combinations precociously accumulated chlorophyll very early at the globular stage in a dosage dependent manner [48]. Chlorophyll normally accumulates only at much later stages. Another chimeric gene occurs in the SIP1 group, with the majority of the coding sequence of an aspartate/ glutamate/uridylate kinase fused immediately C-terminal to the fourth helix of the DNA binding region. Again this is present in both Arabidopsis (At3g10030) (Table 1) and rice (LOC_Os04g33300) (Figure 2), and the presumed parent kinase gene is also still present in both species (At3g18680 and LOC_Os01g73450). The function of the chimeric gene is unclear, although seedlings of a mutant strain of Arabidopsis are dwarfed, with pale green and misshapen leaves [49]. Finally, a hidden member of the trihelix family that has been mis-annotated in the Arabidopsis genome also falls in the SIP1 group. The gene encodes FRIGIDA-INTERACTING PROTEIN2 (FIP2) (At4g17060), and carries a conserved and presumably functional trihelix sequence in the region annotated as the 50 UTR. This lies upstream of the first methionine, so transcription apparently commences at a different codon even further upstream. FIP2 was identified as interacting strongly with FRIGIDA (FRI) in a yeast two hybrid screen [50]. FRI itself is a protein with

Trends in Plant Science March 2012, Vol. 17, No. 3

two coiled-coil forming domains but no apparent DNA binding motif, and it upregulates the expression of FLC, a key repressor of flowering. Together FRI and FLC confer a vernalization requirement for flowering. However, fip2 loss-of-function mutants did not interfere with the FRI promotion of FLC expression [50], and the significance of the interaction remains to be uncovered. Its function may be conserved as an apparent ortholog, LOC_Os10g41460, is present in rice (Figure 2). Thus, early deduced roles for trihelix genes encompassed GT factors that bind to light induction and light repression elements. With the extension of the family to 30 or so members across three additional clades, functions are now being widened to include responses to abiotic and biotic stresses, and to roles in the fine-tuning of a range of specialized developmental processes involving flowers, trichomes, stomata, embryos and seeds. Future directions Function Functional studies need to be broadened and refined. For example, the involvement of GT factors in light regulation is still only tenuously established. Many of the studies so far have involved correlative matching of GT factors with GT elements, and firm evidence for GT function needs to be strengthened, particularly by loss-of-function tests. Scattered examples of trihelix proteins responding to environmental stresses are appearing across three clades (Table 1), but the full extent of this function is not yet established. Another emerging theme is a role in suppressing growth and other events in localized regions of the plant [34]. These include repression of growth of trichomes [34] and inter-sepal zones [38,39], and the accumulation of storage products except during late embryogenesis [24,48]. It will be interesting to determine the generality of all such functions and any underlying common mechanisms. Precedents from other transcription factor families indicate that functions are often shared within clades [51–53]. One complication of functional studies is the potential redundancy of closely related family members (Figure 2), so multiple mutant studies, or targeted knock-down tests using artificial miRNAs, are needed. How trihelix transcription factors function at the molecular level is now accessible using new methods [54]. Their expression has already been mapped in tissue extracts [30], but in situ screens would add specificity. Their interactions with each other, and other transcription factors and co-factors, can now be mapped globally. For example, do they act universally as dimers? Are heterodimers also formed? Do they interact with transcription factors also involved in pathogen and abiotic stress response such as the WRKY family [55,56]? Their potential targets can now be defined by ChIP-seq and other methods, and candidates followed up. In this regard it will be of interest to determine their target sequence specificity, the approximate consensus so far being GGT(A/T)(A/T)(A/T) (Table 1). It cannot be assumed that the same functions are conferred across all flowering plants. A very limited sample of family members, scattered across species, has been examined to date. Comparative studies are complicated 169

Review by the occurrence of differing cohorts of recent gene duplications (Figure 2). For example, phylogenies of the three related Arabidopsis genes GT-2, GTL1 and DF1-like, are intermingled with the five closest rice genes, including OsGT-2 (Figure 2) [46]. Thus, the functions of the genes called GT-2 in rice and Arabidopsis may have diverged through subfunctionalization or neofunctionalization following independent duplications after their separation from a single common ancestor. Origin and evolution The trihelix family is apparently limited to land plants [1], although a report of their presence in humans and Drosophila [57] needs to be investigated further. They are absent from the green algae (Chlorophyta) [19,20,57,58], and have undergone large scale expansion in the lineage of the last common ancestor of land plants [58]. The presumed origin of the trihelix domain from a MYB-like gene carrying only one repeat [21], and their relationship to other divergent MYB-like genes, needs to be examined in further detail. Their consensus DNA sequence target differs markedly from that of any of the MYB and MYB-like families [59], so any relationship is presumably distant. It will be of interest to deduce the ancestral functions of trihelix genes, especially using information from basal land plants [60]. Clearly, developmental functions uncovered to date among Angiosperms involving stomata, trichomes, flowers, embryos and seeds, must be new as many of these structures arose relatively late in evolution. Responses to light and stress, however, are required in all land plants, and these functions may have arisen first. Consistent with this, the relative abundance of introns in light-associated GT-1, and their lack or solo presence in other trihelix genes [46], provides a clue that it may be closest to a possible ancestor. Divergence in function of newly arisen members from ancestral physiological functions to new morphogenetic roles could be a theme common to other transcription factor families, especially when their large scale amplification has occurred [58,60]. Interestingly, family members with duplicated DNA binding domains may be preferentially involved in morphogenetic function in the trihelix family (Table 1), and in the AP2/EREBP family [60,61] as well. The presence of two DNA recognition domains within one protein may add to the spectrum of potential target genes, and to the adaptive complexity of their regulation. Acknowledgments We especially thank John Alvarez, Megan Griffith, Paul Howles, Aydin Kilinc, Edwin Lampugnani, Martin O’Brien, Pia Sappl and Joel Sohlberg for observations and discussions, and the Australian Research Council for sustained funding.

References 1 Riechmann, J.L. et al. (2000) Arabidopsis transcription factors: Genome-wide comparative analysis among eukaryotes. Science 290, 2105–2110 2 Green, P.J. et al. (1987) Sequence-specific interactions of a pea nuclear factor with light-responsive elements upstream of the rbcS-3A gene. EMBO J. 6, 2543–2549 3 Gilmartin, P.M. et al. (1992) Characterization of a gene encoding a DNA binding protein with specificity for a light-responsive element. Plant Cell 4, 839–849 170

Trends in Plant Science March 2012, Vol. 17, No. 3

4 Perisic, O. and Lam, E. (1992) A tobacco DNA binding protein that interacts with a light-responsive box II element. Plant Cell 4, 831–838 5 Lam, E. (1995) Domain analysis of the plant DNA-binding protein GT1a: Requirement of four putative a-helices for DNA binding and identification of a novel oligomerization region. Mol. Cell. Biol. 15, 1014–1020 6 Hiratsuka, K. et al. (1994) Molecular dissection of GT-1 from Arabidopsis. Plant Cell 6, 1805–1813 7 Le Gourrierec, J. et al. (1999) Transcriptional activation by Arabidopsis GT-1 may be through interaction with TFIIA-TBP-TATA complex. Plant J. 18, 663–668 8 Mare´chal, E. et al. (1999) Modulation of GT-1 DNA-binding activity by calcium-dependent phosphorylation. Plant Mol. Biol. 40, 373–386 9 Kay, S.A. et al. (1989) The rice phytochrome gene: structure, autoregulated expression, and binding of GT-1 to a conserved site in the 50 upstream region. Plant Cell 1, 351–360 10 Dehesh, K. et al. (1990) A trans-acting factor that binds to a GT-motif in a phytochrome gene promoter. Science 250, 1397–1399 11 Dehesh, K. et al. (1992) GT-2: a transcription factor with twin autonomous DNA-binding domains of closely related but different target sequence specificity. EMBO J. 11, 4131–4144 12 Dehesh, K. et al. (1995) Twin autonomous bipartite nuclear localization signals direct nuclear import of GT-2. Plant J. 8, 25–36 13 Kuhn, R.M. et al. (1993) DNA binding factor GT-2 from Arabidopsis. Plant Mol. Biol. 23, 337–348 14 Ni, M. et al. (1996) GT-2: In vivo transcriptional activation activity and definition of novel twin DNA binding domains with reciprocal target sequence specificity. Plant Cell 8, 1041–1059 15 Zhou, D-X. (1999) Regulatory mechanism of plant gene transcription by GT-elements and GT-factors. Trends Plant Sci. 4, 210–214 16 Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 17 Goff, S.A. et al. (2002) A draft sequence of the rice genome (Oryza sativa ssp. japonica). Science 296, 92–100 18 Yu, J. et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 19 Perez-Rodriguez, P. et al. (2010) PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 38, D822–D827 20 Zhang, H. et al. (2011) PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database. Nucleic Acids Res. 39, D1114–D1117 21 Nagano, Y. (2000) Several features of the GT-factor trihelix domain resemble those of the Myb DNA-binding domain. Plant Physiol. 124, 491–493 22 Ogata, K. et al. (1994) Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79, 639–648 23 Nagata, T. et al. (2010) Solution structure of the trihelix DNA-binding domains of the wild type and a phosphomimetic mutant of Arabidopsis GT-1: mechanism for an increase in DNA-binding affinity through phosphorylation. Proteins 78, 3033–3047 24 Gao, M-J. et al. (2009) Repression of seed maturation genes by a trihelix transcriptional repressor in Arabidopsis seedlings. Plant Cell 21, 54–71 25 Ayadi, M. et al. (2004) Analysis of GT-3a identifies a distinct subgroup of trihelix DNA-binding transcription factors in Arabidopsis. FEBS Lett. 562, 147–154 26 Xie, Z-M. et al. (2009) Soybean trihelix transcription factors GmGT-2A and GmGT-2B improve plant tolerance to abiotic stresses in transgenic Arabidopsis. PLoS ONE 4, e6898 27 Murata, J. et al. (2002) Characterization of a novel GT-box binding protein from Arabidopsis. Plant Biotechol. 19, 103–112 28 Wang, R. et al. (2004) Transcript abundance of rml1, encoding a putative GT1-like factor in rice, is up-regulated by Magnaporthe grisea and down-regulated by light. Gene 324, 105–115 29 Park, H.C. et al. (2004) Pathogen-and NaCl-induced expression of the SCaM-4 promoter is mediated in part by a GT-1 box that interacts with a GT-1-like transcription factor. Plant Physiol. 135, 2150–2161 30 Schmid, M. et al. (2005) A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506 31 Tzafrir, I. et al. (2004) Identification of genes required for embryo development in Arabidopsis. Plant Physiol. 135, 1206–1220

Review

Trends in Plant Science March 2012, Vol. 17, No. 3

32 Callebaut, I. et al. (2002) Metallo-beta-lactamase fold within nucleic acid processing enzymes: the beta-CASP family. Nucleic Acids Res. 30, 3592–3601 33 Smalle, J. et al. (1998) The trihelix DNA-binding motif in higher plants is not restricted to the transcription factors GT-1 and GT-2. Proc. Natl. Acad. Sci. U.S.A. 95, 3318–3322 34 Breuer, C. et al. (2009) The trihelix transcription factor GTL1 regulates ploidy-dependent cell growth in the Arabidopsis trichome. Plant Cell 21, 2307–2322 35 Yoo, C.Y. et al. (2010) The Arabidopsis GTL1 transcription factor regulates water use efficiency and drought tolerance by modulating stomatal density via transrepression of SDD1. Plant Cell 22, 4128–4141 36 Nagano, Y. et al. (2001) Trihelix DNA-binding protein with specificities for two distinct cis-elements. J. Biol. Chem. 276, 22238–22243 37 O’Grady, K. et al. (2001) The transcript abundance of GmGT-2, a new member of the GT-2 family of transcription factors from soybean, is down-regulated by light in a phytochrome-dependent manner. Plant Mol. Biol. 47, 367–378 38 Griffith, M.E. et al. (1999) PETAL LOSS gene regulates initiation and orientation of second whorl organs in the Arabidopsis flower. Development 126, 5635–5644 39 Brewer, P.B. et al. (2004) PETAL LOSS, a trihelix transcription factor gene, regulates perianth architecture in the Arabidopsis flower. Development 131, 4035–4045 40 Li, X. et al. (2008) A gain-of-function mutation of transcriptional factor PTL results in curly leaves, dwarfism and male sterility by affecting auxin homeostasis. Plant Mol. Biol. 66, 315–327 41 Xu, B. et al. (2008) Arabidopsis genes AS1, AS2 and JAG negatively regulate boundary-specifying genes to promote sepal and petal development. Plant Physiol. 146, 566–575 42 Takeda, S. et al. (2004) RABBIT EARS, encoding a SUPERMAN-like zinc finger protein, regulates petal development in Arabidopsis thaliana. Development 131, 425–434 43 Pagnussat, G.C. et al. (2005) Genetic and molecular identification of genes required for female gametophyte development and function in Arabidopsis. Development 132, 603–614 44 Li, C. et al. (2006) Rice domestication by reduced shattering. Science 311, 1936–1939 45 Lin, Z. et al. (2007) Origin of seed shattering in rice (Oryza sativa L.). Planta 226, 11–20 46 Fang, Y. et al. (2010) Systematic analysis of GT factor family of rice reveals a novel subfamily involved in stress responses. Mol. Genet. Genomics 283, 157–169

47 Kitakura, S. et al. (2002) The protein encoded by oncogene 6b from Agrobacterium tumefaciens interacts with a nuclear protein of tobacco. Plant Cell 14, 451–463 48 Willman, M.R. et al. (2011) MicroRNAs regulate the timing of embryo maturation in Arabidopsis. Plant Physiol. 155, 1871–1884 49 Kumori, T. et al. (2006) A trial phenome analysis using 4000 Dsinsertional mutants in gene coding regions of Arabidopsis. Plant J. 47, 640–651 50 Geraldo, N. et al. (2009) FRIGIDA delays flowering in Arabidopsis via a cotranscriptional mechanism involving direct interaction with the nuclear cap-binding complex. Plant Physiol. 150, 1611– 1618 51 Ariel, F.D. et al. (2007) The true story of the HD-Zip family. Trends Plant Sci. 12, 419–426 52 Carretero-Paulet, L. et al. (2010) Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in Arabidopsis, poplar, rice, moss, and algae. Plant Physiol. 153, 1398– 1412 53 Dubos, C. et al. (2010) MYB transcription factors in Arabidopsis. Trends Plant Sci. 15, 573–581 54 Mitsuda, N. and Ohme-Takagi, M. (2009) Functional analysis of transcription factors in Arabidopsis. Plant Cell Physiol. 50, 1232– 1248 55 Eulgem, T. et al. (2000) The WRKY superfamily of plant transcription factors. Trends Plant Sci. 5, 199–206 56 Zhang, Y. and Wang, L. (2005) The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evol. Biol. 5, 1 57 Rian˜o-Pacho´n, D.M. et al. (2008) Green transcription factors: a Chlamydomonas overview. Genetics 179, 31–39 58 Lang, D. et al. (2010) Genome-wide comparative phylogenetic analysis of plant transcriptional regulation: a timeline of loss, gain, expansion and correlation with complexity. Genome Biol. Evol. 2, 488–503 59 Prouse, M.B. and Campbell, M.M. (2012) The interaction between MYB proteins and their target DNA binding sites. Biochim. Biophys. Acta. 1819, 67–77 60 Floyd, S.K. and Bowman, J.L. (2007) The ancestral developmental tool kit of land plants. Int. J. Plant Sci. 168, 1–35 61 Feng, J-X. et al. (2005) An annotation update via cDNA sequence analysis and comprehensive profiling of developmental, hormonal and environmental responsiveness of the Arabidopsis AP2/EREBP transcription factor gene family. Plant Mol. Biol. 59, 853–868

Plant Science Conferences in 2012 Plant development and environmental interactions 27 – 30 May, 2012 Matera, Italy http://events.embo.org/12-plant/ The Biology of Plants 30 May – 4 June, 2012 Cold Spring Harbor, USA http://meetings.cshl.edu/meetings.html Salt & Water stress in plants 24 – 29 June, 2012 Hong Kong, China http://www.grc.org/programs.aspx?year=2012&program=salt 23rd International Conference on Arabidopsis research (ICAR) 3 – 7 July, 2012 Vienna, Austria http://www.icar2012.org/

------------------------------------------------------------------------------------------------------------------------------Suggest a conference Please use the form at http://www.cell.com/conferences/SuggestConference to suggest a conference for Cell Press the Conference Calendar. 171