Disease resistance signature of the leucine-rich repeat receptor-like kinase genes in four plant species

Disease resistance signature of the leucine-rich repeat receptor-like kinase genes in four plant species

Plant Science 179 (2010) 399–406 Contents lists available at ScienceDirect Plant Science journal homepage: www.elsevier.com/locate/plantsci Disease...

314KB Sizes 0 Downloads 33 Views

Plant Science 179 (2010) 399–406

Contents lists available at ScienceDirect

Plant Science journal homepage: www.elsevier.com/locate/plantsci

Disease resistance signature of the leucine-rich repeat receptor-like kinase genes in four plant species Ping Tang a,b,1 , Ying Zhang a,1 , Xiaoqin Sun c , Dacheng Tian a , Sihai Yang a,∗ , Jing Ding a,∗ a b c

State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210093 Nanjing, China Department of Life Science, Lianyungang Teacher’s College, 222006 Lianyungang, China Institute of Botany, Jiangsu Province & Chinese Academy of Science, Nanjing 210014, China

a r t i c l e

i n f o

Article history: Received 14 April 2010 Received in revised form 25 June 2010 Accepted 30 June 2010 Available online 7 July 2010 Keywords: LRR-RLK Disease resistance gene Gene expansion Tandem duplication

a b s t r a c t Receptor-like kinase (RLK) genes may be a potential reservoir for plant disease resistance (R) genes as defense response has been proved to be a main category of their biological functions. Although genomewide identification of RLKs has been accomplished in various plant species, little was done to distinguish the genes related to plant disease resistance. To discover more R gene candidates in plants, we explored all the leucine-rich repeat (LRR) RLKs in four fully sequenced genomes, Arabidopsis thaliana, rice, poplar and grapevine. Phylogenetic analysis showed that LRR-RLKs in these four species could be mainly divided into two opposite groups: lineage-specific Expanded vs. ortholog-unambiguous Nonexpanded. About 39.2% of the total LRR-RLKs residing in 16 major clades belonged to the Expanded group, exhibiting similar characteristics with respect to typical functional R genes. A high proportion (72%) of these genes was located in tandem duplications, indicating that lineage-independent expansion played an essential role in their evolution. Contrarily, genes in the Nonexpanded group tended to function in plant growth and development and basal defense responses. Positive selection driven by highly variable pathogen effectors and gene expansion should be two major factors making the differences between the two groups of LRR-RLKs. © 2010 Elsevier Ireland Ltd. All rights reserved.

1. Introduction Plant immune system is composed of two main mechanisms [1]. One uses transmembrane pattern recognition receptors (PRRs) to detect conservative pathogen-associated molecular patterns (PAMPs, like bacterial flagellin) and trigger basal defense responses to pathogens. This mechanism is named as PAMP-triggered immunity (PTI). FLS2, a receptor-like kinase (RLK) with extracellular leucine-rich repeats (LRRs), is one of the representative PRRs that mediates flagellin peptide perception in Arabidopsis thaliana, rice and other land plants [2–4]. Correspondingly, the other mechanism is called “effector-triggered immunity (ETI)”, which depends on specific recognition and interaction between plant disease resistance (R) genes and pathogen effectors (gene-for-gene resistance [5]). Many functional R genes have been investigated, such as Rpp5, the gene conferring resistance to some strains of the downy mildew pathogen in A. thaliana [6]. Rpp5 belongs to the largest class of plant R genes that encode nucleotide-binding sites (NBS) and LRR domains. It is believed that the only known function of NBS-LRR

∗ Corresponding authors. Tel.: +86 25 83686406; fax: +86 25 83592740. E-mail addresses: [email protected] (S. Yang), [email protected] (J. Ding). 1 These authors contributed equally to this work. 0168-9452/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.plantsci.2010.06.017

genes is in the secondary order of disease resistance, that is, in plant ETI [7,8]. NBS-LRR genes have been extensively surveyed in various species [9–11]. All the genome-wide analyses show that a large part of the genes are arranged in clusters. Tandem duplication of paralogous sequences is proposed as a major cause of this clustering [8] and the expansion of this gene group in some perennial species [11]. The R-gene diversity is shaped by positive selection that acts on the segregating nonsynonymous sites of the genes and balancing selection that maintains variation at many R loci [12,13]. Both selection forces have been proved to be driven by the co-evolution of host R genes and avirulence (Avr) genes in pathogens [14]. However, the RLK genes related to plant disease resistance were poorly studied. RLKs are known as transmembrane proteins that typically contain N-terminal extracellular signal sequences and Cterminal intracellular kinases [15–17]. This type of genes forms a huge family in plants. More than 610 putative RLK genes in A. thaliana [17], 605 in soybean [18], 1070 in rice and 1192 in poplar [19] have been identified, respectively, indicating their diverse roles in the life cycle of plants. The biological functions of these RLK genes are classified into two main categories [17]. The first category includes genes involved in plant growth and development, such as BRI1 for brassinosteroid signal transduction [20] and EMS1 for microspore development [21] in A. thaliana. The other

400

P. Tang et al. / Plant Science 179 (2010) 399–406

category is related to plant–microbe interactions and defense responses [17,22], including FLS2 that is mentioned above. FLS2 is a member of LRR-RLK gene group, the largest subfamily of RLK that encodes extracellular LRRs in the N-terminal parts. Besides FLS2, more novel LRR-RLKs have been identified to regulate the responses to plant biotic stresses recently, such as EFR from A. thaliana against the bacterial protein EF-Tu [23], and rice R genes Xa21, Xa3/Xa26 for rice bacterial blight resistance [24,25]. FLS2 and EFR are PAMP receptors in PTI, whereas the two rice R genes confer gene-for-gene resistance to specific races of pathogens in ETI, suggesting that the LRR-RLKs are also an important group of genes for both levels of plant defense. Many NBS-LRR genes have been identified from the plant genomes, e.g., 149 in A. thaliana [9], 330 in poplar, 459 and 480 in grapevine and rice genomes [10,11], respectively. Nevertheless, the total number of R genes in a genome seems insufficient for R genes themselves to confer resistance to the multitude of pathogens that a plant is likely to encounter [7]. On the other hand, over 200 and 300 LRR-RLK genes are discovered in A. thaliana and rice [17], respectively, but only a few of them have known functions [19,26,27]. The LRR domain shared by LRR-RLK and NBS-LRR proteins plus the samples mentioned above that function in plant–pathogen interactions implies that the LRR-RLKs may be plant R gene candidates. In this study, we identified 266 and 321 LRR-RLK genes residing in grapevine and poplar genomes, respectively. Their genomic organization and phylogenetic distribution were analyzed along with the LRR-RLKs in A. thaliana and rice. Our results showed that a number of the LRR-RLKs could be classified into a group that shared similar features of genetic variation and evolution with the NBS-LRRs, indicating their potential functions in disease resistance in these four plant genomes.

2. Materials and methods 2.1. Identification of LRR-RLK genes The Ser/Thr/Tyr kinase genes in A. thaliana (The Arabidopsis Information Resource [TAIR], version 7) and rice (Oryza sativa subsp. japonica; International Rice Genome Sequencing Project, version 6.0) were identified using the same method as Shiu et al. [17]. For those in poplar (Populus trichocarpa; Joint Genome Institute [JGI], version 1.1) and grapevine (Vitis vinifera; the French National Sequencing Center [Genoscope], version 1.0), tBLASTn searches of amino acid sequences of Pkinase domain (Pfam PF00069 and sequences used in Shiu and Bleecker [16]) against the genomic sequences were first performed with an E-value cutoff of 1 to identify possible homologs encoded [16]. Then the nucleotide sequences of candidate Ser/Thr/Tyr kinase genes, together with flanking regions of 5000–10,000 bp at both sides, were annotated using the gene-finding programs FGENESH (http://www.softberry.com/) and GENSCAN (http://genes.mit.edu/genescan.html/) to obtain information on complete open reading frames (ORFs). Afterwards, all the candidate kinase genes in the four genomes were further examined to define their structural domains, using the Pfam database (http://pfam.janelia.org/) and SMART protein motif analyses (http://smart.embl-heidelberg.de/). Genes with extracellular LRR motifs were picked out as members of the LRR-RLK subfamily for following analyses. To exclude potentially redundant candidate LRR-RLK genes in poplar and grapevine, sequences from these two genomes were orientated and eliminated if located in the same location according to the results of BLASTn. As in Lehti-Shiu et al. [19], tandem LRR-RLK genes were defined as those genes that (1) are less than or equal to 10 genes apart and (2) are within 100 kb (A. thaliana) or 350 kb (rice, poplar

and grapevine). Some genes in poplar and grapevine were on the unknown chromosome, which was formed of contigs that could not be assembled to the chromosomes. We were not able to determinate whether these genes resided in tandem repeats so they were removed in the statistics. 2.2. Sequence alignment and phylogenetic analysis The kinase domain sequences obtained from Pfam and SMART analyses in all the four species were aligned in ClustalW with default options [28], with 14 functional LRR-RLKs from other species or ecotypes. Based on the alignment results, we generated a whole phylogenetic tree (Fig. S1) and four separate trees (Fig. S2A–D) using the Bootstrap neighbor-joining (NJ) method with a Jukes-cantor model by MEGA v4.0 [29]. The stability of internal nodes was assessed by bootstrap analysis with 1000 replicates. Then, we calculated the average nucleotide diversity () of the kinase domain sequences and split the phylogeny into individual major clades, based on the nucleotide diversity/divergence between paralogs within the major clades (≤60%) combined with their bootstrap values (≥70). We also tried other criteria for the division of the phylogenic tree, such as a higher diversity (70%) that would involve some practically unrelated genes in a major clade or a lower diversity (50%) that would bring many more single-gene clades. Neither criterion had a large effect on the grouping of genes. Most major clades containing genes with a similarity of more than 40% had a bootstrap value higher than 70, though there were several exceptions that might be resulted from approximate levels of divergence among genes within and between the clades on the tree. Full-length nucleotide coding sequences (CDSs) of all the LRRRLKs were aligned in MEGA according to the alignments of protein sequences. Then we divided the genes into two main parts (Pkinase domain and the LRR region) for further analysis. The  values of the whole sequence and both regions were calculated separately between every possible paralog on the tree. The ratio of nonsynonymous to synonymous nucleotide substitutions (Ka /Ks , where Ka and Ks were defined as the number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitution per synonymous site, respectively) was one of the effective parameters to detect signature of selection. Therefore, we also evaluated the Ka /Ks ratios for the entire sequence and each part of the genes among the paralogs within each major clade. Some paralogs were too divergent to be alignable in the LRR domains, so we excluded the pairs with  values of the LRR domains larger than 0.8 or of the whole CDS larger than 0.6. All the parameters mentioned above were calculated by MEGA. AtGenExpress microarray data of plant defense responses [30] were obtained from TAIR (http://www.arabidopsis.org/). A total of 108 LRR-RLKs, including all the unknown Arabidopsis genes in the Expanded group and in the Nonexpanded R and D subgroups with 37 randomly selected genes in the Nonexpanded U subgroup, were searched in the data to investigate whether they are up-regulated under the biotic stress conditions, using the same method as LehtiShiu et al. [19]. 3. Result 3.1. Total number and organization of LRR-RLK genes in the four plant genomes Based on the characteristic domains of LRR-RLKs reported previously, we identified 217, 310, 321 and 266 candidate LRR-RLK genes from the genome of A. thaliana, rice, poplar and grapevine, respectively (Table 1). As different numbers of protein-coding genes in these four genomes were previously reported (approximately

P. Tang et al. / Plant Science 179 (2010) 399–406

401

Table 1 Grouping of the LRR-RLK genes from the phylogeny. Groupa

Clade number

Expanded R Expanded U Nonexpanded R Nonexpanded D Nonexpanded U ND ND single Total

9 7 5 19 62 27 28 157

Gene numberb A. thaliana

Rice

Poplar

Grape

Total

46/23.0 11/2.8 9/2.3 40/2.2 100/1.7 4/1.3 7 217

146/20.9 36/9.0 6/2.0 21/2.1 59/1.6 33/2.8 10 310

32/10.7 53/8.8 14/3.5 47/2.6 147/2.4 27/2.3 1 321

40/13.3 73/12.2 10/2.5 31/1.7 84/1.4 18/1.3 10 266

264/29.3 173/24.7 39/7.8 139/7.3 390/6.3 82/3.0 28 1114

a The subgroups are named as the group titles with suffixes; suffix R is for the clades with functional genes involved in plant resistance, D for those with known genes functioning in plant growth and development, and U for those without any functional genes. ND and ND single are designated for the clades in the ND group containing multiple or single gene(s), respectively. b The number on the left of the slash is the total number of genes in the subgroup, and the one on the right is the average calculated as the total gene number divided by the clade number (only the clades including gene(s) of the species are considered).

27,000 in A. thaliana [31], 37,544 in rice [32], 45,555 in poplar [33] and 30,434 in grapevine [34]), the percentage of LRR-RLK genes with respect to the total genes could be calculated to be approximately 0.80%, 0.83%, 0.70% and 0.87%, respectively. Among these more than eleven hundreds LRR-RLKs, only 43 genes (36 from A. thaliana and 7 from rice; Table S1) had known functions or had alleles with known functions in other ecotypes, which accounted for less than 5.0% of the identified LRR-RLK genes. The LRR-RLK genes were mainly distributed evenly across the chromosomes in each species; however, there were 31.3%, 48.1%, 27.2% and 53.5% genes residing in 22, 43, 28 and 34 clusters in A. thaliana, rice, poplar and grapevine, respectively (Fig. S3). The average numbers of genes in a cluster were very close (from 2.32 to 3.62) in these four genomes. Grape had the largest cluster containing 15 LRR-RLK genes. 3.2. Phylogeny and classification of the LRR-RLK genes We aligned the Pkinase domain (PF00069) sequences of all the LRR-RLK proteins from the four species with 14 functional LRRRLKs from other species or ecotypes (Table S1), and constructed a phylogenetic tree in MEGA v4.0 (Fig. S1). Based on the nucleotide diversity/divergence between paralogs and bootstrap values (see Section 2 for details), we split the tree into 157 major clades to investigate the genes in detail. Representative major clades and the whole phylogeny are shown in Figs. 1 and S1, respectively, exhibiting two dominant types of phylogenetic structures. Within a major clade of the first type (Fig. 1A or B), considerable genes (at least ≥6) from a same species are clustered on the tree. These genes are rarely interpolated with genes from other species, no matter how many species the clade contains. This kind of clustering should be a consequence of gene expansion that independently occurred during evolution, thus we categorize all the major clades with this type of phylogeny into the Expanded group. On the contrary, there is another type of phylogeny, in which genes from different species are intermixed within a major clade, as illustrated in Fig. 1C. No feature of gene expansion is observed, so we designate this type of clades as the Nonexpanded group. In addition, some major clades contain a few genes (≤5) in one or two species (Fig. 1D). These clades cannot be simply classified into either group mentioned above due to limited numbers of genes and species they include. Thus, we categorize them as the non-defined (ND) group. The separate phylogenetic trees for LRR-RLKs in each of the four species show similar topology with the whole phylogeny (Fig. S2A–D). Genes from a same species belonging to the Expanded group are always clustered on the separate trees; and in general, branch lengths among these genes are much shorter than the branch lengths among LRR-RLKs belonging to the Nonexpanded group. Despite that some genes cannot be eas-

ily distinguished between Expanded and Nonexpanded based on the separate phylogeny alone, the basic phylogenetic structures of the four separate trees confirm our division of the whole phylogeny. There are 16 major clades in the Expanded group, comprising 437 (39.2%) of the total LRR-RLK genes (Table 2). This number is significantly higher than random distribution among the clades (2 = 1127 or 585 when genes in the ND group are excluded, both P < 0.001). Each clade contains 27.3 genes on the average. When the average gene number for each species is calculated, the clades that do not have any gene from a species should be excluded. As a result, an average of 9.5, 16.5, 9.4 and 12.6 genes in the Expanded clades are found from A. thaliana, rice, poplar and grapevine, respectively (Table 2). However, the gene numbers of different clades and among different species in the same clade vary a lot. For example, the clade Expanded R7 contains totally 48 genes identified from the four species and one functional rice gene from another ecotype (Fig. 1A and Table 2). About half of the 48 genes are from poplar, while the other half includes 14 genes from rice, 6 from A. thaliana and 5 from grapevine. Most genes from a same species reside in a same minor clade. The 14 rice and 6 Arabidopsis genes include 10 and 4 genes located in tandem repeats, respectively. Within the 23 poplar genes, 5 of the 12 genes that have been assembled to chromosomes are located in two tandem repeats; 6 of the left genes are located on two scaffolds and should be in clusters, too. The topology of genes in this major clade and their physical distributions suggest that the most dramatic expansion should occur independently in each species lineage. The different levels of species-specific expansions may provide an indication of different levels of various biotic stresses these species may encounter. Besides this kind of clades that contains Expanded genes from all four species, there are many other clades in this group that have a single or even no gene from some of the species (Table 2, Figs. 1B and S1). This is reasonable since the major expansion here should have occurred independently as we discussed above. Species-specific clade is the extreme representative resulted from this type of expansion. Six species-specific Expanded clades are observed, all of which are constrained in the rice lineage. As a whole, rice is the species with the largest number and the highest frequency of LRRRLK genes involved in gene expansion. Grapevine also has a high frequency of Expanded genes, whereas A. thaliana and poplar do not contain so many genes in this group of clades (Table 2), implying artificial selection or domestication may contribute to the excess of LRR-RLK expansion in rice and grapevine. Another feature of the Expanded group is that a majority (72.0% = 283/393, poplar and grapevine genes without known positions on the chromosomes are excluded, see Section 2 for details) of the genes reside in tandem repeats (Fig. S3). Based on the physical distributions of LRR-RLK genes on the chromosomes shown in Fig. S3, the average proportion of genes in tandem duplications

402

P. Tang et al. / Plant Science 179 (2010) 399–406

Fig. 1. The representative kinase domain phylogenies of the LRR-RLK genes in the four species for (A) clades in the Expanded group, (B) species-specific clades in the Expanded group, (C) clades in the Nonexpanded group, (D) clades containing genes from two species in the ND group. (䊉), ( ), ( ), ( ) represent genes identified from A. thaliana, rice, poplar and grapevine, respectively. ( ) represents genes with known functions related to plant disease resistance.

in the 16 clades is 77.2%, 65.4%, 52.8% and 91.1% for A. thaliana, rice, poplar and grapevine, respectively (only the clades including gene(s) of the species are considered). These results suggest that tandem duplications are the major cause of the expansion of LRR-RLK genes in all the four plant lineages, which confirms LehtiShiu et al.’s findings that the RLK/Pelle family has a significantly higher rate of expansion largely due to tandem duplications in some subfamilies [19]. In the Nonexpanded group, more than half of the LRR-RLK genes (568/1114) separately reside in 86 major clades, with an average gene number of 6.6 for each clade. This mean value that covers genes from at least three species is lower than one forth of the value of the Expanded group, revealing the distinctive traits of the two groups. In general, every clade in the Nonexpanded group only contains one or two members from each species and the variation among species is not noticeable (Table 1 and Fig. S1). Moreover, a

much lower portion of genes (18.7% = 95/508) are involved in tandem duplications, consistently for the four species. As in Fig. S3, genes in this group disperse across the chromosomes of each species, whose distributions are much more even than the genes in the Expanded group. All of these results demonstrate the significant differences between the Expanded and Nonexpanded groups. Totally there are 44 clades in this group containing LRR-LRKs from each of the four species. The rest 42 clades consist of genes in three species, and a great majority (85.7%) of them do not have rice genes. Genes from poplar and grapevine are always located next to each other, with A. thaliana and rice gene(s) residing closely and distantly outside, respectively (Figs. 1 and S1). Phylogeny of genes in the Expanded group shows nearly the same relationship among the species, despite the excess of expansions. This lineage deduced is in agreement with the general classification of these four species [33].

P. Tang et al. / Plant Science 179 (2010) 399–406

403

Table 2 Distinct expansion of the LRR-RLK genes in the four genomes. Clade

A. thaliana

Rice

Poplar

Grape

Total

Func. genea

Expanded R1 Expanded R2 Expanded R3 Expanded R4 Expanded R5 Expanded R6 Expanded R7 Expanded R8 Expanded R9 Expanded U1 Expanded U2 Expanded U3 Expanded U4 Expanded U5 Expanded U6 Expanded U7 Averageb Total Percentagec

0 0 0 0 0 0 6 0 40 3 6 1 0 0 1 0 9.5 57 26.3%

38 0 19 27 21 6 14 21 0 20 1 9 0 0 0 6 16.5 182 58.7%

0 5 0 0 0 0 23 0 4 3 12 2 27 4 5 0 9.4 85 26.5%

0 20 0 0 0 0 5 0 15 7 18 1 30 6 11 0 12.6 113 42.5%

38 25 19 27 21 6 48 21 59 33 37 13 57 10 17 6 – 437 39.2%

OsBISERK1 Xa3/Xa26 Xa3/Xa26 EFR, Xa21 EFR, Xa21 EFR, Xa21 EFR, Xa21 SIRK, LRRPK SIRK, LRRPK

a

Functional gene(s) located in or near the clades. The average gene number in the clades for each species was calculated as total genes from the species divided by the number of clades that contain gene(s) from that species. c The percentage of genes in the Expand group to the total LRR-RLKs in the species. b

The ND group includes 41 and 14 clades that contain genes from one and two species, respectively. Unlike the Expanded group, most (28 out of 41) species-specific clades in this group are single-gene clades, which diverged from the neighboring genes at early stages. Nevertheless, there are some clades with genes in tandem repeats, indicating they may also be involved in gene expansion. Rice has many more genes in species-specific clades than the other three species, which may be attributed to its relatively distant relationship. 3.3. Functional genes in the LRR-RLK groups Forty-three LRR-RLKs in the four species have been identified with known functions. Plus the 11 functional LRR-RLKs from other species, 54 known genes are used altogether as indications for gene functions (Table S1). Six of these 54 genes separately locate in or near nine clades in the Expanded group, including three from A. thaliana (LRRPK, SIRK and EFR) and three from rice (OsBISERK1, Xa21 and Xa3/Xa26; Tables 2 and S1). Within these six genes, Xa21 and Xa3/Xa26 have been clearly identified as rice R genes [24,25]; SIRK and OsBISERK have also been proved to be involved in defense responses against pathogen effectors [35,36], although detailed mechanisms remain unclear. This means the majority of LRR-RLK genes that fall into the Expanded group play a role in plant ETI like R genes. According to previous studies [24,25], these genes tend towards sitting in tandem repeats and having high levels of diversity, which are common features shared by most NBSLRR genes, the primary R-gene component of plant ETI. EFR is the only known gene that belongs to the Expanded group but takes part in plant PTI [23]. It has been identified only in A. thaliana so far, and no homologs could be found in any other plant species [4,23], indicating its recent origin and distinctive evolutionary history different from usual PAMP receptors. Another exception is the gene LRRPK (AT4G29990), which functions in light signal transduction but locates next to three Arabidopsis genes in the major clade Expanded R9 (Figs. S1 and S2A). Those three genes, including R gene SIRK (AT2G19190), are members of a tandem repeat on Chromosome 2, implying that the distinctive function of LRRPK appears to be a consequence of translocation and neofunctionalization after gene duplication(s). For the Nonexpanded group, 24 out of the 86 clades contain 44 functional genes altogether, a vast majority of which play roles

in growth and developmental control or symbiosis (Table S1). The rest are involved in plant immunity system. FLS2 in A. thaliana is a representative of this kind of genes, which encodes a well-known LRR-LRK PPR for recognizing bacterial flagellin and triggering basal defense responses in plant PTI [2]. Studies on this gene and its orthologs in rice and other plants have demonstrated that there is always one single copy of FLS2 in the genomes of land plants [2–4]. The gene is quite conserved within species, and can complementarily function across divergent plants, suggesting a conserved perception and signaling system among plant taxa [3]. This is different from those R genes in plant-specific resistance, such as Xa21 and Xa3/Xa26 in the Expanded group that show high levels of genetic and phenotypic variation among accessions within species [24,25]. Based on the functional genes located in or near the clades, we further divided the Expanded and Nonexpanded groups into subgroups using suffixes R, D and U (Table 1). The suffix R represents a functional gene involved in plant resistance is present in the clade, D represents a developmental gene and U means no functional genes. The Expanded group does not have the subgroup Expanded D since none of the developmental genes reside in the clades of this group. 3.4. Nucleotide diversity and nonsynonymous to synonymous substitutions We calculated average nucleotide diversity (), nonsynonymous and synonymous substitutions for different regions among each pair of paralogs within clades (Table 3). For most LRR-RLKs, no matter which group the gene falls in, the Ka /Ks ratios for its coding sequence and each particular domain are all much smaller than 1, revealing that negative selection is the main force acting on them. However, the values of  and Ka /Ks ratio in the LRR domains are consistently significantly greater than the ones in the Pkinase domains (two-tailed t-tests, both P value <0.001). Furthermore, a lot more Ka /Ks ratios are greater than 1 in the LRR domains in contrast with the Pkinase domains (24 vs. 3). These results confirm that the Pkinase domain is more conserved than extracellular signal part of the RLK genes [17]. Interestingly, when comparing these parameters between genes in the two opposite groups, we find that the Ka /Ks ratios for all regions of the genes in the Expanded group are significantly greater than the ratios of those in the Nonexpanded group (Table 3; P value

404

P. Tang et al. / Plant Science 179 (2010) 399–406

Table 3 Average nucleotide diversity, nonsynonymous and synonymous substitutions of the LRR-RLK genes in groups. Group

Expand R Expand U Nonexpand R Nonexpand D Nonexpand U ND

CDS

Pkinase

LRR



Ka

Ks

Ka /Ks



Ka

Ks

Ka /Ks



Ka

Ks

Ka /Ks

0.41 0.36 0.27 0.29 0.30 0.32

0.30 0.26 0.16 0.18 0.19 0.24

0.96 0.87 0.94 0.92 0.95 0.72

0.34 0.35 0.21 0.21 0.22 0.42

0.30 0.25 0.21 0.20 0.22 0.26

0.19 0.15 0.09 0.08 0.10 0.17

0.88 0.80 0.94 0.94 0.95 0.70

0.24 0.23 0.12 0.09 0.13 0.31

0.46 0.42 0.34 0.35 0.34 0.35

0.36 0.33 0.23 0.24 0.23 0.27

1.03 0.94 0.98 0.94 0.99 0.73

0.38 0.40 0.28 0.27 0.26 0.52

<0.001). Seventeen out of the 24 Ka /Ks ratios are greater than 1 in the LRR domains of the genes in the Expanded group while only one from the Nonexpanded group (the left six are from the ND group), indicating that the Expanded genes are more likely under positive or relaxed negative selection pressure comparing to those Nonexpanded ones. On the other hand, the variation ranges of Ka /Ks ratios between genes in different clades within the same group are much smaller than the ranges among clades from different groups. Especially, the unknown genes have equal levels of Ka /Ks ratios to those functional genes in the same group, demonstrating similar evolutionary forces they are undergoing. Thus we propose that the LRR-RLK genes belonging to the same group may function in similar biological processes or pathways while different groups of genes should be involved in distinct categories of functions in the four species we investigate. 4. Discussion 4.1. Prediction of gene function for the LRR-RLKs RLK/Pelle gene family is one of the largest families in plant species [16,17,19]. LRR-RLK is one major subfamily of the RLK/Pelle genes. It consists of numerous gene members, but a vast majority of them do not have known functions [19,26,27]. Therefore, it will be of great benefit if the functions are predictable. The LRR-RLKs encode quite diverged LRR motifs that can be commonly found in proteins involved in a variety of signaling pathways [16,17]. Especially, the LRR motifs in NBS-LRRs, the putative R genes in plants, are believed to play an essential role in perception and response to signals produced by pathogen effectors [7,37], indicating that the LRR-RLKs may be R gene candidates. According to our results, the LRR-RLKs can be divided into two groups based on their phylogenetic structures. Gene expansion resulted from tandem duplications forms the main signature of genes in the Expanded group. An average of 9.4–16.5 genes from each of the four plant species are included in one major clade of this group, with high proportion residing in tandem repeats. It has been proposed recently that tandem duplications are correlated with stress response and plant-specific adaption to variable receptors [19,22], indicating the R functions of these genes. Researches on the Xa21 locus give us a good opportunity to see the role of species-specific expansions in the gene evolution. In the resistant rice line IRBB21, at least seven members of Xa21 gene family are located in a 230-kb genomic region [24,38]; while only a single non-functional gene comprises the Xa21 locus in the rice line Nipponbare. The functional Xa21 region is derived from a wild rice species Oryza longistaminata, demonstrating evidence of species-specific duplication and recombination for amplification and diversification of the Xa21 gene family [38]. Besides gene expansion, their other characteristics like topology of the phylogeny, selective and evolutionary patterns are all nearly the same as the NBS-LRRs. Besides NBS-LRRs, LRR-RLKs also share the extracellular LRR domain with receptor-like proteins (RLPs). RLPs form the sec-

ond largest group of eLRR-containing cell surface receptors, whose members are only fewer than LRR-RLKs. Among the RLPs that have been identified from various plant species, some are found to function like R genes, such as the tomato Cf genes [39]; whereas some others play a role in plant development. A recent genome-wide functional investigation of RLPs in A. thaliana reveals that several RLPs are involved in nonhost interactions in plant PTI but no specific responses to the pathogens are detected [40]. Nevertheless, these findings on RLPs provide clues for functional prediction of LRR-RLKs that eLRR-containing cell surface receptors can play roles in plant disease resistance. Microarray data of plant defense responses for the Arabidopsis genes [30], which are available on TAIR, support our proposals, too. Twenty of the 54 unknown genes in the Expanded group have available microarray data, within which 60.0% (12/20) are up-regulated by one or more biotic stress conditions (Table S2). While for the 54 randomly selected genes in the Nonexpanded group, only nine (20.0%) of the 45 genes with available microarray data are upregulated. These nine genes still include LRR-RLKs located in the Nonexpanded R clades which may be involved in basal defense responses like FLS2 (Table S2). If this part of genes is excluded, the percentage will fall to 18.6%. Therefore, we can conclude from the available microarray data that LRR-RLKs in the Expanded group are more likely to be involved in stress response and are the major contributor to this type of genes that are overrepresented under the biotic stress conditions [19]. All the discussions above strongly suggest that LRR-RLKs may functions in plant defense responses, and the LRR-RLKs in the Expanded group should execute similar functions as the NBS-LRRs, playing roles in recognition and interaction of pathogen effectors in plant ETI. In the plant R-Avr interactive system, these genes would be supplements to the small number of identified R genes that may not be sufficient because of the tremendous amount of pathogens that a plant may encounter [7]. Similarly, the unknown LRR-RLKs in the Nonexpanded group should be involved in plant growth and development or basal defense responses, as their traits resemble those of the functional genes in this group, like BRI1, EMS1 and FLS2, etc. In contrast to R genes in plant ETI, the genes functioning in basal defense responses are used to recognize slowly evolving PAMPs and act in conserved signaling pathways independent of R genes [1,3,5]. This may possibly explain why these LRR-RLKs rarely have tandem duplications or are scarcely selected positively [17]. Thus, in general, these genes belong to the Nonexpanded group, sharing analogical phylogenetic and evolutionary patterns to the genes related to developmental processes. The exception is the gene EFR in A. thaliana, which has been proved to encode a PAMP receptor in PTI [23] but resides in a clade of possible Arabidopsisspecific expansion (Fig. S1). No gene from other species is located near EFR on the phylogenetic tree, confirming the absence of this gene in other plant species [4,23]. This independent origination in A. thaliana may lead to distinctive evolutionary process that makes the phylogeny of EFR different from usual PAMP receptor genes.

P. Tang et al. / Plant Science 179 (2010) 399–406

405

Fig. 2. The relationships between Ka /Ks and Ks for the LRR regions of LRR-RLK paralogs belonging to the subgroup of (A) Expanded R (R2 = 0.950, P < 0.001) and (B) Nonexpanded R (R2 = 0.447, P > 0.01). The X-axis denotes average Ks per unit of 0.1, e.g. Ks = 0–0.1, 0.1–0.2, 0.2–0.3, etc. The Y-axis denotes average Ka /Ks ratios.

4.2. Distinct selective and evolutionary patterns of the LRR-RLKs The two groups of LRR-RLKs with diverse functions in developmental processes and plant defenses showed significantly different levels of nucleotide diversity and Ka /Ks ratios, suggesting they have undergone uneven selections. Ks was defined as silent mutations that did not modify the amino acid sequence and could be used as a proxy for time, so we were able to compare the evolutionary processes of the two groups of LRR-RLKs by the variations of their Ka /Ks ratios over Ks (Fig. 2). The result showed a significant negative correlation between Ka /Ks and Ks for the LRR regions among the most closely related paralogs in the Expanded group (Fig. 2 and Fig. S4). The Ka /Ks ratios in the recent LRR-RLK duplicates were significantly greater, which should be an indicator of positive selection or relaxation of negative selection that occurred soon after duplication events. This finding was in agreement with previous studies, in which specific positive selection in the LRR region for a series of LRR-RLKs during evolution was detected [41,42]. Interestingly, studies on the NBS-LRR genes also showed the nearly identical significant negative correlations between Ka /Ks and Ks in the four species we studied here [11]. Therefore, this trend by time was likely a consequence of adaption of NBS-LRR and Expanded LRR-RLK genes to diverse plant pathogens. Functional redundancy generated by recently duplicated NBS-LRR and LRR-RLK genes allowed positive selection or relaxed negative selection to function [17], leading to the accumulation of degenerative mutations in the young duplicates. When the intensity of purifying selection became stronger later on, this accumulation would provide a reservoir of genetic variation for following functional specialization or neofunctionalization, in order to generate new disease resistant specificities against those fast-evolving effector genes of plant pathogens. On the other hand, no such trend could be deduced for the genes in the Nonexpanded group, even for those with functions in basal defense responses (Fig. 2 and Fig. S4). Recognition of conserved PAMPs or developmental signals did not require highly variable receptors in this type of LRR-RLKs. Furthermore, the selective forces acting on the genes could hardly be uniform since these genes participated in various signal pathways. There was no regular variation of the Ka /Ks ratios over time, demonstrating an evolutionary pattern of LRR-RLKs distinct from the ones in the Expanded group. 4.3. LRR-RLK expansion related to defense responses In this study, we identified approximately 0.80%, 0.83%, 0.71% and 0.87% of all the predicted genes as LRR-RLKs in A. thaliana, rice, poplar and grapevine, respectively. Although these values displayed a relatively same proportion of LRR-RLK genes among the four genomes except a slight decrease in poplar, it did not really mean that the LRR-RLKs distributed proportionally to the total gene

number. Firstly, the increased size of the rice RLK family has been verified as a result of the dramatic expansion of the gene family, but not a general trend of larger gene numbers in rice [17]. Besides, the most dramatic expansion turned out to take place independently in different species in a few subfamilies that had members with known functions in plant defense response [19], like the LRR-RLK subfamily. Our analysis in the four genomes confirmed this finding, and we further proposed that the LRR-RLKs in the Expanded group, which played roles in plant ETI, significantly contributed to the tandem duplications and lineage-specific expansions of the whole subfamily. The genes functioning in PTI were rarely involved in expansions though they also responded to the biotic stress imposed on plants. There are 77.2% Arabidopsis, 65.4% rice, 52.8% poplar and 91.1% grapevine LRR-RLK genes in the Expanded group residing in tandem repeats (Fig. S3). Most of the frequencies are even higher than the frequencies of NBS-LRRs in clusters (73.2%, 59.6%, 67.5% and 83.2% for A. thaliana, rice, poplar and grapevine, respectively [9,10,43]). Despite that the total numbers residing in tandem repeats and the extents of gene expansion of the LRR-RLKs are much lower than the NBS-LRRs, expansion is still of great importance for the evolution of LRR-RLKs in plant gene-for-gene resistance. Acknowledgments This work was supported by the National Natural Science Foundation of China. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.plantsci.2010.06.017. References [1] J.D. Jones, J.L. Dangl, The plant immune system, Nature 444 (2006) 323–329. [2] L. Gomez-Gomez, T. Boller, FLS2: an LRR receptor-like kinase involved in the perception of bacterial elicitor flagellin in Arabidopsis, Mol. Cell 5 (2000) 1003–1011. [3] R. Takai, A. Isogai, S. Takayama, F.S. Che, Analysis of flagellin perception mediated by flg22 receptor OsFLS2 in rice, Mol. Plant Microbe Interact. 21 (2008) 1635–1642. [4] C. Zipfel, Pattern-recognition receptors in plant innate immunity, Curr. Opin. Immunol. 20 (2008) 10–16. [5] S.T. Chisholm, G. Coaker, B. Day, B.J. Staskawicz, Host–microbe interactions: shaping the evolution of the plant immune response, Cell 124 (2006) 803–814. [6] J.E. Parker, M.J. Coleman, V. Szabò, L.N. Frost, R. Schmidt, E.A. van der Biezen, T. Moores, C. Dean, M.J. Daniels, J.D.G. Jones, The Arabidopsis downy mildew resistance gene RPP5 shares similarity to the Toll and interleukin-1 receptors with N and L6, Plant Cell 9 (1997) 879–894. [7] J.L. Dangl, J.D. Jones, Plant pathogens and integrated defence responses to infection, Nature 411 (2001) 826–833. [8] B.C. Meyers, S. Kaushik, R.S. Nandety, Evolving disease resistance genes, Curr. Opin. Plant Biol. 8 (2005) 129–134.

406

P. Tang et al. / Plant Science 179 (2010) 399–406

[9] B.C. Meyers, A. Kozik, A. Griego, H. Kuang, R.W. Michelmore, Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis, Plant Cell 15 (2003) 809–834. [10] S. Yang, Z. Feng, X. Zhang, K. Jiang, X. Jin, Y. Hang, J.Q. Chen, D. Tian, Genomewide investigation on the genetic variations of rice disease resistance genes, Plant Mol. Biol. 62 (2006) 181–193. [11] S. Yang, X. Zhang, J.X. Yue, D. Tian, J.Q. Chen, Recent duplications dominate NBS-encoding gene expansion in two woody species, Mol. Genet. Genomics 280 (2008) 187–198. [12] M. Mondragón-Palomino, B.C. Meyers, R.W. Michelmore, B.S. Gaut, Patterns of positive selection in the complete NBS-LRR gene family of Arabidopsis thaliana, Genome Res. 12 (2002) 1305–1315. [13] E.G. Bakker, C. Toomajian, M. Kreitman, J. Bergelson, A genome-wide survey of R gene polymorphisms in Arabidopsis, Plant Cell 18 (2006) 1803–1818. [14] R.L. Allen, P.D. Bittner-Eddy, L.J. Grenville-Briggs, J.C. Meitz, A.P. Rehmany, L.E. Rose, J.L. Beynon, Host-parasite coevolutionary conflict between Arabidopsis and downy mildew, Science 306 (2004) 1957–1960. [15] S.H. Shiu, A.B. Bleecker, Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases, Proc. Natl. Acad. Sci. U.S.A. 98 (2001) 10763–10768. [16] S.H. Shiu, A.B. Bleecker, Expansion of the receptor-like kinase/Pelle gene family and receptor-like proteins in Arabidopsis, Plant Physiol. 132 (2003) 530–543. [17] S.H. Shiu, W. Karlowski, R. Pan, Y. Tzeng, K. Mayer, W.H. Li, Comparative analysis of the receptor-like kinase family in Arabidopsis and rice, Plant Cell 16 (2004) 1220–1234. [18] P. Liu, W. Wei, S. Ouyang, J.S. Zhang, S.Y. Chen, W.K. Zhang, Analysis of expressed receptor-like kinases (RLKs) in soybean, J. Genet. Genomics 36 (2009) 611–619. [19] M.D. Lehti-Shiu, C. Zou, K. Hanada, S.H. Shiu, Evolutionary history and stress regulation of plant receptor-like kinase/pelle genes, Plant Physiol. 150 (2009) 12–26. [20] J. Li, J. Chory, A putative leucine-rich repeat receptor kinase involved in brassinosteroid signal transduction, Cell 90 (1997) 929–938. [21] D.Z. Zhao, G.F. Wang, B. Speal, H. Ma, The excess microsporocytes1 gene encodes a putative leucine-rich repeat receptor protein kinase that controls somatic and reproductive cell fates in the Arabidopsis anther, Genes Dev. 16 (2002) 2021–2031. [22] A.J. Afzal, A.J. Wood, D.A. Lightfoot, Plant receptor-like serine threonine kinases: roles in signaling and plant defense, Mol. Plant Microbe Interact. 21 (2008) 507–517. [23] C. Zipfel, G. Kunze, D. Chinchilla, A. Caniard, J.D. Jones, T. Boller, G. Felix, Perception of the bacterial PAMP EF-Tu by the receptor EFR restricts Agrobacterium-mediated transformation, Cell 125 (2006) 749–760. [24] W.Y. Song, G.L. Wang, L.L. Chen, H.S. Kim, L.Y. Pi, T. Holsten, J. Gardner, B. Wang, W.X. Zhai, L.H. Zhu, C. Fauquet, P. Ronald, A receptor kinase-like protein encoded by the rice disease resistance gene, Xa21, Science 270 (1995) 1804–1806. [25] X. Sun, Y. Cao, Z. Yang, C. Xu, X. Li, S. Wang, Q. Zhang, Xa26, a gene conferring resistance to Xanthomonas oryzae pv. oryzae in rice, encodes an LRR receptor kinase-like protein, Plant J. 37 (2004) 517–527. [26] S.A. Morillo, F.E. Tax, Functional analysis of receptor-like kinases in monocots and dicots, Curr. Opin. Plant Biol. 9 (2006) 460–469. [27] X. Gou, K. He, H. Yang, T. Yuan, H. Lin, S.D. Clouse, J. Li, Genome-wide cloning and sequence analysis of leucine-rich repeat receptor-like protein kinase genes in Arabidopsis thaliana, BMC Genomics 11 (2010) 19. [28] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22 (1994) 4673–4680. [29] K. Tamura, J. Dudley, M. Nei, S. Kumar, MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0, Mol. Biol. Evol. 24 (2007) 1596–1599. [30] J. Kilian, D. Whitehead, J. Horak, D. Wanke, S. Weinl, O. Batistic, C. D’Angelo, E. Bornberg-Bauer, J. Kudla, K. Harter, The AtGenExpress global stress expression

[31] [32] [33]

[34]

[35] [36]

[37] [38] [39]

[40]

[41]

[42] [43]

data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses, Plant J. 50 (2007) 347–363. Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana, Nature 408 (2000) 796–815. International Rice Genome Sequencing Project, The map-based sequence of the rice genome, Nature 436 (2005) 793–800. G.A. Tuskan, S. Difazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, J. Schein, L. Sterck, A. Aerts, R.R. Bhalerao, R.P. Bhalerao, D. Blaudez, W. Boerjan, A. Brun, A. Brunner, V. Busov, M. Campbell, J. Carlson, M. Chalot, J. Chapman, G.L. Chen, D. Cooper, P.M. Coutinho, J. Couturier, S. Covert, Q. Cronk, R. Cunningham, J. Davis, S. Degroeve, A. Déjardin, C. Depamphilis, J. Detter, B. Dirks, I. Dubchak, S. Duplessis, J. Ehlting, B. Ellis, K. Gendler, D. Goodstein, M. Gribskov, J. Grimwood, A. Groover, L. Gunter, B. Hamberger, B. Heinze, Y. Helariutta, B. Henrissat, D. Holligan, R. Holt, W. Huang, N. Islam-Faridi, S. Jones, M. Jones-Rhoades, R. Jorgensen, C. Joshi, J. Kangasjärvi, J. Karlsson, C. Kelleher, R. Kirkpatrick, M. Kirst, A. Kohler, U. Kalluri, F. Larimer, J. Leebens-Mack, J.C. Leplé, P. Locascio, Y. Lou, S. Lucas, F. Martin, B. Montanini, C. Napoli, D.R. Nelson, C. Nelson, K. Nieminen, O. Nilsson, V. Pereda, G. Peter, R. Philippe, G. Pilate, A. Poliakov, J. Razumovskaya, P. Richardson, C. Rinaldi, K. Ritland, P. Rouzé, D. Ryaboy, J. Schmutz, J. Schrader, B. Segerman, H. Shin, A. Siddiqui, F. Sterky, A. Terry, C.J. Tsai, E. Uberbacher, P. Unneberg, J. Vahala, K. Wall, S. Wessler, G. Yang, T. Yin, C. Douglas, M. Marra, G. Sandberg, Y. van de Peer, D. Rokhsar, The genome of black cottonwood, Populus trichocarpa (torr. & gray), Science 313 (2006) 1596–1604. O. Jaillon, J.M. Aury, B. Noel, A. Policriti, C. Clepet, A. Casagrande, N. Choisne, S. Aubourg, N. Vitulo, C. Jubin, A. Vezzi, F. Legeai, P. Hugueney, C. Dasilva, D. Horner, E. Mica, D. Jublot, J. Poulain, C. Bruyère, A. Billault, B. Segurens, M. Gouyvenoux, E. Ugarte, F. Cattonaro, V. Anthouard, V. Vico, C. del Fabbro, M. Alaux, G. di Gaspero, V. Dumas, N. Felice, S. Paillard, I. Juman, M. Moroldo, S. Scalabrin, A. Canaguier, I. le Clainche, G. Malacrida, E. Durand, G. Pesole, V. Laucou, P. Chatelet, D. Merdinoglu, M. Delledonne, M. Pezzotti, A. Lecharny, C. Scarpelli, F. Artiguenave, M.E. Pè, G. Valle, M. Morgante, M. Caboche, A.F. Adam-Blondon, J. Weissenbach, F. Quétier, P. Wincker, French-Italian public consortium for grapevine genome characterization: the grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla, Nature 449 (2007) 463–467. S. Robatzek, I.E. Somssich, Targets of AtWRKY6 regulation during plant senescence and pathogen defense, Genes Dev. 16 (2002) 1139–1149. D. Song, G. Li, F. Song, Z. Zheng, Molecular characterization and expression analysis of OsBISERK1, a gene encoding a leucine-rich repeat receptor-like kinase, during disease resistance responses in rice, Mol. Biol. Rep. 35 (2008) 275–283. L. McHale, X. Tan, P. Koehl, R.W. Michelmore, Plant NBS-LRR proteins: adaptable guards, Genome Biol. 7 (2006) 212. W.Y. Song, L.Y. Pi, G.L. Wang, J. Gardner, T. Holsten, P.C. Ronald, Evolution of the rice Xa21 disease resistance gene family, Plant Cell 9 (1997) 1279–1287. F.L.W. Takken, C.M. Thomas, M.H.A.J. Joosten, C. Golstein, N. Westerink, J. Hille, H.J.J. Nijkamp, P.J.G.M. De Wit, J.D.G. Jones, A second gene at the tomato Cf-4 locus confers resistance to Cladosporium fulvum through recognition of a novel avirulence determinant, Plant J. 20 (1999) 279–288. G. Wang, U. Ellendorff, B. Kemp, J.W. Mansfield, A. Forsyth, K. Mitchell, K. Bastas, C.M. Liu, A. Woods-Tör, C. Zipfel, P.J. de Wit, J.D. Jones, M. Tör, B.P. Thomma, A genome-wide functional investigation into the roles of receptor-like proteins in Arabidopsis, Plant Physiol. 147 (2008) 503–517. X.S. Zhang, J.H. Choi, J. Heinz, C.S. Chetty, Domain-specific positive selection contributes to the evolution of Arabidopsis leucine-rich repeat receptor-like kinase (LRR RLK) genes, J. Mol. Evol. 63 (2006) 612–621. E. Strain, S.V. Muse, Positively selected sites in the Arabidopsis receptor-like kinase gene family, J. Mol. Evol. 613 (2005) 25–32. T. Zhou, Y. Wang, J.Q. Chen, H. Araki, Z. Jing, K. Jiang, J. Shen, D. Tian, Genomewide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes, Mol. Genet. Genomics 271 (2004) 402–415.