Molecular evolution of the MLO gene family in Oryza sativa and their functional divergence

Molecular evolution of the MLO gene family in Oryza sativa and their functional divergence

Available online at www.sciencedirect.com Gene 409 (2008) 1 – 10 www.elsevier.com/locate/gene Molecular evolution of the MLO gene family in Oryza sa...

346KB Sizes 0 Downloads 60 Views

Available online at www.sciencedirect.com

Gene 409 (2008) 1 – 10 www.elsevier.com/locate/gene

Molecular evolution of the MLO gene family in Oryza sativa and their functional divergence Qingpo Liu a,⁎, Huiqin Zhu b,c a

School of Agriculture and Food Science, Zhejiang Forestry University, Hangzhou, Lin'an 311300, PR China b Department of Agronomy, Zhejiang University, Hangzhou 310029, PR China c Department of Agronomy, Qinghai University, Xining 810003, PR China Received 19 June 2007; received in revised form 22 October 2007; accepted 28 October 2007 Available online 26 December 2007 Received by A.J. van Wijnen

Abstract The present study identified 12 MLO genes in rice that were located on chromosomes 1, 2, 3, 4, 5, 6, 10, and 11 respectively without any obvious clustering. On a genome scale we showed that the expansion of rice MLO gene family was primarily attributed to segmental duplication produced by polyploidy, rather than through tandem amplification. Gene conversion events should also play important roles in the evolution of MLO genes. The results of relative rate ratio test and maximum likelihood analysis suggested that positive selection should have occurred after gene duplication and/or speciation, prompting the formation of distinct MLO subfamilies. Functional divergence analysis provided statistical evidence for shifted evolutionary rate after gene duplication. Compared to extracellular loop 3 and Ca2+-binding domain, much stronger functional constraints should impose on intracellular loop 2, although all of the three regions might be under purifying selection. The sliding window analysis of dN/dS ratio values identified one sequence region where strong functional constraints must impose on, and consequently should be crucial for functionality of MLO genes. © 2007 Elsevier B.V. All rights reserved. Keywords: Rice; MLO; Gene conversion; Adaptive evolution; Functional divergence

1. Introduction The MLO (powdery-mildew-resistance gene o) gene has been firstly identified in barley (Büschges et al., 1997). Devoto et al. (1999) demonstrated experimentally that the MLO protein is an Abbreviations: cDNA, complementary to RNA; CDS, coding DNA sequence; dN/dS, ratio of nonsynonymous-to-synonymous substitutions; EST, expressed sequence tag; GEO, gene expression omnibus; HMM, hidden Markov model; KOME, Knowledge-based Oryza Molecular biological Encyclopedia; LRT, likelihood ratio statistic; ME, minimal evolution; MLO, powdery-mildewresistance gene o; MP, maximum parsimony; NCBI, National Center for Biotechnology Information; NJ, neighbor-joining; RGP, Rice Genome Project; TIGR, Institute for Genomic Research; θ, coefficient of type I functional divergence; WGD, whole genome duplication. ⁎ Corresponding author. E-mail address: [email protected] (Q. Liu). 0378-1119/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2007.10.031

integral plasma membrane-localized protein that possesses seven hydrophobic membrane-spanning helices. Some studies revealed that MLO genes should form a multi-gene family, which is unique for plants. In Arabidopsis, this family is composed of 15 MLO genes (Devoto et al., 2003), whereas there are 9 MLOs in maize although this number might be underestimated because of the unfinished genome project. So far, only a few of MLO genes have been identified and isolated from barley, wheat, Lycopersicon, Lotus, Capsicum, Brassica, and Physcomitrella, respectively. It is evidenced that MLO family members play crucial roles in modulating defense responses and cell death (Büschges et al., 1997; Piffanelli et al., 2002). In barley, the presence of MLO proteins is absolutely required for powdery mildew fungi to successfully penetrate the host cell wall (Panstruga, 2005), whereas homozygous mutant (mlo) alleles of the MLO gene confer broad spectrum disease resistance to these pathogens (Jørgensen, 1992),

2

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

and exhibit a spontaneous leaf cell death phenotype (Wolter et al., 1993). Interestingly, it seems that although the recessive mlo genes confer broad resistance to powdery mildew fungi, they are hypersusceptible to the rice blast fungus Magnaporthe grisea, where an acquired resistance (AR) after chemical and/or biological treatments is needed to overcome the hypersusceptible phenotype (Jarosch et al., 2003). The expression of barley MLO can be affected by many biotic and abiotic stress stimuli. Piffanelli et al. (2002) demonstrated that each of the treatments with barley and wheat powdery mildew fungi, rice blast, leaf wounding, and the herbicide paraquat, upregulated the MLO expression level, although it is different for the extent to be induced. The inducibility of MLO expression under a range of conditions suggested a broad role for MLO genes (Piffanelli et al., 2002). Nonetheless, knowledge about the precise action mechanism of MLO proteins is limited. It was reported that two genes, ROR1 and ROR2, were required for full mlo-mediated resistance (Freialdenhoven et al., 1996; Peterhänsel et al., 1997), while MLO-mediated defense suppression might likely involve one or several small GTP-binding proteins of the ROP family (Schultheiss et al., 2002). Consequently, Bhat et al. (2005) suggested that MLO, ROR2 and potentially additional proteins might form a novel pathogen-triggered micro-domain at biotic stress sites. It was worth noting that a gene termed Bl-1 that is a regulator of cellular defense in barley was demonstrated to be sufficient to substitute for MLO function in accessibility to fungal parasites (Hückelhoven et al., 2003). According to the fact that apart from the barley MLO in powdery mildew pathogenesis, only a few mentions were made in other MLO genes, it is urgently necessary to perform in-depth analysis of the whole MLO family with respects to their evolution and functional divergence. The rice genome projects have been finished (International Rice Genome Sequencing Project, 2005; Yu et al., 2005), which provides for the first time a comprehensive overview of the rice MLO gene family. The present study identified 12 MLO genes in the indica and japonica rice respectively. Here we focused our attention on the evolution and divergence analysis of the MLO family. Our results showed that the form and divergence of this family should be primarily attributable to gene duplication, gene conversion, and positive selection etc. Moreover, using a window analysis (Tsunoyama and Gojobori, 1998), a sequence region where strong functional constraints should impose on was identified in the representative member for each subfamily. 2. Materials and methods

in RGP, NCBI, and TIGR were also conducted using the MLO domain sequence as query. In addition, BLASTP searches were performed to screen the NCBI and UniProt/SwissProt protein databases for identifying MLO proteins from other plant species. False-positive hits and pseudogenes were excluded by means of domain detection and promoter identification. Program InterProScan (Quevillon et al., 2005) was used to characterize the domain signature of the MLO gene family. Transmembrane regions were predicted using ConPred II (Arai et al., 2004) and the TMHMM server v2.0 (Krogh et al., 2001). Ca2+-binding domain is identified as a 20-amino acid sequence in the C-terminal tail, where hydrophobic amino acids are present at positions 1, 8, and 14, and a conserved Trp residue is also found within this region (Kim et al., 2002). 2.2. Analysis of rice MLO gene evolution Rice MLO genes were found to show a scattered distribution pattern on chromosomes, and thereby segmental duplication was expected to contribute the expansion of this gene family, although stochastic duplicate across long chromosomal regions could probably also result in the similar pattern sometimes. Schauser et al. (2005) found that an effective way to detect this type of duplication event was to identify additional paralogous protein pairs in the neighborhood of each of the family members. The present study also focused on 10 proteins encoded by genes flanking each of the 12 rice MLOs (5 on each side). 2.3. Multiple sequence alignment and phylogenetic tree construction Alignment of amino acid sequences was performed using CLUSTALW (Thompson et al., 1994) with the following parameters: gap opening penalty, 10; gap extension penalty, 0.05; and the BLOSUM series for a protein weight matrix. The protein alignment was subsequently converted into the corresponding coding DNA sequences (CDSs) alignment using the program PAL2NAL (Suyama et al., 2006). The phylogenetic tree was reconstructed with MEGA (version 3.1; Kumar et al., 2004) by employing the neighbor-joining (NJ) and minimal evolution (ME) method, respectively. For both the NJ and ME methods, the parameters p-distance model and pairwise deletion of gaps/missing data were used. Bootstrap test of phylogeny was performed with 1000 replications. Programs TREEVIEW (Page, 1996) and MEGA v 3.1 (Kumar et al., 2004) were used to display the phylogenetic trees.

2.1. Sequence database search and domain detection 2.4. Gene conversion events detection Using statistical descriptions of a sequence family's consensus, profile hidden Markov models (profile HMMs) can be used to do sensitive database searching (http://hmmer.janelia.org/). HMMER (Eddy, 1998) is such a freely distributable implementation of profile HMM software for protein sequence analysis. In the present study, published MLO proteins were used to construct a HMM profile, and then a HMMER (version 2.3.2; Eddy, 1998) search of the rice proteome was performed. In order to obtain a full list of rice MLOs, TBLASTN searches against the rice genomic sequences collected

Analysis for possible gene conversion events was carried out using the program GeneConv (Sawyer, 1989; http://www.math. wustl.edu/∼sawyer/geneconv). 2.5. Estimation of functional divergence DIVERGE, a program developed by Gu and Velden (2002) was used to detect functional divergence between members of a

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

protein family (Gu et al., 2002). In this analysis, four gene clusters of interests (A, B, C, and D) are selected. The coefficient of type I functional divergence θ and likelihood ratio statistic (LRT) between any two clusters are quickly calculated. If θ is significantly greater than 0, it means altered selective constraints of amino acid sites after gene duplication (Gu, 2003). The estimated θ values for all the pairs of clusters can be used to create a matrix of functional distance (dF) values. Given this matrix, a standard least squares method can be implemented based on the formula dF(A,B) = bF(A) + bF(B) to estimate bF for each gene cluster, where bF(x) is the functional branch length of a given gene cluster x (Gu, 2003). 2.6. Adaptive evolution analysis Adaptive evolution was investigated by employing the Creevey–McInerney method (Creevey and McInerney, 2002) included in the program CRANN (Creevey and McInerney, 2003). The Yang (1998) method was employed as alternative approach to detect possible adaptive evolution. In this approach, the Yang and Nielsen (2000) method implemented in the codeml program of the PAML software package (Yang, 1997) was used to calculate the ratio of nonsynonymous-to-synonymous substitutions (dN/dS) for each branch under two models (one-ratio and free-ratio for branches) respectively. The likelihood ratio test was applied to decide which value fitted the data better. 2.7. Calculating of dN/dS ratios The Yang and Nielsen (2000) method implemented in the yn00 program of PAML (Yang, 1997) was used to calculate the ratio of nonsynonymous-to-synonymous substitutions (dN/dS). Three regions including intracellular loop 2, extracellular loop 3, and the Ca2+-binding domain, were investigated separately. The nucleotide sequences corresponding to the amino acid residues of domains mentioned above were aligned respectively using CLUSTALW. The yn00 program was performed to calculate dN/ dS ratios for each pairwise comparison in a multiple alignment. Thus, an average dN/dS ratio for each region was present for comparison.

3

3. Results 3.1. Identification of rice MLO genes After a careful genome-scale search, 12 MLO genes were identified in japonica rice (defined as OsMLOs; Table 1). OsMLOs were numbered from 1 to 12 according to the score of HMM search. In order to demonstrate the reliability of identified genes, we further searched the full-length cDNA database in KOME and the rice expression database in TIGR (http://rice.tigr. org/tdb/e2k1/osa1/locus_expression_evidence.shtml), and found that 9 of 12 OsMLOs were supported by cDNA sequences in KOME (Table 1); OsMLOs 6 and 12 each matched one EST; Although OsMLO5 was not supported by the present cDNA and/or EST sequences, the searching of rice expression database in TIGR provided evidences for supporting its existence and expression (data not shown). Further, we have identified promoters for OsMLOs using the plant promoter prediction program TSSP that was developed by Softberry Inc. These results supported the idea that OsMLOs are functional or expressed in rice. In addition, we have also identified 12 MLO genes in indica rice cultivar 93-11 (Supplementary Table S1). The comparison of indica and japonica rice MLO genes showed that most of them are highly homologous (Supplementary Table S2, and Supplementary Fig. S1). It was noticeable that in the GenBank annotation, the indica and japonica OsMLO11 are 1004 and 980 AA in length respectively, whereas they are the same length (490 AA) when using BGF as prediction software (http:// rise.genomics.org.cn/rice/index2.jsp). Luckily enough, one fulllength cDNA sequence corresponding to japonica OsMLO11 (AK242559) was cloned in KOME. According to the prediction result, OsMLO11 should be 515 AA in length instead of 980 or 490 AA. 3.2. Genomic distribution of rice MLO genes It is evidenced that gene families could arise through tandem amplification, resulting in a clustered occurrence, or through segmental duplication of chromosomal regions, resulting in a scattered occurrence of family members (Schauser et al., 2005). The rice MLO family members were located on the 1, 2, 3, 4, 5,

2.8. Window analysis for the dN/dS ratios In order to identify particular regions where strong functional constraints should work on, a window analysis (Tsunoyama and Gojobori, 1998) was conducted using CRANN, a program that can calculate pairwise distances as a moving window analysis (Creevey and McInerney, 2003). In this analysis, eight gene pairs were used as representative to calculate the dN/dS ratios (Fig. 3). On each gene pair alignment, a window size of 70-codon length was defined, because this length is the minimum to ensure that one can obtain the number of nonsynonymous and synonymous substitution for each window and can avoid the saturation effect of nucleotide substitutions on the estimation process. Based on the defined window size, the dN/dS ratio for each window along the nucleotide sites codon by codon was calculated accordingly.

Table 1 List of MLO genes identified in japonica rice MLO

Accession ID

Chromosome

AA length

cDNA

OsMLO1 OsMLO2 OsMLO3 OsMLO4 OsMLO5 OsMLO6 OsMLO7 OsMLO8 OsMLO9 OsMLO10 OsMLO11 OsMLO12

AC073166 AP003346 AC099739 AL606456 AL731606 AP003518 AP003616 AP004191 AC134933 AP004144 AC135794 AC135431

10 1 3 4 4 6 6 2 5 2 11 5

580 502 555 596 476 499 431 507 499 516 515 366

AK072272 AK121374 AK098993 AK111773

AK109906 AK066134 AK121347 AK121163 AK242559

4

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

Fig. 1. Genomic organization of rice MLO genes. (A) Localization of OsMLOs on rice chromosomes. The relative sizes of rice chromosomes were consistent with that in RGP. (B) Detection of segmental duplications in regions of the rice genome encompassing OsMLOs. The sequences of 10 proteins surrounding each OsMLO (5 on each side) were concatenated to form one block. A vertical black bar indicates the concatenation of two protein sequences. This was done for all 12 OsMLOs, resulting in 12 blocks, which were then searched against each other by using a reciprocal best-hit BLAST strategy. The three pairs of OsMLOs identified probably resulting from segmental duplications are shown here.

6, 10, and 11 chromosomes respectively. By contrast with only one MLO gene found on chromosomes 1, 3, 10, and 11, two MLOs were located on the chromosomes 2 (OsMLOs 8 and 10), 4 (OsMLOs 4 and 5), 5 (OsMLOs 9 and 12), and 6 (OsMLOs 6 and 7) respectively without any obvious local tandem clustering (Fig. 1A). Thus, it can expect that the rice MLO gene family would mainly result from segmental duplications rather than single gene tandem amplification. To test this postulation, the method used by Schauser et al. (2005) was employed to investigate the evolutionary relationship between duplicated segments. In this way, three pairs of OsMLOs, including OsMLOs 2 and 9, OsMLOs 5 and 10, and OsMLOs 7 and 8, allowed to be identified as the result of segmental duplication events (Fig. 1B). Further examining the rice

duplication blocks identified by Yu et al. (2005) and Wang et al. (2005), we noticed that with three exceptions (OsMLOs 3, 4, and 6), rice MLO genes were located within regions that were supposed to have undergone previous large-scale duplication events. Moreover, we found that the above three pairs of OsMLOs involve three duplication blocks that are corresponding to part of the long arm of chromosome 1 (OsMLO2) and part of the long arm of chromosome 5 (OsMLO9), part of the long arm of chromosome 2 (OsMLO5) and part of the long arm of chromosome 2 (OsMLO10), part of the short arm of chromosome 2 (OsMLO8) and part of the long arm of chromosome 6 (OsMLO7), respectively. As for OsMLOs 1, 11, and 12, we could speculate that they might have arisen as a result of duplication events, and lost their counterparts in the long period

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

time of evolution, because there was only one copy for them in the duplicated regions on chromosomes 10, 11, and 5 respectively. This explanation was reasonable, because evidences showed that after gene duplication occurs, a newly duplicated gene would either be lost or fixed in the population by genetic drift or natural selection (Lynch et al., 2001).

5

3.3. Detection of gene conversion events Gene conversion is any process that causes a segment of DNA to be copied onto another segment of DNA (Sawyer, 1989). Short-segment gene conversion is an important force in evolution (Hilliker et al., 1994). The software GeneConv

Fig. 2. Phylogenetic analysis of MLO proteins by employing the NJ method implemented in MEGA version 3.1. The numbers beside the branches represent bootstrap values (≥60%) based on 1000 replications. The scale is in amino acid substitutions per site. Branch numbers are in bold and italic. Black ellipses indicate the branches (1, 6, 7, 19, and 41) that were identified to be possibly under positive selection by both the relative rate ratio test and maximum likelihood analysis. The putative gene duplicates in rice and Arabidopsis are marked by solid and empty triangles at the nodes respectively. The square and diamond indicate the Physcomitrella patens and Ostreococcus tauri MLO respectively. To identify the species of origin for each MLO protein, a species acronym is included before the protein name: At, Arabidopsis thaliana; Br, Brassica rapa; Ca, Capsicum annuum; Hv, Hordeum vulgare; Lc, Lotus corniculatus; Le, Lycopersicon esculentum; Os, Oryza sativa; Ot, Ostreococcus tauri; Pp, Physcomitrella patens; Ta, Triticum aestivum; Zm, Zea mays.

6

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

Table 2 Functional divergence between subfamilies of the MLO gene family

3.5. Adaptive evolution analysis

Group 1

Group 2

θ ± S.E.

LRT

P

dF

A A A B B C

B C D C D D

0.112 ± 0.061 0.196 ± 0.039 0.334 ± 0.091 0.154 ± 0.065 0.257 ± 0.107 0.172 ± 0.072

3.89 24.89 13.31 5.63 5.80 5.63

b0.05 b0.05 b0.05 b0.05 b0.05 b0.05

0.118 0.218 0.406 0.167 0.297 0.189

Note: θ, the coefficient of type I functional divergence between two gene clusters. LRT, likelihood ratio statistic. dF, functional distance.

(Sawyer, 1989) was employed to detect possible gene conversion events that have occurred during the evolution of plant-specific MLO genes. The results show that there are no global outer and 3 global inner-sequence significant fragments that involve three gene pairs OsMLOs 3 and 6, OsMLOs 5 and 10, and ZmMLO3 and OsMLO8 (p-values 0.0005, 0.0103, and 0.0463, respectively). OsMLOs 3 and 6 are phylogenetically closest to each other (Fig. 2). However, they are outside of the identified duplication regions on chromosomes 3 and 6. Therefore, this result might suggest that OsMLOs 3 and 6 were originally tandem duplication that, over time, migrated to other parts of the genome, but it cannot rule out the possibility of direct duplications to remote loci (Yu et al., 2005). Furthermore, a total of 272 pairwise inner and 0 pairwise outer-sequence significant fragments were found (p b 0.05; Supplementary Table S3). These results powerfully demonstrated that gene conversion events must have occurred in rice MLO genes and contribute to their evolution. 3.4. Functional divergence between MLO subfamilies Type I functional divergence between subfamilies of the MLO family was estimated by posterior analysis using the program DIVERGE that evaluates shifted evolutionary rate after gene duplication or speciation (Gu, 2003). The estimation was based on the MLO protein neighbor-joining tree, where four major subfamilies were clearly presented with highly significant bootstrap value support (Fig. 2). We found that the coefficients of functional divergence (θ) between subfamilies A, B, C, and D are all significantly greater than 0 (p b 0.05; Table 2), which provides statistical evidence for supporting the hypothesis of significantly altered selective constraint between subfamilies. Further, a functional distance analysis was conducted. The functional distances (dF) are shown in Table 2. We estimated the functional branch length (bF) for each subfamily by employing the least-squares method, and found that the level of altered selective constraints of subfamily genes, measured by this index, followed bF (D; 0.23) N bF (A; 0.17) N bF (B; 0.07) N bF (C; 0.03), suggesting that subfamily D should be significantly divergent from other subfamilies, that is, the type I functional divergence might have mainly occurred in subfamily D during MLO evolution.

The relative rate ratio test for adaptive evolution was performed on the coding sequences of MLO genes, based on the neighbor-joining tree shown in Fig. 2. In this analysis, a G-test was performed to evaluate the significance of whether the ratio of replacement-invariable (RI) over replacement-variable (RV) Table 3 Detection of adaptive evolution by relative rate ratio test and maximum likelihood analysis CRANN (Creevey and McInerney, 2003)

PAML (Yang, 1997)

Branch no.

RI

RV

SI

SV

G value

dN/dS ratio

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

98 191 94 191 417 7 41 88 121 44 179 250 29 49 58 111 158 248 507 29 45 64 104 141 217 265 773 775 775 23 68 103 84 220 47 96 109 127 134 363 434 1209 1629

181 484 157 445 1033 14 35 96 248 223 506 764 355 572 88 247 472 1148 2016 1019 1437 758 198 450 1288 2842 4955 5346 5787 344 685 220 267 587 174 325 557 842 1163 1853 2702 8640 10,262

71 103 94 114 219 11 29 53 82 32 120 166 25 42 59 103 118 168 335 25 35 32 60 89 123 160 495 495 495 17 42 54 49 122 46 102 118 122 126 248 290 785 1004

227 447 176 348 850 35 68 122 188 143 361 512 284 459 131 280 451 993 1570 306 503 266 184 386 699 1269 2910 3090 3266 174 307 171 205 427 172 289 481 663 853 1342 1709 5042 6107

8.87 ⁎ 15.41 ⁎ 0.39 3.85 ⁎ 22.62 ⁎ 0.61 10.18 ⁎ 11.61 ⁎ 0.42 0.24 0.21 0.01 0.07 0.09 2.71 1.52 3.18 5.08 ⁎ 4.47 ⁎ 13.13 ⁎ 11.24 ⁎ 2.31 6.27 ⁎ 4.11 ⁎ 0.12 7.97 ⁎ 1.95 2.59 4.01 ⁎ 1.25 2.31 4.07 ⁎ 1.86 4.44 ⁎ 0.01 1.18 2.37 2.10 3.52 0.42 0.44 4.69 ⁎ 0.66

a

27.8

28.4 46.2

42.6

29.1

65.7

50.9

Note: RI, replacement-invariable. RV, replacement-variable. SI, silent-invariable. SV, silent-variable. a, Only dN/dS ratio values being significantly greater than 1 (p b 0.05) are listed in this table. ⁎, Significant at 0.05 level.

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

substitutions was greater than expected from neutrality (Creevey and McInerney, 2002). The results showed that the ratio of RI to RV deviated significantly from neutrality in branches 0, 1, 3, 4, 6, 7, 17, 18, 19, 20, 22, 23, 25, 28, 31, 33, and 41 (Table 3). In the second approach, maximum likelihood analysis of nonsynonymous and synonymous ratios along lineages was performed testing the two extreme models: one-ratio model (a unique rate ratio for all the branches of the phylogenetic tree) and free-ratio model (an independent rate ratio value for each branch). The results showed that the log likelihood was − 27,237.4 for the free-ratio model, while it was − 27,422.5 for the one-ratio model. The log likelihood difference between the two models (185.1) was statistically significant (p b 0.05), suggesting that the free-ratio model fitted the data better. The dN/dS ratio values being significantly greater than 1 for branches were also listed in Table 3. Taken together, the branches being detected to be possibly under adaptive evolution by both of the two approaches, including 1, 6, 7, 19, and 41, were marked on Fig. 2 by black ellipses. 3.6. Strong purifying selection in intracellular loop 2 MLO was identified as a novel calmodulin-binding protein (Kim et al., 2002). The Ca2+-binding domain at its highly variable C-terminal tail was supposed to serve as a docking site for interaction with CaM isoform proteins, and consequently in response to stress signals. As for the extracellular loops, Devoto et al. (2003) found that strong functional constraint should impose on extracellular loop 1 during long period time of evolution.

7

Accordingly, it is of interest to know how about this case in extracellular loop 3. We followed an approach to detect purifying selection by evaluating nonsynonymous versus synonymous substitutions rate ratio (dN/dS) using the Yang and Nielsen (2000) method. The advantage of this method is that it does not depend on the accuracy of ancestral sequences. In this analysis, ratios greater than, less than and equal to 1 are indicative of positive selection, purifying selection and neutral selection, respectively. When strong functional constraints work on a region, dS is greater than dN, resulting in dN/dS being close to zero. We calculated the average dN/dS ratios for extracellular loop 3, intracellular loop 2, and Ca2+-binding domain respectively, and found that these values (0.62, σ = 0.05; 0.21, σ = 0.01; and 0.75, σ = 0.04 respectively) are significantly less than 1 (p b 0.05), suggesting that functional constraints might work on these regions. In particular, significantly strong purifying selection should impose on intracellular loop 2 for functionality of MLO proteins, whereas functional constraints were less for extracellular loop 3 and the Ca2+-binding domain, as the average dN/dS ratios for the latter two regions were almost three times higher than that of intracellular loop 2, and these differences were statistically significant (p b 0.01). 3.7. Sliding window analysis In order to investigate the potential functional important regions for the representative gene pairs in each subfamily, a minimum of 70-codon length of window size was defined. For each window, the dN/dS ratio was calculated along the nucleotide

Fig. 3. The sliding window analysis of dN/dS ratio for representative gene pairs in each subfamily. The vertical line shows the dN/dS ratios, and the horizontal line shows the amino acid site numbers of aligned sequences. The solid arrow indicates the identified sequence region where strong functional constraints should impose on. The dashed arrow indicates the short sequence region with dN/dS ratios less than 0.1.

8

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

sites codon by codon using the program CRANN (Creevey and McInerney, 2003). Fig. 3 shows the results of our sliding window analysis for MLO gene pairs. For each subfamily, two gene pairs were analyzed for comparison. We observed that with the exception of subfamily D, the two gene pairs in other subfamilies showed similar tendency (Fig. 3). The present study defined dN/dS b 0.1 as the explicit cutoff dictating low dN/dS regions, because this value is significantly below 1. For consistency, one sequence region could be clearly identified as low dN/dS ratio region where strong functional constraints should impose on, although another short region with dN/dS ratios less than 0.1 could be identified for the AtMLO2/BrMLO, OsMLO9/ZmMLO8, and OsMLO11/ZmMLO4 gene pairs (Fig. 3). It was noticeable that for each subfamily, the identified sequence region between the two gene pairs was almost accordant. For the OsMLO11/ZmMLO4 pair, the identified sequence region might probably correspond to extracellular loop 2 (291–322 AA), whereas for the AtMLO2/ BrMLO, OsMLO1/ZmMLO2, and OsMLO9/ZmMLO8 gene pairs, this region might be located in intracellular loops 2 (207– 239 AA), intracellular loop 3 (293–387 AA), and intracellular loop 2 (229–279 AA) respectively. In addition, it was postulated that the C-terminal region of two gene pairs (OsMLO1/ZmMLO2 and OcMLO3/ZmMLO1) should probably be under positive selection (Fig. 3), because their dN/dS ratios for the above regions are greater than 1. We inferred that the observed differences in functional constraint working regions might reflect their specificity for functionality of MLO proteins. 4. Discussion Duplication of individual genes, chromosomal segments, or even entire genomes is an important source of raw materials for gene genesis (Ohno, 1970). Recently, 18 distinct pairs of duplicated segments that cover 65.7% of the rice genome have been identified, in which 17 of these pairs can date back to a common time before the divergence of grasses (Yu et al., 2005). The extensive increase of MLO gene numbers in rice may result from these duplication events. In fact, three pairs of rice MLO genes could be identified as the result of segmental duplication event (Fig. 1). It was reported that massive gene losses and chromosome rearrangements, following the large-scale genome duplications, have occurred in rice, leading to a loss of about 30–65% of duplicated genes (Wang et al., 2005). The absence of duplicated pairs of OsMLOs 1, 11, and 12 at their corresponding duplicated regions might be explained in this way. As reported, the rice genome has endured two rounds of ancient polyploidy events (Goff et al., 2002; Guyot and Keller, 2004): one was occurred before the divergence of cereals, and the other was before the monocot–dicot separation. The significant expansion of rice MLO gene family should result from these polyploidy events. Compared with rice, Arabidopsis has more number of MLO genes although it possesses a much smaller genome size. It was reported that the Arabidopsis genome has undergone three rounds of whole genome duplication (WGD; Blanc et al., 2003; Bowers et al., 2003). Therefore, the fact that although most of AtMLOs were found to be within duplicated regions, only one pair of duplicate (AtMLOs 2 and 6) was identified as the result of

segmental duplication event (Devoto et al., 2003), implied that a rapid rate of MLO gene loss should have occurred in the Arabidopsis genome. To actually reflect the evolutionary relationship between MLO genes, a NJ phylogenetic tree was reconstructed using full-length MLO proteins. In this tree, we collected more MLOs for dicotyledons, a full list of rice MLOs, and one full-length of MLO for Physcomitrella patens and Ostreococcus tauri. Moreover, the phylogenetic tree was also reconstructed by employing the ME and MP methods respectively, and the trees exhibited similar topology as the one presented in Fig. 2. Elliott et al. (2002) demonstrated that the function of barley mlo mutants could be complemented by its orthologus in wheat and rice (TaMLO2 and OsMLO3 in this study respectively). Furthermore, Consonni et al. (2006) demonstrated that barley's broad-spectrum immunity against powdery mildews based on loss-of-function mlo alleles could be achieved in distantly related species, the dicot A. thaliana. Atmlo2 mutant showed no disease symptoms against a virulent powdery mildew species G. orontii and G. cichoracearum. This result uncovers a role for these MLO co-orthologus as antagonists of a resistance mechanism(s) preventing fungal ingress at the cell periphery that is conserved over a time span of more than 200 million years (the approximate time of the monocot–dicot split). These results indicated a functional preservation of AtMLO2 and HvMLO, and supported our classification of them into the same subfamily (Fig. 2) rather than different subfamilies. The divergence of family members may involve positive selection, as indicated by many studies (Carginale et al., 2004; Baudry et al., 2006). Using two approaches, some branches of the MLO gene tree were identified to be possible under positive selection, where the branch 41 should correspond to a duplication event. Moreover, adaptive evolution seems to have occurred during MLO evolution by speciation also, such as the branches 1, and 7 (Fig. 2). Lynch and Conery (2000) reported that the fate of new genes produced by duplication would either evolve a new function under positive selection, or be lost during evolution. As for the MLO family, it was found that the sequences of most MLO genes were significantly divergent from each other (Supplementary Table S2). Importantly, some experimental evidences demonstrated that functional diversification should exist between some MLO genes (Chen et al., 2006). These results suggested that continued positive selection would be an important force for driving the divergence of MLO genes. In addition, a maximum likelihood test of functional divergence was performed on the basis of the Gu's (1999) method. The advantage of this method is that it uses amino acid sequences, and thereby is not sensitive to saturation of synonymous sites. The results showed that subfamilies A, B, C, and D were significantly functionally divergent from each other, owing to the shifted evolutionary rate at some important amino acid residues. A reasonable explanation for this difference would be that the MLO family members should have evolved some new subgroup-specific functions within long periods of evolutionary time. These results strongly supported the postulation that highly divergent MLOs being classified into the same clade should indicate the preservation of an early functional diversification (Devoto et al., 2003).

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

In the window analysis, one sequence region either corresponding to intra- or extracellular loop was identified to be under strong purifying selection (Fig. 3). Devoto et al. (1999) identified experimentally a striking sequence conservation in cytoplasmic loops, and a highly variation in extracellular loops 1 and 3. However, it was deduced that strong functional constraint must impose on extracellular loop 1 (Devoto et al., 2003). Further, Elliott et al. (2005) demonstrated that the interplay of cytoplasmic loops plays crucial roles for functionality of MLO proteins. Thus, the differences of functional constraints observed among MLOs might reflect their functional specificity. In addition, Chen et al. (2006) found that each AtMLO gene has a unique expression pattern, suggesting that AtMLO proteins may function in diverse developmental and response processes. This result supported our conclusion concerning the functional divergence analysis of MLO genes. On the other hand, they also observed that several phylogenetically closely related AtMLO genes showed similar or overlapping tissue expression specificity and analogous responsiveness to external stimuli, a result suggesting functional redundancy (such as AtMLOs 5 and 9), cofunction (AtMLOs 4 and 11), or antagonistic function (Chen et al., 2006) among genes within the same clades. Therefore, it will be beneficial for using the AtMLO genes information to primarily annotate the rice MLO genes. The rice EST database was searched using the MLO CDS sequences as query. The results showed that OsMLO3 should be expressed in leaf, shoot, panicle, and flowers; OsMLO2 was expressed in flower etc. Interestingly enough, it was found that OsMLOs 1, 3, 9, 10, and 11 were also expressed in callus, inferring their important roles during callus differentiation. Moreover, we queried the Gene Expression Omnibus (GEO) in NCBI, and found that OsMLO3 might be involved in the Abscisic acid (ABA) signaling process. Nonetheless, further analysis of biological processes governed by distinct isoforms would be necessary for investigating their exact biochemical role of MLO genes. Acknowledgements This project was supported by China Postdoctoral Science Foundation (No. 20060390348). We thank the two anonymous reviewer for their valuable comments on and suggestions for the manuscript. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.gene.2007.10.031. References Arai, M., et al., 2004. ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability. Nucleic Acids Res. 32, W390–W393. Baudry, E., Desmadril, M., Werren, J.H., 2006. Rapid adaptive evolution of the tumor suppressor gene Pten in an insect lineage. J. Mol. Evol. 62, 738–744. Bhat, R.A., Miklis, M., Schmelzer, E., Schulze-Lefert, P., Panstruga, R., 2005. Recruitment and interaction dynamics of plant penetration resistance

9

components in a plasma membrane microdomain. Proc. Natl. Acad. Sci. U. S. A. 102, 3135–3140. Blanc, G., Hokamp, K., Wolfe, K.H., 2003. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 13, 137–144. Bowers, J.E., Chapman, B.A., Rong, J., Paterson, A.H., 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438. Büschges, R., et al., 1997. The barley MLO gene: a novel control element of plant pathogen resistance. Cell 88, 695–705. Carginale, V., Trinchella, F., Capasso, C., Scudiero, R., Riggio, M., Parisi, E., 2004. Adaptive evolution and functional divergence of pepsin gene family. Gene 333, 81–90. Chen, Z., et al., 2006. Expression analysis of the AtMLO gene family encoding plant-specific seven-transmembrane domain proteins. Plant Mol. Biol. 60, 583–597. Consonni, C., et al., 2006. Conserved requirement for a plant host cell protein in powdery mildew pathogenesis. Nat. Genet. 28, 716–720. Creevey, C., McInerney, J.O., 2002. An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Gene 300, 43–51. Creevey, C., McInerney, J.O., 2003. CRANN: detecting adaptive evolution in protein-coding DNA sequences. Bioinformatics 19, 1726. Devoto, A., et al., 1999. Topology, subcellular localization, and sequence diversity of the Mlo family in plants. J. Biol. Chem. 274, 34993–35004. Devoto, A., et al., 2003. Molucular phylogeny and evolution of the plantspecific seven-transmembrance MLO family. J. Mol. Evol. 56, 77–88. Eddy, S.R., 1998. Profile hidden Markov models. Bioinformatics 14, 755–763. Elliott, C., Zhou, F., Spielmeyer, W., Panstruga, R., Schulze-Lefert, P., 2002. Functional conservation of wheat and rice MLO orthologs in defense modulation to the powdery mildew fungus. MPMI 15, 1069–1077. Elliott, C., Müller, J., Miklis, M., Bhat, R.A., Schulze-Lefert, P., Panstruga, R., 2005. Conserved extracellular cysteine residues and cytoplasmic loop–loop interplay are required for functionality of the heptahelical MLO protein. Biochem. J. 385, 243–254. Freialdenhoven, A., Peterhänsel, C., Kurth, J., Kreuzaler, F., Schulze-Lefert, P., 1996. Identification of genes required for the function of non-race-specific MLO resistance to powdery mildew in barley. Plant Cell 8, 5–14. Goff, S.A., et al., 2002. A draft sequence of the rice genome (Oryza sativa L. ssp japonica). Science 296, 92–100. Gu, X., 1999. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16, 1664–1674. Gu, X., 2003. Functional divergence in protein (family) sequence evolution. Genetica 118, 133–141. Gu, X., Velden, K.V., 2002. DIVERGE: phylogeny-based analysis for functionalstructural divergence of a protein family. Bioinformatics 18, 500–501. Gu, J., Wang, Y., Gu, X., 2002. Evolutionary analysis for functional divergence of Jak protein kinase domains and tissue-specific genes. J. Mol. Evol. 54, 725–733. Guyot, R., Keller, B., 2004. Ancestral genome duplication in rice. Genome 47, 610–614. Hilliker, A.J., Harauz, G., Raume, A.G., Gray, M., Clark, S.H., Chovnick, A., 1994. Meiotic gene conversion track length distribution within the rosy locus of Drosophila melanogaster. Genetics 137, 1019–1026. Hückelhoven, R., Dechert, C., Kogel, K.H., 2003. Overexpression of barley BAX inhibitor 1 induces breakdown of MLO-mediated penetration resistance to Blumeria graminis. Proc. Natl. Acad. Sci. U. S. A. 100, 5555–5560. International Rice Genome Sequencing Project, 2005. The map-based sequence of the rice genome. Nature 436, 793–800. Jarosch, B., Jansen, M., Schaffrath, U., 2003. Acquired resistance functions in MLO barley, which is hypersusceptible to Magnaporthe grisea. MPMI 16, 107–114. Jørgensen, J.H., 1992. Discovery, characterization and exploitation of MLO powdery mildew resistance in barley. Euphytica 63, 141–152. Kim, M.C., et al., 2002. Mlo, a modulator of plant defense and cell death, is a novel calmodulin-binding protein. J. Biol. Chem. 277, 19304–19314. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.L., 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.

10

Q. Liu, H. Zhu / Gene 409 (2008) 1–10

Kumar, S., Tamura, K., Nei, M., 2004. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief. Bioinform. 5, 150–163. Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155. Lynch, M., O'Hely, M., Walsh, B., Force, A., 2001. The probability of preservation of a newly arisen gene duplicate. Genetics 159, 1789–1804. Ohno, S., 1970. Evolution by gene duplication. Springer-Verlag, Berlin. 160 pp. Page, R.D., 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12, 357–358. Panstruga, R., 2005. Serpentine plant MLO proteins as entry portals for powdery mildew fungi. Biochem. Soc. Trans. 33, 389–392. Peterhänsel, C., Freialdenhoven, A., Kurth, J., Kolsch, R., Schulze-Lefert, P., 1997. Interaction analyses of genes required for resistance responses to powdery mildew in barley reveal distinct pathways leading to leaf cell death. Plant Cell 9, 1397–1409. Piffanelli, P., et al., 2002. The barley MLO modulator of defense and cell death is responsive to biotic and abiotic stress stimuli. Plant Physiol. 129, 1076–1085. Quevillon, E., et al., 2005. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120. Sawyer, S.A., 1989. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6, 526–538. Schauser, L., Wieloch, W., Stougaard, J., 2005. Evolution of NIN-like proteins in Arabidopsis, rice and Lotus japonicus. J. Mol. Evol. 60, 229–237. Schultheiss, H., Dechert, C., Kogel, K.H., Hückelhoven, R., 2002. A small GTP-binding host proteins is required for entry of powdery mildew fungus into epidermal cells of barley. Plant Physiol. 128, 1447–1454.

Suyama, M., Torrents, D., Bork, P., 2006. PAL2NAL: robust conversion of protein sequence alignment into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Tsunoyama, K., Gojobori, T., 1998. Evolution of nicotinic acetylcholine receptor subunits. Mol. Biol. Evol. 15, 518–527. Wang, X., Shi, X., Hao, B., Ge, S., Luo, J., 2005. Duplication and DNA segmental loss in the rice genome: implications for diploidization. New Phytol. 165, 937–946. Wolter, M., Hollricher, K., Salamini, F., Schulze-Lefert, P., 1993. The MLO resistance alleles to powdery mildew infection in barley trigger a developmentally controlled defence mimic phenotype. Mol. Gen. Genet. 239, 122–128. Yang, Z., 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555–556. Yang, Z., 1998. Likelihood ratio test for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573. Yang, Z., Nielsen, R., 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–43. Yu, J., et al., 2005. The genomes of Oryza sativa: a history of duplications. PLOS Biol. 3, 266–281.