Molecular cloning and characterization of the DNA mismatch repair gene class 2 from the Trypanosoma cruzi

Molecular cloning and characterization of the DNA mismatch repair gene class 2 from the Trypanosoma cruzi

Gene 272 (2001) 323±333 www.elsevier.com/locate/gene Molecular cloning and characterization of the DNA mismatch repair gene class 2 from the Trypano...

1MB Sizes 1 Downloads 98 Views

Gene 272 (2001) 323±333

www.elsevier.com/locate/gene

Molecular cloning and characterization of the DNA mismatch repair gene class 2 from the Trypanosoma cruzi Luiz Augusto-Pinto, Daniella Castanheira Bartholomeu, Santuza Maria Ribeiro Teixeira, SeÂrgio D.J. Pena, Carlos Renato Machado* Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Received 22 March 2001; received in revised form 18 May 2001; accepted 29 May 2001 Received by T. Sekiya

Abstract Genes with homology to the bacterial mutS gene, which encodes a protein involved in post-replication DNA mismatch repair, are known in several organisms. Using a degenerate PCR strategy, we cloned a Trypanosoma cruzi genomic DNA fragment homologous to the mutS gene class two (MSH2). This fragment was used as a probe to select the corresponding cDNAs from a T. cruzi cDNA library. The complete sequence of the gene (3304 bp), denominated TcMSH2, was obtained. The sequence contained an open reading frame of 2889 bp coding for a putative protein of 962 amino acids. Computational analyses of the amino acid sequence showed 36% identity with MSH2 proteins from other eukaryotes and revealed the presence of all functional domains of MutS proteins. Hybridization analyses indicated that the TcMSH2 gene is present as a single copy gene that is expressed in all forms of the T. cruzi life cycle. The role of the product of the TcMSH2 gene in mismatch repair was investigated by negative dominance phenotype analyses in Escherichia coli. When eukaryotic muts genes are expressed in a prokaryotic system, they increase the bacterial mutation rate. The same phenomenon was observed with the TcMSH2 cDNA, indicating that T. cruzi MSH2 interferes with the bacterial mismatch system. Phylogenetic analyses showed that the T. cruzi gene grouped with the MSH2 clade con®rming the nature of the gene isolated in this work. q 2001 Published by Elsevier Science B.V. All rights reserved. Keywords: DNA repair; Molecular evolution; Functional complementation

1. Introduction Trypanosoma cruzi is a protozoan that causes Chagas disease, an important disease in the tropics. This illness constitutes one of the major social and economic problems affecting millions of people in Latin America and no effective drug treatment is available (SchmunÄiz, 2000). Although biochemistry and genetic studies have progressed steadily in the past years, little is known about DNA metabolism in this parasite. The understanding of these mechanisms might open an important gateway towards the development of new drugs for Chagas disease. Because of that, we began our studies searching for DNA repair genes of T. cruzi. DNA repair proteins can be classi®ed into various groups that are part of distinct repair machinery. The long patch mismatch repair machinery (MMR) is present in several Abbreviations: IDLs, insertion/deletion loop-outs; MMR, mismatch repair * Corresponding author. Departamento de BioquõÂmica e Imunologia, ICB, UFMG Av. AntoÃnio Carlos, 6627, Caixa Postal 486, Belo Horizonte, MG Brazil. Tel.: 155-31-34992628; fax: 155-31-34992984. E-mail address: [email protected] (C.R. Machado).

organisms. Its function is to remove base substitution and frame shift mismatches that escape from DNA polymerase proofreading activity after DNA replication, increasing DNA replication ®delity by 100±1000 fold (Modrich and Lahue, 1986). In Escherichia coli, the factors that are exclusively involved in the MMR are encoded by mutS, mutL and mutH genes (Lahue et al., 1989). The MutS protein homodimers recognize and bind speci®cally to base±base mispairing and insertion/deletion loopouts (IDLs). Then, MutS, in association with MutL protein homodimers, activates the MutH protein to make an excision-initiating nick in the unmethylated, newly-synthesized strand. The nicked strand containing the mismatch or IDL is excised by exonucleases and re-synthesized by DNA polymerase and DNA ligase (Marra and SchaÈr, 1999). The current picture of MMR in eukaryotic cells resembles that of E. coli to a great extent, but with two important differences. The ®rst is related to the strand discrimination function. In eukaryotes the hemi-methylation status of newly replicated DNA does not play a role in directing the MMR. No MutH homologue has been identi®ed and it has been proposed that strand discrimination might be

0378-1119/01/$ - see front matter q 2001 Published by Elsevier Science B.V. All rights reserved. PII: S 0378-111 9(01)00549-2

324

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

mediated by the presence of strand discontinuities in the newly-synthesized DNA. The second fundamental difference is that the MutS and MutL functional homologues are heterodimeric rather than homodimeric. At least six MutS homologues and ®ve MutL homologues, referred to as MSH and MLH, respectively, have been identi®ed in eukaryotes. The best characterized of these factors are MSH2, MSH3 and MSH6, which are involved in MMR in the nucleus. MSH2-MSH6 heterodimers recognize and repair base mismatches and loops of up to two bases, whereas MSH2-MSH3 heterodimers recognize loop-outs of different sizes (Marsischky et al., 1996; Drummond et al., 1995). MSH4 and MSH5 proteins constitute another MSH heterodimer, which however does not participate in MMR, but instead is involved in meiotic crossing-over and chromosome segregation (Nakagawa et al., 1999). MSH1 is targeted to the mitochondria and is necessary for mitochondria stability in yeast (Reenan and Kolodner, 1992a). MSH2 has a key role since it is present in both heterodimers that are required for MMR in the nucleus. Sequence analyses of this protein in several organisms have shown a strong conservation of amino acids in certain domains (Eisen, 1998). Because of these highly conserved domains, the MSH2 gene has been isolated from various eukaryotes, including human and yeast (Fishel et al., 1993; Reenan and Kolodner, 1992b) using PCR ampli®cations with degenerated primers. However, MSH2 genes have not been characterized in protozoans, particularly in kinetoplastids, an ancient group of lower eukaryotes. In the present study, we used the degenerated primer PCR strategy to isolate cDNAs from a T. cruzi library with homology to the MSH2 gene. We report the full-length cDNA sequence, its functional characterization by dominant negative assay in E. coli and a phylogenetic analysis of the MSH2 genes including the T. cruzi MSH2. 2. Materials and methods 2.1. Parasite Epimastigote forms of the TulahueÂn strain (Pizzi et al., 1952) were maintained in logarithmic growth phase at 288C in supplemented liver digest neutralized (LDNT) medium (Kirchhoff et al., 1984). Tissue culture-derived trypomastigotes and amastigotes were obtained and puri®ed by centrifugation in discontinuous metrizamide gradients (Sigma) as described previously (Teixeira et al., 1994). 2.2. Screening of the cDNA library A unidirectional cDNA library was constructed in the vector lZAP II (Stratagene, La Jolla, CA) with RNA isolated from epimastigote cultures of the TulahueÂn strain, as described previously (Teixeira et al., 1994). A pair of degenerated primers (muts10, 5 0 -TNACNGGNCCNAAYATG-3 0 and muts21, 5 0 -TYTCNRCCATRAANGT-3 0 ),

based on sequences corresponding to two conserved regions identi®ed in the alignment of MSH2 proteins, were used to PCR-amplify a 180 bp fragment from 50 ng of T. cruzi genomic DNA. This fragment, which corresponds to the T. cruzi MSH2 probe, was cloned into pUC18 vector (Amersham-Pharmacia Biotech) and sequenced using the dideoxy chain termination method and the ALF automated DNA sequencer (Amersham-Pharmacia Biotech). The 180 bp probe was labeled with [a- 32P]dCTP using the Megaprimee DNA labeling protocol from Amersham-Pharmacia Biotech. Plating and transferring of approximately 10 5 phage plaques and hybridization with the labeled DNA probe were performed according to instructions provided by the manufacturer of the library (Stratagene). Brie¯y, replica membranes were hybridized overnight at 428C with the T. cruzi MSH2 probe in a solution containing 50% formamide and washed two times with 2 £ SSC/ 0.1% SDS and 0.1 £ SSC/0.2% SDS at 608C for 30 min each. Clones that hybridized to the MSH2 probe were plaque-puri®ed after two rounds of hybridization and stored in SM buffer (0.1 M NaCl/0.01 M MgSO4/50 ml of 1 M Tris±HCl (pH 7.5)/gelatin 2%). 2.3. Nucleic acid manipulations and DNA sequencing Recombinant phagemids were excised by co-infecting E. coli XL-1blue cells with the l phage and ExAssist helper phage (Stratagene) and introduced into the SOLRe strain of E. coli to generate the corresponding pBluescript plasmids. A protocol described by the supplier of the l phage vector (Stratagene) was followed. The resulting plasmids were isolated from overnight cultures of bacterial cells growing in LB medium with 100 mg/ml of ampicilin using the Qiagen plasmid-prep method (Qiagen Inc.). Sequencing of cDNA inserts was performed with a Thermo Sequenase ¯uorescent labeled primer cycle sequencing kit with 7-deaza-dGTP (Amersham-Pharmacia Biotech) using double-strand pBluescript II cDNAs as template, and M13 reverse and M13 universal primers. To obtain the complete sequence of both strains, PCR ampli®cations were used to generate overlapping fragments from the isolated cDNAs. In order to amplify and clone the 5 0 UTR of MSH2, two primers were used: SLTc (T. cruzi spliced leader sequence: 5 0 -ACAGTTTCTGTACTATATTG-3 0 ) and tmuts21 (5 0 AGGTGGATGGAATTGTATGC-3 0 ) primers in a PCR reaction with 5 ml of lZAP II cDNA library (2 ng/ml). The PCR product was gel-puri®ed and submitted to a new ampli®cation with the TcSL and tmuts11 (5 0 -AGTACTCGACAGAGGCACCA-3 0 ) primers. 2.4. Northern and Southern blot analysis For Northern blot analysis, 10 mg of RNA were sizefractionated in 1.2% agarose gel containing 5% formaldehyde. The RNA was blotted onto a Hybond-N 1 membrane (Amersham-Pharmacia Biotech) by capillary transfer and

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

®xed through UV irradiation using UV Stratagene linker 2400 (Stratagene). The cDNA insert of 3250 bp (cDNA 6E1) was PCR-ampli®ed using T3 and T7 primers, gel-puri®ed and labeled with [a- 32P]dCTP using the Megaprimee DNA labeling protocol from Amersham-Pharmacia Biotech. The membrane was hybridized in a 50% formamide buffer for 18 h at 428C as previously described (Teixeira et al., 1994) and washed twice with 2 £ SSC/0.1% SDS at 608C for 30 min each. For Southern blot hybridization, genomic DNA of T. cruzi was digested with the restriction endonucleases BamHI, EcoRI and NcoI (New England Biolabs) and sizefractionated by electrophoresis in 0.8% agarose gel. After denaturation and neutralization, the DNA fragments were transferred to a Hybond-N 1 membrane (Amersham-Pharmacia) by capillary transfer. The membrane was then hybridized to the same [a- 32P]dCTP-labeled cDNA (3250 bp) probe (Amersham-Pharmacia Biotech). Hybridization and washing steps were carried out as described for Northern blot analysis. 2.5. Alignment of MutS/MSH protein with TcMSH2 and phylogenetic analyses The MutS/MSH protein sequences were obtained from protein databases of the National Center for Biotechnology Information (NCBI) and aligned with the TcMSH2-deduced amino acid sequence using the multiple alignment program CLUSTALW (Thompson et al., 1994). Gaps and regions of ambiguous alignment were excluded from the analysis. The C-terminal region conserved in the protein family evolution (Eisen, 1998; Culligan et al., 2000) was utilized for phylogenetic inferences conducted using a neighbor-joining (NJ) tree (Saito and Nei, 1987) for Dayhoff PAM distances among MutS/MSH protein sequences constructed using PRODIST (Phylip 3.5). Protein parsimony trees were also constructed using PROTPARS (Phylip 3.5) (Felsenstein, 1993). 2.6. Dominant negative assay in E. coli The rate of spontaneous mutation to rifampicin resistance (rif r) in wild-type E. coli AB1157 (F-, Thr1, leu6, thi1, lacY1, glk4, ara14, xyl5, mtl1, proA2, his4, argE3, str31, tsx33, supE44) was determined using a plate assay. The T. cruzi Msh2 (TcMSH2) containing Bluescript plasmid (Statagene, La Jolla, CA) was transformed into AB1157 according to the procedure of Fishel et al. (1986). Ampicillin-resistant transformants were selected and grown to saturation in 2 £ YT medium (1.6% tryptone, 1.0% yeast extract, 0.5% NaCl) containing 100 mg/ml ampicillin. To determine the total number of viable cells (amp r) the culture was plated onto 2 £ YT agar plates containing 100 mg/ml ampicillin. The total number of spontaneous rif r was determined by plating equivalent volumes of culture on 2 £ YT plates containing 100 mg/ml ampicillin and 100 mg/ml rifampicin (Sigma). The rate of mutation was calculated according to Lea and Coulson (1949) using r o ˆ M…1:24 1 lnM†, where ro is the

325

median number of rif r mutations in an odd number of independent cultures (usually 18) and M is the average number of rif r mutations per culture. M was solved by interpolation from the known ro value and then used to calculate the mutation rate r, where r ˆ M=N and N is the ®nal average number of viable cells. An E. coli mutS mutant strain CSH-115 (ara D (gpt-lac) 5 rpsL mutS::miniTn10 (tet r)) was used as a positive control of assay.

3. Results 3.1. Sequencing and analyses of the MSH2 cDNAs Based on a multiple alignment for MutS/MSH2 proteins, a pair of degenerated primers (muts10, 5 0 -TNACNGGNCCNAAYATG-3 0 and muts21, 5 0 -TYTCNRCCATRAANGT3 0 ), corresponding to two conserved regions in MSH2 proteins, was designed. These primers were used to PCRamplify a 180 bp fragment from T. cruzi genomic DNA. Sequence analyses using the Basic Alignment Search Tool (BLAST) showed that the PCR product has a sequence with 60% identity to the MSH2 gene from other eukaryotes including the two conserved motifs that were used to design the primers (see Fig. 2). The 180 bp fragment was used as a probe for the screening of a T. cruzi cDNA library, which resulted in the isolation of seven cDNAs clones. Restriction digestion analysis of the plasmids revealed that the cDNAs, denominated 6E1, 6D1A, 7D1, 4B2, 7A1, 4E1 and 4A1, were 3250, 2560, 2449, 2270, 1963, 1250 and 1151 bp long, respectively. After determining the nucleotide sequences of approximately 400 bp from the 5 0 region of each cDNA and submitting to the BLASTX analyses, the homology with the Msh2 gene of all cDNA clones was con®rmed. The seven T. cruzi MSH2 cDNAs were sequenced completely in both strands and after subcloning and sequencing various PCR fragments derived from the largest cDNA, 6E1, a 3250 bp consensus sequence was generated. Because the spliced leader sequence was not present at the 5 0 end of any of cDNAs isolated, we elaborated a strategy to obtain the complete 5 0 UTR of the T. cruzi MSH2 cDNA using PCR ampli®cation directly from the cDNA library. To achieve that, we used a primer corresponding to 18 nucleotides of the T. cruzi spliced leader (TcSL) and two primers, tmuts11 and tmuts21, which contain sequences corresponding to the 5 0 end of the 6E1 MSH2 cDNA. The tmuts21 reverse primer is complementary to a sequence located 950 bp downstream of the 5 0 end of the 6E1 MSH2 cDNA sequence, whereas the primer tmuts11 is complementary to a sequence located approximately 600 bp upstream of the tmuts21 primer annealing site. Thus, a PCR fragment larger than 950 bp was expected to be generated with the primers tmust21 and TcSL. Using 10 ml of the library as a template, a fragment of approximately 1000 bp was obtained, puri®ed and submitted to a second PCR with TcSL and tmuts11

326

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

primers. In the second PCR ampli®cation, we obtained a fragment of 300 bp, which was cloned and sequenced. After overlapping this sequence with the consensus MSH2 cDNAs sequence we generated the full-length T. cruzi MSH2 sequence containing a total of 3305 bp. That sequence includes 17 bp corresponding to the spliced leader sequence at the 5 0 end (Fig. 1) (GenBank Accession number: AY005798). A single open reading frame (ORF) of 2886 nt encoding a putative protein of 962 amino acids, a 47 nt 5 0 UTR and a 372 nt 3 0 UTR were identi®ed using the ORF ®nder program (http://www.ncbi.nlm.nih.gov/gorf/ gorf.html) (Fig. 1). The homology of the putative 962 amino acid from MSH2 T. cruzi protein with other MSH2 proteins was con®rmed by comparison with protein databases using the BLASTP search program, and the functional domains were identi®ed with Prosite (http://www.expasy.ch/tools/scnpsit1.html) e Pfam (http://pfam.wustl.edu/hmmsearch.html) software (Fig. 1). As shown in Fig. 1, the sequence of the T. cruzi MSH2 putative protein revealed all functional motifs for the MutS/MSH proteins as previously described by Culligan et al. (2000), such as the mismatch recognition motif in the N-terminal domain, the ATP binding site and the helix-turn-helix motif in the C-terminal domain. Furthermore, we identi®ed a putative leucine-zipper motif in the TcMSH2, which has not been described in other MSH/MutS protein. In Fig. 2, we show the putative domain organization and secondary structure of the T. cruzi MSH2 putative protein. This prediction was carried out in a structure-based sequence alignment as described by Ban and Yang (1998). We used the crystal structures of the mismatch repair protein MutS from Thermus aquaticus (TAQ MutS), which has been recently published (Obmolova et al., 2000) and compared with the yeast, human and T. cruzi MSH2 sequence proteins. The crystal structure of the TAQ MutS protein subunit consists of ®ve structural domains: domain I (residues 1±118) is the N-terminal mismatch recognition domain and domain IV (residues 406±513) is involved in DNA binding together with domain I; domain II (residues 132±245) connects domains I and III. The latter (residues 247±385 and 514±540) is central to the MutS structure, because it is directly connected to domains II, IV, and V. Finally, domain V (residues 543±765) contains the Walker ATPase motif and the helix-turn-helix motif that is involved in the protein dimer interface to form the MutS homodimer. All the domains found in the TAQ MutS protein are present in the putative protein MSH2 from T. cruzi; in the trypanosome protein, these domains correspond to residues 49±171 (I), 180±316 (II), 339±504/618± 656 (III), 524±610 (IV) and 661±932 (V) (Figs. 2 and 4). 3.2. Identi®cation of the TcMSH2 gene and mRNA expression analysis Southern blot analyses were performed with genomic DNA isolated from epimastigote cultures of T. cruzi (Tula-

hueÂn strain) digested with the endonucleases BamHI, EcoRI and NcoI. Sequence analysis of the full-length TcMSH2 cDNA sequence showed the presence of single restriction sites for EcoRI and NcoI enzymes (Fig. 1), and no recognition site for the BamHI restriction enzyme. After hybridization with the full-length cDNA probe, we detected a single BamHI fragment and two EcoRI and NcoI fragments, indicating the presence of a single copy of the TcMSH2 gene in the T. cruzi genome (Fig. 3A). The same pattern of hybridization was observed with DNA isolated from the Y strain (data not shown). We also performed Northern blot analyses to investigate the pattern of transcription of the TcMSH2 gene during the T. cruzi life cycle. As observed in Fig. 3B, a single mRNA of approximately 3300 nt was detected in all three forms of T. cruzi. A comparison with the amount of total rRNA present on each lane indicated that the levels of TcMSH2 mRNA are higher in amastigote and epimastigote forms. 3.3. TcMSH2 functional characterization To investigate whether the TcMSH2 cDNA encodes a protein that is able to recognize and bind to DNA mismatches in a bacterial cell, we performed a dominant negative assay using the rifampicin resistance as a target gene. It has been shown by others that expression in E. coli of a protein related to the MutS family results in an increased mutation in the bacteria rate due to lack of interaction of the heterologous protein with the normal repair machinery in the bacterium. The pBluescript (pBSKII) plasmid containing the 4B2 cDNA, when transformed into the AB1157 strain of E. coli, confers resistance to ampicilin. The expression of the TcMSH2 under control of the lac promoter in the same E. coli strain resulted in an increased number of clones that are resistant to rifampicin (rif r), when compared with the bacteria transformed with the pBSKII vector only. Since the E. coli strain AB1157 is sensible to both antibiotics, the resistant phenotype (rif r) is explained as a consequence of the presence of a MutS homologue that increases the mutation rate in the b subunit of the RNA polymerase gene. In Fig. 4, we represented the ¯uctuation analysis (Lea and Coulson, 1949) of several independent experiments. The wild-type E. coli (AB1157) has a rif r mutation rate of 2 £ 10 28, whereas an isogenic strain of E. coli containing a mutS mutation (CSH) has a rif r mutation rate of 15.5 £ 10 26. The mutation rate of the transformed E. coli strain AB1157 expressing TcMSH2 cDNA is 6.2 £ 10 27. These results suggest that the 4B2 cDNA directs the expression of a truncated form of the TcMSH2 protein, which is able to cause an interference in the E. coli MutHLS DNA mismatch repair system and, consequently, increases the mutation rate of the bacterium. 3.4. TcMSH2 phylogenetic analysis Using the CLUSTALW algorithm and sequences deposited in the GenBank database, an alignment of the deduced

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

327

Fig. 1. Nucleotide and deduced amino acid sequence of the T. cruzi MSH2 cDNA (GenBank Accession number: AY005798). Nucleotide and amino acid positions are indicated on the left. Untranslated 5 0 and 3 0 regions are shown in lower case letters, and the coding region is shown in upper case letters. The 17 nt corresponding to part of the spliced leader sequence at the 5 0 UTR is underlined. In the N-terminal region, light gray shading indicates the putative MSH2 mismatch recognition motif, amino acids 89±92. A putative leucine zipper, amino acids 280±301, and a conserved 304 arginine residue in MutS proteins are also shown by light gray shading. In the C-terminal region, light gray shading indicates respectively the ATP binding site, amino acids 709±717, the mutS family signature, amino acids 784±800, and the helix-turn-helix motif, amino acids 871±888. Dark gray shading in the helix-turn-helix motif is the main conserved amino acid residues. Sequences shown by dark gray shading of nucleotides 2005±2010 in the coding region and nucleotides 3117±3122 in the 3 0 UTR are single restriction sites presents in the TcMSH2 gene for NcoI and EcoRI enzymes, respectively.

328

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

Fig. 2. TcMSH2 structure-based sequence alignment. Yeast (yMSH2), human (hMSH2) and T. cruzi (TcMSH2) MSH2 proteins and T. aquaticus MutS protein (TAQ MutS) are shown. Above the aligned sequences are indicated the secondary structures observed in TAQ MutS crystal: a-helices (rectangles) and bstrands (arrows) named in alphabetic or numeric order, respectively. Conserved residues for structural integrity are highlighted in gray with black character, DNA recognition is highlighted in dark gray with white character, protein dimerization is underlined, ATPase activity is highlighted in gray with white character and interdomain interactions are highlighted in black with white character. The black arrow beneath the sequence alignment indicates the start of the TcMSH2 4B2 cDNA, which was used in the dominance negative assay. White arrows beneath the sequence alignment indicate protein conserved regions used to synthesize the degenerated primers developed for the construction of the TcMSH2 probe.

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

329

Fig. 3. Southern and Northern blot analysis of the TcMSH2 gene. (A) T. cruzi genomic DNA digested with BamHI, NcoI and EcoRI were probed with the TcMSH2 6E1 cDNA (3250 bp). Molecular weight markers are shown on the left and representative scale ®gure are shown on the right. (B) The Northern blot analysis was performed with 10 mg of total RNA isolated from (A) amastigote, (T) trypomastigote and (E) epimastigote forms, probed with the same fragment used in the Southern blot experiment.

amino acid sequence of the TcMSH2 with MutS and MSH proteins from various organisms was performed. In the Cterminus regions of these proteins we identi®ed an approximately 270 amino acid conserved region, which presents the ATP binding site, helix-turn-helix motif and the MutS proteins signature (alignment supported in URL: http:// www.icb.ufmg.br/~lgb/tcmsh2.html). This same conserved region was previously identi®ed by Eisen (1998) and Culligan et al. (2000) and was used to describe the evolutionary history of the MutS/MSH proteins. Here, we utilize these sequences to examine the evolutionary relationships of the TcMSH2 protein in a neighbor-joining phylogenetic tree for Dayhoff PAM distances among MutS/MSH protein sequences. As an outgroup we used the MutS protein sequence from Bacillus subtilis and Streptococcus pneumoniae (Gram-positive bacteria). In Fig. 5, we showed that the

TcMSH2 sequence is positioned within the MSH2 clade, but constitutes a distinct branch. This result indicates that the gene isolated and characterized in this work represents the T. cruzi homologue of the eukaryotic MSH2 gene. 4. Discussion The MMR, which is present in all organisms tested, from bacteria to humans, is critical for replication ®delity and genome stability. In this article we demonstrate the presence of the MMR pathway in the human parasitic protozoan T. cruzi. We showed the existence, in the parasite genome, of the mutS homolog gene (MSH2), which encodes a crucial protein involved in DNA MMR in eukaryotes (Nakagawa et al., 1999). In humans, it is known that germline mutations in

330

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

Fig. 4. Dominant negative assay. (A) Graphic representation of the TcMSH2 protein domains organization: I, the N-terminal domain contains the mismatch recognition motif (EYYE); II, connector domain; III, core domain; IV, DNA binding domain; V, the C-terminal domain contains the ATP binding sites, MutS signature family and helix-turn-helix motif. Note that the 4B2 cDNA encodes the full-length III, IV and V domains. (B) Representation of ¯uctuation test to calculate the mutation rate. AB1157, E. coli wild-type strain; CSH, E. coli mutS mutant.

MMR genes cause a predisposition to hereditary non-polyposis colorectal cancer associated with microsatellite instability (Fishel et al., 1993). The cloning of the T. cruzi MSH2 gene will allow us to investigate the consequences of the MMR de®ciency in a pathogenic microorganism presenting non-sexual reproduction such as T. cruzi. The TcMSH2 gene has an ORF of 962 amino acids, similarly to the sequences of proteins belonging to the MSH2 family, which ranges from 933 to 964 amino acids (Eisen, 1998). Analysis of the TcMSH2 amino acid sequence revealed a considerable identity with other eukaryotic MSH2 proteins, such as those of Xenopus, human, mouse, yeast and Drosophila. High sequence conservation was found at the N-terminal, middle and C-terminal domains identi®ed in the MutS/MSH proteins (Culligan et al., 2000). In the N-terminal domain of the TcMSH2 putative protein an aromatic doublet alternating with two negatively charged residues (EYYE) was observed, which was predicted to closely interact with base±base DNA mismatches (Malkov et al., 1997). A highly conserved arginine residue, found in the middle domain of the E. coli MutS protein, is also present in the TcMSH2. Mutations in the bacterial gene affecting this arginine residue confer a dominant negative phenotype (Wu and Marinus, 1994). However, the biochemical function of this important domain has not yet been established. Another interesting feature identi®ed in the TcMSH2 middle domain is the presence of a hypothetical leucine zipper motif, which is known to be involved in protein±protein interaction. Surprisingly, this motif has not been identi®ed in any of

the other members of the MutS/MSH protein class. The C-terminal conserved domain shows the highest level of conservation among all three domains. It contains a helixturn-helix and the ATP/magnesium-binding domains, which are predicted to interact with DNA (Alani et al., 1997) and to mediate ATP binding and hydrolysis, respectively (Haber and Walker, 1991). Also in the C-terminal domain the MutS family signature is present, the function of which has not been characterized (Eisen, 1998). Thus, all three subdomains found in the C-terminal region of MutS/MSH proteins were identi®ed in the TcMSH2 putative protein. The structural characterization of the TcMSH2 protein was also possible by comparative analyses with crystal structures that have been recently solved for the mismatch binding protein MutS of E. coli (Lammers et al., 2000) and its Thermus aquaticus homologue (Obmolova et al., 2000). This analysis permitted us to identify all putative secondary structures and conserved domains in TcMSH2 putative protein, which were described in the TAQ MutS protein crystal structure (Obmolova et al., 2000). The results of the dominant negative assay in E. coli performed with the 4B2 cDNA constitute an initial effort towards the functional characterization of the TcMSH2 gene. Based on this assay we concluded that the N-terminal region is not necessary to cause interference with the E. coli MMR, since the 4B2 cDNA encodes a truncated form from TcMSH2 protein containing only half of the middle domain and the C-terminal domain. This result was unexpected, since the analyses based on the crystal structure resolution of bacterial MutS proteins suggested that the N-terminal

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

331

Fig. 5. Phylogenetic relationships between TcMSH2 putative protein and MutS/MSH protein sequences. This is a neighbor-joining (NJ) tree for Dayhoff PAM distances. Parsimony trees were also constructed, which produced very similar results (data not shown). The horizontal scale bar indicates the evolutionary distance. Numbers above each branch represent the number of times the branch was found in 100 bootstrap replicas. The B. subtilis and S. pneumoniae MutS protein sequences (Gram-positive eubacteria) were used as an outgroup. The species and Accession number in GenBank for each taxa are: MutSBacsb, B. subtilis (P49849); MutSStrep, S. pneumoniae (AAA88597); MutSSalty, Salmonella typhimurium (P10339); MutSEcoli, E. coli (AAG57842); MSH3human, Homo sapiens (P20585); MSH3yeast, Saccharomyces cerevisiae (P25336); MSH1yeast, S. cerevisiae (P25846); MSH1Spomb, Schizosaccharomyces pombe (O13921); MSH4human, H. sapiens (O15457); MSH4yeast, S. cerevisiae (P40965); MSH5human, H. sapiens (O43196); MSH5yeast, S. cerevisiae (S67702); MSH6human, H. sapiens (P52701); MSH6yeast, S. cerevisiae (Q03834); TcMSH2, T. cruzi (AAG00261); MSH2Neucr, Neurospora crassa (O13396); MSH2yeast, S. cerevisiae (S57379); MSH2Corn, Zea mays (Q9XGC9); MSH2Arab, Arabidopsis thaliana (BAB01119); MSH2human, H. sapiens (P43246); MSH2mouse, Mus musculus (S53608); MSH2Rat, Rattus norvegicus (JC6184); MSH2Xenla, Xenopus laevis (S53609).

domain was essential to recognize base±base mismatches and insertion/deletion loops. Unspeci®ed interactions with DNA of domain IV, which is present in cDNA 4B2, constitute a possible explanation for our results. Alternatively, the bacterial MutS protein may be interacting with the C-terminal domain of the truncated TcMSH2 protein and this interaction could be causing the interference with MMR from E. coli and thus generating the increased mutation rate in the bacteria. As shown by Obmolova et al. (2000) and Lammers et al. (2000) the C-terminal domain is the principal region of MutS/MSH for the formation of the protein dimmers and the helix-turn-helix domain, which is present in domain V, is particularly involved in these processes. Thus, we postulated that the truncated TcMSH2 protein forms a heterodimer with E. coli MutS causing the dominant negative phenotype described in this work. RNA analyses showed that the TcMSH2 gene is constitu-

tively expressed in all stages of the T. cruzi life cycle. The presence of the TcMSH2 in epimastigotes and amastigotes, which are replicative forms of the parasite, suggests that post-replication MMR is functional in these forms of the parasite. However, we also detected the 3300 nt TcMSH2 transcript in the non-replicating trypomastigote stage, although in levels that were signi®cantly reduced. The reasons for the presence of the TcMSH2 transcript in trypomastigotes are not clear, and it is possible that the protein synthesis machinery of the cell does not translate this mRNA during this stage of the parasite life cycle. In trypanosomatids in general, post-transcriptional control mechanisms are known to be responsible for most of the changes in protein expression that occur during their life cycle (Teixeira, 1998). Southern blot analysis of T. cruzi genomic DNA showed a correspondence between the hybridization pattern at high-

332

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333

stringency conditions and the restriction map of TcMSH2 cDNA sequence. This suggests that the TcMSH2 gene exists as a single-copy gene in the T. cruzi genome. The presence of only one copy of the MSH2 gene has been demonstrated in other eukaryotes, including human (Peltomaki et al., 1993) and yeast (Reenan and Kolodner, 1992b). This result is especially suitable for the generation of mutant parasites by gene knockout of the MSH2 gene, an experimental approach we have in progress. In this work we also evaluated some features of the TcMSH2 molecular evolution. The alignment of the region C-terminal of the TcMSH2 protein with the MutS/MSH proteins of other species allowed the construction of a phylogenetic tree that clearly showed that TcMSH2 is clustered together with other MSH2 proteins, although in a distinct branch. It is also in agreement with the position of the Trypanosomatids, order Kinetoplastida, as one of the earliest branching eukaryotic lineages. Biochemical studies focusing on DNA repair in trypanosomatids are very scarce. The TcMSH2 gene, described in this work, represents the second DNA repair gene isolated in this group of lower eukaryotes. Recently, Perez et al. (1999) reported the isolation and characterization of cDNAs encoding apurinic/apyrimidinic (AP) endonuclease genes from Leishmania major and T. cruzi, which are capable of complementing the de®ciency of exonuclease III and dUTPase in E. coli. The TcMSH2 gene is the ®rst DNA mismatch repair gene isolated from T. cruzi. This work opens the door to the exploration of functional characteristics of TcMSH2 as well as to the characterization of other members of the MSH gene family that may be present in the T. cruzi genome. Moreover, using reverse genetics, we can envisage the assessment of its role in the T. cruzi MMR pathway permitting the beginning of functional studies of an important metabolic aspect of this human parasite. Acknowledgements Luiz Augusto-Pinto and Daniella Castanheira Barholomeu have a doctoral fellowship (CNPq), and Santuza Maria Ribeiro Teixeira and SeÂrgio D.J. Pena have a research associated fellowship from CNPq. We are grateful to KaÂtia Barroso GoncËalves (CNPq) and Neuza Antunes Rodrigues (CNPq) for technical support. References Alani, E., Sokolsky, T., Studamire, B., Miret, J.J., Lahue, R.S., 1997. Genetic and biochemical analysis of Msh2p-Msh6p: role of ATP hydrolysis and Msh2p-Msh6p subunit interactions in mismatch base pair recognition. Mol. Cell. Biol. 17, 2436±2447. Ban, C., Yang, W., 1998. Structural basis for MutH activation in E. coli mismatch repair and relationship of MutH to restriction endonucleases. EMBO J. 17, 1526±1534. Culligan, K.M., Meyer-Gauen, G., Lyons-Weiler, J., Hays, J.B., 2000.

Evolutionary origin, diversi®cation and specialization of eukaryotic MutS homolog mismatch repair proteins. Nucleic Acids Res. 28, 463±471. Drummond, J.T., Li, G.M., Longley, M.J., Modrich, P., 1995. Isolation of an hMSH2-p160 heterodimer that restores DNA mismatch repair to tumor cells. Science 268, 1909±1912. Eisen, J.A., 1998. A phylogenomic study of the MutS family of proteins. Nucleic Acids Res. 26, 4291±4300. Felsenstein, J., 1993. Phylogenetic Inference Package (PHYLIP version 3.5), University of Washington, Seattle, WA. Fishel, R.A., Siegel, E.C., Kolodner, R., 1986. Gene conversion in Escherichia coli. Resolution of heteroallelic mismatched nucleotides by corepair. J. Mol. Biol. 188 (2), 147±157. Fishel, R., Lescoe, M.K., Rao, M.R., Copeland, N.G., Jenkins, N.A., Garber, J., Kane, M., Kolodner, R., 1993. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis cancer. Cell 75, 1027±1038. Haber, L.T., Walker, G.C., 1991. Altering the conserved nucleotide binding motif in the Salmonella typhimurium MutS mismatch repair protein affects both its ATPase and mismatch binding activities. EMBO J. 10, 2707±2715. Kirchhoff, L.V., Engel, J.C., Dvorak, J.A., Sher, A., 1984. Strains and clones of Trypanosoma cruzi differ in their expression of a surface antigen identi®ed by a monoclonal antibody. Mol. Biochem. Parasitol. 11, 81±89. Lahue, R.S., Au, K.G., Modrich, P., 1989. DNA mismatch correction in a de®ned system. Science 245, 160±164. Lammers, M.H., Perrakis, A., Enzlin, J.H., Winterwerp, H.H.K., de Wind, N., Sixma, T., 2000. The crystal structure of DNA mismatch repair protein MutS binding to a G.T mismatch. Nature 407, 711±717. Lea, D.E., Coulson, C.A., 1949. The distribution of numbers of mutants in bacterial populations. J. Genet. 49, 264±285. Malkov, V.A., Biswas, I., Camerini-Otero, R.D., Hsieh, P., 1997. Photocross-linking of the NH2-terminal region of Taq MutS protein to the major groove of a heteroduplex DNA. J. Biol. Chem. 272, 23811± 23817. Marra, G., SchaÈr, P., 1999. Recognition of DNA alterations by the mismatch repair system. Biochem. J. 338, 1±13. Marsischky, G.T., Filosi, N., Kane, M.F., Kolodner, R., 1996. Redundancy of Saccharomyces cerevisiae MSH3 and MSH6 in MSH2-dependent mismatch repair. Genes Dev. 10, 407±420. Modrich, P., Lahue, R., 1986. Mismatch repair in replication ®delity, genetic recombination, and cancer biology. Annu. Rev. Biochem. 65, 101±133. Nakagawa, T., Abhijit, D., Kolodner, R.D., 1999. Multiple functions of MutS and MutL related heterocomplexes. Proc. Natl. Acad. Sci. USA 96, 14186±14188. Obmolova, G., Ban, C., Hsieh, P., Yang, W., 2000. Crystal structures of mismatch repair protein MutS and its complex with a substrate DNA. Nature 407, 703±710. Peltomaki, P., Aaltonen, L.A., Sistonen, P., 1993. Genetic mapping of a locus predisposing to human colorectal cancer. Science 260, 810±812. Perez, J., Gallego, C., Bernier-Villamor, V., Camacho, A., Gonzalez-Pacanowska, D., Ruiz-Perez, L.M., 1999. Apurinic/apyrimidinic endonuclease genes from the trypanosomatidae leishmania major and Trypanosoma cruzi confer resistance to oxidizing agents in DNA repair-de®cient Escherichia coli. Nucleic Acids Res. 27, 771±777. Pizzi, T.P., Rubio, M.D., Prager, R., Silva, C., 1952. AccioÂn dela corisona en la infeccioÂn experimental por Trypanosoma cruzi. Boletin Chileno de Parasitologia 7, 22±24. Reenan, R.A., Kolodner, R.D., 1992a. Characterization of insertion mutations in the Saccharomyces cerevisiae MSH1 and MSH2 genes: evidence for separate mitochondrial nuclear functions. Genetics 132, 975±985. Reenan, R.A., Kolodner, R.D., 1992b. Isolation and characterization of two Saccharomyces cerevisiae genes encoding homologs of the bacterial HexA and MutS mismatch repair proteins. Genetics 132, 963±973.

L. Augusto-Pinto et al. / Gene 272 (2001) 323±333 Saito, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406±425. SchmunÄiz, G.A., 2000. A Tripanosomiase Americana e seu impacto na sauÂde puÂblica das AmeÂricas. In: Brener, Z., Andrade, Z., Barral-Neo, M. (Eds.), Trypanosoma cruzi e a doencËa de Chagas, 2nd Edition. Guanavara Koogan, Rio de Janeiro. Teixeira, S.M.R., 1998. Control of gene expression in Trypanosomatidae. Braz. J. Med. Biol. Res. 31, 1503±1516. Teixeira, S.M.R., Russel, D.G., Kirchhoff, L.V., Donelson, J.E., 1994. A

333

differentially expressed gene family encoding ªamastinº, a surface protein for Trypanosoma cruzi, originates from a single multicistronic transcript. FEBS Lett. 250, 497±502. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-speci®c gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673±4680. Wu, T.H., Marinus, M.G., 1994. Dominant negative mutator mutations in the mutS gene of Escherichia coli. J. Bacteriol. 176, 5393±5400.