Cell, Vol. 57, 835-845,
June 2, 1989, Copyright
0 1989 by Cell Press
The Genes and Transcripts of an Antigen Gene Expression Site from T. brucei Etienne Pays, Patricia Tebabi, Annette Helene Coquelet, Philippe Revelard, Didier Salmon, and Maurice Steinert Department of Molecular Biology Free University of Brussels 67, rue des Chevaux B1640 Rhode Saint Genese Belgium
Pays,
Summary The AnTat 1.3A antigen gene expression site of T. brucei was cloned from genomic libraries of the 200 kb expressor chromosome. In addition to the antigen gene, it contains seven putative coding regions (ESAGs, for expression site-associated genes), as well as a RIME retroposon. The polypeptide encoded by ESAG 4 shows homology to yeast adenylate cyclase, and possesses structural features of a transmembrane protein. The expression site is transcribed by a pol l-like polymerase in the parasite bloodstream form only, but sequences similar to ESAGs 5, 4, and 2 are also transcribed constitutively elsewhere, by a polymerase sensitive to a-amanitin. Ultraviolet irradiation, which seems to block RNA processing, allows the tentative mapping of a transcriptton promoter about 45 kb upstream of the antigen gene. Introduction Chronic infection by African trypanosomes is linked to the presence of a dense antigenic coat made of variable surface glycoprotein (VSG). The continuous replacement of VSG allows the parasite to avoid the immune response of its host. Basically, two kinds of mechanisms preside over this antigenic variation. Different DNA rearrangements can lead to either replacement or alteration of the VSG gene in a given expression site, or different expression sites, containing different VSG genes, can be activated alternatively (for recent reviews see Borst, 1986; Pays and Steinert, 1988). There are between five and 20 potential expression sites, probably all telomeric. Although conflicting results have been obtained concerning the location of the transcription promoter with respect to the VSG gene, it seems clear that a large portion of the telomere (up to 60 kb) is under the same transcriptional control: this region is only transcribed in the bloodstream form, by a polymerase insensitive to high doses of a-amanitin (Kooter and Borst, 1984; Bernards et al., 1985; Kooter et al., 1987; Shea et al., 1987; Alexandre et al., 1988). It has been suggested that transcription “reinitiation” loci, whose number and location may vary, can interrupt the expression site (Shea et al., 1987; Alexandre et al., 1988). These loci might allow the RNA polymerase to be recycled for efficient transcription of downstream sequences, as observed for ribosomal gene repeats in Xenopus and molrse (Baker
and Platt, 1986; Mitchelson and Moss, 1987). This observation, together with resistance of the polymerase to inhibition by a-amanitin, has led to the proposal that transcription of the VSG gene is of the ribosomal type, and takes place in the nucleolus (Shea et al., 1987). Whatever the nature, number, and location of the transcription promoter(s), it appears that transcription of the VSG gene expression site is polycistronic (Johnson et al., 1987; Alexandre et al., 1988). Several coding regions, called expression site-associated genes (ESAGs; Gully et al., 1985) have been mapped ahead of the VSG gene. Since the ESAGs are transcribed together with the VSG gene, their role may be linked to antigenic variation, and could thus be important for parasite survival in the blood. In this paper we describe the cloning of an active VSG gene expression site up to the promoter region, and provide evidence for the presence of seven ESAGs together with a RIME retroposon. One ESAG shows homology to the yeast adenylate cyclase gene, and appears to encode a transmembrane protein. Sequences related to this and two other ESAGs, as well as to RIME, are also transcribed constitutively elsewhere by a different RNA polymerase. Transcription of this particular expression site probably starts about 45 kb upstream of the VSG gene, and could reinitiate at least once, 27 kb downstream. Results Cloning the AnTat 1.3A Expression Site The expression site of the Trypanosoma brucei AnTat 1.3A VSG gene has been mapped at the end of a 200 kb chromosome (Pays et al., 1985). Previous reports (Murphy et al., 1987b; Alexandre et al., 1988) have described the initial cloning of sequences from this expression site, up to 18 kb upstream of the VSG gene. Clones extending further in the 5’ direction have now been obtained from genomic libraries of the isolated 200 kb chromosome, constructed in either plasmid or phage vectors. Figure 1 shows the map of these clones, which could be classified in two categories, named ES and AL, for expression site and alternate (allele?) sequences, respectively. For the sake of clarity, the demonstration that the ES clones correspond to the active expression site is presented in the last section of Results. Characterization of the Cloned Sequences Large regions of both ES and AL clones have been sequenced, as indicated in Figure 1. An 18.6 kb sequence from the expression site, starting at the 5’-terminal Sphl site, is shown in Figure 2. This sequence carries four large open reading frames (ORFs), which, according to the nomenclature initiated by Gully et al. (1985), have been named ESAGs 7 to 4, from 5’ to 3’. ESAG 7 and ESAG 6 are very similar, whereas the other ESAGs are completely different. The translation of the ESAGs is presented in Figure 3. The presumptive ESAG 7, 6, and 4 proteins harbor a potential signal peptide with a predicted cleavage site,
Cell 836
-
t-)-<
r E L---
I
11
9 8 I
I
6 7
1
5
13
2
,
j
10 I
I 1
-AL
IylE
13
.
1
I
-.-
ES
5 t
1
I
I 10
kb
*
4
11 1 ,
SP
9 10
1
1-7
8
12
Figure 1. Cloned Regions from the 200 kb Expressor
Chromosome
The clones, selected by “walking” from libraries of the isolated 200 kb chromosome, were aligned on the basis of their restriction maps. Two related but different sequences have been obtained, and named ES and AL. The extents of the 11 AL and 13 ES clones are indicated. All clones are in plasmids (pUCl8 for ALI, 2,4-8, 9, and 11 and ESl-7,9, 11-13; pSR322 for AL3, 7, 8 and ES8), except AL10 and ESlO, which are LgtWES recombinants. Clones ESl-7 have been described previously, under the designation pES200.1-7 (Alexandre et al., 1988). Accordingly, the new plasmid recombinants are pES200.8-13 and pAL200.1-11, and the J. recombinants IES200.10 and IAL200.10. The filled boxes correspond to regions identical to full-size cDNAs (ESAGs 7,6, and 5 and the VSG gene, from 5’to 3’). The hatched boxes correspond to ORFs with evidence of stable polyadenylated transcripts, but whose cDNAs are not yet cloned (ESAGs 4, 3, and 2, from 5’ to 3’). The stippled box corresponds to a presumptive but uncloned coding region (ESAG 1; see Alexandre et al., 1988). The open boxes correspond to ORFs with no evidence for transcription. The box labeled “R” corresponds to a RIME retroposon. The two bars below the ES map correspond to regions absent from AL. Regions whose nucleotide sequences have been fully determined are indicated by arrowed lines above AL and below ES. The open and filled asterisks label the sequences presented previously (Alexandre et al., 1986) and in this paper, respectively. The abbreviations used for restriction endonuclease sites in this and the following figures are: D, Dral; E, EcoRI; H, Hindlll; K, Kpnl; P, Pstl; S. Salk SC, Seal; Sp, Sphl; Ss, Sstl; T, Taql; Z: Sspl.
suggesting either surface location or secretion. While no significant homology could be found between ESAG 7, 6, or 5 and sequences from data banks, ESAG 4 showed homology to yeast adenylate cyclase. The homology is mostly confined to two blocks, located in both sequences at a similar relative position with respect to the C terminus (Figures 4A and 48). In ESAG 4 this region is just downstream of a sequence encoding a hydrophobic domain (“b” in Figure 4C). The ESAG 4 protein contains nine potential glycosylation sites, all located upstream of this hydrophobic domain. Four of these sites are within the 100 N-terminal amino acids, on both sides of another hydrophobic region (“a” in Figure 4C). The ES map was found to differ from AL by two short extensions (bars under ES map in Figure 1). One of them, located behind ESAG 6, is due to the presence of a RIME retroposon (Hasan et al., 1984). This RIME copy is flanked by 9 bp repeats (Figure 2). The two main intergenic regions contain large sequences also found elsewhere. Between ESAGs 6 and 5, a 1060 bp sequence (Hl in Figure 2) is 99.6% homologous to that found immediately upstream of the 5’barren region in the lITat 1.3VSG gene expression site (Shah et al., 1987; J. R. Young, GenBank accession no. M15085). Between ESAG 5 and ESAG 4, a 1920 bp sequence (H2 in Figure 2) is 88% homologous to that found about 13 kb downstream in the same expression site, starting in the middle of ESAG 3 (nucleotides 1245-3158 in the 5853 bp sequence of Alexandre et al. [1988]; Figure 3). These se-
quences are interrupted by several stop codons in the three reading frames. Steady-State Transcripts of the Cloned Region The cloned ES sequences have been hybridized with polyadenylated RNA from trypanosome procyclic forms and four bloodstream variants, which have been chosen so as to analyze the transcription pattern of the VSG gene expression site in different situations (Figure 5). In AnTat 1.3A and l.lB the expression site is the same, while in AnTat 1.6C this site is inactive, although unchanged, and in AnTat 1.38 a duplicate of this site is transcribed (Pays et al., 1983a; Alexandre et al., 1988). All probes recognize stable RNAs, except between ESAG 4 and ESAG 3. While ESAG 7 and ESAG 6 transcripts are found in bloodstream variants only, ESAG 4, ESAG 5, and RIME transcripts are also detected in procyclic forms (although to a lesser extent for ESAG 5 and RIME). The pattern of transcripts is largely conserved between variants, even if they use different VSG gene expression sites, such as for AnTat 1.3A and 1.6C. However, some bands are specific to the cloned expression site only, such as the 0.8 kb transcript recognized by probe 3 (see last section of Results for comments). ESAG 7, 6, and 5 transcripts (arrows in Figure 5) could be cloned as full-size cDNAs, so that the 3’ splice sites and poly(A) addition sites could be mapped precisely (Figure 2). The positions of both sites differ between ESAGs 7 and 6, although they are located in highly conserved sequences flanking the genes. The cloned 3 kb
T. brucel VSG Gene ExpressIon 837
Site
ESAG 5 cDNA is 1.8 kb larger than the ORF, and could represent a precursor of the main 2 kb mRNA (Figure 5, panel
6). Transcription in Isolated Nuclei Run-on transcription assays have been conducted to evaluate the nature and extent of the RNA polymerase activity in the expression site. The results are presented in Figure 6, and summarized in Figure 7. Four major observations could be made. First, in nuclei from bloodstream forms, RNA polymerase activity can be detected throughout the whole expression site from the 5’Sphl site to the VSG gene, except for a small region upstream of ESAG 3 (band 7 in Figure 6F). Transcription cannot be observed in the AL sequence extending upstream of ES (bands 5 and 2 in Figure 6A). Second, this activity is resistant to 1 mglml a-amanitin. Third, several regions also appear transcribed by a polymerase sensitive to a-amanitin (see asterisks in Figures 6D-6G). They comprise RIME, ESAG 5, ESAG 4, and, as already pointed out (Alexandre et al., 1988) ESAG 2. This transcription, which is superimposed to that resistant to a-amanitin, can also be detected in procyclic nuclei. It does not occur in the expression site, since the nucleotide sequence of the corresponding cDNAs differs from that of the expression site (S. Alexandre and E. Pays, unpublished data). Fourth, RNA processing may occur in isolated nuclei, since the RNA synthesized in vitro always hybridizes better with coding sequences than with intergenic regions (Figure 7A). In particular, the 10 kb region between ESAG 1 and the VSG gene (the 5’ barren region) shows a very low hybridization level. Mapping the Promoter Cloning of the expression site could not be extended beyond a 5’Sphl site (Figure 1). However, the 5’-terminal region of ES is virtually identical to the corresponding sequence in AL (data not shown), and the restriction maps of AL upstream of the 5’Sall site and ES upstream of Sphl are very similar (data from Southern blot hybridization; not shown), so that one may assume that upstream sequences in AL are homologous to those expected to exist in the expression site. If true, this would imply that transcription starts in the vicinity of Sphl, since the AL sequences upstream of that region do not hybridize to any transcript (bands 2 and 5 in Figure 6A). The same conclusion has been reached by studying the effects of UV light on transcription. As shown in the kinetics of Figure 8, irradiation of trypanosomes by UV light (254 nm; 1 min with 1 J/sec/cm2) leads to a rapid destruction of endogenous RNA, since only 26% of total RNA, or 3% of poly(A)+ RNA, remains 15 min after irradiation. Transcription then resumes, but with an altered processing pattern (Figure 8). The newly synthesized transcripts are enriched in large RNA species, which hybridize to intergenic probes and can be interpreted as mRNA precursors. Thus it would seem that, in addition to inhibiting RNA elongation, UV irradiation interferes with RNA pro-
cessing. Near the promoter, where the inhibition of elongation is minimal, a blocking of RNA processing would be expected to lead to an apparent stimulation of transcription. This prediction has been verified in the case of the rDNA transcription unit, where UV irradiation was found to increase hybridization of run-on transcripts with the promoter DNA region by about lo-fold, while total transcription was inhibited by 60%, due to blocking of RNA elongation (Figure 78). In the expression site a strong increase in hybridization has been found with either run-on or steady-state RNAs and DNA fragments from the 5’ Sphl region only (Figures 6-8). This would suggest that the transcription promoter of the VSG gene expression site is in the Sphl region. Evidence That ES Clones Are from the Actual Expression Site Since all sequences cloned upstream of the VSG gene are repeated at least 5-fold in the genome (see the number of hybridizing fragments in Figure 9) stringent criteria must be met to ensure that clones originate from the unique expression site. Even the construction of clone libraries from the isolated 200 kb expressor chromosome did not prevent ambiguity, since two similar but different sequences were obtained. Three lines of evidence prove that one of them (ES) is from the active expression site. First, the nucleotide sequences of three full-size ESAG cDNAs were found to be identical to their genomic counterparts in ES, but not in AL (Figure 2). These sequences amount to more than 6 kb, including 2 kb of noncoding regions. Moreover, we found that all restriction maps from 20 independent cDNA clones of ESAG 7 and ESAG 6 are characteristic of either gene from ES, with no evidence for transcripts from AL (data not shown). Second, among the RIME transcripts a 0.8 kb RNA (arrowhead in Figure 5) specific to RIME in ES (data not shown), is found only in the three variants where ES or a copy is thought to be used as the expression site. Third, since the AnTat 1.3A expression site is duplicated over more than 40 kb in the AnTat 1.38 variant (Pays et al., 1983a), restriction fragments specific to this expression site should appear more abundant in AnTat 1.38 than in AnTat 1.3A. As shown in Figure 9, all fragments that exhibit this characteristic belong to ES. Discussion We present here a detailed characterization of an active T. brucei VSG expression site up to the transcription promoter region. Previous publications have dealt only with portions (Shea et al., 1987; Shah et al., 1987; Alexandre et al., 1988; Gibbs and Cross, 1988) or possible models (Kooter et al., 1987) of the unique active site. The AnTat 1.3A expression site appears to contain a total of eight genes, six of which may encode membrane proteins, and only five of which (ESAGs 7, 6, 3, and 1 and the VSG gene) are transcribed exclusively in the expression site. Apart from ESAG 4, no ESAG exhibits homology to known sequences, so the function of these genes remains
Cdl a38
61
CGCAGAAAAG ACGGTGTATT
CTTGCAACTC GGAAGGGGGG
CATGAAGTGT TCGGTGAGAC
TATCATCCAT GGCACAACTT
TTCCTTCAAT AACGTGGGGT
AGCTTGGGTG GGTGGAGTGA
CGGAAAATAT TGAATTTTGG
CCGAGGAGGA ATCCT.4TCAA
AAT‘G ‘A G G C AACGGAAGTA
ATGGATGTAA TGTACGTTGA
TTTGGTGAGA GGGTGGTGAA
ACAGATACTG TACGGTGATG
CTACAGAGTA AAACTACTGT TGAGTGAACA GTGTTTCTAC TTGGGGGAAC GGCATGTGGT
CGGTGCAGTA GGATGAATTG ACTCTCT1'TA CGCTGCACTT AAAAACCTAT ATGTACAAGC
CGGTGGACCG GCGAAATGCA GGTGAAGAAG TGTTTTTCAG TTCTTTTATT TACGAAAACG
AAGATCCCAG TAGGGCAGAA AGGAGACGAT TTTTATTGTT TGGGGGAACA TGTGAAACAA
TAAAGTGAGC AGAGGTFGAG CCTGAAATCT CGQ&YJ&&j AATGGGCAAA AACGAGATGT
ATATTTAAGG CTCACTGATG AACTATACTG TGAAGGAAGG AGTAACGTAA AAGGGGAAAA
ATGTCATCCG ATCAGCTTTA CAGAACCGGT ATGCGACAGA GTTTCCAGTG TGTAACAACC
CCTGTTTGCG CGAGGAATTT CCGTGGACCT MXTGCGCTG GGAGTGGTAT AACTATGTTA
CGCTTCCAAG ATCTGGGAGA TTCACGGTAG CTTAGCGTGA GTGTGTGTGT AATTTCAGGA
AAGCAAAAAA CCATACACAG CGGGGTCEAA AAGATTATGG ATGGGGCTGG GACTATTTTT
TGAAGTGATG ATTGGAGCTG CGCGGCAGCA TAATGGAGGG CTAAGGAAAG CAAATTTAGT
AATAAAATAA TCAA*GAG*G GTXATTTGA TTGTGAAAGA ATGTGAGTTC TACAACAAAG
AACCGGCACT AAAAGGGTTA ACTGTGCCGT GGAAGGCGGG AAAGATATGT AACAGTTCTA CCTGTTGCTG CTCAGCGTCA GTCGAAGTTG
CAAACCAGGG ACCGTGTGAG TTACGCGG1T GACTACTAGA ATGATAAGGA AAATTGAATA TTGTTKTTC GTTGGACCGT CCGTGGACCT
GTTACGCCAT GGTCGGAAGC TCCTTGCAGT GCGTGACCC.4 TACTGCGGTG
TGTTACCAGT GCGACAGTTA GGGTCGGCAC CTGATAACTC AATCAAATTT GCAGACAAAT AGTGGCCKT AAAACAGCCC TTAATGAATG
I
+I ACTTGCG AATT A G AGGCGGAM TTATCGGTG TAGCI AACA TAATGCTTAC TGGTGGTGC CGTGGTTGA TGTCTGCCC
T. brucei VSG Gene Expressron 839
Figure 2. Nucleotide
Sequence
Site
from the VSG Gene Expression
Site
The nucleotide sequence of an 18.6 kb region from ES, starting at the 5’Sphl site, is presented together with the extent of three full-size cDNAs (thick vertical lines at the right), from the 3’splice site (filled triangles) to the poly(A) addition site (brackets). The ORFs thought to encode the ESAG 7-4 proteins are boxed (7-4). as is the RIME copy(R), whose putative duplicated target site is underlined. Also boxed are two noncoding sequences that are repeated elsewhere, in a different (Hl) or in the same (H2) VSG expression site. The arrows flanking both ESAG 7 and ESAG 6 mark the boundaries of a 2 kb duplicated domain. The dotted lines are for sequences homologous to the 76 bp repeats usually found upstream of VSG genes (Liu et al., 1983; Shah et al.. 1987). The GenSank accession number for this sequence is M20871.
speculative. ESAG 4 seems to encode a transmembrane protein homologous to yeast adenylate cyclase in its C-terminal region. This homology might be significant, since it is located in the region reponsible for enzyme activity in yeast adenylate cyclase (Kataoka et al., 1985). Moreover, the homology blocks of both proteins are found in similar relative positions with respect to the C terminus, and, in ESAG 4, they are situated next to a putative transmembrane domain. As expected for a region located at the inner surface of the plasma membrane, the homology is found at the C-terminal side of the transmembrane domain, while all potential glycosylation sites and putative signal peptide are at the other side. If ESAG 4 does actually encode a trypanosome adenylate cyclase, its coordinate expression with the VSG gene would suggest some form of functional relationship, in accordance with the observation that the release of VSG stimulates adenylate cyclase activity (Voorheis and Martin, 1980). Whatever its nature, one should anticipate the existence of several ESAG 4 activities, only one of which
would be differentially expressed during the parasite life cycle. Indeed, at least one other member of the ESAG 4 family is transcribed outside the expression site, in procyclic as well as in bloodstream forms. Preliminary observations (S. Alexandre, unpublished data) indicate that transcribed ESAG 4-related genes are primarily conserved in their 3’-terminal domains. The presence of a RIME copy in the active expression site, but not in the silent AL sequence, suggests that insertion of RIME may condition the activation of a potential expression site. The higher transcription rate of RIME in bloodstream over procyclic forms (Murphy et al., 1987a) would be in accordance with this hypothesis. However, in the variant AnTat 1.6C, where the whole AnTat 1.3A expression site is conserved but inactive, the RIME copy of ES is still present (E. Pays, unpublished data). Transposition of RIME is thus not the (only) factor reponsible for the selective telomere activation underlying VSG gene expression. Two long, noncoding intergenic sequences (Hl and H2
Cell 640
6, 7
MKFWFVLLA MKFWFVLLA
LLGKETHA’YY LLGKETHAYY
ENKRNALE ENKRNALNAT
AANKVCGLST AANKVCGLST
TFQQELEEM
RNASALAAAA RmALAAAA
AGIA AGIAAGRLEE
WIFVFAQAAG WIFVFAQAAG
;!;;??=&v,
B;m;
;;:~;,,&~WTRD:SKV
6
;
E
YLKGIAHRYN YLKGIAHRVN
SQFCISVG
SESAVVTEKL SESAVVTEKL
SDLKMRSIQL SDLKMRSIOL
QLS”,,RNR”P QLSVMRNRVP
SGEQDCKDIR SGEQDCKDIR
TLLKTYLRN TLLKTVLRN
100
QECFDGIIGP
LYKI DSR ETLYKIEDSR
VKESAQKSLQ YKESAQKSLQ
LHEVLSSIS LHEVLSSIS
200
KHIPAEHGNL
SIFKDVIRLF SIFKDYIRLF
ARFQEAK; ARFQEAK
300
LKI
5
NKIKTTVDE NKIKTTVDE
LAKCIGQKEV LAKCIGQKEV
MSSKGLWRIS DIHVGSSEIG TENVFFPRHA LLPRGGAPIS FNLVRALGSL
PFPSKRVNRV MRDPLITLIG FPNFPVAVVS LFS”L,,NVSA SILPHLii-
KRVSEQLS
SVEETYFRUVGSATVKFE EGKSKMPKWL GGRVKD”YDK SFTALNNMLR L,,IKSGHLVV QAAHIALVDG ATHANLDSVD
MNMLHLSDRN ASLAPSG f H SLPTGGAVCR AGSVNVQVVi?%NLDLRPED FIKKQLELET RVLRLGFMYL QGVHYGDEEY ALTVNVHSR” SVQLSLIEMW QLALEAAGAS FAPGOLLFTG DRATFVRSLY ~QRRYVINDI VIGDVGGTCE KAAALWQKGT SHLVGKGDLG YSDRFFLHAF SSYLAGAGVG NVDAVICSNE ADGIADFLRS ATALWAGA ARLVFATSLP HWGDTETSSK DTANKLSKNG CASNYGATQI SVWS”ARALE LIFTDIESST ALWAAHPEVM PDAVATHHRL PPTARLDPKV YSRLWNGLRV RVGIHTGLCD RLNAVPGRTF SVLRLELRLL NDDEDQTITS AKVGRVmDFG TRKPSSSVTS WKGVEVSSQV
DAPlDILPVIL DQNKIVIVYG GYELHGVFTV TNPLAKDSQY GDAAKHGATC NTTIAEAAND SLTEFAVSLR TAQLFHDVEK AIPPLTNPM IRTLISKYEC IRRDEVTKGY CSDHSSSRTD EERLL
G
EYG
SSHD”
RWTEDPSKV
LGEEEETILK
SMAEPVRG
PFTVAGSNAA
AVHLSYSTAA
LCFSVLLLGYL
401
EPNKIILKFW DVLPKLGHHI RVPFFLRYVI TNVSVTSSRI -
BATVPFTR ITAVNQILAN SSNALSVSWT DGLDSSTMNT
FVYHSFWCYL KAKEIFHMFP “FCSGCASEA KIRDLINLSW -
YPCSGSAQAE IVFVNSSKIE TFELmPW,, INLNVTYTFP -
IR%“ALWL YGQMRLELVV KSLNKKLFTF APFDLCTKH”
DVSAGRGGLL LPDAADKLML NYRDVEVAVD NITSECYVAG -
100 200 300 ‘roe
RAPVALLLLL PLGDKSVMHS HSPDGKPAPD TAIKRFQEVM ECNQGSKAVY LRRDQGERIV SAVIREDGED DSRLWTPLSV TPSMTFRNSN YEVKTVGDSF DYYGRTSNPlA LSVAAQTIAA
VVLPQLSVGA IPHLMNHRVV AEFKEVFERF SEYLKAHVGE VKE”LENGQK TAVFGPVTEA VGKYLPMSGT LAFATGRLMR AGRISGASLV MIASKSPFAA ARTESVANGG SLQSLLGTFT
EAmVKVLS AFGLITGSTF ATALPOAIIV TEEADYFL TSVRSGFTVL MLDTPXFI VFVIGLSVPD VILLHVEEMS GIIIGGALAL VQLAQELQLC QVLMTHAAYM PAQRQKALIP
ATWNWYMPRK IRQWNPYLYF FGAPVDDTAK THDLEGELPlV KASLCYTDSS DPLELKPRLN VKEIARKLEE PETLVNFFYT FLVVALGVVP FLHHDWGTNA SLSAEERQQI FCERWRVPLP
YVTAINAGFNASLKSRWV LRADPAAETL VLIRYSLCQL FLMMMAVDER IARSYILSPS YGWISGEVLS QALSNLEWLK ELHGPLDGLV VFMKDDDIAS KFRRNVIHLS PTLEQQLYVL RNDLRYIVLF GEFSFLYDLF DSSIVSDDMR YGVFDDTKCV YFFLRNTVIT ICTKDDRPVT IDESYQQFEQ QRAEDDSDYT DVTALGDVPL RGVPKPVEMY QKVGNVWDDD GCQEVVRRVA
100 200 300 400 500 600 700 800 900 1000 1100 lZO0
Figure 3. Translation of the ESAGs shown in Figure 2 The ESAG 7 and 6 proteins are aligned to maximize homology; identical residues are boxed. The putative cleavage sites of the signal peptides (Van Heijne, 1963) are marked with downward arrows. Potential glycosylation sites (Marshall, 1972) are underlined.
in Figure 2) were found to be also present in another location, in the same or in a different VSG expression site. In particular, the sequence between ESAG 5 and ESAG 4 contains a partial duplication of ESAG 3, located about 13 kb downstream. This is worth mentioning, since the second half of ESAG 3, the “companion” sequence in Pays et al. (1983b), appears particularly prone to translocation (Shea et al., 1987; Kooter et al., 1988). It would then seem that the VSG gene is not the only sequence to undergo frequent rearrangements in the expression site. In this regard, we note that the AnTat 1.3A expression site does not exhibit the large duplications/triplications postulated to exist in the gene 221 expression site (Kooter et al., 1987). We have tried to map the promoter of the VSG gene transcription unit by plotting the relative sensitivity of transcription to UV irradiation, as done for another expression site by Johnson et al. (1987). While relatively consistent for the rDNA transcription unit taken as a reference, this approach gave complex results for the VSG expression site, even if transcription was conducted in the presence of 1 mg/ml a-amanitin so as to avoid the complications due to
Figure 5. Steady-State
Transcripts
of the Expression
transcription of ESAG family members outside the expression site. In particular, we did find that, in addition to inhibiting RNA elongation by insertion of pyrimidine dimers in the DNA (Sauerbier and Herculez, 1978), UV irradiation consistently alters the pattern of RNA processing and dramatically increases the hybridization of steady-state and run-on transcripts of some DNA regions. In the rDNA transcription unit these effects characterize the promoter region (this study; H. Coquelet, unpublished data). In the expression site they culminate near the 5’Sphl site, located about 45 kb upstream of the VSG gene. A likely interpretation is that, by preventing RNA processing at the same time as inhibiting RNA elongation, UV irradiation strongly enriches transcripts in precursor RNAs from the promoter region, since the transcription of this region is the least affected by the introduction of pyrimidine dimers. This enrichment can be observed in run-on transcripts because RNA processing would still occur in isolated nuclei, as suggested by the striking correlation between the level of hybridization of run-on transcripts and the relative abundance of steady-state RNAs.
Site
Northern blots of polyadenylated RNA from bloodstream variants AnTat 1.3A, 6C. 18, and 38 (pedigree in Pays et al., 1963a), as well as from procyclic trypanosomes (p) (see the order of lanes above panel 15), were hybridized with the 15 probes defined in the map below. The arrows point to transcripts cloned as full-size cDNAs, while the arrowhead designates a RIME-specific transcript from the expression site (see Results). The ESAG 7 and 6 mRNAs are very similar, and belong to the same band in panel 2. The exposure in panel 1 is 5 times longer than that in the other panels.
T. brucei VSG Gene Expression 841
0ANL
5oo
Figure 4. Homology Adenylate Cyclase
12
YAC
\C 100
Site
(A) The sequences of yeast adenylate cyclase (YAC) (Kataoka et al., 1985) and ESAG 4 (ES4) are aligned starting from the C terminus. Boxes 1 and 2 are blocks of homology between the two sequences. (B) The amino acid sequences of blocks 1 and 2 from ESAG 4 (ES4) and yeast adenylate cyclase (YAC) are aligned to maximize homology. Identical amino acids are boxed, while the dots indicate similar residues. (C) Hydropathy profile of the ESAG 4 protein according to Kyte and Doolittle (1982) with a window of six amino acids. Hydrophobicity is indicated by positive values, hydrophilicity by negative values. The two major hydrophobic regions are labeled “a” and ‘b.” The regions of homology with yeast adenylate cyclase are indicated as lines 1 and 2. The arrow points to the putative signal peptide cleavage site, while arrowheads are for the possible glycosylation sites.
\ \\ \
ES4
i
I\
\
N-HeC
4
amino acids
kb
1
2
3
4
5
of the ESAG 4 Protein to
6
7
3c
iii-2
-
15 a8
SEP$
. 4a‘
r
l'I 6
9
10
11
12
13
9 Jo, 11 4 lq
14 13
15 --
Cell 042
oa-
Figure 6. Transcription
of the Expression
uv
oa-
uv
--oaoa b
--oaoa P
b
oao P
b
a P
oaoa -~ b
P
Site in Isolated Nuclei
RNA synthesized in nuclei was hybridized to Southern blots of sequences cloned from the expression site. In each panel the first lane shows the ethidium bromide staining of the restriction digest, while lanes o and a show the hybridization pattern with RNA transcribed without, and with, 1 mglml a-amanitin. respectively. In (A)-(C) the RNA probes were from nuclei of bloodstream forms, while in (D)-(G), b and p refer to RNA from bloodstream and procyclic nuclei, respectively. In (A)-(C) the three lanes labeled YIV” show, from left to right, the hybridization pattern after UV irradiation (254 nm; 1 J/se&m*) for 30 sec. 1 min, and 2 min. The interpretation of the restriction digests is shown in the maps below, where the open boxes refer to the ESAGs and RIME, and the filled boxes refer to the plasmid DNA in each clone. Dots indicate fragments that do not appear to be transcribed, while asterisks indicate sequences transcribed both in the expression site by a polymerase resistant to a-amanitin and elsewhere by a polymerase sensitive to the drug, such as in procyclics. The arrow points to the fragment thought to contain the transcription start site, and the arrowhead highlights the fragment where the apparent stimulation of transcription by UV irradiation is maximal. The star in map B refers to a Sphl site that is only partially cleaved, generating fragment 2’.
Another interpretation is that both rDNA and VSG gene transcription promoters are stimulated by UV irradiation, like those of some gene families in human skin fibroblasts and keratinocytes (Angel et al., 1986; Kartasova and Van de Putte, 1988). However, the latter type of gene activation is also induced by tumor promoters and factors released by irradiated cells, and we did not observe these effects here (E. Pays and H. Coquelet, unpublished data). Moreover, a stimulation of the promoter activity by UV ir-
Figure 8. Effect of UV Irradiation
radiation would not easily account for the qualitative alteration of the pattern of transcripts. Finally, this study confirms our previous claim (Alexandre et al., 1988) that the AnTat 1.3A expression site is interrupted by a seemingly silent region just upstream of ESAG 3. This region is probably analogous to that described in another VSG gene expression site by Shea et al. (1987). Whether it represents a true promoter or only a “reinitiation” entry for RNA polymerase, as observed for
on RNA Processing
Polyadenylated RNA was extracted from bloodstream forms incubated for different periods at 37% under conditions defined by Johnson et al. (1987). without prior treatment (lane 1) or after UV irradiation (254 nm; 1 J/sec/cm2) for 1 min (lanes 2-5). Incubation periods were 45 min for the control (lane 1) and 15, 30, 50, and 75 min for UV-treated samples (lanes 2-5). The amount of RNA (12.4, 0.4, 2.6, 2.9, and 1.7 frg in lanes l-5) is the total yield from each sample of lo9 trypanosomes. Lanes 6 and 7 contain the same amount of RNA (1 ng) from control and UV-treated cells, respectively, both incubated for 100 min. The Northern blots were hybridized with several probes from the VSG gene expression site, as defined in the map below (probes a-e). Two exposure times are shown for lanes 1-5; an additional, lower, exposure is shown before lane 1 in panel b to accentuate the qualitative differences between lanes 1 and 3-5. The filled and hatched boxes are for conserved sequences on both sides of ESAGs 7 and 6. A possible interpretation of transcripts in panels a-c, based on the relative extent of hybridization and on cDNA analysis, is shown below the map. Note that the quantitative difference in the overall amount of transcripts between control and UV-treated cells is very high near Sphl, and decreases from 5’ to 3’ (lanes 6 and 7 in each panel).
T. brucei VSG Gene Expression 843
Site
Figure 7. Summary of Results Transcription Assays
A
from Run-On
(A) Level of hybridization of run-on transcripts with different regions of the expression site. The RNA probe was synthesized by bloodstream nuclei in the presence of 1 mg/ml a-amanitin. Its level of hybridization with restriction fragments from the cloned sequences was determined by liquid scintillation counting of the relevant cutouts from Southern blots of gels with stoichiometric amounts of DNA, and is expressed as cpmlbp. The effects of aamanitin and UV irradiation are detailed below the map: am/c refers to the ratio of hybridization between the RNA synthesized with 1 mglml a-amanitin and the control probe, and uv/c shows the same ratio, but between the “UV” RNA probe and the control. “Uv” RNA was synthesized in nuclei from bloodstream 1 rDNA ES forms treated by UV light for 1 min and then in“’ cubated for 90 min at 37% while the control 5; was processed in the same way except for UV a e 21 Irradiation. Regions showing marked sensitivb ity to a-amanitin are underlined, whereas the 1 f ‘\ region showing the highest stimulation by UV C ‘\ 5 irradiation is boxed. \ -._ -, _-__ (6) Effect of UV irradiation on transcription. .2 --._ Bloodstream forms were Irradiated for 30 set or 1,2, or 4 min, and then incubated for 90 min d at 37%. Run-on transcripts were hybridized &-‘:::: with restriction digests of clones from the rDNA 4 1 2 4 min 1 2 (left) or the expression site (right), and the ratio of hybridization relatrve to the unirradiated control probe is presented on a logarithmic scale. The restriction fragments hybridized in curves a-d are, in order: 600 bp Rsal, 1.5 kb Hindll-Bglll, 1.5 kb Bglll-Hindlll, and 1.4 kb EcoRI-Bglll. Their mean distances from the putative transcription promoter, as determined from the rDNA restriction map of White et al. (1986), are, in order, 0.3, 0.6, 2.1, and 5.5 kb (H. Coquelet, unpublished data). In curves e and f the fragments are 1.1 kb Sphl-Avalll and 0.9 kb Avalll. Their mean distances from the 5’Sphl site of ES are 0.5 kb and 2.6 kb. Control values (100%) were 1220, 7910, 14180, 26890, 1140, and 750 cpm, for a-f, respectively. The dotted curve shows the effect of UV irradiation on total RNA synthesis, as measured by liquid scintillation counting of trichloroacetic acid precipitates; in this case the control represented 696,000 cpm.
B UV!C
12345
12345
12345
67
probe:
67
a
b T
7 -4.7
5.3
a
b
D
6
m'
E
d b 1.6
67
t
67
7-e
C
c 23
1.6-
67
Ii
I
P
e
H
---
Cell 844
1
6
l-
SpH
EH sc!p+
H
KH ,,
E
(
4
L
17
4
y2
b ,3,
1 L
Figure 9. Evidence that the ES Clones Are from the Expression Since the AnTat 1.3A expression site is duplicated should appear more abundant in AnTat 1.38 DNA 1.38 genomic DNA (right). The Southern blots were designated by arrowheads appear more abundant
5
Procedures
Acknowledgments We acknowledge Dr. D. Jefferies for helpful comments on the manuscript, S. Van Assel for computer analysis of the DNA, and G. Hilgers for help in the UV irradiation experiments, This work was supported by the Belgian Fonds de la Recherche Scientifique Medicale and by a research contract between the University of Brussels and Solvay & Cie. (Brussels). It was also funded by the African Trypanosomiasis Compcnent of the UNDPMlorld Bank/WHO Special Programme for Research and Training in Tropical Diseases and by the Agreement for Collaborative Research on African Trypanosomiasis between ILRAD (Nairobi) and Belgian Research Centres. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC. Section 1734 solely to indicate this fact. December
I
6
7
I
Site
The trypanosome clones AnTat 13A and 1.38 have been characterized elsewhere (Pays et al., 1983a; Alexandre et al., 1988). The procedures for DNA and RNA isolation, Southern and Northern blot hybridization, and DNA cloning were as described (Pays et al., 1980; Alexandre et al., 1988). cDNA libraries were constructed in lgtl0 according to Gubler and Hoffman (1983) using the Amersham cDNA synthesis and cloning kits. The sequences of DNA fragments, subcloned in bacteriophage Ml3 derivatives, were determined on both strands by the method of Sanger et al. (1980) using a modified T7 DNA polymerase (Sequenase, United States Biochemical Corp.). Run-on transcription assays were conducted as described by Murphy et al. (1987b). UV irradiation (254 nm; 1 J/se&m? was performed under the conditions defined by Johnson et al. (1987). Computer analysis of DNA was performed using the DNASIS/PROSIS programs of LKBIHitachi. The sequences have been compared with the following data base releases: GenBank R52.0, EMBL R13.0, and NBRF-PIR R14.0.
Received
1
in the variant AnTat 1.38 (Pays et al., 1983a), restriction fragments specific to the expression site than in AnTat 1.3A. Each panel shows a restriction digest of 1 w of AnTat 1.3A (left) and AnTat hybridized with different probes from ES, as indicated by thick bars under the map. The fragments in AnTat 1.38 than in 1.3A. All belong to the ES map, as indicated by the brackets under the map.
rDNA transcription (Mitchelson and Moss, 1987), is a matter of speculation. In support of the latter hypothesis, no apparent stimulation of transcription by UV irradiation, as found for the rDNA promoter, could be detected in this region. Experimental
KH
30, 1988; revised March 13, 1989.
References Alexandre, S., Guyaux, M., Murphy, N. B., Coquelet, H., Pays, A., Steinert, M., and Pays, E. (1988). Putative genes of a variant-specific antigen gene transcription unit in Tipanosoma brucei. Mol. Cell. Biol. 8, 2367-2378. Angel, P, Pating, A., Mallick, U., Rahmsdorf, H. J.. Schorpp, M., and Herrlich, P (1986). Induction of metallothionein and other mRNA species by carcinogens and tumor promoters in primary human skin fibroblasts. Mol. Cell. Biol. 6, 1760-1766. Baker, S. M., and Platt, T. (1986). Pol I transcription: the end or the beginning? Cell 47, 839-840.
which comes first,
Bernards, A., Kooter, J. M., and Borst, P. (1985). Structure and transcription of a telomeric surface antigen gene of T?ypanosoma brucei. Mol. Cell. Eiol. 5, 545-553. Borst, P (1986). Discontinuous transcription and antigenic variation in trypanosomes. Annu. Rev. Biochem. 55, 701-732. Gully, D. F, Ip, H. S., and Cross, G. A. M. (1985). Coordinate transcription of variant surface glycoprotein genes and an expression site associated gene family in Trypanosoma brucei. Cell 42, 173-182. Gibbs, C. P, and Cross, G. A. M. (1988). Cloning and transcriptional analysis of a variant surface glycoprotein gene expression site in Trypanosoma brucei. Mol. Biochem. Parasitol. 28, 197-206. Gubler, U., and Hoffman, B. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263-269. Hasan, G., Turner, M. J., and Cordingley, J. S. (1984). Complete nucleotide sequence of an unusual mobile element from Trypanosoma brucei. Cell 37. 333-341. Johnson, P J., Kooter, J. M., and Borst, P (1987). Inactivation of transcription by UV irradiation of T. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell 51, 273-281. Kartasova, T, and Van de Putte, P (1988). Isolation, characterization and UV-stimulated expression of two families of genes encoding poly peptides of related structure in human epidermal keratinocytes. Mol. Cell. Biol. 8, 2195-2203. Kataoka, J., Broek, D., and Wigler, M. (1985). DNA sequence and characterization of the S. cerevisiae gene encoding adenylate cyclase. Cell 43, 493-505. Kooter, J. M., and Borst, P (1984). a-Amanitin-insensitive transcription of variant surface glycoprotein genes provides further evidence for discontinuous transcription in trypanosomes. Nucl. Acids Res. 12, 9457-9472.
T. brucei VSG Gene Expression 845
Site
Kooter, J. M., van der Spek, H. J., Wagter, R., d’oliveira, C. E., van der Hoeven, F., Johnson, P J.. and Borst, I? (1987). The anatomy and transcription of a telomeric expression site for variant-specific surface antigens in T. brucei. Cell 51, 261-272. Kooter, J. M.. Winter, A. J.. d’oliveira, C., Wagter, R., and Borst, P (1988). 5’ Boundaries of telomere conversions map at different positions in a VSG expression site of 7: brucei and may disrupt an expression site associated gene. Gene 69, l-11. Kyte, J., and Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 757, 105-132. Liu, A. Y. C., Van der Ploeg, L. H. T, Rijsewijk, F. A. M., and Borst, I? (1983). The transposition unit of variant surface glycoprotein gene 118 of T: brucei. J. Mol. Biol. 767, 57-75. Marshall, 673-702.
R. D. (1972). Glycoproteins.
Annu.
Rev. Biochem.
41,
Mitchelson, K., and Moss, T. (1987). The enhancement of ribosomal transcription by the recycling of RNA polymerase I. Nucl. Acids Res. 75, 9577-9596. Murphy, N. B., Pays, A., Tebabi, P, Coquelet, H., Guyaux, M., Steinert, M., and Pays, E. (1987a). Ttpanosoma brucei repeated element with unusual structural and transcriptional properties. J. Mol. Biol. 795, 855-871. Murphy, N. B., Guyaux, M., Pays, E., and Steinert, M. (1987b). Analysis of VSG expression site sequences in T brucei. In Molecular Strategies of Parasitic Invasion, N. Agabian, H. Goodman, and N. Nogueira, eds. (New York: Alan R. Liss, Inc.), pp. 449-469. Pays, E., and Steinert, M. (1966). Control of antigen gene expression in African trypanosomes. Annu. Rev. Genet. 22, 107-126. Pays, E., Delronche, M., Lheureux, M., Vervoort. T., Bloch, J., Gannon, F., and Steinert, M. (1980). Cloning and characterization of DNA sequences complementary to messenger ribonucleic acids coding for the synthesis of two surface antigens of Tipanosoma brucei. Nucl. Acids Res. 8, 5965-5981. Pays, E., Delauw, M.-F., Van Assel, S., Laurent, M., Vervoort, T., Van Meirvenne, N., and Steinert, M. (1983a). Modifications of a bypanosoma b. brucei antigen gene repertoire by different DNA recombinational mechanisms. Cell 35, 721-731. Pays, E., Van Assel, S., Laurent, M., Dero, B., Michiels, F., Kronenberger. P, Matthyssens, G., Van Meirvenne, N., Le Ray, D., and Steinert, M. (1983b). At least two transposed sequences are associated in the expression site of a surface antigen gene in different trypanosome clones. Cell 34, 359-369. Pays, E., Guyaux, M., Aerts. D., Van Meirvenne, N., and Steinert, M. (1985). Telomeric reciprocal recombination as a mechanism for antigenie variation in trypanosomes. Nature 376, 562-564. Sanger, F, Coulson, A. R., Barrel& B. G., Smith, A. J. H., and Roe, B. (1980). Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 743, 161-178. Sauerbier, M., and Herculez, K. (1978). Genes and transcription mapping by radiation effects. Annu. Rev. Genet. 72, 329-363.
unit
Shah, J. S.. Young. J. R., Kimmel, B. E., lams, K. P, and Williams, R. 0. (1987). The 5’ flanking sequence of a Jvpanosoma brucei variable SUrfaCe glycoprotein gene. Mol. Biochem. Parasitol. 24, 163-174. Shea, C., Lee, M. G.-S., and Van der Ploeg, L. H. T. (1987). VSG gene 118 is transcribed from a cotransposed pol l-like promoter. Cell 50, 603-612. Von Heijne, G. (1983). Patterns of amino acids near signal sequence cleavage sites. Eur. J. Biochem. 733, 17-21. Voorheis, H. P, and Martin, B. R. (1980). “Swell dialysis”demonstrates that adenylate cyclase in Jvpanosoma brucei is regulated by calcium ions. Eur. J. Biochem. 773, 223-227. White, T C., Rudenko, G.. and Borst, P (1986). Three small RNAs within the 10 kb trypanosome rRNA transcription unit are analogous to domain VII of other eukaryotic 28s rRNAs. Nucl. Acids Res. 74, 9471-9489.