Genomic repetitive DNA elements of Trypanosoma cruzi

Genomic repetitive DNA elements of Trypanosoma cruzi

Reviews - Genomic Repetitive DNA Elements of Try#xwl0s0mu cruzi J.M. Requena, MC. Ldpez and C. Alonso is hilmans and the evolutionarily related Bl e...

483KB Sizes 12 Downloads 287 Views

Reviews

-

Genomic Repetitive DNA Elements of Try#xwl0s0mu cruzi J.M. Requena, MC. Ldpez and C. Alonso is hilmans and the evolutionarily related Bl elements in the mouse and other rodents. The LINE elements lalso called poly(A) type and non-LTR (long tennhul repeat) elements1 turbwr sequences of several kile bases and may show the clwwteristics of retr”posons? It has been suggested that the enzymatic machinery of the LINE elements is responsible for the integration of the SINE elements and of proc~sed pseudogene+.

Repetitive DNA sequences constitute a substantial proportion of the genomes of eukaryotic organisms’. Despite their abundance, their functional implications remain unknown. The classification of such sequences as ‘selfish DNA’ refers to the fact that these seqoences are maintained solely by their ability to replicate within the genome. An alternative explanation for their maintenance is that eukaryotes tend to Preserve a vast abu”d.,nce of redundant genetic material to allow the interactions of individual parts of the genome: the genes and/or their regulatory units. According to their genomic distribution. the repetitive sequences have been divided into two main groups: (1) clustered repeats; and ii) dispersed repests. It is most probable that the different genomic organization of these repeats derives from amplification processes that generate each one of them. While tandem repeats may originate either by recombination or by replication slippage mechanisms, the existence of the majority of the interspersed repeated DNA sequences might be explained by retmposition and transposition mechanisms”. Three major classes of tandemly repeated DNA sequences, or satellite DNA, have been described. While the microsatellite sequences are arrayed as 2-5bp repeats, with a mean array size of IOObp, the minisatellite sequences are constituted by arrays of longer (about 15bp) repeats, having hypervariable sizes5 in the range of 0.530 kb. The third class, the satellite sequences, comprise repeat units 5-200 bp long, typically organized as megsbase clusters 1oc;lted in the heterochromatic regions of chromwomes’. Short (SINE) and long (LINE) interspersed nucleotide elements also represent a high proportion of the genonw of mammals and other higher eukaryotes6. The SINE elements are typically less than 500bp and eminent examples we the well-known Al” sequences

Several classes of repeated sequences The analysis of the kinetics of reassociation of T. crmi DNA has show1 that approximately 9-1470 of the total parasite DNA compriw highly repetitive sequenc&“. The tirst reP&ed nuclear DNA sequence described in the T. crlrzi genome was i) satellite DNA scquence’“~“, isolated as a fist-sedimenting DNA fraction after sucrose gradient centrifugation of a Bspl digest of cellular DNA’O. The rePeat length of the T. cruzi s&ellite DNA sequence, with a mpy number of approximately 12OKllJ per genomel’, is 19%l%bp and lacks detectable homology with the l77bp satellite repeat of T. brucei. Although it was post&ted that this satellite DNA was located in minichromosomes, recent experiments using pulsed-field gel electrophoresis (PAGE) have demonstrated that mlnichmmosomes do not exist in T. cruzi and that the satellite DNA sequences are distributed in several, but not a$ of the T. crr#zichr”mtKomes~~. In addition to the tandemly repeated satellite DNA, the T. mrzi genome contains several families of interspersed repetitive DNA sequences showing features of the eukaryotic SINE sequences. The first such sequence, isolated on the basis of the kinetic proPertie of reassociation of repetitive DNA sequence+ from an EcoRI-T. cruzi DNA library, was the El3 element’ j. The DNA sequence that constitutes this element represents aPPr”ximateIy 7% of the total nuclear DNA of the parasite and it is distributed over most of the chromosomes. This element is species specific and no cross-hybridization has been observed with the DNA from other related Trypanosomatids including the non-pathogenic T. r,m&iiK Sequence comP.wis&s of the El3 element with the T. cnni eenomic nucleotide sequences reported in the literat& has shown homologies with certain regions of the E13 element, which is in agreement with its association :o dispersed loci of the T. crmi genome. For example, sequence homology has been observed between the El3 element and the 24% rDNA pseudogene’5, with a sequence identity of 77% in the commonly shared region between Positions 1406 and 1527 of the pseudogene. Also, a region of the El3 element has FF sequence identity, with a region located “p~tream ot the acceptor splice site between positions 162 and 246 of the DNA fragment containing the insect stage-spexifx antigen gp72 Se&b. Recent reports from two independent

within the intergenic region of certain gene clusters. Our group has described the presence of an E13related sequence in the integenic region of tandemly repeated T. crlrzi H2A genes”. Inter&tingly, the E13related element is present in only one of the two tandem arrays of these genes. Similarly, Vazquez ef nl.‘8 have also found a 428bp long repeated element in the intergenic Tegion of certain, but not all, of the T. crrrzi P28 genes. The element, named SIRE, shows 95% sequence identity with the first 122bp of the El3 sequence. The repeated element found in the HZA genes is highly conserved, in both length and nuclwtide sequence, with the SIRE repeat”. Two other genomir highly repeated interspersed DNA sequences (called El2 and E22) have also been described in the 7. crrizi penome”‘. The copy number of the El2 and E22 elements is 5600 and 7200, respectively, per genome (Table 1). The sequence analysis of the El2 element shows that it is formed by three subregions with sequence homok gy to other 7. cruzi genomit sea~ence+~ From oosihons 1 to 322 (subreeion E12AI, (he El2 sque& has 81% sequence ideitity with positions 4444-4712 of the genomic clone cootaining the T. cruzi neuraminidase-encoding geneI’. The EIZB. soannine oositions 1020-1123. has 90% sequence i&&y v& the first 104 nucle~tides of the El3 sequencel~. The subregion lrxated between El2A and ElZB possesses a TAA-rich region that represents a microsatellite DNA also found in the intergenic region of the I1sy70 gene tandem arrayz2. The third repeat element (E22) has been isolated and sequenced, and appears to be associated with previously described T. crrui genes U.M. Requena and C. Alonso, unpublished). The significantly different hybridization pattern of the i”. crrczi chromosomal bands, separated by PFGE, probed with either El3, El2 or E22 repetitive elements (Box 1) indicates that the chromosomal distribution of each one of these sequences along the genomc varies substantially. Also, as shown in Box 1, these repetitive interspersed elements are highly polymorphic among T. cnrzi strains. Another element, named RLE (retroposon-like element), has been found in the intergenic region separating the two tandemly linked genes coding for the glycosomal glyceraldehyde3-phosphate dehydrogenase of T. cnrzi23. The RLE sequence is 317bp long, and has the properties of the SINE elements. Truncated forms of this repeated element are also found in the flanking regions of the gene repeft. Southern blot nnalysis using the RLE ‘5 probe revealed 210

a wide distribution of the RLE-related ~eauences alone the T. crrtzi gmome. We found the exist&e of a Ii; the RLE and El3 ited sequence homology between elements in a common T-rich region. Tlrree other types of repetitive sequences, designated spclcer repetitive elements (SRE), have been identified x&:x?. the T. crrrzi ribosomal intergenic spacers2”. The SRE elements are formed by relatively short repeats (43-145bp) &owing nucleotide changes, variabilities inc‘l ding insertions and deletions. The SRE-I element hai a short Arich region at the end of the repeat, a frequent feature of the retroposon-like elements. Since the copy mmher of the SRE rlements is higher than the copy numher of rRNA eenes it is likeiv that SRE elements are also present &side of the rbNA tandem array. Another specific wclear repeated sequence located dispersed ilong the T. cruzi genome has been described by Wincker et nf.25. This element is lo-12kb long and has a co y number of 220. This clone hybridizes with a spa ,tpc transcript which encodes a 3229 residue protein, named DGF-I (Ref. 26). Although the size of this repeated sequence would indicate that it must be classified as a LINE sequence, the deduced protein sequence suggests that the repeat has the characteristics of a gene coding for a cell wrface protein’l. Searching for EIB-containing genomic DNA sequences we have identified that at least one of the DGF genes is associated with the El3 repeated element U.M. Requena and C. AIonso, unpublished). Repetitive sequences and xfmtran:position The evolutionarv origin of the LINE and SINE reelements is u&&in, but it has been suggested that they may correspond to partbl or complete DNA copies of cellular RNA transcripts. Moreover, the LINE and SINE elements have been considered as transposable element+z’. The LINES, also known as non-LTR retrotransposons, have been identified in a wide variety of eukaryotic organisms and may constitute as much as 5% of the genom&“. Most non-LTR retrotransposons display open reading frames encoding enzymes that could be involved in their own transposition”. In the T. cwzi genome, a site-specific non-LTR retrotransposon, named CZAR, has been describedw. Like other sitespwific non-LTR r&otransoosoos of Trvoanosomatid motozoa~~. the CZAR retro~ransposons ‘ire located in’ the miniexon genes of T. cruzi. Recently, OUT group has identified a new T. cruzi non-LTR-rehotrampxon, named LlTca, that may cxplain the postulated link between the

peat

Reviews retrotransposons and the SINE sequences’. This retrotransposon was identified as a consequence of the search for T. crrrzi transcripts containing the El2 repetitive sequencew. The isolated 5kb LlTc cDNA contains, at its 3’ end, a sequence that is 95% homologous to subregion E12A. We also found that the El2 element is associated with several p”ly(A)+ tmnscripts. The copy number of the LlTc non-LTR retrotransposon is 2000, and it is found dispersed in several chromosomes along the T. crrrzi genome”. As the EIZA e!ement has been found associated to non-I.TE retrotranr posons and also in ion-associated forms, we have postulated that the non-associated forms of E12A may have their origin in the LlTc LINE element. The relationship between the SINE and LINE elements has been reported in the related protozoan 7. bnrcc~. Hasan cl RI.” isolated a 511 bp repetitive DNA sequence, named RIME (ribosomal mobile element). within a 3kb rDNA fragment from T. hcri gemme, which shows sequence features similar to t~ansposons. It was afterwards found that particular RIME sequences are also asso&ted with a larger dispersed repetitive element described simultaneously by Kimmel el n/P and Murphy et al.” as Ingi and TRS-I, rcsorctivelv. This reoetitive 5.2 kb element has a cop; “&ber of hbout 46. There are significant houologies of the encoded primary sequences of this clemeniwith reverse trans&iptases. Sequences with homology to reverse transcriptases have been also found in the LlTc from 2. crrrzP. The 5’and 3’ends of the repetitive element from T. brrrcei consisted of two halves of RIME”. Hrwever. in the LlTc element from T. crrtzi, the RIME like sequences are only present in the 5’ region. Functional implications of the repetitive DNA Sequences Although the precise functions of repetitive sequences ifi eukajotic genomes are obscure, the recent descriotion that the interwersed repetitive DNA sequences ‘are also present in’ the proiwyotic gen“mes (for review. see Ref. 36) suggests a universal role for this class of DNA sequences. Postulated functions for the prokaryotic repetitive sequences have induded mles in gene regulation by differential translation of the genetic units within polycistronic operons and in retroregulation by stabilization of active mRNAs. A c”mm”n feature of the genomic organization of the T. cmzi genes is that they are usually repeated and arranged as head-to-tail tandem arrays. Furthermore, gene conversion has been invoked as one of the ways by which sequence identity of tandemly repeated genes could be maintained in ttypanosomes. The findine that T. cnmzf SINE-like elements are located in thzintergenic regions of tandem gene arrays (see above) and also bordering the tandems favours the idea that these elements may be implicated in the Pamrrtolo~~iday, wi 12, na. 7, I996

generation and maintenance of this peculiar gene organization. The repetitive sequences, on the other hand, may also be implicated in the regulation of gene expression. In fact, the RLE element located in the glyceraldehyde+phosphate dehydrogenase gene tandem” and the SIRE element located in rihosomal l’7.p gene tandem’8 are providing the 3’ acceptor site for tmns-splicing. However, since there are also tmnscriptionally active l??.@ loci that lack the SIRE element, it has heen suggested that the different distribution of the SIRE elements along the ribosomal l’Z@ genes can influence either the level of gene expression or the stability of the tmnscripW8. A similar situation has been desaibed for the T. cruri histone WA genes? These genes are clustered in two loci, each one containing several genes organized as tandem arrays. Bath clusters are closely similar in the coding regions but show differences in the intergenic regions, due mainly to the presence, in one of the clusters, of a re petitive sequence related to the SIRE element. The functional implications of this repeated element in the histone Hi% gene expression we currently under study. Another characteristic feature of trypanosome gene organization is the apparent 14. of promoter sequences in the neighbourhood of tw genes. Instead, it is believed that the genes are transcribed as polycistronic precursors from a distant transaiption start point of unknown sbuchual nature. On the Other hand, some of the eukqotic interspersed repetitive DNA elements have ha”scripti”nal promoter activity and their expression is developmentally regolated’~. the RIME sequences p~ese”t i” the active expression site of WG genes, but not in the silent copies, may condition the activation Of potential

It wassuggested that

Reviews -_ expression sites”. Thus, whether the RIME-like sequences present in the T. nwzi Non-LTR retrotmnsRIME elposon LlTc” with homology to the 7. hcd functions as an internal promctor remain as a possibility. If this were the case, the 7‘. crtrzi ‘hidden’ promoters could be located within the highly abundant transposon elements of the parasite. It was interesting to observe, however, that the repetitive clement E12.4 present in the 3’ end of the LlTc rebotmnspoeon functions as the @y(A) addition site. The function of the E12A element may not be restricted to signalling for the poly(A)addition site of the LlTc retrotmnsposon because the element has also been found associated to other genomic locations”‘. We prrr pose that they may influence the stitbility of RNfi.s, as it has been reported for the LINE element preselt in the J’ LJTR (untranslated region) ot the goat cascin E mRNA_“.

ement”

Repetitive DNA sequences as molecular tools The functional or biological significance of most repetitive DNA sequences remains obscure and speculative. Nevertheless, from a practical point of view, certain subsets of repetitive DNA elements can be employed as versatile twls for the diagnosis of pathogens. Most of the highly repetitive sequences isolab-..I to date from the T. tnrzi genome (summarized in Table 1) have been found to be species specific, and the probes derived from these sequences may be used as excellent tools for diagnostic purposes”~‘JJJ. Also, the rapid evolutionary change in the interspersion pattern of the repetitive sequences makes these sequences useful for stmin-classification studies”. Future potential applications of the T. cnrzi repetitive sequences also include their utility in large-scale genomit mapping projects and linkage analysis of genes. Actoally. large genomic fragments can be obtained after their cloning as yeast artificia! chromosomes (YACs)). From hybridization patterns of the repetitive sequences (SINES and LINES) the YACs can be feasibly ordered as collections of overlapping genomic fmgments called ‘contigs’. In fact, repetitive sequences have been used successfully in the generation of the physical mapping of the human genom@ and are actually being employed to construct the physical map of the T. cruzi nuclear genomeQ4”. Finally, taking advantage of the fact that repetitive sequences have been described in the intergenic sequences of some of the T. cruzi tandemly arranged gene clusters (see aboveJ, oligonuclwtides derived from these repetitive elements in I’CK assays could be used for the hol&on of genes associated to these repetitive elements and the definition of site-tagged sequences (ST%).

.-

Focus

Host-like Sequences in the Schistosome Genome K.A. Clough, AC.

Drew and i?J.Brindley

Controversy has arisen over the past few years concerning reports of the presence of host-like sequences in the genomes of Sclrisfosorrw ammorri and S. jnpwicwtt’. The debate centres on whether these host-like sequences have actually integrated into the genome of the parasite and been propagated, or whether they represent spurious host DNA sequences arising as contaminants during the Iabomtory isolation of schistosome nucleic acids. Stimulus for the debate springs from a series of reports by lwamura and co-workers describing host-like DNA sequences in the schistosome genom+-‘. In brief, these workers detected an army of DNA sequences homologous to the mouse intmcistemal A particle and endogenous type C retr+ virus, mouse type 1 and type 2 AIri quences, the mo-2 sequence, the env gene of the mouse ecotropic and xenotropic retroviruses and, most nxently, murine major histocompatibility complex (MHC) class 1 sequences. The sequences have been detected in various life cycle stages including adults, eggs and miracidia of both 5. mnrrsoui and S. jqmnicum, but have not been reported from cercariae. Sbnpeon and Pena’ have

highlighted the importance of these observations; the idea of genetic exchange between parasite and host is particularly interesting in schtstosomtasis because of the role of acquisition of host antigens by schist* some in avoiding immune detedione. Schistosomes acquire a mask of antigens from theii hosts, including MHC products, erythroqte antigens, and immunoelobulins”‘. Simoson and Pena concluded their &IT% in Pnrohdogy T&y’, sceptical about the veracity of these findings, with: ‘we should assume that hidden uncontrolled expertmental flaws rather than DNA transfer is responsible for the observed results’. While investigating the related issue of genomic rearrangements also reported by lwamura and colleaguesZ we encowered several phenomena (outlined below) that we consider can contribute to the resolution of this controversy. Host-like sequences from schistosome DNA First, in an experiment designed to locate telomerelike sequence in schiitosomes, we designed a 3888 fold degenerate 12.nuclwtide primer named tel (sequence given in Pig. 1) based on telomere motifs from other speciesq. Because of its short length and marked degeneracy, we predicted that tel would behave like a random amplified polymorphic DNA (RAPD) prime+’ in the polymerase chain reaction (PCR), priming in both the 5’ and 3’ directions along the schhtosome genome (2.7 x 1Oabp per haploid genome”). However. because of its longer length compared to a convendone, RAPD primer (usually IOnucleotidesL tel was predicted to provide specificity and enrich for t&mere-like repeat sequences. The resulting products visualized in ethidium bromide-stab& gels appeared as smears of DNA ranging from 2OObp to 1.5kb in size for schistosome adults, cercariae and eggs, and up to 4kb for nwuse and human genomic DNA (results not shown). Smean were expwted since tel .could anneal at any point along the t&mere on and ampltfy poducts varying any of the chromommes in size. In addition, t&mere and noninternal telomere sequences might act as annealing sites for tel (Ref. 1::). As a control, amplification from the pIasmid

to