TIBS 16 - SEPTEMBER1991
REVIEWS THE HEI.IX.LOOP-HEI.IX0-1LI-I)domain is a new type of DNA-bindingand dimerization motif that characterizes a rapidly expanding family of proteins found in species ranging from mammals to plants. This motif was initially identified as a region of homology between the c-myc oncogene, the muscle determination gene MyoD' and the Drosophila achaete-scute complex2. More than 20 HLH genes have now been reported (Fig. I; Refs 3,4) and almost every member of this class has been implicated in transcriptional regulation, oncogenesis and/or cell type determination and differentiation. In Drosophila, genetic evidence indicates that genes of the achaete-scute complex are involved in neural differentiation2; twist has a role in germ-layer developmentS; daughterless is involved in both neural and sex determination6; and hairy acts in both segmentation and bristle patterning 7. In vertebrates, MyoD' and several relatives (myogenin, myf-5, MRF-4) play a critical role in myogenesis. The HLH motif is also found in enhancer-binding proteins, namely El2 and E47 (products of the E2A gene; Ref. 8), ITF-I and ITF-29 and TFE3~°, which have all been implicated in regulating the transcription of immunogiobulin genes in B-lymphoid cells. Karyotypic alterations such as chromosomal translocations are a long-recognized feature of neoplasia, particularly of blood cell cancers. In chronic myeloid leukemia, for example, the Philadelphia chromosome translocation between chromosomes 9 and 22 [t(9;22)] results in a bcr-abl fusion gene. This gene generates a hybrid protein that lacks the amino terminus of the c-Abl protein, leading to enhanced tyrosine kinase activity. Recent experiments using a murine model system have demonstrated that this molecular abnormality plays a pivotal role in the development of the unique phenotype of this leukemia. Another example is follicular B-cell lymphoma, which is characterized by a transiocation [t(14;18)] that links the b¢!-2 gene to an active
A new class of DNA-binding proteins with a helix-loop-helix (HLH) structure has recently been described. Many of these transcriptional regulators are known to play a central role in cell-specification and differentiation processes. Four members of the HLH family are now implicated in the development of human lymphoid malignancies as a result of aberrant expression following chromosomal translocation events. This review focuses on two of these family members: SCL and LYL-1.
immunoglobulin gene locus, resulting in overexpression of bcl-2. The c-myc gene can also be overexpressed as a result of chromosomal translocations occurring in B- and T-lymphocyte tumors. Thus, molecular analysis of translocation breakpoints has allowed the identification of genes that are likely to be critical to normal cell development and whose altered expression may lead to tumor cell proliferation. Remarkably, four of the HLH genes are now known to be involved in chromosomal translocations associated with specific malignancies: c-myc, SCL, LYL-I and the E2A genes. The most widely studied translocations are those associated with human Burkitt's lymphomas, in which the c-myc gene is translocated to the immunoglobulin heavy chain locus (Fig. 2). There is now considerable evidence that deregulation of c-myc is a critical component of the pathogenesis of lymphoid tumors n. This article will primarily focus on the SCL and LYL-I genes within the context of the broader HLH family.
is that the size and sequence of the loop is conserved across species for any given member but displays considerable variation between different family members. The HLH domain is usually preceded by a highly basic motif, 10-20 amino acids in length, that determines DNA-blnding specificity ~ (Fig. 3). This bipartite structure (basic region-HLH domain) has been shown to be a critical determinant of the capacity of MyoD to induce myogenesis ~ and of the transforming activity of c-royal3. For many members of this rapidly growing family, the function of the HLH motif remains to be proven, particularly as newer members appear to diverge from the original consensus sequence defining the amphipathic helices. An interesting feature of HLH proteins is their capacity to form either homodimeric or heterodimeric complexes with certain other members of this family, thereby modulating their DNA-binding capacity. It has been suggested that ubiquitously expressed HLH proteins dimerize with tissue-specific HLH proteins to form a complex that The HLHmotif binds with higher affinity to the tissueThe HLH domain spans approxi- specific target sequence ~2.This is based mately 60 amino acids and comprises on the finding that the HLH domain two predicted amphipathic a-helices can mediate heterodimer formation beJ. Visvaderand C. G. Begleyare at the Walter separated by an intervening loop 8. This tween either daughterless, El2 or E47 structure has been demonstrated to (ubiquitous expression) and achaeteand ElizaHall In3tituteof MedicalResearch, Post Office RoyalMelbourneHospital, mediate protein dimerization 8a2. One scute 7"3 or MyoD (tissue-specific Parkville,Victoria3050, Australia. notable characteristic of the HLH family expression) x2. An analogous pattern of 330 © 1991.ElsevierSciencePublishers, (UK)0376-5067/91/$02.00
TIBS 1 6 -
SEPTEMBER 1991
specialized protein-protein oligomerization is also a feature of the 'leucine zipper' transcriptional regulators, which contain a dimerization domain (leucine repeat structure) adjacent to the DNA-binding motif (basic region) ~4. A feature common to MyoD, the immunoglobulin enhancer-binding proteins, c-Myc, Max (the binding partner of c-Myc), the yeast centromere-binding factor (CBF), USF and AP-4 is the ability to recognize a conserved core DNA sequence, CA- -TG, referred to as the 'Ebox' elemenP. The E-boxes were first identified as in vivo protein-binding sites within immunoglobulin gene loci. In addition to the immunoglobulin heavy chain and light chain enhancers, E-boxes have been identified in, for example, the muscle creatine kinase promoter and the insulin gene enhancer. Recently, a novel subset of HLH proteins that lack the adjacent basic motif have been described, ld3 and HLH 462~5 appear to be negative regulators of HLH DNA-binding proteins and ld has been shown to inhibit MyoD.dependent gene activation 3. Furthermore, during neurogenesis in Drosophila, achaete-scute activity is regulated by at least 2 HLH proteins with negative regulatory roles: extramachrocaetae (emc) 16 and hairy7. The emc gene lacks the basic region, while the presence of a proline residue within the hairy basic region probably interferes with DNA-binding activity. The combinatorial action of HLH proteins, due to their ability to heterodimerize and the existence of both positive and negative components, contributes one more level of complexity to transcriptional control networks within the cell.
TheSCLgene SCL was first identified through its involvement in a translocation between chromosomes I and 14 [t(l;14)(p33;qll)], in a case of T-cell acute lymphoblastic leukemia (T-ALL)~7. Intriguingly, the leukemic cells of the patient differentiated into myeloid cells following treatment with 2'-deoxycoformycin. This phenotype was recapitulated by a cell line established from this patient that carried the (I;14) translocation and indicates that the leukemia may have arisen in a multipotential hemopoietic precursor cell ~8. Molecular analysis revealed that the SCL (stem cell leukemia) gene was translocated into the T-cell antigen receptor (TCR) 5 locus ~8. The translocation
@e
~
o@
•
•
•
•
@•
• •
••
Lyl-1
ARRVFTNSRERWRQQNVNGAFAELRKLLPTHP. . . . . . . . . . . . . .
PDRKLSKNEVLRLAMKY IGFLVRL
SCL
VRR IFTNSRERWRQQNVNGAFAELRKL
PDKKLSKNE ILRLAMKY INFLAKL
El2
ERRVANNARERLRVRD
c-myc
VKRRTHNVLERQRRNELKRSFFALRDQI
IPTHP ..............
IN E A F K E L G R M C O L H L N S E K P
.............
QTKLLI LHOAVSVI LNLEQQ
P ..... ELEN ......... NEKAPKWI
LKKATAY ILSVQAE
L-myc
TKRKNHNFLERKRRNDLRSRF
N-myc
ERRRNHN ILERQRRNDLRS SFLTLRDHVP ..... ELVK ......... N E ~ I
MAX
DKRAHHNALERKR~H
MyoD
DRRKAATMRERRRLSKVNEAFETLKRCTSSNP
twist
NQRVNANVRERQRTQSLNDAFKSLQQ
da
ERRQANNARER
AC-S T4
QRR---NARERNRVKQVNNSFARLRQH
Lc
TGTKNHVMSERKRREKLNER~'LVLKS
CBF-1
ORKDSHKEVERRRREN
INTAINVLSDLIT
AP-4
IRRE IANSNERRRMQS
INAGFOSLKTL
USF
KRRAQHNEVERRRRDK
INNN IVOLSKI IP .... DCSMESTKSGQS
hairy
DRRSNKP IMEKRRRAR INNCLNELKTL
Id
LPALLDEQQVNVLLYDMNGCYSRLKELVP
.... TLP .......... QNRKVSKVE I LQHV IDY IRDLQLE
emc
R IQ R H P T H R G D G E N A E M K M Y L S K L K D L V P
.... FMP .......... KNRKLTKLEI IQHVIDYICDLQTE
E|spl)m8
YQKVKKP MLERQRRARMNKCLDNLKTLVA
.... ELRGDDG ....... ILRI~KAEMLESAVI
LALRDQVP ..... TLAS ......... CSKAPKWI
IKDSFHSLRDSVP
LKKATEYVI4SI~AE
..... SLOG .......... QKASRAQ ILDKATEY IQYMRRK ...............
NQRLPKVE ILRNAIRY IEGLQAL
I IP . . . . . T L . . . . . . . . . . P S D K L S K I ( } T L K L & T R Y I D F L C R M "
IR I R D I N E A L K E L G R M C M T H L K S D K P
[4"//////'///////////h.I BASIC REGION
LSKALEYLQALVGA
.............
Q T K L G I L N M A V E V I MTLE(]Q
IPQS I ITDLTKG--G-GRGPHKKISKVDTLR L L P S IH R . . . . . . . . . . . . . . . . .
IAVEY IRSLQDL
VNKAS ILAETIAYLKEI~RR
....... VR .......... E-SSKAAI LARAAEY IQKLKET
IP ...... HTDG ......... EKLSKAAI LQQTAE¥ I FSLEQE ....... KGGI LSKACD¥ IQELRQS
IL .... DATKKDPA ..... RHSKLEKADI LEKTVKHLQELQRO
I~..LIX I
LOOP
FMRQQKTP
HELIX I I
Rgum 1 Comparison of the protein sequences of representative HLH proteins. The most highly conserved amino acids are indicated (e) above the sequence. The HLH (dimerization) and upstream basic (DNA binding) motifs are shown. Note that the basic domain is disrupted or absent in hairy, Id, emc and the enhancer of split M8 [E(spl)MB] proteins. The HLH family includes products of the achaete-scute gene complex (AC-S T4), twist, daughterless (da), hairy and extrarnachrocaetae(emc) genes of Drosophila; MyoD (human sequence is given) and related gene products involved in muscle determination; the yeast centromere binding factor CBF1; the enhancer binding transcription factors E12 and AP-4 (human sequences given); the maize pigment Lc; the adenovirus major late transcription factor USF; proteins encoded by the myc proto-oncogene family (human) and the protein Max that complexes with them. As TIBS editorial policy does not allow full citation of references, please see Refs 3,4 for a more comprehensive listing of original articles.
was associated with a site-specific rearrangement of the TCR diversity D~2 and D83 genes in an event that strongly implicated the DNA recombinases, which normally carry out T-cell receptor rearrangement, in the translocation mechanism. The translocation occurred in the 3' untranslated region (UTR) of SCL and therefore did not alter the coding region. As a result, SCL was highly expressed in the leukemic cells, although it is not normally detected in T-cell populations. Subsequently, six (I;14) translocations involving SCL19'2°, also referred to as TAL (T-cell acute leukemia) s°, have been characterized. Although described as disrupting the SCL coding region ~°, further characterization of the SCL gene has shown that these translocation breakpoints all occur in the 5' UTR and therefore leave the coding region intact 2~. The t(l;14)(p33;qll) translocation is evident in 3% of patients with T-ALL
In addition to chromosomal translocation events, a second mechanism of dysregulation of SCL in T-cell leukemia has been observed and T-cell DNA recombinases are again implicated in its generation n~. A site-specific recombination on chromosome 1 deletes approximately 90 kb of DNA encompassing the SCL Interrupting Locus (SIL) transcription unit 23and occurs in up to 25% of patients with T-ALL The deletion was reported to disrupt the SCL coding region22; however, on the basis of complete SCL sequence data it appears that the 3' deletion junction occurs within the 5' UTR in a manner analogous to the translocations in the 5' UTR of SCL2]'~. Thus the SCL gene is a target for rearrangement and translocation events in T lymphocytes apparently mediated by DNA recombinases and is clearly implicated in the pathogenesis of T-ALL
331
TIBS 1 6 -
SEPTEMBER 1991
3' UTR of the mouse gene. In both mouse and human there are two major SCL transcripts that differ in the 3' UTR, 1 presumably as a result of usage of alter,r~ r~ II III I~ m m nate polyadenylation signals. There is also heterogeneity in the 5' UTR of the SCL gene. This has been characterized in detail for the human human SCL (chr.1p33) TCR O-chain locus (chro14q11) gone, for which five exons representing 5' untranslated sequence and two disla Ib lib Ilalll IV V VI : :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: tinct transcription initiation sites have been identifiedZL At least six transcripts Rgum 2 have been defined that differ as a result Location of the major translocation breakpoints associated with the c-mycgene in Burkitt's of alternative exon usage within the lymphoma and the SCL gene in T-ALL In t(8;14) translocations, the immunoglobulin heavy 5' UTR. Analysis of the presumed prochain locus (chromosome 14q32) is juxtaposed to the c-myc gene (chromosome 8q24), moter region(s) upstream of the traneither far 5' of the gene or within the gene (5' region, first exon or intron) at the regions scription initiation sites revealed potenindicated by arrows. In each of these regions of the c-myc gene, a number of different tial SP-1, AP-1 and GATA-I recognition breakpoints have been identified (clustering of breakpoints is represented by the brackets). In the t(1;14) translocations, the T-cell receptor (TCR) 8 segment (chromosome sites. The 5' UTR lies in a CpG island 14q11) is linked to the SCL gene (chromosome lp33) at the sites indicated by arrows. that is methylated in the promyeloExons are indicated as boxes, with coding regions solid and untranslated regions hatched. monocytic line HL60, which does not The translocation breakpoints involving the LYL-1 gene are not shown here, as the comexpress SCL, but is unmethylated in the plete structure of the LYL-1gene is not yet known. erythroleukemia line K562, which does express SCL. Structure and expression of the SCLgene the human protein. The HLH motif and The SCL gene displays an interesting The human SCL locus is composed of preceding basic region are identical in pattern of expression within the eight exons spanning 15 kb of genomic the two species. Furthermore, a proline- hemopoietic system: SCL mRNA (4.7 sequenceZL The gene has been mapped rich region conserved in the amino-ter- and 3.0 kb) is abundantly expressed in to human chromosome 1p33 and to the minal half of the mouse and human pro- erythroid, mast and megakaryotic cell mid-portion of mouse chromosome 4 in teins may reflect a transactivation lines whereas little or no expression a region syntenic with human 1p3324. domain similar to that seen in other is observed amongst macrophage, BCharacterization of full-length cDNA transcription factors. Both the human and T-lymphoid cell lines zs,2~. This patclones of human and mouse SCL and mouse genes have a long 3' UTR tern is very similar to that of GATA-I sequences 24 revealed that they are (3.5 kb in the human and 3.0kb in (Eryf-l, NF-EI, GF-I), an erythroid tranclosely related, with the predicted pro- mouse) that is relatively AT-rich and scription factor also expressed in the teins having 94% homology. The long displays 58% nucleotide identity overall. mast and megakaryocytic cell lineages ~T. open reading frame begins in exon 4 of Although the motif TTATTTAT that is Interestingly, SC1, mRNA levels increase the human SCL gone. The predicted associated with mRNA instability dramatically during DMSO-induced difmouse SCL protein (approximately occurs several times in the human ferentiation of erthroleukemla cells 2s 36 kDa) is two amino acids shorter than 3' UTR sequence, it is not present in the and are markedly enhanced in normal erythroid tissue during the regeneration phase following chemically Positive Negative induced hemolysis~s. These findings indicate that SCL may have a regulatory role within the erythroid lineage. IgH locus (chr.14q32)
human c-myc (chr.8q24)
Mn
m
RUI
111
•
i/
The LYL.lgene basle
helx loop helix
SPA. lyH E2A maxmyCfamily MyoO family twlst daughterloss achaete-scuta
hellx loop heUx
Id emo
helix loop helix E(Spl) hairy
Rgure 3 Schematic diagram of positive and negative regulators in the helix-loop-helix gene family. Positive regulators (activators) contain a basic motif that specifies DNA binding, adjacent to the HLH domain, while negative regulators (inhibitors) either have an interrupted basic motif or lack one altogether. 332
The HLH gone LYL-I has a number of striking similarities to SCL. LYL-I was also identified because of its involvement in a chromosomal translocation in T-ALL2". The t(7;19) translocation occurred between the TCR ~ locus and the first intron of the human LYL-I gone. The resulting truncated transcripts of LYL-I lack a potential CTG initiation codon found in exon 1 of human LYL-1. However, the CTG codon is not conserved in the murine LYL gone2s, suggesting that translation normally initiates at the first ATG codon in exon 2. Thus, it is unlikely that the translocation breakpoint alters the coding potential of the LYL-I gene.
TIBS 16 - SEPTEMBER 1991
Characterization of murine L YL-I cDNA clones revealed alternative 5' untranslated sequences and differential splicing within the 5' portion of the coding region that may produce a LYL-I polypeptide lacking an amino-terminal segment25. Comparison of the predicted mouse and human LYL-I amino acid sequences revealed 77% homology25. The predicted mouse LYL-I protein (- 30 kDa) is two amino acids shorter than the human protein. Their HLH motif differs by only a single residue, whereas the more carboxy-terminal region is quite divergent (57% homology). Conservation of the amino-terminal segment may reflect the presence of a transactivation domain, as observed for SCL. One notable feature of the LYL-I protein is the unusual percentage of proline (17%) and basic amino acids (17%). Mouse SCL and LYL-I proteins display 84% amino acid homology in the HLH and basic domains with the majority of differences representing conservative amino acid substitutions. In addition they share a similar intron--exon structure suggesting that they may have evolved from a common ancestral geneu. The LYL-I gene is expressed in most myeloid, erythroid and B lymphocyte cell lines and displays two alternative size classes of transcripts, the smaller size class (1.5-1.8 kb) being typical of the erythroid lineage and the larger class (2.0-2.3 kb) of the B-cell lineage. These two size classes were found to differ in the 5' untranslated region. Thus, expression of the LYL.I gene appears to be differentially regulated in different hemopoietic cell types 25. As observed for SCL, there is low or undetectable expression of LYL-I in most T lymphoid cell sources. These observations are consistent with the view that translocation of SCL or LYL-I in human T-cell leukemias alters their normal regulation and thereby contributes to neoplasia. In contrast to the hemopoietic compartment, the level of LYL-I and SCL mRNA is low or undetectable in the non-hemopoietic tissues and cell lines examined 25. Thus these proteins may function primarily within the hemopoietic system.
Otherhelix-loop-helixproteinsand human leukemlas it has been known for several years that the c-myc oncogene is frequently translocated in Burkitt's lymphoma and other types of B-cell neoplasia to a region near the heavy or light chain
immunoglobulin regulatory sequences. These translocations do not alter the coding potential of the c-myc gene but lead to deregulation of c-myc expressional There is compelling evidence for the involvement of c-myc not only in neoplasia but also in the control of cell proliferation, mitogenesis and differentiation. Intriguingly, another HLH gene has recently been implicated in leukemogene.sis. The E2A gene was originally identified by its binding to the KE2 site in the immunoglobulin gene enhancer and encodes the related transcription factors El2 and E478. The E2A DNA-binding factors appear to be expressed ubiquitously. This gepe has been identified as one partner in the t(l;19)(q23;p13.3) translocation found in 30% of pediatric pre-B ALL. This translocation disrupts the coding region of the E2A gene and links the 5' coding portion of the gene on chromosome 19 to a divergent homeobox gene present on chromosome I, pbxl (Refs 29,30). Therefore, deregulated expression of HLH genes (SCL, LYL, c-myc) or the formation of a fusion polypeptide (E2A-pbxl) may contribute to the pathogenesis of leukemia. Thus, in summary, analysis of chromosomal translocations associated with human tumors has clearly implicated four members of the HLH family of DNA-binding factors (SCL, LYL,c-Myc and E2A) in neoplastic transformation. The normal function of these proteins in the control of proliferation and differentiation awaits further analysis. It will therefore be of particular interest to determine the target DNA sequences of the LYL and SCL proteins and to identify their partners that may be important for regulating the function of these two HLH proteins during hemopoiesis.
Acknowledgements We wish to thank J. Adams for invaluable discussions and R. Harvey for his help with Fig. 3. This work was funded in part by grants from The Victorian
Health Promotion Foundation, the Anticancer Council of Australia and the National Health and Medical Research Council Canberra. J.V. is a recipient of an Australian Research Council Fellowship.
References 1 Weintraub, H. et al. (1991) Science 251, 761-766 2 Villares, R. and Cabrera, C. V. (1987) Cell 50, 415-424 3 Benezra, R. et al. (1990) Cell ~ , 49-59 4 Blackwood, E. M. and Eisenr an, n. N ii ,~1) Sc/ence 251, 1211-1217 5 Thisse, B., Stoetzel, C., Gol:; ~tiza, C. P.lO Perrin-Schmitt, F. (1988) EF-~5, .;. i, 2175-2183 6 Caudy, M. et aL (1988) Cell 55, 1061-1067 7 Rushlow, C. A. et al. (1989) EMBOJ. 8, 3095-3103 8 Murre, C., McCaw, P. and Baltimore, D. (1989) Cell 56, 777-783 9 Henthom, P., Kiledjian, M. and Kadesch, T. (1990) Science 2, 467-470 10 Beckmann, H., Su, L. and Kadesch, T. (1990) Genes Dev. 4,167-179 11 Cory, S. (1986) Adv. Cancer Res. 47, 189-234 12 Murre, C. et aL (1989) Cell 58, 537-544 13 Stone, J. et aL (1987) MoL Cell. BioL 7, 1697-1709 14 Busch, S. J. and SassoneCorsi, P. (1990) Trends Genet. 6, 36-40 15 Christy, B. A. et al. (1991) Proc. Natl Acad. Sci. USA 88, 1815-1819 16 Ellis, H. M., Spann, D. R. and Posakony, J. W. (1990) Cell 61, 27-38 17 Begley, C. G. et at. (1989) Proc. Natl Acad. ScL USA 86,10128-10132 18 Begley, C. G. et aL (1989) Prec. Natl Acad. Sci. USA 86, 2031-2035 19 Bemard, O. et al. (1990) Genes Chrorn. Cancer 1, 194-208 20 Chen, Q. et aL (1990) J. Exp. Med. 172, 1403-1408 21 Aplan, P. D. et al. (1990) MoL Cell. BioL 10, 642.3-6435 .?2 ~.~.,wn,L. et al. (1990) EMBOJ. 9, 3343-3351 23 Aplan, P. D. et al. (1990) Science 250, 1426-1429 24 Begley, C. G. et aL (1991) Proc. Nat/Acad. Sci. USA 88, 869-873 25 Visvader, J., Begley, C. G. and Adams, J. M. (1991) Oncogene6, 187-194 26 Green, A. R., Salvaris, E. and Begley, C. G. (1991) Oncogene 6, 475-479 27 Martin, D. I. K., Tsai, S. and Orkin, S. H. (1989) Nature 338, 435-438 28 Mellentin, J. D., Smith, S. D. and Cleary, M. L. (1989) Cell 58, 77-83 29 Nourse, J. et aL (1990) Cell 60, 535-545 30 Kamps, M. P., Murre, C., Sun, X. H. and Baltimore, D. (1990) Cell 60, 547-555
STUDENTS Did you know that you are entitled to a discount on a subscription to TIBS? See page 345 for details. , .
333