Biochimica et Biophysica Acta 1396 Ž1998. 132–137
Short sequence-paper
Isolation and expression of a human SRY-related cDNA, hSOX20 Yoshiki Hiraoka
a,)
1
, Motoyuki Ogawa a , Yukinao Sakai a , Koji Taniguchi a , Takuma Fujii Akihiro Umezawa c , Jun-ichi Hata c , Sadakazu Aiso a
b,2
,
a
b
Department of Anatomy, Keio UniÕersity School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160, Japan Department Obstetrics and Gynecology, Keio UniÕersity School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160, Japan c Department of Pathology, Keio UniÕersity School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160, Japan Received 5 September 1997; revised 9 October 1997; accepted 13 October 1997
Abstract SOX is a family of genes related to the testis-determining gene, SRY. We have isolated and sequenced an hSOX20 cDNA from a cell line of human embryonic carcinoma. This cDNA contains an open reading frame ŽORF. encoding 233 amino acids. The protein encompasses an SRY-type HMG box exhibiting strong homologies to those of mouse Sox15 and Sox16. Various adult and fetal tissues were tested for hSOX20 mRNA by Northern analysis. Its expression is restricted to the fetal testis and the size of the transcript is 1.5 knt. Electrophoretic mobility shift assay indicated that recombinant hSOX20 polypeptide is capable of binding to AACAAT sequence. q 1998 Elsevier Science B.V. Keywords: DNA binding; Embryonic carcinoma; Fetal testis; HMG box; SOX
SOX is a novel family of genes related to SRY, the testis determining gene w1–5x. The SOX genes encode transcriptional factors: They contain a characteristic DNA-binding motif known as the SRY-type HMG
Abbreviations: aa, amino acidŽs.; EMSA, electrophoretic mobility shift assay; GST, glutathione-S-transferase; knt, kilo nucleotideŽs. or 1000 nt; nt, nucleotideŽs.; ORF, open reading frame; RTrPCR, reverse transcriptionrpolymerase chain reaction; SOX, SRY-related proteinŽs.; SOX, gene ŽDNA, RNA. encoding SOX; SRY, sex-determining region Y gene; SRY, protein encoded by SRY ) Corresponding author. Fax: q81 3 53791977. 1 The nucleotide sequence in this paper has been submitted to the DDBJ, EMBL and GenBank databases under the accession number AB006867. 2 Present address: Department of Internal Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA.
box w1,4x. The HMG domain interacts with DNAs in a sequence-specific manner w6–14x and the binding induces sharp bend on the target DNA w7,8x. The SOX genes are involved in developmental decisions w15–25x. Sox4 is expressed in T and pre-B lymphocytes w10x. A recent study has revealed that Sox4 participates in cardiac development and lymphocyte differentiation w23x. It has been shown that mutations in the human SOX9 gene cause campomelic dysplasia and autosomal sex reversal w18,19x. SOX9 is assumed to play important parts in both gonadal development and chondrogenesis. Previously we have identified five distinct Sox cDNAs from Xenopus laeÕis w12–14,26x. The Xenopus Sox genes are involved in frog oogenesis andror early development. In this study, we have isolated and characterized a cDNA encoding human hSOX20 protein. To identify new SOX genes, we constructed a l phage library of cDNA derived from a cell line of
0167-4781r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 7 - 4 7 8 1 Ž 9 7 . 0 0 1 8 6 - 3
Y. Hiraoka et al.r Biochimica et Biophysica Acta 1396 (1998) 132–137
human embryonic carcinoma, NCR-G3 w27x, and screened it with a mixture of partial cDNAs encoding the SRY-type HMG boxes of hSOX2, hSOX5 and hSOX8 as probes. The probe fragments were prepared by essentially the same method with RTrPCR as previously used to clone partial SOX genes w3,9x. The screening yielded 67 positive clones. From the 67 cDNAs, we selected four cDNA clones, ECZ204, ECZ254, ECZ259 and ECZ260 which gave positive hybridization signals to the hSOX8 probe, because the published hSOX8 sequence is merely partial Ž 54 aa. and the sequences flanking the HMG box still
133
remain unknown w3x. We sequenced the longest ŽECZ260; 1.4 knt. of the four cDNAs. Fig. 1 displays nt sequence of the cDNA and its deduced aa sequence. An ORF of the cDNA encodes a protein of 233 aa containing the SRY-type HMG box. Unexpectedly, the HMG box sequence of ECZ260 is different from that of hSOX8 reported previously w3x. The cDNA showed 61% homology to the partial hSOX8 at the amino acid level and they are assumed to be cross-hybridizable to each other. Next we compared the HMG box of ECZ260 with those of known SOX proteins. The HMG box se-
Fig. 1. Nucleotide sequence of hSOX20 cDNA. Nucleotides are numbered at right. The deduced aa sequence is shown above the nt sequence. The numbers of aa are indicated in parentheses at right. An in-frame stop codon is indicated by an asterisk. The SRY-related HMG box is underlined. While the canonical polyŽA. signal, AATAAA, is missing in this cDNA, a stretch of polyŽA. sequence was attached to its 3X end. The nucleotide sequence Žnt 495–1027. encoding the first 178 aa of hSOX20 is completely identical between our cDNA in this study and the genomic clone reported in w28x. It should be noted that the serine codons at aa 25 ŽTCA., aa 26 ŽTCT. and aa 27 ŽTCG. have been all assigned incorrectly to alanine residue in the published hSOX20 gene sequence.
134
Y. Hiraoka et al.r Biochimica et Biophysica Acta 1396 (1998) 132–137
Y. Hiraoka et al.r Biochimica et Biophysica Acta 1396 (1998) 132–137
quence of ECZ260 was found to be identical to that of human SOX20 documented previously w28x. In this earlier work, however, the authors have shown only a partial sequence Ž765 nt. for the human SOX20 Ž hSOX20 . gene. Whereas a portion Žnt 493–1027. of the hSOX20 cDNA is 100% identical to the first 536 nt of the reported sequence for the hSOX20 gene, the sequences flanking the portion of the cDNA are missing in the genomic clone Ž Fig. 2Ž a.. . Comparison between the sequences of the cDNA and the genomic clone suggests that the hSOX20 gene is split by at least one intron Ž Fig. 2Ž a.. . Although only an incomplete aa sequence Ž178 aa. for hSOX20 has been deduced with the genomic clone in the earlier work w28x, the entire primary structure of hSOX20 was revealed here with the cDNA isolated in this study. Known SOX HMG box sequences have been categorized into distinct six groups named type A to F w29x. When compared with known SOX HMG boxes, the hSOX20 HMG box is unassignable to any subgroups of the six Ž Fig. 2Žb. and Žc... After the report on sequence taxonomy of SOX HMG box w29x, at least nine genes of human andror mouse, including hSOX20, have been newly identified as members of the SOX family to date w30–34x. Using up-to-date sequence data of SOX proteins Ž Fig. 2Ž b.. , we constructed a homology matrix of SOX HMG sequences ŽFig. 2Žc.. . Taking 80% homology as a cut-off value, we can categorize all the SOX HMG box sequences listed in Fig. 2Ž b. into seven subgroups. In Fig. 2Ž c. , aa identities higher than 80% are mapped diagonally on seven distinct areas in the matrix. Of the unassigned members of SOX HMG boxes, mSox19 and
135
hSOX22 are newly categorized in group C. Mouse Sox21 belongs to group E. Group F contains mSox17 and mSox18. Although human SOX20 and mouse Sox15 and Sox16 are unassignable to any of the previously proposed six subgroups named A to F w29x, they share the highly conserved amino acids in the HMG region and form a new subgroup distinct from the six. The seventh subgroup might be named group G. The human gene, hSOX12 w35x, is also assigned as a member of the new group, G ŽFig. 2Žc... It should be noted that hSOX12 is not the human orthologue of mouse Sox12 Ž mSox12 . although the genes from the two species have the same number. We next compared the sequences flanking the HMG domain of hSOX20 with those of known SOX proteins. While hSOX20 does not show any significant homology to known SOX proteins in the N-terminal region, its C-terminal domain is highly homologous to that of mSox15 Ž Fig. 2Ž d.. . The Cterminal domain of hSOX20 is the same in length as that of mSox15 and they share 74% amino acids in the C-terminal region. Thus, together with the sequence similarity in the HMG box, the high degree of sequence conservation in their C-terminal domains suggests close evolutionary relationship between the two SOX proteins. The cDNA for mSox15 whose sequence has been deposited in GenBank Ž accession no. X98369. is a partial clone and the N-terminal sequence of mSox15 is not available at the present time. Tissue-specific expression of hSOX20 was investigated by Northern analysis. The hSOX20 transcript was detected to be 1.5 knt in length Ž Fig. 3. . The
Fig. 2. Comparison of hSOX20 with known members of the SOX family. Ža. Schematic illustration of the hSOX20 cDNA and gene. The upper illustration represents the genomic DNA for hSOX20 reported earlier w28x; the lower is the hSOX20 cDNA in this study. The thick and thin lines denote the coding and non-coding regions, respectively. A presumptive intron sequence in the genomic clone is represented by a wavy line. The wedge-shaped line indicates a boundary between the exons. Nucleotide numbers at the 3X and 5X ends, and the boundaries are indicated above or under the lines and the corresponding amino acid numbers are shown in parentheses. Žb. Sequence alignment of the SOX HMG box. Most of the sequences of mouse Sox ŽmSox. and human SOX ŽhSOX. HMG boxes shown here have been published w1,3,9,10,18,19,28–38x. The HMG box sequences of mSox16 and mSox19, and the sequence of mSox15 in parenthesis have been deposited in GenBank under the accession numbers L29084, X98368 and X98369, respectively. The aa sequence for mSox7 was deduced from a partial cDNA Žunpublished result by Y. Hiraoka and S. Kido.. A shorter aa sequence of mSox7 HMG box has been reported elsewhere w3x. The HMG boxes have been categorized into six subgroups from A to F w29x. The seventh subgroup, G, was newly defined here in this study. Žc. Homology matrix of the SOX HMG box. Percentage of aa residues identical between members of the SOX family was calculated based on the sequence alignment shown in Žb.. Values higher than 80% are highlighted. The diagonal matrix elements corresponding to the self-comparison are marked with dark boxes without showing the value Ž100%.. Žd. Sequence comparison of the C-terminal domains of hSOX20 and mSox15. A partial cDNA encoding the C-terminal amino acids of mSox15 has been deposited in GenBank Žaccession no. X98369.. Amino acid numbers are shown at right. Amino acid residues shared by the two proteins are indicated by highlighted letters.
136
Y. Hiraoka et al.r Biochimica et Biophysica Acta 1396 (1998) 132–137
Fig. 3. Tissue-specific expression of hSOX20. PolyŽA. RNAs Ž3 mg. from adult and fetal human tissues were subjected to Northern analysis for hSOX20. The fetal RNAs tested were prepared from a human fetus of 20 weeks of gestation. Procedures for Northern analysis were the same as described elsewhere w26x. A fragment of 617 nt ŽFig. 1; nt 1–617. derived from the cDNA was used as a probe. The same RNA blot was rehybridized with a control probe derived from the human b-actin gene ŽACT..
cDNA is 1.4 knt-long and therefore the cDNA covers nearly full length of the hSOX20 mRNA. As seen in Fig. 3, the expression of hSOX20 is restricted to the fetal testis and the transcript is not detectable in the adult testis. The human SOX20 gene can be assumed to be involved in testis development. All SOX proteins including SRY have been believed to bind to the same sequence, AACAAT w6– 14,16x. However, there is no experimental evidence suggesting that the HMG domains belonging to the new subgroup, G, are capable of binding to the same target nucleotide sequence. We, therefore, tested whether or not hSOX20 recognized the sequence. Recombinant hSOX20 protein was expressed as GST fusion in E. coli. The binding of the fusion protein to the target nucleotides ŽFig. 4Ža.. was assayed by EMSA. As shown in Fig. 4Žb., GST-hSOX20 bound to an oligonucleotide ŽWT. containing AACAAT while non-fused GST did not bind to the sequence. The specificity of the interaction was assessed by EMSA with a series of oligonucleotide probes including WT and its variants Ž Fig. 4Ža. and Žc.. . Probe WT, containing the canonical AACAAT sequence, exhibited the strongest binding to the protein. Although probe MTALL did not bind, mutant probes MT1 and MT3 were also recognized by hSOX20. A faint retarded band was seen with MT6 probe, the rest of the mutant probes were not recognizable by the protein. Consequently, hSOX20 interacts specifically with the consensus sequence AACAAT in vitro.
Fig. 4. EMSAs of oligonucleotide binding by GST-hSOX20. Nucleotides 630-872 from the hSOX20 cDNA, encoding aa 46–126, were amplified by PCR using primers, 5X-AAGGATCCTGGAGAAGGTGAAGCGGC-3X and 5X-AAGAATTCTAGCTCTTGGCCTTGCGCCGA-3X. The PCR product was digested with BamHI and EcoRI, and was subcloned into the pGEX-3X expression vector ŽPharmacia.. GST-hSOX20 and non-fused GST proteins expressed in E. coli were purified by affinity chromatography on glutathione-Sepharose 4B. Ža. DNA sequences of EMSA probes are aligned. Nucleotides flanking the mutated position in each probe are identical to those of WT probe and are indicated by hyphens. Names of probes are shown at right. Žb. Purified GST Ž1 mg; lane 2. or GST-hSOX20 Ž1 mg; lane 3, 4. was pre-incubated in 10 m l of binding buffer Ž10 mM Hepes, pH 7.1r60 mM KClr1 mM EDTAr1 mM DTTr12% glycerol. with 10 fmol Žapprox. 20 000 cpm. of 32 P-labeled oligonucleotide probe WT for 15 min at room temperature. The reaction for lane 1 was performed in the absence of protein. Unlabeled competitor oligonucleotide WT Ž1000 times excess over the labeled probe. was included in the control reaction Žlane 4.. The reactions were electrophoresed on a 5% polyacrylamide gel in 0.5=TBE buffer at room temperature. Žc. Specificity of sequence recognized by hSOX20 was assessed by EMSA using a series of probes shown in Ža.. Concentrations of GST-hSOX20 and 32 P-labeled probes are the same as those in the experiment shown in Žb.. The probes used are indicated above the respective lanes.
Y. Hiraoka et al.r Biochimica et Biophysica Acta 1396 (1998) 132–137
This work was supported in part by a National Grant-in-Aid for the Establishment of High-Tech Research Center in a Private University from Minstry of Education, Science and Culture to S. A. and Grants-in Aids from Ministry of Education, Science and Culture to S. A. and Y. H., and by Keio Gakuji Fukuzawa Memorial Funds for the Advancement of Education and Research and Keio Gakuji Academic Developmental Funds from Keio University to S.A. and Y.H.
w19x
w20x w21x w22x
References w1x A.H. Sinclair, P. Berta, M.S. Palmer, J.R. Hawkins, B.L. Griffiths, M.J. Smith, J.W. Foster, A.-M. Frischauf, R. Lovell-Badge, P.N. Goodfellow, Nature 346 Ž1990. 240– 244. w2x J. Gubbay, J. Collignon, P. Koopman, B. Capel, A. Economou, A. Munsterberg, N. Vivian, P. Goodfellow, R. ¨ Lovell-Badge, Nature 346 Ž1990. 245–250. w3x P. Denny, S. Swift, N. Brand, N. Dabhade, P. Barton, A. Ashworth, Nucleic Acids Res. 20 Ž1992. 2887. w4x V. Laudet, D. Stehelin, H. Clevers, Nucleic Acids Res. 21 Ž1993. 2493–2501. w5x M. Stevanovic, R. Lovell-Badge, J. Collignon, P.N. Goodfellow, Hum. Mol. Genet. 2 Ž1993. 2013–2018. w6x V.R. Harley, D.I. Jackson, P.J. Hextall, J.R. Hawkins, G.D. Berkovitz, S. Sockanathan, R. Lovell-Badge, P.N. Goodfellow, Science 255 Ž1992. 453–456. w7x K. Giese, J. Cox, R. Grosschedl, Cell 69 Ž1992. 185–195. w8x S. Ferrari, V.R. Harley, A. Pontiggia, P.N. Goodfellow, R. Lovell-Badge, M.E. Bianchi, EMBO J. 11 Ž1992. 4497– 4506. w9x P. Denny, S. Swift, F. Connor, A. Ashworth, EMBO J. 11 Ž1992. 3705–3712. w10x M.v.d. Wetering, M. Oosterwegel, K.v. Norren, H. Clevers, EMBO J. 12 Ž1993. 3847–3854. w11x C.-Y. King, M.A. Weiss, Proc. Natl. Acad. Sci. U.S.A. 90 Ž1993. 11990–11994. w12x M. Shiozawa, Y. Hiraoka, N. Komatsu, M. Ogawa, Y. Sakai, S. Aiso, Biochim. Biophys. Acta 1039 Ž1996. 73–76. w13x Y. Hiraoka, N. Komatsu, Y. Sakai, M. Ogawa, M. Shiozawa, S. Aiso, Gene 197 Ž1997. 65–71. w14x Y. Sakai, Y. Hiraoka, M. Konishi, M. Ogawa, S. Aiso, Arch. Biochem. Biophys. 346 Ž1997. 1–6. w15x P. Koopman, J. Gubbay, N. Vivian, P. Goodfellow, R. Lovell-Badge, Nature 351 Ž1991. 117–121. w16x C.M. Haqq, C.-Y. King, E. Ukiyama, S. Falsafi, T.N. Haqq, P.K. Donahoe, M.A. Weiss, Science 266 Ž1994. 1494–1500. w17x W.-H. Shen, C.C.D. Moore, Y. Ikeda, K.L. Parker, H.A. Ingraham, Cell 77 Ž1994. 651–661. w18x T. Wagner, J. Wirth, J. Meyer, B. Zabel, M. Held, J.
w23x
w24x w25x w26x w27x
w28x w29x w30x w31x
w32x
w33x
w34x
w35x w36x w37x
w38x
137
Zimmer, J. Pasantes, F.D. Bricarelli, J. Keutel, E. Hustert, U. Wolf, N. Tommerup, W. Schempp, G. Scherer, Cell 79 Ž1994. 1111–1120. J.W. Foster, M.A. Dominguez-Steglich, S. Guioli, C. Kwok, P.A. Weller, M. Stevanovic, J. Weissenbach, S. Mansour, I.D. Young, P.N. Goodfellow, J.D. Brook, A.J. Schafer, Nature 372 Ž1994. 525–530. D. Uwanogho, M. Rex, E.J. Cartwright, G. Pearl, C. Healy, P.J. Scotting, P.T. Sharpe, Mech. Dev. 49 Ž1995. 23–36. Y. Kamachi, S. Sockanathan, Q. Lui, M. Breitman, R. Lovell- Badge, H. Kondoh, EMBO J. 14 Ž1995. 3510–3519. S.R.H. Russell, N. Sanchez-Soriano, C.R. Wright, M. Ashburner, Development 122 Ž1996. 3669–3676. M.W. Schilham, M.A. Oosterwegel, P. Moerer, J. Ya, P.A.J.d. Boer, M.v.d. Wetering, S. Verbeek, W.H. Lamers, A.M. Kruisbeek, A. Cumano, H. Clevers, Nature 380 Ž1996. 711–714. M.W. Schilham, P. Moerer, A. Cumano, H.C. Clevers, Eur. J. Immunol. 27 Ž1997. 1292–1295. X. Li, A. Cvekl, S. Bassnett, J. Piatigorsky, Dev. Genetics 20 Ž1997. 258–266. N. Komatsu, Y. Hiraoka, M. Shiozawa, M. Ogawa, S. Aiso, Biochim. Biophys. Acta 1305 Ž1996. 117–119. J. Hata, J. Fujimoto, E. Ishii, A. Umezawa, Y. Kokai, Y. Matsubayashi, H. Abe, S. Kusakari, H. Kikuchi, T. Yamada, T. Maruyama, Acta Histochem. Cytochem. 25 Ž1992. 563– 576. J. Meyer, J. Wirth, M. Held, W. Schempp, G. Scherer, Cytogenet. Cell Genet. 72 Ž1996. 246–249. E.M. Wright, B. Snopek, P. Koopman, Nucleic Acids Res. 21 Ž1993. 744. M.v.d. Wetering, H. Clevers, Nucleic Acids Res. 21 Ž1993. 1669. T.L. Dunn, L. Mynett-Johnson, E.M. Wright, B.M. Hosking, P.A. Koopman, G.E.O. Muscat, Gene 161 Ž1995. 223– 225. Y. Kanai, M. Kanai-Azuma, T. Noce, T.C. Saido, T. Shiroishi, Y. Hayashi, K. Yazaki, J. Cell Biol. 133 Ž1996. 667–681. M. Tani, N. Shindo-Okada, Y. Hashimoto, T. Shiroishi, S. Takenoshita, Y. Nagamachi, J. Yokota, Genomics 39 Ž1997. 30–37. P. Jay, I. Sahly, C. Goze, ´ S. Taviaux, F. Poulat, G. Couly, M. Abitbol, P. Berta, Hum. Mol. Genet. 6 Ž1997. 1069– 1077. C. Goze, ´ F. Poulat, P. Berta, Nucleic Acids Res. 21 Ž1993. 2943. S. Kido, Y. Hiraoka, M. Ogawa, Y. Sakai, Y. Yoshimura, S. Aiso, Gene Ž1998. in press. N. Takamatsu, H. Kanda, I. Tsuchiya, S. Yamada, M. Ito, S. Kabeno, T. Shiba, S. Yamashita, Mol. Cell. Biol. 15 Ž1995. 3759–3766. F. Connor, E. Wright, P. Denny, P. Koopman, A. Ashworth, Nucleic Acids Res. 23 Ž1995. 3365–3372.