Biochimica et Biophysica Acta 1354 Ž1997. 183–188
Short sequence-paper
Complete primary structure of human collagen type XIV ž Undulin/ Michael Bauer, Walburga Dieterich, Tobias Ehnis, Detlef Schuppan
1
)
Free UniÕersity of Berlin, Klinikum Benjamin Franklin, Department of Gastroenterology, Hindenburgdamm 30, D-12200 Berlin, Germany Received 29 May 1997; accepted 14 July 1997
Abstract A partial cDNA sequence coding for the human extracellular matrix protein undulin has been completed. The completed sequence provides conclusive evidence for the suggested identity of undulin and collagen type XIV. Two differently sized polyproteins of 1780 and 1796 amino acids, with an overall amino acid sequence identity of 75% compared to chicken CXIV, emerge from variant 3X sequence ends encoding the C-terminal non-collagenous ŽNC. NC1 domain of human collagen type XIV. q 1997 Elsevier Science B.V. Keywords: Undulin; Collagen type XIV; Extracellular matrix; FACIT; von Willebrand factor; Fibronectin
Collagen type XIV ŽCXIV., a member of the FACIT Žfibril associated collagens with interrupted triple helices. family of collagens w1,2x, as well as the related matrix protein undulin, is found in association with collagen fibrils in differentiated mesenchymal tissues w3–5x. The deduced protein sequence of cDNA clones covering approximately 40% of human undulin showed a modular arrangement of domains with homology to fibronectin type III Ž FN III. repeats or von Willebrand factor A ŽvWF A. domains w6x. Comparison of the partial amino acid sequence of
Abbreviations: RACE, rapid amplification of cDNA ends; PCR, polymerase chain reaction; bp, base pairs; FACIT, fibril associated collagens with interrupted triple helices; CXIV, collagen type XIV ) Corresponding author. Fax: q49 30 8445 4017; E-mail:
[email protected] 1 DNA-sequences reported in this paper have been submitted to the EMBLrGenBank Data Libraries with accession numbers Y11709, Y11710 and Y11711.
undulin with that of chicken CXIV w7,8x, and with the amino acid sequence of proteolytic human CXIV fragments w9x, strongly suggested that undulin and CXIV are identical proteins. Chicken CXIV is composed of a large N-terminal non-collagenous ŽNC. domain NC3, followed by two short collagenous segments, COL2 and COL1, that are separated by a short NC2 domain, and the NC1 domain at the C-terminus w8,10x. CXIV may modulate the supramolecular organization of collagen fibrils, similar to the function assigned to the FACIT collagen type IX, which is found in cartilage w2x. Recently, a proteoglycan variant of CD44 was identified as a cellular receptor for human CXIVrundulin w11x, suggesting a role of CXIVrundulin in cell adhesion and possibly signal transduction. In order to clarify the relationship of CXIV and undulin we completed the cDNA and deduced protein sequence of human undulin. 5X RACE-PCR Žw12,13x Marathon Kit Manual, Clontech. was used to extend the partial human undulin sequence at the 5X end ŽFig. 1.. The first
0167-4781r97r$17.00 q 1997 Elsevier Science B.V. All rights reserved. PII S 0 1 6 7 - 4 7 8 1 Ž 9 7 . 0 0 1 3 1 - 0
184
M. Bauer et al.r Biochimica et Biophysica Acta 1354 (1997) 183–188
Fig. 1. Overlapping cDNA clones completing the human CXIVrundulin sequence were obtained by repeated library screening Ž l3X 1–3. and RACE-PCR Ž5X RACE a–c, 3X RACE d–f.. Alternative poly-adenylation signals created two different sized 3X RACE-PCR products Žd,e. encoding an identical NC1 domain. Žf. shows a third 3X RACE-PCR product encoding a variant NC1 domain Žblack colour represents cDNA sequence different from Žd. and Že... UN1 is a previously described partial undulin clone w6x.
ATG codon Ž Fig. 2, nucleotides 244–246. is immediately followed by a stop codon Ž nucleotides 247–249.. Therefore, translation is expected to start at the second ATG codon Žnucleotides 265–267., which is in a
favorable sequence context for translation initiation w14x. A putative signal peptide of 28 amino acids with a characteristically high content of hydrophobic residues is encoded by the following nucleotide sequence Žnucleotides 265–348. . Similar to one reported chicken CXIV sequence w8x the mature human CXIVrundulin starts with a FN III repeat Ž nucleotides 349–618.. Interestingly, the chicken CXIV sequence of a second report w10x lacks this N-terminal FN III domain. Studies with recombinant fragments of human CXIVrundulin showed that the N-terminal FN III repeat mediates binding of CXIVrundulin to decorin, a small chrondroitinrdermatan sulfate proteoglycan w22x. A linker sequence Žnucleotides 619– 708. connects the N-terminal FN III repeat to a vWF A domain Žnucleotides 709–1245. and to a cluster of seven FN III repeats already covered by the previously described undulin clones UN1 and UN2 w6x. In contrast to chicken CXIV w10x no evidence for alternative splicing at the 5X end was found for human CXIVrundulin. Screening of a placental cDNA library Ž Stratagene. recovered overlapping phage clones Ž Fig. 1. that further extended the partial human undulin sequence
Fig. 2. 5X RACE-PCR was used to complete the human CXIVrundulin sequence at the 5X end. Placental or RD Žrhabdomyosarcoma. cell mRNA was reverse transcribed using a gene specific primer located ) 500 bp downstream of the 5X ends of the previously characterized partial undulin clones UN1 Žv . and UN2 Ž).. Subcloned RACE-PCR products were sequenced using the dideoxy chain termination method ŽTaqTrack or Silver Sequence, Promega.. Amino acids of the N-terminal FN III repeat are in italics. A putative heparin binding site is boxed w20x. Sequences corresponding to gene specific primers used in 5X RACE- Žunderlined. or long-distance PCR Žthick underlined. are indicated.
M. Bauer et al.r Biochimica et Biophysica Acta 1354 (1997) 183–188
at the 3X end. Conceptional translation Ž Fig. 3. revealed a second vWF A domain C-terminal to the cluster of FN III repeats, and a domain with homology to the globular N-terminal NC4 domain of the a 1 chain of collagen type IX. C-terminal to the thus completed large N-terminal NC3 domain, additional domains with high homology to the COL2, NC2, and COL1 domains of chicken CXIV ŽTable 1. were found. Only the C-terminal NC1 domain displayed a reduced homology between human and chicken. The completed cDNA sequence for human undulin provides conclusive evidence for the identity of human CXIV and undulin. To verify our 3X sequence we performed 3X RACE-PCR Žw13x, and Marathon Kit Manual, Clontech.. One of the resulting differently sized 3X RACE-PCR products confirmed the 3X end ŽFig. 3Ž A.. identified in phage screening. A second product was apparently generated by usage of an alternative poly-adenylation signal ŽFig. 3Ž A., nucleotides 5928–5933. . Surprisingly a third product
185
encoded a variant NC1 domain Ž Fig. 3Ž B.. , which shared only the first 11 amino acids Ž boxed in Fig. 3. with the prior NC1 domain. The two divergent 3X sequences were confirmed by Northern analysis with 3X end specific probes. All probes hybridized to mRNA’s of about 6.5 kb in line with previous results w6–8,10x. In addition, long-distance PCR w15x with the 5X primer Ž Fig. 2, nucleotides 265–291. located at the translation start site and 3X variant specific primers ŽFig. 3ŽA., nucleotides 6001–6022, respectively, 5751–5772 in Fig. 3Ž B.. , spanning the complete coding region of human CXIV, produced the expected PCR products Žnot shown. . Analogous to human CXIV there are also two NC1 variants of chicken CXIV w8x. Alignment of the corresponding human and chicken NC1 domains demonstrates that the human domains are much shorter Ž Fig. 4.. Seven amino acids C-terminal to the conserved CXXXXC sequence motif at the COL1–NC1 junction, which is thought to be characteristic for FACIT
Table 1 Similarity of corresponding human CXIVrundulin and chicken CXIV domains deduced from the cDNA sequences. For further details refer to the text. Note the lower percentage of identical amino acids with chicken CXIV Židentity. in the putative signal peptide, linkers 1–3 of the NC3 domain, and the C-terminal NC1 domains Domain
Length Žresidues.
Position
Identity with chicken CXIV Ž%.
Put. Signal FN type III-1 Linker-1 vWF A-1 Linker-2 FN type III-2 FN type III-3 FN type III-4 FN type III-5 Linker-3 FN type III-6 FN type III-7 FN type III-8 Linker-4 vWF A-2 NC4 ŽCIX. like NC3 Žcomplete. COL2 NC2 COL1 NC1 short variant NC1 long variant
28 90 29 189 15 90 92 89 90 20 95 91 88 15 190 250 1433 149 43 106 21 37
1– 28 29– 118 119– 147 148– 336 337– 351 352– 441 442– 533 534– 622 623– 712 713– 732 733– 827 828– 918 919–1006 1007–1021 1022–1211 1212–1461 29–1461 1462–1610 1611–1653 1654–1759 1760–1780 1760–1796
21 82 39 83 53 77 80 67 61 20 81 73 69 73 87 81 76 81 79 78 52 62
Length differences
Human 7 residues shorter Human 3 residues longer
Human 6 residues shorter Human 3 residues longer Human 7 residues shorter Human 3 residues shorter
Human 67 residues shorter Human 82 residues shorter
186
M. Bauer et al.r Biochimica et Biophysica Acta 1354 (1997) 183–188
M. Bauer et al.r Biochimica et Biophysica Acta 1354 (1997) 183–188
187
Fig. 4. Comparison of the N-terminal sequences of bovine, human, and chicken CXIV. Identical amino acids of the human sequences are in bold letters. ŽA.: Alignment of the short NC1 variants and a partial bovine NC1 sequence w21x. The CXXXXC motif at the COL1–NC1 junction, thought to be characteristic for the FACIT collagen family, is well conserved. ŽB.: Alignment of the larger NC1 variants. Amino acids of the larger chicken NC1 not included in the smaller chicken variant are in italics.
collagens, the NC1 sequences become dissimilar. However, following a gap both human NC1 variants contain amino acid stretches with homology to the corresponding chicken NC1 domains. These sequences which have been conserved between human and chicken may be required for yet unknown functions of the NC1 domain. Also, between chicken collagen types XII and XIV, the C-terminal NC1 domain displayed the highest sequence dissimilarity. Thus, it was speculated that the NC1 domain may play a role in the selection of the correct type of FACIT collagen for association with a given collagen fibril w8x. Furthermore, the NC1 domain may assist in the assembly of appropriate chains during triple-helix formation w16–18x. The two variant NC1 domains observed in CXIV might differently affect these processes. In addition to the suspected functional role of the variant NC1 domains, the 3X ends might contain
elements that are important for the regulation of CXIV gene expression. The optional usage of an alternative poly-adenylation signal within one 3X sequence variant results in a mRNA with a shortened 3X untranslated region Ž3X UTR. . Variations in the 3X UTR have been shown to modulate mRNA stability and thus to modify gene expression w19x. Switching between alternative poly-adenylation sites could be one regulatory tool of cells to achieve and maintain a tissue- and development-specific CXIV gene expression. However, till today there is no experimental proof for this in case of CXIV. Supported by Grants Schu 646r1–4 and SFB 366 C5 from the Deutsche Forschungsgemeinschaft. D. Schuppan is recipient of a Hermann-and-Lilly-Schilling-professorship. This publication contains parts of the Ph.D. thesis of M. Bauer. We thank Dr. H.-D. Orzechowski for critically reading the manuscript.
Fig. 3. The alternative 3X sequences of human CXIVrundulin. Amino acid residues of collagen triple helical sequences are in italics. Identical residues of the variant C-terminal NC1 domains are boxed and poly-adenylation signals are in lower case letters. Sequences corresponding to gene specific primers used in 3X RACE- Žunderlined. or long-distance PCR Žthick underlined. are indicated. ŽA.: The 3X sequence of the shorter CXIVrundulin variant. Žv . marks the first nucleotide not included in the previously described undulin clone UN1 w6x, Ž). the alternative polyadenylation site. ŽB.: 3X cDNA sequence of the long C-terminal NC1 variant.
188
M. Bauer et al.r Biochimica et Biophysica Acta 1354 (1997) 183–188
References w1x M. van der Rest, R. Garrone, FASEB J. 5 Ž1991. 2814–2823. w2x L.M. Shaw, B.R. Olsen, Trends Biochem. Sci. 16 Ž1991. 191–194. w3x D.R. Keene, G.P. Lunstrum, N.P. Morris, D.W. Stoddard, R.E. Burgeson, J. Cell Biol. 113 Ž1991. 971–978. w4x X. Zhang, D. Schuppan, J. Becker, P. Reichart, H.R. Gelderblom, J. Histochem. Cytochem. 41 Ž1993. 245–251. w5x D. Schuppan, M.C. Cantaluppi, J. Becker, A. Veit, T. Bunte, D. Troyer, F. Schuppan, M. Schmid, R. Ackermann, E.G. Hahn, J. Biol. Chem. 265 Ž1990. 8823–8832. w6x M. Just, H. Herbst, M. Hummel, H. Durkop, D. Tripier, H. ¨ Stein, D. Schuppan, J. Biol. Chem. 266 Ž1991. 17326– 17332. w7x J. Trueb, B. Trueb, Eur. J. Biochem. 207 Ž1992. 549–557. w8x C. Walchli, J. Trueb, B. Kessler, K.H. Winterhalter, B. ¨ Trueb, Eur. J. Biochem. 212 Ž1993. 483–490. w9x J.C. Brown, K. Mann, H. Wiedemann, R. Timpl, J. Cell Biol. 120 Ž1993. 557–567. w10x D.R. Gerecke, J.W. Foley, P. Castagnola, M. Gennari, B. Dublet, R. Cancedda, T.F. Linsenmayer, M. van der Rest, B.R. Olsen, M.K. Gordon, J. Biol. Chem. 268 Ž1993. 12177–12184.
w11x T. Ehnis, W. Dieterich, M. Bauer, B. von Lampe, D. Schuppan, Exp. Cell Res. 229 Ž1996. 388–397. w12x M.A. Frohman, M.K. Dush, G.R. Martin, Proc. Natl. Acad. Sci. U.S.A. 85 Ž1988. 8998–9002. w13x P.D. Siebert, A. Chenchik, D.E. Kellogg, K.A. Lukyanov, S.A. Lukyanov, Nucleic Acids Res. 23 Ž1995. 1087–1088. w14x M. Kozak, Nucleic Acids Res. 15 Ž1987. 8125–8148. w15x W.M. Barnes, Proc. Natl. Acad. Sci. U.S.A. 91 Ž1994. 2216–2220. w16x J.C. Brown, R. Globik, K. Mann, R. Timpl, Matrix Biol. 14 Ž1994. 287–295. w17x M. Mazzorana, H. Gruffat, A. Sergeant, M. van der Rest, J. Biol. Chem. 268 Ž1993. 3029–3032. w18x A. Lesage, F. Penin, C. Geourjon, D. Marion, M. van der Rest, Biochemistry 35 Ž1996. 9647–9660. w19x J.D. Lewis, S.I. Gunderson, I.W. Mattaj, J. Cell Sci. Supplement 19 Ž1995. 13–19. w20x A.D. Cardin, H.J.R. Weintraub, Arteriosclerosis 9 Ž1989. 21–32. w21x B. Dublet, M. van der Rest, J. Biol. Chem. 266 Ž1991. 6853–6858. w22x T. Ehnis, W. Dieterich, M. Bauer, H. Kresse, D. Schuppan, J. Biol. Chem. 272 Ž1997. 20414–20419.