Collagen type I Type I collagen is the major fibrillar collagen of bone, tendon and skin; it provides these and many other tissues with tensile strength. Type I collagen forms rope-like structures in tendon, sheet-like structures in skin, and in bone is reinforced with calcium hydroxyapatite. More than one hundred mutations have been observed in the COLlAl and COLlA.2 genes, resulting in various forms of osteogenesis imperfects (brittle bone disease) and in the Ehlers-Danlos syndrome type W which is characterized by joint hypermobility and skin fragility1-". A mouse strain (Movl3)which is homozygous for a viral insertion in the first intron of the COLlAl gene is essentially a COLlAl knockout and the mutation is lethal *.
Molecular structure q e I collagen is synthesized primarily as a heterotrimeric procollagen comprising two proal(1) chains and one proa2(I) chain. The procollagen is processed extracellularly by N- and C-proteinases to give a triple-helical molecule that can assemble with a stagger of 234 amino acids into crossbanded fibrils with a 67nm (D) periodicity. These fibrils are stabilized by inter-molecular cross-links derived from specific lysine/hydroxylysine residues in both the non-helical (telopeptide) and helical domains. The flbrillar form of type I collagen interacts with the protein cores of the proteoglycans decorin and fibromodulin. The triple-helical domain interacts with cells via integrin receptors, principally 01281, and a tetrapeptide Asp-Gly-GluAla (DGEA)has been reported to account for this binding. A homopolymer comprising three identical al(1)chains (designated collagen type I trimer) has also been observed in fetal and diseased tissues but it is not a significant component of normal adult tissues p.3 ''-
Isolation Type I procollagen can be isolated from cultured fibroblasts'. Type I collagen can be prepared from fetal skin in its intact, processed form by extraction with 0.5 M acetic acid or in a shghtly shorter form by pepsin digestion.'
Primary structure: al(1)chain Ala A 141 Cys C Gly G Phe F 27 K 58 Leu L LYS Pro P 278 Gln Q Val V Thr T 44 Mol. wt (calc.)= 138 731
601 651
1101 1151 1201 1251 1301 1351 1401 1451
MFSFVDLRLL DVWKPEPCRI SPTDQETTGV GPPGLGGNFA FQGPPGEPGE GARGLPGTAG MGPRGLPGER AKGEAGPQGP GAPGIAGAPG PGPVGVQGPP VAGPKGPAGE GKTGPPGPAG PGPPGAVGPA PPGEAGKPGE GAPGNDGAKG KGADGSPGKD DRGEPGPPGP GPIGNVGAPG AGKEGGKGPR PQGIAGQRGV GPPGLAGPPG PGAPGPVGPA EQGDRGIKGH GKDGLNGLPG LPQPPQEKAH SRKNPARTCR YPTQPSVAQK QLTFLRLMST EGNSRFTYSV DQEFGFDVGP
18 391 48 48 46
Asp D 66 His H 9 Met M 13 Arg R 71 Trp W 6 Residues = 1464
LLLAATALLT CVCDNGKVLC EGPKGDTGPR PQLSYGYDEK PGASGPMGPR LPGMKGHRGF GRPGAPGPAG RGSEGPQGVR FPGARGPSGP GPAGEEGKRG RGSPGPAGPK QDGRPGPPGP GKDGEAGAQG QGVPGDLGAP DAGAPGAPGS GVRGLTGPIG AGFAGPPGAD AKGARGSAGP GETGPAGRPG VGLPGQRGER ESGREGAPGA GKSGDRGETG RGFSGLQGPP PIGPPGPRGR DGGRYYRADD DLKMCHSDWK NWYISKNPKD EASQNITYHC TVDGCTSHTG VCFL
HGQEEGQVEG DDVICDETKN GPRGPAGPPG STGGISVPGP GPPGPPGKNG SGLDGAKGDA ARGNDGATGA GEPGPPGPAG QGPGGPPGPK ARGEPGPTGL GSPGEAGRPG PGARGQAGVM PPGPAGPAGE GPSGARGERG QGAPGLQGMP PPGPAGAPGD GQPGAKGEPG PGATGFPGAA EVGPPGPPGP GFPGLPGPSG EGSPGRDGSP PAGPAGPVGP GPPGSPGEQG TGDAGPVGPP ANWRDRDLE SGEYWIDPNQ KRHVWFGESM KNSVAYMDQQ AWGKTVIEYK
Structural and functional sites Signal peptide: 1-22 N-Propeptide: 23- 161 von Willebrand factor C repeat: 35-103 COL2 domain: 109- 159 N-Telopeptide: 162- 178 Helical domain: 179- 1192
Glu Ile Asn Ser 5 r
E 75 I 24 N 28
S 60 Y 13
QDEDIPPITC CPGAEVPEGE RDGIPGQPGL MGPSGPRGLP DDGEAGKPGR GPAGPKGEPG AGPPGPTGPA AAGPAGNPGA GNSGEPGAPG PGPPGERGGP EAGLPGAKGL GFPGPKGAAG RGEQGPAGSP FPGERGVQGP GERGAAGLPG KGESGPSGPA DAGAKGDAGP GRVGPPGPSG AGEKGSPGAD EPGKQGPSGA GAKGDRGETG AGARGPAGPQ PSGASGPAGP GPPGPPGPPG VDTTLKSLSQ GCNLDAIKVF TDGFQFEYGG TGNLKKALLL TTKTSRLPII
VQNGLRYHDR CCPVCPDGSE PGPPGPPGPP GPPGAPGPQG PGERGPPGPQ SPGENGAPGQ GPPGFPGAVG DGQPGAKGAN SKGDTGAKGE GSRGFPGADG TGSPGSPGPD EPGKAGERGV GFQGLPGPAG PGPAGPRGAN PKGDRGDAGP GPTGARGAPG PGPAGPAGPP NAGPPGPPGP GPAGAPGTPG SGERGPPGPM PAGPPGAPGA GPRGDKGETG RGPPGSAGAP PPSAGFDFSF QIENIRSPEG CNMETGETCV QGSDPADVAI KGSNEIEIRA DVAPLDVGAP
C-Telopeptide: 1193-1218 C-Propeptide: 1219- 1464 Lysine/hydroxylysine cross-linking sites: 170,265, 1108, 1208 Potential N-linked glycosylation site: 1365 N-Proteinase cleavage site: 161- 162 C-Proteinase cleavage site: 1218-1219 DGEA cell adhesion site: 613-616 Mammalian collagenase cleavage site: 953-954
Accession number P02464, PO8123
Primary structure: ar2(I)chain Ala A 126 Cys C Phe F 23 Gly G Leu L Lys K 50 Pro P 230 Gln Q T h r T 44 Val V Mol. wt (calc.)= 129357 1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951 1001 1051 1101 1151 1201 1251 1301 1351
MLSFVDTRTL GRDGEDGPTG PPGAAGAPGP GRPGERGVVG PGAPGENGTP SAGPPGFPGA GANGLTGAKG AGSKGESGNK SPGSRGLPGA GLPGSPGNIG TGDPGKNGDK PPGFQGLPGP GPTGPIGSRG PGGKGEKGEP PAGPAGPRGS GWGPTGPVG SGISGPPGPP EAGTAGPPGT GPPGARGPPG PGNIGPVGAA PQGIRGDKGE GPRGPAGPSG LGVSGGGYDF TPEGSRKNPA ETCIRAQPEN ATQLAFMRLL VAEGNSRFTY GADHEFFVDI
9 382 62 32 55
Asp D 42 His H 17 Met M 10 Arg R 72 Trp W 5 Residues = 1366
LLLAVTLCLA PPGPPGPPGP QGFQGPAGEP PQGARGFPGT GQTGARGLPG PGPKGEIGAV AAGLPGVAGA GEPGSAGPQG DGRAGVMGPP PAGKEGPVGL GHAGLAGARG SGPAGEVGKP PSGPPGPDGN GLRGEIGNPG PGERGEVGPA AAGPAGPNGP GPAGKEGLRG PGPQGLLGAP AVGSPGVNGA GAPGPHGPVG PGEKGPRGLP PAGKDGRTGH GYDGDFYRAD RTCRDLRLSH IPAKNWYRSS ANYASQNITY TVLVDGCSKK GPVCFK
TCQSLQEETV PGLGGNFAAQ GEPGQTGPAG PGLPGFKGIR ERGRVGAPGP GNAGPTGPAG PGLPGPRGIP PPGPSGEEGK GSRGASGPAG PGIDGRPGPI APGPDGNNGA GERGLHGEFG KGEPGVVGAV RDGARGAHGA GPNGFAGPAG PGPAGSRGDG PRGDQGPVGR GILGLPGSRG PGEAGRDGNP PAGKHGNRGE GFKGHNGLQG PGTVGPAGIR QPRSAPSLRP PEWSSGYYWI KDKKHVWLGE HCKNSIAYMD TNEWGKTIIE
Glu Ile Asn Ser p r
E I N S Y
67 31 42 51 16
RKGPAGDRGP YDGKGVGLGP ARGPAGPPGK GHNGLDGLKG AGARGSDGSV PRGEVGLPGL GPPGAAGTTG RGPNGEAGSA VRGPNGDAGR GPVGARGEPG QGPPGPQGVQ LPGPAGPRGE GTAGPSGPSG VGAPGPAGAT AAGQPGAKGE GPPGMTGFPG TGEVGAVGPP ERGLPGVAGA GNDGPPGRDG TGPSGPVGPA LPGIAGHHGD GPQGHQGPAG KDYEVDATLK DPNQGCTMEA TINAGSQFEY EETGNLKKAV YKTNKPSRLP
RGERGPPGPP GPMGLMGPRG AGEDGHPGKP QPGAPGVKGE GPVGPAGPNG SGPVGPPGNP ARGLVGEPGP GPPGPPGLRG PGEPGLMGPR NIGFPGPKGP GGKGEQGPAG RGPPGESGAA LPGERGAAGI GDRGEAGAAG RGGKGPKGEN AAGRTGPPGP GFAGEKGPSG VGEPGPLGIA QPGHKGERGY GAVGPRGPSG QGAPGSVGPA PPGPPGPLGP SLNNQIETLL IKVYCDFPTG NVEGVTSKEM ILQGSNDVEL FLDIAPLDIG
Structural and functional sites Signal peptide: 1-22 N-Propeptide: 23-79 COL2 domain. 33-77 N-Telopeptide: 80-90 Helical domain: 9 1-1102 C-Telopeptide: 1103-1 119 C-Propeptide: 1120-1366 Lysine/hydroxylysine cross-lmlung sites: 84, 177, 1023 Histidine cross-linkingsite: 182 Hydroxylysine glycosylation sites: 177,264 Potential N-linked glycosylation site: 1267 N-Proteinage cleavage site: 79-80 C-Proteinasecleavage site: 1119- 1120 Mammalian collagenase cleavage site: 865-866
Gene structure The proal (I)and proa2(I)chains are encoded by single genes located on human chromosomes 17 (locus q21.3-22) and 7 (locus q21.3-221, respectively. The proal(1) gene contains 51 exons and the proa2(I) gene 52 exons. All the exons encoding the triple-helical domain are multiples of 9 bp corresponding to Gly-X-Y triplets (commonly 54bp). The exon arrangement within the uninterrupted triple-helical domain [exons 7-47 for al(I), exons 7-48 for &{I)]are almost identical to each other and to that for exons 9-50 in the al(II) gene. In cartilage, the use of a different transcription site within the h t intron of the chick proaZ(1)gene results in exons 1 and 2 being replaced by a new exon of 96bp. The resulting transcript contains several open reading frames that are out of frame with the a2(I) coding sequence and therefore encode non-collagenous proteins, one of which appears to be a DNA-binding protein. Whether the al(1) chain of collagen type I trimer is the same or a distinct gene product from that in the heterotrimer is still controversial"".
References Kadler, K.E. (1995)Extracellular matrix 1: fibril-forming collagens. In: Protein Profile, Vol. 2, ed. Sheterline, P., Academic Press, London, pp. 491-619. Kivirikko, K.I. (1993)Collagens and their abnormalities in a wide spectrum of diseases. Ann. Med. 25: 113-126. Fietzek, P.P. and Kuhn, K. (1976)The primary structure of collagen. Int. Rev. Connective Tissue Res. 7: 1-60. Schupp-Byrne, D.E. and Church, R.L. (1982)Embryonic collagen (type I-trirner)cr 1-chainsare genetically distinct from type I collagen al-chains. Collagen Rel. Res. 2: 48 1-494. Kuhn, K. (1987)The classical collagens: types I, II and LU.In: Structure and Function of Collagen 'Ifipes, ed. Mayne, R. and Burgeson, R.E., Academic Press, London, pp. 1-42.
ti
' '
'O
I1
Vuorio, E. and de Crombrugghe, B. (1990)The family of collagen genes. Annu. Rev. Biochem. 59: 837-872. Staatz, W.D. et al. (1991)Identification of a tetrapeptide recognition sequence for the a2pl integrin in collagen. J. Biol. Chem. 266: 7363-7367. Miller, E.J.and Rhodes, R.K, (1982)Preparation and characterization of the different types of collagen. Methods Enzymol. 82: 33-64. Bennett, V. and Adams, S.L. (1990)Identification of a cartilage-specific promoter within intron 2 of the chick a2(I)collagen gene. J. Biol. Chem. 265: 2223-2230. Sandell, L.J. and Boyd, C.D. (1990)Conserved and divergent sequence and functional elements within collagen genes. In: Extracellular Matrix Genes, ed. Sandell, L.J. and Boyd, C.D., Academic Press, New York, pp. 1-56. Chu, M.L. and Prockop, D.J. (1993)Collagen: Gene structure. In: Connective Tissue and Its Heritable Disorders, ed. Steinrnann, B. and Royce, P., WileyLiss, New York, pp. 149-165.