Biochimica et Biophysica Acta, 1173(1993) 81-84
81
© 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00
BBAEXP 90493
Short Sequence-Paper
cDNA sequence for bovine biglycan (PGI) protein core Maureen A. Torok, Suvia A.S. Evans and James A. Marcum Department of Pathology, Beth Israel Hospital, Hareard Medical School, Boston, MA (USA)
(Received 23 December 1992)
Key words: Biglycan;cDNA; (Aorta); (Smooth muscle cell); (Bovine) The nucleotide sequence of the protein core for bovine aortic smooth muscle cell biglycan was determined using recombinant DNA technology. Analysis of the deduced amino acid sequence for bovine biglycan revealed a striking homology, 94.6% and 95.7%, to human and rat biglycan, respectively. The bovine biglycan protein core has four potential O-linked and two potential N-linked glycosylation sites and is composed of 11 leucine-rich repeat units.
Sulfated proteoglycans are acidic macromolecules composed of glycosaminoglycans covalently attached, via an O-linkage, to protein cores [1]. The carbohydrate chains consist, in part, of chondroitin sulfate, dermatan sulfate, and heparan sulfate, and the core proteins range in molecular masses from about 10000 to over 200 000 Da [2]. Neame et al. [3] determined the primary amino acid sequence of a secreted form of a small molecular weight proteochondroitin (biglycan, PG I) isolated from bovine articular cartilage. Fisher et al. [4] cloned the biglycan protein core from a human bone-derived cell cDNA library. The core proteins of bovine and human biglycan contain 331 and 368 amino acid residues, respectively [3,4]. Asundi et al. [5] demonstrated, by Northern analysis with a nonhomologous species cDNA probe, that rat aortic smooth muscle cells maintained in vitro and rat aortic medial tissue obtained in vivo synthesize a 2.9 kb transcript for the core protein of biglycan. Dreher et al. [6] isolated and sequenced clones from a rat aortic smooth muscle cell cDNA library that encode for the protein core of biglycan. In the present report, the cDNA sequence for the core protein of bovine aortic smooth muscle cell biglycan was determined utilizing recombinant D N A technology. Smooth muscle cells were isolated from bovine aorta by explant outgrowth [7] and grown in a humidified atmosphere of 5% CO2, 95% air on 100 mm plastic
Correspondence to: J.A. Marcum, Department of Pathology, Beth Israel Hospital, 330 Brookline Ave., Boston, MA 02215, USA. The sequence data presented in this paper have been submitted to the EMBL/GenBank Data Libraries under the accession number L07953.
culture dishes containing Dulbecco's modified minimal essential media, 10% fetal bovine serum (Intergen), penicillin (100 units/ml), and streptomycin sulfate (100 ~ g / m l ) . Cells were fed every 2-3 days and passaged, after confluence, at a 1:5 split ratio using a 0.05% trypsin-0.2% E D T A solution. Cell type was confirmed by light and electron microscopy [8] and by muscle-actin immunofluorescence [9]. Total RNA was isolated from bovine aortic smo6th muscle cells (passage 4) using quanidinium thioc y a n a t e - p h e n o l - c h l o r o f o r m single-step extraction (Stratagene), and poly(A) ÷ RNA was purified using oligo(dT)-cellulose spun columns (Pharmacia). Synthesis of cDNA was initiated by the addition of reverse transcriptase and oligo(dT)~2 18 (Pharmacia) to the bovine poly(A)* RNA. A 5' extension cDNA libra~ was also constructed using a primer (5' ~ T I ' G G G G A T G C C A G T G A G C T 3') consisting of nucleotides no. 660 to no. 678 (Fig. 1). Second strand synthesis of cDNA was performed with Escherichia coli DNA polymerase and RNase H (Pharmacia). Doublestranded cDNA was blunted with T4 DNA polymerase, coupled to E c o R I linkers containing a non-phosphorylated E c o R I overhang (Pharmacia), ligated into E c o R I - d i g e s t e d , alkaline phosphatase-treated Agtll expression vector (Stratagene), and packaged in vitro. E. coli strain Y1088 was infected with the bacteriophage and plated onto 150 mm Petri dishes. Plaque D N A was transferred, in duplicate, to NEN-Dupont Colony Plaque screen membranes, and filters were prehybridized at 68°C for 10 min in Quikhyb (Stratagene), employing an Autoblot hybridization oven (Bellco). A 2.1 kb contiguous cDNA fragment from the 3' end of rat biglycan [6] was labeled with [32p]ATP (NEN-Dupont) by random priming (Boehringer-Mann-
82 heim). Hybridization was conducted at 68°C for 60 min by adding the denatured double-stranded probe (1.25 • 106 cpm/ml) to the Quikhyb solution containing the prehybridized filters. After hybridization, the filters were washed twice with 2 × SSC and 0.1% SDS for 15 min at 22°C and then once with 0.1 × SSC and 0.1% SDS for 30 rain at 60°C. Filters were exposed overnight to X-OMAT film (Kodak) by autoradiography. Positive plaques were purified by additional rounds of screening. Phage DNA was isolated from plaques using LambdaSorb Phage Adsorbent (Promega). Inserts were removed with EcoRl restriction enzyme and subcloned into pBluescript II K S ( - ) (Stratagene) or pGEM3 Z f ( - ) (Promega). The recombinant plasmids were used to transform E. coli DH5a cells (Gibco-BRL). Positive transformants were grown in Circle Grow (Bio-101) containing ampicillin (50 /xg/ml), and plasmid DNA was isolated using Plasmid Quik mini-columns (Stratagene). Nested deletions of intact plasmid clones were generated using Erase-a-Base System (Promega). The DNA sequence of three overlapping clones was determined by the Sanger dideoxy chain-termination reaction using Sequenase Version 2.0 (USB) and [35S]ATP (NEN-Dupont). Reaction products were electro-
i14 71
5'
ttgcEgtcgctgctctcagacgacacaacacacagacacaggg
actggctgacgcctcaggctgcccaccagccagcgcgtgcccaccgcttgccctcctcaggcacgcccacc
] 1
ATG Met
TGG Trp
CCC Pro
CTG
Leu
TGO Trp
CCT Pro
CTT Leu
GC A Ala
GCC Ala
CTG Leu
CTG Leu
GCC Ala
CTG Leu
AGC Ser
49 !7
CTG Leu
CCC
TTT
CAA Gln
AAA Ly3
C,C C Ala
TGG
GAC
TTC
ACC
CTG
Phe
GAG Glu
TTC
Pro
phe
Trp
Asp
Phe
Thr
Leu
97 33
CTG Leu
CCC Pro
ATG Met
CTG Leu
AAC
GAT
ASh
Asp
145 49
GGC Gly
ATC Ile
CCA
GAC
CTG
GAC
Pro
Asp
Leu
Asp
193 65
241 81
289 97
337 113
385 129
433 145
CCT
TTT
Pro
Phe
GGT Gly
GAC Asp
GGC Gly
AAG Ly3
TAC Tyr
CTG Leu
CTG Leu
CTC Leu
ATC lle
ATC Ile
GGC Gly
AAG Lys
CAG GIn
CAG Gln
CAC His
TGC Cys
GCT Ala
CAC His
GTG Val
TGC Cys
CCC Pro
GAG Glu
GAA Glu
ACA
ACC
TCG
Thr
Thr
Set
TCC
CTC
CCA
CCC
ACC
TAC
AGC
Ser
Leu
Pro
Pro
Thr
Tyr
Set
GCC Aia
ATG Met
TGC CyS
CAT His
AAG Lys
CTG Leu
GAG Glu
ASh
Ile
$er
AAG Lys
GCC Ala
GC~ Gly
GCA Ala
Asn
GAG Glu
Asp
GGT Gly
TCT
TAC Tyr
GAT
Asp
TCG Set
ATC
CTC Leu
GAT
GCT Ala
A~KT G A C
CAC His
GCC Ala
GAA Glu
AAT
Asp
CAG Gin
GCC Ala
TTG Leu
AGG Arg
ATC Ile
GAG Glu
GTC Val
GTT Val
GTT Val
CAG Gln
TGC Cys
TCC
GAC
Set
Asp
TCG
CCT
GAC
ACC
ACC
Ser
Pro
ASp
Thr
Thr
CTC Leu
CTC Leu
CGA Arg
GTG Val
TTC
AGC
CCA
CTG
CGG
Phe
Set
Pro
Leu
Arg
TCC
AAG
AAC
GAC
CTG
TGT
GAG
ATC
Set
Lys
ASh
His
Leu
Val
Glu
Ile
CCT Pro
AAA Lys
GAT
GAC
TTC
Asp
Asp
Phe
AAC
AAC
ASh
ASh
AAG Lys
CCC P~o
CTG Leu
CTG Leu
AAG Lys
GAG Gln
ATC Ile
AAG 5ys
CTG Leu
CTG LeU
AAA Lys
Set
CTC 5eu
AAC
CTG
CCC
AGC
Asn
Leu
Pro
Set
TCC Set
CTG Leu
GTG Val
GAG Glu
CTC Leu
CGC Arg
ATC Ile
CAT His
GAC
AAC
Asn
CGC Arg
ATC Ile
CGC Arg
AAG Lys
GTG Val
CCC
Asp
529 177
AAG Lys
GGC Gly
GTG Val
TTC
AGT
AAC
ATG
AAC
Set
CTC Leu
CGC
Phe
GGG Gly
Arg
ASh
Met
Asn
TGC Cys
ATT Ile
GAG Glu
ATG Met
GGT Gly
GGG Gly
AAT
CCC
Asn
Pro
CTG Leu
GAG Glu
AAC
AGC
GGC
TTT
GAA
CCT
ASh
5er
Gly
Phe
Glu
Pro
GGA Gly
GCA Ala
TTT
GAT
Phe
Asp
625 209
CTG
AAG Lys
CTC Leu
AAC
Leu
ASh
TAC Tyr
CTT Leu
C(SC AI( A r g lie
?(TA GA(J ,5CC AA
673 225
CCC Pro
AAA Lys
GAC Asp
CTC Leu
CCT Pro
GAG Glu
ACC Thr
CTC Leu
AAT Ash
GAA Glu
721 241
AAA Lys
ATC Ile
CAG Gln
GCA Ala
ATC Ile
GAG Glu
CTA Leu
GAG Glu
GAT Asp
CTC C T C Leu
769 257
TAC Tyr
AGG Arg
CTG Leu
GGC Gly
CTG Leu
GGC Gly
CAC His
AAC Asn
CAG Gln
817 273
AGC Ser
CTG Leu
AGT
TTT
CTG
CCC
ACG
Ser
Phe
Leu
Pro
Thr
CTG Leu
865 289
AAG Lys
CTG Leu
TCT Ser
AGG Arg
GTG Val
CCA Pro
GCT Ala
913 305
GTG Val
GTC Val
TAT Tyr
CTG Leu
CAC His
ACC
Thr
961 321
TTC Phe
TC-C C C A C y s Pro
GTG Val
GGC Gly
1009 337
AGC Set
CTC Leu
TTC Phe
AAC Asn
1057 353
TTT Phe
GCG Arg
TGC Cys
GTC Val
1105
AAG
TAG
aggctgtggcagtctgctgcggtggtggcttggtaagggtctcttggggtgcataaggcgtg
369
LyS
End
CTC Leu
'I,' A
,}t;<: A:',. (;ly lie
CA¢~ (~2'G (;AC CAC His Leu A S p His
AAC ASh
Leu
CGC Arg
TAC Tyr
ECC Set
AAG Ly3
TTG Leu
ATC Ile
CCC Arg
ATG Met
ATT lle
GAG Glu
AAC ASh
GGG Gly
CGG Arg
GAG Glu
CTG Leu
CAC His
TTG Leu
GAC Asp
AAC Ash
AAC ASh
GGT Gly
CTT Leu
CCA
GAC
Pro
Asp
CTC Leu
AAG Lys
CTC Leu
CTC Leu
CAG Gln
AAC
AAC
ATC
ACC
ASh
lle
Thr
AAG Lys
GTG Val
GGC Gly
GTC Val
AAC Ash
GAC
ASh
TTC Phe
GGG Gly
GTC Val
AAG Lys
AC43 G C C Arg Ala
TAC Tyr
TAC Tyr
AAC
ASh
GGC Gly
ATC Ile
AAC
CCC
GTT
CCC
TAC
TGG
Pro
Val
Pro
Tyr
Trp
GAG Glu
GTG Val
CAG Gln
CCG Pro
GCC Ala
ACC
ASh
ACT
GAC
Asp
CGC Arg
CTG Leu
GCC Ala
ATC lle
CAG Gln
TTT Phe
GGC Gly
AAC
Thr
TAT Tyr
AAA Lys
Asp
TCC
481 161
577 193
phoresed on 6% polyacrylamide-8 M urea buffer gradient gels. After drying, the gel was exposed to film by autoradiography. The sequence for the 2043 nucleotides of bovine aortic smooth muscle biglycan core protein is shown in Fig. 1. Computer-assisted analysis of the nucleotide sequence revealed an open reading frame of 369 amino acids, corresponding to a molecular mass of 41 589 Da for the complete protein core of bovine biglycan. The size of bovine biglycan protein core is similar to those reported for human [4] and rat [6], and comparison of the deduced amino acid sequences of human [10] and rat [6] biglycan with bovine biglycan revealed a striking homology of 94.6% and 95.7%, respectively. The core protein of bovine biglycan contains four possible O-linked and two possible N-linked glycosylation sites (Fig. 1). Based on amino acid sequence analysis data, Fisher et al. [11] and Neame et al. [3] have proposed that the two O-linked glycosylation sites near the aminoterminus are substituted with glycosaminoglycans (Fig. 1). In addition, bovine biglycan contains eleven leucine-rich repeat units with a consensus sequence of LXXLXLLXNXHXXHPXXXXHX (X denotes any amino acid, and H represents hydrophobic amino acids, and lower case letters denote predominate amino acid). Although there are slight variations, the composition
Pro
GGC Gly
1173 1244 1315 1386 1451 1528 1599 1670 1741 1812 1883
ASh
Thr
tgtcctgaaggggcagcaaagcaaggagccaagccccgcctttgacccccaccctccactcacggcccctt caacccccaccctggctcccaagtgtgcaggtggggcgtgatgcctggcccccatcacatgtcccttggat tcagactgcccctgccccacccgcatcatacccattcagagcgccccccccccaacca~gctttcttccca
ttcaccccaaaagcaaatgatctgagggctccagtccaaggtaaacggtccctgggtctggggggctcaag gatggagaccccactaagcccaccccacctgccagacacacatccctcctcagcccagccagctacctttg tgctcctcagccccccgccatcgtcttgttcagcttctgctctgcccagccattacccaggcaggtggagt
gggcacagctgccctcctactctgccaggctcacccgaagcctgggtgacccttccagaggccagcgaata gggagtgctgcaccccctcttccacagccaagagaggagccctgggctcagccagaccctgagggtctgtc ccactggagttcccatcatgcttctcactgtcccccttccccccatgatggctcagtcHcccctccctttc gcatccggcctctggtctggtgggggtttcaaaccatcacacccagcttgaggaggggctgcttctgaggt cggttgttgtctttcaat~aRagaaacactgtgcaataaaaaaaaaa 3'
Fig. 1. c D N A sequence and deduced amino acid sequence for the protein core of bovine biglycan. Lower case letters correspond to untranslated nucleotide sequence, while upper case letters represent an open reading frame. Solid circles and triangles indicate potential O-linked and N-linked oligosaccharide attachment sites, respectively. Potential adenylation signal is underlined.
83 and number of leucine-rich repeat units for bovine biglycan are similar to those described for human [4] and rat [6]. The deduced amino acid sequence from nucleotide sequence of bovine aortic smooth muscle cell biglycan is almost identical to the primary amino acid sequence published for the secreted form of bovine articular cartilage biglycan by Neame et al. [3]. Only one difference between the two amino acid sequences was detected: residue no. 151 was Cys (Fig. 1) instead of Glu [3]. This is consistent with the deduced amino acid sequences for the rat [6,10] and human [4] biglycan core proteins which also contain Cys at this position. In addition, the deduced amino acid sequence for the bovine biglycan protein core shown in Fig. 1 contains the amino acid sequence prior to the amino-terminus for the secreted form of bovine biglycan [3]. Fisher et al. [4] proposed that the 37 amino acids prior to the amino-terminus for the secreted form of human biglycan is composed of a prepeptide (residues No. 1 to No. 19) and a propeptide (residues No. 20 to No. 37), based upon distribution of charged amino acids. Analysis of biglycan secreted by bovine and rat aortic smooth muscle cells maintained under in vitro conditions reveals an additional species of biglycan that contains Leu (residue No. 17) as the amino-terminus [12], supporting the proposal that biglycan contains a prepeptide and a propeptide [4]. The above data suggest, however, that the prepeptide and the propeptide for rat and bovine biglycan are residues No. 1 to No. 16 and No. 17 to No. 37, respectively. Based upon these data the molecular masses for the secreted form of bovine biglycan, by aortic smooth muscle cells, containing the amino-termini Leu and Asp are 39825 and 37 326 Da, respectively.
Dreher et al. [6] identified a hypervariable region (residues No. 44 to No. 60) of the core proteins for rat, bovine and human biglycan that contains an unusually high degree of heterogeneity when compared to the remaining amino acid sequence. Examination of the prepeptide amino acid sequence for bovine, rat, and human biglycan also reveals an additional, but shorter, region with a high degree of heterogeneity when compared to the remaining amino acid sequence (Fig. 2). For amino acid residues No. 2 to No. 9 of the biglycan protein core, there are 3 substitutions for the human species (62.5% homology) and 4 substitutions for the rat species (50% homology), when compared to the bovine species. Two of the amino acid changes are conservative, i.e., variation only in the aliphatic amino acid residue, while the other substitutions are changes in the type of amino acid residue (Fig. 2). Comparison of the nucleotide sequence for the bovine biglycan core protein with those of human and rat revealed a striking homology of 81.6% and 83.0%, respectively. The greatest level of conservation was within the coding region, although selected regions of the 3' and 5' untranslated sequence did exhibit high homology. For example, the nucleotide sequence surrounding the potential polyadenylation signal (Fig. 1) was highly homologous among the three species. Dreher et al. [6] reported the presence of (CT)22 and (AC)38 dinucleotide repeats within the 3' untranslated region of the nucleotide sequence for rat biglycan. No such dinucleotide repeats were observed in the nucleotlde sequence for bovine biglycan or reported for the human biglycan nucleotide sequence [4]. We thank Dr. Kevin Dreher for the generous gift of rat biglycan cDNA, Dr. Tet-Kin Yeo for critical reading of the manuscript, and Dr. Thomas Graf from the 68
MWPLWPLAALLALSQALPFEQKAFWDFTLDDGLPMLNDEEASGAETTSGIPDLDSLPPTYSAMCPFGC R VS RG PF M D* VL P VT R L TL G L M SD V T F 136 b h r
HCHLRVVQCSDLGLKAVPKEISPDTTLLDLQNNDISELRKDDFKGLQHLYALVLVNNKISKIHEKAFS S T 204 PLRKLQKLYISKNHLVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLKNMNCIEMGGNPLENSGFEPG
272 AFDGLKLNYLRISEAKLTGIPKDLPETLNELHLDHNKIQAIELEDLLRYSKLYRLGLGHNQIRMIENG
340 SLSFLPTLRELHLDNNKLSRVPAGLPDLKLLQVVYLHTNNITKVGVNDFCPVGFGVKRAYYNGISLFN A S S M S I M
369 NPVPYWEVQPATFRCVTDRLAIQFGNYKK
Fig. 2. Comparison of bovine (b), human (h) and rat (r) biglycan deduced amino acid sequences. Human and rat sequences are taken from [10] and [6], respectively. Asterisk denotes missing amino acid in human sequence.
84
Molecular Biology Computer Research Resource, Dana-Farber Cancer Institute, Boston, MA for assistance with computer analysis. This work was supported by an American Heart Association Grant-in-Aid (No. 900735) and by the Beth Israel Hospital Pathology Foundation. References 1 Hassell, J.R., Kimuro, J. and Hascall, V.C. (1986) Annu. Rev. Biochem. 55, 539-567. 2 Ruoslahti, E. (1988) Annu. Rev. Cell Biol. 4, 229-255. 3 Neame, P.J., Choi, H.U. and Rosenberg, L.C. (1989) J. Biol. Chem. 264, 8653-8661. 4 Fisher, L.W., Termine, J.D. and Young, M.F. (1989) J. Biol. Chem. 264, 4571-4576.
5 Asundi, V., Cowan, K., Matzura, D., Wagner, W. and Dreher, K.L. (1990) Eur. J. Cell. Biol. 52, 98-104. 6 Dreher, K.L., Asundi, V., Matzura, D. and Cowan, K. (1990) Eur. J. Cell Biol. 53, 296-304. 7 Fritze, L.M., Reilly, C.F. and Rosenberg, R.D. (1985) J. Cell Biol. 100, 1041-1049. 8 Chamley-Campbell, J., Campbell, G.R. and Ross, R. (1979) Physiol. Rev. 59, 1-61. 9 Libby, P., Warner, S.J.C., Salomon, R.N. and Birinyi, L.K. (1988) N. Engl. J. Med. 318, 1493-1498. 10 Fisher, L.W., Heegaard A-M., Vetter, U., Vogel, W., Just, W., Termine, J.D. and Young, M.F. (1991) J. Biol. Chem. 266, 1437114377. 11 Fisher, L.W., Hawkins, G.R., Tuross, N. and Termine, J.D. (1987) J. Biol. Chem. 262, 9702-9708. 12 Marcum, J.A. and Thompson, M.A. (1991) Biochem. Biophys. Res. Commun. 175, 706-712.