Inr. .I. Eiorhem.
Pergarnon
1357-2725(95)00167-o
MUHAMMAD M. BASHIR,’ MING-DA HAN,’ THUMAS TUCKER,’ RONG-INE MA,’ MARK ROB3YRT MYECHAM,J JOEL ROSENBLOOM’”
Cell Bid. Vol. 28, No. 5, pp. 531-542,
1996 Coovrieht >, - ccl 1996 Elsevier Science Ltd Printed in Grea~Britain. Al1 rights reserved 1357-2725/96 $15.00 + 0.00
WILLIAM R. ABRAMS,’ GIBf30N,2 TIM RI’JTY,3
I Department of Anatomy and Histology, School of Dental Medicine, University of Philadelphia, Philadelphia, PA 19104, U.S.A., ‘Department of Pathology, University of Adelaide, Adelaide, Australia, and ‘Department of Cell Biology, Washington University, St Louis, MO 63110, U.S.A. Transfowiag growth factor (TGF)-/I is secreted as an inactive complex, which freBUentiy contains a large molecular weight bii protein de&pm&d latent TGF+bMing protein (LTBP). Recently, the LTBPs have been shown to he a gene family that eoatiirs three known members aad exhibits a muRldomaia strocture contaiaing cysaeiile-rlehmotifsthatarealsofomid inthe gene family. The present work seeks to characterize the gene and to -pare its features to that of the other LTBPs and to the fi libraries were aSea to isolate cDNA encoding LTBP-2 which was then ased to Me&y LTBP-2 isolate the correspond@ LTBP-2 gene. The ckmed cDNA eacodesa 195 kDa trmri protein 20 epidermal growth factor (EGF)-like repeats, three repeats coataiaiq eight cysteines, and one segment that appears to he a hybrid of the two. Siogie exons encode EGF repeats while the eightqsteine repeats are encoded in two exoas. two tramcripts of 7.5 a4 9.0 kb, with the presently analyzed cDN the 7.5 tramcript. Phylogenetic sequencecornto LTBP-1 Las LTBP-2, while LTBP-2 showsthe most sllggest that LTBP-1 diverged from LTBP-3, and that LTBP-2 diverged from LTBP-1. Wwdn the Bbrillia family, RbrilJin-1 is nearest to the LTBPs. While tbe domain structure of LTBP-2 is similar to that of the other LTBPs, LTBP-2 ~umiqueregiomthatmakeitthe~t member of the LTBP family. LTBP-2 may have dwl fan&ions as a member of the TGF-1 latent complex and as a structural component of microfihrils. Copyright 0 1996 Elsevier ScienceLtd Keywords: TGF-P biadhtg proteins Elastic microfibrils like repeats
FiiriIlin
Epidermal growth factor-
Int. J. Biochem. Cell Biol. (1996) 28, 531-542
INTRODUCTION
The transforming growth factor-/3 (TGF-/?) super family of cytokines is composed of a group of closely related proteins, TGF-81, TGF-82, TGF-/33, TGF-84, and TGF-/IS, having 70-80% sequence identity and a number of more distantly related proteins, including the activins, i&bins and bone morphogenetic proteins having 3040% identity to TGF-/I1 at the *To whom all correspondence should be addressed. Received 18 September 1995; accepted 8 November 1995.
primary sequence level (Roberts and Sporn, 1990). Analysis of the cDNAs encoding the TGF-Bs has demonstrated that each is initially synthesized as a larger precursor molecule containing the mature form of TGF-fi at the carboxy-terminal portion (Roberts and Sporn, 1990). After proteolytic cleavage, the two portions of the precursor remain together and are secreted as a biologicaIIy inactive, noncovalently-bound complex consisting of dimers of both the ammo terminal precursor remainder, designated latency associated peptide (LAP), and mature TGF-P (Miyazono et al., 1988; 531
532
Muhammad M. Bashiret ~1
Wakefield et al., 1988). In some cases this complex is secreted bound to another protein termed latent TGF-/3-binding protein (LTBP) (Miyazono et al., 1988; Olofsson et al., 1992). The function of LTBPs remains to be determined since it is clear that they are not necessary to maintain TGF-/? in an inactive form and they do not appear to bind mature TGF-8. Although LTBPs may facilitate the secretion of TGF-P (Miyazono el al., 1991) or binding of the inactive complex to the cell surface where activation takes place (Flaumenhaft et al., 1993), they are also found as a free protein associated with components of the extracellular matrix (Taipale et al., 1994). LTBP-1 was originally described as a glycoprotein varying in size from 125 to 160 kDa when isolated from platelets (Miyazono et al., 1988) and 170 to 190 kDa when isolated from fibroblasts (Kanzaki et al., 1990). After it was subsequently cloned (Kanzaki et al., 1990; Tsuji et al., 1990), two transcripts of 5.2 kb and 6.2-7.0 kb were identified encoding isoforms of the protein. The smaller transcript encoded a protein having an open reading frame encoding 1394 amino acids and a predicted M, of approx. 151,000 minus the signal sequence (Kanzaki et al., 1990), while the larger transcript encoded a protein of 1712 amino acids with a calculated M, of approx. 184,400 minus the signal sequence (Tsuji et al., 1990). Sequence analysis of the cDNAs demonstrated that the entire difference between the two forms could be accounted for by an additional 318 amino acids at the amino terminus of the larger form. Both forms were composed in large part of cysteine-rich modular domains. One module containing six cysteines was found to be highly homologous to an epidermal growth factor (EGF)-like repeat found in a number of other proteins while a second module containing eight cysteines was at that time unique to LTBP-1. Since then, both modules have been found in the fibrillins, which are major structural components located in l&12 nm microfibrils of the extracellular matrix of many tissues (Corson et al., 1993; Pereira et al., 1993; Zhang et al., 1994). Recently, two other TGF-P-binding proteins (LTBP-2 and LTBP-3) have been described. Analysis of the cDNAs encoding one form of LTBP-2 and that of LTBP-3 demonstrated that their overall structure was similar to LTBP-1 in that they were also composed largely of the two kinds of cysteine-rich modules described above. As with LTBP- 1, two transcripts presumably encoding
isoforms of human LTBP-2 were found, but they were significantly larger, 7.5 and 9.0 kb, than those encoding LTBP-1 (Moren et al., 1994). Only one transcript of 4.6 kb was identified in the case of LTBP-3 (Yin et al., 1995). Our laboratories have been interested in identifying and characterizing proteins composing the 10-12 nm microfibrils of the extracellular matrix. In the course of these studies, we have identified a protein that is associated with elastic-fiber microfibrils and contains the cysteine repeat structures found in both fibrillin and LTBP-1. Characterization of the cDNA from both human and bovine tissues showed that the protein was very similar to LTBP-2 as described by Moren et al. (1994). However, the sequence of our cDNA differs in several important respects from that reported by Moren et al. Here, we report the characterization of human cDNA encoding LTBP-2 and discuss how it differs from the other published sequence. In addition, we have obtained and characterized the human LTBP-2 gene encompassing the cloned cDNA, which has permitted determination of exon/intron structure. We have compared this new sequence of LTBP-2 to other members of the LTBP and fibrillin gene families and developed a phylogenetic analysis demonstrating the relationships amongst the members of the two families.
MATERIALS
AND METHODS
Screening of cDNA and genomic libraries cDNA libraries were prepared in ilgt 10 and 1 Zap Express (Stratagene) by the method of Gubler and Hoffman (1983) using poly (A+) RNA isolated from a human fibroblast cell line designated CC102 and oligo(dT) or specific oligonucleotides as primers for the reverse transcriptase step. Initial desired recombinants were identified by screening with a 1.2 kbp bovine LTBP-2 cDNA clone isolated from a ligamentum nuchae library (Gibson et al., 1995). Several clones were identified in this initial screening and upon analysis, the two largest were found to cover 4.3 kbp. Northern analysis demonstrated that these cDNAs hybridized to two transcripts of 7.5 and 9.0 kb. Exhaustive rescreening of the original oligo(dT)-primed library failed to identify any clones containing more 5’ sequences. Therefore,
Analysis of human latent TGF-b-binding
primer extension libraries were constructed and screened with restriction fragments isolated from the most 5’ segment of the available clones. The strategy and relative relationships of the overlapping clones are illustrated in Fig. 1. A human genomic library was constructed by partial digestion of the DNA with Sau3a and insertion of 14-22 kbp restriction fragments into I DASH (Stratagene) (Bashir et al., 1989). The library was screened with cDNA clones covering the entire available cDNA (7.0 kbp). Genomic clones, which were subsequently shown to encompass the entire cDNA, were purified to homogeneity by several rounds of plaque hybridization. Large scale preparations of the purified clones were characterized by restriction endonuclease digestion analysis, Southern blot analysis, and sequencing in order to compare the genomic sequence with that of the cDNA and to determine the exon/intron structure of the gene. DNA sequencing
Restriction fragments were subcloned into pUC19 and sequenced by the Sanger
533
protein-2 gene
dideoxynucleotide chain-termination method as modified for TAQ polymerase cycle sequencing using an ABI 373A automated DNA sequencer. Ends of subclones were sequenced using universal primers, with internal portions made accessible to sequencing through introduction of appropriate deletions or use of oligo-nucleotide primers complementary to insert sequences, as necessary. All reported sequences have been confirmed by multiple sequencing of both strands. Sequence data were assembled and discrepancies resolved using the Wisconsin Package (Genetics Computer Group, Madison, Wisconsin). Comparative sequence analysis of LTBP-2
The simultaneous alignment of amino acid sequences in the LTBP and fibrillin gene families was accomplished using the Clustal W method (Higgins et al., 1989). This approach begins with a rapid pairwise comparison (Wilbur and Lipman, 1983), which generates similarity scores that are then used to construct a dendogram by the Neighbor-Joining method (Saitou and Nei, 1987). Individual sequence
Amino Acid Residues (x10-~) 0
2
4
6
8
IO
12
14
16
18
l”“I”“I”“I”“l”“1”“1”“1”“I”“l’ RGD
5' untranslated Exons 1
#23
456
7 8 911 10
12414 1617181920212223242526 28 30 32 36 3’ untranslated 27 29 31 33 34 35 1315 t
cHLTBP2.5
t
Pdy A+ Signal
stop codon
cHLTBP2.4 cHLTBP2.3 cHLTBP2.2
+
cHLTBP2.1 II,'
0
Domains:
,,'I
1
EGF-like,
I,"
2
'I',
,,I,
,"I
5
; EGF-likeCa Binding, 0; 8-cysteine, m; Hybrid,Q;
'I,,
6
4-cysteine,
7
; uniqu~~
Fig. 1. Diagram of human LTBP-2 cDNA and proposed domain structure. The cDNA is divided into exons, which are numbered and drawn to scale. The proposed domain structure is based on homology to motifs found in EGF and LTBPs. The cDNA clones are identified and arrows (c) indicate the positions of oligonucleotides used in the formation of primer extension libraries.
534
Muhammad M. Bashir et al.
ATGAGGCCGCGGACCAAAGCCCGCAGCCC~GGGCGCGCC~TGCGGAACC~CTGGAGAGG~TTCCTGCCG~TCACCCTGG~TCTCTTCGT~GGCGCGGGT~ATGCCCAAA~GGACCCCGT~ 170 MRPRTKARSPGRALRNPWRGFLPLTLALFVGAGHAQRDPV
41
GGGAGATACGAGCCGGCTGtTGGAGACGCGAATCGACTG~GGCGCCCTG~GGGCAGCTA~CCGGCAGCG~CTGCAGCCA~GGTGTACAG~CTGTTCCGG~AGCAGGACG~GCCTGTCGC~ 240 GRYEPAGGDANALRRPGGSYPAAAAAKVYSLFREODAPVA
81
GGCTTGCAGCCCGTGGAGCGGGCCCAGCCGGGCTGGGGG;GGAGGCCGT~CCGCGCGCA~CAGTCGCGG~GTGTCCAGC~ACCTGCGCA~ GLOPVERAOPGWGSPRRPTEAEARRPSRAOOSRRVOPPAO
360
161
ACCCGGAGA;GCACTCCCC;GGGCCAGCAGCAACCAGCA~CCCGGACCC~GGCCGCGCC~GCTCTCCCA~GCCTGGGGA~CCCACAGCG~TCTGGGGCT~CGCCCCCAA~CCCGCCGCG~ 480 TRRSTPLGOOQPAPRTRAAPALPRLGTPQRSGAAPPTPPR '-2 2-3 GGGCGGCTCACGGGGAGGAACGTCTGCGGGGGACAGTGC~GCCCAGGAT~GACAACAGC~AACAGCACC~ACCACTGTA~CAAACCCGTTTGCGAGCCG~CGTGCCAGA~CCGGGGCTC~ 600 GRLTGRNVCGGOCCPGWTTANSTNHCIKPVCEPPCONRGS
201
TGCAGCCGCCCGCAGCTCTGTGTCTGCCG~TCTGGTTTC~GTGGAGCCC~CTGCGAGGA~GTCATTCCC~ATGAGGAAT~TGACCCCCA~AACTCCAGG~TGGCACCTC~ACGCTGGGC~ 720 CSRPOLCVCRSGFRGARCEEVIPDEEFDPONSRLAPRRW'A
241
GAGCGTTCACCCAACCTGCGCAGGAGCAG;GCGGCTGGAGCCACAGTCG~CACCAGCTGGGACCClGAG~ ERSPNLRRSSAAGEGTLARAOPPAPOSPPAPOSPPAGTLS
121
3-T4
840
361
GGCCTCAGCCAGACCCACCCTTCCCAGCAGCACGTGGGG~TGTCCCGCA~TGTCCGACT~CACCCGACT~CCACGGCCA~TAGCCAGCT~TCTTCCAAC~CCCTGCCCC~GGGACCAGG~ 960 GLSOTHPSOOHVGLSRTVRLHPTATASSOLSSNALPPCPG 4-T-s CTTGAGCAGAGAGATGGCACCCAACAGGCGGTACCTCTG~AGCACCCCT~ATCCCCCTGGGGGCTGAAC~TCACGGAGA~AATCAAGAA~ATCAAGATC~TCTTCACTC~CACCATCTG~ 1080 LEORDGTOOAVPLEHPSSPWGLNLTEKlKKlKlVFTPTlC S-l-6 AAGCAGACC;GTGCCCGTGGACACTGTGCCAACAGCTGTGGCTTC~GCATCTATTTCTG~ 1200 K 0 T C A R G H C A N S C E R G 0 T T T L Y S 0 G G H G H D P K S G FtR I Y r C
401
CAGATCCCC;GCCTGAACGcAGGCCGCTG~ATCGGCAGG~ACGAATGCT~GTGCCCCGC~AACTCCACC~GGAAGTTCT~CCACCTGCC~ATCCCGCAG~CGGACAGGG~GCCTCCAGG~ 1320 OIPCLNGGRCIGRDECWCPANSTGKFCHLPIPQPDREPPG
441
AGGGGGfCCCGCCCCAGGGCCTTGCTGGAbiGCCCCACTG~AGCAGTCCA~TTTCACACT~CCGCTCTCC~ACCAGCTGGCCTCCGTGAA~CCCTCCCTG~TGAAGGTGC~CATTCACCA~ 1440 RGSRPRALLEAPLKQSTFTLPLSNOLASVNPSLVKVHIHH
481
CCACCCGAGGCCTCAGTGCAGATCCACCAGGTGGCCCAGGTGCGGGGCGGGGTGGAGGAGGCCCTAGTGGAGAACAGCGTGGAGACCAGACCCCCGCCCTGGCTGCCTGCCAGCCCTGGC 1560 PPEASVOIHOVAOVRGGVEEALVENSVETRPPPWLPASPG
521
CACAGCCTC;GGGACAGCAACAACATCCC;GCTCGGTCTGCTGGGCCGG~GTTACCTGA~CACTGTGAA~ HSLWOSNNIPARSGEPPRPLPPAAPRPRGLLGRCYLNTVN
281
321
K-T-7
561
601
641
681 721 761
801 841
881 921 961 1001
1680
8-I-9 7-8 GGACAGTGTGCCAACCCTC;GCTGGAGCTGACTACCCAG~AGGACTGCT~TGGCAGTGT~GGAGCCTTC~GGGGGGTGA~TTTGTGTGC~CCATGCCCA~CCAGACCAGCCTCCCCGGT~ 1800 GOCANPLLELTTOEDCCGSVGAFWGVTLCAPCPPRPASPV 9-10 ATTGAGAATGGCCAGCTGGAGTGTCCTCAGGGGTACAAGAGACTGAACCTCACTCACTGCCAAGATATCAACGAGTGCT~GACCCTGGG~CTGTGCAAG~ACGCGGAGT~TGTGAATAC~ 1920 IENGQLECPOGYKRLNLTHCODINECLTLGLCKDAECVNT 10-11 AGGGGCAGC;ACCTGTGCACATGCAGACC;GGCCTCATGCTGGATCCAT~GCGGAGCCG~TGTGTGTCGGACAAGGCAA~CTCCATGCT~CAGGGACTG~GCTACCGGT~GCTGGGGCC~ 2040 RGSYLCTCRPGLMLDPSRSRCVSDKAISMLOGLCYRSLGP 11-12 GGCACCTGCACCCTGCCTT;GGCCCAGCGGATCACCAAGCAAATGCCCT~TGCCTGGCACAGAGGCCTT~ GTCTLPLAORITKOICCCSRVGKAWGSECEKCPLPGTEAF
2160
AGAGAGATCTGCCCTGCCGGCCACGGCTACACCTACGCGAGCTCCGACATCCGCCTGTCCATGAGGAAAGCCGAGGAGGAGGAACTGGCAAGGCCCCCAAGGGAGCAAGGGCAGAGGAGC 2280 REICPAGHGYTYASSDIRLSMRKAEEEELARPPREOGORS 13-14 12-13 AGCGGGGCACTGCCCGGGCCAGCAGAGAGGCAGCCCCTCCGGGTCGTCACGGACACCTGGCTTGAGGCCGGGACCATCCCTGACAAGGGTGACTCTCAGGCTGGCCAGGTCACGACCAG~ 2400 SGALPGPAEROPLRVVTDTWLEAGTIPDKGDSOAGOVTTS 14-15 GTCACTCATGCACCTGCCTGGGTCACAGGGAATGCCACAACCGATGTGC~GGTGACCCT~ 2520 VTHAPAWVTGNATTPPMPEOGIAElQEEOVTPSTDVLVTL 15-16 AGCACCCCAGIGCATTGACAGATGCGCTGCTGGAGCCACCAACGTCTGTGGCCCTGGAACCTGCGTGAACCTCCCCGATGGATACAGATGTGTCTGCAGCCCTGGCTACCAGCTGCACCCC 2640 STPGIDRCAAGATNVCGPGTCVNLPDGYRCVCSPGYOLHP 16-17 AGCCAGGCC;ACTGCACAGATGACAACGAGTGTCTGAGG~ACCCCTGCA~GGGAAAAGG~CGCTGCATC~ACCGCGTGG~GTCCTACTC~TGCTTCTGC~ACCCTGGCT~CACTCTGGC~ 2760 SOAYCTDDNECLRDPCKGKGRCINRVGSYSCFCYPGYTLA
17-18
ACCTCAGGGGCGACACAGGAGTGTCAAGATATCAATGAGTGTGAGCAGC~AGGGGTGTG~AGCGGGGGG~AGTGCACCA~CACCGAGGG~TCGTACCAC~GCGAGTGTG~TCAGGGCTA~ 2880 TSGATOECODlNECEOPGVCSGGOCTNTEGSYHCECDQGY la-19 . ATCATGGTCAGGAAAGGACACTGCCAAGXTATCAACGAATGCCGTCACCCCGGTACCTGCCCTGATGGGAGATGCGTCAATTCCCCTGGCTCCTACACTTGTCTGGCCTGTGAGGAGGGC 3000 IMVRKGHCODINECRHPGTCPDGRCVNSPGSYTCLACEEG 19-20 TACCGGGGCCAGAGTGGGAGCTGTGTAGATGTGAATGAG~GTCTGACTCCCGGGGTCTG~GCCCATGGAAAGTGCACCA~CCTAGAAGG~TCCTTCAGA~GCTCTTGTG~GCAGGGCTA~ 3120 YRGOSGSCVDVNECLTPGVCAHGKCTNLEGSFRCSCEOGY
Fig. 2. Nucleotide and deduced amino acid sequence of LTBP-2. 5’ and 3’ untranslated segments are in lowercase letters. Amino acids are numbered to the left from the start site of translation; nucleotides are numbered to the right. Division of the sequence into exons is indicated (-). A polyadenylation consensus sequence, utaaa, is underlined. (Continues on next page.)
Analysis of human latent TGF-B-binding
1041
1081 1171
1161 1701
1241 1701
1,321 1361
protein-2 gene
535
20-21 1740 GAGGTCACC;CAGATGAGA6GGGClGCCAAGATGTGGATGAGTGTGCCAGCCGGGCCTC~TGCCCCACAGGCCTCTGCCTCAACACGGA~GGCTCCTTCGCCTGCTClG~CTGrGAGAA, t V T S 0 E KG C 0 0 V II E C AS R A S C P T GI C L N T E G S F A C S A C F )i 21-22 1360 GGGTACTGGGTGAATGAAGACGGCACTGCCTGTGAAGACCTAGATGAGTGTGCCTTCCC~GGAGTCTGCCCCTCCGGAG~CTGCACCAA~ACGGCTGGC~CCTTCTCCT~CAAGGArTi,l GYWVNEDGTACEDLDECAFPGVCPSGVCTNTAGSFSCKDI 22-23 GATGGGGGC;ACCGGCCCAGCCCCCTGGG;GACTCCTGTGAAGATGTGGATGAATGTGA~GACCCCCAG~GCAGCTGCC~GGGAGGCGA~TGCAAGAAC~CTGTGGGCT~CTAC~A(~T~:~ 3480 0 G G Y R P S P LG D S C E 0 V 0 E C F 0 P Q S S C L G G C C K N T V G i f i)) 23-24 1601) CTCTGTCCC~AGGGCTTCC~GCTGGCCAA;GGCACCGTG~GTGAGGATGTGAATGAGTG~ATGGGGGAG~AGCACTGCG~ACCCCACGG~GAGTGCCTC~ACAGCCACG~~,lCTill,i ' L C P 0 G F 0 L A N G T V C F 0 V N E C M G E E H C A P H C E C I N 8 H II \ t / 24-25 t/20 TGTCTGTGCGCGCCTGGCTTCGTCAGCGCAGAGGGGGGCACCAGCTGCCAGGATGTGGACGAGTGTGCCACCACAGACCCGTGTGTGGGAGGGCACTGTGTCAACAC(:(iA(,GGClC~l ( C L C A P G F V 5 A C G G T SC 0 0 V D i C A T T I) P C V G G /( C V Nl i [I '3 25-26 AACTGTCTATGTGAGACTGGCTTCCACCCCTCCCCAGAGAGTGGAGAGTGTGTGG~TATTGACGAGTGTGAGGACTATGGAGACCC(,GTGTGIGGCA(.CTGGAAGlGTGAAAACAG~i.~ T $841! N C L C h T G i 0 P S P E S G E C V D ID E C F D Y G 0 P V (I G T W K r I N :
26-,--27
196(1 hGCTCCTACCGCTGTGTTCTGGGCTGCCAGCCTGGCTTCCACATGGCCCCGAACGGAGACTGCATTGACATAGACGAGTGCGCCAA~GA~ACCATGTGTGGCAGCCACGGC~1(.16i!.~i G S Y II C V L G r 0 P G F H M A P N G 0 C I D IO t C A N I! r M C (2 \ ii I: I ( 27-28. AACACTGATGGCTCCTTCCGCTGCClCTGTGACCAGGGCTTCGAGATCTCTCCCTCAGGCTGGGAClGlGTGGATGTGAACGAGTGlGAGCTTATGCTGGCGGlATGTGGGGLC~,LG~‘C4DHQ NT D G S F R C L C II 0 G F E 1 S P S G W 0 C V D V N E ( F LM L A V (I i, h P : 28-29 TCTGAGAACG~GGAGGGCTCCTTCCTGTGCCTCTGTGCCAGTGACCTGGAGGAGTACGATGCCCAGGAGGGG~.ACTGC~:GCCCA~GGGGGGCTGGAGGTCAGAGTA~~~~L~GAI;GCCII R 4200 C E N V E G S 1 L C L C A S D L E E Y DA 0 E G // C R P P I; A G 6 0 S M' I
1601
ACGGGGGACCATGCCCCGGCCCCCACCCGCATGGACTGCTACTCCGGGCAGAAGGGCCATGCGCCCTGCTCCAGlGlC~TGCGCCCGAACACCACACAGGCT(;AATGCi~;Cli;CAc.C: A(, 4?7!1 , TG D Ii A P A P T R M II C Y S G 0 K G H A P C 5 5 V I G I) NT T 0 A i II ( / 29-30 GGCGCTAGC~GGGGAGATG~CTGTGACCTEiGCCCGTCTGAGGACTCAGCTGAATTCAG~GAGATCTGC~ClAGlCGAA~AGGCTACATTCCTCTCGAA~GAGCCi(;(,hcGr:,G~;kiA;, 444D G A 5 W G D A C D ( C P S E D S A I F 5 E I C P S (1 K G Y I P V [ G A W T I 0 (1 30-3’ ACCATGTACACAGATGCGGATGAGTGTGT~AlATTCGGG~CTGGTCTCT~CCCGAACGG~CGGTGCCTC~ACACCGTGC~TGGllATGI~TGCCTGTGC~ATCCCGGCT;CCACTAC:,A; 4560 T M Y T DA D E C V It G P G L C P N G R C L N T V P G Y V C L C N P Ct H i 11 31-s,--32 4680 GCTTCCCAC~AGAAGTGTGAGGATCACGAiGAGTGCCAG~ACCTGGCCT~TGAGAATGG~GAGTGCGTC~ACA~GGAGG~CTCClTCCA~lGCTTCTGC~GCCCCCLG~.iCACI:CTG~;Ai A S H K KC E D H D E C 0 D L A C C N G E C V N T E G 5 F H C F C 5 P P 1~11 D 32-33 4800 CTCAGCCAGCAGCGCTGCA;GAACAGCACtAGCAGCACGGAGGACCTCC~TGACCACGA~ATCCACATG~ACATCTGCl~GAAAAAAGT~AC~AATGAT~TGTGCAGCG~ALCCCTGi,G~ L S 0 D R C M N S T S 8 T F D L P 0 H DIH M D I C W K K V T N fl V C 5! P ( il 73-34 GGGCACCGC~CCACCTACA~GGAATGCTG~TGCCAGGAC~GCGAGGCCT~GAGCCAGCA~TGTGClClG~GTC~C~.C~,A~~GAG~TLTGAGGiCTATGCl~AGClGTI;CA~i:I;Ti;CC,.i;~4970 R 5 5 L V Y A ii I ( N V A Ii G H R T T Y T t C C C II D G E A W S 0 0 C A, C II P
1641
AlTGAGGCA~AGCGGGAGGtCLGGGTCCA~TTCCGGCCA~GCTATGAGTATGGCCCCGGGCCCGATGA~~T~~CALTACA~CAlClATGG~CCAGATGGG~CCCCCTlCiii~~~A(,,A~ .I, I t A F R L A G V H F R P G Y E Y G P G P 11 D L ii Y 5 I Y G P 0 G A P i Y N y i
1401 1441
1481 1521
1561
5040
1761
Gl.CCCCGAG~ACACCGTCC~lGAGCClG~~TTCCCCAAC~CAGCCGGTC~CTCAGCGGA~CGCACACCC~T~CTTGACI~TCIT7lGCA~CCCTCAGAA~TCCAGCCCChC'ACCII;ICr 5160 G Pj4t D T V P F PA F P N T A G H S A D R T Pi F s 1) IO P s I I. 0 1) /I Y b' e -35 ACCCATCCAGAGCCCCCAG~CGCCTTCGA~GGGCTTCAG~CGGAGGAGT~CGGCATCCT~AACGGCTGT~AGAATGGCC~CTGTGTGCG~GTGCGGGAG~GCTACACCT~TGACTG,il; 5780 5 H P E I' I' A C F C GLC! A E E C G I LN G C E N G R C V R V 17 E G Y I( n i F 35-36 GAGGGCTTCCAGCTGGATGCGGCCCACATGGCCTGCGTAGATGTGAATG~GTGTGATGA~TTGAACGGG~CTGCTGTGC~CTGTGTCCA;GGTTACTGCLAGAACACAG~GG(;C~CS~A~ 54DD t L I 0 L 0 A A H M A C V D V N E C D Dl N G P A V L L V H G Y C E N I ! i; \ Y
1801
CGCT(iICAClCCTCCCCGGGATATGTGGCTGAGGCAGGG~CCCCCCACT~CACTGCCAA~GAGTAG R (_ H C 8 P G Y V A f A G P P H C T A K ESTOP
1681 1721
1586 5706 5826 5946 6066 6186 6306 6426 6546 6630
Fig. 2.-Cant.
weights are calculated from the resulting unrooted phylogenetic tree and guide the final multiple alignment. To produce the final alignment, the algorithm of Myers and Miller (1988) is then used to align progressively larger sequence groups. The BLOSUM series of amino acid weight matrices was used (Henikoff and Henikoff, 1992). At least two iterations were performed for each alignment. Phylogenetic tree construction The Neighbor-Joining
method (Saitou and
Nei, 1987) was employed to construct the phylogenetic tree. The Clustal W alignment was used to determine a topology and branch length proportional to the estimated divergence (100 minus percent of identical residues per two aligned sequences excluding gaps) of each sequence. The branch lengths therefore reflect distances between sequences and each node represents a consensus between the sequences on each branch. A bootstrapping resampling procedure (Felsenstein, 1985) was performed (1000 iterations) to obtain a measure of branching reliability.
536
Muhammad M. Bashir et al.
(our base numbering 4152, 4205, and 4206 in Fig. 2). Our cDNA sequence agrees precisely with that obtained from sequencing the gene as described below. The additional bases result in a frame shift and a difference between our amino acid sequence and that of Moren et al. extending from residue 1385 to residue 1402. Furthermore, in the sequence of Moren et al., an EGF-like repeat diverges from that of the 6-cysteine consensus sequence while our sequence conforms to the consensus.
kb 7.4 4.4 2.4
Fig. 3. Northern blot analysis. Total RNA was isolated from cultured human lung fibroblasts by guanidine thiocyanate extraction. RNA (IOpg in each lane) was separated on a formaldehyde-agarose gel, transferred to a nitrocelhtlose membrane and hybridized to the following probes: lane 1, 800 bp fragment of LTBP-2; lane 2, bases 21-526 of Moren et al. (1994). RESULTS
Sequence analysis of cDNA Overlapping cDNA clones were obtained as described in Materials and Methods and the relative relationships of these clones are illustrated in Fig. 1. Sequence analysis of these clones demonstrated that the cDNA contained an open reading frame of 5460 bp encoding a protein of 1820 amino acids with a predicted molecular weight of 195,050. Also present were a 5’ untranslated segment of 387 bp and a 3’ untranslated segment of 1164 bp. While this work was in progress, Moren et al. (1994) published the sequence of cDNA encoding the same protein. While our sequence largely agrees with theirs, the two sequences differ in two important respects. First, the sequence of Moren et al. contains a long (1.8 kbp) 5’ untranslated sequence, the majority of which we did not find in any of our clones. During comparative analysis of our sequence and theirs, which included BLAST searches of the GenBank databases, we found that the first 1420 bp of the published sequence corresponded to a sequence encoding a portion of human seryl-tRNA synthetase (GenBank Accession No. TO6028). Second, our sequence contains three additional guanines in the coding portion
Northern analysis In order to determine the size of the mRNA transcripts corresponding to the isolated cDNA, Northern analysis was carried out using total RNA isolated from cultured lung fibroblasts (GM05839, American Type Culture). When a probe covering bases 1952-2823 was used, two transcripts of 7.5 and 9.0 kb were identified (Fig. 3, lane l), in agreement with the findings of Moren et al. (1994). Using the lung fibroblast RNA as template, we also isolated a 506 bp cDNA fragment by reverse transcriptase/ polymerase chain reaction corresponding to bases 21-526 of Moren et al. and used it as a probe in Northern analysis. This probe identified a 2 kb transcript, presumably corresponding to human seryl-transferase, which to date has not been cloned in its entirety. However, the transcript for hamster seryl-tRNA synthetase is 2.5 kb (Debastisse et al., 1984), and those for other mammalian synthetases are quite variable. These results, then, demonstrate that the 5’ end of the sequence obtained by Moren et al. encompassing 1420 bp is the result of a cloning artifact incorporating a portion of seryl-tRNA synthetase cDNA. Structure of the gene A partial Sau3a human genomic library was constructed in 1 DASH and screened with cDNA clones covering the entire transcript. Genomic clones, encompassing approximately 100 kbp, were identified and characterized by restriction endonuclease mapping and DNA sequencing (Fig. 4). The analyses demonstrated that the entire cloned cDNA sequence was contained within the cloned genomic DNA, and permitted the definition of the exon/intron structure. However, it should be noted that although the characterized genomic DNA included all the cloned cDNA, it did not form a continuous genomic sequence. Indeterminate portions of introns 1, 2, 3 and 8 are missing
Analysis of human latent TGF-b-binding
(Fig. 4), but it did not seem worthwhile at present to rescreen genomic libraries just to fill in these gaps. Thirty-six exons were identified in the gene, and an interesting feature is the progressive dilution of coding sequence as one progresses from the 3’ to the 5’ end of the gene (compare segments A, B, and C to segments D and E in Fig. 4). Exons l-3 are embedded in at least 33 kbp of genomic DNA while exons 19-36 are contained in only 14 kbp. Division of the cDNA into exon segments is illustrated in Fig. 1. The EGF-like repeats are encoded in single exons, while the eight-cysteine repeats are not. Exon/intron borders, shown in Fig. 5, conform to the consensus sequences deduced from analysis of a large number of eukaryotic genes (Padgett et al., 1986). Generally (31 out of 35 cases), the borders divide codons such that a single nucleotide is found at the 3’ end of an exon and the remaining two nucleotides of the codon are found at the 5’ end of the following exon. A similar exon border structure is found Segment
A
Segment
C . .
Segment
E
Protein sequence alignments
We have used the Clustal W program of progressive multiple sequence alignment to examine the relationship within and between the LTBP and fibrillin groups (Higgins and Sharp, 1989). This alignment method offers improved sensitivity over previous approaches by assigning individual weights to each sequence based on degree of similarity, by varying amino acid weight matricies as the alignment proceeds depending on estimated distances, and by the use of residue-specific, hydrophilic region-specific and locally determined reduced gap penalties (Thompson et al., 1994). We have compared three regions (Fig. 6) of 10 gene products: human (LTBPlHUM) and rat Segment
D
Sites:
537
in the human fibrillin-l gene (Pereira et al., 1993). In two cases in the LTBP-2 gene, codons are divided so that a single base is found at the 5’ border and in two cases the border does not divide a codon.
B
ov 0
Nucleotides Restriction
protein-2 gene
Hind Ill, A; Barn HI,
(kb)
l ; Sst I,
V; Eco RI, 0; Sal I, +
Fig. 4. Diagram of human LTBP-2 gene. The entire cDNA illustrated in Fig. 1 is contained in five nonoverlapping segmentsof genomic DNA. The individual I clones are identified and exons are numbered.
538
Muhammad M. Bashir et al.
(LTBPlRAT) LTBP-I, human (LTBP2HUM) and bovine (LTBP2BOV) LTBP-2, mouse (LTBP3MUS) LTBP-3, human (FBNIHUM), bovine (FBNIBOV) and mouse (FBNlMUS)
fibrillin-1, and human (FBN2HUM) and mouse (FBN2MUS) fibrillin-2 (Kanzaki et al.. 1990; Tsuji et al., 1990; Corson et al., 1993; Pereira et al., 1993; Zhang et al., 1994; Moren et al.,
Exon Size # lnitia tion
ACGE
gt
gagcatgggagagggaagtt
AAAL gt
gagtgcctccaccagtagca
actctgttttgtgaccccac
ag LAGG
acctctctccctgtgtcctc
ag ZGTT
cgtgccgcggttgtctccac
ag LACC
TGG:
gacccaagccctctctctgc
GCTS
gt
aagtccccttctccatgggg
gt
aaggataccctgtcttgggt
ag ;CTGi
ATCL gt
gagtccctttcctgctcaga
accttgagggttgatgttcg
ag ATTTC,
CTG& gt
gagtgccggcacagctgggc
tgactctagcccctgtgccc
ag C&TCC:%!
tgtccatggcctctccttcc tgtccgtatctccactcttc
ag TX ag ETCC
k$f&;
gtcatctcctccaaccctgc
ag ZATC
1
ccacgtgctgtgtctgttgc
ag EGAC
cctctggctgttggctttgc
ag SGCC
1
cgctgttacctttcccttcc
ag GGAC
accttctcctgtttctgtgc
i
gt
gagaacacagctctggcctc
CCAL gt
&i
gagtgctggagtccaaggtc
CAAi
aggaccccagggccagggnt
gt
I GTGL gt
gagtgctggcagggcatggg
t ACAL gt
aacctcctgttcccatccca
: AAGi
gt
atctgagctgggggagatgc
ag GTC
:'%$f?$$i~ Cs gt ~~@~~?~;f$BACAL gt
gagtgcgtcctcccactagg gagtccctcttcctggcggc
tcttcccctctctcctccac
ag EAAT
4S:r,'4@&
CCA& gt
agctgcagccagacctttgg
tgttttgctctgtctcccac
ag EATT
$@+~d#p
ACAL gt
attcacttgggtgccctggg
ctccctgccttcttctggtc
ag A~GAC-'~@k4$%+:;
CAAL gt
aagccaagcccgcccttgcc
gggcttttctgtgtgttcac
ag ~ATC:l?&W#&?~
CAAi
gggaggggctgctgttttgc gtcttggacttctctcccac
ag ATATC~&@&+$@: GTAL gt ~%3!~#@~~ CAAG gt ag ALGTG:'
agaatgcaatgccctatttc ggcgagcgctgtctgtctgc
ag ATGTG ag GCTA
GAAi
GAAL gt
atgagcctctgtgtgggaga
cctcccctgtgtggccctgc
ag ATGTGB
GAGi gt
gagtgacccaggtcacccta
actccccatcttttccctgc
ag
CAGL gt
gagaagccagcacagctacc
ctgacagccatctctactct
ag ATGTGj
catggcatggctctggggac
ag &ATT
cctgggttccactgtcccac
ag
ATGTG:
' GTGi
gt
gt
gt
aaggaacgggaaggagggaa gagggctgctggccaaggca acagagatgttgaagcgccc aaccctggggaggcagtggt
awwtchxwm'gaggg
ACATA
ATTk gt GTGL gt
wcCwwwwwgagaga gaggagcctgtgggctaccc
agcctgttctcacacctggc
ag ATGTG
GGA& gt
gaggcctgggagacacatta
aatgagctactgtcacgcac
ag
TCAk gt
aagaccccaagggttccaga
gctgaaacgcctctctacat
ag CGAA
ACAG- gt
aatctgctccatcctctgcc
acctcagctctgccttccac
ag
GAGi gt
aaggacacacatgatccctg
ccaccctttcctgcctcccc
ag ECAC
cctccctgccctgtgccacc
ag AGGAC
ttctctccctcctccctgac ctctccgatttcctcctccc
ag AGGTC ag ECCC
ACQ gt TCTi gt CCAg gt
wgtwwctwcgtgggg wwwwcawmgata ctgtgttgctgcaaaactgg
actctcctttgtttcctctt
ag ZGTG
GLCAG ATGCG
GTAi gt aagtgcagaattctgatgga Termination
Fig. 5. Nucleotide sequence of the LTBP-2 gene exon/intron junctions. Exons are nuqbered in a 5’ to 3’ direction. Divided border exons are underlined with a solid bar, while undivided border exons are underlined with an open bar. Twenty bases of each flanking intron are shown.
Analysis of human latent TGF-b-binding
1994; Yin et al., 1995; Gibson et al., 1995). Each region for analysis was selected because discrete ends could be identified thereby strengthening the alignment by insuring the comparison of truly analogous segments. Clustal W alignments of regions A, B and C produced consensus sequences that are 188, 559 and 246 residues in length, respectively. The phylogenetic tree shown in Fig. 7 represents region A. Phylogenetic trees constructed from alignments of regions B and C bear the same topology and are not shown. Within the LTBP clade, LTBP-3 is more similar to LTBP-1 than to LTBP-2. LTBP-2 shows greater similarity to the fibrillins than either LTBP-I or LTBP-3. Within the fibrillin clade, fibrillin- I is nearest to the LTBPs. Bootstrap resampling analysis (Felsenstein, 1985) of the phylogenies showed that the branches are strongly supported. DISCUSSION
In the original characterization of LTBP-1, Northern analysis showed two transcripts of 5.2 and 6.2-7.0 kb. Analysis of the corresponding cDNAs indicated that the longer transcript, which was cloned using rat kidney RNA (Tsuji et al., 1990), differed significantly only at the 5’
protein-2 gene
539
end from the shorter transcript, which was cloned using human fibroblast RNA (Kanzaki et al., 1990). Besides containing differing 5’ untranslated sequences, the encoded signal sequences were very dissimilar and most importantly, the longer transcript encoded an additional 318 amino acids at the amino end. In the remainder of the coding sequence where the two transcripts overlapped, they were very homologous with only rare base and amino acid changes which were largely conservative. Thus, it is unlikely that the major difference in the coding sequence located at the 5’ end is due to species variation. The origin of the two different size transcripts is presently unknown. One possibility is that of alternative splicing in which different 5’ exons are utilized in an exclusive fashion. A second possibility is that alternative promoters are used in which different 5’ exons generated by the two separate promoters are spliced to a common downstream exon. In this model, the downstream promoter would be located within an intron of the larger transcript and, depending upon the structure of the gene, alternative splicing might or might not be involved. Our present analysis encompasses 6983 bp of cDNA. The first methionine in the open reading B
Four cysteine
0
EGF-like
0
EGF-like domain
p
0
$
n
domain
domain
Eight cyst&e
calcium domarn
Hybrid
LTBPl Rat
LTBP2 Hum
FBNl Human
FBN2 Human
Fig. 6. Schematic diagrams of the domain structure of the LTBPs and fibriilins. Segments A, B, and C were used in the comparative sequence analysis and generation of phylogenetic trees as discussed in the text and illustrated in Fig. 7.
binding
540
Muhammad
frame begins at base 387 and Moren et al. (1994) used this methionine as the start codon in the construction of an expression vector. They obtained labeled protein which was immunoprecipitable with LTBP-2-specific antibody from the culture media of COS-1 cells transfected with this vector even though a potential
M. Bashir et al.
(but experimentally unproven) hydrophobic secretory signal sequence is not found until 20-33 residues following the initiator methionine. Furthermore, when the LTBP-2 construct was co-transfected with one expressing TGF-/? precursor, the two were secreted as a latent complex. These results support the selected
0.019
1000 =-L19
z::
0.152
0.016
FBNI Bovine
FBNl Human
1.049
-
LTBPZ Bovine
0.164
0.040
-
3.089
LTBP2 Human
r
0.020
0.173
0.031
LTBPl Rat
LTBPf Human
82t
0.202
LTBP3 Mouse
Fig. 7. Phylogenetic tree generated from the alignment of region A (Fig. 6) using the Clustal W method and built with the Neighbor-joining method. The branch lengths reflect distances between sequences (1.0 minus the nmnber of mismatches divided by total number of residues excluding gaps) and each length is printed above each branch. Every node represents a consensus between the sequences on each branch. The number at each node denotes the proportion of 1000 bootstrap iterations that supported the subset of sequences as illustrated.
Analysis of human latent TGF-b-binding
methionine as a legitimate start codon. Based upon this identification, the presently cloned 6983 bp of cDNA most likely corresponds to the 7.5 kb transcript, not the larger transcript as suggested by Moren et al. (1994). Presumably, the remaining 500 bases consist of a polyadenylation tract and possibly additional bases in the 5’ untranslated segment. Analysis of the genomic sequence 5’ of the cloned cDNA (not shown) demonstrated that it was very G + C rich with a high frequency of CpG dinucleotides consistent with it being a GC island (Gardiner-Garden et al., 1987). Although no canonical TATA or CCAAT boxes were found within 500 bases extending 5’ of the cDNA, several consensus sequences for the binding of transcription factors, including SPI, were identified. These features are characteristic of widely expressed genes. However, future primer extension and nuclease protection experiments are necessary to identify the origin of transcription and the legitimate promoter segment. The present analyses also leave open the origin of the larger 9.0 kb transcript. Our Northern analyses indicate that the cloned cDNA is present in both the 7.5 and 9.0 transcripts and thus it is possible that the situation with LTBP-2 is similar to that of LTBP, in which case the transcript may arise through alternative splicing or through the action of another promoter segment. Further cDNA cloning is necessary to identify the additional 1500 bases which will then permit the localization of the origin of transcription of the longer transcript and the corresponding genomic region, including the promoter. The LTBPs are a family of large glycoproteins containing multiple domains that are rich in cysteine. The most common motif is an EGF-like sequence that repeats between 8 and 13 times in one contiguous segment. The function of the LTBPs is uncertain, although there is convincing evidence that these proteins bind TGF-8 intracellularly and facilitate its proper folding and secretion (Miyazono et al., 1991). The LTBPs have also been shown to associate with the extracellular matrix of cells, where they may provide a bridge between TGF-P and matrix-associated proteins (Taipale, 1994). Another function of the LTBPs may be as structural components of the extracellular matrix. LTBP-I has been shown to associate with the extracellular matrix by immunohistochemical localization (Mizoi et al., 1993; Waltenberger et al., 1993; Yamazaki et, al.,
protein-2 gene
541
1994). LTBP-2 has been specifically localized to elastic fibers by immuno-light microscopy, while localization at the ultrastructural level indicates that the protein is confined to microfibrils (Gibson et al., 1995). This is a particularly intriguing finding considering the strong structural similarities between the LTBPs and the fibrillins. Fibrillins are thought to provide the basic scaffolding of the microfibril and contain many of the same cysteine-rich motifs found in the LTBPs (see Fig. 6). Similarities of structural domains shared by the fibrillins and LTBPs imply closely related gene families. Phylogenetic analysis of relatedness using sequence alignment and comparison algorithms established that the LTBPs and fibrillins are, indeed, homologous. Homology implies divergence from a common ancestor, which for most proteins is indicated by a sequence identity of greater than 20%. Although the fibrillins and LTBPs clearly form their own gene families, the degree of identity between members of the two families exceeds the 20% threshold. For large multidomain proteins different parts of the protein frequently have different genetic origins. This appears to be true for both the fibrillins and LTBPs where greater than 50% of their nucleotides were obtained from an ancestral gene with homology to EGF and approx. 20% of the coding sequence appears to have diverged from a gene encoding the 8 cysteine motifs. EGF motifs are commonly found in extracellular matrix proteins where they are thought to provide structural stability to regions that connect other functional domains. The 8 cysteine motifs, in contrast, are unique to the fibrillin and LTBP gene families. Analysis of intron-exon boundaries suggests that exons for the EGF and eight-cysteine motifs have been shuffled from common progenitor genes into the LTBPs and fibrilhns with subsequent expansion of individual exons and, possibly, groups of exons, through gene duplication or chromosomal cross-over events. Phylogenetic analysis (Fig. 7) predicts that LTBP-3 may be the common progenitor for both gene families, but we cannot exclude the possibility of LTBP and fibrillin divergence from an as-yet unidentified precursor. Acknowledgements-Supported by NIH grant AR41414. We thank the Biopolymer Analysis Laboratory, University of Pennsylvania Dental School, for DNA sequence determination and informatics support.
542
Muhammad REFERENCES
Bashir M. M., Indik Z., Yeh H., Ornstein-Goldstein N., Rosenbloom J. C., Abrams W., Fazio M., Uitto J. and Rosenbloom J. (1989) Characterization of the complete human elastin gene. Delineation of unusual features in the 5’-flanking region. J. Biol. Chem. 264, 8887-8891. Corson G. M., Chalberg S. C., Dietz H. C., Charbonneau N. L. and Sakai L. Y. (1993) Fibrillin binds calcium and is coded by cDNAs that reveal a multidomain structure and alternatively spliced exons at the 5’ end. Genomics 17, 47-84. Debatisse M., Robert De Saint Vincent B. and Buttin G. (1984) Expression of several amplified genes in an adenylate-deaminase overproducing variant of Chinese hamster fibroblasts. EMBO J. 3, 3123-3127. Felsenstein J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783-791. Flaumenhaft R., Abe M., Sato Y., Miyazono K., Harpel J., Heldin C.-H. and Rifkin D. B. (1993) Role of the latent TGF-B binding protein in the activation of latent TGF-8 by co-cultures of endothelial and smooth muscle cells, J. Cell. Biol. 120, 995-1002. Gardiner-Garden M. and Frommer M. (1987) CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261-282. Gibson M. A., Hatzinikolas G., Davis E., Baker E., Sutherland G. R. and Mecham R. P. (1995) Bovine latent TGF-B l-binding protein-2: molecular cloning, identification of tissue isoforms and immunolocalization to the elastin-associated microfibrils. Molec. Cell. Biol. 15, 6932-6942. Gubler U. and Hoffman B. J. (1983) A simple and very efficient method for generating cDNA libraries. Gene 2, 263-269. Henikoff S. and Henikoff J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915-10,919. Higgins D. G. and Sharp P. M. (1989) Fast and sensitive multiple sequence alignments on a microcomputer. CABZOS 5, 151-153. Kanzaki T., Olofsson A., Moren A., Wernstedt C., Hellman U., Miyazono K., Claesson-Welsh L. and Heldin C.-H. (1990) TGF-/?l binding protein: a component of the large latent complex of TGF-/?l with multiple repeat sequences. CeN 61, 1051-1061. Miyazono K., Hellman U., Wemstedt C. and Heldin C.-H. (1988) Latent high molecular weight complex of transforming growth factor-/?l. Purification from human platelets and structural characterization. J. Biol. Chem. 263, 6407-6415. Miyazono K., Olofsson A., Colosetti P. and Heldin C.-H. (1991) A role of the latent TGFB l-binding protein in the assembly and secretion of TGF-81. EMBO J. 10, 1091-1101. Mizoi T., Ohtani H., Miyazono K., Miyazawa M., Matsuno S. and Nagura H. (1993) Immunoelectron microscopic localization of transforming growth factor beta-l and latent transforming growth factor beta-l binding protein in human gastrointestinal carcinomas: qualitative difference between cancer cells and stromal cells. Cancer Res. 53, 183-190. Moren A., Olofsson A., Stenman G., Sahlin P., Kanzaki T., Claesson-Welsh L., ten Dijke P., Miyazono K. and Heldin C.-H. (1994) Identification and characterization of
M Bashir et al. LTBP-2, a novel latent transforming growth factor-pbinding protein. J. Biol. Chem. 269, 32,469-32,478. Myers E. W. and Miller W. (1988) Optimal alignments in linear space. CABZOS 4, 11-l 7. Padgett R. A., Grabowski P. J., Konarska M. M., Seiler S. and Sharp P. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119-1150. Olofsson A., Miyazono K., Kanzaki T., Colosetti Pl., Engstrom U. and Heldin C.-H. (1992) Transforming growth factor-p 1, $2, and -83 secreted by a human glioblastoma cell line. Identification of small and different forms of large latent complexes. J. Biol. Chem. 267, 19,482-19,488. Pereira L., D’Alessio M., Ramirez F., Lynch J. R., Sykes B., Pangillinan T. and Bonadio J. (1993) Genomic organization of the sequence coding for fibrillin, the defective gene product in Marfan syndrome. Human Mol. Genet. 2, 961-968. Roberts A. F. and Sporn M. B. (1990) The transforming growth factor-bs. In Peptide Growth Factors and Their Receptors Z (Edited by Spom M. B. and Roberts A. B.), pp. 419-472. Springer-Verlag, Berlin. Saitou N. and Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406425. Taipale J., Miyazono K., Heldin D.-H. and Keski-Oja J. (1994) Latent transforming growth factor-j3 1 associates to fibroblast extracellular matrix via latent TGF-P binding protein. J. Cell Biol. 124, 171-181, Thompson J. D., Higgins D. G. and Gibson T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Rex 22, 46734680. Tsuji T., Okada F., Yamaguchi K. and Nakamura T. (1990) Molecular cloning of the large subunit of transforming growth factor type /I masking protein and expression of the mRNA in various rat tissues. Proc. Natl. Acad. Sci. USA 87, 8835-8839. Wakefield L. M., Smith D. M., Flanders K. C. and Sporn M. B. (1988) Latent transforming growth factor-8 from human platelets. J. Biol. Chem. 263, 76467654. Waltenberger J., Lundin L., Oberg K., Wilander E., Miyazono K., Heldin C.-H. and Funa K. (1993) Involvement of transforming factor-/l in the formation of fibrotic lesions in carcinoid heart disease. Am. J. Pathol. 142, 71-78. Wilbur W. J. and Lipman D. J. (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80, 726-730. Yamazaki M., Minota S., Sakurai H., Miyazono K., Yamada A., Kanazawa I. and Kawai M. (1994) Expression of transforming growth factor-#I 1 and its relation to endomysial fibrosis in progressive muscular dystrophy. Am. J. Pathol. 144, 221-226. Yin W., Smiley E., Germiller J., Mecham R. P., Florer J. B., Wenstrup R. J. and Bonadio J. (1995) Isolation of a novel latent transforming growth factor-b binding protein gene (LTBP-3). J. Biol. Chem. 270, 10,147-10,160. Zhang H., Apfelroth S. D., Hu W., Davis E. C., Sanguineti C., Bonadio J., Mecham R. P. and Ramirez F. (1994) Structure and expression of fibrillin-2, a novel microfibrillar component preferentially located in elastic matrices. J. Cell Biol. 124, 855-863.