Analysis of the human gene encoding latent transforming growth factor-β-binding protein-2

Analysis of the human gene encoding latent transforming growth factor-β-binding protein-2

Inr. .I. Eiorhem. Pergarnon 1357-2725(95)00167-o MUHAMMAD M. BASHIR,’ MING-DA HAN,’ THUMAS TUCKER,’ RONG-INE MA,’ MARK ROB3YRT MYECHAM,J JOEL ROSEN...

1MB Sizes 0 Downloads 71 Views

Inr. .I. Eiorhem.

Pergarnon

1357-2725(95)00167-o

MUHAMMAD M. BASHIR,’ MING-DA HAN,’ THUMAS TUCKER,’ RONG-INE MA,’ MARK ROB3YRT MYECHAM,J JOEL ROSENBLOOM’”

Cell Bid. Vol. 28, No. 5, pp. 531-542,

1996 Coovrieht >, - ccl 1996 Elsevier Science Ltd Printed in Grea~Britain. Al1 rights reserved 1357-2725/96 $15.00 + 0.00

WILLIAM R. ABRAMS,’ GIBf30N,2 TIM RI’JTY,3

I Department of Anatomy and Histology, School of Dental Medicine, University of Philadelphia, Philadelphia, PA 19104, U.S.A., ‘Department of Pathology, University of Adelaide, Adelaide, Australia, and ‘Department of Cell Biology, Washington University, St Louis, MO 63110, U.S.A. Transfowiag growth factor (TGF)-/I is secreted as an inactive complex, which freBUentiy contains a large molecular weight bii protein de&pm&d latent TGF+bMing protein (LTBP). Recently, the LTBPs have been shown to he a gene family that eoatiirs three known members aad exhibits a muRldomaia strocture contaiaing cysaeiile-rlehmotifsthatarealsofomid inthe gene family. The present work seeks to characterize the gene and to -pare its features to that of the other LTBPs and to the fi libraries were aSea to isolate cDNA encoding LTBP-2 which was then ased to Me&y LTBP-2 isolate the correspond@ LTBP-2 gene. The ckmed cDNA eacodesa 195 kDa trmri protein 20 epidermal growth factor (EGF)-like repeats, three repeats coataiaiq eight cysteines, and one segment that appears to he a hybrid of the two. Siogie exons encode EGF repeats while the eightqsteine repeats are encoded in two exoas. two tramcripts of 7.5 a4 9.0 kb, with the presently analyzed cDN the 7.5 tramcript. Phylogenetic sequencecornto LTBP-1 Las LTBP-2, while LTBP-2 showsthe most sllggest that LTBP-1 diverged from LTBP-3, and that LTBP-2 diverged from LTBP-1. Wwdn the Bbrillia family, RbrilJin-1 is nearest to the LTBPs. While tbe domain structure of LTBP-2 is similar to that of the other LTBPs, LTBP-2 ~umiqueregiomthatmakeitthe~t member of the LTBP family. LTBP-2 may have dwl fan&ions as a member of the TGF-1 latent complex and as a structural component of microfihrils. Copyright 0 1996 Elsevier ScienceLtd Keywords: TGF-P biadhtg proteins Elastic microfibrils like repeats

FiiriIlin

Epidermal growth factor-

Int. J. Biochem. Cell Biol. (1996) 28, 531-542

INTRODUCTION

The transforming growth factor-/3 (TGF-/?) super family of cytokines is composed of a group of closely related proteins, TGF-81, TGF-82, TGF-/33, TGF-84, and TGF-/IS, having 70-80% sequence identity and a number of more distantly related proteins, including the activins, i&bins and bone morphogenetic proteins having 3040% identity to TGF-/I1 at the *To whom all correspondence should be addressed. Received 18 September 1995; accepted 8 November 1995.

primary sequence level (Roberts and Sporn, 1990). Analysis of the cDNAs encoding the TGF-Bs has demonstrated that each is initially synthesized as a larger precursor molecule containing the mature form of TGF-fi at the carboxy-terminal portion (Roberts and Sporn, 1990). After proteolytic cleavage, the two portions of the precursor remain together and are secreted as a biologicaIIy inactive, noncovalently-bound complex consisting of dimers of both the ammo terminal precursor remainder, designated latency associated peptide (LAP), and mature TGF-P (Miyazono et al., 1988; 531

532

Muhammad M. Bashiret ~1

Wakefield et al., 1988). In some cases this complex is secreted bound to another protein termed latent TGF-/3-binding protein (LTBP) (Miyazono et al., 1988; Olofsson et al., 1992). The function of LTBPs remains to be determined since it is clear that they are not necessary to maintain TGF-/? in an inactive form and they do not appear to bind mature TGF-8. Although LTBPs may facilitate the secretion of TGF-P (Miyazono el al., 1991) or binding of the inactive complex to the cell surface where activation takes place (Flaumenhaft et al., 1993), they are also found as a free protein associated with components of the extracellular matrix (Taipale et al., 1994). LTBP-1 was originally described as a glycoprotein varying in size from 125 to 160 kDa when isolated from platelets (Miyazono et al., 1988) and 170 to 190 kDa when isolated from fibroblasts (Kanzaki et al., 1990). After it was subsequently cloned (Kanzaki et al., 1990; Tsuji et al., 1990), two transcripts of 5.2 kb and 6.2-7.0 kb were identified encoding isoforms of the protein. The smaller transcript encoded a protein having an open reading frame encoding 1394 amino acids and a predicted M, of approx. 151,000 minus the signal sequence (Kanzaki et al., 1990), while the larger transcript encoded a protein of 1712 amino acids with a calculated M, of approx. 184,400 minus the signal sequence (Tsuji et al., 1990). Sequence analysis of the cDNAs demonstrated that the entire difference between the two forms could be accounted for by an additional 318 amino acids at the amino terminus of the larger form. Both forms were composed in large part of cysteine-rich modular domains. One module containing six cysteines was found to be highly homologous to an epidermal growth factor (EGF)-like repeat found in a number of other proteins while a second module containing eight cysteines was at that time unique to LTBP-1. Since then, both modules have been found in the fibrillins, which are major structural components located in l&12 nm microfibrils of the extracellular matrix of many tissues (Corson et al., 1993; Pereira et al., 1993; Zhang et al., 1994). Recently, two other TGF-P-binding proteins (LTBP-2 and LTBP-3) have been described. Analysis of the cDNAs encoding one form of LTBP-2 and that of LTBP-3 demonstrated that their overall structure was similar to LTBP-1 in that they were also composed largely of the two kinds of cysteine-rich modules described above. As with LTBP- 1, two transcripts presumably encoding

isoforms of human LTBP-2 were found, but they were significantly larger, 7.5 and 9.0 kb, than those encoding LTBP-1 (Moren et al., 1994). Only one transcript of 4.6 kb was identified in the case of LTBP-3 (Yin et al., 1995). Our laboratories have been interested in identifying and characterizing proteins composing the 10-12 nm microfibrils of the extracellular matrix. In the course of these studies, we have identified a protein that is associated with elastic-fiber microfibrils and contains the cysteine repeat structures found in both fibrillin and LTBP-1. Characterization of the cDNA from both human and bovine tissues showed that the protein was very similar to LTBP-2 as described by Moren et al. (1994). However, the sequence of our cDNA differs in several important respects from that reported by Moren et al. Here, we report the characterization of human cDNA encoding LTBP-2 and discuss how it differs from the other published sequence. In addition, we have obtained and characterized the human LTBP-2 gene encompassing the cloned cDNA, which has permitted determination of exon/intron structure. We have compared this new sequence of LTBP-2 to other members of the LTBP and fibrillin gene families and developed a phylogenetic analysis demonstrating the relationships amongst the members of the two families.

MATERIALS

AND METHODS

Screening of cDNA and genomic libraries cDNA libraries were prepared in ilgt 10 and 1 Zap Express (Stratagene) by the method of Gubler and Hoffman (1983) using poly (A+) RNA isolated from a human fibroblast cell line designated CC102 and oligo(dT) or specific oligonucleotides as primers for the reverse transcriptase step. Initial desired recombinants were identified by screening with a 1.2 kbp bovine LTBP-2 cDNA clone isolated from a ligamentum nuchae library (Gibson et al., 1995). Several clones were identified in this initial screening and upon analysis, the two largest were found to cover 4.3 kbp. Northern analysis demonstrated that these cDNAs hybridized to two transcripts of 7.5 and 9.0 kb. Exhaustive rescreening of the original oligo(dT)-primed library failed to identify any clones containing more 5’ sequences. Therefore,

Analysis of human latent TGF-b-binding

primer extension libraries were constructed and screened with restriction fragments isolated from the most 5’ segment of the available clones. The strategy and relative relationships of the overlapping clones are illustrated in Fig. 1. A human genomic library was constructed by partial digestion of the DNA with Sau3a and insertion of 14-22 kbp restriction fragments into I DASH (Stratagene) (Bashir et al., 1989). The library was screened with cDNA clones covering the entire available cDNA (7.0 kbp). Genomic clones, which were subsequently shown to encompass the entire cDNA, were purified to homogeneity by several rounds of plaque hybridization. Large scale preparations of the purified clones were characterized by restriction endonuclease digestion analysis, Southern blot analysis, and sequencing in order to compare the genomic sequence with that of the cDNA and to determine the exon/intron structure of the gene. DNA sequencing

Restriction fragments were subcloned into pUC19 and sequenced by the Sanger

533

protein-2 gene

dideoxynucleotide chain-termination method as modified for TAQ polymerase cycle sequencing using an ABI 373A automated DNA sequencer. Ends of subclones were sequenced using universal primers, with internal portions made accessible to sequencing through introduction of appropriate deletions or use of oligo-nucleotide primers complementary to insert sequences, as necessary. All reported sequences have been confirmed by multiple sequencing of both strands. Sequence data were assembled and discrepancies resolved using the Wisconsin Package (Genetics Computer Group, Madison, Wisconsin). Comparative sequence analysis of LTBP-2

The simultaneous alignment of amino acid sequences in the LTBP and fibrillin gene families was accomplished using the Clustal W method (Higgins et al., 1989). This approach begins with a rapid pairwise comparison (Wilbur and Lipman, 1983), which generates similarity scores that are then used to construct a dendogram by the Neighbor-Joining method (Saitou and Nei, 1987). Individual sequence

Amino Acid Residues (x10-~) 0

2

4

6

8

IO

12

14

16

18

l”“I”“I”“I”“l”“1”“1”“1”“I”“l’ RGD

5' untranslated Exons 1

#23

456

7 8 911 10

12414 1617181920212223242526 28 30 32 36 3’ untranslated 27 29 31 33 34 35 1315 t

cHLTBP2.5

t

Pdy A+ Signal

stop codon

cHLTBP2.4 cHLTBP2.3 cHLTBP2.2

+

cHLTBP2.1 II,'

0

Domains:

,,'I

1

EGF-like,

I,"

2

'I',

,,I,

,"I

5

; EGF-likeCa Binding, 0; 8-cysteine, m; Hybrid,Q;

'I,,

6

4-cysteine,

7

; uniqu~~

Fig. 1. Diagram of human LTBP-2 cDNA and proposed domain structure. The cDNA is divided into exons, which are numbered and drawn to scale. The proposed domain structure is based on homology to motifs found in EGF and LTBPs. The cDNA clones are identified and arrows (c) indicate the positions of oligonucleotides used in the formation of primer extension libraries.

534

Muhammad M. Bashir et al.

ATGAGGCCGCGGACCAAAGCCCGCAGCCC~GGGCGCGCC~TGCGGAACC~CTGGAGAGG~TTCCTGCCG~TCACCCTGG~TCTCTTCGT~GGCGCGGGT~ATGCCCAAA~GGACCCCGT~ 170 MRPRTKARSPGRALRNPWRGFLPLTLALFVGAGHAQRDPV

41

GGGAGATACGAGCCGGCTGtTGGAGACGCGAATCGACTG~GGCGCCCTG~GGGCAGCTA~CCGGCAGCG~CTGCAGCCA~GGTGTACAG~CTGTTCCGG~AGCAGGACG~GCCTGTCGC~ 240 GRYEPAGGDANALRRPGGSYPAAAAAKVYSLFREODAPVA

81

GGCTTGCAGCCCGTGGAGCGGGCCCAGCCGGGCTGGGGG;GGAGGCCGT~CCGCGCGCA~CAGTCGCGG~GTGTCCAGC~ACCTGCGCA~ GLOPVERAOPGWGSPRRPTEAEARRPSRAOOSRRVOPPAO

360

161

ACCCGGAGA;GCACTCCCC;GGGCCAGCAGCAACCAGCA~CCCGGACCC~GGCCGCGCC~GCTCTCCCA~GCCTGGGGA~CCCACAGCG~TCTGGGGCT~CGCCCCCAA~CCCGCCGCG~ 480 TRRSTPLGOOQPAPRTRAAPALPRLGTPQRSGAAPPTPPR '-2 2-3 GGGCGGCTCACGGGGAGGAACGTCTGCGGGGGACAGTGC~GCCCAGGAT~GACAACAGC~AACAGCACC~ACCACTGTA~CAAACCCGTTTGCGAGCCG~CGTGCCAGA~CCGGGGCTC~ 600 GRLTGRNVCGGOCCPGWTTANSTNHCIKPVCEPPCONRGS

201

TGCAGCCGCCCGCAGCTCTGTGTCTGCCG~TCTGGTTTC~GTGGAGCCC~CTGCGAGGA~GTCATTCCC~ATGAGGAAT~TGACCCCCA~AACTCCAGG~TGGCACCTC~ACGCTGGGC~ 720 CSRPOLCVCRSGFRGARCEEVIPDEEFDPONSRLAPRRW'A

241

GAGCGTTCACCCAACCTGCGCAGGAGCAG;GCGGCTGGAGCCACAGTCG~CACCAGCTGGGACCClGAG~ ERSPNLRRSSAAGEGTLARAOPPAPOSPPAPOSPPAGTLS

121

3-T4

840

361

GGCCTCAGCCAGACCCACCCTTCCCAGCAGCACGTGGGG~TGTCCCGCA~TGTCCGACT~CACCCGACT~CCACGGCCA~TAGCCAGCT~TCTTCCAAC~CCCTGCCCC~GGGACCAGG~ 960 GLSOTHPSOOHVGLSRTVRLHPTATASSOLSSNALPPCPG 4-T-s CTTGAGCAGAGAGATGGCACCCAACAGGCGGTACCTCTG~AGCACCCCT~ATCCCCCTGGGGGCTGAAC~TCACGGAGA~AATCAAGAA~ATCAAGATC~TCTTCACTC~CACCATCTG~ 1080 LEORDGTOOAVPLEHPSSPWGLNLTEKlKKlKlVFTPTlC S-l-6 AAGCAGACC;GTGCCCGTGGACACTGTGCCAACAGCTGTGGCTTC~GCATCTATTTCTG~ 1200 K 0 T C A R G H C A N S C E R G 0 T T T L Y S 0 G G H G H D P K S G FtR I Y r C

401

CAGATCCCC;GCCTGAACGcAGGCCGCTG~ATCGGCAGG~ACGAATGCT~GTGCCCCGC~AACTCCACC~GGAAGTTCT~CCACCTGCC~ATCCCGCAG~CGGACAGGG~GCCTCCAGG~ 1320 OIPCLNGGRCIGRDECWCPANSTGKFCHLPIPQPDREPPG

441

AGGGGGfCCCGCCCCAGGGCCTTGCTGGAbiGCCCCACTG~AGCAGTCCA~TTTCACACT~CCGCTCTCC~ACCAGCTGGCCTCCGTGAA~CCCTCCCTG~TGAAGGTGC~CATTCACCA~ 1440 RGSRPRALLEAPLKQSTFTLPLSNOLASVNPSLVKVHIHH

481

CCACCCGAGGCCTCAGTGCAGATCCACCAGGTGGCCCAGGTGCGGGGCGGGGTGGAGGAGGCCCTAGTGGAGAACAGCGTGGAGACCAGACCCCCGCCCTGGCTGCCTGCCAGCCCTGGC 1560 PPEASVOIHOVAOVRGGVEEALVENSVETRPPPWLPASPG

521

CACAGCCTC;GGGACAGCAACAACATCCC;GCTCGGTCTGCTGGGCCGG~GTTACCTGA~CACTGTGAA~ HSLWOSNNIPARSGEPPRPLPPAAPRPRGLLGRCYLNTVN

281

321

K-T-7

561

601

641

681 721 761

801 841

881 921 961 1001

1680

8-I-9 7-8 GGACAGTGTGCCAACCCTC;GCTGGAGCTGACTACCCAG~AGGACTGCT~TGGCAGTGT~GGAGCCTTC~GGGGGGTGA~TTTGTGTGC~CCATGCCCA~CCAGACCAGCCTCCCCGGT~ 1800 GOCANPLLELTTOEDCCGSVGAFWGVTLCAPCPPRPASPV 9-10 ATTGAGAATGGCCAGCTGGAGTGTCCTCAGGGGTACAAGAGACTGAACCTCACTCACTGCCAAGATATCAACGAGTGCT~GACCCTGGG~CTGTGCAAG~ACGCGGAGT~TGTGAATAC~ 1920 IENGQLECPOGYKRLNLTHCODINECLTLGLCKDAECVNT 10-11 AGGGGCAGC;ACCTGTGCACATGCAGACC;GGCCTCATGCTGGATCCAT~GCGGAGCCG~TGTGTGTCGGACAAGGCAA~CTCCATGCT~CAGGGACTG~GCTACCGGT~GCTGGGGCC~ 2040 RGSYLCTCRPGLMLDPSRSRCVSDKAISMLOGLCYRSLGP 11-12 GGCACCTGCACCCTGCCTT;GGCCCAGCGGATCACCAAGCAAATGCCCT~TGCCTGGCACAGAGGCCTT~ GTCTLPLAORITKOICCCSRVGKAWGSECEKCPLPGTEAF

2160

AGAGAGATCTGCCCTGCCGGCCACGGCTACACCTACGCGAGCTCCGACATCCGCCTGTCCATGAGGAAAGCCGAGGAGGAGGAACTGGCAAGGCCCCCAAGGGAGCAAGGGCAGAGGAGC 2280 REICPAGHGYTYASSDIRLSMRKAEEEELARPPREOGORS 13-14 12-13 AGCGGGGCACTGCCCGGGCCAGCAGAGAGGCAGCCCCTCCGGGTCGTCACGGACACCTGGCTTGAGGCCGGGACCATCCCTGACAAGGGTGACTCTCAGGCTGGCCAGGTCACGACCAG~ 2400 SGALPGPAEROPLRVVTDTWLEAGTIPDKGDSOAGOVTTS 14-15 GTCACTCATGCACCTGCCTGGGTCACAGGGAATGCCACAACCGATGTGC~GGTGACCCT~ 2520 VTHAPAWVTGNATTPPMPEOGIAElQEEOVTPSTDVLVTL 15-16 AGCACCCCAGIGCATTGACAGATGCGCTGCTGGAGCCACCAACGTCTGTGGCCCTGGAACCTGCGTGAACCTCCCCGATGGATACAGATGTGTCTGCAGCCCTGGCTACCAGCTGCACCCC 2640 STPGIDRCAAGATNVCGPGTCVNLPDGYRCVCSPGYOLHP 16-17 AGCCAGGCC;ACTGCACAGATGACAACGAGTGTCTGAGG~ACCCCTGCA~GGGAAAAGG~CGCTGCATC~ACCGCGTGG~GTCCTACTC~TGCTTCTGC~ACCCTGGCT~CACTCTGGC~ 2760 SOAYCTDDNECLRDPCKGKGRCINRVGSYSCFCYPGYTLA

17-18

ACCTCAGGGGCGACACAGGAGTGTCAAGATATCAATGAGTGTGAGCAGC~AGGGGTGTG~AGCGGGGGG~AGTGCACCA~CACCGAGGG~TCGTACCAC~GCGAGTGTG~TCAGGGCTA~ 2880 TSGATOECODlNECEOPGVCSGGOCTNTEGSYHCECDQGY la-19 . ATCATGGTCAGGAAAGGACACTGCCAAGXTATCAACGAATGCCGTCACCCCGGTACCTGCCCTGATGGGAGATGCGTCAATTCCCCTGGCTCCTACACTTGTCTGGCCTGTGAGGAGGGC 3000 IMVRKGHCODINECRHPGTCPDGRCVNSPGSYTCLACEEG 19-20 TACCGGGGCCAGAGTGGGAGCTGTGTAGATGTGAATGAG~GTCTGACTCCCGGGGTCTG~GCCCATGGAAAGTGCACCA~CCTAGAAGG~TCCTTCAGA~GCTCTTGTG~GCAGGGCTA~ 3120 YRGOSGSCVDVNECLTPGVCAHGKCTNLEGSFRCSCEOGY

Fig. 2. Nucleotide and deduced amino acid sequence of LTBP-2. 5’ and 3’ untranslated segments are in lowercase letters. Amino acids are numbered to the left from the start site of translation; nucleotides are numbered to the right. Division of the sequence into exons is indicated (-). A polyadenylation consensus sequence, utaaa, is underlined. (Continues on next page.)

Analysis of human latent TGF-B-binding

1041

1081 1171

1161 1701

1241 1701

1,321 1361

protein-2 gene

535

20-21 1740 GAGGTCACC;CAGATGAGA6GGGClGCCAAGATGTGGATGAGTGTGCCAGCCGGGCCTC~TGCCCCACAGGCCTCTGCCTCAACACGGA~GGCTCCTTCGCCTGCTClG~CTGrGAGAA, t V T S 0 E KG C 0 0 V II E C AS R A S C P T GI C L N T E G S F A C S A C F )i 21-22 1360 GGGTACTGGGTGAATGAAGACGGCACTGCCTGTGAAGACCTAGATGAGTGTGCCTTCCC~GGAGTCTGCCCCTCCGGAG~CTGCACCAA~ACGGCTGGC~CCTTCTCCT~CAAGGArTi,l GYWVNEDGTACEDLDECAFPGVCPSGVCTNTAGSFSCKDI 22-23 GATGGGGGC;ACCGGCCCAGCCCCCTGGG;GACTCCTGTGAAGATGTGGATGAATGTGA~GACCCCCAG~GCAGCTGCC~GGGAGGCGA~TGCAAGAAC~CTGTGGGCT~CTAC~A(~T~:~ 3480 0 G G Y R P S P LG D S C E 0 V 0 E C F 0 P Q S S C L G G C C K N T V G i f i)) 23-24 1601) CTCTGTCCC~AGGGCTTCC~GCTGGCCAA;GGCACCGTG~GTGAGGATGTGAATGAGTG~ATGGGGGAG~AGCACTGCG~ACCCCACGG~GAGTGCCTC~ACAGCCACG~~,lCTill,i ' L C P 0 G F 0 L A N G T V C F 0 V N E C M G E E H C A P H C E C I N 8 H II \ t / 24-25 t/20 TGTCTGTGCGCGCCTGGCTTCGTCAGCGCAGAGGGGGGCACCAGCTGCCAGGATGTGGACGAGTGTGCCACCACAGACCCGTGTGTGGGAGGGCACTGTGTCAACAC(:(iA(,GGClC~l ( C L C A P G F V 5 A C G G T SC 0 0 V D i C A T T I) P C V G G /( C V Nl i [I '3 25-26 AACTGTCTATGTGAGACTGGCTTCCACCCCTCCCCAGAGAGTGGAGAGTGTGTGG~TATTGACGAGTGTGAGGACTATGGAGACCC(,GTGTGIGGCA(.CTGGAAGlGTGAAAACAG~i.~ T $841! N C L C h T G i 0 P S P E S G E C V D ID E C F D Y G 0 P V (I G T W K r I N :

26-,--27

196(1 hGCTCCTACCGCTGTGTTCTGGGCTGCCAGCCTGGCTTCCACATGGCCCCGAACGGAGACTGCATTGACATAGACGAGTGCGCCAA~GA~ACCATGTGTGGCAGCCACGGC~1(.16i!.~i G S Y II C V L G r 0 P G F H M A P N G 0 C I D IO t C A N I! r M C (2 \ ii I: I ( 27-28. AACACTGATGGCTCCTTCCGCTGCClCTGTGACCAGGGCTTCGAGATCTCTCCCTCAGGCTGGGAClGlGTGGATGTGAACGAGTGlGAGCTTATGCTGGCGGlATGTGGGGLC~,LG~‘C4DHQ NT D G S F R C L C II 0 G F E 1 S P S G W 0 C V D V N E ( F LM L A V (I i, h P : 28-29 TCTGAGAACG~GGAGGGCTCCTTCCTGTGCCTCTGTGCCAGTGACCTGGAGGAGTACGATGCCCAGGAGGGG~.ACTGC~:GCCCA~GGGGGGCTGGAGGTCAGAGTA~~~~L~GAI;GCCII R 4200 C E N V E G S 1 L C L C A S D L E E Y DA 0 E G // C R P P I; A G 6 0 S M' I

1601

ACGGGGGACCATGCCCCGGCCCCCACCCGCATGGACTGCTACTCCGGGCAGAAGGGCCATGCGCCCTGCTCCAGlGlC~TGCGCCCGAACACCACACAGGCT(;AATGCi~;Cli;CAc.C: A(, 4?7!1 , TG D Ii A P A P T R M II C Y S G 0 K G H A P C 5 5 V I G I) NT T 0 A i II ( / 29-30 GGCGCTAGC~GGGGAGATG~CTGTGACCTEiGCCCGTCTGAGGACTCAGCTGAATTCAG~GAGATCTGC~ClAGlCGAA~AGGCTACATTCCTCTCGAA~GAGCCi(;(,hcGr:,G~;kiA;, 444D G A 5 W G D A C D ( C P S E D S A I F 5 E I C P S (1 K G Y I P V [ G A W T I 0 (1 30-3’ ACCATGTACACAGATGCGGATGAGTGTGT~AlATTCGGG~CTGGTCTCT~CCCGAACGG~CGGTGCCTC~ACACCGTGC~TGGllATGI~TGCCTGTGC~ATCCCGGCT;CCACTAC:,A; 4560 T M Y T DA D E C V It G P G L C P N G R C L N T V P G Y V C L C N P Ct H i 11 31-s,--32 4680 GCTTCCCAC~AGAAGTGTGAGGATCACGAiGAGTGCCAG~ACCTGGCCT~TGAGAATGG~GAGTGCGTC~ACA~GGAGG~CTCClTCCA~lGCTTCTGC~GCCCCCLG~.iCACI:CTG~;Ai A S H K KC E D H D E C 0 D L A C C N G E C V N T E G 5 F H C F C 5 P P 1~11 D 32-33 4800 CTCAGCCAGCAGCGCTGCA;GAACAGCACtAGCAGCACGGAGGACCTCC~TGACCACGA~ATCCACATG~ACATCTGCl~GAAAAAAGT~AC~AATGAT~TGTGCAGCG~ALCCCTGi,G~ L S 0 D R C M N S T S 8 T F D L P 0 H DIH M D I C W K K V T N fl V C 5! P ( il 73-34 GGGCACCGC~CCACCTACA~GGAATGCTG~TGCCAGGAC~GCGAGGCCT~GAGCCAGCA~TGTGClClG~GTC~C~.C~,A~~GAG~TLTGAGGiCTATGCl~AGClGTI;CA~i:I;Ti;CC,.i;~4970 R 5 5 L V Y A ii I ( N V A Ii G H R T T Y T t C C C II D G E A W S 0 0 C A, C II P

1641

AlTGAGGCA~AGCGGGAGGtCLGGGTCCA~TTCCGGCCA~GCTATGAGTATGGCCCCGGGCCCGATGA~~T~~CALTACA~CAlClATGG~CCAGATGGG~CCCCCTlCiii~~~A(,,A~ .I, I t A F R L A G V H F R P G Y E Y G P G P 11 D L ii Y 5 I Y G P 0 G A P i Y N y i

1401 1441

1481 1521

1561

5040

1761

Gl.CCCCGAG~ACACCGTCC~lGAGCClG~~TTCCCCAAC~CAGCCGGTC~CTCAGCGGA~CGCACACCC~T~CTTGACI~TCIT7lGCA~CCCTCAGAA~TCCAGCCCChC'ACCII;ICr 5160 G Pj4t D T V P F PA F P N T A G H S A D R T Pi F s 1) IO P s I I. 0 1) /I Y b' e -35 ACCCATCCAGAGCCCCCAG~CGCCTTCGA~GGGCTTCAG~CGGAGGAGT~CGGCATCCT~AACGGCTGT~AGAATGGCC~CTGTGTGCG~GTGCGGGAG~GCTACACCT~TGACTG,il; 5780 5 H P E I' I' A C F C GLC! A E E C G I LN G C E N G R C V R V 17 E G Y I( n i F 35-36 GAGGGCTTCCAGCTGGATGCGGCCCACATGGCCTGCGTAGATGTGAATG~GTGTGATGA~TTGAACGGG~CTGCTGTGC~CTGTGTCCA;GGTTACTGCLAGAACACAG~GG(;C~CS~A~ 54DD t L I 0 L 0 A A H M A C V D V N E C D Dl N G P A V L L V H G Y C E N I ! i; \ Y

1801

CGCT(iICAClCCTCCCCGGGATATGTGGCTGAGGCAGGG~CCCCCCACT~CACTGCCAA~GAGTAG R (_ H C 8 P G Y V A f A G P P H C T A K ESTOP

1681 1721

1586 5706 5826 5946 6066 6186 6306 6426 6546 6630

Fig. 2.-Cant.

weights are calculated from the resulting unrooted phylogenetic tree and guide the final multiple alignment. To produce the final alignment, the algorithm of Myers and Miller (1988) is then used to align progressively larger sequence groups. The BLOSUM series of amino acid weight matrices was used (Henikoff and Henikoff, 1992). At least two iterations were performed for each alignment. Phylogenetic tree construction The Neighbor-Joining

method (Saitou and

Nei, 1987) was employed to construct the phylogenetic tree. The Clustal W alignment was used to determine a topology and branch length proportional to the estimated divergence (100 minus percent of identical residues per two aligned sequences excluding gaps) of each sequence. The branch lengths therefore reflect distances between sequences and each node represents a consensus between the sequences on each branch. A bootstrapping resampling procedure (Felsenstein, 1985) was performed (1000 iterations) to obtain a measure of branching reliability.

536

Muhammad M. Bashir et al.

(our base numbering 4152, 4205, and 4206 in Fig. 2). Our cDNA sequence agrees precisely with that obtained from sequencing the gene as described below. The additional bases result in a frame shift and a difference between our amino acid sequence and that of Moren et al. extending from residue 1385 to residue 1402. Furthermore, in the sequence of Moren et al., an EGF-like repeat diverges from that of the 6-cysteine consensus sequence while our sequence conforms to the consensus.

kb 7.4 4.4 2.4

Fig. 3. Northern blot analysis. Total RNA was isolated from cultured human lung fibroblasts by guanidine thiocyanate extraction. RNA (IOpg in each lane) was separated on a formaldehyde-agarose gel, transferred to a nitrocelhtlose membrane and hybridized to the following probes: lane 1, 800 bp fragment of LTBP-2; lane 2, bases 21-526 of Moren et al. (1994). RESULTS

Sequence analysis of cDNA Overlapping cDNA clones were obtained as described in Materials and Methods and the relative relationships of these clones are illustrated in Fig. 1. Sequence analysis of these clones demonstrated that the cDNA contained an open reading frame of 5460 bp encoding a protein of 1820 amino acids with a predicted molecular weight of 195,050. Also present were a 5’ untranslated segment of 387 bp and a 3’ untranslated segment of 1164 bp. While this work was in progress, Moren et al. (1994) published the sequence of cDNA encoding the same protein. While our sequence largely agrees with theirs, the two sequences differ in two important respects. First, the sequence of Moren et al. contains a long (1.8 kbp) 5’ untranslated sequence, the majority of which we did not find in any of our clones. During comparative analysis of our sequence and theirs, which included BLAST searches of the GenBank databases, we found that the first 1420 bp of the published sequence corresponded to a sequence encoding a portion of human seryl-tRNA synthetase (GenBank Accession No. TO6028). Second, our sequence contains three additional guanines in the coding portion

Northern analysis In order to determine the size of the mRNA transcripts corresponding to the isolated cDNA, Northern analysis was carried out using total RNA isolated from cultured lung fibroblasts (GM05839, American Type Culture). When a probe covering bases 1952-2823 was used, two transcripts of 7.5 and 9.0 kb were identified (Fig. 3, lane l), in agreement with the findings of Moren et al. (1994). Using the lung fibroblast RNA as template, we also isolated a 506 bp cDNA fragment by reverse transcriptase/ polymerase chain reaction corresponding to bases 21-526 of Moren et al. and used it as a probe in Northern analysis. This probe identified a 2 kb transcript, presumably corresponding to human seryl-transferase, which to date has not been cloned in its entirety. However, the transcript for hamster seryl-tRNA synthetase is 2.5 kb (Debastisse et al., 1984), and those for other mammalian synthetases are quite variable. These results, then, demonstrate that the 5’ end of the sequence obtained by Moren et al. encompassing 1420 bp is the result of a cloning artifact incorporating a portion of seryl-tRNA synthetase cDNA. Structure of the gene A partial Sau3a human genomic library was constructed in 1 DASH and screened with cDNA clones covering the entire transcript. Genomic clones, encompassing approximately 100 kbp, were identified and characterized by restriction endonuclease mapping and DNA sequencing (Fig. 4). The analyses demonstrated that the entire cloned cDNA sequence was contained within the cloned genomic DNA, and permitted the definition of the exon/intron structure. However, it should be noted that although the characterized genomic DNA included all the cloned cDNA, it did not form a continuous genomic sequence. Indeterminate portions of introns 1, 2, 3 and 8 are missing

Analysis of human latent TGF-b-binding

(Fig. 4), but it did not seem worthwhile at present to rescreen genomic libraries just to fill in these gaps. Thirty-six exons were identified in the gene, and an interesting feature is the progressive dilution of coding sequence as one progresses from the 3’ to the 5’ end of the gene (compare segments A, B, and C to segments D and E in Fig. 4). Exons l-3 are embedded in at least 33 kbp of genomic DNA while exons 19-36 are contained in only 14 kbp. Division of the cDNA into exon segments is illustrated in Fig. 1. The EGF-like repeats are encoded in single exons, while the eight-cysteine repeats are not. Exon/intron borders, shown in Fig. 5, conform to the consensus sequences deduced from analysis of a large number of eukaryotic genes (Padgett et al., 1986). Generally (31 out of 35 cases), the borders divide codons such that a single nucleotide is found at the 3’ end of an exon and the remaining two nucleotides of the codon are found at the 5’ end of the following exon. A similar exon border structure is found Segment

A

Segment

C . .

Segment

E

Protein sequence alignments

We have used the Clustal W program of progressive multiple sequence alignment to examine the relationship within and between the LTBP and fibrillin groups (Higgins and Sharp, 1989). This alignment method offers improved sensitivity over previous approaches by assigning individual weights to each sequence based on degree of similarity, by varying amino acid weight matricies as the alignment proceeds depending on estimated distances, and by the use of residue-specific, hydrophilic region-specific and locally determined reduced gap penalties (Thompson et al., 1994). We have compared three regions (Fig. 6) of 10 gene products: human (LTBPlHUM) and rat Segment

D

Sites:

537

in the human fibrillin-l gene (Pereira et al., 1993). In two cases in the LTBP-2 gene, codons are divided so that a single base is found at the 5’ border and in two cases the border does not divide a codon.

B

ov 0

Nucleotides Restriction

protein-2 gene

Hind Ill, A; Barn HI,

(kb)

l ; Sst I,

V; Eco RI, 0; Sal I, +

Fig. 4. Diagram of human LTBP-2 gene. The entire cDNA illustrated in Fig. 1 is contained in five nonoverlapping segmentsof genomic DNA. The individual I clones are identified and exons are numbered.

538

Muhammad M. Bashir et al.

(LTBPlRAT) LTBP-I, human (LTBP2HUM) and bovine (LTBP2BOV) LTBP-2, mouse (LTBP3MUS) LTBP-3, human (FBNIHUM), bovine (FBNIBOV) and mouse (FBNlMUS)

fibrillin-1, and human (FBN2HUM) and mouse (FBN2MUS) fibrillin-2 (Kanzaki et al.. 1990; Tsuji et al., 1990; Corson et al., 1993; Pereira et al., 1993; Zhang et al., 1994; Moren et al.,

Exon Size # lnitia tion

ACGE

gt

gagcatgggagagggaagtt

AAAL gt

gagtgcctccaccagtagca

actctgttttgtgaccccac

ag LAGG

acctctctccctgtgtcctc

ag ZGTT

cgtgccgcggttgtctccac

ag LACC

TGG:

gacccaagccctctctctgc

GCTS

gt

aagtccccttctccatgggg

gt

aaggataccctgtcttgggt

ag ;CTGi

ATCL gt

gagtccctttcctgctcaga

accttgagggttgatgttcg

ag ATTTC,

CTG& gt

gagtgccggcacagctgggc

tgactctagcccctgtgccc

ag C&TCC:%!

tgtccatggcctctccttcc tgtccgtatctccactcttc

ag TX ag ETCC

k$f&;

gtcatctcctccaaccctgc

ag ZATC

1

ccacgtgctgtgtctgttgc

ag EGAC

cctctggctgttggctttgc

ag SGCC

1

cgctgttacctttcccttcc

ag GGAC

accttctcctgtttctgtgc

i

gt

gagaacacagctctggcctc

CCAL gt

&i

gagtgctggagtccaaggtc

CAAi

aggaccccagggccagggnt

gt

I GTGL gt

gagtgctggcagggcatggg

t ACAL gt

aacctcctgttcccatccca

: AAGi

gt

atctgagctgggggagatgc

ag GTC

:'%$f?$$i~ Cs gt ~~@~~?~;f$BACAL gt

gagtgcgtcctcccactagg gagtccctcttcctggcggc

tcttcccctctctcctccac

ag EAAT

4S:r,'4@&

CCA& gt

agctgcagccagacctttgg

tgttttgctctgtctcccac

ag EATT

$@+~d#p

ACAL gt

attcacttgggtgccctggg

ctccctgccttcttctggtc

ag A~GAC-'~@k4$%+:;

CAAL gt

aagccaagcccgcccttgcc

gggcttttctgtgtgttcac

ag ~ATC:l?&W#&?~

CAAi

gggaggggctgctgttttgc gtcttggacttctctcccac

ag ATATC~&@&+$@: GTAL gt ~%3!~#@~~ CAAG gt ag ALGTG:'

agaatgcaatgccctatttc ggcgagcgctgtctgtctgc

ag ATGTG ag GCTA

GAAi

GAAL gt

atgagcctctgtgtgggaga

cctcccctgtgtggccctgc

ag ATGTGB

GAGi gt

gagtgacccaggtcacccta

actccccatcttttccctgc

ag

CAGL gt

gagaagccagcacagctacc

ctgacagccatctctactct

ag ATGTGj

catggcatggctctggggac

ag &ATT

cctgggttccactgtcccac

ag

ATGTG:

' GTGi

gt

gt

gt

aaggaacgggaaggagggaa gagggctgctggccaaggca acagagatgttgaagcgccc aaccctggggaggcagtggt

awwtchxwm'gaggg

ACATA

ATTk gt GTGL gt

wcCwwwwwgagaga gaggagcctgtgggctaccc

agcctgttctcacacctggc

ag ATGTG

GGA& gt

gaggcctgggagacacatta

aatgagctactgtcacgcac

ag

TCAk gt

aagaccccaagggttccaga

gctgaaacgcctctctacat

ag CGAA

ACAG- gt

aatctgctccatcctctgcc

acctcagctctgccttccac

ag

GAGi gt

aaggacacacatgatccctg

ccaccctttcctgcctcccc

ag ECAC

cctccctgccctgtgccacc

ag AGGAC

ttctctccctcctccctgac ctctccgatttcctcctccc

ag AGGTC ag ECCC

ACQ gt TCTi gt CCAg gt

wgtwwctwcgtgggg wwwwcawmgata ctgtgttgctgcaaaactgg

actctcctttgtttcctctt

ag ZGTG

GLCAG ATGCG

GTAi gt aagtgcagaattctgatgga Termination

Fig. 5. Nucleotide sequence of the LTBP-2 gene exon/intron junctions. Exons are nuqbered in a 5’ to 3’ direction. Divided border exons are underlined with a solid bar, while undivided border exons are underlined with an open bar. Twenty bases of each flanking intron are shown.

Analysis of human latent TGF-b-binding

1994; Yin et al., 1995; Gibson et al., 1995). Each region for analysis was selected because discrete ends could be identified thereby strengthening the alignment by insuring the comparison of truly analogous segments. Clustal W alignments of regions A, B and C produced consensus sequences that are 188, 559 and 246 residues in length, respectively. The phylogenetic tree shown in Fig. 7 represents region A. Phylogenetic trees constructed from alignments of regions B and C bear the same topology and are not shown. Within the LTBP clade, LTBP-3 is more similar to LTBP-1 than to LTBP-2. LTBP-2 shows greater similarity to the fibrillins than either LTBP-I or LTBP-3. Within the fibrillin clade, fibrillin- I is nearest to the LTBPs. Bootstrap resampling analysis (Felsenstein, 1985) of the phylogenies showed that the branches are strongly supported. DISCUSSION

In the original characterization of LTBP-1, Northern analysis showed two transcripts of 5.2 and 6.2-7.0 kb. Analysis of the corresponding cDNAs indicated that the longer transcript, which was cloned using rat kidney RNA (Tsuji et al., 1990), differed significantly only at the 5’

protein-2 gene

539

end from the shorter transcript, which was cloned using human fibroblast RNA (Kanzaki et al., 1990). Besides containing differing 5’ untranslated sequences, the encoded signal sequences were very dissimilar and most importantly, the longer transcript encoded an additional 318 amino acids at the amino end. In the remainder of the coding sequence where the two transcripts overlapped, they were very homologous with only rare base and amino acid changes which were largely conservative. Thus, it is unlikely that the major difference in the coding sequence located at the 5’ end is due to species variation. The origin of the two different size transcripts is presently unknown. One possibility is that of alternative splicing in which different 5’ exons are utilized in an exclusive fashion. A second possibility is that alternative promoters are used in which different 5’ exons generated by the two separate promoters are spliced to a common downstream exon. In this model, the downstream promoter would be located within an intron of the larger transcript and, depending upon the structure of the gene, alternative splicing might or might not be involved. Our present analysis encompasses 6983 bp of cDNA. The first methionine in the open reading B

Four cysteine

0

EGF-like

0

EGF-like domain

p

0

$

n

domain

domain

Eight cyst&e

calcium domarn

Hybrid

LTBPl Rat

LTBP2 Hum

FBNl Human

FBN2 Human

Fig. 6. Schematic diagrams of the domain structure of the LTBPs and fibriilins. Segments A, B, and C were used in the comparative sequence analysis and generation of phylogenetic trees as discussed in the text and illustrated in Fig. 7.

binding

540

Muhammad

frame begins at base 387 and Moren et al. (1994) used this methionine as the start codon in the construction of an expression vector. They obtained labeled protein which was immunoprecipitable with LTBP-2-specific antibody from the culture media of COS-1 cells transfected with this vector even though a potential

M. Bashir et al.

(but experimentally unproven) hydrophobic secretory signal sequence is not found until 20-33 residues following the initiator methionine. Furthermore, when the LTBP-2 construct was co-transfected with one expressing TGF-/? precursor, the two were secreted as a latent complex. These results support the selected

0.019

1000 =-L19

z::

0.152

0.016

FBNI Bovine

FBNl Human

1.049

-

LTBPZ Bovine

0.164

0.040

-

3.089

LTBP2 Human

r

0.020

0.173

0.031

LTBPl Rat

LTBPf Human

82t

0.202

LTBP3 Mouse

Fig. 7. Phylogenetic tree generated from the alignment of region A (Fig. 6) using the Clustal W method and built with the Neighbor-joining method. The branch lengths reflect distances between sequences (1.0 minus the nmnber of mismatches divided by total number of residues excluding gaps) and each length is printed above each branch. Every node represents a consensus between the sequences on each branch. The number at each node denotes the proportion of 1000 bootstrap iterations that supported the subset of sequences as illustrated.

Analysis of human latent TGF-b-binding

methionine as a legitimate start codon. Based upon this identification, the presently cloned 6983 bp of cDNA most likely corresponds to the 7.5 kb transcript, not the larger transcript as suggested by Moren et al. (1994). Presumably, the remaining 500 bases consist of a polyadenylation tract and possibly additional bases in the 5’ untranslated segment. Analysis of the genomic sequence 5’ of the cloned cDNA (not shown) demonstrated that it was very G + C rich with a high frequency of CpG dinucleotides consistent with it being a GC island (Gardiner-Garden et al., 1987). Although no canonical TATA or CCAAT boxes were found within 500 bases extending 5’ of the cDNA, several consensus sequences for the binding of transcription factors, including SPI, were identified. These features are characteristic of widely expressed genes. However, future primer extension and nuclease protection experiments are necessary to identify the origin of transcription and the legitimate promoter segment. The present analyses also leave open the origin of the larger 9.0 kb transcript. Our Northern analyses indicate that the cloned cDNA is present in both the 7.5 and 9.0 transcripts and thus it is possible that the situation with LTBP-2 is similar to that of LTBP, in which case the transcript may arise through alternative splicing or through the action of another promoter segment. Further cDNA cloning is necessary to identify the additional 1500 bases which will then permit the localization of the origin of transcription of the longer transcript and the corresponding genomic region, including the promoter. The LTBPs are a family of large glycoproteins containing multiple domains that are rich in cysteine. The most common motif is an EGF-like sequence that repeats between 8 and 13 times in one contiguous segment. The function of the LTBPs is uncertain, although there is convincing evidence that these proteins bind TGF-8 intracellularly and facilitate its proper folding and secretion (Miyazono et al., 1991). The LTBPs have also been shown to associate with the extracellular matrix of cells, where they may provide a bridge between TGF-P and matrix-associated proteins (Taipale, 1994). Another function of the LTBPs may be as structural components of the extracellular matrix. LTBP-I has been shown to associate with the extracellular matrix by immunohistochemical localization (Mizoi et al., 1993; Waltenberger et al., 1993; Yamazaki et, al.,

protein-2 gene

541

1994). LTBP-2 has been specifically localized to elastic fibers by immuno-light microscopy, while localization at the ultrastructural level indicates that the protein is confined to microfibrils (Gibson et al., 1995). This is a particularly intriguing finding considering the strong structural similarities between the LTBPs and the fibrillins. Fibrillins are thought to provide the basic scaffolding of the microfibril and contain many of the same cysteine-rich motifs found in the LTBPs (see Fig. 6). Similarities of structural domains shared by the fibrillins and LTBPs imply closely related gene families. Phylogenetic analysis of relatedness using sequence alignment and comparison algorithms established that the LTBPs and fibrillins are, indeed, homologous. Homology implies divergence from a common ancestor, which for most proteins is indicated by a sequence identity of greater than 20%. Although the fibrillins and LTBPs clearly form their own gene families, the degree of identity between members of the two families exceeds the 20% threshold. For large multidomain proteins different parts of the protein frequently have different genetic origins. This appears to be true for both the fibrillins and LTBPs where greater than 50% of their nucleotides were obtained from an ancestral gene with homology to EGF and approx. 20% of the coding sequence appears to have diverged from a gene encoding the 8 cysteine motifs. EGF motifs are commonly found in extracellular matrix proteins where they are thought to provide structural stability to regions that connect other functional domains. The 8 cysteine motifs, in contrast, are unique to the fibrillin and LTBP gene families. Analysis of intron-exon boundaries suggests that exons for the EGF and eight-cysteine motifs have been shuffled from common progenitor genes into the LTBPs and fibrilhns with subsequent expansion of individual exons and, possibly, groups of exons, through gene duplication or chromosomal cross-over events. Phylogenetic analysis (Fig. 7) predicts that LTBP-3 may be the common progenitor for both gene families, but we cannot exclude the possibility of LTBP and fibrillin divergence from an as-yet unidentified precursor. Acknowledgements-Supported by NIH grant AR41414. We thank the Biopolymer Analysis Laboratory, University of Pennsylvania Dental School, for DNA sequence determination and informatics support.

542

Muhammad REFERENCES

Bashir M. M., Indik Z., Yeh H., Ornstein-Goldstein N., Rosenbloom J. C., Abrams W., Fazio M., Uitto J. and Rosenbloom J. (1989) Characterization of the complete human elastin gene. Delineation of unusual features in the 5’-flanking region. J. Biol. Chem. 264, 8887-8891. Corson G. M., Chalberg S. C., Dietz H. C., Charbonneau N. L. and Sakai L. Y. (1993) Fibrillin binds calcium and is coded by cDNAs that reveal a multidomain structure and alternatively spliced exons at the 5’ end. Genomics 17, 47-84. Debatisse M., Robert De Saint Vincent B. and Buttin G. (1984) Expression of several amplified genes in an adenylate-deaminase overproducing variant of Chinese hamster fibroblasts. EMBO J. 3, 3123-3127. Felsenstein J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783-791. Flaumenhaft R., Abe M., Sato Y., Miyazono K., Harpel J., Heldin C.-H. and Rifkin D. B. (1993) Role of the latent TGF-B binding protein in the activation of latent TGF-8 by co-cultures of endothelial and smooth muscle cells, J. Cell. Biol. 120, 995-1002. Gardiner-Garden M. and Frommer M. (1987) CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261-282. Gibson M. A., Hatzinikolas G., Davis E., Baker E., Sutherland G. R. and Mecham R. P. (1995) Bovine latent TGF-B l-binding protein-2: molecular cloning, identification of tissue isoforms and immunolocalization to the elastin-associated microfibrils. Molec. Cell. Biol. 15, 6932-6942. Gubler U. and Hoffman B. J. (1983) A simple and very efficient method for generating cDNA libraries. Gene 2, 263-269. Henikoff S. and Henikoff J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915-10,919. Higgins D. G. and Sharp P. M. (1989) Fast and sensitive multiple sequence alignments on a microcomputer. CABZOS 5, 151-153. Kanzaki T., Olofsson A., Moren A., Wernstedt C., Hellman U., Miyazono K., Claesson-Welsh L. and Heldin C.-H. (1990) TGF-/?l binding protein: a component of the large latent complex of TGF-/?l with multiple repeat sequences. CeN 61, 1051-1061. Miyazono K., Hellman U., Wemstedt C. and Heldin C.-H. (1988) Latent high molecular weight complex of transforming growth factor-/?l. Purification from human platelets and structural characterization. J. Biol. Chem. 263, 6407-6415. Miyazono K., Olofsson A., Colosetti P. and Heldin C.-H. (1991) A role of the latent TGFB l-binding protein in the assembly and secretion of TGF-81. EMBO J. 10, 1091-1101. Mizoi T., Ohtani H., Miyazono K., Miyazawa M., Matsuno S. and Nagura H. (1993) Immunoelectron microscopic localization of transforming growth factor beta-l and latent transforming growth factor beta-l binding protein in human gastrointestinal carcinomas: qualitative difference between cancer cells and stromal cells. Cancer Res. 53, 183-190. Moren A., Olofsson A., Stenman G., Sahlin P., Kanzaki T., Claesson-Welsh L., ten Dijke P., Miyazono K. and Heldin C.-H. (1994) Identification and characterization of

M Bashir et al. LTBP-2, a novel latent transforming growth factor-pbinding protein. J. Biol. Chem. 269, 32,469-32,478. Myers E. W. and Miller W. (1988) Optimal alignments in linear space. CABZOS 4, 11-l 7. Padgett R. A., Grabowski P. J., Konarska M. M., Seiler S. and Sharp P. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119-1150. Olofsson A., Miyazono K., Kanzaki T., Colosetti Pl., Engstrom U. and Heldin C.-H. (1992) Transforming growth factor-p 1, $2, and -83 secreted by a human glioblastoma cell line. Identification of small and different forms of large latent complexes. J. Biol. Chem. 267, 19,482-19,488. Pereira L., D’Alessio M., Ramirez F., Lynch J. R., Sykes B., Pangillinan T. and Bonadio J. (1993) Genomic organization of the sequence coding for fibrillin, the defective gene product in Marfan syndrome. Human Mol. Genet. 2, 961-968. Roberts A. F. and Sporn M. B. (1990) The transforming growth factor-bs. In Peptide Growth Factors and Their Receptors Z (Edited by Spom M. B. and Roberts A. B.), pp. 419-472. Springer-Verlag, Berlin. Saitou N. and Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406425. Taipale J., Miyazono K., Heldin D.-H. and Keski-Oja J. (1994) Latent transforming growth factor-j3 1 associates to fibroblast extracellular matrix via latent TGF-P binding protein. J. Cell Biol. 124, 171-181, Thompson J. D., Higgins D. G. and Gibson T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Rex 22, 46734680. Tsuji T., Okada F., Yamaguchi K. and Nakamura T. (1990) Molecular cloning of the large subunit of transforming growth factor type /I masking protein and expression of the mRNA in various rat tissues. Proc. Natl. Acad. Sci. USA 87, 8835-8839. Wakefield L. M., Smith D. M., Flanders K. C. and Sporn M. B. (1988) Latent transforming growth factor-8 from human platelets. J. Biol. Chem. 263, 76467654. Waltenberger J., Lundin L., Oberg K., Wilander E., Miyazono K., Heldin C.-H. and Funa K. (1993) Involvement of transforming factor-/l in the formation of fibrotic lesions in carcinoid heart disease. Am. J. Pathol. 142, 71-78. Wilbur W. J. and Lipman D. J. (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. USA 80, 726-730. Yamazaki M., Minota S., Sakurai H., Miyazono K., Yamada A., Kanazawa I. and Kawai M. (1994) Expression of transforming growth factor-#I 1 and its relation to endomysial fibrosis in progressive muscular dystrophy. Am. J. Pathol. 144, 221-226. Yin W., Smiley E., Germiller J., Mecham R. P., Florer J. B., Wenstrup R. J. and Bonadio J. (1995) Isolation of a novel latent transforming growth factor-b binding protein gene (LTBP-3). J. Biol. Chem. 270, 10,147-10,160. Zhang H., Apfelroth S. D., Hu W., Davis E. C., Sanguineti C., Bonadio J., Mecham R. P. and Ramirez F. (1994) Structure and expression of fibrillin-2, a novel microfibrillar component preferentially located in elastic matrices. J. Cell Biol. 124, 855-863.