Molecular structure and sequence homology of a gene related to α1-antitrypsin in the human genome

Molecular structure and sequence homology of a gene related to α1-antitrypsin in the human genome

GENOMICS 2,165-173 Molecular (1988) Structure and Sequence Homology of a Gene Related to a,-Antitrypsin in the Human Genome JIA-JU BAO, *s’ Low R...

1MB Sizes 2 Downloads 51 Views

GENOMICS

2,165-173

Molecular

(1988)

Structure and Sequence Homology of a Gene Related to a,-Antitrypsin in the Human Genome

JIA-JU BAO, *s’ Low REED-F• URQUET, t RICHARD N. S~FERS,* VINCENT J. KIDD, *q2 AND SAVIO L. C. Woo*+* tHoward

Hughes Medical Institute, *Department of Cell Biology, and *Institute for Molecular Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030 Received

November

17, 1987;

Pm,

Inc.

INTRODUCTION

cY1-Antitrypsin (AAT) is the liver and subsequently stream where it serves as protease inhibitor. Although

February

12, 1988

iting a variety of proteases including trypsin, chymotrypsin, thrombin, kallikrein, and plasmin (Laurel1 and Jeppsson, 1975), its major physiological role is to protect elastic tissues in the alveoli structure of the lung from hydrolysis by excessive neutrophil elastase (Laurel1 and Eriksson, 1963). Clinically, AAT is important in that its genetic deficiency predisposes individuals toward the development of pulmonary emphysema (Laurel1 and Eriksson, 1963; Gadek et al., 1980). We have previously reported the molecular cloning and nucleotide sequence analysis of a full-length cDNA of human AAT (Kurachi et aZ., 1981). Subsequently, the molecular structure of the human AAT gene was characterized and its entire nucleotide sequence reported (Long et al., 1984). AAT is a member of the serine protease inhibitor gene family which includes al-anticbymotrypsin, antithrombin III, chicken ovalbumin, Y protein, and angiotensinogen (Hunt and Dayoff, 1980). Varying degrees of nucleotide and amino acid sequence homology exist between members of this family. It has been proposed that the members of this gene family arose from the same ancestral gene, and several models of the divergent evolutionary history of this superfamily have been proposed (Leicht et al., 1982; Procbownik et cd., 1985; Bao et al., 1987). We and others have previously reported the detection of an uncharacterized 7.7-kb EcoRI fragment within total human genomic DNA that is highly homologous to the human AAT gene (Lai et al., 1983; Hodgson and Kalsheker, 1986). Here we report the molecular structure and nucleotide sequence of this AAT-related genomic DNA sequence and show that it is located within 8 kb downstream of the authentic human AAT gene. Its extensive nucleotide sequence homology with the AAT gene suggests that it arose from a recent gene duplication event. Furthermore, if expressed, it would encode a plasma protease inbibitor with substrate specificity distinct from that of AAT.

A 7.7-kb EcoRI genomic DNA fragment highly homologous to the human cul-antitrypsin (AAT) gene has been cloned. This antitrypsin-related sequence is physically linked to the authentic AAT gene and both are present in a single cosmid clone. Nucleotide sequencing of the AAT-related genomic fragment demonstrated extensive homology with the authentic AAT gene in the introns as well as in the exons. The conservation of all RNA splice sites and lack of internal termination codons in the exonic regions suggest that it may not be a classical pseudogene. If expressed, it could result in a protein of 420 amino acid residues exhibiting a 70% overall homology with human al-antitrypsin. The signal peptide sequence is well conserved in the related gene, but the active site for protease inhibition of Met-Ser in cwi-antitrypsin has been changed to Z’rp-Ser. These data suggest that the putative protein encoded by the AAT-related gene is a secretory serine protease inhibitor with an altered substrate specificity. Interestingly, even the intronic regions in the related gene exhibit a 66% overall nucleotide sequence homology with those of the authentic AAT gene. These results suggest that the AAT-related gene is derived from a recent duplication of the authentic AAT gene and represents a new member of the serine protease inhibitor superfamily. 0 1988 Academic

revised

Genetics,

synthesized primarily in secreted into the bloodthe predominant serine AAT is capable of inhib-

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Acceseion No. 503044. i Current address: Institute of Basic Medical Sciences, Department of Biophysics, 5 Dong Dan San Tiao, Beijing, China P.R. 2 Current address: University of Alabama, Department of Cell Biology and Anatomy, 1918 University Blvd., Birmingham, AL 35294. 165

o&33-7543/33

$3.00

Copyright 0 1938 by Academic Press, Inc. All rights of reproduction in any form reserved.

166

BAO ET AL. MATERIALS

AND

METHODS

Materials All restriction endonucleases, T4 DNA ligase, polynucleotide kinase and exonuclease III were purchased from New England Biolabs. Escherichia coli DNA polymerase I (Klenow fragment) and calf intestine alkaline phosphatase were products of Boehringer Mannheim. [a-32P]dCTP (sp act, >3000 Ci/mmol) for nick translation and [(u-~%]~ATP (sp act, >650 Ci/ mmol) for nucleotide sequencing were obtained from Amersham Corporation. Construction and Screening of Cosmid Genomic DNA Library High-molecular-weight DNA (>150 kb) was isolated from lymphocytes of a patient homozygous for a Pi”” allele of AAT (Muensch et al., 1986). A genomic DNA library was constructed using the cosmid vector pCV107 (Lau and Kan, 1983) according to the procedure reported by DiLella et al. (1986). Bacterial colonies were grown on nitrocellulose filters (3.5 X lo4 colonies/l37-mm filter), and 7 X lo5 colonies were screened with a 32P-nick-translated AAT cDNA clone (Kurachi et al., 1981) as previously described for the cloning of the PiM and Piz AAT alleles (Sifers et al., 1987). Recombinant cosmid DNA was isolated from hybridizing transformant colonies and subjected to standard restriction mapping and Southern blot analysis. Nuckotide

tification resource (PIR), the MBIR services provided by Baylor College of Medicine. Sequences were obtained from the PIR, EMBL, and GenBank repositories. The amino acid sequence alignment and calculations and predictions for the evolutionary tree were determined by the methods reported by Feng and Doolittle (1988), utilizing the Multalign program of MBIR. RESULTS

The Antitrypsin-Related Gene Sequence Is Physically Linked to the Authentic AAT Gene In an attempt to isolate the antitrypsin-related (ATR) DNA sequence previously identified in the human genome (Lai et al., 1983), a cosmid library of partially digested total human genomic DNA was constructed and screened using the human AAT cDNA as probe. Restriction mapping and Southern blot analysis were performed on recombinant cosmid clones that hybridized to the human AAT cDNA. As expected, a 9.6-kb hybridizing EcoRI fragment corresponding to the authentic AAT gene was obtained in the clones. However, within some clones an additional 7.7-kb hybridizing fragment was detected, representing the ATR DNA sequence (Fig. 1, lane 2). Since the average genomic DNA insert in each recombinant cosmid clone was 40 kb in length, these results indicate that the ATR sequence previously detected within the human genome is physically linked to the authentic AAT gene. Further restriction mapping of

Sequence Analysis

Following digestion with various restriction endonucleases, genomic DNA fragments generated from clones hybridizing with the human AAT cDNA were treated with E. coli DNA polymerase I (Klenow fragment) and ligated into the SmaI site of M13mp18 (Messing, 1983). Following transformation of TG-1 cells, single-stranded DNA was isolated from plaques and sequenced by the dideoxynucleotide chain-termination method @anger et al., 1977) using the Ml3 universal primer. Alternatively, DNA sequencing was performed on some genomic DNA fragments following exonuclease III deletion as described by Henikoff (1984). Computer Analysis and Comparison of the DNA Sequences All computer analyses of sequence data were performed using the on-line facilities of the protein iden-

A

23 9.41 Kb

6.6-i

B

* e9.6 -7.6

4.4-r

2.31

FIG. 1. Southern blot analysis of a recombinant cosmid clone containing both the AAT gene and the ATR genomic DNA region. Approximately 1 pg of recombinant cosmid clone cATN13 was digested with EcoRI and fractionated in a 0.6% agarose gel. (A) Detection of fractionated DNA fragments by staining with ethidium bromide. (B) Autoradiogram of fractionated DNA transferred to nitrocellulose paper and probed with a “*P-nick-translated human AAT cDNA. Lane 1, DNA molecular size marker. Lane 2, cosmid cATN13 DNA.

CQ-ANTITRYPSIN-RELATED

167

GENE 1Kb

(ATO)

A.[-L



/ / / /

(TAA)

;

,

I I I I I I I I I I I

,d-m.L-Jq

/ / AAT / / / / / / / / / / / / /

ATR

(TAA)

I

I

B. --

BarnHI

Bamtll

ECORI

Hinfl

Pvull

Hinfl

-

Hlnfl

HIndIll

- - -

--

-

-

ECORI

Clal

I

I 1Kb

FIG. 2. Physical map of the cui-antitrypsin (AAT) and antitrypsin-related (ATR) loci. (A) EcoRI restriction map showing linkage of AAT and ATR sequences. Boxed areas indicate position of each gene. Solid regions show positions of exons within the AAT gene. The positions of the methionine translation initiation codon (ATG) and nonsense stop codon (TAA) for the AAT gene are shown in parentheses. (B) Partial restriction map of the ATR DNA fragment showing nucleotide sequencing strategy. Solid boxes indicate regions that hybridize with the AAT cDNA. Proposed translation start (ATG) and stop (TAA) codons are shown in parentheses.

partial EcoRI-digested cosmid DNA enabled the generation of a physical map of the AAT locus. As shown in Fig. 2A, the 7.7-kb EcoRI DNA fragment of the ATR sequence is located within 8 kb downstream of the authentic AAT gene. Structure of the Antitrypsin-Related

Gene

Restriction mapping and Southern blot analysis of the ATR DNA fragment resulted in the identification of four distinct hybridizing regions (Fig. 2B). The nucleotide sequences of all the hybridizing regions plus substantial flanking DNA segments were determined and are shown in Fig. 3. Comparison of this sequence with that of the authentic AAT gene revealed extensive homology, particularly those regions corresponding to the various exons of the AAT gene. Further examination of the ATR regions corresponding to the intron-exon junctions of the AAT gene showed that all of the donor and acceptor sites for RNA splicing are conserved (Fig. 4), suggesting that the ATR gene, if expressed, can be expressed into a mature mRNA encoding a protein of 420 amino acids in length.

The Antitrypsin-Related Gene Encodes a Putative Protease Inhibitor

The primary structure of the putative ATR protein was deduced from the exonic regions on the basis of comparison with the authentic human AAT gene (Fig. 5). It is evident that the putative ATR protein exhibits a high degree of amino acid conservation compared to that of AAT. In particular, the 24 amino acid residues comprising the signal sequence at the amino terminus of AAT have been conserved (Fig. 5). In addition, all three glycosylation sites have been conserved, although the first one is positioned 17 amino acids closer to the amino terminus (Fig. 5). These data suggest that if the ATR gene is expressed, the protein product is likely to be a secreted glycoprotein. Interestingly, the homology between the two proteins is weak at the active site region of AAT, and the active site itself has been changed from Met-Ser in AAT to Trp-Ser in the putative ATR protein (Fig. 5). The conservation of the critical active site Ser3@jin the putative ATR protein suggests that it is a protease

BAO ET AL.

168

A. 1 71 141 211 281 351 421 491 561 631 701 771 841

tatcttctat aattacaatc &2&t* GCCTGGTCCC -GGG ~GTACAAGA TTCGAATGCT CACAGA a CG &CCCGGCTCC

ctttcaatat acacaatatc CttgcagGx CAGCTCCCTG GACTGGGAGG GCTGGCTGAT CTCCCTGGGG CA CT AGCTGACCAC

-TCAG

EGAGCAGATC MCMTTATG m -CT gttccactga aataatccaa

atgtgtagaa autccctaat MTGCCATIC GTTGAGGATC ACCTTGCTTG CTATCACAAC ACCAAGGCTG

atgtttgcaa cacttgwag TCTGTCTEAT CCCAGGGAGA

CCAGAAGATC CAGCATGTCT ACACTCGCAC

A CGGCAGTAGC

CTGTTTGTTA AAGCCTCTTC TGGAGAAAAG AACTGGAAGA pgtaaggt acatattctc aaattactag

tgctggcaca tcatcatqtg GGGGCGTCCT TGCTGCCCAA TCCTATAACG

tgtgatgttt ccctuactta CCTGCTGGCA AAGACGGATA TCACCGACCT

TAGTCACCCC

AACAAGCGTG

AGAGATCCTG

GAAGGCCTGA

aataattcat aacctaccct S~CT~TGCT CATCCCACCA CGCCTTTGAT GCTATGGCCT ATGTCAACCT

ACMGAGTAT CA-GAGGCC MAGTAGTGG

GAAGCTAGTG

GACACGTTTT

ATTTGGTCAA agcactacct

ACACCTGAii gtccaacgcc

ttcttaagct

B. 1 71 141 211 281 351 421 491 561 631 101

771 841 911 981 1051 1121 1191 1261 1331 1401 1471 1541 1611 1681 1751 1821 1891 1961 2031 2101 2171 2241 2311 2381 2451 2521 2591 2661 2731 2801 2871 2941 3011 3081 3151 3221 3291 3361 3431 3501 3571 3641 3711

agtctgccta ctcctcgtat aaggagttcc atataaatgc attttacaqa cacacttcca WTCAAG BTGATAAACC -TGGG ATTGACCTAC agtcagaagc aaoaataaac < a<- -< tgctccatga ggaacactgt agaggacaga acgggggaag cagaaggctg aagccataca atttttacac tttctctgtc gataactgat tccacggaaa actgcgcact acccaacact aaggaggtgg agaggagcct taggttcagg tgtcacccag tgatatgggt gggttccaga SCTAC TT CACCAA;:TC gtgaggtcac cctgaggctg agatgggcca ccgctgttgt agaaagctta tcaccaagcc aaggacaact gagggtggcc ggacatgtca gacgtgcagc agggaaggac agggagggag aggatgaggc tctgcttctc -CCC @CATCATCA AActggctqt Gaaifgc& gaggctgtgt actgggcagg cttatggaag cctgagtgct

ggcatctgca tctcccctgc ctcaacagaa ccagccctct gattttactc attaactttc ttcttctgtg gcaccaggat taataaawt gaawagauu- -ckttttitc &aiiCCtgg GWGCGCA TTATGGTAGA ACCTGGGTAG ATTTGACATC Q&&&&Q GCCTTCTTCA AGCCACCTTG AAAATATCCA tgcccagacc tttgagccat ctcacatccc tuaatcacau --_ ggaggcagta cccggggtgi cccttcccat ataccatacc tagggctcct tcggaaggca agcctgtgtg gcaaggccca cgttcctgcc gtatcgatgc ttagtgtatg tgtgtgcatt cgtaaaacca agatgaaaat ccatgcaaac gccctcctga cagagagccc agagccaggc accagaaggg agcattcggc gctgccccac gttcccctgg acacagcagg atgagacacg catttcaagc tgccttaaca aaaggagaac aaaccaaggc ggtccgtgtg tcaggtttgt gtactaagcg tagtacccaa gaggacatgg ccactaagtg agaacaggag aaatgactgc TCCCAAACTG TCCATTTCTG TTCAGCAACG AGGCCGACCT cagaagaccg tgttgtcccc aagaaggggc taagagacac tggtcagacg gcggagctgg gtcattgacc ctttgcccag ggctcatttc aaaagacttt cgaatcaatg aatgatttga ttggcagttc tgtgaatgct aaaggacaac tttggcagtt gagtccatgt ccccaaggag ggggatgccc ctccactttc agacggaggg gagccagccc aggacgtgca gggtcagtgc tgtccacatg ggcctcgctc tccccttcag -TG CATC TGGAGG AGAAGGCCTG WTACAT WCTTT cactcctcag cccctcccct tctgcctgcg tgtgactgca gtggactcca ggtcacagtg catgtgctag gccaggatgg agcaaggggc taccaggagc agaacgctgt ccgcatgtgt

tgatccaggg gcatgagatc attaaccatc tttgccaatt qaacltqtcct ctgitgccca GGGCTTCCAT CACCGGGACA TCCTGCCCGA GAGAGCCTTT gtcaggactg ctactucacia ggaggtaaga cagggaggga acagtggagt gaaaaacaaa tgttagattg cgctgattat acgaaagtct ggtacctgct tgcccgctca ctggtgtgat gtgtaccccc cttccaggac agatgttagt taacttctga tatataggta tagttatttt ggagacgcaa tccatgCCtt GAACCTACAA CTCTGGAGTC tggtgtcggc aaatgaaggc ggaggcgcct aaccttccag ctggcccctt gctaaagatg gcgttcactt ctgtgaatgg cttcaagctg tgaattgtgg catcagccaa caccgagtgg caggcctggq TGGCTGTGC T GTCTMGTAT CCGCT CTTCA ccacccctag aacccctcCC ctgtctccgg agggggctga ggccgccaag gacccagaaa

aaaggcctca agaagcctcc cccaagccgc agttcaagta uccaaaawfu &ttcc(t
ggagaagtaa agccaaaggt tatcgaataa tatagaaact ctcaaauctc cttc+CA AGACCATCAT CAGCTGGGTG ATGTGGCAGT gatttccagt ggactggcca uaattcaaac tgggactiic accacacatt aggtagctac gaacagccag agtaaatgat tgctgatcta aataggaatc gaaagaacaa gccttcgggc tcatcccctg tccaggctgg ctagcctgaa cattctttga ttttttttta ttttattgta ctccctctga aagcaactgg cctgcatggc GTCCCTAGGA CACCCCTGAA tgtggaggtt aggatgcctg tggcactttc taatttatgc ggttttgagt aggaaaacta caccatgaag ttggggtctg aggagagagc agcaggggag gggagcaagg gagtgagttc gctcagcctc GAGAAAGGGA TGT CMCCG GGT;MTCCC acrataacatt tgagtctccc ctgtgttcat taagggcctg cagagagagc

gctcaatgct ggcaaggcat acagagccaa ctggattccc ctttcattca AGTGGAAAGA CAGAGTGCCT CTGGCACAGC TGGAAGAAAA tcaggggctg gagggctggg quaccatcca iictccatgt atcggagacc ccgtcagggg gccatgttag acttgagcat gtttattttt gaggctggaa ggaggcgaga ggctctgtct aacgggtgtc gggcccggtg ggqcttctca ttctgggtat acttttcttt tagattattt cttctgatct ccactgtggt agGTCTATCA ATCTGGGCAT GCTCTCCAAG gcagttctgt agattcactg agggcctccc tattgataca ttctggtgca aaggtggcca gacactgtca ggcacatgaa cctggaggca aggctcttat cctgtgtgac ctcacctcct ggaacgtatc CCGAAGCCAC ACCCT TCm ACCCAAAAAT aaaoaaaaoc ttt&gZg; ggagcacctg gacctgggag caggccaggg

FIG. 3. Nucleotide sequence of the ATR DNA fragment. Sequences homologous to exons of the human AAT gene are shown by underlined uppercase letters. Sequences homologous to intronic sequences are shown in lowercase letters. Panel A corresponds to sequences around exon II. Panel B corresponds to sequences including exons III, IV, and V. A putative polyadenylation signal is located at position 3418.

inhibitor and the substitution of the preceding amino acid from Met to Trp at the PI position suggests that it has a different substrate specificity. A similar pattern

(Fig. 6) that results in altered inhibitory specificities has previously been observed in other serine protease inhibitors (McRae et al., 1980; Courtney et al, 1985).

(u,-ANTITRYPSIN-RELATED EXON AAT ATR

ccatctctgtcttgcag ctgcctctgtcttgcag

2 GACAAT . . . . . . . . . ..TTAAAG GACAAT . . . . . . . . . ..TTCACG

AAT ATR

ccttcccctctctccag ccttccttttcttccag

3 GCAAAT . . . . . . . . . ..CAGAAG GCAAGT.. . . . . . . . ..CATAAG

gtgattcccc gtgatttcct

MT ATR

tcttttctgcacgacag tctttccagcatggcag

4 GTCTGC.. . . . . . . . ..TCCAAG GATCTA..... . . . . ..TCCAAG

gtgagatcac gtgaggtcac

MT ATR

gcttCtctcccCtccag gcttCtctccCCttcag

5 GCCGTG...........GAGCTG GCTGTG...........GAATTG

gtaaggttgc gtaaggtagc

FIG. 4. Comparison of the introdexon splice junctions of the AAT gene to those of the ATR sequence. Exon sequences are shown in uppercase letters, whereas intronic regions are in lowercase.

The Antitrypsin-Related Gene Arose from a Recent Duplication of the Authentic AAT Gene Interestingly, the high degree of nucleotide sequence homology between the AAT and the ATR gene sequences exists not only in exonic regions, but also in the introns. This extensive homology between intronic regions suggests that the ATR region arose from a very recent duplication of the authentic AAT gene. The homology between these two genes is much more extensive than that between any of the other members of the serine protease inhibitor family (Bao et al., 1987), and calculations based on the methods of Feng and Doolittle (1988) support the construction of an evolutionary tree with the ATR sequence and AAT gene diverging approximately 70 million years ago (Fig. 7). DISCUSSION

The serine protease inhibitor (SERPIN) superfamily (Carrel1 and Travis, 1985) includes a,-antitrypsin, al-antichymotrypsin, antithrombin III, chicken ovalbumin, angiotensinogen, and some additional new members. They share varying degrees of amino acid sequence homology, nucleotide sequence homology, and genetic structural organization (Hunt and Dayhoff, 1980; Leicht et al., 1982; Doolittle, 1983; Chandra et al., 1983; Ohkubo et al., 1983; Tanaka et al., 1984). It is postulated that members of this superfamily arose from one common ancestral gene, and several models of the divergent evolutionary history of this gene family have been proposed (Prochownik et al., 1985; Bao et al., 1987). The presence of an ATR sequence within the human genome has previously been mapped to chromosome 14 (Lai et al., 1983). In the present report we demonstrated that the ATR region is physically linked to and within 8 kb downstream of the authentic

GENE

169

AAT gene. This close physical linkage has allowed the presence of an EcoRI restriction fragment length polymorphism (RFLP) within the ATR region (Hodgson and Kalsheker, 1986) to be used as an adjunct to the conventional prenatal diagnosis of AAT deficiency (Abbott et al., 1987; Cox et al., 1985; Hejtmancik et al., 1986). Previously, it was considered that the two most closely related SERPIN family members were al-antitrypsin and ai-antichymotrypsin, which share an overall amino acid sequence homology of 42% and a nucleotide sequence homology of 56% between their cDNAs (Chandra et al., 1983). Furthermore, both genes are located at the q31-q32 region of human chromosome 14 (Schroeder et al., 1985; Rabin et al., 1986). These observations suggest that the divergence of these two genes occurred relatively recently compared to other members of the SERPIN family (Bao et al., 1987). The nucleotide sequence homologies between the AAT and the ATR sequences, however, range from 75 to 81% within exonic regions and 43 to 78% within introns. By virtue of this overall extensive nucleotide sequence homology and the identical structure between the two genes, it was estimated that the ATR sequence diverged from the authentic AAT gene approximately 70 million years ago. Recently, Hill and Hastie (1987) estimated that the divergence of several contrapsin-related genes in rats occurred approximately 20 million years ago. Contrapsin is a member of the SERPIN family in rodents that functions as the counterpart to al-antichymotrypsin (ACT) in humans (Hill et al., 1984). Interestingly, whereas a syntenic cluster of contrapsin-related genes is present in rodents (Hill et al., 1985), only one ACT gene is present in humans (Bao et al., 1987). These data clearly indicate that the individual members of the SERPIN family are evolving independently in rodents. The physical structure of the ATR genomic sequence is not consistent with that of a pseudogene, as it contains regions that are homologous to introns and exons of the authentic AAT gene. It is noteworthy that the donor/acceptor splice sites, amino-terminus signal peptide sequence, and two of the three asparagine-linked glycosylation sites within the AAT gene have been conserved in the ATR sequence. Thus, if this region of DNA is transcribed, it could be expressed as a secretory glycoprotein. However, degeneration of the nucleotide sequence around the active inhibitory site has occurred, changing the Met-Ser in AAT to Trp-Ser in the putative ATR protein. It has been established that the specificity of the SERPINS is dependent primarily on a single amino acid at the reactive center of the macromolecule (Johnson and Travis, 1978; McRae et al., 1980). This has been most

170

BAO ET AL. MT ATR MT

ATR

-24: -24:

. . . Met -j--I Met

Pro Ser Ser Pro Phe 1-1 Ser

1: Glu Asp Pro Gln 1: Glu Asp Pro Gln

Gly Gly

Val Val

Ser Ser

Asp Ala Asp Ala

Trp Gly Trp Gly Ala Ala

Ile Val

Gln Lys Gln Lys

Leu Leo Leu Ala

Gly

Leu Cys Cys Leu Val

( Leu Leu Leu Ala Gly Leu CYS Cys Leu Val Thr Asp Thr Thr Asp Ihr

Ser Eis Ser Eis

Ais His

MT AIR MT ATR

48: 50:

Thr Asn . . . Eis

MT ATR

73: 74:

Kill Ar8

MT ATR

Glu Gln

Ile Val

Phe Phe Ser Leu Val Thr

Leu Leu Ar8 Val l-lLeu Gln

Thr Leu Asn Gln Pro Asp Set Gln Ala l-lLeu Ser Ar8 l-lPro Asp Thr Ar8 Val Tbr

MT ATR

148: 149:

MT ATR

108: 189:

Gly Asp Thr Glu Ar8 Asp Thr Glu

Pho Glu Val Phe Lys Ala cl

Glu Glu

Lya Asp Glu Ar8

Ala Ala

Asp Gln Asp Eis Pro . . . Thr Phe . . . Asp Gin Gly Asp Trp Glu Asp Leu Ala Ar8 Gln Leu Ala Lys Ser Trp Leu

Bin

Gln Tyr

Ser Eis

Leu Ser Leu Gly Leu Set Leu Gly

Thr Lys Thr Lys

Ala Ala

Leu Gln Leu Thr Lsu Gln Lsu Thr

Thr Thr

Gly Gly

Lys Lys

His His

Ser Glu Ser Glu

Lys Lau Tyr Lyx Lau Tyr

Am

Ser

Ala Ala

Thr Glu Ile Met

Ala Ala run

110 Pbe Phe Leu Pro Phe Phe Ile Lsu Pro

Thr Glu

Lys Pha Lsu Glu Asn Ile Gln Ar8

Glu Gln

Glu Glu

Ala Ala

Pro

Gly Gly in

Ala Ala

Met Phe Lau Glu Ala Pro Ris Leu Glu Glu

273:

274:

MT ATR

208:

MT ATR

323:

MT ATR

348:

Ala

349:

Thr

MT ATR

373:

Leu Met 110 Glu Gln Ile 110 Lys 0111 Tyr 0

Am

Ala

Asn Phe han Ph4 cl

Glu Val

248: 240:

MT

Phm Tbr Val Sar Sor Ile

Lys Lys

MT ATR

ATR

Asp Th Asp Th

Gly Lou Phe Leu Sex Glu Sax r-lLou Phe Val Asn Lys

Lau Met Ala Gln Thr Ala Thr Ala

Ile

v

Asp Glu Gly Asp Pro Lys

Lys Lys cl

Lys Ala Lvs Ala

Eis Ris

Lau Gln Ais Trp Gln

Met

Lau Glu Lou Glu El

Glu Glu Lys

Am

Lys Ris

v Am

q L1 Tyr Tyr

Lw Tbr Ris Leu Thr Tyr El

Lou Gly Val Gly

ASn

Asp Ser

Ile

Ile

Bis

Lou

Glu Phe

299:

324:

374:

Pro

Leu Lys Leu Ssr Leu LYS Leu Ser

Asn Thr Lys 118 Tbr Am cl

Val Val

110 Pro Mat Sat Lys Ala Ttp Ser *Ii

Lys Ala Val Ala Ile Lys

Val Val

Pm Pro Tyr Gln

Ser Pro Lou Phe Hot Gly Pbe Pro Lou Pha 110 Gly II

Lys Lys

Leu Thr Lau Tbr

110 Asp Glu Lys Gly Thr Ile ASP Glu LYS 01~ Thr

Glu Ala Glu Ala

Glu Thr Val Val

Val Val

Am Am

Pro Pro

Thr Thr

Gln Gln

Lys Lys

l ** ***

FIG. 5. Predicted amino acid sequence of a putative expression product from the ATR sequence compared to that of the authentic AAT. Homologous sequences are within boxes. The signal peptide consists of amino acids -24 to -1. Asparagine-linked glycosylation sites are shown by vertical arrows, and the reactive site is depicted by asterisks.

strikingly demonstrated by analysis of the AAT Pittsburg variant (Lewis et al., 1978). In this variant, the Met-Ser active inhibitory site of AAT has been converted to Arg-Ser, which is the active site in antithrombin III (Chandra et al., 1981), and individuals

bearing this variant AAT protein exhibit excessive antithrombin activity in plasma and suffer from a lethal bleeding disorder. Therefore, the Trp-Ser active site in the putative ATR protein could alter its inhibitory specificity from that of AAT.

(pi-ANTITRYPSIN-RELATED

171

GENE Rfxictive site 1

AlTZ

:

Glu L?fs Gly

ltr

Glu Ala

...

AlaTrpSer

AAT

:

Glu Iys

'It-r

Glu Ala

...

FT.-u Met ser

Am

:

Leu Leu ser

ATIII

:

GUY Arg 3*

FIG. 6. Comparison of the tithrombin III (ATIII), and the within boxes. The reactive center residue dictates the specificity of

Gly

regions containing the active inhibitory sites of ai-antitrypsin (AAT), cyi-antichymotrypsin (ACT), anpredicted translation product of the antitrypsin-related (ATR) sequence. Homologous regions are shown of each protein is shown by a vertical arrow. A single amino acid (Pi position) preceding the common serine each protease inhibitor.

Hill et al. (1984) have reported that the Pi position at the reactive center of mouse AAT is tyrosine. However, Krauter et al. (1986) have reported a very similar sequence for mouse AAT with the exception that methionine is located in the P1 position. This discrepancy suggests the presence of multiple AAT genes in mice and supports the theory proposed by Hill and Hastie (1987) that a positive Darwinian selection is responsible for the rapid divergence of the reactive centers of the members of the SERPIN family. Furthermore, Krauter et al. (1986) have shown that the AAT gene and the AAT-related gene(s) are all localized on chromosome 12 in the mouse. The evolutionary relationship between the AAT-related genes in mice and those in humans cannot be determined

without further structural analysis of the AAT-related genes in the mouse genome. The authentic human AAT gene consists of five exons and four introns (Long et al., 1984). At present, we have identified regions within the ATR sequence that are homologous to the genomic AAT sequences from exons II through V. Exon I of AAT is only 50 bp in length and consists entirely of 5’ untranslated DNA (Long et al., 1984). At present, we have not yet detected regions upstream of the ATR sequence that are sufficiently homologous to exon I of the authentic AAT gene. Furthermore, using an exonic DNA fragment of the ATR gene as probe in an Sl nuclease protection assay, no transcript was detected in total RNA from human liver (R. Sifers, unpublished observations), indicating that the ATR sequence is not expressed in adult human liver. These data, however, do not exclude the possibility of its transcription in other tissues and during different developmental stages. Recently, we and others have demonstrated that the human AAT gene is expressed in a variety of cell types in addition to the liver when introduced into the germline of mice (Sifers et al., 1987; Carlson et al., 1988; Kelsey et al., 1987). Thus, further analyses using the transgenic mouse model must be performed in order to determine if the ATR sequence is expressed in vivo. ACKNOWLEDGMENTS

FIG. 7. Predicted evolutionary map of the serine protease inhibitor superfamily. Branches depict divergence of genes. Evolutionary time is shown in millions of years (MYR). Abbreviations are those described in the legend to Fig. 6, plus angiotensinogen (ANGIO), ovalbumin (OVAL), and Y protein (Y).

This work was partially supported by NIH Grants HL27509 and HL37136. J.J.B. is the recipient of a WHO fellowship on leave from the Department of Biophysics, the Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences. R.N.S. is the recipient of Postdoctoral Fellowship HL07343 from the NIH.

172

BAO ET AL.

S.L.C.W. is an Investigator of the Howard Hughes Medical Institute. We express our appreciation to Ms. Debbie Martin for typing this manuscript.

REFERENCES 1. ABBO’IT, C. M., MCMAHON, C. J., KELSEY, G. D., PARKER, M., WHITEHOUSE, D. B., CORNEY, G., POVEY, S., HOPKINSON, D. A., MIELI-VERGANI, G., AND MOWAT, A. (1987). Alpha-lantitrypsin-related gene (ATR) for prenatal diagnosis Lancet 1: 14251426. 2. ALTSCHUL, S. F., AND ERICKSON, B. W. (1986). Optimal sequence alignment using affine gap costs. Bull. Math. Biol. 48: 603-616. 3. BAO, J. J., SIFERS, R. N., KIDD, V. J., LEDLEY, F. D., AND Woo, S. L. C. (1987). Molecular evolution of serpins: Homologous structure of the human alpha-1-antichymotrypsin and alpha-1-antitrypsin genes. Biochemistry 26: 7755-7759. 4. BARTH, R. K., GROSS, K. W., GREMKE, L. C., and HASTIE, N. D. (1982). Developmentally regulated mRNAs in mouse liver. Proc. Natl. Acad. Sci. USA 79: 500-504. 5. CARLSON, J. A., ROGERS, B. B., SIFERS, R. N., HAWKINS, H. K., FINEGOLD, M. J., AND Woo, S. L. C. (1988). Multiple tissues express alpha-1-antitrypsin in transgenic mice and man. J. Clin. Invest., in press. 6. CARRELL, R., AND TRAVIS, J. (1985). cY1-Antitrypsin and the serpins: Variation and countervariation. Trends Biochem. Ski.

17. HEJTMANCIK, J. F., SIFERS, R. N., WARD, P. A., HARRIS, S., MANSFELD, T., AND Cox, D. W. (1986). Prenatal diagnosis of a-I-antitrypsin deficiency by restriction fragment length polymorphisms, and comparison with oligonucleotide probe analysis. Lancet 2: 767-769. 18. HENIKOFF, S. (1984). Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28: 351-359. 19. HILL, R. E., SHAW, P. H., BOYD, P. A., BAUMANN, H., AND HASTIE, N. D. (1984). Plasma protease inhibitors in mouse and man: Divergence within the reactive centre regions. Na-

21.

Biochemistry

22:

Nature

chemistry

26:

743-749.

14. DOOLI?TLE, R. F. (1983). Angiotensinogen is related to the alpha-1-antitrypsin family. Science 222: 417-419. 15. FENG, D. F., AND DOOLITTLE, R. F. (1988). Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Euol., in press. 16. GADEK, J. E., HUNNINGHAKE, G. W., FELLS, G. A., ZIMMJXRMAN, R. L., KEOGH, B. A., AND CRYSTAL, R. G. (1986). Evaluation of the protease-antiprotease theory of human destructive lung disease. Bull. Eur. Phvsiarmthal. Respir. 16: 27-40.

(London)

366:

96-99.

HODGSON, I., AND KALSHEKER (1986). RFLP for a gene-related sequence of alpha-1-antitrypsin (AAT). Nucleic Acids

23.

HUNT, L. T., AND DAYHOFF, M. 0. (1986). A surprising new protein superfamily containing ovalbumin, antithrombin III, and alpha-1-proteinase inhibitor. Biochem. Biophys. Res.

24.

JOHNSON, D., AND TRAVIS, J. (1978). Structural evidence for methionine at the reactive site of human alpha-1-proteinase inhibitor. J. Bial. Chem. 263: 7142-7144. KELSEY, G. D., POVEY, A. E., BYGRAVE, A. E., AND LOVELLBADGE, R. H. (1987). Species- and tissue-specific expression of human alpha-1-antitrypsin in transgenic mice. Genes Deu. 1: 161-171. KLOTZ, L. C., AND BLANKEN, R. L. (1981). A practical method for calculating evolutionary trees from sequence data. J.

Res.

14:6779.

Commun.

25.

26.

5055-5060.

9. COURTNEY, M., JALLAT, S., TESSIER, L.-H., BENAVENTE, A., CRYSTAL, R. G., AND LECOCQ, J.-P. (1985). Synthesis in E. Coli of alpha-1-antitrypsin variants of therapeutic potential for emphysema and thrombosis. Nature (London) 313: 149-151. 10. Cox, D. W., Woo, S. L. C., AND MANSFELD, T. (1985). DNA restriction fragments associated with alpha-l-antitrypsin indicate a single origin for deficiency allele PiZ. Nature (London) 316: 79-81. 11. DAYHOFF, M. 0. (1976). The origin and evolution of protein superfamilies. Fed. Proc. 36: 2132-2138. 12. DAYHOFF, M. O., ECK, R. V., AND PAFZK, C. M. (1972). In “Atlas of Protein Sequence and Structure” (M. 0. Dayhoff, Ed.), National Biomedical Research Foundations, Washington, DC. 13. DILELLA, A. G., KWOK, S. C. M., LEDLEY, F. D., MARVIT, J., AND WOO, S. L. C. (1986). Molecular structure and polymorphic map of the human phenylalanine hydroxylase gene. Bb

HILL, R. E., SHAW, P. H., BARTH, R. K., AND HASTIE, N. D. (1985). A genetic locus closely linked to a protease inhibitor complex controls the level of multiple RNA transcripts. Mol. Cell Biol. 5: 2114-2122. HILL, R. E., AND HASTIE, N. D. (1987). Accelerated evolution in the reactive centre regions of serine protease inhibitors.

22.

10:20-24. 7. CHANDRA, T., KURACHI, K., DAVIE, E. W., AND Woo, S. L. C. (1981). Induction of cu-1-antitrypsin mRNA and cloning of its cDNA. Biochem. Biophys. Res. Commun. 103: 751-758. 8. CHANDRA, T., STACKHOUSE, R., KIDD, V. J., ROBSON, K. J. H., AND Woo, S. L. C. (1983). Sequence homology between human cu-1-antichymotrypsin, a-1-antitrypsin, and antithrombin III.

311: 175-177.

ture(London) 20.

Theor.

96:

Biol.

864-871.

91:261-272.

KFXJTER, K. S., CITRON, B. A., Hsu, M.-T., POWELL, D., AND DARNELL, J. E. (1986). Isolation and characterization of the alpha-1-antitrypsin gene in mice. DNA 5: 29-36. 28. KURACHI, K., CHANDRA, T., DEGEN, S. J. F., WHITE, T. T., MARCHIORO, T. L., Woo, S. L. C., AND DAVIE, E. W. (1981). Cloning and sequence of cDNA coding for alpha-l-antitrypsin. Proc. Natl. Acad. Sci. USA 78: 68266830. 29. LAI, E. C., KAO, F.-T., LAW, M. L., AND Woo, S. L. C. (1983). Assignment of the alpha-1-antitrypsin gene and a sequencerelated gene to human chromosome 14 by molecular hybridization. Amer. J. Hum. Genet. 36: 385-392. 30. LATIMER, J. J., BERGER, F. G., AND BAUMANN, H. (1987). Developmental expression, cellular localization, and testosterone regulation of cu-1-antitrypsin in Mus caroli kidney. J. Biol. Chem. 262: 12541-12646. 31. LAU, Y. F., AND KAN, Y. W. (1983). Versatile cosmid vectors for the isolation, expression, and rescue of gene sequences: Studies with the human cu-globin gene cluster. Proc. Natl. 27.

Acad. 32.

33.

34.

Sci. USA 80:

5225-5229.

LAURELL, C. B., AND ERIKSSON, S. (1963). The electrophoretie alpha-l-globulin pattern of serum in alpha-1-antitrypsin deficiency. Stand. J. Clin. Invest. 15: 132-140. LAURELL, C. B., AND JEPPSSON, J. 0. (1975). Protease inhibitors in plasma. In “The Plasma Proteins” (F. W. Putnam, Ed.), Vol. 1, pp. 229-264, Academic Press, New York. LEICHT, M., LONG, G. L., CHANDRA, T., KURACHI, K., KIDD, V. J., MACE, M., DAVIE, E. W., AND Woo, S. L. C. (1982).

ai-ANTITRYPSIN-RELATED

35. 36.

37.

38.

39.

40.

Sequence homology and structural comparison between the chromosomal human alpha-l-antitrypsin and chicken ovalbumin genes. Nature (London) 297: 655-659. LEWIS, J. H., IAMMARINO, R. M., SPERO, J. A., AND HASIBA, U. (1978). Antithrombin pittsburg An a-1-antitrypsin variant causing hemorrhagic disease. Blood 61: 129-137. LONG, G. L., CHANDRA, T., Woo, S. L. C., DAVIE, E. W., AND KUF~ACHI, K. (1984). Complete sequence of the cDNA for human alpha-1-antitrypsin and the gene for the S variant. Biochemistry 23: 4828-4837. MCRAE, B., NAKAJIMA, K., TRAVIS, J., AND POWERS, J. C. (1980). Studies on reactivity of human leukocyte elastase, cathespsin G, and porcine pancreatic elastase toward peptides including sequences related to the reactive site of cYl-protease inhibitor (ai-antitrypsin). Biochemistry 19: 3973-3978. MESSING, J. (1983). New Ml3 vectors for cloning. In “Methods in Enzymology” (R. Wu., L. Grossman, and K. Moldave, Eds.), Vol. 101, Part C, pp. 20-78, Academic Press, New York. MUENSCH, H., GAIDULIS, L., KUEPPERS, F., So, S. Y., ESCANO, G., KIDD, V. J., AND Woo, S. L. C. (1986). Complete absence of serum alpha-1-antitrypsin in conjugation with an apparently normal gene structure. Amer. J. Hum. Genet. 38: 898-907. OHKUBO, H., KAGEYAMA, R., UJIHARA, M., HIROSE, T., IN-

41.

42.

43. 44.

45.

46.

GENE

173

AYAMA, S., AND NAKANISHI, S. (1983). Cloning and sequence analysis of cDNA for rat angiotensinogen. J. Biol. Chem. 80: 2196-2200. PROCHOWNIK, E. V., BOCK, S. C., AND ORKIN, S. H. (1985). Intron structure of the human antithrombin III gene differs from that of other members of the serine protease inhibitor superfamily. J. Biol. Chem. 260: 96089612. RABIN, M., WATSON, M., KIDD, V., Woo, S. L. C., BREG, W. R., AND RUDDLE, F. H. (1986). Regional location of (y-lantichymotrypsin and Lu-1-antitrypsin genes on human chromosome 14. Somut. Cell Mol. Genet. 12: 209-214. SANGER, F., NICKLEN, S., AND COULSEN, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467. SCHROEDER, W. T., MILLER, M. F., Woo, S. L. C., AND SAUNDERS, G. F. (1985). Chromosomal localization of the human alpha-1-antitrypsin gene (Pi) to 14q31-32. Amer. J. Hum. Genet. 37: 868-872. SIFERS, R. N., CARLSON, J. A., CLIFT, W. M., DEMAYO, F. J., BULLOCK, D. W., AND Woo, S. L. C. (1987). Tissue specific expression of the human alpha-1-antitrypsin gene in transgenie mice. Nucleic Acids Res. 15: 1459-1475. TANAKA, T., OHKUBO, H., AND NAKANISHI, S. (1984). Common structural organization of the angiotensinogen and the alpha-1-antitrypsin genes. J. Biol. Chem. 259: 8063-8065.