doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
Article
Comparison of Human Chromosome 6p25 with Mouse Chromosome 13 Reveals a Greatly Expanded Ov-Serpin Gene Repertoire in the Mouse Dion Kaiserman,1 Susan Knaggs,2 Katrina L. Scarff,1 Anneliese Gillard,1 Ghazala Mirza,2 Matthew Cadman,3 Richard McKeone,3 Paul Denny,3 Jessica Cooley,4 Charaf Benarafa,4 Eileen Remold-O’Donnell,4 Jiannis Ragoussis,2 and Phillip I. Bird1,* 1 Department of Biochemistry and Molecular Biology, Monash University 3800, Australia The Genomics Laboratory, Division of Medical Molecular Genetics, GKT Medical School, King’s College, London SE1 9RT, UK 3 MRC UK Mouse Genome Centre and Mammalian Genetics Unit, Harwell, Oxfordshire OX11 ORD, UK 4 Center for Blood Research, Harvard Medical School, Boston, Massachusetts 02115, USA
2
*To whom correspondence and reprint requests should be addressed. Fax: (+61) 3 9905 4699. E-mail:
[email protected].
Ov-serpins are intracellular proteinase inhibitors implicated in the regulation of tumor progression, inflammation, and cell death. The 13 human ov-serpin genes are clustered at 6p25 (3 genes) and 18q21 (10 genes), and share common structures. We show here that a 1-Mb region on mouse chromosome 13 contains at least 15 ov-serpin genes compared with the three ov-serpin genes within 0.35 Mb at human 6p25 (SERPINB1 (MNEI), SERPINB6 (PI-6), SERPINB9 (PI-9)). The mouse serpins have characteristics of functional inhibitors and fall into three groups on the basis of similarity to MNEI, PI-6, or PI-9. The genes map between the mouse orthologs of the Werner helicase interacting protein and NAD(P)H menadioine oxidoreductase 2 genes, in a region that contains the markers D13Mit136 and D13Mit116. They have the seven-exon structure typical of human 6p25 ov-serpin genes, with identical intron phasing. Most show restricted patterns of expression, with common sites of synthesis being the placenta and immune tissue. Compared with human, this larger mouse serpin repertoire probably reflects the need to regulate a larger proteinase repertoire arising from differing evolutionary pressures on the reproductive and immune systems. Key Words: serpin, ov-serpin, SPI3, SPI6, PI-6, PI-9, MNEI, phylogeny, gene structure
INTRODUCTION Proteins of the serpin superfamily are found in multicellular organisms and some viruses. Many serpins are extracellular protease inhibitors involved in diverse processes such as the regulation of blood coagulation, complement activation, fibrinolysis, and matrix remodeling. Serpin mutants are implicated in tumor progression, thrombosis, emphysema, angioedema, and familial dementia [reviewed in 1]. The most striking feature of a serpin is an exposed reactive center loop (RCL) located 30–40 amino acids from the carboxy terminus that resembles the natural substrate of the target protease, and contains a peptide bond cleaved during the protease–serpin interaction [2]. Cleavage causes conformational change in the serpin, accompanied by major structural distortion and inactivation of the protease. The protease and serpin remain
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved. 0888-7543/02 $35.00
covalently linked, and the complex is eventually removed from the system and degraded. Thus serpins function as suicide substrates regulating proteolysis [1]. Phylogenic studies clearly show that serpins fall into a number of distinct clades [3], and as a consequence human serpin gene nomenclature is now clade-based [1]. Comparison of serpin gene localization and structures shows that the genes for members of the same clade are often clustered, and have very similar numbers and positioning of introns [4,5]. In humans there are nine serpin clades: the largest, clade A, comprises the ␣1-antitrypsin-like serpins; the next largest, clade B, comprises the ovalbumin (ov)-like serpins [3]. Ov-serpins are emerging as a large class of serpins involved in the regulation of tumor progression, cell differentiation, and cell survival [reviewed in 1]. They are predominantly intracellular molecules, although release from cells can occur under certain circumstances, for example in
349
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
FIG. 1. Amino acid sequence alignment of the mouse chromosome 13 ov-serpins. New serpin sequences were deduced from direct sequencing of cDNA clones (SPI3C, R86, NK10, NK13, NK21B, NK26, EIA, EIB), or predicted from public nonredundant and htgs data (EIC, SPI3B, SPI3D, NK9, NK21) or the Celera database (AT2). Also shown are the sequences of the related human ov-serpins, PI-6, PI-8, PI-9, and MNEI (shaded). See Table 1 for accession numbers. Asterisks indicate predicted sequences; intron/exon boundaries are indicated by arrows above the alignment.
response to an inflammatory stimulus or during cell necrosis. Several ov-serpins are associated with disease: maspin is a tumor suppressor, the squamous-cell carcinoma antigens (SCCAs) appear in the circulation of patients with squamouscell carcinoma, megsin is upregulated in IgA nephropathy, and increased plasminogen activator inhibitor-2 is correlated with a favorable prognosis in breast cancer patients. The 13 human ov-serpin genes are split into two clusters, with 3 genes found on chromosome 6 and 10 on chromosome 18 [4,6]. The region at chromosome 6p25 containing the serpin gene cluster is within a segment positioned between the genetic markers D6S344/AFMB34ya5 and D6S1617 showing loss of heterozygosity in cervical carcinoma [7], implying that one or more of these serpins are tumor suppressors. The 6p25
350
serpin gene cluster is also in a region deleted in patients with craniofacial and anterior eye anomalies [8]. Similar abnormalities are seen in Del(13)Svea36H mice hemizygous for a syntenic region on mouse chromosome 13 [9]. Most ov-serpin genes have eight exons identically arranged to those in the chicken ovalbumin gene in terms of both position and intron/exon boundary phasing [5]; this suggests that a primordial ov-serpin gene was present in a common ancestor of birds and mammals. However, five of the human ov-serpin genes contain only seven exons [4], having lost the exon encoding a variable interhelical loop in the protein. Otherwise they are identical in structure to the eight-exon genes. Taken with the phylogenic analyses, the structure and localization of the human ov-serpin genes suggest that the clusters have
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
AF425083 (F)
NK21B Serpinb9g
Serpinb9f
Serpinb6e Serpinb6c
SPI3B
SPI3C
AF425084 (F)
Not assigned
R86B
Serpinb9b
U96705 (F) AK005491 (F)
U96708 (S)
NK21
Serpinb6b
R86
U96707 (F)
NK13
Serpinb8
Serpinb9e
U96703 (F)
NK10
Serpinb9c
U96709 (F)
U96706 (S) AK014448 (F)
NK9
Serpinb9d
NK26
U96704 (S)
AT2
Serpinb9
Serpinb9-ps1
U96700 (F)
SPI6
BG078882 BG065606 (S)
AI506527 (S)
None found
AA389124 (F)
AA409418 (F) AA409417
RP23-262J21 (c) (Acc. no. AL450406) RP23-262J21 (c) (Acc. no. AL450406)
(Acc. no. AL606533)
(Acc. no. AL589871) RP23-312D21
(Acc. no. AL606533) RP23-391I11 (c)
RP23-312D21
RP23-391I11 (c) (Acc. no. AL589871)
RP23-414F19 (p)
(Acc. no. AL450331)
AA408234
No. 618 (s) 5’-gaaatagtcacagtggggtcacc-3’ No. 621 (a/s) 5’-gcacttatggataaggtcacttggg-3’ No. 619 (s) 5’-caattgttctgaaggggtcatcacg-3’ No. 621 (a/s) 5’-gcacttatggataaggtcacttggg-3’
No. 260 (s) 5’-tccgctgagggcatcatt-3’ No. 496 (a/s) 5’-accccacatagtaatgtgcc-3’
No. 623 (s) 5’-ttgcagcctctgctggcaaaattatacttttctgtgat-3’ No. 624 (a/s) 5’-gtccattctcacagagcagtagaag-3’
No. 635 (a/s) 5’-tcctcacagagcagtaaggc-3’
No. 646 (s) 5’-gcagcctctgctgtagaatttatatttttatgttca-3’
RP23-414F19 (p) RP23-41L21 (c)
AA408235 (F)
BGO77981
No. 605(a/s) 5’-gctatgcagttgaggctagccctgcatg-3’
No. 637 (a/s) 5’-gatcggcaggttggcaccatcatg-3’ No. 604 (s) 5’-gccaatataggttttaggtgtatggtcc-3’
No. 636 (s) 5’-gtgattaggaacgcccggtgctgtag-3’
No. 690 (s) 5’-gctgtttaggttcttatcccc-3’ No. 691 (a/s) 5’-ggtggtggagttgccaagagag-3’ No. 677 (s) 5’-gccacagctgatgatactgtatgttc-3’ No. 674 (a/s) 5’-gggcacagatgagtgtcaggg-3’
RCL (s) and 3’-UTR (a/s) primers used for RT-PCR and genomic mapping No. 259 (s) 5’-acagctggcatgatgacg-3’ No. 370 (a/s) 5’-ggcaattgtgctcagggagaggagaacc-3’ No. 645 (s) 5’-gccatcatagaattttgctgtgcctc-3 No. 520 (a/s) 5’-gggatactgaagagagaactctccctgtg-3’
No. 646 (s) 5’-gcagcctctgctgtagaatttatatttttatgttca-3’ No. 635 (a/s) 5’-tcctcacagagcagtaaggc-3’
RP23-39I11 (c)
RP23-130F18 (c)
RP23-414F19 (p)
RP23-312D21 (Acc. no. AL606533) RP23-414F19 (p)
RP23-262J21(c) (Acc. no. AL450406) RP23-391I11 (c) (Acc. no. AL589871)
Htgs clones hit
(Acc. no. AL589871) RP23-312D21 (c) (Acc. no. AL606533) RP23-41L21 (c) (Acc. no. AL450331)
AW259940
AW907882 (S) BG245799 AW260489 (F)
AW538390 (S)
None found
Representative dbEST clone
TABLE 1: Accession numbers and details of mouse serpin cDNAs, ESTs, BAC clones, and primersa Gene symbol Serpinb6
NK21C
U25844 (F)
Acc. no.
Mouse serpin SPI3
Table 1 continued on next page
PI-8 (L40377)
Human ortholog PI-6 (Z22658) PI-9 (U71364)
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
Article
351
352 Serpinb1-ps1
EID
None found
None found
BE650533 (S)
AA201023 (F)
None found
Representative dbEST clone AA168365 (S)
RP23-41l21 (c) (Acc. no. AL450331) RP23-414F19 (c) RP23-262J21 (c) (Acc. no. AL450406)
RP23-262J21 (c) (Acc. no. AL450406) RP23-217N23 (c) CT7-236L20 (Acc. no. AC073666)
CT7-236L20 (Acc. no. AC073666)
CT7-236L20 (Acc. no. AC073666)
CT7-236L20 (Acc. no. AC073666) RP23-39I11 (c) (Acc. no. AL589871) RP23-217N23 (c) RP23-312D21 (Acc. no. AL606533)
RP23-217N23 (c) RP23-427P14 (c)
No. 544 5’-ttttatctattgagtagattcatggtg-3’ No. 545 5’-tatgcctggaggaaaacagg-3’ No. 546 5’-ccaccctttgctgttttgtt-3’ No. 547 5’-acagttccagacatccacctg-3’ No. 548 5’-ggcgtccaaggtccttcc-3’ No. 549 5’-aacgtgtgcctacaatgcac-3’
No. 702 5’-gctctgatccaagagaggttgata-3’ No. 703 5’-acccagaacactggtgagtg-3’
Primers used for mapping No. 536 (s) 5’-tgctcatcgtctatgcacaccaaga-3’ No. 537 (a/s) 5’-tttcttgtggcccttggctcaa-3’ No. 534 (s) 5’-agcatgcagttattcttctgtgtctgga-3’ No. 535 (a/s) 5’-gtagcacttgggggcaaagatcaa-3’ RUVBL (s) 5’-tagttttattggctcataagacctttgc-3’ RUVBL (a/s) 5’- acctggcagctttgtgcaattaagtaat-3’
No. 608 (s) 5’-ggcattattcaggtgctctgcgagaag-3’ No. 609 (a/s) 5’-gggaatgaatagctaagctctgccttc-3’ No. 692 (s) 5’-gttggatgctgcctgatgccc-3’ No. 693 (a/s) 5’-ttatggtcaagggcaagtg-3’
No. 420 (s) 5’-gaggcattgctacattctgt-3’ No. 421 (a/s) 5’-gagtcctgctttcattgtac-3’
No. 622 (a/s) 5’-gcacatatggataagataaagtggg-3’ No. 625 (s) 5’-aattgtgatgatgggtgtaccacc-3’ No. 626 (a/s) 5’-ctcagttctgaagatgggatgcc-3’
(Acc. no. AL513022) RP23-262J21 (Acc. no. AL450406) RP23-3N21 (Acc. no. AL513022)
RP23-3N21
RCL (s) and 3’-UTR (a/s) primers used for RT-PCR and genomic mapping No. 620 (s) 5’-gtgatgatgggtgcatcaccaact-3’
TABLE 1: continued Htgs clones hit
(F), Includes full-length coding sequence; (S), partial sequence including RCL; (c), confirmed by PCR; (p), not in htgs database but found by PCR; (s), sense; (a/s), antisense.
a
Mit265
D13Mit241
D13Mit136
D13Mit116.1
Whip F208046 (F) (RUVBL1)
Pher1
Marker Nmor2
Serpinb1c
EIC
AF252260 (F)
AF426025 (F) AK018226 (F)
EIB Serpinb1b
Serpinb1
AF426024 (F)
EIA
Gene symbol Serpinb6d Serpinb6-ps1
Acc. no.
SPI3E
Mouse serpin SPI3D
WHIP (NM_020135)
NMOR2 (NM_000904)
(M93056)
MNEI
Human ortholog
Article doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
evolved through at least two interchromosomal duplications and several intrachromosomal duplications [4]. To understand the biology of ov-serpins and ov-serpin gene evolution in more detail, we have begun to identify and examine mouse ov-serpin genes, and compare their localization and structures with human genes. Studies so far have shown that at least three mouse ov-serpin genes are found in regions syntenic with human chromosomes 6 and 18 [10–12]. Analysis of two of these genes has demonstrated that both the eight-exon and the seven-exon structures are conserved in the mouse [10,12]. However, RT-PCR-based screening of mouse immune tissue using degenerate primers that amplify the variable RCL has indicated that there are a number of mouse serpins that have no human counterparts [11]. Here we show that the mouse ov-serpin repertoire on chromosome 13 (syntenic with human chromosome 6) comprises at least 16 genes and several pseudogenes, with all those analyzed having the seven-exon gene structure. RT-PCR analysis indicates that most of these mouse genes have restricted expression, with many found in reproductive tissue. These results indicate that the mouse ov-serpin gene repertoire is significantly larger than human, and that it has arisen through intrachromosomal gene duplication.
RESULTS Identification and Characterization of New Mouse Ov-Serpins We previously described the mouse ortholog of human SERPINB6 (PI-6 gene), which is designated Spi3 [10]. Spi3 has the seven-exon ov-serpin structure and maps to mouse chromosome 13 between Pl1 and Ctla2a in a region syntenic with human chromosome 6 [10]. We have also described a paralog of human SERPINB9 (PI-9 gene), termed Spi6, which maps to the same region as Spi3 [11]. As part of the latter study we carried out a survey for other ov-serpins expressed in mouse immune cells and tissue. Using RT-PCR with degenerate primers designed to amplify the variable RCL sequences from ov-serpins, we cloned and sequenced partial cDNA sequences encoding seven new proteins, termed AT2, NK9, NK10, NK13, NK21, NK26, and R86 (GenBank accession numbers are given in Table 1). On the basis of sequence similarity between RCLs, it is likely that the NK10 gene represents the mouse ortholog of SERPINB8 (PI-8 gene), but none of the others have obvious human counterparts. (Although SERPINB8 is in the human chromosome 18 ov-serpin cluster, it is closely related to SERPINB6 and SERPINB9 and shares the same seven-exon structure.) Two other partial serpin cDNA sequences were cloned from a bone marrow library in the same study (mBM2A and mBM17 [11]). Subsequent investigations have revealed that the library was contaminated with rat sequences and that mBM2A and mBM17 probably represent the rat counterparts of SPI6 and NK10, respectively (P.I.B., unpublished data).
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
Article
To obtain full-length sequence data for the new mouse serpins, we iteratively scanned the nonredundant, mouse EST, and high-throughput genomic sequence (htgs) GenBank databases by BLASTN analysis for sequences similar but not identical to SPI3 and SPI6, and for sequences that matched the RCLs of the new serpins. This process yielded hits on the expressed sequence tag (EST) or nonredundant databases for NK9, NK10, NK13, NK21, NK26, and R86 (Table 1). No hits were obtained for AT2 in these databases. Apart from NK9 and NK26, these sequences also appeared in the htgs database, with all except NK10 falling on mouse chromosome 13 genomic clones (the clone containing NK10 has not been assigned to a specific chromosome, but is likely to belong to chromosome 1, which is syntenic with human 18q21). We identified four new sequences in the htgs database that strongly resemble SPI3 but have clearly distinct RCLs (Fig. 1). These are present in two overlapping chromosome 13 genomic clones (262J21 and 3N21), one of which also contains SPI3 (Table 1). We termed these new SPI3-related serpins SPI3B, SPI3C, SPI3D, and SPI3E. Finally, inspection of the Celera mouse database revealed sequences on chromosome 13 matching the previously published RCL of AT2 [11], and the first two exons of a gene closely resembling R86, which we termed R86B. Detailed analysis of genomic sequence data from BAC RP23-41L21 revealed two copies of NK21 (designated NK21 and NK21B) that are 98% identical through both exons and introns, with only slight differences in the lengths of introns E and F. Overall, 5 of 377 codons in the putative NK21 and NK21B open reading frames are different, with a key distinction at P5 in the RCL (glutamic acid and lysine, respectively). NK21 and NK21B are represented in the EST database, indicating that both are expressed (Table 1). In addition we identified sequences from a third serpin gene on RP23-41L21 that closely resembles NK21 but probably represents a pseudogene because of multiple frameshifts and termination codons in the putative exons. We termed this gene NK21C. Using the human SERPINB1 (encoding monocyte/neutrophil elastase inhibitor (MNEI)) sequence in the BLASTN program, we identified, obtained, and sequenced several related mouse EST clones. One clone (dbEST 638727; acc. no. AA201023) was full-length and was sequenced in both directions. On the basis of sequence similarity, particularly the RCL, we identified this serpin as the mouse counterpart of human MNEI, and it was designated EIA (Table 1). Several BACs containing portions of mouse chromosome 13 and also containing EIA sequences were found in the htgs database (Table 1), supporting the idea that the counterparts of human PI-6, PI-9, and MNEI also form a gene cluster in the mouse. Further analysis identified three other genes related to EIA on chromosome 13, termed EIB, EIC, and EID. Almost complete cDNA structures containing full-length open reading frames have been deduced for EIB and EIC (Fig. 1; E.R.-O., unpublished data). Information on EID is incomplete, but nucleotide sequence alignments indicate it most closely resembles EIC (> 90% homology; data not shown); however, the absence of
353
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
FIG. 2. Relationships between mouse and human ov-serpins. The amino acid sequences of the mouse chromosome 13 serpins and related human serpins were aligned using ClustalW. The alignment was used to calculate a phylogenic tree based on the neighbor-joining method using 1000 bootstraps. Mouse antithrombin (AT-III) was defined as the outgroup. Human serpins are boxed. NK10 is not part of the mouse chromosome 13 serpin gene cluster but is closely related and shown for comparison. The scale bar represents the number of substitutions per 100 amino acids. Bootstrap numbers are indicated at the nodes.
an identifiable exon 7 and the presence of multiple stop codons and frameshifts in other exons suggest it is a pseudogene. EIB is also represented in the nonredundant database by a RIKEN clone (Table 1), and preliminary results indicate both EIB and EIC are expressed (E.R.-O., unpublished data). We obtained and entirely sequenced full-length IMAGE EST clones encoding NK13, NK21B, NK26, R86, and SPI3C. Using information from EST and genomic databases we deduced the coding sequences for six of the remaining serpins (NK9, NK10, NK21, AT2, SPI3B, and SPI3D). We were unable to deduce fulllength coding sequences for EID, NK21C, or SPI3E because of a lack of EST data, incomplete htgs data, or assembly errors in the htgs data. However, it is likely that EID, NK21C, and SPI3E are pseudogenes because stop codons, deletions, and frameshifts were evident in almost all exons analyzed. Amino acid sequence alignments and comparisons (Fig. 1) showed that the new proteins are very similar to earlier described mouse ov-serpins such as SPI3 and SPI6. Each has features typical of serpins in general, and ov-serpins in particular. For example, they have the clearly identifiable, conserved proximal and distal hinge (serpin signature) motifs that flank the RCL in almost all serpins, indicating that each is potentially a functional protease inhibitor. By contrast, none has a discernible N-terminal signal peptide, and each is shorter than the serpin prototype ␣1-antitrypsin, both of which are characteristic of ov-serpins. In addition, many carry the FCAD (or very similar) motif in the distal hinge region that is typical of human PI-6, PI-8, and PI-9, and most terminate with FSSP (typical of many ov-serpins). Taken together, these observations indicate that all are new members of clade B of the serpin superfamily. Phylogeny To establish the evolutionary relationships between the mouse serpins, and their similarity or otherwise to human
354
clade B ov-serpins, we constructed a phylogenic tree using mouse antithrombin as a common ancestor (Fig. 2). This clearly shows that the mouse serpins fall into four distinct groups comprising SPI3-like proteins (SPI3, NK13, SPI3B–D), SPI6-like proteins (SPI6, R86, NK9, NK21, NK21B, NK26, AT2), EI-like proteins (EIA, EIB, EIC), and NK10. Adding human serpins to the tree illustrates that each mouse group corresponds to a single human serpin, suggesting that a common ancestor of humans and mice possessed single counterparts of PI-6, PI-8, PI-9, and MNEI, and that the PI-6, PI-9, and MNEI genes were duplicated in mice after the divergence from humans. This is supported by the high homology between EIA, EIB, and EIC (88%), NK21 and NK21B (98%), NK21 and NK26 (93%), SPI3 and NK13 (80%), and SPI6 and R86 (80%). Gene Nomenclature Human serpin gene nomenclature has recently been revised to accommodate phylogenic information [1]. In the new system, a serpin gene is given the descriptor SERPIN, followed by a capital letter indicating which clade [3] the encoded protein belongs to, and finally by a number to act as an identifier within the clade. Thus the gene encoding MNEI is SERPINB1, the gene encoding PI-6 is SERPINB6, and the gene encoding PI-9 is SERPINB9. The letter P is reserved for pseudogenes. The Mouse Gene Nomenclature Committee (http:// www.informatics.jax.org/mgihome/nomen/) recommends that for orthologous genes the same root symbol is used in the mouse as in human. Major clade designators are represented by a lowercase letter following the root symbol, the gene number is an arabic numeral, and paralogs not present in humans are indicated by a lowercase letter following the gene number. A hyphen followed by “ps” and an arabic numeral is used to indicate pseudogenes. Accordingly, we have used our phylogenic data to assign symbols for the mouse chromosome 13 ov-serpin genes (Table 1). These have been approved by the Mouse Gene Nomenclature Committee. For example, the previously identified gene Spi3, which encodes the ortholog to human PI-6, becomes Serpinb6, and the gene for the related serpin NK13 is assigned the symbol Serpinb6b. (Note that for maximum consistency with human nomenclature the designator Serpinb6a is not used for the SERPINB6 ortholog.) The pseudogene SPI3E becomes Serpinb6-ps1. The new gene encoding EIA becomes Serpinb1 because it is orthologous to SERPINB1.
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
TABLE 2: Expression of mouse ov-serpin genes Tissue
Ov-serpin SPI3
SPI3C
SPI3B
Skin
++
+
+
Brain
++
Eye
++
Heart
++
Lung
+
Kidney
++
Liver
++
Pancreas
+
Stomach
++
Lower intestine
++
Small intestine
++
Muscle
+
Lymph node
++
Thymus
++
Spleen
++
Testis
++
Uterus
++
Placenta
++
Embryo (d13)
+
ES cell
++
SPI3E
SPI3D NK21
NK26
AT2
NK9
R86
SPI6
NK13
EIA
+ + + +
++ ++ ++
+
+
+ + +
+ +
++
+++
+
+
+ +
+++ ++
+
+
+
+
+
+ +
+ +
Tissue Distribution of Mouse Ov-Serpins With the exception of AT2, EIC, and R86B, all the new genes are represented in the nonredundant cDNA and/or EST databases (Table 1), indicating that they are expressed in mouse tissue and are unlikely to represent pseudogenes. To establish the distribution of the various serpins, we prepared RNA from a range of mouse tissues for use in RT-PCR analysis. The primers used in the analysis are shown in Table 1, and in each case to ensure specificity they were designed to anneal to the RCL (which is the most variable sequence between serpins) and to the 3⬘-untranslated region (3⬘-UTR). Because both the RCL and 3⬘-UTR are in the last exon, the presence of contaminating genomic DNA could not be assessed by amplification across an intron/exon boundary, so genomic DNA traces were removed by DNase I treatment before cDNA synthesis. In each case a control reaction was run without addition of RT. Samples were also checked and normalized by amplification of mRNA encoding the housekeeping enzyme, GAPDH. Results of the analysis are shown in Table 2. SPI3 and EIA showed a broad tissue distribution, which is consistent with previously published results [10] and the expression patterns of their human counterparts PI-6 and MNEI [13–15]. SPI6 was detected in heart, large intestine, spleen, lymph node, and placenta. Contrary to previous results [11], SPI6 was not detected in lung. This discrepancy may be due to lack of
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
+
+
+
+
++
+
+ +
hybridization stringency in the original northern blot analysis, such that the SPI6 probe cross-reacted with other mouse serpin sequences. Nevertheless, expression of SPI6 in immune tissue and placenta is consistent with the observation that its closest human counterpart (PI-9) is present in immune tissue and placenta [16]. Consistent with previous results [11], we did not detect SPI6 in testis, which contrasts with the presence of PI-9 in human testis [16]. This result implies that SPI6 expression is either induced in testis or not required to fulfill a similar role to PI-9 in this tissue. Of the other serpins, NK13 showed a wide and overlapping distribution with SPI3 and EIA, indicating its presence in similar cells to SPI3 and EIA. By contrast the remaining serpins showed very restricted expression, being present only in skin (SPI3B, SPI3C), muscle (SPI3D), embryonic stem (ES) cells (SPI3C), or placenta (R86, NK9, NK21, and/or NKL21, NK26). Given the very high similarity between NK21 and NK21B, we cannot be sure whether one or both genes are expressed in placenta. Expression of AT2 and SPI3E was not detected. Gene Structures As outlined above, examination of human ov-serpin genes has shown two very similar gene organizations (8-exon or 7-exon structures). The only distinction is lack of an exon encoding the interhelical loop in the 6p25 genes encoding
355
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
TABLE 3: Organization of human and mouse ov-serpin genesa Gene
Intron A
B
C
Ref. D
E
F
Size (kb)
Phase
Size (kb)
Phase
Size (kb)
Phase
Size (kb)
Phase
Size (kb)
Phase
Size (kb)
Phase
SERPINB6 (PI-6)
2.0
Ub
4.0
0
1.0
0
1.5
1
5.0
0
0.7
0
[4]
Serpinb6 (SPI3)
4.3
U
1.4
0
4.2
0
2.3
1
4.0
0
0.5
0
[10]
Serpinb6b (NK13)
3.0
U
2.8
0
0.6
0
2.5
1
2.5
0
0.3
0
Serpinb6e (SPI3B)
U
1.9
0
0.1
0
1.2
1
3.1
0
0.3
0
Serpinb6c
U
1.8
0
2.0
0
1.3
1
13
0
0.3
0
U
2.1
0
1.0
0
1.4
1
1.4
0
0.3
0
0
0.8
0
1.8
1
1.3
0
1.3
0
(SPI3C) Serpinb6d (SPI3D) SERPINB9 (PI-9)
4.0
U
Serpinb9 (SPI6)
1.8
U
1.3
0
0.6
0
1.8
1
3.0
0
1.4
0
Serpinb9b (R86)
1.8
U
3.3
0
0.5
0
1.9
1
2.3
0
1.4
0
Serpinb9c (NK9)
1.8
2
0.7
0
0.4
0
1.8
1
2.4
0
1.4
0
Serpinb9f (NK21)
1.3c
U
1.3
0
0.5
0
1.4
1
2.6
0
1.9
0
Serpinb9g (NK21B)
1.7
U
1.3
0
0.5
0
1.4
1
2.6
0
1.9
0
Serpinb9d (AT2)
1.5
U
1.1
0
0.5
0
1.4
1
2.5
0
1.9
0
SERPINB1 (MNEI)
1.2
U
1.7
0
0.5
0
1.6
1
.08
0
1.8
0
[17]
SERPINB8 (PI-8)
0.3
U
1.1
0
1.7
0
1.6
1
1.4
0
1.7
0
[4]
7.2
U
1.3
0
3.8
0
1.7
1
1.0
0
1.0
0
Serpinb8 (NK10)
[4]
a
Human genes are indicated in bold. b U, Exon/intron boundary falls in 5’-UTR. c This is the estimated minimum size, pending final assembly of adjoining contigs.
MNEI (SERPINB1), PI-6 (SERPINB6), and PI-9 (SERPINB9), as well as the 18q21 genes encoding maspin (SERPINB5) and PI8 (SERPINB8) [4,5,17]. Mouse Serpinb6 (formerly Spi3) conforms to the seven-exon structure [10]. Analysis of htgs data and PCRbased mapping of genomic clones (data not shown) allowed us to deduce the structures of 10 of the new mouse chromosome 13 serpin genes (Table 3). All have the seven-exon structure,
356
with the first intron occurring in the 5⬘-UTR, and conserved phasing of the following exon/intron boundaries. The only exception to the rule is Serpinb9c (NK9), which has an 11residue N-terminal extension, meaning that its first intron interrupts the coding sequence rather than the 5⬘-UTR. Serpinb9f (NK21) and Serpinb9g (NK21B) are 98% identical across both exons and introns, indicating that duplication occurred
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
recently. Introns B, C, and D are identical in size, while intron E is 32 bp larger in Serpinb9f (NK21B) and intron F in Serpinb9g (NK21) contains a 64-bp repeat. Sequence assembly for intron A of Serpinb9g (NK21) is incomplete, but the data show that its minimum size is 1.3 kb and it is 98% identical to intron A of Serpinb9f (NK21B). Our ability to identify only two exons of R86B (exons 1 and 2) implies that the gene either encodes a truncated product or is a nonfunctional remnant. Given its uncertain status, a gene symbol was not assigned to R86B. Chromosome 13 Ov-Serpin Gene Localization and Order To localize and order the chromosome 13 serpin genes, we used SPI3 and SPI6 cDNA probes to isolate 30 clones from a 129sv mouse genomic PAC library [18]. We also obtained those BACs listed in the htgs database containing mouse chromosome 13 ov-serpin genes (RP23-391I11, 262J21, 41L21, 3N21). Using fingerprint, sequence-tagged site (STS), and genomic survey sequence data, we identified overlapping BACs and constructed a contig comprising 217N23-391I11-414F19-41L213N21-262J21. Two BACS that potentially overlapped 391I11 and 217N23 were also identified (312D21, 427P14), as was a BAC from a different library (CT7-236L20). The localization of the new serpin genes to mouse chromosome 13 was confirmed by FISH using the PACs 600A11, 532I21, 507G8, 578H9, 672N8, and 358P7, as well as the BACs 262J21, 3N21, 414F19, 312D21, and 427P14 (see Table 4 for the genes on these clones). FISH was carried out using metaphase chromosomes isolated from mice heterozygous for the Del(13)Svea36H deletion, which have lost a 25-Mb region of chromosome 13 syntenic with human 6p21–6p25 [9]. All of the clones hybridized to the wild-type chromosome 13, within the region lost in the Del(13)Svea36H deletion (data not shown). Thus the serpin genes are clustered in a region syntenic with human 6p25. In humans, genes encoding the human receptor (TNFRSF)-interacting serine-threonine kinase 1 (RIPK1; NM_003804) and biphenyl hydrolase-like protein (BPHL; NM_004332) are close to the human 6p25 serpin genes, and their mouse counterparts (Bphl and Ripk1) have been mapped to mouse chromosome 13 (J.R., unpublished data). Therefore the PACs were placed into contigs on the basis of hybridization with serpin probes and probes for Ripk1 and Bphl (data not shown). These contig data indicate that in the mouse, as in human, the RIPK1, BPHL, and serpin genes are clustered. Mapping and recent sequence analysis of the region of human chromosome 6 containing the genes for PI-6, PI-9, and MNEI has identified two other neighboring genes, besides RIPK1 and BPHL, which encode the Werner helicase interacting protein (WHIP; NM_020135) and NAD(P)H menadioine oxidoreductase 2 or quinone oxidoreductase (NMOR2; NM_000904; Fig. 3). In addition, our analysis of the sequence of the mouse clone 262J21, which contains the gene for SPI3, also identified two intronless pseudogenes similar to the calcium-sensing receptor (CaSR) gene, which is a member of the pheromone receptor family. One pseudogene appears to be derived from CaSR (AF110178), while the other appears to
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
Article
be derived from a differentially spliced variant (AF110179). We termed these pseudogenes Pher1 and Pher2 and designed PCR primers to Pher1 (Table 1). To confirm synteny between this region of mouse chromosome 13 and human chromosome 6p24–p25, we tested whether Nmor2, Pher1, and Whip are also present in the genomic clones. As shown in Table 4, Whip, Nmor2, and Pher1 are indeed associated with the mouse chromosome 13 ov-serpin genes. We also tested for the markers D13Mit241, D13Mit136, D13Mit116.1, and D13Mit265 thought to be in this region of mouse chromosome 13 [19]. D13Mit136 and D13Mit116.1 were present on a number of clones but none were positive for D13Mit241 or D13Mit265. Subsequent sequence analysis showed that D13Mit116.1 is within intron A of Serpinb6c (SPI3C). The PAC and BAC genomic clones were screened for the presence of ov-serpin and marker genes by PCR using the primer pairs listed in Table 1. Nine clones were negative for all the probes tested. With the exception of Serpinb8 (NK10), each of the ov-serpin genes was found on one or more of the remaining clones (Table 4). Although it is not yet represented in the htgs database, Serpinb9c (NK9) was found on the same clone as Serpinb9f (NK21) and Serpinb9e (NK26), demonstrating that these genes are clustered. This PCR analysis unequivocally placed the serpin and marker genes into three clusters encoding Whip-Serpinb1-Serpinb1-ps1Serpinb1c-Serpinb6b-Serpinb9-Serpinb9b-Serpinb1b, Serpinb6dSerpinb6-ps1-Pher1-Serpinb6e-Serpinb6c-D13Mit136-Serpinb6Nmor2, and Serpinb9c-Serpinb9d-Serpinb9e-Serpinb9fD13Mit116.1. Subsequent analysis of htgs sequence data from 312D21 placed Serpinb1-ps1 (EID) between Serpinb1 (EIA) and Serpinb1c (EIC), and R86B between Serpinb1c (EIC) and Serpinb6b (NK13), while data from 41L21 placed Serpinb9g (NK21B) and Serpinb9-ps1 (NK21C) distal to Serpinb9f (NK21). Celera sequence information suggests that Serpinb9c (NK9) and Serpinb9d (AT2) are distal to Serpinb1b (EIB), but there is a 0.2-Mb gap between Serpinb9d (AT2) and the Serpinb6 cluster. From the BAC fingerprint contig and STS content data we deduced linkage and the relative arrangement of the three clusters using the following overlaps: 391I11 (contains Serpinb1b (EIB)) and 414F19 (contains Serpinb9c (NK9)); 414F19 and 41L21 (contains Serpinb9-ps1 (NK21C)); 41L21 and 3N21 (contains Serpinb6-ps1 (SPI3E)); and 3N21 and 262J21 (contains Serpinb6 (SPI3)). Thus the organization of the serpin gene cluster is cen-Whip-Serpinb1-Serpinb1-ps1-Serpinb1c-R86B-Serpinb6bSerpinb9-Serpinb9b-Serpinb1b-Serpinb9c-Serpinb9d-Serpinb9e-Serpinb9f-D13Mit116.1-Serpinb9g-Serpinb9-ps1-Serpinb6d-Serpinb6ps1-Pher1-Serpinb6e-Serpinb6c-D13Mit136-Serpinb6-Nmor2-tel (Fig. 3). Analysis of sequence and PCR data (not shown) allowed us to orient most of the genes and pseudogenes with respect to the flanking markers (Fig. 3). Several other potential pseudogenes or gene remnants are also evident (Fig. 3, diamonds). From sequence information we estimate that the minimum size of the mouse serpin cluster from Whip to Ripk1 is 1-Mb, compared to 0.35-Mb in humans.
357
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
N
N
532 I21
N
N
N
661 D24
N
N
N N
598 A11 507 G8
N +
+
N
N +
N
+
+
N
N
+
+
+
+
N
672 N8
N
N
+
+
+
+
N
548 D1
N
N
N
577 C18
N
N
N
455 G5
N
N
358 P7
N
N
N
628 P21
N
N
N
339 H23
N
N
346 O17
N
N
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+ +
N
N
N
+
+
+
N
N
N
+
+
+
339 I24
N
N
473 C19
N
N
+
530 C20
N
N
N
642 K6
N
N
N
414 F19
N
N
391 I11
N
S
262 J21
N
N
41 L21
N
N
3 N21 217 N23
N
+
+
+
+
+
+
N
+
S
S
427 P14
+
+
N
236L20
S
S
S
S
S
+
+
+ S
+
S
+
+
+
+
+
+
+
+
+
+
+
+
+
+
N
N
N
S
+
S N
N N
+
N N
N
+
N
+
N
+
312 D21
+ +
+
N N
+
+
N
442 N20
+
+
+
470 B12
+
+
N
N
+
Mit136
+
Pher1 +
b6 (SPI3)
b6-ps1 (SPI3E) +
N
578 H9
+
b6c (SPI3C)
N
N
b6e (SPI3B)
N
584 B16
b6d (SPI3D)
b9-ps1 (NK21C)
b9g (NK21B)
Mit116
b9f (NK21)
b9e (NK26)
b9d (AT2)
b9c (NK9)
+
b1b (EIB)
N
b9b (R86)
b1c (EIC)
+
b9 (SPI6)
b1-ps1 (EID)
+
b6b (NK13)
b1 (EIA)
600 A11
Whip
PAC/BAC
b9-ps2 (R86B)
TABLE 4: Identification of serpin genes and markers on PACs and BACsa
S
N N
S N N N N N
N N
S
N
N
N
N N
N N N N N
N N
N
N N
N
N
N
N N
N N N N N
N N
N
N N N
N N N N
N N N
N N
N N N N N
N N
a
Clones 600A11–642K6 are from the RPCI21 library. Clones 414F19–427P14 are from the RP23 BAC library. 236L20 is from the CT7 library. +, Positive by PCR; blank, negative by PCR; S, positive by database sequence information; N, PCR not done. Serpin genes are indicated without the Serpin root symbol.
DISCUSSION There are 13 ov-serpins in humans, and they are implicated in processes such as tumor progression, inflammation, and the regulation of apoptosis. Three human ov-serpin genes are clustered at 6p25, with the remainder at 18q21, and all have
358
essentially the same structures. In this study we have identified and characterized 15 serpin genes on mouse chromosome 13, in a region syntenic to human 6p25. Synteny is illustrated by the presence of genes encoding RIPK1, BPHL, WHIP, and NMOR2 near the ov-serpin clusters in both humans and mice. Although the gene order is conserved in mouse and human, it is likely that the orientation of each locus on the chromosome
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
Article
FIG. 3. Comparison of the human and mouse ov-serpin gene clusters. The relationship of the htgs RP23 BACs to the mouse cluster is indicated, as deduced from genomic survey sequence data and PCR analysis. The Celera scaffold designations for this part of mouse chromosome 13 are shown; information on clones used to sequence the region can be found at http://www.celera.com. Arrows indicate gene orientation; closed circles indicate named pseudogenes; closed diamonds indicate unnamed pseudogenes or remnants; closed square indicates truncated gene. Accession numbers for the serpin sequences, markers, and htgs RP23 BACs are listed in Table 1. The clone 236L20 is from the CT7 library. Note that the map is not to scale. Sequence information indicates that the mouse cluster spans ~ 1-Mb, whereas the human cluster spans ~ 0.35-Mb. The structure and predicted orientation of the human 6p25 locus serpin is as shown in the NCBI map viewer (http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/maps.cgi?org=hum&chr=6).
may differ between species. Thus the gene for BPHL is telomeric in the mouse, but is centromeric in humans according to the draft genomic sequence. At present the orientation of the mouse locus is supported by analysis of the Del(13)Svea36H deletion mice [9], but the orientation of the human locus has yet to be confirmed by joining of the adjacent sequence contigs. Hence the formal possibility remains that the human locus is in the same orientation as the mouse. Although there has been a significant expansion in the number of serpin genes in the mouse cluster compared to human, gene structures have been conserved in terms of exon number and intron phasing. At least seven potential serpin pseudogenes or gene remnants have been observed at mouse chromosome 13 (EID, NK21C, SPI3E, and four unnamed sequences) and one at human 6p25 (PI8L1 [4]), but until sequencing of the region is completed we cannot rule out the existence of additional genes or pseudogenes. The possibility of additional genes is illustrated by one PAC (577C18) that was identified as carrying serpin sequences by probing for SPI3/SPI6 and is positive for Bphl, yet is negative for Nmor2 and all of the serpin genes described here. The several other PACs isolated in the same way that are negative for all markers may contain chromosome 1 serpin sequences (from a region syntenic to human chromosome 18q21). Phylogenic analysis indicates that the mouse serpins encoded by genes on chromosome 13 fall into three groups,
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
each with similarity to one of the three 6p25 human serpins. Hence mice have at least three MNEI-like serpins, seven PI-9like serpins, and five PI-6-like serpins. The simplest explanation for the evolution of the mouse chromosome 13 cluster is that a common ancestor of human and mice possessed a corresponding primordial cluster of three ov-serpin genes that were independently duplicated after the human and mouse lineages diverged. This would explain the conservation of gene structure. An alternative view that the common ancestor had 16 or more serpin genes in the primordial cluster, and that most were lost in humans after divergence, is not supported by human genomic sequence data, which show no evidence of gene remnants. One or more duplications of the entire primordial cluster is also unlikely to have occurred in mice because there are different numbers of serpins in each of the three subgroups. Our map of the chromosome 13 genes is also inconsistent with such a “duplication en-masse” model, which would predict conserved gene order within each subcluster (for example, PI-6-like, then PI-9-like, then MNEI-like). What are the likely functions of the mouse chromosome 13 serpins? On the basis of sequence similarity, functional studies and expression patterns, we have earlier suggested that SPI3 and SPI6 are counterparts of the human 6p25 serpins PI-6 and PI-9, respectively [10,11]. The work reported here supports this view by extending the expression pattern data, and showing that SPI6 has an identical gene structure
359
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
to PI-9. It is also clear from sequence comparisons that EIA is the counterpart of MNEI (Fig. 1). The 6p25 human serpins are widely expressed proteinase inhibitors thought to have intracellular cytoprotective roles [reviewed in 20]. Known targets of PI-6, PI-9, and MNEI are the leukocyte granule proteinases cathepsin G, granzyme B, and neutrophil elastase, respectively [14,21,22]. Although they may have additional targets and sites of action [15,23], the main physiological role of the 6p25 serpins is probably the protection of proteinaseproducing leukocytes, bystander cells, and lining cells from misdirected proteolysis. For example, PI-9 is produced by cytotoxic, dendritic, endothelial, and epithelial cells [16,24,25], and has been shown to protect cells from apoptosis induced by misdirected granzyme B [21]. It is likely that in the mouse SPI6 has a similar role to PI-9, and that SPI3 and EIA mirror PI-6 and MNEI, respectively. On the basis of our earlier studies identifying AT2, R86, NK9, NK13, NK21 and NK26 in mouse immune tissue [11], it is possible that these serpins also protect cells and tissues against mouse granule or secretory proteinases. As there are more granule proteinases in the mouse than in human [26], it is reasonable to suggest that the larger number of mouse ov-serpins reflects the need to regulate a larger number of proteinases. Inspection of the sequences of the chromosome 13 mouse serpins shows conservation of the proximal and distal hinge motifs flanking the RCL, suggesting that all are functional proteinase inhibitors. From RCL comparisons alone it is not possible to predict what proteinases are potential targets of these serpins. Although the overall homology between the serpins is high, their RCLs show substantial sequence variation, suggesting that they inhibit different proteinases and are unlikely to be functionally redundant. The exceptions are NK21 and NK21B, which are essentially identical in their RCLs and probably inhibit the same proteinase(s). At present we do not know whether NK21 and NK21B are differentially expressed, but it is possible they inhibit the same proteinase in different contexts, for example at different sites or in response to different stimuli. Our RT-PCR analysis also shows that apart from NK13, the new serpins have a very restricted expression pattern in normal adult tissue, suggesting that their targets are also restricted. However it remains to be seen whether expression of one or more of these genes also occurs during development or is induced by specific signals, for example in response to immune challenge or other stress. An interesting question raised by our analysis of mouse chromosome 13 is whether a similar expansion has occurred in the region of mouse chromosome 1 syntenic with human chromosome 18q21 (which contains the larger number of human ov-serpin genes). Expansion of mouse serpin gene number compared to human is not limited to the chromosome 13 ov-serpin subgroup, as genes corresponding to the single-copy human plasma serpins ␣1-antitrypsin and ␣1-antichymotrypsin have undergone duplication on mouse chromosome 12 to form clusters of 5 and 10 genes, respectively [27,28]. Members of each of these plasma serpin gene clusters show considerable divergence in their RCLs but
360
otherwise are very closely related, suggesting that they have also evolved to regulate different proteinases [29]. It is clear from database searches that there are indeed other mouse ov-serpins that have no obvious human counterparts, and are probably not encoded in the cluster we have analyzed. At present we do not know whether they represent genes on chromosome 13, chromosome 1 or elsewhere. Our preliminary analysis of the region on chromosome 1 encoding NK10 (the mouse homolog of PI-8) suggests that it has not been duplicated, and there is no evidence of additional NK10-like genes in the EST or htgs databases. The possibility therefore remains open that the chromosome 1 ov-serpin gene cluster has not undergone significant expansion. In closing, it should be noted that our work illustrates the importance of using complementary techniques to characterize clusters of closely-related genes, and the potential difficulties in relying on genomic nucleotide sequence information alone. At the time of writing neither the public nor private mouse genome sequence databases contain the entire, correctly constructed and ordered chromosome 13 serpin cluster. Although the assembly phase is complete, the Celera database has a 0.2-Mb gap in the middle of the serpin cluster, and is missing or failed to predict EID, NK21, NK21B, NK21C, NK26, SPI3B, SPI3D and SPI3E. By contrast, the public database sequences (htgs) remain mainly unassembled, and lack AT2, NK9 and NK26. In addition, the presence of introns in the 5’-untranslated regions of these serpin genes complicates identification of the first splice donor and the upstream promoter region using gene prediction programs. Comparison of EST and genomic sequences is required to reliably delineate and validate these and other intron/exon boundaries, therefore it is important that large-scale or genome-wide projects characterizing full-length ESTs are continued, and that the range of source tissues used in library generation is expanded.
MATERIALS AND METHODS Mouse EST and genomic clones. The mouse PAC library RPCI21 was constructed using female 129/SvevTACfBr mouse spleen genomic DNA [18]. It was obtained in the form of DNA on membranes from the HGMP Resource Centre, Hinxton, UK. ESTs were obtained from the IMAGE collection (ATCC). Clones from the RPCI23 C57BL6/J BAC library were obtained from the HGMP Resource Centre (Hinxton, UK) or from PAC/BAC Resources at the Murdoch Institute (Melbourne, Australia). Clone DNA was prepared by standard procedures [30]. Database mining and sequence analysis. Genbank accession numbers for the clones identified or used in this study are shown in Table 1. Mouse counterparts of human PI-6 (SPI3) and PI-9 (SPI6) have been reported [10,11]. Using the human MNEI amino acid sequence in the tBLASTn program (http://www.ncbi.nlm.nih.gov/BLAST/ [31]), we found a clone in the GENBANK mouse EST database that we identified as the mouse counterpart of human MNEI, on the basis of very high homology in the RCL. We designated this serpin EIA. A full-length RIKEN clone was released and identified later in the GENBANK non-redundant database. In all human and mouse ov-serpin genes characterized so far, the RCL and 3’-UTR are encoded by the last exon [4,5]. Our previous study identified 100 bp fragments of new, expressed mouse serpin genes on the basis of RCL sequence comparisons [11]. Each of these 100 bp fragments comprised part of
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
the coding sequence and 3’-UTRs were not identified. As a first step towards obtaining full-length sequences of the new ov-serpins, the BLASTn program (http://www.ncbi.nlm.nih.gov/BLAST/ [31]) was used to search the GENBANK mouse EST database with the previously identified RCL fragments [11] for EST clones containing both RCL and 3’-UTR sequences. Of the many clones identified, one was apparently full length with essentially complete sequence in the database (NK9), and three (EIB, NK21 and NK26) had inserts of sufficient length (> 1.2 kb) to encode a full-length serpin. ESTs encoding EIB, NK21 and NK26 were obtained from the ATCC IMAGE collection and sequenced on both strands. The full-length cDNA sequences of NK21 and NK26 were aligned to SPI6 and SPI3 using ClustalW (http://www.clustalw.genome.ad.jp/), and a consensus nucleotide sequence for the mouse ov-serpin coding region was derived. (Subsequent studies showed that the NK21 EST actually represents a highly homologous gene, designated NK21B.) The consensus sequence was used with BLASTn to identify clones in the GENBANK high throughput genomic sequence (htgs) database that contain mouse ov-serpin sequences. These htgs sequences were then compared by BLASTn to the known RCL sequences [11]. Several BACs carrying EIA, NK10, NK13, NK21, R86, SPI3 and SPI6 were identified. In addition, six previously unknown serpin genes (SPI3B-E, NK21B, NK21C) were identified by BLASTn analysis using the consensus sequence. No htgs sequences corresponding to AT2, NK9 or NK26 were identified. Inspection of data in the Celera mouse database (http://www.celera.com) revealed genomic sequences containing NK9 and AT2. The gene prediction program GENSCAN (http://genes.mit.edu/ GENSCAN.html [32]) was then used to analyze the htgs data to predict the full-length open reading frames of NK10, NK13, R86, AT2, and the five novel serpins. Intron size and phasing for the ov-serpin genes was predicted by comparing the experimentally determined (SPI3, SPI6, NK21B) and predicted (NK10, NK13, R86, SPI3B-E, AT2) cDNA sequences to the corresponding genomic (htgs) sequences using the program EST_GENOME (http://zeno.well.ox.ac.uk/ git-bin/est_genome [33]). Because ov-serpin genes have an intron in the 5⬘-UTR, it was difficult to locate exon I in htgs data de novo using BLASTn and GENSCAN, and hence to identify the beginning of the relevant genes. Accordingly, exon II of each predicted gene was used with BLASTn to identify corresponding clones in the mouse EST database that contain sequences upstream of exon II. This approach identified EST clones for NK10, NK13, R86 and SPI3C, which were obtained from the relevant EST collection and completely sequenced. In each case the sequencing data yielded the exon I sequence, and confirmed the predicted open reading frames. The full-length cDNA sequence was then used to identify the corresponding exon I in the htgs data, and complete the determination of intron size and phasing using EST_GENOME. Because no ESTs containing exon II of SPI3B or SPI3D were identified, the beginning of these genes and size of intron A could not be identified. Due to incomplete or incorrect assembly of htgs data, the full nucleotide sequence and structure of the EIA, EID and SPI3E genes could not be determined. Phylogenic trees based on the amino acid sequences of the new serpins were constructed as described [3]. DNA sequencing. Plasmids containing EIA, EIB, NK10, NK13, NK21B, NK26, R86, and SPI3C ESTs were sequenced on both strands using an “oligonucleotide walk” strategy. Primers were purchased from Sigma and used with purified plasmid template DNA in the Big-Dye system (Amersham Pharmacia Biotech). Samples were run on an Applied Biosciences 373 DNA sequencer. RT-PCR analysis. Extraction of RNA from mouse (C57BL6) tissue samples was performed using the RNAzol B RNA isolation system (Tel-Test). To remove any contaminating DNA, 5 g of RNA was incubated with 2 units of RQ1 RNase-free DNAse (Promega) in Taq DNA polymerase buffer and 3.75 mM MgCl2 at room temperature for 30 minutes. To inactivate the DNase, EDTA was added to a final concentration of 1 mM, and the samples were phenolextracted. Approximately 2.5 g of DNA-free RNA was reverse transcribed with 1 g of oligo dT (Amersham Pharmacia Biotech) in the presence of 20 units RNase inhibitor and 200 units M-MLV reverse transcriptase (Promega) in a total volume of 25 l. Parallel reactions lacking reverse transcriptase were performed to ensure the absence of genomic DNA. The reactions were incubated at 42⬚C for 2 hours. To assess the efficiency of cDNA synthesis and absence of contaminating genomic DNA, 1 l from each reaction was used in a 30 cycle PCR (10 mM Tris, pH 9, 2.5 mM MgCl2 , 50 mM KCl, 0.1% Triton X-100, 200 M dNTP) with 10 pmol primers specific for GAPDH (5⬘-gaccccttcattgacctcaac-3⬘
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.
Article
and 5⬘-gatgaccttgcccacagcctt-3⬘). Approximately 1 l of cDNA was used in each PCR used to amplify a mouse serpin sequence with 20 pmol of the appropriate primer combination listed in Table 1. 30 cycle PCR reactions were performed in 50 l containing 1.5 or 2.5 mM MgCl2 and the highest possible annealing temperature (usually 60–65⬚C) empirically determined for each oligonucleotide combination on a genomic DNA template. Amplified samples were assessed by electrophoresis of approximately 20 l on 2% ethidium bromide agarose gels. PAC mapping. Genomic PAC clones containing ov-serpin genes were identified in the RPCI21 mouse genomic PAC library by screening with cDNA probes corresponding to SPI3, SPI6 or human PI8. Plasmids containing these cDNAs were cleaved with the appropriate restriction endonuclease(s) to release the insert, which was gel purified using a BIO 101 kit and the concentration determined by gel electrophoresis. DNA (50 ng) was labeled using a Rediprime II kit (Amersham Pharmacea Biotech) with 0.05 mCi/1.85 MBq of [␣-32P]dCTP and incubated at 37⬚C for 30 minutes. PAC library membranes were prehybridized for 1 hour at 65⬚C in hybridization solution (10% Dextran SO4, 10⫻ Denharts, 50 mM Tris HCl, 6⫻ SSC, 1% Sarkosyl). The probe was denatured at 95⬚C for 5 minutes, added to pre-warmed hybridization mix and hybridized for 16 hours at 65⬚C. The filters were then washed in 2⫻ SSC, 0.1% SDS at 65⬚C for 20 minutes, followed by two 20-minute washes in 1⫻ SSC, 0.1% SDS. Positive clones were identified and ordered from the HGMP Resource Centre, Hinxton, UK. Contig building. DNA from the positive clones was prepared by standard procedures [30]. Approximately 200 ng of clone DNA was incubated with 10 U EcoRI at 37⬚C for 2 hours, and the sample was split and resolved on two 0.8% ethidium bromide agarose gels. DNA was transferred to membranes by alkali blotting techniques as recommended by the manufacturer (Amersham Hybond N+) and hybridized to SPI3, SPI6 or human PI8 probes. From these results it was possible to identify the serpin genes contained in individual clones and to build contigs. Two IMAGE clones, IMAGE 739159 and IMAGE 592125, corresponding to the Ripk1 and Bphl genes were obtained from the HGMP Resource Centre, Hinxton, UK, digested and used as probes as described above on the clones to aid contig construction and further refine the map of the region. The RP23 BAC contig was built using FPC [34] with a combination of STS content (as described below, and M.C., R.M., Chris Sellick, and P.D., unpublished data) and restriction endonuclease digestion data (Genome Sequence Centre; BC Cancer Research Centre; http://www.bcgsc.bc.ca/projects/ mouse_mapping/). Confirmation of BAC order was performed by BLASTn analysis using the following genomic survey sequences, GenBank acc. nos. AQ983039, AQ929164, AZ110066, AZ110072, AZ250279, AZ250271, AZ060024, AZ561201, AZ561203, AAZ027750, AQ970854, AAZ088339, and AZ060148. Overlap of BACs 41L21 and 414F19 was established using the STS marker M05614 (http://www-genome.wi.mit.edu [35]). FISH. Ear explants from Del(13)Svea36H mice were washed in 100% ethanol and then twice in phosphate-buffered saline (PBS). Each ear was chopped into 4 parts and transferred into a 25 cm3 flask containing 5 ml Dulbecco’s medium containing 10% fetal calf serum, 4 mM L-glutamine, and 50 g/ml gentamycin. The explants were grown at 37⬚C in 5% CO2 until fibroblasts had reached about 80% confluence, then Colcemid (Gibco) was added to a final concentration of 75 ng/ml. After 2 hours the cells were washed with PBS and incubated for 5 minutes with 0.5 ml of 0.25% trypsin, 0.02% EDTA (Sigma). We added 3 ml of 75 mM KCl to the detached cells, which were transferred to a centrifuge tube. After 20 minutes at 37⬚C, 10 drops of pre-fixative (3:1 methanol: acetic acid) were mixed in well. The cells were collected by centrifugation at 1000 rpm in a Jouan C312 centrifuge, resuspended in fixative then collected again by centrifugation. This step was repeated before finally resuspending the pellet in 1 ml of fixative. The FISH procedure was as described [36] except mouse cot1 DNA was used.
ACKNOWLEDGMENTS We thank Pannos Ioannou (Murdoch Institute, Melbourne) for providing BACs, Ruth Arkell (MRC UK Mouse Genome Centre & Mammalian Genetics Unit, Harwell) for Del(13)Svea36H mice, Chris Sellick and Anne Southwall (MRC, Harwell) for technical assistance, Owen McCann and Simon Gregory (Sanger Centre) for assistance with FPC, and Nicos Tripodis (Division of Medical Molecular Genetics, GKT Medical School) for fibroblast preparation, and James Irving (Monash University) for discus-
361
Article
doi:10.1006/geno.2002.6716, available online at http://www.idealibrary.com on IDEAL
sions. Parts of these data were generated through use of the Celera Discovery System and Celera’s associated databases. This work was supported by the National Health and Medical Research Council of Australia, the Wellcome Trust (grant 060759/Z), the Medical Research Council (UK), the National Institutes of Health (grant HL66548), and Monash University. RECEIVED FOR PUBLICATION OCTOBER 23; ACCEPTED DECEMBER 27, 2001.
REFERENCES 1. Silverman, G. A., et al. (2001). The serpins are an expanding superfamily of structurally similar but functionally diverse proteins. J. Biol. Chem. 276: 33293–33296. 2. Huntington, J. A., Read, R. A., and Carrell, R. W. (2000). Structure of a serpin–protease complex shows inhibition by deformation. Nature 407: 923–926. 3. Irving, J. A., Pike, R. N., Lesk, A. M., and Whisstock, J. C. (2000). Phylogeny of the serpin superfamily: implications of patterns of amino acid conservation for structure and function. Genome Res. 10: 1845–1864. 4. Scott, F. L., et al. (1999). Human ovalbumin serpin evolution: phylogenic analysis, gene organization, and identification of new PI-8-related genes suggest that two interchromosomal and several intrachromosomal duplications generated the gene clusters at 18q21–q23 and 6p25. Genomics 62: 490–499. 5. Ragg, H., Lokot, T., Kamp, P.-B., Atchley, W. R., and Dress, A. (2001). Vertebrate serpins:construction of a conflict-free phylogeny by combining exon–intron and diagnostic site analyses. Mol. Biol. Evol. 18: 577–584. 6. Bartuski, A. J., Kamachi, Y., Schick, C., Overhauser, J., and Silverman, G. A. (1997). Cytoplasmic antiproteinase 2 (PI8) and bomapin (PI10) map to the serpin cluster at 18q21.3. Genomics 43: 321–328. 7. Chatterjee, A., et al. (2001). Mapping the sites of putative tumor suppressor genes at 6p25 and 6p21.3 in cervical carcinoma: occurrence of allelic deletions in precancerous lesions. Cancer Res. 61: 2119–2123. 8. Davies, A. F., et al. (1999). Delineation of two distinct 6p deletion syndromes. Hum. Genet. 104: 64–72. 9. Arkell, R. M., et al. (2001). Genetic, physical and phenotypic characterization of the Del(13)Svea36H mouse. Mamm. Genome 12: 687–694. 10. Sun, J., Rose, J. B., and Bird, P. (1995). Gene structure, chromosomal localization and expression of the murine homologue of human proteinase inhibitor 6 (PI6) suggests divergence of PI6 from the ovalbumin serpins. J. Biol. Chem. 270: 16089–16096. 11. Sun, J., et al. (1997). A new family of 10 murine ovalbumin serpins includes two homologs of proteinase inhibitor 8 and two homologs of the granzyme B inhibitor (proteinase inhibitor 9). J. Biol. Chem. 272: 15434–15441. 12. Dougherty, K. M., et al. (1999). The plasminogen activator inhibitor-2 gene is not required for normal murine development or survival. Proc. Natl. Acad. Sci. USA 96: 686–691. 13. Coughlin, P., Sun, J., Cerruti, L., Salem, H. H., and Bird, P. (1993). Cloning and molecular characterization of a human intracellular proteinase inhibitor. Proc. Natl. Acad.. Sci.USA 90: 9417–9421. 14. Remold-O’Donnell, E., Chin, J., and Alberts, M. (1992). Sequence and molecular characterisation of human monocyte/neutrophil elastase inhibitor. Proc. Natl. Acad. Sci. USA 89: 5635–5639.
362
15. Scott, F., et al. (1998). Proteinase inhibitor 6 (PI-6) expression in human skin: induction of PI-6 and a PI-6/proteinase complex during keratinocyte differentiation. Exp. Cell Res. 245: 263–271. 16. Sun, J., et al. (1996). A cytosolic granzyme B inhibitor related to the viral apoptotic regulator cytokine response modifier A is present in cytotoxic lymphocytes. J. Biol. Chem. 271: 27802–27809. 17. Zeng, W., Silverman, G. A., and Remold-O’Donnell, E. (1998). Structure and sequence of human M/NEI (monocyte/neutrophil elastase inhibitor), an ov-serpin family gene. Gene 213: 179–187. 18. Osoegawa, K., et al. (2000). Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 10: 116–128. 19. Hong, H. -K., Lass, J. H., and Chakravarti, A. (1999). Pleiotropic skeletal and ocular phenotypes of the mouse mutation congenital hydrocephalus (ch/Mf1) arise from a winged helix/forkhead transcription factor gene. Hum. Mol. Genet. 8: 625–637. 20. Bird, P. I. (1999). Regulation of pro-apoptotic leucocyte granule serine proteinases by intracellular serpins. Immunol.Cell Biol. 77: 47–57. 21. Bird, C. H., et al. (1998). Selective regulation of apoptosis: the cytotoxic lymphocyte serpin proteinase inhibitor 9 protects against granzyme B-mediated apoptosis without perturbing the Fas cell death pathway. Mol. Cell. Biol. 18: 6387–6398. 22. Scott, F. L., et al. (1999). The intracellular serpin proteinase inhibitor 6 (PI-6) is expressed in monocytes and granulocytes and is a potent inhibitor of the azurophilic granule proteinase, cathepsin G. Blood 93: 2089–2097. 23. Kato, K., et al. (2001). Serine proteinase inhibitor 3 and murinoglobulin I are potent inhibitors of neuropsin in adult mouse brain. J. Biol. Chem. 276: 14562–14571. 24. Buzza, M. S., et al. (2001). The granzyme B inhibitor, PI-9, is present in endothelial and mesothelial cells, suggesting it protects bystander cells during immune responses. Cell. Immunol. 210: 21–29. 25. Bladergroen, B. A., et al. (2001). The granzyme B inhibitor, protease inhibitor 9, is mainly expressed by dendritic cells and at immune-privileged sites. J. Immunol. 166: 3218–3225. 26. Smyth, M. J., and Trapani, J. A. (1995). Granzymes: exogenous proteinases that induce target cell apoptosis. Immunol. Today 16: 202–206. 27. Borriello, F., and Krauter, K. S. (1991). Multiple murine alpha 1-protease inhibitor genes show unusual evolutionary divergence. Proc. Natl. Acad. Sci. USA 88: 9417–9421. 28. Inglis, J. D., and Hill, R. E. (1991). The murine Spi-2 proteinase inhibitor locus: a multigene family with a hypervariable reactive site domain. EMBO J. 10: 255–261. 29. Paterson, T., and Moore, S. (1996). The expression and characterization of five recombinant murine ␣ 1-protease inhibitor proteins. Biochem. Biophys. Res. Commun. 219: 64–69. 30. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). Molecular Cloning. A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 31. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215: 403–410. 32. Burge, C., and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94. 33. Mott, R. (1997). EST_GENOME: a program to align spliced DNA sequences to unspliced DNA. Comput. Appl. Biosci. 13: 477–478. 34. Soderland, C., Humphrey, S., Dunham, A., and French, L. (2000). Contigs built with fingerprints, markers and FPC V4.7. Genome Res. 10: 1772–1787. 35. Nusbaum, C., et al. (1999). A YAC-based physical map of the mouse genome. Nat. Genet. 22: 388–393. 36. Davies, A. F., et al. (1995). Evidence of a locus for orofacial clefting on human chromosome 6p24 and STS map of the region. Hum. Mol. Genet. 4: 121–128.
GENOMICS Vol. 79, Number 3, March 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved.