General and Comparative Endocrinology 108, 199–208 (1997) Article No. GC976965
Elephantfish Proinsulin Possesses a Monobasic Processing Site Michael A. Gieseg, Peter A. Swarbrick, Lana Perko, Robert J. Powell, and John F. Cutfield1 Biochemistry Department, School of Medical Sciences, University of Otago, Dunedin, New Zealand Accepted June 23, 1997
Total pancreatic RNA from the holocephalan species Callorhyncus milii (elephantfish) was used to make cDNA as a template for the polymerase chain reaction. Three redundant primers based on the known amino acid sequence of elephantfish insulin were used to amplify a fragment of proinsulin comprising truncated B-chain, complete C-peptide, and complete A-chain. Whereas the C-peptide/A-chain junction contained the expected dibasic cleavage site (-Lys-Arg-), the B-chain/Cpeptide junction was found to contain only a single Arg, the first such site to be unequivocally associated with the proteolytic processing of a proinsulin to insulin. Examination of the flanking sequences around this site shows that a typical endocrine/neuroendocrine PC3 conversion enzyme should still be able to cleave, as the general requirements for precursor processing at a monobasic site are satisfied, notably a basic residue (Lys) at the 24 position. An acidic residue (in this case Asp) at the 11 position, which is seen in all known proinsulins, is maintained. The corresponding genomic DNA fragment of elephantfish proinsulin was also amplified by PCR, revealing a 402-bp intron at the conserved IVS-2 position within the C7 codon. r 1997 Academic Press Most polypeptide hormones and neuropeptides, as well as many constitutively secreted proteins, are 1 To whom correspondence should be addressed at Biochemistry Department, University of Otago, P.O. Box 56, Dunedin, New Zealand. Fax: 64 3 4797866. E-mail: john.cutfield@stonebow. otago.ac.nz.
0016-6480/97 $25.00 Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
synthesised as extended precursor molecules that are subsequently cleaved by proteolytic enzymes to produce their active forms. The sites of cleavage are usually identifiable by pairs of basic residues (Lys-Arg and Arg-Arg especially), although sometimes a single basic residue or a longer run of such residues may also act as recognition sites (Devi, 1991; Lindberg and Hutton, 1991). Sequences flanking these sites would also seem to be important in defining the points of cleavage, as not all such sites are actually cleaved, and consensus sequence rules have been formulated to try and help identify likely cutting sites (Nakayama et al., 1992; Rholam et al., 1995). More significantly, characterization of a family of processing endoproteases in higher eukaryotes, which are related to the yeast Kex2 enzyme, is now providing insights into both the specificity and the regulation of proprotein processing (Steiner et al., 1992). Proinsulin is processed to the two-chain hormone insulin by excision of a central connecting C-peptide that is typically 30–35 residues in length (Steiner, 1981). Cleavage occurs predominantly in the secretory granules of the pancreatic beta cells and involves two Ca21-dependent enzyme activities with preferences for dibasic sites, first identified in rat insulinoma granules by Davidson et al. and labeled Type I and Type II (Davidson et al., 1988). The Type I enzyme was shown to cleave on the carboxy-terminal side of Arg-Arg at the B-chain/C-peptide junction, whereas the Type II enzyme targeted the Lys-Arg sequence at the Cpeptide/A-chain junction. These two enzyme activi-
199
200
ties have subsequently been found to correspond to the subtilisin-related proprotein convertases PC3 (also referred to as PC1/3 or SPC3) and PC2 (or SPC2), respectively, which are expressed in endocrine and neuroendocrine cells (Steiner et al., 1992; Hutton, 1994; Halban and Irminger, 1994). Following removal of the C-peptide by the action of the endoproteases the resulting carboxy-terminal basic residues are trimmed off by carboxypeptidase H (Davidson and Hutton, 1987). All proinsulins sequenced so far, including representatives from mammals, birds, amphibians, and fish, have their C-peptides flanked by pairs of dibasic residues, invariably Lys-Arg at the C/A junction and either Arg-Arg or Lys-Arg at the B/C junction. Of considerable interest from an evolutionary point of view are the holocephali, which are cartilaginous fishes believed to have diverged from the main vertebrate line of evolution some 280–300 million years ago and which are now limited to just three families (Long, 1995). These are the most primitive fish to possess an exocrine pancreatic gland containing islets of Langerhans (Falkmer et al., 1984). Two species from one of the families, namely the ratfish (Hydrolagus colliei) and the rabbitfish (Chimaera monstrosa), possess an unusual insulin in which the B-chain is extended at the Cterminus by some eight residues, due apparently to an Ile for Arg substitution at the B/C junction to form a monobasic rather than a dibasic processing site (Conlon et al., 1986, 1988). An alternative processing site located further into the C-peptide was inferred to exist based on limited amino acid sequence data. Our work on the insulin from the elephantfish (Callorhyncus milii), another holocephalan species but from a different family, showed that its sequence is 94% identical to that of ratfish (and rabbitfish) but without the B-chain extension (Berks et al., 1989). We wished to find out if elephantfish proinsulin contained a different processing site from ratfish, or indeed from any other species, and if so, whether this information might usefully add to the current knowledge of prohormone conversion enzyme specificity. Using PCR amplification methodology we have obtained sequence data for elephantfish proinsulin through the B/C and C/A junctions, from both cDNA and genomic DNA, the latter also allowing intron 2 (Steiner et al., 1985) to be characterized.
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
Gieseg et al.
EXPERIMENTAL Isolation of Total RNA Fresh pancreatic tissue from elephantfish was immediately placed in liquid nitrogen before purification using the CsCl method of RNA extraction (Chirgwin et al., 1979). Tissue was lyophilized, ground to a powder, and homogenized in GT buffer (4.0 M guanidinium thiocyanate, 0.1 M Tris–HCl, pH 7.5, 1% 2-mercaptoethanol). The homogenate was prespun at 2000g for 20 min and then equilibrated by adding 1 g of CsCl per 2.5 ml, before being layered on to a 10 ml CsCl cushion (5.7 M CsCl/0.01 M EDTA, pH 7.5) in a Beckman SW27 swinging bucket rotor. Tubes were centrifuged for 24 hr at 23,000 rpm and 20°. The RNA pellet was washed with 70% ethanol and resuspended in TE (10 mM Tris/1 mM EDTA/1% SDS). After additions of TE, sodium acetate, and 85% ice-cold ethanol, the sample was left on ice for 30 min, microfuged for 10 min, washed in 85% ethanol, and air-dried. The purified RNA was redissolved in diethylpyrocarbonate-treated water and stored at 280°. RNA samples were checked for integrity by formaldehyde agarose electrophoresis prior to cDNA synthesis.
cDNA Synthesis First-strand cDNA was synthesized from total RNA using Gibco BRL M-MLV reverse transcriptase. Total RNA (30 µg) was incubated with 20% (v/v) 53 first-strand synthesis buffer, 10 mM DTT, 3 mM MgCl2, 0.5 mM dNTPs, 1.0 µg/ml oligo(dT), and 200 units of M-MLV for 60 min at 37°.
Isolation of Genomic DNA Elephantfish genomic DNA was isolated from spleen using a modification of an established method (Ausubel et al., 1987). Frozen elephantfish spleen was ground to a fine powder, suspended in 1.2 ml digestion buffer (100 mM NaCl, 100 mM Tris–HCl, 25 mM EDTA, 0.5% SDS, 0.1 mg/ml proteinase K) per 100 mg tissue, and incubated with shaking overnight. The digested sample was extracted with 33 1⁄2 vol of phenol, pH 8.0, and 13 volume of chloroform/isoamyl alcohol, and the DNA was precipitated from the
201
Elephantfish Proinsulin
aqueous phase by the addition of 1⁄2 vol 7.5 M ammonium acetate plus 2 vol ice-cold absolute ethanol in a siliconized glass tube. DNA was spooled, transferred to an Eppendorf tube, washed in 70% ethanol, resuspended in 200 µl TE, pH 8.0, and stored frozen at 280°.
PCR Amplification of cDNA and Genomic DNA Template cDNA (ca. 150 ng in 2 µl) was added to a PCR cocktail containing 13 Jeffrey’s buffer (45 mM Tris–HCl, pH 8.0, 11 mM ammonium sulfate, 4.5 mM magnesium chloride, 6.7 mM 2-mercaptoethanol, 0.25 mM spermidine), 2.5 units of Taq DNA polymerase (Boehringer Mannheim), 0.2 mg/ml BSA plus relevant primers (total volume 50 µl). Primers were designed on the basis of minimum codon redundancy based on the amino acid sequence of elephantfish insulin and some codon preference information. Those selected were A1–7 (20mer, 643 degeneracy), A15–21 (21mer, 323), and B16–21 (17mer, 1283), used as forward, reverse, and forward primers, respectively (see Fig. 1a). In addition to amplifying across the C-peptide region of insulin, this allowed the synthesis of complete A-chain (63-bp) DNA for use as a probe in screening cloned PCR products. Oligonucleotides were synthesized by Macromolecular Resources, Colorado State University. For A-chain amplification the primers were GGN AT(C/T) GT(G/A) GA(G/A) CA(G/A) TGC TG (forward) and GTT GCA GTA NCC (C/T)TC CA(G/A) (G/A)TT (reverse). The PCR was carried out in a Techne Thermocycler using 20 pmol/µl of primers and reaction conditions 93° (30 sec), 50° (60 sec), 72° (30 sec) for 30 cycles. For amplification of the truncated B-chain– C-peptide–A-chain region (B8CA) the B-chain primer (forward) was TA(C/T) TT(C/T) GTN TG(C/T) GGN GA. The A-chain primer was GTT GCA GTA GCC CTC CAG GTT and was the unique A15–21 nucleotide sequence determined from sequencing the cloned Achain PCR product. Reaction conditions were as above except that the forward primer was present in a concentration of 200 pmol/µl, i.e., 103 that of the reverse primer. Based on the sequencing of the cloned B8CA product a second (unique) B-chain primer was prepared for the PCR amplification of genomic DNA. This primer, corresponding to B17–23, had the same GC content (57%) as the unique A15–21 primer, and both were
used at 20 pmol/µl. Reaction conditions were 95° (30 sec), 60° (60 sec), and 72° (30 sec) over 30 cycles.
Cloning PCR Products The A-chain PCR product was visualized following electrophoresis in NuSieve agarose (FMC Corp., Rockland, ME) and the 63-bp band was cut out of the gel. The gel slice was melted at 67° for 10 min before adding 4–5 vol of TE buffer at the same temperature. This was then mixed with an equal volume of phenolsaturated TE buffer and centrifuged (1600g for 3 min). The aqueous phase was reextracted in successive steps with phenol, phenol/chloroform, and chloroform. The B8CA products were purified using the Promega Magic DNA Purification kit. Purified PCR products (5 µl) were then end-filled using Klenow (5 units) and 1 unit T4 DNA polymerase (Boehringer Mannheim), 1 µl dNTPs, 1.0 µl 103 Jeffrey’s buffer, made up to a final volume of 20 µl with MilliQ water, and incubated for 30 min at room temperature. The reaction was stopped by heating at 65° for 5 min. Phosphorylation was carried out by incubating with 2 µl kinase buffer and 10 units of polynucleotide kinase for 15 min at 37°. Standard ethanol precipitation was carried out and the DNA pellet dissolved in 5.0 µl MilliQ water. DNA (5 µl) was ligated into SmaI-digested dephosphorylated pBluescript (2.0 µl) with 1 µl ligation buffer and 800 units of T4 DNA ligase (Boehringer Mannheim) for 16 hr at room temperature, and the reaction mix was ethanol-precipitated. Ligated DNA (1.0 µl) was electroporated with 40 µl of electrocompetent Escherichia coli DH5a cells using a Bio-Rad Gene Pulser at a setting of 1.7 kV/cm2. Transformed cells were incubated in 1 ml of LB broth at 37° with shaking for 30 min before plating out on LB/ampicillin agar. For the A-chain transformants six colonies were selected at random for sequencing. For the B8CA transformants colony lifts were carried out. After 30 min of alkali lysis the filters were washed at 63° in 23 SSC, with three changes over 30 min, and then prehybridized for 15 min at the same temperature before incubating for 6 hr with 32P-labeled A-chain DNA (for the cDNA PCR) or B8-C-A DNA (for genomic PCR). Probes were constructed with the MegaPrime DNA labeling system. After incubation the filters were washed twice at 63° before being exposed to X-ray film for 30 to 120 min. Positive colonies were selected and
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
202
Gieseg et al.
streaked out on LB agar, grown overnight, and then rescreened. The four strongest positives were selected for sequencing, and the plasmids were extracted using the Promega Magic Minipreps DNA purification system. Automated sequencing was then carried out using an Applied Biosystems 373A DNA sequencer with universal dilabeled primers. Sequence manipulations were carried out using the Applied Biosystems SeqEd version 1.0.3 and the GCG Wisconsin Package of Sequencing Utilities version 7.
RESULTS AND DISCUSSION PCR Strategy Using cDNA synthesized from total pancreatic RNA, together with three highly redundant primers based on the most conserved regions of the known elephantfish insulin amino acid sequence (Berks et al., 1989), a PCR product, B’CA, that included the desired Cpeptide DNA sequence, the full A-chain, and a truncated B-chain (B17–31) was synthesized (Fig. 1a). Based on an earlier study (Chan et al., 1990) it was decided to use PCR to make an A-chain probe for later screening of cloned PCR products and at the same time provide sequence information for a nonredundant A-chain primer to be made. Using forward and reverse primers A1–7 (64-fold redundancy) and A15–21 (32fold), respectively, the expected 63-bp product, representing the complete A-chain, was obtained (Fig. 1b). The sequence of this fragment was entirely consistent with the known amino acid sequence. A second PCR experiment utilized a 128-fold degenerate B-chain forward primer encoding residues B16– 21, together with the new nonredundant A15–21 reverse primer in a ratio of 10:1. Other ratios were tried without success. The product(s) of this reaction was not actually visible in the gel but Southern analysis with the A-chain probe revealed a band of about 240 bp. This low-yield PCR product was purified and subjected to a second round of amplification which produced the desired product together with some minor products. Sequencing established that the major product contained 222 bp and that it spanned the appropriate B’CA region.
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
FIG. 1. PCR strategy and products. (a) Schematic outline of the PCR strategy used to amplify the B8CA region of elephantfish proinsulin cDNA, where B8 refers to truncated B-chain. Lowest redundancy primers Af (forward) and Ar (reverse) were first used to find the nucleotide sequence of the A-chain. Second, unique primer Ar was used with redundant Bf to obtain the B8CA sequence. Third, unique primers Bf and Ar were used to determine the genomic DNA sequence, as well as confirm the cDNA sequence. (b) Agarose gel electrophoresis of elephantfish proinsulin PCR products. Lane 1 shows pUC18/Msp markers (base pair units); lane 2 is the A-chain product from cDNA template (redundant primers); lane 3 is the B8CA product from cDNA (unique primers); and lane 4 shows the introncontaining B8CA product from genomic DNA (unique primers).
The third PCR experiment was performed essentially to confirm the sequence and employed the two unique primers B17–23 and A15–21 (same size and GC content), which gave rise to a single product (Fig. 1b). After blunt-end ligation into Bluescript vector and transformation, four positive clones were selected for sequencing (both strands). Two gave the correct nucleotide sequence, which is shown in Fig. 2. Finally, the same two nonredundant primers were used to amplify DNA extracted from elephantfish spleen. The approximately 660-bp product (Fig. 1b) was confirmed by Southern analysis to contain the B’CA region. Three clones were selected for sequenc-
203
Elephantfish Proinsulin
Elephantfish Proinsulin C-Peptide Sequence
FIG. 2. The DNA sequence of elephantfish proinsulin (B8CA region). Exon sequences are shown in upper case, and the intron 2 sequence is shown in lower case. The deduced amino acid sequence for the B- and A-chain segments is identical to that obtained by protein sequencing (Berks et al., 1989).
ing and these results showed that the C-peptide was interrupted by an intron, the position of which coincided with that of insulin IVS2 (see Fig. 2). The full DNA sequence of elephantfish proinsulin B’CA has been deposited with GenBank under Accession No. U82395.
The elephantfish proinsulin C-peptide was shown to contain 34 residues, excluding the flanking basic residues that define the processing sites, which is slightly more than was found for mammalian species (e.g., sheep has 26 residues, human has 31 residues) but in the middle of the range (30–38 residues) for fishes. Proinsulin C peptides vary considerably in sequence due, presumably, to the lack of restraints on their structure. Various roles have been ascribed to the C-peptide though these have proved difficult to prove experimentally. It may act as a ‘‘molecular spacer’’ to provide sufficient length for the polypeptide chain to allow transmembrane segregation in the early stages of the secretory pathway; its charged residues enhance the solubility of proinsulin in comparison to insulin; it protects insulin from degradation, and it allows proinsulin to fold so that the insulin A- and B-chains are juxtaposed to promote rapid disulfide bond formation between them (Steiner et al., 1985). The most striking aspect of the elephantfish proinsulin sequence is the presence of a single arginine at the B/C junction, a site which has only been observed before for holocephalan species of proinsulin (Conlon et al., 1989). Fish and amphibian proinsulins possess the dibasic Lys-Arg at this site while mammals and birds prefer Arg-Arg. Evidence from amino acid sequencing of isolated pancreatic elephantfish insulin (Berks et al., 1989) indicates that efficient processing takes place at this monobasic site. At the other end, the C/A junction was found to contain the typical Lys-Arg dibasic cleavage site, and this is also processed normally, indicating that a typical PC2 type conversion enzyme is present. Comparison of C-peptide sequences from a variety of vertebrate classes (Oyer et al., 1971; Ko et al., 1971; Chance et al., 1968; Kwok et al., 1983; Perler et al., 1980; Shuldiner et al., 1989; Hobart et al., 1990; Hahn et al., 1983; Chan et al., 1981; Chan et al., 1990) as seen in Fig. 3 permits some general observations to be made. The N-terminal residue is apparently always acidic, the second is invariably hydrophobic, and the third is often acidic but from then on there is little overall similarity except within classes. The importance of these acidic residues as a recognition ‘‘domain’’ for the PC3 processing enzyme has been demonstrated by mutagenesis studies on human proinsulin (Gross et al.,
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
204
Gieseg et al.
FIG. 3. Amino acid sequence comparisons of proinsulin C-peptides with flanking regions (B-chain to left, A-chain to right). A representative group of vertebrates is shown plus the protochordate, amphioxus (references in text). Processing sites are shown in boldface.
1989; Kaufmann et al., 1995). At the carboxy-terminus of the C-peptide there is no overall conserved residue, let alone motif, suggesting different secondary specificity requirements for the PC2 enzyme compared to PC3. Whereas mammalian sequences are glycine rich in their central regions, the corresponding fish sequences contain a cluster of acidic residues. However, in the absence of a tertiary structure for (any) proinsulin it is difficult to rationalize these distributions in terms of C-peptide function. From an evolutionary point of view it is interesting to note that the more primitive hagfish, a cyclostome, has a 31-residue B-chain as does elephantfish, but possesses ‘‘normal’’ dibasic cleavage sites at both the B/C and the C/A junctions. DNA sequence comparison between the two (Fig. 4a) shows stronger similarity at the B/C junction if a 3-bp deletion in the elephantfish sequence corresponding to the lysine of the Lys-Arg is assumed, thereby creating a monobasic site. An alternative explanation for the presence of the monobasic site is that the usual (fish) lysine (codon AAA) has mutated to isoleucine (AUA). The former hypothesis, namely that a common ancestral proinsulin more closely resembles that of hagfish, allows a more plausible explanation for the various B-chain
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
lengths seen in different vertebrates. For the teleost fish a 9-bp deletion corresponding to hagfish residues B29–31 has occurred, leaving B28 Pro adjacent to the dibasic cleavage site which might prevent the carboxypeptidase H from removing both basic residues, assuming in the first place that normal cleavage has occurred. Mammalian insulins on the other hand generally contain a 30-residue B-chain, implying a 3-bp deletion at position 30 or 31 with respect to the progenitor insulin.
Comparison between Holocephalan Species Although the C-peptide for ratfish, or rabbitfish, proinsulin has not been sequenced the extension of its B-chain allows comparison of the corresponding B/C regions (Fig. 4b). The obvious question to be asked is why the single arginine in ratfish is apparently not a good cleavage site whereas it is for elephantfish. For the nine residues between B28 and C4, most of which might reasonably be expected to interact with the conversion enzyme (Siezen et al., 1994; Lipkind et al., 1995), five are identical (B28 Pro, B29 Lys, B31 Ile, the Arg, C4 Pro), two are similar (C1 acidic, C2 aliphatic), and two are different (B30, C3). Interestingly the
205
Elephantfish Proinsulin
FIG. 4. Nucleotide and amino acid sequence comparisons in the region of the proinsulin B/C junction between (a) the cyclostome hagfish and the holocephalan elephantfish and (b) the holocephalan species elephantfish and ratfish. The 2/1 notation refers to residues N-terminal and C-terminal, respectively, of the scissile bond.
Glu-Leu-Glu sequence at C1–3 in ratfish is more mammalian than fish-like (Fig. 4) and may not be an ideal recognition site for a ‘‘typical’’ fish processing enzyme, assuming of course that ratfish and elephantfish possess similar processing endoproteases. Another difference that could affect efficiency of cleavage is the presence of a proline at B30. This would impose conformational restrictions close to the cutting site and could prevent efficient binding to the conversion enzyme. If so, the ratfish must still be able to produce sufficient amounts of viable insulin; indeed the additional B-chain residues do not seem to adversely influence receptor-binding properties (Conlon et al., 1989). The question then arises as to the location and composition of the B/C processing site in ratfish and rabbitfish, but without complete sequence data this cannot be answered. A possible clue may lie in the elephantfish sequence C8–11 (Phe-Arg-Asp-Leu) which is similar to the actual site of cleavage within B31 to C2 (Ile-Arg-Asp-Val). If ratfish contained a similar sequence, then it is possible that cleavage could occur after the Arg at C9, although this would leave the ratfish B-chain with two extra amino acids that would then have to be cleaved off by a nonspecific carboxypeptidase. Some support for such a hypothesis may be found in the observation by Conlon and co-workers (1989) that in addition to the 38-residue B-chain insulin 36- and 37-residue variants were also present, suggesting inexact or partial processing. Interestingly they also isolated some ‘‘normal’’ 31-residue B-chain from
ratfish pancreas, which implies that processing at the B/C junction as seen in elephantfish can also occur, albeit more sluggishly. How, then, do these monobasic cleavage sites compare with those found in other proproteins?
Processing at Monobasic Sites Although proprotein processing more often occurs at dibasic sites compared to monobasic sites, there are now many recorded examples of the latter (Devi, 1991). Analyses of these sites have led to the formulation of consensus site rules and tendencies that have been gradually modified over the past 10 years. Schwartz (1986) distinguished two types of monobasic processing site, proline directed (i.e., presence of an adjacent proline) and nonproline directed. The former involves conversion enzymes that are distinct from those which cleave at dibasic sites. It is the more prevalent nonproline group, however, for which consensus recognition sequences have been sought (reviewed by Devi, 1991) and subsequently compared to the dibasic sites to see if the same conversion enzyme(s) might be involved. The most important requirement for monobasic cleavage would seem to be the presence of a basic residue at the third or fifth position N-terminal (24, 26) to the single Arg (or occasionally Lys) at position 21, with Arg being preferred to Lys at the 24 position (Benoit et al., 1987; Nakayama et al., 1992). It was also observed that
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
206
aliphatic residues are not suitable for the 11 position, that cysteine is not found in the vicinity of the cleavage site, and that aromatic amino acids are not tolerated at the 21 position. The B/C processing site in elephantfish proinsulin 25 to 13 is -Pro-Lys-Gln-Ile-Arg-Asp-Val-Gly-, showing a Lys at 24 and generally satisfying the other requirements for monobasic processing. The significance of the 24 Lys for enhancing recognition of the human proinsulin cleavage site by PC3 has previously been recognized (Kaufmann et al., 1995). By contrast, the piece of sequence further along the C-peptide region of ratfish proinsulin, suggested above to be a possible cleavage site, is -Leu-Ser-Ala-Phe-Arg-AspLeu-Glu- and does not satisfy the rules on two counts: the lack of a basic residue at 24 and the presence of an aromatic at 21. An analogous Arg at C9 (equivalent to B41 of ratfish proinsulin) would not be a likely cleavage site as the 24 position would then be Leu. In the absence of sequence information for ratfish or rabbitfish C-peptide there is no simple explanation for the extended B-chain in their insulins. Given that elephantfish proinsulin contains a typical monobasic cleavage site is it reasonable to assume that it is cleaved by a typical PC3 proinsulin conversion enzyme and not some other endoprotease? Nakayama and co-workers (1992) demonstrated that PC1/PC3 was capable of cleaving both dibasic and consensus monoarginyl sites using recombinant prorenin constructs expressed in mouse pituitary AtT20 cells. As far as insulin processing is concerned, however, there has been no evidence yet put forward to indicate a relaxation in the dibasic specificity at the B/C junction. Mutational studies, also involving transfected AtT20 cells, showed that processing did not occur at the B/C junction of mouse proinsulin when Arg-Arg was changed to Arg-Gly (Docherty et al., 1989) but this would mean B29 Lys is now at the 23 and not the 24 position, a disallowed situation. Further mutagenesis studies of the type X-Arg in a mammalian insulin would be of considerable interest in establishing whether the Type I convertase could still cleave efficiently. A model-building study of the active sites of PC2 and PC3 (Lipkind et al., 1995) demonstrated how these enzymes might attract basic residues at the 21, 22, and 24 positions of a proprotein substrate, although the subsite interacting with the 22 position did
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
Gieseg et al.
not appear to be tailored specifically for a basic residue. The elephantfish PC3-type enzyme would require an Ile to be bound at this subsite. Interestingly, a mono-arginyl cleavage site in the pronatriuretic peptide of porcine brain is seen to closely resemble that of elephantfish proinsulin (Nakayama et al., 1992), viz, -Ser-Pro-Lys-Thr-Met-Arg-Asp-Ser-Gly-, being identical at the 26, 25, 24, 21, 11 and 13 positions. In summary, processing at the B/C junction of elephantfish proinsulin is carried out by an endoprotease that appears closely related to the mammalian proinsulin PC3 enzyme. Cleavage occurs at a monobasic site, rather than the usual dibasic site, presumably because the other specificity requirements are met: a basic residue (Lys) at the 24 position (typical of monobasic processing in general), and an acidic residue (Asp) at the 11 position (typical of proinsulin B/C processing in particular). Outside of the holocephali the only other proinsulin for which a form of monobasic cleavage has been reported is that of dog (Kwok et al., 1983), where the C-peptide is cleaved at an internal Arg (C8) in addition to the normal dibasic processing sites. However, this position does not obey the 24 basic rule though intriguingly the 11 position is acidic. It was suggested in this case that either a posttranslational event involving a trypsin-like enzyme had occurred or that degradation had taken place during the purification procedure. An unusual processing pathway is also evident in the proinsulin of the amphibian amphiuma, in this case at the C-peptide/Achain junction, where the dibasic site has been converted to a monobasic site, resulting in an alternative (as yet undefined) processing site (Conlon et al., 1996). Clearly a cloned cDNA sequence would be valuable here, as it would be for ratfish proinsulin, in order to define the sequence specificity of the processing enzyme involved.
Genomic Sequence Determination of the genomic DNA sequence of the B’CA region not only confirmed the cDNA sequence but also revealed the presence of intron 2 (IVS2) of the insulin gene (Steiner et al., 1985). The intron 1 sequence was not determined in this study as it is known to lie just upstream of the start site. Intron 2 contains 402 bp and interrupts residue C7 Ala immediately following the first nucleotide (G) of the codon. Every known
Elephantfish Proinsulin
insulin gene, except that of rat I and mouse I where it is absent, possesses an intron precisely at this position within the C-peptide regardless of the size of the exons flanking it. The length of this intron varies from 141 bp (in hagfish) to about 3500 bp (in chicken) (Steiner et al., 1985). Examination of the intron 2 splice sites in known insulin genes indicates consensus sequences GGTG/ AAG and GCAGT at the 58 and 38 ends, respectively. The elephantfish sequence is different at the 38 end (CCAGC) but this is still consistent with the general consensus (Lewin, 1994). It is of interest to note that residue 7 of all currently known insulin C-peptides is aliphatic (specifically Val, Ala, or Gly), a consequence of the C7 codon starting with G. Of the 16 possible codons beginning with G, 4 code for each of Val, Ala, and Gly, while 2 code for each of Asp and Glu. The interaction of this residue with a specific part of the insulin molecule, over which the mobile C-peptide is believed to drape itself (Weiss et al., 1990), probably dictates that it must be nonpolar. The more primitive protochordate species amphioxus which possesses a hybrid insulin/IGF molecule also contains an intron at C7, in this case interrupting another aliphatic residue (Ile) after an A (of ATC), whereas the IGFs themselves do not have introns at this position (Chan et al., 1990).
ACKNOWLEDGMENTS Support from the University of Otago Medical School is acknowledged. The Centre for Gene Research, University of Otago, provided the DNA sequencing facilities. We thank Dr. Shu Jin Chan and Professor D. F. Steiner (University of Chicago) for helpful advice.
REFERENCES Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K. (1987). ‘‘Current Protocols in Molecular Biology.’’ Wiley, New York. Benoit, R., Ling, N., and Esch, F. (1987). A new prosomatostatinderived peptide reveals a pattern for prohormone cleavage at monobasic sites. Science 238, 1126–1129. Berks, B. C., Marshall, C. J., Carne, A., Galloway, S. M., and Cutfield, J. F. (1989). Isolation and structural characterization of insulin and glucagon from the holocephalan species Callorhynchus milii (elephantfish). Biochem. J. 263, 261–266.
207 Chan, S. J., Coa, Q. P., and Steiner, D. F. (1990). Evolution of the insulin superfamily: cloning of a hybrid insulin/insulin-like growth factor cDNA from amphioxus. Proc. Natl. Acad. Sci. USA 87, 9319–9323. Chan, S. J., Emdin, S. O., Kwok, S. C. M., Kramer, J. M., Falkmer, S., and Steiner, D. F. (1981). Messenger RNA sequence and primary structure of preproinsulin in a primitive vertebrate, the Atlantic hagfish. J. Biol. Chem. 256, 7595–7602. Chance, R. E., Ellis, R. M., and Bromer, W. W. (1968). Porcine proinsulin: Characterization and amino acid sequence. Science 161, 165–167. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rugger, W. J. (1979). Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18, 5294–5299. Conlon, J. M., Dafgard, E., Falkmer, S., and Thim, L. (1986). The primary structure of ratfish insulin reveals an unusual mode of proinsulin processing. FEBS Lett. 208, 445–450. Conlon, J. M., Andrews, P. C., Falkmer, S., and Thim, L. (1988). Isolation and structural characterization of insulin from the holocephalan fish, Chimaera monstrosa (rabbit fish). Gen. Comp. Endocrinol. 72, 154–160. Conlon, J. M., Goke, R., Andrews, P. C., and Thim, L. (1989). Multiple molecular forms of insulin and glucagon-like peptide from the Pacific ratfish (Hydrolagus colliei). Gen. Comp. Endocrinol. 73, 136– 146. Conlon, J. M., Cavanaugh, E. S., Mynarcik, D. C., and Whittaker, J. (1996). Characterization of an insulin from the three-toed amphiuma (Amphibia:Urodela) with an N-terminally extended Achain and high receptor-binding affinity. Biochem. J. 313, 283–287. Davidson, H. W., and Hutton, J. C. (1987). The insulin-secretorygranule carboxypeptidase H. Biochem. J. 245, 575–582. Davidson, H. W., Rhodes, C. J., and Hutton, J. C. (1988). Intraorganellar calcium and pH control proinsulin cleavage in the pancreatic b cell via two distinct site-specific endopeptidases. Nature 333, 93–96. Devi, L. (1991). In ‘‘Peptide Biosynthesis and Processing’’ (L. D. Fricker, Ed.), pp. 175–198. CRC Press, Boca Raton, FL/London. Docherty, K., Rhodes, C. J., Taylor, N. A., Shennan, K. I., and Hutton, J. C. (1989). Proinsulin endopeptidase substrate specificities defined by site-directed mutagenesis of proinsulin. J. Biol. Chem. 264, 18335–18339. Falkmer, S., El-Salhy, M., and Titlbach, M. (1984). In ‘‘Evolution and Tumour Pathology of the Neuroendocrine System’’ (S. Falkmer, R. Hakanson, and F. Sundler, Eds.), pp. 59–87. Elsevier, Amsterdam. Gross, D. J., Villa-Komaroff, L., Kahn, C. R., Weir, G. C., and Halban, P. A. (1989). Deletion of a highly conserved tetrapeptide sequence of the proinsulin connecting peptide (C-peptide) inhibits proinsulin conversion to insulin conversion by transfected pituitary corticotroph (AtT20) cells. J. Biol. Chem. 264, 21486–21490. Hahn, V., Winkler, J., Rapoport, T. A., Liebscher, D. H., Coutelle, C., and Rosenthal, S. (1983). Carp preproinsulin cDNA sequence and evolution of insulin genes. Nucleic Acids Res. 11, 4541–4552. Halban, P. A., and Irminger, J.-C. (1994). Sorting and processing of secretory proteins. Biochem J. 299, 1–18. Hobart, P. M., Shen, L.-P., Crawford, R., Pictet, R. L., and Rutter, W. J. (1990). Comparison of the nucleic acid sequence of anglerfish and
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
208 mammalian insulin mRNA’s from cloned cDNA’s. Science 210, 1360–1363. Hutton, J. C. (1994). Insulin secretory granule biogenesis and proinsulin-processing endopeptidases. Diabetologia 37 (Suppl. 2), S48–S56. Kaufmann, J. E., Irminger, J-C., and Halban, P. A. (1995). Sequence requirements for proinsulin processing at the B-chain/C-peptide junction. Biochem. J. 310, 869–874. Ko, A. S. C., Smyth, D. G., Markussen, J., and Sundby, F. (1971). The amino acid sequence of the C-peptide of human proinsulin. Eur. J. Biochem. 20, 190–199. Kwok, S. C. M., Chan, S. J., and Steiner, D. F. (1983). Cloning and nucleotide sequence analysis of the dog insulin gene. J. Biol. Chem. 258, 2357–2363. Lewin, B. (1994). In ‘‘Genes V,’’ p. 914. Oxford Univ. Press, Oxford. Lindberg, I., and Hutton, J. C. (1991). In ‘‘Peptide Biosynthesis and Processing’’ (L. D. Fricker, Ed.), pp. 141–174. CRC, Boca Raton, FL/London. Lipkind, G., Gong, Q., and Steiner, D. F. (1995). Molecular modeling of the substrate specificity of prohormone convertases SPC2 and SPC3. J. Biol. Chem. 270, 13277–13284. Long, J. A. (1995). ‘‘The Rise of Fishes: 500 Million Years of Evolution.’’ Johns Hopkins Press, London. Nakayama, K., Watanabe, T., Nakagawa, T., Kim, W-S., Nagahama, M., Hosaka, M., Hatsuzawa, K., Kondoh-Hashiba, K., and Murakami, K. (1992). Consensus sequence for precursor processing at mono-arginyl sites. J. Biol. Chem. 267, 16335–16340. Oyer, P. E., Cho, S., Peterson, J. D., and Steiner, D. F. (1971). Studies on human proinsulin. Isolation and amino acid sequence of the human pancreatic C-peptide. J. Biol. Chem. 246, 1375–1386. Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., and
Copyright r 1997 by Academic Press All rights of reproduction in any form reserved.
Gieseg et al.
Dodgson, J. (1980). The evolution of genes: The chicken preproinsulin gene. Cell 20, 555–566. Rholam, M., Brakch, N., Germain, D., Thomas, D. Y., Fahy, C., Boussetta, H., Boileau, G., and Cohen, P. (1995). Role of amino acid sequences flanking dibasic cleavage sites in precursor proteolytic processing. Eur. J. Biochem. 227, 707–714. Schwartz, T. W. (1986). The processing of peptide precursors. Proline-directed arginyl cleavage and other monobasic processing mechanisms. FEBS Lett. 200, 1–10. Shuldiner, A. R., Phillips, S., Roberts, C. T., Le Roith, D., and Roth, J. (1989). Xenopus laevis contains two nonallelic preproinsulin genes. J. Biol. Chem. 264, 9428–9432. Siezen, R. J., Creemers, J. W. M., and Van de Ven, W. S. M. (1994). Homology modelling of the catalytic domain of human furin. A model for the eukaryotic subtilisin-like proprotein convertases. Eur. J. Biochem. 222, 255–266. Steiner, D. F. (1981). In ‘‘Structural Studies on Molecules of Biological Interest’’ (G. Dodson, J. P. Glusker, and D. Sayre, Eds.), pp. 407–419. Oxford Univ. Press, Oxford. Steiner, D. F., Chan, S. J., Welsh, J. M., and Kwok, S. C. M. (1985). Structure and evolution of the insulin gene. Annu. Rev. Genet. 19, 463–484. Steiner, D. F., Smeekens, S. P., Ohagi, S., and Chan, S. J. (1992). The new enzymology of precursor processing endoproteases. J. Biol. Chem. 267, 23435–23438. Weiss, M. A., Frank, B. H., Khait, I., Pekar, A., Heiney, R., Shoelson, S. E., and Neuringer, L. J. (1990). NMR and photo-CIDNP studies of human proinsulin and prohormone processing intermediates with application to endopeptidase recognition. Biochemistry 29, 8389–8401.