Amplification of human argininosuccinate synthetase pseudogenes

Amplification of human argininosuccinate synthetase pseudogenes

J. Mol. Biol. (1986) 192, 221-233 Amplification of Human Argininosuccinate Synthetase Pseudogenes Hisayuki Nomiyama ‘, Kenshi Obaru’, Yoshihiro Jinno...

2MB Sizes 3 Downloads 262 Views

J. Mol. Biol. (1986) 192, 221-233

Amplification of Human Argininosuccinate Synthetase Pseudogenes Hisayuki Nomiyama ‘, Kenshi Obaru’, Yoshihiro Jinno1*2 Ichiro Matsuda2, Kazunori Shimada’? and Takashi Miyata3 Departments of ‘Biochemistry and 2Pediatrics Kumamoto University Medical School Honp, Kumamoto 860, Japan Department of Biology Faculty of Science Kyushu University, 6-10-l Hakozaki Higashi-Ku, Fukuoka 812, Japan (Received 6 January 1986, and in revised form 10 June 1986) The human genome contains multiple pseudogenes for an argininosuccinate synthetase (AS) gene. To elucidate the molecular mechanisms of generation and dispersion, complete nucleotide sequences of four different AS pseudogenes, *AS-Y, $AS-AI, $AS-A2 and $ASA3, have been determined. A comparison of these sequences with those of three reported AS pseudogenes, $A&1, *AS-3 and *AS-7 revealed that two pairs, $AS-Y/$AS-7 and are highly homologous but not identical, thereby suggesting that one of the *AS-A3Irl/AS-I, pairs is generated by a duplication of the other member of the pairs. The @AS-Y, which is sequenced *AS-7 are both probably located on chromosome Y, and the partially interrupted by an Alu element at exactly the same site in their St-end regions. These two Ah elements are located in an opposite orientation relative to the direction of transcription of the pseudogene, and their possible role on pseudogene dispersion was examined. The *AS-Al is also accompanied by an Alu element at its 3’ end. In this case, the orientation of the Ah element is the same as that of the pseudogene. The *AS-Al and the Alu element are flanked with direct repeats, as if they had been inserted into a chromosomal site, as a single unit.

1. Introduction

chromosomal segments, including functional genes (for a review, see Jeffreys & Harris, 1984). Argininosuccinate synthetase (L-citrulline : L-aspartate ligase; EC 6.3.4.5) is one of the urea cycle enzymes. Several cDNAs for the AS8 gene have been cloned (Su et al., 1981; Bock et al., 1983) and their nucleotide sequences have been reported (Bock et al., 1983). Human ASase is encoded by a single functional gene located on chromosome 9 (Beaudet et al., 1982). This gene is 63 kb in length and contains at least 13 exons (Freytag et al., 1984b). Other than this functional gene, the human genome contains 14 AS pseudogenes that are dispersed along 11 different

Many eukaryotic genomes contain dead relics of genes termed pseudogenes. They are classified into at, least, two different groups. One group completely lacks introns? has an oligo(A) tract at its 3’ end and is often flanked with direct repeats. Based on these st,ructural features, this group is assumed to be created by a mechanism involving a reverse transcription of mRNA followed by an integration of the resulting cDNA copy into a new chromosomal site, thereby generating a target site duplication. This group has been termed processsed pseudogenes (for reviews, see Vanin, 1984; Jeffreys & Harris, 1984). The other group shows all the major structural features of functional genes, such as promoters. exons and introns, and is supposed to be created by DNA-mediated duplication of ? Author

to whom

all

$ Abbreviations used: AS gene, the gene for argininosuccinate synthetase; ASase, the enzyme L-citrulline: L-aspartate ligase; kb, lo3 base-pairs: base-pair(s); SSC is 0.15 m-NaCI, 0.015 M-trisodium citrate. pH 74.

correspondence should be

addressed.

bp

221 0 1986 Academic Press Inc. (London)

Ltd.

222

H. Nomiyama

chromosomes, including chromosomes X and Y (Su et al., 1984). In the course of screening a human genomic library for the functional AS gene, we obtained many phage clones carrying AS cDNA-like sequences (Jinno et al., 1984). Restriction mapping and Southern blot analyses of these clones suggested that some carry AS pseudogenes. In this work, we have analyzed the nucleotide sequences of four different AS pseudogenes, for the purpose of elucidating molecular mechanisms of pseudogene generation and their dispersion. A comparision of these sequences with those of the three AS pseudogenes reported by Freytag et al. (1984o) suggested that two are created by a duplication of the other two. Moreover, one of the four has a novel structure consisting of an AS pseudogene accompanied by an Ah element at its 3’ end. They are surrounded by direct repeats, as if the combined unit had been inserted into a chromosomal site, as a single unit.

2. Materials and Methods (a) Enzymes and chemicals

Enzymes and chemicals were obtained from the following sources: restriction enzymes from Takara Shuzo (Kyoto, Japan) and New England Biolabs; bacteriophage T4 polynucleotide kinase and DNA ligase from Takara Shuzo, Kyoto, Japan; Escherichia coli DNA polymerase I from New England Biolabs; SP6 RNA polymerase and RNasin ribonuclease inhibitor from Promega Biotec; calf intestinal aklaline phosphatase from BoehringerMannheim; RNase-free DNase I (DPRF) from Cooper Biomedical. [a-32P]dCTP (3000 Ci/mmol), [Y-~*P]ATP (5000 Ci/mmol), [a-32P]GTP (409 Ci/mmol) and purchased from were [a-32P]CTP (409 Ci/mmol) Amersham. (b) Preparation of DNAs containing AS cDNA or AS cDNA-like sequences The AS cDNA was prepared from pAS1 (Su et al., 1981), which was obtained from Dr T. Saheki of Kagoahima University, Japan. The isolation of phage clones carrying AS cDNA-like sequences has been described (Jinno et al., 1984). The clones were isolated from a human gene library constructed in our laboratory on EcoRI partial digests of male placental DNA (Tsuzuki et al., 1983). Physical maps of some of these clones have been described (Jinno et al.: 1984). The presence of the AS cDNA-like sequences in these clones was detected using a nick-translated AS cDNA as a probe. For further structural studies. restriction fragments were isolated from the recombinant phage DNAs and were subcloned into plasmid pBR322. (c) Propagation

of E. coli cell8

Propagation of E. coli cells carrying recombinant phages or plasmids was carried out in accordance with the guidelines for recombinant DNA research issued by the Ministry of Education, Science and Culture of Japan. (d) DNA hybridization

and sequencing

Nick translation, Southern blotting and hybridizations were performed as described (Jinno et al.. 19856).

et al.

Nucleotide sequences were determined by the method of Maxam t Gilbert (1980). Entry, editing and analyses of the sequence data were by GENAR, GRASE and GENIAS programs purchased from Mitsui Knowledge Industry (Tokyo, Japan) using a personal computer, NEC PC-98OlE. (e) RNA dot-blot hybridization Total RNAs were extracted by the guanidine thiocyanate method (Chirgwin et al., 1979) from human liver and the following cultured human cell lines; hepatoblastoma HUH-~ (Doi, 1976), hepatoma HUH-~ (Nakabayashi et al., 1982) and HeLa cells. HUH-~ and HUH-7 were kindly provided by Dr J. Sato of Okayama University, Japan. RNA samples were suspended and incubated at 37°C for 10 min in 50 pl of 40 m&r-Tris- HCl (pH 75), 6 mm-MgCl, containing RNase-free DNase I (20 pg/ml) and RNasin (1090 units/ml). RNAs were then denatured and applied to nitrocellulose filters, as described by White & Bancroft (1982), using a Minifold apparatus (Schleicher & Schuell). The filters were then baked in uacuo at 80°C for 2 h and prehybridized at 60°C for 4 h in 50% (v/v) formamide, 50 mnn-sodium phosphate (pH 65), 5 x SSC. 2.5 x Denhardt’s solution (0.5 g of polyvinylpyrrolidone, 0.5 g of bovine serum albumin and 0.5 g of Ficoll 400 per litre), 0.01% (w/v) sodium dodecyl sulfate, 500 pg of yeast RNA/ml and 250 pg of E. coli DNA/ml. Hybridization was carried out in the same buffer with the addition of an appropriate probe at 6OO”C for 12 h. Filters were washed 3 times at 65°C for 20 min in 0.1 x SSC, 0.1% (w/v) sodium dodecyl sulfate, and autoradiographed at -80°C with a DuPont LightningPlus intensifying screen. RNA probes for the hybridization were synthesized from the following plasmids, as templates. The AS cDNA extracted from the pAS1 was digested with Hind111 and EcoRI, and a 289 bp fragment (nucleotides 233 to 521; Bock et al., 1983) was cloned into the HindIII-EcoRI sites of pSP64 and pSP65 (Melton et al., 1984). The resulting plasmids were linearized by digestion with EcoRI or HindIII, and were used for the synthesis of 32P-labeled and/or unlabeled sense and anti-sense strand RNAs. The amounts of unlabeled sense and anti-sense AS RNAs synthesized per reaction were estimated from parallel react,ions using [a-32P]CTP. The syntheses of RNAs by SP6 RNA polymerase were performed according to Melton et aE. (1984). The bonaJide transcripts hybridize with the anti-sense RNA probe. The synthesized unlabeled anti-sense AS RNA was used to of the RNA dot-blot the sensitivity estimate hybridization analysis for detecting possible anti-sense transcripts derived from some of the processed type AS pseudogenes.

3. Results (a) Isolation and characterization cDNA-like sequences

of AS

We isolated 12 phage clones from a human genomic library, using an AS cDNA derived from pAS1 (Su et al, 1981) as a probe. To determine the chromosomal assignment of each AS cDNA-like sequence carried by these phage clones, BcoRT digests of human male and female genomic DNAs and that of DNAs derived from the phage clones were co-electrophoresed in an agarose gel, Southern

Pseudogene Amplifzcation

223

Figure 1. Southern blot analysis of EcoRI digests of human genomic DNAs and that of phage cloned human DNAs. EcoRI digests of male and female genomic DNAs (10 pg) were co-electrophoresed on a 0.8% (w/v) agarose gel with those of the phage cloned DNAs (about 10 ng) carrying AS cDNA-like sequences, Southern blotted and hybridized with the AS cDNA probe. F, female; M, male. Lane a, Lm AS-l; lane b, Lm A&11; lane c. Lm AS-7; lane d, Lm AS-12: lane e. Lm AS-3; lane f. Lm AS-4; lane g, Lm AS-13; lane h, Lm AS-6; lane i, Lm AS-14; lane j, Lm AS-2: lane k, Lm AS-5 lane 1. Lm AS-K A male-specific EcoRI fragment is indicated by the arrow at the right. Sizes of the EcoRI-digested DNA fragments derived from the functional AS gene are specified (in kb) at the left (Su et al.. 1984).

blotted and hybridized with the AS cDNA probe. From the result shown in Figure 1 and the reported data of the chromosomal assignments of EcoRI fragments containing the AS cDNA-like sequences (Su et al., 1984), Lm AS-8 is assumed to carry several EcoRI fragments derived from the functional AS gene. We confirmed that this clone actually contains the 5’-end region of the functional AS gene (Jinno et al., 1985a,b). Lm AS-2 carries a 4.2 kb long EcoRI fragment that, is assigned to cover an AS pseudogene present on chromosome Y (Fig. 1: and see Su et al., 1984). Furthermore, the restriction map of the Lm AS-2 (Fig. 2) is consistent with the observation that HamHT and KpnI digests of a human male genomic DNA produce male-specific AS cDNA-like fragments of 2.45 kb and 1.8 kb in length, respectively (Daiger et al., 1982). Therefore, we assumed that the AS cDNA-like sequence carried by Lm AS-2 is an AS pseudogene derived from chromosome ‘I’. Comparison of the restriction fragments derived from the rest of phage clones with those of the human male and female genomic DNAs suggested that all these, except Lm AS-6, contain pseudogenes (Fig. 1). Restriction maps of the Lm AS-l, Lm AS-3 and Lm AS-I I DNAs indicate that all contain AS pseudogenes (Jinno et al., 1984; Fig. 2). To study their structures in more detail,

restriction fragments carrying AS cDNA-like sequences were subcloned into pBR322. Locations of the AS pseudogenes in subcloned fragments were determined by Southern blot analysis and by comparisons of their restriction maps with that of the AS cDNA (Bock et al., 1983; Fig. 2). Sequencing strategies of these pseudogenes are shown in Figure 2, and Figure 3 summarizes all the obtained sequences. The determined nucleotide sequences showed that all these AS cDNA-like sequences possess features of processed pseudogenes; they lack intervening sequences and have oligo(A) tracts at their 3’ ends. The AS pseudogenes carried by clones Lm AS-Z, Lm AS-3, Lm AS-l and Lm AS-l 1 were designated $AS-Y. @AS-Al, II/AS-A2 and $AS-A3, respectively, and structural feat,ures of each pseudogene are described below.

Comparison of the nucleotide sequence of *AS-Y with that of the AS cDNA (Bock et al., 1983) revealed that the 3’-end region of t,his pseudogene is interrupted by an Ah element, which is inserted in an opposite orientation relative to the direction of transcription of the pseudogene (Fig. 3). The *AS-Y is flanked with 15 bp direct repeats, and its nucleotide sequence shows 8796 homology to that of the AS cDNA. The accompanying Alu element is

224

Lm AS-2

H. Nomiyama et al. KEBH K YU c *-

E

‘L

.=;;;,:,

_,-c-6

BHSaA

X

S

KP

b-z-z

Lm AS-3

‘I

7

250bp

H

e*pEB

__~-

_.-’ P

A&,SsA%s

Lm AS-l

Ali

+-CCI

HHB

Eva

H

B

E iis

--.___ ---__ H HB

+AS-A2 A AC ’

44

elements associated with these pseudogenes also have the same extent of homologies and common

---__ -----____

k

25obp

-

HH --

, \ ,

SS ’

ASsAB “‘*-

A

P

Y

’ c

f; . . I’

,.a’

,’

/*

E

H $

x

rJ,*

SaA

p,

‘Y;

A ASS

ct_

base-substitutions, when the sequences are compared with that of an Ah consensus sequence (Jelinek & Schmid, 1982; Fig. 3(b)). As a detailed restriction map of the JIAS-7 has not been reported, we cannot directly compare the map of $AS-Y with that of $AS-7. Freytag et al. (1984a) reported that $A%7 extends over two EcoRI fragments of the human genomic DNA, while (C1AS-Y is present

-\

$AS-A3

H

250bp

-o-----P-

-

LmAS-11

ii%

:

‘--

E

: Sa I; ’

JIAS-A 1

homologous to the Ah consensus sequence (Jelinek & Schmid, 1982), and is flanked with 9 bp direct repeats, which were probably generated at the time when the Alu element was integrated into the pseudogene. We found that $AS-Y and JIAS-7 (Freytag et al., 1984a) share the following distinct structural features. First, $AS-7 is interrupted by an Ah element at exactly the same site as that of the JIAS-Y (Fig. 3). Second, DNA sequence homology between *AS-7 and the AS cDNA is 89% and that, of between rl/AS-7 and *AS-Y is much the same. Third, both pseudogenes have many common substitutions (Fig. 3(a)). Finally, the two Ah

80%

i3

Iws

-

7 _*-*

B E E u ‘. ‘. . . ikb ‘. ‘.

Alu

f4

250bp

-

scripts

Figure 2. Restriction

maps of the phsge cloned DNAs. carrying AS cDNA-like sequences. DNA regions that hybridized with the AS cDNA are indicated by hatched bars. Filled and open bars in the enlarged maps indicate the pseudogene and Ah sequences determined by nucleotide sequence analyses. The restriction map of the AS cDNA (Bock et at., 1983) is shown at the bottom. Arrows beneath the enlarged maps represent the sequencing strategy, and the length and direction of the sequencing. The end-labeled sites are marked with filled circles. Restriction sites: A&II (At), Am1 (A), BumHI (B). BglII (Bg), EcoRI (E), Hind111 (H). KpnI (K), MstTI (M), PatI (Pt)? PvuII (P), Sac1 (Sa). &UT (S). XbaT (X).

on a single EcoRI

fragment

of 4 kb in length (Fig. 2). Accordingly, we assume that $A&Y differs from *AS-7. The resemblance of the DNA structures of *AS-Y and @AS-7 suggests that one was created by a DNA or RNA-mediated duplication of the other (see Discussion). The presence of the AZu elements suggests the latter possibility. As Ah elements contain an RNA polymerase III promoter, tranfrom this promoter

could extend

into the AS

pseudogene, thereby synthesizing anti-sense RNAs of the AS pseudogene. Some of these cDNA copies RNAs could insert into new of anti-sense chromosomal sites. To investigate this possibility, total

RNAs

extracted

from human

liver

cultured cell lines were dot-hybridized

and several

with sense or

anti-sense RNAs synthesized from two plasmid clones, both containing an SP6 promoter and a part

of the AS cDNA. As shown in Figure 4, the antisense RNA probe, which hybridizes with the AS mRNA. gave positive hybridization signals, though the sense RNA probe gave no such signal. The meaning of the st,rong positive signals with anti-

Figure 3. Pu’ucleotide sequences of the AS pseudogenes, $AS-Y. t//AS-Al. +AS-A2 and $AS-A3. (a) Comparison of the sequences wit,h that of the AS cDNA. Sequences of the t//AS-l. *AS-3 and $AS-7 (Freytag et aZ., 1984a), are included in the Figure. All these AS pseudogene sequences are compared with that, of the AS cDPU’A (top line). Since the full-length c%DXA has not been cloned, 5’-end regions of the pseudogenes are compared with that of the functional AS gene (,Jinno pt al.. 19850: Freytag et al.. 19846). The bases that are identical with those of the AS cDKA or genomic DPU’Aare shown h> dashes. Additions of 1 or several bases to align sequences for maximum homology are indicated by asterisks. The sites where illu elements are present are indicated, and their nucleotide sequences are separately shown m (b). Common base substitutions in each pair II/AS-Y/tjAS-7 and GAS-A3/$AS-1 are boxed: those base substitutions that are common in at least 5 different pseudogenes are indicated by arrowheads. Direct repeats flanking the pseudogenes and the Alu elements are underlined. The putative transcription initiation site that is deduced from the comparison of the sequencesbetween functional AS gene and those of pseudogenes (Jinno et al., 19856) is indicated by a downward arrow. (b) Nucleot,ide sequences of Ah elements that are found in or near the AS pseudogenes. The bases that are identical with the Alu consensus sequence (top line; Deininger et al., 1981) are shown by dashes. Dots in the .4Zu sequence of the $AS-7 indicate the region that has not been sequenced. The complementary and reverse sequences are shown for the illu elements of $A&Y and *AS-7. Sequence hyphens have been omitted for clarity.

TCTTACCCTTAAAACCATAGAGGTTGACTTTTATGGATTTATTGAAACC GAAAGGATGGCAACTTCAATTTTCAATAAATAACTAGAAATGGA

. I . . GAGAATGTTGTGGAGCT AAAAATAATGTCTATATATCAACCATGCAAGCAAGATGTATGTTGTTAlUiC

. . . . CCTCCTCTCCT~TCTTTCCTGAGTACTAGAATTAGGCATT

GTC --a ----a --a----

cDNA $As-Y *S-Al @S-A2 @S-A3 @S-l was-3

ATT ---c-----a---

*** t*t *** +*t

A GTT CTG GCC TAC AGT mm- --a --C-- .m__ --------e-_ ----we- --a-_

--------------TC-----G

------ A ------T ----------------------AAG-----------------------A GGC GGC CTG GAC ACC TCG TGC ATC CTC GTG TGG CTG AAG GAA CAA GGC TAT GAC --a -me m-e --A C-m mm --T m-w ------m-c --T -_- we- A-- --m-c __- A-- _-- ---

GCC -------------

TAT --c -- c --c --c e-e -- C

CTG __-a---------

GCC _-_-----B-m ---

AAC --s-T ----V-B e-v # ---

ATT ---C---

GGC ---e-we ____A-.-

CAG -----w_------

AAG ------_------

GAA ----em-------

GAC --I s-s e-e ------ T

TTC ---_----__---

GAG --me_-___-A--

GAA A -_ ---a------a

GCC -A--------A--

AGG ----w-v ----m-m

AAG -e_ -------we we-

AAG e-m e-m M-T -----et

GCA --m-w ---------

CTG TGC a-B-w --a-T--

AAG ---em -----

CTT --G-T---wea--

GGG s-m --A --_ ----_ em_

---------------------------es----a---T ---t*,------------e-w --T---_--em-----i ---_-----C--CAA

GTG A-A---a

cDNA *s-y #AS-Al @S-A2 @S-A3 *s-1 w-3

G-G---T-

CCACGTGTCCCCGGTCACCGGCCCTGCCCCCGGGCCCTGTGCTTAT~CCTGGGATGGGCACCCCT~CAG~CTGCTCTGCCGCCTGCCACCGCTGCC* CATTGATTTTGGTGTTGCTGCTAAACTGAACAGTATTAGTTTGGTATAGCTGCTGT AAAAAAATTACCTCCT-T-------T-A-----TT-T-T---AC AAAAAAAAAAAAATGAAGAGTCA------T----T----------------T* TCCCTGACTTCAGCAACTCATCAATTTCTTTGTGTGAGTCTATTTTCTT CCTTTTTGACTCATCAGTGAGCCAAAGGTAGAAAGTGTGTTAG~GAGGTGCATAT~GCAGGT~GTTCTG---------T----------------T* TCTGACn;C-T--T----T----------T------* GTCTGGGGCCACACTGCTTCCAGAAGGGGCACAG~CTGG~CCTTT~C~TGGTT~ AGTGGCTGAGGCCTAGGGGATTAATTATTATT~CTC~GTGCCCTCCAGTGGAGAGGAGGCTGAG~GAGGCCA--CTG-CATGCT-CA-AGG-C--AGAC ATATGCTTGGGATAGGCCTGCTTTCTACCCATTGCACCAGTCCTCT~ACTC~~CCTGCTCTTAGT~C---------T----------------T* A CGAGCCCGAGTGGTTCACTGCACTGTGAAAACAGATTCCAGATTCCAGACGCCGGG~CTCACGCC~C~TCCCAGACGCT ATG TCC AGC AAA GGC TCC --m-e w-v a-- --A --G-G--G-A-------T-------C---G---G-------T----T--A--------G-A--------------T----G--------------------C-----G-------e--e-m A---------------------------------me w-e w-w -a- ------- -a- -a- em---A-------------------G -----G---------G------A----G---T----------T--------

AS Gene @S-Y @is-Al *S-A2 @is-A3 @S-l las-3

cDNA @S-Y @S-Al @S-A2 @S-A3 @S-l *s-3

TGGGGAGGCGGGCCCCGCCCATCTGCAGGTGGCTGTGAACTGTGGCGCGCGTCCCCG . . ..CTGACCCAAACTCCTGTCAGTGTGCTAACCATCCTGATATTCCT CCTCCATACATGCCACGGTGTGATTCTGTACTCAGTCCCTGACCTCACAGCCAGCCCAGG~C~~AG~GTTGGAGGTGCTAGC~GGGGACATAT~ TG~TGATGACAAATTAGAAAAGAGAGAGGGTAGGGC~CCCATAG~TTG~TTCT~TTTACTCT~AGTA~CTTCTTT~GCC~T~GT AGAGTAGGGAAGCATCAGTTACAAGTTAGGAGAAACAGCAGCACTGCAGTATTG~CGGGAGC~GGTGAGGGCT~C~C~T~GAGGGTCAGT~~A . . . . Tn=AAGACCTTGTCAACTTTAAGGAGGAGGTT~CAGAGGTCTTGGGACTTTTGGCAGGATTGCCTCTG~GG~G~GGCAGGAGCTA . . ..CAGTGGGAGCAGAAGCTCCAAAACTCTGAATACCAGCT~CTGGTGTGACTTTGATACAGTAGTGGCTGCCTGAC~AG~TGT~C

AS Gene $&T-Y *S-Al MS-A2 @S-A3 @S-l WAS-3

AS Gene . . ..CGCCTGCGCCCCCGCCGGCGCGCCCCTGGGAGGGTGAGCCGGCGCCGGGCCCAGGCCCGGACCTGGTGGGAGGCGGGGGGAGGTGGGGACGAG~C @!+A1 CTGAGGGATTCTTGTTAGAGCAAGGATACTATGCCCATCAGCTAAAGATAAGTCCTCACATGGTGTn=ATAAGACTAGCn;GGCTTCCTAAGTATCCT #AS-A2 . . ..CATCAGTCAACCATGTAGGC #AS-A3 TTAAAACAGGCATAAAGAGGAAACTGAAGGCAAAGCATTGAC~TCTAG~TGATTATCAGCTTCAGCT~G~CGATACTCACCAGCTCC~GGAC

*S-Al @S-A3

#AS-Al MS-A3

cDNA *AS-Y wis-7 *AS-Al *AS-A2 *AS-A3 *AS-l *AS-3

cDNA *AS-Y *AS-Al *AS-A2 *AS-A3 *AS-l *AS-3

cDNA *AS-Y WiS-Al *AS-AZ *AS-A3 *AS-l ws-3

cDNA *AS-Y *AS-Al *AS-AZ *AS-A3 @AS-l *AS-3

cDNA *AS-Y IrAS-Al *AS-A2 *AS-A3 IrAS-1 *AS-3

cDNA *AS-Y *AS-Al *AS-A2 WS-A3 as-1 *AS-3

a--

w-e

---

---

T-m

---

---

-w-

-WV

---

---

e-v

m-w

m-w

--a

-mm --w

G=-

---

c--

e-e se-

---

--a--

m-T

--e-e

-A-

---

a----

---

---

-mm

w-T

---

---

m-m m-s ---a-

---

-me

a--

-a---

em-

-T

--

e-w

w-e

--T

mm--a-e-s

-we

T

e-c

-**

---

til

--C

--A-,!- -

---

---

-me

---

mm-

---

a--

w-m

e-w

-we

---

e-v

A--

---

---

A--

-a-

---

---

---

---

---

---

---

e-m

---

---

---

---

-em

-----

4 m-e

---

-

---

--A

e-w

e-v

G-m.

---

---

---

---

---

---

---

---

T-e

G-A --

T--

-A-

---

---

e-e

---

---

---

-me

A--

GCC CGC AAA CM e-e -A-mm e-e

---

---

e-w

e-s

TGG CCG GCC ATC

--

---

C

Be-

-me

#

T-T

III

$?zzzz:::$I

-a-

---

a--

m-m

-G-

---

---

-a-

A--

ITT

*****,,a.

l ftt*--,

I$j

_--

-em

f--c-

---

---

-se

---

A-C

A--

---

---

---

---

---

---

---

-we

---

---

a--

---

---

---

--A

--A

+J

--lr;i

--T

---

---

---

---

---

e-v

-_-

-_-

---

---

---

-a-

--T

---

TGT

---

-we

---

---

SW-

-I-

CTC ATG CAC ATC

---

s-m

---

---

---

---

---

-e-

---

AGC ATG GAT GAG MC -T--_ e-c ---

-A-

---

--T

---

---

---

---

a--

me-

--a

w-w

---

---

---

--I

---

---

---

e-e

__-

--

---

---

---

--a

---

---

---

---

A--

---

--C

-+J

-+j

---

---

---

---

T--

---

---

_--

---

---

---

_--

---

---

---

---

___

_-_

---

A--

IIS

---

---

-w-

1::

-me


---

---

m--

a--

---

--

4

A ---

--a

---

---

w-e

-me

CTG

---

-es

---

a--

---

---

--

a--

a--

---

---

-a-

---

---

---

GAG

---

---

---

---

CCC ATC ----we- --a

-----

me-

AGC TAC GAG GCT GGA ATC T

---

---

---

---

e-T

---

-G-

I=$-

---

---

---

A--

---

m-m

II

---

C

---

-es

-em

-WC

A

--

e-G

w-G

m-w

---

--a

-a-

AAG

---

we-

---

--.. ---- T -----a---- T em-

---

3! ---

-A--w

MC ccc

-T-

em-

---

CCG GTC ACT --A -----Te-w -me A -------

---

m-e

A

GCT CCC TGG AGG ATG CCT GM

---

--a

---

---

14

--G

---

---

---

AAG GTC *****ATT

-T-

---

A--

---

a--

m-T

III

--a

---

m-G

---

---

*--

T

C-m

Aa-

---

---

m-m

#

--

A--

A--

--_

-+

e-m

II

e-m

---

m-m

---

---

---

-we

WV-

---

e-G

---

TCA CTG GCC CCC CAG ATA

a--

---

---

---

_--

l **

III

-em

-mm

---

GAC CTG ATG GAG TAC GCA AAG CAA CAC GGG ATT e-T ------A em- A-----w-T C-----------A --T --e-s ---------

--_

---

-T-

A--

AAG GGC CGC AAT s-T --T-w ---------G-

-A-

-

AGC TGC TAC

D-e

--a

em-

-us

---

---

-T-

A--

---

-es

---

TTC

-A-

---

-A-

III

---

-T-

-me

---

---

e-m

s-T

---

-a

---

A ---

---

m-T

---

---

III

---

a--

-a-

II

---

---

GAG CTC w-w -a-

---

---

WC-

me-

---

---

---

---

T-e

---

---

T-m

---

---

C-m

TAC AAC CGG TTC

---

1::

---

-em

s-w

-a-

---

---

---

---

MC

CA**A --TA--ffT a-t*, -,**a --a*, --tt, --a*,

--a

---

# ---

GCG -4 -TV --A

-T-

CCT -------a-------

-se

CCA -mm ------a ---T---

CCC AAG AAC CCG TGG -T-----..-- ----a --a--T------T -- ----a mm- m-v -a-a----T-w

---

III

---

---

---

GTG GAA ATC S-B -we -ST

---

---

---

---

CAG TCC AGC

---

-c-

----a

---

-a-

---

--A

---

w-w

CTT GCC AGG CCC TGC ATC -- G w-T ---T---me

---

w-m

---

-a-

ATC

GTG TCC CAC GGC GCC ACA GGA AAG GGG AAC GAT CAG GTC CGG TTT --A --T-T --AT- -TG -----A---------A---

---

-

e-e

-em

e-e

---

GTG GAG GAG TTC

GCC CAG CGG GAG GGG GCC AAG TAT A----A---em ----e-c

El

T-m

e-w

---

---

-mm -we

s-s

-em

---

---

---

---

w-e

--a

m--

---

---

---

---

---

---

---

---

--_

---

--_

_--

---

---

---

___

___

w-w

-se

GAG GAT GTC AGC AGG GAG TTT

GAG GAC CGC TAC CTC CTG GGC ACC TCT ----TA- -em -se em- -em w-w ---

-a-

ATT

GCA CTG TAT ----C ---

---

---

-MT

G0.I AAA AAG GTG TTC

cDNA *AS-Y *AS-7 *AS-Al #AS-A2 $AS-A3 *As-1 *AS-3

cDNA tAs-Y *AS-A1 *AS-A2 *AS-A3 @AS-l *AS-3

cDNA *AS-Y *AS-7 *AS-Al *AS-AZ *AS-A3 *AS-l #AS-3

cDNA *AS-Y *AS-7 *AS-A1 *AS-AZ *AS-A3 *AS-l *AS-3

cDNA *AS-Y *AS-7 *AS-Al #AS-AZ *AS-A3 #AS-l *AS-3

cDNA *AS-Y *AS-7 *AS-Al *AS-AZ *AS-A3 ws-1 #AS-3

-em ----w-w

-a--^--

-a---

--_

-----

--_

--a ---

--a ---

e-w --e-m -----

-Be -em -mm -----

-A-

---

---

-----we ---me

GAA ---

T-m. -A--Be -

-Av-m weIIT

----a 14

TGT ---

CGC

GTG

GAA

a-G

__m-e

--> --_ ---

--e-G

__-

e-T e-s

GTC

A

TAC -VW

GAG --A

GTG -SW -we -ea----^a

G--

-T--A -w-em w-m

ACG --A

-__ ---------

TTT ---

-__

-----__ ---

AAA

me-em

C-B C-m

GAG A--

___

--A

GCG

-- C ---

CCT -- C ---- C -- C 3

TAC w-s m-e --m-v e-e -es ---

ATC ---

GGT

GAA

-----v-

AAC

--w-e

--A

---

GGG GTC ---A-----

CTC

GGT

---

---

---

--_

_a_-_ --___ -c--

GTC ---

_-_

---C_-_-_ _--

ATC

weG-e

_-_ G--

ACC ---

me_

e-s

GGC

em-

_-_ T-m -__ --T-a -

CGC -A-

-a_

--e-T ----_ ---

AAA

----G

a__ ---

CCA -T-

--A

AAG

w-w

---

---

---

---

---

-G-

--_

GTG

---

AAG

III

,I

---

---

---

---

w-e

--%#I -T-

-c--_ -a_ -__ -__

CAC ---

--_

-------

CAA

-----

-T---

GCA --T

s-C

CAT

--T

--w

---

---

---

---

---

ACC

---

Br

*,,

CAG

ACC

AAG -------

__--__-----

TGC m-e

__-

--B-w __---

GGC

__m-w

__A--

GGC -me

__-

GGC

G--

---

--a

---

-G-

---

_--

AAC

--T

I$j

---

II ---

GAC

--------T--

ATC ---

---

_--------

--_ ------em_

GCC A--

---

--mm---

GGC

--em-

--a --CTG

--_ w-s

ATC ---

-__

GGC

--a

e-c

s-m

---

---

---

---

AAG

---

#

v--

BI -a-

GCC

-a---

ACC -T-

a--

GTG

A--

--a

---

---

---

---

_--

GTC

e-G

14

m-G

g ---

CCA

___ --em_ --___

AAG -__

___

G-w -*mm---

TTG

--a -a_

-----

CTT ---

em_

CGT

e-m

GAT --e-e m-e ---w-

------***

AAA --e-s s-e

----T-a --_

---

CAG we-

-em

TTT a-_ ---a---

m-c

e-m -x

CAT T --

---

GAC

T--

-we

-se

ACC

-Mm

me-

CCC -----

--I -----

TCC ---

---

AAA __-_-----

----a

--a ---

TAC ---

-C-

ATT

w-w

---

-VW

GGC

***

-VW

GCC --w-w

_

___

v-m ---G-

GAG ---

---

GCT m-m -em -T---

---

-mm ---

GCT -__

---

ATC

ACC -----a---------

AAC ----w----------

NC--

-CG -A-

CGA T-q

---

GAG --e-e -----

--m-s

-----

CAT -a_

--a

GTG

T--

---

B-w

---

CAC

-----

ACC ----em---

---

-CA ---

GTG ---

---

CTG e-e w-w -a-

a-m-m

e-w ---

TTA ---

--e

GAG

---

--a ---

GAA ---

a- A

GTG --a A-A ---

-----

m-e ---

GAC c--

mm-

AAC

-------------

------8 ---

ACC ---

CAG ---

-----

---

---

se-

w-m

---

GAC

-se

---

---

A--

---

CCT

---

CTC

-es

-----

GGG -a-

m-m

:$j

TAT

--mm-

VW-C-

ATC ---

m-v

CGC

--A m-w -a---

TCC -------

-mm

---

AAA -a<--a---

-a-

1::

ACC

A-A--

A -A--

GAG e-w

-- T

TTC

---------

#

TTG

------a ------ C ---

ATT

wea-T

m-m

---

-em

---

GAG -em --a C-A

,,+

-

*

---

GTG m-w mm-----

-em

,j$:

GGT

-----

-T---

GCC --a

---

ATT

-we

-

-

-SW

GTG ---------

---

T-s AA CAG -me --m-s ---

CGG

--W-B

m-v -em

ACC WV-

---

ATG

c*+

---

---

-G---

-WC

I#,

TTA

-we ---

---es

TTC w-v

m-c

GGA

El

---

-----

-----

-----

-- T

A TCC C-T --C-G _--

-AC

#

CCT

-----

-CA e-s

ATG ---

---

AAG

---

*,,

-A-

---

G -em---

---

GTC em------_

---

1:: ---

AGC

-a---

---,-+

GAC A-T

---

TCC

m-m

--a

---

---

TAC m-w -mm ---

---

a--

---a-

--a II l

-

‘,-a-

CTC a m-w ___

A--

1:: ---

CCT

B -T

---A-

CGG -A-

-we

CGA

-SW

me-

e-w

---

CTG ---em ---

-a-

---

---

---

AAAAAA D-e -VW ---------

TTC -we -c---

GAG CTC! TTC ATG

---

---

--e-w

---

-3 m-s

ATC

w-v

GAG T---A--

_---_- G -------

-----

-G---

ATC -a-a-tie e-_ ---s_ ---

AAC ---C------a -----

ATC --e-s -----------

AAT m-e +.+ --c --w-m ---G-

e-e TCC --e-e --------a ---

--CTC A---T -----------

--A AGG m-e me-- A --w-m -----

A--

-------c----------------G--------------------,---,,A-,,,--Gfffff~***--------

A--------------------****~****--------

---w--w C-es-- C--------------------------------ff**f**~****---------- C --,C--------------------------T----T---*******~*--------GA---*------------------A-------------------

CTG --w-w --a --s-e . ..-m.--

GAA -we -------------

--CA-------

-,,,--,,---,,--,--------,A---------------------------T-----------------*----**********---------------_---_-----------TA-----------------------T--------------------8 -w--- T----------------------------# -T----------------CA--G--T------------T---T------------------------*----**********----------TMCMC

4AS-A2 $AS-A3 *As-1 IrAS-3

A I

Aid

-Y&

**********AATTAAAAGAG(Poly A,AAAAATGAC--------A-AGA~AG

--,--,,--*---,t*++**ttrr--C---C-------TACACTA e-s-- T ---*---,t**t*+**,t-,,,-,--,-TACACTA

,,--,,,,-,-,,,,t,,-,

T----A-CA----G-------------T----A-----------A-----------*----~~~~**~***----------


$AS-Al

*AS-7

*AS-Y

TA******T -et*****e-t*****, --r*****c --*****em --++***t,

I

e-w G-m e-w A--

l ***t+*+*,,-,,,,,

AAG -mm -em -----------

-em -a-

CCTGAAGCCTGCAAACGTTGTCATCGAAGGGG~G~T~GGGCAGCTGC~~GGAGCTAT~TGAC G--------------A--GA-----A----------------A--------A-----------*----

-----w-

-we

TTC --e-v ---a-em -me ---

m-m -em ---

TAGACCCGn;TACAAn;AGGAGCTGGGGCCTCCTCAATTT

GGG A --em --w-m a-a---T

-es

cDNA

----a ---

---

AAA ---we

ACC e-e e-e -----a----T

T we a--

TACAGGCGCTAATTGTTGTGATAATTTGTAATTGn;ACTTG~~CCCGGCT~CAGCGTAG~~GCT~CAG~CCCA~~~CC~~CC -----A-----------C---------------------TA--------TA--------------A------A----*---------A-A----A ---e-m # ----------------------------------T---> ------T-----------------------------C------------A-------A-C-------------------------------G------------A---------------------------------------T------------------------------------------T-----------_---------------------------------------------------------------------------------+-*-*--__-___-_------------~~~--~~~-~--~-~---~--~-----------------------------------------------# # ------T--------------C-------------------------------------------------------------------T---------

----w--

---

AAG GTC e-w -- c lz --T-m

----_ -A-

---

AGC -a--w-T

---

CAG T-A a----

-TA ACT C--------------

GCC -----A-------- T

ZIZ BI -em ---

---

14 -we

- I$# e-w T-m mm- -A-

GCC ---------------

IZI e-w

--w-T --T

CTC _--a---

z_'z II -----

# ---

-_a e-e e-w

CGT T-m -AT --

--A --a -me ---we ---

a-e-e ---

GAT A -w---m-v --a- G w-w

-mm -_---------

CAG GGT GAT TAT GAG CCA ACT & 11, $j III 14 III II ---we --w-s me- --G -T-T-a---em w-c-m T-m #

----G--------

A)

---

CAT --a -------

AAG GGC CAG GTG TAC ATC CTC GGC CGG GAG TCC CCA CTG TCT CTC TAC AAT GAG GAG CTG GTG AGC ATG AAC GTG me- e-m --e-v w-m -a- --A --m-m m-m me- -mm ------_ A-- e-w we- G-- --.A e-e A

cDNA HE-Y *AS-7 *AS-Al *AS-AZ 4diS-A3 *AS-l *AS-3

cDNA *AS-Y *AS-7 *AS-Al *AS-A2 *AS-A3 *AS-l WS-3

cDNA *AS-Y $AS-7 *AS-Al *AS-A2 @S-A3 *As-l *AS-3

cDNA tA!s-Y *AS-7 *AS-Al 4AS-A2 *AS-A3 *As-1 $A!I+3

*AS-h3:hlu

*AS-hl:Alu

CONSENSUS *AS-Y: Alu *AS-7: Alu

*AS-A3:Alu

CONSENSUS *AS-Y: Alu *AS-7: Alu WiS-Al : Alu

CONSENSUS WE-Y: Alu *AS-7: Alu WE-Al : Alu WCS-A3 : Alu

(b)

*AS-AZ

AAAhAACTTTATATTTTGGT

A Ah

GGAAGGTACCACATT

GGCCGGGCGCGGTGGCn=hCGCCTGTAATCCCCAGCACTTTGGGAGGCCGAG~GGG~GA~ACCTGhGGTCAGGAGTTCGAGACChGCCTGGC --h-h---h-----A-----------------------C --w--^TG -----A--CM--A------------------AT-----T---A-. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..-.............-...... . . ..1.................-..... ---h----h-A---------h--------------------------T--C-T----------jj-----C--------T------------h*****+*+**++,+jj,-,T-TT-------AA---~----------hTG-T--G-A-h---CTG-T----CC--T------h------------h

ACTTTCAGGAGTACTGGTTG~T~GGACTGAATTTT CAATAAAAATGAATGAACTGGCATGCAACCACACATGGATTATTC~~TGCATTA~CT~GTG~G~GCCAGAC~~GCCTACATA....

*AS-A2

*AS-A3

CACCGTGGCAAGTTGAAhACAAGCFl’TTTACATAGAATAAAGG CTTTACTTAGAAAGATAGTAAATTGCAMiThhAAAA GAAATATTTGTCCTTTACTTATGAATTAATOATTAAGCAA AGTTATATGTGTTACGT

GCATATCTGTGGG.... TTTTGGCAATTTTATATAGTAGCTTTTATATAT~~ATTT~~CC~GTACA~AT~TG~~T~~~G~~~ CCTGTCTTCACCTGTCTTn;ACAATTAGTAAATAAATCCCT~CCTTAT~TATTAT~TGGGCTAGCCATGG~ACC CCTGTCTTCACCTGACTTTGACAATTAGTAAATAAATCCC.... CTGTTTGAAACACCTAGAGTAGTTACTGTTTCCTTTATT...

TTTGcATTTTcTcTTTTT GTAAAAAAAAAAAArPGACTGCTCTGAAAACGAGTCTACTGGGCAGTGACGTTCACATTTCTGAAATTAATGCATACTCTA TGAAAACGAGTCTACTGGGCAGTGACATTCACATTTCTGAAATTAATGCATACTCTATTTGCATTTTCTCTTTTT GTAAAAAAAAAAA*TTGACTGCTC TGCTTAATTTGGACAGTAAACTTAGTAACCCCTAA TcccccccGcAAAAAA

*AS-AZ *AS-A3

as-3

LAS-1

W+A3

*AS-AZ

MS-Al

*AS-AZ *AS-A3 Ids-1 w4S-3

*AS-Y *AS-Al

230

H. Nomiyama et al. Probe : Anti-sense

to the AS cDNA (Bock et al., 1983) is 9lq/;, (Fig. 3(a)), and that of the Ah element to the Ah consensus sequence (Jelinek & Schmid, 1982) is 88% (Fig. 3(b)).

Sense

Liver HuH-6

(d) $A,!?-A2 4

I

0.5

4 t RNase

4 RNase

Probe :

Sense

Liver 4

I

0.5

The *AS-A2 and AS cDNA share 94% DNA sequence homology, and the $AS-A2 has structural features of a typical processed pseudogene. However, we found no direct repeat in the vicinity of the pseudogene (Fig. 3(a)), suggesting that a cDNA copy of the AS mRNA was ligated to blunt ends of a chromosomal break.

pg RNA

(e) $A&‘-A3 Synthewed I” utro

sense RNA antI-sense

The I//AS-A3 is flanked

with

direct

repeats of with the AS cDNA (Fig. 3(a)). A comparision of the nucleotide sequence of GAS-A3 with that of the GAS-1 (Freytag et al., 1984a) revealed that these pseudogenes share several common structural feat,ures. They have almost the same extent of sequence homologies to t.he AS cDNA (@AS-I is 93% homologous to the AS cDNA), and have many common base subst’itutions (Fig. 3(a)). Moreover. the restriction maps of these pseudogenes and that 14 bp in length and shares 93% homology

RNA 0.4

0.1

0.05

pg RNA

Figure 4. RNA dot-blot analysis of t,he t,otal RWAs from human liver and from cultured human hepatoblastoma HUH-~, hepatoma HUH-7 and HeLa cells. Total RNAs from these sources. as well as unlabeled sense and anti-sense AS RKAs synthesized in aitro using the SP6 system were applied m serial dilutions to nitrocellulose filters. As another control, we applied RNA samples treated with RNase A. These control samples were prepared as follows; prior to the denaturation step. 4pg R.?ilX samples were incubated at 37°C for 30 min in IO rnM-Tris.HCl (pH 7.0). I mM-RL)TA containing 1 pg of RNase A. The filters were then hybridized with the anti-sense RKA probe (6x IO5 ctsjmin per ml) or sense RXA probe (6 x lo6 cts/min per ml). Preparat,ion of RNA probes, conditions for hybridization and washing were as described in iMaterials and Methods. tinder this experimental condition, we confirmed that at, least 0.5 ng of the unlabeled anti-sense AS RXA synthesized in vitro is clearly detectable. When we assume that most of the anti-sense AS R?u’As synthesized in vitro cover the entire HindTTJ-&x~RI AS cI)KA fragment. i.e. 289 bp in length. (4.5 ng corresponds to 3 fmol. and when we assume that most of the anti-sense AS RXAs present in human cells cover the entire AS pseudogene. i.e. about 1.6 kb in lrngt,h. 3 fmol of the entire anti-sense AS RNA corresponds to 2.8 ng. This value indicates tha,t the sensit,ivity of the analysis is around WO79;, of the t,otal RNA. sense RNA probe in the RNAs from HeLa cells is not clear; however, we did confirm by Northern blotting that these hybridizable RNAs are of the AS mRNA size (data not shown).

(c) IjAS-AI A striking feature of the $AS-Al is that this pseudogene is immediately followed by an AZu element; that is, the pseudogene and the Alu element are directly juxtaposed in a head to tail fashion (Fig. 3(a) and (b)). Apparently, 15 bp direct repeats are flanking immediately 5’ to the 5’ end of the pseudogene and immediately 3’ to the oligo(A) tract of the Alu element. Homology of the @AS-Al

of’ their

5’. and S-flanking

regions.

spanning

more

than 5 kb, are similar (Fig. 5). In particular, the 3’.flanking regions of these two pseudogenes have extensive sequence homologies (Fig. 3(a)). However. as shown in Figure 3(a), @AS-l is apparently an incomplete pseudogene. Tt lacks a block of DNA sequence corresponding to t,he $-end region of the AS cDNA. This difference may have been caused by an insertion (or a deletion) of a small DNA segment in the 5’-end region of the *AS-l. In the 3’.flanking region of the *AS-A3 there is an Alu element t,hat. is not surrounded by direct, repeats (Fig. 3(a)). Since the corresponding region of the $AS-1 has yet to be sequenced, it is not known whether this region contains an Ah element,.

4. Discussion fire determined nucleotide sequences of four different AS pseudogenes, I(/AS-Y, $AS-Al , $AS-A2 and *AS-AS (Fig. 3). One of them, $AS-Y, is assumed to be located on chromosome Y (Fig. 1). These four pseudogenes differ from each other and from the three AS pseudogenes studied by Freytag et al. (1984a). Therefore, seven out’ of the 14 AS pseudogenes, present on 11 different chromosomes. have been characterized. Representations of their structures are shown in Figure 6. A comparison of their nucleotide sequences suggests that two of t’he pseudogenes were created by a duplication of two other. pre-existing pseudogenes. (a) Amplijkation

of pseudogene

Freytag et al. (1984a) proposed two mechanisms for the generation of multiple AS pseudogenes. One

Pseudogene Amplification E I

E I

APtPtA

H

A AP

IIII

II’

III

PtA

231 P

AA

‘m

E

Pt

I

I

$A%A3

E I

PA II

H PtA

A AA

Pt P

Pt

E

I

I

I kb

Figure 5. Comparison

of the restriction maps of II/AS-A3 and +A&1. Thick bars indicate pseudogene sequences determined by nucleotide sequence analysis. A triangle indicates the site where an insertion (or a deletion) of a small I)iVA segment has been predicted. Abbreviations of enzyme sites are the same as for Fig. 1. The restriction map of $L4S-l was taken from Freytag et al. (19&z).

is the

“independent origin” hypothesis, which implies that each AS pseudogene was created by an independent, event involving a reverse transcription. The second is the “common intermediate” hypothesis, that more than one pseudo-

gene was created from a single reverse transcript. The finding that two pairs of AS pseudogenes, $AS-YIIC/AS-7 and t+bAS-ASj$AS-1,share several common structural features (Figs 3 and 5) is consistent with the common intermediate hypot’hesis. In the $AS-ASI$AS-1 pair, we noted several

common

restriction

sites,

even

in their

5’

and 3’-flanking regions, thereby suggesting that one of t,he pair was created by a duplication of a large rmL4 segment, including a pseudogene region. From the nucleotide sequence analyses, this duplication

was

calculated

to

have

occurred

at

$lASmY 5’

__

-

3’ Ah

$/AS-7

5’ IIO.

Poly (A)

a..

3’

---------A:-::>--- ______

Freytag

efal

Ah $IAS-Al

5, -

_.

3’ __

*AS-A2

5, 3’

-_

$/AS-A3 5’

._

+AS-

3’

__

I

3’

_.

Frey tag ef al

3’ ---Freytog

eta

Figure 6. K,epresentations of the structures of 7 AS pseudogenes. Open and black boxes indicate AS pseudogenes itnd &-I/U elements. respectively. Hatched regions represents oligo(A) tracts at the 3’ ends of the pseudogenes or those of dlu elements. Filled arrows indicate the orient,ation of ALU elements; open arrows, the direct rrpea,ts. Broken lines represent DKA regions, for which nurlrotide sequence data are not available, and I)roken arrows indicate t,he predicted direct repeats.

around 9 million years ago (see Fig. 7 and Discussion, section (c) for t,he calculation). Accordingly, they are probably not alleles of the same pseudogene locus. It is possible that the $AS-1 described by Freytag et al. (1984~~)may have undergone a recombinational event, perhaps during cloning or growth of the cloned phage. the result being a juxtaposition of a piece of sequence at, t hr 5’ end, which actually does not I)rlonp to this pseudogene. The $AS-Y/IC/AS-7 pair may also be the result of a similar DNA duplication. Su et al. (1984) reported that two of the AS pseudogenes are present on the X chromosome. If *AS-T is one of the pseudogenes on this chromosome, t,hen $AS-Y may have arisen from *AS-7 or vice uersa, because it. has been demonstrated that crossing-over occurs at a relatively high frequency between t,he X and Y chromosomes (Cooke et al., 1985: Simmler ef al.. 1985; Buckle et al.. 1985). However, the presence of an Ah element at exactly the same 3’-end sit’e of this pair suggests another possibility. The Ah elements are believed to be dispersed along human genomes by RNA polymerase ITT transcripts, followed bv self-primed cDNA syntheses and subsequent rntegration of t,he c:DNAs into chromosomal

breaks

(Jagadeeswaran

et nl..

l!Ml;

Sharp,

1983). Moreover, it has been reported that RNA polymerase III synthesizes long RNA transcripts in vitro from an illu element present in the 3’-flanking region of the /?-globin gene (Manley & Colozzo, 1982). Therefore, it’ is possible that one of the two pseudogenes may have been created from a long RNA transcript synthesized from an RNA polymerase III promoter present at t’he 5’ end of the Alu element, and ext,ending into t’he r\S pseudogene sequence, thereby synthesizing an anti-sense R’lVA of the AS pseudogene, followed by an insertion of a cDNA copy into a new chromosomal site. If such an anti-sense RNA is an int’ermediate, t,hen @AS-\ must be the templat,e for the transcript’ion. because it contains both the 3’ end of t,hr pseudogene and an oligo(A) tract. These two components of the pseudogene are separated by the rllu element) from the rest of the pseudogene and transcripts initiated from the Ah promoter are not able t’o cover both the 3’ end of the AS pseudogenr and its oligo(A)

H. Nomiyama et al.

232

tract in a single RNA molecule. Therefore, we speculate that $AS-7 may be a duplicate copy of IC/AS-Y. We have not detected any anti-sense AS RNAs in several human cell lines. However, it is possible that the Alu element was actively transcribed at the time when the $AS-7 was created, or is transcribed in germ-line cells. Detailed characterization of the structure of *AS-7 should elucidate the relationship between these two pseudogenes.

$AS-A3 qAS- I JlAS-Al JlAS-3 +AS-Y $A?+?

(b) Features of IC/AS-A I $AS-Al is followed imediately at its 3’ end by an Alu element of the same orientation, and direct repeats surround both of them, as if one cDNA copy of an AS mRNA and that of an Alu transcript are integrated into a chromosomal site, as a single unit. Alternatively, it may have been created by an insertion of an AZu element into an oligo(A) tract of a pre-existing AS pseudogene. Another example of such a combined unit of two differently categorized DNA sequences that are assumed to be transposed by RNA intermediates is the case of an Alu element and a member of the KpnI family, one family of the human middle repetitive sequences (Miyake et al., 1983). In this case, an Alu element is present at the 5’ end of a KpnI family member. (c) Evolutionary history of the AS pseudogenes Comparison of the nucleotide sequences of the AS cDNA and seven different AS pseudogenes revealed the presence of 17 common base substitutions in at least five different pseudogenes (Fig. 3(a)). Ten of the 13 common base substitutions in the coding region are located at’ the third bases of the codons, and none of these base substitutions generates a termination codon. Furthermore, nine of these ten base substitutions do not lead to amino acid changes. As noted by Freytag et al. (1984a), these common base substitutions may reflect the nucleotide sequences of the functional AS gene at the time when these pseudogenes were created. Therefore, analyses of more AS pseudogenes may elucidate ancestral sequences of the AS gene at various evolutionary stages. To determine the evolutionary relationships of the functional AS gene and pseudogenes in detail, a phylogenetic tree was constructed on the basis of sequences. In comparisons of their nucleotide functional genes, the rate of change of nucleotide sequences differs considerably between amino-acidchanging (or replacement) positions and silent positions of protein-coding regions (Miyata et al., 1980); the former evolves slowly, whereas the latter evolves rapidly. In contrast, pseudogenes are thought to accumulate mutational changes at a rapid and even rate between different positions (Miyata & Hayashida, 1981). Thus, the silent positions of the functional AS gene (Bock et al., 1983) and the corresponding positions of the

I 80

I 70

I 60

I 50

I 40

I 30

I 20

I IO

I 0

Figure 7. Evolutionary relationships of’ the AS pseudogenes. Thick line, functional gene lineage; (+) possible node by gene duplication. For details of the methods for constructing the phylogenetic tree and for estimating divergence times, see the text.

pseudogenes were compared by the method described by Miyata & Yasunaga (1980). On the basis of the difference matrix with correction for multiple substitutions (Kimura, 1983), a phylogenetic tree representing evolutionary relationships among the eight AS-related genes, seven AS pseudogenes and a single functional AS gene, was inferred by the simple clustering method (Sokal & Sneath, 1963). To establish the approximate times of divergence of branching nodes, the rate of pseudogene evolution was determined from the comparisons of primate q-globin pseudogenes (Koop et al., 1986); assuming 40 million years ago for the separation of the Old World monkeys and New World monkeys, the evolutionary rate was estimated to be 1.63 x lo-‘/site per year. Figure 7 shows a phylogenetic tree of the functional AS gene and seven AS pseudogenes. This tree suggests that’ the multiple AS pseudogenes have been generated by at least two distinct mechanisms. As discussed above, two pairs of the pseudogenes, $AS-A3/$AS-I and $AS-Y/$AS-7. have diverged by gene duplication, while the common ancestors of the pairs as well as the other pseudogenes are likely to have derived from RNA intermediates. It has been reported that’ primate genes evolved at a non-constant rate, reducing their rates considerably along lineages from prosimians to hominoids (Koop et al., 1986). Thus, the present’ estimates of the divergence times are highly tentative. The divergence times may be estimated correctly for nodes corresponding to t,imes around 40 million years, at which the molecular clock was calibrated, but, may be underestimated for recent, nodes. and overestimated for remote nodes. We thank I)r T. Saheki of Kagoshima University for providing pAS1, Dr J. Sato of Okayama University for HUH-~ and HUH-~ cell lines, and M. Ohara of Kyushu University for reading the manuscript. This work was supported by grants from the Ministry of Education. Science and Culture of ,Japan.

Pseudogene Amplification

References Beaudet. A. L., Su, T.-S., O’Brien, W. E., D’Eustachio, P.. Barker. P. E. & Ruddle. F. H. (1982). Cell, 30, 285-293. Bock. H. G.. Su. T.-S., O’Brien, W. E. & Beaudet, A. L. (1983). ,VucI. Acids Res. 11, 65OG-6512. Buckle, V.. Mondello, C., Darling, S., Craig, I. W. & 317, Goodfellow. P. R’. (1985). Nature (London), 739-74 I. (‘hirgwin. J. M.. Przybyla, A. E., MacDonald, R. J. & RuUer. W. .J. (1979). Biochemistry, 18, 5294-5299. (‘ookr. H. J., Brown. W. R. A. & Rappold, G. A. (1985). ,Vatuw (London), 317, 687-692. I)aiger, S. I’.. N’ildin, R. S. & Su, T.-S. (1982). ,Vature (London),

298, 682-684.

Deininger. I’. L.. ,Jolly, D. J.. Rubin, C. M., Friedmann. T. & Schmid, C. W. (1981). J. Mol. Biol. 151. 17-33. Ooi. 1. (1976). Gann. 67. l-10. Frrytag. S. 0.. Bock, H. G., Beaudet, A. L. & O’Brien. b’. E. (1984a). J. Biol. Chem. 259, 316&3166. Freytap, S. 0.. Beaudet. A. L.. Bock, H. G. & O’Brien, \V. E. (19846). Mol. C’ell. Biol. 4, 1978-1984. .Jagadeeswaran. P.. Forget. B. G. & Weissman, S. M. (1981). (‘rll, 26. 141-142. Jeffreys. A. .J. & Harris, S. (1984). BioEssays, 1. 253-258. ,Jrlinek. A’. R. Cy Schmid. C. W. (1982). Annu. Rev. Biochem. 51. 813-844. Jinrio, Y.. Somiyama, H.. Wakasugi. S.: Shimada, K., Matsuda. 1. & Saheki. T. (1984). J. Znher. Metab. Dis. 7. 133~ 134. ,Jinno, Y.. Komiyama, H.. Matuo, S., Shimada, K., Saheki. T. & Matsuda. I. (1985a). J. Znher. Metab. Dis. 8. 157-159. .Jinno. Y.. Matuo. S.. Nomiyama, H., Shimada, K. & Matsuda. I. (1985b). .I. Biochem. (Tokyo), 98, 13951403. Kimura. >I, (1983). In The Neutral Theory of Molecular

233

Evolution, pp. 55-97, Cambridge University Press, Cambridge. Koop. B. F.. Goodman, M., Xu. P., Chan, K. & Slightom. J. L. (1986). Nature (London), 319. 234-238. Manley, J. L. & Colozzo. M. T. (1982). AVature (Lon,don). 300. 37&i-379. Maxam. A. M. & Gilbert. W. (1980). Methods Enzymol. 65. 499-560.

Melton. D. A.. Krieg, P. A., Rebagliati. MM.R., Maniatis, T., Zinn, K. & Green, M. R. (1984). Nucl. Acids Res. 12. 7035-7056. Miyake. T.. Migita, K. & Sakaki. Y. (1983). &cl. Acids Rea. 11, 6834-6846. Miyata. T. & Hayashida, T. (1981). Proc. Xat. Acu.d. Sci., l’.S.A.

78, 5739-5743.

Miyata. T. & Yasunaga, T. (1980). 1. Ablol. k:zfol. 16, %336. Miyata. T., Yasunaga, T. & Nishida. T. (1980). Pror. Yat. Acad. Sci., C.S.il.

77, 7328-7332.

Nakabayashi, H., Taketa, K., Miyano, K.. Yamane. 1‘. &, Sate. ,J. (1982). Cancer Res. 42, 3858-3863. Sharp, P. A. (1983). Nature (London), 301. 471472. Simmler. M.-C., Rouyer, F.. Vergnaud, G., Bystrom-Lahti, M.. ?u’go, K. Y., de la Chapell, A. & Weissenbach. J. 3 17. 692-697. ( 1985). Nature (London), Sokal, R. R. & Sneath. P. H. (1963). Principles of 8umerical Taxonomy. Freeman, San Francisco. Su, T.-S.. Bock, H. G., O’Brien. W. E. & Beaudet. A. L. (1981). J. Biol. Clhem. 256, 1182611831. Su, T.-S.. Kussbaum, R. L., Airhart, S.. Ledbetter. D. H., Mohandas, T., O’Brien, W. E. 8r Beaudet. A. L. (1984). Amer. J. Hum. Genet. 36. 954-964. Tsuzuki, T.. Eomiyama. H., Srtoyama. C.. Maeda. S. 8r Shimada. K. (1983). Gene, 25. 223-229. Vanin, E. F. (1984). Biochim. Biophyc. Acta. 782. 231241. White, B. A. & Bancroft, F. c*. (1982). ,I. Biol. (‘hem. 257. 8569-8572.

Edited by P. Chambon