Gene, 150 (1994) 243-250 0 1994 Elsevier Science B.V. All rights
reserved.
0378-1119/94/$07.00
243
GENE 08326
Construction
of a human full-length cDNA bank
(Recombinant DNA; shuttle vector; phagemid; cDNA synthesis; cloning; sequencing; in vitro translation; expression in mammalian cells)
Seishi Katoavb, Shingo Sekine”, %-Wan Oh”, Nam-Soon Kim”, Yuri Umezawaa, Naoto Abe”, Midori Yokoyama-Kobayashi” and Takashi Aokib “Human Protein project, Kanagawa Academy of Science and Technology (KAST).
Kawasaki, Kanagawa, Japan. Tel. (81-427) 42-5091; and
bSagami Chemical Research Center, Sagamihara, Kanagawa, Japan Received by Y. Sakaki: 24 April 1994; Revised/Accepted:
28 June/l
July 1994; Received at publishers:
4 August
1994
SUMMARY
We aimed to construct a full-length cDNA bank from an entire set of human genes and to analyze the function of a protein encoded by each cDNA. To achieve this purpose, a multifunctional phagemid shuttle vector, pKA1, was constructed for preparing a high-quality cDNA library composed of full-length cDNA clones which can be sequenced and expressed in vitro and in mammalian cells without subcloning the cDNA fragment into other vectors. Using this as a vector primer, we have prepared a prototype of the bank composed of full-length cDNAs encoding 236 human proteins whose amino acid sequences are identical or similar to known proteins. Most cDNAs contain a putative cap site sequence, some of which show a pyrimidine-rich conserved sequence exhibiting polymorphism. It was confirmed that the vector permits efficient in vitro translation, expression in mammalian cells and the preparation of nested deletion mutants.
INTRODUCTION
The number of human proteins is estimated to be 50000-100000, of which only approx. 3000 have had their primary structure and basic function characterized. A conventional approach to proteins starts with the purification of a target protein, which has required much time and labor. Recent progress in gene technology has enabled us to approach proteins from genes, especially cDNA synthesized from mRNA. Several groups have started the partial sequencing of human cDNA fragments Correspondence to: Dr. S. Kato, 4-4-l Nishi-Ohnuma, Sagamihara, 42-4791; Fax (81-427) 49-7631.
Sagami Chemical Research Center, Kanagawa 229, Japan. Tel. (81-427)
Abbreviations: aa, amino acid(s); bp, base pair(s); cDNA, DNA complementary to RNA; DTT, dithiothreitol; IPEC, intact protein-encoding cDNA; kb, kilobase or 1000 bp; nt, nucleotide(s); PAGE, polyacrylamide-gel electrophoresis; r-, ribosomal; SDS, sodium dodecyl sulfate; SV40, simian virus 40; tsp, transcription start point(s); u, unit(s). SSDI
0378-l
119(94)00537-O
as a part of the Human Genome Project (Adams et al., 1991; 1992; 1993a,b; Okubo et al., 1992; Khan et al., 1992; Sikela and Auffray, 1993). However, there is a large gap between the cDNA fragment and the corresponding protein. For the last several years, we have prepared various human cDNA libraries using different methods and characterized them by partial sequencing. This strategy has been confirmed to be an effective way to obtain novel proteins (Kim et al., 1993a,b; Oh et al., 1993), but there is still room for improvement in the quality of the cDNA library. To prepare a high-quality cDNA library, we have developed novel methods for synthesizing full-length cDNAs from mRNAs at high efficiency using a multifunctional shuttle vector that enables us to sequence the entire cDNA and to express the cDNA in vitro and in mammalian cells. By sequencing the full-length cDNAs and producing the proteins encoded by the cDNAs, we can prepare a collection of human proteins whose primary
244
structures have already been determined, called the Homo-Protein Bank. This paper describes the construction of a prototype of the Homo-Protein Bank.
A
RESULTS
AND DISCUSSION
(a) Strategy
Bgl I
pBR322 ori
B Gppp-
AAAA
+ + S-GGGGAATEGA~&+3
-A-Aeauw-
EEpRL,,-
TAP
T4RNA Ligase
DNA-RNA
chlmerlc linker
BAP
-d
kt
l-t-ffVector primer
RTase
ECORI A#-
EcoRl Self-ligation !
RNaseH Exoli DNA Pol I Ecoli DNA ligase
To construct the Homo-Protein Bank, we planned to prepare a cDNA library satisfying the following three requirements: (i) each clone has a full-length cDNA starting from a cap site or at least an intact protein-encoding cDNA (IPEC) containing the start codon, (ii) easy determination of their nt sequences and (iii) each cDNA can be translated in vitro and in mammalian cells without extensive work such as subcloning of the cDNA into other vectors. So far, there has been no method for preparing a cDNA library satisfying all these requirements. The Gubler and Hoffman (1983) method, which is the simplest and most popular at present, does not satisfy the first requirement. The Okayama and Berg (1982) method, which is so far the most superior method for synthesizing a full-length cDNA, does not satisfy the second and third requirements. We selected the following strategy to prepare a cDNA library satisfying the above three requirements: (i) synthesis of a cDNA using a vector primer, as well as the Okayama-Berg method, (ii) use of a novel multifunctional phagemid as a vector primer and (iii) use of a novel method for synthesizing a full-length cDNA based on modification of the cap site of mRNA (Fromont-Racine et al., 1993; Maruyama and Sugano, 1994). Fig. 1A shows tography.
The poly(A)+RNA
(10 pg) was dephosphorylated
for 30 min in 100 pl of a reaction
mixture
containing
(pH 7.5)/2 mM DTTjlOO u of RNase inhibitor sv40 Olf&P
rial alkaline
phosphatase
(TaKaRa).
nol precipitation, the pellet containing 50 mM Na,acetate u of RNase inhibitor/O.1
(TaKaRa)/O.b
After phenol
at 37’C
100 mM TrisHCl extraction
u of bacteand etha-
was dissolved in 100 ul of a solution (pH 6.0)/l mM EDTA/Z mM DTTilOO
u of tobacco
acid pyrophosphatase
(Epicentre).
The decapped mRNA and 3 nmol of a chimeric DNA-RNA oligonucleotide linker (Y-dGdGdGdGdAdAdTdTdCdGdA-GGA-3’) synthesized with a DNA synthesizer (ABI) were ligated at 20°C for 24 Fig. 1. Synthesis of full-length cDNAs using a multifunctional shuttle vector pKA1. (A) The structure of a multifunctional shuttle vector pKA1. The pKA1 contains an SV40 replication origin (ori), an SV40 early promoter (P), a splicing site, a T7 RNA polymerase promoter (T7), a universal sequencing primer site (U), a cDNA cloning site, a T3 RNA polymerase promoter (T3), a poly(A)-addition signal, the intergenic region of bacteriophage fl (fl ori), restriction sites generating a 3’ protruding end and another reverse sequencing primer site (R). Plasmid pKA1 was digested with KpnI and tailed with about 60 nt (dT6,,). After digestion with EcoRV, the dT-tailed plasmid was separated on agarose gel and used as a vector primer. The nt sequence of pKA1 is available from GenBank/EMBL/DDBJ under accession No. D13749. (B) The capping method for synthesizing full-length cDNAs. Total mRNA was isolated using the guanidinium/CsCl method, and poly(A)+RNA was purified using oligo(dT)-cellulose column chroma-
h in 100 ul of a reaction
mixture
containing
50 mM
TrisHCl
(pH
7.5)/5 mM MgC1,/0.5 mM ATP/2 mM DTT/25% polyethylene glycol 6000/100 u of RNase inhibitor/250 u of T4 RNA hgase (TaKaRa). The DNA-capped mRNA and 1.2 ug of a dT-tailed pKA1 vector primer were dissolved in 20 ul of a solution containing 50 mM TrisHCl (pH 8.3)/75 mM KC1/3 mM MgC1,/1.25 mM dNTP/S mM DTTj50 u of RNase inhibitor/200 u of Superscript II (GIBCO-BRL). The reaction mixture was incubated at 45°C for 1 h and then 50°C for 30 min. After the first-strand synthesis, the reaction product was digested at 37°C for 1 h with 20 u of EcoRI in 50 pl of a buffer solution, The digested product was dissolved in 50 pl of TE solution. Of this solution, 5 ~1 were used for self-ligation and the second strand synthesis according to Okayama and Berg (1982). The obtained cDNA vectors were used for transformation of the competent cells E. coli DH12S (GIBCO-BRL) by an electroporation method. TE is 10 mM TrisHCl (pH 8.0)/l mM EDTA.
245 the structure of a multifunctional shuttle vector, pKA1, developed for this purpose. The plasmid pKA1 possesses various functional units that enable us to do a unidirectional cDNA cloning, to prepare an antisense singlestranded DNA for sequencing and screening, to prepare a sense RNA for in vitro translation and screening, to express the cDNA in mammalian cells by transfection, and to sequence the entire cDNA using a deletion method. These functions satisfy the second and third requirements. (b) Synthesis of full-length cDNA
We used two methods for synthesizing full-length cDNAs: one is a modified Okayama-Berg method (Pruitt, 1988) and another is a novel method using a DNAcapped mRNA developed by us (Sekine and Kato, 1993). Using the first method, the cDNA library was prepared from poly(A)+RNA of human histiocytic lymphoma cell line U937. This method produces full-length cDNAs at high frequency if the starting mRNA is intact. The principle of the second method based on modification of a cap site is shown in Fig. 1B. The poly(A)+RNA extracted from human fibrosarcoma cell line HT-1080 was treated with alkaline phosphatase to remove the 5’ phosphate residue of a degraded mRNA. The cap of an intact mRNA was removed by a tobacco acid pyrophosphatase treatment. To the generated 5’ phosphate group, a chimeric DNA-RNA oligonucleotide linker carrying an EcoRI site was ligated using T4 RNA ligase. The resulting DNA-capped mRNA was used as a template for cDNA synthesis. The cDNA vector was circularized after EcoRI digestion, and an RNA strand was converted to a DNA strand. In principle, this capping method gives only full-length cDNAs. (c) Quality of the cDNA library
The resulting cDNA libraries contained > lo6 independent clones in which >90% possessed the cDNA insert. The overall yields of cDNA clones varied from 2 x lo6 to 2 x 10’ independent clones per ug vector primer depending on the transformation efficiency. To examine the quality of the cDNA library, the partial sequencing of the 5’ region of the cDNA insert was performed using the amplified cDNA libraries prepared from the above two libraries. The amplified libraries contained the short-sized cDNA insert (mainly less than 3 kb) and the average length of cDNA inserts was approx. 1.2 kb. When analyzing non-amplified libraries, we found many clones carrying a long cDNA insert up to 8 kb. By amplification, the long cDNA seems to be remarkably reduced. Nevertheless, we used the amplified libraries to complete the short full-length cDNA bank at the first stage of the bank construction.
The obtained nt sequences and the deduced aa sequences were compared with those of known genes and proteins in databases. The results are summarized in Table I. Of the clones selected from U937 and HT-1080 libraries, 42% and 51% contained the nt sequences which agreed with those of known human genes deposited in the databases. In these clones, the ratio of an IPEC clone was 75% and 78%, respectively. The insert sizes of the IPEC clones varied from 400 bp to 2.9 kb. The average size was 1.1 kb for clones obtained from either library. 14 and 13% of total clones contained the cDNA insert encoding the protein whose aa sequence has similarity to those of known proteins accumulated in the protein databases. Because most of them show great similarity around the N-terminal region, these clones seem to contain an IPEC. If so, the ratio of the IPEC clones becomes 71 and 83% for U937 and HT-1080 libraries, respectively. Considering that the ratio of IPEC clones was less than 20% in the cDNA library prepared using the conventional methods (unpublished data), these results suggest that both methods are appropriate for obtaining IPEC clones. When comparing the two methods, the ratio of the IPEC clones was found similar. The truncated clones in the library prepared by the tailing method may originate from the degraded mRNAs contained in the starting material. Unfortunately, the library prepared using the TABLE
I
Classification
of cDNA
CeP
clones by partial Total
Method’
sequencing”
Identicald
Similar’
(IPEC)
(IPEC)
Novel’
u937
Tailing
176
326 (246)
105 (74)
345
HT-1080
Capping
720
369 (287)
92 (76)
259
1496
695 (533)
197 (150)
604
Total
BThe clones selected randomly from the amplified cDNA libraries were infected with helper phage M13K07, and single-stranded DNAs were prepared
for sequencing. The sequencing reaction was performed using a Taq kit (ABI) or a BcaBEST TM kit (TaKaRa). The dye-labeled pro-
ducts were analyzed on a 373A automated DNA sequencer (ABI). The obtained sequence data were analyzed on a Macintosh computer using the software
GENETYX-MAC
search
done
was
SWISS-PROT MAC/CD
and
(Software
using
the
(Software databases
NBRF-PDB) Development)
and
Development). the
program
GENETYX-
based on the algorithm
and Pearson (1985). ‘The source of a poly(A)+RNA
used for synthesizing
“The method
a cDNA library.
used for preparing
in section b. d The number of cDNA clones containing that of known human genes. The number
A similarity
(GenBankTM/EMBL/DDBJ, of Lipman
cDNAs.
Details are described
the nt sequence identical to in parentheses represents the
number of IPEC clones. ‘The number of cDNA clones encoding the protein whose aa sequence has similarity to those of known proteins. The number in parentheses represents the number of IPEC clones. ‘The number of novel cDNA clones whose nt sequences or deduced sequences show no similarity to known genes or proteins.
aa
246 TABLE II Current collection in the Homo*Protein No.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
HPNo.’
Bank
PlUlbt’
HP00440 2’.3’-cyclic nucleotidc 3’-phosphcdiessrue HP00318 23-kD8 highly brie protein HP00053 A&l-B HP00109 ActinHP00367 Amidophosphoriboryltran:ferase HP00512 Arninoacylue- 1 HP00102 Anwin II HP00190 A’Il’ syrithase B chain HP00306 A’fP sptthue c subunit (P2 form) HP00020 ATP synthae 1 chain-lib HP@3374 ATP synthua lipid-binding protein-like HP00375 ATP rynthue Om-sensitivity-conferral protein-like HP00433 Basigin HP00282 8-2.miuoglobulin HP00425 B-coat protein-like HP00461 8-@actori&-binding protein HP00393 Breast bulc coruerved protein 1 HP00069 Calcium-binding protein-like HP00494 Calcyclin HP00176 c!almodulin HP&360 Calmrin precurtor HP00206 Calpactin I light chain HP00517 Cathepsin D HP00412 Cathepsin H HP00276 CCAAT-binding trnscription factor I subunit A HP00446 CD44 antigen HP00443 CKSLZ HP00471 Clathrin coat assembly protein APS&like HP00331 Ciathrbt-usociaed protein 17.like HP00016 Cofilin HP00477 Cyclin (proliferating all nucleu anti8en) HP00042 Cyclophilin HP00529 Cytochrome c HP00090 cytocbronm c oxidase II HP00178 Cytochrotnc c onidasc III HP00162 Cytochrome c oxidne IV HP00506 Cytochronw c oxidac Via-liver HP00369 Cytochromc c oxidase VIb HP00523 Cytochrome c ox&se VIIc HP00445 Cytokeratin 8 HP00145 Diuspun binding inhibitor HP00304 Dihydrolipoamide dehydrogcnase HP00466 DNA binding protein inhibitor Id (HLH)-like. HP00403 DnaI-like HP00017 EB virus small RNAa associated protein HP00173 Electron transfer flawprotein waubunit HP00155 Elongation factor 1-o. HP00037 Elongation factor 1-p HP00249 Elongation factor 1-y HP00414 Enolue-. HP00337 l?sterw D HP00136 Eukaryotic initiation factor 4AI HP00172 Bukaryotic initiation factor-2a HP00404 Bxcirion npir protein HP00465 F-actin capping protein 8 subunit-like HP00492 Perritin heavy chain HP00439 FK506.binding protein HP00064 Frwtom-bispborphate aldolasa A HP00429 Plunwylacetoaceuse HP00287 0 protein p subunit-like protein 12.3 HPtYJ326 Gene H5 protein-like HP00495 Gln-tRNA rynthctasc-like HP00138 Glutathiom peroxidase HP00473 Glut&ione S-transferase Pi HP00117 Glycenldohyda-3-phosphate dehydrogenasc HP@3488 OTP-binding protein (ml) HP00442 H+-ATPwe protcolipid protein-like HP00516 H+-A’Tl’aseproton chnnoel subunit (vruolar) HP00004 Ha shock protein 90 kDa HP00449 Helix-loop-helix protein IR2l HP00474 Histone HlD HPC0453 Histone H2AZ HP00452 Hitonc H2B.l-like HP00193 Histonc H3.3 HPCQO14 Histonc H3.3.like HP00504 HLA class II hiitocompatibility antigen OR LI chain uPcaO49 h”RNP Al HP00303 hnRNP Al-like HP00432 h”RNP A2 HP00185 bnRNP Cl/C2
v
Hl=PL4
1 0 PLl 9 9 FL2 1 8 PL3 3 6 FL1 1 0 ? 0 1 PL4 0 lPL4 1 OIu 1 0 Ix1 0 1 ? 1 O? 1 0 ? 0 2 Ix3 4 2 PLl 1 0 ? OlFL4 1 lOPL3 0 1 ? 0 2 FL1 2oFL4 1 O? 02Fl.4 0 1 FL1 1 oFL4
IO? 0 1 l-xl 0 I FLA 0 2 ? 0 0 0 7
1 3 1 6
? FL3 l-u FL2
IO? 3 0 FL1 150 FL1 0 1 ? 0 1 ? 1 oPL4 0 1 PI-4 0 3 PLl 0 1 FL4
I OFU 0 1 ? I I ? 1 I FL2
1 OFu 199 FL1 1 1 FL2 0 4 FL3 1 2 FL1 2 0 PL3 0 2 FL3 1 0 PLl 1 oPL4 0 1 ? 0 3 PLl 0 1 FL1 0 2 Ix1 1 OIu I 5 FL3 1 O? 0 1 ? 0 2 Ix1 0 1 FL1 1 13 PL2 0 1 PL‘l 0 1 ? 0 1 ? 3 0 FL3 0 2 FL1 0 1 PLl 0 1 FLA 0 2 ? 1 1 PLI 1 0 ? OlPL4 2 0 FL3 1 O? 1 ow 1 OFu
No.
HPNo.
Protein
8 1 HP00267 Growth and tmttsfomution-depsndem protein 82 HPC042rI Hyponanthin phorphoribosyltnnrfc.rsrue 83 HP00410 Id-2 (brdix-loop-helix) protein 84 HP00129 Lactate dehydmgmtua H chain 85 HP00094 Lactate &hydro~nnuc M chaii 86 HP00146 Lantinitt-binding protein 87 HP00259 Lufw La uttigen 88 HP00349 Msla dehydmpnar-like 89 HP00434 Membrw glycoprotein gp25L-like 90 HP00436 Mctallothionsin-II 9 1 HP00082 Mitochottdrial phosphate carrier protein 92 HPOOSOOMitochondrial proteolipid 6.8-kDa-liio 93 HP00383 Multifunctional pmtein ADE2Hl 94 HP00496 NADH-ubiquirtom oxidoreduaue. B15 subunit-lii 95 HP00330 NADH-ubiquinom oxidomdwtam B22 subunit-liie 96 HPC0370 NADH-ubiquiwne oxidorsducprs chaitt 1 97 HP00441 NADH-ubiiuiwno oxidomdunue chin PSST-lie 98 HP0053 1 NADH-ubiiuinone oxidoredu~ CI-SGDH-ill 99 HP00400 Neuropoptide Y3 maptor 100 HP00521 Na-hirtom chromosomal protein HMO-17-like 10 1 HP00007 Nucleoli phorphoprotcin 823 102 HP00114 Nuckoside diphosphatc kimse A 103 HP00223 Nucbosids diphosphate kinue B 104 HP00382 Nuckoaome ummbly protein 105 HP00103 Omithine decuboxylasc 106 HP00524 Omithinc dccarboxylase antizyme-like 107 HP00182 Osteonectin 108 HP00233 Pcnuxin 109 HP00526 Peptidyl-prolyl cis-tram isomerarc B 110 HP00139 Phoaphoglyarate kimse 1 111 HP00023 Plutin-L 112 HP00358 Polyadenylrne-binding protein 113 HP00342 Potphobilinogen dsaminase (non-erythroporetic) 114 HP00277 Pm-mRNA aplici~ factor SFQp32 115 HP00362 pm-mRNA splicing factor SRp20 116 HP00166 Profilin 117 HP00361 Proteuome f~ subunit-like 118 HPM)S18 Proteuome chain 7 119 HP00509 Proteasorne component Cll-like 120 HP00416 Prot@uom component Cl3 121 HP00235 Proteasome component C13-like 122 HP00083 Proteuome component C2 123 HP005 19 Proteasome component C3 124 HP00092 Proteasome component C9 125 HP00402 Proteasorm component-like (IOTA) 126 HP00334 Proteasome-like 127 HW0327 Protein kinsse 40-kDa-like 128 HP00255 Protein kinuc-like 129 HP00527 Protain phasphatase PP-X-like 130 HP00226 Prothymorin-n 13 1 HP00319 QM protein (Wilm’s tumor-related protein) 132 HP00156 N-like protein TC4 133 HP00408 ma-related protein rab-1A 134 HP00341 r-phosphoprotein H) 135 HP00113 r-phorphoprotein Pl 136 HP00320 r-phaphoprotein P2 137 HP00530 r-protein BLlS-like 138 HP00149 r-protein BL23 homolog 139 HP00265 r-protein BSlZ-like 140 HPC0502 r-protein BS14-like 14 1 HP00324 r-protein BS17-like 142 HP00515 r-protein BS6-like 143 HP00072 r-protein Lll 144 HP00438 r-protein L12 145 HP00032 r.protein L17 146 HP00486 r-orotein L18 147 HP00251 r-protein LlNa 148 HP00310 r-protein LZl-like 149 HP00305 r-protein LWA-like 150 HP00325 r-protein L26 15 1 HP00468 r-protein L27 152 HP00460 r-protein L28 153 HW0335 r-protein L3 154 HP00415 r-protein L30 15 5 HWOZOOr-protein L3 1 156 HP00199 r-protein L32 157 HPC0347 r-protein WPlikc 158 HP00448 r-protein L35-like 159 HP00397 r-protein L35a 160 HP00332 r-protein L36A-like
lJHI,R
1 0 ? 1 0 ? 1 0 FL4 4 0 PLl 1 0 ? 9 7 FL2 1 0 FL4 1 0 ? 011 0 3 n1 0 1 FL1 0 1 ? 2oFL4 0 1 ? 0 1 ? 1 0 FL1 Ol? 1 0 ? 1 0 Ix4 0 1 ? 2 0 FL3 0 I FL4 1 1 FL2 3 0 Fl/4 0 2 PL3 0 2 FL3 0 1 FL4 0 1 lx4 1 0 FL4 1 0 FL1 0 1 ? 1 0 ? 1 0 FL1 0 2 ? 20x3 0 7 Ix3 1 0 ? 0 1 n1 0 1 ? 1 0 ? 0 1 ? 0 1 ? OlFL4 1 0 Iu 1 0 Ix4 1 0 ? 1 0 ? 0 1 ? 1 0 ? 0 1 FLd 0 8 FIa3 1 0 FL4 1 0 lu 3 3 lx2 6 6 FL2 2 2 Ix2 1 0 ? 1 4 FL3 1 1 FL2 0 1 ? 1 0 ? 0 1 ? 0 1 FL4 0 3 FL3 1 1 FL2 1 2 FL2 0 6 FL3 2 1 FL2 6 2 m2 5 2 PL2 0 2 FL3 0 4 FL3 4 0 FL3 1 3 FL2 4 3 FL2 1 1 FL2 1 1 Ix2 0 2 FL3 1 0 FL4 4 2 n2
247 TABLE II (continued) 16 1 HP00050 r-protein L3Wke 162 HPOW63 r-protein L37A 163 HP00379 r-protein WE-like 164 HP00266 r-protein L39 165 HP00104 r-pm& Ls-like 166 HP00212 r-protsio L6 167 HFW130 r-proteinL7 168 HP00309 r-protein L7a 169 HP00348 r-protein L&like 170 HP00301 r-protein L9-like 171 HPOO371 r-protein SlO-like 172 HP00312 r-protein S11 173 HPOOI68 r.proteio s12 174 HP00373 r-protein 513 175 HP00043 r-proain S14 176 HPW470 r-proteinSl5 177 HP00250 r-protein S16 178 HP00174 r.proein 517 179 HP00363 r-protein Sl8 180 HP00214 r-protein SL9 18 1 HP00073 r-protein S20 182 HW0357 r-pmtcinS2J 183 HP@3365 r-protein s24 184 HP00133 r-proteinS25 185 HP00285 r-protein S26-like 186 HP00469 r-protein S27 187 HP09015 r-protein 53 188 HP00153 r-proteinS3r I89 HP00279 r-protein S4 190 HW0077 r-protein S4X isoform 191 J1P00508 r-protoio S4Y isoform 192 HPOM72 r-protein SS-like 193 HP00028 r-protein S6 194 HP00323 r-protein S7 195 HP00085 r-protein S8 196 HP00419 r-protein S9-lib 197 HP00034 r-protein XLlA-Jike 198 HPW339 r.pmtein YLlO
3 0 62FL2 1 3 1 1 SOFU 2 1 5 0 6 4 2 3 114 4 4 1 3 0 1
FL3 FL2 FL2 FL2 FL1 FJ.1 FL2 FL2 lx2 FL2 FL4
1
OFU
1 0 1 2 2 1 1 2 3 3 1 0 0 8 0 f 0
2 FL1 J J%l 5 FL2 0 FLJ 0 FL3 1 rx2 2 FL2 J FL2 1 FL2 J FL2 O? 1 ? 8 FL3 7 FL2 18FL3 2 FL2
0
9 2 1 0 3 1
1 FL4 2 3 0 1 4 3 1
FL3 FL2 FL3 FL1 FL3 FL2 FL2
199 200 201 202 233 204 205 206 207 208 209 210 211 212 213 214 215 216 217 2 18 219 220 221 222 223 224 225 226 227 228 229 230 23 J 232 233 234 235 236
HP00451 r-pro&t Yi.25.like HP@3302 r-protein YLWA-like HPOW# r-protein YL4t bomolog HP00321 r-pro&in YL43-like WOOD27 r-protein YS% homolog HP00264 Secmtory ~tmnuk pmteoglycn core protsio HP00229 Signal mcognition puti& mccptor n subunit HP00536 Sin&tmnded DNA bindins protein P%lib HP00210 Smooth murcb protein U-o-like WOO444 snRNP-E &ted protein C29-like HP00396 SW& U’.22 HP00372 St&mitt HP00507 Testis-vp;ciJic protein TPXl-like HP00165 Thiomdoxin HP00100 Thymorin ~10 HP00196 Thymorin p-4 HP00209 Trnrniption factor BTF3 homologue HP00376 Tmnsniptioo faaor BTFMkc HP00029 TruzrformiDp protein rho A HP003 16 Tmn&tio~lJy cootrotlsd tomor proteio HP@0485 Triorephosphw isomerwe HP00222 Tub&t u. 1 HP00187 TubuJJn 8-l HPOOS14 Tumor antigen L6 HP00422 Ul snRNp C HP00409 UllU2 snRNP E HPC0418 Ubiquinol-cytoch c mduama 11.koII protein HP&%25 Ubiquinol-cytoduome c radwtwc 8-kDa protein-like HW0171 Ubiqo~offi.biodinE protein HP00184 Ubiquitin JlwO224 Ubiquitin carrier protein (E2-FW) HP00366 Ubiquitin.52 *P fusion protein HP00308 Ubiquitin-80 u fusion protein (UbaSO) HP00484 Ubiquitiwconjugating enzyme E2-17-kDs HP00522 Ubiquitin-like HP00346 Ubiquitin-like HP00307 Ubiquitis-Jike.S30 fusion protein (ftw) HP00192 Vimentin
Ol? 3 0 02s3 1 3 1 4 4 0 OlFl.4 10 Ol? 0 1 10 3 0 Of? J 1 0 3 4 0 20&l lo? 1 0 2 4 0 3 02Fu 02? Ol? 1 J 1 0 1 0 Ol? 2 0 J 0 02Ju 1 1 4 6 0 1 0 1 1 1 0 J 03FL4
J%3 r%2 FL2 FL1 ? ? FL4 FL3 FL1 FL3 FL3
? FL2 Rl
FL2 FL4 FL4 r&l ? FL2 FL2 FL4 ? FL2 FL4
aThe identification number given to the individual cDNA clone encoding one kind of protein. Thus, the overlapping clones were indicated by the same HP number. bThe name of the protein encoded by the cDNA. Om = oligomycin. “The numbers of IPEC clones obtained from U937 (U) and HT-1080 (HT) cDNA libraries are listed. dThe grade indicating certainty of a full-length cDNA, which was defined in section e.
capping method also contained the truncated cDNA clones possessing the linker sequences at the 5’ terminus. These may result from the degraded mRNAs during the tobacco acid pyrophosphatase treatment and/or the linker Iigation procedure. However, the obtained ratio is satisfactory for practical use. More than 30 clones were fully sequenced using the nested deletion method that can be easily performed because of functional units, another sequencing primer site followed by several restriction sites generating a 3’ protruding end, prepared on the pKA1 for deletion. Because all clones sequenced had a poly(A) tail of 50-120 nt, it was confirmed that the 3’ end of cDNA is intact. (d) Prototype of the ~omo.Protein cDNA bank The classification of cDNA clones by partial sequencing showed that total 683 cDNAs encoded intact proteins (Table I). These include overlapping clones transcribed from abundant mRNAs in the cell. By eliminating the overlapping clones, we obtained 236 kinds of cDNAs encoding different intact proteins that consist of 169 kinds of known proteins and 68 kinds of novel proteins. The content of the bank prototype is listed alphabetically
in Table II. In these clones, 61 kinds of cDNAs were isolated from both libraries, which may encode housekeeping proteins. Because we used the amplified libraries prepared from two different mRNA sources using two different methods, the number of overlapping clones does not reflect the accurate composition of the corresponding mRNA population in the cell. However, the abundant clones showed a similar tendency as reported on the expressed-sequence tags of non-biased libraries prepared from HepG2 cells (Okubo et al., 1992), that is, the abundance of protein-synthesis-related proteins such as elongation factor-la and r-proteins. Table II contains a total of 70 kinds of r-proteins, including 23 new ones, which correspond to approx. 90% of approx. 80 eukaryotic r-proteins (Wool et al., 1993). Furthermore, the bank contains main housekeeping proteins including other protein-synthesis-related proteins, cytoskeletons, metabolic enzymes, signal transductionrelated proteins and transcription factors. (e) Cap site sequences All clones shown in Table II contain an IPEC, but it is uncertain whether they contain a full-length cDNA.
248 Although our method must give a full-length cDNA in principle, some clones contained truncated cDNAs that may be derived from degraded mRNAs. We judged cDNAs satisfying one of the following three criteria to be a full-length cDNA: FLl, The 5’ terminal sequence agreed with the tsp that had been determined using Sl mapping of the corresponding genomic sequence; FL2, more than two clones possessing the same 5’ terminal sequence were obtained from two different cDNA libraries; FL3, the same clones were obtained from the same cDNA library. When judging whether or not the cDNA satisfies the criterion of FLl, we allowed a difference in length not more than 7 nt (2 nt in most cases) because alternative tsp within this range have been reported frequently and the Sl mapping method gives only putative tsp. In Table II, 119 clones (50%) satisfied one of the above criteria: FLl, 36 (15%); FL2, 47 (20%); FL3, 36 (15%). If only one clone has been obtained and the corresponding genomic sequence is unknown, we cannot judge whether or not the clone contains a full-length cDNA. However, most of the remaining IPEC clones seem to contain a full-length cDNA because the cDNAs identical to known genes have longer 5’ terminal sequences than those registered in the databases, and 53 clones (22%) came under this classification, which was denoted by FL4. We compared the 5’ terminal sequences of putative fulllength cDNAs. When comparing the cDNAs of grade FL4, only the sequences obtained by the capping method were used because we could not determine whether the 5’ terminus (dG) of cDNAs obtained by the tailing method originated from mRNA or the tailed (dC). We found that the pattern C,T,C (where m=0,1,2 and n 2 1) is conserved at the 5’ terminal sequence of many cDNAs, especially ones encoding r-proteins (Table III). The existence of a pyrimidine-rich sequence has been reported around the tsp of vertebrate r-protein genomes (e.g., Wool et al., 1990). All of the putative full-length cDNA clones encoding r-proteins in Table II, except for some clones presumably produced by an alternative tsp, contained this conserved sequence at the 5’ terminal. Other clones encoding housekeeping proteins such as elongation factors, nuclear proteins and ubiquitin-related proteins, also possessed the similar conserved sequence. This sequence may play an important role in the regulation of transcription and/or translation of these proteins (Hariharan and Perry, 1990; Levy et al., 1991; Kaspar et al., 1992; Morris et al., 1993). The existence of the conserved sequence at the 5’ terminus of novel cDNA clones may become one criterion for judging whether it contains a full-length cDNA. Indeed, some novel cDNAs in our bank showed this type of sequence. We also found polymorphism of the putative cap site
sequences of cDNA clones originating from abundant mRNAs as shown in Table III. Some of them, in which only the tsp is different, may result from mRNAs initiated at an alternative tsp. In other clones, a variation in the numbers of (dT) and (dC) was found. This variation does not seem to be an artifact produced during the cDNA synthesis process because (i) multiple clones possessing each sequence were obtained from both libraries prepared using different methods and (ii) we could not find this kind of variation in the other part of the cDNA containing a (dT) or (dC) stretch. The r-proteins are known to be a member of multi-gene families (Meyuhas and Perry, 1980; Monk et al., 1981), and it has been controversial whether all genes are functional (Wool et al., 1990). Our results suggest that several polymorphic mRNAs may be transcribed from different loci on the genome. Other characteristic patterns at the putative cap site sequences have not yet been observed. All combinations (16 types from AA to TT) were seen in the first two nt of the 5’ terminal sequence. More detailed analysis will be done when more data are accumulated. The 5’-noncoding sequences of putative full-length cDNA clones in Tables II and III are available from GenBankTM/EMBL/DDBJ (Accession Nos. D28342D28465), except for the identical sequences in the database. (f ) Expression of cDNA clones
The pKA1 possesses a T7 RNA polymerase promoter adjacent to the cDNA cloning site, which enables us to do in vitro transcription followed by in vitro translation using T7 RNA polymerase and rabbit reticulocyte lysates to obtain the protein encoded by the cDNA. If C3”S]methionine is added in the reaction mixture, we can obtain a radioisotope-labeled protein which is available for determination of the molecular mass of the translated product and for a binding assay to search a ligand or a substrate interacting with the product. We have confirmed that more than 200 clones selected from the present cDNA libraries could be translated in vitro (data not shown). A eukaryotic promoter on the vector enables us to express a cDNA in mammalian cells. Recently, we have found that recombinant fl phage particles carrying a pKAl-cDNA vector can transfect COS7 cells with DEAE-dextran and the introduced single-stranded cDNA can replicate and produce the corresponding protein in the cells (Yokoyama-Kobayashi and Kato, 1993). Based on this finding, we have developed a large-scale expression system in which only several tens of pl of E. coli culture medium containing fl phage particles are necessary for transfection without purifying the cDNA vector.
249 TABLE
III
Conserved
sequences
HP No.’
at the putative
cap site and their polymorphism
Proteinb
s-temlinal
seq”encec
(S-3’) HP0031 8 HP003 18 HP003 18 HP0031 8 HP00393 HP00017 HP00017 HP00155 HP00155 HP00155 HP00155 HP00155 HP00037 HP00249 HP00287 HP00287 HP00049 HP00049 HP00146 HP00146 HP00146 HP00146 HP00146 HP00146 HP00007 HP00223 HP003 19 HP00319 HP00319 HP00319 HP00113 HP00113 HP00113 HP001 13 HPOGi 13 HP00320 HP00320 HP00200 HP00200 HP00200 HP00200 HP00063 HP00063 HP00063 HP00063 HP00379 HP00379 HP00379 HP00309 HP00309 HP00301 HP00301 HP00301 HP00301 HP00301 HP00133 HP00133 HP00028 HP00028 HP00316 HP00316 HP00171 HP00366 HP00366 HP00308 HP00308 HP00308
23-kDa highly basic protein 23-kDa highly basic protein 23-kDa highly basic protein 23-kDa highly basic protein ATP synthase c subunit (P2 form) Breast basic conserved protein 1 EB virus small BNAs associated protein EB virus small BNAs associated protein Elongation factor l-a Elongation factor l-a Elongation factor l-a Elongation factor l-a Elongation factor l-a Elongation factor 1-p Elongation factor 1-r G protein p subunit-like protein 12.3 G protein p subunit-like protein 12.3 hnBNP Al hnBNP Al Laminin-binding protein Laminin-binding protein Laminin-binding protein Laminin-binding protein Laminin-binding protein Laminin-binding protein Nucleolar phosphoprotein B23 Nucleoside diphosphate kinase B QM protein QM protein QM protein QM protein r-phosphoprotein Pl r-phosphoprotein Pl r-phosphoprotein Pl r-phosphoprotein Pl r-phosphoprotein Pl r-phosphoprotein P2 r-phosphoprotein P2 r-protein L31 r-protein L31 r-protein L31 r-protein L31 r-protein L31 r-protein L37A r-protein L37A r-protein L37A r-protein L37A r-protein L38-like r-protein L38-like r-protein L38-like r-protein L7a r-protein L7a r-protein L9-like r-protein L9-like r-protein L9-like r-protein LPlike r-protein LPlike r-protein S25 r-protein S25 r-protein S6 r-protein S6 Translationary controlled tumor protein Translationary controlled tumor protein Ubiquinone-binding protein Ubiquitin-52 aa fusion protein Ubiquitin-52 aa fusion protein Ubiquitin-80 aa fusion protein Ubiquitin-80 aa fusion protein Ubiquitin-80 aa fusion protein
TTTTTTTTTCCAAGCGGCTGCCGAAGATGG CTTTTTTTCCAAGCGGCTGCCGAAGATGG CTTTTTCCAAGCGGCTGCCGAAGATGG CTTTTCCAAGCGGCTGCCGAAGATGG TCTTCTCTGCAGTGGGAGCAGCTCTCCTGC CTTTCCGCTCGGCTGTTTTCCTGCGCAGGA CTTTTTTTTTCTAACTCCGCTGCCGCCATG CTTTCTAACTCCGCTGCCGCCATG CTTTTTTTTTCGCAACGGGTTTGCCGCCAG CTTTTTTTTCGCAACGGGTTTGCCGCCAG CTTTTTTCGCAACGGGTTTCCCGCCAG CTTTTTCGCAACGGGTTTGCCGCCAG CTTTTCCGCAACGGGTTTGCCGCCAG CTTTTTCCTCTCTTCAGCGTGGGGCGCCCA CTTTCTTTGCGGAATCACCATGGCGGCTGG CTTTTTTTCACTGCAAGGCCGCGGCAGGAG CTTTCACTGCAAGGCGGCGGCAGGAG CTTTTTTTTCTGCCCGTGGACGCCGCCGAA CTTTCTGCCCGTGGACGCCGCCGAA CTTTTTTTCCGTGCTACCTGCAGAGGGGTC TTTTTTTCCGTGCTACCTGCAGAGGGGTC CTTTTTTCCGTGCTACCTGCAGAGGGGTC CTTTTTCCGTGCTACCTGCAGAGGGGTC TTTTTCCGTGCTACCTGCAGAGGGGTC CTTTTCCGTGCTACCTGCAGAGGGGTC CTTTCCCTGGTGTGATTCCGTCCTGCGCGG TCCTCCGCCGCCGGCTCCCGGGTGTGGTGG TCCTTTCCCTTCGGTGTGCCACTGAAGATC CTCTTTCCCTTCGGTGTGCCACTGAAGATC TTTCCCTTCGGTGTGCCACTGAAGATC TCTTTTTCGTTGCAGCCACTGAAGATC CTTTTTTTCCTCAGCTGCCGCCAAGGTGCT CCCCTTTCCTCAGCTGCCGCCAAGGTGCT CCTTTTCCTCAGCTGCCGCCAAGGTGCT CCTTTCCTCAGCTGCCGCCAAGGTGCT CTTTCCTCAGCTGCCGCCAAGGTGCT CTTTTTTTCCTCCCTGTCGCCACCGGGGTC CTTTTCCTCCCTGTCGCCACCGAGGTC CTTTTTTTCCAACTTGGACGCTGCAGAATG CTTTTTCCAACTTGGACGCTGCAGAATG CTTTTCCAACTTGGACGCTGCAGAATG CTTCCTTTCCAACTTAGACGCTGCAGAATG CCTTTCCAACTTGGACGCTGCAGAATG CTTTTCTTTCTAGGCTCGGACCTAGGTCGC CTTCCTTTCTGGGCTCGGACCTAGGTCGC CTTTTTCTGGGCTCGGACCTAGGTCGC CTTTCTGGGCTCGGACCTAGGTCGC CTTTTTTTTTTCGTCCTTTTCCCCGGTTGC CTTCTTTTTCGTCCTTTTCCCCGGTTGC CTTTTTTCGTCCTTTTCCCCGGTTGC CTTTTTTTTCTCTCTCCTCCCGCCGCCCAA CCTTTTCTCTCTCCTCCCGCCGCCCAA TTTTTTTTTTTTTTGCTGCGTCTACTGCGA CTTTTTTTTGCTGCGTCTACTGCGA CTTTTCTTTGCTGCGTCTACTGCGA CTTTTTTTGCTGCGTCTACTGCGA CTTTCTTTGCTGCGTCTACTGCGA TTTTTTTTTTTTTTTTTTTTTGTCCGACAT CTTCCTTTTTGTCCGACAT CTCTTTTCCGTGGCGCCTCGGAGGCGTTC CTTTTTTTCCGTGGCGCCTCGGAGGCGTTC CTTTTTTTTCCGCCCGCTCCCCCCTCCCCC CTTTTCCGCCCGCTCCCCCCTCCCCC CTCTTTCTGGTCAAAATGGCTGGTAAGCAG TTTTTTTTTCTTCAGCGAGGCGGCCGAGCT TTCTTTTTCTTCAGCGAGGCGGCCGAGCT CTTCCTTTTCGATCCGCCATCTGCGGTGGA CTTTTTTCGATCCGCCATCTGCGGTGGA CTTTTTCGATCCGCCATCTGCGCTGGA
_ “,bSee Table 11. “The 5’-terminal
sequence
of the cDNA
containing
the putative
cap site sequence.
250 (g) Conclusions (I ) We constructed a multifunctional phagemid shuttle vector, pKA1, which enables us to do uni-directional cDNA cloning, to prepare an antisense single-stranded DNA, to prepare a sense RNA and to transfect mammalian cells without subcloning into other vectors. (2) Using this vector as a vector primer, we prepared a high-quality cDNA library from poly(A)+RNAs of human cell lines. The sequence analysis and in vitro translation experiment showed that the library contains intact protein-encoding cDNA clones (IPEC) at a frequency of more than 75%. (3) We have prepared a prototype of the human protein bank composed of the cDNAs encoding 236 human proteins. Most of them contain putative cap site sequences. We have found out that the cap site of cDNAs encoding some families of proteins, such as protein-synthesisrelated proteins, has the pyrimidine-rich conserved sequence motif C,T,C. The polymorphism at the cap site sequence was also found in these cDNAs. (4) We demonstrated that the vector functions well for determination of the partial or entire nt sequence of the cDNA, in vitro translation and expression in mammalian cells.
ACKNOWLEDGEMENTS
We thank Dr. Y. Deng, and Mss. N. Aida, F. Ohmori, T. Kato and M. Saeki for technical assistance with the sequencing analysis. This study was partly supported by a Grant-in-Aid for Creative Basic Research (Human Genome Program) from The Ministry of Education, Science, and Culture, Japan.
REFERENCES Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., Kerlavage, A.R., McCombie, W.R. and Venter, J.C.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252 (1991) 1651-1656. Adams, M.D., Dubnick, M., Kerlavage, A.R., Moreno, R., Kelley, J.M., Utterback, T.R., Nagle, J.W., Fields, C. and Venter, J.C.: Sequence identification of 2,375 human brain genes. Nature 355 (1992) 632-634. Adams, M.D., Kerlavage, A.R., Fields, C. and Venter, J.C.: 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nature Genet. 4 (1993a) 256-267. Adams, M.D., Soares, M.B., Kerlavage, A.R., Fields, C. and Venter, J.C.: Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nature Genet. 4 (1993b) 3733380. Fromont-Racine, M., Bertrand, E., Pictet, R. and Grange, T.: A highly
sensitive method for mapping the 5’ termini of mRNAs. Nucleic Acids Res. 21 (1993) 168331684. Gubler, U. and Hoffman, B.J.: A simple and very efficient method for generating cDNA libraries. Gene 25 (1983) 263-269. Hariharan, N. and Perry, R.P.: Functional dissection of a mouse ribosomal protein promoter: significance of the polypyrimidine initiator and an element in the TATA-box region. Proc. Natl. Acad. Sci. USA 87 (1990) 152661530. Kaspar, R.L., Kakegawa, T., Cranston, H., Morris, D.R. and White, M.W.: A regulatory cis element and a specific binding factor involved in the mitogenic control of murine ribosomal protein L32 translation. J. Biol. Chem. 267 (1992) 508-514. Khan, AS., Wilcox, A.S., Polymeropoulos, M.H., Hopkins, J.A., Stevens, T.J., Robinson, M., Orpana, A.K. and Sikela, J.M.: Single pass sequencing and physical and genetic mapping of human brain cDNAs. Nature Genet. 2 (1992) 180-185. Kim, N.-S., Umezawa, U., Ohmura, S. and Kato, S.: Human glyoxalase I. cDNA cloning, expression, and sequence similarity to glyoxalase I from Pseudomonasputida. J. Biol. Chem. 268 (1993a) 11217-l 1221. Kim, N.-S., Kato, T., Abe, N. and Kato, S.: Nucleotide sequence of human cDNA encoding eukaryotic initiation factor 4AI. Nucleic Acids Res. 21 (1993b) 2012. Levy, S., Avni, D., Hariharan, N., Perry, R.P. and Meyuhas, 0.: Oligopyrimidine tract at the 5’ end of mammalian ribosomal protein mRNAs is required for their translational control. Proc. Natl. Acad. Sci. USA 88 (1991) 3319-3323. Lipman, D.J. and Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227 (1985) 14351441. Maruyama, K. and Sugano, S.: Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138 (1994) 171-174. Meyuhas. 0. and Perry, R.P.: Construction and identification of cDNA clones for mouse ribosomal proteins: application for the study of r-protein gene expression. Gene 10 (1980) 113-129. Monk, R.J., Meyuhas, 0. and Perry, R.P.: Mammals have multiple genes for individual ribosomal proteins. Cell 24 (1981) 301-306. Morris, D.R., Kakegawa, T., Kaspar, R.L. and White, M.W.: Polypyrimidine tracts and their binding proteins: regulatory sites for posttranscriptional modulation of gene expression. Biochemistry 32 (1993) 2931-2937. Oh, S., Iwahori, A. and Kato, S.: Human cDNA encoding DnaJ protein homologue. Biochim. Biophys. Acta 1174 (1993) 114-116. Okayama, H. and Berg, P.: High-efficiency cloning of full-length cDNA. Mol. Cell. Biol. 2 (1982) 161-170. Okubo, K., Hori, N., Matoba, R., Niiyama, T., Fukushima, A., Kojima, Y. and Matsubara, K.: Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet. 2 (1992) 1733179. Pruitt, SC.: Expression vectors permitting cDNA cloning and enrichment for specific sequences by hybridization/selection. Gene 66 (1988) 121-134. Sekine, S. and Kato, S.: Synthesis of full-length cDNA using DNAcapped mRNA. Nucleic Acids Res. Symp. Ser. No. 29 (1993) 143-144. Sikela, J.M. and AufIray, C.: Finding new genes faster than ever. Nature Genet. 3 (1993) 1899191. Wool, LG., Endo, Y., Chan, Y.-L. and Gluck, A.: Structure, function, and evolution of mammalian ribosomes. In: Hill, W.E., Dahlberg, A., Garrett, R.A., Moore, P.B., Schlessinger, D. and Warner, J.R. (Eds.), The Ribosome: Structure, Function, & Evolution. Am. Sot. Microbial., Washington, DC, 1990, pp. 203-214. Yokoyama-Kobayashi, M. and Kato, S.: Recombinant fl phage particles can transfect monkey COS-7 cells by DEAE dextran method. Biochem. Biophys. Res. Commun. 192 (1993) 935-939.