METHODS: A Companion to Methods in Enzymology Vol. 2, No. 1, February, pp. 32-41, 1991
Cloning Multigene Families with Degenerate PCR Primers Thomas M. Wilkie and Melvin I. Simon Biology Division, California Institute of Technology, Pasadena, California 91125
Multigene families often exhibit regions of highly conserved amino acid sequence that are characteristic of a given family. Degenerate PCR primers that target these motifs will selectively amplify members within that multigene family. Members of the family that are expressed in a particular tissue can be amplified from cDNA. The PCR products can be easily cloned, screened, and sequenced over a precise region to rapidly identify novel genes. PCR with degenerate primers can also be used to screen the expression of these genes in various tissues and cell types. © 1991 AcademicPress, Inc.
a target sequence for PCR amplification, how to design degenerate PCR primers, how to bias the PCR primers toward particular subfamilies, and how to screen clones to identify those genes that are most abundant in certain tissues or cell types. Following these protocols, we have cloned PCR fragments from 19 distinct genes that are members of the tyrosine kinase subfamilies. At least 6 of the genes have not been previously identified in any organism and 5 genes are not closely related to any of the recognized tyrosine kinase subfamilies.
MATERIALS AND REAGENTS Growth and development are complex processes that Enzymes are controlled, in part, by genes that constitute various M-MLV reverse transcriptase: BRL or Pharmacia. multigene families. It is likely that many developmental AmpliTaq: Perkin-Elmer/Cetus. processes are regulated by as yet unknown genes that are Restriction enzymes: New England Biolabs. also members of multigene families. Various studies have T4 DNA ligase: New England Biolabs. employed degenerate oligonucleotide primers in the PCR T4 DNA kinase: New England Biolabs. to clone new genes within these families, including protein Multiprime radiolabeling kit: Amersham or Stratagene. kinases (9, 37), seven transmembrane receptors (15), G proteins (4, 32), ion channels (9), and putative transcrip- Chemicals and Buffers tion factors such as the POU (7) and helix-loop-helix Reverse transcriptase buffer (5X): 250 mM Tris-HC1 proteins (8). By virtue of conservation of the genetic code, (pH 8.3), 375 mM KC1, 50 mM DTT, 15 mM MgC12. the degenerate PCR primers can also be used to clone Random hexanucleotides [pd(N)6]: Pharmacia. genes within a particular family from distantly related Deoxynucleotides (dATP, dCTP, dTTP, dGTP): organisms. For example, G protein a-subunits from Dro- Pharmacia. sophila (31), Caenorhabditis elegans (16), Dictyostelium Taq polymerase buffer (10×): 100 mM Tris-HC1 (pH (J. Hadwiger, personal communication), Arabidopsis (17), 8.3), 500 mM KC1, 15 mM MgC12, 0.01% (w/v) gelatin. and Neurospora (K. Borkovich, personal communication) Oligonucleotide primers for PCR: Prepared on an ABI have been cloned using the same degenerate PCR primers DNA synthesizer. that were originally used to clone several novel mouse G NuSieve agarose: FMC. genes (32). It is apparent that PCR cloning is a powerful E buffer (10X): 400 mM Tris-HC1, 330 mM Na acetate, technique and may reveal novel genes that play central 10 mM EDTA, 12.5% glacial acetic acid (pH 7.9). roles in the biological processes of a variety of organisms. Restriction enzyme mix: 2 #1 5× buffer (manufacturer's We have endeavored to clone novel members of the specifications), 0.4 ttl enzyme 1, 0.4 ttl enzyme 2, 5.2 tyrosine kinase receptor subfamily that are expressed in ttl dH20. male mouse germ cells. The detailed descriptions of the Ligation mix: 1.5 #1 10X ligase buffer, 1.5 ~1 dH20, 0.75 methods that we used to identify these genes are generally t~l 10 mM ATP, 0.75 zl T4 DNA ligase. applicable to any multigene family that exhibits blocks Ligase buffer (10X): 500 mM Tris-HC1 (pH 7.8), 100 of conserved amino acid motifs. We discuss how to choose mM MgC12, 20 mM DTT, 10 mM ATP. 32
1046-2023/91 $3.0( Copyright © 1991 by Academic Press, Inc All rights of reproduction in a n y form reserved
DEGENERATE PCR PRIMERS Cloning vector (such as Bluescript KS II): Stratagene. SOC medium: LB, 0.4% glucose, 10 mM MgC12, 10 mM MgSO4. Denaturing solution: 0.5 M NaOH, 1.5 M NaC1. Neutralizing solution: 0.5 M Tris-HC1, 1.5 M NaC1 (pH 7.5). STET: 8% sucrose, 5% Triton X-100, 50 mM Tris-HC1, pH 8.0, 50 mM EDTA. TE: 10 mM Tris-HC1, 0.25 mM EDTA (pH 7.5). Sequencing reagents (Sequenase sequencing kit or order separate reagents): U.S. Biochemical. Sequencing primers (M13 universal, M13 reverse, KS, and SK if using the Stratagene Bluescript KS II cloning vectors): Stratagene. Sequencing S-mix: 2 #l sequencing buffer, 1 #1 0.1 M DTT, 0.5 #l labeling mix (dGTP or dITP), 0.5 #l [35S]dATP, 0.25 ttl Sequenase version 2.0, 0.5 #l DMSO. Sequencing buffer (10×): 200 mM Tris-HC1 (pH 7.5), 100 mM MgC12, 250 mM NaC1. [3~S]dATP > 1000 Ci/mmol: Amersham. Sequencing stop solution: 95% formamide, 20 mM EDTA, 0.05% bromphenol blue, 0.05% xylene cyanol. GeneScreen nylon filter: DuPont.
Equipment Positive displacement pipets: Baxter Scientific Products (SMI Digitron), Gilson (Microman M-25). Microfuge: Eppendorf. Speedvac. Agarose gel electrophoresis equipment: Lab shop. Sequencing gel electrophoresis equipment: BRL or Bio-Rad. PCR machine: Perkin-Elmer/Cetus or Ericomp. Flat-bottom and round-bottom microtiter dishes: Falcon.
Optional GeneAmp reaction tubes: Perkin-Elmer/Cetus. Electronic digital repeat pipet: Rainin (EDPTM motorized microliter pipets EDP-25, EDP-100). Electroporation equipment: Bio-Rad.
METHODS Design of Degenerate PCR Primers The PCR requires two oligonucleotide primers that are complementary to opposite strands of the target sequences (19, 26, 28). To identify new genes within a multigene family, at least two regions of highly conserved amino acid sequence should be common to most members within the family. The amino acid sequence of the conserved motifs will dictate the DNA sequence of degenerate oligonucleotides that are to serve as PCR primers. The proper orientation of the primers is critical to the PCR;
33
the "sense" primer should target the upstream conserved motif and the opposing "antisense" primer should complement the coding strand of the downstream motif. These conserved motifs should not be less than 5 amino acid residues in length and should be separated from each other by at least 20 amino acids. When cDNA is used as the starting template in the PCR, it is advantageous for the primers to flank an intron, thus reducing the possibility of amplifying and cloning genomic DNA that may have contaminated the original RNA sample. To distinguish the targeted sequences from nonspecific PCR products that might be cloned, the amino acid sequence between the conserved motifs must be diagnostic for that multigene family. The PCR requires that the target sequence within the degenerate primers be at least 15 nucleotides long but may contain up to 27 or 30 nucleotides. This translates to a conserved motif of at least 5 amino acid residues, but the best primers are generally derived from 6 to 8 amino acids of uninterrupted identity. Longer primers are useful when two blocks of 2 to 4 highly conserved amino acids are separated by 2 or 3 residues that are less well conserved. The DNA sequence of degenerate primers can be made to take into account all possible nucleotide sequences coding for the amino acids at the primer binding site. This approach is recommended for amino acid motifs that are identical in every known member of the multigene family. If the motifs are not highly conserved, the degeneracy of the primers can become too great to support the PCR. When many members within a multigene family are known, degeneracy can be reduced if sets of primers are synthesized to target specific classes within the family. Another approach is to synthesize primers containing a consensus sequence derived from the nucleotide sequence of the known genes across the conserved region (15, 38). Degeneracy can be further reduced if inosine is used at any nucleotide position that has three- or fourfold degeneracy (15). We have analyzed the cDNA sequence of several genes at the PCR primer binding site and have not yet detected a bias in inosine pairing to the starting template. We found that guanine specifically replaces inosine in the DNA sequence of the cloned PCR fragments, suggesting that Taq polymerase incorporates cytosine opposite inosine during the PCR. In at least one study, only the degenerate primers that contained inosine at degenerate positions amplified the target sequence (8). Degenerate primers may be synthesized to include restriction enzyme sites on their 5' ends. This provides the obvious advantage that the PCR products may be directionally cloned. In our experience, BamHI and EcoRI have always performed well, but HindIII, SalI, and PstI have also been used (15, 28). Addition of two nucleotides (G or C) beyond the restriction sites at the 5' end of the primers facilitates enzyme recognition. One disadvantage of using restriction enzyme sites to directionally clone the PCR products may be encountered if the recognition se-
34
WILKIE AND SIMON
A -
-
-
277
295
I
I
I
II
310 322
I
I
III
345
384
405
428
446
475
I
I
I
I
I
I
VI
VII
V
IV
VIII
CT73---~
B
H
R
D
L
A
A
R
IX
500
I
X
CC
C T
-
-
4~--CT74
N
CT73 5' CGGATCCAT AGI GAT CTI GCI GCI AGI AA 3 ' - - / / Barn H1
-
XI
I1,,
C
Eco R1
/ / - - 3 ' CTA CAI ACC AGI AAA CCI CA CTTAAGG 5' CT74 G TG T D
V
W
S
F
Y
G
V
I
F I G . 1. Design of degenerate PCR primers. (A) The positions of the 11 conserved amino acid motifs within the catalytic domain of protein kinases are indicated by r o m a n numerals. The arabic numerals refer to the position of the central amino acid of each domain in the chicken src protein (31). The arrows mark the position and orientation of the degenerate P C R primers CT73 and CT74 (shown in B) t h a t were used to amplify and clon e novel members of the tyrosine kinase subfamilies. (B) CT73 and CT74 were synthesized in the sense and antisense orientation, respectively. The DNA sequence of the PCR primers was derived from the consensus amino acids in boldface type above CT73 and below CT74. Degenerate positions in the nucleotide sequence are indicated by either two nucleotides or an inosine. The restriction endonuclease sites BamHI and EcoRI used for cloning the PCR products are underlined. D N A polymerization during the PCR extends from the 3' end of either primer in the direction of the arrows.
quences are also present within the target DNA between the primers. In this case, the entire sequence between the PCR primers can be obtained by ligating the PCR fragments directly into a blunt-end cloning site (SmaI or EcoRV) in the vector. If this approach is taken, it is recommended that the cloning site in the vector be dephosphorylated with alkaline phosphatase to reduce vector reclosure. For ligation into a dephosphorylated vector it is essential to phosphorylate either the degenerate primers or the PCR products with polynucleotide kinase (27). Taq polymerase frequently adds an adenine nucleotide to the 3' end of duplex DNA fragments (3). Therefore, prior incubation of the PCR products with the Klenow fragment of DNA polymerase I will increase the efficiency of bluntend ligations. PCR products can also be prepared for blunt-end ligation with T4 DNA polymerase (14). Degeneracy at the 3' ends of PCR primers is best held to a minimum (30). If possible, the last one or two amino acids should be conserved among all family members in the targeted region. In addition, the PCR primers usually do not include degeneracy at the position of the 3' nucleotide. This is partly for expediency because oligonucleotides are synthesized from 3' to 5', with the 3' nucleotide coupled to a column. One or two inosine residues can be used at the 3' end if this position must be degenerate, although inosine-coupled columns are not yet commercially available and degenerate 3' ends greatly reduce the efficiency of PCR amplification.
Design of Degenerate PCR Primers Targeting Tyrosine Kinase Receptors Most of the suggestions given above for designing degenerate oligonucleotide primers were applied to cloning members of the tyrosine kinase multigene family. The search for PCR primer binding sites focused on the catalytic domain because it is generally conserved among all
protein kinases. Within the catalytic domain there are 11 amino acid motifs that are highly conserved (Fig. 1A). The motifs most well suited to PCR primer sites occur at regions I, VI, VII, VIII, and IX [their amino acid sequences are found in Ref. (6)]. We chose the sequences HRDLAARN within motif VI and DVWS(FY)G(VI) within motif IX to synthesize the sense and antisense oligonucleotide primers, CT73 and CT74, respectively (Fig. 1B). These sequences were chosen because they were the most highly conserved motifs among the tyrosine kinase receptors and they were not expected to amplify members of the serine/threonine protein kinase subfamily. Note that these primers would also be expected to amplify genes within the Abl class of tyrosine kinases, which are not transmembrane proteins, but this was tolerated in an effort to include as many potential receptors as possible among the PCR products. Degeneracy in the PCR primers CT73 and CT74 was held to a minimum toward their 3' ends. For example, motif IX includes a serine that was omitted from the 3' end of CT74 because serine codons are highly degenerate. Of lesser importance, but perhaps influential to the distribution of clones that were obtained, the serine position within CT74 was assigned only the "AGN" anticodons. Toward the 5' end, CT74 includes two positions of amino acid degeneracy because either phenylalanine or tyrosine and either valine or isoleucine were found in motif IX among the tyrosine kinase receptors. The DNA sequences of CT73 and CT74 also included inosines at all nucleotide positions with three- and fourfold degeneracy. An additional feature of these primers is that regions VI and IX are separated by about 60 amino acids containing two regions of amino acid conservation (motifs VII and VIII) diagnostic for protein kinases (Figs. 1A and 2). Although it is important to include diagnostic sequence between the PCR primers, the presence of highly con-
DEGENERATE PCR PRIMERS
35
clone YK2
~ 3
392 443 VLVSSNDCVKLGDFGLSRYME.DSTYYKA.SKGKLPIKWMAPESINFRRFTSAS
class Src
aene
rCf.
YKI9 YK6 YKI7
2 5 2
CLVGENHLVKVADFGLSRLMT.GDTYTAH.AGAKFPIKWTAPESLAYNKFSIKS CLVGENHVVKVADFGLSRLMT.GDTYTAH.AGAKFPIKWTAPESLAYNTFSIKS CLVGENHVVKVADFGLSRLMT.GDTYTAH.AGAKFPIKWTAPESLAYNTFSIKS
Abl
Abl Arg Arg
a b b
YK9 YKI YK5
3 4 5
CMVAEDFTVKIGDFGMTRDIY.ETDYYRKGGKGLLPVRWMSPESLKDGVFTTHS CMLNENMSVCVADFGLSKKIY.NGDYYRQGRIAKMPVKWIAIESLADRVYTSKS CMLAEDMTVCVADFGLSRKIY.SGDYYRQGCASKLPVKWLALESLADNLYTVHS
INSR
IGFIR JTKII
c d
22 1 2 1 18 5 6
VLLAQGKIVKICDFGLARDIMHDSNYVSK.GSTFLPVKWMAPESIFDNLYTTLS VLICEGKLVKICDFGLARDIMRDSNYISK.GSTFLPLKWMAPESIFNSLYTTLS ILLSESDIVKICDFGLARDIYKDPDYVRK.GSARLPLKWMAPESIFDKVYTTQS ILLSENNVVKICDFGLARDIYKNPDYVRR.GDTRLPLKWMAPESIFDKVYSTKS ILVAEGRKMKISDFGLSRDVYEEDSYVKK.SKGRIPVKWMAIESLFDHIYTTQS VLVTEDNVMKIADFGLARD.IHHIDYYKKTTNGRLPVKWMAPEALFDRIYTHQS VLVTENNVMKIADFGLARD.INNIDYYKKTTNGRLPVKWMEALFDRVYTHQS
PDGFR
PDGFR PDGFR
e f
Ret bFGFR bek
g *h *i
JTK5 FD22 FDI7 TYK2
d *j *j k
YK4 YKI6 YKI0 YKI5 YKI8 YK8 YKII
YK3 2 YK7 6 YKI3 3 YKI4 1 YKI2 1 consensus
ILVGENYIAKIADFGLSRG...QEVYVKK.TMGRLPVRWMAIESLNYSVYTTNS CVIDDTLQVKITDNALSRDLF.PMDYHCLGDNENRPVRWMALESLVNNEFSSAS VLVESEHQVKIGDFGLTKAIETDKEYYTVKDDRDSPVFWYAPECLIQCKFYIAS ILVENENRIKIGDFGLTKVLPQDKEYYKVKEPGESPIFWYAPESLTESKFSVAS VLLDNDRLVKIGDFGLAKAVPEGHEYYRVREDGDSPVFWYAPECLKECKFYYAS DFGL Y P W E VII
VIII
F I G . 2. Predicted a m i n o acid sequence from 19 tyrosine kinase clones. T h e a m i n o acid sequence of each Y K clone was p r e s e n t in a contiguous open reading frame from t h e B a m H I site in CT73 to t h e EcoRI site in CT74, b u t only t h e sequence between t h e p r i m e r s is shown. T h e clone n a m e a n d t h e n u m b e r of clones identified by D N A sequence are indicated to t h e left of each sequence. T h e clones are grouped by subfamily according to homology scores obtained from a T F A S T A search (20) of t h e G e n B a n k nucleotide sequences. Five clones, YK3, 7, 12, 13, a n d 14, are only distantly related to t h e recognized tyrosine kinase subfamilies. Homolognes of t h e Y K clones t h a t have been previously identified in o t h e r o r g a n i s m s are indicated by gene n a m e a n d reference; an asterisk indicates t h a t it h a s been cloned from m o u s e (a, 29; b, 13; c, 36; d, 21; e, 2; f, 5; g, 34; h, 25; i, 10; j, 37; k, 12). A c o n s e n s u s position was defined by a m i n o acid identity in 18 of t h e 19 sequences shown; t h e sequences are aligned between a m i n o acids 392 a n d 443 in t h e chicken src protein (35). Boldface type in t h e tyrosine kinase sequences indicates a g r e e m e n t with t h e consensus. T h e conserved motifs VII a n d VIII are located by t h e double line.
served motifs could enhance template crossover during the PCR. Template crossover is a P C R artifact t h a t joins two different sequences, ABC and aflT, at a region of local sequence identity (B and fl) to produce a novel P C R fragm e n t of sequence AB/~% Template crossover is stimulated under conditions of incomplete extension from the primer, as occurs during the plateau phase of the PCR. We did not detect template crossover in any of the YK clones, but it occurred in about 2% of the G a-subunit P C R fragments (32). In one case, sequence identity between two genes in 10 contiguous nucleotides was sufficient to achieve template crossover and, in another case, 16 of 17 nucleotides were identical in the two participating sequences. It cannot be assumed t h a t any two sequences t h a t are identical through a portion of the P C R fragment, but then diverge, indicate a P C R artifact. One set of clones with this characteristic proved to be the product of alternate splicing in the Go a-subunit gene (33). These cDNAs were cloned because the P C R primers flanked an absolutely conserved intron in the G a multigene family t h a t participated in alternative splicing in the Go gene. Although the P C R primers CT73 and CT74 also flank introns present in mammalian tyrosine kinase genes (1, 13, 18), we did not find alternative splice products.
cDNA Synthesis (First Strand Only) T h e P C R requires single- or double-strand DNA as a starting template to amplify a target sequence. T o clone members of a multigene family t h a t are expressed in a specific tissue, it is most convenient to use single-strand cDNA prepared from that tissue as the input template. cDNA may be synthesized from poly(A) or total RNA. Synthesis of the first strand of cDNA may be primed from an oligonucleotide specific to a given multigene family, from oligo(dT), or from random hexadeoxynucleotide primers. T h e latter is by far the most versatile because cDNA from a single reaction may be used for P C R amplification of any target gene(s), irrespective of whether the P C R primers are located toward the 3' or 5' end of the sequence. T h e tyrosine kinase clones shown in Fig. 2 were amplified from cDNA synthesized using r a n d o m primers according to the following protocol: 2-5 #g total RNA from cells or tissues of interest 1 #g random hexanucleotides [pd(N)6] 1 mM each d N T P 1× reverse transcriptase buffer 400 u M - M L V reverse transcriptase 50-ttl reaction volume
36
WILKIE AND SIMON Incubate 90 rain at 37°C Store at - 2 0 ° C
PCR Amplification of Multigene Families Optimal conditions for the use of degenerate primers in the PCR must be empirically determined for each primer pair. In general, the greatest diversity of clones within a multigene family can be obtained by using the lowest annealing temperature possible that yields only a discrete band(s) of the expected size. The optimal annealing temperature will vary depending on the length of the target sequence and the ratio of A + T to G+C content among the degenerate oligonucleotides. In general, annealing temperatures between 37 and 45°C are recommended for target sequences of 14-17 nucleotides; for target sequences of 24-27 nucleotides, try 50 to 65°C. If the annealing temperature during PCR is too low, nonspecific DNA fragments may appear as a band(s) of unexpected size or as an intense background smear following electrophoresis through an agarose gel. It is worth spending some time testing various PCR conditions before cloning and sequencing the PCR products. Suboptimal conditions will restrict the diversity and decrease the proportion of clones that are members of a given multigene family. In this case, most or all of the clones may be unrecognizable, nonspecific sequences. If the background is unacceptably high, first check that during the denaturing step the sample temperature is at least 92°C but not more than 94°C; temperatures in excess of 94°C may denature Taq polymerase after several cycles, thus inhibiting DNA amplification. To suppress nonspecific DNA amplification, try increasing the annealing temperature and perhaps shortening the time that the samples are held there. If the background persists, then try reducing the concentration of the PCR primer. If all else fails, redesign the PCR primers to reduce their degeneracy, particularly at the 3' ends. The PCR is extremely sensitive; therefore, it is imperative to avoid even minute contamination of the PCR reagents with DNA that contains sequence complementary to the primers. To avoid contamination, aliquot the PCR template(s) only after all other aqueous components have been added to the reaction vessels. In addition, positive displacement pipets can be used in all procedures for the PCR. To verify that reagents are clean, routinely set up a control PCR reaction in which no template DNA is added and other controls in which only one of the two PCR primers is added. When loading the PCR products onto an agarose gel for electrophoresis, always leave an empty well between two different samples. If the isolated PCR products are to be reamplified, form the agarose gels on clean gel plates and, if contamination is a problem, soak the plates in 0.1 M HC1 and rinse with dH20 before pouring the gel. After the agarose gel has been stained, always place it on plastic wrap rather than directly onto a uv transilluminator that is in general use.
The conditions given below were used with the degenerate PCR primers CT73 and CT74 (Fig. 1B) to amplify the tyrosine kinase clones shown in Fig. 2. These conditions gave a discrete band of expected size with little or no background (Fig. 3A). T h e same conditions were also used to obtain a variety of G protein a-subunits using two different sets of P C R primers specific to th a t family (32). To PCR amplify cDNA use: 1 #1 cDNA as prepared above 9 #l PCR mix: 100 ng each primer (see detail of primer design) 200 #M each d N T P 1× Taq DNA polymerase buffer 1 u AmpliTaq 10-#1 reaction volume 35 cycles: 1 min 94°C 1.5 min 37-65°C 2 min 72°C
denature anneal extend
10-min extension at 72°C at the end of the 35 cycles. All reaction volumes can be scaled up if desired. A second round of PCR is generally recommended to increase the amount of DNA available for cloning and subsequent screening protocols. This also provides an opportunity to separate multiple bands which are the expected products of some multigene families. 1. Load 5 #1 of the first round of PCR reactions in every second well of a 3% NuSieve agarose minigel submerged in 0.5X E buffer. DNA fragments between 100 and 1000 bp can be resolved in a 3% NuSieve gel. 2. Visualize the amplified DNA by submerging the agarose gel in an ethidium bromide solution (10 ng/ml) for 3 min at room temperature. Rinse the gel with water a few times to remove excess ethidium bromide and illuminate the DNA-ethidium bromide complex with a long-wavelength uv source. Short-wavelength uv illuminators will nick the DNA unless it is shielded by several layers of plastic wrap. Use a single-edge razor blade to excise the band of interest, taking care to trim away excess agarose that does not contain DNA. Use extreme caution that the DNA fragments are not cross-contaminated while manipulating these small pieces of agarose. Place the isolated cubes of agarose into separate 0.75-ml microfuge tubes. 3. Melt the agarose plug at 68°C for 2 min. 4. Add 1 ttl of the DNA that is contained in the molten agarose to 19 #1 of PCR mix. 5. Reamplify the DNA according to the conditions that were used in the first round of amplification. 1 ttl DNA (in NuSieve agarose, heated to 68°C 3 rain) 19 ~1 PCR mix:
DEGENERATE PCR PRIMERS A
LU .-J U
LU 2:
expression of the lacZ transcription unit is suppressed in these strains. The protocol for cloning PCR products is as follows: (D
EtBr
YK19 YK18
<-<_
uJ Z "o
wN i
w
37
U
,~ YK19 YK 8
F I G . 3. Tissue and cell-type distribution of PCR products from degenerate primers. The degenerate PCR primers CT73 and CT74 were used to PCR-amplify tyrosine kinase genes present in eDNA made from various mouse tissues (A) and germ cell types (B). Ethidium bromide (EtBr) staining demonstrates t h a t amplified products are present in each lane at approximately 205 bp. A Southern blot of the amplified products was hybridized with a radiolabeled PCR fragment of YK19 a n d a radiolabeled oligonueleotide specific to YK18 (ATCTTGGTGGCTGAGGGACGG).
200 ng each primer 200 #M each d N T P 1× Taq buffer 2 u AmpliTaq 20-#1 reaction volume 35 cycles as above
Cloning the Amplified PCR Products The PCR-amplified DNA can be easily cloned into any one of a variety of bacterial plasmid vectors that are suitable for preparing double-strand templates for DNA sequencing. High-copy-number plasmids are recommended because they usually increase the yield of DNA template. However, the multiple cloning sites (MCS) in these vectors are usually engineered into the a-fragment of the lacZ gene. If expression of a IacZ/PCR fusion product were detrimental to the host, it could affect the cloning and sequencing of that PCR product. For this reason, it is recommended that a bacterial strain that expresses the lac iQ repressor (e.g., JM101) be transformed because
1. Isolate PCR products by electrophoresis through a NuSieve agarose gel in 0.5× E buffer. The second round of amplification should yield at least 100 ng of DNA. 2. Stain the gel with ethidium bromide, visualize the bands of DNA on a longwave uv illuminator, and isolate each band into individual 0.75-ml microfuge tubes. 3. Melt the agarose cube(s) containing DNA by placing the microfuge tubes in a 68°C water bath for 2 min. 4. Add 2 ~l of the molten agarose to 8 #l of the restriction enzyme mix (see Chemicals and Buffers). Mix the contents and place in a 37°C water bath for 1-2 h. 5. Remelt the agarose plug at 68°C for 5 min. Add 0.5 #l of the appropriately cut vector DNA (1-20 ng). Mix and heat at 68°C for an additional minute. Remove the microfuge tube from the water bath and immediately add 4.5 #l of ligation mix. Mix the contents and incubate at room temperature for 1-2 h. 6. Remelt the agarose plug at 68°C for 2 min. Add 1.7 #l 1 M NaC1 and 30 ttl 95% ethanol (room temperature). Mix the contents. 7. Spin tubes in the microfuge for 5 min. Aspirate the ethanol and wash the pellet twice with 70% ethanol (room temperature). Dry the pellet in a speedvac or desiccation chamber. 8. To dissolve the DNA, add 7.5 ~1 water to the dried pellet and heat at 68°C for 3 rain. 9. Combine 5 #l of DNA and 40 #l of high-efficiency electrocompetent Escherichia coli (5 × l0 s colonies/#g supercoiled Bluescript KS+). Place on ice for 2 min. The cloning efficiency should be great enough to yield at least 200 colonies per ligation; electroporation generally yields thousands of transformants. 10. Electroporate and immediately add 800 #1 SOC medium to the E. coll. 11. Incubate the cells for 15 to 60 min at 37°C without shaking. 12. Plate 200 ttl of the SOC culture. If desired, the remaining transformed E. coli may be grown up in 5 ml of L B - a m p broth (100 #g/ml). Aliquot 1 ml for storage at - 7 0 ° C in 7.5% DMSO to retain a library of the transformants.
Screening the PCR Clones Electroporation of a ligation mix yields several thousand clones. A few of these clones can be picked at random and sequenced but it is often more efficient to first screen the population and sort clones based on their tissue distribution as determined by their hybridization patterns. The tyrosine kinase clones that we obtained were passed through a differential screen of this type to sort the clones prior to DNA sequencing. Since differential screens require duplicate hybridization filters, the original transformants were randomly picked into individual wells of
38
WILKIE AND SIMON
96-well microtiter dishes, grown to stationary phase, and transferred to four replica filters with a multichannel pipet. The bacteria were lysed on the nylon filters and the DNA was cross-linked to the filters by uv irradiation. The clones were sorted prior to sequencing according to their hybridization patterns to radiolabeled probes prepared by the Multiprime method from the PCR products of Sertoli cell, type A spermatogonium, and spermatid cDNA. The fourth filter was hybridized with the degenerate oligonucleotides CT73 and CT74, which were radiolabeled by T4 DNA kinase (27). Clones that hybridized to the PCR primers were assumed to contain insert and then were categorized according to their patterns of hybridization to the PCR probes. 1. Aliquot 150 #l L B - a m p culture medium into each well of a 96-well flat-bottom microtiter dish. 2. Pick isolated ampicillin-resistant colonies to inoculate individual wells in the microtiter dish. 3. Grow cultures until turbid (4-12 h) on a shaker platform at 37°C. 4. Use a multichannel pipet (8 or 12 channels) to transfer 10 #1 of each culture in an 8 X 12 grid onto GeneScreen nylon filters that have been placed on top of a damp pad of 3MM paper soaked in 2X SSC. 5. When all 96 cultures have been transferred from the microtiter plate, float the nylon filter on the surface of denaturing solution for 5 min. 6. Float the filters on the surface of neutralizing solution for 5 min. 7. Rinse the filters submerged in 2X SSC. 8. Air-dry the filters and either bake them at 80°C for 1 h or uv-irradiate to covalently attach the DNA to the filter. 9. Hybridize the filters with the appropriate radiolabeled probes and wash according to conditions described by Sambrook et al. (27).
Rapidly Boiling the Miniprep to Isolate Plasmid DNA for Sequencing Plasmid DNA can be easily prepared from transformants that contain candidate clones. The following protocol is recommended for bacterial strains DH5, JM101, and MC1061; it is not recommended for strains RR1, HB101, and JM109. We also found that it was difficult to obtain good DNA sequence from miniprep DNA if the cultures were grown too long. 1. Grow cultures in 2 ml of L B - am p 8-12 h at 37°C (aeration is important; do not grow the cultures in sealed microfuge tubes). 2. Transfer 1.5 ml to a microfuge tube and spin in a microfuge for 25 s. 3. Pour off the supernatant, removing all residual LB. 4. Resuspend the cell pellet by vortexing in 300 ml STET.
5. Add 20-25 ttl lysozyme made as a 10 mg/ml stock in water and kept frozen at -20°C. Immediately mix by inverting the tubes no longer than 30 s. 6. Place tubes in boiling water for 2 min. 7. Spin 5 min in microfuge. 8. Remove the mucoid pellet with a toothpick. 9. Add an equal volume (usually 250 ~1) of 75% isopropanol/25% 10 M ammonium acetate to the supernatant, mix vigorously, and spin 5 min in the microfuge. 10. Rinse the pellet well with 70% ethanol and dry in a speedvac 5 min. 11. Resuspend in 50 #1 TE.
Sequencing Miniprep DNA Other than the addition of DMSO to the sequencing reaction (S-mix), we follow the manufacturer's protocol for DNA sequencing. Prepare the S-mix (see Chemicals and Buffers) when you are ready to sequence the plasmid DNA. To anneal the sequencing primer to the template, boil 9 #1 of the miniprep DNA with 1.1 #1 of DMSO and 1 #l of sequencing primer at 20-40 ng/ml for 5 min. Place on ice or store frozen at - 2 0 ° C before sequencing. After the sequencing gel has been read, translate the DNA sequences of the PCR inserts into all reading frames. The proper open reading frame is obvious if the conserved amino acids are present at their appropriate positions within the PCR sequence.
Screening Expression in Different Tissues and Cell Types The same degenerate PCR primers that were initially used to amplify and clone novel members of a multigene family can also be used to screen the expression pattern of a given clone. All of the YK clones whose amino acid sequences are shown in Fig. 2 were originally amplified from male germ cell cDNA with the degenerate PCR primers CT73 and CT74. To assay the expression profile of these clones in a variety of germ cell types and somatic tissues, we PCR-amplified random-primed cDNA exactly as described above. Figure 3A shows the ethidium bromide-stained PCR fragments obtained from these tissues. Southern blots of the PCR-amplified cDNA were hybridized with radiolabeled probes prepared from individual clones by PCR amplification of the inserts (Fig. 3, YK19). A potential drawback to using the entire PCR fragment as a hybridization probe is that it may cross-react with other members of the same multigene family. An alternative is to hybridize the Southern blot with a specific oligonucleotide probe radiolabeled with [~-32P]ATP by T4 DNA kinase (27; see Fig. 3, YK18). 1. Prepare random-primed cDNA from tissues and cell types as described above. 2. Amplify 1 ttl of cDNA in a 10-#l PCR reaction for 30-35 rounds as described above. 3. Pour a 2% agarose gel with multiple combs spaced at least 2 cm apart. After amplification, set aside 2 #l of
DEGENERATE PCR PRIMERS the PCR samples in 8 #l of 0.5X dye buffer to be stained with ethidium bromide. Dilute the remaining 8 ul of the PCR samples 200-fold: add 160 #l of 0.5X dye buffer, 100 #l of chloroform (to dissolve the mineral oil), vortex, and spin briefly. For Southern blotting, load all but one set of wells with 10 #l of the diluted PCR samples and load the last set of wells with the PCR samples to be stained with EtBr on a 2% agarose gel in 0.5X E buffer. 4. Remove that portion of the gel to be stained with EtBr, visualize the PCR products on a uv illuminator, and record the data on positive/negative film. 5. For Southern blotting, soak the remainder of the gel in denaturing solution twice for 5 rain. 6. Rinse the gel once with neutralizing solution and then soak in neutralizing solution 10 min. 7. Place the gel upside-down onto a sheet of plastic wrap in a small pool of neutralizing solution. Remove all air bubbles from below the gel and aspirate or blot excess buffer. Set up the normal Southern blot with a GeneScreen nylon filter.
39
following a TFASTA comparison search (23) of the GenBank DNA sequence files. The PCR primers CT73 and CT74 biased the PCR amplification toward the tyrosine kinase receptors because 75% of the sequenced clones were most closely related to other genes in these subfamilies. None of the clones were members of the serine/threonine protein kinase subfamilies and only YK2, which is related to src, was not a member of the targeted subfamilies (Abl and the tyrosine kinase receptors). Note that two Drosophila src homologues contain an arginine residue at the penultimate position in CT73 (6) and this may be a feature in common with some mammalian src homologues, such as YK2. Of the 19 sequences presented, at least 6 have not been previously identified in mammals and 5 sequences are not closely related to any of the recognized subfamilies of tyrosine kinase genes (6). This was not a comprehensive screen; for example, the c-kit gene, which is expressed in testis (20), was not obtained. PCR fragments of c-kit might have been lost before bacterial transformation or cloned but overlooked during subsequent screens. The distribution of clones that are obtained This technique is useful for rapidly screening the depends on their abundance in the cDNA, the pairing expression profiles of individual genes within a multigene characteristic of the degenerate PCR primers in compefamily. The relative abundance of a given sequence in tition with all other potential cDNA templates, and other different tissues does not significantly vary with the numfactors related to cloning. ber of rounds of PCR amplification; only its absolute level PCR amplification under the conditions that were used increases with each round of amplification until the PCR introduced apparently random base pair substitutions reaches saturation. This is true for PCR amplification about once per 400-500 nucleotides. At this rate, a sigusing degenerate primers that simultaneously amplify nificant fraction of individual isolates potentially contain many members of a given multigene family, such as the missense mutations. However, nucleotide substitutions YK primers CT73 and CT74 or the G a-protein PCR usually occur in the later PCR cycles (11), and it is corprimers (33). However, it is only accurate for those genes respondingly rare to find two isolates of a cognate sewhose target sequences closely or exactly complement the quence that have the same nucleotide substitution. To sequence of the degenerate PCR primers; genes with diminimize DNA sequence errors due to PCR mutagenesis, vergent motif sequences may not amplify proportionally we sequenced pools of DNA templates from three to five in each PCR sample. Once candidate genes have been individual isolates, when available, whose sequence was identified, their expression profiles should be confirmed initially determined using the M13 reverse primer. In this by an independent assay. approach, a consensus DNA sequence is read on the gel. Both strands of each YK gene were sequenced using the M13 universal and reverse primers. This method is fairly RESULTS AND DISCUSSION accurate; for example, the DNA sequence of YK8 is identical to the cDNA sequence of the mouse basic fibroblast We identified PCR fragments of 19 different tyrosine growth factor receptor (bFGFR; 25) and Y K l l is identical kinase genes whose amino acid sequences are shown in to the bek cDNA (10). Further evidence attesting to the Fig. 2. We initially sorted 384 randomly picked clones fidelity of PCR cloning comes from the DNA sequences and all but 8 contained inserts. Of these, 96 were selected of the PCR fragments of the G protein a-subunits (36), for DNA sequence analysis based on the results of dif- which also agreed with the mouse cDNA clones that we ferential hybridization to the PCR fragments amplified isolated (31, 33). The accuracy of these DNA sequences is encouraging from Sertoli cell, type A spermatogonium, and spermatid cDNA. We obtained sequence from 92 clones using the and it is interesting that YK6 and YK17 share the same M13 reverse primer, and their distribution is shown in amino acid sequence, although their DNA sequences differ Fig. 2. To determine the DNA sequences accurately, rep- at 4 nucleotides. None of the other YK clones had more resentative clones were selected to be sequenced using than I nucleotide difference with their cognate sequence. either the M13 universal or the reverse primer in separate Therefore, we propose that YK6 and YK17 are indepenreactions. The amino acid sequences of these PCR frag- dent PCR isolates of two different genes. Two other PCR ments were classified according to their homology scores clones, YK13 and YK14, have been independently ob-
WILKIE AND SIMON
40
tained from mouse (36). The D N A sequence of YK14 has a single nucleotide change t h a t is presumably a P C R artifact fixed in the one clone t h a t we sequenced, but YK13, which was sequenced from three separate clones, has 3 nucleotide differences with the P C R clone FD22 (38). These differences might be attributable to D N A polymorphisms in the two sources of cDNA. A m o n g the other Y K clones that were previously identified in other species, only YK9, YK12, and YK18 have one amino acid substitution when compared to their most closely related homologues, the h u m a n IGF1R, TYK2, and Ret genes, respectively (36, 12, 34). The amino acid sequence in this region of the catalytic domain is highly conserved; for example, the sequence of Y K l l is identical to the cek 3 c D N A in chicken (22) and the mouse clones Y K 6, 13, 14, and 19 are identical to clones t h a t were PCR-amplified from zebra finch brain c D N A (P. Tsoulfas, personal communication). T h e impressive sequence conservation in this region might indicate t h a t YK9, YK12, and YK18 are not the mouse homologues of IGFR, TYK2, and Ret but of other closely related genes. To resolve this question, the P C R fragments can be used as probes to isolate the corresponding c D N A clones for comparison across the more divergent domains of these genes. Expression of the Y K clones was assayed in spermatogenic cell types and various somatic tissues. Figure 3 shows the expression patterns of two clones: Y K 19, a P C R fragment of the mouse abl gene, and YK18, most closely related to the Ret gene. YK19 was ubiquitously expressed in somatic tissues and male germ cells, in agreement with P o n z e t t o and W o l g e m u t h (24). In contrast, germ cell expression of YK18 was most a b u n d a n t in the primitive and type A spermatogonia and the meiotic cell types: leptotene, zygotene, and pachytene spermatocytes. This was the only clone with relative expression levels t h a t were much greater in type A spermatogonia t h a n in Sertoli cells or spermatids. We confirmed this expression pattern using an oligonucleotide probe specific to YK18 t h a t differed from all other Y K clones in at least 7 of 16 nucleotide positions. None of the YK clones were germ cell specific, but expression of YK18 appears to be highest in testis cDNA.
CONCLUSION =
=
We have described a detailed m e t h o d for differential hybridization screening of P C R fragments t h a t have been amplified from targeted multigene families. Any multigene family exhibiting at least two short motifs of highly conserved amino acids, such as those described a m o n g the protein kinase family, may support P C R amplification of a specific sequence domain. This approach Provides a very efficient m e t h o d for cloning and sequencing novel members of multigene families and for screening their patterns of expression in various tissues and cell types.
ACKNOWLEDGMENTS We thank Mike Strathmann for discussions and numerous contributions, Carol Lee for screening and sequencing the PCR clones, Anthony Bellv~ for isolating the spermatogenic cell types, members of the Microchemical Facility at Caltech for oligonucleotide synthesis and for coupling inosine to columns on special order, Pantelis Tsoulfas for TFASTA sequence comparisons, and Jane Johnson, Murray Robinson, and Pantelis Tsoulfas for comments on the manuscript. This work was supported in part by a NIH postdoctoral fellowship to T.M.W. (GMl1576) and a NIH grant to M.I.S. (GM34236).
REFERENCES 1
1. Anderson, S. K., Gibbs, C. P., Tanaka, A., Kung, H-J., and Fujita, D. J. (1985) Mol. Cell. Biol. 5, 1122-1129. 2. Claesson-Welsh, L., Eriksson, A., Westermark, B., and HenrikHeldin, C. (1989) Proc. Natl. Acad. Sci. USA 86, 4917-4921. 3. Clark, J. M. (1988) Nucleic Acids Res. 16, 9677-9686. 4. Gautam, N., Northup, J., Tamir, H., and Simon, M. I. {1990) Proc. Natl. Acad. Sc£ USA 87, 7973-7977. 5. Gronwald, R. G. K., Grant, F. J., Haldeman, B. A., Hart, C. E., O'Hara, P. J., Hagen, F. S., Ross, R., Bowen-Pope,D. F., and Murray, M. J. (1988) Proc. Natl. Acad. Sci. USA 85, 3435-3439. 6. Hanks, S. K., Quinn, A. M., and Hunter, T. (1988) Science 241, 42-52. 7. He, X., Treacy, M. N., Simmons, D. M., Ingraham, H. A., Swanson, L. W., and Rosenfeld, M. G, {1989) Nature 340, 35-42. 8. Johnson, J. E., Birren, S., and Andersen, D. (1990) Nature 346, 858-861. 9. Kamb, A., Weir, M., Rudy, B., Varmus, H., and Kenyon, C. (1989) Proc. Natl. Acad. Sci. USA 86, 4372-4376. 10. Kornbluth, S., Paulson, K. E., and Hanafusa, H. (1988) Mol. Cell. Biol. 8, 5541-5544. 11. Krawczak,M., Reiss, J., Schmidtke, J., and Rosler, U. (1989) Nucleic Acids Res. 17, 2197-2201. 12. Krolewski,J. J., Lee, R., Eddy, R., Shows, T. B., and Dalla-Favera, R. (1990) Oncogene 5, 277-282. 13. Kruh, G. D., King, C. R., Krans, M. H., Popescu, N. C., Amsbaugh, S. C., McBride, W. O., and Aaronson, S. A. {1986) Science 234, 1545-1548. 14. Kumar, R. (1989) Technique 1, 133-152. 15. Libert, F., Parmentier, M., Lefort, A., Dinsart, C., Van Sande, J., Maenhaut, C., Simons, M.-J., Dumont, J., and Vassart, G. (1989) Science 244, 569-572. 16. Lochrie, M, A., Mendel, J. E., Sternberg, P. W., and Simon, M. I. (1991) Cell. Regul. 2, 135-154. 17. Ma, H., Yanofsky, M., and Meyerowitz,E. (1990) Proc. Natl. Acad. Sci, USA 87, 3821-3825. 18. Matsushime, H., Wang, L-H., and Shibuya, M. (1986) Mol. Cell. Biol. 6, 3000-3004. 19. Mullis, K. B., and Faloona, F. A. {1987) in Methods in Enzymology (Wu, R., Ed.), Vol. 155, pp. 335-351, Academic Press, San Diego. 20. Nocka, K., Majumder, S., Chabot, B., Ray, P., Cervone, M., Bernstein, A., and Besmer, P. (1989) Genes Dev. 3,816-826. 21. Partanen, J., Makela, T. P., Alitalo, R., Lehvaslaiho, H., and Alitalo, K. (1990) Proc. Natl. Acad. Sci. USA 8 7 , 8913-8917. 22. Pasquale, E. B, (1990) Proc. Natl. Acad. Sci. USA 87, 5812-5816. 23. Pearson, W. R., and Lipman, D. J. (1988) Proc. Natl~ Acad. Sci. USA 85, 2444-2448.
DEGENERATE 24. Ponzetto, C., and Wolgemuth, D. J. (1985) Mol. Cell. Biol. 5, 17911794. 25. Reid, H. H., Wilks, A. F., and Bernard, O. (1990) Proc. Natl. Acad. Sci. USA 87, 1596-1600. 26. Saiki, R. K., Scharf, S., Faloona, F., Mullis, K. B., Horn, G. T., Ehrlich, H. A., and Arnheim, N. (1985) Science 230, 1350-1354. 27. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor~ NY. 28. Scharf, S. J., Horn, G. T., and Erlich, H. A. (1986) Science 233, 1076-1078. 29. Shtivelman, E., Lifshitz, B., Gale, R. P., Roe, B. A., and Cannaani, E. (1986) Cell 47, 277-284. 30. Sommer, R., and Tautz, D. (1989) Nucleic Acids Res. 17, 6749. 31. Strathmann, M. P., and Simon, M. I. (1990) Proc. Natl. Acad. Sci. USA 87, 9113-9117.
PCR PRIMERS
41
32. Strathmann, M. P., Wilkie, T. M., and Simon, M. I. (1989) Proc. Natl. Acad. Sci. USA 86, 7407-7409. 33. Strathmann, M. P,, Wilkie, T. M., and Simon, M. I. (1990) Proc. Natl. Acad. Sci. USA 87, 6477-6481. 34. Takashi, M., and Cooper, G. M. (1987) Mol. Cell. Biol. 7, 13781385. 35. Takeya, T., and Hanafusa, H. (1983) Cell 32,881-890. 36. Ullrich, A., Gray, A., Tam, A. W., Yang-Feng, T., Tsubokawa, M., Collins, C., Henzel, W., LeBon, T., Kathuria, S., Chen, E., Jacobs, S., Francke, U , Ramachandran, J., and Fujita-Yamaguchi, Y. (1986) E M B O J. 5, 2503-2512. 37. Wilks, A. F. (1989) Proc. Natl. Acad. Sci. USA 86, 1603-1607. 38. Wilks, A. F., Kurban, R. R., Hovens, C. M., and Ralph, S. J. (1989) Gene 85, 67-74.