Isolation of genes from cloned DNA Anthony John Radcliffe
P Monaco Hospital,
Oxford,
UK
During the past year, improvements in the physical and genetic maps of the human genome, in combination with more efficient methods to isolate genes from cloned DNA, have made an increasing impact on the identification of disease genes. 5equence analysis of genomic DNA and the random sequencing and mapping of cDNA clones is helping to integrate the transcript map with the developing physical and genetic maps.
Current
Opinion
in Genetics
and Development
Introduction The isolation of expressed sequences from cloned genomic DNA has been essential for the construction of transcript maps across large chromosome regions and for the identification of genes responsible for inherited diseases and cancer. New techniques for efficient isolation of genes from cloned DNA have been developed over the past few years, most notably exon amplihcation and cDNA selection. Direct nucleotide sequencing has also contributed to gene identification, including sequencing of genomic DNA to search for potential exons using computer algorithms and the random sequence analysis of cDNA clones. This review highlights the improvements on the techniques for the isolation of genes from cloned DNA and their application to ‘positional cloning’, which involves the isolation of disease genes on the basis of their chromosome location, rather than prior knowledge about the defective protein.
Positional
1994, 4:360-365
Most disease genes isolated to date by positional cloning techniques have been pinpointed by chromosome abnormalities that are cytogenetically visible, such as fragile sites, deletions, duplications and translocations. Physical mapping and DNA isolation of large regions surrounding the chromosome position of these abnormalities has been greatly advanced by the ability to clone large fragments of DNA in yeast artificial chromosomes (YACs) 121.For the Human Genome Project, YACs have been essential for the construction of physical maps of cloned DNA, and a first generation physical map of the entire human genome has recently been described [3”1.YACs have also been used to identify the position of structural rearrangements in patients’ chromosomes by fluorescence in situ hybridization (FISH) of YAC DNA to metaphase spreads [4,51. Another type of mutation that can be detected by Southern blot analysis of genomic DNA or by PCR amplification is the expanding trinucleotide repeat. Repeat instability and amplification in affected patients has been documented for seven human inherited diseases, most notably fragile-X syndrome, myotonic dystrophy and Huntington’s disease (for review, see [&I>.
cloning
Positional cloning is the process whereby genes are identified as a direct result of genetic analysis and it does not involve functional information, such as pro tein sequence, or available antibodies (for review, see 11.1). Positional cloning is a multistep process (see Fig. 1) which begins by localizing a disease gene locus to a particular region of a chromosome by the identification of structural abnormalities and/or genetic linkage analysis in families segregating for the disease. This preliminary localization is followed by a molecular analysis of the region, including finer genetic mapping, physical mapping, DNA isolation, transcript identification, cDNA cloning and mutation searching of candidate genes.
When there is no obvious structural aberration detectable cytogenetically or as altered DNA fragments in Southern blot analysis, disease genes must be localized solely by genetic linkage analysis in families segregating for the disease phenotype. This approach has been greatly enhanced by the generation of a high resolution genetic map of the human genome with very informative microsatellite polymorphisms IT’**]. Using these genetic resources, a large number of disease genes have been positioned on the chromosome map in the past year. The density of the genetic map and the success of positional cloning for monogenic diseases has spurred interest in the isolation of genes for more complex diseases, such as diabetes, hypertension, obesity and psychiatric disorders (for review, see [81).
Abbreviations EST+xpressed
360
sequence
6 Current
tag; YAC-yeast
Biology
artificial
Ltd ISSN 0959-437X
chromosome.
Isolation
r 1
, 1
Disea;arm&ant
I
Chromosome position
I
I
.
Finer genetic and physical mapping
I
YAC contig construction (lambda and cosmid subclones)
Gene
isolation
Traditional approaches: fi) conservation on zoo blots (ii) CpG islands (iii) northern blots (iv) cDNA library screening with YAC, cosmid or lambda clones
Mutation
I
Analysis
Newer approaches: ti) exon amplification (ii) cDNA selection (iii) genomic sequencing and computer analysis (iv) regionally mapped candidate genes and
analysis of function
I
0 1994 Cumnt Opinion in G&tics
Fig. 1. Schematic outline text for further details).
of steps involved
in positional
and Develqmer
cloning
approaches
Several traditional approaches are available for gene identification, especially if specific single-copy DNA fragments are available from YAC subclones (see Fig. 1). These include searching for evolutionary conservation of DNA fragments by hybridization to Southern blots of genomic DNA isolated from multiple species (‘too blots’ 1111) or identification of CpG islands as signposts for the 5’ end of genes W,l3l. Conserved DNA fragments, CpG island containing fragments, and random single-copy DNA fragments can be tested for expression by hybridization to northern blots of RNA isolated from fetal and adult human tissues. If a transcript is identified, then the genomic DNA fragment can be subsequently hybridized to cDNA libraries generated from the appropriate tissue. Both multipletissue northern blots and specific-tissue cDNA libraries are now commercially available from several sources.
I
1
DNA Monaco
is easily purified. This is accomplished by subcloning the YAC after partial restriction enzyme digestion into these vectors and identifying human positive clones 191 or by directly hybridizing YAC inserts purified from the yeast host chromosomes to gridded chromosome-specific cosmid libraries 1101.
;,
Traditional
.
of genes from cloned
(see
Although the analysis of conserved DNA fragments and CpG islands has resulted in the isolation of many disease genes, it is a very labour-intensive strategy when mapping a large region. For genomic regions of OS-3.OMb that can be isolated in overlapping YAC clones, searching for genes from cloned DNA requires a more efficient approach (see Fig. 1). One strategy is to directly hybridize whole YAC or cosmid inserts to cDNA libraries on filters after suppression of repeated sequences in the radioactively labelled probe 114,151. This method can lead to difficulties because of the hybridization of pseudogenes and low copy number repeat sequences that are not efficiently suppressed in the large genomic probe. For YAC inserts of several hundred kilobases, this approach requires the use of large amounts of high specific activity radioactive probe, which can cause significant background on the library filters. Despite these technical difficulties, several groups have been successful and have isolated many new genes 116,171.
cDNA selection
Strategies
to isolate genes from cloned
DNA
Overlapping YAC clones (contigs) covering up to several megabases are usually isolated from the region surrounding a chromosomal abnormality or critical disease region defined by recombinants or allelic association in genetic mapping. To isolate new polymorphisms and expressed sequences, YAC clones are often represented by contigs of overlapping bacteriophage or cosmid clones in bacterial hosts, from which DNA
Another strategy, termed ‘cDNA selection’ 118,191, is to hybridize an amplified cDNA library to YAC inserts or cosmid clones immobilized on nylon membranes after suppression of repeated sequences. The cDNA inserts that specifically hybridize to the cloned genomic DNA are then eluted and amplified again by PCR; this process is repeated two or three times before cloning the selected cDNAs. The resulting cDNA sub-libraries are enriched for expressed sequences from the genomic region, but still require thorough characterization because of simultaneous selection of cDNAs with homology to pseudogenes and low copy num-
361
k62
Genetics
of disease
. ber repeat sequences in the genomic DNA template. Several groups have reported improvements on the cDNA selection protocols using hybridization in solution, rather than filter-bound template, and the use of biotin-streptavidin and magnetic beads to capture cDNA inserts specifically hybridizing to the genomic template 120,211. The cDNA selection procedures have led to the.generation of transcript maps across several areas of the human genome in the past year, including Xq28 1221,Xq13.3 1231and the HLA class I region 1241. A similar selection’ method has also been described which enriches for genomic DNA sequences conserved between species 1251. Using human cosmids from the Xq28 region, sub-libraries were generated with homologously conserved sequences from mouse and pig DNA.
Exon amplification
Exon amplification or ‘trapping’ is another useful method for the isolation of expressed sequences from cloned human DNA. It relies on functional sequences required for RNA splicing that flank exon coding sequences. Cloned genomic DNA fragments are ligated into a plasmid vector that contains splicing sequences. If the genomic fragment inserted in the vector contains an exon, then it can be spliced properly into the mature mRNA after expression in mammalian cells in culture. In one version 1261, the vector contains the splice donor site and the genomic fragment containing the putative exon provides the splice acceptor site. This method relies on there being only one splice site to be selected and has the disadvantage of selecting genomic fragments with cryptic splice acceptor sites, thereby trapping false positive exons. Another version of exon trapping 127,281 requires both splice donor and splice acceptor sites in the putative genomic exon to be incorporated into the resulting mRNA. This type of exon amplification vector should in theory provide fewer false positive exons because of the selection for both donor and acceptor splice sites. The exon amplification vector pSPL1 1271 has been problematic, having a cryptic splice site in the HIV-fur intron in which the genomic DNA is ligated, thus generating false positives 129,301.As this system requires the presence of both functional 3’ and 5’ splice sites, it will be unable to identify intronless or single exon genes. However, the pSPL1 exon amplification system has generally provided an efficient system for isolating exons from cloned cosmid DNA 129,301. This technique has been important in the identification of the Menkes disease gene 1311, the neurofibromatosis type 2 tumour suppressor gene 1321, the glycerol kinase gene 1331,and the Huntington’s disease gene l341. Modifications to the original pSPL1 vector have been described recently 135’1 that increase both the sensitivity and efficiency of exon amplification and its ability to isolate expressed sequences from highly complex sources of genomic DNA. The new version, pSPL3, in particular, has been modified to eliminate vector-
only splice products and the false positives resulting from cryptic splicing in the HIV-Cur intron.
Nucleotide
sequencing
of genomic
DNA
Expressed sequences can be identified in cloned genomic DNA after direct nucleotide sequencing by searching for coding sequences. On a smaller scale, sequence analysis of conserved DNA fragments, CpG islands, exon amplification products or cDNA-selected fragments provides an entry point for obtaining the full-length cDNA sequence. As an alternative to hybridization to cDNA libraries on filters, coding sequences provided by direct sequencing of these small fragments can be used to design primers to obtain the complete cDNA sequence using methods such as rapid amplification of cDNA ends (‘RACE’) 136,371. On a larger scale, long stretches of genomic DNA can be sequenced and potential exons identified using computer algorithms. For genomic DNA isolated in bacteriophage or cosmid clones, ‘shotgun’ sequencing approaches have been used to generate libraries of overlapping fragments for sequencing on high-throughput automated sequencing machines, Improvements in the construction of high-quality random libraries have utilized adaptor-based strategies to decrease the number of insertless clones or clones with extraneous DNA or multiple inserts 1381. Once the random sequencing approach has assembled 90-95% of the final sequence, a directed sequencing approach using specifically designed primers helps to close any remaining gaps. In the past two years, several groups have reported sequences of human genomic DNA ranging from 58-106kb 139-411. The assembled sequences were analyzed for the location and density of human repeated sequences. Several programs have been developed to find potential coding sequences, most notably GRAIL 1421,which uses a multi-sensor/neural network approach to find genes in DNA sequences, and BLASTX 143’1, which translates nucleotide sequences and searches a protein database for homologous coding regions. Recent reviews have provided a more detailed discussion of computer algorithms and the issues involved in finding coding sequences in genomic DNA and searching molecular sequence databases (144..1; see review by M Boguski lpp 383-3881). The approach of sequencing genomic DNA to find potentially expressed sequences has been applied to the positional cloning of the gene for Kallman syndrome. A deletioncritical region in Xp22.3 was identified in which at least part of the gene must have been located. One group 1451used conservation of YAC subclones on zoo blots to identify potential exons, while another group 1461completely sequenced 60 kb of genomic DNA and identified potential coding regions using several computer analyses. As the automated sequencing technology becomes more accessible to many researchers in the next few years, and computer programs and databases become more sensitive and efficient at hnding and comparing coding sequences, the approach of
Isolation
direct sequencing to find genes should become more popular.
Nucleotide
sequencing
of cDNA
clones
In the past few years, over 20000 new genes have been identified in many species by randomly sequencing cDNA clones to produce ‘expressed sequence tags’ (ESTs) [471. This approach has provided a large number of cDNA sequences from a variety of tissues isolated from organisms at different developmental stages (for review, see 148’1; for details of human ESTs, see [47,49-531). By generating cDNA sequences from either the 5” end, 3’ end, or both ends of cDNA clones, researchers have provided partial gene sequences which have now been collected into a single database for expressed sequence tags, dbEST 1541. A modest number of human ESTs have been assigned chromosome locations for integration with the physical and genetic maps of the Human Genome Project and for applications to the positional cloning of disease genes [55,56,57*1. On the human X chromosome, 19 ESTs were recently assigned to smaller regions on the basis of a panel of somatic cell hybrids 1581.This has increased the number of genes assigned to specific regions of the X chromosome by 20%. More efficient methods for assigning the thousands of new genes to specific chromosome regions are needed to integrate them into the physical and genetic maps. As more ESTs are mapped to specific chromosome regions, disease loci mapping to the same regions will have increasing numbers of potential candidate genes. This has led to the ‘positional candidate approach’ [59”1, whereby the information obtained from a gene sequence and function, in combination with its chromosome location, will implicate it as a candidate gene for a specific disease that has been mapped to the same region. One example, specific for an EST approach, is the gene for X-linked glycerol kinase deficiency. The gene was isolated not only by positional cloning approaches [33,601, but also by a random sequence analysis of testes cDNA clones, one of which showed significant homology with bacterial glycerol kinase 1611. After mapping this EST to the Xp21 region and obtaining longer cDNA sequence, it was shown to be the X-linked gene involved in glycerol kinase deficiency.
Conclusions In the past few years, the number of resources and techniques for the isolation of genes from cloned DNA to generate transcript maps and for the positional cloning of disease genes has increased dramatically. Most of these resources (e.g. YAC physical maps, microsatellite genetic maps, sequence databases and ESTs) have been the result of the efforts of researchers involved in the Human Genome Project. As the number of cloned and mapped genes increases, the need
DNA Monaco
of genes from cloned
to isolate new candidate genes by positional cloning will decrease over time. This is particulary relevant to projects for the determination of genes involved in complex diseases, in which genetic mapping may be able to localize specific loci only to intervals several Mb in size. The combination of efficient gene isolation procedures (e.g. cDNA selection and exon amp&ation) and the increasing number of randomly isolated gene sequences (ESTS) should make these challenging genetic problems more approachable.
References Papers review, . ..
and recommended
of particular interest, published have been highlighted as: of special interest of outstanding interest
reading within
the annual
period
of
1.
Collins FS: Positional Cloning: Let’s Not Call It Reverse Anymore. Nature Genet 1992, 13-6. ;his review outlines the history behind, and basic steps involved in, positional cloning (previously termed ‘reverse genetics’). It includes tables listing disease genes which have been identified by both positional and functional cloning. 2.
Burke DT, &de GF, Olson Mv: Cloning of Large Segments of Exogenous DNA into Yeast by Means of Artiicial Chro mosome Vectors. Scferrce 1992, 236:BO6-812.
Cohen D, Chumakov I, Weissenbach J: A FlrstGeneration Physical Map of the Human Genome. Nature 1993, 366:69%701. This paper reports a first-generation YAC map of the human genome using 33 000 YAC clones (0.9Mb average insert size) In an approach combining hybridization fingerprinting with human repeat sequences and PCR screening with 2000 G&tethon microsatellite markers. The more detailed maps are to be published separately, but this paper describes the strategy and more Importantly provides e-mail access to the YAC contig data.
3. ..
4.
Heitz D, Rousseau F, Devys D, Saccone S, Abderrahim H, Le Paslier D, Cohen D, Vincent A, Tonolio D, Della Valle G, et al.: Isolation of Sequences that Span the Fragile X and Identification of a Fragile X-Related CpG Island. Sc&rtce 1991, 251:12361239.
5.
Turner Z, Chelly J, Tommerup N, Ishikawa-Btush Y, ToMesen T, Monaco AP, Horn N: Characterization of a I.OMb YAC Contig Spanning Two Chromosome Breakpoints Related to Mcnkcs Disease. Hum Mol Gkmet 1992, I :483-489.
6.
Richards RI, Sutherland GR: Simple Repeat DNA is Not Replicated Simply. Nahrre Tenet 1994, 6:114-117. ;his News and Views article summarizes the most recent disease genes that have expanding repeat sequences as the causative mutation (including a table with references) and outlines a mechanism for repeat expansion mutation. Weissenbach J, Gyapay G, Dib C, Vignal A, Morissette J, Millasseau P, Vaysselx G, Iathrop M: A Second Generation Linkage Map of the Human Genome. Nature 1993, 359:794-gOI. A landmark paper describing the French Genethon effort to use randomly generated microsatellite markers to construct a high-resolution genetic map of the human genome.
7. ..
8.
Lathrop
M: Genetic
Approaches
to Common
Diseases.
Cnrr
Optn Btotecbnof 1993, 4:678683. 9.
Monaco AP, Iarin Z: Subcloning and Cosmid Vectors. In Ctrwent rcr, unit 5.11. Edited by Dracapoli
YACs
into
Bacteriophage
RvtocoLs in Human G@netNC,
Haines
JL, Korf
BR,
363
364
Genetics Moir New
of disease DT, Morton CC, Seidman CE, Seidman JG. Smith D. York: Greene and Wiley-Interscience; ‘1994: in press.
10.
Baxendale S, Bates GP, MacDonald ME, Gusella H: The Direct Screening of Cosmid Libraries Clones. Nttclric Acids Res 1991, 196651.
11.
Monaco AP, Neve KL, Colletti-Feener C, Bertelson CJ, Kurnit DM, Kunkel LM: Isolation of Candidate cDNAs for Portions of the Duchcnnc Muscular Dystrophy Gene. Nalartrre 1986, 323:64&650.
12.
Bird AP: CpGRich Islands and the Function lation. Narrrre 1986, 321:209-213.
13.
Larsen F, Gundersen G, Lopez as Gene Markers in the Human 13:1095-1107.
R, Prydz Genomc.
28.
Hamaguchi M, Sakamoto H, Tsuruta H. Sdsaki H, Muto T. Sugimum T, Terdda M: Establishment of a Highly Selective and Specific Exon-Trapping System. Proc Null Acad Sci USA 1992, 89:9779-9783.
29.
Church DM, Banks LT. Rogers AC, Gmw SL. Housman DE, Gusella JF, Buckler AJ: Identification of Human Chrome some 9 Specific Genes Using Exon Amplitication. Hum Mvl GeneI 1993, 2:1915-1920.
30.
North MA, San.seau 1’. Buckler AJ. Church D, Jackson A, Pate1 K, Trowsdale J, Lehrach H: Efficiency and Specificity of Gene Isolation by Exon Amplification. Mamm Genome 1993, 4:4(L+i74.
31.
Vulpe C, Levinson 1~. Whitney S. Packman S, Gitschier J: Is@ lation of a Candidate Gene for Menkes Disease and Evidence that it Encodes a Copper Transporting ATPase. NaIlIre Getter 1993, 3:7-13.
32.
Trofatter JA, M;~cCollin MM, Rutter JL. Murrell JR, Duyao MP, Parry DM, Eldridge R, Kley N. Menon AC. Pulaski K. el al.: A Novel Moesin-, Ezrin-. Radixin-Like Gene is a Candidate for the Neurolibromatosis 2 Tumor Suppressor. Ceil 1993. 72:7914CO.
33.
Walker AP, Muscatelli F, Monaco AP: Isolation of the Human Xp21 Glycerol Kinasc Gene by Positional Cloning. Hum Mel Gkner 1993, 2:107-114.
34.
The Huntington’s Disease Collaborative A Novel Gene Containing a Trinucleotidc Expanded and Unstable on Huntington’s somes. Ceil 1993, 72:971-983.
JF, Lehrdch with YAC
of DNA H: CpG &tornia
MethyIslands 1992,
14.
Herrmann BG, Labeit S, Poustka A, King TR, Lehrach Cloning of the T Gene Required in Mesodcrm Formation the Mouse. Natttre 1990, 343:617622.
H: in
15.
Elvin P, Slynn G, Black Anand R, Markham AF: Yeast Artificial Chromosome 18:39133917.
16.
Kahloun AE, Chauvel B, Mauvieux V, Dorval 1, Jouanolle A-M, Gicquel I, Le Gall J-Y, David V: Localization of Seven New Gcncs Around the HLA-A Locus. Hum Mel Gener 1993, 2:55-60.
17.
Getaghty MT, Brody LC, Martin Pearson P, Monaco AP, Lehrach of cDNAs from OATLl at Xpll.2 nonks 1993. 16:44&446.
18.
Parlmoo S, SM: cDNA lection of Fragments.
19.
Lovett M, Kere J, Hinton LM: Direct Selection: a Method for the Isolation of cDNAs Encoded by Large Gcnomic Regions. Proc Nat1 Acad Scl USA 1991. 88:%&%32.
20.
Kom B, Sedlacek 2, Manta A, Kioschis P, Konecki D, Lehrach H, Poustka A: A Strategy for the Selection of Transcribed Sequences in the Xq28 Region. Hum Mol Genet 1992, 1:235-242.
21.
Morgan M: Tbc Regions Nuclefc
D, Graham A, Butler R, Riley J, Isolation of cDNA Clones Using Probes. Ntrcleic Acids Res 1990,
L5, Marble M, Kearns W, H. Valle D: The Isolation Using a 480 kb YAC. Ge-
Patanjali SR, Shukla H, Chaplin DD, Weissman Selection: Efficient PCR Approach for the SecDNAs Encoded in Large Chromosomal DNA Rvc Nat1 Acad Scf USA 1991, 88:%23-9627.
Research Group: Repeat That is Disease Chrome
Church DM. Stotler CJ, Rutter JL, Murrell JR, Trofatter JA, Buckler AJ: Isolation of tines from Compla Sources of Mammalian Gcnomic DNA Using Exon Amplification. Nafzrre Genet 1994, 6:98-105. The authors describe the modification of an exon amplification vector that had previously been used widely in positional cloning projecrs. The modified vector has improved sensitivity, can handle complex genomic input DNA and has eliminated some of the previous technical problems with the original vector.
35. .
36.
Lob EY, Elliott JF, Cwirla S, Lnier LL, Davis MM: Polymerase Chain Reaction with Single-Sided Specificity: Analysis of T Cell Receptor 6 Chain. Science 1989, 243:217-220.
JG. Dolganov GM, Robblns SE, Hlnton LM, Loven Selective Isolation of Novel cDNAs Encoded by the Surrounding the Human Intcrleukin 4 and 5 Genes. Acti Res 1992, 20:51735179.
37.
Frohman MA, Dush MK, Martin CR: Rapid Production Full-Length cDNAs from Rare Transcripts: Amplification ing a Single Gene-Specific Oligonucleotide Primer. Proc Acad Sci USA 1988, 85:899%WO2.
22.
Sedlacek 2, Korn B, Konecki DS, Siebenhaar R, Coy JF, Kloschis P, Poustka A: Construction of a Transcription Map of a 300 Lb Region Around the Human G6PD Locus by Direct cDNA Selection. Hum Mol Genet 1993, 2:1865-1869.
38.
Povinelli duction: 210:1626.
Gecz J, Viilard L, Lossi AM. Millasseau P, Djabali M, Fontes M: Physical and Transcriptional Mapping of DXS56-PGKl 1 Mb Region: Identification of Three New Transcripts. Hum Mel Genet 1993, 2:1389-13%.
39.
23.
Martin-Gallardo A, McCombie WR, Gocayne JD, FitzGemld MG, Wallace S, Lee BMB, Lamerdin J, Trdpp S, Kelley JM, Liu L-I. er al.: Automated DNA Sequencing and Analysis of 106Kilobases from Human Chromosome 19ql3.3. Narure Genet 1992, 134-39.
24.
Wei H, Fan W-F, Xu H, Parimoo S, Shukla H, Chaplin DD, Weissman SM: Genes in One Mcgabasc of the HLA Class I Region. Proc Nad Auad Scf USA 1993, 90:1187&i1874.
40.
25.
Sedlacek A: Direct Species.
McCombie WR. Martin-Gallardo A, Gocayne JD, FitzGerald MG, Dubnick M, Kelley JM, Castilla L. Liu L-I. Wallace S. Trapp S, et al.: Expressed Genes, Alu Repeats and Polymorphisms in Cosmids Sequenced from Chromosome 4~16.3. Nature Genet 1992, 1348-353.
41.
26.
Duyk GM, Kim S, Myers RM, Cox DR: Exon Trapping: A Genetic Screen to Identify Candkiatc Transcribed Sequences in Cloned Mammalian Genomic DNA. Proc Nat1 Acad Scl USA 1990, 87:899=.
Iris FJM, Bougueleret L, Prieur S, Caterina D, Primas G. Perrot V, Jurka J. Rodriguez-Tome P, Claverie JM, Dausset J, Cohen D: Dense Ah Clustering and a Potential New Member of the NFKB Family within a 9OKilobase HLA Class III Segment. Narrrre Gener 1993, 3:137-143.
27.
Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA. Sharp PA, Housman DE: Axon Amplification: a Svatcgy to Isolate Mammalian Genes Based on RNA Splicing. Proc Nat1 Acad Scl USA 1991, 88:400~009.
42.
Uberbacher EC, Mural RJ: Locating ProteinCoding in Human DNA Sequences by a Multiple ral Network Approach. Proc Narl Awd Scl 88:11261-11265.
2, Konecki DS, Siebenhaar R. Kioschis Selection of DNA Sequences Conserved Nucleic Acids Res 1993, 21:341+3425.
P, Poustka Between
CM, Gibbs RA: Large-Scale Sequencing Library an Adaptor-Based Strategy. Anal Blochem
of Us Nat1 Pm 1993,
Regions Sensor-New USA 1991,
Isolation Gish W, States DJ: Identification of Protein gions by Database Similarity Search. Nulzrre 3:26f&272. This paper describes the latest Basic Local Alignment (BLAST) software, called BLASTX. It simultaneously nomic DNA sequence ln all six reading frames and protein database for similar protein sequences. BLASTX ing frames even in the presence of sequencing errors for sequencing projects in the early stages of assembly require information on potential gene identification.
43. .
Coding RcCenef 1993, Search Tool trdnskttes gesearches the detects readand is useful which still
Altschul SF, Boguski MS, Gish W, Wootton JC: lssucs in Searching Molecular Sequence Databases. Nafure Genef 1994, 6:11’+129. This recent review outlines issues related to sequence database searching, including “the choice of scoring systems, the statistics1 significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.” An essential paper for any molecular biologist who uses nucleotide and protein database searching! Frdnco B, Guioli S, Prdgliola A, lnceni B, lorenzl R, Carrozzo R, Maestrini E, Pieretti P, et al.: A Gene Deleted in Kallman’s Homology with Neural Cell Adhesion and ing Molcculcs. Narnre 1991, 353:529-536.
46.
Legouis R, Wunderle D, ef al.: Syndrome Ceil 1991,
47.
Bardoni B, TonM, Taillon-Miller Syndrome Shams Axonal Path-Find-
Hardelin J-P, Levilliers J, Calverie J-M, Compaln S, V. MikWXW P, Le Paslier D. Cohen D. Caterina The Candidate. Gene for the X-Linked Kallman Encodes a Protein Related to Adhesion ~Molcculcs. 67:42+i35.
Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CM, Wu A, Olde B, Moreno RF, ef al.: Complementary DNA Sequencing: Expressed Sequence Tags and Human Genomc Project. Science 1991, 252:1651-1656.
48. Sikela JM, Auffrdy C: Finding New Genes Faster Than Ever. . Nature Genet 1993, 3:189-191. This News and Views article summarizes the state of the art in large-scale cDNA sequencing programs involving the development of ESTs, including both published papers and unpuhlished data presented at the European Community Symposium on SfmtegIes in cDNA Progmrm, Paris, Frdnce, October 1992. 49.
Adams MD, JM, Utterhack Identification 355:632634.
Duhnick M, Kerlavage AR, Moreno RF, Kelley TR, Nagle JW, Fields C, Venter JC: Sequence of 2375 Human Brain Genes. Nafure 1992.
50.
Okubo K, Kojima Y, Analysis of Expression.
Hori N. Matoba R. NiiylmZt T, Fukushima A, Mats&am K: Large Scale cDNA Sequencing for Quantitative and Qualitative Aspects of Gene Natrrre Genet 1992, 2:173-179.
51.
Khan AS, Wilcox AS, Polymeropoulos MH, Hopkins JA, Stevens TJ, Robinson M, Orpana AK, Sikela JM: Single Pass Sequencing and Physical and Genetic Mapping of Human Brain cDNAs. Nulnre Gtner 1992, 2:180-185.
52.
Adams MD, Kerlavage AR, Fields C, Venter JC: 3400 New Expressed Sequence Tags Identify Diversity of Transcripts in Human Brain. Nature Genet 1993, 4:256-267.
DNA Monaco
53.
Adams MD, Soares MB, Kerlavage AR, Fields C, Venter JC: Rapid cDNA Sequencing (Expressed Sequence Tags) from a Directionally Cloned Human Infant Brain cDNA Library. Nutwe Genet 1993, 4:373380.
54.
Boguski MS, Lowe TMJ, Tolstoshev for ‘Expressed Sequence Tags’. 4:332-333.
55.
Wilcox AS, Khan AS, translated Sequences some Assignment and an Expression Map of 19:1837-1843.
56.
Polymeropoulos MH, Xiao H, Glodek A, Gorski M, Adams MD, Moreno RF, Fitzgerald MG, Venter JC, Merril CR: Chro mosomal Assignment of 46 Brain cDNAs. Genomicr 1992, 12:492-B%.
44. ..
45.
of genes from cloned
CM: dbEST-Database Nattrre Genet 1993,
Hopkins JA, Sikela JM: Use of 3’Unof Human cDNAs for Rapid Chrome Conversion to ST%: Implications for the Genomc. Nrtcleic Adds Res 1991,
57.
Polymeropoulos MH, Xlao H, Sikela JM, Adams M, Venter JC, Merril CR: Chromosomal Distribution of 320 Genes from a Brain cDNA Library. Nature Genet 1993, 4381-386. This paper describes the chromosomal assignment of several hundred brain E5Ts using somatic cell hybrid mapping panels and hy genetically mapping polymorphic cDNAs in the CEPH reference pedigrees and database. ‘Ihis represents a modest beginning to integmte E5Ts with the physical and genetic map and provides interesting correlations of gene density with GC content and cytogenetic length of chromosomes.
.
58.
Parrish JE, Nelson DL: ESTs. Hum Mol Gener
Regional Assignment 1993, 2:1901-1905.
of 19 X-Linked
Ballabio A: The Rise and Fall of Positional Cloning? Nature 59. .. GeneI 1993, 3:2n-279. This April 1993 News and Views article reviews the current state of positional cloning (including a table and references) and compares it to functional cloning and the large number of genes (ESTs and others) that have been mapped to specific chromosomal regions. It describes the increasing number of genes that are being identified by ‘positional candidate cloning’, using a combination of positional information of disease gene location and the identification of the disease gene on the basis of functional information of genes already mapped to this region. 60.
Guo W. Worley K, Adams V, Mason J, Sylvester-Jackson D, Zhang Y-H, Towhin JA, Fogt DD, Madu S, Wheeler DA, McCabe ERB: Gcnomic Scanning for Expressed Sequences in Xp21 Identifies the Glycerol Klnasc Gene. Nufrrre Genet 1993, 4:367-371.
61.
Sargent CA, Affan NA. Bentley E, Pelmear A, Bailey DMD. Davey P, Dow D, Leversha M, Aplln H, Besley GTN, Ferguson-Smith MA: Cloning of the X-Linked Glycerol Kinasc Deficiency Gcnc and Its Idcntiftcation by Sequence Comparison to the Bacillus subtilt Homologuc. Hum Mol Genet 1993. 2:97-106.
AP Monaco, Imperial of Molecular Medicine, OX3 9DU, UK.
Cancer Research Fund Laboratories, John Radcliffe Hospital, Headington,
Institute Oxford
365