CEPH viewer: a client-server database to browse and manipulate CEPH physical mapping and linkage data

CEPH viewer: a client-server database to browse and manipulate CEPH physical mapping and linkage data

BRIEF REPORTS Viewer utilizes some data in the CEPH data set that quickmap does not: namely, fingerprints and YAC size information. (Size information ...

416KB Sizes 0 Downloads 56 Views

BRIEF REPORTS Viewer utilizes some data in the CEPH data set that quickmap does not: namely, fingerprints and YAC size information. (Size information on YACs is displayed by quickmap, but is not currently used for inferencing.) CEPH Viewer uses this information as follows:

CEPH Viewer: A Client-Server Database to Browse and Manipulate CEPH Physical Mapping and Linkage Data Prakash M. Nadkarni’

*Given the well-known presence of significant chimerism in the CEPH mega-YAC library, it is important for a mapper to be able to identify “clean” clones (i.e., nonchimeric clones with no internal deletions) for the characterization of a particular region. Several obviously chimeric YACs are already flagged in the CEPH data set as binding to more than one chromosome; flagging of YACs with internal deletions is a more subtle problem. CEPH Viewer uses size information to flag possible internal deletion in a YAC if it binds to more than one adjacent STS on a chromosome, and the linkage distance spanning these STSs is suspiciously large given the size of the YAC. The linkage-to-physical distance ratio for flagging a “suspicious” YAC can be specified by the user, since this ratio is not linear across the genome.3 lCEPH Viewer uses fingerprint information to display a cartoon of a gel with a user-selected set of clones: individual lanes (clone digests) can be repositioned using the mouse. Visual inspection of the cartoon will sometimes help to determine partially the tiling pattern of YACs hybridizing to a region of a chromosome when the number of YAC fragments is too few for quickmup to apply its statistical criteria. (Complete determination of the tiling pattern will generally require additional data points through experiments using probes derived from the ends of these YACs.)

and Patricia Bray-Ward*

Center for Medical lnformatics and *Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06510 Received March 29, 1994; revised September 27, 1994

With their announcement of the first high-level physical map of the human genome (l), the Center for Study of Human Polymorphisms (CEPH) made the supporting data on the clones in their mega-YAC library and their linkage markers publicly available through anonymous ftp. Individual researchers as well as genome centers now need to constantly reference this voluminous body of data that, as of May 9, 1994, held information on approximately 4000 ST%, 33,500 YACs, 17,600 STS-YAC associations, 87,400 YAC-YAC associations based on Ah-PCR analysis, and 99,000 YAC-YAC associations based on a combination of fingerprint and AluPCR data, in addition to 6 sets of fingerprints for each YAC and a total of 1.5 million fingerprint fragments for all YACs. The optimal use of the CEPH data set is facilitated by storing it in relational database form. One benefit of doing this is to be able to add value in the form of tables containing local experimental data, to perform queries that span both the public and the local data sets, and to report agreement/ discrepancies between the two. CEPH Viewer, a database for this purpose, was originally built by us as part of the informatics support of the chromosome 12 genome center. To assist viewing of this large vqlume of data from geographically separated laboratories, we decided to create a clientserver implementation that could use an Internet-accessible shared database. CEPH Viewer uses Sybase (running on a Sun) as the back end (seruer) and the Macintosh DBMS 4th Dimension (4D) as the front end (client). We implemented the server component rapidly through a straightforward transformation of the CEPH data files into eight relational tables.2 The use of CEPH Viewer is complementary to the use of CEPH’s publicly available quickmup program (2). CEPH

We emphasize that, given the nature of the data on which it operates, CEPH Viewer is meant to assist, not supplant, the mapper’s judgment, and the verification (or disproof) of clones that it flags as “suspect” (or the verification or disproof of a suggested tiling pattern) can be performed only in the laboratory. With regard to straight reporting of CEPH data, the user interface component of CEPH Viewer emphasizes the ability to choose multiple working sets of YAC clones (or STSs) and display as much data as possible on a chosen set of YACs or STSs up front. This involves issuing searches to the server based on sets of clone identifiers (or sets of STSs spanning a particular zone on the linkage map of a particular chromosome), capturing the results, and concatenating the contents of result columns into single text fields prior to display.4 CEPH Viewer will integrate experimentally derived local data from independent (non-CEPH derived) tables wherever such data is available. Figure 1 shows such an integrated display of a user-specified YAC set, which also combines local data (in this case, FISH mapping data) and fingerprint information. Such integration is important, because the precom-

1To whom correspondence should be addressed at the Yale Center for Medical Informatics, Department of Anesthesiology, Yale School of Medicine, 333 Cedar Street, New Haven CT 06510. Telephone: (203) 785-7403. Fax: (203) 737-2243. ’ We took one liberty in data transformation to reduce its bulk: the size of clone fragments derived from fingerprinting gels is reported in the CEPH data to six digits, whereas the technology CEPH uses for sizing the fragments cannot reliably size fragments above 20 kb; fragments above this size are too close to the origin. We therefore truncated large fragments to 32,000 bases, thereby allowing their storage as small integers (which occupy two bytes of storage) rather than long integers (which occupy four bytes). With 1.5 million gel fragments in the database, considerable space is saved. GENOMICS

25,318-320

0888-7543/95$6.00

(1995)

Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.

3 This information is computed on demand and not stored permanently in the database, since the binding of one clone to more than 1 STS could be alternatively interpreted to indicate that two different regions of the same chromosome might share a similar stretch of sequence. 4 The infoclone program, which is a nongraphic component of the quickmap package and operates like a UNIX filter, reports similar information, but does so a single STS or YAC at a time. 11Q

“I”

BRIEF

YAC ziil

FISH X00-14)

8ooh

iEiE

$

Sizdkbl

STSs Bound (+ Cbr/distaaoe)

1780

AFMIZZxf4(9;1.4290) AFM144y12(6:0.3200) AFMI 99&6;0.3090)

1580

06(10-12)

900

’ MM1 22xf4(9;1.4290)

AFMI 22xf44(9;1.4290) AFM144yR(6;0.3200) AFMl99yr5(6;0.3090)

REPORTS

ALU-PCR Info

319 'AssooiatcdClams*

2001_hhJ (FFF) 747fe10 (SW 747-f-l VFM) 771A-1 (MSM) 794_hJ (FFF) SOLLl 0 (MFM) 850-U (FSF) 859AJ @FF) S62sJ (FFFI 928ad (FFF) 789A5 WFF) 9oo_hs m-F) 808JLto @FM) 847~5 (FM) 9IhI I (SW 923hI 1 (FFF) 928a5 (MFF) 93623 (FFM) 95222 @lFF) 747-f-1 0 (FFF) 747~10 (FFF) 7554s (FFF) 771 hl (FFF) 81 OLl (FFF) 859aJ (RF) 90 I s4 (MFF) 912-q-9 (FSF) 917~_I (SFS) 923-h-l I VFF) 966_.+10 (FFF) 9S7aA (FFF) au+69 (F 8034s (FFF) el4~_4 (FMF) 947a_7 ml-m

66 I AJ5 747-f-l 0 ESO_f> 923-h-l I 924A_I

2001 _h3 (FFF) 747-f-10 (SSF) 771-h-I (FSF) 923-h-l 1 (FFF) 924-1,I (FFF)

771A-1

924-r-I

_ _.

. ._ .

___

Fingcrprrat uata: Labels on top indicate Clone Names and Number of Bands.

Edit ‘fox1 Object Arrjm

Probe:

/ ! !

Kpn

Enzyme: Pvull

. . .. .. . .. . . .. . .. . . . . . .. . .. .. . . . .. . . . . : : ;

‘f

FIG. 1. An integrated display of data showing a user-selected set of YAC clones. The top part shows for each YAC: physical localization (chromosome and range of percentage fractional length from Pter) (this is local experimental data); size of each YAC in kilobases; the list of STSs to which this YAC hybridizes (and for each STS, the chromosome with which it is associated, and the sex-averaged linkage distance from the end of Pter, in Morgans); other YACs associated with this YAC by Alu-PCR hybridization (the letters F, M, and S indicate faint, medium, and strong hybridization signals in three dimensions: a signal that is strong in all three dimensions is highly reliable, while a signal that is faint in two or more dimensions is highly doubtful); and YACs “associated” with this YAC on the basis of an unpublished algorithm used by CEPH, which statistically weights fingerprint and Alu-PCRinformation. The lower part of the display shows fingerprints (a gel mock-up) for the same set. This mock-up can be manipulated by reordering the lanes and adding or subtracting clones from the selected list to facilitate visual comparison of YACs, as a means of assessing the position of a candidate YAC within a region prior to experimental evaluation.

BRIEF

320

REPORTS

puted associations between YACs (computed by CEPH through a proprietary algorithm) are not 100% reliable: the underlying data is in constant flux. For example:

Assignment of the Waldner Blood Group Locus (WD) to 17q12-q21

*The STS-YAC hybridization screening as performed by CEPH has a significant incidence of false negatives (Janine Leblanc-Straceski, pers. commun.). *There are often discrepancies between data derived from CEPH and non-CEPH sources. An earlier version of the CEPH data set showed an inconsistency for the clone 771 h 1 in Fig. 1, which was reported by CEPH to hybridize exclusively to chromosome g-derived STSs, but hybridized during FISH mapping (performed by the second author) to chromosome 6 only. This was communicated to CEPH, which then repeated the experiments: Fig. 1 shows hybridization of 771 h 1 with both chromosome 9- and chromosome 6-derived STSs.

T. Zelinski,’ G. Coghlan, L. White,

Integration of local data with CEPH data would be very difficult without a relational database framework. CEPH Viewer achieves integration of local data through a simple indexed lookup of the local data on CEPH-derived identifiers (other identifiers, such as GDB D numbers or long integer unique identifiers, may also be stored with a local object). A client-server version of CEPH Viewer requires the following software licenses: (a) Sybase (Sybase Inc., Emeryville, CA) and (b) 4th Dimension, 4D SQL Server and 4D DRAW, all from AC1 US, Cupertino, CA, or from discount software retailers. For institutions with a Sybase license who express in writing their intention to use CEPH Viewer, we will provide, free of cost, UNIX shell scripts for schema definition (which includes skeletons of tables for storing local data) and Sybase stored procedures, awk and per1 scripts to transform the CEPH data files into normalized form, and the 4D frontend code and documentation. Those who wish to run a stand-alone microcomputer version of CEPH Viewer (without the graphical fingerprint displays) will need a machine with at least 70 MB of free disk space and a Foxpro license. (The 4D database engine does not handle this volume of data satisfactorily in stand-alone mode.) For such users, we will supply a Foxpro schema and interface code. ACKNOWLEDGMENTS We thank Ken Krauter, Janine LeBlanc, Kate Montgomery, Beatrice Renault, David Ward, and Sung-Jo0 Yoon for their inputs. We thank Phillipe Rigault (CEPH) for giving us a detailed explanation of quickmap’s workings. The data imported into CEPH Viewer was downloaded by anonymous ftp from the directory lpublceph-genethon-map of the server ceph-genethon-mup.genethon.fi. This work was supported by Grants POlHG00965 and ROlHG00272 from the National Center for Human Genome Research.

and 5. Philipps

RhLaboratory, Departments of Pediatricsand Human Genetics, Universityof Manitoba, Winnipeg, Manitoba, Canada, R3E OL8 Received June 14, 1994; revised September 28, 1994

The Waldner blood group antigen (WDl) was first recognized as a distinct erythrocyte surface structure in 1978 (6). Occurring infrequently (incidence
1To whom correspondence should be addressed at Rh Laboratory, 735 Notre Dame Avenue, Winnipeg, Manitoba, Canada R3E 0L8. Telephone: (204) 789-3244. Fax: (204) 787-4807.

REFERENCES 1.

Cohen, D., Chumakov, I., and Weissenbach, generation physical map of the human 366(6456): 698-701.

J. (1993). A firstgenome. Nature

2.

Rigault, P., and Poullier, E. (1994). QUICKMAP: Compact database and navigation tool for integration of CEPH-GBnBthon mapping data. Cold Spring Harbor Conference on Genome Mapping and Sequencing, 1994, p. 206.

FIG. 1. An example of a family cosegregating for WD and D17S41. O/Cl represent WD: -l,O, and O/II represent WD:l,O. D17S41 alleles are indicated by arrowheads. A paternal z = 9:l count for the WD.J)17S41 pair is observed. GENOMICS 25,320-322

(19%) 0888-7543/95 $6.00

Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form resewed.