Comparison of 2-D gel electrophoresis images

Comparison of 2-D gel electrophoresis images

Paste- Rccoenieac t _t ;, Comparison of 2-D gel electrophoresis images Horace H .-S . IP Research Computer Unit, fmper a! Cancer Research hand L...

332KB Sizes 2 Downloads 59 Views



Paste-

Rccoenieac t _t

;,

Comparison of 2-D gel electrophoresis images Horace H .-S . IP Research Computer Unit, fmper a! Cancer Research hand Laboratories, P.O . Box 123. Lincoln s [nn Fields . London If C2.1 2PX. United Kingdom

D .J . POTTER* Image Processing Group, Deparenent

of Physics

and Astronomy, University College London, London, United Kingdom

Received September 1985 Revised 8 May 1986

Abstract : This paper reports the design and implementation of an algorithm for pairing protein spots seen on separate 2-D electrophoresis gets . The algorithm consists of a global linear transformation applied to one set of protein spots followed by local matching based on the chamfering_ technique . The algorithm has been implemented on the CLIP4 family of 2-D cellular array processors .

Ker ,vordc image blobs matching, parallel algorithm, array processors .

1 . Introduction Two-dimensional gel electrophoresis is a process by which the various types of proteins synthesised in a cell sample are separated according to two intrinsic properties of the protein molecules, namely molecular charge and size (mass) . The output of the process is a gel layer containing a twodimensional pattern of spots . The location of each spot on the get is related to an associated protein molecular charge along the x-axis and to the size of the molecule along the y-axis . The density of the spot is also related to the abundance of the associated protein in the sample . By noting differences in the spot patterns produced from normal and cancerous cell samples, this process potentially enables us to pinpoint the difference between proteins synthesized in normal and cancerous cells, and thereby may lead to the development of antibodies which would offset 'Supported by the Science and Eneineerina Research `buncil .

these differences . Because over 1000 spots can typically be found on one of these gels and the differences, both qualitative and quantitative, are expected to be small, this work is directed at matching corresponding protein spots found on two separate gels with the aid of an array computer, CLIP4 (see Duff, 1978) . Approaches which are based on graph matching (Skolnick, 1982), and others based on affine transformations working in conjunction with a set of heuristics (e .g . Lemkin and Lipkin, 1981 ; Carrels et al ., 1983) have been proposed . Our technique draws on experience gained from these approaches, and, at the same time, takes advantage of the parallelism available in the 2-D array computer .

2 . Observations Due to the variability of local background intensity around each spot and the physiochemical effects which serve to generate these spots in the first

016 - -8655 8T'53 .50 i 1987 . Elsevier Science Publishers B .y' . (North-Holland)

81



January 198'

PA ,! ERN RDCOG-N . . .-% LErFERS

place, segmen _d object features s 'ch as edge seg m •n ts, shape, size and optical: density of corresponding spots are not maintained in separate gels to a degree which can be used for matching purposes . The only useful guides to matching corresponding spots are the positions of the spot in the gel and the local configuration around corresponding spots . Since these protein spots represent the focuii of protein molecules having similar molecular weights and isoelectric points, they are typically elliptical and have a major axis of 5-7 pixels across in the gel image . Our matching algorithm consists of two stages : a global linear transformation applied to all the spots on one of the gels, followed by matching based on the chamfering technique (Barrow et al ., 1977) . The chamfer is a distance function represented by grey levels in an image . It was first suggested by Barrow et al . in matching satellite images to maps .

on _he two ma chin gels . These parameters are estimated using the method of least squares . The transformation is then applied globally to all the spots in one of the gels . A chamfer defined by a distance (metric) function is generated for the untransformed spot image . The process effectively generates a terrain of 'hills' and `wells' in the resulting chamfer map which correspond to the distances to the nearest spot and the locations of the spots respectively . Marching is accomplished by projecting the set of spots from the transformed gel image onto the terrain (chamfer function) and moving each spot down along the steepest slope of the 'hills' in the chamfer until it arrives at the nearest 'well' . Multiple matching is prevented by an averaging and interpolation step which acts on the shift-distance for the spots as defined by the chamfer map .

4 . Global linear transform 3 . Approach The linear transform is defined as follows : The inputs to the matching programs are binary images consisting of isolated white points (Figure I), each white point denoting the location of the centre of a protein spot previously segmented from the gel image (see Potter, 1984) . The parameters of the linear transformation, which aims to correct for the relative displacement and orientation of the two sets of spots are estimated from a set of manually located landmark

x'=ax+by+c

(1)

y'=dx-ey+f

(2)

and

where x' and y' are the transformed coordinates of x and y respectively . To map a spot centred at x and y to x' and y' respectively requires movement over the distances : dy=y'-y=cr+(d-I)y+f

(3)

and dx=x'-x=(a-I -b-c/d)-x +(b/d)-y'+e-(fb)/d .

Figure 1 . .A pattern of dots marking the positions of protein spots in a 2-D electrophoresis get .

(4)

The present implementation initially generates two images, DX and DY, whose pixel values are equal to the amount of pixel shifts required by the least squares criterion . Ramp images (wedge functions) along the x- and the y-axes are generated using global propagation operations (see Otto, 1984) . By treating each pixel in the ramp images with the appropriate scaling factors and constants as specified in equation (3),



'AFT MI

! Lk-

"? ON= !he 40-image DA , He Ayvzraae DX

an be comp . red similar ; . . The actual pixel shitting operation is done serialeach group of pixels having the same ialue in he shift images . DX and DT . This insolces finding all pixels in the _hut image having tle same followed by executing the correct number of shifts for those pixels . The complexity of this part of the matching algorithm is proportional to the total number of shift operations, rather than to the sum of the total number of shifts required by each of the spots in the ima g e, Parallel implementation of equations (3) and (4) means that the values of d-v or d v can be computed simultaneously for alt the spots and the transformation is applied to all spots requiring [lie same number of pixel shifts simultaneously . This is in advantage over serial implementation of the same algorithm where a large number of spots has to he processed .

n,l , ~~ - : i, ? al mum her or proton spor, ! and in Ae it : inlays, Equation O can be evaluated for all pixels in an imags simultaneously by using the cellular logic operation SHRINK itenni%ck . Tae binar: image containing the spots (white dots) is first inverted and the resulting~ image is SHRINKed . After each SHRINK„ the result 6 added to an accumulator . The SHRINK-and-ADD process is repeated until the accumulator stabilises . Each pixel value in the accumulator then contains the chess-board distance of the pixel to the nearest spot in the binary spot ima g e (see Figure 2) . The chamfer map is generated for the spot image which has not previously been transformed . Having Obtained the chamfer function for the spot image, the next problem is to determine the path which a spot on the globally transformed image has to traverse in order to arrive at the nearest `well' in the chamfer map . The direction of steepest descent for each point on the chamfer image is described by

5 . Local transform

(7)

- 1 , [g(v, .V)] For an isolated pixel i at coordinate (-v,, v, ), the chamfer value gjxy) as defined by the chessboard distance metric is given by z(k_v)=maxj,_v-x,I,Iy-y, 1

(5)

When there is more than one point in the grid, the chamfer value at coordinate (xy) becomes ,g(.v, .v) =min[gjx, y)]

Ft'ure

2

T,o

for i = 1, . . ., n,

and the distance to be covered is equal to the value of gets y)- The vector which describes the path is therefore : J=dx- i+ dv-j

(8)

where

(6)

Jwinter maps generated from the dot pattern shos,n in

dx = - (

I i,;u

1:

ax

[g (vy)j 2 ) 2

(a) Chess-hoard distance .

(9)

ih,

Euclidean dtitance .



PAT 1 - LFU' RPCOuN'Tlli\ LL IILFtS

and

-)

sub .maec -i ~- [

( .v. )l I

r

interpolated shift

(IC)

and i and j are unit e cctors along the v and y directions respectively . The shift-images DX and DY for the local transform can be computed using gradient operators applied to the square of the chamfer map . The number of shifts (in the .v or y direction) for each spot on the globally transformed image can be determined and applied to the spots themselves by logically Awning the appropriate shift-image with the binary spot image, the result being called the label-spot-image .

Rearranging eves 1 g(7) (XY)-[A-(XY-xY)(XY-k .r')

+B(XY vY)Xv+CvyXY +DYX(XY-Xv)(

(12)

6 .1 . Implementation 6 . Averaging and interpolation The use of label-spot-image as it stands for spot matching does not give satisfactory results . This is because spots on one gel which do not have true corresponding spots on the other gel would give rise to large incorrect shift values as dictated by the chamfer map . Furthermore, multiple matchings may occur . The first problem can be resolved by setting a threshold on the maximum distance over which the spots are permitted to migrate for the purpose of matching . To correct for the second problem, the shift values in the label-spot-image must be averaged in some fashion before shifting is performed . Simple 3 by 3 window averaging or similar methods will not have much effect as there are only a small number of spots in the window . Instead, averaging must be done over lager areas and calculation must only include shifts marked by points to be shifted . The averages are interpolated so that the distortion function, represented by the shifts, is continuous . Averaging is achieved by partitioning the labelspot-image into a number of subimages each consisting of a (n x n) block of pixels . The shiftdistances for the spots inside each subimage are first averaged and the corrected shift-distance for a spot is the interpolated value of the four averaged distances . A, 8, C and D, calculated for the four nearest subimages . For a spot lying a distance of x and y pixels from the centre of the lower-left 84

The averaged and interpolated shift-distance (in thex or v direction) for the set of spots can be computed in parallel as follows : (1) Form an image which contains a set of vertical lines, n pixels apart . (2) Use a procedure called `SUMLN MK' (see Otto, 1984) which sums, in the horizontal direction, the pixel values of a grey image within a binary mask (defined by a binary image), with the appropriate label-spot-image and the inversion of the vertical lines image generated in (I) as inputs . This summing procedure makes use of the global propagation operations and, consequently, the complexity of this part of the algorithm is essentially of an order proportional to the bit length of the images involved . (3) The resulting image is shifted one pixel to the right and added to the label-spot-image in order to

Figure

3.

Image paruuonme :or ntcrpo1au :e acer ;rvmg .



January 1987

PATTERN RECOGNITION LETTERS

\olunu 5, "Umber I

take into account those spots which happen to be lying exactly on the vertical lines (these values lie outside the binary masks and, consequently, are not summed in step (2)) . AND the sum image and the vertical lines image to obtain an image whose pixel values are the sum of all the shift values (in the label-spot-image) found in the preceding n columns of the same row . (4) Repeat the above steps but, this time, form a set of horizontal lines (in step (1)), sum along the vertical direction (in step (2)), and replace the labelspot-image used in steps (2) and (3) by the partial sum image computed in (3) . The result is an image in which the bottom right corner pixel of each subimage contains the sum of the shift distances for all the spots lying inside it in the label-spot-image (image S) . (5) Repeat steps (I) to (4) using the binary spot image instead of the label-spot-image to obtain an image in which the bottom right hand corner of each subimage contains the total number of spots found inside it (image T) . (6) Divide image S by image T to obtain the averaged shift distance . (7) Calculate the 8 images A, B, C, D, xy, yX, Xy and XYas specified by equation (12) . The pixel values in images A, B, C and D contain the averaged shift distance of the four nearest subimages respectively . To compute image B, for instance, we simply spread the relevant averaged shift values

computed in step (6) from the bottom right corner of each subimage to cover all the pixels belonging to the associated subimage . Images A, C and D are the appropriately shifted versions of B, corrected for edge effects . The remaining images are wedge images of modulus n scaled by the constants X and Y accordingly . (8) The corrected shift distance for the spots can be obtained by arithmetically combined the 8 images computed in step (7) according to equation (12) .

7 . Results and conclusions To measure the success of the algorithm, ten gels were matched to one standard gel, simulating an experiment for high and low metastatic cell line comparison . Table 1 shows the results of the comparison . False positives are divided into three categories : (a) the spots on both gels are true spots and the transformation applied is incorrect, (b) one of the spots is a foreign particle on the gel, and (c) the segmentation process picked out a spot although none is present . False negatives are divided into four categories : (a) two true spots are present which should have been matched, (b) the transformation has been applied to a foreign particle on the gel, (c) the transformation has been applied to a non-spot (due to an error by the segmentaion process), and

Table I Approximate percentages of matching ten gels to a standard gel (see text for explanation) Get no .

Matched Correct

Nonmatched

Incorrect (a)

Correct

(b)

(c)

Incorrect (a)

(b)

(c)

(d)

1 0 0 0 0 0 0 1 0 0

0 0 0 0 1 0 0 0 0 0

1 3 4 6 4 0 4 2 3 1

0 .2

0 .1

2 .8

1 2 3 4 5 6 7 8 9 10

79 70 78 66 78 82 73 83 74 84

3 4 I 1 2 2 3 3 0 1

0 1 0 0 0 0 0 1 0 0

0 0 0 0 1 0 0 0 0 0

10 12 13 7 Il 11 14 9 8 4

6 10 4 20 3 5 6 1 15 10

Av .

76 .7

2

0 .2

0 .1

9 .9

8

85



"alume i, Number I

PATTERN RECOGNITION LETTERS

January 1987

(d) the spot has not been matched because there are

analysis system using the CLIP4 array processor .

two possible matches on the ocher gel .

The size (96x96) of the array imposed certain

Normally 15-20 pairs of manually matched spots

restrictions on the pixel resolution and the area of

were used in the estimation of the least squares

the images which can be processed . Work is cur-

parameters . When compared with results obtained from another matching algorithm which uses a

rently being carried out to implement the system on the CLIP4S computer which has an array of

global transformation followed by adjusting the

512x4 processing elements .

position of each spot using neighbouring previous-

At the Imperial Cancer Research Fund Labora-

ly matched pairs, this technique has been shown to

tories, a database for storing and organising the

reduce incorrect matches whilst improving correct matches (see Potter, 1984) . Other factors affecting

protein spot data records derived from the electro-

the performance of the algorithm are the number

plemented .

and the distribution of the set of manually matched landmark spots used in the global transforma-

and the relevant data analysis software .

phoresis images has been designed and is being imComputer graphics will be used extensively to improve user access to the database

tion step and the size of the subimages used in the averaging and interpolation step . Euclidean distance (see Figure 2) has been tried

References

instead of the chess-board distance for the chamfer function, but no significant improvement in the matching rate is observed . Because of the SIMD nature of the CLIP4 computer, substantial overheads are incurred in the data reshuffling which is needed to position the data in the right order or configuration prior to the execution of operations such as parallel addition or subtraction of two sets of operand data . The current CLIP4 implementation requires approximately 90 seconds to match 400 pairs of spots . The timing can be broken down as follows : (a) manual location of landmark spots takes approximately 60 sec ., (b) the calculation and application of the global transformation takes approximately 3 sec ., (c) the generation of the chess-board chamfer function takes around 1 sec ., and (d) averaging and interpolation takes another 7 sec . for each of the two shift directions .

8 . Current developments The work described in the preceding sections forms a part of a 2-D gel electrophoresis image

86

Barrow, H,G ., J .M . Tenenbaum, R .C . Bolles and H .C . Wolf (1977) . Parametric correspondence and chamfer matching : Two new techniques for image matching . Proc. 5th Intern at, Joint Conf. Artificial Intell . 2, 659-663 . Duff, M .J .B . (1978) . Review of the CLIP image processing system . Proc . Nat . Comput. Conf, pp . 1055-1060 . Garrets, J .I ., J .T . Farrar and C .B . Burwell (1983) . The quest system for computer-analyzed two-dimensional electrophoresis of proteins . In : R . Bravo, Ed ., Two-Dimensional Electrophoresis of Proteins : Methods and Applications . Academic Press, New York . Lemkin, P .F . and L .E . Lipkin (1981) . GELL4a : A computer system for 2D gel electrophoresis analysis Il ; pairing spots . Comput. Biomed. Res. 14, 355-380 . Otto, G .P . (1984) . Algorithms for image processing on the CLIP4 cellular array processor . Ph .D . Thesis, University of London . Potter, D .J . (1984) . Analysis of images containing blob-like structures using an array processor . Ph .D . Thesis, University of London . Skolnick, M .M . (1982) . An approach to completely automatic comparison of two-dimensional gels . Clin . Chem . 28, 979-986 .