Computers and electronics in agriculture
Computers and Electronics in Agriculture 17 (1997) 2499261
ELSEVIER
Using model-based image processing to track animal movements RD.
Tillett *, C.M. Onyango,
Silsoe Research
Institute,
Wrest
Pavk,
Silsoe,
J.A. Marchant
Bedfordshire,
MY45
4HS,
UK
Abstract Previous work has shown how a trainable flexible model (a point distribution model) can be used to locate pigs in images. This paper extends the idea to tracking animal movements through sequences of images where a single pig is viewed from above. As well as position and rotation, more subtle motion such as bending and head nodding can be modelled. This type of model based tracking could be used to characterise animal behaviour over time. The technique was used on seven sequences and worked well in most cases. However, it is possible to lose lock, as happened in one sequence, and the method currently reported cannot restart tracking. Further developments are required to investigate model fitting methods and high level control over the fitting and tracking process. 0 1997 Elsevier Science B.V. Keywords:
Image processing;
Animal
behaviour;
Pigs; Tracking
1. Introduction
Animal behaviour is a complex mixture of responses to internal and ext.ernal stimuli. A full description of animal behaviour would include internal responses, e.g. immunological, but some aspects of animal behaviour can be deduced from the animal’s physical actions. These phlysical actions result in changes of posture and position. Although the behavioural information contained in a gross estimate of an * Corresponding author 01%1699/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved. PI1 SO168-1699(96)01308-7
250
R.D. Tillett et al. /Computers and Electronics & Agriculture 17 (1997) 249-261
animal’s posture and movements is only partial, it is still useful. In ethological research the responses to controlled stimuli are studied and measurements of the position and movement of animals are often required. Commercial systems are available which can track artificial markers attached to an animal, e.g. MacReflex Motion Analysis System from Qualisys, Bedford. Only the marker positions are recorded, and the animal’s behaviour may be affected by the attachment or presence of the markers. Another potential use of position or posture data is in using the animal itself as a sensor, for example for temperature control. Work: by Boon (198 1) suggested a link between the average floor area occupied by a group of resting pigs and the deviation of the ambient temperature from the lower critical temperature under standardised conditions. One method of observing and measuring an animal’s movements is to use image processing. A camera is a non-contact sensor that can provide large amounts of data without any disruption to the animal’s normal behaviour. The potential benefits of computer vision for livestock systems are discussed by DeShazer et al. (1988). The authors suggest many possible applications including monitoring individual or group activity levels, early detection of lameness, identification of biting pigs or other aggressive behaviour, and the detection of predator or human intruders from the animal group response. Work on the specific applications of monitoring weight and growth rates, and on environmental control using the animal as the sensor are described by Van der Stuyft et al. (1991). However the animal’s position in the image is unconstrained, the lighting may not be even, the background such as bedding will change over time, and multiple animals in a pen will occlude each other at times. The interpretation of this type of data requires robust algorithms which can use as much prior information as possible. Several approaches for locating pigs in images have been reported. Tillett (1991) used an outline derived from one training image, combined with an assumption of smooth bending along the spine to create a model with one mode of shape variation. Marchant and Schofield (1993) used a ‘snake’ algorithm to locate the edges of a pig at a drinker. The snake algorithm is good at locating smooth continuous edges in an image but it is difficult to include prior knowledge of the overall shape required. Work by McFarlane and Schofield (1995) has used a very simple model (an ellipse) for tracking piglets in a sequence of images. Model-based image processing is a family of techniques where a model of the target object is used to guide the searching of an image. This approach allows a target object to be found even when there is only partial information available, for example due to shadows, poor lighting, occlusion or confusion due to background clutter. Provided sufficient good information is available to guide the model-fitting, the model will bridge over localised areas of uncertainty. A detailed model, on.ce fitted to the image, will provide information on where various parts of the animal are, e.g. the head and rump. It can also provide information on bodily distortions such as particular postures. One type of model is the point distribution model (Cootes et al., 1992). The model consists of two parts, a mean shape represented by a numbered set of landmark points, and a number of modes of variation. The model is learnt from a
R. D. Tillett et al. /Computers and Electronics in Agriculture 17 (1997) 249-261
251
training set of shapes and can then be used to describe similar shapeswithin the variability seen in the training set. This type of model has been used for a range of applications including medical imaging (Cootes et al., 1994) face identification (Lanitis et al., 1994) and gesture recognition (Ahmad et al., 1995). An enhancement to the model, developed by Marchant (1993) and applied to pig images, is the use of a finite element technique to add a grey level rendering to the model, allowing it to be compared directly to grey levels from an image. The use of models for tracking objects in image sequenceshas been reported in the literature. Pentland and Sclaroff (1991) describe a technique for building a model based on the physical properties of i.he object, which can then be fitted to three-dimensional information for object recognition. Pentland and Horowitz (1991) extend this technique with an extended Kalman filter to track an object over time. Baumberg and Hogg (1994) use a point distribution model learnt from a training set and a standard Kalman filter to track a pedestrian walking acroSs the scene. Blake and Isard (1994) use a B-spline curve model derived from a shape template plus a number of key-frame templates. The model tracks an object using a Kalman filter. Dynamical models learnt from the tracking can then be incorporated to give improved tracking. The last two pieces of work mentioned both use edge data selected locally in the image, and Ipredictive information on the dynamics to maintain a fit of the model. An alternative approach is taken in this paper, where the trained model is fitted to the image using grey level shading information as well as edge strengths. Marchant and Onyango (1995) and Onyango et al. (1995) have shown that this approach can fit the model to single images in a range of conditions including multiple pigs partially obscuring each other. This paper describes a point distribution model to track single pigs in image sequences.The model is trained on some of the images. The modes of variation are apparently linked to independent deformations of the animal. The model is then used to track the animals through sequencesand the results are presented.
2. Building a model
2.1. Training data Seven image sequencesof a single pig in a pen, viewed from above were collected. The sequencesshow the pig walking, running, turning and sniffing at the ground. They vary in length from 13 to 30 images, captured at a speed of eight images per second. A sample from sequence 1 is shown in Fig. 1. The lighting was predominantly from above and arranged to be fairly even. A black and white ccd camera was used. Two images from each sequencewere used to make up the set required for training the model. These were selected from near the beginning and end of each sequence. Fourteen images should be sufficient to find the first one or two main modes of variation, provided the analysis shows that they account for most of the variation seenwithin the training set. The amount of variation accounted for is not known until the model is trained (see Section 2.3). The sequencesdo not show the pig lying or sitting so these postures will not be included in the model.
252
R.D. Tillett et al. /Computers md Electronics in Agriculture 17 (1997) 249-261
Fig.
1. Sequential
images
from
sequence
i of a pig viewed
from
above.
A point distribution model is based on the shape taken by a set of landmark points on the object to be recognised. For the pigs, a network of 49 points have been used and these are shown in Fig. 2. The network is chosen to divide the pig’s
Fig. 2. The landmark
points
used to represent
the pig’s
shape.
R.D.
Tillett
et al. /Computers
and Electronics
in Agriculture
17 (1997)
249-261
253
surface up into a number of elements within which the grey level shading can be interpolated (Marchant and Onyango, 1995). Each element has eight nodes on its boundary, and the corners of the elements are at specific points on the pig-the tip of the nose, at the base and tips of the ears, behind the shoulders, in front of the hind legs, and at the base of the tail. The choice of the key points was made to use visible landmarks on the pig as far as possible. (Normally there are kinks in the pig outline behind the shoulder and in front of the hind legs, but they are not always visible.) The other points are positioned by partitioning the edge joining these key points. For each image in the training set, the landmark points were located by hand, using a mouse to record the positions. When a key point was not visible its position was estimated. Poor location of some of the points will result in a model which cannot predict the position of those points very accurately. However, provided the overall model retains ‘pig-like’ shape, it will predict pig shapes adequately. A data file was used to record the x and y coordinates of each landmark point, in order, along with the image grey level at that point. The grey level information was then used to generate the grey level rendering on the trained model.
2.2. Aligning
the data
The shape data for each training image must be rotated, and translated to align them to a default position before the variations in shape can be calculated. The technique used in this work differs from the standard approach (Cootes et al., 1992) in not removing changes in scale prior to calculating the shape model. In a general three dimensional system the object can rotate, translate or change in scale (due to moving nearer or further from the camera) without changing shape. However, in the imaging configuration used for viewing the pigs the camera was at a fixed h.eight and looking vertically down. Since pigs can’t fly, and only one pig was used for these experiments, there should be no changes in scale other than those due to changes in posture. For instance, a. pig with its head down will present a smaller plan area than one with its head up. The global factors of translation and rotation are used to align the shapes. All other variation is counted as a variation in shape and will appear in the modes of variation calculated for the model. The alignment of the shapes is achieved using the form of Generalised Procrustes Analysis described by Cootes et al. (1992) with the following alterations. Instead of calculating four parameters, the translation (t,, tv) the rotation (19) and the scale (s) to align each shape to a chosen position, the best fit achieved using only the translation and rotation is calculated. The values of t,, tY and 0 are calculated by minimising the sum of squares of the distances between equivalent points in the two shapes. Therefore the error, E, to be minimised in mapping a shape x2 onto another shape x, is given by: E=
n-l C (XZ~ costl- y,, sin ti + 1’,- xlkJ2 + (x2k sin 0 + y2k cos B+ t, - ylk)’ k=O
(1)
254
R.D.
Tillett
et al. /Computers
and Electronics
in Agriculture
17 (1997)
249-261
where t, is the translation in the x-direction, fy the translation in the y-direction, B is the angle of rotation and the summ.ation is over the number of points, n, used to represent each shape. Minimising this error by the least squares approach gives: t, = k(X, - X2 cos 8 + Y, sin B)
(2)
ty = $Y, - X2 sin 19+ Y2 cos 0)
(3)
tan 19=
X,Y,-X,Yl -A -x,x,Y,Y,+B
(4)
Where n-1
-&
=
Yl
=
1 k=O
Xlk
n-1 c
Ylk
k=O n-l
x2=
1 k=O
X2k
(‘7)
Y2k
(8)
n-1 y2 =
c k=O n-1
A=n
c k=O
(Xtky2k-X2k.&k)
(9)
II-1
'=
c k=O
(XlkX2k-.%kV2k)
(1-O)
2.3. Training rhe model The model is trained by the technique presented by Cootes et al. (1992). The model consists of two parts, the mean shape, and a number of modes of variation representing the way in which the points of the shape tend to move together. The mean shape of the aligned training shapes is found by taking the mean position of each landmark point over the training examples. The modes of variation are calculated using principal component analysis. Each mode of variation is given by a unit eigenvector of the covariance matrix of the training shapes’ deviation from the mean shape. The associated eigenvalues give the proportion of the total variation that is explained by each eigenvector. The results for the training data used in this paper are shown in Fig. 3. The figure shows the computer generated model with the grey level rendering added. The rotation and translation of the mean shape are arbitrary (they depend on which training shape was used to align the others). The mean shape is in the centre, with changes in the first mode of variation along the vertical axis, and the second mode of variation along the horizontal axis. The imag;es shown represent the deformation
R.D.
Tillett
et al. /Computers
and Electron&
in Agricultwe
17 (1997)
249-261
255
Fig. 3. Examples of the trained model with grey level rendering added. The mean shape is in the centre, with changes in the first mode of variation along the vertical axis, and the second mode of variation along the horizontal axis. The images show a deformation by one standard deviation in each direction for each mode of variation.
by one standard deviation in each direction for each mode of variation. The model can also deform with components contributed from several modes of variation at once. The modes of variation appear to represent basic postural changes of the pig, as demonstrated in the training set. The first mode, accounting for 63% of the total variation, shows lateral bending of the pig”s back. The second mode (20%) shows nodding of the pig’s head. The less significant modes (not shown in Fig. 3) allow more subtle changes in shape to be included, but are less clearly linked to a specific behavioural cause.
256
R.D. Tillett et al. /Computers and Electronics in Agriculture 17 (1997) 249-261
3. Fitting the model 3.1. Measures of fit
An optimisation technique (explained below) is used to find a set of model parameters which minimise the weighted sum of two measures of fit. The measures of fit are described by Marchant and Onyango (1995). They are: (i) the average difference between the grey levels in the rendered model and those from the image at the model position, and (ii) the grey level gradient in the image in a direction normal to the model boundary averaged along the model boundary. The optimisation technique used is the Simplex method (Press et al., 1988). This method evaluates the measure of fit for a number of model positions which form a simplex in the model parameter space. The resulting fits are used to decide how to reshape the simplex to move towards a minimum. This technique is used for multivariable optimisation where the shape of the objective function is not well understood. Marchant and Onyango (1995) showed how smoothing the image would smooth the objective function allowing more reliable location of a good minimum. The smoothed image is used to find a minimum of the grey level fit, and this position is then used as a starting point to find a minimum for the edge strength fit. During each fit the simplex is restarted from the converged position to allow it to climb out of local minima. This is repeated until the new converged position is within a given tolerance of the previous one. 3.2. Tracking a sequence
The image sequences used in this work were tracked in the following way. The model had four parameters, the x and y positions, the rotation of the pig and the amount of bend (the first mode of variation from the point distribution model). For the first image in the sequence the model was positioned approximately by hand. The optimisation procedure was then used to refine the model’s fit to the image. The model parameters were then recorded. For subsequent images of the sequence the optimisation procedure was started from two model positions. The first was the position recorded for the previous image of the sequence. The second was the previous position translated along the pig’s axis towards its head by 20 pixels. The final measures of fit evaluated for each starting point were compared to select the best result, and the parameters of the corresponding model position were recorded. The technique of starting the model from two positions helps to overcome one Iof the problems in fitting the model. There is often a local minimum when the model pig’s ears are aligned with the shoulders of the pig in the image. This problem is likely to occur if the pig moves forward during the sequence, and so is a common occurrence. Starting the model in the ‘moved forward’ position reduces this problem, but doubles the computational requirement.
R.D.
Tillett
et al. / Computers
ad
Electronics
in Agriculiui-e
17 (1997)
249-261
251
4. Results The results of fitting the model to the sequence shown in Fig. 1 are shown in Fig. 4. The model fits the body of the pig well throughout the sequence. The head is not so reliably found. This is because only one mode of shape variation is used in the model. This mode is the bend of the pig. When the pig bends evenly along its length, including the head, a good fit is achieved. When the head is moved independently of the main body position, as in images 4 and 5, the objective function minimum is biased towards the larger area and so fits the body and ignores the head. Adding more modes of variation would allow the shape to vary more, but can jintroduce problems in the minimisation procedure because the minimisation surface becomes more complex. Further work is required to investigate this area such as investigating alternative measures of fit, more extensive training of the model, and possibly multi-scale or sequential fitting of the parameters. The complexity required within the model will depend on what is required from the tracking data. For some applications one mode of variation may be sufficient.
Fig. 4. The mode!
fitted
to the image
sequence
shown
in Fig.
1
258
R.D.
Tillett
et nl. /Computers
Fig.
5. The model
and Electronics
fitted
in Agriculture
to the images
17 (1997)
in sequence
249-261
4.
The algorithms as currently implemented take just under 10 min on a Spare 20 processor for the 18 images in the sequence. This includes implementing the simplex method 256 times, with the convergence tolerance set to 0.001 for each implementation. Changes to these values and attention to the method of evaluating the fit to the image could reduce the computation time considerably, possibly by a factor of five or ten. However, because of the iterative method of optimisation, this technique is not currently appropriate for real-time impl~ementation. The model was fitted to all seven sequences. The results were similar to those shown in Fig. 4 for all the sequences except number 4. The results for sequence 4 are shown in Fig. 5. At image 9 the fitted model is behind the pig’s position with the model ears close to the pig’s shoulders. The tracking doesn’t recover from this situation. The current procedure is very dependent on maintaining a good fit to the pig in order to provide a good estimate for the next image. The minimisation step is unreliable if the initial estimate is not close to the pig’s position, and further work is needed to increase reliability in this area. The parameters returned from the model-fitting process give information about the pig at each step of a sequence. The graphs in Fig. 6 show the parameter values for sequence 1. The graph on the left shows how the position of the centre of the
R.D. Tillett et al. / Computers and Electronls
pig has moved down the (the x-value). The graph increased (the pig rotated left but later to the right. activity of the animal, or time.
in Agvicultuve 17 (1997) 249-261
259
image and up again (the y-value) and from right to left on the right shows that the angle of the pig’s body has clockwise) and the bend of the body was initially to the This information can be used to calculate the speed or other information such as the location of the head over
The development of a point distribution model of a pig, and a method for fitting it to a sequence of images has shown some promise. The model captures the shape variations due to independent behaviours of the pig, such as bending its body, or nodding its head. However there are problems with fitting the model reliably to a large range of images. Further work is required on the choice of objective function, the minimisation technique used, and the higher level monitoring of the fitting and tracking in order to detect failures. Previous work by McFarlane and Schofield (1995) described the tracking of piglets using an ellipse as the model. The advance given by the model described in this paper is that the parts of the model correspond directly to parts of the pig. Once the model is fitted to a sequence, the movement of each landmark point, e.g. the tip of the nose or the shoulders, can be extracted. Also, since the modes of variation correspond to particular postures (such as lateral bending), the amount of this posture seen through a sequence can be calculated. Therefore this type of model is a rich source of information on how the animal is behaving, and can be used to locate parts of the sequence which are required for further investigation by automatic or manual means. The technique could be extended to track multiple pigs, and this would allow interactions between pigs to be measured. The location of multiple pigs in a single image, even when overlapping, has been reported by Onyango et al. (1995). The reliability of the fitting process would be even more important with multiple pigs in the scene. The computational requirement would also increase with the number of pigs being followed. -
-150
I 0
1 5
IO
15
pos1tm I” sequence
Fig. 6. Graphs
showing
the model
r&&on
-
bend
/ 20
parameters
for each image
in sequence
1.1.
260
R.D.
Tillett
et al. i Computers
and Electronics
in Agriculture
17 (1997)
249-261
. Conclusions
A flexible model has been developed to capture the variable appearance of a pig viewed from above. The mean shape and the first two modes of variation capture over 80% of the variability seen in the training set. The modes of variation of the model are related to independent behaviours seen in the training set. The first mode of variation corresponds to the lateral bending of the pig’s back. The second mode corresponds to head nodding. The model can be used to track a pig in an i.mage sequence. Seven sequences have been tracked using a model with only the first mode of variation, but further work is required to increase the reliability of the tracking. The fit of the model through the sequence provides data on the pig’s position and posture. Activity measurements, localisation of specific parts such as the head, and some behavioural information can be deduced from this data. If several pigs were tracked then interactions between them could! be measured.
This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The pig images were provided by C.P. Schofield of the Animal Science and Engineering Division of Silsoe Research Institute.
eferences Ahmad, T., Taylor. C.J., Lanitis, A. and Cootes, T.F. (I 995) Tracking and recognising hand gestures using statistical shape models. In: D. Pycock (Editor), BMVC95, Proceedings of the 6th British Machine Vision Conference, Birmingham. BMVA Press, Malvern, U.K., pp. 403-412. Baumberg, A.M. and Hogg, D.C. (1994) An efficient method for contour tracking using active shape models. Univ. of Leeds, School of Computer Studies, Research Report Series, Report 94.11. Blake, A. and Isard, M. (1994) 3D position, attitude and shape input using video tracking of hands and lips. SIGGRAPH 94, Computer Graphics Proceedings. Annu. Conf. Ser., 1994: 185-192. Boon. C.R. (1981) The effect of departures from lower critical temperature on the group postural behaviour of pigs. Animal Prod., 33: 71-79. Cootes, T.F., Taylor, C.J., Cooper, D.H. and Graham, J. (1992) Training models of shape from sets of examples. In: D. Hogg and R. Boyle (Editors), BMVC92, Proceedings of the British Machine Vision Conference, Leeds. Springer Verlag, Berlin, pp. 9-18. Cootes. T.F., Taylor, C.J. and Lanitis, A. (1994) Active shape models: Evaluation of a multi-resolution method for improving image search. In: E. Hancock (Editor), BMVC94, Proceedings of the 5th British Machine Vision Conference, York. BMVA Press, Malvem, U.K.. pp. 327-336. DeShazer, J.A., Moran, P., Onyango. CM., Randall, J.M. and Schofield, C.P. (1988). Imaging systems to improve stockmanship in pig production. AFRC Inst. Eng. Res. Div. Note DN 1459; 24 pp. Lanitis. A.: Taylor, C.J. and Cootes, T.F. (1994) An automatic face identification system using flexible appearance models. In: E. Hancock (Editor), BMVC94, Proceedings of the 5th British Machine Vision Conference, York. BMVA Press, Ma.lvern, U.K.. pp. 65-74. McFarlane, N.J.B. and Schofield, C.P. (1995) Segmentation and tracking of piglets in images. Machine Vision Appl., 8: 187-193.
R.D.
Tillett
et al. /Computers
and Electronics
in Agriculture
17 (1997)
249-261
261
Marchant, J.A. (1993) Adding grey level information to point distribution models using finite elem!ents. In: J. Illingworth (Editor), BMVC93, Proceedings of the 4th British Machine Vision Conference, Guildford. BMVA Press, Malvern, U.K., pp. 3099318. Marchant, J.A. and Onyango, C.M. (1995) Fitting grey level point distribution models to animals in scenes. Image Vision Comput., 13 (1): 3-12. Marchant, J.A. and Schofield, C.P. (1993) Extending the snake image processing algorithm for outlining pigs in scenes. Comput. Electron. Agric., 8: 261-275. Onyango, C.M., Marchant, J.A. and Ruff, BP. (1995) Model based location of pigs in scenes. Comput. Electron. Agric., 12: 261-273. Pentland, A. and Horowitz, B. (1991) Recovery of non-rigid motion and structure. IEEE Trans. Pattern Analysis Machine Intelligence, 13 (7): 730-742. Pentland, A. and Sclaroff, S. (1991) Closed-form solutions for physically based shape modelling and recognition. IEEE Trans. Pattern Analysis Machine Intelligence, 13 (7): 715-729. Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1988) Numerical recipes in C. Cambridge University Press, Cambridge, pp. 305-3109. Tillett, R.D. (1991) Model-based image processing to locate pigs within images. Comput. Electron. Agric., 6: 51-61. Van der Stuyft, E., Schofield, C.P., Randall, J.M., Wambacq, P. and Goedseels, V. (1991) Development and application of computer vision systems for use in livestock production. Comput. Electron. Agric., 6: 243-265.