Image mosaicing using sequential bundle adjustment

Image mosaicing using sequential bundle adjustment

Image and Vision Computing 20 (2002) 751–759 www.elsevier.com/locate/imavis Image mosaicing using sequential bundle adjustment Philip F. McLauchlan*,...

634KB Sizes 0 Downloads 49 Views

Image and Vision Computing 20 (2002) 751–759 www.elsevier.com/locate/imavis

Image mosaicing using sequential bundle adjustment Philip F. McLauchlan*,1, Allan Jaenicke1 School of EE, IT and Mathematics, University of Surrey, Guildford GU2 7XH, UK Received 10 June 2001; received in revised form 13 March 2002; accepted 14 March 2002

Abstract We describe the construction of accurate panoramic mosaics from multiple images taken with a rotating camera, or alternatively of a planar scene. The novelty of the approach lies in (i) the transfer of photogrammetric bundle adjustment techniques to mosaicing; (ii) a new representation of image line measurements enabling the use of lines in camera self-calibration, including computation of the radial and other non-linear distortion; and (iii) the application of the variable state dimension filter to obtain efficient sequential updates of the mosaic as each image is added. We demonstrate that our method achieves better results than the alternative approach of optimising over pairs of images. q 2002 Elsevier Science B.V. All rights reserved. Keywords: Bundle adjustment; Mosaic; Sequential algorithm; Recursive algorithm

1. Introduction Mosaicing is a common and popular method of effectively increasing the field of view of a camera, by allowing several views of a scene to be combined into a single view. To use the technique one must assume that either the camera is rotating in a stationary scene, or that the scene is approximately planar. In these cases there is no parallax induced by the motion, and so no truly 3D effects are involved, simplifying the task of combining views. Existing methods are based on either image-to-image warping [5,11,12], or corner feature location and matching [2]. There are also commercial mosaicing systems such as the RealViz ‘Stitcher’ software. In all cases the aim is to recover the homographies that map images to each other, and thence allow the images to be transformed and combined in a single coordinate system. When seamless, high-resolution mosaics are required it makes sense to consider the techniques that are used in full 3D structurefrom-motion and photogrammetry, which have been devel-

* Corresponding author. þ 44-1483-876037; fax: þ 44-1483-876031. E-mail addresses: [email protected] (P.F. McLauchlan), [email protected] (P.F. McLauchlan), ahj@imagineersoftware. com (A. Jaenicke). 1 Imagineer Software Ltd, The Technology Centre, The Surrey Research Park, Guildford, Surrey GU2 7YG, UK

oped over many years for high-accuracy metrology. That is the main goal of this paper. The most relevant technique is bundle adjustment [1], which is a photogrammetric technique to combine multiple images of the same scene into an accurate 3D reconstruction. Initial estimates of the 3D location of features in the scene must first be computed, as well as estimates of the camera locations. Then bundle adjustment applies an iterative algorithm to compute optimal values for the 3D reconstruction of the scene and camera positions, by minimising the log-likelihood of the overall feature projection errors using a least-squares algorithm. It is quite straightforward to modify the usual bundle adjustment algorithms for image mosaicing; indeed it amounts merely to a simplification from 3D to 2D. It is perhaps more surprising that this does not appear to have been attempted before. Consider the advantages of this procedure: † The result is statistically optimal given the usual assumptions concerning the projection errors (unbiased, Gaussian, known variance). † All geometric constraints among the features are enforced. This is the most important for dense mosaics where each part of the scene is viewed in several (i.e. more than two) images, as we shall demonstrate in our results.

0262-8856/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved. PII: S 0 2 6 2 - 8 8 5 6 ( 0 2 ) 0 0 0 6 4 - 1

752

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

† Self-calibration is a common feature of bundle adjustment algorithms, and different assumptions (e.g. fixed intrinsic parameters/varying focal length/varying focal length and zoom) may be incorporated without difficulty. † Features other than points may be incorporated, for instance lines and curves [8] and all the feature information utilised in an optimal way. † Since 2D bundle adjustment is basically a reduction of existing methods from 3D to 2D, it is fairly easy to implement given existing 3D reconstruction tools. On the other hand, it cannot be doubted that bundle adjustment is a more complex procedure than simply computing the pairwise image-to-image homographies. It is probably thought of as somewhat excessive when visualisation of an extended field of view is the primary aim. However, where accuracy is a prerequisite, and/or selfcalibration is an aim, the bundle adjustment method should be considered. The benefits of a feature-based method over direct image matching are: † Features constitute a reduction in the data to be processed, while containing most of the geometric information useful for image matching. † A robust feature matching algorithm such as RANSAC [3] allows part of the scene to be in motion and still match the stationary part of the scene. † It is difficult to generalise image matching beyond matching pairs of images, so constructing dense mosaics with many overlapping images will give rise to errors. In order to apply bundle adjustment techniques there must be a clear distinction between the ‘image’ and ‘scene’ representations, so that a single scene reconstruction can be computed from multiple images. In order to use imagebased techniques one would have to define an ‘abstract’ image, for instance in spherical/cylindrical coordinates, to be computed from multiple projections. The ‘parameters’ of this abstracted image would be the pixel values at each point in the coordinate space, in other words the mosaic itself. There is no doubt that such a method could produce good results, but computationally it would be prohibitive because of the size of the parameter space. We distinguish between sequences taken with a still camera, typically from markedly different views and a small overlap between images, and video sequences with closely spaced images. In the latter case feature matching is relatively straightforward, and the challenge is to register a large quantity of image data in a reasonable length of time and without building up error through the sequence. With still images, conversely, the major problem is feature matching, so we have designed an interface based on Java and OpenGL that allows the user to place each image in turn by hand, providing an approximate registration that is then optimised.

We employ the variable state dimension filter (VSDF) [6, 8] to implement sequential bundle adjustment. The VSDF supports standard sparse matrix techniques that are used to accelerate bundle adjustment iterations. The sequential feature of the VSDF is important for two reasons: firstly it allows long sequences to be registered in a reasonable length of time, and secondly, when using the user interface to build the mosaic, the results can be viewed as each image is processed and added to the mosaic. This greatly enhances the user-friendliness of the software as well as aiding interactive control, such as manual error correction. Another novel aspect of our work is the use of lines in self-calibration, in particular calibration of the non-linear distortion parameters, for which in the past points have always been used. To accomplish this requires a new model for the projection of straight lines into images, which we discuss in detail in Section 3. The method can be generalised to self-calibration under general 3D viewing conditions. Section 2 describes the reconstruction model for points, i.e. how the point features are represented in the mosaic, and how the camera motion is represented. Section 3 discusses how the representation is extended to straight lines. Feature matching using RANSAC is dealt with in Section 4, and Section 5 explains how initial estimates for the point and line reconstructions are computed from the feature matches. Then Section 6 describes the bundle adjustment algorithm employed to optimise the reconstruction, the VSDF [7–9]. Results and conclusions are then presented in Sections 7 and 8.

2. Projection model We consider first the case of a rotating camera. We represent the scene as fixed features in a scene-centred coordinate frame, based on which the images provide observations. Thus a point in the scene is represented by a direction, which we implement as a unit vector X ¼ ðX Y ZÞT : Then the projection of point X onto the image plane x; y is in homogeneous coordinates p ¼ KdðRX; K1 ; K2 Þ 0

px

1

0

fx

B C B C B or B @ py A ¼ @ 0 pz

0

0

x0

1 00

RXX

fy

C BB BB y0 C Ad@@ RYX

0

1

RZX

RXY RYY RZY

10 1 1 X CB C C B C C RYZ C A@ Y A; K1 ; K2 A

RXZ

RZZ

Z

ð1Þ where † p contains the homogeneous image coordinates of the projected point, so that x ¼ px =pz ; y ¼ py =pz : † K is a calibration matrix containing the camera focal lengths fx ; fy and the image centres x0 ; y0 : † R is a 3 £ 3 rotation matrix describing the rotation of the camera. † X is the scene point and kXk2 ¼ 1:

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

† d(·) is a distortion function describing the non-linear image distortion, which we represent as two terms of radial distortion: 0 !1   X 2 4 B 1 þ K1 r þ K2 r C dðX; K1 ; K2 Þ ¼ B Y C @ A Z where r 2 ¼ ðX 2 þ Y 2 Þ=Z 2 and K1 ; K2 are the radial distortion parameters. The camera rotation R is represented using the local rotation approach, employed in 3D reconstruction by Taylor and Kriegman [14]. To reconstruct the mosaic, considering point features only at this stage, the initial registration provides the bundle adjustment algorithm with estimates of the camera motion RðjÞ for each image j: The feature detection and matching stages then provide observations x; y for each feature i in images j; which are used along with the initial RðjÞ to compute initial estimates of the scene point Xi : The bundle adjustment then adjusts the estimates of motion RðjÞ ; scene structure Xi and (optionally) calibration parameters K and K1 ; K2 : 2.1. Projective model for points In the case of reconstructing a plane, or when the calibration is not known even approximately, or finally that non-linear distortions are either neglected or compensated for in the image, we can apply the uncalibrated projective method that reconstructs the mosaic up to a projective transformation. The point projection equation is then 0 1 0 10 1 px PXX PXY PXZ X B C B CB C C B CB C p ¼ PX or B ð2Þ @ py A ¼ @ PYX PYY PYZ A@ Y A pz

PZX

PZY

PZZ

Z

The matrix P is a 3 £ 3 matrix defining a homography from projective 2D scene coordinates into projective 2D image coordinates. It is clear that the PðjÞ s and Xis can be recovered up to a general homographic ambiguity. The scale freedoms in each PðjÞ are removed by imposing the constraint kPðjÞ k2F ¼ 1; where k·kF denotes the Frobenius matrix norm.

753

This leads to the equivalent equation for lines under linear perspective projection: l ¼ K 2T RL

ð3Þ

Eq. (3) would then define the line projection model, just as Eq. (1) describes the point projection model. With nonlinear distortion, however, the scene line projects to a curve, forming the well-known ‘barrel’ or ‘pin-cushion’ effects in the image. This complication, a qualitative change in the projection equation for lines, has prevented the direct use of lines in non-linear self-calibration methods up to now. However, we can circumvent this problem by modifying the line observation model. We build observations of lines by going back to each edge point that was used to fit the line. One can construct a measurement for each edge point p by considering the corresponding scene point X to lie on the scene line L. For each edge observation we introduce an angle parameter u that describes where on the line L the point X lies. Then the edge point observation is written as an observation depending on both L and u. In this way line reconstruction can proceed in a similar manner to point reconstruction. We must first define a reference zero for the angle u. We do this by constructing another line L0 perpendicular to L. We then define u by the equation XðL; uÞ ¼ cos uðL £ L0 Þ þ sin u½ðL £ L0 Þ £ L

ð4Þ

relating L, L0 and u to the point on the line XðL; uÞ: With this construction we have automatically L·XðL; uÞ ¼ 0: The final line projection model is obtained by combining Eqs. (1) and (4). This results in a projection equation p ¼ KdðRXðL; uÞ; K1 ; K2 Þ

ð5Þ

relating the line L, angle u, calibration parameters K; K1 ; K2 and rotation R to the edge point image position p. This model is unusual because a u parameter is attached to each edge point observation. It might be thought that this would considerably increase the computation involved in updating the reconstruction, but in fact sparse matrix techniques can be applied and not much extra computation is involved. It is impractical to build an observation for every edge point, and so instead a set of ‘representative’ points are chosen. In the results we show later we used just the line endpoints. 3.1. Projective model for lines The projective line projection equation is

3. Line features We employ an infinite line representation, which similarly to points can be represented as a unit 3-vector L such that L·X ¼ 0 and kLk2 ¼ 1: Under a linear camera model (i.e. no radial distortion), this would project to the line l·p ¼ 0 where p ¼ KRX

l ¼ P2T L

ð6Þ 2

Again the scene line L is normalised to kLk ¼ 1: Without the non-linear distortion there is no need to introduce the representative points on the image lines in order to relate scene to image; the image line parameters l may be used directly, first reducing them to a non-redundant representation in a coordinate-frame independent manner (details omitted).

754

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

Fig. 1. Five images of Market Square in Brussels, with corner and line image features.

4. Feature matching We set the rotation of the first image R(1) identity (or the homography P(1) to identity in the projective case). Given then an initial approximate rotation R0ðkÞ (or homography P0ðkÞ ) of a new image k and existing scene, the feature matching attempts to match the point and line features in the new image to the existing point and line scene vectors Xi and Li. The main assumption is that the solution involves a small modification to the initially estimated R0ðkÞ or homography P0ðkÞ : The first stage, therefore, involves a search around each projected scene point p ¼ KdðRðkÞ X; K1 ; K2 Þ and projected line l ¼ K 2T RL for potential matching image features in the new image k (or p ¼ PX and l ¼ P2T L for the projective model). In the case of lines we first remove the non-linear distortion from the line endpoints using the latest distortion parameter estimate. Once candidate matches have been computed, RANSAC [3] is used to find a rotation RðkÞ or homography PðkÞ consistent with a large number of scene feature/image feature matches. RANSAC works by randomly sampling small sets of feature matches from the candidate matches, so that a number of camera motion hypotheses can be produced.

These are each checked against the complete candidate match set, producing a ‘vote’ for each motion hypothesis. In this way motion estimation and feature matching is integrated, since the candidate matches voting for the sample with the highest vote are taken as the best guess for the feature matching. We apply homography fitting using point and line matches, which requires a minimum of four matches. We have found that best results are obtained using samples of five matches, from which the homography is computed using the linear eigensystem technique. We first separate the homography P to be estimated into its three rows and combine these rows into a single matrix P: 0 1 0 1 P1 PT B 1C B C B C C P¼B P ¼ B PT2 C; @ P2 A @ A P3 PT3 Given a match from scene point X to image point p, we convert the projection Eq. (2) into the form px P3 ·X 2 pz P1 ·X ¼ 0; py P3 ·X 2 pz P2 ·X ¼ 0

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

755

collected into a linear matrix equation AP ¼ 0 We then solve for P and hence P by computing the eigenvector of AT A corresponding to its smallest eigenvalue, i.e. the right null-vector of A. Note the essential difference between this procedure and the more usual RANSAC matcher used for matching features between images, e.g. [2]. We match the reconstructed abstract features Xi, Li to the new image data, rather that the previous image features. This enforces a tighter constraint by ensuring that all the images of a feature are consistent with a single abstract feature, and is more in tune with the strict global rigidity assumption of structure from motion.

5. Computing initial point and line estimates

Fig. 2. The first two images from the Market Square sequence stitched together. On the left is the manual alignment, on the right the optimised registration. The bottom pair shows close-ups of the join between the two images in the two cases.

This is a homogeneous linear equation in the elements of P. Similarly, for a line L we identify two points X1, X2 as ‘representative’ points on the line, by back-projecting the previous image endpoints onto L. Then if the image line in the new image is l, we can use the equations lx P1 ·X1 þ ly P2 ·X1 þ lz P3 ·X1 ¼ 0; lx P1 ·X2 þ ly P2 ·X2 þ lz P3 ·X2 ¼ 0 These equations for the sample of point and line matches are

Before we can apply bundle adjustment, we must compute estimates of the point and line parameters, to provide a starting point for the optimisation. We start from the presumption that the RANSAC matcher has successfully computed an estimate for the camera motion ðR or PÞ consistent with a large number of tracked features. Features in the new image that were not tracked from previous images need to be added to the reconstruction. To do this we need to show how to compute X/L consistent with the single view available of the new feature.2 This can be achieved simply by back-projecting from the feature’s image coordinates onto the scene point/line coordinates, inverting the projection Eqs. (1) and (2) for points and Eqs. (5) and (6) for lines. The result is X ¼ R21 d21 ðK 21 p; K1 ; K2 Þ;

X ¼ P21 p

for perspective and projective point models, respectively, given an image point p, and     L ¼ ðX1 £ X2 Þ; L ¼ P21 p1 £ P21 p2 for perspective and projective line models, given image line endpoints p1, p2. d 21(·) is the inverse distortion function that removes non-linear distortion from a distorted image point. In the perspective line case, X1 and X2 are back-projections of the line endpoints p1 and p2:   X1 ¼ R21 d21 K 21 p1 ; K1 ; K2 ;   X2 ¼ R21 d21 K 21 p2 ; K1 ; K2 In all these cases, X and L are scaled to unit length. In the case of perspective lines, we must also choose the reference line L0 perpendicular to L0. This could be

Fig. 3. The five images from the Market Square sequence stitched together. The visible image boundaries are not registration errors, but brightness variations between and within images, which we do not currently model. Indeed the centre of each image is significantly brighter than the periphery.

2

For 3D reconstruction the situation would be more complicated because at least two views of a new feature are needed to initialise it in the reconstruction.

756

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

Fig. 4. Six images from a video sequence of Stefan Edberg. The images were digitised at frame rate, so the 216 images of the sequence represent about eight seconds of real-time video.

chosen as either X1 or X2, but for the sake of symmetry we choose L0 as the unit eigenvector of the matrix 2T X1 X2T corresponding to the largest eigenvalue 1 þ X2 X2 (L is the null-vector). When L is adjusted, L0 must also be adjusted to maintain L·L0 ¼ 0; which we achieve using the replacement L0 ˆ ðL £ L0 Þ £ L which is also rescaled to unit length.

6. Bundle adjustment Bundle adjustment is an iterative Gauss – Newton method, implementing non-linear least squares, computing the mean of the likelihood or posterior distribution (depending on whether prior knowledge is present or not), and taking advantage of sparseness in the system

information matrix to speed up its inversion. The VSDF actually uses the Levenberg – Marquardt algorithm, which is a modification of Gauss – Newton. To apply the VSDF method we have to define the vector of state parameters x, and an observation model which relates the state parameters to the image observations. The state x will be constructed from all the unknowns in the system: the np scene points Xi and nl lines Li, the camera rotations RðjÞ ; the calibration parameters K; K1 ; K2 ; and the line observation angle parameters ui ðjÞ: These are all bundled together into x as follows: 0

xz

1

B C Bx C B lC C x¼B B C B xd C @ A xf

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

757

Fig. 5. The Stefan sequence stitched together (a) using a linear (pinhole) camera model; (b) incorporating non-linear radial distortion parameters K1 and K2 ; and (c) using the accumulated homographies of image pairs (linear camera model). The total processing time in each case was about 3 h on an Ultra-Sparc.

where 0

u1ð1Þ

1

C B B . C B . C B . C C B C B B u1ðkÞ C C B C xz ¼ B B . C; B . C B . C C B B . C B . C B . C A @ unðkÞ 0 1 fx B C B fy C B C B C Bx C B 0C C xf ¼ B B C B y0 C B C B C B K1 C @ A K2 and

0

L1

1

C B B . C B .. C C B C B C B B Lnl C C B xl ¼ B C; B X1 C C B C B B . C B . C B . C A @ Xnp

0 B B B B xd ¼ B B B B @

rð1Þ .. . .. . rðkÞ

1 C C C C C; C C C A

† The ‘observation’ parameters xz are attached to individual observations, in this case the angles ui ðjÞ: † The ‘local’ parameters xl are the scene structure parameters, in this case 2D points Xi and lines Li in homogeneous coordinates. † The ‘dynamic’ parameters xd represent the camera motion over time. Each 3-vector r( j ) represents the local rotation parameters for image j. † The ‘fixed’ parameters xf represent the camera calibration. Here we assume that the calibration is the same for each image; if not then xf would consist of multiple sets of calibration parameters. We now turn to the observation model, which defines the way in which the state parameters relate to the image observations. The VSDF software requires that the projection Eqs. (1), (2) and (6) be translated into routines to evaluate the image point/line coordinates given the relevant parts of the state vector x for each observation. This projected feature is then compared with the actual (i.e. matched) image feature, the difference or ‘innovation’ being used to update the state vector. The routines should also evaluate the Jacobians of the projection with respect to x for

758

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

use by the Gauss –Newton iterations. Other routines are used to set up the initial state vector values given an initial ‘batch’ of image observations along with any prior knowledge, for instance of calibration, and also to initialise new scene features and camera rotations/homographies when in VSDF sequential mode. Details of the VSDF algorithm are given in Ref. [6]. The VSDF uses the above routines to build the linear system used in Levenberg – Marquardt iterations. The linear system matrix turns out to be sparse, and this is exploited to compute an efficient solution using the recursive partitioning algorithm [13]. In fact the computational complexity for a bundle iteration of this problem is theoretically Oðk2 ðk þ np þ nl ÞÞ; i.e. proportional to the number of features np and nl ; and is not affected by the line observation parameters ui ðjÞ: In the case of uncalibrated projective reconstruction, there are no angles ui ðjÞ and so no observation state vector xz. Also there are no calibration parameters, so xf is also absent. The local state vector xl is made up of the scene points Xi and lines Li, and the dynamic state vector contains the elements of the PðjÞ matrices.

abstracted scene point/line features X/L, a procedure which guarantees, just as in the case of 3D reconstruction, that all available geometric constraints are exploited. In Fig. 5a we show the results of stitching 216 images together, using a linear (pinhole) camera model. The non-linear image distortion is clear from the marked curvature of the tennis court baseline. Indeed registration breaks down slightly in some areas of the tennis court. This is mainly because most features are detected among the spectators, so that the registration is best there. When the distortion is modelled, the result is much better, both in removing the curvature and registering all images correctly, as can be seen in Fig. 5b. The distortion parameters are recovered by the sequential VSDF algorithm, initialising K1 and K2 to zero. Note that Stefan himself has disappeared. Such a clean separation of a moving foreground object is only possible with a densely overlapping sequence. Finally Fig. 5c shows what happens when we only match features over adjacent image pairs. The RANSAC matching algorithm is unchanged; only the manner in which the matched data is employed is simplified. As expected, the errors in the homographies accumulate over the sequence, to the detriment of the registration.

7. Results We present here results for mosaic reconstruction from point and line features for two image sequences, one of separate pictures taken with a still camera, the other a long video sequence. Five overlapping images from Market Square in Brussels are shown in Fig. 1. To locate point features we use the Plessey corner detector [4], and for lines edge detection followed by Hough transform line fitting [10]. We build up the mosaic image by image, registering the images approximately by hand, and then matching features and optimising using the software. Fig. 2 shows the process for the first two images. The result of stitching the five images together is shown in Fig. 3. The visible joins are due to brightness variations between the images, which we neglect in the current software. While these results are good, other systems can also perform well on such images. More challenging is the long video sequence of which we show a sample in Fig. 4. Here the danger is that for a long sequence of closely spaced images, the registration error will build up as the sequence is processed. Any system based on exploiting pairwise overlaps between images would have difficulties here, because to maintain good registration overlaps between non-adjacent images have to be utilised, and it is not at all clear how to select the image pairs for this purpose. The sequence is especially difficult because the camera initially pans left, but at around frame 110 it backtracks to the right, finishing beyond the initial position. Thus correct registration involves matching images between the start and end of the sequence. Our system circumvents this problem by constraining image features not to each other but to

8. Conclusions We have designed a sequential semi-automatic mosaicing system for building high-accuracy mosaics from multiple images with a rotating camera, or images of a planar scene. The system is completely automatic when processing a sequence of closely spaced images such as from video. The system is flexible in that it can compute either a calibrated (optionally with self-calibration) or uncalibrated mosaic. It has performed well on an image sequence that would be difficult to register accurately using alternative methods.

Acknowledgment Allan Jaenicke was funded by a Nuffield Foundation undergraduate research bursary.

References [1] K.B. Atkinson, Close Range Photogrammetry and Machine Vision, Whittles Publishing, Caithness, Scotland, 1996. [2] D. Capel, A. Zisserman, Automated mosaicing with super-resolution zoom, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1998) 885–891. [3] M.A. Fischler, R.C. Bolles, Random sample concensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications in ACM 24 (6) (1981) 381 –395. [4] C.J. Harris, M. Sephens, A combined corner and edge detector,

P.F. McLauchlan, A. Jaenicke / Image and Vision Computing 20 (2002) 751–759

[5]

[6]

[7]

[8]

[9]

Proceedings of the Fourth Alvey Vision Conference, Manchester (1988) 147–151. M. Irani, P. Anandan, S. Hsu, Mosaic based representations of video sequences and their applications, Proceedings of the Fifth International Conference on Computer Vision, Boston (1995) 605–611. P.F. McLauchlan, The variable state dimension filter. Technical Report VSSP 4.99, University of Surrey, Department of Electrical Engineering, December, 1999. P.F. McLauchlan, J. Malik, Vision for longitudinal control, in: A. Clark (Ed.), Proceedings of the Eighth British Machine Vision Conference, BMVA Press, Essex, 1997. P.F. McLauchlan, D.W. Murray, A unifying framework for structure and motion recovery from image sequences, Proceedings of the Fifth International Conference on Computer Vision, Boston (1995) 314–320. P.F. McLauchlan, D.W. Murray, Active camera calibration for a headeye platform using the variable state-dimension filter, IEEE

[10]

[11]

[12] [13] [14]

759

Transactions on Pattern Analysis and Machine Intelligence 18 (1) (1996) 15–22. J. Kittler, P.L. Palmer, M. Petrou, An optimising line finder using a hough transform algorithm, CVGIP: Image Understanding 67 (1) (1997) 1–23. H.S. Sawnhey, R. Kumar, True multi-image alignment and its application to mosaicing and lens distortion correction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1997) 450–456. H.Y. Shum, R. Szeliski, Panoramic image mosaics. Technical Report MSR-TR-97-23, Microsoft, 1997. C.C. Slama, C. Theurer, S.W. Henriksen, Manual of Photogrammetry, American Society of Photogrammetry (1980). C.J. Taylor, D.J. Kriegman, Structure and motion from line segments in multiple images, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (11) (1995) 1021–1032.