Further Five-Point Fit Ellipse Fitting

Further Five-Point Fit Ellipse Fitting

Graphical Models and Image Processing 61, 245–259 (1999) Article ID gmip.1999.0500, available online at http://www.idealibrary.com on Further Five-Po...

120KB Sizes 6 Downloads 58 Views

Graphical Models and Image Processing 61, 245–259 (1999) Article ID gmip.1999.0500, available online at http://www.idealibrary.com on

Further Five-Point Fit Ellipse Fitting Paul L. Rosin Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex UB8 3PH, United Kingdom E-mail: [email protected] Received July 8, 1998; revised February 23, 1999; accepted July 6, 1999

The least-squares method is the most commonly used technique for fitting an ellipse through a set of points. However, it has a low breakdown point, which means that it performs poorly in the presence of outliers. We describe various alternative methods for ellipse fitting which are more robust: the Theil–Sen, least median of squares, Hilbert curve, and minimum volume estimator approaches. Testing with synthetic data demonstrates that the least median of squares is the most suitable method in terms of accuracy and robustness. °c 1999 Academic Press Key Words: ellipse fitting; least squares; Theil–Sen; least median of squares; Hilbert curve; minimum volume estimator.

1. INTRODUCTION Ellipses commonly occur in manmade scenes, often being formed as projections of circular objects onto the image plane. They provide a useful representation of parts of the image since (1) they are more convenient to manipulate than the corresponding sequences of straight lines needed to represent the curve and (2) their detection is reasonably simple and reliable. Thus they are often used by computer vision systems for model matching [4, 6]. Over the years much attention has been paid to fitting ellipses to data samples, and many variations of the standard method for finding the least-squares (LS) solution exist [5, 7]. However, computer vision often requires more robust methods that can tolerate large numbers of outliers since there is the likelihood that the data will be substantially corrupted by faulty feature extraction, segmentation errors, etc. While LS is optimal under Gaussian noise it is very sensitive to severe non-Gaussian outliers and is therefore unsuitable for many vision applications. This has led to increased interest in the application of robust techniques to vision problems [12, 14]. Previously we described a robust method for fitting ellipses to curve data [17]. It uses the method of minimal subsets to accumulate ellipse hypotheses. A minimal subset is the smallest number of points necessary to uniquely define a geometric primitive (five for an 245 1077-3169/99 $30.00 c 1999 by Academic Press Copyright ° All rights of reproduction in any form reserved.

246

PAUL L. ROSIN

ellipse). Many five-tuples of points are selected from the full data set. Those that define nonelliptic conics are rejected, and the remainder are considered as hypotheses of the ellipse that best fits all the data. An ellipse can be described by the general equation for a conic, Ax 2 + Bx y + C y 2 + Dx + E y + F = 0, or by its more intuitive geometric parameters, namely center coordinates (xc , yc ), major and minor axes {a, b}, and orientation θ. We have generally found it convenient to fit the ellipse using the conic form and then convert to the geometric representation using [8] B E − 2C D 4AC − B 2 B D − 2AE yc = 4AC − B 2 v u −2F u q {a, b} = 2t ¡ ¢2 A + C ± F B 2 + A−C F xc =

θ=

1 B tan−1 . 2 A−C

Each of the geometric parameters of the ellipse is estimated as the median of the parameters of the hypothesized ellipses. In other words, if Psq is the qth parameter estimated from minimal subset s, then the final estimates are Pˆ q = meds Psq for q = 1, . . . , 5. This is in fact an application of the Theil–Sen estimator [28], which has already been used for fitting lines [11] and circular arcs [15]. Other applications of the minimal subset method to the estimation of geometric features are given in [2, 22, 27, 29]. Unfortunately there are a number of inadequacies with the method just described, involving (1) the treatment of circular parameters (i.e., orientation), (2) statistical efficiency, and (3) correlation between the five parameters. This paper presents several solutions to these problems and describes some variations on the theme of robust ellipse fitting. 2. ELLIPSE FITTING TECHNIQUES 2.1. Approximate Circular Median The Theil–Sen method described above estimates each parameter by the median of the set of hypothesized parameters produced from the minimum subsets. While this is adequate if the ellipses are described by their conic coefficients, there is a problem when the ellipses’ geometric parameters are used.1 Since the ellipse orientation is directional data, the median will produce incorrect estimates if the data is clustered around the ends of the range [0, 2π ]. The natural solution would be to use the circular median in place of the linear median [13, 27]. This is defined as the point P such that half of the data points lie on either side of the diameter P Q of the unit circle (on which the data lies) and the majority of 1

Experiments showed that it was preferable to use the geometric parameters of the ellipse rather than their corresponding conic coefficients [17]. Their values tend to lie in a smaller range than the coefficients and are invariant to translation and rotation of the data. Therefore all the methods described in this paper use geometric parameters rather than conic coefficients.

247

FIVE-POINT FIT ELLIPSE FITTING

points are closer to P than to Q. More formally [26], if θ1 , θ2 , . . . , θn , 0 ≤ θi ≤ 2π , are the angles of a set of n-dimensional vectors, then the median of the angles is that P angle θ minimizing the sum of geodesic distances j d(θ, θ j ) around the circle, where d(θ, θ j ) = min(|θ − θ j |, 2π |−θ − θ j |). However, unless the values are binned (which is undesirable since the results will then depend on an arbitrary bin size) all pairs of points need to be considered, which is computationally expensive. Instead, as a compromise between efficiency and robustness, we use a combination of the circular mean and the circular median. The circular mean µ of a set of n orientation samples θi is defined as [13]  0 if S > 0 and C > 0  µ 0 µ= µ +π if C < 0   0 µ + 2π if S < 0 and C > 0, where µ0 = tan−1

S , C

C=

n 1X cos θi , n i=1

S=

n 1X sin θi . n i=1

(1)

We rotate the data in order to center its mean at the orientation midrange, π , θi0 = (θi − µ + π ) mod 2π, and then perform a linear median, after which we undo the rotation: θˆ = (med θi0 + µ − π ) mod 2π. Since the median can be found in linear time, the running time of the approximate circular median is also O(n). Although the Theil–Sen estimator is more robust than the LS method, for five dimensions its breakdown point2 is only 12.9% since the fraction of outliers ² must satisfy (1 − ²)5 ≥ 0.5, so that ² = 1 − ( 12 )1/5 . Although the results published in Rosin [17] appear to show better robustness, this is possibly due to many of the conics passing through the five-tuples being nonellipses (i.e., hyperbolae or parabolae) and they are therefore rejected. 2.2. Repeated Median Estimators more robust than the Theil–Sen are available. For instance, the repeated median (RM) [25] has a breakdown point of 50%, which is achieved by calculating q Pˆ q = med med med med med Pi, j,k,l,m , i

j

k

l

m

q

where Pi, j,k,l,m is the qth parameter estimated from the five points {pi , p j , pk , pl , pm }. For n points this recursive application of the median results in ( n5 ) tuples being considered—the same complexity as the Theil–Sen estimator. Since this is O(n 5 ), in practice evaluating all the tuples is too costly, and fitting must be speeded up by only selecting a subset of the possible tuples. More details on the selection of these tuples will be described later; however, they are not easily applied to the repeated median estimator. 2 An estimator’s breakdown point is defined as the percentage of outliers that may force the estimator to return a value arbitrarily larger than the correct result.

248

PAUL L. ROSIN

2.3. Least Median of Squares Another robust estimator with a breakdown point of 50% is the least median of squares (LMedS) estimator [23], which was applied to ellipse fitting by Roth and Levine [21]. For each of the minimal subsets s they calculated the errors ²i from each point pi to the ellipse defined by pi∈s . The LMedS ellipse has the parameters defined by the subset with the least median error, namely min med ²i2 . s

i

Unlike the Theil–Sen and RM estimators this application of the LMedS uses the residuals of the full data set of each hypothesized ellipse. The ideal error measure would be the Euclidean distance between the point and the ellipse, but that is expensive to calculate, requiring solving a quartic equation and choosing the smallest of the four solutions. In practice there are many simpler and numerically more stable approximations that are often used [18–20]. The principal one (and both the simplest and the most efficient) is the algebraic distance, while other more accurate approximations include the gradient-weighted algebraic distance (used by Roth and Levine), the angular bisector method, and the orthogonal conics method.3 Although the LMedS method is significantly more robust than the least-squares method, it is less efficient as a statistical estimator (i.e., its asymptotic relative efficiency) as well as being generally less efficient computationally. This can be improved by “polishing” the fit returned by the best minimal subset. First the noise is robustly estimated using the median absolute deviation (mad) which finds the median of the errors and returns the median of the differences of the errors from that median, ¯ n ¯¯ n ¯ mad = med¯²i − med ² j ¯. i=1

j=1

The median is modified by f , which is a Gaussian normalization and finite-sample correction factor [23], µ ¶ 5 f = 1.4826 1 + , n−1 and can then enable outliers to be robustly detected by applying the indicator function ½ wi =

1

|²i | < 3 × f mad

0

otherwise.

The ellipse parameters are now reestimated by performing a standard LS fit, but weighting the data points pi by wi to eliminate the effect of (robustly detected) outliers. 3 A brief description of the distance measures follows (for more details and analysis see [18–20]): the algebraic distance is calculated by substituting the point’s coordinates into the implicit conic equation defining the ellipse; the gradient-weighted algebraic distance divides the above by its gradient at the point; the angular bisector method constructs two straight lines from the point to the ellipse’s foci, intersects the angular bisector of these lines and the ellipse, and returns the distance between the point and the calculated intersection; the orthogonal conics method constructs the confocal hyperbola to the ellipse that passes through the point of interest and returns the distance between the point and the intersection between the ellipse and the hyperbola.

FIVE-POINT FIT ELLIPSE FITTING

249

FIG. 1. Hilbert curve.

In addition to its robustness, a significant advantage of the LMedS approach just described is that all the parameters are estimated simultaneously. A weakness of the Theil–Sen and RM approaches is that each parameter is estimated independently, thereby ignoring any correlations between parameters that may be present. 2.4. Hilbert Curve A different approach to estimating the five ellipse parameters simultaneously is to map the 5D parameter space onto 1D. This allows a simple 1D estimation technique to be employed as before, after which the result is mapped back to the full 5D parameter space. Previously the Hilbert curve has been used for dimensionality reduction to improve data visualization [10] and to cluster data as part of texture analysis [16]. Figure 1 shows a 2D example of the Hilbert curve. It can be seen that it is a space-filling curve with the property that points close in 2D usually map into points close in 1D along the curve. Many algorithms for converting to and from 2D Cartesian coordinates and arclength along the Hilbert curve, usually based on performing bitwise operations, have been described. We have used Butz’s [3] algorithm, which can operate on arbitrary dimensional coordinates. Since the ellipse orientation is circular it is treated separately using the circular median. Therefore, in practice we analyze the remaining parameters using a 4D Hilbert curve. These four parameters from each subset are converted into a Hilbert curve arclength. The median subset arclength is converted back into 4D space, providing the parameters of the ellipse estimate. Regarding practical issues, the Hilbert curve is quantized so that each axis is divided into 256 bins. The four parameters are all measured in the same units (pixels), obviating scaling problems. Since the Hilbert curve only covers a subset of R4 we must specify the desired range of each axis. In our experiments we have set all of them to be [0, 1000]. 2.5. Vector Median and MVE Another way to estimate the parameters simultaneously rather than performing independent 1D medians is to use a multivariate generalization of the median. Again the ellipse orientation that is circular would be treated separately. For the remaining parameters the vector median p M of the points pi ∈ R4 in parameter space is defined as [1] min M

S X i=1

kp M − pi k,

250

PAUL L. ROSIN

where p M ∈ {pi } and S is the number of minimal subsets producing candidate parameter estimates. A disadvantage is that all pairs of points need to be considered, making the method computationally too expensive. If all minimal subsets are evaluated, then the total algorithm becomes O(n 10 )! Just as the vector median is a robust way to estimate the center of a set of points, another robust technique is the minimum volume ellipsoid estimator (MVE) [23]. The MVE is defined as the ellipsoid with the smallest volume that contains half of the data points. It is affine invariant and so it can be easily applied to multivariate data without the need to normalize the dimensions, although that is not an issue here. The center of the MVE is used as an estimate of the ellipse parameters. As usual, the ellipse orientation is estimated separately by an approximate circular median. Since calculation of the MVE is expensive it is approximated by a random resampling algorithm approach similar to the minimal subset method we are using for ellipse fitting. For p-dimensional data, subsamples of size p + 1 are drawn. From each subset the mean x¯ s and covariance Cs are calculated, defining a sample ellipsoid. The ellipsoid is scaled by m s so that it covers half the points, where n

¯ s )T . m 2s = med(xi − x¯ s )C−1 s (xi − x i=1

The volume of the ellipsoid is proportional to m 2s p det Cs and so the algorithm selects the subset producing the smallest value. From the best subset s the MVE is approximated by x¯ s 2 2 2 2 2 and (χ p,0.5 )−1 cn, p m s Cs , where cn, p = (1 + 15/(n − p)) is a small sample correction factor. Finally, robust distances are calculated for each point relative to the MVE, and the result is polished using a reweighted mean and covariance [24].

3. TUPLE SAMPLING STRATEGIES All the estimators described in this paper use the minimal subsets generated from the different combinations of data points. For the line fitting application in [11] it was acceptable to use all the possible pairs of points as the complexity was only O(n 2 ). Since ellipse fitting requires five-tuples, taking all the combinations would result in O(n 5 ) complexity. Unless only very small data sets are used this is not practically useful. However, reasonable results can still be obtained at much less computational cost if only some of the minimal subsets are computed. The simplest approach is to randomly sample the data set with replacement to generate the number of minimal subsets which is estimated as necessary to achieve the desired likelihood of successful performance. For instance, for the LMedS only one minimal subset containing no outliers is sufficient for its success. This implies that if the proportion of outliers is ², then the number of subsets N that must be chosen to include with 100 × C% confidence one composed entirely of inliers is [21, 23] N=

ln(1 − C) . ln(C − (1 − ²)5 )

Unfortunately, this analysis only considers type II noise—the outliers. But type I noise— consisting of low-level noise often modeled by additive Gaussian noise (sometimes called perturbation errors)—can also have considerable effect. In our earlier work on ellipse fitting [17] it was noted that even for near perfect test cases made up from uncorrupted

FIVE-POINT FIT ELLIPSE FITTING

251

FIG. 2. Orientation error of the line depends on the distance between the two points.

synthetic data many of the minimal subsets generated ellipses providing poor estimates of the full data set. Just the effects of quantizing the data to the pixel grid were sufficient to substantially distort these local ellipses, particularly if the points in the minimal subset were close together. Closely spaced points have a high “leverage,” and therefore small variations in their position have a large effect on the ellipse through the minimal subset. The instability of the ellipse fit when points are closely spaced can more easily be demonstrated by the example of the straight line passing through two points. If the error of the points is bounded by e and the distance between the points is d, then from Fig. 2 the maximum orientation error can be seen to be approximately θ = tan−1

2e . d

When the error is plotted as a function of the distance between the two points (Fig. 3), it is evident that the error increases exponentially as the distance decreases. Based on work carried out to analyze the distribution of spaces on lottery tickets we can calculate the probability of smallest spacing S of r sequentially ordered items selected from n as [9] µ P(S ≤ k) = 1 −

n − (r − 1)(k − 1) r

¶Á³ ´ n . r

FIG. 3. Orientation error as a function of distance between points.

252

PAUL L. ROSIN

FIG. 4. (a) Probability of choosing five points from n and getting a minimum spacing of one; (b) reduced breakdown point of estimator.

Figure 4a shows the probability of choosing five points when this is applied to ellipses. Naturally, as we increase the number of potential points that the subset is being drawn from the probability of two points in the subset being adjacent decreases. Nevertheless, it may appear surprising that even with 33 points there is still a 50% chance of a minimum subset of five points containing two directly adjacent members. Thus, random sampling will produce a number of subsets with closely spaced elements which could reduce their effectiveness and therefore reduce the robustness of the estimator. Let us take as an extreme example the situation in which minimal subsets containing adjacent points always produce erroneous ellipses. Then for a set of 33 points half the sampled tuples are likely to contain adjacent points so that even an estimator with a 50% breakdown point will be expected to have zero resistance to outliers. In general, an estimator with a breakdown point b will have its effective breakdown point reduced so that the fraction of outliers ² must satisfy (1 − ²) × P(S > 1) ≥ b. This is illustrated in Fig. 4b for an estimator with an original breakdown point of 0.5. Although in practice the effects of closely spaced points will not be as extreme as those modeled here, they will have a definite lesser effect, depending on the spatial distance between the points as well as on the amount of mislocalization of the points. Not only closely spaced points are a problem; points spaced too far apart can also lead to difficulties. In computer vision the data set for fitting the ellipse to is likely to be structured. For instance, the points may be sampled from a curve obtained by detecting edges in the image. This means that the outliers will not be randomly distributed among the data set. Instead, if the curve is poorly segmented, then one end of the curve may be derived from the elliptical feature of interest, while the other end arises from a separate outlying object such as background clutter. When outliers are clustered like this it is better to prevent the range of the minimal subsets from being too large. Otherwise, at least one point will lie on the outlying section of the curve.

253

FIVE-POINT FIT ELLIPSE FITTING

FIG. 5. (a) Probability of choosing five points from n and getting a overall range R ≥ K ; (b) probability of range being larger than a fraction f of the number of points.

If the points p j in a minimal subset are randomly selected, then we can analyze the joint distribution of the spacings X j between them [9], µ P(X j ≥ k j ; j = 1 . . . r − 1) =

n−

Pr −1

j=1 (k j

− 1)

r

¶Á³ ´ n . r

Thus, the distribution of the total range that the subset covers, i.e., R = 1 + µ

n +r −k P(R ≥ K ) = r

Pr −1 j=1

k j , is

¶Á³ ´ n , r

and is plotted for various values of K in Fig. 5a. The likelihood of a minimal subset covering a fixed range increases as the number of available points increases. Of more interest is when the range of the minimal subset is taken as a fraction f of the full data set n. Replacing K by dn f e gives the graph in Fig. 5b. The jagged outline is due to the nonlinear d•e operation, necessary since there are only a discrete number of points. We can see from the graph that for small numbers of points there is a moderate probability that a large fraction will be covered by a randomly selected minimal subset. For example 19% of five-tuples taken from 20 points will extend to cover at least half the range of the data. As an alternative to random selection of points, Rosin [17] suggested choosing only subsets whose elements were spaced regularly, i.e., X i = X j ; ∀i, j, which results in O(n 2 ) complexity. It is also easy to restrict the separation values, so that for instance they only lie within a range such that the minimum separation is chosen to avoid instability problems while the maximum separation is chosen to deal with clustered outliers. Another approach to the problem of the instability of the ellipses arising from close points is to use subsets larger than the minimal size to hypothesize ellipses using LS fitting. While this ameliorates the effects of type I noise, increasing the size of the subsets also increases the number of subsets necessary to provide the same probability of selecting at least one without any outliers.

254

PAUL L. ROSIN

FIG. 6. Examples of test data. (a) Ellipse with superimposed sine wave and Gaussian noise (σ = 10); (b) ellipse with added clutter (44%) and Gaussian noise (σ = 10).

4. EXPERIMENTAL RESULTS The different ellipse-fitting methods were tested on synthetic data to assess their performance. Four sets of test data were used, all based on 38 points sampled from an elliptic arc (axis lengths 333 and 250, subtended angle 200◦ , and instances taken over the full range of orientations) corrupted by Gaussian noise in the direction of the normals: 1. a sine wave was superimposed on the arc, generating some confusing small scale structure; 2. some points were additionally corrupted by extremely large amounts of Gaussian noise, producing outliers; 3. clutter was introduced (similar to structured outliers) by adding points along three noisy line segments; and 4. different lengths of the noisy arc were sampled. For each of the graphs about 50,000 data sets were generated. Some examples of the test data are shown in Fig. 6: an elliptic arc with the superimposed sine wave plus noise and an arc with clutter appended. All the ellipse-fitting methods were applied to the data, and the plots of the α-trimmed means4 of the error in the estimated ellipse centers are displayed in Fig. 7. The mean is trimmed since some incorrect ellipse fits produce extremely deviant parameter estimates that have a large influence on the mean. Some of the methods occasionally failed to fit an The α-trimmed mean is calculated by sorting the data, removing the top and bottom α × 100% of the data, and calculating the mean of the remaining data. 4

FIVE-POINT FIT ELLIPSE FITTING

255

FIG. 7. α-trimmed mean error of estimated center location (α = 0.1); (a) superimposed sine wave and Gaussian noise; (b) type I and II noise; (c) clutter and Gaussian noise; (d) varying subtended angle.

ellipse (fitting a hyperbola or parabola instead). In these cases the fit was ignored during the calculation of the mean. The LMedS method applied to the residuals is labeled LMedS residuals 1, 2, and 3 corresponding to the algebraic, gradient-weighted algebraic, and orthogonal conics distance approximations, respectively. In Fig. 7a, despite the presence of the superimposed sine wave, the LS method outperforms all the other methods. However, this is explained by the fact that the noise is symmetric and uniformly distributed over all the data. We can also see that the LMedS method applied to the residuals with the gradientweighted algebraic and orthogonal conics distance approximations gives better results than the remaining techniques. Thus it can be seen that improving the accuracy of the calculation of the residual from the efficient but crude measure given by the algebraic distance greatly improves the resulting fits. Figure 7b shows how adding outliers (σ = 10 and 500 for type I and II noise) causes the LS method to immediately break down. The LMedS residual methods work best now, particularly the gradient-weighted algebraic and orthogonal conics distance approximations which consistently produce good results. The Theil–Sen, Hilbert curve, and MVE methods produce less accurate results and also show earlier breakdown points.

256

PAUL L. ROSIN

FIG. 8. α-trimmed mean error of other parameters (α = 0.1); (a) major axis error; (b) orientation error.

The addition of clutter (Fig. 7c) also makes the LS estimator perform poorly, while the same two LMedS residual methods perform well again, only breaking down drastically when more than 50% of the data is made up of the clutter. In addition, the Theil–Sen method worked well, often outperforming the LMedS residual method. In part this is due to its rejection of nonelliptic conics generated by sampled tuples, many of which will occur along the straight sections of clutter. We also analyze the effect of changing the subtended angle sampled from the ellipse. The ellipse superimposed with the sinusoid was used, and the results are shown in Fig. 7d. As expected, smaller arc sections produce greater errors for all methods. It can be seen that the algebraic error is particularly error prone unless nearly complete sections of elliptical data are provided. When we look at the errors of the other estimated parameters, they appear similar to the center error plots in Fig. 7. Figure 8 shows (a) the major axis and (b) the orientation errors incurred by the superimposed sine data set. There are fewer methods shown in Fig. 8b since several share the approximate circular median approach for orientation estimation. The only difference in the relative merits of the algorithms is the poorer performance of the Theil–Sen method’s estimation of orientation. This suggests that a more robust approximation to the circular median is required. Finally, we consider the effects of the different sampling methods. All the results presented above used the complete set of tuples with regular spacing. If there are fewer than 50% outliers, then 95 randomly selected tuples should (with 95% confidence) contain at least one minimal set containing only inliers, which is therefore sufficient for the LMedS method to succeed. Two new runs of the Theil–Sen and LMedS methods were made using only 95 randomly selected tuples. Either the tuples were a random subset of the regularly spaced tuples or they were made completely randomly and were not necessarily regularly spaced. Some of the results using the regular (solid lines) and nonregular (dotted lines) spacing are shown in Fig. 9. It can be seen that randomly selected nonregular tuples generally provided a better performance except at low noise levels when regular tuples showed benefits.

FIVE-POINT FIT ELLIPSE FITTING

257

FIG. 9. α-trimmed mean error of estimated center location (α = 0.1) using regular (solid) and nonregular (dotted) tuple selection; (a) superimposed sine wave and Gaussian noise; (b) clutter and Gaussian noise.

5. CONCLUSIONS We have described and tested various methods for fitting ellipses to data. The error plots suggest that in the presence of outliers the LMedS approach applied using either of the gradient-weighted algebraic or orthogonal conics distance approximations produces the best results. Both approximations result in robust and accurate fits. If the simpler but more biased algebraic distance approximation is used, then the accuracy degrades significantly, and the robustness also suffers to a lesser extent. The LS method can produce excellent fits, but it is not robust and therefore it cannot be considered if the data may be contaminated by clutter or other outliers. For random outliers the Theil–Sen method is markedly inferior to the LMedS residual method in terms of accuracy and robustness. However, for the case of structured clutter its relative accuracy is better, and in our particular experiment it outperformed the LMedS residual method. Finally, both the Hilbert curve and MVE approaches were reasonably robust, but provided poor accuracy. With the exception of the LS approach all the algorithms described used random sampling to provide acceptable computational efficiency. The number of subsets required to produce acceptable robustness is not a function of the number of data points. Therefore the Theil– Sen and Hilbert curve methods, which just analyze parameter estimates generated from the samples, are both of O(1) complexity. The remaining approaches—least-squares, least median of squares, and minimum volume estimator—all analyze the residuals calculated over the complete data set. This dominates their complexity, which is therefore O(n). We discussed several issues concerning the sampling of points to form the minimal subsets. In order to avoid tuples with points separated by either too small or too large a gap a regular sampling of the points appeared advantageous. However, the experiments show that in fact no advantage was gained by regular sampling except at low noise levels. It should be noted that it is difficult to truly assess the performance of the fitting techniques. We have measured deviation in the parameter estimates from the underlying parameter set used to generate the contaminated data sets. However, many of the fits which produced low

258

PAUL L. ROSIN

FIG. 10. LMedS residual fit; inliers are boxed.

scores according to this criterion still represented the data quite adequately. An instance is shown in Fig. 10, where the LMedS method is applied. The full data set is shown as crosses, the inliers selected by the indicator function are marked by boxes, and the plotted ellipse is the result of applying a standard LS fit to the inliers. While the estimated ellipse parameters rate poorly against the expected parameters, the majority of the data is still well represented. The ability to ignore the mismatch of the fitted ellipse to a minor part of the data is of course the strength of the LMedS method since it enables its resilience to outliers.

ACKNOWLEDGMENTS I thank Spencer W. Thomas for providing the code to convert between Cartesian coordinates and the Hilbert curve indices. The MVE estimate was calculated using Rousseeuw and Leroy’s MINVOL program.

REFERENCES 1. J. Astola, P. Haavisto, and Y. Neuvo, Vector median filters, Proc. IEEE 78(4), 1990, 678–689. 2. R. C. Bolles and M. A. Fischler, A RANSAC-based approach to model fitting and its applications to finding cylinders in range data, in 7th Int. Joint Conf. on Artificial Intelligence, 1981, pp. 637–643. 3. A. R. Butz, Alternative algorithm for Hilbert’s space-filling curve, IEEE Trans. Comput. 20, 1971, 424–426. 4. M. Dhome, J. T. Lapreste, G. Rives, and M. Richetin, Spatial localisation of modelled objects of revolution in monocular perspective vision, in Proc. Europ. Conf. Computer Vision, 1990, pp. 475–485. 5. A. W. Fitzgibbon and R. B. Fisher, A buyer’s guide to conic fitting, in British Machine Vision Conf. 1995, pp. 513–522. 6. D. A. Forsyth, J. L. Mundy, A. Zisserman, C. Coelho, A. Heller, and C. Rothwell, Invariant descriptors for 3-D object recognition and pose, IEEE Trans. Pattern Anal. Mach. Intell. 13(10), 1991, 971–991. 7. W. Gander, G. H. Golub, and R. Sterbel, Least-squares fitting of circles and ellipses, Bit 34, 1994, 558–578. 8. R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, Addison–Wesley, Reading, MA, 1992. 9. N. Henze, The distribution of spaces on lottery tickets, Fibonacci Quart. 33, 1995, 426–431.

FIVE-POINT FIT ELLIPSE FITTING

259

10. S. Kamata and E. Kawaguchi, An interactive analysis method for multidimensional images using a Hilbert curve, Systems Comput. Jpn. J77-D-II(7), 1994, 1255–1264. 11. B. Kamgar-Parsi, B. Kamgar-Parsi, and N.S. Netanyahu, A nonparametric method for fitting a straight line to a noisy image, IEEE Trans. Pattern Anal. Mach. Intell. 11(9), 1989, 998–1001. 12. K. M. Lee, P. Meer, and R. H. Park, Robust adaptive segmentation of range images, IEEE Trans. Pattern Anal. Mach. Intell. 20, 1998, 200–205. 13. K. V. Mardia, Statistics of Directional Data, Academic Press, San Diego, 1972. 14. P. Meer, D. Mintz, A. Rosenfeld, and D. Y. Kim, Robust regression methods for computer vision: A review, Int. J. Comput. Vision 6, 1991, 59–70. 15. D. M. Mount and N. S. Netanyahu, Computationally efficient algorithms for high-dimensional robust estimators, CVGIP: Graph. Models Image Process. 56, 1994, 289–303. 16. P. T. Nguyen and J. Quinqueton, Space filling curves and texture analysis, in Proc. Int. Conf. Pattern Recognition, 1982, pp. 282–285. 17. P. L. Rosin, Ellipse fitting by accumulating five-point fits, Pattern Recog. Lett. 14, 1993, 661–669. 18. P. L. Rosin, Analysing error of fit functions for ellipses, Pattern Recog. Lett. 17(14), 1996, 1461–1470. 19. P. L. Rosin, Assessing error of fit functions for ellipses, Graph. Models Image Process. 58, 1996, 494–502. 20. P. L. Rosin, Ellipse fitting using orthogonal hyperbolae and Stirling’s oval, Graph. Models Image Process. 60, 1998, 209–213. 21. G. Roth and M. D. Levine, Extracting geometric primitives, Comput. Vision Image Understand. 58(1), 1993, 1–22. 22. G. Roth and M. D. Levine, Geometric primitive extraction using a genetic algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 16(9), 1994, 901–905. 23. P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection, Wiley, New York, 1987. 24. P. J. Rousseeuw and B. C. van Zomeren, Unmasking multivariate outliers and leverage points, J. Am. Stat. Assoc. 85, 1990, 633–651. 25. A. F. Siegel, Robust regression using repeated medians, Biometrika 69, 1982, 242–244. 26. C. G. Small, The Statistical Theory of Shape, Springer–Verlag, Berlin/New York, 1996. 27. A. Stein and M. Werman, Robust statistics in shape fitting, in Proc. Conf. Computer Vision Pattern Recognition, 1992, pp. 540–546. 28. H. Theil, A rank-invariant method of linear and polynomial regression analysis, Nederl. Akad. Wetenchappen Ser. A. 53, 1950, 386–392. 29. Y. Wang and N. Funakubo, Detection of geometric shapes by the combination of genetic algorithm and subpixel accuracy, in Proc. Int. Conf. Pattern Recognition, 1996, pp. 535–539.