Pattern Recognition 42 (2009) 2317 -- 2326
Contents lists available at ScienceDirect
Pattern Recognition journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r
Adaptive and optimal difference operators in image processing Peter Veelaert ∗ , Kristof Teelen University College Ghent, Applied Engineering Sciences—Ghent University/IBBT/Image Processing and Interpretation, choonmeersstraat 52, B9000 Ghent, Belgium
A R T I C L E
I N F O
Article history: Received 25 June 2008 Received in revised form 7 November 2008 Accepted 10 November 2008
Keywords: Difference operator ¨ Grobner basis Local feature detector Tangent Laplacian
A B S T R A C T
Differential operators are essential in many image processing applications. Previous work has shown how to compute derivatives more accurately by examining the image locally, and by applying a difference operator which is optimal for each pixel neighborhood. The proposed technique avoids the explicit computation of fitting functions, and replaces the function fitting process by a function classification process using a filter bank of feature detection templates. Both the feature detectors and the optimal difference operators have a specific shape and an associated cost, defined by a rigid mathematical structure, which ¨ can be described by Grobner bases. This paper introduces a cost criterion to select the operator of the best approximating function class and the most appropriate template size so that the difference operator can be locally adapted to the digitized function. We describe how to obtain discrete approximates for commonly used differential operators, and illustrate how image processing applications can benefit from the adaptive selection procedure for the operators by means of two example applications: tangent computation for digitized object boundaries and the Laplacian of Gaussian edge detector. © 2008 Elsevier Ltd. All rights reserved.
1. Introduction Differential operators are essential in many applications in image processing and computer vision, and they are ubiquitous in lowlevel vision: gradients and Laplacians for images, tangents for curves. There is a large choice of popular operators: Sobel, Roberts, Prewitt, or the more elaborate operators that combine Gaussian smoothing with derivatives. Differential operators also prove their worth in higher level image processing algorithms. For example, the scaleinvariant feature transform (SIFT) [1] combines the Laplacian with Gaussian smoothing at different scales to extract key points from images. The speeded up robust features (SURF) [2] rely on the outcome of digitized box filters, used to approximate Gaussian second order partial derivatives. Because of their importance for discrete image processing, several methods have been proposed that either replace a differential operator by a difference operator or estimate its outcome in the context of numerous discrete feature extraction and edge detection applications. Lindeberg [3] shows how to define discrete derivative approximations for low-level feature extraction, and edge detection. Gunn [4] and Demigny and Kamlé [5] consider discrete versions of edge detection algorithms. Lachaud et al. [6] evaluate different
∗ Corresponding author. E-mail addresses:
[email protected] (P. Veelaert),
[email protected] (K. Teelen). 0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.11.017
accurate estimation methods for the tangent for digitized curves based on digital line recognition. In fact, almost any algorithm that extracts interesting features from images, such as edges or remarkable points, relies in some way on first or higher order derivatives. Remarkably, the difference operators used to approximate a differential operator are often quite rudimentary. The discrete operator is frequently chosen on an experimental basis: for a given class of images, which operator yields visually the best results? One reason is that in the past, applying a 3 × 3 linear filter to a large image was an expensive operation. A more fundamental cause for using an ad hoc approach, however, is that some questions have not been solved in a profound way. Suppose that we want to compute the tangent to a digitized curve in a point p by approximating the curve in a small window around p by some polynomial. One recurring difficulty is the appropriate choice of the window in which the differential is calculated and the choice of the order of the polynomials. The largest window and the polynomials of the highest order will not necessarily give the best results. If one chooses the order of the polynomial too high, there is a risk of overfitting, while a window chosen too large can result in a sudden degradation of the fit, because the curve may behave differently farther away from p, where, for example, discontinuities may occur. Digital images contain a lot of noise, while difference operators are quite sensitive to noise. One usually assumes that Gaussian smoothing at different scales is the best way to remove noise. Too much smoothing, however, displaces edges. In fact, fitting polynomials to curves or images will always yield inaccurate results at
2318
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
discontinuities, although one of the primary objectives of computing derivatives is to find regions where the image varies rapidly. One way to handle this problem is to look for zero crossings of the second derivative, after some Gaussian smoothing, which produces more or less stable results. The goal of this paper is to relate the derivation of an optimal difference operator to the analysis of the local image content. Most of the difficulties mentioned above can be solved when the difference operator and the window size are adapted to the neighborhood of each pixel. In practice, the computation of a derivative will involve two simple steps. First, a filterbank is used to determine the best class of fitting functions and the optimal window size. Next, a difference operator is used that has been optimized for the specific function class and window size. The design of the filterbank and the optimization of the operators rely heavily on the use of ideals of difference operators. The ¨ reduction of difference operators with Grobner bases will be one of our primary tools to find good operators. One of the advantages of ¨ Grobner bases is that one can reduce systems of difference equations for multidimensional functions, while classical methods are limited mainly to difference equations (or recurrences) for functions in a ¨ single variable [7]. Using Grobner bases to solve or reduce difference equations is not new, however. The work by Oberst and Gerdt et ¨ al., where Grobner bases are used to solve systems of partial difference equations has led to renewed interest in the design of efficient discrete schemes for solving PDEs [8–10]. These new results extend and clarify earlier techniques, such as those found in the work of Collatz [11]. Without doubt the improved schemes will also find their way in image processing, where PDEs are used in diffusion processes and image segmentation. The focus of this work, however, will be on one particular problem: how can we estimate differentials as accurately as possible? Previous work [12–14] has already shown that it is possible to compute derivatives more accurately by examining the image locally. This paper generalizes the approach and shows how to select the best class of fitting functions as well as the best win¨ dow size. Because in this paper the Grobner bases for the different function classes form a lattice, the filter bank will also have a more elegant structure. We will illustrate the proposed operator selection procedure by the computation of the tangent for digitized curves, and the derivative and Laplacian for images. These results show how the computation of the approximating difference operators can benefit in accuracy from locally applying the optimal digitized version of the differential operator. We also show that the Sobel and Prewitt operators are optimal within our framework and can, thus, be incorporated in a larger filter bank of optimal, adaptive operators. Section 2 introduces most of the mathematical ideas. In Section 3 we derive several examples of optimal operators. Section 4 discusses the criterion for selecting an appropriate operator. Practical results are evaluated in Section 5. 2. Function classes To find the derivative of a discrete function f , a standard approach ˜ before taking derivais to approximate f by a continuous function g, tives. This is a time consuming process, involving unclear choices such as the size of the window in which the approximation takes place and the nature of the approximating function. The idea advocated in this and previous work is that it is not necessary to compute g˜ itself, but that it is sufficient to know by which class of continuous functions the digitized function can be approximated well, and to use a difference operator that is known to be optimal for this class. We introduce a criterion that will be used to select an optimal class of approximating functions, and an optimal window size.
m We use a continuous real function g˜ : R → R to approximate a digitized function f : Zm → Z. To approximate the value of a differential at a point x0 , it is sufficient to approximate f in a finite ˜ < is used as a shorthand for subset D ⊂ Zm containing x0 . |f − g| ˜ < , x ∈ D. In this paper, we will introduce classes of |f (x) − g(x)| approximation functions. Let G be a class of approximation functions ˜ Given a domain D, if there is at least one g˜ ∈ G such that |f − g| ˜ , g. we denote this as f ∈ G;D . The shift operator, j , is defined by j f (x) = f (x + j), for x, j ∈ Zm . The functional composition of shift operators can be expressed as a multiplication of polynomials, i.e. j k f = j+k f . A difference operator, P, can be represented as a polynomial in with non-negative, bounded exponents, that is P = pj j , and P ∈ R[], the ring of poly ˜ = 0, x, j ∈ nomials in . We write P g˜ = 0 as a shorthand for pj j g(x) m ˜ < , this means that |Pf (x) − P g(x)| ˜ Z . If we write that |Pf − Pg| < for all x for which Pf (x) is well defined, that is (x + j) ∈ D for every non-vanishing coefficient pj of the difference operator P. Since polynomial ideals are usually defined for polynomials with non-negative exponents, we will assume, without loss of generality, that D is a finite neighborhood containing only points with non-negative coordinates. The difference operators can be represented by templates. For example, a two-dimensional difference operator P = pj j = jx jy 2 jx ,jy pjx jy x y , j = (jx , jy ) ∈ Z is represented by a two-dimensional template:
(1)
We use the convention that the box at the upper left corner corresponds to p00 . When considering central operators, without loss of generality, we will multiply the operators with jc , with jc the position of the central point, so that all exponents are non-negative. The same goes for non-central operators, with jc indicating the point at which the operator is computed. Boxes with vanishing coefficients are either not drawn, or drawn as empty boxes. Now, we come to the main idea of this paper. Given a pixel in a digital image or point on a digitized curve, how can we select the best difference operator? Let L be the differential operator that must be replaced by a difference operator. If we have an appropriate class ˜ and a difference operator Q = qj j that G of fitting functions g, satisfies Q g˜ = Lg˜ for every g˜ ∈ G. Then, if g˜ is an approximation for ˜ < in the domain D, since the operators Q and L f such that |f − g| are linear, we have ˜ < |Qf − Lg|
|qj |
(2)
˜ < |Q|, where |Q| denotes |qj |. which will also be written as |Qf −Lg| Hence, the difference operator Q is a good approximation for the differential operator L, provided G contains at least one good approximation g˜ for f . This property will be exploited as follows. A bank of linear filters Hi is applied to an image f , as shown in Fig. 1. Each filter gives an indication of the error i that can be expected when the image f is locally approximated by functions of a given class, e.g. linear functions. Since each filter can only estimate the error i , more than one filter is needed to obtain reliable estimates. The filters are grouped into subbanks such that each subbank corresponds to a given class of functions. For each class there is an optimal differ ence operator Q, with cost |Q| = |qj |. By multiplying each i by the operator cost |Q| of the corresponding class, and after taking maximum values, the filter bank reliably estimates the maximal error |Q| that can be expected by applying the difference operator Q. The
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
2319
Fig. 1. Scheme of the filter bank for computing the optimal operator selection criterion.
optimal operator is the one that minimizes the maximal error. Note that the same filter can often cooperate with other filters to estimate the error for multiple classes (subbanks may overlap). For example, a function that is locally linear is also locally quadratic, although the difference operator costs may be different. It is important to keep in mind that the total error |qj | in (2) is due to two successive approximations. The error arises when f is ˜ a step which cannot replaced by some continuous fitting function g, be avoided if we want to compute differentials. The additional error |qj | is due to the replacement of the differential operator L by a difference operator Q, which avoids the explicit computation of the continuous approximation. This technique only works if two distinct requirements are satisfied. First, the approximation (2) is only valid ˜ Second, the approximation error, , provided Q g˜ = Lg˜ holds for g. should be as small as possible. The best way to satisfy both requirements is to define a class of fitting functions by means of a finite set of difference equations Pi g˜ = 0, 1 i k. If a function satisfies these k difference equations, then it will satisfy any difference equation of the form (S1 P1 +· · ·+Sk Pk )g˜ =0 where the Si are arbitrary polynomials in the shift operators. The set of all operators P = i Si Pi forms an ideal I, spanned by the polynomials Pi , which is denoted as I = P1 , P2 , . . . , Pk . A polynomial ideal ¨ I can always be represented by a Grobner basis. In this paper, we will assume that when we write I = P1 , P2 , . . . , Pk , then the Pi form a ¨ Grobner basis, according to lexicographic order. For the properties of ¨ Grobner bases, we refer to [16,17]. Fig. 2 shows the templates for a ¨ subset of operators Pi in the ideal 2x , 2y , which defines the Grobner basis for the class of functions of the form 1 xy + 2 x + 3 y + 4 . Using an ideal of difference operators to characterize a class of fitting functions has several advantages: • At each point, the best class of local fitting functions can be determined from the output of a bank of linear filters. • The finite set of classes of fitting functions form a lattice, with respect to the union and intersection of the ideals in this set. Thus, we can design efficient filter banks that compute the approximation error for each class. The structure of the lattice will reappear in the filter bank.
11 12 13 1y 2y 3y 4y 5y 6y 7y 8y 9y 10 y y y y
0
0
0
1y
1
-2
1
2y
0
-1
0
0
0
0
1
2
1
0
-1
0
3y 4y 5y
1
0
0
1
-4
2
0
-1
0
1
0
0
1
0
1
0
-1
0
6y 7y
-1
0
0
8y
0
2
0
9y
0
0
1
Fig. 2. A class of fitting functions can be defined by means of a set of difference 2 2 equations Pi g˜ = 0. In this example, the operators Pi of the ideal x , y define functions g˜ of the form 1 xy + 2 x + 3 y + 4 of class G5 (see Table 2). Only a limited subset of operators P = i Si Pi from the ideal is shown, as filter kernels on an image. The kernels correspond to five distinct filters in the filterbank.
• The optimal difference operator for each class can be found in a ¨ systematic way by using Grobner bases. In the remainder of this section, we will look more closely at each of these advantages. Lattices of fitting classes. We will use two kinds of differential operators to illustrate the concepts of this paper. We will compute the derivative of a digitized curve to obtain the tangent, and the Laplacian of an image. Choosing the fitting function classes Fj for one-dimensional functions is straightforward, as shown in Table 1. We select polynomial ¨ functions of increasing order. In this case, the Grobner bases are very simple and can be generated by a single operator. The structure of the lattice is also simple, since Fi ⊂ Fi+1 for each class.
2320
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
˜ g, ˜ Lemma 1. Suppose we have one difference operator Q such that Q g=L then Rg˜ = Lg˜ for any R for which R − Q ∈ I.
Table 1 ¨ Function classes and corresponding Grobner bases for the defining ideals. F0 : 0
x
F1 : 1 x + 0
x
F2 : 2 x2 + 1 x + 0
x
2 3
F3 : 3 x3 + 2 x2 + 1 x + 0
x
F4 : 4 x + 3 x + 2 x + 1 x + 0
x
F5 : 5 x5 + 4 x4 + 3 x3 + 2 x2 + 1 x + 0
x
4
3
4 5
2
6
Table 2 ¨ Functions and corresponding Grobner bases with lexicographic ordering x > y . G1 : 1
x , y
G2 : 1 x + 2
x , y
G3 : 1 y + 2
y , x
G4 : 1 (x + y) + 2
x − y , y
G5 : 1 x + 2 y + 3
x , x y , y
G6 : 1 xy + 2 x + 3 y + 4
x , y
G7 : 1 (x + y) + 2 (x + y) + 3
x − y ,
G8 : 1 (x + y) + 2 x + 3 y + 4
x − y , y (x − y ), y
G9 : 1 (x2 + y2 ) + 2 x + 3 y + 4
− x y ,
G10 : 1 x2 + 2 y2 + 3 x + 4 y + 5
x , x y , y
G11 : 1 (x + y ) + 2 xy + 3 x + 4 y + 5
− x y , y
G12 : 1 x + 2 y + 3 xy + 4 x + 5 y + 6
y , x y
G13 : 1 x3 + 2 x2 y + · · · + n
y , x
G14 : 1 x4 + 2 x3 y + · · · + n
x , x y , x y , x y , x y , y
2 2
2 2
2
2
2
2
˜ implies |Pf | |P| ˜ , then the inequality |f − g| Lemma 2. If |f − g| for each P ∈ I.
2
2 2
2
2
2
Theorem 1. There is a finite set of m difference operators H1 , . . . , Hm in I, ˜ holds for at least one member g˜ of the corresponding such that |f − g| function class, provided |Hi f | |Hi |, for i = 1, . . . , m.
3 y
3
2 x
5
3
2 y,
3
4 x,
In previous work [12], we have proven that the converse is also true.
3 y
2
2 x
3 x,
In fact, since P g˜ = 0 for all operators P ∈ I, if R − Q ∈ I, then ˜ (R − Q)g˜ = 0, hence Rg˜ = Q g˜ = Lg. The optimization criterion is that |R| should be as small as possible. As a result of Lemma 1, R can be chosen from an infinite set of difference operators in the ideal of the function class. Section 3 explains in more detail how an optimal operator can be computed for each class. Class selection with filter banks. The task of the filter bank is to verify whether the class of fitting functions contains at least one ˜ . In fact, we have the following immemember g˜ such that |f − g| diate result, because a difference operator is a linear operator and P g˜ = 0, for all P ∈ I.
2 y,
2
2 x 3 x 4
3
2 y,
2 x 3
3
2 y, 2
3 y,
2
3
4 y 4
5
The finite set of operators can be computed as follows. Assume that any solution g of the partial difference equations P1 g=0, . . . , Pn g= 0 can be written as a linear combination of n functions g1 , . . . , gl : g = 1 g1 +· · ·+l gl . This will be the case when there are enough difference equations to restrict the solution space to a finite dimensional vector space (e.g., by including powers of x as well as of y , as in Table 2). |D| ) difference operators Hi of the form Let KD be the set of ( l+1 g1 (x1 ) ... g (x ) 1
Fig. 3. Lattice of the operator ideals are shown in Table 2.
As Table 2 shows, the set of possible choices for the function classes increases significantly when considering function classes of two variables. In this case, the classes Gi do not form a chain of subsets, but a lattice I as shown in Fig. 3. The lattice in Fig. 3 can be interpreted in terms of function classes as well as ideals. For each pair of ideals Ik , Il in I the intersection (or meet) of Ik and Il , as well as their union (or join or sum) are also included in I. Note that to find ¨ a Grobner basis of an intersection of two ideals, one cannot simply ¨ take the union of both Grobner bases. Instead, one must apply an algorithm for computing ideal intersections, such as the one given in [16]. In terms of function classes this means that the set of classes in Table 2 is closed with respect to intersections and unions. For example, I12 is the intersection of I10 and I11 , or, equivalently, G12 = G11 ∪ G10 . In this sense, the function classes form a complete collection. Optimal difference operators. The second benefit of using ideals to define fitting functions is that the optimal difference operator can be selected from a large set of operators. The following result follows immediately.
l+1
...
gl (x1 )
. . . gl (xl+1 )
x1 xl+1
(3)
where the xj are l + 1 arbitrary points in D. The operators of KD are determinantal expressions of the coefficients gj (xj ) and the shift operators xj . Since D is finite, KD is finite. The operators H in KD are all operators for which Hf is well defined. One can show that ˜ holds, provided |Hi f | |Hi | for Hi ∈ KD [12]. |f − g| In fact, although up to now, we used a set of difference operators to define a class of fitting functions, one may proceed in the opposite way. With the set KD , one can find an ideal of defining operators when the solution set is given as a linear combination of basis functions gi . When we assume that D is rectangular, there is a unique point x0 in D with the smallest coordinates. Let −x0 KD denote the polynomials of KD multiplied by −x0 . The polynomials −x0 KD all have non-negative exponents, but some of the exponents will be zero. One can show that if D is chosen not too small, the polynomials Hi in −x0 KD will generate the entire ideal I = P1 , . . . , Pn . For example, for two-dimensional operators, in practice, it is usually sufficient that D has size at least (a + 2) × (b + 2) when the leading monomial of the fitting functions is xa yb , or smaller. As stated, for f to be in G;D , it is sufficient to verify |Hf | |H| for all H in the finite set KD . In other words, it is sufficient to verify a finite set of inequalities to determine whether a discrete function f can be approximated well by at least one function in a given function class. The set KD is still a very large set, however. One can show that it is sufficient to use only a small subset of operators Hi in KD to estimate the approximation error in a reliable way [12].
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
Theorem 2. Let H1 , . . . , Hk be a finite subset of the operators in KD , with k>m, such that the operators H1 , . . . , Hk span the entire ideal I. Then there is a positive real number , such that if |Hi f | |Hi |, for i = 1, . . . , k, then |Hi f | |Hi | for all Hi in KD .
can be close to one for a small but well chosen set of operators
H1 , . . . , Hk . In the filter bank of Fig. 1, each of the operators, Hi , gives rise to a linear filter.
Table 3 Optimal tangent operators for the one-dimensional function classes with different window sizes. Size 3 5 7
3. Computing optimal operators 9
The optimal difference operator RG;D for a function class G and a window D is computed by an optimization process. The optimal operator is the operator for which |RG;D | is minimal. We will proceed in two steps. First, we give a general procedure for finding all operators with a predefined shape for a given function class. Then we compute the optimal operator for each class by solving the optimization problem min(|R|). 3.1. Shaped operators Difference operators S with a predefined shape have the advantage that known properties of the operator can be taken into account, so that the optimization process can be speed up. For example, if it is known that there exists an optimal symmetric operator, we can use this fact by starting with a predefined symmetrical shape, which reduces the optimization search space considerably. Given a domain D and an ideal I, defining a function class, a general form for all operators of a predefined shape is derived as follows. (1) Select any difference operator Q, satisfying Q g˜ = Lg˜ for this function class, and a predefined shape for the operator S. The shape has zero entries outside the domain D. ¨ (2) With the Grobner basis of I, compute a reduced operator modulo the function class for both Q and S. (3) Identify the reduced predefined operator, Q, with the reduced shaped operator, S, and solve the resulting system of linear equations. If this system has no solution, there is no operator with the predefined form. When a solution is found, some variables can be eliminated to obtain an expression in which the number of remaining variables equals the number of dimensions of the solution space. The resulting operator represents symbolically all correct operators of the predefined shape. Example. We illustrate the given procedure by the computation of an optimal difference operator to replace the differential tangent operator for functions of class F2 = 2 x2 + 1 x + 0 . The tangent operator L = j/ jx can be correctly replaced by the operator Q = x − 2x /2 = − 32 + 2x − 2x /2
(4)
so that Lg˜ = Q g˜ for a function g˜ in the class F2 . In [14], we give a more detailed explanation on how to systematically compute a derivative operator by relating the shift operators to partial derivatives. As the operator Q is shift invariant, we can shift it by 2x so that the tangent is computed in the center of a window of length 5, which gives Q = −32x /2 + 23x − 4x /2. The operator Q is then reduced modulo the ideal IF2 of this function class, resulting in Q mod IF2 = 1 2 2 − 2x + 3x /2. ¨ The same Grobner basis is used to reduce the operator S, which we choose to be a difference operator with shape [−a, −b, c, b, a]. The reduction of S yields S mod IF2 =(2a+b)−(8a+4b)x +(6a+3b+c)2x . ¨ Note that the Grobner basis is used here to find conditions for the
2321
11
F1 and F2
2 −1 + 2 2 4 −1 + 4 4 6 −1 + 6 6 8 −1 + 8 8 10 −1 + 10 10
F3 and F4 Non existing 2 1 4 1 2 − + 3 − 12 3 3 12 1 1 1 1 6 1 1 − − 2 + 4 + 5 − 12 4 4 4 4 12 1 8 1 1 2 1 6 − + − 24 3 3 24 9 10 9 25 2 25 8 + − − 160 96 96 160
shaped operator S so that S is the same as Q modulo the ideal. ¨ Therefore, the monomial order used to compute the Grobner basis, e.g. lexicographic, is not important. Identifying Q mod IF2 with S mod IF2 gives a system of three linear equations of rank 2 ⎧ 1 ⎨ 2a + b = 2 −8a − 4b = −2 ⎩ 6a + 3b + c = 32
(5)
This system allows us to eliminate two variables, resulting in an operator of the general form [−a, − 12 + 2a, 0, 12 − 2a, a], where a can be chosen freely. Any operator of this form has the required shape, ˜ and satisfies Rg˜ = (j/ jx)g.
Note that Q is not unique. For example Q = x − 2x /2 + 3x /3 will ¨ also give a correct result. However, the reduction by the Grobner basis will automatically eliminate unnecessary terms, such as 3x /3, because the third derivative is always zero for this particular class of fitting functions F2 . 3.2. Optimal operators An operator is optimal when the sum of the absolute values of its coefficients is minimal. For example, the cost of the operator in the previous example is |R| = |ri | = | − a| + | − 12 + 2a| + | 21 − 2a| + |a|. In fact, although the expression for |R| is non-linear, it is a convex function for which we can easily compute an optimum. In fact, it is easy to see that the minimum is attained where at least one of the absolute values is zero, and that the solution set is a convex polytope. Therefore, the optimum can be found by solving a set of linear programming problems. In this example, we obtain a minimum for cost |R| = 12 , with the template R = [− 14 , 0, 0, 0, 14 ]. Table 3 shows the optimal tangent operator for the function classes of Table 1, with increasing domain (or window) size. Note that some function classes may share the same optimal operator. For example, this is the case for the functions of first and second order. This is no longer true when the tangent is computed outside the central position of the window. Table 4 shows that the operator cost |R| decreases for larger window sizes in the same function class. When the size of the window remains fixed, we notice that the operator cost increases for higher order function classes. The operator 1 2 2 3 1 4 12 − 3 + 3 − 12 is often used in numerical analysis to compute derivatives, although the way in which it was derived here, is completely different from the derivation described in [18]. The same procedure can be repeated to compute a difference operator to approximate the results for another differential operator, like e.g. the Laplacian L = (j2 / jx2 ) + (j2 / jy2 ) of functions in two variables. We illustrate this procedure for function class G7 , where the Laplacian, L, can be replaced by the difference operator Q = 22y = 22y − 4y + 2. We choose an operator with the shape given in
2322
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
Fig. 4, compute the linear system and solve the optimization problem, resulting in the operator given on the right in Fig. 4. If the reduction of the operators with the ideals of different function classes yields operators of the same form, then the optimal operator of that form will also be the same for those function classes. Table 5 shows a table of optimal Laplacian operators for functions in two variables. The set of linear function classes comprises G1 , G2 , G3 , G4 and G5 ; the quadratic classes are G9 , G10 , G11 and G12 ; the cubic class is G13 ; the class of fourth order is G14 . Although G6 is not a class of linear functions, with respect to the Laplacian it behaves as a linear function. We use the term quadratic symmetric to denote the set of function classes {G7 , G8 }, which are symmetric in the second degree terms. There are no 3 × 3 operators that correctly compute the Laplacian for cubics and higher order functions. The operator cost for all these functions classes in different window sizes is given by Table 6. For a given window size, the operator cost increases when the function class becomes more general. In fact, Table 2 shows that I12 ⊂ I10 , i.e. G12 contains G10 as a subset, and the functions in G10 have to satisfy the difference equations from I12 plus some extra difference equations. As a result, the class of operators from which an optimal operator has to be chosen is larger for I10 than for I12 . Second, for a given function class the cost decreases when the window size increases. In fact, the operators for a larger window contain the operators for a small window as a special case.
Table 4 The operator cost |R| of the optimal tangent operators for the one-dimensional function classes, given for different window sizes. Size
F1 and F2
F3 and F4
F5
3
1
n.e.
n.e.
5
1 2
3 2
n.e.
7
1 3
7 6
11 6
9
1 4
3 4
79 60
11
1 5
19 30
153 140
Table 5 shows that, for linear functions, the optimal operator is the zero operator with cost equal to zero. This is correct since the Laplacian of a linear function (and of G6 ) is zero, a value which can be estimated without error. This may give rise to some anomalies when we have to select an appropriate function class and window size. In the next section, we discuss how these anomalies can be avoided by introducing an artificial cost for linear functions (shown between brackets in Table 6). 3.3. Non-central operators Although all the operators in Table 3 compute the tangent at the center of the window, the theory is not limited to central operators. When a curve or function contains discontinuities or sudden jumps, it makes sense to consider operators that compute the tangent at a noncentral position. These operators will incorporate information arising further along the function into the computation of the difference operator. Table 7 shows some optimal operators of size 7 for function class F3 . Within the window the tangent is computed at the indicated point. Not surprisingly, the cost of the operator becomes larger when we compute the tangent closer to the boundaries of the window. Table 8 shows operators that compute for different window sizes the tangent at the fourth position of the window. As expected, the error drops when the window size increases, although slower than for central tangents of Table 4. 3.4. Derivative operators for images An important application is the computation of derivatives for discrete images. Let
(6)
be a proposed shape for a difference operator that computes the derivative j/ jx at its central position. In terms of shift operators this Table 6 Cost of optimal Laplacian operators for different sets of function classes and window sizes.
3×3 5×5 Fig. 4. Example: Laplacian operator computation for G7 with shaped operator (left) and resulting optimal operator (right).
7×7
Linear
Quadratic symmetric
Quadratic
Cubic
4th order
0(1)
2
4
n.e.
n.e.
1 0( ) 4 1 0( ) 9
1 2 2 9
1
1
4 9
4 9
16 3 9 5
Table 5 Optimal Laplacian operators for the 2D function classes in 3 × 3 or 5 × 5 templates. Linear
3×3
5×5
Quadratic symmetric
Symmetric
Cubic
4th order
n.e
n.e
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326 Table 7 Optimal tangent operators for the function classes F3 for a fixed window size of 7. 1 12 1 15 1 72
−
1 4
0 0
5 − 12
1 4
1 4 5 − 12 −
0 −
0
4 5
1 4 4 15
0
1 9
−
3 8
5 − 4
0
1 12 1 12 17 36
7 6 5 6 35 36
−
10 3
0
Within the window the tangent is computed at the point indicated by a box. The cost |R| of the operator is shown in the last column.
Table 8 Optimal tangent operators for the function classes F3 for a fixed window size of 7. 1 12 0 −
3 35
1 4 1 − 5 5 − 54 −
1 4 1 − 24 −
0
1 4
1 4
0
0
0
0
1 12 13 40
−
0
0
−
29 126
0
1 12 0
−
7 135
7 6 13 20 29 63
Within the window the tangent is computed at the point indicated by a box. The cost |R| of the operator is shown in the last column.
1 x 40
-1
0
0
0
1
-2
0
0
0
2
-4
0
0
0
4
-2
0
0
0
2
-1
0
0
0
1
1 x 24
1
0
0
0
-1
0
-8
0
8
0
0
0
0
0
0
0
-8
0
8
0
1
0
0
0
-1
Fig. 5. Optimal operators in a 5 × 5 window for linear and quadratic functions with cost 12 (left), and for cubics with cost 32 (right).
shape can be written as −a + a2x − by + b2x y − a2y + a2x 2y . This operator must be equivalent to the operator x − 2x /2+ 3x /3− 4x /4+ ¨ reduction of both operators · · ·, which computes j/ jx. After Grobner and identifying coefficients, one finds that for constant functions and functions of the form 1 y + 2 , the zero operator is optimal. For all other linear and quadratic functions one finds, after ¨ Grobner reduction, that the derivative operator must have the form
(7)
with operator cost 4|a| + 2| 12 − 2a|. The minimal value of the cost is 1, which is obtained for 0 a 14 . It is interesting to note that the Sobel, Prewitt and Frei–Chen √ operators are all optimal in this sense (choose a= 18 , 16 , or 1/(2(2+ 2)) ≈ 0.146, respectively). Furthermore, both operators exploit the limited freedom of the parameter a to perform some smoothing in the vertical direction. There is no 3 × 3 operator that correctly computes the derivative for cubic functions. In fact, it makes no sense to use cubics in a 3 × 3 window to approximate an image function, since a cubic has 10 parameters, and the fit will always have zero error. In Fig. 5, we show an optimal operator in a 5 × 5 window for linear and quadratic functions with cost 12 , and one for cubics with cost 32 .
2323
4. Optimal operator selection in practice The error produced by a difference operator is equal to |R|. Several parameters affect the quality of the approximation in different ways: • When the window size increases, the fitting cost, , increases for a given function class. • When the window size remains fixed, decreases for more general functions (e.g. polynomials of high order). • When the window size increases, the operator cost, |R|, goes down for a given function class. • When the window size remains fixed, the operator cost, |R|, increases for more general functions. That is, increasing the window size can have a beneficial as well as bad influence on the approximation, and the same goes for the choice of the function class. To take all the above effects at once into account, it makes sense to use |R| as the criterion for selecting the best operator. There may be some artefacts, however, which prohibit the blind application of this criterion. First, some operators have zero cost. As noted in Table 6 we have |R| = 0 for the Laplacian of a linear function, and therefore |R| = 0. Second, the fitting error may be zero. Suppose that we compute the tangent of a discrete curve on which three subsequent points are collinear by accident. Then the fitting error of class F1 in a window of size 3 will be zero. Therefore, the error |R| will be smaller for a window of size 3 and an operator of class F1 , than for any other function class or window size. This phenomenon will appear whenever the window size is small compared to the order of the polynomial. For example, three points of an arbitrary function always lie on a parabola, which makes the fitting error zero. Also, four points of a digitized curve will still often lie by accident on a parabola. There is no obvious solution for these problems since they are related to the loss of information that arises when a curve or function is digitized. How to cope with these artifacts may often depend on the application. For example, the noise level in an image influences the decision to accept that a function is locally linear. One simple way to avoid these anomalies is to incorporate two additional, artificial costs in the selection criterion: ( + )(|R| + |R| ), where and |R| are small positive real numbers. Alternatively, one can introduce artificial costs for some of the operators, as was done in Table 6. For example, for functions that are locally linear, the best Laplacian difference operator is the zero operator, which has zero cost. To avoid that this operator has an unsurmountable advantage over the other operators, we introduce an artificial non-zero cost in the column of the linear operator, which is in line with the costs of the other operators. A more radical approach is simply to avoid the function classes or windows that may give rise to problems. For example, to compute the Laplacian, one may discard the class of linear functions, or to compute a tangent, one may avoid using a window whose size is small compared to the order of the approximating polynomial. 5. Difference operator applications We present two applications where a filter bank of adaptive operators is used to compute differentials: computing the optimal tangent operator for one-dimensional functions or object boundaries, and the Laplacian operator for local image surfaces. 5.1. Tangent computation for object boundaries As a first application, we consider the computation of an optimal tangent operator for functions of one variable. The function classes
2324
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
80
3.5
70
3 2.5 tangent
f (x)
60 50 40
1.5 1
30
0.5 0
20 0
5
10 15 20 25 30 x
8
10
12
14
8
10
12
14
16 X
18
20
22
24
2
40
1
30 20
0
10
−1
tangent
f (x)
2
0 −10 −20
−2 −3 −4
−30
−5
−40
−6
−50 0
5
10 15 20 25 30 x
16
18
20
22
24
x
Fig. 6. Two examples of the tangent computation for a digitized function: a third order monotonically increasing polynomial (a), and the join of two third order polynomials with a discontinuity (c). For each function, the actual tangent value (o), the tangent estimated by our method (∗), and by taking backward differences (.) are indicated, respectively, in (b) and (d). The squares indicate the upper and lower error bounds on the tangent value estimated by our method.
used are given in Table 1. A filter bank with feature detection templates Hi computes the fitting costs Fj ;Dk at each point of the curve, for several function classes Fj and several windows Dk . For each function class, the operator cost |RFj ;Dk | of the optimal operator for that class and window is known in advance. The cost criterion is used to select the optimal operator, adapted to the local function neighborhood around each point. In this example, the slightly modified criterion (Fj ;D + )|RFj ;D | (with e.g. = 1) was used to compensate for the anomalies explained before. Also, non-central windows and operators were included in the filter bank to cope with discontinuities in the curve. Fig. 6 illustrates the computation of the tangent for digitized curves. Fig. 6(b) shows the exact tangent value for a digitized polynomial of third order. The adaptive operators approximate the actual value very well, certainly when compared to the result of the computation of the tangent by taking the backward difference. Fig. 6(c) shows the concatenation of two third order curves, with a discontinuity in-between. Adaptive operators approximate the actual tangent value well on both sides of the discontinuity, due to the non-central operators which compute the tangent left and right of the discontinuity. The filter bank also estimates an upper and lower error bound on the estimation of the tangent value. In each point of the digitized function f , |R| |Lg˜ − Rf | must hold, when there is at least one function g˜ of that class approximating the digitized function f well, i.e. |g˜ − f | . In this example, the function fc before digitization is known, so that the error bounds can be computed as
Lf c ± |R|, which is also shown in Fig. 6(d). Note that there can be an additional approximation error, i.e. g˜ is not necessarily equal to fc . This is not accounted for by the indicated error bounds. The computation of the tangent of more general digitized curves requires some adaptations. Points do not necessarily form the graph of a function, but may correspond to a multivalued function. In this case, we consider all possible digitized functions fj that can be constructed with the curve points, taken along one of the main axes. The minimum cost is taken not only over all possible windows and function classes, but also over the different functions that can be derived from the multivalued function, both for functions of the x and y coordinates. The application to the boundary of a digitized circle illustrates that adaptive operators approximate the actual value well, as indicated by the full and the dotted line, respectively, in Fig. 7. Adaptive operators perform far better when compared to the backward difference. 5.2. Laplacian of Gaussian edge detection The |R| cost criterion gives a remarkable result for zero operators as their operator cost, |R|, is always zero, and therefore also their Laplacian estimation error, |R|, is zero. Theoretically, this is correct because |R| represents the error with which we can estimate the ˜ Since the Laplacian of a linear function is always zero, value of Lg. we can in fact estimate the value of the Laplacian without error. The fitting cost does not count in this case. One solution would be to
tangent
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
2325
1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 0
50
100
150
200
250
# Fig. 7. An example of the tangent estimation for a digital curve, which is the boundary of a digital circle. For each point of the curve, the actual tangent value (dotted), the tangent estimated by our method (solid), and by taking backward differences (dashed) are shown.
Fig. 8. The original image (a), for which edges are detected as zero crossings after Laplacian computation on the unsmoothed image by the proposed method (b), by the hierarchical method (c) and by the default √ (d). (e) and (f) show the edges obtained by the Laplacian of Gaussian (LoG) method, respectively, for images smoothed √ method with a Gaussian of standard deviation 2 and 2 2 [13].
define a more elaborate mixed cost h(, |R|). In [15], we proposed a simpler solution by introducing a non-zero cost for zero operators. By observing Table 6, one sees that the cost |R| often goes down by a factor 2 when the function class gets more specialized. For example, in a 3 × 3 window, the cost is 4 for quadratic functions, and 2 for quadratic symmetric functions. We shall extrapolate this trend and use a cost for linear functions which is half the cost of symmetric quadratic functions, as shown between brackets in Table 6. Thus, quadratic or higher order functions will only be used if their fitting cost is at least less than one half the fitting cost of linear functions. By a fortunate coincidence, the cost used in a 3×3 window for linear functions is simply the fitting cost, , itself. Fig. 3 shows that the function classes Gi form a poset with subset inclusion as ordering relation. If one reduces the Laplacian operator j2 / jx2 + j2 / jy2 = (x − 2x /2 + 3x /3 − · · · )2 + (y − 2y /2 + 3y /3 − ¨ · · · )2 with the Grobner bases of the respective function classes of Table 2, some of the operators will be identical. This means that the
structure of filter bank will be simpler than the lattice of Fig. 3. As the same difference operator is optimal for more than one function class, the number of operators is also limited. When two classes have the same optimal operator for a given window size, we consider only the most general class for fitting because it yields the smallest approximation error . The following classes share an operator: • The function classes G1 , G2 , G3 , G4 , G5 and G6 form a subset of functions for which the optimal difference operator is the zero operator. G6 is the most general of these functions, and is, therefore, used as a representative for this set of classes. • G8 represents the set of quadratic function classes {G7 , G8 }. • Quadratic functions as in G9 , G10 , G11 and G12 are represented by G12 . For 5 × 5 templates, the optimal operator generated for these classes is the same as that for class G13 . Therefore, we use G13 to generate the feature detection templates in the filterbank for this subset of function classes.
2326
P. Veelaert, K. Teelen / Pattern Recognition 42 (2009) 2317 -- 2326
Better estimation of the Laplacian offers a considerable advantage in feature detection applications, for example edge detection. It also results in a more accurate computation of the Laplacian zero crossings, which determine the location of the edges. In particular, there is a considerable improvement for large noisy image regions with homogenous gray values. Regions around edges can be well approximated by the quadratic and higher order function classes. Fig. 8 shows the result for the Laplacian applied to the unsmoothed Lena image. When we compare the edges obtained from zero crossings of the Laplacian, we notice that adaptive Laplacian operators using the criterion |R| give better results than the hierarchical method, proposed in [13], and much better than the widely applied method for the discrete Laplacian, which gives no acceptable results without smoothing. After smoothing the results remain better. Note that some drawbacks of LoG edge detection become more apparent in the results when the images are smoothed more. Edges tend to form closed loops, and sharp corners are smoothed too much. The edges of finer (and even coarser) details disappear on higher scales while edges are still detected in (noisy) homogeneous regions. 6. Conclusion This paper presents a mathematical framework to replace a differential operator by a difference operator that is optimal for a specific class of functions. Instead of actually approximating the digitized function by a fitting function of certain class, we apply a filter bank in order to determine locally, for a window D of certain size, how well the digitized function can be characterized by that class of functions. While this step results in a fitting cost , for each class and window size, there is also an optimal operator cost |R|. For each local part of the function, the best operator is the one that minimizes the combined cost |R|. As this cost relies on both the fitting and the operator cost, we can select fitting functions of the most appropriate order in a window of appropriate size, thus striking a balance between over- (or under-) fitting and a sufficiently accurate estimation of the differential operation. When the filter bank includes operators for non-central points (albeit with an increased operator cost), it also copes well with discontinuities in the digitized functions. A topic not discussed in this paper is how to employ the freedom left in choosing an operator when there is more than one optimal operator (e.g., when computing derivatives for image functions).
This freedom arises when the window is large enough, and allows the use of additional criteria, such as further reduction of noise, or the reduction of the average error, apart from the maximum error of the operator.
References [1] D.G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. [2] H. Bay, T. Tuytelaars, L. Van Gool, SURF: speeded up robust features, in: Ninth European Conference on Computer Vision, 2006. [3] T. Lindeberg, Discrete derivative approximations with scale-space properties: a basis for low-level feature extraction, Journal of Mathematical Imaging and Vision 3 (1993) 349–376. [4] S. Gunn, On the discrete representation of the Laplacian of Gaussian, Pattern Recognition 32 (1999) 1463–1472. [5] D. Demigny, T. Kamlé, A discrete expression of Canny's criteria for step edge detector performances evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 1199–1211. [6] J.-O. Lachaud, A. Vialard, F. de Vieilleville, Analysis and comparative evaluation of discrete tangent estimators, in: Lecture Notes in Computer Science, Proceedings of the Discrete Geometry for Computer Imagery, vol. 3429, Springer, Berlin, 2005, pp. 240–251. [7] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, second ed., Addison-Wesley, Reading, MA, 1994. [8] U. Oberst, Multidimensional constant linear systems, Acta Applicandae Mathematicae 20 (1990) 1–175. [9] U. Oberst, F. Pauer, The constructive solution of linear systems of partial difference and differential equations with constant coefficients, Multidimensional Systems and Signal Processing 12 (2001) 253–308. ¨ [10] V. Gerdt, Y. Blinkov, V. Mozzhilkin, Grobner Bases and Generation of Difference Schemes for Partial Differential Equations, Symmetry, Integrability and Geometry: Methods and Applications, vol. 2, Paper 051, arXiv:math.RA/0605334, 2006. [11] L. Collatz, Numerical Treatment of Differential Equations, third ed., Springer, Berlin, 1960. [12] P. Veelaert, Local feature detection for digital surfaces, in: Proceedings of the SPIE Conference on Vision Geometry V, SPIE, vol. 2826, 1996, pp. 34–45. [13] K. Teelen, P. Veelaert, Improving difference operators by local feature detectors, in: Lecture Notes in Computer Science, Proceedings of the Discrete Geometry for Computer Imagery, vol. 4245, Springer, Berlin, 2006, pp. 391–402. [14] P. Veelaert, K. Teelen, Feature controlled adaptive difference operators, Discrete Applied Mathematics 157 (3) (2009) 571–582. [15] P. Veelaert, K. Teelen, Optimal difference operator selection, in: Lecture Notes in Computer Science, Proceedings of the Discrete Geometry for Computer Imagery, vols. 495–506, Springer, Berlin, 2008, pp. 495–506. [16] D. Cox, J. Little, D. O'Shea, Ideals, Varieties and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, Springer, New York, 1992. [17] W.W. Adams, P. Loustaunau, An Introduction to Grobner Bases, Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, vol. 3, 1994. [18] P. Henrici, Elements of Numerical Analysis, Wiley, New York, 1964.
About the Author—PETER VEELAERT received a degree in electronic engineering from the University of Ghent in 1981. After his studies he started working as an engineer at the Digitized Information Systems Corporation, Belgium, where he developed computer graphics software. In 1986 he joined the Laboratory for Electronics at the University of Ghent from which he received a PhD degree in 1992. He currently teaches and does research at the University College Ghent, department of Applied Engineering Sciences, Ghent University. His current research interests include real-time systems for low-level vision, image interpretation of road scenes, and geometric uncertainty models. About the Author—KRISTOF TEELEN received the MS degree in electronic engineering from the Catholic University of Leuven in 2003. After his studies, he joined the department of Applied Engineering Sciences, University College Ghent—Ghent University Association, where he currently teaches and does research on image correspondence problems. He is currently a PhD candidate at the Image Processing and Interpretation group (IPI-TELIN-IBBT) at Ghent University. His research interests include image interpretation for real-time vision systems, correspondence problems and geometric uncertainty models.