Pattern Recognition 37 (2004) 609 – 621 www.elsevier.com/locate/patcog
Automatic localization of craniofacial landmarks for assisted cephalometry I. El-Feghi, M.A. Sid-Ahmed, M. Ahmadi∗ Department of Electrical and Computer Engineering, University of Windsor, Windsor, Ont., Canada N9B 3P4 Received 17 October 2002; accepted 12 September 2003
Abstract In this paper we propose a system for localization of cephalometric landmarks. The process of localization is carried out in two steps: deriving a smaller expectation window for each landmark using a trained neuro-fuzzy system (NFS) then applying a template-matching algorithm to pin point the exact location of the landmark. Four points are located on each image using edge detection. The four points are used to extract more features such as distances, shifts and rotation angles of the skull. Limited numbers of representative groups that will be used for training are selected based on k-means clustering. The most e6ective features are selected based on a Fisher discriminant for each feature set. Using fuzzy linguistics if-then rules, membership degree is assigned to each of the selected features and fed to the FNS. The FNS is trained, utilizing gradient descent, to learn the relation between the sizes, rotations and translations of landmarks and their locations. The data for training is obtained manually from one image from each cluster. Images whose features are located closer to the center of their cluster are used for extracting data for the training set. The expected locations on target images can then be predicted using the trained FNS. For each landmark a parametric template space is constructed from a set of templates extracted from several images based on the clarity of the landmark in that image. The template is matched to the search windows to 8nd the exact location of the landmark. Decomposition of landmark shapes is used to desensitize the algorithm to size di6erences. The system is trained to locate 20 landmarks on a database of 565 images. Preliminary results show a recognition rate of more than 90%. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Cephalometry; Landmarks; k-means clustering; Fuzzy linguistics; Gradient decent; Fisher discriminant
1. Introduction 1.1. Cephalometry Cephalometry is the scienti8c measurement of the dimensions of the head, usually through the use of standardized lateral skull radiographs, or cephalograms. Analysis of the cephalogram images and measurement of parameters is based on a set of agreed upon feature points—landmarks. ∗ Corresponding author. Tel.: +1-519-253-3000; fax: +1-519-971-3695. E-mail addresses:
[email protected] (I. El-Feghi),
[email protected] (M.A. Sid-Ahmed),
[email protected] (M. Ahmadi).
The number of landmarks that have been named and de8ned is quite large. Admittedly, many of these landmarks are rarely used. Rakosi [1] de8nes a smaller set of 20 –30 landmarks Table 1. contains those most frequently used landmarks by orthodontists in what is called cephalometric evaluation. The detection of the landmarks plays an essential role in diagnosis and treatment planning by orthodontists. Lines are constructed from the detected landmarks, and the subsequent measurement of various angular and linear parameters are measured and recorded. This allows comparison with normal values for a population, diagnosis, and assessment of craniofacial growth, planning orthodontic treatment, e6ects of treatment and evaluation of treated cases. The conventional method of locating landmarks depends
0031-3203/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2003.09.002
610
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
Table 1 List of the most commonly used landmarks and their de8nition No
Abbreviation
De8nition
1
N
2 3
S Se
4
Sn
5
A
6
APMax
7
Pr
8 9 10
Is AP1 Ii
11 12
Ap1 Id
13
B
14 15 16
Pog Gn Go
17
Me
18
APMan
19
Ar
20 21 22
Cd Or Pn/2
23 24 25
Int.FH/R.asc ANS PNS
26
S
27 28
APOcc PPOcc
29 30
Ba Ptm
Nasion, the most anterior point of the nasofrontal in the median plane; the skin nasion (N ) is located at the point of maximum convixity between nose and forehead Sella, The midpoint of the hypophysial fossa Midpoint of the entrance to sella; The midpoint of the line connecting the posterior clinoid process and the anterior opening of the sella turicica Subnasal, a skin point; the point at which the nasal septum merges mesially with the integument of the upper lip Point A, subspindale, the deepest midline point in the curved bony outline from the base to the alveolar process of the maxilla The anterior landmark for determining the length of the maxilla. It is constructed by dropping a perpendicular from A to the palatal plane Prosthion, the lost most anterior point on the alveolar portion of the premaxilla, in the median plane, between the upper central incisors Incisor superius, tip of the crown of the most anterior maxillary central incisor Apicale, root apex of the most anterior maxillary central incisor Infradentale, alveolar rim of the mandible; the lightest, most anterior point on the alveolar process, in the median plane, between the mandibular central incisors Apicale, root apex the most anterior mandibular central incisor Infradentale, alveolar rim of the mandible; the highest, most anterior point on the alveolar process, in the median plane, between the mandibular central incisors Point B, superamentale, most anterior part of the mandibular base. It is the most posterior point in the outer contour of the mandibular alveolar process, in the median plane Pognion, most anterior point of the bony chin, in the median plane Gnathion, the most anterior and inferior point of the bony chin Gonion, a constructed point, the intersection of the lines tangent to the posterior margin of the ascending ramus and the manibular base Manton, the most caudal point in the outline of the symphysis; it is regarded as the lowest point of the mandible and corresponds to the anthropological gnathion The anterior landmark for determining the length of the mandible. It is de8ned as the perpendicular dropped from Pog to mandibular plane. Articular, the point of intersection of the posterior margin of the ascending and the outer margin of cranial base Condylion, most superior point on the head of the condyle Orbitale, lowermost point of the orbit in the radiograph A constructed point, it is obtained by bisecting the Pn vertical, between its intersection with the palatal plane and point N Intersection of the ideal Frankfurt horizontal and posterior margin of ascending armus Anterior nasal spine, the tip of the bony anterior nasal spine, in the median plane Posterior nasal spine, the intersection of a continuation of the anterior wall of the pterygopalatine fosa and the Ioor of the nose Landmark for assessing the length of the maxillary base, in the posterior section. It is de8ned as a perpendicular dropped from point S to a line extending the palatal plane Anterior point for the occlusal plane, the midpoint in the incisor overbite in occlusion Posterior point of the occlusal plane, the most distal point of contact between the most posterior molar in occlusion Basion, lowest point on the anterior margin of the foramen magnum in the median plane Pterygomaxillary :ssure, the contour of the 8ssure projected onto the palatal plane
on manual tracing of the radiographic images to locate the landmarks on line crossing. This can take an experienced orthodontist up to 20 min. The process is tedious, time consuming and subject to human error. An automated system would eliminate the above-mentioned problems and hence produce repeatable results.
The 8rst X-ray pictures of the skull, in the standard lateral view were taken in 1922. It was not until 1931 that Hofrath and Broadbent [2] developed a standardized method for the production of cephalometric radiographs, using special holders known as cephalostats, to permit assessment of growth and of treatment response. By introducing the X-ray
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
Fig. 1. Most commonly used landmark as de8ned on the cephalometric tracing.
pictures of the skull, cephalometric radiology was born. Since then, a variety of analyses have been developed based on the standard lateral radiograph of the skull. In 1948 Downs [3] introduced the 8rst cephalometric analysis method. His method has been the basis for many of the methods used at present. These analyses are composed of series of linear and angular measurements based on the locations of the landmarks. Cephalograms are rarely used directly in the analysis. Instead, a tracing as shown in Fig. 1 is 8rst made to outline the most important bony and soft visible tissues. Tracings from previous or subsequent X-rays are superimposed to compare the growth and monitor the treatment. 1.2. Automated cephalometry Investigations into automatic landmark recognition using di6erent image processing techniques have been undertaken for several years. The task is a diKcult one due to the complexity and variability of the images. In the model-driven approaches, edge information from X-ray images are extracted so that desired features can be found via line trackers in a predetermined order. Another method searches for known patterns in the images. It uses various well-developed techniques and requires prede8ned contour maps for the various landmarks along with systematic training methods. Automatic cephalometric analysis has been a subject of research for many years and the automatic location of landmarks has been attempted by more than 20 independent researchers with varying degrees of success. The various methods used by researchers in this 8eld can be divided into three categories. The 8rst category follows a strategy similar to that used by orthodontists; this employs a combina-
611
tion of image processing techniques to extract the important edges, which are then used to locate landmarks on line crossings. The second category employs knowledge-based edge tracker or geometrical means to reduce the search area, followed by image-matching techniques to pin point the exact location of the landmarks. The third category employs arti8cial intelligence and neural networks. It was Levy-Mandel et al., in 1986 [4], who for the 8rst time attempted to automate the location of landmarks using knowledge-based line extraction techniques. In their system they used median 8lter and histogram equalization to enhance the contrast of the image and remove the noise. In the second stage they used a Mero-Vassy Operator [5] for edge detection to extract the relevant edges using a knowledge-based line tracker. Lines are tracked in a predetermined order based on algorithms introduced to the system by ad hoc criteria. The landmarks are then located on line crossings according to their geometric de8nitions. This algorithm can only detect landmarks located on lines and requires good quality X-rays, something that cannot be guaranteed in practice. Parthasaraty et al. [6] improved the system used in Ref. [4] by including a resolution pyramid to reduce the processing time. The X-ray image’s resolution is reduced to 64 × 60 pixels, once the landmarks are found the resolution is scaled back to the original size and the locations are 8ne-tuned. Yen et al. [7], Contereras et al. [8], Jackson et al. [9], Cohen et al. [10,11] and Davis and Forsyth [12] presented similar knowledge-based edge tracking methods. The approaches given in Refs. [4,6–12] su6ered from similar problems due to the use of rigid knowledge-based rules. The disadvantage to edge tracking systems is that the success of the approach is very much dependent on the quality of the X-ray and can only locate a landmark if it is positioned on an edge. Another disadvantage of knowledge-based approaches is the diKculty of adding a new landmark to the system, for which new rules have to be formulated and added to the rule base. The second category of researchers used mathematical or statistical models to narrow down the search area for each landmark, then shape-matching techniques are utilized to pin point the exact location of each landmark. Cardillo and Sid-Ahmed [13] presented the most promising results locating 76% of the landmarks within 2 mm. They used a combination of mathematical modeling methods similar to aKne transformation to remove the shifts, rotations and size di6erences to reduce the size of the search window. They then applied a shape recognition algorithm based on gray-scale mathematical morphology to exactly locate the landmark. Decomposition of shapes was used to desensitize the algorithm to size di6erence. This method produced good results when tested on a set of approximately 40 X-rays. Grau et al. [14] improved the work of Cardillo and Sid-Ahmed [13] by using a line detection module to search for the most signi8cant lines, such as the jaw line or nasal
612
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
spine, then utilized mathematical morphology approach similar to that used by Cardillo and Sid-Ahmed [13] for shape recognition. The mathematical morphology approach used here is very sensitive to noise. They used only one component for shape matching, which meant that the algorithm was not desensitized to size di6erences. Twenty images were used for training and another 20 used for testing the algorithm. Their methods produced some improvements over Cardillo and Sid-Ahmed [13] when tested on a smaller set of 20 X-ray images. Desvignes et al. [15] addressed the problem of 8nding an initial estimate of the location of landmarks using adaptive coordinate space where locations are registered. To reduce the variability and produce an initial estimate of the region of interest they used statistical models and training sets. The algorithm was tested on a set of 58 X-rays, of which 28 were used for training. Although they claimed to locate 99.5% of the landmarks within a window, the mean error between the real position and the estimated position was about 4:5 mm. This error was signi8cantly large considering the fact that the allowed error is ±2 mm. Hutton et al. [16] used active shape models for cephalometric landmarking. Permissible deformations of a template were established from a training set of hand-annotated images and the resulting model was used to 8t unseen images. Sixty-three randomly selected cephalograms were tested using a drop-one-out method. On average, 35% of 16 landmarks were within an acceptable range of ±2 mm. It was concluded that the current implementation did not give suf8cient accuracy for complete automated landmarking, but could be used as a timesaving tool to provide a 8rst-estimate location of the landmarks. The third category of researchers used neural networks and fuzzy inference systems to locate the landmarks. Neural networks can be trained using a training set then used to locate or estimate the locations on target images. Fuzzy neural networks use fuzzy rules to o6er a more Iexible solution. Uchino et al. [17] presented a fuzzy learning machine that could learn the relation between the mean gray levels of the image and the location of the landmarks. They divided the image into nine blocks where the mean of the gray level of each block constituted an input to the machine. The weights were adjusted by learning the coordinates of the landmark. Once the machine located the block containing the landmark, the block was further divided into nine separate blocks. The process is repeated until the size of the block is reduced and the exact landmark location is obtained. This method only works if images are registered in a common space, having the same sizes, rotations and shifts. This method su6ers from several drawbacks such as training time, amount of weights used and the number of fuzzy rules. Each landmark will require nine levels of training to produce nine sets of weights. The number of rules is equal to number of sets raise to the power nine. The lowest number of sets that can be used in the output is two, making the number of rules equal to 512. Therefore, the number of weights is equal to 9 × 512. This
method was tested on a set of images not used for training producing an average error of 2:3 mm. Recently, Innes [18] used Pulse Coupled Neural Networks (PCNN) to highlight regions containing key craniofacial features from digital X-rays. They applied di6erent size averaging 8lters prior to using the PCNN to minimize the noise in the di6erent regions of the image. In this study a larger set of images (109 X-rays) was used and tested on highlighting the regions surrounding three landmarks with increasing diKculty of identi8cation. The success rate was 36.7% for the region containing landmark sella, 88.1% for the region containing the chin, and 93.6% for the region containing the nose. Although PCNN’s have shown they are capable of image smoothing and segmentation, they require a considerable manual intervention to set the required parameters. In this paper, we present an approach based on the use of neuro-fuzzy system to produce an initial estimate of the landmark location then use template-matching technique to 8nd the exact location of the landmark inside the search area on a relatively large data set. 2. Outline of the proposed approach To prevent wrong detection of landmark the search window is minimized. The region of interest for each landmark depends on the size, rotation and shift of the head in the X-ray image. A small shift in the head will reIect a large shift in the locations of landmarks. In the method outlined in this paper; sizes, rotations and shifts of the head in the X-ray image are measured and used as features. Due to the hazardous nature of X-ray imaging, several images of the same patient with di6erent skull rotations, shifts and distances form the cephalostat at the same time cannot be taken. Hence, classes representing groups of similar images are formed based on the measurements of selected features. Fig. 2 shows the distribution of three features of randomly selected 150 images. It is obvious that the data are closely distributed. Instead of using large number of images as a training set, clustering is used to select representatives of similar images. Five steps are incorporated in the proposed approach. In the 8rst step features are extracted that provide a measure of skull size, its rotation and shift in the X-ray image. In the second step similar images are clustered into groups based on similarity of these features. Since not all features are needed nor can be used in the neural system, a smaller number of features are selected based on a Fisher distance measure [19]. In the third step a neuro-fuzzy system is constructed using one representative from each group to ensure that a similar image form each group is included in the training. In the fourth a training algorithm is utilized. After the neuro-fuzzy system is trained it will approximate the location of each landmark within a small window. In the 8fth step a template-matching algorithm is applied on the search window to pin point the exact location of the landmark.
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
613
Fig. 2. Distribution of three normalized features of 150 images.
2.1. Features extraction The most important issue in classi8cation of feature-based images is the extraction of image feature points that e6ectively represent the original image. This section presents the method used to extract image features. Since we are interested only in locations, it is desirable to convert the original image to binary images for the following reasons: 1. To identify the region of the image that contains the skull including outer soft and bonny tissues. In this step we identify the perimeter of the skull and separate the object (skull and tissues) from the background. 2. To extract the shape of the skull from the edge map. 3. To convert the edge-enhanced image to a line drawing. This is done to distinguish strong edges that correspond to object outlines from weak edges due to illumination changes, shadows, etc. 4. Binary images take less space and carry more information regarding the size and rotations of the skull. Before features can be extracted from the X-ray image, histogram equalization is applied followed by a median 8lter of size 5 × 5 for noise removal. A Sobel operator is then used to enhance the edges as shown in Fig. 3. Finally, thresholding is applied to convert the resultant image to binary form (black and white) containing the edges of the skull as shown in Fig. 4. Four points are located on the image (Fig. 5). The points are located by tracing the image in the directions shown in
Fig. 3. Image after applying histogram followed by 5 × 5 median 8lter and a Sobel operator.
Fig. 5 until the 8rst or last non-zero point is found. Points (P1) and (P4) are located by tracing the image vertically from top to bottom starting from the left side for P1 and right side for P4. Point (P2) is found by extending a line from P1 until the last non-zero value is found. Point (P3) is found
614
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
Fig. 6. Original image.
Fig. 4. Binary image.
(Cx ; Cy ) is the center of gravity of the shape calculated as follows: Cx =
N −1 1 (xi + xi+1 )(xi yi+1 − xi+1 yi ); 6A i=0
(1)
Cy =
N −1 1 (yi + yi+1 )(xi yi+1 − xi+1 yi ); 6A i=0
(2)
where N is the number of points and A is the area of the shape given by Eq. (3) A=
N −1 1 (xi yi+1 − xi+1 yi ): 2 i=0
(3)
All distance are Euclidian distances measured in pixels and angles are measured in degrees. In some images it may not be possible to detect the nose tissue due to fading in the background as shown in Figs. 6 and 7. In these rare cases manual detection of P1 is used. This situation was encountered in about 8% of the images in the data set we have worked on. Fig. 5. Locating four points on the image.
2.2. Data clustering by scanning diagonally from right to left starting from the lower-right corner. A shape similar to that shown in Fig. 8 is constructed and more features are created using the extracted four points. The created features are lines measuring the distance between points and angles between those lines. The lines represent the scale and local shift of the skull relative to lower corner of the image. The lines are drawn to connect P1, P2, P3, P4 and center c to obtain Fig. 8. Table 2 list the features created from the four points.
Clustering is used to categorize similar images of di6erent patients by partitioning them into several groups such that similarity within a group is larger than among di6erent groups. Images of di6erent patients tend to be correlated since they all represent a common visual object. This correlation denotes redundancy in data. An obvious step would be to take advantage of this redundancy to reduce the number of training set. This lower number in training sets should preserve only signi8cant variations between data samples. In this situation, a clustering algorithm is applied to 8nd the
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
615
Table 2 Name and description of the created features No
Name
1 2 3 4 5 6 7
Line1 Line1 Line1 Line2 Line2 Line3 cx
8
cy
9
c1
10
c2
11
c3
12
c4
13
1
14
2
15
3
16 17
4
5
18
Pr
19 20
A
6
21
Dc
Description 2 3 4 4 3 4
Distance between P1 and P2 Distance between P1 and P3 Distance between P1 and P4 Distance between P2 and P4 Distance between P2 and P3 Distance between P3 and P4 Horizontal coordinate of the center of gravity Vertical coordinate of the center of gravity Distance between center of gravity and P1 Distance between center of gravity and P2 Distance between center of gravity and P3 Distance between center of gravity and P4 Angle between Line1 3 and the vertical axis Angle between Line3 4 and the horizontal axis Angle between Line1 4 and the horizontal Angle between Line2 4 and Line1 2 Angle between Line2 4 and the vertical axis. Perimeter of the shape = Line1 3 + Line3 2 + Line2 4 + Line4 1 Area of the shape Angle between the line connecting the center of gravity to the lower-left corner and the horizontal Distance between center of gravity and the lower left corner of the image
“representative” centroids of the data and then use the best representative from each group as a member in the training set. Number of images belonging to the group reIects the numbers of samples covered by the group centroid. From each group one image is selected as a representative and the rest are used for testing as shown in Fig. 9. Clustering algorithms ensures that the various data samples are represented in the training set. Since all clustering algorithms are sensitive to the range of elements in the input vectors, all features are normalized within the unit hypercube. Subtractive clustering proposed by Chiu [20] is used to create initial clusters. In this method, all images are considered as candidates for a cluster centers. Consider a collection of n images each represented by a vector X = {x0 ; x1 ; : : : ; xm } of m dimension where m is the number of features. Since each vector is a candidate for
Fig. 7. Manual interventions is needed to detect point P1.
Fig. 8. The feature made from the extracted four points.
cluster centers, a density measure at vector Xi is de8ned as n−1 Xi − Xj 2 ; (4) Di = exp − (ra =2)2 j=0 where ra is a positive constant de8ning the radius of the neighborhood; contribution from data points outside this radius to the density measure is smaller. The radius ra is estimated as follows: ra = 2 min(di ), where n j=1 Xi − Xj di = n is the sum of the distances from vector i to all other vectors.
616
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
is updated by the mean of all vectors in a group as follows: 1 Ci = Xl ; (7) |Gi | l;xl ∈Gi
Fig. 9. The closest image to the center of a cluster is chosen as a group representative.
After density measure has been calculated for each image, the image with highest density measure is selected as the center of the 8rst cluster. Let xs1 be an image with density Ds1 . Next, the density measure for all images is revised by the formula xi − xs1 2 Di = Di − Ds1 exp − ; (rb =2)2
Fl =
(5)
where rb is a positive number selected to be about 1.5 ra to prevent closely spaced centers [21]. Revising the density using Eq. (5), the density measure of images near the 8rst image xs1 will have a reduced density, thereby making them unlikely to be selected as the next cluster center. This process is repeated until a suKcient number of clusters is obtained. After initial clusters have been decided, we apply the k-means algorithm [21,22] to 8nd the clusters center. When Euclidean distance is used as a measure of dissimilarity between image Xl in cluster j and the corresponding groups Gi ; i = 1; : : : ; c with a center of cluster Ci , the best group centers can be obtained by minimizing the following cost function:
J=
c−1 i=0
Ji =
c−1 i=0
d(Xl − Ci ) ;
where |Gi | is the size of the ith group. This is repeated until the improvement of Eq. (6) over previous iteration is below . In this research we have used = 0:0001. After all the images have been classi8ed to one of the clusters, the image with minimum distance to the center of each cluster is chosen as a representative to be used for training. Since an image can belong to only one cluster, no image will be chosen more than once. The number of features used for clustering is high; it would be desirable to reduce the number of features while preserving the properties of the clustering centers. Reducing the number of feature is very important in reducing the number of fuzzy rules. Number of fuzzy rules required is equal to the number of sets raised to the power of the number of inputs. Designing a system with 27 rules is more desirable than designing the same system with 221 rules if the two systems perform the same task. For this purpose we use the Fisher discriminant or distance [19] for each feature set to select the most e6ective features. The Fisher distance is a measure of cluster separability. The mean and standard deviation of the feature value in each cluster are used to compute the Fisher distance
(6)
l;xl ∈Gi
where c is the number of clusters and d is the Euclidian distance between vectors k and Ci the center of the ith group. The value Ji depends on the geometrical properties of Gi and the location of ci . Xi belongs to group i if the distance between Xi and Ci is closer than the distance between Xi and any other center. Ci
c−1 c−1
(l; i − l; j )2 =(l;2 i + l;2 j );
(8)
i=0 j=i+1
where l is feature number, l; i is the mean of the feature k on cluster i and l; j is the mean of the feature k on cluster j; l; i is the standard deviation of the feature l on cluster i and l; j is the standard deviation of the feature l on cluster j. As can be seen from the formula in Eq. (8), the Fisher distance increases as the di6erence between means of the feature values for the two clusters increase, or as the separability between the two clusters increase. In a similar manner as the standard deviation of the feature in each cluster decreases, the Fisher distance increase. The features that generate the largest Fisher distance are considered e6ective. From 21 features used for clustering, only seven features are chosen. The clustering is repeated and each time one of the less e6ective features is removed. If the clustering results remain unchanged, another feature is removed. This process is repeated until removing the one of the less e6ective features changes the clustering results. It is found that the clustering result obtained using 21 features listed in Table 2 can be obtained using the seven features listed in Table 3. 2.3. Neuro-fuzzy network Neural networks and fuzzy systems are universal function approximators. Neural networks are easy to train and have the ability to learn, but their internal representation is diKcult to understand because the acquired knowledge
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
617
Table 3 The most e6ective features No
Name
1 2 3 4 5 6 7
cx cy c1 c2 c3 c4
6
is implicitly expressed. Conversely, a fuzzy system provides a natural interpretation of the problem by means of if–then fuzzy rules. Combining the properties of neural networks and fuzzy systems will bring the learning abilities and the representation of the fuzzy systems to what is called a neuro-fuzzy network (NFN). Neuro-fuzzy networks have rapid computation due to their intrinsic parallel processing nature. On the other hand, the parameters of neuro-fuzzy network have clear physical meanings and it is easier to design the input–output data pairs for the training stage. They also provide an improved knowledge representation and uncertainty reasoning with natural language processing capabilities to model complex, non-linear problems. Moreover, the neuro-fuzzy systems provide an e6ective framework to incorporate human linguistic descriptions into the problem at hand [25,26]. The non-linear mapping from the input space [x0 ; : : : ; xN ] to the output space yL is realized by Sugeno fuzzy model with Refs. [27,28] using a set of fuzzy inference if–then rules with generalized bell membership function in the antecedent part and a singleton in the consequence part. The fuzzy inference rules are described by Rule j: IF x1 is A1j AND x2 is A2j ; : : : ; xN is ANj ; THEN y1 is yj1 ; : : : ; yp is yjp ( j = 1; : : : ; M );
(9)
where X = {xi ; i = 1; : : : ; N } are the inputs to the NFN, Aij (i = 1; : : : ; N ) are fuzzy subsets and yj1 ; : : : ; yjp are real numbers in [0,1] representing the outputs of rule j on pattern p. Since the input to the NFN is fuzzy, the feature must be fuzzi8ed 8rst. Here we employ a complementary generalized bell membership function for fuzzi8cation as shown in Fig. 10. The membership functions are made large enough to cover the whole range of the input features. The max and min represent the maximum and minimum feature values in the training sets, respectively. The values of a; c and the slope of the curves are to be determined by the training algorithm. Here we use the neuro-fuzzy network shown in Fig. 11 similar to that used by Kothari [24]. The network is of three
Fig. 10. Complementary membership functions.
Fig. 11. Architecture of the neuro-fuzzy network.
layers feed-forward with n inputs X = {x1 ; : : : ; x n } and a single output yL ; L = 1; 2 representing the horizontal and vertical coordinates of the landmark. This is not a limitation, since a fuzzy algorithm with n inputs and m output is equivalent to m fuzzy algorithms with n inputs and one output. The 8rst layer is the fuzzi8cation layer, and calculates the degree of membership for an input vector x. This layer is composed of groups of n neurons. The ith neuron in the jth group computes the membership value of the ith input variable xi in the fuzzy subset Aij for kth rule. This is achieved by choosing a generalized bell membership function: ij (xi ) =
1 ; 1 + |(xi − cij )=aij |2bij
(10)
where cij is the center, bij is the slope, and aij is the width of the membership function. Every input feature will activate only two neighboring membership functions and the sum of the membership degree is always one.
618
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
The third layer performs the inference of rule by means of a product operator. The number of nodes in this layer is equal to the number of rules. Rules are formed as follows: R(p) = (x1 ; Api1 ):(x2 ; Api2 ); : : : ; (xN ; ApiM );
(11)
which corresponds to the activation strength of the pth rule for the input vector X = {x1 ; : : : ; x n }. The third layer performs the defuzzi8cation phase by means of the weighted sum method. The output of the single unit in this layer is N p=1 wLp R(p) yL (x) = N ; (12) p=1 R(p) where wLp is the connection weight between pth unit in the second layer and the Lth output unit in the third layer. 2.4. Training The slope, width of the generalized bell membership function and the connection weights are adaptive parameters which are determined through an iterative process to train the network using a supervised learning algorithm by feeding the features extracted from the representative image of each group to the network and adjusting the adaptive parameters so that the output corresponds to the location of the landmark [22,23]. Let us then de8ne a cost function to be minimized over the training cycle as follows: EL =
N 1 (ydp − yot )2 ; 2 t p=1
(13)
where E is the error, L is the number of outputs in the network. In this application we have two outputs, one for the horizontal and the other is for the vertical landmark location, ydp is the desired output of the network using prototype &; N is the number of training samples, yot is the obtained network output in the training cycle t. The gradient of the total error with respect to each of the adaptive parameters is computed, then the parameters are updated in the direction opposite to the measured gradient as follows: dEj wj (t + 1) = wj (t) − 'w ; (14) dwj where w are the connection weights and 'w is the learning rate of the weights and t is the training cycle number. The slope b is updated as follows: dEL bij (t + 1) = bij (t) − 'b ; (15) dbij where dE = dbij
@E @yo
@yo @R
@R @ij (xi )
@ij (xi ) @bij
and 'b is the gradient-decent speed for bij .
Similar expressions are obtained of cij and aij for horizontal and vertical locations. In the proposed cost function, a global minimum is always guaranteed because the cost function is quadratic and hence only one minimum exists. After the network is trained, it can predict the coordinates of the landmark location within the window of size not more than 50 × 50 pixels.
2.5. Template matching Template matching is referred to as the comparison of a standard representative pictorial pattern (template) with an image for the purpose of 8nding the occurrence of the reference pattern within the search image. The search is done by moving the template pixel by pixel over the search image while calculating the correlation between the template and the search area at that position to locate the position that has the highest correlation value in the image. The dif8culty with this method is its high computational cost especially with large search windows. Many techniques have been developed to reduce the computational time such as coarse-to-8ne template matching where a low resolution of search area in the image is compared with a low-resolution template to 8nd a location of a good match [24,25]. Due to low-resolution image and template, this search requires less time but lower precision. Since our search area is small, computation time does not constitute a major problem. The problem is raised by the di6erence in shape, size, translation and rotation of landmarks. No two landmarks have exactly the same shape, or location in two di6erent images. By minimizing the search window for each landmark, the problem of translation or shift in the image is solved since it guarantees that the landmark falls inside that small window and the whole window can be searched. The problem of size di6erence between the shapes of landmarks can be partially solved by dividing the complex shape of the landmark templates into more than several smaller components. It is easier to locate one complex, large shape if it is broken down to several small components [13] (Table 4). One way to overcome the shape rotation is to use multiple templates for each component of the landmark shape. However, the number of templates required to reIect the individual variations is very large and this will introduce a drastic increase in the computation cost. It is desirable to combine a set of templates {t0 ; t1 ; : : : ; tn } in one template and match it to the reference image. Modeling a template using a combination of several templates has been successfully used in Refs. [29,30]. A parametric template P can be constructed by using a linear combination of all the templates as follows:
(16) P=
k i=1
* i ti ;
(17)
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
619
Fig. 12. Template extracted form the same image with di6erent rotation angles.
Table 4 Comparison of results with the work of Cardillo and Sid-Ahmed [13] Landmark
N S Nose A Point IS AP1 Ii AP1 B Pog Gn Go Me Cd Or SoftPog AnS PNS APOcc PPOcc Total
Recognition rate
Improvement ()
Cardillo et al. (%)
Proposed system (%)
83 53 94 77 76 89 64 79 71 97 100 61 78 89 40 91 68 86 48 71
91 77 100 94 100 100 79 98 85 100 100 87 84 91 74 100 92 100 68 87
8 24 6 17 24 11 15 19 14 3 0 26 6 2 34 9 24 14 20 16
76
90
14
where ti is a rotated version of a component of the land mark shape, k is the number of shapes to be represented in template p; *i are constant values obtained using constrained optimization such that the following two constraints are satis8ed: (1) It maximizes the normalized correlation between all the rotated templates and one template with no rotation. (2) The sum of *i satis8es the condition: k
*i = 1:
(18)
i=1
Fig. 12 shows the extraction of a template from the same image with di6erent rotation angles. To avoid corner clipping a larger size of the template is chosen then rotated around the center of the template. Only the area marked by the white square is saved as the template.
3. Experimental results Using the outlined clustering algorithm on 600 images, the k-means converged to 35 clusters. From each cluster one image is chosen to be used for training the rest of the images are used for testing the algorithm. In the proposed approach the training set consists of 35 images obtained from the different clusters. The image closest to the center of each cluster is selected. This leaves us with 565 images to be used for testing. Fisher function is used to reduce the number of features. It is shown that with only seven features identical clustering can be obtained. The seven features of the prototype images are used to construct the fuzzy sets. Landmarks are located manually on the prototypes. For each landmark the network is trained and two sets of weights are obtained: one for estimating the x location and the other for the y location of the landmark. Twenty landmarks are selected for this experiment. The selected landmarks are of di6erent types and location and should provide a reasonable test set for assessing this algorithm. From a better than average quality X-ray templates of size 20 × 20 pixels, landmarks are extracted manually from the X-ray image to enclose the shape of each landmark. Most of the landmarks can be represented by one or two template elements except point “Sella” (landmark labeled 2 in Fig. 1) which requires three elements as shown in Fig. 13. Parametric template space is constructed for each element using 8ve templates with degrees of rotation: −5; −2:5; 0; 2:5 and 5, as shown in Fig. 12. The system was tested on 565 images with resolutions ranging from 461 × 461 to 563 × 563 pixels. The location of the landmarks is compared to that located by a human operator. If the di6erence between the location of a landmark obtained by the algorithm and that obtained by the human operator is less than ±6 pixels (±2 mm), we consider this to be a correct location. Example of a successful localization of landmark “Nose” is shown in Fig. 14. Test results are summarized in Table 2. The results are compared to those obtained by Cardillo and Sid-Ahmed [13]. 4. Conclusion In this paper, we have presented a system for the extraction of craniofacial landmarks from the lateral skull X-ray. A method based on the use of neuro-fuzzy networks is used to predict the expected locations of the landmarks on target images by non-linear input/output mapping of a set of
620
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621
Fig. 13. Extraction of three templates to represent the landmark “Sella”.
Fig. 14. Localization of landmark “Nose” inside the minimized search area.
input features using a set of fuzzy inference rules. A template matching method is then used to pin point the exact locations of the landmarks The method is tested on 565 images and results showed improvement over previous work. References [1] T. Rakosi, An Atlas and Manual of Cephalometric Radiology, Wolfe Medical Publications, London, 1982. [2] H.B. Broadbent, A new X-ray technique and its application to orthodontia, Angle Orthod. 1931, Vol. 1, pp. 45–66.
[3] Wb. Downs, Variation on facial relationships: their signi8cance in treatment and prognosis, Am. J. Orthod. 34 (1941) 812–840. [4] A. LUevy-Mandel, A. Venetsanopoulos, J. Tsotsos, Knowledgebased landmarking of cephalograms, Comput. Biomed. Res. 19 (3) (1986) 282–309. [5] L. Mero, Z. Vassy, A simpli8ed and fast version of the Hueckel operator for 8nding optimal edges in pictures, Proceedings of the Fourth International Joint Conference on Arti8cial Intelligence, Tbilissi, Georgia, USSR, September 1975, pp. 650 – 655. [6] S. Parthasaraty, S. Nugent, P.G. Gregson, D.F. Fay, Automatic landmarking of cephalograms, Comput. Biomed. Res. 22 (1989) 248–269. [7] C.K. Yan, A. Venetsanopoulos, E. Filleray, An expert system for landmarking of cephalograms, Proceedings of the Sixth International Workshop on Expert Systems and Applications, 1986, pp. 337–356. [8] J.L. Contereras-Vidal, J. Garza-Garza, Knowledge-based system for image processing and interpolation of cephalograms, Proceedings of the Canadian Conference on Electrical and Computer Engineering, 1990, pp. 75.11–75.14. [9] P.H. Jackson, G.C. Dickson, D.J. Birnie, Digital image processing for cephalometric radiographic: a preliminary report, Br. J. Orthod. 12 (1985) 122–132. [10] A.M. Cohen, A.D. Linney, A low cost system for computer-based cephalometric analysis, Br. J. Orthod. 13 (1986) 105–108. [11] A.M. Cohen, H.H.-S. Ip, A.D. Linney, A preliminary study of computer recognition and identi8cation of skeletal landmarks as a new method of cephalometric analysis, Br. J. Orthodont. 11 (1984) 143–154. [12] D. Davis, Knowledge-based cephalometric analysis: a comparison with clinicians using interactive computer methods, Comput. Biomed. Res. 27 (1994) 10–28. [13] J. Cardillo, M. Sid-Ahmed, An image processing system for the automatic extraction of craniofacial landmarks, IEEE Trans. Med. Imaging 13 (2) (1994) 275–289. [14] V. Grau, M.C. Juan, C. Monserrat, C. Knoll, Automatic localization of cephalometric landmarks, J. Biomed. Inf. 34 (2001) 146–156. [15] M. Desvignes, B. Romaniuk, R. Demoment, M. Revenu, M.J. Deshayes, Computer assisted landmarking of cephalometric radiographs, Proceedings of the Fourth IEEE Southwest Symposium on Image Analysis and Interpretation, 2000, pp. 296 –300. [16] T.J. Hutton, S. Cunningham, P. Hammond, An evaluation of active shape models for the automatic identi8cation of cephalometric landmarks, Eur. J. Orthodont. 22 (5) (2000) 499–508. [17] E. Uchino, T. Yamakawa, High speed fuzzy learning machine with guarantee of global minimum and its application to chaotic system identi8cation and medical image processing, Proceedings of the Seventh International Conference on Tools with Arti8cial Intelligence, 1995, pp. 242–249. [18] A. Innes, V. Ciesielski, J. Mamutil, S. John, Landmark detection for cephalometric radiology images using pulse coupled neural networks, International Conference in Computing in Communications, June 2002, pp. 391–396.
I. El-Feghi et al. / Pattern Recognition 37 (2004) 609 – 621 [19] M. Mehrubeoglu, N. Kehtarnavaz, G. Marquez, L.V. Wang, Characterization of skin lesion texture in di6use reIectance spectroscopic images, Proceedings of the Fourth IEEE Southwest Symposium on Image Analysis and Interpretation, 2000, pp. 146 –150. [20] S.L. Chiu, Fuzzy model identi8cation based on cluster estimation, J. Intell. Comput. Fuzzy Systems 2 (3) (1994) 267–278. [21] P.R. Krishna, L.N. Kanal, Classi8cation, Pattern Recognition, and Reduction of Dimensionality, in: Handbook of Statistics, Vol. 2, North-Holland, Amsterdam, 1982. [22] J. Makhoul, S. Roucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (11) (1985) 1551–1588. [23] F. GuUely, P. Siarry, Gradient decent method for optimizing various fuzzy rule bases, Proceeding of IEEE International Conference on Fuzzy Systems, 1993, pp. 1241–1246. [24] R. Kothari, Robust regression based training of ANFIS, 18th International Conference of the North American on Fuzzy Information Processing Society, pp. 605 – 609.
621
[25] J. Zhang, A.J. Morris, Recurrent neuro-fuzzy networks for nonlinear process modeling, IEEE Trans. Neural Networks 10 (2) (1999) 313–326. [26] D. Abu-al-nadi, D. Popovic, Texture analysis using an adaptive neuro-fuzzy network and fractals, Proceedings IEEE International Conference on Fuzzy Systems 2 (1999) 1102– 1107. [27] T. Takagi, M. Sugeno, Fuzzy identi8cation of system and its application to modeling and control, IEEE Trans. System Man Cybernet. 15 (1985) 116–132. [28] M. Sugeno, G.T. Kang, Structure identi8cation of fuzzy model, Fuzzy Sets and Systems 15 (1988) 15–33. [29] S. Ullman, R. Basri, Recognition by linear combinations of models, IEEE Trans. Pattern Anal. Mach. Intell. 13 (10) (1991) 992–1006. [30] K. Tanaka, M. Sano, S. Ohara, M. Okudaira, A parametric template method and its application to robust matching, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, 2000, pp. 620 – 627.
About the Author—IDRIS EL-FEGHI was born in Misurata-Libya in 1959. He received his B.Sc. degree in (EE) from University of Portland, Portland, OR, USA and M.Sc. and Ph.D. in Electrical and Computer Engineering from University of Windsor, Windsor, Ont., Canada in 1983, 1999 and 2003, respectively. He is currently a postdoctoral fellow at the University of Windsor. His research interests include biomedical image analysis, neural networks, computer vision, face detection and recognition. About the Author—MAHER A. SID-AHMED was born in Alexandria, Egypt in 1945. Obtained his B.Sc. in Electrical Engineering in 1968. Served two years in the Egyptian Navy. Completed his Ph.D. from the University of Windsor in 1973. Joined the utility coordination group in Alberta Government Telephones, in Edmonton, Alberta in 1973. Was hired by the University of Windsor in 1978. Carried-out research work in DSP, image processing, robotic vision and related areas. Published over 100 papers and have four US patents in improved de8nition television. Published one book entitled “Image Processing, Theory, Algorithms and Architectures,” McGraw-Hill in 1995. Graduated over 30 Masters and Ph.D. students. Presently working on a new edition for the Image Processing book. About the Author—MAJID AHMADI (S’75 –M’77–SM’84 –F’02) received the B.Sc. degree in Electrical Engineering from Arya Mehr University, Tehran, Iran, in 1971 and the Ph.D. degree in Electrical Engineering from Imperial College of London University, London, UK, in 1977. He has been with the Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada, since 1980, currently as a Professor and Associate Director of the Research Center for Integrated Microsystems. His research interests include digital signal processing, machine vision, pattern recognition, neural network architectures, applications, and VLSI implementation. He has coauthored the book Digital Filtering in one- and two-Dimensions; Design and Applications (New York: Plenum, 1989) and has published over 300 articles in this area. He is the Associate Editor for the Journal of Pattern Recognition, the Journal of Circuits, Systems and Computers, and the International Journal of Computers and Electrical Engineering. Dr. Ahmadi was the IEEE-CAS representative on the Neural Network Council and the Chair of the IEEE Circuits and Systems Neural Systems Applications Technical Committee. He was a recipient of an Honorable Mention award from the Editorial Board of the Journal of Pattern Recognition in 1992 and received the Distinctive Contributed Paper award from the Multiple-Valued Logic Conference Technical Committee and the IEEE Computer Society in 2000. He is a Fellow of the Institution of Electrical Engineering.