Computers, Environment and Urban Systems 39 (2013) 48–62
Contents lists available at SciVerse ScienceDirect
Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys
An adaptive fuzzy-genetic algorithm approach for building detection using high-resolution satellite images Emre Sumer a,⇑, Mustafa Turker b,1 a b
Baskent University, Faculty of Engineering, Department of Computer Engineering, 06810 Ankara, Turkey Hacettepe University, Faculty of Engineering, Department of Geomatics Engineering, 06800 Ankara, Turkey
a r t i c l e
i n f o
Article history: Received 12 March 2012 Received in revised form 22 January 2013 Accepted 23 January 2013
Keywords: Building detection Image processing High resolution satellite imagery Genetic algorithms Fuzzy logic
a b s t r a c t We propose a new approach for building detection using high-resolution satellite imagery based on an adaptive fuzzy-genetic algorithm. This novel approach improves object detection accuracy by reducing the premature convergence problem encountered when using genetic algorithms. We integrate the fundamental image processing operators with genetic algorithm concepts such as population, chromosome, gene, crossover and mutation. To initiate the approach, training samples are selected that represent the specified two feature classes, in this case ‘‘building’’ and ‘‘non-building’’. The image processing operations are carried out on a chromosome-by-chromosome basis to reveal the attribute planes. These planes are then reduced to one hyperplane that is optimal for discriminating between the specified feature classes. For each chromosome, the fitness values are calculated through the analysis of detection and mis-detection rates. This analysis is followed by genetic algorithm operations such as selection, crossover and mutation. At the end of each generation cycle, the adaptive-fuzzy module determines the new (adjusted) probabilities of crossover and mutation. This evolutionary process repeats until a specified number of generations has been reached. To enhance the detected building patches, morphological image processing operations are applied. The approach was tested on ten different test scenes of the Batikent district of the city of Ankara, Turkey using 1 m resolution pan-sharpened IKONOS imagery. The kappa statistics computed for the proposed adaptive fuzzy-genetic algorithm approach were between 0.55 and 0.88. The extraction performance of the algorithm was better for urban and suburban buildings than for buildings in rural test scenes. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction Building detection has long been one of the major research areas in urban remote sensing. At first glance, buildings may appear to be simple objects that can be easily identified and extracted. However, automatic building extraction from highresolution images must address several difficulties caused by differences in viewpoint and by buildings of complex shape and size. Buildings are one of the fundamental GIS data components, and building detection has been shown to be extremely useful in urban planning, infrastructure development, the construction of telecommunication lines, pollution modeling, disaster planning and many other types of urban simulation. Several approaches based on high-resolution multi-spectral spaceborne imagery have been developed for the acquisition of 2-D building information. Depending on the application area, sin⇑ Corresponding author. Tel.: +90 312 246 66 66; fax: +90 312 246 66 60. E-mail addresses:
[email protected] (E. Sumer),
[email protected] (M. Turker). 1 Tel.: +90 312 297 69 90; fax: +90 312 297 61 69. 0198-9715/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compenvurbsys.2013.01.004
gle and stereo uses of panchromatic, multi-spectral and pan-sharpened spaceborne imagery are commonly encountered. Fraser, Baltsavias, and Gruen (2001), Lee, Shan, and Bethel (2003), Shackelford and Davis (2003), Kim, Lee, and Kim (2006), Sirmacek and Unsalan (2009) and Koc San and Turker (2012) utilized IKONOS imagery for building extraction with different methodologies based on image classification, image segmentation, fuzzy pixel/object based approaches, line analysis and graph theoretical methods. Additionally, QuickBird imagery has been used to extract buildings in several studies by employing different region- and feature-based approaches such as clustering, edge detection and snake contours (Liu, Cui, & Yan, 2008; Mayunga, Coleman, & Zhang, 2007; Wei, Zhao, & Song, 2004). Inglada (2007) proposed an image processing system based on support vector machines for the detection and recognition of man-made objects from SPOT-5 imagery. Furthermore, the use of hybrid datasets such as integrated SAR–optical imagery and LIDAR–optical imagery have also been tested in several studies (Karantzalos & Paragios, 2010; Sohn & Dowman, 2007; Tupin & Roux, 2003). Moreover, the use of digital elevation models (DEMs) and digital surface models (DSMs) generated from different spaceborne sensors for building detection can be
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
observed in Ioannidis, Psaltis, and Potsiou (2009), Lafarge, Descombes, Zerubia, and Pierrot-Deseilligny (2010) and Tournaire, Bredif, Boldo, and Durupt (2010). These previous approaches have generally employed purely spectral input vectors built by the set of intensity values from each spectral channel for each pixel in the image. Although these vectors provide a suitable fixed-dimensionality space in which conventional classifiers often work well, it is evident that spatial relationships such as texture, proximity and shape can also be very informative in feature extraction. This type of additional information can be added to the spectral information. However, there exist a huge number of potential combinations of these additional vector dimensions. To address this problem, a hybrid evolutionary algorithm called GENIE (GENetic Image Exploitation) was developed by Perkins et al. (2000). The algorithm maintains a population of primitive image processing operators such as basic mathematical, logical and texture operators. For each individual (chromosome) in the population, the ability to find the feature of interest (e.g., the building) is tested by assigning a fitness value. The fitness of an individual is determined by the agreement between the extracted feature of interest and the user-provided reference pixels. After fitness determination, evolutionary operators such as selection, crossover and mutation are employed until some stopping criterion is satisfied. As a general tendency, the ‘‘less-fit’’ individuals are discarded and the ‘‘more-fit’’ ones are preserved to produce better operator chains. In a study conducted by Harvey et al. (2002), GENIE proved to perform better than the fundamental conventional supervised classifiers such as minimum distance, maximum likelihood, Mahalanobis distance, spectral angle mapping and binary encoding. In a further study conducted by Perkins et al. (2005), a system called GENIE Pro was developed. Similar to GENIE, GENIE Pro is a general purpose adaptive tool that derives automatic pixel classification algorithms for satellite and aerial imagery from training inputs. In particular, GENIE Pro integrated spectral information and spatial cues such as texture, local morphology and large-scale shape information in a much more sophisticated way. The performance of genetic algorithms (GAs) is quite sensitive to control parameters. For example, it is possible to destroy a well-performing chromosome when the crossover probability is high. On the other hand, a low crossover probability may prevent the algorithm from obtaining better individuals and does not guarantee faster convergence. A high mutation rate may cause too much diversity and take longer to reach the optimal solution, whereas low mutation tends to miss some near-optimal points. A tendency for all of the population to converge to a single suboptimal solution is also possible given a low mutation rate. If all of the members of the population are very similar, the crossover operator has little function and mutation turns out to be the primary operator (Herrera & Lozano, 2003). This negative effect triggers the problem of premature convergence, where the solving procedure is trapped in a suboptimal state and most of the operators are unable to generate offspring that surpass their parents any more. The use of fuzzy logic controllers to adapt GA parameters is one possible solution to overcome these impediments and improve the performance of the GA. In this study, we propose an adaptive fuzzy logic-based genetic approach to the detection of buildings from high-resolution satellite images. The approach is based on the combination of GAs and supervised image classification and therefore, it can be considered a hybrid feature extraction procedure. As the approach’s major novelty, an adaptive-fuzzy logic module is integrated with the conventional GA in an attempt to improve the performance of the GA and reduce the premature convergence problem by adjusting the algorithm’s parameters. Unlike the abovementioned previous studies, the present study solely utilizes satellite imagery; no auxiliary
49
data such as LIDAR or a DEM are employed to locate the buildings. The approach was implemented using a program written in the MATLAB programming environment. 2. Methodology A flowchart of the proposed methodology is given in Fig. 1. First, training and test regions are selected within the imagery. Next, predefined image processing operations are arbitrarily applied to the Blue (B), Green (G), Red (R) and Near Infrared (NIR) image bands to obtain spectral and textural attributes that are then reduced to a single binary image band (the temporary building regions) through Fisher’s linear discriminant analysis. Then, based on the temporary output and reference data, the fitness value of the candidate solution is computed by comparing the pixels that belong to the region of interest. This computation is followed by running GA operations such as selection, crossover and mutation. Selection retains the successful solutions (operator chains), whereas crossover and mutation are included to try to diversify the remaining candidate solutions for the next generations. In the next step, the parameters of the GA are updated by an adaptive fuzzy logic controller to improve the algorithm’s performance. The newly adjusted parameters are then used in the next generation. This evolutionary process is repeated until a predefined number of generations is reached. As the final step, we apply post-processing operations. Post-processing is essential to remove various false alarm areas and image distortions that are likely to appear. The operations we use in post-processing include morphological image processing functions such as opening, artifact removal, closing and hole-filling. 2.1. Image-based genetic algorithm fundamentals GAs are a relatively popular paradigm that mimics the principles of genetics and natural selection. A GA is a search heuristic that is used to generate solutions for optimization and search problems (Haupt & Haupt, 2004). In recent years, GAs have become a popular optimization technique in the field of image processing. Applications of image-based GAs extend from image enhancement filters (Chang-Shing, Shu-Mei, & Chin-Yuan, 2005) and edge detection (Li, Bai, & Zhang, 2007) to image classification (Yang, 2007) and segmentation (Maulik, 2009). For instance, GAs can be used to construct new image enhancement filters or to optimize the parameters of existing filters. Different image-based GA studies have addressed different problems. Every approach is unique, with different chromosome and gene encoding schemes as well as selection, crossover and mutation strategies, which are the key to the success of the optimization (Paulinas & Usinskas, 2007). The structure of the image-based GA used in our study is shown in Fig. 2. In this model, the population of the GA is generated from a predefined number of chromosomes, each of which can be seen as a candidate solution for extracting the building regions. The structure of a chromosome consists of a predefined number of image processing operations (genes). The genes are well-known operators such as the basic mathematical, logical, thresholding, texture, spectral and filtering operators. 2.2. Spectral and textural attribute extraction The first processing step is the selection of training and test areas from the imagery. In this study, we aim to discriminate buildings from all other background objects. Therefore, the feature classes ‘‘building’’ and ‘‘non-building’’ are specified, and the training samples are selected for those two classes. We also collect test samples with which to compute the fitness value of the extracted
50
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
Fig. 1. Flowchart of the proposed adaptive fuzzy-genetic algorithm approach for building detection.
Fig. 2. The structure of a population that is composed of M chromosomes and N genes in each chromosome.
building regions. Next, the initialization of the chromosomes with the image processing operators (genes) is carried out. These operators are randomly selected from a gene pool. The complete list of image processing operators (Gonzalez, Woods, & Eddins, 2009, chaps. 2–6 & 9–10) included in the gene pool is given in Table 1. In the gene pool, the first category (Table 1) consists of basic mathematical operators. Operator 1 simply adds two bands of
imagery. Operator 2 adds a positive or negative scalar parameter to a band. Operator 3 subtracts two bands from each other, whereas Operator 4 subtracts a positive or a negative scalar parameter from a band. Operator 5 is similar to Operator 3 but divides the result by the sum of its two inputs. Operator 6 multiplies the pixel values of two bands. Operator 7 scales the input band by a positive scalar. Operator 8 divides two bands pixel by pixel where the divisor and the divided bands are selected arbitrarily. Operator 9 is similar to Operator 7 but multiplies the input band by the reciprocal of the scalar. Operators 10, 11 and 12 respectively apply the negation, square root and square operators to a single input band. Operator 13 is similar to Operator 7 but adds an extra parameter to the scaled input. Operator 14 outputs a linear combination of two inputs specified by a parameter that takes a value between 0 and 1. The second category comprises the fundamental logical operators. Operators 15 and 16 perform the minimum and maximum operations, respectively, pixel-by-pixel. Operator 17 outputs its third input whenever the first input is less than the second input and outputs its fourth input otherwise. The third category in the gene pool comprises several basic thresholding operators. In this category, Operator 18 truncates any pixel values above a value set by its parameter. Operator 19 does the reverse operation of Operator 18. With the use of Operator 20, the values below its parameter are set to 0 (black), whereas the values above the parameter are set to 1 (white). The operators in the texture category apply Laws’ texture energy measures (Laws, 1980) to the input bands. The fundamental
51
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 Table 1 The primitive image processing operators (the gene pool). Category
Operator Id
Operator description
# Of input bands
# Of parameters
Basic mathematical
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Add bands Add scalar Subtract bands Subtract scalar Normalized difference Multiply bands Multiply by scalar Divide bands Divide by scalar Negate band Square root Square Linear scale Linear combination
2 1 2 1 2 2 1 2 1 1 1 1 1 2
0 1 0 1 0 0 1 0 1 0 0 0 2 1
Logical
15 16 17
Minimum Maximum If less than else
2 2 4
0 0 0
Thresholding
18 19 20
Clip high Clip low Threshold
1 1 1
1 1 1
Texture
21 22 23 24 25
R5R5 LAWB LAWD LAWF LAWH
1 1 1 1 1
0 0 0 0 0
Spectral
26 27 28
Distance similarity Correlation similarity Similarity value
3 3 3
0 0 0
Filtering
29 30 31 32 33 34 35
Average Sobel Prewitt Gaussian Laplacian Laplacian of Gaussian Unsharp
1 1 1 1 1 1 1
0 1 1 1 1 1 1
L3, E3 and S3 and the derived vectors L5, E5, S5, W5 and R5 are composed of 1-D convolution kernels: L3 = [1 2 1], E3 = [1 0 1], S3 = [1 2 1], L5 = [1 4 6 4 1], E5 = [1 2 0 2 1], S5 = [1 0 2 0 1], W5 = [1 2 0 2 1] and R5 = [1 4 6 4 1]. For these kernels, the mnemonics stand for (L)evel, (E)dge, (S)pot, (W)ave and (R)ipple (Laws, 1980). In this study, R5R5, LAWB, LAWD, LAWF and LAWH (Operators 21–25) are generated from the above set of 1-D kernels, in which R5R5 corresponds to R5T R5 and LAWB, LAWD, LAWF and LAWH correspond to S3T L3, E3T E3, L3T S3 and S3T S3, respectively. In the next category, which is composed of spectral operators (Operators 26–28), the spectral similarity within the input bands is provided by the distance and correlation similarities along with the similarity value. The last category (Operators 29–35) consists of various filtering operators with a default kernel size of 3 3. Operator 29 performs average filtering. Operators 30 and 31 emphasize edges, where an additional binary parameter (0 or 1) is used to indicate the gradient direction such that 0 indicates vertical and 1 refers to horizontal. Operator 32 performs rotationally symmetric Gaussian low-pass filtering with a standard deviation sigma value between 0 and 1. Operator 33 is a Laplacian filter that approximates the shape of the two-dimensional Laplacian operator. The parameter ‘alpha’ controls the shape of the Laplacian and ranges from 0 to 1. Operator 34 is the Laplacian of the Gaussian filter with the same standard deviation parameter as in Operator 32. Finally, Operator 35 performs unsharp contrast enhancement filtering from the negative of the Laplacian filter with an ‘alpha’ parameter that controls the shape of the Laplacian and must be between 0 and 1.
Input
All chromosomes in the population have the same fixed number of genes. An example of a chromosome with five genes could be [3 10 20 10 24], where the numbers denote the operators. For this chromosome, the image processing operators ‘‘Subtraction (Operator 3)’’, ‘‘Negation (Operator 10)’’, ‘‘Thresholding (Operator 20)’’, ‘‘Negation (Operator 10)’’ and ‘‘Texture (LAWF) (Operator 24)’’ are randomly applied to selected input and output bands. The input image bands include B (blue), G (green), R (red) and NIR (near-infrared), and the output bands are the empty temporary bands. A temporary output band can also be used as an input band after it is initialized by an operator. Throughout this study, we used four predefined temporary bands: ‘‘temp1’’, ‘‘temp2’’, ‘‘temp3’’ and ‘‘temp4’’. Considering the above hypothetical chromosome, an example fictitious scenario works as follows: Let us assume that the algorithm selects two input bands (R and G) and one output band (temp3) for Operator 3. The result of the subtraction (R–G) is written to ‘‘temp3’’. From now on, band ‘‘temp3’’ can also be used as an input band. For the next gene (Operator 10), single input and output bands are selected. For example, the bands ‘‘NIR’’ and ‘‘temp1’’ might be selected as the input and output, respectively. The result of negating the ‘‘NIR’’ band is written to ‘‘temp1’’. Next, a single input band (temp3) and an output band (temp2) are selected for Operator 20 together with a scalar parameter between 0 and 255 for an 8-bit image. In our example, the scalar parameter is defined as 135. Therefore, those pixel values of ‘‘temp3’’ above 135 are set to 255, whereas values below the parameter are set to 0. The result of this operation is written to band ‘‘temp2’’. After that, Operator
52
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
10 is used to negate an input band (B) and the resultant image is written to an output band (temp4). In this case, ‘‘temp4’’ becomes the automatic output band because it is the only remaining empty band. Finally, an input band (temp1) is selected for Operator 24, the LAWF texture mask is applied to this band and the output is written to the selected output band (temp4). 2.3. Dimension reduction using Fisher’s linear discriminant analysis In the next step, the temporary output bands are reduced to a single band that represents the temporary building regions. The reduction process is conducted by means of Fisher’s linear discriminant, which is a conventional classification algorithm. The method provides a linear combination of the temporary output bands that maximize the mean separation between true pixels (building) and false pixels (non-building), normalized by the total variance in the projection defined by the linear combination. The result of the discriminant-finding phase is a gray-scale image, which is then reduced to a binary image using a threshold value that maximizes ‘‘fitness’’. Fig. 3 illustrates the dimension reduction procedure (Duda, Hart, & Stork, 2001). In a projection onto one direction, w (two class problem), the samples are d-dimensional vectors x1, . . ., xn, which consist of two subsets, D1 and D2. The projected samples are computed using Eq. (1), which consists of two subsets Y1 and Y2 (Duda et al., 2001, chap. 3):
y ¼ wt x
ð1Þ
The criterion is to maximize Fisher’s linear discriminant J(w):
JðwÞ ¼
wt SB w wt Sw w
ð2Þ t
where SB ¼ ðm1 m2 Þ ðm1 m2 Þ is the between-scatter matrix (mi = mean of x e Di) and Sw = S1 + S2 is the within-scatter matrix, where
Si ¼
X ðx mi Þðx mi Þt
ð3Þ
x2Di
The optimal line direction w can be computed as follows:
w ¼ S1 w ðm1 m2 Þ
ð4Þ
Fig. 3. The optimum direction w in discriminating the points that belong to two different classes (red and black). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
2.4. Fitness computation After extracting the building regions in binary form, the fitness value (FT) of the candidate solution (chromosome) is computed. For the building and non-building regions, the fitness value of a chromosome can be defined by the degree of agreement between the binary output and the test pixels. For each chromosome, FT is calculated using the following equation:
FT ¼ 500ðD þ ð1 MDÞÞ
ð5Þ
where D, the detection rate, is the fraction of test pixels marked as ‘‘building’’ that the classifier marks as ‘‘building’’ plus the fraction of test pixels marked as ‘‘non-building’’ that the classifier marks as ‘‘non-building’’. MD, the mis-detection rate, is the fraction of test pixels marked as ‘‘building’’ that the classifier marks as ‘‘non-building’’ plus the fraction of test pixels marked as ‘‘non-building’’ that the classifier marks as ‘‘building’’. For instance, if D = 1, then MD becomes 0 and FT is computed as 1000, which is the best case. In the worst case, FT becomes 0, for which D = 0 and MD = 1. Note that a fitness score of 500 can be achieved with a classifier that identifies all pixels as ‘‘building’’ or ‘‘non-building’’. 2.5. Selection, crossover and mutation operations After obtaining the fitness values for all chromosomes in the population, the chromosomes are ranked according to their values and only the highest-ranking chromosomes are selected, discarding the rest. The selection rate XR is the fraction of the total population NPOP that survives for the next generation. The number of chromosomes to be kept NKEPT is computed as follows:
NKEPT ¼ N POP X R
ð6Þ
For the chromosome population in a generation NPOP, only the top NKEPT are kept for mating and the rest (NPOP–NKEPT) are discarded to make room for new chromosomes. Next, two chromosomes are selected from among the NKEPT chromosomes to produce two new offspring. To select the chromosomes, a random pairing technique, which utilizes a uniform random number generator, is used. To preserve the success rate for the next generations, the chromosome with the highest fitness value, the ‘‘elite chromosome’’, is excluded from this process as suggested by De Jong (1975). After selecting the parent chromosomes, the mating procedure is carried out. Mating can be defined as the creation of one or more offspring chromosomes from the selected parents. The most common forms of mating include the production of two offspring by two parents (crossover) and the creation of a single offspring from one parent (mutation). In general, mutation takes place after a crossover is performed. These operations are aimed at creating a better population in the next generation by producing altered offspring versions of ‘‘fit’’ parent chromosomes. The probabilities of the parent chromosomes being involved in the crossover and mutation operations are set to Pc and Pm, respectively. In this study, we use the ‘‘single point’’ crossover operation, in which a crossover point is randomly selected between the first and the last genes of the parents’ chromosomes (Holland, 1992). In Fig. 4, Parent Chromosome-1 first copies those of its genes that are to the left of the crossover point to Offspring Chromosome-1. Similarly, Parent Chromosome-2 copies those of its genes that are to the left of the same crossover point to Offspring Chromosome-2. Then, the genes to the right of the crossover point from Parent Chromosome-1 are moved to Offspring Chromosome-2 and the same genes from Parent Chromosome-2 (genes 1–3) are passed to Offspring Chromosome-1 in the same manner. Mutation is the second way of diversifying a population. As with crossover, we employ a single point mutation procedure (Holland,
53
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
Fig. 4. An example for the crossover operation and the generated offspring chromosomes.
1992). The gene to be mutated is randomly selected from the offspring chromosome produced by the crossover operation and exchanged for a new gene. This gene is selected arbitrarily from the gene pool, which comprises the set of image operators (Fig. 5). At the end of the crossover and mutation operations, the parents are expected to produce a total of NPOP–NKEPT offspring to keep the chromosome population at NPOP. To do that, the selection and mating procedures are repeated until the required number of offspring is produced. 2.6. Adaptive fuzzy logic controller operation Before proceeding to the next generation of the genetic image component, an adaptive fuzzy logic controller operation proposed by Herrera and Lozano (2003) and Liu, Xu, and Abraham (2005) is performed. In this study, the measures for the performance of the GA, such as the average and maximum fitness values and the control parameters (crossover and mutation probabilities), are fed into the adaptive fuzzy logic controller. Then, the controller returns the adjusted parameters for use in the next generation of the GA cycle. The idea behind the adaptive fuzzy logic controller is that the crossover and mutation probabilities (Pc and Pm) should increase if it consistently produces better offspring. However, Pc should decrease and Pm should increase when fave(k) (average fitness of the kth generation) approaches fmax(k) (maximum fitness of the kth generation) or fave(k 1) approaches fave(k). This scheme is based on encouraging the well-performing genes to produce more offspring and reducing the chance for poorly performing genes to destroy the potential chromosomes during the crossover and mutation processes. Two parameters (e1 and e2) were introduced (Eqs. (7) and (8)) to define the fuzzy rules for crossover and mutation operations. With the use of these parameters, the fuzzy rules are identified to describe the relationship between the inputs (e1 and e2) and the output, which is the step size of the crossover or mutation probabilities (Table 2).
e1 ¼
fmax ðkÞ fav e ðkÞ fmax ðkÞ
ð7Þ
e2 ¼
fav e ðkÞ fav e ðk 1Þ fmax ðkÞ
ð8Þ
In Table 2, the abbreviations NL, NS, ZE, PS and PL, respectively, stand for ‘‘Negative Large’’, ‘‘Negative Small’’, ‘‘Zero’’, ‘‘Positive Small’’ and ‘‘Positive Large’’. The values of these parameters are derived from the membership functions given in Liu et al. (2005). The inputs for the mutation controller are the same as those for the fuzzy logic controller of the crossover. However, the output values of the mutation (illustrated by an asterisk in Table 2) are scaled by 10% (i.e. PS = PS/10). The output values, which are computed by the defuzzification process, specify the step sizes DPc(k) and DPm(k) for the crossover and mutation probabilities, respectively. The defuzzification process, which converts the fuzzy output back into numerical values, is performed by means of the centroid approach using the membership functions described above. In this approach, the fuzzy set membership function has the shape of a triangle (Fig. 6a). If this triangle is cut along a straight horizontal line somewhere between the top and the bottom and the top portion is removed, the remaining shape looks like a trapezoid (Fig. 6b). In the initial step of defuzzification, parts of the graphs are cut off to form trapezoids. All of these trapezoids are then superimposed on one another, forming a single geometric shape. Then, the centroid of this shape is calculated and used as the defuzzified value (Fig. 6c). If the shape has a plate of equal density, the centroid is the point along the horizontal axis about which this shape would balance. By means of the defuzzification process, the control parameters of the GA are modified using the computed values DPc(k) and DPm(k) given in the following equations (Liu et al. (2005)):
Pc ðkÞ ¼ Pc ðk 1Þ þ DPc ðkÞ
ð9Þ
Pm ðkÞ ¼ Pm ðk 1Þ þ DPm ðkÞ
ð10Þ
After determining the new probabilities for crossover and mutation, the next generation is initiated with a renewed population. The number of generations in the model depends on whether an acceptable solution is reached or a predefined number of
Fig. 5. An example for the mutation operation and the updated offspring.
Table 2 Fuzzy rules for crossover and mutation operations. CROSSOVER (DPc(k))
e2
e1 PL PS ZE
NL NS ZE NS
NS ZE ZE NL
ZE NS NL NL
PS PS ZE NL
PL PL ZE NL
MUTATION (DPm(k))
e2
e1 PL PS ZE
NL PS ZE PS
NS ZE ZE PL
ZE PS PL PL
PS NS ZE PL
PL NL NS PS
54
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
element. Next, an artifact removal operation is carried out to remove isolated regions. Then, the closing operation, which tends to smooth sections of contours, is carried out. In contrast to the opening operation, the closing operation generally fuses narrow breaks and long thin gulfs. The closing of set A by structuring element B, denoted A B, is defined as
A B ¼ ðA BÞ B
ð12Þ
where the disk-shaped structuring element is used once again. In the final step of post-processing, the hole-filling operation is carried out. A hole is defined as a set of background pixels surrounded by a connected border of foreground pixels in a binary image. In general, the hole-filling algorithms are based on a combination of dilation, complementation and intersection in an image. The effects of the morphological operators on a sample test scene (scene 1) are illustrated in Fig. 7. The composite effect of the opening and artifact removal operations shows that isolated regions and small protrusions are eliminated to a great extent. Closing fuses the narrow building object parts and hole-filling fills the isolated regions inside the building patches. 3. Experimental setup 3.1. Description of study area and data
Fig. 6. An example defuzzification process; (a) The original membership functions, (b) the membership functions whose top portions are removed and (c) the superimposed trapezoids with the centroid of the shape (red dashed line), which specifies the defuzzified value. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
iterations is reached. After a while, all of the chromosomes and their fitness values will become the same. At this point, the algorithm is stopped. In our experiments, the GA is stopped after the predefined number of generations is reached. 2.7. Morphological post-processing Once the buildings are detected, artifacts are removed by applying post-processing operations. The post-processing functions that we utilize include the image morphological operations: (i) opening, (ii) artifact removal, (iii) closing and (iv) hole filling (Gonzalez & Woods, 2008, chap. 9). First, the opening operation is carried out to smooth the contours of the building regions and eliminate thin protrusions. The opening of set A by structuring element B is denoted as A B, which is formulated as
A B ¼ ðA BÞ B
ð11Þ
where the symbols and denote morphological erosion and dilation, respectively. Erosion tends to decrease the sizes of objects and remove small anomalies by subtracting objects with a radius smaller than that of the structuring element. In contrast, dilation generally increases the sizes of objects and connecting areas that are separated by spaces smaller than the size of the structuring
The developed methodology was implemented in 10 different test scenes selected from the Batikent district of the city of Ankara, Turkey. Batikent, which covers an area of approximately 1000 ha, is located on the western corridor of Ankara. It is a planned and regularly developed settlement area that contains various types of buildings with different shapes and usages, such as residential, industrial, commercial, social and cultural facilities. The district was the biggest mass-housing project of the 1980s, accomplished through cooperatives in Turkey. The project was planned for 50,000 housing units and 250,000 inhabitants (European Resettlement Fund., 2011). Fig. 8 illustrates the test scenes in false color composite. Scenes 1, 3, 4, 5, 6 and 7 were classified as ‘‘urban’’ with respect to a study conducted by Steiniger, Lange, Burghardt, and Weibel (2008). The perceptual properties of an urban area are such that the built-up area is dense and the building shapes are generally complex and compact. The areas covered by the urban scenes are 353 263 m for scene 1, 283 282 m for scene 3, 164 156 m for scene 4, 357 348 m for scene 5, 324 196 m for scene 6 and 247 190 m for scene 7. Scenes 2 and 8 were identified as ‘‘suburban’’ areas where rows of single houses along roads are emphasized. Moreover, the built-up density is low and the buildings are rather dispersed. The areas covered by the suburban scenes are 572 347 m for scene 2 and 632 431 m for scene 8. The remaining scenes (scenes 9 and 10) were classified as ‘‘rural’’ areas where the rural context generally comprises single buildings. Additionally, the built-up area is open and the size of the buildings varies from small to large. Rural scenes 9 and 10 cover areas of 552 572 m and 646 607 m, respectively. For the satellite data, we used 1-m resolution pan-sharpened IKONOS imagery acquired on August 4, 2002. The image was in ‘‘Geo’’ format. To assess the classification accuracies, a reference dataset was prepared for each scene by means of manually delineating and labeling the buildings from the image (Fig. 9). 3.2. Parameter assignment The proposed approach to building detection utilizes a number of parameters including the selection rate (XR), the number of generations, the number of runs, the population size (number of
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
55
Fig. 7. For a sample test scene; (a) the extracted building patches, the building patches after applying the (b) opening, (c) artifact removal, (d) closing and (e) hole filling operations.
chromosomes), the chromosome size (number of genes) and the probabilities of crossover (Pc) and mutation (Pm). For each of these parameters, we determined the optimum value. To decide on a selection rate is somewhat arbitrary. Letting only a few chromosomes survive may limit the number of available genes, whereas keeping too many chromosomes may result in poor performance. Similar to a study conducted by Haupt and Haupt (2004, chaps. 1–2), XR was kept to the 50% level in the natural selection process. After performing several tests, we found that 20 generations were optimum to obtain a barely changing value of maximum fitness. Similar to studies conducted by Garg (2009) and Bielza, Fernandez del Pozo, Larranaga, and Bengoetxea (2010), the number of runs was set to 10. The initial crossover and mutation probabilities were set to 0.8 and 0.2, in parallel with the literature (Haupt & Haupt, 2004; Liu et al., 2005; Perkins et al., 2000). To determine the optimum values for the population and chromosome sizes, the accuracy test conducted in Sumer (2011) based on the average and maximum fitness values was used. The test results showed that a population size of 30 and a chromosome size of 5 can be used for scenes with similar characteristics to the test scenes used in this study. For the post-processing operations, a disk-shaped structuring element with a radius of 3 was used for both the opening and
closing operations. Although other shapes of structuring elements, such as diamond, line, square and rectangle, are also available, the disk-shaped element with a radius of 3 was found to be more feasible for preserving the orientation of the building regions. The threshold value for the artifact removal operation was set to 75. This threshold value was selected because it represents half of the minimum area patch (150 square meters) among the patches in the reference data. Therefore, the patches above the specified threshold value were considered to be buildings and preserved; the patches staying below the threshold value were removed from the binary image. 3.3. Performance evaluation To evaluate the performance of the proposed building detection approach, the quantitative metrics proposed by Shufelt (1999), Lillesand, Kiefer, and Chipman (2008), chap. 7) and Rutzinger, Rottensteiner, and Pfeifer (2009) were used. When comparing the detected buildings with the reference data, True Positive (TP) is defined as an entity labeled as a ‘‘building’’ that also corresponds to a ‘‘building’’ in the reference data. True Negative (TN) is an entity that belongs to ‘‘non-building’’ in both the detection results and the reference data. A False Positive (FP) is an entity labeled as a
56
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
Fig. 8. The false color pan-sharpened IKONOS images for the test scenes 1–10.
Fig. 9. The reference data prepared for the test scenes 1–10.
‘‘building’’ that corresponds to a ‘‘non-building’’ in the reference, whereas a False Negative (FN) is the exact opposite of the FP case. To evaluate the accuracies of the detected building patches, the
metrics Producer’s Accuracy (PA), User’s Accuracy (UA) and Kappa (j) were computed using the following equations (Lillesand et al., 2008):
57
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
PA ¼
TP TP þ FN
ð13Þ
UA ¼
TP TP þ FP
ð14Þ
j¼
ðTP þ TNÞ ðTP þ TN þ FP þ FNÞ chance agreement
agreement that would occur by chance. The kappa value will be 0 if two datasets agree only at the rate expected by chance, 1 if they always agree and negative if the performance is worse than random. In general, a kappa value above 0.8 is considered a ‘‘good’’ agreement, a value between 0.67 and 0.8 is taken as ‘‘fair’’ and agreement below 0.67 is assumed to be ‘‘dubious’’ (Manning, Raghavan, & Schütze, 2008, chap. 8).
ð15Þ
ðTP þ TN þ FP þ FNÞ2 chance agreement
chance agreement ¼ ðTP þ FPÞ ðTP þ FNÞ þ ðTN þ FNÞ ðTN þ FPÞ
4. Results and discussion
ð16Þ
4.1. Building detection using the adaptive fuzzy logic controller
The metric PA, also referred to as ‘‘Detection Rate’’ or ‘‘Completeness’’, is treated as a measure of the object detection performance. It evaluates the fraction of reference pixels labeled as ‘‘building’’ that is also identified as such by the approach. The metric UA, also referred to as ‘‘Correctness’’, indicates how well the detected buildings match with the reference data and is an indicator of the false alarm rate. The compound metric j is generally assumed to be a more robust measure than a simple percent agreement calculation because it also takes into account the
The developed approach was tested using the predetermined GA parameters. We used a value of 20 for the parameter ‘‘number of generations’’, 30 for ‘‘population size’’, 5 for ’’chromosome size’’, 0.8 for ‘‘crossover probability’’ and 0.2 for ‘‘mutation probability’’. Due to the adaptive fuzzy logic controller, the crossover and mutation probabilities were adjusted adaptively with respect to performance measures. Each experiment was repeated 10 times and the highest average fitness values computed for scenes 1–10,
Table 3 (a) For 10 individual runs, the fitness values computed for 20 generations for scene#1 and (b) average fitness values for scenes #2–#10. Number of generation
(a) #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20
Number of run
Avg. fitness
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
921 937 940 940 940 940 940 940 940 940 944 944 944 944 944 944 944 944 944 944
855 858 858 858 942 942 942 942 942 942 942 942 942 942 942 942 942 942 942 943
902 902 902 902 902 938 939 939 939 941 942 942 942 947 947 947 947 947 947 947
861 862 923 934 935 935 935 935 935 935 935 935 935 935 935 935 935 935 936 936
903 903 903 903 903 936 936 936 936 936 936 936 936 936 936 936 936 936 939 940
894 899 899 899 899 935 935 935 935 935 935 935 939 941 941 941 941 941 944 944
918 920 920 920 920 920 920 920 920 920 920 934 935 935 935 935 935 935 935 935
893 896 921 921 921 921 921 921 936 936 936 936 936 936 940 940 940 940 940 940
896 896 902 902 902 902 931 931 931 944 944 944 944 944 944 944 944 944 944 944
918 937 937 937 937 938 946 946 946 946 946 946 946 946 946 947 947 947 947 947
896 901 910 912 920 931 934 934 936 937 938 939 940 941 941 941 941 941 942 942
Avg. fitness Scene #2
#3
#4
#5
#6
#7
#8
#9
#10
(b) 908 918 920 921 924 930 933 934 934 935 935 936 936 937 937 937 938 938 938 938
897 899 901 902 902 902 902 902 903 903 903 905 905 906 906 906 906 906 906 907
897 898 901 902 902 902 903 904 904 904 904 905 905 906 906 906 906 907 907 907
810 813 815 816 817 820 822 823 824 824 826 826 826 827 827 827 829 830 831 831
945 951 953 954 954 954 955 959 959 960 960 962 962 962 963 963 964 964 964 965
851 864 866 868 869 873 874 877 877 877 889 895 904 905 905 905 912 913 913 913
750 773 775 787 795 801 804 811 816 837 843 844 844 847 850 852 853 854 857 857
852 854 860 872 881 885 894 902 909 910 917 917 920 920 920 924 926 927 927 927
887 892 912 916 917 921 933 933 933 934 936 939 939 940 940 940 942 943 943 944
58
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
respectively, in the 20th generation were 942, 938, 907, 907, 831, 965, 913, 857, 927 and 944. The complete fitness values obtained for scene 1 and the average fitness values for the remaining scenes (scenes 2–10) are given in Table 3. For scene 1, the best progress was made by run 2, which starts from a fitness value of 855 and reaches 943. These extreme values are italic in Table 3a. However, assessing the progress of an individual run may yield misleading results because the runs may start from different fitness scores in the parameter space. This is due to the fact that the proposed approach is based on a random process in which one individual may produce a very successful score in a certain run, whereas another may fail in a different run for the same scene. Therefore, to stabilize the extreme values,
we considered the average of the fitness values. In this context, the average fitness values were analyzed for all generations where the first generation refers to the classification results of the Fisher’s linear discriminant analysis and the last generation corresponds to the final fitness values. As expected, an increasing trend is evident for all scenes; the differences in fitness values between the first and the last generations were found to be 46, 30, 10, 10, 21, 20, 62, 107, 75 and 57 for scenes 1–10, respectively. The best progress, with a standard deviation of 33.28, was achieved for scene 8. This scene is classified as ‘‘suburban’’ and covers an area of 632 431 m, which can be considered rather large. On the other hand, the worst progress was observed for scenes 3 and 4 with standard deviations of 2.76 and 2.86, respectively. These scenes
Fig. 10. The detected buildings for the test scenes 1–10 with the metrics: Fitness, Producer’s Accuracy, User’s Accuracy and Kappa.
Fig. 11. Examples of false alarm regions (circled by red), incompletely detected buildings (circled by yellow) and closely located buildings (circled by green) selected from the test scenes 1–4. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
59
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
successful one with the highest kappa value of 0.88. Therefore, scene 1 demonstrates ‘‘good’’ agreement. In contrast, scene 5 yielded the worst kappa value of 0.55 and therefore can be interpreted as ‘‘dubious’’. We believe that the failure of the classification for this scene is due to its high building density. Although the computed accuracies were satisfactory, several shortcomings were evident in certain cases. The reasons for the failures were investigated and we found that the success of building detection was highly affected by the scene characteristics. For instance, in scene 3, vegetation occludes the roof edges to a certain extent, which boosts the FN pixels (Fig. 11c). Furthermore, in the lower part of scene 2, the zigzag-shaped roof structures cause spectral confusion that tends to increase the FP pixels (Fig. 11b). The other shortcoming that emerged from the scene characteristics is that closely located buildings are detected as a joined single building patch. This case is shown in scene 4 (circled in green), where the urban buildings are rather dense (Fig. 11d). Apart from these, the red and yellow circles respectively indicate areas of false alarms and incompletely detected buildings (Fig. 11). Aside from the experimental results, the running times were also computed for each test scene. The average running time for 10 individual runs varied between 40 s (scene 4) and 274 s (scene
were categorized as ‘‘urban’’ and based on their areas (283 282 m for scene 3 and 164 156 m for scene 4), they can be considered small. The computed fitness values cannot be used for the evaluation of the quantitative results. They are used only to determine the output images, which are assessed using the metrics described in section 3.3. Therefore, for each test scene, we considered the output image that has the highest fitness value. For the quantitative assessments, the counts TP, TN, FP and FN were computed together with the metrics PA, UA and j. The binary output images with the highest fitness values and the metrics PA, UA and j are shown in Fig. 10. The producer’s accuracies ranged from 0.69 (scene 2) to 0.89 (scene 1). The user’s accuracies and kappa values ranged from 0.50 (scene 10) to 0.91 (scene 3) and 0.55 (scene 5)–0.88(scene 1), respectively. The computed average kappa value was 0.76 for the suburban context (scenes 2, 8), 0.74 for the urban context (scenes 1, 3, 4, 5, 6, 7) and 0.68 for the rural context (scenes 9, 10). Similarly, the computed average producer’s and user’s accuracies were 0.75 and 0.84 for the suburban context, 0.79 and 0.85 for the urban context and 0.78 and 0.66 for the rural context, respectively. Of the ten test scenes used, scene 1 can said to be the most
Table 4 (a) For 10 individual runs, the fitness values computed for 20 generations using the conventional approach for scene#1 and (b) average fitness values for scenes #2–#10. Number of generation
(a) #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20
Number of run
Avg. fitness
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
915 915 931 931 931 931 931 931 931 931 931 931 931 931 931 931 931 931 931 931
934 934 934 934 934 934 934 934 934 934 934 934 937 937 938 938 938 938 938 938
904 904 906 906 934 934 934 934 934 934 934 934 934 934 934 934 934 934 935 935
899 899 903 903 935 935 935 935 935 935 935 935 935 935 940 940 940 940 940 940
863 897 897 897 897 897 897 897 897 899 922 922 922 922 922 922 922 922 922 922
903 903 903 903 903 903 903 903 903 903 934 934 934 935 935 935 935 935 935 935
938 938 939 939 939 939 939 939 939 939 939 940 940 940 940 941 941 945 945 945
880 880 880 880 880 880 903 903 903 903 903 938 938 938 938 938 938 938 941 941
932 932 932 932 932 932 932 939 939 939 939 939 939 939 939 939 939 939 939 939
864 880 880 880 880 880 889 889 902 902 902 902 902 902 908 908 908 908 908 908
903 908 910 910 916 916 920 920 922 922 927 931 931 931 932 932 933 933 933 933
Avg. fitness Scene #2
#3
#4
#5
#6
#7
#8
#9
#10
(b) 919 920 930 930 931 932 932 933 933 933 934 934 934 935 935 935 935 935 935 935
895 896 898 899 899 900 900 900 901 901 901 902 902 902 902 902 902 902 902 902
897 897 898 898 899 900 900 900 901 901 902 903 903 903 904 904 904 904 904 904
812 813 814 818 819 820 823 823 823 823 824 824 825 825 825 826 826 827 827 827
951 952 952 955 955 955 956 956 957 957 958 959 960 960 960 960 961 961 961 961
849 860 865 865 869 873 879 884 885 891 891 892 892 896 896 896 896 898 898 899
786 789 810 813 814 827 830 835 838 839 839 839 840 843 843 848 850 851 851 853
858 864 877 886 887 897 898 901 905 909 909 909 911 912 913 913 914 915 915 916
903 907 910 919 923 924 924 926 931 933 933 933 935 936 936 936 936 937 937 938
60
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
Fig. 12. For two approaches, the equalized performance curves generated for the test scenes 1–10. The dashed lines denote the proposed adaptive approach, while the solid lines refer to conventional approach.
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
10). Of the test scenes used, scene 4 has the smallest area, 164 156 m, whereas scene 10 has the largest area, 646 607 m. We found that the scene size was not the only factor that affects the execution time. The gene arrangement is also an important factor because every image processing operator has a different run time. The total time required to process 10 scenes was computed to be 20.9 min (1251 s), which means that each scene was processed in 2.1 min on average. All of these performance measurements were based on a Windows XP operating system with a Pentium Core 2 1.86 GHz processor and 2 GB RAM. 4.2. Building Detection without using the adaptive fuzzy logic controller To assess the contribution of the adaptive-fuzzy approach, the system was re-executed excluding the adaptive fuzzy logic controller. In this case, the crossover and mutation probabilities were not allowed to be changed and remained at 0.8 and 0.2, respectively, throughout the execution. As in the previous scenario (adaptive approach), each experiment was repeated 10 times. The complete fitness values computed for scene 1 and the average fitness values for the remaining scenes (scenes 2–10) are provided in Table 4. For scene 1, the best fitness progress was made by run 8, which starts from a fitness value of 880 and reaches 941. The extreme values are italic in Table 4a. Similarly to the adaptive scenario, the average fitness values were analyzed instead of individual runs. The computed highest values (averaged over 10 runs) were 933, 935, 902, 904, 827, 961, 899, 853, 916 and 938 for scenes 1–10, respectively, in the 20th generation. In this respect, the fitness differences between the first and the last generations were analyzed for each scene. For scenes 1–10 respectively, the differences were computed to be, 30, 16, 7, 7, 15, 10, 50, 67, 58 and 35. As in the adaptive scenario, the best progress, with a standard deviation of 19.74, was achieved for scene 8. On the other hand, scenes 3 and 4 made the worst progress with standard deviations of 2.09 and 2.57, respectively. In addition to the fitness values, the performance curves of the two approaches (Fig. 12) were also analyzed. To make a better comparison between the conventional approach and the proposed adaptive approach, the starting fitness values were equalized. It is evident that for all test scenes, the proposed adaptive approach performs better than the conventional approach. The equalized differences between the two approaches are 16 (949–933) for scene 1, 14 (949–935) for scene 2, 3 (905–902) for scene 3, 3 (907– 904) for scene 4, 6 (833–827) for scene 5, 10 (971–961) for scene 6, 12 (911–899) for scene 7, 40 (893–853) for scene 8, 17 (933– 916) for scene 9 and 22 (960–938) for scene 10 (Fig. 12). The performance curves indicate that there is a larger probability of getting trapped in local optima using the conventional approach. In particular, the performance curves for scenes 1 and 6 (solid lines) reach their maxima after the 17th generation, whereas the performance curves for scenes 2, 3 and 4 reach their maxima after the 14th, 12th and 15th generations at the local optimal solutions. For scenes 3 and 7, the adaptive approach takes more time to find a better solution with a larger probability of arriving at the optimum solution. On the other hand, scene 8 yields the highest difference between the adaptive and conventional approaches. Similarly to the adaptive approach, the running times were also computed in the present scenario. The computed average running times for all scenes stayed between 32 s for scene 4 and 274 s for scene 10. The scenes for which the fastest and the slowest running times were computed were the same as in the adaptive approach. However, the total running time required to process 10 scenes was 20.1 min (1205 s), which is 4% faster than the running time of the adaptive approach. It is believed that this difference is due to the variable probabilities of crossover and mutation in the adaptive
61
approach. In the present scenario, the crossover and mutation probabilities are fixed, whereas in the adaptive approach, these probabilities are subject to change. In particular, it is observed from the experiments that the mutation probability tends to increase in later generations, making the running times slightly longer due to the execution of many more operations. 5. Conclusions In this study, we presented a GA-based building detection approach using high-resolution satellite imagery. The approach combines a hybrid system of evolutionary techniques with a traditional classification method (Fisher’s linear discriminant) and an adaptive fuzzy logic component. The approach makes a novel improvement to object detection accuracy by reducing the premature convergence problem encountered in GAs. The fundamental image processing operators are integrated with the GA concepts such as population, chromosome, gene, crossover and mutation. The effectiveness of the proposed approach for producing successful results was demonstrated. The experimental validation of the approach was carried out on ten selected test scenes with different characteristics. The kappa values computed for the detected building patches were in the range from 0.55 to 0.88. The extraction performance was better for urban and suburban buildings than for the buildings in the rural test scenes. Among the scenes analyzed, the rural scenes generally include more buildings under construction, which yield low user’s and producer’s accuracies. Further, the detection of densely located urban buildings is also problematic, although the computed user’s and producer’s accuracies were fairly high. The proposed approach provided higher fitness values when compared to a traditional classification method (Fisher’s linear discriminant classifier), for which the corresponding fitness values are computed in the first generation of the GA. In the experimental tests that were carried out, considerable improvements were evident such that average fitness increased by 107 of 1000 units (e.g. scene 8). On the other hand, minor improvements were encountered for the test scenes that have smaller areas (e.g. scene 3). Moreover, if initiated at a high fitness value, it was noticeable that the algorithm has little chance to make a significant jump. Compared to the conventional GA approach, the proposed adaptive fuzzy-genetic algorithm approach is efficient, yielding higher average fitness values. For the test scenes analyzed, the differences in average fitness values between the two approaches were computed in the range from 3 to 40. It is believed that these differences are due to the fixed initial probabilities of the crossover and mutation operations, which greatly increase the risk of getting trapped in a local minimum solution. In other words, the adaptive fuzzygenetic approach reveals the most appropriate solution by dynamically adjusting the GA control parameters. Finally, the image morphology-based post-processing stage reduced the false alarm areas successfully. The morphological opening and artifact removal operations removed the isolated regions and small protrusions to a large extent. Further, the closing and hole-filling operations successfully fused the narrow building parts and removed the holes in the building patches, respectively. The proposed approach has a few limitations. One limitation is that although an adaptive fuzzy logic controller is integrated into the proposed approach, there is no absolute assurance that the GA will find a global optimum solution. Apart from that, parameter selection and initialization are the critical issues in terms of execution cost and overall accuracy. Careful selection is needed for settings like selection, crossover and mutation methods along with the size of the population and of chromosomes. Choosing improper
62
E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62
parameters might result in longer program runs or even unsatisfactory results. Acknowledgments The authors thank the two anonymous reviewers for their constructive suggestions and comments on this article. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.compenvurbsys. 2013.01.004. References Bielza, C., Fernandez del Pozo, J. A., Larranaga, P., & Bengoetxea, E. (2010). Multidimensional statistical analysis of the parameterization of a genetic algorithm for the optimal ordering of tables. Expert Systems with Applications, 37, 804–815. Chang-Shing, L., Shu-Mei, G., & Chin-Yuan, H. (2005). Genetic-based fuzzy image filter and its applications to image processing. IEEE Transactions on Systems, Man, and Cybernetics – Part B. Cybernetics, 35, 694–711. De Jong, K. A. (1975). An analysis of the behavior of a class of genetic adaptive systems. Doctoral dissertation, University of Michigan, Ann Arbor, Michigan, USA. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification (2nd ed.). WileyBlackwell. European Resettlement Fund. (2011). Batikent project-Turkey.
Retrieved 12.10.11. Fraser, C. S., Baltsavias, E., & Gruen, A. (2001). 3-D building reconstruction from high-resolution Ikonos stereo-imagery. In E. P. Baltsavias, A. Gruen, & L. V. Gool (Eds.), Automatic extraction of man-made objects from aerial and space images (III) (pp. 331–344). Leiden: Balkema Publishing. Garg, P. (2009). A comparison between memetic algorithm and genetic algorithm for the cryptanalysis of simplified data encryption standard algorithm. International Journal of Network Security & Its Applications, 1, 34–42. Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2009). Digital image processing using Matlab (2nd ed.). Gatesmark Publishing, LLC. Gonzalez, R. C., & Woods, R. E. (2008). Digital image processing (3rd ed.). New Jersey: Pearson Education. Harvey, N. R., Theiler, J., Brumby, S. P., Perkins, S., Szymanski, J. J., Bloch, J. J., et al. (2002). Comparison of GENIE and conventional supervised classifiers for multispectral image feature extraction. IEEE Transactions on Geoscience and Remote Sensing, 40, 393–404. Haupt, R. L., & Haupt, S. E. (2004). Practical genetic algorithms (2nd ed.). New Jersey: John Wiley & Sons. Herrera, F., & Lozano, M. (2003). Fuzzy adaptive genetic algorithms: Design, taxonomy and future directions. Soft Computing, 7, 545–562. Holland, J. H. (1992). Adaptation in natural and artificial systems (2nd ed.). Cambridge, MA: MIT Press (Chapter 6). Inglada, J. (2007). Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features. ISPRS Journal of Photogrammetry and Remote Sensing, 62, 236–248. Ioannidis, C., Psaltis, C., & Potsiou, C. (2009). Towards a strategy for control of suburban informal buildings through automatic change detection. Computers, Environment and Urban Systems, 33, 64–74. Karantzalos, K., & Paragios, N. (2010). Large scale building reconstruction through information fusion and 3-D priors. IEEE Transactions on Geoscience and Remote Sensing, 48, 2283–2296. Kim, T., Lee, T. Y., & Kim, K. O. (2006). Semiautomatic building line extraction from Ikonos images through monoscopic line analysis. Photogrammetric Engineering and Remote Sensing, 72, 541–549. Koc San, D., & Turker, M. (2012). A model-based approach for automatic building database updating from high-resolution space imagery. International Journal of Remote Sensing, 33, 4193–4218.
Lafarge, F., Descombes, X., Zerubia, J., & Pierrot-Deseilligny, M. (2010). Structural approach for building reconstruction from a single DSM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 135–146. Laws, K. I. (1980). Rapid texture identification. Proceedings of SPIE, 238, 376–380. Lee, D. S., Shan, J., & Bethel, J. S. (2003). Class-guided building extraction from Ikonos imagery. Photogrammetric Engineering and Remote Sensing, 69, 143–150. Lillesand, T., Kiefer, R. W., & Chipman, J. W. (2008). Remote sensing and image interpretation (6th ed.). John Wiley & Sons Inc. Li, Y., Bai, B., & Zhang, Y. (2007). An adaptive immune genetic algorithm for edge detection. Advanced Intelligent Computing Theories and Applications: With Aspects of Artificial Intelligence Lecture Notes in Computer Science, 4682, 565–571. Liu, Z., Cui, S., & Yan, Q. (2008). Building extraction from high resolution satellite imagery based on multi-scale image segmentation and model matching. In Proceedings of 2008 international workshop on earth observation and remote sensing applications, Beijing. Liu, H., Xu, Z., & Abraham, A. (2005). Hybrid fuzzy-genetic algorithm approach for crew grouping. In Proceedings of 5th international conference on Intelligent Systems Design and Applications (ISDA’05) (pp. 332–337), Wroclaw. Manning, C. D., Raghavan, P., & Schütze, H. (2008). An introduction to information retrieval. Cambridge University Press. Maulik, U. (2009). Medical image segmentation using genetic algorithms. IEEE Transactions on Information Technology in Biomedicine, 13, 166–173. Mayunga, S. D., Coleman, D. J., & Zhang, Y. (2007). A semi-automated approach for extracting buildings from Quickbird imagery applied to informal settlement mapping. International Journal of Remote Sensing, 28, 2343–2357. Paulinas, M., & Usinskas, A. (2007). A survey of genetic algorithms applications for image enhancement and segmentation. Information Technology and Control, 36, 278–284. Perkins, S., Edlund, K., Esch-Mosher, D., Eads, D., Harvey, N., & Brumby, S. (2005). Genie Pro: Robust image classification using shape, texture and spectral information. Proceedings of SPIE, 5806, 139–148. Perkins, S., Theiler, J., Brumby, S. P., Harvey, N. R., Porter, R., Szymanski, J. J., et al. (2000). GENIE: A hybrid genetic algorithm for feature classification in multispectral images. Proceedings of SPIE, 4120, 52–62. Rutzinger, M., Rottensteiner, F., & Pfeifer, N. (2009). A comparison of evaluation techniques for building extraction from airborne laser scanning. IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing, 2, 11–20. Shackelford, A. K., & Davis, C. H. (2003). A combined fuzzy pixel-based and objectbased approach for classification of high-resolution multispectral data over urban areas. IEEE Transactions on Geoscience and Remote Sensing, 41, 2354–2363. Shufelt, J. A. (1999). Performance evaluation and analysis of monocular building extraction from aerial imagery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 311–326. Sirmacek, B., & Unsalan, C. (2009). Urban-area and building detection using SIFT keypoints and graph theory. IEEE Transactions on Geoscience and Remote Sensing, 47, 1156–1167. Sohn, G., & Dowman, I. (2007). Data fusion of high-resolution satellite imagery and LIDAR data for automatic building extraction. ISPRS Journal of Photogrammetry and Remote Sensing, 62, 43–63. Steiniger, S., Lange, T., Burghardt, D., & Weibel, R. (2008). An approach for the classification of urban building structures based on discriminant analysis techniques. Transactions in GIS, 12, 31–59. Sumer, E. (2011). Automatic reconstruction of photorealistic 3-D building models from satellite and ground-level images. Unpublished PhD thesis, Middle East Technical University, Ankara, Turkey. Tournaire, O., Bredif, M., Boldo, D., & Durupt, M. (2010). An efficient stochastic approach for building footprint extraction from digital elevation models. ISPRS Journal of Photogrammetry and Remote Sensing, 65, 317–327. Tupin, F., & Roux, M. (2003). Detection of building outlines based on the fusion of SAR and optical features. ISPRS Journal of Photogrammetry and Remote Sensing, 58, 71–82. Wei, Y., Zhao, Z., & Song, J. (2004). Urban building extraction from high-resolution satellite panchromatic image using clustering and edge detection. In Proceedings of the IEEE international geoscience and remote sensing symposium, Anchorage, AK, 2008–2010. Yang, M. D. (2007). A genetic algorithm (GA) based automated classifier for remote sensing imagery. Canadian Journal of Remote Sensing, 33, 203–213.