Journal Pre-proof A benchmark data set for aircraft type recognition from remote sensing images Zhi-Ze Wu, Shou-Hong Wan, Xiao-Feng Wang, Ming Tan, Le Zou, Xin-Lu Li, Yan Chen
PII: DOI: Reference:
S1568-4946(20)30072-7 https://doi.org/10.1016/j.asoc.2020.106132 ASOC 106132
To appear in:
Applied Soft Computing Journal
Received date : 19 May 2019 Revised date : 16 January 2020 Accepted date : 23 January 2020 Please cite this article as: Z.-Z. Wu, S.-H. Wan, X.-F. Wang et al., A benchmark data set for aircraft type recognition from remote sensing images, Applied Soft Computing Journal (2020), doi: https://doi.org/10.1016/j.asoc.2020.106132. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Elsevier B.V. All rights reserved.
Journal Pre-proof
pro of
A Benchmark Data Set for Aircraft Type Recognition from Remote Sensing Images Zhi-Ze Wua , Shou-Hong Wanb , Xiao-Feng Wangc,∗, Ming Tanc , Le Zouc , Xin-Lu Lia , Yan Chena a Institute
of Applied Optimization, School of Artificial Intelligence and Big Data, Hefei University; Jinxiu Dadao 99, Hefei, Anhui, China, 230601. b School of Computer Science and Technology, University of Science and Technology of China; Hefei, Anhui, China, 230027. c School of Artificial Intelligence and Big Data, Hefei University; Jinxiu Dadao 99, Hefei, Anhui, China, 230601.
re-
Abstract
Aircraft type recognition from remote sensing images has many civil and military applications. In images obtained with modern technologies such as high spatial resolution remote sensing, even details of aircraft can become visible.
lP
With this, the identification of aircraft types from remote sensing images becomes possible. However, the existing methods for this purpose have mostly been evaluated on different data sets and under different experimental settings. This makes it hard to compare their results and judge the progress in the field. Moreover, the data sets used are often not publicly available, which brings dif-
urn a
ficulties to reproduce the works for fair comparison. This severely limits the progress of research and the state of the art is not entirely clear. To address this problem, we introduce a new benchmark data set for aircraft type recognition from remote sensing images. This data set is called Multi-Type Aircraft Remote Sensing Images (MTARSI), which contains 9’385 images of 20 aircraft types, with complex backgrounds, different spatial resolutions, and complicated
Jo
variations in pose, spatial location, illumination, and time period. The pub∗ Corresponding
author Email addresses:
[email protected] (Zhi-Ze Wu),
[email protected] (Shou-Hong Wan),
[email protected] (Xiao-Feng Wang),
[email protected] (Ming Tan),
[email protected] (Le Zou),
[email protected] (Xin-Lu Li),
[email protected] (Yan Chen) URL: iao.hfuu.edu.cn (Zhi-Ze Wu)
Preprint submitted to Applied Soft computing
January 16, 2020
Journal Pre-proof
licly available MTARSI data set allows researchers to develop more accurate and robust methods for both remote sensing image processing and interpreta-
pro of
tion analysis of remote sensing object. We also provide a performance analysis of state-of-the-art aircraft type recognition and deep learning approaches on MTARSI, which serves as baseline result on this benchmark.
Keywords: Benchmark, Airplane Type Recognition, Remote Sensing Images, Pattern Recognition
1. Introduction
With the rapid development of the technology, remote sensing images have
re-
become a data source of great significance, which is widely used in civil [1] and military fields [2]. Among the many possible objects to detect from remote sensing image, aircraft are a key concern. Aircraft are not only an important means of transportation in the civil sector, but also a strategic objective in
lP
the military field. Here, dynamic aircraft detection and type identification can provide information of great significance for a timely analysis of the battlefield situation and the formulation of military decisions [2, 3]. Civil applications such as emergency aircraft search, airport surveillance, and aircraft detection are highly important as well [1, 4].
urn a
Due to the continuous improvement of the spatial resolution of remote sensing cameras, the information contained in single images is growing. The currently existing automated interpretation approaches for identifying aircraft from such images are unable to meet the needs of many real-world applications [5]. Therefore, the question how to accurately and quickly extract the position and type of aircraft has come into the focus of research. In particular, aircraft type recognition has attracted significant attention
Jo
from academia and industry. Aircraft differ in details such as model and type, as illustrated in Figure 1, where three types of transport aircraft, C-1, C-17, and C-130, are shown in sequence. Even though these aircraft have different uses and functions, they are quite similar in appearance. In addition, complex
2
(a) C-5.
(b) C-17.
pro of
Journal Pre-proof
(c) C-130.
Figure 1: Three types of transport aircraft: C-5, C-17, and C-130
backgrounds and the characteristics of the image sources, i.e., different satellites, cause additional difficulties for aircraft recognition. The apparent shape, shadow, and color of an object may further be influenced by the solar radi-
re-
ation angle, the radar radiation angle, and the surrounding environment [6]. Therefore, recognizing the aircraft type from remote sensing images remains a challenging problem.
In recent years, several data sets with images that can be used to train air-
lP
craft detection algorithms have been created, including UCMerced LandUse [7], NWPU-RESISC45 [8], PatternNet [9], and FGVC-Aircraft [10]. However, in UCMerced LandUse, NWPU-RESISC45, and PatternNet, aircraft are only a sub-category and appear only in a subset of the data. Also, there is no distinction between their types. While FGVC-Aircraft can be used for fine grained
urn a
visual categorization of aircraft, it is not based on remote sensing images and instead contains natural scenes. In summary, the publicly available data sets are not suitable for the domain of aircraft type recognition from remote sensing images.
The currently existing approaches for aircraft type recognition can coarsely be divided into three categories, namely methods based on template matching [4, 11, 12], based on the shallow feature learning model [3, 13, 14], and based on
Jo
deep neural networks [4, 6, 12, 15, 16]. Most of these works were evaluated under different experimental settings and on different data sets, which, moreover, are not publicly available. This makes it hard to judge the overall progress in the field and also reduces the reproducibility of results very significantly. 3
Journal Pre-proof
A benchmark data set is needed, which should be permanently publicly available and which would make it easy to perform fair algorithm comparisons. Such
pro of
a benchmark would accelerate and promote the development of the research field [17]. The data set should demonstrate all the challenging aspects of the problem, for instance, the high diversity in the geometrical and spatial patterns of aircraft in remote sensing images. Along with the data set, a comprehensive set of results obtained with state-of-the-art approaches should be provided, so that new works can directly be compared to tested, published, and verified results. In this paper, we present such a benchmark data set, called the Multitype Aircraft Remote Sensing Images (MTARSI) [18]. It contains 9’385 remote
re-
sensing images of 20 aircraft types, with complex backgrounds, different spatial resolutions, and complicated variations in pose, spatial location, illumination, and time period. As discussed in the benchmarking study [19], we describe the collection, simulation, labeling, and categorization of the samples for the data set in detail. This will allow researchers to extend the data set using the same
lP
approach as us, if the need for larger data set should arise due to the progress in technology, or to develop similar data sets for other domains. We then compare and evaluate the performance of several typical aircraft type recognition and deep learning approaches on MTARSI in order to provide proper baseline
urn a
results.
The contributions of this paper are three-fold: 1. We construct, present, and publish MTARSI – the first public aircraft remote sensing image database. By doing so, we significantly aid the development of robust methods for aircraft type recognition from remote sensing images.
2. We describe in detail how this data set is created by applying image sim-
Jo
ulation methods supported by data gathered from online sources. This allows for applying our approach to generate high-quality benchmark image data sets for other application areas.
3. We provide the results obtained by state-of-the-art approaches in the field,
4
的国家地图(United State Geological Survey National Map)中选取部分城市区域 的影像,再随机裁剪成图像大小为 256×256 像素而来的,这些图像的空间分辨率 为 1 英尺,约 0.3 米。数据集一共包含有 21 类不同的场景类别,分别为:农业 Journal Pre-proof 用地、飞机、棒球场、海滩、建筑物、丛林、高密度住宅区、森林、高速公路、 高尔夫球场、海港、十字路口、中密度住宅区、移动式房屋停车场、天桥、停车 场、河流、跑道、低密度住宅区、油罐和网球场。该数据集共有 2100 张图像,
pro of
每个类别 100 张。图错误!文档中没有指定样式的文字。.1 为该数据集中的部分样例。
re-
图错误!文档中没有指定样式的文字。.1 UCMerced_LandUse 数据集部分图像样例 Figure 2: A subset of the images in the UCMerced LandUse Data Set [7]
(5)NWPU-RESISC45 数据集
to serve as baseline for comparison for researchers working on remote
错误!未找到引用源。 sensing computer vision. We show 是由西北工业大学创建的,该数据集是 that MTARSI can indeed identify the NWPU-RESISC45 数据集
strengths and weaknesses of different aircraft type recognition algorithms 迄今为止数量最大、类别最多的开源遥感图像场景分类数据集,一共有 31500 张
lP
for remote sensing images. 图像,它首先调研了所以现存数据集中已有的场景类别,然后挑选了最具代表性
的 45 种遥感场景类别,如图错误 !文档中没有指定样式的文字。.2 45 种场 The remainder of this paper is organized as follows. In Section所示,这 2, we review 景类别包括:飞机、机场、棒球场、篮球场、海滩、桥梁、灌木丛、教堂、圆形 existing data sets for aircraft recognition and the state-of-the-art in aircraft 农田、云、商业区、高密度住宅区、沙漠、森林、高速公路、高尔夫球场、田径 type recognition from remote sensing images. Then, the design and contents of 场、港口、工业区、十字路口、岛屿、湖泊、草地、中密度住宅区、公园、山区、 MTARSI are presented in Section 3. The performance of several state-of-the-
urn a
天桥、宫殿、停车场、铁路、火车站、矩形农田、河流、环岛、跑道、海上浮冰、 art aircraft type recognition methods is evaluated on the MTARSI in Section 4. Finally, the paper closes with a conclusion and discussion in Section 5, where we also describe how to obtain the MTARSI data set from its public repository.
2. Related Work
In this section, we first review several data sets commonly used for aircraft
Jo
recognition and then present a comprehensive analysis of state-of-the-art for aircraft type recognition from remote sensing images.
5
Journal Pre-proof
2.1. Existing Data Sets for Aircraft Recognition At present, there are four popular data sets for aircraft recognition,
pro of
namely UCMerced LandUse [7], NWPU-RESISC45 [8], PatternNet [9], and FGVC-Aircraft [10].
The UCMerced LandUse Data Set [7] was established at the University of California, Merced, USA. It is one of the most widely used data sets in the field of remote sensing, especially for various remote sensing image classification tasks. The images in UCMerced LandUse were first selected from the United States Geological Survey National Map and then randomly cropped into sizes of 256 × 256 pixels. The spatial resolution is 1 foot (about 0.3 meters).
re-
UCMerced LandUse contains a total of 21 different scenario types: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court. The data set has a total of 2100 images, 100 in each category.
lP
Note that “aircraft” is only a sub-category of this data set, i.e., there are only 100 images relevant to our domain. Figure 2 shows some examples of this data set.
The NWPU-RESISC45 Data Set [8] was created at the Northwestern Poly-
urn a
technical University, Xi’an, China. The data set is by far the largest open source remote sensing imagery data sets. It contains a total of 31’500 images, which are based on an investigation of existing data sets and a selection of the most representative images of 45 remote sensing scene categories, including aircraft, airports, but also baseball fields, basketball courts, beaches, bridges, bushes, churches, circular farmland, clouds, commercial areas, industrial areas, crossroads, railway, boats, snow mountains, and so on. Each category in the data set has 700 images with an image size of 256 × 256 pixels and a spatial resolution
Jo
of between 0.2 and 30 meters. Aircraft are again a sub-category of this data set, this time including 700 images. Figure 3 shows some examples of this data set. The PatternNet Data Set [9] was established at the State Key Laboratory
of Surveying and Mapping Remote Sensing Information Engineering of Wuhan 6
Journal Pre-proof 船、雪山、低密度住宅区、体育场、油罐、网球场、梯田、热电站和湿地。数据 集中每种类别具有 700 张图像,图像大小为 256×256 像素,空间分辨率控制在 0.2 米到 30 米之间。由于 NWPU-RESISC45 数据集类别众多,涉及到的类别从 小型目标(如飞机、船)到大型场景目标(如住宅区、机场),因此该数据集也
urn a
lP
re-
pro of
是遥感图像场景分类任务中一个极具挑战性的数据集。
图错误!文档中没有指定样式的文字。.2 NWPU-RESISC45 数据集部分图像样例 Figure 3: A subset of the images in the NWPU-RESISC45 Data Set [8]
(6)PatternNet 数据集
Jo
University, Wuhan, China and the University of California, Merced, USA. The 错误 未找到引用源。 PatternNet 数据集 ! 是由武汉大学测绘遥感信息工程国家重点实验 data set is currently the largest high spatial resolution remote sensing image 室与加州大学默塞德分校联合建立的,PatternNet 名字的由来是受一个开源的遥 data set, with a total of 38 categories, such as airplanes, baseball field, basket感卫星图像视觉搜索工具 TerraPattern 的启发。该数据集也是目前规模最大的一 ball court, beach, bridges, cemeteries, shrubs, tanks, swimming pools, tennis 个高空间分辨率遥感图像数据集之一,它一共有 38 个类别,分别是飞机、棒球 courts, substations, and so on. Each category contains 800 images with a size 场、篮球场、海滩、桥梁、墓地、灌木丛、圣诞树农场、封闭公路、海滨别墅、 of 256 × 256 pixels and spatial resolutions ranging from 0.062m to 4.693m. The images of the PatternNet data set are obtained from the Google Earth Satel-
7
为 256×256 像素。PatternNet 数据集的图像是从谷歌地球卫星影像和谷歌地图美 国城市接口中获取得到的,和 AID 数据集、NWPU-RESISC45 数据集类似,
Journal Pre-proof PatternNet 数据集也包含不同的空间分辨率, 范围从 0.062 米至 4.693 米,数据集 中图像普遍为高空间分辨率。值得注意的是,从错误!未找到引用源。和图错误! 文档中没有指定样式的文字。.2 中可以看出,AID 数据集和 NWPU-RESISC45 数据
集中的大部分图像都包含有大面积的背景,感兴趣的目标占比较小,且位置在角 落里,这样容易影响图像表征。而如图错误!文档中没有指定样式的文字。.3 所示, PatternNet 数据集的图像中目标对应的占比较大,背景部分较少,有利于对图像
lP
re-
pro of
的表征。
图错误!文档中没有指定样式的文字。.3 PatternNet 数据集部分图像样例
urn a
Figure 4: A subset of the images in the PatternNet Data Set [9]
lite Imagery and the Google Maps US Urban Interface. Figure 4 shows some examples from this data set.
The Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft [10]) is a benchmark data set for the visual categorization of aircraft. The data set contains 10’200 images of aircraft, with 100 images for each of 102 different aircraft model variants, most of which are airplanes [10]. The (main) aircraft
Jo
in each image is annotated with a tight bounding box and a hierarchical airplane model label. Aircraft models are organized in a hierarchy with four levels [10]. These are, from finer to coarser: • Model, e.g., Boeing 737-76J: Since certain models are nearly visually 8
Journal Pre-proof
727−200
737−200
737−300
737−400
737−500
737−600
737−700
737−800
737−900
747−100
747−200
747−300
747−400
757−200
757−300
767−200
767−300
767−400
777−200
777−300
A300B4
A310
A318
A319
A320
A321
A330−200
A330−300
A340−200
A340−300
A340−500
A340−600
A380
ATR−42
Beechcraft 1900
Boeing 717
C−130
C−47
CRJ−200
Cessna 560
Challenger 600
DC−10
DC−3
DC−6
DHC−8−100
DHC−8−300
DR−400
Dornier 328
E−170
Embraer Legacy 600
Eurofighter Typhoon
F−16A/B
F/A−18
Falcon 2000
Gulfstream IV
Gulfstream V
Hawk T1
Il−76
Model B200
PA−28
pro of
707−320
An−12
BAE 146−200
BAE 146−300
BAE−125
CRJ−700
CRJ−900
Cessna 172
Cessna 208
Cessna 525
DC−8
DC−9−30
DH−82
DHC−1
DHC−6
E−190
E−195
EMB−120
ERJ 135
ERJ 145
Falcon 900
Fokker 100
Fokker 50
Fokker 70
Global Express
MD−80
MD−87
MD−90
Metroliner
Tu−154
Yak−42
re-
ATR−72
L−1011
MD−11
Figure 5: A subset of the images in the FGVC-Aircraft Data Set [10] SR−20
Saab 2000
Saab 340
Spitfire
Tornado
Tu−134
lP
indistinguishable, this level is usually not used in the evaluation of algorithms. Figure 1. Our dataset contains 100 variants of aircrafts shown above. These are also annotated with their family and manufacturer, as well as bounding boxes.
Jo
urn a
• Variant, A variant collapses thetasks: models that variant recognition, airmore substantial. The goal e.g., of thisBoeing level is 737-700: to create a clasWe defineall three aircraft sification task ofare intermediate difficulty. For example, the craft family recognition, and aircraft manufacturer recogvisually indistinguishable into one class [10]. The data set comprises family Boeing 737 contains variants 737-200, 737-300, nition. The performance is evaluated as class-normalised 102dataset different variants. . . . , 737-900. The contains 70 families. average classification accuracy, obtained as the average of • Manufacturer. A manufacturer is a grouping of families the diagonal elements of the normalised confusion matrix. produced by•the same company. For example, Boeing Formally, 70 let different yi ∈ {1,families. . . . , M } the ground truth label for Family, e.g., Boeing 737: The data set comprises contains the families 707, 727, 737, . . . . The dataset conimage i = 1, . . . , N (where N = 10, 000 and M = 100 tains airplanes by 30 differente.g., manufacturers. forcomprises variant recognition). Letmanuyˆi be the label estimated auto• made Manufacturer, Boeing: The data set 41 different matically. The entry C of the confusion matrix is given pq The list of model variants and corresponding example imfacturers. by ages are given in Fig. 1 and the hierarchy is given in Fig. 2. |{i : yˆi = q ∧ yi = p}| C = FGVC-Aircraft contains 100 example images for each of Note that this data has been used as part of ImageNet [20].pq Some samples |{i : yi = p}| the 100 model variants. The image resolution is about 1-2 Mpixels. Image varies images were captured in 5, showing from quality this data setasare displayed in Figure the images contain of a set. The classwhere |that · | denotes the cardinality PM a span of decades, but it is usually very good. The dominormalised average accuracy then p=1 Cpp /M . front and side views of aircraft, while remote sensing images from satellitesis are nant aircraft is generally well centred, which helps focusThe dataset is made publicly available for reing on fine-grained rather while than object detecusually discrimination top views. Thus, FGVC-Aircraft can indeed be used visual search purposes only for at http://www.robots.ox. tion. Images are equally divided into training, validation, ac.uk/ vgg/data/fgvc-aircraft/. Please note ˜ categorization aircraft, it is either not suitable or comparing algorithms and test subsets, so that eachofsubset contains 33 or 34for training (Sect. 3.1) that the data contains images that were generimages for each variant.sensing. Algorithms should be designed on for remote ously made available for research purposes by several phothe training and validation subsets, and tested just once on tographers; however, these images should not be used for the test subset to avoid over fitting. any other purpose without obtaining prior and explicit conBounding box information can be used for training9the sent by the respective authors (see Sect. 5.1 for further deaircraft classifiers, but should not be used for testing. tails). 2
Journal Pre-proof
To the best of our knowledge, no remote sensing data set for aircraft type recognition is publicly available. This is perhaps the primary reason for limiting
pro of
the development in the area. 2.2. Review on Aircraft Type Recognition of Remote Sensing Images
Most of the recent methods for aircraft recognition from remote sensing images are focused on deciding whether an object is an aircraft or not [21, 22, 23, 24, 25]. Less attention has been awarded to the further fine-grained classification and recognition of the aircraft model and type. General image classification approaches can be divided into hand-crafted feature based methods [26, 27, 28, 29] and deep learning based methods [30, 31, 32, 33]. The research on object
re-
type identification from remote sensing images can roughly be divided into three categories, namely methods that based on template matching [4, 11, 12], based on the shallow feature learning model [3, 13, 14], and based on deep neural networks [4, 12, 15, 16, 6].
lP
Let us first consider the template matching based methods.
Here,
Wu et al. [11] proposed a jigsaw reconstruction approach to extract aircraft shapes and then match them with standard templates.
Zhao et al. [4] de-
signed a keypoint detection model based on Convolutional Neuronal Networks (CNNs) [30] and a keypoint matching method to recognize aircraft, transforming
urn a
the aircraft recognition problem into the landmark detection problem. Due to the use of CNNs, this method could also partially be classified as a deep learning approach. Furthermore, Zuo et al. [12] built a template matching method using both aircraft keypoints and segmentation results, which provided more detailed information to improve recognition accuracy. An important shallow feature learning model based method was proposed by Hsieh et al. [13], who use a hierarchical classification algorithm based on four
Jo
different features: wavelet transform, Zernike moment, distance transform, and bitmap. In [3], a coarse-to-fine method is built to integrate high-level information for aircraft type recognition. A method using the artificial bee colony
10
Journal Pre-proof
algorithm with an edge potential function is proposed in [14] to solve this problem.
pro of
These methods for object type identification from remote sensing images have achieved significant results, but there are still many shortcomings. The methods based on template matching have a problem to adapt to complex scenarios [6]. The methods based on the shallow feature learning model have difficulties to extract high-level semantic features from objects and there is a bottleneck in the recognition performance [15].
In recent years, with the improvement of parallel GPU computation and the emergence of large-scale image data sets, deep neural networks [30, 31, 32,
re-
33, 34, 35] have experienced tremendous development and have proven to be efficient in a variety of vision tasks, such as the classification [36, 37, 38, 39], detection [40, 41, 42, 43], and segmentation [44, 45, 46, 47] of images. Also benefiting from the existing large-scale data sets from different application domains [20, 48], deep neural networks can be initialized by using transfer learn-
lP
ing [38, 49]. As a result, the application of deep neural networks has led to breakthroughs in aircraft recognition from remote sensing images. Diao et al. [15] used a pixel-wise learning method based on deep belief networks (DBNs) for object recognition in remote sensing images. In [4], Zhao et al. proposed a idea
urn a
to address the aircraft type recognition problem by detecting the landmark points of an aircraft using a vanilla network. By combing semantic segmentation and aircraft type recognition, another CNN based method is used in [12]. Zhang et al. [16] developed an aircraft type recognition framework for learning representative features from images without type labels, based on conditional generative adversarial networks (GANs). Fu et al. [6] propose a fine-grained aircraft recognition method for remote sensing images. Their multi-class activation mapping (MultiCAM) utilizes two subnetworks, i.e., the target net and
Jo
the object net, in order fully use the features of discriminative object parts. Compared with handcrafted-feature based methods, neural-network based
models can obtain better discriminative feature representations and achieve significant improvements in terms of the recognition accuracy. However, these 11
Journal Pre-proof
models must be trained well on target domain data sets after the initialized stage.
pro of
One problem is that the aforementioned deep neural network based aircraft recognition methods [4, 6, 12, 16, 15] for remote sensing images were evaluated on different data sets under different experimental settings. This makes it hard to compare their results. Additionally, the data sets used in these works and in the research on the template matching based [4, 11, 12] and shallow feature learning based methods [3, 13, 14] are not publicly available. It is thus not easy to reproduce these works for a fair comparisons. Therefore, the state of the art of aircraft type recognition of remote images is not entirely clear.
re-
To address this problem, we introduce a remote sensing image data set of diverse aircraft, which can be used to assess and evaluate the performance of aircraft type recognition algorithms. It also particularly benefits the development of both remote sensing image processing and interpretation analysis of
lP
remote sensing object in the future.
3. The MTARSI Data Set
To advance the state of the art in aircraft type recognition of remote sensing images, we construct MTARSI, a public large-scale aircraft image data set. In
urn a
all, MTARSI has a total of 9’385 remote sensing images acquired from Google Earth satellite imagery and manually expanded, including 20 different types of aircraft covering 36 airports. The new data set is made up of the following 20 aircraft types: B-1, B-2, B-29, B-52, Boeing, C-130, C-135, C-17, C-5, E-3, F-16, F-22, KC-10, C-21, U-2, A-10, A-26, P-63, T-6, and T-43. All the sample images are carefully labeled by seven specialists in the field of remote sensing images interpretation. Each image contains exactly one complete aircraft. Each
Jo
type of aircraft sample in the MTARSI data set is shown in Figure 6. The numbers of sample images vary with different aircraft types (see Table 1) and range from 230 to 846.
12
(c) B-2
(d) B-52
(e) C-5
(f) C-17
(g) C-130
(h) C-135
(k) KC-10
(l) C-21
(n) T-6
(o) A-10
(p) A-26
(r) F-16
(s) B-29
(t) T-43
re-
(b) B-1
lP
(a) Boeing
pro of
Journal Pre-proof
(j) F-22
urn a
(i) E-3
Jo
(m) U-2
(q) P-63
Figure 6: Samples of the 20 aircraft types from MTARSI
13
Journal Pre-proof
Table 1: Different aircraft classes and the number of images in each class of MTARSI
#images
Types
#images
Types
#images
Types
#images
B-1
513
C-130
763
F-16
372
A-10
345
pro of
Types
B-2
619
C-135
526
F-22
846
T-6
248
B-29
321
C-17
480
KC-10
554
A-26
230
B-52
548
C-5
499
C-21
491
P-63
305
Boeing
605
E-3
452
U-2
362
T-43
306
The spatial resolution of the images is between 0.3 and 1.0 meters. During the data set construction process, the sample images for each class in MTARSI are carefully chosen from different airports around the world, but mainly in
re-
the United States, England, Japan, Russia, France, and China. They were extracted at different times and seasons under different imaging conditions, which increases the intraclass diversities of the data. Different colors of the body paint are chosen for the same type of aircraft and different backgrounds
lP
and poses are also taken into account. This way, the intra-class diversity of the data set is greatly increased. Moreover, for the each type, aircraft with similar appearance but different models are selected, which gives the data set certain inter-class similarity. Therefore, this data set is challenging for methods that try to recognize the aircraft type.
urn a
In Table 2, we show some samples based on variations of resolution, pose, color, light, background, and model. Some aircraft, such as the B-2 bomber and the KC-10 tanker, are very rare and difficult to capture by satellite remote sensing. This situation hinders the process of collecting and building image data sets. In order to mitigate this problem, we artificially expand the data by simulating images of aircraft that are rare and difficult to be observed. The specific process is shown in Figure 8.
Jo
First, a Prewitt-operator based segmentation algorithm [50] is applied to an existing image of the aircraft. The Prewitt operator is often used in image processing, particularly within edge detection algorithms [51]. It is a discrete differentiation operator approximating the gradient of the image intensity func-
14
Journal Pre-proof
Table 2: Samples under the variation of resolution, pose, color, time/light, background, and
pro of
model.
Resolution
Pose
re-
Color
urn a
Background
lP
Time/light
Model
tion. With this operator, we can separate the aircraft from the background.
Jo
A different background image is then selected randomly from the relevant remote sensing images (such as airport runways and landing field, etc.) that do not contain any aircraft. Figure 7 shows four background sample images, including cross runway, grass, tarmac and fog. Finally, the extracted aircraft
15
(a) Cross
(b) Grass
pro of
Journal Pre-proof
(c) Tarmac
(d) Fog
Figure 7: Background samples for simulating rare aircraft.
Foreground Extraction
Rotations and Flips
re-
Airplane Chip
Embedding
Background Chips
lP
Randomly Selecting a Background Chip
Figure 8: Flowchart of the remote sensing aircraft image simulation strategy
image is rotated, mirrored, and then combined with the selected background
urn a
to obtain a final simulated image. The extracted object and the selected background image are always in the same spatial resolution and are obtained from the same remote sensing sensor.
4. Evaluation of State-of-the-Art Algorithms on the MTARSI Data Set
Jo
We now evaluate general identification algorithms and deep learning approaches for aircraft type recognition on our new data set. We first investigate the performance of the typical state-of-the-art (but still off-the-shelf) CNN AlexNet [30].
16
Journal Pre-proof
Transfer learning [38, 49] is typically applied in object recognition. It prescribes two steps, namely pre-training on a different data set and then fine-
pro of
tuning on the target data set (here: MTARSI). We thus conduct a series of experiments with AlexNet applying pre-training in Section 4.1.
The question how the fine-tuning parameters effects the final performance is then investigated in Section 4.2. Thus, we establish a baseline for the performance of a very typical contemporary machine learning technique in our domain and on our data set. Any more sophisticated approach which is developed for aircraft type recognition from satellite images should be able to outperform this method.
re-
We then conduct a larger study to compare ten aircraft recognition methods in Section 4.3. Six of them again apply deep learning based on CNNs while the other four methods are hand-crafted feature approaches. This way, we cover the state-of-the-art on aircraft recognition from remote sensing image data and can provide reproducible results that other researchers can compare with.
lP
Finally, we then investigate which of the image features of MTARSI cause problems for which approach in Section 4.4 . In summary, we answer questions such as:
• Which types of aircraft were recognized incorrectly by which method in
urn a
our experiment?
• Which aircraft were confused most often? • Which variations caused most trouble for which approach? As environment for all experiments, we use a PC with Intel i7-7700 CPU with 3.6 GHz, 16 GB RAM, and an Nvidia GTX Titan X graphics card. The algorithms are implemented and executed based on the Caffe open source li-
Jo
brary [52].
17
Journal Pre-proof
4.1. Experiments with Pre-training AlexNet Network using Different Source Domain Data Sets
pro of
We first want to analyze the impact of transferring knowledge from different source domains to our target domain, the aircraft recognition from remote sensing data. We also experimentally investigate which kind of source data set is appropriate for pre-training deep neural networks for aircraft type recognition from remote sensing images. This experiment therefore uses different source domain data sets, namely the seven data sets: UCMerched LandUse [7], NWPU-RESISC45 [8], PatternNet [9], ImageNet [20], AID [5], WHU-RS19 [53] and RSSCN7 [54]. AID, WHU-RS19, and RSSCN7 are the latest data sets used
re-
to classify remote sensing scenes. Based on the source data sets, a group of experiments that use MTARSI to fine-tune the AlexNet network model without using transfer learning is carried out for comparison.
When pre-training the AlexNet model with different data sets, the output of the eighth fully connected layer needs to be modified accordingly, that is,
lP
its output dimension should be equal to the number of categories used in the pre-trained data set. For example, when using the ImageNet data set for pretraining, the output dimension of the eighth fully connected layer in the AlexNet network model needs to be set to 1000.
urn a
As listed in Table 3, AlexNet requires inputs of a the fixed size 227 × 227 × 3. After two alternations of convolutional and pooling layers, it produces as output 256 feature maps, each with the size 13 × 13. The third, fourth, and fifth convolutional layers are connected to each other without poolings layers. The third convolutional layer (Conv3) takes the output of the second pooling layer (Pool2) as input and filters it with 384 kernels of size 3 × 3 × 256. The fourth convolutional layer contains 384 kernels of size 3 × 3 × 384, and the fifth convolutional layer has 256 kernels of size 3 × 3 × 384. The series of convolutional
Jo
layers is followed by a pooling layer (Pool5) and the three fully-connected layers F6, F7 and F8. The output of the last fully-connected layer (F8) is then a 1000-dimensional vector, i.e. a probability distribution over the 1000 class labels.
18
Journal Pre-proof
Table 3: Details of network configurations of AlexNet used in the experiments
Input
Kernel
Stride
Pad
Output
Conv1
227*227*3
11*11
4
0
55*55*96
Pool1
55*55*96
3*3
2
0
27*27*96
Conv2
27*27*96
5*5
1
2
27*27*256
Pool2
27*27*256
3*3
2
0
13*13*256
Conv3
13*13*256
3*3
1
1
13*13*384
Conv4
13*13*384
3*3
1
1
13*13*384
Conv5
13*13*384
3*3
1
1
13*13*256
Pool5
13*13*256
3*3
2
0
6*6*256
FC6
6*6*256
6*6
1
0
4096*1
FC7
4096*1
1*1
1
0
4096*1
FC8
4096*1
1*1
1
0
1000*1
re-
pro of
Layer
lP
When the AlexNet network model is pre-trained to convergence, the pretraining parameters of all layers except the eighth fully-connected layer are migrated into the fine-tuning AlexNet network model. Then, the MTARSI data set is divided into a training and a test data set according to a ratio 4 : 1, that is, the training of AlexNet network model is fine-tuned using 80% of MTARSI.
urn a
The remaining 20% of the data are used to test and verify the network model after training and fine-tuning, and to evaluate the performance of the network model by identifying the correct rate. Based on the settings described in [30], the learning rate of each layer during the fine-tuning of the AlexNet model was set to 0.001, and it would be set to the previous 10% after 5000 iterations, and the momentum unit was set to 0.9. The weight decay is set to 0.0005, the batch size is set to 44, and the number of iterations depends on the convergence of
Jo
the network model.
The experimental results are shown in Table 4. We note that when the
MTARSI data set is used directly for training without transfer learning, the recognition accuracy of the AlexNet network model on MTARSI is 80.13%.
19
Journal Pre-proof
Table 4: Pre-training results using different source domain data sets
Scale
Categories
Accuracy
Without Pre-training
-
-
80.13%
WHU-RS19
1005
19
75.65%
UCMerced LandUse
2100
21
74.47%
RSSCN7
2800
7
77.15%
AID
10’000
30
80.56%
NWPU-RESISC45
30’400
45
82.527%
PatternNet
31’500
38
81.79%
ImageNet
about 1.2 million
1000
84.61%
re-
pro of
Pre-training Data Set
However, when the data sets WHU-RS19, UCMerched LandUse, and RSSCN7 are used for pre-training, then even after parameter fine-tuning based on MTARSI, the recognition accuracy is lower. This is called the “negative trans-
lP
fer” phenomenon and means that the knowledge learned from the source domain has a negative impact on the learning on the target domain [49]. The reason for this phenomenon is that the three source domain data sets do not contain enough images and overfitting occurs during the pre-training. The knowledge in the source domain has not been learned well and thus cannot be transferred
urn a
usefully to the target domain.
If we use the data sets AID, NWPU-RESISC45, PatternNet, or ImageNet as the source domain data set for pre-training, then AlexNet works well after the fine-tuning phase.
The accuracy becomes 80.56%, 82.52%, 81.79%,
and 84.61%, respectively, which is significantly higher than the 80.13% achieved without transfer learning. The results obtained with pre-training on ImageNet are the best.
Jo
When the size of the source domain data set used in the pre-training reaches a certain level and there is enough data to ensure that the knowledge in the source domain is grasped, transfer learning will promote the learning on the target domain MTARSI. In order to measure the value of the pre-training on
20
Journal Pre-proof
loos
2 1.5 1 0.5 0
iterations training loos
test loos
pre-training on ImageNet
3
loos
2.5 2 1.5
0
lP
1 0.5
accuracy
re-
3.5
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
pro of
3 2.5
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
accuracy
3.5
accuracy
without pre-training
iterations
test loos
accuracy
urn a
training loos
Figure 9: Loss and recognition accuracy rate of the AlexNet network on MTARSI for no pre-training (top) and pre-training (bottom) on ImageNet
the source domain data set, we additionally carry out a group of experiments where we directly train on MTARSI without using transfer learning. In Figure 9, we illustrate the loss and recognition accuracy rate of the AlexNet network on
Jo
MTARSI without pre-training and with pre-training on ImageNet, respectively. It can clearly be seen that the AlexNet network model using transfer learning
has better recognition accuracy and that the convergence speed and convergence effect on the training set and the test set are better than without transfer
21
Journal Pre-proof
Table 5: Different learning rate settings the and corresponding experimental results Conv-1
Conv-2
Conv-3
Conv-4
Conv-5
FC-6
FC-7
FC-8
accuracy
Model-1
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
85.61%
Model-2
0
0.001
0.001
0.001
0.001
0.001
0.001
0.001
85.61%
Model-3
0
0
0.001
0.001
0.001
0.001
0.001
0.001
85.61%
Model-4
0
0
0
0.001
0.001
0.001
0.001
0.001
84.31%
Model-5
0
0
0
0
0.001
0.001
0.001
0.001
83.56%
Model-6
0
0
0
0
0
0.001
0.001
0.001
82.35%
Model-7
0
0
0
0
0
0
0.001
0.001
81.52%
Model-8
0
0
0
0
0
0
0
0.001
80.27%
learning.
pro of
Model
re-
4.2. Experiments of Layers in AlexNet Network under Different Learning Rate Settings
We now investigate the different layers of the AlexNet network model under different learning rates, in order to further study the inner characteristics of
lP
deep neural networks for processing the remote sensing images. First, the ImageNet data set was used to pre-train the AlexNet network model, since we know that this has a very positive impact on the accuracy. Then, in the parameter transfer phase, eight different learning rate settings for the layers in the fine-tuning AlexNet network model were investigated. Each
urn a
setting corresponds to a model, as shown in Table 5. These models were again fine-tuned using 80% of the MTARSI data set and tested on the remaining 20% of the data.
In Table 5, the learning rate “0” indicates that the layer parameters are transferred from the pre-training stage without fine-tuning on MTARSI. Each iteration consists of 5000 training cycles, the learning rate is set to the previous 10%, the impulse unit is set to 0.9, the weight decay is set to 0.0005, the batch size
Jo
is set to 44, and the number of iterations depends on the convergence of the network model.
In the networks, the low-level convolutional layers generally extract common
features such as edges and textures of the image, while the high-level convo-
22
Journal Pre-proof
lutional layer and the all-connected layer extract more task-related high-level features [55]. As can be seen from Table 5, Model-2 only uses the parameters of
pro of
the Convolution Layer 1 transferred from the pre-trained AlexNet network and does not fine-tune the layer parameters of the target domain. It can achieve a recognition accuracy rate equivalent to Model-1. The same is true for Model-3. This also shows that the features extracted by the first few convolutional layers of the convolutional neural network have certain common semantics.
The experimental results of Model-4 to Model-8 show that the learning will be affected if no task-related parameter fine-tuning is performed for the highlevel convolutional layer and the fully connected layer after the parameter migra-
re-
tion. From this, we can conclude that the low-level parameters of the pre-trained CNNs can be directly used for the target task, while fine-tuning of the high-level parameters is necessary.
4.3. Experiments of Different Classification Algorithms on MTARSI
lP
We now investigate the performance of five state-of-the-art CNN structures on MTARSI, namely VGG [31], GoogLeNet [32], ResNet [33], DenseNet [34], and EfficientNet [35].
In the experiment, the five CNNs again use the ImageNet data set for pretraining.
The MTARSI data set is also again divided into a training and
urn a
a test set according to a ratio of 4 : 1 and the size of the images is fixed to 256 × 256 pixels. As in the experiment in Section 4.1, the learning rate of each layer of the network model in the parameter fine-tuning stage is uniformly set to 0.001, and the training rate is set to the previous 10%, the impulse unit is 0.9, the weight decay is 0.0005, the batch size is 44, and the number of iterations depends on the convergence of the network model. For comparison purposes, we also apply four hand-crafted feature-based ap-
Jo
proaches [26, 27, 28, 29]. For the local patch descriptor, such as SIFT and HOG, we use a fixed size grid (16 × 16 pixels) with the spacing step of 8 pixels to extract all the descriptors in the image and adopt the average pooling method for each dimension of the descriptor to get the final image features. We fix the 23
Journal Pre-proof
Table 6: Experimental results of different classification algorithms on the MTARSI data set
accuracy
SIFT [26]+BOVW
59.02%
HOG [27]+SVM ScSPM [28]
61.34% 60.61%
LLC [29]
64.93%
AlexNet
85.61%
GoogLeNet
86.53%
VGG
87.56%
ResNet
89.61% 89.15%
re-
DenseNet
pro of
Method
EifficientNet
89.79%
dictionary size to 4096 for BOVW and LLC, and to 256 for ScSPM.
lP
Table 6 shows that the aircraft type recognition algorithms based on transfer learning are far superior to the traditional classification methods in aircraft target recognition performance and that the average accuracy rate is improved by nearly 26%. This also indicates that for aircraft type recognition from remote sensing images, methods based on CNNs can extract higher-level features with
urn a
more generalization and image expression ability compared to the traditional feature-based methods. They also can more effectively distinguish aircraft of different types.
We further find that ResNet, DenseNet, and EfficientNet have similar performance, while AlexNet, GoogleNet, and VGG perform slightly worse. In terms of time performance of recognition, AlexNet and EfficientNet perform best, while the others are similar. Note that AlexNet has only eight layers, which is much
Jo
shallower than the other deep models. For EfficientNet, a compound scaling method is used for balancing network width, depth, and resolution. With this, inference via EfficientNet is 6.1 times faster than with ResNet [35].
24
Journal Pre-proof
4.4. Analysis of Image Features of MTARSI Based on our experiments with different classification algorithms, we now
pro of
investigate which of the image features of MTARSI are most problematic. We first study the recognition rate for each type of aircraft over different classification methods and check which types were confused most often. In order to find which variations caused most trouble for which approach, we further conduct a new group of experiments on a validation set of size 600, by randomly selecting 100 images from the testing set of MTARSI based on their variations including resolution, pose, color, light, background, and aircraft model.
In Figure 10, the confusion matrices on MTARSI for SIFT [26], HOG [27],
re-
ScSPM [28], LLC [29], are presented. Figure 11 provides the confusion matrices on MTARSI for AlexNet, GoogLeNet, VGG, ResNET, DenseNet, and EfficientNet. We find that the hand-crafted feature based methods do generalize, but suffer from some misclassifications on similar aircraft types or models, such as
lP
C-5 and Boeing or B-52 and C-5. The deep neural network based methods exhibit good performance except on some extremely similar aircraft. We also find that there are significant differences in the recognition performance of these two kinds of methods for each specific type.
We conclude that MTARSI is a challenging benchmark for most of the state-
urn a
of-the-art methods. It is very likely that it can help to design, analyze, and improve practical applications of aircraft identification. For this purpose, deep learning models like ResNet, DenseNet, and EfficientNet lend themselves. We illustrate the recognition rates of the state-of-the-art methods on the validation set including six variations, i.e., resolution, pose, color, light, background, and aircraft model, in Figure 12. We can see that the aircraft model and background are the two biggest factors that influence the method based
Jo
on hand-crafted feature, and the image resolution is the biggest variable that affects the performance of the deep learning method.
25
Journal Pre-proof
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.03
0.02
0.06
0.03
0.06
0.04
0.02
0.03
0.01
0.01
0.03
0.56
0.07
0.02
0.05
0.03
0.05
0.05
0.02
0.07
0.01
0.02
0.01
0.04
0.05
0.59
0.03
0.02
0.02
0.06
0.07
0.02
0.06
0.01
0.01
0.02
0.01
0.10
0.61
0.14
0.03
0.05
0.03
0.01
0.01
0.08
0.12
0.61
0.01
0.01
0.04
0.02
0.01
0.60
0.05
0.02
0.02
0.05
0.07
0.02
0.02
0.02
0.62
0.03 0.01
0.01
0.09
0.06
0.01
0.05
0.06
0.01 0.01 0.02
0.06
0.03
0.02
0.02
0.03
0.03
0.02
0.03
0.03
0.10
0.02
0.03
0.53
0.06
0.02
0.07
0.02
0.01
0.05
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.01
0.01
0.01
0.03
0.50
0.16
0.01
0.01
0.03
0.05
0.03
0.01
0.01
0.21
0.48
0.01
0.01
0.01
0.02
0.01
0.01
0.55
0.09
0.02
0.01
0.01
0.12
0.53
0.04
0.01
0.08
0.05
0.49
0.03
0.01
0.01
0.04
0.02
0.01
0.58
0.01
0.05
0.03
0.01
0.03
0.04
0.01
0.06
0.03
0.05
0.12
0.05
0.05
0.06
0.58
0.01
0.03
0.02
0.01
0.04
0.02
0.06
0.02
0.01
0.01
0.02
0.01
0.01
0.01
0.10
0.03
0.01
0.02
0.03
0.01
0.02
0.02
0.01
0.03
0.01
0.03
0.03
0.01
0.06
0.03
0.01
0.02
0.02
0.03
0.03
0.03
0.06
0.04
0.02
0.01
0.01
0.03
0.03
0.05
0.02
0.02
0.07
0.03
0.01
0.02
0.01
0.01
0.01
0.02
0.04
0.02
0.02
0.01
0.03
0.03
0.02
0.01
0.01
0.01
0.03
0.03
0.01
0.01
0.01
0.02
0.01
0.02
0.05
0.08
0.03
0.01
0.01
0.02
0.02
0.01
0.02
0.04
0.07
0.03
0.01
0.01
0.05
0.05
0.01
0.69
0.01
0.01
0.01
0.01
0.02
0.01 0.02
0.01
0.01
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01
0.01
0.08
0.01
0.01
0.01
0.01
0.01 0.05
0.03
0.01
0.03
0.01
0.02
0.11
0.01
0.01
0.09
0.61
0.02
0.01
0.01
0.01
0.62
0.03
0.01
0.02
0.58
(a) SIFT+BOVM
0.68
0.01
0.02
0.01
0.03
0.01
0.01
0.01
0.01
0.02
0.08
0.07
0.01
0.01
0.01
0.01
0.05
0.70
0.01
0.01
0.02
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.03
0.02
0.02
0.01
0.01
0.02
0.01
0.01
0.63
0.02
0.06
0.05
0.02
0.02
0.07
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.61
0.08
0.03
0.04
0.02
0.04
0.03
0.01
0.01
0.03
0.01
0.01
0.01
0.04
0.02
0.63
0.03
0.01
0.03
0.07
0.06
0.01
0.05
0.01
0.01
0.01
0.03
0.01
0.04
0.58
0.09
0.05
0.01
0.04
0.01
0.02
0.01
0.03
0.01
0.01
0.01
0.01
0.08
0.11
0.59
0.05
0.01
0.01
0.01
0.02
0.02
0.01
0.01
0.04
0.01
0.01
0.60
0.02
0.05
0.02
0.04
0.01
0.01
0.04
0.07
0.01
0.05
0.56
0.03
0.01
0.04
0.03
0.04
0.04
0.09
0.06
0.02
0.07
0.51
0.01
0.04
0.06
0.01
0.03
0.01
0.06
0.63
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.01
0.01 0.01
0.01
0.02 0.06
0.02
0.13
0.02
0.01 0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.09
0.01
0.01
0.03
0.02
0.02
0.01
0.01
0.01
0.03
0.03
0.02
0.01
0.01
0.01
0.04
0.01
0.01
0.01
0.01
0.53
0.12
0.04
0.02
0.02
0.01
0.01
0.03
0.01
0.01 0.01
0.03
0.01
0.01
0.01
0.01
0.02
0.16
0.51
0.02
0.02
0.02
0.01
0.02
0.01
0.06
0.01
0.02
0.01
0.63
0.05
0.01
0.03
0.01
0.01
0.01
0.01
0.03
0.01
0.02
0.08
0.02
0.03
0.07
0.62
0.02
0.01
0.01
0.01
0.02
0.03
0.07
0.03
0.01
0.01
0.07
0.05
0.61
0.01
0.01
0.03
0.03
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.02
0.04
0.01
0.04
0.02
0.07
0.02
0.01
0.01
0.01
0.01
0.01
0.02
0.07
0.03
0.04
0.02
0.01
0.01
0.01
0.01
0.02
0.01
0.03
0.05
0.03
0.01
0.02
0.02
0.09
0.01
0.01
0.01
0.02
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.02 0.03 0.08
0.03 0.03 0.02
0.03
0.02
0.02 0.01
0.01
0.03
0.01
0.03 0.04 0.02 0.03 0.01 0.01 0.03 0.01
0.01 0.01
0.03 0.01 0.02
0.03 0.60 0.07 0.02 0.05 0.03 0.05 0.05 0.02
0.05
0.06 0.03 0.57 0.04 0.01 0.04 0.07 0.08
0.06 0.02
0.02 0.01
0.01 0.01
0.02 0.01
0.03 0.01
0.03 0.63 0.12 0.05 0.01 0.05 0.01
0.05
0.02 0.01
0.01 0.06 0.04 0.01 0.01 0.02
0.01 0.01
0.64 0.03 0.05 0.04 0.03 0.02 0.07 0.02 0.01
0.03
0.04 0.04 0.02
0.01 0.07 0.09 0.62 0.04 0.03 0.01 0.58 0.03 0.05 0.01
0.06 0.01
0.02
0.01 0.05 0.06 0.01 0.04 0.59 0.04 0.01 0.01
0.01 0.06
0.02
0.06 0.01 0.01 0.02 0.59 0.07 0.01 0.03
0.02 0.05
0.02
0.04
0.01 0.10 0.57 0.05
0.02 0.02
0.09 0.04 0.56 0.01
0.03 0.08 0.04 0.01 0.01 0.01
0.02 0.60 0.08 0.02 0.03
0.02
0.02 0.01 0.01 0.01
0.03 0.07 0.05 0.01
0.02 0.05
0.02 0.05 0.01
0.03
0.02 0.01 0.01 0.01 0.04 0.02 0.04 0.07 0.01 0.04
0.02
0.03 0.02 0.01
0.01 0.02 0.16 0.53
0.01 0.03 0.03 0.02 0.03 0.01
0.01 0.02 0.01 0.01
0.05 0.03 0.03
0.02 0.56 0.10 0.03 0.05 0.03
0.01
0.02 0.03 0.01 0.01 0.01
0.02
0.01 0.02 0.05
0.03 0.04 0.01 0.02 0.01 0.05 0.68 0.01
0.02
0.11 0.02
0.06 0.02 0.08
0.03 0.03 0.02 0.01
0.01 0.02 0.04 0.09 0.05 0.03 0.05 0.59 0.01
0.05 0.02 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.01 0.01
0.04 0.01 0.02 0.11 0.63 0.03 0.04
0.09 0.02 0.02 0.01 0.01 0.01 0.01 0.02 0.01
0.01 0.03 0.07 0.01 0.01 0.01 0.65 0.01 0.02 0.62
0.03 0.07 0.01 0.01 0.01 0.03 0.07
0.01 0.01 0.01 0.03 0.04 0.02 0.01
0.02 0.01
0.03
0.01
0.01
0.68
0.12
0.01
0.01
0.13
0.67
0.01
0.01
0.01
0.01
0.68 0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.02
0.01
0.01
0.02
0.01
0.01
0.01
0.02
0.01
0.67
0.06
0.06
0.03
0.01
0.01
0.05
0.01
0.01
0.01
0.05
0.63
0.09
0.03
0.04
0.02
0.02
0.02
0.01
0.06
0.02
0.61
0.03
0.01
0.03
0.07
0.06
0.02
0.03
0.08
0.60
0.05
0.03
0.02
0.06
0.73
0.01
0.01
0.02
0.75
0.01
re-
0.01
0.02
0.06 0.71
(b) HOG+SVM
0.05 0.01 0.03 0.01 0.02 0.01 0.63
lP
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.73 0.03 0.02
pro of
0.76
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.02
0.04
(c) ScSPM
0.05
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.06
0.03
0.01
0.02
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.05
0.01
0.01
0.01
0.04
0.01
0.03
0.02
0.01 0.01
0.03
0.01
0.01
0.01 0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.02
0.01
0.03
0.01
0.10
0.07
0.61
0.05
0.02
0.03
0.01
0.01
0.01
0.01
0.08
0.04
0.06
0.63
0.01
0.04
0.02
0.01
0.04
0.01
0.01
0.01
0.03
0.06
0.02
0.01
0.03
0.63
0.04
0.01
0.01
0.06
0.01
0.02
0.03
0.03
0.08
0.03
0.03
0.04
0.58
0.01
0.04
0.04
0.01
0.02
0.04
0.02
0.04
0.02
0.01
0.02
0.05
0.69
0.02
0.02
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.05
0.61
0.06
0.02
0.01
0.01
0.01
0.06
0.01
0.01
0.07
0.01
0.07
0.01
0.01
0.01 0.01
0.01
0.02
0.01
0.02
0.19
0.59
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.02
0.01
0.02
0.01
0.06
0.01
0.02
0.01
0.61
0.07
0.01
0.01
0.02
0.02
0.05
0.01
0.01
0.01
0.01
0.05
0.03
0.01
0.01
0.09
0.02
0.06
0.60
0.01
0.01
0.01
0.05
0.01
0.01
0.01
0.01
0.03
0.01
0.03
0.07
0.03
0.01
0.02
0.04
0.08
0.59
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.70
0.15
0.01
0.01
0.01
0.02
0.13
0.72
0.01
0.02
0.01
0.02
0.01
0.70
0.01
0.02
0.04
0.01
0.01
0.71
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.07
0.03
0.02
0.03
0.02
0.01
0.01
0.02
0.01
0.01
0.04
0.04
0.02
0.01
0.01
0.03
0.05
0.01
0.01
0.01
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.68
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
(d) LLC
Figure 10: Confusion Matrices on MTARSI when using SIFT [26]+BOVW, HOG [27]+SVM,
urn a
ScSPM [28], an LLC [29]
5. Conclusion and Future Work The recognition of aircraft types from remote sensing images attracts much research interest. In this paper, we first review several data sets commonly used for aircraft recognition. We found that all of them either are not suitable for our domain or have significant shortcomings. This inspired us to develop and publish a new benchmark data set for aircraft type recognition of remote sensing
Jo
images, MTARSI.
We then reviewed the state-of-the-art on this domain and found that, as
a consequence from the lack of suitable benchmarks, most approaches were evaluated on different data sets under different experimental settings. It is thus
26
Journal Pre-proof
0.01
0.76
0.05
0.03
0.02
0.01
0.01
0.03
0.83
0.04
0.01
0.01
0.01
0.03
0.01
0.76
0.01
0.01
0.01
0.04
0.01
0.06
0.01
0.03
0.01
0.01
0.02
0.01
0.01
0.07
0.01
0.04
0.02
0.01
0.02
0.01
0.05
0.79
0.03
0.01
0.03
0.01
0.02
0.02
0.05
0.87
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.88
0.01
0.01
0.02
0.01
0.03
0.02
0.02
0.02
0.82
0.02
0.02
0.05
0.01
0.01
0.02
0.79
0.01
0.01
0.02
0.01
0.01
0.02
0.92
0.01
0.01
0.01
0.01
0.01
0.03
0.01
0.01
0.01
0.02
0.01
0.02 0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01 0.01
0.01
0.01
0.01 0.01
0.01
0.01
0.01
0.07 0.88
0.01
0.01
0.83
0.03
0.01
0.01
0.01
0.04
0.87
0.02
0.01
0.01
0.04
0.01 0.01
0.01
0.01
0.08
0.01
0.01
0.04
0.01
0.01
0.03
0.01
0.01
0.01
0.84
0.01
0.01
0.01
0.01
0.02 0.05
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01 0.01
0.01
0.01
0.01
0.01
0.01
0.06
0.04
0.01
0.01
0.01
0.01
0.01
0.01
0.84
0.02
0.85
0.07
0.01
0.01
0.01
0.09
0.86
0.01
0.01
0.01
0.04
0.89
0.01
0.03
0.03
0.02
0.90
(a) AlexNet
0.93
0.01
0.01 0.01
0.01 0.01
0.01
0.01
0.01
0.01
0.03
0.04
0.01 0.01
0.83
0.04
0.03
0.01
0.82
0.03
0.01
0.03
0.85
0.01
0.01
0.01
0.02
0.89
0.01
0.01
0.01
0.01
0.91
0.01
0.01
0.01
0.03
0.01
0.89
0.01
0.01
0.90
0.01
0.01
0.02
0.92
0.01
0.01
0.06
0.01
0.01
0.03
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.07
0.86
0.01
0.01
0.86
0.02
0.01
0.03
0.87
0.01
0.01
0.01 0.01 0.02
0.01
0.01
0.01
0.01
0.03
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01
0.01
0.01 0.01
0.01
0.05
0.01
0.01
0.01
0.01
0.03
0.01
0.88
0.02
0.01
0.06
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.05
0.01
0.01
0.01
0.01
0.01
0.01
0.01 0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.05
0.02
0.01
0.05
0.01
0.01
0.01
0.01 0.80
0.01
0.01
0.01
0.01
0.03 0.01
0.91
0.01
0.05
0.01
0.04
0.88
0.07
0.01
0.05
0.90
0.01
0.84
0.01 0.90
0.91 0.90
0.05
0.04
0.03
0.03
0.01
0.03
0.03
0.01
0.01
0.87
0.04
0.01
0.01
0.01
0.04
0.01
0.02
0.89
0.03
0.01
0.01
0.01
0.01
0.02
0.01
0.89
0.01
0.01
0.03
0.01
0.01
0.01
0.01
0.90
0.03
0.02
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.96 0.01
0.01
0.01
0.01
0.08
0.01
0.10
0.80
0.01
0.02
0.01
0.03
0.01
0.01
0.02
0.01
0.03
0.02
0.03
0.02
0.02
0.01 0.03
0.01
0.91
0.91
0.09
0.07
0.90
0.87
0.01
0.02 0.01 0.88
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.01
0.01
0.01
0.93
0.03
0.04
0.01
0.01
0.01
0.95
0.01
0.01
0.01
0.83
0.01
0.02
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.88
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.86
0.01
0.04
0.04
0.05
0.88
0.05
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01
0.02
0.03
0.01
0.02
0.04
0.02
0.03
0.03
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.03
0.04
0.03
0.02
0.01
0.01
0.86
0.04
0.03
0.01
0.01
0.01
0.02
0.79
0.04
0.02
0.01
0.01
0.03
0.04
0.82
0.02
0.01
0.01
0.03
0.03
0.88
0.02
0.02
0.01
0.02
0.01
0.01
0.90
0.01
0.01
0.03
0.04
0.01
0.01
0.02
0.83
0.02
0.02
0.03
0.02
0.05
0.02
0.02
0.01
0.83
0.01
0.02
0.01
0.01
0.03
0.90
0.01
0.02
0.01
0.05
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.88
0.08
0.10
0.86
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.01
0.01
0.01
0.01
0.01
0.04
0.90
0.01
0.02
0.01
0.01
0.01
0.86
0.01
0.01
0.01
0.02
0.89
0.01
0.01
0.02
0.01
0.01
0.01
0.01 0.01
0.01
0.04
0.01
0.02
0.02
0.01
0.01
0.88
0.07
0.07
0.89
0.01
0.01
0.01
0.91
0.01
0.01
0.05
0.01
0.88
0.92
0.01
0.02
0.93
0.01
0.01 0.01 0.01
0.01
0.01
0.01
0.01
0.90
0.01
0.01
0.01
0.01
0.01
0.01
0.86
0.01
0.01
0.01
0.01
0.02
0.01
0.02
0.87
0.01
0.01
0.02
0.01
0.01
0.03
0.88
0.01
0.01
0.05
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.89
0.01
0.01
0.03
0.01
0.01
0.89
0.01
0.01
0.02
0.04
0.01
0.01
0.90
0.01
0.01
0.01
0.03
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01 0.01
0.01
0.01
0.02
0.03
0.01 0.01 0.89
0.08
0.07
0.90
0.01
0.03
0.01
0.01 0.03
0.89
0.01
0.01
0.89
0.01
0.01
0.90
0.04
0.01
0.01
0.01
0.01
0.91
0.01
0.04
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.01
0.02
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.89
0.01
0.01
0.90
0.08
0.02
0.05
0.91
0.01
0.02
0.01
0.02
0.05
0.01
0.01
0.01 0.02 0.89
0.02 0.91
(d) ResNet
0.01
0.88 0.05
0.01
0.93 0.82
0.09
0.01
0.04
0.01
0.04
0.86
0.06
0.01
0.01
0.01
0.83
0.07
0.01
0.02
0.86
0.03
0.02
0.06
0.88
0.01
0.03
0.01
0.01
0.01
0.04
0.01
0.05
0.03
0.01
0.01
0.01
0.01
0.08
0.03
0.01
0.01
0.01
0.01
0.01
0.04
0.01
0.05
0.01
0.01
0.01
0.01
0.05
0.91
0.01
0.01
0.01
0.86
0.01
0.02
0.86
0.02
0.05
0.93
0.01
0.94
0.05
0.01
0.06
0.90
0.03
0.04
0.01 0.89
0.01
0.03
0.93 0.03
0.05
0.01 0.01
0.01
0.01
0.01
0.01
0.01 0.01
0.01
0.07 0.01
0.02
0.04
0.01
0.01
0.01
0.01
0.01
0.03
0.01 0.04
0.02
0.03
0.01
0.01
0.02 0.04 0.91 0.87
0.07
0.06
0.93 0.87
0.01 0.90
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.01
0.01
urn a
0.90
0.01
0.78
(c) VGGNet
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01
0.01
0.89
re-
0.01
0.01
lP
0.86
0.01
(b) GoogLeNet
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
0.01
0.88
pro of
0.89
0.01
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.01
0.01
0.01
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
0.86
B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43
B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43
(f) EfficientNet
Jo
(e) DenseNet
Figure 11: Confusion Matrices on MTARSI when using AlexNet, GoogLeNet, VGG, ResNet, DenseNet, and EfficientNet
27
Journal Pre-proof
Pose
Color
Time/light
Background
Model
re-
Resolution
pro of
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
Figure 12: Experimental results of the state-of-the-art methods on the validation set
hard to fairly compare the results published in literature. In order to solve the problem, we make MTARSI available in the online
lP
repository [18]. MTARSI can be used to assess and evaluate the performance of aircraft type recognition algorithms for natural images. It also benefits the development of image processing, object recognition algorithms, and vision technology for remote sensing.
We then evaluate a set of representative aircraft type recognition approaches
urn a
with various experimental protocols on the new data set. We found that our data set allows to clearly distinguish the performance of different methods. We also report which features of the images contained in the data set cause trouble for which approach. Researchers using MTARSI will therefore have a sound baseline of results to compare with, as well as pointers for where to look if their experimental results leave room for improvements. In our future work, we will utilize MTARSI to develop better aircraft recog-
Jo
nition methods. We also will further expand to collect more abundant data based on MTARSI and take into account other object types of remote sensing images.
Acknowledgements. We acknowledge support from the Youth Project of 28
Journal Pre-proof
the Provincial Natural Science Foundation of Anhui 1908085QF285 and 1908085MF184, the Scientific Research and Development Fund of Hefei Uni-
pro of
versity 19ZR05ZDA, the Talent Research Fund Project of Hefei University 18-19RC26, the National Natural Science Foundation of China under grants 61673359 and 61672204, and the grant KJ2018A0555 of the key Scientific Research Foundation of Education Department of Anhui Province.
Disclosure. All authors declare that there are no conflicts of interests. All authors declare that they have no significant competing financial, professional or personal interests that might have influenced the performance or presentation of the work described in this manuscript. Declarations of interest: none. All
re-
authors approve the final article. References
[1] J. Chen, B. Zhang, C. Wang, Backscattering feature analysis and recog-
lP
nition of civilian aircraft in TerraSAR-X images, IEEE Geosci. Remote Sensing Lett. 12 (4) (2015) 796–800. doi:10.1109/LGRS.2014.2362845. [2] Y. Zhong, A. Ma, Y. soon Ong, Z. Zhu, L. Zhang, Computational intelligence in optical remote sensing image processing, Applied Soft Computing
urn a
64 (2018) 75–93. doi:10.1016/j.asoc.2017.11.045. [3] G. Liu, X. Sun, K. Fu, H. Wang, Aircraft recognition in highresolution satellite images using coarse-to-fine shape prior,
IEEE
Geoscience and Remote Sensing Letters 10 (3) (2013) 573–577. doi:10.1109/LGRS.2012.2214022. [4] A. Zhao, K. Fu, S. Wang, J. Zuo, Y. Zhang, Y. Hu, H. Wang, Aircraft recognition based on landmark detection in remote sensing images,
Jo
IEEE Geoscience and Remote Sensing Letters 14 (8) (2017) 1413–1417. doi:10.1109/LGRS.2017.2715858.
[5] G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, L. Zhang, X. Lu, Aid: A benchmark data set for performance evaluation of aerial scene classification, 29
Journal Pre-proof
IEEE Transactions on Geoscience and Remote Sensing 55 (7) (2017) 3965– 3981. doi:10.1109/TGRS.2017.2685945.
pro of
[6] K. Fu, W. Dai, Y. Zhang, Z. Wang, M. Yan, X. Sun, Multicam: Multiple class activation mapping for aircraft recognition in remote sensing images, Remote Sensing 11 (5) (2019) 544. doi:10.3390/rs11050544.
[7] Y. Yang, S. D. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in: D. Agrawal, P. Zhang, A. El Abbadi, M. F. Mokbel (Eds.), 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, November 3-5, 2010, San Jose, CA, USA, Proceedings, ACM, 2010, pp. 270–279.
re-
doi:10.1145/1869790.1869829.
[8] G. Cheng, J. Han, X. Lu, Remote sensing image scene classification: Benchmark and state of the art, Proceedings of the IEEE 105 (10) (2017) 1865–
lP
1883. doi:10.1109/JPROC.2017.2675998.
[9] W. Zhou, S. Newsam, C. Li, Z. Shao, Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval, ISPRS Journal of Photogrammetry and Remote Sensing 145 (2018) 197–209.
urn a
[10] S. Maji, E. Rahtu, J. Kannala, M. B. Blaschko, A. Vedaldi, Fine-grained visual classification of aircraft, CoRR abs/1306.5151. arXiv:1306.5151. URL http://arxiv.org/abs/1306.5151 [11] Q. Wu, H. Sun, X. Sun, D. Zhang, K. Fu, H. Wang, Aircraft recognition in high-resolution optical satellite remote sensing images, IEEE Geoscience and Remote Sensing Letters 12 (1) (2015) 112–116. doi:10.1109/LGRS.2014.2328358.
Jo
[12] J. Zuo, G. Xu, K. Fu, X. Sun, H. Sun, Aircraft type recognition based on segmentation with deep convolutional neural networks, IEEE Geoscience and Remote Sensing Letters 15 (2) (2018) 282–286. doi:10.1109/LGRS.2017.2786232. 30
Journal Pre-proof
[13] J.-W. Hsieh, J.-M. Chen, C.-H. Chuang, K.-C. Fan, Aircraft type recognition in satellite images, IEE Proceedings-Vision, Image and Signal Pro-
[14] C. Xu,
H. Duan,
pro of
cessing 152 (3) (2005) 307–315. Artificial bee colony (ABC) optimized edge
potential function (EPF) approach to target recognition for lowaltitude aircraft, Pattern Recognition Letters 31 (13) (2010) 1759–1772. doi:10.1016/j.patrec.2009.11.018.
[15] W. Diao, X. Sun, F. Dou, M. Yan, H. Wang, K. Fu, Object recognition in remote sensing images using sparse deep belief networks, Remote sensing
re-
letters 6 (10) (2015) 745–754.
[16] Y. Zhang, H. Sun, J. Zuo, H. Wang, G. Xu, X. Sun, Aircraft type recognition in remote sensing images based on feature learning with conditional generative adversarial networks, Remote Sensing 10 (7) (2018) 1123.
lP
doi:10.3390/rs10071123.
[17] T. Weise, X. Wang, Q. Qi, B. Li, K. Tang, Automatically discovering clusters of algorithm and problem instance behaviors as well as their causes from experimental data, algorithm setups, and instance features, Applied
urn a
Soft Computing 73 (2018) 366–382. doi:10.1016/j.asoc.2018.08.030. [18] Z. Wu, Multi-type Aircraft of Remote Sensing Images: MTARSI, Zenodo.org, 2019. doi:10.5281/zenodo.3464319. [19] M. Jian, Q. Qi, H. Yu, J. Dong, C. Cui, X. Nie, H. Zhang, Y. Yin, K.-M. Lam, The extended marine underwater environment database and baseline evaluations, Applied Soft Computing. [20] J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, Imagenet: A large-scale
Jo
hierarchical image database, in: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, IEEE Computer Society, 2009, pp. 248–255. doi:10.1109/CVPRW.2009.5206848. 31
Journal Pre-proof
[21] Y. Li, K. Fu, H. Sun, X. Sun, An aircraft detection framework based on reinforcement learning and convolutional neural networks in remote sensing
pro of
images, Remote Sensing 10 (2) (2018) 243. doi:10.3390/rs10020243. [22] Y. Xu, M. Zhu, P. Xin, S. Li, M. Qi, S. Ma, Rapid airplane detection in remote sensing images based on multilayer feature fusion in fully convolutional neural networks, Sensors 18 (7) (2018) 2335. doi:10.3390/s18072335. [23] G. Wang, X. Wang, B. Fan, C. Pan, Feature extraction by rotationinvariant matrix representation for object detection in aerial image, IEEE Geoscience and Remote Sensing Letters 14 (6) (2017) 851–855.
re-
doi:10.1109/LGRS.2017.2683495.
[24] F. Zhang, B. Du, L. Zhang, M. Xu, Weakly supervised learning based on coupled convolutional neural networks for aircraft detection, IEEE Transactions on Geoscience and Remote Sensing 54 (9) (2016) 5553–5563.
lP
doi:10.1109/TGRS.2016.2569141.
[25] Y. Tan, Q. Li, Y. Li, J. Tian, Aircraft detection in high-resolution SAR images based on a gradient textural saliency map, Sensors 15 (9) (2015) 23071–23094. doi:10.3390/s150923071.
urn a
[26] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. doi:10.1023/B:VISI.0000029664.99615.94. [27] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, IEEE Computer Society, 2005, pp. 886–893. doi:10.1109/CVPR.2005.177.
Jo
[28] J. Yang, K. Yu, Y. Gong, T. S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009),
32
Journal Pre-proof
20-25 June 2009, Miami, Florida, USA, IEEE Computer Society, 2009, pp. 1794–1801. doi:10.1109/CVPRW.2009.5206757.
pro of
[29] K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada, Curran Associates, Inc., 2009, pp. 2223–2231.
[30] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep
re-
convolutional neural networks, in: P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6,
lP
2012, Lake Tahoe, Nevada, United States, 2012, pp. 1106–1114. [31] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556. [32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in:
urn a
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society, 2015, pp. 1–9. doi:10.1109/CVPR.2015.7298594. [33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer
Jo
Society, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. [34] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
33
Journal Pre-proof
[35] M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, arXiv preprint arXiv:1905.11946.
pro of
[36] L. M. Dang, S. I. Hassan, I. Suhyeon, A. kumar Sangaiah, I. Mehmood, S. Rho, S. Seo, H. Moon, Uav based wilt detection system via convolutional neural networks, Sustainable Computing: Informatics and Systems.
[37] J. G. Ha, H. Moon, J. T. Kwak, S. I. Hassan, M. Dang, O. N. Lee, H. Y. Park, Deep convolutional neural network for classifying fusarium wilt of radish from unmanned aerial vehicles, Journal of Applied Remote Sensing 11 (4) (2017) 042621.
re-
[38] S. Chakraborty, M. Roy, A neural approach under transfer learning for domain adaptation in land-cover classification using twolevel cluster mapping, Applied Soft Computing 64 (2018) 508–525. doi:10.1016/j.asoc.2017.12.018.
lP
[39] Z. Chai, W. Song, H. Wang, F. Liu, A semi-supervised auto-encoder using label and sparse regularizations for classification, Applied Soft Computing 77 (2019) 205–217. doi:10.1016/j.asoc.2019.01.021. [40] X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, Z. Guo, Automatic ship
urn a
detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks, Remote Sensing 10 (1) (2018) 132. doi:10.3390/rs10010132. [41] K. Fu, Y. Li, H. Sun, X. Yang, G. Xu, Y. Li, X. Sun, A ship rotation detection model in remote sensing images based on feature fusion pyramid network and deep reinforcement learning, Remote Sensing 10 (12) (2018) 1922. doi:10.3390/rs10121922.
Jo
[42] J. C. Rangel, J. Mart´ınez-G´ omez, C. Romero-Gonz´ alez, I. Garc´ıa-Varea, M. Cazorla, Semi-supervised 3d object recognition through CNN labeling, Applied Soft Computing 65 (2018) 603–613. doi:10.1016/j.asoc.2018.02.005.
34
Journal Pre-proof
[43] Y. Chen, X. Yang, B. Zhong, S. Pan, D. Chen, H. Zhang, Cnntracker:
Online discriminative object tracking via deep convolu-
doi:10.1016/j.asoc.2015.06.048.
pro of
tional neural network, Applied Soft Computing 38 (2016) 1088–1098.
URL https://doi.org/10.1016/j.asoc.2015.06.048
[44] Z. Yan, M. Yan, H. Sun, K. Fu, J. Hong, J. Sun, Y. Zhang, X. Sun, Cloud and cloud shadow detection using multilevel feature fused segmentation network, IEEE Geoscience and Remote Sensing Letters 15 (10) (2018) 1600– 1604. doi:10.1109/LGRS.2018.2846802.
[45] X. Gao, X. Sun, Y. Zhang, M. Yan, G. Xu, H. Sun, J. Jiao, K. Fu, An end-
re-
to-end neural network for road extraction from remote sensing imagery by multiple feature pyramid network, IEEE Access 6 (2018) 39401–39414. doi:10.1109/ACCESS.2018.2856088.
[46] M. Larsson, Y. Zhang, F. Kahl, Robust abdominal organ segmentation
lP
using regional convolutional neural networks, Applied Soft Computing 70 (2018) 465–471. doi:10.1016/j.asoc.2018.05.038. [47] M. Mittal, L. M. Goyal, S. Kaur, I. Kaur, A. Verma, D. J. Hemanth, Deep learning based enhanced tumor segmentation approach
urn a
for MR brain images, Applied Soft Computing 78 (2019) 346–354. doi:10.1016/j.asoc.2019.02.036. [48] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ ar, C. L. Zitnick, Microsoft COCO: common objects in context, in: D. J. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, Vol. 8693 of Lecture Notes in Computer Science,
Jo
Springer, 2014, pp. 740–755. doi:10.1007/978-3-319-10602-1 48.
[49] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on knowledge and data engineering 22 (10) (2010) 1345–1359. doi:10.1109/TKDE.2009.191. 35
Journal Pre-proof
[50] J. R. Dim, T. Takamura, Alternative approach for satellite cloud classification: edge gradient application, Advances in Meteorology 2013.
Psychopictorics 10 (1) (1970) 15–19.
pro of
[51] J. M. Prewitt, Object enhancement and extraction, Picture processing and
[52] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in: K. A. Hua, Y. Rui, R. Steinmetz, A. Hanjalic, A. Natsev, W. Zhu (Eds.), Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, ACM,
re-
2014, pp. 675–678. doi:10.1145/2647868.2654889.
[53] G.-S. Xia, W. Yang, J. Delon, Y. Gousseau, H. Sun, H. Maˆıtre, Structural high-resolution satellite image indexing, in: ISPRS TC VII Symposium-100 Years ISPRS, Vol. 38, 2010, pp. 298–303.
lP
[54] Q. Zou, L. Ni, T. Zhang, Q. Wang, Deep learning based feature selection for remote sensing scene classification, IEEE Geoscience and Remote Sensing Letters 12 (11) (2015) 2321–2325. doi:10.1109/LGRS.2015.2475299. [55] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional net-
urn a
works, in: D. J. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, Vol. 8689 of Lecture Notes in Computer Science, Springer, 2014, pp. 818–833. doi:10.1007/978-3-319-
Jo
10590-1 53.
36
Journal Pre-proof
Highlights
pro of relP
3.
urn a
2.
We construct and present a diverse remote sensing image data set for aircraft type recognition. We present a comprehensive review of up-to-date algorithms for aircraft type recognition of remote sensing images. We provide the results obtained with state-of-the-art approaches in the field, to serve as baseline for comparison for researchers working on remote sensing computer vision.
Jo
1.
Journal Pre-proof
Declaration of Interest Statement Article Title:
A Benchmark Data Set for Aircraft Type Recognition from Remote
Sensing Images Authors:
Zhi-Ze Wu, Shou-Hong Wan, Xiao-Feng Wang, Ming Tan, Le Zou,
Xin-Lu Li, Yan Chen
pro of
All authors of this manuscript have directly participated in planning, execution, and/or analysis of this study.
The contents of this manuscript have not been copyrighted or published previously. The contents of this manuscript are not now under consideration for publication elsewhere.
The contents of this manuscript will not be copyrighted, submitted, or published
re-
elsewhere while acceptance by Applied Soft Computing.
There are no directly related manuscripts or abstracts, published or unpublished, by any authors of this manuscript.
lP
No financial support or incentive has been provided for this
manuscript (if financial support was provided, please specify below). I am one author signing on behalf of all co-authors of this manuscript, and attesting to
Signature: Date:
urn a
the above.
May 19, 2019
Printed Name:
Department of Computer Science and Technology, Hefei University
Jo
Institution:
Xiao-Feng Wang