A benchmark data set for aircraft type recognition from remote sensing images

A benchmark data set for aircraft type recognition from remote sensing images

Journal Pre-proof A benchmark data set for aircraft type recognition from remote sensing images Zhi-Ze Wu, Shou-Hong Wan, Xiao-Feng Wang, Ming Tan, Le...

8MB Sizes 0 Downloads 13 Views

Journal Pre-proof A benchmark data set for aircraft type recognition from remote sensing images Zhi-Ze Wu, Shou-Hong Wan, Xiao-Feng Wang, Ming Tan, Le Zou, Xin-Lu Li, Yan Chen

PII: DOI: Reference:

S1568-4946(20)30072-7 https://doi.org/10.1016/j.asoc.2020.106132 ASOC 106132

To appear in:

Applied Soft Computing Journal

Received date : 19 May 2019 Revised date : 16 January 2020 Accepted date : 23 January 2020 Please cite this article as: Z.-Z. Wu, S.-H. Wan, X.-F. Wang et al., A benchmark data set for aircraft type recognition from remote sensing images, Applied Soft Computing Journal (2020), doi: https://doi.org/10.1016/j.asoc.2020.106132. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Elsevier B.V. All rights reserved.

Journal Pre-proof

pro of

A Benchmark Data Set for Aircraft Type Recognition from Remote Sensing Images Zhi-Ze Wua , Shou-Hong Wanb , Xiao-Feng Wangc,∗, Ming Tanc , Le Zouc , Xin-Lu Lia , Yan Chena a Institute

of Applied Optimization, School of Artificial Intelligence and Big Data, Hefei University; Jinxiu Dadao 99, Hefei, Anhui, China, 230601. b School of Computer Science and Technology, University of Science and Technology of China; Hefei, Anhui, China, 230027. c School of Artificial Intelligence and Big Data, Hefei University; Jinxiu Dadao 99, Hefei, Anhui, China, 230601.

re-

Abstract

Aircraft type recognition from remote sensing images has many civil and military applications. In images obtained with modern technologies such as high spatial resolution remote sensing, even details of aircraft can become visible.

lP

With this, the identification of aircraft types from remote sensing images becomes possible. However, the existing methods for this purpose have mostly been evaluated on different data sets and under different experimental settings. This makes it hard to compare their results and judge the progress in the field. Moreover, the data sets used are often not publicly available, which brings dif-

urn a

ficulties to reproduce the works for fair comparison. This severely limits the progress of research and the state of the art is not entirely clear. To address this problem, we introduce a new benchmark data set for aircraft type recognition from remote sensing images. This data set is called Multi-Type Aircraft Remote Sensing Images (MTARSI), which contains 9’385 images of 20 aircraft types, with complex backgrounds, different spatial resolutions, and complicated

Jo

variations in pose, spatial location, illumination, and time period. The pub∗ Corresponding

author Email addresses: [email protected] (Zhi-Ze Wu), [email protected] (Shou-Hong Wan), [email protected] (Xiao-Feng Wang), [email protected] (Ming Tan), [email protected] (Le Zou), [email protected] (Xin-Lu Li), [email protected] (Yan Chen) URL: iao.hfuu.edu.cn (Zhi-Ze Wu)

Preprint submitted to Applied Soft computing

January 16, 2020

Journal Pre-proof

licly available MTARSI data set allows researchers to develop more accurate and robust methods for both remote sensing image processing and interpreta-

pro of

tion analysis of remote sensing object. We also provide a performance analysis of state-of-the-art aircraft type recognition and deep learning approaches on MTARSI, which serves as baseline result on this benchmark.

Keywords: Benchmark, Airplane Type Recognition, Remote Sensing Images, Pattern Recognition

1. Introduction

With the rapid development of the technology, remote sensing images have

re-

become a data source of great significance, which is widely used in civil [1] and military fields [2]. Among the many possible objects to detect from remote sensing image, aircraft are a key concern. Aircraft are not only an important means of transportation in the civil sector, but also a strategic objective in

lP

the military field. Here, dynamic aircraft detection and type identification can provide information of great significance for a timely analysis of the battlefield situation and the formulation of military decisions [2, 3]. Civil applications such as emergency aircraft search, airport surveillance, and aircraft detection are highly important as well [1, 4].

urn a

Due to the continuous improvement of the spatial resolution of remote sensing cameras, the information contained in single images is growing. The currently existing automated interpretation approaches for identifying aircraft from such images are unable to meet the needs of many real-world applications [5]. Therefore, the question how to accurately and quickly extract the position and type of aircraft has come into the focus of research. In particular, aircraft type recognition has attracted significant attention

Jo

from academia and industry. Aircraft differ in details such as model and type, as illustrated in Figure 1, where three types of transport aircraft, C-1, C-17, and C-130, are shown in sequence. Even though these aircraft have different uses and functions, they are quite similar in appearance. In addition, complex

2

(a) C-5.

(b) C-17.

pro of

Journal Pre-proof

(c) C-130.

Figure 1: Three types of transport aircraft: C-5, C-17, and C-130

backgrounds and the characteristics of the image sources, i.e., different satellites, cause additional difficulties for aircraft recognition. The apparent shape, shadow, and color of an object may further be influenced by the solar radi-

re-

ation angle, the radar radiation angle, and the surrounding environment [6]. Therefore, recognizing the aircraft type from remote sensing images remains a challenging problem.

In recent years, several data sets with images that can be used to train air-

lP

craft detection algorithms have been created, including UCMerced LandUse [7], NWPU-RESISC45 [8], PatternNet [9], and FGVC-Aircraft [10]. However, in UCMerced LandUse, NWPU-RESISC45, and PatternNet, aircraft are only a sub-category and appear only in a subset of the data. Also, there is no distinction between their types. While FGVC-Aircraft can be used for fine grained

urn a

visual categorization of aircraft, it is not based on remote sensing images and instead contains natural scenes. In summary, the publicly available data sets are not suitable for the domain of aircraft type recognition from remote sensing images.

The currently existing approaches for aircraft type recognition can coarsely be divided into three categories, namely methods based on template matching [4, 11, 12], based on the shallow feature learning model [3, 13, 14], and based on

Jo

deep neural networks [4, 6, 12, 15, 16]. Most of these works were evaluated under different experimental settings and on different data sets, which, moreover, are not publicly available. This makes it hard to judge the overall progress in the field and also reduces the reproducibility of results very significantly. 3

Journal Pre-proof

A benchmark data set is needed, which should be permanently publicly available and which would make it easy to perform fair algorithm comparisons. Such

pro of

a benchmark would accelerate and promote the development of the research field [17]. The data set should demonstrate all the challenging aspects of the problem, for instance, the high diversity in the geometrical and spatial patterns of aircraft in remote sensing images. Along with the data set, a comprehensive set of results obtained with state-of-the-art approaches should be provided, so that new works can directly be compared to tested, published, and verified results. In this paper, we present such a benchmark data set, called the Multitype Aircraft Remote Sensing Images (MTARSI) [18]. It contains 9’385 remote

re-

sensing images of 20 aircraft types, with complex backgrounds, different spatial resolutions, and complicated variations in pose, spatial location, illumination, and time period. As discussed in the benchmarking study [19], we describe the collection, simulation, labeling, and categorization of the samples for the data set in detail. This will allow researchers to extend the data set using the same

lP

approach as us, if the need for larger data set should arise due to the progress in technology, or to develop similar data sets for other domains. We then compare and evaluate the performance of several typical aircraft type recognition and deep learning approaches on MTARSI in order to provide proper baseline

urn a

results.

The contributions of this paper are three-fold: 1. We construct, present, and publish MTARSI – the first public aircraft remote sensing image database. By doing so, we significantly aid the development of robust methods for aircraft type recognition from remote sensing images.

2. We describe in detail how this data set is created by applying image sim-

Jo

ulation methods supported by data gathered from online sources. This allows for applying our approach to generate high-quality benchmark image data sets for other application areas.

3. We provide the results obtained by state-of-the-art approaches in the field,

4

的国家地图(United State Geological Survey National Map)中选取部分城市区域 的影像,再随机裁剪成图像大小为 256×256 像素而来的,这些图像的空间分辨率 为 1 英尺,约 0.3 米。数据集一共包含有 21 类不同的场景类别,分别为:农业 Journal Pre-proof 用地、飞机、棒球场、海滩、建筑物、丛林、高密度住宅区、森林、高速公路、 高尔夫球场、海港、十字路口、中密度住宅区、移动式房屋停车场、天桥、停车 场、河流、跑道、低密度住宅区、油罐和网球场。该数据集共有 2100 张图像,

pro of

每个类别 100 张。图错误!文档中没有指定样式的文字。.1 为该数据集中的部分样例。

re-

图错误!文档中没有指定样式的文字。.1 UCMerced_LandUse 数据集部分图像样例 Figure 2: A subset of the images in the UCMerced LandUse Data Set [7]

(5)NWPU-RESISC45 数据集

to serve as baseline for comparison for researchers working on remote

错误!未找到引用源。 sensing computer vision. We show 是由西北工业大学创建的,该数据集是 that MTARSI can indeed identify the NWPU-RESISC45 数据集

strengths and weaknesses of different aircraft type recognition algorithms 迄今为止数量最大、类别最多的开源遥感图像场景分类数据集,一共有 31500 张

lP

for remote sensing images. 图像,它首先调研了所以现存数据集中已有的场景类别,然后挑选了最具代表性

的 45 种遥感场景类别,如图错误 !文档中没有指定样式的文字。.2 45 种场 The remainder of this paper is organized as follows. In Section所示,这 2, we review 景类别包括:飞机、机场、棒球场、篮球场、海滩、桥梁、灌木丛、教堂、圆形 existing data sets for aircraft recognition and the state-of-the-art in aircraft 农田、云、商业区、高密度住宅区、沙漠、森林、高速公路、高尔夫球场、田径 type recognition from remote sensing images. Then, the design and contents of 场、港口、工业区、十字路口、岛屿、湖泊、草地、中密度住宅区、公园、山区、 MTARSI are presented in Section 3. The performance of several state-of-the-

urn a

天桥、宫殿、停车场、铁路、火车站、矩形农田、河流、环岛、跑道、海上浮冰、 art aircraft type recognition methods is evaluated on the MTARSI in Section 4. Finally, the paper closes with a conclusion and discussion in Section 5, where we also describe how to obtain the MTARSI data set from its public repository.

2. Related Work

In this section, we first review several data sets commonly used for aircraft

Jo

recognition and then present a comprehensive analysis of state-of-the-art for aircraft type recognition from remote sensing images.

5

Journal Pre-proof

2.1. Existing Data Sets for Aircraft Recognition At present, there are four popular data sets for aircraft recognition,

pro of

namely UCMerced LandUse [7], NWPU-RESISC45 [8], PatternNet [9], and FGVC-Aircraft [10].

The UCMerced LandUse Data Set [7] was established at the University of California, Merced, USA. It is one of the most widely used data sets in the field of remote sensing, especially for various remote sensing image classification tasks. The images in UCMerced LandUse were first selected from the United States Geological Survey National Map and then randomly cropped into sizes of 256 × 256 pixels. The spatial resolution is 1 foot (about 0.3 meters).

re-

UCMerced LandUse contains a total of 21 different scenario types: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court. The data set has a total of 2100 images, 100 in each category.

lP

Note that “aircraft” is only a sub-category of this data set, i.e., there are only 100 images relevant to our domain. Figure 2 shows some examples of this data set.

The NWPU-RESISC45 Data Set [8] was created at the Northwestern Poly-

urn a

technical University, Xi’an, China. The data set is by far the largest open source remote sensing imagery data sets. It contains a total of 31’500 images, which are based on an investigation of existing data sets and a selection of the most representative images of 45 remote sensing scene categories, including aircraft, airports, but also baseball fields, basketball courts, beaches, bridges, bushes, churches, circular farmland, clouds, commercial areas, industrial areas, crossroads, railway, boats, snow mountains, and so on. Each category in the data set has 700 images with an image size of 256 × 256 pixels and a spatial resolution

Jo

of between 0.2 and 30 meters. Aircraft are again a sub-category of this data set, this time including 700 images. Figure 3 shows some examples of this data set. The PatternNet Data Set [9] was established at the State Key Laboratory

of Surveying and Mapping Remote Sensing Information Engineering of Wuhan 6

Journal Pre-proof 船、雪山、低密度住宅区、体育场、油罐、网球场、梯田、热电站和湿地。数据 集中每种类别具有 700 张图像,图像大小为 256×256 像素,空间分辨率控制在 0.2 米到 30 米之间。由于 NWPU-RESISC45 数据集类别众多,涉及到的类别从 小型目标(如飞机、船)到大型场景目标(如住宅区、机场),因此该数据集也

urn a

lP

re-

pro of

是遥感图像场景分类任务中一个极具挑战性的数据集。

图错误!文档中没有指定样式的文字。.2 NWPU-RESISC45 数据集部分图像样例 Figure 3: A subset of the images in the NWPU-RESISC45 Data Set [8]

(6)PatternNet 数据集

Jo

University, Wuhan, China and the University of California, Merced, USA. The 错误 未找到引用源。 PatternNet 数据集 ! 是由武汉大学测绘遥感信息工程国家重点实验 data set is currently the largest high spatial resolution remote sensing image 室与加州大学默塞德分校联合建立的,PatternNet 名字的由来是受一个开源的遥 data set, with a total of 38 categories, such as airplanes, baseball field, basket感卫星图像视觉搜索工具 TerraPattern 的启发。该数据集也是目前规模最大的一 ball court, beach, bridges, cemeteries, shrubs, tanks, swimming pools, tennis 个高空间分辨率遥感图像数据集之一,它一共有 38 个类别,分别是飞机、棒球 courts, substations, and so on. Each category contains 800 images with a size 场、篮球场、海滩、桥梁、墓地、灌木丛、圣诞树农场、封闭公路、海滨别墅、 of 256 × 256 pixels and spatial resolutions ranging from 0.062m to 4.693m. The images of the PatternNet data set are obtained from the Google Earth Satel-

7

为 256×256 像素。PatternNet 数据集的图像是从谷歌地球卫星影像和谷歌地图美 国城市接口中获取得到的,和 AID 数据集、NWPU-RESISC45 数据集类似,

Journal Pre-proof PatternNet 数据集也包含不同的空间分辨率, 范围从 0.062 米至 4.693 米,数据集 中图像普遍为高空间分辨率。值得注意的是,从错误!未找到引用源。和图错误! 文档中没有指定样式的文字。.2 中可以看出,AID 数据集和 NWPU-RESISC45 数据

集中的大部分图像都包含有大面积的背景,感兴趣的目标占比较小,且位置在角 落里,这样容易影响图像表征。而如图错误!文档中没有指定样式的文字。.3 所示, PatternNet 数据集的图像中目标对应的占比较大,背景部分较少,有利于对图像

lP

re-

pro of

的表征。

图错误!文档中没有指定样式的文字。.3 PatternNet 数据集部分图像样例

urn a

Figure 4: A subset of the images in the PatternNet Data Set [9]

lite Imagery and the Google Maps US Urban Interface. Figure 4 shows some examples from this data set.

The Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft [10]) is a benchmark data set for the visual categorization of aircraft. The data set contains 10’200 images of aircraft, with 100 images for each of 102 different aircraft model variants, most of which are airplanes [10]. The (main) aircraft

Jo

in each image is annotated with a tight bounding box and a hierarchical airplane model label. Aircraft models are organized in a hierarchy with four levels [10]. These are, from finer to coarser: • Model, e.g., Boeing 737-76J: Since certain models are nearly visually 8

Journal Pre-proof

727−200

737−200

737−300

737−400

737−500

737−600

737−700

737−800

737−900

747−100

747−200

747−300

747−400

757−200

757−300

767−200

767−300

767−400

777−200

777−300

A300B4

A310

A318

A319

A320

A321

A330−200

A330−300

A340−200

A340−300

A340−500

A340−600

A380

ATR−42

Beechcraft 1900

Boeing 717

C−130

C−47

CRJ−200

Cessna 560

Challenger 600

DC−10

DC−3

DC−6

DHC−8−100

DHC−8−300

DR−400

Dornier 328

E−170

Embraer Legacy 600

Eurofighter Typhoon

F−16A/B

F/A−18

Falcon 2000

Gulfstream IV

Gulfstream V

Hawk T1

Il−76

Model B200

PA−28

pro of

707−320

An−12

BAE 146−200

BAE 146−300

BAE−125

CRJ−700

CRJ−900

Cessna 172

Cessna 208

Cessna 525

DC−8

DC−9−30

DH−82

DHC−1

DHC−6

E−190

E−195

EMB−120

ERJ 135

ERJ 145

Falcon 900

Fokker 100

Fokker 50

Fokker 70

Global Express

MD−80

MD−87

MD−90

Metroliner

Tu−154

Yak−42

re-

ATR−72

L−1011

MD−11

Figure 5: A subset of the images in the FGVC-Aircraft Data Set [10] SR−20

Saab 2000

Saab 340

Spitfire

Tornado

Tu−134

lP

indistinguishable, this level is usually not used in the evaluation of algorithms. Figure 1. Our dataset contains 100 variants of aircrafts shown above. These are also annotated with their family and manufacturer, as well as bounding boxes.

Jo

urn a

• Variant, A variant collapses thetasks: models that variant recognition, airmore substantial. The goal e.g., of thisBoeing level is 737-700: to create a clasWe defineall three aircraft sification task ofare intermediate difficulty. For example, the craft family recognition, and aircraft manufacturer recogvisually indistinguishable into one class [10]. The data set comprises family Boeing 737 contains variants 737-200, 737-300, nition. The performance is evaluated as class-normalised 102dataset different variants. . . . , 737-900. The contains 70 families. average classification accuracy, obtained as the average of • Manufacturer. A manufacturer is a grouping of families the diagonal elements of the normalised confusion matrix. produced by•the same company. For example, Boeing Formally, 70 let different yi ∈ {1,families. . . . , M } the ground truth label for Family, e.g., Boeing 737: The data set comprises contains the families 707, 727, 737, . . . . The dataset conimage i = 1, . . . , N (where N = 10, 000 and M = 100 tains airplanes by 30 differente.g., manufacturers. forcomprises variant recognition). Letmanuyˆi be the label estimated auto• made Manufacturer, Boeing: The data set 41 different matically. The entry C of the confusion matrix is given pq The list of model variants and corresponding example imfacturers. by ages are given in Fig. 1 and the hierarchy is given in Fig. 2. |{i : yˆi = q ∧ yi = p}| C = FGVC-Aircraft contains 100 example images for each of Note that this data has been used as part of ImageNet [20].pq Some samples |{i : yi = p}| the 100 model variants. The image resolution is about 1-2 Mpixels. Image varies images were captured in 5, showing from quality this data setasare displayed in Figure the images contain of a set. The classwhere |that · | denotes the cardinality PM a span of decades, but it is usually very good. The dominormalised average accuracy then p=1 Cpp /M . front and side views of aircraft, while remote sensing images from satellitesis are nant aircraft is generally well centred, which helps focusThe dataset is made publicly available for reing on fine-grained rather while than object detecusually discrimination top views. Thus, FGVC-Aircraft can indeed be used visual search purposes only for at http://www.robots.ox. tion. Images are equally divided into training, validation, ac.uk/ vgg/data/fgvc-aircraft/. Please note ˜ categorization aircraft, it is either not suitable or comparing algorithms and test subsets, so that eachofsubset contains 33 or 34for training (Sect. 3.1) that the data contains images that were generimages for each variant.sensing. Algorithms should be designed on for remote ously made available for research purposes by several phothe training and validation subsets, and tested just once on tographers; however, these images should not be used for the test subset to avoid over fitting. any other purpose without obtaining prior and explicit conBounding box information can be used for training9the sent by the respective authors (see Sect. 5.1 for further deaircraft classifiers, but should not be used for testing. tails). 2

Journal Pre-proof

To the best of our knowledge, no remote sensing data set for aircraft type recognition is publicly available. This is perhaps the primary reason for limiting

pro of

the development in the area. 2.2. Review on Aircraft Type Recognition of Remote Sensing Images

Most of the recent methods for aircraft recognition from remote sensing images are focused on deciding whether an object is an aircraft or not [21, 22, 23, 24, 25]. Less attention has been awarded to the further fine-grained classification and recognition of the aircraft model and type. General image classification approaches can be divided into hand-crafted feature based methods [26, 27, 28, 29] and deep learning based methods [30, 31, 32, 33]. The research on object

re-

type identification from remote sensing images can roughly be divided into three categories, namely methods that based on template matching [4, 11, 12], based on the shallow feature learning model [3, 13, 14], and based on deep neural networks [4, 12, 15, 16, 6].

lP

Let us first consider the template matching based methods.

Here,

Wu et al. [11] proposed a jigsaw reconstruction approach to extract aircraft shapes and then match them with standard templates.

Zhao et al. [4] de-

signed a keypoint detection model based on Convolutional Neuronal Networks (CNNs) [30] and a keypoint matching method to recognize aircraft, transforming

urn a

the aircraft recognition problem into the landmark detection problem. Due to the use of CNNs, this method could also partially be classified as a deep learning approach. Furthermore, Zuo et al. [12] built a template matching method using both aircraft keypoints and segmentation results, which provided more detailed information to improve recognition accuracy. An important shallow feature learning model based method was proposed by Hsieh et al. [13], who use a hierarchical classification algorithm based on four

Jo

different features: wavelet transform, Zernike moment, distance transform, and bitmap. In [3], a coarse-to-fine method is built to integrate high-level information for aircraft type recognition. A method using the artificial bee colony

10

Journal Pre-proof

algorithm with an edge potential function is proposed in [14] to solve this problem.

pro of

These methods for object type identification from remote sensing images have achieved significant results, but there are still many shortcomings. The methods based on template matching have a problem to adapt to complex scenarios [6]. The methods based on the shallow feature learning model have difficulties to extract high-level semantic features from objects and there is a bottleneck in the recognition performance [15].

In recent years, with the improvement of parallel GPU computation and the emergence of large-scale image data sets, deep neural networks [30, 31, 32,

re-

33, 34, 35] have experienced tremendous development and have proven to be efficient in a variety of vision tasks, such as the classification [36, 37, 38, 39], detection [40, 41, 42, 43], and segmentation [44, 45, 46, 47] of images. Also benefiting from the existing large-scale data sets from different application domains [20, 48], deep neural networks can be initialized by using transfer learn-

lP

ing [38, 49]. As a result, the application of deep neural networks has led to breakthroughs in aircraft recognition from remote sensing images. Diao et al. [15] used a pixel-wise learning method based on deep belief networks (DBNs) for object recognition in remote sensing images. In [4], Zhao et al. proposed a idea

urn a

to address the aircraft type recognition problem by detecting the landmark points of an aircraft using a vanilla network. By combing semantic segmentation and aircraft type recognition, another CNN based method is used in [12]. Zhang et al. [16] developed an aircraft type recognition framework for learning representative features from images without type labels, based on conditional generative adversarial networks (GANs). Fu et al. [6] propose a fine-grained aircraft recognition method for remote sensing images. Their multi-class activation mapping (MultiCAM) utilizes two subnetworks, i.e., the target net and

Jo

the object net, in order fully use the features of discriminative object parts. Compared with handcrafted-feature based methods, neural-network based

models can obtain better discriminative feature representations and achieve significant improvements in terms of the recognition accuracy. However, these 11

Journal Pre-proof

models must be trained well on target domain data sets after the initialized stage.

pro of

One problem is that the aforementioned deep neural network based aircraft recognition methods [4, 6, 12, 16, 15] for remote sensing images were evaluated on different data sets under different experimental settings. This makes it hard to compare their results. Additionally, the data sets used in these works and in the research on the template matching based [4, 11, 12] and shallow feature learning based methods [3, 13, 14] are not publicly available. It is thus not easy to reproduce these works for a fair comparisons. Therefore, the state of the art of aircraft type recognition of remote images is not entirely clear.

re-

To address this problem, we introduce a remote sensing image data set of diverse aircraft, which can be used to assess and evaluate the performance of aircraft type recognition algorithms. It also particularly benefits the development of both remote sensing image processing and interpretation analysis of

lP

remote sensing object in the future.

3. The MTARSI Data Set

To advance the state of the art in aircraft type recognition of remote sensing images, we construct MTARSI, a public large-scale aircraft image data set. In

urn a

all, MTARSI has a total of 9’385 remote sensing images acquired from Google Earth satellite imagery and manually expanded, including 20 different types of aircraft covering 36 airports. The new data set is made up of the following 20 aircraft types: B-1, B-2, B-29, B-52, Boeing, C-130, C-135, C-17, C-5, E-3, F-16, F-22, KC-10, C-21, U-2, A-10, A-26, P-63, T-6, and T-43. All the sample images are carefully labeled by seven specialists in the field of remote sensing images interpretation. Each image contains exactly one complete aircraft. Each

Jo

type of aircraft sample in the MTARSI data set is shown in Figure 6. The numbers of sample images vary with different aircraft types (see Table 1) and range from 230 to 846.

12

(c) B-2

(d) B-52

(e) C-5

(f) C-17

(g) C-130

(h) C-135

(k) KC-10

(l) C-21

(n) T-6

(o) A-10

(p) A-26

(r) F-16

(s) B-29

(t) T-43

re-

(b) B-1

lP

(a) Boeing

pro of

Journal Pre-proof

(j) F-22

urn a

(i) E-3

Jo

(m) U-2

(q) P-63

Figure 6: Samples of the 20 aircraft types from MTARSI

13

Journal Pre-proof

Table 1: Different aircraft classes and the number of images in each class of MTARSI

#images

Types

#images

Types

#images

Types

#images

B-1

513

C-130

763

F-16

372

A-10

345

pro of

Types

B-2

619

C-135

526

F-22

846

T-6

248

B-29

321

C-17

480

KC-10

554

A-26

230

B-52

548

C-5

499

C-21

491

P-63

305

Boeing

605

E-3

452

U-2

362

T-43

306

The spatial resolution of the images is between 0.3 and 1.0 meters. During the data set construction process, the sample images for each class in MTARSI are carefully chosen from different airports around the world, but mainly in

re-

the United States, England, Japan, Russia, France, and China. They were extracted at different times and seasons under different imaging conditions, which increases the intraclass diversities of the data. Different colors of the body paint are chosen for the same type of aircraft and different backgrounds

lP

and poses are also taken into account. This way, the intra-class diversity of the data set is greatly increased. Moreover, for the each type, aircraft with similar appearance but different models are selected, which gives the data set certain inter-class similarity. Therefore, this data set is challenging for methods that try to recognize the aircraft type.

urn a

In Table 2, we show some samples based on variations of resolution, pose, color, light, background, and model. Some aircraft, such as the B-2 bomber and the KC-10 tanker, are very rare and difficult to capture by satellite remote sensing. This situation hinders the process of collecting and building image data sets. In order to mitigate this problem, we artificially expand the data by simulating images of aircraft that are rare and difficult to be observed. The specific process is shown in Figure 8.

Jo

First, a Prewitt-operator based segmentation algorithm [50] is applied to an existing image of the aircraft. The Prewitt operator is often used in image processing, particularly within edge detection algorithms [51]. It is a discrete differentiation operator approximating the gradient of the image intensity func-

14

Journal Pre-proof

Table 2: Samples under the variation of resolution, pose, color, time/light, background, and

pro of

model.

Resolution

Pose

re-

Color

urn a

Background

lP

Time/light

Model

tion. With this operator, we can separate the aircraft from the background.

Jo

A different background image is then selected randomly from the relevant remote sensing images (such as airport runways and landing field, etc.) that do not contain any aircraft. Figure 7 shows four background sample images, including cross runway, grass, tarmac and fog. Finally, the extracted aircraft

15

(a) Cross

(b) Grass

pro of

Journal Pre-proof

(c) Tarmac

(d) Fog

Figure 7: Background samples for simulating rare aircraft.

Foreground Extraction

Rotations and Flips

re-

Airplane Chip

Embedding

Background Chips

lP

Randomly Selecting a Background Chip

Figure 8: Flowchart of the remote sensing aircraft image simulation strategy

image is rotated, mirrored, and then combined with the selected background

urn a

to obtain a final simulated image. The extracted object and the selected background image are always in the same spatial resolution and are obtained from the same remote sensing sensor.

4. Evaluation of State-of-the-Art Algorithms on the MTARSI Data Set

Jo

We now evaluate general identification algorithms and deep learning approaches for aircraft type recognition on our new data set. We first investigate the performance of the typical state-of-the-art (but still off-the-shelf) CNN AlexNet [30].

16

Journal Pre-proof

Transfer learning [38, 49] is typically applied in object recognition. It prescribes two steps, namely pre-training on a different data set and then fine-

pro of

tuning on the target data set (here: MTARSI). We thus conduct a series of experiments with AlexNet applying pre-training in Section 4.1.

The question how the fine-tuning parameters effects the final performance is then investigated in Section 4.2. Thus, we establish a baseline for the performance of a very typical contemporary machine learning technique in our domain and on our data set. Any more sophisticated approach which is developed for aircraft type recognition from satellite images should be able to outperform this method.

re-

We then conduct a larger study to compare ten aircraft recognition methods in Section 4.3. Six of them again apply deep learning based on CNNs while the other four methods are hand-crafted feature approaches. This way, we cover the state-of-the-art on aircraft recognition from remote sensing image data and can provide reproducible results that other researchers can compare with.

lP

Finally, we then investigate which of the image features of MTARSI cause problems for which approach in Section 4.4 . In summary, we answer questions such as:

• Which types of aircraft were recognized incorrectly by which method in

urn a

our experiment?

• Which aircraft were confused most often? • Which variations caused most trouble for which approach? As environment for all experiments, we use a PC with Intel i7-7700 CPU with 3.6 GHz, 16 GB RAM, and an Nvidia GTX Titan X graphics card. The algorithms are implemented and executed based on the Caffe open source li-

Jo

brary [52].

17

Journal Pre-proof

4.1. Experiments with Pre-training AlexNet Network using Different Source Domain Data Sets

pro of

We first want to analyze the impact of transferring knowledge from different source domains to our target domain, the aircraft recognition from remote sensing data. We also experimentally investigate which kind of source data set is appropriate for pre-training deep neural networks for aircraft type recognition from remote sensing images. This experiment therefore uses different source domain data sets, namely the seven data sets: UCMerched LandUse [7], NWPU-RESISC45 [8], PatternNet [9], ImageNet [20], AID [5], WHU-RS19 [53] and RSSCN7 [54]. AID, WHU-RS19, and RSSCN7 are the latest data sets used

re-

to classify remote sensing scenes. Based on the source data sets, a group of experiments that use MTARSI to fine-tune the AlexNet network model without using transfer learning is carried out for comparison.

When pre-training the AlexNet model with different data sets, the output of the eighth fully connected layer needs to be modified accordingly, that is,

lP

its output dimension should be equal to the number of categories used in the pre-trained data set. For example, when using the ImageNet data set for pretraining, the output dimension of the eighth fully connected layer in the AlexNet network model needs to be set to 1000.

urn a

As listed in Table 3, AlexNet requires inputs of a the fixed size 227 × 227 × 3. After two alternations of convolutional and pooling layers, it produces as output 256 feature maps, each with the size 13 × 13. The third, fourth, and fifth convolutional layers are connected to each other without poolings layers. The third convolutional layer (Conv3) takes the output of the second pooling layer (Pool2) as input and filters it with 384 kernels of size 3 × 3 × 256. The fourth convolutional layer contains 384 kernels of size 3 × 3 × 384, and the fifth convolutional layer has 256 kernels of size 3 × 3 × 384. The series of convolutional

Jo

layers is followed by a pooling layer (Pool5) and the three fully-connected layers F6, F7 and F8. The output of the last fully-connected layer (F8) is then a 1000-dimensional vector, i.e. a probability distribution over the 1000 class labels.

18

Journal Pre-proof

Table 3: Details of network configurations of AlexNet used in the experiments

Input

Kernel

Stride

Pad

Output

Conv1

227*227*3

11*11

4

0

55*55*96

Pool1

55*55*96

3*3

2

0

27*27*96

Conv2

27*27*96

5*5

1

2

27*27*256

Pool2

27*27*256

3*3

2

0

13*13*256

Conv3

13*13*256

3*3

1

1

13*13*384

Conv4

13*13*384

3*3

1

1

13*13*384

Conv5

13*13*384

3*3

1

1

13*13*256

Pool5

13*13*256

3*3

2

0

6*6*256

FC6

6*6*256

6*6

1

0

4096*1

FC7

4096*1

1*1

1

0

4096*1

FC8

4096*1

1*1

1

0

1000*1

re-

pro of

Layer

lP

When the AlexNet network model is pre-trained to convergence, the pretraining parameters of all layers except the eighth fully-connected layer are migrated into the fine-tuning AlexNet network model. Then, the MTARSI data set is divided into a training and a test data set according to a ratio 4 : 1, that is, the training of AlexNet network model is fine-tuned using 80% of MTARSI.

urn a

The remaining 20% of the data are used to test and verify the network model after training and fine-tuning, and to evaluate the performance of the network model by identifying the correct rate. Based on the settings described in [30], the learning rate of each layer during the fine-tuning of the AlexNet model was set to 0.001, and it would be set to the previous 10% after 5000 iterations, and the momentum unit was set to 0.9. The weight decay is set to 0.0005, the batch size is set to 44, and the number of iterations depends on the convergence of

Jo

the network model.

The experimental results are shown in Table 4. We note that when the

MTARSI data set is used directly for training without transfer learning, the recognition accuracy of the AlexNet network model on MTARSI is 80.13%.

19

Journal Pre-proof

Table 4: Pre-training results using different source domain data sets

Scale

Categories

Accuracy

Without Pre-training

-

-

80.13%

WHU-RS19

1005

19

75.65%

UCMerced LandUse

2100

21

74.47%

RSSCN7

2800

7

77.15%

AID

10’000

30

80.56%

NWPU-RESISC45

30’400

45

82.527%

PatternNet

31’500

38

81.79%

ImageNet

about 1.2 million

1000

84.61%

re-

pro of

Pre-training Data Set

However, when the data sets WHU-RS19, UCMerched LandUse, and RSSCN7 are used for pre-training, then even after parameter fine-tuning based on MTARSI, the recognition accuracy is lower. This is called the “negative trans-

lP

fer” phenomenon and means that the knowledge learned from the source domain has a negative impact on the learning on the target domain [49]. The reason for this phenomenon is that the three source domain data sets do not contain enough images and overfitting occurs during the pre-training. The knowledge in the source domain has not been learned well and thus cannot be transferred

urn a

usefully to the target domain.

If we use the data sets AID, NWPU-RESISC45, PatternNet, or ImageNet as the source domain data set for pre-training, then AlexNet works well after the fine-tuning phase.

The accuracy becomes 80.56%, 82.52%, 81.79%,

and 84.61%, respectively, which is significantly higher than the 80.13% achieved without transfer learning. The results obtained with pre-training on ImageNet are the best.

Jo

When the size of the source domain data set used in the pre-training reaches a certain level and there is enough data to ensure that the knowledge in the source domain is grasped, transfer learning will promote the learning on the target domain MTARSI. In order to measure the value of the pre-training on

20

Journal Pre-proof

loos

2 1.5 1 0.5 0

iterations training loos

test loos

pre-training on ImageNet

3

loos

2.5 2 1.5

0

lP

1 0.5

accuracy

re-

3.5

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

pro of

3 2.5

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

accuracy

3.5

accuracy

without pre-training

iterations

test loos

accuracy

urn a

training loos

Figure 9: Loss and recognition accuracy rate of the AlexNet network on MTARSI for no pre-training (top) and pre-training (bottom) on ImageNet

the source domain data set, we additionally carry out a group of experiments where we directly train on MTARSI without using transfer learning. In Figure 9, we illustrate the loss and recognition accuracy rate of the AlexNet network on

Jo

MTARSI without pre-training and with pre-training on ImageNet, respectively. It can clearly be seen that the AlexNet network model using transfer learning

has better recognition accuracy and that the convergence speed and convergence effect on the training set and the test set are better than without transfer

21

Journal Pre-proof

Table 5: Different learning rate settings the and corresponding experimental results Conv-1

Conv-2

Conv-3

Conv-4

Conv-5

FC-6

FC-7

FC-8

accuracy

Model-1

0.001

0.001

0.001

0.001

0.001

0.001

0.001

0.001

85.61%

Model-2

0

0.001

0.001

0.001

0.001

0.001

0.001

0.001

85.61%

Model-3

0

0

0.001

0.001

0.001

0.001

0.001

0.001

85.61%

Model-4

0

0

0

0.001

0.001

0.001

0.001

0.001

84.31%

Model-5

0

0

0

0

0.001

0.001

0.001

0.001

83.56%

Model-6

0

0

0

0

0

0.001

0.001

0.001

82.35%

Model-7

0

0

0

0

0

0

0.001

0.001

81.52%

Model-8

0

0

0

0

0

0

0

0.001

80.27%

learning.

pro of

Model

re-

4.2. Experiments of Layers in AlexNet Network under Different Learning Rate Settings

We now investigate the different layers of the AlexNet network model under different learning rates, in order to further study the inner characteristics of

lP

deep neural networks for processing the remote sensing images. First, the ImageNet data set was used to pre-train the AlexNet network model, since we know that this has a very positive impact on the accuracy. Then, in the parameter transfer phase, eight different learning rate settings for the layers in the fine-tuning AlexNet network model were investigated. Each

urn a

setting corresponds to a model, as shown in Table 5. These models were again fine-tuned using 80% of the MTARSI data set and tested on the remaining 20% of the data.

In Table 5, the learning rate “0” indicates that the layer parameters are transferred from the pre-training stage without fine-tuning on MTARSI. Each iteration consists of 5000 training cycles, the learning rate is set to the previous 10%, the impulse unit is set to 0.9, the weight decay is set to 0.0005, the batch size

Jo

is set to 44, and the number of iterations depends on the convergence of the network model.

In the networks, the low-level convolutional layers generally extract common

features such as edges and textures of the image, while the high-level convo-

22

Journal Pre-proof

lutional layer and the all-connected layer extract more task-related high-level features [55]. As can be seen from Table 5, Model-2 only uses the parameters of

pro of

the Convolution Layer 1 transferred from the pre-trained AlexNet network and does not fine-tune the layer parameters of the target domain. It can achieve a recognition accuracy rate equivalent to Model-1. The same is true for Model-3. This also shows that the features extracted by the first few convolutional layers of the convolutional neural network have certain common semantics.

The experimental results of Model-4 to Model-8 show that the learning will be affected if no task-related parameter fine-tuning is performed for the highlevel convolutional layer and the fully connected layer after the parameter migra-

re-

tion. From this, we can conclude that the low-level parameters of the pre-trained CNNs can be directly used for the target task, while fine-tuning of the high-level parameters is necessary.

4.3. Experiments of Different Classification Algorithms on MTARSI

lP

We now investigate the performance of five state-of-the-art CNN structures on MTARSI, namely VGG [31], GoogLeNet [32], ResNet [33], DenseNet [34], and EfficientNet [35].

In the experiment, the five CNNs again use the ImageNet data set for pretraining.

The MTARSI data set is also again divided into a training and

urn a

a test set according to a ratio of 4 : 1 and the size of the images is fixed to 256 × 256 pixels. As in the experiment in Section 4.1, the learning rate of each layer of the network model in the parameter fine-tuning stage is uniformly set to 0.001, and the training rate is set to the previous 10%, the impulse unit is 0.9, the weight decay is 0.0005, the batch size is 44, and the number of iterations depends on the convergence of the network model. For comparison purposes, we also apply four hand-crafted feature-based ap-

Jo

proaches [26, 27, 28, 29]. For the local patch descriptor, such as SIFT and HOG, we use a fixed size grid (16 × 16 pixels) with the spacing step of 8 pixels to extract all the descriptors in the image and adopt the average pooling method for each dimension of the descriptor to get the final image features. We fix the 23

Journal Pre-proof

Table 6: Experimental results of different classification algorithms on the MTARSI data set

accuracy

SIFT [26]+BOVW

59.02%

HOG [27]+SVM ScSPM [28]

61.34% 60.61%

LLC [29]

64.93%

AlexNet

85.61%

GoogLeNet

86.53%

VGG

87.56%

ResNet

89.61% 89.15%

re-

DenseNet

pro of

Method

EifficientNet

89.79%

dictionary size to 4096 for BOVW and LLC, and to 256 for ScSPM.

lP

Table 6 shows that the aircraft type recognition algorithms based on transfer learning are far superior to the traditional classification methods in aircraft target recognition performance and that the average accuracy rate is improved by nearly 26%. This also indicates that for aircraft type recognition from remote sensing images, methods based on CNNs can extract higher-level features with

urn a

more generalization and image expression ability compared to the traditional feature-based methods. They also can more effectively distinguish aircraft of different types.

We further find that ResNet, DenseNet, and EfficientNet have similar performance, while AlexNet, GoogleNet, and VGG perform slightly worse. In terms of time performance of recognition, AlexNet and EfficientNet perform best, while the others are similar. Note that AlexNet has only eight layers, which is much

Jo

shallower than the other deep models. For EfficientNet, a compound scaling method is used for balancing network width, depth, and resolution. With this, inference via EfficientNet is 6.1 times faster than with ResNet [35].

24

Journal Pre-proof

4.4. Analysis of Image Features of MTARSI Based on our experiments with different classification algorithms, we now

pro of

investigate which of the image features of MTARSI are most problematic. We first study the recognition rate for each type of aircraft over different classification methods and check which types were confused most often. In order to find which variations caused most trouble for which approach, we further conduct a new group of experiments on a validation set of size 600, by randomly selecting 100 images from the testing set of MTARSI based on their variations including resolution, pose, color, light, background, and aircraft model.

In Figure 10, the confusion matrices on MTARSI for SIFT [26], HOG [27],

re-

ScSPM [28], LLC [29], are presented. Figure 11 provides the confusion matrices on MTARSI for AlexNet, GoogLeNet, VGG, ResNET, DenseNet, and EfficientNet. We find that the hand-crafted feature based methods do generalize, but suffer from some misclassifications on similar aircraft types or models, such as

lP

C-5 and Boeing or B-52 and C-5. The deep neural network based methods exhibit good performance except on some extremely similar aircraft. We also find that there are significant differences in the recognition performance of these two kinds of methods for each specific type.

We conclude that MTARSI is a challenging benchmark for most of the state-

urn a

of-the-art methods. It is very likely that it can help to design, analyze, and improve practical applications of aircraft identification. For this purpose, deep learning models like ResNet, DenseNet, and EfficientNet lend themselves. We illustrate the recognition rates of the state-of-the-art methods on the validation set including six variations, i.e., resolution, pose, color, light, background, and aircraft model, in Figure 12. We can see that the aircraft model and background are the two biggest factors that influence the method based

Jo

on hand-crafted feature, and the image resolution is the biggest variable that affects the performance of the deep learning method.

25

Journal Pre-proof

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.03

0.03

0.02

0.06

0.03

0.06

0.04

0.02

0.03

0.01

0.01

0.03

0.56

0.07

0.02

0.05

0.03

0.05

0.05

0.02

0.07

0.01

0.02

0.01

0.04

0.05

0.59

0.03

0.02

0.02

0.06

0.07

0.02

0.06

0.01

0.01

0.02

0.01

0.10

0.61

0.14

0.03

0.05

0.03

0.01

0.01

0.08

0.12

0.61

0.01

0.01

0.04

0.02

0.01

0.60

0.05

0.02

0.02

0.05

0.07

0.02

0.02

0.02

0.62

0.03 0.01

0.01

0.09

0.06

0.01

0.05

0.06

0.01 0.01 0.02

0.06

0.03

0.02

0.02

0.03

0.03

0.02

0.03

0.03

0.10

0.02

0.03

0.53

0.06

0.02

0.07

0.02

0.01

0.05

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.03

0.01

0.01

0.01

0.03

0.50

0.16

0.01

0.01

0.03

0.05

0.03

0.01

0.01

0.21

0.48

0.01

0.01

0.01

0.02

0.01

0.01

0.55

0.09

0.02

0.01

0.01

0.12

0.53

0.04

0.01

0.08

0.05

0.49

0.03

0.01

0.01

0.04

0.02

0.01

0.58

0.01

0.05

0.03

0.01

0.03

0.04

0.01

0.06

0.03

0.05

0.12

0.05

0.05

0.06

0.58

0.01

0.03

0.02

0.01

0.04

0.02

0.06

0.02

0.01

0.01

0.02

0.01

0.01

0.01

0.10

0.03

0.01

0.02

0.03

0.01

0.02

0.02

0.01

0.03

0.01

0.03

0.03

0.01

0.06

0.03

0.01

0.02

0.02

0.03

0.03

0.03

0.06

0.04

0.02

0.01

0.01

0.03

0.03

0.05

0.02

0.02

0.07

0.03

0.01

0.02

0.01

0.01

0.01

0.02

0.04

0.02

0.02

0.01

0.03

0.03

0.02

0.01

0.01

0.01

0.03

0.03

0.01

0.01

0.01

0.02

0.01

0.02

0.05

0.08

0.03

0.01

0.01

0.02

0.02

0.01

0.02

0.04

0.07

0.03

0.01

0.01

0.05

0.05

0.01

0.69

0.01

0.01

0.01

0.01

0.02

0.01 0.02

0.01

0.01

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01

0.01

0.08

0.01

0.01

0.01

0.01

0.01 0.05

0.03

0.01

0.03

0.01

0.02

0.11

0.01

0.01

0.09

0.61

0.02

0.01

0.01

0.01

0.62

0.03

0.01

0.02

0.58

(a) SIFT+BOVM

0.68

0.01

0.02

0.01

0.03

0.01

0.01

0.01

0.01

0.02

0.08

0.07

0.01

0.01

0.01

0.01

0.05

0.70

0.01

0.01

0.02

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.03

0.02

0.02

0.01

0.01

0.02

0.01

0.01

0.63

0.02

0.06

0.05

0.02

0.02

0.07

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.61

0.08

0.03

0.04

0.02

0.04

0.03

0.01

0.01

0.03

0.01

0.01

0.01

0.04

0.02

0.63

0.03

0.01

0.03

0.07

0.06

0.01

0.05

0.01

0.01

0.01

0.03

0.01

0.04

0.58

0.09

0.05

0.01

0.04

0.01

0.02

0.01

0.03

0.01

0.01

0.01

0.01

0.08

0.11

0.59

0.05

0.01

0.01

0.01

0.02

0.02

0.01

0.01

0.04

0.01

0.01

0.60

0.02

0.05

0.02

0.04

0.01

0.01

0.04

0.07

0.01

0.05

0.56

0.03

0.01

0.04

0.03

0.04

0.04

0.09

0.06

0.02

0.07

0.51

0.01

0.04

0.06

0.01

0.03

0.01

0.06

0.63

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.01

0.01 0.01

0.01

0.02 0.06

0.02

0.13

0.02

0.01 0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.09

0.01

0.01

0.03

0.02

0.02

0.01

0.01

0.01

0.03

0.03

0.02

0.01

0.01

0.01

0.04

0.01

0.01

0.01

0.01

0.53

0.12

0.04

0.02

0.02

0.01

0.01

0.03

0.01

0.01 0.01

0.03

0.01

0.01

0.01

0.01

0.02

0.16

0.51

0.02

0.02

0.02

0.01

0.02

0.01

0.06

0.01

0.02

0.01

0.63

0.05

0.01

0.03

0.01

0.01

0.01

0.01

0.03

0.01

0.02

0.08

0.02

0.03

0.07

0.62

0.02

0.01

0.01

0.01

0.02

0.03

0.07

0.03

0.01

0.01

0.07

0.05

0.61

0.01

0.01

0.03

0.03

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.02

0.04

0.01

0.04

0.02

0.07

0.02

0.01

0.01

0.01

0.01

0.01

0.02

0.07

0.03

0.04

0.02

0.01

0.01

0.01

0.01

0.02

0.01

0.03

0.05

0.03

0.01

0.02

0.02

0.09

0.01

0.01

0.01

0.02

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.02 0.03 0.08

0.03 0.03 0.02

0.03

0.02

0.02 0.01

0.01

0.03

0.01

0.03 0.04 0.02 0.03 0.01 0.01 0.03 0.01

0.01 0.01

0.03 0.01 0.02

0.03 0.60 0.07 0.02 0.05 0.03 0.05 0.05 0.02

0.05

0.06 0.03 0.57 0.04 0.01 0.04 0.07 0.08

0.06 0.02

0.02 0.01

0.01 0.01

0.02 0.01

0.03 0.01

0.03 0.63 0.12 0.05 0.01 0.05 0.01

0.05

0.02 0.01

0.01 0.06 0.04 0.01 0.01 0.02

0.01 0.01

0.64 0.03 0.05 0.04 0.03 0.02 0.07 0.02 0.01

0.03

0.04 0.04 0.02

0.01 0.07 0.09 0.62 0.04 0.03 0.01 0.58 0.03 0.05 0.01

0.06 0.01

0.02

0.01 0.05 0.06 0.01 0.04 0.59 0.04 0.01 0.01

0.01 0.06

0.02

0.06 0.01 0.01 0.02 0.59 0.07 0.01 0.03

0.02 0.05

0.02

0.04

0.01 0.10 0.57 0.05

0.02 0.02

0.09 0.04 0.56 0.01

0.03 0.08 0.04 0.01 0.01 0.01

0.02 0.60 0.08 0.02 0.03

0.02

0.02 0.01 0.01 0.01

0.03 0.07 0.05 0.01

0.02 0.05

0.02 0.05 0.01

0.03

0.02 0.01 0.01 0.01 0.04 0.02 0.04 0.07 0.01 0.04

0.02

0.03 0.02 0.01

0.01 0.02 0.16 0.53

0.01 0.03 0.03 0.02 0.03 0.01

0.01 0.02 0.01 0.01

0.05 0.03 0.03

0.02 0.56 0.10 0.03 0.05 0.03

0.01

0.02 0.03 0.01 0.01 0.01

0.02

0.01 0.02 0.05

0.03 0.04 0.01 0.02 0.01 0.05 0.68 0.01

0.02

0.11 0.02

0.06 0.02 0.08

0.03 0.03 0.02 0.01

0.01 0.02 0.04 0.09 0.05 0.03 0.05 0.59 0.01

0.05 0.02 0.01 0.01 0.01 0.01 0.01

0.01 0.01 0.02 0.02 0.01 0.01 0.01 0.01 0.01

0.04 0.01 0.02 0.11 0.63 0.03 0.04

0.09 0.02 0.02 0.01 0.01 0.01 0.01 0.02 0.01

0.01 0.03 0.07 0.01 0.01 0.01 0.65 0.01 0.02 0.62

0.03 0.07 0.01 0.01 0.01 0.03 0.07

0.01 0.01 0.01 0.03 0.04 0.02 0.01

0.02 0.01

0.03

0.01

0.01

0.68

0.12

0.01

0.01

0.13

0.67

0.01

0.01

0.01

0.01

0.68 0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.02

0.01

0.01

0.02

0.01

0.01

0.01

0.02

0.01

0.67

0.06

0.06

0.03

0.01

0.01

0.05

0.01

0.01

0.01

0.05

0.63

0.09

0.03

0.04

0.02

0.02

0.02

0.01

0.06

0.02

0.61

0.03

0.01

0.03

0.07

0.06

0.02

0.03

0.08

0.60

0.05

0.03

0.02

0.06

0.73

0.01

0.01

0.02

0.75

0.01

re-

0.01

0.02

0.06 0.71

(b) HOG+SVM

0.05 0.01 0.03 0.01 0.02 0.01 0.63

lP

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.73 0.03 0.02

pro of

0.76

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.02

0.04

(c) ScSPM

0.05

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.06

0.03

0.01

0.02

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.05

0.01

0.01

0.01

0.04

0.01

0.03

0.02

0.01 0.01

0.03

0.01

0.01

0.01 0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.02

0.01

0.03

0.01

0.10

0.07

0.61

0.05

0.02

0.03

0.01

0.01

0.01

0.01

0.08

0.04

0.06

0.63

0.01

0.04

0.02

0.01

0.04

0.01

0.01

0.01

0.03

0.06

0.02

0.01

0.03

0.63

0.04

0.01

0.01

0.06

0.01

0.02

0.03

0.03

0.08

0.03

0.03

0.04

0.58

0.01

0.04

0.04

0.01

0.02

0.04

0.02

0.04

0.02

0.01

0.02

0.05

0.69

0.02

0.02

0.02

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.05

0.61

0.06

0.02

0.01

0.01

0.01

0.06

0.01

0.01

0.07

0.01

0.07

0.01

0.01

0.01 0.01

0.01

0.02

0.01

0.02

0.19

0.59

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.02

0.01

0.02

0.01

0.06

0.01

0.02

0.01

0.61

0.07

0.01

0.01

0.02

0.02

0.05

0.01

0.01

0.01

0.01

0.05

0.03

0.01

0.01

0.09

0.02

0.06

0.60

0.01

0.01

0.01

0.05

0.01

0.01

0.01

0.01

0.03

0.01

0.03

0.07

0.03

0.01

0.02

0.04

0.08

0.59

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.70

0.15

0.01

0.01

0.01

0.02

0.13

0.72

0.01

0.02

0.01

0.02

0.01

0.70

0.01

0.02

0.04

0.01

0.01

0.71

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.07

0.03

0.02

0.03

0.02

0.01

0.01

0.02

0.01

0.01

0.04

0.04

0.02

0.01

0.01

0.03

0.05

0.01

0.01

0.01

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.68

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

(d) LLC

Figure 10: Confusion Matrices on MTARSI when using SIFT [26]+BOVW, HOG [27]+SVM,

urn a

ScSPM [28], an LLC [29]

5. Conclusion and Future Work The recognition of aircraft types from remote sensing images attracts much research interest. In this paper, we first review several data sets commonly used for aircraft recognition. We found that all of them either are not suitable for our domain or have significant shortcomings. This inspired us to develop and publish a new benchmark data set for aircraft type recognition of remote sensing

Jo

images, MTARSI.

We then reviewed the state-of-the-art on this domain and found that, as

a consequence from the lack of suitable benchmarks, most approaches were evaluated on different data sets under different experimental settings. It is thus

26

Journal Pre-proof

0.01

0.76

0.05

0.03

0.02

0.01

0.01

0.03

0.83

0.04

0.01

0.01

0.01

0.03

0.01

0.76

0.01

0.01

0.01

0.04

0.01

0.06

0.01

0.03

0.01

0.01

0.02

0.01

0.01

0.07

0.01

0.04

0.02

0.01

0.02

0.01

0.05

0.79

0.03

0.01

0.03

0.01

0.02

0.02

0.05

0.87

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.88

0.01

0.01

0.02

0.01

0.03

0.02

0.02

0.02

0.82

0.02

0.02

0.05

0.01

0.01

0.02

0.79

0.01

0.01

0.02

0.01

0.01

0.02

0.92

0.01

0.01

0.01

0.01

0.01

0.03

0.01

0.01

0.01

0.02

0.01

0.02 0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01 0.01

0.01

0.01

0.01 0.01

0.01

0.01

0.01

0.07 0.88

0.01

0.01

0.83

0.03

0.01

0.01

0.01

0.04

0.87

0.02

0.01

0.01

0.04

0.01 0.01

0.01

0.01

0.08

0.01

0.01

0.04

0.01

0.01

0.03

0.01

0.01

0.01

0.84

0.01

0.01

0.01

0.01

0.02 0.05

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01 0.01

0.01

0.01

0.01

0.01

0.01

0.06

0.04

0.01

0.01

0.01

0.01

0.01

0.01

0.84

0.02

0.85

0.07

0.01

0.01

0.01

0.09

0.86

0.01

0.01

0.01

0.04

0.89

0.01

0.03

0.03

0.02

0.90

(a) AlexNet

0.93

0.01

0.01 0.01

0.01 0.01

0.01

0.01

0.01

0.01

0.03

0.04

0.01 0.01

0.83

0.04

0.03

0.01

0.82

0.03

0.01

0.03

0.85

0.01

0.01

0.01

0.02

0.89

0.01

0.01

0.01

0.01

0.91

0.01

0.01

0.01

0.03

0.01

0.89

0.01

0.01

0.90

0.01

0.01

0.02

0.92

0.01

0.01

0.06

0.01

0.01

0.03

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.03

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.07

0.86

0.01

0.01

0.86

0.02

0.01

0.03

0.87

0.01

0.01

0.01 0.01 0.02

0.01

0.01

0.01

0.01

0.03

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01

0.01

0.01 0.01

0.01

0.05

0.01

0.01

0.01

0.01

0.03

0.01

0.88

0.02

0.01

0.06

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.03

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.05

0.01

0.01

0.01

0.01

0.01

0.01

0.01 0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.05

0.02

0.01

0.05

0.01

0.01

0.01

0.01 0.80

0.01

0.01

0.01

0.01

0.03 0.01

0.91

0.01

0.05

0.01

0.04

0.88

0.07

0.01

0.05

0.90

0.01

0.84

0.01 0.90

0.91 0.90

0.05

0.04

0.03

0.03

0.01

0.03

0.03

0.01

0.01

0.87

0.04

0.01

0.01

0.01

0.04

0.01

0.02

0.89

0.03

0.01

0.01

0.01

0.01

0.02

0.01

0.89

0.01

0.01

0.03

0.01

0.01

0.01

0.01

0.90

0.03

0.02

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.96 0.01

0.01

0.01

0.01

0.08

0.01

0.10

0.80

0.01

0.02

0.01

0.03

0.01

0.01

0.02

0.01

0.03

0.02

0.03

0.02

0.02

0.01 0.03

0.01

0.91

0.91

0.09

0.07

0.90

0.87

0.01

0.02 0.01 0.88

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.01

0.01

0.01

0.93

0.03

0.04

0.01

0.01

0.01

0.95

0.01

0.01

0.01

0.83

0.01

0.02

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.88

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.86

0.01

0.04

0.04

0.05

0.88

0.05

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01

0.02

0.03

0.01

0.02

0.04

0.02

0.03

0.03

0.01

0.01

0.01

0.01

0.01

0.01

0.02

0.03

0.04

0.03

0.02

0.01

0.01

0.86

0.04

0.03

0.01

0.01

0.01

0.02

0.79

0.04

0.02

0.01

0.01

0.03

0.04

0.82

0.02

0.01

0.01

0.03

0.03

0.88

0.02

0.02

0.01

0.02

0.01

0.01

0.90

0.01

0.01

0.03

0.04

0.01

0.01

0.02

0.83

0.02

0.02

0.03

0.02

0.05

0.02

0.02

0.01

0.83

0.01

0.02

0.01

0.01

0.03

0.90

0.01

0.02

0.01

0.05

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.88

0.08

0.10

0.86

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.01

0.01

0.01

0.01

0.01

0.04

0.90

0.01

0.02

0.01

0.01

0.01

0.86

0.01

0.01

0.01

0.02

0.89

0.01

0.01

0.02

0.01

0.01

0.01

0.01 0.01

0.01

0.04

0.01

0.02

0.02

0.01

0.01

0.88

0.07

0.07

0.89

0.01

0.01

0.01

0.91

0.01

0.01

0.05

0.01

0.88

0.92

0.01

0.02

0.93

0.01

0.01 0.01 0.01

0.01

0.01

0.01

0.01

0.90

0.01

0.01

0.01

0.01

0.01

0.01

0.86

0.01

0.01

0.01

0.01

0.02

0.01

0.02

0.87

0.01

0.01

0.02

0.01

0.01

0.03

0.88

0.01

0.01

0.05

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.89

0.01

0.01

0.03

0.01

0.01

0.89

0.01

0.01

0.02

0.04

0.01

0.01

0.90

0.01

0.01

0.01

0.03

0.02

0.01

0.01

0.01

0.01

0.01

0.01

0.01 0.01

0.01

0.01

0.02

0.03

0.01 0.01 0.89

0.08

0.07

0.90

0.01

0.03

0.01

0.01 0.03

0.89

0.01

0.01

0.89

0.01

0.01

0.90

0.04

0.01

0.01

0.01

0.01

0.91

0.01

0.04

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.03

0.01

0.02

0.01

0.02

0.01

0.01

0.01

0.01

0.01

0.02

0.01

0.01

0.01

0.01

0.89

0.01

0.01

0.90

0.08

0.02

0.05

0.91

0.01

0.02

0.01

0.02

0.05

0.01

0.01

0.01 0.02 0.89

0.02 0.91

(d) ResNet

0.01

0.88 0.05

0.01

0.93 0.82

0.09

0.01

0.04

0.01

0.04

0.86

0.06

0.01

0.01

0.01

0.83

0.07

0.01

0.02

0.86

0.03

0.02

0.06

0.88

0.01

0.03

0.01

0.01

0.01

0.04

0.01

0.05

0.03

0.01

0.01

0.01

0.01

0.08

0.03

0.01

0.01

0.01

0.01

0.01

0.04

0.01

0.05

0.01

0.01

0.01

0.01

0.05

0.91

0.01

0.01

0.01

0.86

0.01

0.02

0.86

0.02

0.05

0.93

0.01

0.94

0.05

0.01

0.06

0.90

0.03

0.04

0.01 0.89

0.01

0.03

0.93 0.03

0.05

0.01 0.01

0.01

0.01

0.01

0.01

0.01 0.01

0.01

0.07 0.01

0.02

0.04

0.01

0.01

0.01

0.01

0.01

0.03

0.01 0.04

0.02

0.03

0.01

0.01

0.02 0.04 0.91 0.87

0.07

0.06

0.93 0.87

0.01 0.90

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.01

0.01

urn a

0.90

0.01

0.78

(c) VGGNet

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01

0.01

0.89

re-

0.01

0.01

lP

0.86

0.01

(b) GoogLeNet

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

0.01

0.88

pro of

0.89

0.01

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.01

0.01

0.01

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

0.86

B1 B2 B29 BBo 52 ei ng C C 5 -1 C 7 -2 C 1 -1 C 30 -1 35 E3 F16 FKC 22 -1 0 U -2 A10 A26 P63 T6 T43

B-1 B-2 B-29 B-52 Boeing C-5 C-17 C-21 C-130 C-135 E-3 F-16 F-22 KC-10 U-2 A-10 A-26 P-63 T-6 T-43

(f) EfficientNet

Jo

(e) DenseNet

Figure 11: Confusion Matrices on MTARSI when using AlexNet, GoogLeNet, VGG, ResNet, DenseNet, and EfficientNet

27

Journal Pre-proof

Pose

Color

Time/light

Background

Model

re-

Resolution

pro of

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00

Figure 12: Experimental results of the state-of-the-art methods on the validation set

hard to fairly compare the results published in literature. In order to solve the problem, we make MTARSI available in the online

lP

repository [18]. MTARSI can be used to assess and evaluate the performance of aircraft type recognition algorithms for natural images. It also benefits the development of image processing, object recognition algorithms, and vision technology for remote sensing.

We then evaluate a set of representative aircraft type recognition approaches

urn a

with various experimental protocols on the new data set. We found that our data set allows to clearly distinguish the performance of different methods. We also report which features of the images contained in the data set cause trouble for which approach. Researchers using MTARSI will therefore have a sound baseline of results to compare with, as well as pointers for where to look if their experimental results leave room for improvements. In our future work, we will utilize MTARSI to develop better aircraft recog-

Jo

nition methods. We also will further expand to collect more abundant data based on MTARSI and take into account other object types of remote sensing images.

Acknowledgements. We acknowledge support from the Youth Project of 28

Journal Pre-proof

the Provincial Natural Science Foundation of Anhui 1908085QF285 and 1908085MF184, the Scientific Research and Development Fund of Hefei Uni-

pro of

versity 19ZR05ZDA, the Talent Research Fund Project of Hefei University 18-19RC26, the National Natural Science Foundation of China under grants 61673359 and 61672204, and the grant KJ2018A0555 of the key Scientific Research Foundation of Education Department of Anhui Province.

Disclosure. All authors declare that there are no conflicts of interests. All authors declare that they have no significant competing financial, professional or personal interests that might have influenced the performance or presentation of the work described in this manuscript. Declarations of interest: none. All

re-

authors approve the final article. References

[1] J. Chen, B. Zhang, C. Wang, Backscattering feature analysis and recog-

lP

nition of civilian aircraft in TerraSAR-X images, IEEE Geosci. Remote Sensing Lett. 12 (4) (2015) 796–800. doi:10.1109/LGRS.2014.2362845. [2] Y. Zhong, A. Ma, Y. soon Ong, Z. Zhu, L. Zhang, Computational intelligence in optical remote sensing image processing, Applied Soft Computing

urn a

64 (2018) 75–93. doi:10.1016/j.asoc.2017.11.045. [3] G. Liu, X. Sun, K. Fu, H. Wang, Aircraft recognition in highresolution satellite images using coarse-to-fine shape prior,

IEEE

Geoscience and Remote Sensing Letters 10 (3) (2013) 573–577. doi:10.1109/LGRS.2012.2214022. [4] A. Zhao, K. Fu, S. Wang, J. Zuo, Y. Zhang, Y. Hu, H. Wang, Aircraft recognition based on landmark detection in remote sensing images,

Jo

IEEE Geoscience and Remote Sensing Letters 14 (8) (2017) 1413–1417. doi:10.1109/LGRS.2017.2715858.

[5] G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, L. Zhang, X. Lu, Aid: A benchmark data set for performance evaluation of aerial scene classification, 29

Journal Pre-proof

IEEE Transactions on Geoscience and Remote Sensing 55 (7) (2017) 3965– 3981. doi:10.1109/TGRS.2017.2685945.

pro of

[6] K. Fu, W. Dai, Y. Zhang, Z. Wang, M. Yan, X. Sun, Multicam: Multiple class activation mapping for aircraft recognition in remote sensing images, Remote Sensing 11 (5) (2019) 544. doi:10.3390/rs11050544.

[7] Y. Yang, S. D. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in: D. Agrawal, P. Zhang, A. El Abbadi, M. F. Mokbel (Eds.), 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, November 3-5, 2010, San Jose, CA, USA, Proceedings, ACM, 2010, pp. 270–279.

re-

doi:10.1145/1869790.1869829.

[8] G. Cheng, J. Han, X. Lu, Remote sensing image scene classification: Benchmark and state of the art, Proceedings of the IEEE 105 (10) (2017) 1865–

lP

1883. doi:10.1109/JPROC.2017.2675998.

[9] W. Zhou, S. Newsam, C. Li, Z. Shao, Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval, ISPRS Journal of Photogrammetry and Remote Sensing 145 (2018) 197–209.

urn a

[10] S. Maji, E. Rahtu, J. Kannala, M. B. Blaschko, A. Vedaldi, Fine-grained visual classification of aircraft, CoRR abs/1306.5151. arXiv:1306.5151. URL http://arxiv.org/abs/1306.5151 [11] Q. Wu, H. Sun, X. Sun, D. Zhang, K. Fu, H. Wang, Aircraft recognition in high-resolution optical satellite remote sensing images, IEEE Geoscience and Remote Sensing Letters 12 (1) (2015) 112–116. doi:10.1109/LGRS.2014.2328358.

Jo

[12] J. Zuo, G. Xu, K. Fu, X. Sun, H. Sun, Aircraft type recognition based on segmentation with deep convolutional neural networks, IEEE Geoscience and Remote Sensing Letters 15 (2) (2018) 282–286. doi:10.1109/LGRS.2017.2786232. 30

Journal Pre-proof

[13] J.-W. Hsieh, J.-M. Chen, C.-H. Chuang, K.-C. Fan, Aircraft type recognition in satellite images, IEE Proceedings-Vision, Image and Signal Pro-

[14] C. Xu,

H. Duan,

pro of

cessing 152 (3) (2005) 307–315. Artificial bee colony (ABC) optimized edge

potential function (EPF) approach to target recognition for lowaltitude aircraft, Pattern Recognition Letters 31 (13) (2010) 1759–1772. doi:10.1016/j.patrec.2009.11.018.

[15] W. Diao, X. Sun, F. Dou, M. Yan, H. Wang, K. Fu, Object recognition in remote sensing images using sparse deep belief networks, Remote sensing

re-

letters 6 (10) (2015) 745–754.

[16] Y. Zhang, H. Sun, J. Zuo, H. Wang, G. Xu, X. Sun, Aircraft type recognition in remote sensing images based on feature learning with conditional generative adversarial networks, Remote Sensing 10 (7) (2018) 1123.

lP

doi:10.3390/rs10071123.

[17] T. Weise, X. Wang, Q. Qi, B. Li, K. Tang, Automatically discovering clusters of algorithm and problem instance behaviors as well as their causes from experimental data, algorithm setups, and instance features, Applied

urn a

Soft Computing 73 (2018) 366–382. doi:10.1016/j.asoc.2018.08.030. [18] Z. Wu, Multi-type Aircraft of Remote Sensing Images: MTARSI, Zenodo.org, 2019. doi:10.5281/zenodo.3464319. [19] M. Jian, Q. Qi, H. Yu, J. Dong, C. Cui, X. Nie, H. Zhang, Y. Yin, K.-M. Lam, The extended marine underwater environment database and baseline evaluations, Applied Soft Computing. [20] J. Deng, W. Dong, R. Socher, L. Li, K. Li, F. Li, Imagenet: A large-scale

Jo

hierarchical image database, in: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, IEEE Computer Society, 2009, pp. 248–255. doi:10.1109/CVPRW.2009.5206848. 31

Journal Pre-proof

[21] Y. Li, K. Fu, H. Sun, X. Sun, An aircraft detection framework based on reinforcement learning and convolutional neural networks in remote sensing

pro of

images, Remote Sensing 10 (2) (2018) 243. doi:10.3390/rs10020243. [22] Y. Xu, M. Zhu, P. Xin, S. Li, M. Qi, S. Ma, Rapid airplane detection in remote sensing images based on multilayer feature fusion in fully convolutional neural networks, Sensors 18 (7) (2018) 2335. doi:10.3390/s18072335. [23] G. Wang, X. Wang, B. Fan, C. Pan, Feature extraction by rotationinvariant matrix representation for object detection in aerial image, IEEE Geoscience and Remote Sensing Letters 14 (6) (2017) 851–855.

re-

doi:10.1109/LGRS.2017.2683495.

[24] F. Zhang, B. Du, L. Zhang, M. Xu, Weakly supervised learning based on coupled convolutional neural networks for aircraft detection, IEEE Transactions on Geoscience and Remote Sensing 54 (9) (2016) 5553–5563.

lP

doi:10.1109/TGRS.2016.2569141.

[25] Y. Tan, Q. Li, Y. Li, J. Tian, Aircraft detection in high-resolution SAR images based on a gradient textural saliency map, Sensors 15 (9) (2015) 23071–23094. doi:10.3390/s150923071.

urn a

[26] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. doi:10.1023/B:VISI.0000029664.99615.94. [27] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, IEEE Computer Society, 2005, pp. 886–893. doi:10.1109/CVPR.2005.177.

Jo

[28] J. Yang, K. Yu, Y. Gong, T. S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009),

32

Journal Pre-proof

20-25 June 2009, Miami, Florida, USA, IEEE Computer Society, 2009, pp. 1794–1801. doi:10.1109/CVPRW.2009.5206757.

pro of

[29] K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada, Curran Associates, Inc., 2009, pp. 2223–2231.

[30] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep

re-

convolutional neural networks, in: P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6,

lP

2012, Lake Tahoe, Nevada, United States, 2012, pp. 1106–1114. [31] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, arXiv preprint arXiv:1409.1556. [32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in:

urn a

IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society, 2015, pp. 1–9. doi:10.1109/CVPR.2015.7298594. [33] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer

Jo

Society, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. [34] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.

33

Journal Pre-proof

[35] M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, arXiv preprint arXiv:1905.11946.

pro of

[36] L. M. Dang, S. I. Hassan, I. Suhyeon, A. kumar Sangaiah, I. Mehmood, S. Rho, S. Seo, H. Moon, Uav based wilt detection system via convolutional neural networks, Sustainable Computing: Informatics and Systems.

[37] J. G. Ha, H. Moon, J. T. Kwak, S. I. Hassan, M. Dang, O. N. Lee, H. Y. Park, Deep convolutional neural network for classifying fusarium wilt of radish from unmanned aerial vehicles, Journal of Applied Remote Sensing 11 (4) (2017) 042621.

re-

[38] S. Chakraborty, M. Roy, A neural approach under transfer learning for domain adaptation in land-cover classification using twolevel cluster mapping, Applied Soft Computing 64 (2018) 508–525. doi:10.1016/j.asoc.2017.12.018.

lP

[39] Z. Chai, W. Song, H. Wang, F. Liu, A semi-supervised auto-encoder using label and sparse regularizations for classification, Applied Soft Computing 77 (2019) 205–217. doi:10.1016/j.asoc.2019.01.021. [40] X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, Z. Guo, Automatic ship

urn a

detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks, Remote Sensing 10 (1) (2018) 132. doi:10.3390/rs10010132. [41] K. Fu, Y. Li, H. Sun, X. Yang, G. Xu, Y. Li, X. Sun, A ship rotation detection model in remote sensing images based on feature fusion pyramid network and deep reinforcement learning, Remote Sensing 10 (12) (2018) 1922. doi:10.3390/rs10121922.

Jo

[42] J. C. Rangel, J. Mart´ınez-G´ omez, C. Romero-Gonz´ alez, I. Garc´ıa-Varea, M. Cazorla, Semi-supervised 3d object recognition through CNN labeling, Applied Soft Computing 65 (2018) 603–613. doi:10.1016/j.asoc.2018.02.005.

34

Journal Pre-proof

[43] Y. Chen, X. Yang, B. Zhong, S. Pan, D. Chen, H. Zhang, Cnntracker:

Online discriminative object tracking via deep convolu-

doi:10.1016/j.asoc.2015.06.048.

pro of

tional neural network, Applied Soft Computing 38 (2016) 1088–1098.

URL https://doi.org/10.1016/j.asoc.2015.06.048

[44] Z. Yan, M. Yan, H. Sun, K. Fu, J. Hong, J. Sun, Y. Zhang, X. Sun, Cloud and cloud shadow detection using multilevel feature fused segmentation network, IEEE Geoscience and Remote Sensing Letters 15 (10) (2018) 1600– 1604. doi:10.1109/LGRS.2018.2846802.

[45] X. Gao, X. Sun, Y. Zhang, M. Yan, G. Xu, H. Sun, J. Jiao, K. Fu, An end-

re-

to-end neural network for road extraction from remote sensing imagery by multiple feature pyramid network, IEEE Access 6 (2018) 39401–39414. doi:10.1109/ACCESS.2018.2856088.

[46] M. Larsson, Y. Zhang, F. Kahl, Robust abdominal organ segmentation

lP

using regional convolutional neural networks, Applied Soft Computing 70 (2018) 465–471. doi:10.1016/j.asoc.2018.05.038. [47] M. Mittal, L. M. Goyal, S. Kaur, I. Kaur, A. Verma, D. J. Hemanth, Deep learning based enhanced tumor segmentation approach

urn a

for MR brain images, Applied Soft Computing 78 (2019) 346–354. doi:10.1016/j.asoc.2019.02.036. [48] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ ar, C. L. Zitnick, Microsoft COCO: common objects in context, in: D. J. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, Vol. 8693 of Lecture Notes in Computer Science,

Jo

Springer, 2014, pp. 740–755. doi:10.1007/978-3-319-10602-1 48.

[49] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on knowledge and data engineering 22 (10) (2010) 1345–1359. doi:10.1109/TKDE.2009.191. 35

Journal Pre-proof

[50] J. R. Dim, T. Takamura, Alternative approach for satellite cloud classification: edge gradient application, Advances in Meteorology 2013.

Psychopictorics 10 (1) (1970) 15–19.

pro of

[51] J. M. Prewitt, Object enhancement and extraction, Picture processing and

[52] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in: K. A. Hua, Y. Rui, R. Steinmetz, A. Hanjalic, A. Natsev, W. Zhu (Eds.), Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, ACM,

re-

2014, pp. 675–678. doi:10.1145/2647868.2654889.

[53] G.-S. Xia, W. Yang, J. Delon, Y. Gousseau, H. Sun, H. Maˆıtre, Structural high-resolution satellite image indexing, in: ISPRS TC VII Symposium-100 Years ISPRS, Vol. 38, 2010, pp. 298–303.

lP

[54] Q. Zou, L. Ni, T. Zhang, Q. Wang, Deep learning based feature selection for remote sensing scene classification, IEEE Geoscience and Remote Sensing Letters 12 (11) (2015) 2321–2325. doi:10.1109/LGRS.2015.2475299. [55] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional net-

urn a

works, in: D. J. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (Eds.), Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, Vol. 8689 of Lecture Notes in Computer Science, Springer, 2014, pp. 818–833. doi:10.1007/978-3-319-

Jo

10590-1 53.

36

Journal Pre-proof

Highlights

pro of relP

3.

urn a

2.

We construct and present a diverse remote sensing image data set for aircraft type recognition. We present a comprehensive review of up-to-date algorithms for aircraft type recognition of remote sensing images. We provide the results obtained with state-of-the-art approaches in the field, to serve as baseline for comparison for researchers working on remote sensing computer vision.

Jo

1.

Journal Pre-proof

Declaration of Interest Statement Article Title:

A Benchmark Data Set for Aircraft Type Recognition from Remote

Sensing Images Authors:

Zhi-Ze Wu, Shou-Hong Wan, Xiao-Feng Wang, Ming Tan, Le Zou,

Xin-Lu Li, Yan Chen

pro of

All authors of this manuscript have directly participated in planning, execution, and/or analysis of this study.

The contents of this manuscript have not been copyrighted or published previously. The contents of this manuscript are not now under consideration for publication elsewhere.

The contents of this manuscript will not be copyrighted, submitted, or published

re-

elsewhere while acceptance by Applied Soft Computing.

There are no directly related manuscripts or abstracts, published or unpublished, by any authors of this manuscript.

lP

No financial support or incentive has been provided for this

manuscript (if financial support was provided, please specify below). I am one author signing on behalf of all co-authors of this manuscript, and attesting to

Signature: Date:

urn a

the above.

May 19, 2019

Printed Name:

Department of Computer Science and Technology, Hefei University

Jo

Institution:

Xiao-Feng Wang