Unsupervised anomaly detection in unmanned aerial vehicles

Unsupervised anomaly detection in unmanned aerial vehicles

Applied Soft Computing Journal 83 (2019) 105650 Contents lists available at ScienceDirect Applied Soft Computing Journal journal homepage: www.elsev...

3MB Sizes 1 Downloads 153 Views

Applied Soft Computing Journal 83 (2019) 105650

Contents lists available at ScienceDirect

Applied Soft Computing Journal journal homepage: www.elsevier.com/locate/asoc

Unsupervised anomaly detection in unmanned aerial vehicles ∗

Samir Khan a , , Chun Fui Liew a , Takehisa Yairi a , Richard McWilliam b a b

University of Tokyo, Department of Aeronautics and Astronautics, Tokyo, 113-8654, Japan Durham University, School of Engineering and Computing Sciences, Durham, DH1 3HP, UK

highlights • • • •

State of the art on the current trends for anomaly detection systems in UAVs. Importance of unsupervised anomaly detection in a multivariate time series. Application of the Isolation forest as an out-of-the-box approach. High level synthesis of the machine learning solution for rapid FPGA implementation.

article

info

Article history: Received 28 January 2019 Received in revised form 2 June 2019 Accepted 20 July 2019 Available online 25 July 2019 Keywords: System health monitioring Machine learning Isolation forest Fault diagnostics and isolation

a b s t r a c t A real-time anomaly detection solution indicates a continuous stream of operational and labelled data that must satisfy several resources and latency requirements. Traditional solutions to the problem rely heavily on well-defined features and prior supervised knowledge, where most techniques refer to hand-crafted rules derived from known conditions. While successful in controlled situations, these rules assume that good data is available for them to detect anomalies; indicating that these rules will fail to generalise beyond known scenarios. To investigate these issues, current literature is examined for solutions that can be used to detect known and unknown anomalous instances whilst functioning as an out-of-the-box approach for efficient decision-making. The applicability of the isolation forest is discussed for engineering applications using the Aero-Propulsion System Simulation dataset as a benchmark where it is shown to outperform other unsupervised distance-based approaches. Also, the authors have carried out real-time experiments on an unmanned aerial vehicle to highlight further applications of the method. Finally, some conclusions are drawn concerning its simplicity and robustness in handling diagnostic problems. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Continuous monitoring of health can help sustain asset reliability and reduce its maintenance costs. Here, the occurrence of anomalies during operation are an area of concern as considerable human expertise is often required to make sure that sensors are operating within their predefined tolerances. However, with the ever-increasing number of sensors, it is becoming difficult to maintain performance within statistical limits. Since anomalies are unpredictable, a priori knowledge models and rules may no longer remain valid; suggesting the need to generalise behaviours within multiple conditions. From a practical point-ofview, this becomes an issue when established individual component tolerances can no longer guarantee the intended system performance. The past three years have witnessed a proliferation of publications related to the application of machine learning based ∗ Corresponding author. E-mail address: [email protected] (S. Khan). https://doi.org/10.1016/j.asoc.2019.105650 1568-4946/© 2019 Elsevier B.V. All rights reserved.

approaches for complex engineering systems [1]. This growing interest stems from the combined research and development of efficient algorithms; availability of data; advances in highperformance computing; and the successes reported by industry, academia, and research communities alike. However, most recent developments have two major drawbacks: they are either computationally intensive during training/inference process; or because these techniques are usually optimised to detect normal system behaviours. This indicates that recognising anomalies is simply a consequence of a mismatched classification process. When taking such factors into account, the probable beneficial opportunities from existing approaches might not be enough to outweigh their limitations and expected application risks.1 This is because their detection capabilities are a by-product of an algorithm that was originally designed for a purpose other than 1 This can be attributed to the highly complex nature of modern systems and the current drive towards developing a more electric solutions. As data volumes keep increasing in size and complexity, there is a need to adapt and demonstrate intelligence in related domains.

2

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

anomaly detection (it was perhaps designed for classification, clustering, etc.). For engineering applications, this will inevitably cause problems since:

• Most detection algorithms are based on distance measurements — this can be unreliable in high dimensional spaces due to the so-called curse of dimensionality2 ; • These methods are not optimised to detect anomalies — as a consequence, they may under-perform by causing either too many false alarms or too few anomalies being detected [3]; • These methods are limited to small data sizes because of the legacy attached to their original algorithms. Typically, features will need to be extracted, or else PCA and autoencoders are used to reduce the dimensionality of the dataset [1]; • Realising real-time solutions is not straightforward due to the sum of computational complexities [4]. In the light of progress on the above, a notable development in the field has been the isolation forest algorithm which presumably does not require prior knowledge about the internal dynamics of the system. It operates on the principle that anomalous instances are limited and varied as compared to the rest of the data. This makes them more sensitive to the process of isolation, causing such instances to be confined closer to the root of a decision tree; whereas the normal data is more likely to be embedded much deeper in the tree. As it does not use any distance or density function, this unsupervised method can yield lower overall computational costs, linear time complexities and the ability to deal with large datasets. Surprisingly, the application of the isolation forest algorithm in engineering focused publications has been limited and hence the authors’ attempt to demonstrate its applicability for anomaly detection in real-time problems. 1.1. Contributions The article focuses on aspects of machine learning for anomaly detection in the aerospace domain. To summarise the main contributions, it provides:

• The literature background on the current trends for anomaly detection systems in the aerospace domain;

• An emphasis on the importance of unsupervised anomaly detection to detect unknown instances in a multivariate time series; as one of the most important properties of any detection system; • The use of isolation forest as an out-of-the-box approach that does not require expert knowledge for configuration; and yet can still address dimensionality and correlation challenges found in most anomaly detection systems. The data used for experiments is the known Turbofan engine degradation simulation dataset, followed by a case study from the data collected using a UAV during field experiments. The rest of the paper is organised as follows: Section 2 reviews the necessary literature and field developments in machine learning and its application for anomaly detection in UAVs. The section also details the research gaps and requirements within the domain. Section 3 describes the proposed approach for unsupervised anomaly detection using the isolation forest algorithm. Section 4 presents the experimental efforts carried out and discussions on the result. Finally, some conclusions are made from the preceding analysis. 2 This is because having a high level of correlation within variables can degrade the performance of anomaly detection algorithms. This can be observed when a small deviation within a cluster of correlated variables may generate an large anomalous spike; as compared to a larger deviation within an isolated variable. [2]

Fig. 1. Illustration of simple anomalies in two-dimensional space.

2. Literature background 2.1. Problem statement The advent of system health monitoring methods was realised to preserve system functionality within harsh operational environments. In the aerospace domain, achieving near-zero downtime whilst integrating more sophisticated health management systems stretches these challenges that affect cost, design periods, availability of experts, etc. No matter how well a maintenance system is designed, there are always deficiencies due to decisions and trade-offs which present an inherent weakness in systems which only become evident once it is in operation [1]. To address this challenge, many statistical and reliability models can be used for maintenance tasks that allow the periodic replacement of components, regardless of their state of health. Although this is useful, such practices are is not always cost effective [5]. Each time a serviceable component is replaced during scheduled maintenance, its remaining useful life is essentially wasted. This has served as a major motivation for the aerospace domain to move towards predictive maintenance strategies, which has a lot of opportunity for the application of machine learning techniques [6]. However, there are inevitable challenges associated with the availability of adequate data for constructing the prediction model. Since failure rates are relatively low, with a diverse number of signatures, training a model from a generic (and incomplete) dataset is not an effective approach. This paves the way for a more reasonable method by training the model from the operational sensor data attached to the system. As a result, anomaly detection approaches are more suitable for this type of monitoring tasks that focus on identifying deviations from the nominal response. There are a number of techniques for detecting and isolating such problems; most of which can be divided to rule-based methods [7], model-based [8]3 and data-driven [9]. In the case of the latter, the development of health monitoring solutions require the need to know what sensor data are best for extracting meaningful information about the system and how should they be classified. 2.2. The importance of anomaly detection Illustrated in Fig. 1, anomaly detection refers to finding patterns in data that do not conform to an expected behaviour. Its applications span across many domains, including credit card 3 These are powerful as they rely on a deep understanding of the system, and can benefit from established relationships. But they have trouble accounting for complexity, limited variables, which can make it expensive and time-consuming.

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

fraud detection, insurance, health care, food industry, intrusion detection for cyber-security, fault detection in safety-critical systems, and military surveillance of enemy activities [10]. Anomaly detection is an important concept because anomalies in a dataset can translate into significant (and often critical) actionable information; for example, a machine’s anomalous behaviour or inspecting time series data.4 However, a specific formulation of the anomaly detection problem is determined by such factors as the nature of the input data, the availability (or unavailability) of labels, and the constraints and requirements of the application domain. Designing the system can be a challenging endeavour because of the subjective boundaries between anomalies and normal data; indicating that, at times, normal observations could be considered as anomalies and vice-versa. Also, what is considered to be normal at the moment may not be normal in the future under the influence of the various factors. As a result, the classifier is expected to recognise these complex patterns to make intelligent decisions. Many techniques, e.g. neural networks, Oneclass SVMs, etc., can be utilised to generalise system models from available training data. These techniques can help recognise critical information about the system and can greatly benefit the construction of effective diagnostic solutions. 2.2.1. The limitations of many techniques Before the application of any learning algorithm, data normalisation and dimensionality reduction techniques are often used. The aim of these approximations is not only to reduce data redundancy but also to maintain much of the important information.5 While these techniques can prove to be useful for a highly correlated dataset, there is no way to determine if sufficient information, which will help isolate anomalies, is retained [12]. This issue becomes more noticeable when there are many large clusters of correlated variables in the data, which can make the impact of an isolated uncorrelated variable less significant. This makes it difficult to detect any anomalies that are associated only with such isolated variables. Another limitation is associated with several assumptions that are made during algorithm design. E.g., assume the following dataset x1 , x2 , . . . xn where each xi ∈ R. If sensor parameters are being monitored (e.g., position, velocity, telemetry, etc.), the process will be a mixture of nominal and anomalous points, that are generated by different generative processes. An important consideration here is the well-defined anomaly distribution assumption i.e., the anomalies are drawn from a well-defined (and known) probability distribution, such as repeated instances of known machine failures. However, such an assumption can be risky as it:

• Undermines adversarial situations6 ; • Ignores the diverse set of potential causes for unknown failure modes;

• Does not consider that a user’s notion of anomaly might change with time, e.g., maybe the analyst wanted to see failure data points at first, but later wanted to concentrate on other aspects of the system. So, how do these assumptions hinder the process? Let us divide the problem into two cases. Both of which depend on one parameter, α - the fraction of training points that are anomalies 4 Within statistical literature, popular techniques for anomaly detection include autoregressive moving average (ARMA), vector auto regression (VARMA), exponentially weighted moving average, etc. 5 This is usually defined in terms of the percentage of explained variance [11]. 6 Since training and test data are usually assumed to be generated from the same distribution, they will not be able to account for intelligent and adaptive adversaries.

3

within the dataset: in the first case, α is large, e.g., >5% in the data, where the data can be modelled as a distribution of nominal points. In the second case, α is small, e.g., <1% in the dataset. This now becomes more like an outlier detection problem and literature has many efficient techniques to address it. However, the problem with anomaly detection (unlike other domains like natural language processing or computer vision), is that most methods are only evaluated with limited datasets, which are often proprietary. So a research gap within the community is the requirement of a large (and growing) collection of public anomaly detection benchmarks — in order to do a thorough evaluation of an algorithm. These can perhaps benchmark according to a number of categories like dataset size/length, dimensions, features, etc., and then systematically varying α to study the effects on the results.7 Furthermore, even though several unsupervised techniques have been proposed in literature, their performance depends a lot on the data and application they are being used in. This indicates that most of these methods have little systematic advantages over the other when compared across many other datasets. The authors have listed some techniques, that have appeared in use for anomaly detection in aerospace-related applications, in Table 1. A detailed description and evaluation of each of these algorithms is beyond the scope of this paper and the reader is referred to notable surveys that address these topics in much detail [13– 16]. It evident that most existing work employs clustering that is either distance-based8 or density-based.9 The problem when applying some of these techniques is the need to define the parameters related to the data observations beforehand such as defining a similarity matrix or the number of clusters that should exist in the data. This, therefore, becomes the responsibility of the designer on what these parameters should be used, even if the data has a random structure.10 2.3. Recent developments in unmanned aerial vehicles Several techniques exist that can identify anomalous patterns in time series data during operation. The authors present a brief review of the most common ones used in UAV-related anomaly detection. This is to identify whether the isolation forest technique (which is used in this article) would be a useful contribution to the community. The application of UAVs has gained significant momentum across many commercial domains such as for real-time monitoring, remote sensing, inspecting disaster area, delivery of goods, surveillance, agriculture and several other disciplines [44]. This is largely due to their ease of development, low costs, flexibility and recent improvement in control. However, this has also caused a surge in terms of complexity and autonomy. Coupled with the increasing use of these assets within extreme conditions has had 7 It should be noted that most anomaly detection algorithms approximate density estimations where they are trying to model the probability density of the nominal points and score points according to lower density points — that are the anomalies. If these anomalies are clustered together to increase their density, then it is likely that these algorithms will miss them. 8 Distance-based approaches determine the distances between an object and its nearest neighbours, which are then used to estimate anomalous data points e.g., based on the euclidean distance. 9 Density-based approaches measure the density around an anomaly and its neighbours. The assumption is that the density of the neighbourhood will be correlated with that of its neighbour’s neighbourhood. If there are any significant differences between the calculated densities, then the data point can be considered as an anomaly. 10 Since safety-critical systems can be highly non-linear, the data will be nonstationary and will depend on the mission at hand. This makes it difficult for operators to take all factors into account with increasing data volumes and complexity.

4

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Table 1 Popular approaches found in aerospace literature for anomaly detection. Description Clustering-based

Classification-based

Spectral-based

Model-based

Statistics-based

K-means [17,18]: is used to divide a group of data points into hard clusters. It assumes a balanced cluster size, the joint distribution of features with equal variance, and independent features with similar cluster density. Determining the optimal K can be difficult, but for small values, it is computationally fast and efficient. Gaussian mixture model [19,20]: uses a Gaussian distribution-based parametric model to identify the underlying populations. These can be explained by a normal distribution in the midst of many heterogeneous populations. However, in many practical situations, the data distribution may not have any explicit clusters. As a result, each point can be assigned with different weights or probabilities to soft clusters. Neural network [21,22]: is a collection of neurons that learn to approximate any mapping between two or more related things, given enough depth of layers and number of neurons. They are easy to generalise and can have nonlinear representations in which normal data and anomalies are well separated. Random forest [23]: operates by constructing a multitude of decision trees at training time and outputting the class that is either the mode of the classes (classification) or the mean prediction (regression) of the individual trees. It is invariant to monotonic transformations of individual features i.e. per feature scaling will not change anything as they are never compared in magnitude to each other. SVM [24,25]: is a non-parametric classification method that transforms the data to a higher dimensional space and builds a hyperplane decision boundary. It assumes that the training points belong to one class and all non-training points belong to another. However, they do not directly provide probability estimates and hence these are calculated using an expensive five-fold cross-validation. KNN [26,27]: assigns data points according to the majority of its nearest neighbours to find anomalous data points by measuring the local deviation. A choice needs to be made on the value of K to avoid overfitting/underfitting issues. PCA [28]: PCA-based anomaly detection analyses available features to determine what constitutes a normal class and applies distance metrics to identify cases that represent anomalies. This allows to train a model using existing imbalanced data. Autoencoders [29]: are a neural network that attempts to reconstruct its input. It can serve as a form of feature extraction to produce a compressed representation of its input at the encoder. This representation can be mapped to its original form using a decoder. Anomaly detection uses the reconstruction error to measure how well the decoder is performing. Residual particle filter estimation [30,31]: based on Bayesian estimation theory, it allows for representation and management of uncertainty in a computationally efficient manner. The resulting anomaly detection routine declares a fault only when a specified confidence level is reached at a given false alarm rate. The underlying principle of the methodology is the approximation of relevant distributions with particles (samples from the space of the unknowns) and their associated weights. Compared to classical Monte Carlo methods, sequential importance sampling enables particle filtering to reduce the number of samples required to approximate the distributions with the necessary precision and makes it a faster and more computationally efficient approach. Unscented Kalman filter estimation [32,33]: can perform as a robust, real-time outlier detector. E.g., a system can estimate its position relative to its close environment, and at the same time build a map of this environment without knowing the exact position. This problem (called the simultaneous localisation and mapping) is a typical example use case of the Kalman filter. For time series data, it can be used to learn and predict; although this can be complex to implement in practice. As a result, many practitioners choose to split the learning/estimation step from prediction/inference step. Autoregressive models [34]: are time series models that use observations from previous time steps as input to a regression equation to predict the value at the next time step. It is easy to develop complicated models which are fast to compute. However, since the series explains itself, it is difficult to assess structural changes unlike with explanatory variables. Recursive least squares estimation [35]: is the recursive application of the well-known least squares (LS) regression algorithm so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearised) correlation. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations and a set of dependent (observed) variables. Bayesian statistics [36,37]: are good at classification based on observations. The network can model relations between events in the present situation, its symptoms and potential future effects. The model would then be able to classify the present situation and hence predict future events with a probability. This requires specifying priors and posteriors; however, defining these can become difficult, making the model computationally infeasible. Mahalanobis calculations [38–40]: it is a measure of the distance between a data point and a distribution, taking into account the correlation of the dataset. It is a multi-dimensional generalisation of the idea of measuring how many standard deviations away the data point is from the mean of the distribution. Kernel density estimation [41]: uses kernels to estimates the unknown probability distribution of a random variable, based on a sample of points taken from that distribution. A Gaussian bell curve is typically used. The smoother the curve (for the Gaussian this means higher variance), the more spread out the evidence. Local outlier factor [42,43]: it seems similar to non-parametric method, such as kernel density estimation, as it finds anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. Although it is easily generalised, it can be difficult to interpret, e.g., a value of 1 (or even less indicates) is a clear inlier, but there is no clear rule for when a point is an outlier.

several economic and liability consequences [45]. Mostly, issues arise due to sensor faults, battery problems and faulty motors; many of which can be addressed by applying simple statistical and rule-based methods, e.g., implementing tolerance levels on battery discharges, and monitoring the vibrations and correlations within the sensor data. Another common problem is when

the propellers experience degradation. These components are key of a UAVs flight control system, and hence measures should be placed to ensure that faults are detected in a manner that ensures safety. While many traditional linear, time-invariant solutions can be applied in basic situations, these might not be able to account

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

for dynamically changing ones — where most UAVs operate.11 It is therefore in these dynamic situations that a non-linear, time-variant process must be studied.12 Also, typical data-driven methods used for fault analysis need a lot of field data to carry out an extensive analysis of the results in different situations. The problem is that these datasets can become so large and complex that most of the traditional methods can become inadequate to deal with them directly — something which was highlighted in the discussion at the outset. As a consequence, their dimensions would often need to remove redundant data (that has no variation across the dataset) by making use of methods like PCA or autoencoders. UAVs often require various levels of anomaly assessments. Designers would build detection algorithms that aim to extract all the anomalous information from sensors, previous knowledge and maintenance requirements as a collective effort. A number of model-based approaches, such as Particle Filters, Extended Kalman Filter variants, etc., have appeared [47]. However, most of these methods rely on accurately estimating state residuals for detecting any changes in the system response, and on obtaining the exact physical models. On the data-driven front, UAV related literature largely makes use of distance-based techniques to detect anomalies [17,18,24,25,38–40,42,43]; in which approaches based on filter estimations and Mahalanobis distance seem to be even more popular. These algorithms use variations of the sensor readings rather than their absolute values by computing the Mahalanobis distance between the mean of the distribution (of nominal points) and the new sensor reading vector (in the units of standard deviations) of the distribution [38]. When a data point is too far from the normal data, then it is considered anomalous. Such methods consist of determining the thresholds of distances below which a data point is still valid and above of what is considered to be an anomaly. Furthermore, predefining anomalous rules are also be used to detect sensor faults [30]. However, their performance seems to be limited in terms of the resulting false alarm rates and slow processing speeds due to the high computational loads. Since the rules are fixed, this would indicate limitations towards detecting unknown faults and adversarial situations. As a result, event though datadriven based methods have provided a good alternative to detect anomalies and then classifying them, there still exist limitations in terms of performance and implementation requirements. In literature, Lin et al. developed a method to detect sensor faults in a UAV by monitoring a number of internal and external sensor readings [39]. Again, the Mahalanobis distance was used in the calculations. However, the problem with such setups is that data measurements should be within a certain threshold while being processed offline. Practical validation of such techniques has also been questionable as it has not been verified within the field. Khalastchi et al. proposed the use of Mahalanobis distance to evaluate correlated attributes within a data set collected from real-time in-flight navigation sensor parameters [40]. The authors advocated that the faults could be effectively detected based on a threshold determined directly from the distance residual size; where a warning is triggered if the distance residual exceeds a specified threshold. However, even though this may seem to be an effective approach, some of the assumptions made are too ideal and hence it is difficult to realise in practice. What is noted is that many of the highlighted techniques depend on realising the model requirements prior to implementation [28,33,48]. Others seem to be computationally heavy or 11 The system will have to fly with strong winds, temperature fluctuations, component degradation, corrosion, body vibrations, etc. 12 Neural networks are able to handle such problems as they are non-linear approximators [46].

5

are designed solely for supervised cases in mind. As a result, the authors surveyed a number of techniques which would not require detailed knowledge about the internal dynamics of the system and there is evidence which suggests that the isolation forest has better potential in terms of fast response times with an added flexibility of making decisions based on statistical information [3,49]. Since its application in this domain could not be found, it provided a unique opportunity to contribute in this area. 2.4. Current challenges There are several challenges in the domain, including gaining access to data that can reasonably predict situations, which may require to dynamically enable information and the scale and speed required to analyse the data. This requires some means of high-performance computing. E.g., if a problem is detected, then processing needs to be quick enough to shift to systems that are unaffected by the problem. This also requires near realtime classification of potential issues, their locations and the systems affected. Especially for real-world applications, there is a big difference between the sort of detection that is needed and the real world equivalent that actually meets the hardware budget and time constraints. Machine learning can be used to circumvent some of these issues. But regardless of the problem, an effective anomaly detection solution must suffice the following requirements:

• Availability of data: A very common challenge is the acquisition of relevant data. This is also a limitation as to the availability, quality, and composition (e.g. are the meta-data included? or are data labelled?) of the data at hand have a strong influence on the performance of machine learning algorithms. A common challenge is the high-dimensionality of the available data which can contain irrelevant and redundant information that can impact the solutions’ performance. Furthermore, acquiring anomalous training data can be difficult; and even if it is obtained, it is highly likely that it may not account for all deviations during operation; • Unsupervised learning: Since there is a large amount of heterogeneous data, it can become impractical to label anomalies or unknown events within the data. Furthermore, prediction of faults, failure, degradation, etc., requires machine learning algorithms that can predict and provide an informative trend on an events’ propagation through the system. This can be done by recording large volumes of data over long periods of time to build a history and tracing back through a fault to find the root cause or even early warning signs. Unsupervised machine learning methods such as variational autoencoders can be combined with time-frequency representations of the data, such as Fourier series, to learn distribution patterns. A model can then be built to detect and act on these problems before a fault occurs [50]. There is also a need to blend in practical experience with the other diagnostic and prognostic tools and techniques [51]; • Selection of algorithm: this influences the ability to classify various faults and failure modes using machine learning techniques. Since faults are rare, data is expensive and often disparate. There is a need for fusing information from many different aspects, including environmental, history, hardware, software, and performance to indicate precursors and root causes. As data is produced in high volumes, scale and scope, this requires methods that can automate fault classification. There also seem to be a need to carry out this process in − situ, where both labelled and unlabelled fault data can be used;

6

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

• Process uncertainty: Since fault data is rare and expensive to label, there is a need to measure the accuracy of the results. Depending on the distribution/variance, the computational expense will be large. This demonstrates the need for investigating statistical approaches such as Naïve Bayesian to gain insight on ambiguities involved with the process. There will always be some uncertainty in the trained models themselves, uncertainty in the subsystem, requirement definitions and uncertainties in the environment. Further, it is important to identify the various types and deal with its effects individually; • Real-time solution: The system must be able to detect anomalies and learn new patterns in real-time. This indicates the raising computational complexity of these problems as it seems that there is no systematic approach to solve this. Learning and correlating logged data is a largescale off-line data analytics and adding any on-line data processing opens up yet another set of algorithmic and architectural challenges; • Evaluation criteria: Given the above challenges, it is important to evaluate the solution to measurable performance metrics such as the runtime performance, detection rate, scalability, etc. This raises an important question about how to carry out quality assurance of AI solutions where factors such as data quality, features and the algorithm pipeline/infrastructure often vary. 3. An unsupervised approach 3.1. Overview A popular supervised learning method is the decision tree, which looks to model higher order interactions. Over the years, it has been adapted to other more powerful algorithms – random forest and isolation forest – both of which fall under the category of ensemble methods.13 The former generates a multitude of decision trees on different subsets of the feature space and the training set. It averages the results from each tree and votes for the highest prediction. This substantially reduce instability in the predictor. The latter technique is a further development that processes data in high dimension spaces by constructing a fully random binary tree and partitioning the entire input space into patterns; with the assumption that if there is any unusual behaviour, it will fall into these partitioned spaces. The isolation forest algorithm assumes that anomalous data should, in general, be significantly easier to isolate in comparison to normal data. This is because the dataset is split recursively, until each data point becomes its own leaf within the tree. The statistic of interest here is the depth of a data point within this tree when it becomes isolated (this is the number of iterations needs to reach to a sample from the input data). This process is repeated multiple times to construct diverse decisions trees,14 with the intuition that if there is any anomaly, then only some splits will be needed until it becomes isolated.15 Since it is assumed that anomalies are rare, or α is small, it is likely that the distance taken to reach the anomaly is far less than that for the normal data. Finally, the distance of the path is averaged cross the depth in all trees and then normalised to calculate the anomaly score. Data instances with small average tree depth have a higher anomaly score because they need less splits to be separated; indicating that higher scores will be considered as anomalies. 13 This means that both these methods use a number of weak classifiers to produce a strong classifier, e.g., a simple decision tree is a weak classifier, but stacking them together to make a forest will produce a much robust solution. 14 This can be seen on an conceptual level as the exploration of random local sub-spaces. 15 Like any other algorithm, if the anomaly is deeply embedded in the dataset, then many splits will be needed to isolate it.

Fig. 2. An isolation tree. The internal nodes in the tree illustrate the attributes and split values. The end nodes illustrate the data. The random partitioning procedure is represented by an ensemble of trees. In this illustration, data near the root of the tree is easier to isolate and hence is more likely to be an anomaly, as compared to the data that is embedded far away from the root and hence is more likely to be classified as normal data.

A tree in an isolation forest is illustrated in Fig. 2. During training, the data moves across each isolation tree (of the forest) until it approaches a terminal node or until the maximum depth is reached. The choice of depth affects the granularity of the anomaly detection scheme.16 The tuning parameters of the algorithm include the subsample size and the number of trees. The maximum depth can also be included in the analysis. Fig. 3 outlines the algorithm being implemented to calculate the scores over time. The setup has two phases: the off-line phase, which generates a forest with all the necessary parameters required during the online phase. This is part of the variable initialisation during the second phase. The output includes the tree.size which is known beforehand, the tree.node which is either 1 for internal nodes and a 0 for external nodes. The tree.height which is the height of the node, the tree.splitattribute which is the splitting attribute used by the tree, the tree.splitpoint is the splitting point, the tree.leftchild is the left child tree which also forms the half space tree model and finally, the tree.rightchild is the right child tree which forms the second half of the space tree model. During the online phase, a data window is used to pass through the data samples.17 This global window is defined according to the number of data points required to process that chunk of data. This can be used to process multivariate data as well. The rest of the computation is the standard algorithm to produce the scores. It should be noted that although the algorithm works well with high dimensional data, it is still useful to perform a correlation analysis and group the data accordingly before the anomaly detection process. This is a type of feature selection process which measures the relationships between variables and grouping them together depending on their correlation coefficients. The reason for doing this is two-fold: firstly, it can be used to isolate the group of variables that are causing the anomaly; which can be useful when carrying out any kind of fault diagnosis. Secondly, the computed anomaly scores are not biased towards highly 16 The assumption is that for anomalous points will traverse a shorter path on the forest as compared to the normal points. As a result, the end result produces anomaly scores as a function of the normalised value of the average among all trees. 17 The window is updated depending on the sample rate as new data is streaming in.

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

correlated features [49]. This will avoid the algorithm to give a higher score to anomalies that appear in correlated features, as compared to ones that were not. 3.2. Computing the scores and likelihood

h(x)

(− c(n) )

(1)

where s(x, n) is the score, h(x) is the mean path length in a forest and c(n) is the average number of steps required to isolate a sample of the dataset. When h(x) approaches n − 1, the score will approach 0. This indicates a normal condition. However, if h(x) approaches 0, the score will approach 1, indicating an anomaly. Again, the benefit of employing such a method helps avoid using density or distance-based estimations that may require significant computational costs depending on the size of data. Furthermore, a threshold can now be applied to define the scores as an anomaly. After successfully calculating the scores, there is a need to investigate the underlying implications of collective anomalies as an indication of a fault. This is done by making use of a sliding window that calculates the moving mean, µt , and variance, σt2 of the scores within that window: j−1

µt = σt2 =

Σ0 st −i

(2)

j j−1 Σ0 (st −i

− µt )2

j−1

(3)

For anomaly detection purposes, this is followed by computing the likelihood, ALt , as the complement of the Gaussian tail probability, or Q-function [52]: ALt = 1 − Q (

µˆt − µt ) σt

the Cauchy distributions, are much wider; indicating that there is a relatively high probability of anomalous events.18 Finally, the overall quality of any anomaly detector depends heavily upon the quality of the underlying model it is representing. 3.3. Simulation results

In the original publication, the authors had defined an anomaly scoring strategy that quantitatively indexes all observations by [3]: s(x, n) = 2

7

(4)

where µˆt is the median. It should be noted that the decision for choosing the median over mean, or vice versa, can be further scrutinised for better performance depending on the application [50]. However, for the analysis carried out in this article, the authors have opted to work with the median. By placing a threshold on ALt , it can be used to report a fault incident. Eq. (4) is applied to a distribution of predicted errors to serve as an estimate on how accurate the model is able to pick up on collective anomalies, in context to the history within the assigned window. In probable situations, ALt will almost always behave like the original scores calculated by the isolation forest; as the distribution of results will have smaller variances and be centred near 0. When there are sudden jumps within the scores, it will result in an anomaly in ALt . E.g., in noisy sensor conditions, data variances will be a lot larger as compared to the ideal case. This will cause the overall mean to be pushed away further from 0. As a result, a single anomaly in the score will not cause an increase in ALt but successive anomalies will. The question is how should this threshold value be decided? It is important to get this right because if this threshold is closer to 0, it would be too many alerts, imposing an upper bound on the number of false positives. Under the assumption that anomalies themselves are also extremely rare, the ratio of true positives to false positives will always need to be in a healthy range to remove trivial anomalies. Therefore, putting a threshold somewhere further away from the mean, which will act as an inherent upper limit on the number of alerts, e.g., at 80%–90%. Of course, this will require some tuning as some distributions, like a Gaussian, will have tighter tails. Other distributions, like

Many practitioners are currently investigating effective ways that can help benchmark unsupervised anomaly detection algorithms. This plays an important role when commercialising any technological developments before allowing them to be put into operation.19 With the rise in the use of black-box models, there is a need for structured methodologies and techniques. The following methods are included in the analysis:

• • • •

Isolation forest One-class SVM Gaussian mixture model20 Mahalanobis distance

As multi-dimensional datasets are of more interest in many practical situations, the authors have selected the dataset provided by C-MAPSST (Commercial Modular Aero-Propulsion System Simulation). It accounts for many sensor outputs from aircraft engines throughout their usage cycle for 100 units. The data is divided into a training dataset, which has trajectories that end at the cycle in which the failure occurs for each engine and the test dataset which also has trajectories but end at the cycle prior to the failure. In addition, the data is labelled as normal if there are >50 cycles left before failure. The sensor data can be visualised in Fig. 4a. This data is normalised before applying PCA. As a result, only the first two components are used instead of the total 14.21 Fig. 4b demonstrates that a separation between anomalous and normal data points is limited in this application and that attempting to use a purely distance-based detection is likely to have low accuracy. After computing contours for one class SVM, GMM and Mahalanobis distance, the anomaly detection results are compared with test data of 10 units. The test dataset has a total of 510 anomalies. In this process, the aim is to detection the 1% of anomalies in the test dataset before unit failure. The condition of the engine is shown by changing colours for normal and anomalous data points. As demonstrated in Fig. 5a, the contours defined by One-class SVM, GMM and Mahalanobis distance are useful tools as anomaly detectors before a failure takes place. Fig. 5b illustrates how the isolation forest algorithm approaches the problem of isolating anomalies from the rest of the data. As a result, in Fig. 6, instances with <50 cycles in the test dataset were better detected. Further details of the experiment are outlined in Table 2. 4. Case study In this section, the UAV system under consideration and the implementation of the algorithm are discussed. The goal is to classify normal and anomalous data instances, where the anomalous instance indicates a potential fault. Since not all anomalies are considered as faults, the goal is not to classify the type of faults; but rather to detect whether an issue was present at a given point in time in the multivariate time series. 18 In fact, the Cauchy distribution has tails that can be so wide that it may not even have a well-defined mean; only a median [53]. 19 If a machine learning solution cannot be benchmarked appropriately, it will not be approved by any legislation and hence cannot be used for any safety critical application. 20 Based on Akaike’s information criterion, the optimum number of clusters were calculated to be 3. 21 It was noted that the first singular values already explained more than 79% of the variance in the first 2 components of the data.

8

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Fig. 3. The algorithm being implemented in real-time. The forest is pre-computed and used during the online phase to calculate the scores of the data in a sliding window. Note that 0.577 is the Euler–Mascheroni constant.

Fig. 4. The datasets used for benchmarking.

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Fig. 5. Detecting anomalies in the data.

Fig. 6. The result from the Isolation forest algorithm.

9

10

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Table 2 Evaluating algorithm performance. Method

True positive

True negative

False positive

False negative

Accuracy %

One class SVM Isolation forest GMM Mahalanobis distance

1463 1482 1350 1326

126 426 121 129

384 144 389 381

288 84 276 300

68.5 89.3 68.9 68.1

Fig. 7. A custom-built quadrotor UAV used for data collection. Table 3 Characteristics and explanation of the collected quadrotor data. Name

Description

accSmooth gyroADC magAD

X, Y, and Z accelerometer readings ROLL, PITCH, and YAW gyroscope readings in rad/s unit X, Y, and Z magnetometer readings

4.1. The UAV system The system under consideration is shown in Fig. 7. It is a custom-built quadrotor UAV: manufactured with aluminium parts, a 5000 mAh Lithium-polymer battery, and four propulsion units with 1260 kV outrunner brushless motors. It weighs approximately 1 kg and has a compact size (length from centres of two diagonal motors is 39 cm). It is equipped with a self-developed Flight Controller (FC), which collects all sensor information onboard, performs simple PID control and outputs the control signals to the propulsion units. Currently, the FC collects raw data from a gyroscope (3-axis), an accelerometer (3-axis), a magnetometer (3-axis), a barometer and a GPS sensor. In addition, the FC is capable of logging the battery voltage and current levels being consumed, the motor commands and the ground station commands from the human operator. By integrating such navigation sensors onboard the UAV allows the system to be controlled and operated with robustness and reliability. This also provides the opportunity to make use of this information to effectively detect anomalies during operation and hence ensure that the UAV flight is safe to fly. The quadrotor was operated for about 6 min within a controlled indoor environment. The data collected reflected ideal conditions (absence of wind and relatively stable temperature) for its successful operation. A smoothing method was used to remove any outliers in the data, in addition to noise. The sample rate was set at 200 Hz. 4.2. Experimental work Data requirements: The first step is data acquisition. A number of sensors have been placed over the UAV and data is recorded

to an onboard data logger. This can now be analysed off-line for various data-driven modelling and processing. As raw data is often prone to a lot of noise and missing values, it is put through a preprocessing stage to filter and smooth out some of the information; also avoiding any sudden behavioural patterns. Since the recorded data is multi-dimensional, it is not ideal to analyse it directly in its current form. This is because some of the variables: (i) might not be required for health assessments and will lead to extra computational issues, (ii) can have low variability or remain constant throughout the operation, or (iii) they cannot be used for health monitoring. Here is where feature extraction and selection becomes a central part of the whole process. The former reduces the number of features that are available; and the latter is representative of features that can be selected (or enhanced) for anomaly detection, failure diagnosis and prognosis.22 A breakdown of the selected variables used in given in Table 3. Anomaly types: There are a number of problem sources that can take place during the UAV operation; i.e., something that is not expected to happen frequently, or does not follow a certain expected pattern or simply it is not what was expected. These can provide early indications to intermittent faults, electronic failures and degradation. Intermittent faults are temporary issues that result in momentary erroneous behaviour. These are often a result of magnetic/electrical interference caused by the operating environment. Electronic failures are can be attributed to on-board components, often caused by sensors, control units and batteries. Degradation can develop over time and cause structural issues in the UAV system. These can encompass problems such as mount dislodging, uncontrollable vibrations and even bearing failures in the UAV’s electric motors. Many of these problems can be studied simply with statistical analysis and rule-based strategies, e.g. using min/max, mean, standard deviation and correlations in the data to detect any anomalies. Some of the rules that can be used for detecting electronic failures, in particular, are to focus on the operating range of components such as monitoring the battery current, GPS signal strengths, correlations between actual and desired outputs, etc. However, some failures (including incipient ones) can be difficult to classify and are depended on other sensors. This can be exacerbated by uncertainties in positional estimates, sensor noise and the flight velocity. As a result, anomalies are introduced by loosening the UAV sensors to replicate impending component failures or intermittent wiring connections during flight. Evaluation: An excerpt of two variables within the data is illustrated in Fig. 8. For evaluation purposes, a couple of synthetic anomalies were added at t = 5000 and t = 10000 by using a process described by Kriegel et al. [2]. It generates anomalies based on a number of Gaussian distributions of different random mean and variance [2].23 The isolation forest was trained with a 100 trees.24 Fig. 9 illustrate the results, including the distribution of scores assigned to anomalous and non-anomalous observations. Most of the normal observations are scored in between 0.6 and 0.7, Whereas, distribution of the anomalous observations are spread 22 Especially for prognosis, the health index is usually extracted and fused together with many other constituents to indicate the system degradation and health state. 23 Using a labelled dataset to compare an unsupervised algorithm might be somewhat counterintuitive, but the idea here is to only use the labels to compare relative model performance. 24 The original paper also suggested to use 100 trees because the path lengths would have already converged before that [3].

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

11

Fig. 8. Evaluating the detection performance.

Fig. 9. The real-time anomaly likelihood results generated after windowing the isolation forest.

Fig. 10. The data cluster labelled after calculating AL window = 200.

>0.75.25 However, it is still difficult to delineate it. In contrast, the anomaly likelihood result computed via Eq. (4), helps to further distinguish the result in Fig. 9b. This approach is useful not only to labelled data but also to trigger an alarm each time this likelihood crosses the predefined threshold e.g., scores ≥ 95th percentile. The final result if illustrated in Fig. 10. The follow-up analysis consists of all 9 variables that represent the system behaviour during UAV take-off and hovering. The 25 Liu and Ting (2008) had suggested that for the normal observations, the algorithm has a score of close to zero and for anomalous observations, it has a score between 0.5 and 1 [3].

results reveal that a number of anomalies take place at different times instances in Fig. 11a.26 Many of these alarms are a result of intermittent faults brought about due to loose sensor connections. Points tagged as anomalous can be visualised in Fig. 11b which illustrates the PCA result with 3 principal components. From this analysis, there are two instances which warrant further investigation i.e. alarms raised between 189 s – 206 s and 325 s – 345 s. At the moment, there is no way to determine why an instance is being considered anomalous just by observing the 26 The anomaly likelihood threshold has been set up at ≥ 95 percentile. With a more relaxed threshold, more anomalies can be detected.

12

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Fig. 11. The results from the anomaly detection setup.

Fig. 12. Using the Violin plot to investigate the root cause. The width of the plots shows the density of anomalous observations as a function of normalised data magnitude. The embedded boxplots show the inter-quartile range (blue boxes) and ±1.5 times the interquartile range (tolerance lines) of the data in each variable.

figure. This leaves the analyst with no guidance about where to begin their investigation. To address this issue, consideration was given to group some variables together so that anomalies can be detected and isolated to that group (of variables). This indicates that the algorithm would have to be executed on each group separately (and in parallel) in order to observe anomaly isolation

in the incoming data. To make a real-time decision, this seems to be the ideal solutions, however, the authors opted to do a post offline analysis by using a visually intuitive tool, called the violin plot, to statistically investigate the anomalous instances in the data. This can also help rank probable anomalous variables by looking at their spread and skewness, and the ones with the

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

most points beyond the min/max quartile range. Fig. 12 analyses the anomalies occurring in the data at data points between 189 s – 206 s and 325 s – 345 s; it reveals that for Fig. 12a, gyro readings 4,5,6 and variable 9 have the most variability, where as for Fig. 12b, variable 9 demonstrates the most variability to contribute to the isolation forest score. Analysing the video of the UAV experiment during these times, it revealed that the system was attempting to stabilise itself after a drop in altitude. It should be noted that this approach may not unequivocally diagnose the root cause to a problem, but it can aid in providing much greater insight into the probable causes, and hence help to limit the search. 5. Conclusions The detection of anomalies can either be missed or have a significant time gap. There is a need to react quickly in order to learn about an event as soon as possible, identify its causes, and understand what to do about it. This entails identification of abnormal values, collecting the corresponding events centrally, and monitoring a much larger number of metrics and dimensions than what human capabilities allow. As a result, the expectation is that such solutions can respond in a timely manner to unexpected events. Of course, there is a huge variety of approaches, methods, and algorithms for detecting anomalies, and thus the authors intended to burgeon literature and analyse current developments in the context of UAVs. This article investigated requirements for unsupervised anomaly detection for engineering applications and demonstrated a real-time implementation of the isolated forest. The aim of the solution was to determine the complete range, description and operational parameters of what constitutes as anomalous behaviour. Since it is not necessary to know a priori which observations are anomalies, the idea is to provide the algorithm with enough data to build a forest that is able to identify which observations are likely to be anomalies and which are not. The resulting scores should encompass the majority of such behaviours exhibited during operation. In order to avoid benign deviations and faulty measurements, a post-analysis stage was also implemented to calculate an anomaly likelihood metric with various window sizes. This demonstrated a more robust result rather than using the raw scores generated by the isolation forest algorithm. Further analysis can be undertaken to better understand this function, its subsequent threshold levels and to use violin plots to isolated the cause of an anomaly. 5.1. Future work One of the challenges highlighted in the research gaps was the ability to process large data and test simulations with streaming signals. This can be attributed to the data acquisition and algorithm processing needs that must take place within a defined time frame. This motivated this effort to explore a hardware implementation of the algorithm. Since a functioning isolation forest had been implemented, the authors aimed to accelerate it for embedded applications. This section carries out a Register Transfer Level (RTL) implementation whilst optimising the result using Xilinx Vivado High-Level Synthesis (HLS). The framework to transfer the algorithm (and functions) into an embedded solution is illustrated in Fig. 13. The major steps are to: (i) validate it27 (ii) synthesise the solution for hardware acceleration. The training of the forest was completed and stored in the memory beforehand. The C code can be then generated 27 This which may contain a hierarchy of subfunctions.

13

Table 4 Synthesis results. Standard Optimised

BRAM

DSP

FF

LUT

150 211

461 31

89 400 47 702

202 341 61 079

from the Matlab implementation of the machine learning algorithm, which includes the necessary extensions and libraries. This generated code accounts for all the necessary function calls and related operations that enable the RTL description using a Hardware Description Language (HDL). The primary advantage of such a framework is the rapid hardware prototyping under various configurations; which is often a labour intensive process [55]. The authors are currently targeting the cost-effective XC7Z020 FPGA board, with a clock period of 10 ns. Initial implementations make use of optimisation methods: unrolling28 and pipe-lining.29 These can be configured in various ways to perform different arithmetic operations to support internal pipelining with its DPS blocks to enhance throughput. The original code was divided into a machine learning function and a test bench. The test bench is executed on the ARM processor, whilst the machine learning function was converted into RTL for the FPGA. The HLS tool is running on an Intel (R) Core (TM) i7-8550 CPU @ 1.8 GHz. Typically, the metrics most relevant to assess the quality/performance of a hardware implementation are the use of DSP48s,30 Flip Flops (FF), Block RAMs (BRAM) and Look Up Tables (LUT). Table 4 shows the synthesis results in terms of hardware resources. The standard implementation makes use of a large number of LUTs that consume a significant amount of resources. Hence, depending on the design requirements and the use; a trade-off analysis between performances and use of resources must be carried out when it is optimised. It appears that using HLS for machine learning problems enables optimisation of the hardware design in terms of performance and digital resources. However, the solution presented here is not trivial. Too aggressive design constraints can result in a solution with less performance and more usage of FPGA area. Whereas some optimisation strategies, such as inner loop unrolling, can directly increase performance and resource usage. Other more complex strategies, such as outer loop unrolling cannot be easily predicted. Also, the solution depends on the problem being investigated and the implementation platform being used. Nonetheless, the combination of automatic code generation and the use of such tools can enable a fast hardware realisation of the algorithm. The experimental work demonstrates great potential for real time unsupervised anomaly detection in aerospace applications. FPGA technology and high level synthesis further opens up the doors for verifiable machine learning implementations. As a result, the authors are currently investigating the following:

• How to reduce failure ambiguity by using the anomaly likelihood and interpretable information that can aid in diagnosing the root-cause; • Practicality of deep autoencoders as an unsupervised learning algorithm; 28 Vivado HLS can enable partial or complete unrolling of loops. Complete unrolling a loop will generate dedicated hardware for every iteration of the loop. This is in contrast to software programming where commands in a given loop are executed sequentially. This adds delays since the successive loops cannot be executed until all commands of the previous iteration have been completed. 29 Loop pipelining can help implement concurrent commands within a loop. This improves overall throughput and latency by using the same hardware with different control logic. 30 These are self-contained arithmetic logical units with add/sub/multiply/logic operations.

14

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650

Fig. 13. Development methodology for algorithm development and design work-flow.

• There is a need to be able to account of user feedback. This can help improve the anomaly detection model for the next iteration of the model; • Onboard implementation; • Latency and power analysis of the solution on FPGAs. Declaration of competing interest No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105650. Acknowledgement This project was partially funded by Japan Society for the Promotion of Science (JSPS): P1609. All Matlab simulation files will be made available from the authors github page. References [1] S. Khan, T. Yairi, A review on the application of deep learning in system health management, Mech. Syst. Signal Process. 107 (2018) 241–265. [2] H.-P. Kriegel, A. Zimek, et al., Angle-based outlier detection in highdimensional data, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 444–452. [3] F.T. Liu, K.M. Ting, Z.-H. Zhou, Isolation forest, in: 2008 Eighth IEEE International Conference on Data Mining, IEEE, 2008, pp. 413–422. [4] X. Jin, B.W. Wah, X. Cheng, Y. Wang, Significance and challenges of big data research, Big Data Res. 2 (2) (2015) 59–64. [5] B. de Jonge, R. Teunter, T. Tinga, The influence of practical factors on the benefits of condition-based maintenance over time-based maintenance, Reliab. Eng. Syst. Saf. 158 (2017) 21–30. [6] J. Daily, J. Peterson, Predictive maintenance: How big data analysis can improve maintenance, in: Supply Chain Integration Challenges in Commercial Aerospace, Springer, 2017, pp. 267–278. [7] Y. Peng, M. Dong, M.J. Zuo, Current status of machine prognostics in condition-based maintenance: a review, Int. J. Adv. Manuf. Technol. 50 (1–4) (2010) 297–313. [8] C.P. Ward, P. Weston, E. Stewart, H. Li, R.M. Goodall, C. Roberts, T.X. Mei, G. Charles, R. Dixon, Condition monitoring opportunities using vehicle-based sensors, Proc. Inst. Mech. Eng. F 225 (2) (2011) 202–218. [9] S. Yin, S.X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Trans. Ind. Electron. 61 (11) (2014) 6418–6428. [10] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv. 41 (3) (2009) 15. [11] I.T. Jolliffe, J. Cadima, Principal component analysis: a review and recent developments, Phil. Trans. R. Soc. A 374 (2065) (2016) 20150202. [12] F. Gharibnezhad, L.E. Mujica, J. Rodellar, Applying robust variant of principal component analysis as a damage detector in the presence of outliers, Mech. Syst. Signal Process. 50 (2015) 467–479. [13] J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, Vol. 1, Springer series in statistics New York, NY, USA:, 2001. [14] G.E. Hinton, T.J. Sejnowski, T.A. Poggio, Unsupervised Learning: Foundations of Neural Computation, MIT press, 1999. [15] C.C. Aggarwal, Outlier Analysis, Springer, 2015. [16] M. Goldstein, S. Uchida, A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data, PLoS One 11 (4) (2016) e0152173.

[17] S. Sanchez, M. Perhinschi, H. Moncayo, M. Napolitano, J. Davis, M. Fravolini, In-flight actuator failure detection and identification for a reduced size uav using the artificial immune system approach, in: AIAA Guidance, Navigation, and Control Conference, 2009, p. 6266. [18] A. Srivastava, Enabling the discovery of recurring anomalies in aerospace problem reports using high-dimensional clustering techniques, in: 2006 IEEE Aerospace Conference, IEEE, 2006, pp. 17–pp. [19] L. Li, R. Hansman, R. Palacios, R. Welsch, Anomaly detection via a gaussian mixture model for flight operation and safety monitoring, Transp. Res. C 64 (2016) 45–57. [20] T.G. Dietterich, T. Zemicheal, Anomaly detection in the presence of missing values, in: ACM SIGKDD 2018 Workshop, 2018. [21] T. Brotherton, T. Johnson, Anomaly detection for advanced military aircraft using neural networks, in: 2001 IEEE Aerospace Conference Proceedings (Cat. No. 01TH8542), Vol. 6, IEEE, 2001, pp. 3113–3123. [22] A. Nanduri, L. Sherry, Anomaly detection in aircraft data using recurrent neural networks (rnn), in: 2016 Integrated Communications Navigation and Surveillance (ICNS), 5C2–1, IEEE, 2016. [23] S. Lee, W. Park, S. Jung, Fault detection of aircraft system with random forest algorithm and similarity measure, Sci. World J. 2014 (2014). [24] S. Ge, L. Jun, D. Liu, Y. Peng, Anomaly detection of condition monitoring with predicted uncertainty for aerospace applications, in: 2015 12th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Vol. 1, IEEE, 2015, pp. 248–253. [25] M. Amer, M. Goldstein, S. Abdennadher, Enhancing one-class support vector machines for unsupervised anomaly detection, in: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ACM, 2013, pp. 8–15. [26] Y. Liu, W. Ding, A knns based anomaly detection method applied for uav flight data stream, in: 2015 Prognostics and System Health Management Conference (PHM), IEEE, 2015, pp. 1–8. [27] G.O. Campos, A. Zimek, J. Sander, R.J. Campello, B. Micenková, E. Schubert, I. Assent, M.E. Houle, On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study, Data Min. Knowl. Discov. 30 (4) (2016) 891–927. [28] D. Tibaduiza, L. Mujica, J. Rodellar, Damage classification in structural health monitoring using principal component analysis and self-organizing maps, Struct. Control Health Monit. 20 (10) (2013) 1303–1316. [29] M. Xia, T. Li, L. Liu, L. Xu, C.W. de Silva, Intelligent fault diagnosis approach with unsupervised feature learning by stacked denoising autoencoder, IET Sci. Meas. Technol. 11 (6) (2017) 687–695. [30] J. Bu, R. Sun, H. Bai, R. Xu, F. Xie, Y. Zhang, W.Y. Ochieng, Integrated method for the uav navigation sensor anomaly detection, IET Radar Sonar Navig. 11 (5) (2017) 847–853. [31] D. Brown, G. Georgoulas, H. Bae, G. Vachtsevanos, R. Chen, Y. Ho, G. Tannenbaum, J. Schroeder, Particle filter based anomaly detection for aircraft actuator systems, in: 2009 IEEE Aerospace Conference, IEEE, 2009, pp. 1–13. [32] L. Cork, R. Walker, Sensor fault detection for uavs using a nonlinear dynamic model and the imm-ukf algorithm, in: Information, Decision and Control, 2007. IDC’07, IEEE, 2007, pp. 230–235. [33] J.-A. Ting, E. Theodorou, S. Schaal, A kalman filter for robust outlier detection, in: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2007, pp. 1514–1519. [34] R. Fujimaki, T. Yairi, K. Machida, An anomaly detection method for spacecraft using relevance vector learning, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2005, pp. 785–790. [35] Z. Birnbaum, A. Dolgikh, V. Skormin, E. O’Brien, D. Muller, C. Stracquodaine, Unmanned aerial vehicle security using recursive parameter estimation, J. Intell. Robot. Syst. 84 (1–4) (2016) 107–120. [36] J. Schumann, K.Y. Rozier, T. Reinbacher, O.J. Mengshoel, T. Mbaya, C. Ippolito, Towards real-time, on-board, hardware-supported sensor and software health management for unmanned aerial systems, Int. J. Progn. Health Manage. (2015).

S. Khan, C.F. Liew, T. Yairi et al. / Applied Soft Computing Journal 83 (2019) 105650 [37] J. Schumann, P. Moosbrugger, K.Y. Rozier, R2u2: monitoring and diagnosis of security threats for unmanned aerial systems, in: Runtime Verification, Springer, 2015, pp. 233–249. [38] Z. Hu, M. Xiao, L. Zhang, S. Liu, Y. Ge, Mahalanobis distance based approach for anomaly detection of analog filters using frequency features and parzen window density estimation, J. Electron. Test. 32 (6) (2016) 681–693. [39] R. Lin, E. Khalastchi, G.A. Kaminka, Detecting anomalies in unmanned vehicles using the mahalanobis distance, in: Robotics and Automation (ICRA), 2010 IEEE International Conference on, IEEE, 2010, pp. 3038–3044. [40] E. Khalastchi, M. Kalech, G.A. Kaminka, R. Lin, Online data-driven anomaly detection in autonomous robots, Knowl. Inf. Syst. 43 (3) (2015) 657–688. [41] J. Ting, Z. Fugen, B. Xiangzhi, Moving object detection in airborne video using kernel density estimation, Infrared Laser Eng. 40 (1) (2011) 153–157. [42] C. Fu, R. Duan, D. Kircali, E. Kayacan, Onboard robust visual tracking for uavs using a reliable global-local object model, Sensors 16 (9) (2016) 1406. [43] K. Guo, L. Liu, S. Shi, D. Liu, X. Peng, Uav sensor fault detection using a classifier without negative samples: a local density regulated optimization algorithm, Sensors 19 (4) (2019) 771. [44] C.F. Liew, D. DeLatte, N. Takeishi, T. Yairi, Recent developments in aerial robotics: An survey and prototypes overview. arXiv preprint arXiv:1711. 10085 (2017). [45] G. Goh, S. Agarwala, G. Goh, V. Dikshit, S.L. Sing, W.Y. Yeong, Additive manufacturing in unmanned aerial vehicles (uavs): challenges and potential, Aerosp. Sci. Technol. 63 (2017) 140–151. [46] G. Baffi, E. Martin, A. Morris, Non-linear projection to latent structures revisited (the neural network pls algorithm), Comput. Chem. Eng. 23 (9) (1999) 1293–1307.

15

[47] G.R. Rodríguez-Canosa, S. Thomas, J. Del Cerro, A. Barrientos, B. MacDonald, A real-time method to detect and track moving objects (datmo) from unmanned aerial vehicles (uavs) using a single camera, Remote Sens. 4 (4) (2012) 1090–1111. [48] K. Hundman, V. Constantinou, C. Laporte, I. Colwell, T. Soderstrom, Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2018, pp. 387–395. [49] L. Puggini, S. McLoone, An enhanced variable selection and isolation forest based methodology for anomaly detection with oes data, Eng. Appl. Artif. Intell. 67 (2018) 126–135. [50] A. Kentaro, S. Khan, T. Yairi, C. Liew, Towards anomaly detection using variational long short-term memory autoencoders for system health monitoring, in: 35th International Conference on Machine Learning, 2018. [51] J. Lee, J. Ni, D. Djurdjanovic, H. Qiu, H. Liao, Intelligent prognostics tools and e-maintenance, Comput. Ind. 57 (6) (2006) 476–489. [52] S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing 262 (2017) 134–147. [53] P.R. Garvey, S.A. Book, R.P. Covert, Probability Methods for Cost Uncertainty Analysis: A Systems Engineering Perspective, Chapman and Hall/CRC, 2016. [54] A. Saxena, K. Goebel, Turbofan engine degradation simulation data set, NASA Ames Progn. Data Repos. (2008). [55] R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y.T. Chen, H. Hsiao, S. Brown, F. Ferrandi, et al., A survey and evaluation of fpga high-level synthesis tools, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 35 (10) (2016) 1591–1604.