A hybrid approach for improving unsupervised fault detection for robotic systems

A hybrid approach for improving unsupervised fault detection for robotic systems

Accepted Manuscript A Hybrid Approach for Improving Unsupervised Fault Detection for Robotic Systems Eliahu Khalastchi , Meir Kalech , Lior Rokach PI...

8MB Sizes 1 Downloads 30 Views

Accepted Manuscript

A Hybrid Approach for Improving Unsupervised Fault Detection for Robotic Systems Eliahu Khalastchi , Meir Kalech , Lior Rokach PII: DOI: Reference:

S0957-4174(17)30221-X 10.1016/j.eswa.2017.03.058 ESWA 11216

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

24 December 2016 3 March 2017 25 March 2017

Please cite this article as: Eliahu Khalastchi , Meir Kalech , Lior Rokach , A Hybrid Approach for Improving Unsupervised Fault Detection for Robotic Systems, Expert Systems With Applications (2017), doi: 10.1016/j.eswa.2017.03.058

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights From unsupervised to supervised learning a fault detection model (for robots) Insights to why and when it becomes more accurate Theoretical analysis & a prediction tool Empirical results on 3 real-world domains that back these insights

AC

CE

PT

ED

M

AN US

CR IP T

   

ACCEPTED MANUSCRIPT

A Hybrid Approach for Improving Unsupervised Fault Detection for Robotic Systems Eliahu Khalastchi1, Meir Kalech2, Lior Rokach2 [email protected], [email protected], [email protected] 1College

of Management Academic Studies, 2Ben-Gurion University of the Negev December 22, 2016

Abstract

CR IP T

The use of robots in our daily lives is increasing. As we rely more on robots, thus it becomes more important for us that the robots will continue on with their mission successfully. Unfortunately, these sophisticated, and sometimes very expensive, machines are susceptible to different kinds of faults. It becomes important to apply a Fault Detection (FD) mechanism which is suitable for the domain of robots. Two important requirements of such a mechanism are: high accuracy and low computational-load during operation (online). Supervised

AN US

learning can potentially produce very accurate FD models, and if the learning takes place offline then the online computational-load can be reduced. Yet, the domain of robots is characterized with the absence of labeled data (e.g., “faulty”, “normal”) required by supervised approaches, and consequently, unsupervised approaches are being used. In this paper we propose a hybrid approach - an unsupervised approach can label a data set, with a

M

low degree of inaccuracy, and then the labeled data set is used offline by a supervised approach to produce an online FD model. Now, we are faced with a choice – should we use the unsupervised or the hybrid fault

ED

detector? Seemingly, there is no way to validate the choice due to the absence of (a priori) labeled data. In this paper we give an insight to why, and a tool to predict when, the hybrid approach is more accurate. In

PT

particular, the main impacts of our work are (1) we theoretically analyze the conditions under which the hybrid approach is expected to be more accurate. (2) Our theoretical findings are backed with empirical analysis. We

CE

use data sets of three different robotic domains: a high fidelity flight simulator, a laboratory robot, and a commercial Unmanned Arial Vehicle (UAV). (3) We analyze how different unsupervised FD approaches are

AC

improved by the hybrid technique and (4) how well this improvement fits our prediction tool. The significance of the hybrid approach and the prediction tool is the potential benefit to expert and intelligent systems in which labeled data is absent or expensive to create. Keywords: Fault Detection, Robotic Systems, Unsupervised.

ACCEPTED MANUSCRIPT

1. INTRODUCTION

In recent years we witness a rapid increase in the use of robots. Recent reports by the International Federation of Robotics (IFR, 2016) (IFR, 2016) describe yearly increases of 15% and 25% in sales of service and industrial robots respectively. Robots are used for tasks that are too dangerous, too dull, too dirty or too difficult to be done by humans. Among such tasks we find surveillance and patrolling

CR IP T

(Agmon, Kraus, & Kaminka, 2008), aerial search (Goodrich, et al., 2008), rescue (Birk & Carpin, 2006) and mapping (Thrun, 2002).

Robots are complex entities comprised of both physical (hardware) and virtual (software) components. They operate in different dynamic physical environments with a varying degree of

AN US

autonomy e.g., satellites, Mars rovers, Unmanned Aerial, Ground or Underwater Vehicles (UAV, UGV, UUV), Robotic arm in an assembly line, etc. These sophisticated, and sometimes very expensive machines, are susceptible to a wide range of physical and virtual faults such as wear and tear, noise, or software-control failures (Steinbauer, 2013). If not detected in time, a fault can quickly deteriorate

M

into a catastrophe; endangering the safety of the robot itself or its surroundings (Dhillon, 1991). For

ED

instance, an undetected engine fault in a UAV (Unmanned Aerial Vehicle) can cause it to stall and crash. Thus, robots should be equipped with Fault Detection (hereinafter FD) mechanisms that will

PT

allow recovery processes to take place.

Different types of robots necessitate different FD requirements that challenge FD approaches in

CE

different ways. In this paper we focus on the following requirements: 1. High accuracy. In particular, a low false positive (false alarm) rate and a high true positive

AC

(detection) rate are required. These requirements are especially important where a false alarm might lead to a very expensive mission abortion and an undetected fault might lead to mission failure, e.g., in an unmanned spacecraft mission. In addition, in the dynamic context (e.g., new missions, dynamic environments) under which robots operate one cannot account for all possible faults a priori; unknown faults might occur and must be detected as well. 2. Small computational burden. It is required where the FD process has to be executed onboard the

ACCEPTED MANUSCRIPT

robot. Robots are already engaged in mission oriented real-time processes, such as vision processing, which are very demanding and compete over system resources (CPU, memory). An additional computationally heavy FD process might burden the system and thus interfere with the robot‟s behavior. This requirement is especially important for autonomous robots where there is less dependency on remote supporting systems, e.g., a rover on Mars cannot wait 22 minutes for a server

CR IP T

on Earth to communicate the fact that the rover has a critical fault.

There are types of robots that necessitate both of these requirements from an FD mechanism, e.g., a UAV flying solo in a part of its mission. Unfortunately, traditional approaches for FD in the literature are challenged to meet both of these requirements and at once. Three general approaches

AN US

are usually used for FD: Knowledge-Based systems, Model-Based, and Data-Driven approaches. Knowledge-Based systems (Akerkar & Sajja, 2010) typically associates recognized behaviors with predefined known faults. These approaches tend to have a small computational burden but are unable to detect unknown faults, and thus, accuracy is compromised. Model-Based approaches

M

(Isermann, 2005) on the other hand, are very equipped to detect unknown faults. Instead of modeling

ED

different faults, the expected (normal) behavior is modeled. High discrepancies between the model output and the observed system output are reported as detected faults. Yet, one must consider the

PT

practicality of the model construction, and its load on the robot‟s computational system during runtime. Data-Driven approaches (Hodge & Austin, 2004) (Anderson, Michalski, Carbonell, &

CE

Mitchell, 1986) are model free and have a natural appeal for detecting unknown faults. Online data processing may capture the current temporal context of the robot‟s operation and thus achieve better

AC

accuracy (Khalastchi, Kaminka, Kalech, & Lin, 2011). Yet, these online computations increase the computational load of the robot (Chandola, Banerjee, & Kumar, 2009). Machine learning are datadriven approaches which may involve offline preprocessing that reduce online computations. An FD model is learned from offline data and applied online. Supervised learning approaches rely on data that is already labeled for detected faults (Khalastchi, Kalech, & Rokach, 2014). Unfortunately, such data is typically absent in the domain of robots, and very expensive to produce.

ACCEPTED MANUSCRIPT

In this paper we provide three contributions. The first contribution of this paper is by proposing a hybrid approach that overcomes the absence of labeled data. We utilize an unsupervised FD approach which could have been used to detect faults online. We use this unsupervised FD approach offline, as a black box, to label a large data set. Then, supervised learning is applied and a new classifier is produced. The produced supervised-classifier is used online for FD. Since the classifier has statically

CR IP T

captured the online behavior of the unsupervised FD approach, it has potentially less burden on system resources (CPU, Memory). In addition, under some conditions, the supervised-classifier can even be more accurate than the unsupervised FD approach would have been.

This proposed hybrid approach raises serval interesting questions: (1) what are the conditions

AN US

under which it should achieve greater accuracy? (2) Why the greater accuracy can even be achieved? Especially given the fact that the unsupervised FD approach is not able to perfectly label the data set, and (3) can we predict the success of such a hybrid approach in the absence of labeled data? To the best of our knowledge, although approaches that are similar in concept were applied, there was no

M

attempt to give theoretical answers to these questions.

ED

Our second contribution lies in the theoretical extension of our previous work (Khalastchi, Kalech, & Rokach, 2014). We extend our theoretical findings by addressing these questions for any general

PT

binary classification problem, nominal data, and decision tree learning. We provide a theoretical analysis that (a) uncovers the conditions that make the decision tree more accurate than the original

CE

unsupervised FD approach, (b) provides an answer to why it is more accurate, and (c) provides a prediction about when it is advisable to use the proposed hybrid approach. Thus, we partially fill the

AC

theoretical gap.

The third contribution of our work is the experimental extension of our previous work. We use data sets of three different robotic domains: a new benchmark for FD in a high fidelity flight simulator, a laboratory robot, and a commercial Unmanned Arial Vehicle (UAV). The new benchmark is very challenging since it contains (a) different types of faults, (b) varying fault durations, (c) concurrent occurrence of multiple faults, and (d) contextual faults, i.e., data instances that possess values which

ACCEPTED MANUSCRIPT

are valid under one context but are invalid under another. We provide an empirical analysis which is consistent with our theoretical findings. In particular, we show how the hybrid approach is more accurate than four different unsupervised fault detection approaches that are utilized to label the data sets, and how well this improvement in accuracy fits our theoretical prediction. The paper is organized as follows. In the next section we present related work. In Section 3 we

CR IP T

provide an overview of the proposed hybrid approach. In Section 4 we provide the theoretical analysis which serves as our main contribution. In particular, we uncover the conditions under which the technique works, answer why it works, and derive a predicting calculation. In Section 5 we provide our second contribution - the empirical experimental setup and results. We conclude with a

AN US

discussion in Section 6. 2. RELATED WORK

Steinbauer conducted a survey on the nature of faults of autonomous robots (Steinbauer, 2013) . The

M

survey participants are developers competing in different leagues of the Robocup competition (2013). The reported faults were categorized as hardware, software, algorithmic and interaction-related

ED

faults. The survey concludes that hardware faults such as sensors, actuators and platform related faults have a deep negative impact on mission success. In this paper we focus on the improvement of

PT

the detection accuracy of different FD approaches which can detect such faults to sensors and

CE

actuators.

In the introduction we have discussed the main FD approaches and their disadvantages when

AC

applied on different types of robotic systems, namely, Model-Based, Data-Driven, and KnowledgeBased approaches. Our arguments are consistent with the arguments of Pettersson (2005) who presents a survey about execution monitoring in robotics. He uses the knowledge and terminology from industrial control to classify different execution monitoring approaches applied to robotics. His survey particularly focuses on mobile robots. There is a very good discussion about the advantages and disadvantages of each approach. Our research is focused on a hybrid approach which can avoid these disadvantages.

ACCEPTED MANUSCRIPT

Knowledge-based systems (Akerkar & Sajja, 2010) are typically challenged by the requirement to detect unknown faults. Model-based approaches (Isermann, 2005) (Travé-Massuyès, 2014) that use a model that describes the expected behavior of the robot can detect unknown faults. For example, Steinbauer and Wotawa (2005) and recently in (Wienke, 2016) use a model based approach for detecting failures in the control software of a robot. The software architecture was utilized for the

CR IP T

model creation. Different aspects of software components are observed and expected to behave according to the model unless there is a fault of any type, including of an unknown type. Yet, modelbased approaches are challenged to model the expected behavior of components and their interactions with respect to the dynamic environment. In recent work Steinbauer and Wotawa (2010) emphasize

AN US

the importance of the robot's belief management and fault detection with respect to the real-world dynamic environment. Akhtar and Kuestenmacher (2011) address this challenge by the use of native physical knowledge. The naive physical knowledge is represented by the physical properties of objects which are formalized in a logical framework. Yet, formalizing the physical laws demands great a

M

priori knowledge about the context in which the robot operates (environment and task). A diagnosis

ED

model can be automatically generated (Zaman & Steinbauer, Automated Generation of Diagnosis Models for ROS-based Robot Systems, 2013) (Zaman, Steinbauer, Maurer, Lepej, & Uran, 2013). The

PT

authors utilize the Robot Operating System (ROS) (Quigley, et al., 2009) to model communication patterns and functional dependencies between nodes. A node is a process that controls a specific

CE

element of the robot, e.g., a laser range finder or path planning. The hybrid approach we propose applies decision tree learning. The decision tree is an automatically produced fault detection model,

AC

and as such, the challenge of describing expected behaviors with respect to the environment is avoided.

Data-driven techniques (Golombek, Wrede, Hanheide, & Heckmann, 2011) (Hodge & Austin, 2004) (Anderson, Michalski, Carbonell, & Mitchell, 1986) process data and produce a model, typically for fault detection. The model creation can be done in an offline preprocessing or online as the robot operates and generates the data. Either way, the model should be applied online to detect faults as

ACCEPTED MANUSCRIPT

they occur. Online model creation, in particular density-based approaches such as in (Pokrajac, Lazarevic, & Latecki, 2007) (Serdio, et al., 2014) (Khalastchi, Kalech, Kaminka, & Lin, 2015) (Costa, Angelov, & Guedes, 2015) (Bezerra, Costa, Guedes, & Angelov, 2016), may produce dynamic models that fit very well to the dynamic nature of the robot. Yet, such approaches face the challenge of producing the dynamic model within the constraints of time and computational-load (Chandola,

CR IP T

Banerjee, & Kumar, 2009). On the other hand, offline model creation, such as in machine learning approaches, reduce online calculations, making the approach quicker and easier on system resources. However, the produced model is static and typically cannot account for every possible scenario. This challenge is met with the use of very large data sets which represent the scope of operations of the

AN US

robot. Our empirical analysis utilizes such data sets.

The absence of labeled data in the domain of robots can be handled in different ways. Some approaches, such as in (Leeke, Arif, Jhumka, & Anand, 2011) and (Christensen, O‟Grady, Birattari, & Dorigo, 2008), depend on the injection of fault expressions to a data set of a non-faulty operation prior

M

to the learning phase. Such approaches face the challenge of mimicking the faults propagation

ED

through the system to correctly inject the fault‟s expressos in the data. Another possibility is to use one-class classification techniques. Hornung et al. (2014) utilize a data set of a fault-free operation.

PT

They apply two clustering techniques to produce a two-classed labeled data. First, MahalanobisDistance (Mahalanobis, 1936) is utilized in a radial basis function to cluster the data instances. The

CE

formed clusters depict the normal behavior of robot. Then, negative selection is used to cluster the unknown data instances, i.e., an infinite amount of possible data instances that do not appear in the

AC

data and depict the abnormal behavior of the robot. Having the two labels, a Support Vector Machine (SVM) (Steinwart & Christmann, 2008) learning algorithm is applied offline, and used online as an anomaly detector. For efficiency the high dimension of the data is reduced by techniques of projection and re-projection. In this paper we assume to have a large (unlabeled) data set of operations which already contain hidden faults. There is no need to mimic fault expressions. In addition, we assume to have an unsupervised fault detection approach which is utilized offline as a data labeler. Thus, there

ACCEPTED MANUSCRIPT

is no need for a generic clustering algorithm; we use an FD approach which is already suitable for the domain. Table 1 summarizes the general advantages and challenges the architypes of FD approaches have when applied to robotic systems. Note that this is a general summary and exceptions can be found. We can see that each approach has a certain challenge that is an advantage of the other approaches.

Knowledge-based

Model-based

Advantages

Challenges

CR IP T

Approaches

Low computational burden

Detection of unknown faults

Low computational burden

Modeling expected behaviors of every

Detection of unknown faults

component and their interactions especially

AN US

w.r.t. the environment

Model free

High computational burden

Data-driven

Detection of unknown faults

Absence of labeled data

M

Table 1: Summary of advantages and challenges FD approaches have when applied to robotic systems.

ED

The robot‟s operational data is a time-series comprised of quantitative data streams. In this paper, we are focused on nominal data. The quantitative data streams are transformed to nominal values

PT

depicting the existence of suspicious patterns (e.g., “drift”, “stuck”). Such patterns are general in nature and can be found in other works about fault detection. For instance, the Advanced Diagnostics

CE

and Prognostics Testbed (ADAPT) (Mengshoel, et al., 2008) depicts the following faults to sensors on

AC

an electrical circuit: "stuck" where all values produced by the sensor are the same, "drift" where the values show a movement towards higher (or lower) values, and "abrupt" where there is a sudden large increase (or decrease) in the sensor's values. Another example can be found in the work of Hashimoto et al. (2003). They use Kalman filters along with kinematical models to diagnose "stuck" and "abrupt" faults to sensors of a mobile robot, as well as "scale" faults, where the (gain) scale of the sensor output differs from the normal expectation. Sharma et al. (2010) discuss how to use rulebased, estimation, time series analysis, and learning-based approaches to detect „real-world‟ sensor

ACCEPTED MANUSCRIPT

patterns such as “constant” (stuck), “short” (abrupt), and “noise”, where the variance of the sensor readings increases.

ED

M

AN US

CR IP T

3. AN OVERVIEW OF THE HYBRID APPROACH

PT

Figure 1: an overview of the unsupervised technique.

Figure 1 depicts an overview of the hybrid approach. The left side describes the offline process. We

CE

begin (at the top) with a large unlabeled data set. This data set depicts the nominal values of the robot‟s attributes as recorded during many different operations. This set already includes hidden

AC

examples of faults that the robot has suffered. Specifically, during the robot‟s operations data streams of sampled attributes are recorded. The attributes are based on sensor readings, actuator commands and actuator feedbacks. For instance, attributes of a UAV may include sensor readings such as indicated altitude, airspeed, heading etc., actuator commands such as the force to be applied on the elevators, ailerons, rudder, throttle, etc., and actuator feedback such as the current angle of the ailerons. These recordings form a quantitative

ACCEPTED MANUSCRIPT

time series. Hence we require an unsupervised approach that has the ability to transform the quantitative time series into nominal data1. Next, an unsupervised fault detection approach is used offline, as a black box, to label the data set, i.e., each row in the data set receives the label “Fault” or “Normal”. The fault detection approach used is very accurate, but not perfect. It may have a high fault detection rate (true positive rate), and a low

CR IP T

false alarm rate (false positive rate). Yet, the resulted labeled data set contains false positive and false negative examples, i.e., normal instances deemed as faulty and faulty instances deemed as normal (respectively). Note that we do not know the fault detection and false alarm rates of the approach, nor the amount of false positives and negatives in the data. However, we evaluate that

AN US

relatively small amounts of mistakes do exist within this data set.

Even so, in the next step we apply supervised decision tree learning, i.e., based on the data that the unsupervised approach has labeled. This process produces a fault detecting decision tree which depicts, and perhaps generalizes, the fault detection decision mechanism of the utilized unsupervised

M

approach.

ED

The online process is depicted in the right side of the figure. The robot‟s sampled quantitative data is processed and transformed to nominal data. The nominal data is fed to the decision tree which

PT

quickly determines whether or not this instance is a fault. Applying this decision tree online would typically take significantly less calculations than possible online calculations done by the original

CE

unsupervised fault detection approach; these calculations are replaced with a static tree. We benefit a quick and computationally easy fault detection approach. But is it more accurate than the original

AC

online approach? We answer this question in the next section. Even so, in the next step we apply supervised decision tree learning, i.e., based on the data that the unsupervised approach has labeled. This process produces a fault detecting decision tree which depicts, and perhaps generalizes, the fault detection decision mechanism of the utilized unsupervised approach. The online process is depicted in the right side of the figure. The robot‟s sampled quantitative data 1Implementation

detail: we used a fixed size sliding-window technique to frame the data streams. Pattern detectors were applied on each stream. Detected patterns, such as “drift” or “stuck”, that describe the data stream of an attribute, were assigned as its nominal value.

ACCEPTED MANUSCRIPT

is processed and transformed to nominal data. The nominal data is fed to the decision tree which quickly determines whether or not this instance is a fault. Applying this decision tree online would typically take significantly less calculations than possible online calculations done by the original unsupervised fault detection approach; these calculations are replaced with a static tree. We benefit a

online approach? We answer this question in the next section. 4. THEORETICAL ANALYSIS

CR IP T

quick and computationally easy fault detection approach. But is it more accurate than the original

In this section we provide our main contribution. In subsection 4.1 we provide a theoretical analysis

AN US

for the hybrid approach by investigating the conditions under which it is expected to achieve better accuracy than the unsupervised approach utilized as a data labeler. In addition, we derive the conditions under which greater accuracy can be achieved. In section 4.2 we describe a predictor that can help to decide whether to use the hybrid approach or the original unsupervised approach even

M

though we do not possess labeled data for testing. 4.1 Why the Hybrid Approach can be More Accurate

depicts the notations we use in this section, and their meaning. Let

ED

Table 2

unlabeled set of examples, consisting of are unknown. Let

negative examples. Since

CE

classifier, but not perfect. Figure 2 depicts 's classification of . The examples of the circle. The top half represents

AC

bottom half represents positives

- the share of examples classified by

- the share of examples classified by

and false positives

is unlabeled then

be an unsupervised classification approach. We know

PT

and

positive and

be a sufficiently large

.

includes true negatives

know these amounts, but we do know that

satisfies

are represented by as positive, and the

as negative.

and false negatives and

.

is a good

includes true . We do not

ACCEPTED MANUSCRIPT

Notation

Meaning

CR IP T

A large unlabeled data Set Unknown numbers of Positive and Negative examples in . An Unsupervised classifier that

classified as Positive

The (countable) number of examples in

that

classified as Negative

AN US

The (countable) number of examples in

Unknown numbers of True Positives and False Positives of . Unknown numbers of True Negatives and False Negatives of . The unknown True Positive Rate, and False Positive Rate of .

is a Leaf in , and

as labeled by .

M

A decision Tree, learned on

is a Subset of examples which satisfy the case of

.

that are classified as Positive by .

The (countable) number of examples in

that are classified as Negative by .

PT

ED

The (countable) number of examples in

Unknown numbers of True Positives and False Positives of .

CE

Unknown numbers of True Negatives and False Negatives of .

AC

The unknown True Positive Rate, and False Positive Rate of . Table 2 notations and their meaning

CR IP T

ACCEPTED MANUSCRIPT

Figure 2: how S is classified by an unsupervised classifier.

We can label

is more accurate than

when put to a test, i.e.,

would have a higher true positive rate

AN US

predict if

with , apply a decision tree learning and produce a classification tree . We wish to

and a lower false positive rate. We are interested in these measures since they depict the fault detection rate and false alarm rate (respectively) of a fault detection approach. We wish to determine which classifier is more accurate

M

even though no labeled data exist.

Each

ED

In decision tree classifiers, the decisions are made in the leaves of the tree. leaf

PT

different values. For

represents a case where different features possess

Figure 3: a decision tree.

instance, Figure 3 depicts a decision tree where the leaf

represents a case where attributes which satisfy the case of

CE

in

then the tree decides that the case of

the tree decides that the case of

. Let

be the number of examples in

be the number of examples in

AC

positives by . Let

. Let

and

be the set of examples that are classified as

that are classified as negatives by . If

should be classified as positive. If

should be classified as negative. If

then

then the tree cannot

decide (the leaf has no information gain) and thus such leaves are omitted. Figure 4 represents the two types of leaves (positive and negative) and their corresponding subset of examples in . The leaf

classifies as positive since

. The leaf

classifies as negative

ACCEPTED MANUSCRIPT

.

CR IP T

since

Figure 4: the two types of leaves and their corresponding subset of examples.

Generally, in each subset of

has made. Correct labels include a number of true positives denoted as

negatives denoted as

M

– a leaf that classifies as positive, i.e., every example is

then

are incorrect.

correctly classifies more examples in

|

)

. Since

AC

(

CE

Generally, the probability for

are correct.

is incorrect and

is classified by

as

and the classification by

is the number of examples for

is the number of examples for which

is the number of examples for which

PT

incorrect.

and

ED

is the number of examples for which both and

, and

. Each of these values can be equal to or greater than zero.

positive. Let us investigate the correlation between the classification by

which both the

, and true

. Incorrect labels include a number of false positives denoted as

false negatives denoted as Let us focus on

AN US

incorrect labels

represented by a leaf, there is an unknown number of correct and

is correct and

is

is correct. Thus, if

than does .

to be correct when classifying an example

as negative is

is assumed to be a good classifier then this probability is greater than

1/2. Seemingly, this might mean that

in

.

However, this analysis disregards two important pieces of available information. (a)

is a leaf. The

construction process of the tree only splits nodes if there is meaningful information to be gained. This means that any way to split

would result in leaves depicting subsets of examples for which ‟s

decision does not contribute any meaningful information; such leaves are pruned. Hence, all the

ACCEPTED MANUSCRIPT

examples in

, though different from one another, depict a single unique case and provide the same

information with respect to the desired decision. (b) We know that

that

times, and

times as negative, where

is certain that the case depicted by

this value as

. Note that

. With this information we can state

is positive with a probability of

is known since

and

are countable in

For example, the tree learning process has grouped examples were labeled by

all the 100 examples in

examples under a leaf

as positive and

. Among these

as negative. We know that

depict a single unique case with respect to the contribution of meaningful

information. Thus, we can state that probability

.

is certain that the case depicted by

.

is positive with a

AN US

examples

. We denote

CR IP T

positive

has classified this single case as

Now we can understand why greater accuracy can be achieved. Every example in same case, and according to , each example has a probability of

is

). Since

(

examples in

correctly than

for being positive. Thus, we can

, and the number of true negatives

is greater than 1/2 then

in

. That is,

classifies more

classified.

ED

in

is

M

estimate that the number of false negatives in

depicts the

PT

Continuing the example above, we can estimate that Accordingly,

as Figure 5 depicts. We can see that for case

, which classifies as positive,

CE

depicted by

is correct

times, and

times. Thus, we can see that given an example

AC

tree has a probability of 80/100=0.8 being correct, while being correct;

)

, the

has a probability of only 68/100=0.68 of

is better.

is certain with probability of (

is correct

, i.e., that satisfies the case of

The same principle can be applied to negative leaves such as that

.

. That is, if

. For the examples in

we can state

that they are negative. Hence, then

correctly classifies more examples in

than does .

CR IP T

ACCEPTED MANUSCRIPT

Figure 5: an example of a positive leaf.

more positive leaves in which more cases in which

AN US

Now, we can understand the conditions under which greater accuracy can be achieved. As there are and more negative leaves in which

is correct and

is not.

thus there are

Understanding this condition allows us to create a predictor that can help us decide whether

or

M

should be used. We describe this predictor in the next subsection. As you can see, this theoretical

ED

analysis is not restricted to the fault detection problem. It can be applied to any binary classification problem where nominal data is used and there is enough data in .

PT

4.2 Predicting Which Approach is More Accurate

Recall that in our domain, labeled data is absent and thus we cannot apply a test on a testing set.

CE

Even so, we wish to know which approach is better to use, the original unsupervised approach

or

AC

the hybrid approach with its decision tree ? Seemingly, more information is needed. Yet, we can answer the following question: given an estimation for the true positive and false positive rates

would have theoretically achieved on an

imaginary testing set, what would be the expected true positive and false positive rates of ? The answer to this question can help us decide if Formally, we define a predictor ( true positive rate of , and

is indeed better. )



〉 such that when given

- an estimated false positive rate of , the predictor

- an estimated returns the

ACCEPTED MANUSCRIPT

expected true positive rate and the expected false positive rate of , denoted as For example, Alice implemented an unsupervised approach

. Due to the absence of labeled data she cannot test which

classifier is better. However, she can ask: if

. were to achieve a true positive rate of 0.8 and a

false positive rate of 0.1 on some data set, what would ) and gets an answer of 〈

(

We start by estimating Note that

and

〉, indicating that

is preferable.

and

respectively. Among the positive examples

AN US

. The true positive rate of

. Hence, the amount of true positives are represents the ratio . Thus,

)

| |

)

and assign the estimated number of negative examples:

positive examples and , we can estimate that

CE

input

PT

Continuing the running example, Alice used a data set where | | has labeled

. Similarly, the false positive

. With some algebra we can assign the estimated

ED

.

represents the

. Hence, the amount of false positives are (| |

number of positive examples:

as

M

ratio rate of

(repectivly).

| |. In the data set , we know the number of examples that are labeled by

are true positives and false positives, i.e.,

| |

be able to achieve? Alice applies

- the number of positive and negative examples in

positive and as negative; these are denoted as

(| |

. In addition, she applied the

CR IP T

hybrid approach and produced

respectively.

. Applying

to label

negative examples. According to Alice‟s .

Now, we can calculate the number of true positives, false positives, true negatives, and false

AC

negatives of :

For

ACCEPTED MANUSCRIPT

Note that we have just estimated that in

there are

mislabeled positive examples, and

mislabeled negative examples. These examples are distributed among the subsets represented by the leaves of . In the previous subsection we theorized what are the portions of false negatives and false positive for each type of leaf. In particular, we stated that in a subset , the amount of false negatives

is

. That is, the probability of an example in

CR IP T

leaf

which corresponds to a positive

to be positive times the amount of examples labeled as negative by corresponds to a negative leaf

which

. That is, the

to be negative times the amount of examples labeled as positive by .

Note that all of these amounts are countable in each leaf and thus Let

is

AN US

probability of an example in

, the amount of false positives

. Similarly, in

be the set of positive leaves of , and

and

are known.

be the set of negative leaves of . With the prior

estimation of , we can now calculate the false negative, false positive, true positive and true negative





CE

PT

ED

M

of :

AC

The amount of false negatives for the tree falsely labeled as negative ∑

should count for all the examples in

apart of those which belong to positive leaves. The amount of

counts for all the examples in

that are falsely labeled as negative by

as positive by , and thus cannot be regarded as a part of examples in i.e., ∑

that were

that were falsely labeled as positive

. Similarly,

but are classified

should count for all the

apart of those which belong to negative leaves,

. The amount of true positive for the tree

is set to be the number of examples

ACCEPTED MANUSCRIPT

classified as positive by the tree number of true negatives for

, which is countable, minus its false positives

is

.

Continuing with the running example, assume that Alice counted that examples in

as positive, and

. These are 400 out of

classifies as positives. Thus,

as negative. Assume the sum of examples that

CR IP T

these examples is ∑

correctly

. Similarly, Alice calculated that

. Accordingly,

,

Note that according to our formula, as

and

becomes closer to

.

agree on more examples thus the negated sums are

, and

becomes closer to

AN US

reduced. As a result

classifies

examples as negative. For each positive leaf Alice calculated

the estimated number of examples that were falsely labeled by



. Similarly, the

true positive rate and false positive rate of both

and

. In turn, the predicted

become very similar. This is in support to the

conditions to achieve greater accuracy, as we discussed in the previous subsection.

and

then it is advisable to use .

PT

If

ED

M

Now we can estimate true positive rate and false positive rate of :

CE

In our running example, Alice calculated that Since the prediction for .

AC

use of

is better than the input (0.8, 0.1) of

and

.

then Alice should prefer the

Even without labeled data of any sort, with this predictor we can plot several instances of and the corresponding expected

on a curve. If the prediction for

is better for these

instances than , then it is advisable to use . For example, Figure 6 depicts a case where the hybrid approach is potentially more accurate than the unsupervised approach. The X-Axis represents the false positive rate and the Y-Axis represents the true positive rate. As a point in the chart tends towards the upper left corner, thus it depicts

ACCEPTED MANUSCRIPT

greater accuracy. The four black dots, represent the pairs of fabricated represent the pairs of the corresponding predicted predictor yields a more accurate prediction for

. The four gray dots,

. We can see that for each input, the

. Thus, for this case one may know that it is

recommendable to use the hybrid approach. On the other hand, Figure 7 depicts a case where the hybrid approach is not expected to be different than the unsupervised approach. For each fabricated , the prediction for

was of insignificant difference. This case signifies that

CR IP T

pair of

PT

ED

M

AN US

there is little room for improvement and hence, it is not recommendable to use the hybrid approach.

AC

CE

Figure 6: a prediction example. The hybrid approach is expected to be more accurate than the unsupervised approach.

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 7: a prediction example. There is no significant difference between the hybrid and the unsupervised.

5. EVALUATION

In this section we provide our evaluation. We start by describing the experimental setup in

M

subsection A, which includes the simulated and real-world domains. We continue in subsection B

ED

with the description of the different fault detection approaches used offline to label the data sets. In subsection C we provide the results. In particular, we show that the predictor was able to indicate the

5.1 Experimental setup

PT

right decision to whether or not we should use the hybrid approach.

CE

In this paper we provide a way to predict whether or not it is useful to use the hybrid approach over the unsupervised approach - even in the absence of labeled data. However, in order to validate our

AC

hypothesis that the predictions are correct, we have to use labeled data sets. We use the following labeled data sets. We created a new data set that can be used as a public benchmark for general fault detection approaches as well as approaches designed more specifically for the domain of robots. We utilized the FlightGear (Perry, 2004) flight simulator. FlightGear is an open source high fidelity flight simulator designed for research purpose and is used for a variety of research topics. FlightGear is a great domain for robotic FDD research, particularly for UAVs, and yet it is not widely used enough

ACCEPTED MANUSCRIPT

for this topic of research. In addition to this simulated domain, we experiment with two real-world domains as well: a commercial UAV and a laboratory robot. In this section we describe the FlightGear domain and the real-world domains. 5.1.1. The FlightGear Domain

The most important aspect of FlightGear as a domain for FD is the fact that FlightGear has built-in

CR IP T

realistically simulated instrumental, system, engine, and actuators faults. For example, if the vacuum system fails, the HSI (Horizontal Situation Indicator) gyros spin down slowly with a corresponding degradation in response as well as a slowly increasing bias/error (drift) (Perry, 2004), and in turn, if not detected, lead the aircraft miles away from its desired position. Thus, the first

AN US

advantage is that FDD approaches may solve a real-world problem. More importantly, while in other domains faults‟ expressions are injected into the recorded data after the flight is done, in FlightGear the simulated faults are built-in and can be injected during the flight. First, there is no bias; the faults were not modeled by the scientist who created and tested the FDD approach. Second, built-in

ED

expressions are more realistic.

M

faults which are injected during the flight propagate and affect the whole simulation. Hence, fault

We created a control software to fly a Cessna 172p aircraft as an autonomous UAV (see Figure 8).

PT

The flight instruments are used as sensors which sense the aircraft state with respect to its environment. These sensor readings are fed into a decision making mechanism which issues

CE

instructions to the flight controls (actuators) such that goals are achieved. As the UAV operates, its state is changed and again being sensed by its sensors. The desired flight pattern consists of seven

AC

steps: a takeoff, 30 seconds of a strait flight, a right turn of

, 15 seconds of a strait flight, a left

turn of 180 , 15 seconds of a strait flight, and a decent to one kilofeet. In total, the flight pattern duration is 6 minutes of flight.

ACCEPTED MANUSCRIPT

During a flight, 23 attributes are sampled in a frequency of 4Hz. These attributes present 5 flight controls feedback (actuators), and 18 attributes of flight instruments (sensors). The sampled flight controls are the ailerons, elevators, rudder, flaps, and engine throttle. The flight instruments are the airspeed indicator, altimeter, horizontal situation indicator, encoder, GPS, magnetic compass, slip skid ball, turn indicator, vertical speed indicator, and engine‟s RPM. Each instrument yields 1 to 4

CR IP T

attributes of data which together add up to 18 sensor-based attributes. Note that we did not sample

PT

ED

M

AN US

attributes that would have made the fault detection easy. For instance, sampling the current and

voltage of the aircraft would have made the detection of an electrical system failure unchallenging.

CE

Figure 8: flying a Cessna 172p as a UAV in the FlightGear simulator.

Type

Effect

Airspeed indicator

Instrument

Stuck

Altimeter

Instrument

Stuck

3

Magnetic compass

Instrument

Stuck

4

Turn indicator

Instrument

Quick drift to minimum value

5

Heading indicator

Instrument

Stuck

1 2

AC

Fault to

ACCEPTED MANUSCRIPT

6

Vertical speed indicator

Instrument

Stuck

7

Slip skid ball

Instrument

Stuck

8

Pitot tube

Subsystem

Airspeed drifts up

9

Static

Subsystem

Airspeed drifts up or down, Altimeter & Encoder are stuck Electrical

Subsystem

Turn indicator slowly drifts down

11

Vacuum

Subsystem

Horizontal Situation indicator slowly drifts

12

Flight elevator

Actuator

Stuck

CR IP T

10

Table 3 Summary of injected faults

AN US

The data set contains one flight which is free from faults, 5 subsets that each contains 12 recorded flights in which different faults were injected, and one special “stunt flight”. In total, the data set contains 62 recorded flights with almost 90,000 data instances. We injected faults to 7 different

M

instruments and to 4 different subsystems. Table 3 depicts the different faults and their effect. For instance, a fault injection of type 9 fails the static subsystem which, in turn, leads the airspeed

ED

indicator to drift upwards or downwards, and the altimeter and encoder to be stuck. Four subsets represent a single fault scenario. Each of the 12 flights in a subset corresponds to an

PT

injected fault in Table 3, i.e., flight 1 was injected with an airspeed indicator failure, flight 2 was injected with an altimeter failure, etc. Each fault was injected with 3-6 times per flight, and lasted 5-

CE

30 seconds. The fifth subset represents a multi-fault scenario. Each of the 12 flights on this set was

AC

injected 3 times. Each time a fault was injected to two different components at the same time. The double injection occurred at a random time of the flight, and for a random duration of 10-30 seconds. In total, the data set contains 290 injected faults. The first subset was used as an unlabeled data set. Subsets 2-5 were used as a testing set. That is, on these subsets, we compared the fault detection rate and the false alarm rate of the online FD approaches against the hybrid approach which has utilized these approaches. We checked if the improvement was fitting to the theoretical prediction.

ACCEPTED MANUSCRIPT

5.1.2. Real-World Domains

In addition to the FlightGear simulated domain, we experiment with two physical robots: a commercial UAV and a laboratory robot. These domains are not as complex as the simulated domain, but they serve to show the domain independence of the SFDD and its ability to handle real-world data.

CR IP T

Commercial UAV domain: The real UAV domain consists of 6 recorded real flights of a commercial UAV. 53 attributes were sampled in 10Hz. The attributes consists of telemetry, inertial, engine and servos data. Flights duration varies from 37 to 71 minutes. The UAV manufacture injected a synthetic fault to two of the flights. The first scenario is a value that drifts down to zero. The second

AN US

scenario is a value that remains frozen (stuck). The detection of these two faults were challenging for the manufacture since in both scenarios the values are in normal range. The remaining four flights were used as a training set where into two flights we injected similar synthetic faults. In total, the test set contains 65,741 instances out of which 1,593 are expression of faults.

M

Laboratory robot domain: Robotican1 is a laboratory robot (see Figure 9) that has 2 wheels, 3 sonar

ED

range detectors in the front, and 3 infrared range detectors which are located right above the sonars, making the sonars and infrareds redundant systems to

PT

one another. This redundancy reflects real world domains such as unmanned vehicles. In addition, Robotican1 has 5 degrees of freedom arm. Each joint is held by two electrical engines. These engines provide a feedback of the voltage applied

CE

Figure 9: Robotican1 laboratory robot.

by their action.

AC

The following scenario was repeated 10 times: the robot slows its movement as it approaches a cup. Concurrently, the robot‟s arm is adjusted to grasp the cup. In 9 out of the 10 times faults were injected. Faults of type stuck or drift were injected to different type of sensors (motor voltage, infrared and sonar). We sampled 15 attributes in 8Hz. Scenarios duration lasted only 10 seconds where 1.25 seconds expressed a fault. In total, the test set contains 800 instances out of which 90 are expression of faults. Four scenarios were used as an unlabeled training set and the other 6 were used

ACCEPTED MANUSCRIPT

as a testing set. 5.2 The Unsupervised Fault Detection Approaches

Potentially, every unsupervised approach can be used to label the data set. In order to show the strength and weaknesses of the hybrid approach we compare four different unsupervised approaches and analyze how the dependent hybrid approach is affected by each. We used the following

CR IP T

unsupervised FD approaches to label the data sets offline. The first is the incremental local outlier detection approach (Pokrajac, Lazarevic, & Latecki, 2007), denoted as

. This approach is uses the

K-nearest neighbor metric to compare densities of data instances. We arbitrarily chose a fixed threshold above which an outlier is considered as a result of a fault.

– the Online Data Driven Anomaly Detector (Khalastchi, Kalech,

AN US

The second approach is

Kaminka, & Lin, 2015). Data is consumed online in a sliding window fashion. Mahalanobis-Distance is utilized to compare correlated streams of current temporal data. Outliers above a calculated

The third approach is

M

threshold are considered as anomalies that may have been caused by faults. – the Sensor based Fault Detection and Diagnosis approach is an improvement of

where suspicious

ED

(Khalastchi, Kalech, & Rokach, 2013). The

pattern recognition are utilized instead of the Mahalanobis-Distance. In addition, a fault detection

PT

heuristic differentiate faults from normal behaviors. The forth approach is an extended implementation of the

, denoted as

, where among

CE

other extensions, the online temporal correlation detection is replaced with an offline large scale

AC

constant correlation detection. These correlations are learned offline form a fault free record of operation. Since this approach is very accurate we do not expect the decision tree to achieve greater accuracy.

We denote these unsupervised approaches as

,

,

,

as the produced decision trees for these approaches.

respectively, and

ACCEPTED MANUSCRIPT

5.3 Results

First we wish to show the prediction ability of our predictor. For each unsupervised approach, we inputted the predictor with several fabricated instances of true positive rates and false positive rates. It predicted an improvement for all approaches beside and show an improvement with respect to the

. The

approach is very accurate.

has showed that apart of only one example,

CR IP T

Examination of the leaves of

. Figure 6 depicts the predicted values of

and

agree on all examples in the data set. Thus, it is expected for the hybrid approach not to be better than

.

We classified the testing sets with the unsupervised approaches and calculated their true positive

AN US

rate and false positive rate. These results were re-inputted to the predictor. We checked if the predictions are similar to the real results of the corresponding hybrid approach. Table 4

depicts this comparison. The columns depict the unsupervised approaches. The first row

M

depicts the results of these approaches on the test set. Each cell shows the

of the

corresponding approach. These values were inputted, as an estimation, to the predictor. The second . The last row depicts the observed results of

ED

row depicts the predictions for

on the test set. These results are collected from the FlightGear domain. By

PT

comparing the first row and the last row we can see how well each unsupervised approach can be improved by the hybrid approach. For instance, the

managed a very high true positive rate of

CE

0.98 and a very low false positive rate of 0.06. Yet, the hybrid approach

managed to achieve

AC

even better results: a true positive rate of 0.996 and a false positive rate of 0.03. By comparing all the rows we can see how close the prediction to the real obtained results is. For instance, consider the column of

, given

the prediction for

is recommendable to use .

. Indeed,

is

achieved on the testing set better results than

, i.e., it , i.e.,

ACCEPTED MANUSCRIPT

Estimation of

0.87,

0.075

0.88,

0.13

0.98,

0.06

0.96,

0.003

Prediction for

0.90,

0.069

0.90,

0.12

1.00,

0.05

0.96,

0.003

Results of

0.99,

0.022

1.00,

0.02

0.996, 0.03

0.94,

0.004

Table 4 observed results

CR IP T

Obviously, the data of the testing set is different than the training set. As a result, the prediction cannot be 100% precise. Yet, we can see that for each approach the prediction can support the decision whether to use the hybrid or unsupervised approach. For each of the first three approaches, the prediction favored the hybrid approach and indeed it turned out to be better. For

are to be exactly the same. That is, the tree depicts the exact same decisions as

AN US

predictions for

, and thus there is no room for improvement. Hence, we can decided not to use . Indeed, the results for the hybrid

unsupervised

vs.

ED

In the particular case of

, not only there is no room for improvement, but the

approach is better equipped to handle previously unobserved data instances than . Thus,

can be more accurate than

PT

the supervised

for the testing set reveal that

is more accurate. We can conclude that when there is no room for improvement,

the unsupervised approach is preferable.

unsupervised

, but

M

rather use

the

.

CE

The implication of these results is the ability to use the hybrid approach in domains where the labeled data is absent or expensive, and to use our predictor to indicate which classifier should we

AC

use, even when seemingly there is no way to validate the choice due to the lack of (a priori) labeled data.

Next, we wish to demonstrate different aspects of improvement of the hybrid approach. Recall that the

is sliding window based. The size of window governs the accuracy of the approach

(Khalastchi, Kalech, & Rokach, 2013). Figure 10 illustrates the degree of reduction in the average false positive rate over the FlightGear domain. Note that the false positive rate is in logarithmic

ACCEPTED MANUSCRIPT

scale. We can see that with each size of sliding window the false positive rate of the hybrid unsupervised approach.

AN US

CR IP T

approach is significantly lower than the

Figure 10: FP rate vs. sliding window size. Unsupervised is

, hybrid is

, data taken from FlightGear.

M

The different parameters used by the unsupervised approach during the offline phase can be viewed as different unsupervised approaches; each with its own rate of false positives. The hybrid

ED

approach contributes to the reduction of false positive rate for each of these unsupervised approaches.

PT

Moreover, as the false positive rate of the unsupervised approach is getting lower, thus the false positive rate of the hybrid approach tends to 0.

CE

Recall that the sampled data of the robot is of quantitative nature. Again, a sliding window technique is used online to capture frames of the data stream, apply pattern detection on these

AC

framed streams, and output a nominal value for each attribute. As in the

approach, the size of

the online sliding window may govern the accuracy of the hybrid approach. In particular, a smaller size yields more fault reports. The increase of reports may increase the detection rate (true positive rate) of the hybrid approach, but it may also increase the rate of false alarms (false positives). Since the false alarm rate of the hybrid approach is very low, one can decide to tolerate more false alarms in order to increase the fault detection rate.

ACCEPTED MANUSCRIPT

Figure 11 illustrates the false alarm rates and the detection rates of the approach verses

(unsupervised)

(the hybrid approach) under the influence of a changing size of the online

sliding window (62sec – 47sec). The X-Axis represents the false alarm rate and the Y-Axis represents

accuracy. Note that scale of

CR IP T

the detection rate. As a point in the chart tends towards the upper left corner, thus it depicts greater

Figure 11 zooms in on high detection rate (close to 1) and low false alarm rate (close to 0). The added of false alarms to the hybrid approach is of little significance while the effect on the unsupervised approach is apparent. As expected, the detection rate of the hybrid approach is getting

AC

CE

PT

ED

M

AN US

higher as the size of the online sliding window decreases.

Figure 11: ROC –

(hybrid) vs.

(unsupervised), for differen sizes of the online sliding window.

Similar trends were observed for the real-world domains - UAV and Robotican1. As in the FlightGear domain, the predictor supported the use of

but not

. The injected

faults were relatively easy to detect, and where the unsupervised approach achieved a detection rate of 1 so did the hybrid approach, i.e., all faults were detected. Yet, the hybrid approach achieved significantly lower false alarm rates than the unsupervised approaches, apart from

where both

ACCEPTED MANUSCRIPT

approaches achieved similar false alarm rates. Table 5 depicts the false alarm rates of the unsupervised approach (

) vs. the hybrid approach (

score is better. We can see that

) for the real-world domains. a lower

is significantly better. Similar results were obtained for the

other unsupervised approaches, signifying the success of the hybrid approach even for real-world data.

UAV

0.033

0.016

Robotican1

0.067

0.041

CR IP T

Domain

Table 5 False Alarm Rates

AN US

In order to test the generality of the prediction to domains other than fault detection, we tested the Breast Cancer Wisconsin data set (Wolberg & Mangasarian, 1990). The data set contains 699 instances of 10 nominal attributes with a benign / malignant classification problem. We used 80% of the instances as a training set, and the other 20% as a testing set. We treated this domain as if it was

to relabel the training set. Then, we constructed a decision tree

, and

ED

neighbor algorithm

M

unlabeled. We removed the labels from the training set and used the (unsupervised) K-nearest

used the predictor with several parameters of true positive and false positive rates. The predictor

PT

indicated an improvement in the true positive rate at the cost of a higher false positive rate. Indeed, this was the case when we tested the classifiers on the testing set, as Table 6 depicts. FP rate

0.823

0.025

Prediction for

0.865

0.034

Results of

0.86

0.089

AC

Results of

CE

TP rate

Table 6 the breast cancer data set results

The first row depicts the results of output for

on the testing set. The second row depicts the predictor

where 0.823, 0.025 are specifically used as the parameters for the true positive and

false positive rates respectively. We can see that the predictor indicated a higher true positive rate at

ACCEPTED MANUSCRIPT

the cost of a higher false positive rate. The third row depicts the results of indicated by the predictor,

on the testing set. As

achieved higher true positive and false positive rates. This indicates

that our predictor is not limited to the fault detection classification problem. 6. DISCUSSION AND FUTURE WORK

CR IP T

In this paper we addressed the problem of fault detection (FD) for robotic systems. In particular, we tackled the need for higher accuracy and lower computational load of the FD approaches. Given the absence of labeled data in the domain of robots, we introduced a hybrid approach that can utilize an unsupervised approach, designed for FD for robotics, to label a large unlabeled data set. Then, we

AN US

apply decision tree learning to produce a fault detecting decision tree. The online work of classifying according to the decision tree is very easy on system resources, but is it more accurate? We answered three important questions: (1) what are the conditions under which the hybrid approach should achieve greater accuracy? We answered that a higher rate of disagreement between

M

the approaches may leave room for improvement. (2) Why the greater accuracy can even be achieved? Especially given the fact that the unsupervised FD approach is not able to perfectly label the data set.

ED

The answer relies in the notion that all examples belonging to the same leaf depict a single unique case with respect to a meaningful information gain. As such, the probability of the leaf being correct

PT

can be calculated and we showed it is greater than 1/2. (3) Can we predict the success of the hybrid

CE

approach in the absence of labeled data? We answered that indeed we can. We introduced a predictor which is based on our theoretical analysis. We empirically showed that the predictor can support the

AC

correct decision to whether or not we should use the hybrid approach or the original unsupervised approach.

The implications of our work are: (1) we introduced a hybrid approach that may be more accurate than a given unsupervised approach for FD in robotics, (2) the theoretical analysis is not restricted to the fault detection problem, but rather it depicts a binary classification problem with nominal data at hand, and (3) even with the absence of labeled data, one can use our predictor to estimate whether the hybrid approach is preferable.

ACCEPTED MANUSCRIPT

For future work we intend to answer the following questions: (1) how would the hybrid approach work under quantitative data stream? (2) Would the hybrid approach be suitable for a multiclass classification problem and as such be applied for the diagnosis problem? REFERENCES

CR IP T

Agmon, N., Kraus, S., & Kaminka, G. A. (2008). Multi-robot perimeter patrol in adversarial settings. Proceedings of the IEEE International Conference on Robotics and Automation, (pp. 2339-2345). Akerkar, R., & Sajja, P. (2010). Knowledge-based systems. Jones & Bartlett Publishers.

Anderson, J. R., Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (1986). Machine learning: An artificial intelligence approach. Morgan Kaufmann.

AN US

Bezerra, C. G., Costa, B. S., Guedes, L. A., & Angelov, P. P. (2016). An evolving approach to unsupervised and RealTime fault detection in industrial processes. Expert Systems with Applications, 63, 134-144. Birk, A., & Carpin, S. (2006). Rescue robotics - a crucial milestone on the road to autonomous systems. Advanced Robotics, 20(5). Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41, 158.

M

Christensen, A. L., O’Grady, R., Birattari, M., & Dorigo, M. (2008). Fault detection in autonomous robots based on fault injection and learning. Autonomous Robots, 24, 49-67. Competition, D. (n.d.). Retrieved from https://sites.google.com/site/dxcompetition2011/

ED

Costa, B. S., Angelov, P. P., & Guedes, L. A. (2015). Fully unsupervised fault detection and identification based on recursive density estimation and self-evolving cloud-based classifier. Neurocomputing, 150, 289-303.

PT

Dhillon, B. S. (1991). Robot reliability and safety. Springer. Golombek, R., Wrede, S., Hanheide, M., & Heckmann, M. (2011). Online data-driven fault detection for robotic systems. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

CE

Goodrich, M. A., Morse, B. S., Gerhardt, D., Cooper, J. L., Quigley, M., Adams, J. A., & Humphrey, C. (2008). Supporting wilderness search and rescue using a camera-equipped mini UAV. Field Robotics, 89-110.

AC

Guizzo, E. (2010). World Robot population reaches 8.6 million. IEEE Spectrum . Hashimoto, M., Kawashima, H., & Oba, F. (2003). A multi-model based fault detection and diagnosis of internal sensors for mobile robot. International Conference on Intelligent Robots and Systems (IROS). Hodge, V., & Austin, J. (2004). A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22, 85126. Hornung, R., Urbanek, H., Klodmann, J., Osendorfer, C., & van der Smagt, P. (2014). Model-free robot anomaly detection. International Conference on Intelligent Robots and Systems (IROS). IFR. (2016). Executive Summary World Robotics 2016 Industrial Robots. the International Dederation of Robotics (IFR).

ACCEPTED MANUSCRIPT

IFR. (2016). Executive Summary World Robotics 2016 Service Robot. the International Federation of Robotics (IFR). Isermann, R. (2005). Model-based fault-detection and diagnosis--status and applications. Annual Reviews in control, 71-85. Khalastchi, E., Kalech, M., & Rokach, L. (2013). Sensor fault detection and diagnosis for autonomous systems. the 12th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS-2013). Saint Paul.

CR IP T

Khalastchi, E., Kalech, M., & Rokach, L. (2014). A Hybrid Approach for Fault Detection in Autonomous Physical Agents . the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2014). Paris. Khalastchi, E., Kalech, M., Kaminka, G. A., & Lin, R. (2015). Online data-driven anomaly detection in autonomous robots. Knowledge and Information Systems, 657-688. Khalastchi, E., Kaminka, G. A., Kalech, M., & Lin, R. (2011). Online Anomaly Detection in Unmanned Vehicles. the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-2011). Taipei.

AN US

Khalastchi, E., Meir, K., & Rokach, L. (2016). A Sensor-Based Approach for Fault Detection and Diagnosis for Robotic Systems. submited to Autonomous Robots. khtar, N., & Kuestenmacher, A. (2011). Using Naive Physics for unknown external faults in robotics. the 22nd International Workshop on Principles of Diagnosis (DX-2011). Leeke, M., Arif, S., Jhumka, A., & Anand, S. S. (2011). A methodology for the generation of efficient error detection mechanisms. the 41st International Conference on Dependable Systems & Networks (DSN), (pp. 25-36).

M

Mahalanobis, P. C. (1936). On the generalised distance in statistics. the National Institute of Sciences of India, 2, 4955.

ED

Mengshoel, O. J., Darwichse, A., Cascio, K., Chavira, M., Poll, S., & Uckun, S. (2008). Diagnosing faults in electrical power systems of spacecraft and aircraft. Association for the Advancement of Artificial Intelligence (www.aaai.org).

PT

Perry, A. R. (2004). The flightgear flight simulator. USENIX Annual Technical Conference. Boston, MA. Pettersson, O. (2005). Execution monitoring in robotics: A survey. Robotics and Autonomous Systems, 73-88.

CE

Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. In IEEE Symposium on Computational Intelligence and Data Mining (CIDM). (pp. 504-515).

AC

Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., . . . Ng, A. Y. (2009). ROS: an open-source Robot Operating System. ICRA workshop on open source software. Robocup. (n.d.). Retrieved from http://www.robocup.org/ Robocup. (2013). Retrieved from http://www.robocup.org/ Robotics, W. (n.d.). Retrieved from http://www.worldrobotics.org/ Serdio, F., Lughofer, E., Pichler, K., Buchegger, T., Pichler, M., & Efendic, H. (2014). Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Information Fusion, 20, 272-291. Sharma, A. B., Golubchik, L., & Govindan, R. (2010). Sensor faults: Detection methods and prevalence in real-world datasets. ACM Transactions on Sensor Networks (TOSN).

ACCEPTED MANUSCRIPT

Steinbauer, G. (2013). A Survey about Faults of Robots Used in RoboCup. In RoboCup 2012: Robot Soccer World Cup XVI (pp. 344-355). Springer Berlin Heidelberg. Steinbauer, G., & Wotawa, F. (2005). Detecting and locating faults in the control software of autonomous mobile robots. the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), (pp. 1742-1743). Steinbauer, G., & Wotawa, F. (2010). On the Way to Automated Belief Repair for Autonomous Robots. the 21st International Workshop on Principles of Diagnosis (DX-10). Steinwart, I., & Christmann, A. (2008). Support Vector Machines.

CR IP T

Thrun, S. (2002). Robotic mapping: A survey. Exploring artificial intelligence in the new millennium, 1-35. Travé-Massuyès, L. (2014). Bridging control and artificial intelligence theories for diagnosis: A survey. Engineering Applications of Artificial Intelligence, 27, 1-16. Wienke, J. l. (2016). Autonomous fault detection for performance bugs in component-based robotic systems. EEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

AN US

Wolberg, W. H., & Mangasarian, O. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 9193-9196. Zaman, S., & Steinbauer, G. (2013). Automated Generation of Diagnosis Models for ROS-based Robot Systems. The 24th International Workshop on Principles of Diagnosis.

AC

CE

PT

ED

M

Zaman, S., Steinbauer, G., Maurer, J., Lepej, P., & Uran, S. (2013). An integrated model-based diagnosis and repair architecture for ROS-based robot systems. International Conference on Robotics and Automation (ICRA). IEEE International Conference on Robotics and Automation (ICRA).