FAULT DETECTION USING REDUNDANT NAVIGATION MODULES Paul Sundvall ∗,1 Patric Jensfelt ∗ Bo Wahlberg ∗
∗
S3 Automatic Control and Centre for Autonomous Systems, KTH, SE-100 44 Stockholm, Sweden
Abstract: Mobile robots and other moving vehicles need to know their position with a certain level of confidence. In this paper, a method is proposed to handle faults in the navigation system by considering the outputs of existing navigation modules rather than processing sensor data directly. The proposed method needs only a simple model for drift and noise, an extended Kalman filter and a CUSUM test. It can handle position information given in different coordinate systems and does not require any modification of existing navigation modules. Promising experimental c 2006 IFAC. results are shown. Copyright Keywords: fault detection, navigation, mobile robots, autonomous vehicles
1. INTRODUCTION AND PROBLEM DESCRIPTION
Most implementations of fault detection systems operate at a low level in the system, more or less directly operating on sensor data. This is the most powerful approach because it has access to all information in the system. However, it has two important drawbacks. First, detailed knowledge of sensor characteristics is necessary. Second, access to sensor data is required, which is not always possible.
Navigation is a crucial component of autonomous land vehicles. Such vehicles often perform tasks like moving objects between locations. To avoid collisions and reach the goal, knowledge of the vehicle position is needed. Inputs from accelerometers, wheel encoders and/or other sensors are often fused into an estimate of its position. The main problem is how to provide reliable navigation. Because this is a central problem for autonomous vehicles, numerous methods have been developed for navigation. Robustness and reliability are often increased by means of using more or better sensors, possibly in combination with advanced modelling of the environment and sensor noise.
In this paper, a method is proposed for doing fault detection on a higher level instead of working with sensor data directly. It utilizes already existing navigation systems (pose providers) which are often already present and encapsulate extensive knowledge of sensors and the environment. The proposed method detects faults by considering the outputs of existing pose providers. Changing or adding code to the pose providers is thus not necessary.
To be able to operate safely even in the presence of failing sensors or malfunctioning navigation algorithms, fault detection methods can be incorporated into the navigation system.
In the next section, alternative approaches to the problem are discussed. The details of the approach presented in this paper are provided in Section 3. Experimental results from an implementation of the approach are given in Section 4.
1
Supported by the Swedish Foundation For Strategic Research
522
2. RELATED WORK
and thus operate at a low level in the system. This requires detailed models andthat it is possible to obtain sensor data to the fault detection module. The method proposed in the current paper operate at a higher level in the system, where sensor data have been refined.
Increasing robustness has been a driving force in developing navigation methods. Many solutions use advanced modelling of unwanted sensor input, such as detecting and removing laser echoes from people or detecting movable objects in the environment to separate them from the fixed environment. Other contributions introduce methods to handle ambiguous situations, for instance by using multiple hypotheses. Even if methods like these improve the navigation performance, they can still not handle sensor faults.
For the applications considered here, it is necessary not only to obtain correct positioning, but also to detect extraordinary events, for instance collisions. Regarding the operation of an autonomous robot, it is often equally important to know that a collision occurred as it is that the navigation has degraded. Thus, it is not enough to increasing navigation performance. Fault detection is also necessary for the robot to be aware that something unexpected has occured.
An example of adding fault detection for navigation purposes is (Roumeliotis et al., 1998), where a bank of Kalman filters is used. Each filter is tuned to a specific fault and is used for detection purposes. Fault isolation is obtained by studying the residuals at a given moment, using their probability distribution to decide what fault model is most likely. In this approach, the detection is implemented in close conjunction with the sensors. Fault isolation and detection are obtained, at the cost of using a model for each sensor fault.
3. METHODS AND MODELLING The proposed method relies on obtaining position estimates from two or more pose providers, where the term pose provider is used for sensors or systems that deliver estimates of the robot’s position. A flow diagram is shown in Fig. 1. The pose providers do not need to use the same coordinate system or have the same update rate. The most obvious example of a pose provider is odometry (integrating relative motion into position, most often based on wheel encoders). As localization is extremely important for mobile robots, there exist many different kinds of pose providers: GPS, inertial navigation, simultaneous localization and mapping(SLAM), laser range based odometry, visual odometry and wireless localization are some examples. It is important that the pose providers provide redundancy, otherwise it is impossible to distinguish what could be a normal motion and what is a fault. As mobile robots often are equipped with many different types of sensors, and there are several algorithms that use a subset of the sensors, it is not difficult finding independent
Another approach to fault detection and isolation is used in (Lu et al., 2004). Multiple sensors are combined to obtain several estimates of the robot’s position, orientation and speed based on different sensors for each estimate. The pairwise differences between the estimates are then used as residuals. Based on the size of the residuals, a table mapping high residuals to specific faults is used to isolate faults. Navigation is then performed with the subset of sensors that are considered functioning. It is not clear how the algorithm handles accelerometer and gyroscope drift, which would cause the speed estimates to drift compared to the odometry based estimates. In (Verma et al., 2004), a particle filter is used for fault detection and isolation. One discrete state is used for each fault mode, having an associated model for the continuous states. Particle filters are very powerful for tracking systems, and output a probability distribution over the states, given as samples (particles). They can be used for nonlinear, non-Gaussian and multi modal distributions and outperforms the Kalman filter in such cases. The drawback is that the computational demands increase fast with the state dimension, even if there are means for increased efficiency. Another drawback is that a fault model is needed to track the probability of being in a fault state. With the proposed method, a Kalman filter tracks the normal state. It is then possible to test for deviation from the normal (fault free) model, requiring a model only for normal behavior.
Fig. 1. The proposed system layout: sensors deliver data to navigation algorithms, which deliver their position estimates to an extended Kalman filter. The weighted residual of the Kalman filter is fed to a CUSUM test, which alarms if necessary.
Common to the fault detection methods described above, is that the methods use sensor data directly
523
equations, the matrices are given for p = 2 pose providers. Extending to use more pose providers is straight forward. c c Q u e1 ue1 Q , w = ue2 (2) Qe2 Q= c1 u Qc1 Qc2 uc2 1 1 ∆T cos(θi ) 0 G G 0 I3 0 , Gi = ∆T sin(θi ) 0 Gt = G2 0 G2 0 I3 0 ∆T (3) Each pose provider gives rise to a Gi matrix, which is a linearization around the heading angle θi for the corresponding pose provider. This is what causes Gt to vary with the state xt , introducing nonlinearity in (1). ∆T is the time interval between the updates of the system.
pose providers. An example of a basic system is to use odometry as one pose provider and an integrator that converts motion control commands to position as a second pose provider. Pose providers can be divided into two groups depending on the characteristic of their estimation error. The first group has an estimation error that grows over time, and includes methods like inertial navigation and different kinds of odometry. The second group has a bounded estimation error, and includes map based methods and GPS-like sensors. The approach presented here can handle both types of systems. System model To detect deviation between the pose providers, a model is used to predict the outputs. A set of three T is used for every state variables ri = x y θ pose provider i, which corresponds to rectilinear position and orientation of the robot, in the frame of each pose provider. The frames do not need to be known for the method to work.
Each element of the process noise vector w represents different sources for driving the pose providers. The noise is assumed to be zero mean and white (independent of future and past noise).
A standard first order dynamic model is used for the evolvement of the output from the pose providers:
Common robot speed: The first and dominating part of the process noise vector is uc , the common T (forward and rotational robot speed uv uω speed). As the robot moves, uc is the part of the model that captures this. In absence of model errors, pose provider errors and noise it would be the only nonzero part of the process noise. The model assumes that the control signal is an unknown stochastic variable. One can argue that the commanded motion of the robot is known and should be included as a deterministic input signal in the model (1). However, experiments have shown that this does not contribute significantly to detection performance. In some systems, it might not even be possible to get this signal, for instance when the robot is controlled by the hardware directly or by another piece of software.
xt+1 = xt + Gt (xt )wt , cov(wt ) = Qt (xt ) (1) yt = xt + vt , cov(vt ) = Rt where xt is the aggregated state vector T xt = r1T r2T . . . rpT at time t, for p pose providers. The process noise wt corresponds to commanded motion as well as noise inherent within the pose providers, including sensor noise. The 3p-by-(2 + 5p) matrix Gt is the gain from process noise wt to states (see (3)), Qt is the process noise intensity (see (2)) and vt is measurement noise. This particular choice of the state vector leads to a nearly linear system. An alternative is to choose the state to be the true position augmented with a coordinate transform for each pose provider. Having a pose provider that slowly drifts, as for instance odometry, does however lead to cross couplings between noise sources and robot speed and is thus more difficult to track.
If the speed reference is available, it can be externally integrated into pose using an appropriate dynamic model and then used as a pose provider in the present framework. The dynamic model for the relation between speed reference signal and pose could be arbitrarily complex without affecting the structure of the fault detector, as only the output of the algorithm is read into the fault detector. This means that the complexity of the fault detector does not increase, even if a very complex model for the relation between speed reference and robot motion is used. This elucidates the modularity in the method.
The measurements are the outputs of the pose providers, and the influence of sensor noise is captured in the process noise w, and not the measurement noise v. This is further discussed in the remainder of the section. Model input and process noise The process noise w is a 2 + (2 + 3)p vector which is used to model both the commanded motion of the robot as well as disturbances and imperfection within each pose provider. The latter will give rise to drift in the pose providers. In the following
Pose provider robot frame drift: A part of the drift uei of the pose provider is modeled as additive T input of the robot speed uei = v ei ω ei . A system that integrates signals from robot fixed
524
speed sensors, such as a Doppler radar or a rate gyro, will be subject to this type of drift.
from the pose providers) agree with the model or not. This is done by monitoring the state with an extended Kalman filter (EKF). In this way, an estimate x ˆ and an associated covariance matrix P are calculated every time step. The purpose of tracking is not to get an estimate of the pose provider outputs, but instead to tell whether the pose providers agree or not. In the ideal case with no fault present, the innovation (predicted output error) e = y − C x ˆ from the Kalman filter is Gaussian and white, given that the model assumptions hold and that linearization effects are negligible. When something abrupt happens like sensor malfunction, wheel slip or collision, the model will not be valid, and the innovation e will not be Gaussian nor white.
Pose provider Cartesian frame drift: As uei models drift relative to the robot frame, uci models T drift in a Cartesian frame. uci = vxci vyci vθci represents noise that models a drift in all states simultaneously. GPS-like pose providers are subject to this type of drift. Using this drift model, problems arising from linearization and model errors are also alleviated. The characteristic of a navigation algorithm normally changes with speed. For instance, odometry has a larger drift at high speed than low. Other types of pose providers, like inertial navigation, might have drift nearly independent of speed. To accommodate for these effects of changing speed, some elements of the noise intensity matrix Q are scaled with a factor k defined in (4). A similar scaling can be found in (Chong and Kleeman, 1997).
At startup, the filter must be initialized. By setting the initial covariance P0|−1 to a high value βI, β 1, the filter will rapidly converge to the correct values. Having one or more pose providers starting at a large nonzero position is thus not a problem. If the pose providers deliver data at different rates or readings are missing, the Kalman filter provides means to handle such situations.
Herein, a small offset α is added to the scaling factor k, to avoid unreasonable low noise at standstill. The offset also alleviates problems of having a slight delay in calculating the factor. The scaling factor is dimensionless, and uses normalization constants v¯f , v¯θ (see (4)) which are set to normal driving speed of the robot. Thus the factor k will be approximately 1 at normal driving forward or rotating, and α 1 at standstill. The reason for scaling the covariance (almost) linear with speed, is that adding independent uncertainty Q/n in n steps obtains Q total increase of uncertainty, independent of n. s 2 2 vθ vf + +α (4) k= v¯f v¯θ
An alternative to use EKF might be to compare the speeds from the pose providers directly, as done in (Lu et al., 2004). However, this is not straight forward, as different noise characteristics make it difficult to reason if changes are large or not. Comparing absolute sensors (GPS-like) and relative sensors like accelerometers is not easy. The straight forward approach of comparing, by differentiating the absolute position and comparing it to the integral of the accelerometer signal would suffer from noise amplification and drift at the same time. Detecting abrupt changes From the Kalman filter, the innovation e is obtained. The associated covariance will be Re = P + R and a test statistic is obtained by the Mahalanobis distance st = eT Re−1 e, which will be χ2 distributed with 3p degrees of freedom. This is true when the model assumptions hold. A simple test of when to alarm for faults is to set a threshold on st . This can be done using a standard χ2 table, or from monitored levels of the test statistic. However, thresholding is sensitive to outliers in st , and can cause false alarms. To avoid this, st is fed into a detector that takes the size and duration of the test statistic into account.
Measurement noise Sensor noise and imperfection of the pose providers give rise to drift in the position estimates, which is modelled by the process noise w in (1). This means there is no regularmeasurement noise v in the system. There will however be effects from scheduling jitter, which is especially important when implementing the fault detector in a non realtime system. Also, communication delays between sensors and receiving system may also cause jitter. Effects from finite machine precision and interpolation errors might also be captured by the measurement noise. Having a nonzero R is also beneficial for the numerical properties of tracking the state, as R = 0 indicates that the measurements are exact, and can cause inconsistencies in the estimate.
CUSUM test The CUSUM test (see for instance (Gustafsson, 2000)) is a test for detecting positive changes in a positive (noisy) scalar signal. In short, the test will alarm for a single sample being extremely large, or several consecutive samples being unexpectedly large. The algorithm is shown in Fig. 2. There are
Tracking the state With a model of the system at hand, it is now possible to test if the measurements (readings
525
gt = gt−1 + st − v if (gt < 0) then talarm = t, gt = 0 if (gt > h) then raise alarm and set gt = 0 Fig. 2. The CUSUM algorithm. two parameters in the test, the drift value v > 0 and the alarm threshold h > v. A simplified description is that v is related to what level the input to the detector normally has, and h is adjusted to trade off false alarms and risk of missed detection. The values can be initially set by rules of thumb and then tuned if necessary. Since the Kalman filter will slowly adapt to the new data after a disturbance, faults will only give residuals under a limited time. Therefore, the detector should not be too slow. It is also important that the alarm delay is kept small.
Fig. 3. Detail of an alarm from the CUSUM test: The curve is the test variable gt . The alarms are marked with circles at the times they are raised, and the reported time talarm of the first alarm is marked with a vertical line.
The CUSUM test can estimate when the change in the signal has occurred, see for instance (Gustafsson, 2000). This is given by the last reset time talarm . An example of an alarm time obtained with this method is shown in Fig. 3. Upon an alarm, this information can be used to perform recovery actions on the navigation system. For instance, estimation of the current position can be carried out on stored sensor data from the time of when the fault was believed to occur and forward, where faulty sensors are excluded from the analysis. The memory of old sensor data can be of finite length, as a fault alarm probably will come rather quickly or not at all.
Experiment setup During the experiments, the robot moved around autonomously between goal points using the Nearness Diagram algorithm (Minguez and Montano, 2000). The experimental environment is a room furnished with sofas, tables and bookshelves to resemble a domestic living room in size and furnishing. As the robot is moving around in the room, it has been pushed to induce wheel slip. During pushing, the robot has been rotated approximately π3 (due to its weight, it is difficult to move the robot other ways than rotating it). The algorithm was run in realtime on a standard laptop computer, connected via wireless network to the robot and its sensors.
4. EXPERIMENTAL RESULTS
Results The filter has been tested during several sessions, with both fault free and faulty data. No false alarms were triggered during the fault free sessions, even if the robot was forced to do fast moves with the obstacle avoidance behavior, triggered by suddenly walking into the field of view of the laser scanner, close to the robot. Faults were injected by pushing the robot manually (to not cause damage to the robot), while it was moving autonomously between waypoints. An example of output from such an experiment with faulty data is shown here. The Mahalanobis distance st from the Kalman filter is shown in Fig. 4. One can clearly see when the robot has been pushed (multiple times). One can also see that the residual is very low during the beginning of the test, when no fault is present. The corresponding test variable gt and its threshold are shown in Fig. 5. Each time it exceeds the threshold, gt is reset to zero and an alarm is raised. The alarm times are marked with circles in the figure. Several alarms are raised after each other. This is a sign that the fault is large compared to the sensitivity of the fault detector. A detailed plot of the first alarm is shown in Fig. 3.
To test the proposed method, it has been implemented on an indoor service robot intended for domestic use. The robot is equipped with odometry and a SICK laser scanner, where the odometry system can be used directly as a pose provider. As a second pose provider (independent from the odometry system), a scan matching routine was implemented, which integrates the position by matching consecutive laser scans to each other. See (Lu and Milios, 1994) for an example of a scan matching algorithm. Since the match of two consecutive scans will have an error that is mainly independent of the robot motion, the error model with Cartesian noise dominates the drift. The size of the scan matcher error is for the speed range considered here nearly independent of the speed, and the corresponding parts of Q can thus be held constant. The measurement noise covariance matrix R has been selected diagonal, where the elements have been chosen as a fraction of the typical motion between two time steps and the fraction corresponds to the size of the scheduling jitter compared to the sampling time.
526
In summary, the pros and cons of the proposed approach are described below: + Existing navigation modules can be used, as long as they provide redundancy. + The model is quite simple, and it is fairly easy to set the parameters. + No special model is needed for faults, only the normal model is required. + Very low computational power is needed. – Knowledge about system behavior for specific faults is not utilized. – Fault isolation is not obtained. Fig. 4. Weighted residual st from the Kalman filter from test with robot being pushed several times. The pushes are clearly visible as peaks in st . The first 25 seconds are fault free and the residual is small.
6. FUTURE WORK The crucial part for detection using the proposed framework is that the fault model is valid. Some pose providers are able to give an estimate of their own uncertainty. If this can be built in to the model, the performance of the fault detector can be improved even further. Regarding the isolation problem, there is a possibility to combine pose providers pairwise, using one filter per pair.
REFERENCES Chong, Kok Seng and Lindsay Kleeman (1997). Accurate odometry and error modelling for a mobile robot. In: International Conference on Robotics and Automation (ICRA). Gustafsson, Fredrik (2000). Adaptive filtering and change detection. Wiley. Lu, F. and E. Milios (1994). Robot pose estimation in unknown environments by matching 2d range scans. In: CVPR94. pp. 935–938. Lu, Y., E. Collins and M. Selekwa (2004). Parity relation based fault detection, isolation and reconfiguration for autonomous ground vehicle localization sensors. In: 24th Army Science Conference. Minguez, J. and L. Montano (2000). Nearness diagram navigation(ND): A new real-time collision avoidance approach. In: International Conference on Intelligent Robots and Systems (IROS). Roumeliotis, S.I., G.S. Sukhatme and G.A. Bekey (1998). Sensor fault detection and identification in a mobile robot. In: International Conference on Intelligent Robots and Systems (IROS). Verma, V., R. Simmons, G. Gordon and S. Thrun (2004). Particle filters for fault diagnosis. IEEE Robotics and Automation Magazine.
Fig. 5. Output from CUSUM test with robot being pushed several times. When the test variable gt exceeds the threshold, an alarm is generated (marked with circles) and it is reset. The weighted residual from Fig. 4 is used as input for the detector. 5. SUMMARY AND CONCLUSIONS The proposed system is demonstrated to be able to detect wheel slip in realtime. The system handles initially unknown coordinate systems and different noise characteristics. The approach is here demonstrated using two providers, odometry and scan matching. If more pose providers are available, the quality of detection would further improve. It is demonstrated that it is possible to detect faults on top of the navigation system, as opposed to the normal approach which is to make the navigation system more robust by building in more intelligence into the navigation algorithms themselves, which requires good domain knowledge. The approach uses a simple model for noise, and no specific fault model is needed. Thus the demand of modeling is very modest. The approach is suitable to run in realtime even on systems with little computing power, as the only calculation needed is a low order Kalman filter and a CUSUM test.
527