ARTICLE IN PRESS Mechanical Systems and Signal Processing Mechanical Systems and Signal Processing 21 (2007) 795–808 www.elsevier.com/locate/jnlabr/ymssp
Application of a data-driven monitoring technique to diagnose air leaks in an automotive diesel engine: A case study David Antory Electrical Test for Advanced Architectures, International Automotive Research Centre, Warwick Manufacturing Group, University of Warwick, Coventry CV4 7AL, UK Received 24 October 2005; received in revised form 15 November 2005; accepted 16 November 2005 Available online 4 January 2006
Abstract This paper presents a case study of the application of a data-driven monitoring technique to diagnose air leaks in an automotive diesel engine. Using measurement signals taken from the sensors/actuators which are present in a modern automotive vehicle, a data-driven diagnostic model is built for condition monitoring purposes. Detailed investigations have shown that measured signals taken from the experimental test-bed often contain redundant information and noise due to the nature of the process. In order to deliver a clear interpretation of these measured signals, they therefore need to undergo a ‘compression’ and an ‘extraction’ stage in the modelling process. It is at this stage that the proposed data-driven monitoring technique plays a significant role by taking only the important information of the original measured signals for fault diagnosis purposes. The status of the engine’s performance is then monitored using this diagnostic model. This condition monitoring process involves two separate stages of fault detection and root-cause diagnosis. The effectiveness of this diagnostic model was validated using an experimental automotive 1.9 L four-cylinder diesel engine embedded in a chassis dynamometer in an engine test-bed. Two joint diagnostics plots were used to provide an accurate and sensitive fault detection process. Using the proposed model, small air leaks in the inlet manifold plenum chamber with a diameter size of 2–6 mm were accurately detected. Further analyses using contribution to T 2 and Q statistics show the effect of these air leaks on fuel consumption. It was later discovered that these air leaks may contribute to emissions fault. In comparison to the existing model-based approaches, the proposed method has several benefits: (i) it makes no simplifying assumptions, as the model is built entirely from the measured signals; (ii) it is simple and straight-forward; (iii) there is no additional hardware required for modelling; (iv) it is a time and cost-efficient way to deliver condition monitoring (i.e. fault diagnosis application); (v) it is capable of pin-pointing the root-cause and the effect of the problem; and (vi) it is feasible to be implemented in practice. r 2005 Elsevier Ltd. All rights reserved. Keywords: Application; Data-driven technique; Condition monitoring; Diagnosis; Air leaks; Automotive diesel engine
Tel.: +44 24 7657 5441; fax: +44 24 7657 5403.
E-mail address:
[email protected]. 0888-3270/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.ymssp.2005.11.005
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
796
1. Introduction Stringent emission regulations have led automotive manufacturers to develop systems which can detect and diagnose any fault which may cause tailpipe emissions to rise above a prescribed threshold. This can be achieved by continuously monitoring the automotive data characteristics for any abnormal behaviour. Recently, Mills [1] discussed a way to perform automated analysis of automotive data to oversee vehicle system operations, to automate data capture and analysis, and also to improve the diagnostic process. Such an approach can be viewed as a method for improving the reliability, safety and efficiency of the processes as discussed by Isermann [2] and Gertler [3]. This can also be used as a way to conduct fault detection and identification. Previous work by Antory [4] investigated faults in an automotive engine using measurement signals which were available in production engines and excluded the remaining signals which can only be measured in a testbed environment. The work reported in this paper extends previous investigations by using all measurement signals taken from an engine during tests conducted in a laboratory. This additional step allows a complete analysis of the experimental data which may be beneficial in the design, development, manufacturing and service stages of the vehicle lifecycle. A detailed analysis is then performed to demonstrate the detection and diagnosis processes. This paper showed that fault caused by various small leaks (of 2, 4 and 6 mm diameters) in the intake manifold plenum chamber of a turbocharged direct injection (TDI) 1.9 L diesel engine can be well detected and diagnosed. The model, built using a data-driven technique named principal component analysis (PCA), performed more accurate condition monitoring of this fault than that achieved by using a conventional physical model (Section 4). The improved performance is especially apparent for the smallest air leak (with a diameter of 2 mm). This paper is organised as follows: Section 2 describes the data-driven technique, the PCA method, which is followed by a discussion of the experimental data in Section 3. Section 4 discusses the condition monitoring process where the detection and diagnosis of various sizes of air leak is explained in detail. Finally, Section 5 concludes this paper. 2. Data-driven technique for condition monitoring This section discusses the data-driven technique known as PCA. PCA has gained considerable attention mostly in the field of industrial chemical and semiconductor processes for condition monitoring [5–8]. The technique can be successfully applied to automotive applications [4]. 2.1. The PCA method The different types of signals collected from the process are recorded in a range of different unit scales. Therefore, in PCA, normalisation is an essential first stage to make the variance between one process variable comparable to that of any other [9]. Normalisation can be done by mean-centring or auto-scaling the raw data, the latter is done by dividing the mean-centred data by its standard deviation. The normalised data are then stored in column vectors that form a matrix. PCA identifies a combination of variables that describe major trends in the data set. It relies on an eigenvector decomposition of the covariance or correlation matrix of the process variables [10]. The most important information can then be described using a small number of principal components (PCs). PCA is a powerful tool in this respect, for analysing multivariate data sets [11]. For a given data matrix, X 2
k X
ti pTi þ E.
(1)
i¼1
X is decomposed into a sum of vector products of PC score vectors ti, stored as column vectors in T, and PC loading vectors pi, stored as column vectors in P, where kon represents the significant process variation
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
797
shown by the first k dominant eigenvectors of the correlation matrix Sxx defined as follows: Sxx ¼
1 XT X 2
(2)
Here, X is auto-scaled to have mean-centred and unit variance. The residual matrix E describes unimportant variation P and noise in the original data X. The important variation is stored in the estimation matrix ^ ¼ k ti pT , thus, the residual E can be written as follows: X i i¼1 ^ E ¼ X X.
(3)
Whilst the elements in the loading vectors describe the coefficient of the linear relationships between the process variables, the elements in the score vectors represent the variation in these variables. The model is built by determining k, which represents a reduced set of PCs that describe significant process variation. The loading vectors pi are the eigenvectors of the correlation matrix Sxx, which can be formulated as follows: Sxx pi ¼ li pi ,
(4)
where li is the eigenvalue associated with the eigenvector pi of the correlation matrix. It measures the amount of variance explained by the fti ; pi g pairs, which are arranged in descending order of li. Consequently, the first k pairs capture the largest amount of variation which contains the largest amount of information from the original data. The score vector ti is the linear combination of the original data matrix X defined by loadings pi as shown below: Xpi ¼ ti .
(5)
This transformation enhances the ability of PCA to extract information from the original data by eliminating redundant information. The reduced set of variables is then used for modelling and analysis. Kourti and MacGregor [8] stated that this new reduced data set often contains more robust information of the process than the original data. More details about PCA can be found in Jackson [10] and Jolliffe [12]. 2.2. Monitoring statistics Using PCA for condition monitoring involves the application of a PCA model to new observed data. The procedure applied is similar to that of building the actual model. The observed data are normalised using the mean and standard deviation of the PCA model. By using the same number of PCs retained to build the model k, the loadings P, and the correlation matrix Sxx, condition monitoring can be performed. The examination focuses on the variation of the observed data within the PCA model and the mismatch between the PCA model and the observed data. 2.2.1. The Hotelling’s T 2 statistic The Hotelling’s T 2 statistic gives a measure of significant variation of the process. It is simply the sum of normalised squared scores divided by their variance. The PC score t is obtained by projecting the new observed data xnew onto the plane defined by the PCA loadings P. This can be summarised as follows: t ¼ xnew PT , T 2 ¼ tT K1 t ¼
(6) k X t2 i
i¼1
li
,
(7)
where K1 is a diagonal matrix of the inverse of the k largest eigenvalues li of correlation matrix Sxx in descending order, and ti is the ith score.
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
798
The Hotelling’s T 2 statistic can be plotted as a function of time. The statistical thresholds for T 2 can be calculated using the F-distribution [10,12] as follows: T 2a ¼
kðm 1Þ F a ðk; m kÞ, ðm kÞ
(8)
where T 2a is the threshold value with significance level of confidence, a typically 95% or 99%, m is the number of samples used to build the PCA model, k is the number of PCs retained and F a ðk; m kÞ is the upper 100a% critical point of the F-distribution with k and (mk) degrees of freedom. 2.2.2. The Q (residual) statistic The Q statistic gives the measurement uncertainty between the PCA model and the observed data. It shows how well the newly observed data conforms to the PCA model. The mismatch between measured and estimated sensor readings results in the residual e, which forms the basis of the Q statistic, which is formulated as follows: e ¼ x tPT ¼ x½In PPT .
(9)
The Q statistic is simply the sum squared of the residual e, thus Q ¼ eT e ¼
n X
e2j ,
(10)
j¼1
where ej is the jth residual. The Q statistic can be plotted as a function of time. The statistical thresholds for the Q statistic [13] can be calculated as follows: !1=h0 pffiffiffiffiffiffiffi h0 ca 2y2 y2 h0 ðh0 1Þ Qa ¼ y1 þ þ1 , (11) y1 y21 P P P where y1 ¼ ni¼kþ1 li , y2 ¼ ni¼kþ1 l2i , y3 ¼ ni¼kþ1 l3i , h0 ¼ 1 ð2y1 y3 =3y22 Þ and ca is the normal deviate corresponding to the (1a) percentile. 2.2.3. Geometrical interpretation of the monitoring statistics The geometrical interpretation of the Hotelling’s T2 and the Q statistics is illustrated in Fig. 1 for a twodimensional (2D) plane formed by the first and second PCs. Point A shows the orthogonal deviation of a new sample perpendicular to the ellipse plane model, while point B shows the horizontal deviation of a new sample from the centre of the ellipse plane model. The deviation represents a serious effect of the abnormal situation to the process. The further away this deviation is from the ellipse plane model the more serious the effect of the fault which has occurred.
Fig. 1. Geometric interpretation of the monitoring statistics.
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
799
The two monitoring statistics mentioned above can compliment each other to produce more accurate condition monitoring. However, when the effect of the fault only emerges in one of the monitoring statistics, it may cause confusion in the analysis and interpretation of results. A joint monitoring statistics plot which combines the Q and the Hotelling’s T2 statistics may give a better interpretation. This is discussed in detail in Section 2.2.4. 2.2.4. Kernel density method for joint diagnostics Chen et al. [14] stated that in process condition monitoring, the Q and Hotelling’s T 2 statistics are the most important statistical parameters. They are useful for monitoring the system performance independently to detect any abnormal situation. Combining both statistics can improve the sensitivity of the individual monitoring statistic, especially when dealing with incipient fault, such as small air leaks. This can be done by simply using an individual statistic’s confidence limit to create joint diagnostics confidence limits (i.e. by plotting the Q against T 2 statistics in 2D features). Alternatively, a new confidence region can be generated from the probability density functions (PDFs) of the joint Q and T 2 statistics using the kernel density estimation (KDE) method [14]. A PDF describes the likelihood with which a data point has occurred in previous process operations. The KDE assumes that the determination of the density function is approximated by a sum of small kernel functions (e.g. of a Gaussian or Epanechnikov type) centred on each data point. Using the kernel, confidence regions are determined entirely from the structure contained in the data set without reference to a parametric model. KDE provides simple, reliable and useful information to a wide range of applications in fields such as medicine, engineering and economics [15]. The univariate kernel density estimator can be formulated as follows: f^ðx; hÞ ¼ ðnh1 Þ
n X
Kfðx X i Þ=hg,
(12)
i¼1
R where K is a kernel function that satisfies the condition KðxÞ dx ¼ 1, and h is the bandwidth. Using a rescaling notation where K h ðuÞ ¼ h1 Kðu=hÞ, Eq. (12) is transformed into f^ðx; hÞ ¼ n1
n X
K h ðx X i Þ.
(13)
i¼1
A unimodal PDF that is symmetric about zero is usually chosen for K. One important aspect when using the non-parametric approach KDE is the determination of the bandwidth h. Wand and Jones [15] state that even though it is possible to choose the bandwidth subjectively by eye in many situations, it is very time-consuming, especially if one has no prior knowledge of the structure of the data. They proposed the use of an automatic bandwidth selector. In this paper, a mean integrated squared error (MISE) type of automatic bandwidth selector cross-validation is adopted. The extension from univariate to multivariate KDE requires some modification. The bandwidth h is transformed into a bandwidth matrix H using a diagonal matrix with one parameter as follows: H ¼ h2 I, where I is an identity matrix. In order to avoid a loss of accuracy by forcing the bandwidth to be the same in all dimensions, Fukunaga [16] suggested rescaling the data and stated that rescaling makes all variables become the same in all dimensions. This reduces the computational load and provides a reasonable choice for bandwidth selection. The determination of H has the effect of minimising the global error criterion. For MISE cross-validation, H is given by Z MISEff^ð:; HÞg ¼ E ½ f^ðx; HÞ f ðxÞ2 dx, (14) where f^ðx; HÞ is the fitted density function and f ðxÞ is the real density function. The multivariate kernel density estimator can then be written as follows: f^ðx; HÞ ¼ n1
n X i¼1
K H ðx Xi Þ,
(15)
ARTICLE IN PRESS 800
D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
where H is a bandwidth matrix made up of a symmetric positive definite d d matrix. In analogy to the R univariate version, K H ðxÞ ¼ jHj1=2 KðH1=2 xÞ, where KðxÞ dx ¼ 1. The PDFs of the joint monitoring statistics between the Q and Hotelling’s T 2 statistics can be built using ( ) Qi Q . In this paper, a 99% confidence and Xi ¼ Eq. (15) with a small modification, where x ¼ 2 T 2i T region is adopted, which means that under normal operating condition not more than 1% of the total observed data lie outside of this region. More detailed information about KDE can be found in [15,17]. Examples of the application of KDE to process monitoring can be found in [18,19]. 3. An experimental automotive diesel engine This section explains the procedure used to obtain the experimental data from an engine test-cell facility. A description of the automotive diesel engine used in this study is briefly presented. This is followed by a discussion of the types of fault conditions investigated. 3.1. Design of experiment A four-cylinder Volkswagen 1.9 L TDI diesel engine was used to provide the experimental data. The engine was coupled to a 145 kW a.c. Schenck dynamometer and Ricardo control system in an instrumented test-bed facility. A photo of the test laboratory is given in Fig. 2. The fault-free or baseline performance characteristics of the engine were recorded at steady-state conditions with speed settings of 1500, 2500, 3500 and 4500 rev/min, respectively. Five different pedal positions, ranging from 30% to 100%, were tested at each speed. These test conditions are summarised in Table 1. The values of the pedal positions were chosen using the following procedure. Firstly, the peak torque values at each speed were recorded. Next, the pedal positions corresponding to 20%, 40%, 60%, 80% and 100% of these peak torque values were noted. These pedal positions were then used during both fault-free and faultcontaining tests. This ensured that the same inputs were used in all tests. Experimental data from a total of 20 different combinations, covering a wide range of operating conditions, were therefore used to develop the model. Each steady-state condition was recorded for 30 s at a sampling rate of 10 Hz. A total of 300 points were therefore recorded for each combination, producing an overall total of 6000 points across the entire
Fig. 2. Photo of engine test cell with Volkswagen diesel engine connected to a chassis dynamometer.
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
801
Table 1 Matrix of speed/load settings used during the engine tests Speed (rev/min)
Pedal position (% load)
1500 2500 3500 4500
30 49 57 62
40 59 64 65
54 74 74 76
62 78 80 83
100 100 100 100
Table 2 Recorded experimental engine signals Engine variable
Unit
Note
Speed Pedal position
rev/min %
Input
Fuel flow Air flow Intake manifold pressure Intake manifold temperature Turbine inlet pressure Turbine inlet temperature Turbine exit pressure Torque Turbo speed CO2 HC O2
kg/h kg/h bar 1C bar 1C bar Nm Hz % ppm %
Output
range of steady-state driving conditions. At each point, the signals from 12 transducers were recorded using the combined input settings shown in Table 2. The first seven outputs are available in production engines, while the remaining five outputs can be captured using laboratory instruments. The model derived did not include engine speed and pedal position as these represent input parameters that are set by the test-cell operator via the dynamometer control system. Including these inputs for modelling will not give additional information since they represent the ideal situation of steady-state behaviour. It would be a different matter for a transient dynamic experimental case where the dynamic characteristics of the inputs (speed and load) heavily influence the output signals. However, in the case, it is mandatory to include them in the model. In this steady-state experimental case, the remaining 12 output variables, shown in Table 2, can be affected by any operational fault that occurs. 3.2. Fault investigated: air leak in the intake manifold The fault to be examined was an air leak in the intake manifold. This particular kind of fault can be difficult to detect as, under a range of operating conditions, the turbocharger waste gate will inherently try to counteract the fault and maintain the manifold boost pressure at a pre-determined level. Consequently, depending on the magnitude of the air leak, the fault may be imperceptible to the driver. However, the engine management system (EMS) assumes that all of the air, which passes the airflow meter will subsequently enter the combustion chamber. If some of this air escapes from the manifold, then the overall air–fuel ratio will be lower than that assumed by the EMS. This could therefore lead to an increase in the levels of carbon monoxide, unburned hydrocarbons and particulate matter being released into the atmosphere, especially at full load conditions. Depending on the location of the leak within the intake manifold and the method of control used, the exhaust gas recirculation process may also be affected leading to an increase in NOx emissions.
ARTICLE IN PRESS 802
D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
Fig. 3. Raw plot of the experimental data for all measured signals.
In this investigation, the air leak was created by drilling holes of 2, 4 and 6 mm diameters in a removable bolt in the inlet manifold plenum chamber. The manifold was pressure tested for leaks prior to the experimental leak being introduced. The complexity of the combustion process makes the identification of such a fault a difficult task. The effect of these leaks on the raw data is shown in Fig. 3, which highlights the fact that it is difficult to identify the abnormal effect. Consequently, with the exception of the + 6 mm hole, this type of fault would be difficult to detect using a physical model (see Section 4.1). Of particular interest is the data recorded with a + 2 mm leak which appears to be identical to the data recorded during the fault-free condition. 4. Condition monitoring of intake manifold air leaks This section discusses the condition monitoring process where the detection and diagnosis of air leaks in the inlet manifold plenum chamber is investigated. A comparison between physical and PC models is provided to illustrate the effectiveness of the proposed data-driven model over conventional physical techniques to examine the effect of air leaks, especially for the smallest leak (+ 2 mm). 4.1. Physical model Using a physical model, the air leak rate was calculated using the pressure difference between the manifold and the surrounding atmosphere using the following equation: pffiffiffiffiffiffiffiffiffiffiffiffi _ ¼ A 2rDP C D , m (16) _ is the mass flow rate of air through the hole (kg/s), A the area of the hole (m2), r the density (kg/m3), where m DP the manifold boost pressure (Pa) and CD the coefficient of discharge which was taken to be 0.6. The airflow entering the engine was measured by the airflow meter during engine testing.
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
803
As expected, it was found that the percentage of air lost through the hole increased as the diameter of the hole increased. For the three diameters tested, the highest percentage loss occurred at 1500 rev/min at full load, reaching 1.98%, 7.05% and 15.19% for + 2 mm, + 4 mm and + 6 mm holes, respectively. Under these conditions the airflow rate entering the engine is low due to the low engine speed. However, the manifold boost pressure, which is the driving force for the air leakage flow, is almost at its maximum. Consequently, the air leak flow rate constitutes a high proportion of the flow rate entering the engine. Fig. 4 shows the flow rate of air through the hole as a percentage of air entering the engine. Given that the maximum air leak rate was less than 2% for the + 2 mm hole, with an average value less than 1%, this fault posed a difficult challenge for the fault detection and diagnosis algorithm. 4.2. PCs model Using the 12 output signals taken from the experimental data, a PCA model was built. Table 3 shows the variance captured by PC scores in descending order. 2 mm hole 4 mm hole 6 mm hole
Percentage air loss through hole
14 12 10 8 6 4 2
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Data Points Fig. 4. Percentage air loss caused by 2, 4 and 6 mm air leaks in the inlet manifold plenum chamber for all combinations tested.
Table 3 Variance captured by PCA Number of PC 1 2 3 4 5 6 7 8 9 10 11 12
Eigenvalue
Variance captured by each PC (%)
7.46 3.55 0.44 0.30 0.15
62.13 29.61 3.68 2.52 1.25
0.049 0.040 0.006 0.003 0.001 0.0006 0.0002
0.40 0.33 0.047 0.015 0.011 0.005 0.002
Total sum of variance captured (%) 62.13 91.74 95.42 97.94 99.19 99.59 99.92 99.967 99.982 99.993 99.998 100
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
804
The method of choosing how many PCs to retain was based on the percentage of variance captured by each PC. A minimum of 1% was required for a PC to be included. Popular methods such as the eigenvalue-one rule and the cross-validation procedure are not suitable for this case. Both approaches select two PCs to be retained to build a model. During residual evaluation it was found that using these approaches the residual value still contained a considerable amount of the variation of the original data. Further examination revealed that five PCs captured most of the original variance of the experimental data and left only a negligible level of less than 1% of unimportant variation and noise. Information regarding the methods used to choose the number of PCs is not discussed here due to limitation of space but can be found in Jolliffe [12]. 4.3. Process monitoring of air leaks fault Section 2.2 has discussed the monitoring statistics used to detect air leaks in the intake manifold plenum chamber. To illustrate the monitoring process, a new data set of 150 s (corresponding to 1500 samples) for various driving conditions with a + 2 mm air leak introduced for the last 300 samples at 1500 rev/min and at full load, was used for validation purposes. Figs. 5 and 6 illustrate the monitoring process using Hotelling’s T 2 and Q statistics, respectively. Two confidence limits (99% and 95%) are provided to highlight the violation that was caused by a + 2 mm air leak at full load, 1500 rev/min. While the first 1200 samples remain below the confidence limits, the majority of the last 300 samples (sample 1201 onwards) strongly violate the confidence limits. This abnormal condition caused by the + 2 mm air leak in the intake manifold becomes more apparent in Fig. 7. The joint monitoring statistics plots shown in Fig. 7 can enhance the detection capabilities with increased sensitivity. The first joint diagnostics plot simply combines the two monitoring statistics (Q and T 2) and plots them together on the X- and Y-axis. The validation data set consists of the plus symbol (which represents the first 1200 conforming samples) and the cross symbol (which represents the non-conforming samples caused by the + 2 mm air leak at sample numbers 1201–1500). There are four regions in Fig. 7(a) defined by the two 99% confidence limits of the monitoring statistics, denoted as R1, R2, R3 and R4, respectively. R1 illustrates the normal region containing samples which fall below both confidence limits. In contrast, as can be seen R3 is the region containing those samples which violate both confidence limits. The samples contained in this region represent the most abnormal conditions and stem mostly from the + 2 mm condition represented by the cross
40
Hotelling's T2 statistic value 99% Confidence limit 95% Confidence limit
Hotelling's T2 Statistic
35 30 25
Non-conforming samples
20 15 10 5 300
600 900 Sample Number
1200
Fig. 5. Process monitoring using Hotelling’s T 2 statistic.
1500
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
0.6
805
Q Residual statistic value 99% Confidence limit 95% Confidence limit Non-conforming samples
Q Residual Statistic
0.5
0.4
0.3
0.2
0.1
300
600 900 Sample Number
1200
1500
Fig. 6. Process monitoring using Q residual statistic.
symbol. R2 and R4 contain samples which violate either the Q residual (R2) or Hotelling’s T 2 (R4) statistics alone. The second joint diagnostics plot shown in Fig. 7(b) utilises a confidence region estimated using the kernel density method as discussed in Section 2.2.4 for condition monitoring. Here, the contour represents the 99% confidence region of the joint PDF between the Q and T 2 statistics. Any points falling outside the contour represent outliers which occurred as an effect of the air leaks. Further analysis of the effect of the air leaks can be retrieved using contribution to T 2 and Q statistics. Fig. 8 shows an analysis of the variation that is not captured by the model (T 2 statistic). It is obvious that the HC (hydrocarbon) measurement signal is affected the most when air leaks occurred, especially from points 1200 onwards where air leaks are introduced. In a similar fashion, Fig. 9 shows an analysis of the mismatch between the diagnostic model and the unseen measured signals (Q statistic). It clearly shows that fuel flow measurement signal is affected the most by air leaks, especially from point 1200 onwards. 5. Conclusions This paper demonstrated that a data-driven monitoring technique, PCA, is a simple, straightforward, powerful and potentially useful technique for condition monitoring in automotive applications. The diagnostic model is capable of exploring and exploiting underlying ‘hidden’ information from the experimental data in a compact manner. No requirement to make any simplifying assumptions is needed in building the model. This means that the model is derived solely from the measurement signals. The interdependency of the original signals is ‘captured’ and ‘transformed’ into a new and smaller number of independent signals (Section 2). The remaining un-captured signals will contain mainly un-informative and noisy data. The variation (T 2 statistic) and the residual generator (Q statistic) of these un-captured signals are used as the back-bone of fault detection and diagnosis process. It was shown in Section 4 that the diagnostic PCA model performed better in comparison to a physical model (where an assumption is made to define a coefficient discharge, CD) when detecting air leaks at the intake manifold plenum chamber, especially for a small diameter air leak (+ 2 mm). Using two joint monitoring statistics plots, a clearer detection and diagnosis can be visually represented and a better analysis can be carried out. A confidence region estimated using kernel method increases the sensitivity of the monitoring process. It allows easier visual representative interpretation
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
806
Conforming samples Non-conforming samples
40
Hotelling's T2 Statistic
35 30 25 R4
R3
R1
R2
20 15 10 5 0.1
0.2
(a) 40
0.3 0.4 Q Residual Statistic
0.5
0.6
Conforming samples Non-conforming samples
Hotelling's T2 Statistic
35 30 25 20 15 10 5 0 0 (b)
0.1
0.2 0.3 0.4 Q Residual Statistic
0.5
0.6
Fig. 7. (a) Process monitoring using combined Q residual and Hotelling’s T 2 statistics. (b) Process monitoring using a kernel density confidence region of the combined Q residual and Hotelling’s T 2 statistics.
thereby improves the detection and diagnosis of small air leaks (see Fig. 7(b)) in comparison to joint diagnostics built by simply combining both monitoring statistics (see Fig. 7(a)). Further analysis using contribution to T 2 and Q statistics show the effect of the air leaks on fuel consumption and indicate that they may contribute to emissions fault. Another important benefit of using this diagnostic model is that it can be used to detect and to diagnose any type of fault (within the scope of the measured signals) in a similar manner to the air leaks fault. The proposed technique has therefore shown good potential to automotive applications. It may be a valuable tool for a variety of condition monitoring situations, especially as the emissions regulations become increasingly stringent.
ARTICLE IN PRESS D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
807
Fig. 8. Contribution to Hotelling’s T 2 statistic for various data points (fault-free and faulty conditions).
Fig. 9. Contribution to Q residual statistic for various data points (fault-free and faulty conditions).
Acknowledgements David wishes to acknowledge the support of Dr. Darja Brandenburg for proofreading this manuscript. Comments and suggestions received from Dr. Geoffrey McCullough and Dr. Paul McEntee of Internal Combustion Engines Research Group (ICERG) and Prof. George W. Irwin and Dr. Uwe Kruger of the Intelligent Systems and Control (ISAC) Research Group, Virtual Engineering Centre, Queen’s University Belfast are gratefully acknowledged. Special thanks to the Electrical Test for Advanced Architectures team at International Automotive Research Centre (IARC), University of Warwick for their support and encouragement. References [1] W.N. Mills III, Automated analysis of automotive data, in: SAE World Congress, Vehicle Diagnostic, SP-1922, No. 2005-01-1437, Detroit, USA, April 2005. [2] R. Isermann, Model-based fault detection and diagnosis—status and applications, Annual Reviews in Control 29 (2005) 71–85.
ARTICLE IN PRESS 808
D. Antory / Mechanical Systems and Signal Processing 21 (2007) 795–808
[3] J. Gertler, Fault Detection and Diagnosis in Engineering Systems, Marcel Dekker, New York, USA, 1998. [4] D. Antory, Fault diagnosis applications using nonlinear multivariate statistical process control, Ph.D. Thesis, School of Electrical & Electronics Engineering, Virtual Engineering Centre, Queen’s University Belfast, Belfast, Northern Ireland, UK, February 2005. [5] S.J. Qin, Statistical process monitoring: Basics and beyond, Journal of Chemometrics 17 (2003) 480–502. [6] J.F. MacGregor, Data-based methods for process analysis, monitoring and control, in: Proceedings of the 13th IFAC Symposium on System Identification, Rotterdam, The Netherlands, 2003, pp. 1019–1029. [7] J.F. MacGregor, T. Kourti, Statistical process control of multivariable processes, Control Engineering Practice 3 (1995) 403–414. [8] T. Kourti, J.F. MacGregor, Process analysis, monitoring and diagnosis, using multivariate projection methods, Chemometrics and Intelligent Laboratory Systems 28 (1995) 3–21. [9] P. Geladi, B.R. Kowalski, Partial least-squares regression: A tutorial, Analytica Chimica Acta 185 (1986) 1–17. [10] J.E. Jackson, A User Guide to Principal Components, Wiley, New York, USA, 1991. [11] K.V. Mardia, J.T. Kent, J.M. Bibby, Multivariate Analysis, Academic Press, London, UK, 1979. [12] I.T. Jolliffe, Principal Component Analysis, Springer, New York, USA, 1986. [13] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [14] Q. Chen, U. Kruger, M. Meronk, A.Y.T. Leung, Synthesis of T 2 and Q statistics for process monitoring, Control Engineering Practice 12 (2004) 745–755. [15] M.P. Wand, M.C. Jones, Kernel Smoothing, Monographs on Statistics and Applied Probability, vol. 60, Chapman & Hall, London, UK, 1995. [16] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, London, UK, 1990. [17] B.W. Silverman, Density Estimation for Statistic and Data Analysis, Monograph on Statistics and Applied Probability, vol. 26, Chapman & Hall, London, UK, 1986. [18] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance monitoring charts, Journal of Process Control 6 (1996) 349–358. [19] Q. Chen, R. Wynne, P. Goulding, D.J. Sandoz, The application of principal component analysis and kernel density estimation to enhance process monitoring, Control Engineering Practice 8 (2000) 531–543.