Genetic polynomial regression as input selection algorithm for engine load prediction in combine harvesters

Genetic polynomial regression as input selection algorithm for engine load prediction in combine harvesters

Copyright © 2004 IF AC Fifth International Workshop on Artificial Intelligence in Agriculture, Cairo, Egypt GENETIC POLYNOMIAL REGRESSION AS INPUT SE...

6MB Sizes 0 Downloads 84 Views

Copyright © 2004 IF AC Fifth International Workshop on Artificial Intelligence in Agriculture, Cairo, Egypt

GENETIC POLYNOMIAL REGRESSION AS INPUT SELECTION ALGORITHM FOR ENGINE LOAD PREDICTION IN COMBINE K. Maertens· J. De Baerdemaeker· R . Babuska··

• Katholieke Universiteit Leuven, Laboratory for Agro-Machinery and Processing, Kasteelpark Arenbery 30, B-3001 Leuven, Belgium •• Delft Center' for Systems and Control, Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands

Abstract: The performance of non-linear identification techniques is often determined by the selected input variables and the corresponding time lags. High correlation coefficients between candidate input variables in addition to a non-linear relation with the output signal induces the need for an appropriate input selection methodology. This paper proposes a genetic polynomial regression technique to select the significant input variables for the identification of non-linear dynamic systems with multiple inputs. An evolutionary measure is presented to visualize and to process the results from different selection runs. This evolutionary approach was applied to select an optimal set of variables for engine load prediction in combine harvesters. Copyright ©2000 [FAC Keywords: Genetic algorithms, Variable selection, Non-linear systems, Takagi-Sugeno fuzzy models, Agricultural machinery, Combine harvester

1. INTRODUCTION

In this study, an alternative approach is proposed to reduce the number of possible signal combinations by means of an evolutionary strategy. A polynomial regression is used to approximate the general non-linear, multi-variable process behaviour. This genetic polynomial regression approach is applied to select those variables that have a significant impact on the engine load of a combine harvester. A Takagi-Sugeno fuzzy model was identified with the proposed set of regressor variables.

fhe data-driven identification of complex, nonlinear systems requires the application of an appropriate input selection methodology to identify the significant regressor variables. The high correlation between candidate regressor variables, in addition to a non-linear relationship deteriorates the performance of the standard multi-variate ,tatistical tools. A straightforward, data-driven method to select the key process variables is to :ompare the performance of all possible variable :ombinations by identifying and validating the tlCcompanying non-linear models. However, this problem is non-trivial. A set of p candidate input variables brings about 2P-l possible combinations (e.g. 10 candidate variables produce 1023 possible models), which are impossible to evaluate with 00l'anced identification techniques as fuzzy models )r neural networks.

2. GENETIC POLYNOMIAL REGRESSION The non-linear relation between input and output variables, together with a high interaction between the candidate regressor variables, engender the need for a special input-output representation. Polynomial terms offer a direct method to model

81

this complex non-linear interaction. Important signal combinations can be found by determining those polynomial terms that have a significant impact on the accuracy of the full model.

p-p-1; n-n_init; else 1. no variable to delete n-n-1; end; end;

2.1 Polynomial regression

where Si are combinations of k indices (with repetition) chosen from the set 1,2, ... p, where k = 1,2, .. . d. Parameter vector A = [ao, ai, . . . ,an]T can be estimated by means of standard leastsquares techniques.

The algorithm is initialized with n;nit polynomial terms and all p potential input variables. Subsequently, the genetic polynomial regression algorithm gpoly is carried out and the resulting polynomial structure str is analyzed. If one of the variables Xi is absent in all n polynomial temlS, it is removed from the set X ~el and the procedure is repeated with p -1 variables and the initial ninit polynomial terms. If all variables are present in the polynomial structure str, the number of terms n is reduced and the gpoly-dll is called until one of the variables X~el diminishes. This backward selection procedure is repeated until a predefined, minimal number of Pmin variables is reached . The residual means squared error RMSE is registered to visualize the evolution of the final fitness value.

The largest possible number n max of these combinations (and therefore the maximal number of terms in the polynomial) is: .

The user has to choose the initial number of terms ninit (typically ninit ~ 1.5p to 2p) and to define the final number of the regressors Pmin'

More formally, we denote by X = [x I, ... , xp]T the vector of potential regressors (input variables) . The regressand (output) y is modelled as a dth-degree polynomial in X and defined as:

"

IT

i=1

VjE~ i

Y=P(X;d, n,A)=~ai

n max

xj+ao (1)

_~(n+k-l) L...J k

-

(2)

2.3 Statistical interpretation of evolutionary runs

k=1

The results of the stepwise selection procedure are affected by the random operations in the genetic algorithm (mutation, cross-over) and the initial at random selection of ninit polynomial terms. The backward selection procedure should be repeated a number of times to reduce the impact of these random effects on the selection results.

and the number of possible model structures containing a selection of these n max terms is 2nmax -1.

2 .2

Evolutionary selection of polynomial tenn8

Clearly, it is not possible to compare the predictive power of all 2P - 1 polynomial models. Therefore, a genetic algorithm is introduced to select a subset of relevant input variables. The algorithm evolves 'in parallel' a large number of polynomial structures, which are constructed from the p potential input variables, and selects those polynomial terms that have an important impact on the accuracy of the full polynomil;\l model by calculating the fitness value from the residual Root Mean Squared Error (RMSE) . This eVQlutionary teclmique was implemented in a Matlab Mex-file as follows

The different selection runs can be analyzed by applying an 'All-Selected-Regressions' approach (Maertens et al. 2004 b) . The best subsets of variables are determined for every number of input variables p by introducing an appropriate selection criterion. Subsequently, the optimal pumber of variables p is determined by visual inspection or by applying an information criterion. An alternative way to process the selection results is to look at the resulting selection scenarios. Vari&bles that are usually eliminated at the end of the selection process will be of higher significance compared to variables that are mostly removed during the first elimination rounds. As a consequence, the average lifetime, expressed by LI(~i) , could be considered as a measure for importance.

n-n_init; pep_tot; while (p>p_min). 1. genetiC polynomial regression [str.par.rmse]-gpoly([Xsel y].[d n]);

""N

if (any(-[sum(str)--O]». 1. if possible. delete variable from X i-1; while (avar(i) -- 0). i-i+1; end;

LI (~i) = 100% L.Jk=1

~Pm'" [~. - X

L.Jp,.,

,-.cl (P») .

(3)

N (Ptot -Pmin) where N corresponds to the number of calculated selection runs. Variables that are always eliminated in the first elimination round will have a

Xsel-[Xsel(:.1:i-1) Xsel(:.i+1 :p)]; RHSE=[RMSE;rmse]; 82

Table . 1. Overview of candidate input variables to predict engine load in combine harvesters. Signal Engine load (%) Ground speed (km/h) Crop throughput (V) Torque threshing drum (V) Back/Forw slope (0) Stored crop (t) Handle position (%)

Lag 1,2 1

Xl ,X2

4,5

X4,X5

X6,X7

us

1,2 1

UM UH

1 1

XS

Symbol y UGS UF UTD

• The ground speed of a combine harvester is manually adjusted by means of a handle. The handle position, HPos(%), is directly related to the machine speed and thus determines to the engine load. • The engine load itself, y(%), is estimated by registering the instantaneous fuel consumption and is distorted by strong highfrequency noise disturbances.

Xi X3

X7

i

X10

The selected time lags result from an iterative, non-linear correlation procedure that calculates the optimal time delay between two signals (Maertens et al. 2004a) . No pre-processing algorithms were applied .

'Lifetime Index' of 0%, while variables that are always present after the final elimination round will be linked to a LI(:z:;)-value of 100%.

All measurement devices were installed on a New Holland CX combine harvester and connected to the mobile Controller Area Network (CAN) . Measurements were carried out at a fixed 5Hz sample rate during normal operation in wheat.

3. ENGINE LOAD PREDICTION IN COMBINE HARVESTERS Engines in off-road vehicles are exposed to strongly varying load variations due to other functional tasks besides transport . A combine harvester is a typical example of this type of vehicles. The mounted engine has to provide the power to move through the field and to drive on the road, but also has to drive the different harvest, separation and cleaning elements. Keeping the instantaneous machine load below the 100%-level is necessary to avoid breakdown. The practical implementation of this engine load restriction in an automatic harvest speed controller, requires a proper choice of regressor variables.

4. RESULTS AND DISCUSSION The evolutionary selection procedure of Section 2 was applied on a measured data set of 5158 samples to identify those variables that have an important impact on the engine load. Subsequently, the selected variables were used to identify a TakagiSugeno fuzzy model.

4.1 Evolutionary selection

Table 1 gives an overview of the candidate variables that. were considered as potential input variables .

The bottom-left plot of Fig. 1 gives a two dimensional overview of the 78 selection runs. The upper-right and bottom-right figures provide the frequency that each variable is present in the final set (Pmin = 3) and the 'Lifetime Index' LI(xi), respectively. The final fitness value of the gpolyfunction is added in the upper-left corner and indicates a loss in accuracy when reducing a set of 5 variables. Following input variables were selected by the lifetime index in order of relevance: previous engine load measurement y(k-l) (xd, crop throughput measurement uF(k-4) (X4), machine slope us(k - 1) (X7) and a second engine load measurement y(k-2) (X2)'

• The ground speed signal GSpeed(km/h) is registered via the rotation speed of the driving wheels. • The amount of crop that is harvested instantaneously, Throughput(V), is estimated indirectly by measuring the driving torque on the crop elevator that transports the harvested material into the machine. • The torque on the threshing drum, TDrum(V) , provides a second indication of the crop throughput and is typically used for automatic ground speed control algorithms (Eimer 1974, Jensen 1988). • The backward-forward machine inclination, Slope(O), determines the relation between the speed control input and the engine load via the gravitational force . • The mass of stored crop material, Mass(t), accounts for a significant share (up to 33%) in the total mass of the Iluu:hinc and is estimated on-line by integrating the flow of clean crop material after unloading the grain hin.

4.2 Identification of a Takagi-Sugeno fuzzy model

Following TS fuzzy model structure was chosen, the basis of the selection results (Maertens et al. 2004 c):

OIl

IF (uF(k-4) E Ai) AND (us(k-l) E THEN y(k) = al,iy(k-l)+a2,iy(k-2)+ bF,iUF(k-4) +bs,ius(k-l)+ bH,iUH(k-l) + d;

83

Bd (4)

Training error 4.2

~100 "ii 80

)(

"8

E

)(

~ 3.11 0::

iV 60

)(

c:

"

3.6

~

.S

40

3.4

8 c

20

3.2

~ ll.

Cl) UI

4 6 8 Number of variables p

0

x.,

Point of elimination



1 234 567 8 910

40 -g2 30

e3

~4

20

.~ 5 ... 10

ifj6

~ )( Cl)

~ Cl)

.,E

:5

7 0

2 3 4 5 6 7 8 9 10

1 234 5 6 7 8 910

,

X.

XI

Fig. 1. Results of the genetic polynomial regression algorithm to rank the potential variables Xi for engine load prediction in combine harvesters (N=78,ninit=15,Pmin = 3) . to be redundant with respect to the partition of the premise space. The Gustafson-Kessel fuzzy clustering algorithm (Gustafson and Kessel 1979) was applied for different numbers of clusters according to the methodology described in (Babuska 1985). As a result, three clusters were found, which after projection provide the partition of the antecedent space of Fig. 2. This partition was found in an automatic manner, but at the same time, a good a posteriori interpretation can be given to the obtained partition. The first local model characterizes a low crop throughput and corresponds to the machine regime at which the vehicle is only moving through the field without actually harvesting any crop. The second and third membership function correspond to engine load responses registered for downhill and uphill harvesting, respectively.

0.8 £ 0.6 0.4 0.2

3

Throughput(k-4)

1 2 3

al,i

a2,i

bF,i

bS,i

bH,i

di

1.51 0.94 0.92

-0.625 -0.186 -0.139

2.07 7.36 7.79

0.109 0.676 0.814

0.065 0.125 0.264

-2.84 -13.5 -24.7

The consequent parameters were identified by optimizing a multi-objective cost function to guarantee both model transparency and prediction accuracy (Maertens et al. 2004c). The parameters of the three resulting local linear models are added to Fig. 2 and the corresponding step responses on a variation of the control speed input UI/ are illustrated in Fig. 3.

Fig. 2. Antecedent space and consequent parameters of the Takagi-Sugeno fuzzy model to predict engine load in combine harvesters. The premise variables are the throughput uF(k4) and the backward-forward machine slope us(k1), while two lagged outputs and the handle position uH(k - 1) are also added to the inputs of the consequent rules. The AND operation was realized as the product of the membership degrees. Both lagged outputs y(k-l) and y(k-2) are necessary to capture the system's dynamics, but appear

Local Model 1 approximates the engine load responses when the machine is not harvesting any crop or only small amounts. The recorded throughput variations are mainly induced by

84

A fuzzy Takagi-Sugeno engine load prediction model was successfully identified on the basis of the selected regressor variables. Both, the position of the local models as the consequent rules could be physically interpreted.

0.9 O.S

~

0.7

'" 0.6

]

6. ACKNOWLEDGEMENTS

-;; 0.5 c

'ao ~ 0.4

The authors thank Attila Almos (University of Budapest) for his contribution to the development of the genetic selection algorithm and its implementation in MATLAB and C. The authors also gratefully acknowledge the LW.T . (Instituut voor Wetenschappelijk Technologisch onderzoek) for the financial support through doctoral grant No.3249 and F .W .O. for financing the stay of the first author at the Delft University of Technology.

0.3 0.2 0.1 OLL--~--~--~----~--~

o

4

5

Fig. 3. Step responses of the three local linear models of Fig. 2 as a result from a unitary variation of the handle position Ull.

7. REFERENCES

noise disturbances and unwanted correlation with other varying signals and are therefore less reliable. The engine is only slightly loaded, resulting in faster and less damped dynamic behaviour. Local Model 2 characterizes the system response during downhill harvesting. The harvesting task introduces a higher engine load and therefore slower dynamics. The negative machine slope makes it easier for the engine to respond to variations of the ground speed input signal uH(k1). Local Model 3 describes the highest engine load regime because of the high throughput values and positive machine slopes. This combination makes the engine load more sensitive for handle variations.

Babuska, R. (1985). Fuzzy modeling for control. Kluwer Academic Publishers. Norwell. Eimer, M. (1974). Automatic control of combine threshing process. In: Proc. IFAG Symp. 011 Automatic Control for Agriculture, June 1820. Saskatoon, Canada. Gustafson, D. E. and W. C. Kessel (1979) . Fuzzy clustering with a fuzzy covariance matrix. In: Proc. of IEEE Conference on Decision and Control (CDC). San Diego, USA. Jensen, L. P. (1988). An adaptive feedrate controller for combine harvesters. In: Proc. Int. Symp. Modelling, Identification and Control (lASTED), February 17-19. Grindelwald, Switzerland. Maertens, K, H. Ramon and J. De Baerdemaeker (2004a) . An on-the-go modelling algorithm for separation processes in combine harvesters. Computers and Electronics in Agriculture (In Press). Maertens, K, J . De Baerdemaeker and R. BabuSka (2004b). Genetic polynomial regression as input selection algorithm for nonlinear identification. Soft Computing (Submitted). Maertens, K, T. A. Johansen and R. Babuska (2004c) . Engine load prediction in off-road vehicles using multi-objective nonlinear identification. Control En9ineering Practice (In Press).

5. CONCLUSION The identification of the key process variables is a first, but important step to model complex processes on the basis of a measured data set. Although statistical tools are available to handle data from multi-variate linear processes, special attention should be paid when dealing with dynamic processes that are characterized by nonlinear phenomena. Genetic polynomial regression offers a tool to handle these type of data sets with a minimal amount of a priori process knowledge. This evolutionary approach was used to determine the key variables to predict engine load in a combine harvester. The crop throughput and backward-forward machine slope were found to be significant for engine load prediction, besides previously measured engine load values and the speed control signal.

85