Forecasting carbon monoxide concentrations near a sheltered intersection using video surveillance and neural networks: A comment

Forecasting carbon monoxide concentrations near a sheltered intersection using video surveillance and neural networks: A comment

Tronspn Res.-D, Vol. 2, No. 3, pp. 221-222, 1997 0 I997 Published bv Elsevier Science Ltd Pergamon Allrights reserved. Printed in Great Britain 136...

226KB Sizes 0 Downloads 5 Views

Tronspn Res.-D, Vol. 2, No. 3, pp. 221-222, 1997 0 I997 Published bv Elsevier Science Ltd

Pergamon

Allrights

reserved. Printed in Great Britain 1361-9209/97 $17.00+0.00

PII: S1361-9209(97)00012-6

NOTES AND COMMENTS FORECASTING CARBON MONOXIDE CONCENTRATIONS NEAR A SHELTERED INTERSECTION USING VIDEO SURVEILLANCE AND NEURAL NETWORKS: A COMMENT* MARK Centre for Research

on Transportation

S. DOUGHERTY

and Society (CTS), Hagskolan

Dalarna,

S-78 188 Borllnge,

Sweden

and LAURIE The Institute

of Public Policy, George

A. SCHINTLER Mason

(Accepted

University,

Fairfax,

VA 22030-4444,

U.S.A.

19 April 1997)

In a recent article, Moseholm et al. (1996) argue that neural networks are an effective and efficient method for exploring complex interrelationships between traffic, wind, and short-term carbon monoxide (CO) concentrations near intersections. They argue furthermore that because the neural network can be trained to “learn” these dynamics, they are likely to produce forecasts of emissions that are superior to those generated by traditional linear regression methods and dispersion models. This is an interesting idea, which attacks what has always been a very difficult problem. To illustrate these points, they conduct an experiment in which they train a neural network using data on traffic conditions, on-site wind parameters, and CO concentrations near a sheltered intersection, and using a separate data set containing the same variables, they compare the predictive accuracy of the neural network, standard linear regression methods, and two dispersion models. They conclude that the neural network is the most accurate predictor of CO concentrations. Finally, a “sensitivity analysis” of the trained neural network is conducted to demonstrate how this method can be used as a tool for better understanding complex interrelationships between these variables. Unfortunately the authors have, perhaps unwittingly, stepped into some rather tricky territory by their use of a neural network and there are several points concerning this paper which we feel obliged to criticise, since they bring into question the validity of many of the arguments presented. First, it appears that their neural network was overtrained, ‘overfitting’ the training data. Overfitting is described as a circumstance in which the neural network simply learns the training data and not the underlying function, consequently affecting the ability of the neural network to generalize to a new data set (Hush and Horne, 1993). Overtraining, in their case, is evident from the rather large difference in R2 between the training and test set; 0.925 in contrast to 0.692. It is also evident from the overspecified nature of their neural network, as well as the lack of data examples used to train their network. It is well known that overfitting is likely to occur when there are too few training examples, too many predictor variables, and/or too many hidden neurons (Hush and Horne, 1993), all of which they violate in their neural network. They train with only 179 examples, and test it with only 22 examples. The backpropagation neural network they use has 32 input variables, 17 hidden neurons, and 1 output variable, equating to 561 internal weights. Thus their model has considerably more degrees of freedom than the data set used to train it. Nowhere in their paper is it demonstrated how this specification was chosen, or that it is in fact Editor’s Note: A copy of this comment reply has been received.

was sent to the contact

221

author

of the original

paper

for possible

reaction

but no

222

Notes and Comments

‘optimal’. Although determining an optimal specification is not always so straightforward, there are some techniques, such as cross-validation, that one can employ to approach this design. Crossvalidation requires separating the data set into three sets for training, testing and validation respectively. A training set is utilized to train the network. As the neural network is being trained, its generalization capability is periodically monitored using a test set. Training is terminated when the performance on the test data set begins to deteriorate. Finally, the neural network is validated using a third independent validation set. This is necessary because the test set is used during the training process and is therefore not considered independent. A second doubt we have concerns their use of the neural network to conduct what they term a “sensitivity analysis”. Although they argue that the neural network sensitivity analysis helped to uncover important interrelationships between variables, many of their findings could have just as easily been derived from the raw data itself or from the statistical analysis they performed as part of the study. For example, their plot of volume, occupancy, and neural network generated CO concentrations is almost identical to their plot of the actual values of these variables. Furthermore, they claim that the sensitivity analysis helped in identifying the interaction between wind direction and lane specific occupancy, yet this very relationship was identified by their original statistical analysis. In other words, it is never made clear as to what the true role of the neural network was in determining relationships between variables. One should also note that the sensitivity of an individual input of a neural network is dependent on the values of all the inputs to the network (including the input under consideration). It is therefore necessary to examine a sensitivity distribution by repeating the calculations across the entire training set (Bishop, 1995); it is not made clear whether this precaution was taken. This reference also points out that sensitivities are only valid for small changes in network inputs, yet the authors performed their test by displacing inputs to their minimum and maximum values, which implies rather large changes. Despite these additional complications, we agree that training a neural network and then performing a sensitivity analysis is an interesting concept and there are certain instances in which the technique has been demonstrated to be useful. For example, in applications in which the output is discrete, such as in the case of a discrete choice problem, the neural network provides us with the necessary machinery to translate the raw data into the domain of continuous probabilities (Dougherty, 1995a). A sensitivity analysis then reveals information which could not be deduced directly from the data. Another possibility is to use a sensitivity test to prune superfluous inputs to a neural network (Dougherty and Cobbett, 1996). A final difficulty with their paper concerns their argument for using the neural network over standard statistical procedures. This argument is based on the fact that the neural network outperformed the statistical methods they employed in their study. Given that a nonlinear relationship exists, it is hardly surprising that a linear statistical technique did not perform well. They could have used a non-linear regression method, yet they opted not to do this. Furthermore, they used stepwise regression. This type of regression does not perform adequately if there are interrelationships between independent variables (Studenmund, 1992). These interrelationships are of course the basis of much of their paper. In summary, the presence of nonlinearity and significant covariance amongst input variables is not necessarily a convincing argument for using neural networks (Dougherty, 1995b). We do not dispute their usefulness, but do argue that they must be used with considerable care, taking into account the points we have raised above. This is particularly the case when carrying out sensitivity tests, where there are several subtle pitfalls awaiting the unwary. However, despite these reservations we enjoyed the paper and look forward to further articles on this subject from the authors. REFERENCES Bishop, C. (1995) Neural N&works for Partern Recognifion. Oxford University Press, Oxford, UK. Dougherty, M. S. (1995a) Exploring traffic systems by elasticity analysis of neural networks. Proceedings of the Inrernational Symposium on Neural Nerwork Applications in Transport, Helsinki, Finland. Dougherty, M. S. (1995b) A review of neural networks applied to transport. Transportation Research-C 3, 247-260. Dougherty, M. S. and Cobbett M. R. (1997) Short term inter-urban traffic forecasts using neural networks. International Journal of Forecasting, 13, 2 I-3 I. Hush, D. R. and Horne, B. G. (1993) Progress in supervised neural networks what’s new since Lippmann? IEEE Signal Processing Magazine, lO( I), 8-39. Moseholm, L., Silva, J. and Larson, T. (1996) Forecasting carbon monoxide concentrations near a sheltered intersection using video surveillance and neural networks. Transportation Research-D 1, 15-28. Studenmund, A. H. (1992) Using Econometrics: A Practical Guide. Harper Collins, New York.