Modelling and Forecasting of Greenhouse Whitefly Incidence Using Time-Series and ARIMAX Analysis

Modelling and Forecasting of Greenhouse Whitefly Incidence Using Time-Series and ARIMAX Analysis

6th IFAC Conference on Sensing, Control and Automation for 6th IFAC Conference on Sensing, Control and Automation for Agriculture Agriculture 6th IFAC...

1MB Sizes 0 Downloads 9 Views

6th IFAC Conference on Sensing, Control and Automation for 6th IFAC Conference on Sensing, Control and Automation for Agriculture Agriculture 6th IFAC IFAC Conference Conference on Sensing, Sensing, Control and Automation Automation for Available online at www.sciencedirect.com 6th on Control and for December 4-6, 2019. Sydney, Australia 6th IFAC IFAC Conference Conference on Sensing, Sensing, Control and Automation Automation for 6th on Control and for December 4-6, 2019. Sydney, Australia Agriculture Agriculture Agriculture Agriculture December 4-6, 2019. Sydney, Australia December 4-6, 2019. Sydney, Australia December December 4-6, 4-6, 2019. 2019. Sydney, Sydney, Australia Australia

ScienceDirect

IFAC PapersOnLine 52-30 (2019) 196–201

Modelling Modelling and and Forecasting Forecasting of of Greenhouse Greenhouse Whitefly Whitefly Incidence Incidence Modelling and Forecasting of Greenhouse Whitefly Using Time-Series and ARIMAX Analysis Modelling and Forecasting of Greenhouse Whitefly Incidence Time-Seriesofand ARIMAXWhitefly AnalysisIncidence ModellingUsing and Forecasting Greenhouse Incidence Using Time-Series and ARIMAX Analysis Using Time-Series and ARIMAX Analysis Using Time-Series and Rustia, ARIMAX Analysis Lin-Ya Chen-Yi Lin-Ya Chiu, Chiu, Dan Dan Jeric Jeric Arcega Arcega Rustia, Chen-Yi Lu, Lu, Ta-Te Ta-Te Lin Lin 

 Rustia, Lin-Ya Chiu, Dan Dan Jeric Jeric Arcega Arcega Rustia, Chen-Yi Lu, Ta-Te Lin Lin-Ya Chiu, Lin-Ya Rustia, Chen-Yi Chen-Yi Lu, Lu, Ta-Te Ta-Te Lin Lin Lin-Ya Chiu, Chiu, Dan Dan Jeric Jeric Arcega Arcega  Rustia, Chen-Yi Lu, Ta-Te Lin   Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taiwan, R.O.C. Department of Engineering, National Taiwan Taiwan, R.O.C. Department of Bio-Industrial Bio-Industrial Mechatronics Mechatronics Engineering, National Taiwan University, University, Department Engineering, National Department of of Bio-Industrial Bio-Industrial Mechatronics Mechatronics Engineering, National Taiwan Taiwan University, University, Taiwan, R.O.C. Taiwan, R.O.C. Taiwan, R.O.C. Taiwan, R.O.C. Abstract: whitefly (Trialeurodes Abstract: Greenhouse Greenhouse whitefly (Trialeurodes vaporariorum) vaporariorum) is is aa major major insect insect pest pest of of greenhouse greenhouse crops. crops. To prevent the damage caused by whiteflies, farmers control the population of whiteflies by spraying spraying Abstract: Greenhouse whitefly (Trialeurodes vaporariorum) is a major insect pest of greenhouse crops. To prevent the damage caused by whiteflies, farmers control the population of whiteflies by Abstract: Greenhouse whitefly (Trialeurodes vaporariorum) is a major insect pest of greenhouse crops. Abstract: Greenhouse whitefly (Trialeurodes vaporariorum) aa major insect pest of crops. pesticides in a regular regular basis. However, pesticides are costly costly andis may affect the environment environment and health of Abstract: Greenhouse whitefly (Trialeurodes vaporariorum) isthe major insect pestwhiteflies of greenhouse greenhouse crops. To prevent the damage caused by whiteflies, farmers control population of by spraying pesticides in a basis. However, pesticides are and may affect the and health of To prevent the damage caused by whiteflies, farmers control the population of whiteflies by spraying To prevent the damage caused by whiteflies, farmers control the population of whiteflies by spraying farmers. To provide a more efficient way for applying pesticides, this research aims to develop a model for To prevent the damage caused by whiteflies, farmers control the population of whiteflies by spraying pesticides in regular basis. However, pesticides are costly and may may affect the environment and health of farmers. Toin provide a more efficient way for applying pesticides, this research aims to developand a model for pesticides aaa regular basis. However, pesticides are costly and affect the environment health of pesticides in basis. However, pesticides are may affect the environment and health of predicting the possible increase in way whitefly population in and greenhouses using autoregressive integrated pesticides in a regular regular basis. However, pesticides are costly costly and may affectusing theaims environment and healthfor of farmers. To provide a more efficient for applying pesticides, this research to develop a model predicting the possible increase in whitefly population in greenhouses autoregressive integrated farmers. To provide a more efficient way for applying pesticides, this research aims to develop a model for farmers. To provide provide a more more and efficient waywith for applying applying pesticides, this research aims aims to develop a model model for moving average (ARIMA) ARIMA exogenous variables (ARIMAX). The data used in this work farmers. To a efficient way for pesticides, this research to develop a for predicting the possible increase in whitefly population in greenhouses using autoregressive integrated moving average (ARIMA) and ARIMA with exogenous variables (ARIMAX). data used inintegrated this work predicting the possible increase in whitefly population in greenhouses using The autoregressive predicting the possible increase in population in autoregressive integrated were collected collected using wireless imaging devices that can can monitor monitor the(ARIMAX). numberusing of whiteflies whiteflies trapped on sticky sticky predicting the using possible increase in whitefly whitefly population in greenhouses greenhouses using autoregressive integrated moving average (ARIMA) and ARIMA with exogenous variables The data used in this work were wireless imaging devices that the number of trapped on moving average (ARIMA) and ARIMA with exogenous variables (ARIMAX). The data used in this work moving average (ARIMA) and ARIMA with exogenous variables (ARIMAX). The data used in this work paper traps using an automatic insect counting algorithm. The wireless imaging devices were installed in aa moving average (ARIMA) and ARIMA with exogenous variables (ARIMAX). The data used in this work were collected using wireless imaging devices that can monitor the number of whiteflies trapped on sticky paper traps using an automatic insect counting algorithm. The wireless imaging devices were installed in were collected using wireless imaging devices that can monitor the number of whiteflies trapped on sticky were collected using wireless imaging devices that can monitor the number of whiteflies trapped on sticky greenhouse that grew tomato seedlings, which is one of the host plants of whiteflies. The ARIMA and were collected using wireless imaging devices that can monitor the number of whiteflies trapped on sticky paper traps using an automatic insect counting algorithm. The wireless imaging devices were installed in greenhouse that grew tomato seedlings, whichalgorithm. is one of The the host plants of whiteflies. The ARIMA and paper traps using an automatic insect counting wireless imaging devices were installed in aa paper traps using an automatic insect algorithm. The wireless devices were installed in ARIMAX models were compared bycounting setting different combinations ofimaging input data. Particularly, ARIMA paper trapsmodels using anwere automatic insect counting algorithm. The wireless imaging devices were installed in aa greenhouse that grew tomato seedlings, which is one of the host plants of whiteflies. The ARIMA and ARIMAX compared by setting different combinations of input data. Particularly, ARIMA greenhouse that grew tomato seedlings, which is one of the host plants of whiteflies. The ARIMA and greenhouse that tomato seedlings, which is of the host plants of whiteflies. The and includes only only thegrew whitefly count while ARIMAX includes the whitefly count andParticularly, environmental data. greenhouse that grew tomato seedlings, which is one one of thethe hostwhitefly plants of whiteflies. The ARIMA ARIMA and ARIMAX models were compared by setting different combinations of input data. ARIMA includes the whitefly count while ARIMAX includes count and environmental data. ARIMAX models were compared by setting different combinations of input data. Particularly, ARIMA ARIMAX models were compared by setting different combinations of input data. Particularly, ARIMA Based on preliminary testing, the minimum number of input data was found to be at around 60 days to 90 ARIMAX models were compared by setting different combinations of input data. Particularly, ARIMA includes only the whitefly count while ARIMAX includes the whitefly count and environmental data. Based ononly preliminary testing, the minimum number includes of input data was found to beand at around 60 days to 90 includes the whitefly count while ARIMAX the whitefly count environmental data. includes only the whitefly count while ARIMAX includes the whitefly count and environmental data. days. ARIMAX was found to be the best model with input data including the increase in whitefly counts, includes only the whitefly count while ARIMAX includes the whitefly count and environmental data. Based on preliminary testing, the minimum number of input data was found to be at around 60 days to 90 days. was found to the be the best model withof input theto increase in whitefly counts, Based on testing, minimum number input data was be 60 days to BasedARIMAX on preliminary preliminary testing, the minimum number of inputdata dataincluding was found found toproposed be at at around around 60 was days to 90 90 temperature and humidity. In average, the RMSE for 7-day forecasting of method found Based on preliminary testing, the minimum number input data was found to be at around 60 days to 90 days. ARIMAX found to be the best with input data including the increase in whitefly counts, temperature and was humidity. In average, themodel RMSE forof 7-day forecasting of the the proposed method was found days. ARIMAX was found to be the best model with input data including the increase in whitefly counts, days. ARIMAX was found to be the best model with input data including the increase in whitefly counts, to be around 1.30. To assist farmers in decision-making for pesticide application scheduling, four levels of days. ARIMAX was found to be the best model with input data including the increase in whitefly counts, temperature humidity. average, the RMSE for 7-day forecasting of the proposed method found to be around and 1.30. To assistIn farmers in decision-making for pesticide application scheduling, fourwas levels of temperature and humidity. In average, the RMSE for 7-day forecasting of the proposed method was found temperature and humidity. In average, the RMSE for 7-day forecasting of the proposed method was found increase in whitefly count were defined such as Normal, Moderate, High, and Critical, which were temperature and humidity. In average, the RMSE for 7-day forecasting of the proposed method was found to be around 1.30. To assist farmers in decision-making for pesticide application scheduling, four levels of increase in whitefly count were defined such as Normal, Moderate, High, and Critical, which were to be around 1.30. To assist farmers in decision-making for pesticide application scheduling, four levels of to be 1.30. To assist farmers in decision-making for application scheduling, four levels of determined using and testing results on aa testing dataset show an F1-scores to be around around 1.30.K-means To count assist clustering farmers inalgorithm, decision-making for pesticide pesticide application scheduling, four levels of increase in whitefly were defined such as Normal, Moderate, High, and Critical, which were determined using K-means clustering algorithm, and testing results on testing dataset show an F1-scores increase in whitefly count were defined such as Normal, Moderate, High, and Critical, which were increase in whitefly count were defined such as Normal, Moderate, High, and Critical, which were of 0.86 and 0.42 for Normal and Moderate levels of daily increase in whitefly count. increase in whitefly count were defined such as Normal, Moderate, High, and Critical, which were determined using K-means clustering algorithm, and testing results onwhitefly testingcount. dataset show an F1-scores F1-scores of 0.86 and using 0.42 for Normalclustering and Moderate levelsand of daily increase inon determined K-means algorithm, testing results aaa testing dataset show an determined using K-means clustering algorithm, and testing results dataset show determined using K-means clustering algorithm, and testing resultsinon on a testing testing dataset Whitefly, show an an F1-scores F1-scores Keywords: Greenhouse, Sticky paper trap, Early warning, Integrated pest management, Forecast. of 0.86 and 0.42 for Normal and Moderate levels of daily increase whitefly count. of 0.86 and 0.42 for Normal and Moderate levels of daily increase in whitefly count. © 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. of 0.42 and Moderate levels of increase whitefly count. Keywords: Sticky trap, Early management, of 0.86 0.86 and andGreenhouse, 0.42 for for Normal Normal andpaper Moderate levelswarning, of daily daily Integrated increase in inpest whitefly count. Whitefly, Forecast. Keywords: Greenhouse, Sticky paper trap, Early warning, Integrated pest pest management, management, Whitefly, Whitefly, Forecast. Forecast. Keywords: Greenhouse, Integrated pest Whitefly,  Keywords: Greenhouse, Sticky Sticky paper paper trap, trap, Early Early warning, warning, Integrated Keywords: Greenhouse, Sticky paper trap, Early warning, Integrated pest management, management, Whitefly, Forecast. Forecast.  Mitri, 2013) and Bayesian Belief Network  Mitri, 2013) and Bayesian Belief Network (BBN) (BBN) for for 1. INTRODUCTION  1. INTRODUCTION infectious disease outbreak prediction.  Mitri, 2013) and Bayesian Belief Network (BBN) for infectious disease outbreak prediction. Mitri, 2013) and Bayesian Belief Network (BBN) for Mitri, 2013) and Bayesian Belief 1. INTRODUCTION Mitri, 2013) andoutbreak Bayesian Belief Network Network (BBN) (BBN) for for The (Trialeurodes 1. 1. INTRODUCTION INTRODUCTION infectious disease prediction. The greenhouse greenhouse whitefly whitefly (Trialeurodes vaporariorum) vaporariorum) is is one one infectious In this work, ARIMA is used for forecasting the daily increase disease outbreak prediction. 1. INTRODUCTION infectious disease outbreak In this work, ARIMA is usedprediction. for forecasting the daily increase disease outbreak prediction. of the most greenhouse insect pests be The greenhouse whitefly (Trialeurodes isto of the most harmful harmful greenhouse insectvaporariorum) pests known known is toone be infectious in population of the greenhouse ARIMA was chosen The greenhouse whitefly (Trialeurodes vaporariorum) one The greenhouse whitefly (Trialeurodes vaporariorum) is one In this work, ARIMA is used used for forwhitefly. forecasting the daily daily increase in population of the greenhouse whitefly. ARIMA wasincrease chosen capable of transmitting 111 kinds of plant viruses (Jones, In this work, ARIMA is forecasting the The greenhouse whitefly (Trialeurodes vaporariorum) is one In this work, ARIMA is used for forecasting the daily increase of the most harmful greenhouse insect pests known to be capable of transmitting 111 kinds of plant viruses (Jones, due to its flexibility in forecasting for several applications. To In this work, ARIMA is used for forecasting the daily increase of the most harmful greenhouse insect pests known to be of the most harmful greenhouse insect pests known to be in population of the greenhouse whitefly. ARIMA was chosen due to its flexibility in forecasting for several applications. To 2003). In practice, spray pesticides to control population of the greenhouse whitefly. ARIMA was chosen of the most harmfulfarmers greenhouse insect pestsviruses known to the be in in population of the greenhouse whitefly. ARIMA was chosen capable of transmitting 111 kinds of plant (Jones, 2003). In practice, farmers spray pesticides to control the cite some works using the said model, Pasaribu et al., (2018) in population of the greenhouse whitefly. ARIMA was chosen capable of transmitting 111 kinds of plant viruses (Jones, capable of transmitting 111 kinds of plant viruses (Jones, due to its flexibility in forecasting for several applications. To cite some works using the said model, Pasaribu et al., (2018) population of whiteflies regularly. such aa way of to its flexibility in forecasting for several applications. To capable of practice, transmitting 111 kinds However, of plant viruses (Jones, due to its in for several applications. To 2003). farmers spray pesticides to control control the population of whiteflies regularly. However, such way of due used ARIMA to predict rainfall in using monthly due to its flexibility flexibility in forecasting forecasting forMerauke several applications. To 2003). In practice, farmers spray pesticides to the 2003). In Inpesticides practice, farmers spray pesticides to control the cite some works using the said model, Pasaribu et al., (2018) used ARIMA to predict rainfall in Merauke using monthly applying may not be efficient at all times. cite some works using the said model, Pasaribu et al., (2018) 2003). In practice, farmers spray pesticides to control the some works using the said model, Pasaribu et (2018) population of whiteflies whiteflies regularly. However, such aa way way of of cite applying pesticides may not be efficient at all times. rainfall data from 2005 2017. and cite works said model, PasaribuAl‐Sakkaf et al., al.,monthly (2018) population of regularly. However, such population of regularly. However, such usedsome ARIMA to using predict rainfall inMoreover, Merauke using rainfall data from 2005theto to 2017. Moreover, Al‐Sakkaf and ARIMA to predict rainfall in Merauke using monthly population of whiteflies whiteflies regularly. However, such aa way way of of used used ARIMA to predict rainfall in Merauke using monthly applying pesticides may not be efficient at all times. Jones (2014) applied ARIMA for forecasting monthly disease used ARIMA to predict rainfall in Merauke using monthly To devise a more efficient method for spraying pesticides, applying pesticides may not be efficient at all times. applying pesticides may not be efficient at all times. rainfall data from 2005 to 2017. Moreover, Al‐Sakkaf and Jones (2014) applied ARIMA for forecasting monthly disease To devisepesticides a more efficient method foratspraying data from 2005 to 2017. Moreover, Al‐Sakkaf and applying may not be efficient all times.pesticides, rainfall rainfall data from to Moreover, Al‐Sakkaf and incidence of campylobacteriosis from 1998 to 2008 in New rainfall data from 2005 2005 to 2017. 2017. Moreover, Al‐Sakkaf and monitoring and prediction of the population of insect pests are Jones (2014) applied ARIMA for forecasting monthly disease To devise a more efficient method for spraying pesticides, incidence of campylobacteriosis from 1998 to 2008 in New monitoring and prediction of the population of insect pests are Jones (2014) applied ARIMA for forecasting monthly disease To devise a more efficient method for spraying pesticides, Jones (2014) (2014) applied applied ARIMA ARIMA for for forecasting forecasting monthly monthly disease disease To devise aa more efficient method for spraying pesticides, Zealand. necessary. way of the insect pests is by the To devise One more efficient spraying incidence of of campylobacteriosis campylobacteriosis from 1998 1998 to to 2008 2008 in in New New monitoring and prediction of method the population population of insect pests are Jones Zealand. necessary. One way of monitoring monitoring the for insect pests ispesticides, bypests the use use incidence from monitoring and prediction of the of insect are incidence of campylobacteriosis from 1998 to 2008 in New monitoring and prediction of the population of insect pests are incidence of campylobacteriosis from 1998 to 2008 in New of sticky paper traps. After collecting the number of insect monitoring and prediction of the population of insect pests are Zealand. necessary. One way of monitoring the insect pests is by the use of sticky paper traps. After collecting the pests number ofthe insect This Zealand. necessary. One way of monitoring the insect is by use necessary. One way of monitoring the pests is by the use This research research presents presents the the use use of of ARIMA ARIMA and and multivariate multivariate Zealand. pests trapped the sticky traps, the information necessary. Oneon monitoring the insect insect pests is by use Zealand. of sticky sticky paper traps. Afterpaper collecting the count number ofthe insect pests trapped onway theof sticky paper traps, count information ARIMA for forecasting the daily increase in population of of paper traps. After collecting the number of insect of sticky paper traps. After collecting the number of insect This research presents the use of ARIMA and multivariate ARIMA for forecasting the daily increase in population of can be used to develop models that can predict possible This research presents the use of ARIMA and multivariate of sticky paper traps. After collecting the number of insect research presents the use of ARIMA and multivariate pestsbetrapped trapped on the sticky sticky paper traps, the count count information can used toon develop models thattraps, can predict possible insect This whiteflies in greenhouses. Unlike other works, the data used in This research presents the use of ARIMA and multivariate pests trapped on the sticky paper traps, the count information pests the paper the information ARIMA for forecasting the daily increase in population of whiteflies in greenhouses. Unlike other works, the data used in pest outbreaks. ARIMA for forecasting the daily increase in population of pests trapped on the sticky paper traps, the count information ARIMA for forecasting the daily daily increase in population population of can be be used to to develop develop models models that that can can predict predict possible possible insect insect ARIMA pest outbreaks. this work were obtained from a wireless imaging system that for forecasting the increase in of can be used to develop models that can predict possible insect can used whiteflies in greenhouses. Unlike other works, the data used in this work were obtained from a wireless imaging system that whiteflies in greenhouses. Unlike other works, the data used in can be used to develop models that can predict possible insect whiteflies in greenhouses. Unlike other works, the data used in pest outbreaks. outbreaks. is able to automatically record the number of insect pests found whiteflies in greenhouses. Unlike other works, the data used in Prediction or forecasting in many applications is usually done pest pest outbreaks. this work were obtained from a wireless imaging system that is able to automatically the number imaging of insect system pests found Prediction or forecasting in many applications is usually done this work were obtainedrecord from aa wireless that pest outbreaks. this work were from imaging system that on sticky paper traps. means that proposed this work were obtained obtained from a wireless wireless imaging system that by using models neural network One of is able able to automatically automatically record the number of insect insect pestsmethod found Prediction or forecasting forecasting inand many applications is usually usually done sticky paper traps. This This means that the the proposed method by using statistical statistical modelsin and neural network models. models. One of on is to record the number of pests found Prediction or many applications is done is able to automatically record the number of insect pests found Prediction or forecasting in many applications is usually done can be implemented instantaneously making management of is able to automatically record the number of insect pests found the studies that used statistic models includes the work of Arya Prediction or forecasting in many applications is usually done on sticky sticky paper traps. traps.instantaneously This means means that that the proposed proposed method by using using statistical models and neuralincludes networkthe models. One of can be implemented making management of the studies that usedmodels statisticand models work of Arya on paper This the method by statistical neural network models. One of on sticky paper traps. This means that the proposed method by using statistical models and neural network models. One of insect pests more effective. This paper also aims to find the on sticky paper traps. This means that the proposed method et al. (2016) applying Autoregressive Integrated Moving by using statistical models and neural network models. One of can be implemented instantaneously making management of the studies that used statistic models includes the work of Arya insect pests more effective. This paper also aims to find the et al. (2016) applying Autoregressive Integrated Moving can be implemented instantaneously making management of the studies that used statistic models includes the work of Arya can be implemented instantaneously making management of the studies that statistic models the work of minimum number of input data for applying ARIMA and bepests implemented instantaneously making management of Average (ARIMA) in forecasting thrips the studies that used used statistic models includes includes theand workwhitefly of Arya Arya can insect more effective. This paper also aims to find the et al. (2016) applying Autoregressive Integrated Moving minimum number of input data for applying ARIMA and Average (ARIMA) in forecasting thrips and whitefly insect pests more effective. This paper also aims to find the et al. (2016) applying Autoregressive Integrated Moving insect pests effective. This also to find the et al. applying Autoregressive Integrated Moving multivariate ARIMA in also tested pests more more effective. This paper paper We also aims aims to find the population. On the hand, Xiang Zhou (2010) used et al. (2016) (2016) Integrated Moving Average (ARIMA) in Autoregressive forecasting thrips and whitefly multivariate ARIMA in forecasting. forecasting. We alsoARIMA tested the population. On applying the other other hand, Xiang and and Zhouand (2010) used insect minimum number of input data for applying and Average (ARIMA) in forecasting thrips whitefly minimum number of input data for applying ARIMA and Average (ARIMA) in forecasting thrips and whitefly proposed approach different sets of number ofto data for ARIMA data and Support Vector Machine for Dendrolimus Average (ARIMA) in (SVM) forecasting thrips whitefly population. On the the other hand, Xiang and Zhouand (2010) used minimum proposed approach toinput setsapplying of time-series time-series Support Vector Machine (SVM) for forecasting forecasting Dendrolimus multivariate ARIMA in forecasting. We also tested the population. On other hand, Xiang and Zhou (2010) used multivariate ARIMA indifferent forecasting. We also tested data the population. On the other hand, Xiang and Zhou (2010) used collected from different farms in order to investigate and multivariate ARIMA in forecasting. We also tested the punctatus. Works on using neural network models include the population. On the other hand, Xiang and Zhou (2010) used proposed approach to different sets of time-series data Support Vector Machine (SVM) for forecasting Dendrolimus collected from different farms in order to investigate and punctatus. Works on using neural network models include the Support Vector Machine (SVM) for forecasting Dendrolimus proposed approach to different sets of time-series data Support Vector Machine Machine (SVM) for for forecasting Dendrolimus compare the forecasting errors and validity of the model. proposed approach to different sets of time-series data use of Multi-Layer Perceptron (MLP) neural network for Support Vector (SVM) forecasting Dendrolimus collected the from differenterrors farmsand in validity order to investigate and punctatus. Works on on using using neural (MLP) networkneural modelsnetwork include the the compare forecasting of the model. and use of Multi-Layer Perceptron for collected punctatus. Works neural network models include from different farms in order investigate punctatus. Works on using neural network models include the collected from differenterrors farmsand in validity order to to investigate and predicting the population of thrips on cotton crops (Patil and punctatus. Works on using neural network models include the compare the forecasting of the model. use of Multi-Layer Perceptron (MLP) neural network for predicting the population of thrips(MLP) on cotton crops (Patil and use of Multi-Layer Perceptron neural network for use of Perceptron (MLP) neural network for compare the the forecasting forecasting errors errors and and validity validity of of the the model. model. use of Multi-Layer Multi-Layer Perceptron neural network for compare predicting the population population of thrips thrips(MLP) on cotton cotton crops (Patil and and predicting the of on crops (Patil predicting the population of thrips on cotton crops (Patil and predicting the population of thrips on cotton crops (Patil and 2405-8963 © 2019, IFAC IFAC (International Federation of Automatic Control) Copyright © 2019 196Hosting by Elsevier Ltd. All rights reserved. Copyright 2019 responsibility IFAC 196Control. Peer review©under of International Federation of Automatic Copyright © 2019 IFAC 196 10.1016/j.ifacol.2019.12.521 Copyright © © 2019 IFAC IFAC 196 Copyright 196 Copyright © 2019 2019 IFAC 196

IFAC AGRICONTROL 2019 December 4-6, 2019. Sydney, Australia

Lin-Ya Chiu et al. / IFAC PapersOnLine 52-30 (2019) 196–201

2. MATERIALS AND METHODS

197

where k=p+q+1 if the model contains an intercept or constant term and k=p+q otherwise (Cryer, 2008). The best p, q, d values are determined based on the lowest AIC values found under different values of p, q, and d.

2.1 Dataset collection Two sets of greenhouse data were used in this work which included the whitefly count, temperature, humidity and light intensity collected using a wireless imaging system. The said system is composed of wireless imaging devices equipped with Raspberry Pi 3 B+ embedded devices and Raspberry Pi v2 cameras (Rustia and Lin, 2017). The yellow sticky paper images were processed by two CNN models, in which the first model is used to filter out the insect objects, and the second model is used to classify them. An insect pest counting algorithm was used to obtain the whitefly count data with 97% accuracy in whitefly identification (Rustia et al., 2018).

2.3 ARIMAX model ARIMAX extends ARIMA models by including exogenous variables X to transform ARIMA into a multiple regression model. In ARIMAX, the I component is similar to an ARIMA model as shown previously in (2). ARIMAX is suitable for analysis where there are additional variables that have possible influence on the predicted values. The formula for applying ARIMAX is shown in (4):

(1 − ∑𝑝𝑝𝑠𝑠=1 𝛼𝛼𝑠𝑠 𝐿𝐿𝑠𝑠 )𝑦𝑦𝑡𝑡 = 𝜇𝜇 + ∑𝑞𝑞𝑠𝑠=1 𝛽𝛽𝑠𝑠′ 𝐿𝐿𝑠𝑠 𝑋𝑋𝑡𝑡 + (1 + ∑𝑟𝑟𝑠𝑠=1 𝛾𝛾𝑠𝑠 𝐿𝐿𝑠𝑠 )𝑒𝑒𝑡𝑡 (4)

where L is the usual lag operator, for example, 𝐿𝐿𝑠𝑠 𝑦𝑦𝑡𝑡 = 𝑦𝑦𝑡𝑡−𝑠𝑠 and 𝐿𝐿𝑠𝑠 𝑥𝑥𝑡𝑡 = 𝑥𝑥𝑡𝑡−𝑠𝑠 ; 𝛼𝛼𝑠𝑠 ∈ 𝑅𝑅 , 𝛽𝛽𝑠𝑠 ∈ 𝑅𝑅𝑘𝑘 , 𝛾𝛾𝑠𝑠 ∈ 𝑅𝑅 are unknown parameters, and 𝑒𝑒𝑡𝑡 ’s are the errors, and p, q and pre-defined natural numbers (Bierens, 1987).

Particularly, 10 devices were installed in greenhouse No.1, located in Yunlin, Taiwan and with an area of 2208 m2, and 7 devices were installed in greenhouse No. 2, located in Chiayi, Taiwan with an area of 529 m2. Both greenhouses grow tomato seedlings as main crop. Using the raw whitefly count, the daily increase in whitefly count is defined as the number of whiteflies detected in a day. The data from greenhouse No. 1 includes data from 2018/5/9 to 2018/12/9 which were used for model training and testing. On the other hand, greenhouse No. 2 includes data from 2018/2/22 to 2018/9/10, which was used for model validation.

2.4 ARIMA (or ARIMAX) model training and forecasting The flowchart for applying the ARIMA (or ARIMAX) model is shown in Fig. 1.

2.2 ARIMA model Autoregressive Integrated Moving Average (ARIMA) is a method for fitting time series data to a model in order to obtain future prediction from historical data. In theory, ARIMA includes three components: auto-regressive (AR), movingaverage (MA), and Integrated (I) terms. The first two components are expressed in (1):

𝑊𝑊𝑡𝑡 = ∅1 𝑊𝑊𝑡𝑡−1 + ⋯ + ∅𝑝𝑝 𝑊𝑊𝑡𝑡−𝑝𝑝 + 𝑒𝑒𝑡𝑡 − 𝜃𝜃1 𝑒𝑒𝑡𝑡−1 − ⋯ − 𝜃𝜃𝑞𝑞 𝑒𝑒𝑡𝑡−𝑞𝑞 (1) AR terms

MA terms

where ∅ is a number strictly between −1 and +1, and θ is the weights, and p is the order of the AR model, and q is the order of the MA model. On the other hand, the I component is defined in (2), with values of d often set into values 0, 1, or 2. (Cryer, 2008): if 𝑑𝑑 = 0, 𝑊𝑊𝑡𝑡 = 𝑌𝑌𝑡𝑡 {if 𝑑𝑑 = 1, 𝑊𝑊𝑡𝑡 = 𝑌𝑌𝑡𝑡 − 𝑌𝑌𝑡𝑡−1 if 𝑑𝑑 = 2, 𝑊𝑊𝑡𝑡 = (𝑌𝑌𝑡𝑡 − 𝑌𝑌𝑡𝑡−1 ) − (𝑌𝑌𝑡𝑡−1 − 𝑌𝑌𝑡𝑡−2 )

Fig. 1. ARIMA (or ARIMAX) model training and forecasting flowchart. The input data should be stationary before fitting them to the models. This is because the mean and variance of a stationary data is constant over time, which can be easier to predict. In this research, ADF (Augmented Dickey-Fuller) test (α =0.05) is used to check whether the data is stationary or not. The ADF test statistic is an estimated coefficient from the method of least squares regression (Cryer, 2008). If the p-value > α condition of the ADF test is met, the null hypothesis cannot be rejected and this means that the data is stationary, and nonstationary otherwise. The results of the ADF test are shown in Table 1.

(2)

The p, q, d values of ARIMA can be automatically determined using a variant of ARIMA called Auto-ARIMA. AutoARIMA iteratively computes the information criteria used to select the best p, q, d values such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Hannan-Quinn Information Criterion (HQIC), and Out of Bag (OOB). Among the different criteria, AIC was used in this work as the criterion for optimizing Auto-ARIMA. AIC is based on the maximum likelihood estimation (MLE) as shown in (3): AIC = −2 log(maximum likelihood) + 2𝑘𝑘

In Table 1, the p-values of the raw data are larger than 0.05 except for light intensity, which means that the raw data are not all stationary. One way to make a set of data stationary is to obtain its first order difference as shown in Fig. 2 and Fig. 3, showing the raw and first order difference of the data. For

(3) 197

IFAC AGRICONTROL 2019 198 December 4-6, 2019. Sydney, Australia

Lin-Ya Chiu et al. / IFAC PapersOnLine 52-30 (2019) 196–201

[𝑄𝑄1 − 𝑘𝑘(𝑄𝑄3 − 𝑄𝑄1 ), 𝑄𝑄3 + 𝑘𝑘(𝑄𝑄3 − 𝑄𝑄1 )]

testing the model, the last 79 data points, out of the 199 data points of greenhouse No. 1, are used for model testing.

where Q1 is the lower quartile, and Q3 is the upper quartile. The non-negative constant k is defined by John Tukey as k = 1.5 which indicates an outlier (Mosteller and Tukey, 1977).

Table 1. ADF test for stationary testing p-value of factors

Raw data First order difference

Count

Temp. (oC)

Humidity (%RH)

0.22

0.11

0.57

Light intensity (lux) 0.04

2.13×10-6

3.52×10-19

2.21×10-19

2.83×10-7

(6)

2.6 Increase in whitefly count early warning levels The increase in whitefly count can be further utilized by defining early warning levels. In this work, the warning levels were set as Normal, Moderate, High, and Critical. These warning levels were set differently for the two farms and were automatically determined using K-means clustering algorithm (Zhang et al., 2018). K-means clustering algorithm is a method for unsupervised division of datasets into K clusters by minimizing the withincluster sum of squares. In using K-means, the K value can be pre-defined by the user which determines the number of cluster centroids. The initial cluster centroids are randomly selected, and iteratively changed until it becomes stationary or until it reaches the end of the iteration (Hartigan and Wong, 1979). In this work, K was set into 4 based on the number of target warning levels.

Fig. 2. Raw whitefly count data of greenhouse No. 1.

To evaluate the model, the confusion matrix is obtained and is used to calculate the prediction F1-score. The F1-score is calculated using the harmonic mean of precision and recall. The values of F1-score ranges from 0 to 1 where values closer to 1 means better model performance. Precision is the ratio of true positives (TP) over the total number of predictions while recall is the ratio of TP over the total number of testing data to be predicted. In this work, TP is defined as a predicted warning level that matches the true warning level defined manually. Precision =

Fig. 3. First difference of whitefly count of greenhouse No. 1.

Recall =

After obtaining the first order difference of the data, the pvalues became smaller than 0.05, which means that the processed data is now stationary and can be used to fit the ARIMA or ARIMAX model. Then, the data is used as input to the ARIMA (or ARIMAX) models as training data, and then the predicted data are inverted to get the true forecast data. The future data is predicted up to a pre-set N number of days. In this research, 7 days of predicted data is forecasted after each day. The forecasting model was iteratively retrained by adding newly observed real data to the training data in order for the forecasting the whitefly count of the next day.

𝑛𝑛

2∗𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃∗𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃+𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅

(7) (8) (9)

3. RESULTS AND DISCUSSION 3.1 Data selection and determination of minimum input data To determine the necessary input data for forecasting, different combinations of the input data were tested. In this work, C is defined as the number of whiteflies detected each day, T is the ambient temperature (⁰C), H is the relative humidity (%RH), and L is the light intensity (lux). Moreover, the minimum number of input training data or days to the model were also determined to know if the method can also be applied on different greenhouses effectively.

In this work, model validation is done by calculating the RMSE (Root-Mean-Squared Error) from the real and forecasted data as shown in (5): 2 ∑𝑛𝑛 𝑡𝑡=1(𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑−𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑)

𝑇𝑇𝑇𝑇

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑛𝑛𝑔𝑔 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

𝐹𝐹1 -𝑠𝑠core =

2.5 Model validation and outlier removal

RMSE = √

𝑇𝑇𝑇𝑇

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝

(5)

In this research, different training days were tested such as: 15, 30, 60, 90, and 120. The performance of ARIMA and ARIMAX model trained with different combinations of input data is shown in Table 2.

However, after preliminary testing, outliers were found from the testing data and predicted data. Thus, outlier removal is performed to remove the bias in calculating the average error between the testing data and predicted data. This research uses the definition of Tukey’s fences for outlier removal as shown in (6):

The results show that the best combination of the input data was using C+T+H. It also shows that when there was 60-days length of training data, most of the models have the minimum 198

IFAC AGRICONTROL 2019 December 4-6, 2019. Sydney, Australia

Lin-Ya Chiu et al. / IFAC PapersOnLine 52-30 (2019) 196–201

199

Table 2. RMSE ± STD of different factors with different number of training days where C, T, H and L are the daily increase in whitefly counts, temperature, humidity, and light intensity, respectively. Factors (differenced) C C+T C+H C+L C+T+H C+T+L C+H+L C+T+H+L

15 2.21+1.43 2.66+1.95 2.61+1.69 3.41+3.62 3.14+2.30 3.26+2.53 3.65+2.87 10.56+11.88

30 2.21+1.43 2.16+1.36 1.98+1.26 1.74+1.08 2.21+1.48 2.09+1.31 1.95+1.27 2.02+1.30

Number of training days 60 2.21+1.43 1.57+1.04 1.67+1.13 1.70+1.09 1.51+0.93 1.74+1.15 1.70+1.17 1.62+1.02

90 2.21+1.43 1.69+1.09 1.82+1.18 1.66+0.93 1.78+1.15 1.69+0.95 1.92+1.27 1.67+0.95

120 2.21±1.43 1.66±1.09 1.63±0.99 1.70±1.14 1.56±0.91 1.75±1.14 1.72±1.13 1.62±0.96

RMSE, and the RMSE of ARIMAX models were smaller than the ARIMA model. With regards to the minimum number of input data, it shows that at around 30 days, the results were acceptable since the RMSE values were smaller than 3. 3.2 Sample forecasting results Using input data C, T, and H, the model was used to perform more in-depth forecasting performance tests. To do so, the one-day forecasting results, as compared to the raw data with different days of input data, are shown in Fig. 4. In coherence to the results shown in Table 2, Fig. 4 (b)-(d) show that the predictions were close to the testing data, Moreover, the prediction results also follow the trend of the testing data. Furthermore, Fig. 5 shows sample results of 7-day forecasting on greenhouse No. 1 using 60 days of input data.

Fig. 5. Sample 7-day forecasting results on greenhouse No. 1 using 60 days of input data. 3.3 Testing the ARIMAX method with a separate data set The proposed method was also tested on another greenhouse for validation. This is done to prove that the method can be easily applied on other greenhouses as well. Fig. 6 and Fig. 7 show the raw data and its first order difference, respectively. Similar to greenhouse No. 1, the last 79 data points were used as testing data out of the 199 data points.

Fig. 4. One-day forecasting results on greenhouse No. 1 with different days of input data for training. (a) 30 days, (b) 60 days, (c) 90 days, (d) 120 days. Forecasted data of negative value were all set to zero.

Fig. 6. Raw whitefly count of greenhouse No. 2.

The results in Fig. 5 show that even the predictions were not exactly the same as the testing data, the trends were still similar. The trend and the resulting warning levels, as to be shown later on, is more important in applying the method.

Fig. 7. First difference of whitefly count of greenhouse No. 2. 199

IFAC AGRICONTROL 2019 200 December 4-6, 2019. Sydney, Australia

Lin-Ya Chiu et al. / IFAC PapersOnLine 52-30 (2019) 196–201

The summary of testing results on greenhouse No. 2 using different number of days for training is shown in Table 3.

The results in Fig. 9 show that the trends were properly predicted even there were slight differences in count. The above results demonstrate that the proposed method was flexible enough and the chosen combination of input data was appropriate for forecasting. It can also be observed that the prediction results can be more accurate if there were more data included as input such as more than 60 days.

Table 3. Summary of testing results on greenhouse No. 2 using different number of input days with input data C+T+H.

RMSE STD

15 3.06 3.60

Number of input days 30 60 90 1.12 1.08 0.99 0.54 0.51 0.38

120 1.09 0.50

3.4 Increase in whitefly count warning levels To determine the levels for warning, the frequency distributions of the daily increase in whitefly counts were obtained. The distributions were categorized by using Kmeans clustering algorithm as described a while ago. The frequency distributions of greenhouse No. 1 and No. 2 are shown in Fig. 10 and Fig. 11.

The results in Table 3 show that the lowest RMSE can be found with input days equal and greater than 30. This shows that the method can be used for other greenhouses as well. To compare the prediction and the real data, four sample oneday forecasting results of greenhouse No. 2 are shown in Fig. 8.

Fig. 10. Frequency distribution of the daily increase in whitefly counts of greenhouse No. 1 with the warning levels determined by the K-means clustering. Fig. 8. One-day forecasting results on greenhouse No. 2 with different number of input days. (a) 30 days, (b) 60 days, (c) 90 days, (d) 120 days. Similar to the findings in greenhouse No. 1, the forecasting results in greenhouse No. 2 also appear to be similar to the testing data. Furthermore, sample 7-day forecasting results are also shown in Fig. 9.

Fig. 11. Frequency distribution of the daily increase in whitefly counts of greenhouse No. 2 with the warning levels determined by K-means clustering. The results show that the frequency distribution and determined warning levels were about the same. However, it also shows that the K-means clustering allows non-arbitrary setting of the warning levels, making the method more adaptive. Based on the distributions, it shows that there were several instances that the increase in count was on a critical

Fig. 9. Sample 7-day forecasting results on greenhouse No. 2 using 60 days of input data. 200

IFAC AGRICONTROL 2019 December 4-6, 2019. Sydney, Australia

Lin-Ya Chiu et al. / IFAC PapersOnLine 52-30 (2019) 196–201

level of around less than 6% for both farms. These instances are mostly caused by sudden changes in the environment such as in temperature or humidity. The determined warning levels are used for validating the model by obtaining the confusion matrix between the predicted and true levels. The results are shown in Fig. 12.

201

Arya, P., Paul, R. K., Kumar, A., Singh, K. N., Sivaramne, N., & Chaudhary, P. (2016). Predicting pest population using weather variables: An ARIMAX time series framework. International Journal of Agricultural and Statistical Sciences, 11(2), 381-386. Bierens, H. J. (1987). ARMAX model specification testing, with an application to unemployment in the Netherlands. Journal of Econometrics, 35(1), 161-190. Cryer, D. J. (2008). Time Series Analysis with Applications in R. New York USA: Springer. Hartigan, J., & Wong, M. (1979). Algorithm AS 136: A Kmeans clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108. Jones, D. R. (2003). Plant viruses transmitted by whiteflies. European Journal of Plant Pathology. 109, 195-219. Liao, Y., Xu, B., Wang, J., & Liu, X. (2017). A new method for assessing the risk of infectious disease outbreak. Scientific Reports, 7, 40084. Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression: a second course in statistics. Addison-Wesley Series in Behavioral Science: Quantitative Methods. Pasaribu, Y. P., Fitrianti, H., & Suryani, D. R. (2018). Rainfall forecast of merauke using autoregressive integrated moving average model. In The 3rd International Conference on Energy, Environmental and Information System, 73, Central Java, Indonesia. Patil, J., & Mytri, V. D. (2013). A Prediction model for population dynamics of cotton pest (Thrips tabaci Linde) using multilayer-perceptron neural network. International Journal of Computer Applications, 67(4), 19-26. Rustia, D. J. A., & Lin, T.T. (2017). An IoT-based wireless imaging and sensor node system for remote greenhouse pest monitoring. Chemical Engineering Transactions, 58, 601-606. Rustia, D. J. A., Lin, C. E., Chung, J.-Y., Lin, T.-T. (2018). A real-time multi-class insect pest identification method using cascaded convolutional neural networks. In The 9th International Symposium on Machinery and Mechatronics for Agriculture and Biosystems Engineering, Jeju, South Korea. Xiang, C., & Zhou, Z. (2010). Application of ARIMA and SVM hybrid model in pest forecast. Acta Entomologica Sinica, 53(9), 1055-1060. Yusof, Y., & Mustaffa, Z. (2011). Dengue outbreak prediction: A least squares support vector machines approach. International Journal of Computer Theory and Engineering, 3(4), 489-493. Zhang, C., Cai, J., Xiao, D., Ye, Y., & Chehelamirani, M. (2018). Research on vegetable pest warning system based on multidimensional big data. Insects, 9(2), 66.

Fig. 12. Warning level confusion matrices for greenhouses (a) No.1, and (b) No. 2, where N, M, H, and C stands for Normal, Moderate, High, and Critical, respectively. Based on the confusion matrices shown in Fig. 12, the F1scores of levels N, M, and H of greenhouse No. 1 are 0.83, 0.51, and 0.48, respectively. The C level was not included for calculation since there were no critical levels included in the testing set. On the other hand, F1-scores of greenhouse No. 2 are 0.89 for N, and 0.33 for M. From the preliminary results, it shows that the method has the capability of predicting the warning levels with satisfactory accuracy. However, it also shows that it should be further tested as soon as more data under High and Critical levels are included. 4. CONCLUSIONS The results in this research prove that the best model for predicting the incidence of whiteflies was ARIMAX with temperature and humidity included as the exogenous factors. Based on the testing results, it was found that the forecasting method can be started in a new farm after at least 30 days, with an RMSE of around 3, and using 60 days of data leads to better model performance. The warning levels were successfully determined by K-means clustering and testing results on a testing dataset show an F1-scores of 0.86 and 0.42 for Normal and Moderate levels of daily increase in whitefly count. However, more data should be collected to further test the method on High and Critical levels. The proposed ARIMAX model can be used to assist farmers in decision-making for pesticide application scheduling.

ACKNOWLEDGEMENTS This work was supported by a grant (MOST 107-2321-B-002061-) from the Ministry of Science and Technology, Taiwan, R.O.C.

REFERENCES Al‐Sakkaf, A., & Jones, G. (2014). Comparison of time series models for predicting campylobacteriosis risk in New Zealand. Zoonoses Public Health, 61(3), 167-174. 201