Predicting bike sharing demand using recurrent neural networks

Predicting bike sharing demand using recurrent neural networks

Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at www.sciencedirect.com Available online at ww...

655KB Sizes 0 Downloads 32 Views

Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Procedia Computer Science 00 (2019) 000–000

2018

ScienceDirect

Procedia Computer Science 00 (2019) 000–000 International Conference onScience Identification, Information Procedia Computer (2019) 000–000 Procedia Computer Science 14700 (2019) 562–566

in the Internet of Things, IIKI 2018

www.elsevier.com/locate/procedia

and Knowledge www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Predicting bike sharing demand using recurrent networks 2018 International Conference on Identification, Information neural and Knowledge 2018 International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2018 a the Chen Internet of Things, 2018a , Xin Yaob Yan Pan , Ray Zheng , JiaxiIIKI Zhang a,∗ in

Predicting bike sharing demand using recurrent networks The High School Affiliated to Renmin University of China, Beijing, 100080,neural China Predicting bike sharing demand using recurrent neural networks Institute of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China a

b

Abstract

b Institute b Institute

a a b Yan Pana,∗ a,∗, Ray Chen Zhenga , Jiaxi Zhanga , Xin Yaob Yan Pan , Ray Chen Zheng , Jiaxi Zhang , Xin Yao a

The High School Affiliated to Renmin University of China, Beijing, 100080, China a The High School Affiliated to Renmin University of China, Beijing, 100080, China of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China of Remote Sensing and Geographical Information Systems, Peking University, Beijing, 100871, China

Predicting bike sharing demand can help bike sharing companies to allocate bikes better and ensure a more sufficient circulation of bikes for customers. This paper proposes a real-time method for predicting bike renting and returning in different areas of a city Abstract during a future period based on historical data, weather data, and time data. We construct a network of bike trips from the data, Abstract use a community detection method on the network, and find two communities with the most demand for shared bikes. We use Predicting bike sharing demand can help bike sharing companies to allocate bikes better and ensure a more sufficient circulation data of stations in the two communities our dataset, and train antodeep LSTM model with two layersa to predict bike renting and Predicting sharing demand can helpasbike sharing companies allocate bikes better and sufficient circulation of bikes forbike customers. This paper proposes a real-time method for predicting bike renting and ensure returningmore in different areas of a city returning, making use of thepaper gatingproposes mechanism of longmethod short term memory and the abilityand to returning process sequence data of recurrent of bikes for customers. This a real-time for predicting bike renting in different areas of a city during a future period based on historical data, weather data, and time data. We construct a network of bike trips from the data, neural network. We evaluate the with the Root Mean Squared Error of data and showathat the prediction of proposed model during a future period basedmethod on model historical weather data. Wewith construct network bike trips from use a community detection on thedata, network, anddata, find and two time communities the most demandoffor shared bikes.the Wedata, use outperforms that of other deep learning by comparing their RMSEs. use aofcommunity method on models the network, two with the most shared bikes. We and use data stations in detection the two communities as our dataset,and andfind train an communities deep LSTM model with twodemand layers tofor predict bike renting data of stations in the communities as our dataset, train an deep LSTM with to twoprocess layers sequence to predict data bike of renting and returning, making usetwo of the gating mechanism of longand short term memory andmodel the ability recurrent c 2019network.  The Authors. Published by Elsevier returning, making of the the gating mechanism of long short term memory sequenceofdata of recurrent neural Weuse evaluate model with B.V. the Root Mean Squared Error ofand datathe andability show to thatprocess the prediction proposed model This is network. an open article under the with CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) neural We evaluate the model theby Root Mean Squared Error of data and show that the prediction of proposed model outperforms thataccess of other deep learning models comparing their RMSEs. Peer-review under of the models scientific theRMSEs. 2018 International Conference on Identification, Information outperforms that ofresponsibility other deep learning bycommittee comparingof their and Knowledge in the Internet of Things. c 2019  2019 The The Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. © c 2019  The Authors. Published by Elsevier B.V. This is access the BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an an open open access article under under the CC CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Shared bike article demand prediction ; time series forecasting ; recurrent neural networks ; long short term memory Keywords: This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) under responsibility responsibilityofofthe thescientific scientificcommittee committee the 2018 International Conference on Identification, Information Peer-review under ofof the 2018 International Conference on Identification, Information and Peer-review responsibility the scientific committee of the 2018 International Conference on Identification, Information and Knowledge the Internet of of Things. Knowledge inunder theinInternet of Things. and Knowledge in the Internet of Things. Keywords: Shared bike demand prediction ; time series forecasting ; recurrent neural networks ; long short term memory Keywords: Shared bike demand prediction ; time series forecasting ; recurrent neural networks ; long short term memory

1. Introduction Bikes have long played an important part in city transportation. As a consequence, bike-sharing has recently received increasing attention around the world. Bike-sharing customers prefer to quickly find a bike whenever they need 1. Introduction 1. one.Introduction Thus, bike provider companies need to allocate bikes efficiently according to the demand. Appropriate prediction of bike across different areas over different is thus crucial. Bikesdemands have long played an important part in city time transportation. As a consequence, bike-sharing has recently reBikes have long played an important part Bike-sharing intime city of transportation. As a consequence, bike-sharing has recently reAs many underlying factors — for the customers day, day of the week, events,find weather, between ceived increasing attention around the example, world. prefer to quickly a bike correlation whenever they need ceived increasing attention around the world. Bike-sharing customers prefer to quickly find a bike whenever they need stations — bike contribute to companies the demandneed of shared bikes[1], demand challenging. Several studies one. Thus, provider to allocate bikespredicting efficientlybike according to is thevery demand. Appropriate prediction one. Thus, bike provider companies need to different allocate bikes efficiently according to the demand. Appropriate show that analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict futureprediction transport of bike demands across different areas over time is thus crucial. of bike demands across factors different—areas over different time thusday crucial. As many underlying for example, time of theisday, of the week, events, weather, correlation between As many underlying factors — for example, time of the day, day of thedemand week, events, correlation stations — contribute to the demand of shared bikes[1], predicting bike is veryweather, challenging. Severalbetween studies ∗ Corresponding author. Tel.: +86-187-0137-1618. stations — contribute to the demand of shared bikes[1], predicting bike demand is very challenging. Severaltransport studies show that analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future E-mail [email protected] show thataddress: analyzing usage data of taxicabs [11], subways [2], buses[4], and bikes[14] could predict future transport c 2019 The Authors. Published by Elsevier B.V. 1877-0509  ∗ author. Tel.: +86-187-0137-1618. This∗ Corresponding is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Corresponding author. Tel.: +86-187-0137-1618. E-mail address: [email protected] Peer-review under responsibility of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in E-mail address: [email protected] the Internet of Things. c 2019 The Authors. Published by Elsevier B.V. 1877-0509  1877-0509  © 2019 The The Authors. Published by B.V. c 2019 1877-0509 Authors. Published by Elsevier Elsevier B.V. This article under the CC licenselicense (https://creativecommons.org/licenses/by-nc-nd/4.0/) This isisan anopen openaccess access article under the BY-NC-ND CC BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review of the committee of the 2018 Conference on Identification, Information and Knowledge in Peer-reviewunder underresponsibility responsibility of scientific the scientific committee of theInternational 2018 International Conference on Identification, Information and Peer-review under of the scientific committee of the 2018 International Conference on Identification, Information and Knowledge in the Internet of Knowledge inThings. theresponsibility Internet of Things. the Internet of Things. 10.1016/j.procs.2019.01.217

2

Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000

563

usage. Kaltenbrunner et al.[9] discovered that temporal and spatial mobility patterns exist within the city; Vogel et al.[13] discovered that there are spatio-temporal dependencies between rents and returns of bikes at stations. In this paper, we propose a real-time method for predicting bike demands in different areas of a city during a future period based on historical data from Citi Bike System Data and meteorology data. We use the time sequence of bike rents and returns as dataset. We train a deep long short term memory (LSTM)[8] recurrent neural network (RNN) with this data, making use of the self-loop and forget gate of LSTM. The model is proved to be effective after experiment with various approaches. The method is able to handle huge data in an acceptable amount of time, and the same method can be applied to other bike sharing systems. 2. Methodology 2.1. Deep LSTM We choose the LSTM sequence learning model because of its ability to process sequential data and memorize data of past time steps[7]. LSTM is a different type of gated RNN which is capable of learning long-term dependencies. LSTM is not affected by vanishing gradient or exploding gradient problem [8]. Fig. 1a shows the mechanism of LSTM. An LSTM has a internal recurrence and a self-loop[5] in addition to the outer recurrence, which allows the network to accumulate information. The self-loop weight of LSTM is controlled by a forget gate, using a sigmoid unit which sets the self-loop weight between 0 and 1. The external input gate and output gate have similar computations to the forget gate. Thus, an ability to learn long-term dependencies were given to the network. y softmax

a

c

*

c

c

y(d)

y(d+1)

loss function

loss function

LSTM

LSTM



LSTM

LSTM



x(d)

x(d+1)

tanh

a

f forget gate 

i

*

update gate 

c~ tanh gate 

* o

Dataset

a

output gate 

x

(a) The mechanism of LSTM.

a

(b) The structure of the deep LSTM sequence learning model.

Fig. 1: Fig. 1a shows the complete mechanism of LSTM using a flowchart. Fig. 1b shows the implementation of a deep LSTM model with two layers of LSTM.

Like other neural networks in deep learning, RNNs could be stacked up to deeper versions, which contain more than 1 layers of RNN. Because RNNs are especially computationally expensive to train, normally a deep RNN model contains no more than 3 layers of LSTM. Deep RNN is very useful in learning complex functions. We use two LSTM layers in our model. In deep LSTM, the model contain multiple layers, but the parameters of different layers are calculated independently. The first layer of LSTM will compute a hidden layer of units based on the input. Then, the second layer of LSTM will calculate the output based on the hidden units. Finally, the neural network will calculate the loss function and try to minimize it. Fig. 1b shows the structure of the sequence model. With the deep LSTM sequence learning model, we are able to learn complex functions and predict sequential data very accurately. 3. Experiment 3.1. Data Description & Processing We use data from the Citi Bike System Data of 2017 as the training set and use data of January, February, and March of 2018 as test set to conduct the experimental study. The Citi Bike have more than 800 bike stations built in New York City and Jersey City. However, Fig. 2a and 2b show that the number of bike rent in a given hour can vary

564

Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000

3

hugely according to the location. Furthermore, analyzing every station on its own there is a repetitive pattern found, as shown in Fig. 2c. There are two problems with these stations having little amount of related trips. First, they may lead to the problem of data scarcity, while LSTM have strict restriction on the quality of data[3]. Second, since they have little rents and returns, the bikes hardly run out, so analyzing their time sequence is much less meaningful. We use the community detection method proposed by Rosvall et al. (2008)[12] to detect the station community structure. The method results in 12 large communities with more than 3 stations and other small communities. We only choose the two communities with largest number of related trips as our dataset. Therefore, by using data of stations in a community as the dataset, we could maintain the consideration of interactions between stations while filtering low-quality data.

(a) The accumulative spatial distribution of bike rents during 8:00-9:00 in March 2018.

(b) The accumulative spatial distribution of bike rents during 18:00-19:00 in March 2018.

(c) The accumulative number of rents and returns for a single station in different hours of week in February 2018. (W 52 St & 6 Ave, station id: 3443)

(d) The community map. Community 1 and community 3 are chosen because of their large number of related trips.

Fig. 2: The spacial distribution of bike rent and its repetitive pattern. Fig. 2a and 2b show that the demand of bikes varies significantly with stations, and the comparison of the two sub-figures shows the difference of distribution of rent and return behaviors. Fig. 2c shows the repetitive pattern of rents and returns for a single station in different time of week. The curve shows similar patterns from Monday to Friday, but different on Saturday and Sunday. Fig. 2d shows the positions of all communities and the stations included in each community.

The raw data contains information in many dimensions, including spatial information, temporal information, and customer information. In this model we use only the start time, end time, start station, and the end station of each trip. We first convert the information into data of stations by dividing each day into several time steps and count the number of rents and returns separately, denoted by Xrent and Xreturn . We also consider the importance of different influence factors in our model, including Weather, Date, and Day of Week. Because people are more exposed to harsh weather conditions during bike rides, the demand for shared bikes is greatly influenced by weather[1]. We consider the potential influence of 3 different weather indicator — Temperature, Precip Intensity, and Wind Speed. We use  both the historical weather and future weather in the dataset, using Xweather and Xweather separately to denote weather data of the past day and the target day, Xyear and Xweek to denote day of year and day of week. Therefore, the input data structure is a combined matrix of all matrices. The input matrix consists of N rows, denoting the features, and T columns, denoting the time steps. As we only need to predict future data of rents and returns, the output is the combination of two matrices Yrent and Yreturn . 3.2. Deep LSTM Sequence Learning Model To avoid the potentially strong influence of Day of Week, we use previous data to predict the data after 7 days. That (d) (d) (d+7) (d+7) and Xreturn to predict Xrent and Xreturn . We have 360 stations in total. We use data from January 1, is, we use Xrent 2017 to December 31, 2017 as training set and data from January 1, 2018 to March 31, 2018 as test set. The input and output shape are 358 × 24 × 728 and 358 × 24 × 720 for training set, while 90 × 24 × 728 and 90 × 24 × 720 for test set. The experimental parameters are shown in Table 1a. We use the mean squared error as the loss function: N T ˆ < j>(d) 1 MS E(d) = N·T · i=1 − Yi< j>(d) )2 , where N and T are the number of training sample and the total number j=1 (Yi of timesteps, d represents the day, Yˆ i< j> and Y is the predicted value and the real value of the ith training sample at the jth timestep on day d, respectively.



Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566

4

565

Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000 Table 1: The experimental parameters, as well as average RMSEs for each method.

(a) Experimental Parameters

(b) Average RMSEs for each method.

Parameter

Value

Number of trips Number of stations Time step length Time sequence length Number of sequences Number of hidden layers Number of nodes in hidden layer

16364502 360 1 hour 24 hours 358 0-2 1000

(a) Training set

(b) Test set

Method

Training set

Test set

LSTM DNN LSTM+LSTM LSTM+DNN DNN+DNN LSTM+LSTM+DNN

3.7046 4.6083 3.6752 3.4953 3.8289 4.4015

2.7128 3.1117 2.7069 2.7731 2.9361 3.1106

(c) Net demand of train- (d) Net demand of test set ing set

Fig. 3: Fig. 3a and 3b compare the RMSEs for different neural networks. The comparison shows the deep LSTM model fits the test set best. Fig. 3c and 3d show the RMSEs of net demand different deep learning models for training set and test set.

(a) Real

(b) Predicted

Fig. 4: Real and predicted number of rents at 18:00-19:00, January 14, 2018.

4. Result Analysis In order to evaluate the performance of our proposed model, we use different deep learning models to predict the demand and compare their results. Apart from LSTM, we also use deep neural networks (DNN)[6] to predict the result, which does not take the property of sequence into consideration. Weuse the Root Mean Squared Error (RMSE)[10] N T ˆ < j>(d) 1 · i=1 − Y < j>(d) )2 . as the performance metric, which can be calculated by RMS E(d) = j=1 (Y N·T

i

i

The result shows a mean RMSE of 3.6752 for training set and a mean RMSE of 2.7069 for test set. Considering the number of docks in each station, the error is affordable. The RMSEs for the test set are significantly lower than that for the training set, which indicates no problem of overfitting. Figure 3 shows the boxplot of RMSE for each model. The comparison shows that our model with two layers of LSTM fits the test set best, indicating that LSTM is better at predictions with sequential data than DNN. Fig. 4 shows an example of prediction. The prediction is accurate on the areas around the Central Park and the New York Stock Exchange. However, the prediction is not so well in areas around Museum of Modern Art and the Empire State Building, maybe due to the influence of events.

566

Yan Pan et al. / Procedia Computer Science 147 (2019) 562–566 Pan, Zheng, Zhang, Yao / Procedia Computer Science 00 (2019) 000–000

5

To assist allocation of bikes and predict the actual demand for each area, we need to compute the difference between (d) ˆ ˆ (d) ˆ (d) = Yrent − Yreturn . We can therefore rents and returns, which we define by NetDemand: NetDemand n n n evaluate our performance by calculating the RMSE for NetDemand. The mean RMSE is 3.0203 for training set while 1.9323 for test set. Fig. 3c and 3d show the boxplot of RMSE for net demand. We conclude that the prediction is even more precise on the net demand than on rent and returns. 5. Conclusion & Future Application In response to the unequal spatial temporal distribution of demand for shared bikes, we propose a model based on long short-term memory to predict the rents and returns of each bike sharing station in different areas of a city based on historical bike data, weather data, and time data. We evaluate our model on data from the Citi Bike System Data. Experimental results show that the model can get an RMSE of 2.7069 on average. We further evaluate our model by comparing the RMSE of proposed model to RMSE of result predicted by other deep learning neural networks. We get the net demand by calculating the difference between number of rents and returns. The result for net demand is even better, showing that our model can predict the demand accurately. By learning from historical bike data and past weather data, the proposed deep LSTM model can predict the rents and returns of bikes for the entire city as well as the demand for bikes at a certain time. Based on the prediction, we can make suggestion for bike companies about how to distribute the bikes specifically to each station to satisfy the need of customers as well as saving unnecessary cost of keeping bikes. The application of proposed model will be a win-win solution for both the bike company and the customers. Acknowledgments This work is funded by Studies on Talent Cultivation Model: International Experience and Domestic Reform (Project ID: ADA160004) — A Key National Project Under the 13th Five Year Plan for National Education Science, supported by National Social Science Foundation of China. References [1] Campbell, A.A., Cherry, C.R., Ryerson, M.S., Yang, X., 2016. Factors influencing the choice of shared bicycles and shared electric bikes in beijing. Transportation research part C: emerging technologies 67, 399–414. [2] Ding, C., Wang, D., Ma, X., Li, H., 2016. Predicting short-term subway ridership and prioritizing its influential factors using gradient boosting decision trees. Sustainability 8, 1100. [3] Dong, D., Wu, H., He, W., Yu, D., Wang, H., 2015. Multi-task learning for multiple language translation, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1723–1732. [4] Foell, S., Phithakkitnukoon, S., Kortuem, G., Veloso, M., Bento, C., 2015. Predictability of public transport usage: A study of bus rides in lisbon, portugal. IEEE Transactions on Intelligent Transportation Systems 16, 2955–2960. [5] Gers, F.A., Schmidhuber, J., Cummins, F., 1999. Learning to forget: Continual prediction with lstm . [6] Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks, in: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323. [7] Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep learning. volume 1. MIT press Cambridge. [8] Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural computation 9, 1735–1780. [9] Kaltenbrunner, A., Meza, R., Grivolla, J., Codina, J., Banchs, R., 2010. Urban cycles and mobility patterns: Exploring and predicting trends in a bicycle-based public transport system. Pervasive and Mobile Computing 6, 455–466. [10] Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y., et al., 2015. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intelligent Transportation Systems 16, 865–873. [11] Phithakkitnukoon, S., Veloso, M., Bento, C., Biderman, A., Ratti, C., 2010. Taxi-aware map: Identifying and predicting vacant taxis in the city, in: International Joint Conference on Ambient Intelligence, Springer. pp. 86–95. [12] Rosvall, M., Bergstrom, C.T., 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 1118–1123. [13] Vogel, P., Greiser, T., Mattfeld, D.C., 2011. Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia-Social and Behavioral Sciences 20, 514–523. [14] Yang, Z., Hu, J., Shu, Y., Cheng, P., Chen, J., Moscibroda, T., 2016. Mobility modeling and prediction in bike-sharing systems, in: Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, ACM. pp. 165–178.