Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Transportation Research Procedia 00 (2018) 000–000
Available online at www.sciencedirect.com
Transportation Research Procedia 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Transportation Research Procedia 32 (2018) 62–68 www.elsevier.com/locate/procedia
International Steering Committee for Transport Survey Conferences International Steering Committee for Transport Survey Conferences
Estimation of bus passengers’ waiting time at a coach terminal with Estimation of bus passengers’ waiting time at a coach terminal with Wi-Fi MAC addresses Wi-Fi MAC addresses Takahiko Kusakabea,a,*, Hideki Yaginuma bb, Daisuke Fukuda cc Takahiko Kusakabe *, Hideki Yaginuma , Daisuke Fukuda a
The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8568, JAPAN b University of Science, 641 Yamazaki,Kashiwa-shi, Noda-shi, Chiba 278-8510, JAPAN TheTokyo University of Tokyo, 5-1-5 Kashiwanoha, Chiba 277-8568, JAPAN c b Tokyo Institute of Technology, O-okayama, Meguro, Tokyo, 152-8552, JAPAN Tokyo University of Science,2-12-1, 641 Yamazaki, Noda-shi, Chiba 278-8510, JAPAN a
c
Tokyo Institute of Technology, 2-12-1, O-okayama, Meguro, Tokyo, 152-8552, JAPAN
Abstract Abstract The purpose of this study is to develop a method to estimate duration of stay at transit facilities by detecting Wi-Fi MAC addresses. validation wasisconducted one of the largest highway terminals, “Busta Shinjuku”, in Tokyo, data is The purpose of survey this study to developata method to estimate durationbus of stay at transit facilities by detecting Wi-FiJapan. MACThe addresses. compared to interview survey results obtained the same period verification. TheShinjuku”, estimated staying duration The validation survey was conducted at one ofduring the largest highway busforterminals, “Busta in Tokyo, Japan.was Thesmaller data is than the one interview surveys. This implies Wi-Fi survey possibly observed throughstaying traffic as well aswas the smaller staying compared to obtained interviewby survey results obtained during the that same period for verification. The estimated duration travelers. than the one obtained by interview surveys. This implies that Wi-Fi survey possibly observed through traffic as well as the staying travelers. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license © 2018 The Authors. Published by Elsevier Ltd. (http://creativecommons.org/licenses/by-nc-nd/3.0/) © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC). Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) Keywords: Travel time; Mixed traffic; MAC address; Wi-Fi Detector; Keywords: Travel time; Mixed traffic; MAC address; Wi-Fi Detector;
1. Introduction 1. Introduction The investigation of transfer and waiting behaviour for boarding intercity coaches/trains is one of the significant The for investigation andtransit waiting behaviour boardingisintercity coaches/trains is one of the factors evaluating of andtransfer designing facilities. Thisfor behaviour influenced by the level of service andsignificant reliability factors for evaluating and designing facilities. Thisabehaviour is influenced level service andwhich reliability of the access transit modes because transit a traveler estimates buffer time in order toby notthe miss theofcoach/train is a of the part access because a traveler estimates a buffer time is in expected order to not miss the coach/train major of transit his/hermodes intercity trip. An investigation of this behaviour to be useful to know thewhich effectsisofa major part reliability of his/her of intercity An investigation of thisof behaviour is expected to beThe useful to knowalso theinfluences effects of improving transit trip. facilities as well as facilities the terminals themselves. behaviour improving reliability of transit facilities as well as facilities of the terminals themselves. The behaviour also influences
* Corresponding author. E-mail address:author.
[email protected] * Corresponding
E-mail address:
[email protected]
2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) 2352-1465 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC) 2352-1465 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the International Steering Committee for Transport Survey Conferences (ISCTSC). 10.1016/j.trpro.2018.10.012
2
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
63
the planning and designing of transfer facilities, such as waiting rooms, because the capacity of the room is determined by both the accumulation of passengers and the duration of their stay. However, it is difficult to count transit users and to measure their duration of stay in the facility automatically. This is because we need to identify individual travellers and track them during their stay to know their duration. The recent rise in smartphones use enables us to detect travellers using Wi-Fi and/or Bluetooth. These devices respectively have their own unique media access control (MAC) addresses which can be detected when they are looking for access points (AP). By observing the requests at the transit facilities or roadsides, several studies have proposed the method to detect traffic situation, congestion, and travel behaviour. For example, Tsubota et al. (2011) observed travel time of road traffic and Malinovskiy (2012) observed the one of the pedestrian by using Bluetooth signals. Abedi et al. (2015) compared Wi-Fi and Bluetooth scanners to determine the travel time of pedestrians, runners and cyclists. Vu et al. (2010) analysed peoples’ movements on a university campus using Bluetooth and Wi-Fi. Musa and Eriksson (2011) conducted a survey to demonstrate the usability and accuracy of Wi-Fi scanners for road traffic observations. Danalet et al. (2014) showed pedestrian movements with Wi-Fi devices. Previous studies, however, have not focused on the duration of stay of waiting passengers. The purpose of this study is to develop a method to estimate staying duration at transit facilities by detecting WiFi MAC addresses. The validation survey was conducted at one of the largest highway bus terminal, “Busta Shinjuku”, in Tokyo, Japan. The data is compared with interview survey results obtained during the same period for verification. 2. Methodology This study proposes a method to estimate the duration of passengers’ stay in a coach terminal waiting room by detecting Wi-Fi probe requests from passengers’ Wi-Fi devices. Basically, the duration can be estimated as the time gap between the first and the last probe requests broadcasted by a respective device. However, the probe requests are not always broadcasted depending on the situation and model of the device. In order to mitigate the effects of incomplete observations, this study proposes an estimation method to identify a distribution of duration of stay. Section 2.1 describes a specification and characteristics of the observation device used for collecting Wi-Fi probe requests. Section 2.3 shows the data cleaning method to eliminate the data associated with through traffic. 2.1. Device This study employs a device that scans MAC addresses in Wi-Fi probe requests, referred to as AMP sensor (Anonymous MAC address Probe Sensor, Nishida et al., 2014). This device observes the probe requests from travellers’ Wi-Fi devices such as smartphones. When the Wi-Fi devices are turned on, they broadcast probe requests in order to find access points (AP) available for connecting the device. Each probe request includes a MAC address which is used for identifying the device to establish the Wi-Fi connection to the AP. The AMP sensor has the capability of anonymizing the collected MAC addresses by one-way hash functioning in order to avoid identifying the owner of the Wi-Fi device based on their MAC address data and some other observations. This study assumes that the gap between the first and last request broadcasted by the same device represents the duration of stay in the waiting room. However, several conditions and features in the observation process can influence the observed duration. The conditions needed for estimating staying duration are: • Wi-Fi mode of drivers’ device need to be turned on and broadcasting probe requests; • The probe request needs to be scanned by the AMP sensor. These conditions are not always satisfied because of the following features: • Timing, frequency, and signal intensity of the broadcasting probe requests are different depending on the model of the user’s device; • Even if signal intensity is high, it is possible that it not be observed due to noise; • Reachable distance of probe requests varies from dozens to several hundred meters depending on the user’s device and signal condition; • Probe requests from devices which are not in the waiting room may also be observed;
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
64
3
Due to these features, it is difficult to know whether a traveller actually leaves the location or his/her device just stops broadcasting. Furthermore, travellers who are walking near the waiting room can also be observed even though they are not waiting travellers. To mitigate the effects of these factors, we introduce a method to estimate duration of stay of the waiting passengers as defined in the following sections 2.2. Waiting time estimation The proposed method consists of data cleaning and estimating the mixture distribution of waiting time. This is because probe requests detected by an AMP sensor include requests from though traffic as well as waiting passengers. As shown in Section 2.1, timing and frequencies of requests are different depending on states, operation systems, and models of Wi-Fi devices. The data cleaning process is mainly intended to eliminate requests from through traffic. The estimation of the mixture distribution is intended to distinguish the distribution of actual staying duration and censored/truncated duration caused by conditions of Wi-Fi devices. In this study, the duration obtained from the data cleaning process is called “possible staying duration”. The final estimation results are identified as “staying duration” and “truncated duration”. The following list represents the notations used for the proposed methodology: • i : Anonymized MAC address; • j : Sequence number of a probe request sorted by time; • k : Sequence number of a possible stay; • tij : Received time of a probe request, j , broadcasted from i ; • S ik : A set of probe requests broadcasted from i included in a possible stay, k ; • tmax : Maximum time interval between probe requests included in S ik . This is the max t (i, j + 1) - t (i, j ) j Î Sik ; • nmin : Minimum number of data which should be included in S ik ;
{
•
• • •
}
d ik : Observation duration of possible stay, k , namely ti, max(Sik ) - ti,min( Sik ) ; d max : Maximum continuous observation time of each device; Nmax : Maximum number of observations in the same day; A : Set of Sik remaining after the data cleaning process.
2.3. Data cleaning process The data cleaning process is mainly intended to eliminate requests from through traffic and the ones from devices not held by travellers. The process consists of four steps. The first step is to make sets which consist of the probe requests from an identical device, and to divide the sets when the observation duration is larger than the threshold value tmax . This process creates initial sets of S ik by distinguishing single and multiple visits of a passenger to the waiting room. The second step is to eliminate a set when the included number of probe requests is less than the threshold value nmin . The third step is to derive the duration of the possible stay. And the forth step is to eliminate devices whose number of observations are larger than threshold Nmax , and to eliminate the set whose observation duration is longer than d max . In the first process, the probe requests are divided into each visit, and sequence number is numbered when the observation interval exceeds tmax . This process is expressed as: Step 1: Set the initial sequence number j=1, k=1; Step 2-1: Add the probe request, j, to Sik; Step 2-2: If j is the last data of MAC address, i , proceed to step 3. Step 2-3: Compare the observation times tij and tij+1. If tij+1 – tij ³ tmax, add Sik to A. Update k and j as k=k+1 and j=j+1, respectively. The return to step 2-1; • Step 3: Add Sik to set A.
• • • •
In the second process, when multiple requests are observed in the same second, the request which has the largest RSSI remains in
S ik . And if the number of data Sik
is larger than nmin , the set
S ik
remains in A . Otherwise
S ik
4
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
65
is eliminated from A . This process is intended to reduce the amount of data by eliminating the multiple observations taken at the same time and to eliminate the data whose number of observations is too small since the devices may not have had a stable data transfer. The fourth process eliminates devices that have multiple visits to the bus terminal in the same day because travellers who use long-distance buses should not have visited the same terminal several times. In this process, when the device i is appearing more than Nmax in the same day, all the sets S ik of device i observed in the identical day is eliminated from A . And when the possible staying duration derived from S ik is more than d max , this is also removed from A . As described in Section 2.1, possible staying duration d ik is not always equivalent to staying duration because probe requests do not always reach the AMP sensor. This study assumes that the duration of the possible stay includes actual duration and insufficient observation. The duration of insufficient observation is assumed to be caused when a device stops broadcasting probe requests, when conditions of Wi-Fi signals are poor, and when a passenger moves out of the detection area to wait for his/her bus. Since these two types of durations cannot be identified in advance, this study employs mixture Weibull distribution for possible staying duration d ik . The distribution consists of the distributions of the actual staying duration and the ones of insufficient observation. The probabilistic density function of duration d is represented by Equation 1:
æ æ öm1 -1 æ æ öm1 ö ö æ æ öm2 -1 æ æ öm2 ö ö d d çm d ÷ çm d ÷ expç - çç ÷÷ ÷ ÷ + (1 - s )ç 2 çç ÷÷ expç - çç ÷÷ ÷ ÷ p(d m1,h1, m2 ,h2 ) = sç 1 çç ÷÷ (1) ç ÷ ç ÷ h h h h h h ç 1 è 1ø ÷ ç 2 è 2ø ÷ 1ø 2ø è è è øø è øø è è where s is the mixture rate of actual staying, h1 and m1 are parameters of Weibull distribution for actual staying duration, and h 2 and m2 are for the duration of insufficient observation. The average duration of the distributions are h1G(1 + 1 / m1 ) and h2G(1 + 1/ m2 ) respectively, where G(×) is the gamma function. If the disconnection timing
occurs independently, the distribution of the duration of insufficient observation is expected to follow an exponential distribution which has m2 = 1. Note that the parameters can be estimated by EM Algorithm.
3. Empirical survey Data collection was conducted at “Busta Shinjuku” which is adjacent to the Shinjuku Station of Japan East Railways in Tokyo. Shinjuku is one of the largest urban rail stations in the world in terms of number of daily passengers and Busta is one of the largest coach terminals in Japan. The Busta was opened in April 2016. The AMP sensor was installed in a passenger waiting room in the Busta Shinjuku. The survey was conducted from 6:30 am to 11:00 pm for 15 days from April 28th, 2016 to May 13th, 2016. For validation analysis, a parallel interview survey was conducted on May 3rd, 2016. In this survey, passengers were asked about their waiting time at the Busta. In the survey by the AMP sensor, 3,447,118 probe requests were observed over 15 days. This number corresponds to 13,928 average requests per hour. These probe requests were sent by 487,730 MAC addresses. This means that an average of 32,515 MAC addresses were observed per day. 3.1. Results of data cleaning The parameters for data cleaning were set as: • tmax =120 minutes; • nik =4 records; • d max = 9600 second; • Nmax =1. As shown in Table 1, for the first part of data cleaning, 517,304 possible staying over 15 days were detected. The second part retained 162,858 possible stays where nik >4. The data corresponded to 31.41% unique MAC addresses. By the fourth part, 155,561 possible stays remained. Figure 1 shows the day-to-day changes of the number of possible stays after data cleaning. For validation, the actual number of boarding passengers that were manually counted on May 3rd, 2016 was 18,953 travellers. The number of
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
66
5
possible stays from the AMP sensor, recorded on the same day, was 13,333. When we consider the penetration rate of smartphones in Japan, which is 72.0 % (the Ministry of Internal Affairs and Communications, 2016), this number is relevant because the number of possible stays correspond to 70.3% of the total number of travellers. Table 1. Results of the data cleaning process. Data cleaning process Original data First part Second part Fourth part
Number of MAC addresses 487,730 487,730 153,192 150,278
Percentage of remaining data [%]
31.41 30.81
Number of probe requests 3,447,118 3,447,118 2,920,101 -
Number of possible stays
517,304 162,858 155,561
Number of possible stays
16000 14000 12000 10000
8000 6000
4000 2000 0
Figure 1. Number of possible stays after data cleaning.
3.2. Results of data cleaning Table 2 and Figure 2 show the results of parameter estimation of the proposed duration model with the observation data on May 3rd, 2016. The average staying duration was estimated as 20.39 minutes and the average duration of insufficient observation was 2.56 minutes. The durations of stay and truncating seem to be distinguished according to these average durations, and the share of the staying was 0.67. However, since m1 is less than one, the mode appears when the staying duration is zero. This result implies that many passengers just pass through the waiting room because the room is so crowded. The parameter m2 for the duration of insufficient observation satisfies 1.0 < m2 < 2.0 . This result confirms the unexpected disconnection possibly found in the earlier connection time. Figure 3 shows the estimation results during the observation period. According to the figure, the composition of staying and average duration does not vary greatly from day to day. This implies that the proposed method can estimate the average duration stably, though it tends to underestimate, as described in the next subsection. The minimum value of the average waiting time was 18.9 minutes and the maximum was 22.9 minutes. The average share of the waiting time is 65.1%.
6
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
67
Table 2. Estimation results from May 3rd, 2016. Number of samples Log likelihood BIC
13333 -47368.45 94784.386 Coef.
s m1
0.67 0.89 19.31 1.34 2.79
η1 m2
η2 0.3
Staying
t-value p-value 206.19 0.0 189.11 0.0 143.22 0.0 192.83 0.0 156.07 0.0
Insufficient observation
Probability Density
0.25 0.2
0.15 0.1
0.05 0
0
10
20
30 40 Duratiion [min]
50
60
Figure 2. Estimated probability density of staying duration.
15 10 5 2016/5/13
2016/5/12
2016/5/11
2016/5/9
2016/5/10
2016/5/8
2016/5/7
2016/5/6
2016/5/5
2016/5/4
2016/5/2
0
Composition rate of staying
20
2016/5/1
2016/5/13
2016/5/12
2016/5/11
2016/5/9
2016/5/10
2016/5/8
2016/5/7
2016/5/6
2016/5/5
2016/5/4
2016/5/3
2016/5/2
2016/5/1
0
2016/4/30
5
Composition of staying 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
2016/4/30
10
Insufficient observation
25
2016/4/29
15
Staying
Average duration of staying/truncated staying [min]
20
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
2016/5/3
Composition of staying
Composition rate of staying
Truncated staying
25
2016/4/29
Average duration of staying/truncated staying [min]
Staying
Figure 3. Day-to-day changes of estimated average durations. Number of samples
45
Composition rate
90%
80%
35
70%
30
60%
25
50%
20
40%
15
30%
10
20%
5
10%
Composition rate
Number of samples
40
0
10
20
30
40
50 60 70 80 Duration of stay (min)
90
100
110
0%
Figure 4. Staying duration from interview survey.
3.3. Comparison with interview survey The data from May 3rd , 2016 was used to compare with the results from the interview survey described in Figure 4. The interview survey collected 284 samples. According to the interview survey, the average staying duration was 59.0 minutes and the maximum number of travellers was observed in the 30 to 39 minute range. Comparing the
68
Takahiko Kusakabe et al. / Transportation Research Procedia 32 (2018) 62–68 Kusakabe et al./ Transportation Research Procedia 00 (2018) 000–000
7
estimated duration of the Wi-Fi device, larger staying durations were observed in the interview survey. This may be because the waiting people who visited the ticket counter in the waiting room waited outside the room to avoid the congestion. As a result, the travellers observed in the interview survey who stayed in the room seemed to have larger staying time than the Wi-Fi device. 4. Conclusion This study introduced an estimation method of travellers’ staying time in the waiting room of a coach terminal. The method employs an AMP sensor which can observe probe requests from Wi-Fi devices. The methodology consists of two parts, data cleaning and estimation of the mixture distribution. These processes are intended to mitigate the effects of the feature of the probe requests described in Section 2.1. The proposed estimation model was applied to the actual observation which was collected in a large coach terminal in Japan. The estimated staying duration was smaller than the one obtained from the interview surveys. This implies that the Wi-Fi survey possibly observed through traffic in addition to staying travellers. On the other hand, the interview survey possibly failed to collect the travellers who were in too much of a rush to answer or waited for the coach outside the waiting room. These results indicate that multiple observation locations should be set when the observation field is very large. Acknowledgements We appreciate the support of Kanto Regional Development Bureau of Ministry of Land, Infrastructure, Transport and Tourism in Japan. References Abedi, N., Bhaskar, A., Chung, E., Miska, M., 2015. Assessment of antenna characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses. Transp Res Part C 60, 124–141. Danalet, A., Farooq, B., Bierlaire, M., 2014. A Bayesian approach to detect pedestrian destination-sequences from WiFi signatures, Transp Res Part C 44, 146–170. Malinovskiy, Y., Saunier, N., Wang, Y., 2012. Analysis of pedestrian travel with static Bluetooth sensors. Transp. Res. Rec.: J. Transp. Res. Board 2299, 137–149. Musa, A.B.M., Eriksson, J., 2012. Tracking Unmodified Smartphones Using Wi-Fi Monitors, Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, pp. 281–294. Nishida, J., Adachi, T., Makimura, K., 2014. Traffic flow analysis by the use of Wi-Fi packets receiver. IRF Asia, Indonesia. Tsubota, T., Bhaskar, A., Chung, E. Billot, R., 2011. Arterial traffic congestion analysis using Bluetooth duration data. Aust Transp Res Forum, Adelaid SA. Vu, L., Nahrstedt, K., Retika, S. and Gupta, I., 2010. Joint Bluetooth/Wifi scanning framework for characterizing and leveraging people movement in university campus. Proc of the 13th ACM Int Conf on Model. Anal Simul Wirel Mob Syst, Bodrum, Turkey, 2010. Ministry of Internal Affairs and Communications, 2016. The White Paper on Information and Communications in Japan, Japan.