Transportation Research Part C 107 (2019) 510–524
Contents lists available at ScienceDirect
Transportation Research Part C journal homepage: www.elsevier.com/locate/trc
Diffusion behavior in a docked bike-sharing system Xueyan Weia,b,c,d, Sida Luob, Yu (Marco) Nieb,
⁎
T
a
Jiangsu Key Laboratory of Urban ITS, Southeast University, No. 2 Southeast University Road, Nanjing 211189, China Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL 60208, United States c Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, China d School of Transportation, Southeast University, China b
ARTICLE INFO
ABSTRACT
Keywords: Bike sharing Diffusion behavior Bike trip distance Rebalancing action Isolated communities
This paper examines bike diffusion behavior in a docked bike-sharing system in Chicago. The analysis is based on an analogy between the movement of shared bikes and the transmission of certain information on internet or the spreading of epidemics among humans. By mining a bike trip data set collected in the city, we find that (1) the distribution of bike trip distance peaks between 0.8 and 2 km, and beyond 6.3 km, it follows a strong power law; (2) the diffusion intensity of a community is affected positively by the number of incoming bike trips and rebalancing actions, and negatively by the percentage of inner-community trips. The effect of the rebalancing actions is roughly twice as strong as that of the incoming bike trips; (3) both the diffusion range of a bike and the number of rebalancing actions it receives are strong predictors of its use. Reaching one more community will produce about 14 more trips and an additional rebalancing action contributes about 8.6; and (4) even the most active bikes could only reach about 75% of all communities in Chicago. The last finding helps identify a cluster of communities poorly connected with the rest of the city by bike travel. Interestingly, these isolated communities are strongly correlated with the areas of the city that have high concentration of African American population, low-income households and homicide crimes.
1. Introduction Thousands of cities around the world now operate some forms of bike-sharing systems (Meddin, 2018). Bike sharing helps transit users solve their first- and last-mile problem (Liu et al., 2012). It also offers others, including residents and visitors of large metropolitan areas, a low-cost mode to make relatively short trips. While bike-sharing systems are expensive to operate in general (Shaheen et al., 2010), they are quite popular among travelers. A recent survey found that over 90 percent of bike sharing users in Beijing, China were satisfied with the experience (Wang, 2017). Bike-sharing systems may or may not be operated with fixed docks. Dockless bike-sharing systems allow users to pick and return bikes pretty much anywhere, a crucial feature that has quickly attracted myriads of users and stimulated their dramatic expansion, especially in China (e.g. Mobike and ofo). Yet, they are often frowned upon by city managers, who see the bikes littering the streets and sidewalks a threat to their city’s ecosystem (Liu et al., 2017; Jiang, 2018). Docked systems, such as Capital Bikshare in Washington D.C. and Divvy Bike in Chicago, are less intrusive, but they are generally viewed as less flexible and more expensive. Although large-scale bike sharing is a relatively new concept, it has attracted great interest from the research community in recent years, see (Fishman, 2016) for a review. Researchers have attempted to examine the characteristics of bike-sharing systems using
⁎
Corresponding author. E-mail address:
[email protected] (Y.M. Nie).
https://doi.org/10.1016/j.trc.2019.08.018 Received 2 October 2018; Received in revised form 3 May 2019; Accepted 20 August 2019 Available online 12 September 2019 0968-090X/ © 2019 Elsevier Ltd. All rights reserved.
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
data-driven approaches (Faghih-Imani and Eluru, 2015; de Chardon and Caruso, 2015; Wergin and Buehler, 2017; Yang et al., 2017; Zhang and Mi, 2018; Du et al., 2019), to develop operational strategies such as rebalancing operations (Chemla et al., 2013; Erdoğan et al., 2015; Forma et al., 2015; Schuijbroek et al., 2017; Haider et al., 2014; Zhang et al., 2019; Warrington and Ruchti, 2019), to integrate it with transit systems (Chow and Sayarshad, 2014; Ma et al., 2015; Ai et al., 2018; Lu et al., 2018; Stiglic et al., 2018; Yang et al., 2018; Gu et al., 2019), and to optimize the system design such as station locations (Frade and Ribeiro, 2015; Hyland et al., 2018). In the following we briefly review the data-driven research because our study falls into this category. To analyze bicyclists destination preferences, Faghih-Imani and Eluru (2015) specify a multinomial logit model and estimate its coefficients using a Divvy Bike data set. Their analysis shows that travel distance, land use, built environment, and access to public transportation infrastructure all affect the destination choices of bikers. de Chardon and Caruso (2015) develop several models—different from each other on the level of spatial and temporal aggregation—that estimate the number of daily bike sharing trips based on the number of bicycles available at a station over time. Wergin and Buehler (2017) analyze detailed GPS trajectory data of 94 Capital Bikeshare (CaBi) bikes. Distinguishing riders by type of CaBi membership, their analysis reveals popular routes, bicycle infrastructure usage, and activity patterns associated with places of interest. Strong differences are found in trip attributes between short-term and long-term users. Yang et al. (2017) combine smart card data with survey data to explore the factors contributing to bike sharing usage in underdeveloped cities. They find that bike sharing in underdeveloped cities offers an alternative transportation mode, instead of a mere last-mile solution, as in large cities. Using big data techniques, Zhang and Mi (2018) estimate the impacts of bike sharing on energy use and carbon dioxide (CO2 ) and nitrogen oxide (NOx ) emissions in Shanghai. Their analysis suggests that bike sharing cut the city’s CO2 and NOx emissions by 25,240 and 64 tonnes, respectively in 2016. This paper attempts to understand the operation of a bike-sharing system from a quite different perspective. It is motivated by the similarity between bike sharing and other dynamic processes driven by human mobility, in particular the diffusion process of information (e.g. a rumor) or epidemics (Castellano et al., 2009; Lloyd and May, 2001; Newman, 2002; Pastor-Satorras and Vespignani, 2001). Such an analogy is new and interesting in its own right, but as we shall see, it also offers a promising tool to extract new knowledge from the data. We divide the region in which a bike-sharing system operates into smaller zones called communities. Each bike mimics a piece of news or an epidemic and each community is seen as the analogy for a human agent. The inter-community flow of bike traffic then represents the connection between agents, and the flow rate represents the strength of the connection, or the likelihood of transmission of information or disease from one agent to another. We first characterize the diffusion process of shared bikes using empirical data collected by Divvy Bike. We then conduct various analyses in order to understand: (1) whether and how fast a bike can reach all communities in the region within a given period of time; (2) how a bike's usage is affected by diffusion behaviors, both at the community and the individual level; and (3) what factors affect the bike diffusion behavior. Among other things we discover communities that are largely skipped even by the most active bikes. Surprisingly, these areas are well aligned with the socially disconnected communities that have high concentration of African American population, low-income households and homicide crimes. The remainder of this paper is structured as follows. Section 2 presents an overview of the Divvy Bike data used in this study, followed by results of preliminary analysis on the distribution of trip distance by shared bikes. Sections 3 and 4 characterize the bike diffusion behavior from collective and individual perspectives, respectively. Section 5 performs a case study to verify a hypothesis about isolated communities. The last section concludes the study with a summary of findings and possible directions for future research. 2. Preliminary analysis 2.1. Data description Divvy Bike started its docked bike-sharing system in 2013 and by February of 2016 has been generating 3.2 million rides per year in the Chicago metropolitan area. Chicago is one of the most populous megacities in the North America, with a large and vibrant downtown that attracts commuters and visitors in great numbers. With around 600 stations, Divvy covers the city of Chicago and two neighboring suburbs, as shown in Fig. 1. In 2017, Divvy Bike has about 6,243 bikes in active service in the Chicago area (Divvy, 2018). Acquired from the official web site of Divvy Bike (Divvy, 2018), the data used in this study contains 3,829,014 bike trips associated with 585 bike stations, collected from 1/1/2017 to 12/31/2017. The ID and location (latitude and longitude) of each station are also included. Each trip record includes order ID, bike ID, departure time, origin station, arrival time and destination station. The raw trip data offers no information on real trip distance. As an approximation, we compute the trip distance between two stations using Distance Matrix API of GoogleMap corresponding to the bike mode. We filter out the trips that (1) start and end at the same station (this typically means that the bike was returned where it was picked), (2) take more than five hours, and (3) display an average speed less than 5 km/h or larger than 30 km/h. Applying these filters leaves 3,431,037 trips for the analysis. We note that Divvy bike trips used in the analysis are only associated with users, including Subscriber, Customer and Dependent. The movement of the bikes by non-users (e.g., by the system operator) is not recorded as a trip. Such a movement is inferred from the raw trip data—whenever a discrepancy is detected between the destination station of a trip and the origin station of the next trip—and is interpreted as a rebalancing (reposition) action. A rebalancing action is considered effective if the bike is used at least once within an analysis period after the action is taken. We only consider effective rebalancing actions in this study. All Divvy bike stations are assigned to one of the 48 communities (see Fig. 1), of which 46 are officially designated in the Chicago 511
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Fig. 1. Visualization of Divvy operation in Chicago: geographic distribution of divvy bike stations (left) and a heat map of bike stations in different communities (right).
area community GIS data (City of Chicago, 2018). Communities 80 and 81 are synthesized from other data sources, since they are not part of the City of Chicago. It is easy to see from the right panel of Fig. 1 that the downtown area has a much higher density of bike stations. Specifically, Communities 32, 8, 7 and 28 have higher bike-station density than others because they are part of the central business district (CBD), while the communities from suburbs, such as 81, 13, 15 and 46, are covered by fewer stations. 2.2. Empirical distribution of bike trip distance It is well known that the distribution of travel distance decays following a power law at the inter-city level (Gonzalez et al., 2008). Specifically, let P (r ) be the probability density function (PDF) of inter-city trip distance r; then P (r ) takes the form of: (1)
P (r ) = r ,
where and are parameters. Notations employed in this paper are summarized in Table 1. We start our analysis by asking whether the distance of bike trips exhibits a similar feature. The empirical probability density function P (r ) is plotted in Fig. 2a, computed from an ensemble of all valid trips. The plot shows that the probability density peaks between 0.8 and 2 km, consistent with the observation in the literature (Mobike, 2017b). Also, the PDF appears to consist of two distinct patterns, with a critical r ranging between 6 and 7 km. The trend follows a power law when r exceeds this critical value, but a different law seems at work for smaller values of r. The existence of the threshold may result from a few factors. First, the size of Chicago downtown (roughly 4.88 km by 2.66 km) might play a role. The second and more plausible explanation is Divvy’s pricing policy. Under this policy, a single ride is charged every 30 min and a member does not have to pay for the first 30 min in each ride. This means the user cost jumps after the riding time exceeds 30 min, creating an incentive for many users to terminate their trip within that limit. The 6–7 km breakpoint, on average corresponding to 20–30 min riding time, is likely resulted from this limit. 2.3. Validation of power law To confirm our observation in the previous section, we propose to fit the empirical P (r ) by a power law with exponential cutoff Eq. (2), a continuous nonlinear function Eq. (3) and a piecewise nonlinear function Eq. (4), respectively:
P1 (r ) = r 1exp(
2 r ),
P2 (r ) = exp(
4r
3
0.5
(2) 5/ r
(3)
0.5),
512
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Table 1 Summary of notations. Notation
Explanation
P (r ) r r0
Probability density function (PDF) of trip distance r Single trip distance (km) Minimum trip distance (km) Maximum trip distance (km) Critical trip distance (km) Cumulative trip distance of a shared bike (km) Cumulative number of unique bikes that had visited a community at least once within a given time period Total number of active shared bikes in certain time period Diffusion intensity of a community Number of trips destined for stations inside a community Number of effective rebalancing actions (NERA) for a community Percentage of internal trips (PIT) Diffusion range of an individual shared bike Number of trips recorded for an individual bike NERA for an individual bike Average diffusion range of a set of bikes A binary variable, which equals 1 if a community is isolated otherwise equals 0 The probability that a community is identified as an isolated community The percentage of African American population (% )
rc d b B pc qc ec gc pi qi ei p¯i I P (I = 1) PAAP MHI
The median household income (10 4$ )
Fig. 2. Analysis of bike trip distance distribution: (a) Empirical distribution; (b) Empirical distribution vs. fitted power law with exponential cutoff and continuous nonlinear functions; (c) Empirical distribution vs. fitted piecewise nonlinear function, (d) Empirical vs. fitted probability density function of cumulative trip distance in a year for individual shared bikes.
513
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
P3 (r ) =
exp( r
6
7 ln(r )
9 exp(
10 r ),
8/r
0.5)),
r0
r
rc < r
rc ,
(4)
where i , i = 1, …, 10 are parameters to be calibrated, r0 and are the minimum and maximum trip distances observed from data, and rc is the critical value that separates the two pieces in Eq. (4). We found in this data set r0 is around 0.2 km, consistent with the minimum trip distance seen in dockless shared bike systems (Mobike, 2017b)1. Such a lower bound is easy to understand: few are willing to go through the trouble of bike sharing for a trip shorter than 200 meters. The maximum trip distance is cut off at 30 km in this study. The actual data records only five trips longer than that, which seem rare exceptions. Fig. 2b-c plot the three fitted functions and the statistics of the curve-fitting results are reported in Table 2. The goodness-of-fit result is the worst for the power law with exponential cutoff (Eq. (2)), as indicated in the adjusted R2 , the sum of squared error (SSE) and the root of mean squared error (RMSE). The continuous nonlinear function (Eq. (3)) and the piecewise nonlinear function (Eq. (4)) both fit the empirical distribution very well, with an adjusted R2 value of 0.99. For the piecewise nonlinear function, the analysis confirms that the best fitting result is achieved when rc is around 6.3 km. In the power law piece (i.e., rc > 6.3 km), the scaling exponent 9 is 1.80, which is very close to 1.75, the scaling exponent found by Gonzalez et al. (2008). Thus, a universal rule governing human dynamics also governs long-distance bike travel. The value of 9 is slightly larger in our study than that from Gonzalez et al. (2008), likely because the bikers limited willingness and ability to travel very far make the probability of observing long-distance trips decays more quickly. Finally, Fig. 2d reports the empirical and fitted PDF of individual bikes’ total distance travelled in a period of one year, denoted as d hereafter. As expected, d follows a normal distribution, as it is essentially the sum of many random variables (each being a trip distance). The fitted plot shows that the PDF of annual total travel distance peaks at d (1600,1700) (km) for divvy bikes in Chicago, corresponding to an average of about 1,650 km. Thus, each shared Divvy bike travels about 4.5 km per day in Chicago. In comparison, a Mobike in Beijing travels about 13.3 km per day (Mobike, 2017a). 3. Diffusion analysis: community perspective Starting from this section, we treat the movement of a shared bike as a diffusion process of information or epidemics, each individual bike as a piece of news or an epidemic, each community as a human agent and each bike sharing trip as the transmission of information or disease. Despite the strong analogy, bike sharing is different from epidemic spreading in two aspects. First, a bikesharing system has many bikes that are moving around whereas in a system of epidemic spreading there is usually a single epidemic. Secondly, an epidemic spreads through infection, meaning it duplicates itself with certain probability (i.e., rate). There seems no suitable analogy of infection rate in bike sharing, which makes it difficult to directly adopt existing epidemic models like the SIS model and SIR model (Hethcote, 2000). Instead, we focus on developing various indexes that characterize bikes’ diffusion behavior. The main objective is to analyze bike diffusion behavior, individually and collectively, and understand whether and how it contributes to the utilization of individual bikes and the overall performance of the shared bike system. In this section, we shall take the perspective of a community. The next section will approach the analysis from the perspective of individual bikes. 3.1. Basic properties We first analyze the collective diffusion patterns of all bikes in the 48 communities across the region. To this end, let b be the cumulative number of unique bikes that had visited a community at least once within a given time period, and B be the total number of active shared bikes corresponding to the time period. A bike is considered “active” if and only if it has been used at least once in the period, otherwise it is considered out of service. In this study, B 6, 243. Fig. 3a plots the value of b against the rank of the communities in the descending order b. Each of the twelve months in 2017 has a separate b vs. community rank curve. For example, the point on any curve with x = 3 corresponds to the community that had been visited by the third largest cumulative number of unique bikes of all 48 communities for that month. In terms of the general trend, all the twelve curves in Fig. 3a are rather similar. The logarithm view (Fig. 3b) shows a strong power law, indicating that the number of unique bike visits in different communities decreases exponentially. It is quite interesting that the twelve curves can be easily classified into two groups: one consists of results corresponding to months from April to October and the other consists of the other months. The average value of b for any given x is significantly different between the two groups. The reason for the difference is easy to bring to mind: biking is sensitive to weather. April through October have the most suitable weather for biking in Chicago, and hence diffusion is also faster in these months. Expanding the temporal scaling from one week to one year, we depict the diffusion of shared bikes in Figs. 3c-d. These plots reveal, as expected, that more unique bike visits are recorded across the board for longer periods of observation. Also, note that b observed for the top ranked communities (the first 5–8) all exceeds 6,000. In other words, given enough time, nearly all shared bikes would reach some of the “hot regions”, strongly suggesting that few bikes are confined in a small number of communities.
1
The Mobike data is collected in Wuhan, China. 514
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Table 2 Results of parameter estimation (with 95% confidence intervals). Functions
SSE
R2
Adjusted R2
RMSE
Eq. (2)
2.70E-03
0.87
0.86
2.74E-03
Eq. (3)
7.87E-06
0.99
0.99
Coefficients 1
1.49E-04
= 3.31 (3.26, 3.36)
3
= 4.57 (4.52, 4.62)
4
Eq. (4)-1
7.90E-05
0.99
0.99
1.12E-03 8
Eq. (4)-2
4.82E-06
0.96
0.96
1.28E-04
= 4.43 (4.26, 4.61)
2
= 3.57 (3.55, 3.59)
5
= 4.37 (4.34, 4.40)
6
= 6.71 (6.23, 7.18)
7
= 4.65 (4.45, 4.84)
9
= 1.80 (1.61, 1.99)
= 10.06 (9.59, 10.54)
10
= 0.38 (0.33, 0.42)
Fig. 3. Community rank curves in descending order of b in different aggregation time periods: (a) Monthly view; (b) Logarithmic vertical axis view of (a); (c) Cold season view and (d) Warm season view including time spans of one week, one month, one quarter, half year and one year.
515
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Fig. 4. Number of bike trips recorded in each month of 2017.
3.2. Statistical analysis We proceed to examine what factors affect the speed at which shared bikes diffuse to different communities. To this end, we first define a community’s diffusion intensity within a given period of time as pc = b/ B , which is the percentage of all unique bikes that have visited the community at least once in the period. The results reported in the previous section suggest that the diffusion intensity may be related to bike travel intensity because bikes evidently diffuse faster in the summer than the winter. Fig. 4 confirms that the total number of recorded bike trips is strongly influenced by the time of year: In July and August it reaches a peak that is more than four times of what is recorded in December and January. Accordingly, we postulate that the number of recorded trips may be a strong predictor for the diffusion intensity. Particularly, we consider the number of trips destined for stations inside a community, denoted as qc . To test this hypothesis, the following data sample is prepared. For each of the 48 communities, the diffusion intensity and the number of recorded trips are generated for 69 distinctive time periods of different length. These include 48 individual weeks (four per month), twelve months, four quarters, two half years (January to June, and July to December), warm and cold seasons (April to September, October to March), and the entire year. Hence, in total, the sample has 3,312 data points. Fig. 5 plots the diffusion intensity pc against lnqc (the natural logarithm of qc ) for the sample generated above. Interestingly, the relationship between pc and ln(qc ) demonstrates a strong S shape, which resembles S growth curve widely used in biology to describe the growing process of population under finite natural environment. It is well-known that an S-shape curve can be modeled using a logistic function (Grubler, 1990) depicted below:
pc =
1 1 + exp(a1
a2ln(qc ))
,
(5)
where a1 and a2 are coefficients to be calibrated. The basic regression analysis based on the sample generated before results in a1 = 9.51 and a2 = 1.08 at a 95% confidence level. The goodness-of-fit measure R2 is 0.97, indicating that the proposed functional form is indeed a very good fit to the data. Having demonstrated that the diffusion intensity of a community is strongly influenced by the number of bike trips that it receives, we note that other factors may also play an important role. Notably, the effort to reposition bikes from one station to another
Fig. 5. Diffusion intensity vs. travel intensity. 516
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Table 3 Results of multivariate linear regression for diffusion intensity. Coefficient (Variable)
Estimated value
(constant)
0
1 2
3
−0.31∗∗∗ 0.04∗∗∗
(total trips lnqc )
0.08∗∗∗ −0.19∗∗∗
(NERA, lnec ) (PIT, gc )
Regression statistics
R2
0.78
Adjusted R2 Standard error
0.14
0.78
Note: ∗∗∗ means the p-value is significant at a confidence level of 99%.
(sometimes in a different community) evidently helps the circulation of bikes in the system. Bike rebalancing has received much attention in recent years, but few had considered its impact on bike diffusion. We record the total number of effective rebalancing actions (NERA) for each community, denoted as ec , and employ it as the second explanatory variable. Intuitively, if many of the trips ending in a community originate within it, the diffusion intensity may be negatively affected. Clearly, if all trips start and end in the same community, then the bikes will be confined in that community and no diffusion would take place. To capture this effect, a third explanatory variable, denoted as gc , is introduced to represent the percentage of internal trips (PIT). Our linear regression model then reads:
pc =
0
+
1lnqc
+
2 lnec
+
3 gc
+
(6)
c,
where i, i = 0, …, 3 are coefficients and c is the random error associated with unobserved factors. The sample used to perform the regression is different from the one used to calibrate Eq. (5). To ensure each data point can be treated as an independent observation, we only use data aggregated for each of the twelve months in each community. Our tests suggest that one month is long enough to observe significant diffusion while still providing a reasonably large sample (48 × 12 = 576 ) for regression. The regression results are reported in Table 3. First of all, all coefficients are statistically significant at a confidence level of 99% and their signs agree with the expectation. For example, the results confirm that the total number of trips ending in a community positively affects the diffusion intensity, and that the percentage of internal trips has an opposite effect. The value of coefficient 1 indicates that doubling the number of trips ending in a community would roughly increases the diffusion intensity by 2.8 percentage points (or about 170 additional unique bike visits). As expected, the number of effective rebalancing actions (ec ) also positively affects the diffusion intensity, but its effect appears much stronger: doubling ec would results in a 5.6 percentage points (or 340 additional unique bike visits) increase in the diffusion intensity. Moreover, in order to create a similar effect as doubling ec , one could try to decrease PIT in a community by 30%. The above model may also be used to estimate the number of rebalancing actions required to achieve a desired level of diffusion intensity for a community ( pc ), provided that qc and gc can be obtained through a demand forecasting model of bike trips.
Fig. 6. Linear relationship between: (a) number of trips qi , and ei ; (b) cumulative trip distance d and ei (each dot in the plot represents a bike).
517
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Table 4 Results of multivariate linear regression for bike usage qi . Coefficient (Variable) 0
1 2
Estimated value
(constant) (diffusion range, pi ) (NERA, ei )
−145.27∗∗∗ 682.75∗∗∗ 8.64∗∗∗ Regression statistics
R2
0.76
Adjusted R2 Standard error
106.50
0.76
Note: ∗∗∗ means the p-value is significant at a confidence level of 99%.
4. Diffusion analysis: bike perspective Having analyzed the diffusion intensity of communities, this section proceeds to examine the diffusion behavior from the perspective of individual bikes. To this end, an individual bike’s diffusion range, denoted as pi , is defined as the percentage of unique communities it visits within a given period of time. Here i stands for individual. Our first goal is to uncover the key factors that affect pi . 4.1. Bike usage estimation We start from the relationship between the number of trips made using a given bike and the number of effective rebalancing actions taken against the bike. The former is denoted as qi and the latter as ei to distinguish them from the similar variables defined for a community. Fig. 6a reveals a surprisingly strong linear relationship between the two variables. A simple linear regression confirms that the goodness-of-fit measure R2 = 0.71 and the slope of the fitted line is 10.18 with a Y-intercept of about 109. Hence, when there is no rebalancing action, a bike gets used roughly 100 times a year, or once every three days. Each rebalancing action then produces ten more trips, highlighting the prominent role it plays in promoting the utilization of shared bikes. Fig. 6b further confirms that the bikes’ cumulative distance d in the year of 2017 is also strongly and positively correlated with ei , with a Pearson correlation coefficient of 0.82. The linear regression shows that, at a goodness-of-fit measure of 0.67, an additional rebalancing action can yield about 25 bike kilometers travelled. Fig. 6 shows that the rebalancing action ei is a strong predictor for the bike usage qi . We further add the diffusion range pi as the second predictor. The new linear regression model reads
qi =
0
+
1 pi
+
2 ei
(7)
+ i,
where i , i = 0, 1, 2 are coefficients and i is the random error associated with unobserved factors. The regression results are reported in Table 4. First of all, both ei and pi are statistically significant at a confidence level of 99%. Second, the adjusted R2 value is 0.76, better than the model that has only ei as its explanatory variable (Fig. 6). In other words, the diffusion range index improves the predictive power of the bike usage model. The coefficient of pi , i.e. 1, indicates that reaching one more community (i.e., an increase of 1/48 in pi ) leads to about 14 more trips on average. Also, an additional NERA generates about 8.6 more trips. In comparison, when pi is not included in the model, the sensitivity of trip generation to NERA is about 10.2.
Fig. 7. Change of average diffusion range with time, for three bike sets. 518
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Fig. 8. Relation between pi and d for bikes in three bike sets.
4.2. Diffusion properties considering rebalancing actions In light of the impact of rebalancing actions, the following three groups of bikes are created for following analysis, depending on how often they were rebalanced.
• Low-rebalance-bike set (LRS). It includes bikes whose e in the entire year ranks in the bottom 5%. The group has 304 bikes, none of which has been rebalanced more than 16 times. • High-rebalance-bike set (HRS). It includes bikes whose e ranks in the top 5%. The group has 324 bikes, none of which has been rebalanced less than 70 times. Whole set (WHS). The set that includes all bikes. • i
i
For each set, pi is first computed for every individual bike for a given period of observation, and the simple average is taken, denoted as p¯i , for all included in the set. Fig. 7 reports how the average diffusion range p¯i varies with the length of the observation period for each of three groups. First and foremost, the plot shows, somewhat surprisingly, how small the average diffusion range is for a bike in the system. Even for HRS, the most actively rebalanced group, a bike barely reaches 51% of the communities on average after a year. Notably, towards the end of the year, the curve for HRS has leveled, indicating it might approach a limit. It appears as if the region has a few subregions that are isolated from each other. It may be difficult to cross the boundaries between those subregions by bike, and accordingly the operator sees no reason to move bikes from one subregion to another when rebalancing the bikes. Also worth noting in Fig. 7 is the difference between the curve corresponding to HRS and LRS. Clearly, rebalancing action still positively impact the bikes’ diffusion range in a quite significant manner. By the 100th day, p¯i for HRS bikes has more than doubled that for LRS bikes. At the end of the year, highly rebalanced bikes can still reach 60% more communities than poorly rebalanced ones. Since we only use data in one year, it is unclear how much the LRS bikes can still expand its diffusion range after a year. Fig. 8 plots pi against the cumulative travel distance d in the year of 2017 for each bike. It shows, in general, that the longer a bike travels in the system, the more communities it is likely to reach (see Fig. 8a). Yet, the trend is much less clear for the HRS group, which has a quite high and uniform cumulative travel distance. The Pearson correlation coefficients for pi and d are 0.61 for WHS, 0.74 for LRS and 0.20 for HRS. Worth noting here is that the strongest correlation between pi and d occurs for LRS, the group with lowest rebalancing activities. This is likely because the lack of rebalancing makes the diffusion process more dependent on bike usage, which is directly related to the cumulative distance travelled. Fig. 9 plots the empirical distribution of pi for bikes in each group and different time periods. The subplots on the left (right) column shows the time periods associated with the cold (warm) season. In each plot, the vertical axis represents pi and the horizontal axis represents the cumulative percentage of active bikes whose diffusion range have reached at least pi within a give period of time. A couple of observations can be made by comparing the plots in Fig. 9 across different groups. First, the distribution of pi for most active bikes is concentrated between 45% to 60%. For WHS bikes, about 60% bikes are in this range, whereas for the HRS, the ratio is 80%. Second, the curves shift upward (i.e.g, diffusion range increases) when the observation period becomes longer, as expected and consistent with the finding from Fig. 7. The plots associated with the LRS bikes are subject to much greater irregularities. Importantly, their empirical diffusion range has much weaker concentration, suggesting larger variances in the diffusion range when the bikes are insufficiently balanced. This implies rebalancing helps reduce the differences in the diffusion range between bikes. Comparing the plots in the left column to the right, we again see that the warm season helps diffusion, likely through greater travel and rebalancing activities (note that the two are closely correlated). For WHS bikes, more than 80% active bikes reach a diffusion range of 30% in the third quarter (July to September), whereas only 20% reach that range in the first quarter (January to March). An even bigger gap is observed for HRS bikes. Comparing the plots in the second and third rows highlights the role of rebalancing activities. For example, for the third quarter (the warm season), less than 30% of LRS reach a diffusion range of 30%, compared to more than 90% for HRS bikes reaching that level of diffusion. 519
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Fig. 9. Relation between pi and cumulative percentage of active bikes in three bike sets and different periods of time.
Last but not least, the maximum diffusion range in all plots never reaches 75%. Thus, even the bike that has greatest mobility cannot reach one quarter of all communities. This, once again, suggests that not all communities in the region are well connected by bike travel, and that the largest subregion that is well connected by bikes may encapsulate approximately 35 communities for Chicago. 5. Case Study: isolated communities The previous diffusion analysis found no shared Divvy bike can reach more than three quarters of the 48 communities. We hypothesize that the region has isolated communities that are not well connected by bike travel. In this section we perform a case study to test this hypothesis. 520
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Fig. 10. Spatial distribution of (a) isolated communities, (b) concentration of African American, (c) median household income and (d) homicide rate.
521
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Table 5 Results of logistic regression. Logistic regression
Number of obs LR Chi2(2) Prob>Chi2 Pseudo R2
Log likelihood = −22.42
= = = =
I
Coef.
Std. Err.
z
P > |z|
PAAP MHI Constant
−0.03 0.40 0.15
0.01 0.18 0.92
−2.79 2.17 0.17
0.01 0.03 0.87
48 14.78 0.00 0.25 [95% Conf. Interval]
−0.05 0.04 −1.66
−0.01 0.75 1.96
We select a group of bikes that have the highest diffusion range ( pi 65% ) and track their whereabouts in the region. In total, there are 68 such bikes. We count the total number of visits by these bikes in each community during the year. A community is considered isolated if the count is no more than 68. That is, the average visit time per bike is no more than one in an isolated community. Using this definition, 15 communities are identified as isolated, as visualized in Fig. 10a. The isolated communities are distributed on the south and west side of the city. They seem strongly associated with those segregated communities in the region that have high concentration of African American population, low-income households and homicide crimes (Justicemap, 2018; Federal Bureau of Investigation, 2017), as shown in Fig. 10b-d. To confirm the correlation, we perform a logistic regression that takes the following form
P (I = 1) =
1 + exp(w0 + w1
1 PAAP + w2
MHI )
,
(8)
where PAAP represents the percentage of African American population and MHI represents the median household income. We exclude homicide crime rate because it is strongly correlated with PAAP, with a Pearson correlation coefficient of 0.72. The dependent variable in the regression is a binary variable I, which equals 1 if a community is isolated. As reported in Table 5, the coefficients of both PAAP and MHI are significant at the 95% confidence level, and their signs are expected. PAAP has an odds ratio of exp( 0.03) = 0.97 < 1, suggesting a higher percentage of African American population increases the probability of isolation. The odds ratio of MHI (1.49 > 1) implies raising household income helps reduce isolation. The goodnessof-fit test for proposed logistic model shows a correctly classified rate of 77.08%. Fig. 11 reports the number of trips made by the 68 bikes between (1) two isolated communities; (2) two non-isolated (or connected) communities; and (3) an isolated community and a connected community. The first two types of trips will be referred to as Type 1 and Type 2 trips. For the third type, if the isolated community is the origin, it is called Type 3O; otherwise it is called Type 3D. Trips starting from an isolated community have roughly similar likelihood to end in an isolated or a connected community. More specifically, the median is 401 and 391 for Type 1 and Type 3O trips, respectively (see Fig. 11a). In contrast, trips starting from connected community have much greater likelihood to end in a connected than an isolated community: the median for Type 2 is 31,285, almost two orders of magnitude higher than that for Type 3D (675, see Fig. 11b). Fig. 11c shows that, of all trips that end in an isolated community, roughly a half also start from an isolated community. Yet, less than 1% of trips ending in a connected community start in an isolated community. These results indicate that isolated communities are reasonably connected by bike trips among themselves but are poorly connected with other communities. To summarize, the curious limit on the bikes’ diffusion range is but a sober reminder of the racial segregation in the region. This interesting finding highlights how bike diffusion analysis could be used to diagnose potential structural problems in the system.
Fig. 11. Box plots of Type 1, 2, 3O and 3D trips. (a) Type 1 vs. Type 3O trips. (b) Type 2 vs. Type 3D trips. (c) Ratio 1 = Type 1/(Type 1 + Type 3D) vs. Ratio 2 = Type 3O/(Type 2 + Type 3O). 522
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
6. Conclusions We have examined the operation of a docked bike-sharing system from the perspective of bike diffusion behavior. An analogy is drawn between the movement of shared bikes in the system and the transmission of certain information on internet or the spreading of epidemics between humans. Our relatively straightforward data-mining efforts and analysis have uncovered quite a few interesting, if not entirely surprising, findings, as highlighted in the following.
• The distribution of bike trip distance peaks between 0.8 and 2 km. Beyond 6.3 km, it follows a strong power law with an exponent slightly larger than those reported in the literature on human mobility. • The relationship between the diffusion intensity and the number of total incoming bike trips of a community follows an S-curve • •
quite similar to the growing process of population under finite natural environment. Furthermore, the diffusion intensity of a community is also affected negatively by the percentage of inner-community trips and positively by the number of rebalancing actions. The effect of rebalancing actions is roughly twice as strong as that of incoming bike trips. Both the diffusion range of a bike and the number of rebalancing actions it receives are strong predictors of its use. Reaching one more community will produce about 14 more trips and an additional rebalancing action contributes about 8.6. Even the most active bikes could only reach about 75% of all communities in Chicago. This finding helps identify a cluster of communities poorly connected with the rest of the city by bike travel. It is confirmed that these isolated communities are strongly correlated with the communities in the region that have high concentration of African American population, low-income households and homicide crimes.
We caution that some of the above empirical findings may not be directly transferable to other bike-sharing systems. This study takes an initial step towards analyzing and understanding the diffusion patterns in shared bike systems. It is just a beginning. The present study lacks a physical representation of the underlying process that presumably governs the diffusion behavior. Modeling and simulating the process, as well as comparing the analytical and simulated results with the empirical observations, would be an important next step, one that could potentially generate more useful managerial insights for the operators of these systems. Another general direction is to apply the analysis to other bike-sharing systems, particularly the dockless systems, and try to identify and understand the spatial, societal and operational characteristics that would affect the findings made here. Acknowledgments The research was conducted when the first author was a visiting PhD student at Northwestern University. Her visit was supported by China Scholarship Council. The research was supported by National Natural Science Foundation of China (No. 51878166 and No. 71801042), Scientific Research Foundation of Graduate School of Southeast University (No. YBJJ1837) and Jiangsu Graduate Research Innovation Program (No. KYCX18_0134). The research was also partially supported by National Science Foundation under the award number PFI:BIC 1534138. The authors are grateful to the constructive comments provided by three anonymous reviewers. The case study presented in Section 5 was added based on these comments. The remaining errors are those of the authors’ alone. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.trc.2019.08.018. References Ai, Y., Li, Z., Gan, M., 2018. A solution to measure traveler’s transfer tolerance for walking mode and dockless bike-sharing mode. J. Supercomput. 1–18. Castellano, C., Fortunato, S., Loreto, V., 2009. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591. de Chardon, C.M., Caruso, G., 2015. Estimating bike-share trips using station level data. Transport. Res. Part B: Methodol. 78, 260–279. Chemla, D., Meunier, F., Calvo, R.W., 2013. Bike sharing systems: Solving the static rebalancing problem. Disc. Optim. 10, 120–146. Chow, J.Y., Sayarshad, H.R., 2014. Symbiotic network design strategies in the presence of coexisting transportation networks. Transport. Res. Part B: Methodol. 62, 13–34. City of Chicago, 2018. Data portal: Boundaries - community areas (current), 2018. https://data.cityofchicago.org/Facilities-Geographic-Boundaries/BoundariesCommunity-Areas-current-/cauq-8yn6. Divvy, 2018. Divvy help: Pricing structure. https://www.divvybikes.com/system-data. Du, Y., Deng, F., Liao, F., 2019. A model framework for discovering the spatio-temporal usage patterns of public free-floating bike-sharing system. Transport. Res. Part C: Emerg. Technol. 103, 39–55. Erdoğan, G., Battarra, M., Calvo, R.W., 2015. An exact algorithm for the static rebalancing problem arising in bicycle sharing systems. Eur. J. Oper. Res. 245, 667–679. Faghih-Imani, A., Eluru, N., 2015. Analysing bicycle-sharing system user destination choice preferences: Chicago’s divvy system. J. Transp. Geogr. 44, 53–64. Federal Bureau of Investigation, 2017. Crime in the us. https://ucr.fbi.gov/crime-in-the-u.s. Fishman, E., 2016. Bikeshare: A review of recent literature. Transport Rev. 36, 92–113. Forma, I.A., Raviv, T., Tzur, M., 2015. A 3-step math heuristic for the static repositioning problem in bike-sharing systems. Transport. Res. Part B: Methodol. 71, 230–247. Frade, I., Ribeiro, A., 2015. Bike-sharing stations: A maximal covering location approach. Transport. Res. Part A: Policy Practice 82, 216–227. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L., 2008. Understanding individual human mobility patterns. Nature 453, 779. Grubler, A., 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Physica-Verlag, Heidelberg. Gu, T., Kim, I., Currie, G., 2019. Measuring immediate impacts of a new mass transit system on an existing bike-share system in china. Transport. Res. Part A: Policy Practice 124, 20–39. Haider, Z., Nikolaev, A., Kang, J.E., Kwon, C., 2014. Inventory rebalancing through pricing in public bike sharing systems. Working Paper 564, Department of
523
Transportation Research Part C 107 (2019) 510–524
X. Wei, et al.
Industrial and Systems Engineering, University at Buffalo, Buffalo, NY, USA. Hethcote, H., 2000. The mathematics of infectious diseases. SIAM Rev. 42, 599–653. Hyland, M., Hong, Z., de Farias Pinto, H.K.R., Chen, Y., 2018. Hybrid cluster-regression approach to model bikeshare station usage. Transport. Res. Part A: Policy Practice 115, 71–89. Jiang, Y., 2018. Bike sharing dilemma: Intending to protect the environment risks becoming urban waste. http://www.chinanews.com/sh/2018/08-10/8595499. shtml. Justicemap, 2018. Data comes from the 2010 us census and the american community survey (2013–2017 - 5 year summary). http://www.justicemap.org/data.php. Liu, S., Li, W., Deng, H., 2017. Time to regulate china’s booming bike share sector. https://www.chinadialogue.net/blog/9887-Time-to-regulate-China-s-boomingbike-share-sector-/en. Liu, Z., Jia, X., Cheng, W., 2012. Solving the last mile problem: Ensure the success of public bicycle system in beijing. In: Procedia - Social and Behavioral Sciences 43, 73–78. 8th International Conference on Traffic and Transportation Studies (ICTTS 2012). Lloyd, A.L., May, R.M., 2001. How viruses spread among computers and people. Science 292, 1316–1317. Lu, M., Hsu, S.C., Chen, P.C., Lee, W.Y., 2018. Improving the sustainability of integrated transportation system with bike-sharing: A spatial agent-based approach. Sustain. Cities Soc. 41, 44–51. Ma, T., Liu, C., Erdoğan, S., 2015. Bicycle sharing and public transit: does capital bikeshare affect metrorail ridership in washington, dc? Transport. Res. Rec.: J. Transport. Res. Board 1–9. Meddin, R., 2018. In Operation: Beginning December 2018. http://www.bikesharingmap.com. Mobike, 2017a. Traveling report of sharing bikes in China, 2017. Technical Report. Beijing Mobike Technology Co., Ltd. Mobike, 2017b. Traveling report of sharing bikes in Wuhan, 2017. Technical Report. Wuhan Transportation Development Strategy Research Institute. Newman, M.E., 2002. Spread of epidemic disease on networks. Phys. Rev. E 66, 016128. Pastor-Satorras, R., Vespignani, A., 2001. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200. Schuijbroek, J., Hampshire, R.C., Van Hoeve, W.J., 2017. Inventory rebalancing and vehicle routing in bike sharing systems. Eur. J. Oper. Res. 257, 992–1004. Shaheen, S., Guzman, S., Zhang, H., 2010. Bikesharing in Europe, the americas, and asia: past, present, and future. Transport. Res. Rec.: J. Transport. Res. Board 159–167. Stiglic, M., Agatz, N., Savelsbergh, M., Gradisar, M., 2018. Enhancing urban mobility: Integrating ride-sharing and public transit. Comput. Oper. Res. 90, 12–21. Wang, S., 2017. Beijing municipal bureau of statistics released a survey on the satisfaction of bicycle sharing: Over 90% of users are satisfied. http://beijing.qianlong. com/2017/0510/1674493.shtml. Warrington, J., Ruchti, D., 2019. Two-stage stochastic approximation for dynamic rebalancing of shared mobility systems. Transport. Res. Part C: Emerg. Technol. 104, 110–134. Wergin, J., Buehler, R., 2017. Where do bikeshare bikes actually go? analysis of capital bikeshare trips with gps data. Transport. Res. Rec.: J. Transport. Res. Board 12–21. Yang, X.H., Cheng, Z., Chen, G., Wang, L., Ruan, Z.Y., Zheng, Y.J., 2018. The impact of a public bicycle-sharing system on urban public transport networks. Transport. Res. Part A: Policy Practice 107, 246–256. Yang, Y., Li, T., Zhang, T., Yang, W., 2017. Understanding the utilization characteristics of bicycle-sharing systems in underdeveloped cities: a case study in xuchang city, china. Transport. Res. Rec.: J. Transport. Res. Board 78–85. Zhang, Y., Mi, Z., 2018. Environmental benefits of bike sharing: A big data-based analysis. Appl. Energy 220, 296–301. Zhang, Y., Wen, H., Qiu, F., Wang, Z., Abbas, H., 2019. ibike: Intelligent public bicycle services assisted by data analytics. Future Generat. Comput. Syst. 95, 187–197.
524